(M) Staffing – 3x Operations Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

5+ years in Operations, SRE, or DevOps roles, 3+ years managing on-premise Kubernetes clusters, Strong troubleshooting skills in Kubernetes, Networking, and Databases, Proficient in monitoring tools like Prometheus, Grafana, and Loki..

Key responsibilities:

  • Manage and maintain on-premise Kubernetes infrastructure
  • Monitor and troubleshoot performance issues and respond to incidents
  • Support and maintain databases ensuring performance and data integrity
  • Execute and manage production deployments with rollback strategies.

Believe Solutions logo
Believe Solutions Scaleup http://www.believesol.com/
51 - 200 Employees
See all jobs

Job description

Kubernetes On-Premise Operations Engineer

Location: Remote (only Bolivia candidates)
Type: Full-Time
Project Scope: Iridium Panama (end of 2025)

We are seeking a Kubernetes On-Premise Operations Engineer to manage and maintain our on-premise Kubernetes infrastructure. This role is focused on day-to-day operations, proactive monitoring, troubleshooting, and ensuring high availability and system stability. The engineer will collaborate closely with Level 3 Engineers who provide the infrastructure backbone, ensuring seamless and reliable production operations.


Scope of Applications Supported
  • Mi Tigo – Serving 6 countries

  • Tigo Sports – Available in 6 countries

  • Apigee – Active in 1 country

  • KannelGateway – Used across 9 countries


Key Responsibilities
  • Kubernetes Cluster Management

    • Apply patches and updates

    • Monitor and troubleshoot performance issues

  • Incident Management & On-Call Support

    • Participate in on-call rotation

    • Respond to incidents, perform root cause analysis (RCA), and document resolutions

  • Networking & Ingress Management

    • Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik

  • Storage & Databases

    • Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity

  • Observability & Monitoring

    • Manage Prometheus, Grafana, and Loki for proactive alerting and system logging

  • Automation & Configuration Management

    • Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations

  • Production Deployments

    • Execute, monitor, and manage production deployments with proper rollback strategies

  • OS & Security Management

    • Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant


Requirements
  • 5+ years in Operations, SRE, or DevOps roles

  • 3+ years managing on-premise Kubernetes clusters

  • Strong troubleshooting skills in:

    • Kubernetes

    • Networking

    • Databases (MongoDB, MySQL, PostgreSQL)

  • Proficient in monitoring tools: Prometheus, Grafana, Loki

  • Familiar with operational processes, incident management, and runbooks

  • Experience with Helm, Ansible, and optionally Terraform

  • Prior experience with production on-call support and incident resolution

  • Competent in performing production deployments under change management practices

  • Experience managing Ubuntu systems

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Troubleshooting (Problem Solving)

Operations Specialist Related jobs