Offer summary

Qualifications:

5+ years in Operations, SRE, or DevOps roles, 3+ years managing on-premise Kubernetes clusters, Strong troubleshooting skills in Kubernetes, Networking, and Databases, Proficient in monitoring tools like Prometheus, Grafana, and Loki..

Key responsibilities:

Manage and maintain on-premise Kubernetes infrastructure

Monitor and troubleshoot performance issues and respond to incidents

Support and maintain databases ensuring performance and data integrity

Execute and manage production deployments with rollback strategies.

Job description

Kubernetes On-Premise Operations Engineer

Location: Remote (only Bolivia candidates)
Type: Full-Time
Project Scope: Iridium Panama (end of 2025)

We are seeking a Kubernetes On-Premise Operations Engineer to manage and maintain our on-premise Kubernetes infrastructure. This role is focused on day-to-day operations, proactive monitoring, troubleshooting, and ensuring high availability and system stability. The engineer will collaborate closely with Level 3 Engineers who provide the infrastructure backbone, ensuring seamless and reliable production operations.

Scope of Applications Supported

Mi Tigo – Serving 6 countries
Tigo Sports – Available in 6 countries
Apigee – Active in 1 country
KannelGateway – Used across 9 countries

Key Responsibilities

Kubernetes Cluster Management
- Apply patches and updates
- Monitor and troubleshoot performance issues
Incident Management & On-Call Support
- Participate in on-call rotation
- Respond to incidents, perform root cause analysis (RCA), and document resolutions
Networking & Ingress Management
- Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik
Storage & Databases
- Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity
Observability & Monitoring
- Manage Prometheus, Grafana, and Loki for proactive alerting and system logging
Automation & Configuration Management
- Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations
Production Deployments
- Execute, monitor, and manage production deployments with proper rollback strategies
OS & Security Management
- Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant

Requirements

5+ years in Operations, SRE, or DevOps roles
3+ years managing on-premise Kubernetes clusters
Strong troubleshooting skills in:
- Kubernetes
- Networking
- Databases (MongoDB, MySQL, PostgreSQL)
Proficient in monitoring tools: Prometheus, Grafana, Loki
Familiar with operational processes, incident management, and runbooks
Experience with Helm, Ansible, and optionally Terraform
Prior experience with production on-call support and incident resolution
Competent in performing production deployments under change management practices
Experience managing Ubuntu systems