Site Reliability Consultant
LATAM | Remote | Work from Home
Why you?
As a Site Reliability Consultant, you will serve as both a technology leader and trusted advisor to our customers, while mentoring teammates in cutting-edge tools and approaches. Your projects will focus on infrastructure design and modernization, automation of CI/CD pipelines, and building out intelligent monitoring and observability systems—spanning Linux, Cloud, container orchestration, and other open-source technologies. You’ll become our resident expert for Git-based source code management, artifact repository solutions, and Kubernetes in both cloud (e.g., AWS EKS) and on-prem environments.
What you will you be doing:Operate & MaintainAdminister and optimize platforms such as GitLab (CI/CD pipelines, runners) and artifact repository solutions (e.g., JFrog Artifactory).Maintain and troubleshoot Kubernetes clusters—either in the cloud (AWS EKS) or on-prem distributions—with a focus on availability, performance, and security.Automation & CI/CD
Champion “infrastructure as code” using tools like Terraform (or CloudFormation), building repeatable processes for provisioning and updating clusters, repos, and associated services.Implement or improve CI/CD pipelines to reduce manual toil and ensure quick, reliable deployments across multiple environments.Monitoring & Incident Response
Design and configure observability solutions (e.g., Prometheus, Dynatrace, Grafana) to proactively detect and address issues in container orchestration environments, code repositories, and artifact repositories.Participate in an on-call rotation, troubleshooting incidents at all tiers (from first-contact resolution to escalation) and driving continuous improvement based on Root Cause Analysis.Architectural Guidance & Roadmaps
Collaborate with clients to shape infrastructure strategies around container orchestration, secure CI/CD, and DevSecOps best practices.Provide leadership and technical direction on automating repetitive administrative tasks, enforcing security policies (RBAC, TLS, container scanning), and adopting GitOps workflows.Documentation & Mentorship
Create and maintain design documents, runbooks, and operational playbooks for container platforms, CI/CD pipelines, and code management services.Mentor fellow consultants and client stakeholders on Kubernetes, infrastructure automation, and advanced CI/CD usage to enhance knowledge across the organization.Process Management
Plan and coordinate maintenance activities, ensuring minimal downtime and clear communication with stakeholders.Provide ITIL-oriented support (Incident, Change, Problem Management), and champion continuous improvement of operational processes and service reliability.What we need from you:Kubernetes & ContainerizationMust have strong experience with container orchestration (Kubernetes, Docker) in cloud (AWS EKS) or on-prem distributions.Familiarity with related ecosystem tools (Helm, Operators, GitOps, etc.).AWS & Cloud Expertise
Hands-on experience using AWS (VPC, EC2, EKS, IAM, S3, etc.), including provisioning with IaC tools like Terraform (or AWS CloudFormation).AWS certifications (Solutions Architect, DevOps Engineer) are a plus.CI/CD & Source Code Management
Experience setting up GitLab or similar platforms (GitHub, Bitbucket) for CI/CD pipelines, managing runners, and integrating code scanning.Familiarity with artifact repository solutions (e.g., JFrog Artifactory), including repository creation, access controls, and automation of artifact flows.DevOps & Automation
Track record of infrastructure automation using Terraform, Ansible, Puppet, or Chef to reduce manual intervention and ensure repeatable deployments.Strong scripting skills (Bash, Python, Go, etc.) to automate system tasks and streamline operational workflows.Monitoring & Observability
Experience with modern monitoring stacks (Prometheus, Dynatrace, Grafana, ELK/EFK) for analyzing logs, metrics, and traces.Proven ability to design alerts, dashboards, and runbooks that enable rapid first-contact resolution.Linux & Networking
Solid understanding of Linux-based systems, performance tuning, and troubleshooting.Network fundamentals (TCP/IP, load balancers, DNS, NTP, etc.) and ability to diagnose connectivity or performance issues in complex distributed environments.Security & Compliance
Familiarity with container security best practices (RBAC, TLS, vulnerability scanning) and how to apply them at scale.Understanding of compliance frameworks (HIPAA, PCI, etc.) and data privacy constraints a plus.Soft Skills & Collaboration
Adept at communicating technical concepts to both engineering and non-technical stakeholders.Ability to mentor junior team members, champion DevOps culture, and contribute to an inclusive, knowledge-sharing environment.Education & Experience
Bachelor’s Degree in Computer Science, Information Systems, or equivalent experience.Several years of progressive DevOps or SRE experience managing large-scale systems in a production environment.AI/Automation Tooling
Experience or strong interest in leveraging AI-based services or scripts for operational efficiency and faster issue resolution is highly desirable.What you get in return:Love your career: Work on innovative projects such as complex, modern platform initiatives—ranging from on-prem container orchestration to advanced CI/CD and DevSecOps solutions in the cloud.Love your work/life balance: Flexibly work remotely from your home, there’s no daily travel requirement to an office! All you need is a stable internet connection. Love your team: Enjoy our collaborative culture and in a diverse, global team of SRE experts who share knowledge, tackle incidents together, and learn from each other.Love your development: Pythian cares about continues learning and provides opportunities to earn certifications (AWS, Kubernetes, Terraform) and expand your skill set across multiple platforms, frameworks, and industries.Love your workspace: We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment! Love yourself: Pythian cares about the health and well-being of our team. You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more). Additionally, you will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity.Love your customers: You will be focused on highly impactful work, helping clients modernize their infrastructure, automate critical processes, and improve reliability for mission-critical applications.Why Pythian
Pythian excels at helping businesses use data, analytics, and cloud to transform how they compete and win by delivering advanced on-prem, hybrid, cloud, and multi-cloud solutions. In the early years, we focused on supporting mission-critical operational databases, and as our experts became known for their ability to solve the toughest data challenges, our services grew to meet the rapidly changing needs of our clients; expanding from on-premise to the cloud and from operational to analytics data systems.
A powerful combination of extensive expertise in data and cloud, as well as our ability to keep ahead of the latest bleeding-edge technologies, makes us the perfect partner. We help mid- and large-sized businesses transform to stay ahead in today’s rapidly changing digital economy.
We pride ourselves on our ability to deliver innovative solutions that meet the specific data goals of each client and have built meaningful partnerships with major cloud vendors AWS, Google Cloud, Microsoft, and Oracle.
Not the right job for you? Check out what other great jobs Pythian has open around the world!Pythian Careers
Disclaimer
The successful applicant will need to fulfill the requirements necessary to obtain a background check.
Accommodations are available upon request for candidates taking part in all aspects of the selection process.