Match score not available

Senior Staff Engineer - Site Reliability at Nagarro

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

Colombia

Offer summary

Qualifications:

Expert in Kubernetes and Terraform, Proficient in AWS services, especially EKS, Experience with incident and problem management, Understanding of networking concepts like TCP/IP and DNS, Familiarity with CI/CD tools like Jenkins.

Key responsabilities:

Provide L3 support across full stack
Automate SRE tools for proactive support
Communicate effectively with stakeholders during troubleshooting
Monitor resource utilization and application performance
Handle business pressure for critical applications

Nagarro Information Technology & Services XLarge https://www.nagarro.com/

10001 Employees

See more Nagarro offers

Job description

Company Description

We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!

Job Description

Experienced L3 SRE engineer based on business-critical SaaS application.
Capacity to L3 across the full stack including infra backend and front-end, before escalation to engineering business unit.
Capacity to automate SRE tools to provide proactive L3 support, close to our tech monitoring strategy.
Capacity to work under business pressure for business critical applications.
Capacity to communicate accordingly with L1,L2, Engineering, Product managers, leadership and end-users during troubleshooting.

Qualifications

Must have Skills: Kubernetes (Expert), Github Actions, Terraform (Expert), and AWS.

Capacity to communicate accordingly.
Experience with incident and problem management.
Experience with multitenant applications.
Solid understanding of networking concepts (TCP/IP, DNS, Routing, etc) like VPCs, subnets, firewalls, and load balancing, TLS and SSL.
Experience with CI/CD pipelines (e.g., Jenkins, Github Actions) & version control.
Python, react/next - Monitoring and logging to analyze & track resource utilization, application performance, and identify potential issues, Grafana, Prometheus, Loki or ELK.
Experience with AWS, particularly EKS, serverless, queue & various databases.
Solid knowledge Kubernetes.