Offer summary
Qualifications:
Proficiency in a programming language (Python, Go, Java), Solid knowledge of operating systems and distributed systems, Experience handling production incidents with root cause analysis, Strong communication skills for explaining technical concepts, Familiarity with cloud platforms and container orchestration systems is preferred.
Key responsabilities:
- Develop tools to automate operational processes
- Lead monitoring and observability framework improvements
- Participate in on-call rotation for incident response
- Collaborate with developers to enhance service reliability
- Manage SLIs, SLOs, and SLAs effectively