Offer summary

Qualifications:

Experience in incident response and system monitoring., Strong understanding of cloud infrastructure and automation tools., Familiarity with Kubernetes orchestration and performance optimization., Detail-oriented with a passion for improving system reliability..

Key responsabilities:

Assist with incident investigation and root cause analysis.

Design and implement preventive measures based on incident patterns.

Monitor service health and implement proactive improvements.

Collaborate with the SRE team to enhance system reliability.

Job description

Site Reliability Engineer

About RebelMouse

RebelMouse is the always-modern SaaS CMS where more than 100 enterprise brands and media companies grow their digital audience. Websites running on RebelMouse serve more than half a billion page views per month thanks to powerful tools and incredible distribution across search and social. We blend technology and strategy together to move the needle where it matters most to increase traffic, loyalty, and revenue.

Our People

Our fully distributed team lives in 33 countries around the world.. Led by Andrea Breanna, our Mexican-American, gender-fluid founder and CEO, we are a very safe, positive, and loving environment where diversity matters. We enjoy interesting tasks and strong challenges, value a sense of humor, and strive for work-life balance.

Job Summary

We are looking for a motivated and detail-oriented Site Reliability Engineer (SRE) to join our Infrastructure team. In this role, you will focus on incident response, system monitoring, and maintaining the reliability of our services. Over time, you will have the opportunity to take on broader responsibilities within the SRE function. We are seeking someone who is passionate about infrastructure, eager to learn, and ready to grow by supporting and improving the stability and performance of our platform.

Key Responsibilities:

Assist with incident investigation and root cause analysis
Design and implement preventive measures based on incident patterns
Create and update runbooks and documentation for operational procedures
Develop automation to prevent recurring incidents
Monitor service health and implement proactive improvements
Collaborate with existing SRE team members to enhance system reliability
Identify and address technical debt related to infrastructure stability
Help reduce alert noise by refining monitoring thresholds and rules

Growth Opportunities

Develop expertise in cloud infrastructure management
Learn advanced Kubernetes orchestration
Gain experience with performance optimization
Contribute to automation and tooling development
Participate in system architecture discussions

Benefits Package

Remote work forever
Monthly wellness subsidy
Flexible work hours
Flexible paid time off (PTO) with 12 national holidays and 20 days of vacation per year, as well as paid sick days and personal celebrations days : )

RebelMouse is committed to providing a diverse work environment. We appreciate the unique competencies that each person brings to the company, and we provide equal employment opportunity to all applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity/expression, protected veteran status, or disability status.

Required profile