SRE (LATAM)

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Experience in incident response and system monitoring., Strong understanding of cloud infrastructure and automation tools., Familiarity with Kubernetes orchestration and performance optimization., Detail-oriented with a passion for improving system reliability..

Key responsabilities:

  • Assist with incident investigation and root cause analysis.
  • Design and implement preventive measures based on incident patterns.
  • Monitor service health and implement proactive improvements.
  • Collaborate with the SRE team to enhance system reliability.

RebelMouse logo
RebelMouse Startup https://www.rebelmouse.com/
51 - 200 Employees
See all jobs

Job description


Site Reliability Engineer

About RebelMouse

RebelMouse is the always-modern SaaS CMS where more than 100 enterprise brands and media companies grow their digital audience. Websites running on RebelMouse serve more than half a billion page views per month thanks to powerful tools and incredible distribution across search and social. We blend technology and strategy together to move the needle where it matters most to increase traffic, loyalty, and revenue.

Our People

Our fully distributed team lives in 33 countries around the world.. Led by Andrea Breanna, our Mexican-American, gender-fluid founder and CEO, we are a very safe, positive, and loving environment where diversity matters. We enjoy interesting tasks and strong challenges, value a sense of humor, and strive for work-life balance.

Job Summary

We are looking for a motivated and detail-oriented Site Reliability Engineer (SRE) to join our Infrastructure team. In this role, you will focus on incident response, system monitoring, and maintaining the reliability of our services. Over time, you will have the opportunity to take on broader responsibilities within the SRE function. We are seeking someone who is passionate about infrastructure, eager to learn, and ready to grow by supporting and improving the stability and performance of our platform.

Key Responsibilities:
  • Assist with incident investigation and root cause analysis

  • Design and implement preventive measures based on incident patterns

  • Create and update runbooks and documentation for operational procedures

  • Develop automation to prevent recurring incidents

  • Monitor service health and implement proactive improvements

  • Collaborate with existing SRE team members to enhance system reliability

  • Identify and address technical debt related to infrastructure stability

  • Help reduce alert noise by refining monitoring thresholds and rules

Growth Opportunities
  • Develop expertise in cloud infrastructure management

  • Learn advanced Kubernetes orchestration

  • Gain experience with performance optimization

  • Contribute to automation and tooling development

  • Participate in system architecture discussions

Benefits Package
  • Remote work forever

  • Monthly wellness subsidy

  • Flexible work hours

  • Flexible paid time off (PTO) with 12 national holidays and 20 days of vacation per year, as well as paid sick days and personal celebrations days : )


RebelMouse is committed to providing a diverse work environment. We appreciate the unique competencies that each person brings to the company, and we provide equal employment opportunity to all applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity/expression, protected veteran status, or disability status.



Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Detail Oriented
  • Collaboration

Related jobs