Match score not available

Cloud SRE

Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

5+ years in Site Reliability Engineering or DevOps., Experience with major cloud providers., Proficient in Docker and Kubernetes., Strong analytical skills and communication abilities..

Key responsabilities:

  • Collaborate to design solutions for high availability.
  • Own incident response and root cause analysis.

Ryz Labs logo
Ryz Labs Information Technology & Services Startup https://ryzlabs.com/
11 - 50 Employees
See all jobs

Job description

Remote position within Argentina or Uruguay

RYZ is seeking a Cloud SRE to join one of our clients, who is developing self-driving robotic carriers to deliver and serve food. 

In this role, you will balance hands-on responsibilities—building and maintaining critical SRE tooling and processes with technical leadership, guiding architecture decisions, mentoring others in SRE practices, and steering strategic initiatives to enhance system resiliency and availability. You’ll collaborate across engineering, product, and operations teams to ensure their systems meet strict uptime and performance goals, all while aligning with overarching business objectives.

Qualifications: 

- 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
- Demonstrated success implementing SRE best practices in high-availability, large-scale systems.
- Experience with one or more major cloud providers (e.g., Google Cloud, AWS, Azure); familiarity with managed services and best practices for high availability.
- Proficiency in Docker, Kubernetes, or similar containerization/orchestration platforms.
- Hands-on experience with logging, metrics, and tracing tools (e.g., Prometheus, Grafana, Datadog, Splunk, New Relic).
- Familiarity with Infrastructure-as-Code (Terraform, Ansible, etc.) and scripting (Python, Go, Bash).
- Proven ability to guide teams in adopting SRE principles without direct managerial authority.
- Excellent communication skills to work across diverse technical and business teams.
- Strong analytical skills to navigate complex systems and identify root causes.
- Comfortable operating in a fast-paced environment with shifting priorities.

Key Responsibilities: 

- Collaborate with development teams to design and implement solutions that ensure high availability in the cloud.
- Lead the definition and management of SLIs and SLOs aligned with business objectives.
- Perform capacity planning, load testing, and performance tuning.
- Develop monitoring and observability tools to validate system availability and performance.
- Implement best practices for instrumentation with tools like Prometheus, Grafana, or Datadog.
- Own the incident response process and root cause analysis.
- Identify and mitigate reliability risks to reduce downtime.
- Facilitate postmortems to capture learnings and drive continuous improvement.
- Advise teams on reliability-oriented design and development practices.
- Mentor engineers to foster a culture of continuous learning and operational excellence.


About RYZ Labs:
RYZ Labs is a startup studio built in 2021 by two lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. What brought them together is the passion for the early phases of company creation and the idea of attracting the brightest talents in order to build industry-defining companies in a post-pandemic world.

Our teams are remote and distributed throughout the US and Latam. They use the latest cutting-edge technologies in cloud computing to create applications that are scalable and resilient. We aim to provide diverse product solutions for different industries, planning to build a large number of startups in the upcoming years.

At RYZ, you will find yourself working with autonomy and efficiency, owning every step of your development. We provide an environment of opportunities, learning, growth, expansion, and challenging projects. You will deepen your experience while sharing and learning from a team of great professionals and specialists.

Our values and what to expect:

- Customer First Mentality - every decision we make should be made through the lens of the customer.
- Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated.
- Ownership -  step up if you see an opportunity to help, even if not your core responsibility. Humility and Respect - be willing to learn, be vulnerable, and treat everyone who interacts with RYZ with respect.
- Frugality - being frugal and cost-conscious helps us do more with less.
- Deliver Impact - get things done in the most efficient way. 
- Raise our Standards - always be looking to improve our processes, our team, and our expectations. The status quo is not good enough and never should be.

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Teamwork
  • Communication
  • Analytical Skills

Related jobs