Match score not available

Site Reliability Engineer (Remote)

extra holidays
Remote: 
Full Remote
Contract: 
Salary: 
10 - 105K yearly
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Experience in Site Reliability Engineering or DevOps, Strong experience with AWS services, Expertise using monitoring tools like Prometheus and Grafana, Hands-on with CI/CD pipeline management, Familiarity with Infrastructure as Code tools.

Key responsabilities:

  • Monitor platform activity and manage incidents
  • Manage and support AWS infrastructure
  • Collaborate with teams for platform improvements
  • Automate processes for efficiency
  • Enhance monitoring systems and optimize performance
Perlego logo
Perlego E-learning Scaleup https://www.perlego.com/
51 - 200 Employees
See more Perlego offers

Job description

What we do

At Perlego, there are over 100 of us working hard to make education accessible to all. In this digital age, we believe that anyone should be able to learn anything at any time. Knowledge should be more accessible, not locked behind sky-high price tags.

Over the past 5 years, our goal has been to support students across the UK & Europe to access quality books. The next stage of Perlego is twofold: 1) expand our support to students globally, and 2) build a product that goes beyond the book, a platform that helps students study smarter and more effectively.

What we're looking for:

We are looking for an experienced Site Reliability Engineer (SRE) with a strong background in AWS services and monitoring tools. In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India. You will be integral to swiftly addressing issues, resolving incidents independently, and thriving in a fast-paced environment.

How we collaborate:

Our organization operates across multiple time zones, with teams based in across Europe. As an SRE, you will provide critical support during off-hours, working autonomously to resolve issues while collaborating closely with our teams to ensure continuous service availability. You will be part of a global team, supporting cloud infrastructure and platform initiatives.

What you’ll do:

As a Site Reliability Engineer, your main focus will be to ensure our services remain highly available and performant. Key responsibilities include:

Monitoring & Incident Management:

  • Monitor and manage platform activity using tools like DatadogPrometheusGrafana, or AWS CloudWatch.
  • Respond quickly to alerts and incidents, independently resolving issues and ensuring service uptime during off-peak hours.
  • Conduct post-incident reviews and help improve system resiliency through automation and monitoring enhancements.

Cloud Infrastructure Management:

  • Manage and support AWS infrastructure, focusing on scalability, security, and reliability.
  • Handle deployments, managing CI/CD pipelines for both containerized (Docker/Kubernetes) and serverless (AWS Lambda) applications.
  • Ensure effective backup, recovery, and disaster recovery strategies to minimize downtime.

Collaboration & Communication:

  • Collaborate with cross-functional teams to implement platform improvements.
  • Work independently and make swift decisions when managing service incidents outside core business hours.
  • Assist in platform security, ensuring adherence to best practices for cloud security and compliance.

Continuous Improvement:

  • Automate manual processes to reduce human error and improve efficiency.
  • Continuously enhance monitoring systems, ensuring robust early detection and resolution capabilities.
  • Identify potential performance bottlenecks and contribute to overall platform optimization.

Requirements

This role is ideal for you if you possess:

  • Experience in Site Reliability Engineering, DevOps, or a similar field.
  • Strong experience with AWS services
  • Expertise in using monitoring tools (e.g.  Prometheus, Grafana, CloudWatch) for real-time platform performance insights.
  • Hands-on experience with CI/CD pipeline management for deploying containerized (Docker) and serverless applications.
  • Proficiency in Linux-based operating systems and shell scripting.
  • Familiarity with Infrastructure as Code tools (Terraform, CloudFormation).
  • Experience with incident management, troubleshooting, and platform recovery in high-pressure environments.
  • Strong communication skills with a proven ability to work both independently and collaboratively across time zones.

⭐️ It’s a plus if you have:

  • Experience working in a global, distributed team providing off-hours support.
  • Knowledge of container orchestration tools.
  • Previous experience with SecOps and cloud security best practices.
  • Familiarity with scaling highly available systems in a fast-paced, growth-oriented environment.

Benefits

🌈 Benefits include:

✨Compensation

The salary available for this role is CA$105,000 + Share options

Why should you work at Perlego?

Apart from our mission, we foster a unique company culture championing self-empowerment, personal development, direct communication and mutual support. We’re proud of our Glassdoor reviews and the fact that 97% of our team would recommend Perlego as a place to work.

Want to learn more about how we’re making learning accessible? Check out our latest impact report

🧠 L&D Budget

We value continuous learning and you will have a personal L&D budget for online courses, subscriptions or books not on Perlego.

🌱 Unlimited Coaching Opportunities

Unlimited access to MoreHappi, an on-demand professional coaching platform to offer all employees access to unbiased and professional coaching opportunities.

🤓 Learning Time

All employees have dedicated Learning Time to focus on new skills, projects or interests that lay outside of their day-to-day job

🌴 Work-Life Balance

Everyone needs a break, so enjoy 30 days off (incl. bank holidays) + 1 additional day annual leave for every year of service up to 35 days off (incl. bank holidays)

🛐 Flexi Bank Holidays

We understand that not everyone aligns with the same calendar; we offer the flexibility to take your local country's bank holiday allowance for other religious or cultural days.

e.g. switch UK Easter Bank Holidays Days for Eid celebrations

❄️ Office Reset

All employees can also enjoy the days between Boxing Day and New Year off, to reset and refresh for the new year - this is additional to your annual leave 🙂

🏖 Sabbatical

After three years there is an opportunity to take a 1-month unpaid sabbatical, and after five years there is an opportunity to take a 1-month paid sabbatical

💛 Personal Days

Life happens and we want you to be able to use your annual leave for resting, relaxing or taking time out to do something you love!

We offer 1 additional day a year for life events (your wedding, relocation, moving house, or a child starting school).

🍏 Health & Wellbeing

We want everyone to feel healthy and happy, so you get private medical insurance

🍼 Family time

We believe family is really important; we offer new parents a competitive matched parental leave as well as a phased return to work from extended leave.

👋 Belonging at Perlego:

🌈 We are an equal opportunity employer and value diversity of thought and background.

❤️ We are actively building a diverse team, so we strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.

📣 To enable an equitable experience for all and give you the best chance of success, if you have any specific requirements for any stage of the interview process, please let us know by emailing ben@perlego.com

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Industry :
E-learning
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Communication
  • Problem Solving
  • Collaboration

Site Reliability Engineer Related jobs