Match score not available

Senior Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Strong background in software engineering, Experience in IT operations, Knowledge of Docker and Kubernetes, Familiarity with AWS and monitoring tools.

Key responsabilities:

  • Ensure reliability, scalability, performance
  • Manage system capacity and load balancing
  • Define and measure SLAs, SLIs, SLOs
  • Collaborate with development teams
  • Automate operational tasks and maintain tech stack

Job description


Promaton is changing the dental healthcare landscape by automating treatment planning workflows using AI, making healthcare more affordable and accessible for everyone. We are on a mission to eliminate errors in dentistry by improving diagnostic accuracy and automating treatment planning workflows, see our company page to learn more about what we do.

Our team’s mission is to (1) ensure that our AI can be accessed efficiently and effectively by thousands of customers world-wide and that (2) our internal Product teams have the best experience when developing new features.

We are looking for a highly motivated and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in software engineering and IT operations and will be responsible for ensuring the reliability, scalability, and performance of our production systems.

Do you want to join us in our journey to improve the lives of patients?

You will:
  • Be in charge of our products’ reliability
    • Designing, implementing, and maintaining the monitoring and alerting systems to ensure system availability and performance.

    • Managing the system’s capacity, load balancing, and performance to anticipate and mitigate problems before they occur.

    • Defining and measuring our service level agreements (SLAs), Service Level Indicators (SLIs), Service Level Objectives(SLOs), and error budgets.

    • Collaborating with the development teams to improve the reliability and scalability of applications.

  • Promote a DevOps philosophy inside the engineering teams
    • Conduct post-mortem analysis of incidents and develop action plans to prevent future issues.

    • Have an Incident Commander mentality and deal with incidents and on-call rotas.

    • Participating in the planning and execution of software deployments.

  • Improve our Stack
    • Automating repetitive operational tasks using tools and scripts.

    • Be responsible for keeping the tech stack up to date and helping other teams with that.

Our tech stack:

Docker | Kubernetes | AWS | Grafana | Prometheus | GitHub & GitHub Actions | TypeScript | Node.JS | Express | PostgreSQL | Metabase | OpenAPI | Python | PyTorch | TensorFlow | ArgoCD & Workflows | ClearML | Packer

Our whole stack runs on AWS using EKS, and we deploy our infrastructure changes in a GitOps pipeline using CDK. Our applications are deployed in a GitOps fashion using ArgoCD.

Our backend is mostly written in TypeScript and Python, and all our machine-learning applications are in Python. We have an efficient and effective design and development process around RFCs, PR reviews, and pair programming.

The perks of working at Promaton:

🎈Inclusive environment, we value and celebrate diversity.

🏡 Excellent work/life balance. Freedom to work from home or anywhere you like (and any time you like). We only have a few touchpoints.

💪 Loads of responsibility and autonomy (we stay away from micromanagement) and a chance to make a real impact.

👩‍🔬 Dedicated time for hackathons and growth to explore new ideas of your own. Every quarter, we have a hackathon week where you can work on anything you like to expand your skill set!

🎓 Real training budget for books, conferences, or anything else you need to grow.

💰 Attractive salary package and excellent employment terms.

🚀 Work with the latest technology at the forefront of a rapidly developing field in medical imaging AI.

🏖 Awesome yearly company retreat and quarterly team events.

💻 Top-notch gear and even bigger servers to play with.

🏄‍♂️ Promaton is funded for many years to come, meaning you can have the impact you only get at a startup but with the job security of an established company.

🛬 For international engineers based in the NL (already relocated to the Netherlands), we are able to offer visa sponsorship.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Communication
  • Collaboration
  • Analytical Thinking

Site Reliability Engineer Related jobs