Match score not available

Senior Site Reliability Engineer

extra holidays - extra parental leave
Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

7+ years of experience in Site Reliability Engineering or related roles., Proficiency in Kubernetes and Terraform, with a strong understanding of cloud providers like AWS and GCP., Deep knowledge of observability practices and experience with tools like PromQL and Grafana., Strong programming skills in languages such as Go, Python, or Rust, with a focus on automation and security best practices..

Key responsabilities:

  • Write and maintain Terraform configurations and modules for Kubernetes clusters.
  • Develop and maintain Helm charts for internal and third-party software deployments.
  • Respond to incidents in the production environment and participate in on-call rotations.
  • Support pre-sales activities by addressing customer inquiries regarding architecture and data security.

Supermetrics logo
Supermetrics Scaleup http://supermetrics.com
201 - 500 Employees
See all jobs

Job description

We’re looking for a Senior Site Reliability Engineer to join our Infrastructure team at Supermetrics. We are a frontrunner in data integration technology, with headquarters in Helsinki, Finland, and offices across different locations. We're a team of 360+ growth-minded people from diverse backgrounds. Together, we make a multicultural, resourceful, and collaborative team.

About the role:

  • Location: Fully remote from Brazil (GMT -3 or GMT -4)
  • Contract: Full-time contract, through a local third-party Employer of Record (EOR)
  • Language Requirement: Fluency in English (both spoken and written) is essential
  • Onboarding: As part of your onboarding, we expect the candidate to spend 2-3 weeks at our HQ in Helsinki (we organize the travel arrangements).

In this role, we're looking for someone with:

  • Experience as a Software Engineer with an extensive background in developing tools and services within a Platform Engineering team focused on building APIs, services, and CLIs to support development teams.
  • Proficiency in Kubernetes, encompassing the enhancement of capabilities through Custom Resource Definitions (CRDs) and operators, alongside improving networking functions and expanding storage alternatives via Container Network Interfaces (CNIs) and Container Storage Interfaces (CSIs), while ensuring security with network policies, admission controls, security profiles, and runtime classes.
  • Deep knowledge of observability practices, including advanced proficiency with PromQL, defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and configuring and deploying OpenTelemetry collectors. Skilled in setting up and managing Grafana Dashboards, operating Time Series Databases (e.g., VictoriaMetrics), and working with Elastic/OpenSearch.
  • Background in Site Reliability Engineering (SRE), with practical experience both as a user and as an operator. Capable of developing, supporting, and maintaining the tools, processes, and infrastructure essential for collecting, analyzing, and optimizing metrics, logs, and traces to ensure the high performance and scalability of SaaS platforms.

Your responsibilities will include:

  • Write Terraform configuration and modules that bootstrap a Kubernetes cluster, or review PRs with contributions from other members, making sure that our modules are truly reusable and well-defined, improving how we test and release them.
  • Write (using Golang, for example) and maintain or improve our tooling, ensuring it facilitates platform utilization by engineering teams.
  • Develop and maintain Helm charts for internal deployments and third-party software.
  • Respond to an incident with our production environment.
  • Support our pre-sales team and assist them in answering potential customers' questions on our architecture and how we guarantee data security or consistency or ensure uptime.
  • Review an architecture change involving a new database and take part in the meetings discussing the pros and cons of such an approach.
  • Rewrite a Github Action to improve how we deploy to Kubernetes using GitOps.
  • Fix technical issues as they arise.
  • Participate in our on-call rotations to provide support, respond to incidents, or handle internal users' questions.

Technologies you'll be working with:

  • Kubernetes
  • ArgoCD, Helmfile, Helm, External Secrets, Cert-manager, Nginx, Contour 
  • Terraform
  • Cloud providers: AWS/GCP (Queues, Compute, Object Storage, Networking, IAM, etc.)
  • Other providers: Cloudflare (CDN, DNS), Aiven, Redis Co.
  • Github Cloud and Github Enterprise
  • OpenSearch, Redis, PostgreSQL, ClickHouse, MySQL  
  • PHP, Golang

Requirements: 

  • 7+ years of experience in Site Reliability Engineering, Platform Engineering, or related roles
  • In-depth understanding of containers and experience operating Kubernetes clusters at scale.
  • Experience operating databases in production
  • Proficient in database concepts with practical experience in both relational and NoSQL databases.
  • In-depth knowledge of Linux systems and Terraform.
  • In-depth experience and understanding of AWS and GCP
  • Solid understanding of modern observability practices and tools
  • Automation mindset with the ability to automate repetitive tasks using scripting languages such as Python or Bash.
  • Collaborative approach to working with others
  • Willing to take on-call rotations during non-business hours
  • Good communication skills, in particular in writing (documentation, but able to write good PRs too)
  • Skilled problem-solving abilities with a keen interest in the tools, technologies and problems in this space
  • A developer background and the ability to write CLIs and other tools in Go, Python, or Rust.
  • Security mindset with experience implementing security best practices in platform and operational contexts.
  • Experience in creating and managing Helm charts.
  • Expert knowledge of continuous integration and continuous deployment (CI/CD) systems and processes and experience developing and maintaining GitHub Actions.

Recruitment Process:

  • Screening call with the recruiter 
  • Hiring Manager Interview
  • Technical Assignment + Technical interview
  • Team Interview

Does this sound like your next adventure? Apply now! We'll fill the role as soon as we find the right person.

Hear why our team likes it here at supermetrics.com/careers/life-at-supermetrics.

Get to know our Engineering team at supermetrics.com/careers/engineering.

#LI-Remote #LI-FullTime #LI-MiddleToSeniorLevel

 

 

Join us on our mission to make data a marketing superpower

 

Supermetrics is a frontrunner in data integration technology, with 15% of global advertising spend reported through our products. 

 

Our technology streamlines marketing data for over 200,000 businesses through a network of agencies and customers like Shopify, HubSpot, and Nestlé. We help marketers master their data and turn it into insights that improve business results and predict the best next step. Since our founding in 2013, we've grown profitably to reach 750K+ users and over 50M€ in annual recurring revenue.

 

We're a team of 360+ growth-minded people from diverse backgrounds. Together, we make a multicultural, resourceful, and collaborative team.

 

Supermetrics operates on trust, transparency, and a keen customer focus. Forward-looking and action-oriented, we work hard to be the leader in our industry. As team players, we help each other and win together.

 

We're hiring for a diverse, competent, and collaborative team and building an inclusive workplace where everyone is treated fairly and respectfully.

 

It all started with a Google t-shirt... Read the rest of our growth story at supermetrics.com/about.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs