Match score not available

Site Reliability Engineer

extra holidays - extra parental leave

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

United Kingdom, Europe

Offer summary

Qualifications:

Proven experience as an SRE or in a similar role, Strong knowledge of Elasticsearch/OpenSearch architecture, Experience with performance tuning and cluster optimization, Understanding of JVM concepts and programming languages, Familiarity with monitoring and automation tools.

Key responsabilities:

Oversee the performance and reliability of Elasticsearch/OpenSearch clusters
Implement best practices for scaling and indexing
Develop and maintain automated performance testing and monitoring
Diagnose and resolve issues related to cluster health and performance
Collaborate with development and DevOps teams for system enhancements

Coralogix SME https://coralogix.com/

See more Coralogix offers

Job description

Description

Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, trace and security events with features such as APM, RUM, SIEM, Kubernetes monitoring and more, all enhancing operational efficiency and reducing observability spend by up to 70%.

We are seeking a skilled Site Reliability Engineer (SRE) with a strong background in Elasticsearch/OpenSearch to join our team. The ideal candidate will manage and optimize large-scale Elasticsearch/OpenSearch clusters, ensuring the infrastructure's stability, performance, and scalability. You'll work closely with development and operations teams to build robust and efficient systems.

Key Responsibilities:

Manage & Monitor: Oversee the performance, reliability, and availability of large-scale Elasticsearch/OpenSearch clusters.
Optimize & Scale: Implement best practices for scaling, indexing, and querying to ensure optimal performance.
Automate & Streamline: Develop and maintain automated performance testing or benchmarking, monitoring, and alerting for Elasticsearch/OpenSearch clusters.
Troubleshoot & Resolve: Quickly identify and resolve issues related to cluster health, data integrity, performance bottlenecks, and search accuracy.
Collaborate: Work closely with development, DevOps, and other teams to design and implement enhancements to cluster architecture, stability, performance, and data management flows.

Requirements

Experience: Proven experience as an SRE or in a similar role, with specific expertise in managing Elasticsearch or OpenSearch clusters.
Technical Skills:
Strong knowledge of Elasticsearch/OpenSearch architecture, including index management, sharding, and replication.
Experience with performance tuning, scaling, and cluster optimization.
Understanding of JVM concepts and ability to code with Java or Scala, Python, Go.
Familiarity with monitoring tools (e.g., Prometheus, Grafana)
Experience with configuration management and automation tools (e.g., Ansible, Terraform, Kubernetes).
Problem Solving: Ability to diagnose and troubleshoot complex performance and stability issues in large-scale distributed systems.
Communication: Strong verbal and written communication skills to collaborate across teams and document processes clearly.