Match score not available

Site Reliability Engineer

extra holidays
Remote: 
Full Remote
Contract: 
Salary: 
140 - 170K yearly
Experience: 
Expert & Leadership (>10 years)
Work from: 

Offer summary

Qualifications:

Bachelor's degree in Computer Science or related field, 10+ years of IT experience across multiple disciplines, 5+ years leading SRE implementation for large teams, Expert in process improvement methodologies.

Key responsabilities:

  • Translate requirements into solution reliability strategy
  • Lead data-driven improvements in software reliability
Cotiviti logo
Cotiviti Large http://www.cotiviti.com/
5001 - 10000 Employees
See more Cotiviti offers

Job description

Overview:

The Site Reliability Engineer (SRE) is responsible for leading the continuous evolution of the capabilities needed to ensure the reliable delivery and operation of the software solutions which enable Cotiviti’s ability to retrieve medical records from healthcare providers.

 

The SRE works closely with architects, development teams, production operations, and product owners to enable the appropriate level of reliability to meet business objectives.  The SRE serves as a mentor and role model—providing thought-leadership and collaborating with a cross-functional team to drive the continuous improvement of SDLC and production operations. They improve reliability by focusing on monitoring, productivity, performance, and availability.

 

The SRE has three primary areas of responsibility:

  • Operations: emergency incident response; change management; infrastructure management
  • System support: ensure system stability; production operations enablement
  • Process improvement: post-incident reviews; improve software development, deployment, and release practices; improve support practices; recommend changes to solution architecture

 

Collaborating with stakeholders, the SRE: defines business-aligned Service Level Indicators and Objectives; implements capabilities which real-time insight into the health of applications and the development pipelines; implements process and technology changes; automates routine SDLC tasks.

 

The SRE possesses a deep understanding of AWS cloud-native services; they ensure the team employs the correct strategies and tactics to ensure the reliability of the applications and services operating on AWS.

Responsibilities:
  • Ability to translate functional and nonfunctional requirements and strategies into solution reliability strategy, architecture, and roadmap in collaboration with development team members and other architects.
  • Ability to define key business-value aligned Service Level Indicators and Objectives. Automate SLIs/SLOs through observability tools.
  • Ability to lead data-driven improvement in reliability of the software solution.
  • Ability to apply SRE principles and practices to solutions built using AWS cloud-native services, such as but not limited to:
    • API Gateways
    • Lambda functions built using NestJS/NodeJS
    • Datastores (DynamoDB, OpenSearch, RDS, s3, HealthLake)
    • Event messaging technologies (SQS, EventBridge, Kinesis)
    • Logging/Tracing (CloudWatch, X-Ray)
    • Infrastructure as Code (Terraform)
  • Ability to drive continuous process and technology improvements to increase the reliability of deployments and releases
  • Coaching/training development team members as necessary to drive improvements in the teams’ delivery of the solution.
  • Support the continuous evolution of best practices and standards for solution reliability
  • Complete all responsibilities as outlined on annual Performance Plan.
Qualifications:
  • Proven record of accomplishment of applying SRE principles and practices to drive reliable software delivery and operation
  • Self-starter with a passion for delivering reliable, mission-critical solutions which delight customers
  • Expert in applying process improvement methodologies (Lean, Six Sigma, Kaizen, etc.) to software engineering practices
  • Bachelor’s degree in Computer Science, Information Technology or related field, or equivalent work experience
  • 10+ years of experience in at least two IT disciplines (such as data/solution architecture, Technical/Infrastructure architecture, Information/Data Architecture & Business Architecture) in a multitier enterprise environment.
  • 5+ years recent experience leading the implementation of SRE in support of large development teams
  • 5+ years hands-on experience implementing site reliability engineering practices; expert with tools and technologies used to improve software reliability at scale
  • 10+ years working in an Agile model, SAFe preferred
  • Prior hands-on experience with greenfield software development
  • Ability to apply data-driven decision making when evaluating architecture alternatives, balancing cost, complexity, time-to-market, and other factors
  • Basic knowledge of financial models and budgeting
  • Strong problem solving and critical thinking skills
  • Exceptional interpersonal skills including teamwork, facilitation, coaching, and negotiation
  • Excellent written and verbal communication skills
  • Strong leadership skills

Mental Requirements:

  • Communicating with others to exchange information.
  • Assessing the accuracy, neatness, and thoroughness of the work assigned.

 

Physical Requirements and Working Conditions:

  • Remaining in a stationary position, often standing or sitting for prolonged periods.
  • Communicating with others to exchange information.
  • Repeating motions that may include the wrists, hands, and/or fingers.
  • Assessing the accuracy, neatness, and thoroughness of the work assigned.
  • No adverse environmental conditions are expected.
  • Must be able to provide a dedicated, secure work area.
  • Must be able to provide high-speed internet access/connectivity and office setup and maintenance.

 

Base compensation ranges from $140,000 to $170,000. Specific offers are determined by various factors, such as experience, education, skills, certifications, and other business needs.

 

This role is eligible for discretionary bonus consideration

 

Cotiviti offers team members a competitive benefits package to address a wide range of personal and family needs, including medical, dental, vision, disability, and life insurance coverage, 401(k) savings plans, paid family leave, 9 paid holidays per year, and 17-27 days of Paid Time Off (PTO) per year, depending on specific level and length of service with Cotiviti. For information about our benefits package, please refer to our Careers page.

 

Since this job will be based remotely, all interviews will be conducted virtually.

 

Date of posting: 12/13/2024

Applications are assessed on a rolling basis. We anticipate that the application window will close on 03/12/2025, but the application window may change depending on the volume of applications received or close immediately if a qualified candidate is selected.

#LI-Remote

#LI-RA1

 

Required profile

Experience

Level of experience: Expert & Leadership (>10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Communication
  • Leadership
  • Teamwork
  • Critical Thinking

Site Reliability Engineer (SRE) Related jobs