Match score not available

Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Salary: 
100 - 130K yearly
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Extensive experience with SRE methodologies., Expertise in automation and coding., Experience in public cloud environments., 2+ years of hands-on cloud administration., Relevant degree or equivalent experience..

Key responsabilities:

  • Support and enhance cloud infrastructure.
  • Respond to production incidents effectively.
  • Automate processes using Terraform and Ansible.
  • Maintain documentation for technical procedures.
  • Provide 24x7 support and engage in root cause analysis.
Restaurant365 logo
Restaurant365 Computer Software / SaaS SME https://www.restaurant365.com/
201 - 500 Employees
See more Restaurant365 offers

Job description

Restaurant365 is a SaaS company disrupting the restaurant industry! Our cloud-based platform provides a unique, centralized solution for accounting and back-office operations for restaurants. Restaurant365’s culture is focused on empowering team members to produce top-notch results while elevating their skills. We’re constantly evolving and improving to make sure we are and always will be “Best in Class” ... and we want that for you too!

The SRE will be assisting in the responsibilities for supporting, enhancing, and maintaining our infrastructure and cloud services. Qualified candidates will demonstrate immediate technical aptitude, as well as propensity for learning new tools and techniques quickly in a fast-paced environment. Excellent candidates will be responsible for collaborating with the devops and development teams on efforts to help sustain a healthy responsive system. The SRE team is the front line for supporting our system and developing a best-in-class monitoring platform. The candidate will propose enhancements for system health, performance, and reliability to deliver SaaS based services for Restaurant365 customers.

How you'll add value:
  • Responding to production incidents and determining how we can prevent them in the future.
  • Triaging and troubleshooting production issues to ensure reliability and performance.
  • Identifying and automating manual processes.
  • Continuously evolving our monitoring tools and platform.
  • Promoting and applying best practices for building scalable and reliable services across engineering.
  • Developing and maintaining technical documentation/diagrams, runbooks, and procedures.
  • Provide “Always On” support for a 24x7 online environment, by participating in an on-call rotation providing response to production incidents and participating in root cause analysis and problem management.
  • Automate Public cloud environments by utilizing tools such as Terraform, Ansible, and cloud formation.
  • Work within strict time frames following change management protocols to provide maximum uptime.
  • Implement, review, and adhere to security policies along with working with audit teams. 
  • Research and remediate system vulnerabilities.
  • Interact and coordinate with architects, developers, vendors, and internal business partners.
  • Maintain documentation of all Cloud infrastructure related components.
  • Maintain a solid working knowledge of current infrastructure and future trends.
  • Other duties as assigned.

  • What you'll need to be successful in this role:
  • Extensive experience with SRE methodologies and processes. 
  • Automation expert with coding skills and a mindset to automate manual/repetitive tasks with PowerShell, Bash, Perl, PHP, or containers.
  • Extensive scripting experience with Terraform, YAML, Ansible, Python.
  • Automation experience in public cloud environments, with a strong understanding of infrastructure as code. 
  • Experience in continuous deployment and lifecycle management using tools such as Gitlab, Git, stash.  
  • Linux engineering skills and working knowledge of Windows. 
  • Working experience with Nginx and Apache Tomcat. 
  • Azure or AWS: 2+ years hands on administration and automation of various Azure or AWS services (Azure AKS, Azure Functions, Azure Blob, AWS ECS, AWS EKS, LAMDA, S3, ALB/ELB, etc...).  
  • Experience with Windows and Linux. 
  • Ability to effectively prioritize and execute tasks in a high velocity environment. 
  • Minimum of 2 years of related experience with a bachelor's degree; or equivalent work experience.  
  • Strong written, oral, and interpersonal communications skills. 
  • AWS or Azure cloud certification is preferred. 
  • Preferred experience using: Jira, Prometheus, Grafana, ELK, Site24x7. Nagios a bonus!

  • R365 Team Member Benefits & Compensation
  • This position has a salary range of $100K-$130K. The above range represents the expected salary range for this position. The actual salary may vary based upon several factors, including, but not limited to, relevant skills/experience, time in the role, business line, and geographic location. Restaurant365 focuses on equitable pay for our team and aims for transparency with our pay practices.
  • Comprehensive medical benefits, 100% paid for employee
  • 401k + matching
  • Equity Option Grant
  • Unlimited PTO + Company holidays
  • Wellness initiatives

  • #BI-Remote
    R365 is an Equal Opportunity Employer and we encourage all forward-thinkers who embrace change and possess a positive attitude to apply.

    Required profile

    Experience

    Level of experience: Mid-level (2-5 years)
    Industry :
    Computer Software / SaaS
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Time Management
    • Interpersonal Communications

    Site Reliability Engineer Related jobs