Offer summary

Qualifications:

Bachelor's degree in computer science, information technology, or a related field, or equivalent work experience., Proven experience in software development and/or system administration., Strong scripting and coding skills in languages like Python, Go, or Shell., Familiarity with cloud platforms and containerization technologies..

Key responsabilities:

Ensure the reliability and availability of production systems by monitoring and responding to incidents.

Develop and maintain automation tools for system monitoring and incident response.

Collaborate with development teams for capacity planning and performance improvements.

Maintain documentation for operational processes and best practices.

Job description

● Ensure the reliability and availability of production systems and services by monitoring, troubleshooting, and responding to incidents.

● Develop and maintain tools and automation for system monitoring, alerting, and incident response to minimize manual intervention.

● Collaborate with development teams to plan for capacity scaling and performance improvements based on usage patterns and growth forecasts.

● Collaborate with development and product teams to ensure that new features and services are designed with reliability in mind.

● Maintain documentation for operational processes, system configurations, and best practices.

Requirements

● Bachelor's degree in computer science, information technology, or a related field (or equivalent work experience).

● Proven experience in software development and/or system administration.

● Strong scripting and coding skills (e.g., Python, Go, Shell) for automation and tool development.

● Familiarity with containerization and orchestration technologies like Docker and Kubernetes.

● Experience with cloud platforms (e.g., AWS, Azure, GCP) and infrastructure as code tools (e.g., Terraform).

● Proficiency in monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

● Knowledge of network, security, and database concepts.

● Strong problem-solving skills and the ability to work well under pressure.

● Understanding of agile and DevOps methodologies.

● Excellent communication and collaboration skills.

● Availability to work during US hours till 3 pm ET is essential for this role.

● Candidates must have their own system/work setup for remote work.

Required profile

Are you interested?