キンドリルには、日々、最先端で信頼性の高いテクノロジーインフラストラクチャーをデザイン、実行、管理する世界最高レベルの人材がいます。私たちは、共にその世の中に必要不可欠なテクノロジーエコシステムの健全性を大きな視点で考え続けます。私たちは新たな方法でシステムを創り、過去の成功を塗り替えて注力する、自立した企業をめざしています。相応しいパートナーを巻き込み、ビジネスに投資し、新たな可能性を引き出すべく、お客様とともに歩んでいきます。私たちは障害を乗り越えていきます。キンドリルには、フォーチュン100のうちの75社にサービスを提供する、経験に裏打ちされた高いスキルをもった9万人の社員がいます。しかし私たちを突き動かす目的は、社会を進化させる必要不可欠なシステムを前に進めること。なぜなら、デジタルエコシステムが健全であれば、適応性がより高くなり、継続的な成長を後押しすることができるため、すべての人々にとって、可能性に満ちた世界が広がるのです。キンドリル、それは社会成長の生命線。確かな社会成長を私たちと共に。

Who We Are

At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.

The Role

Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you'll be at the forefront of driving continuous improvement and delivering exceptional service to our customers.

Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems.

We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth.

The DevOps team is responsible for deploying and managing on-demand virtual machines, as well as the underlying infrastructure, in over a dozen global data centers. We are responsible for managing both ESXi and PowerVM hypervisors. We also build monitoring components which provide insights into the performance of these virtual machines.

Representative projects include building features around scaling and performance, gathering metrics to improve and optimize self-healing systems, creating new monitoring frameworks, automation improvements, bug fixes, and internal technical improvements. We use a lot of Python, Git, Mercurial, InfluxDB, MySQL, etc. (You don’t need to know all these technologies to apply!) Skytap has a deep technical stack, so you will have plenty of opportunities to learn new things.

The team is a diverse collection of engineers who collaborate to ensure the overall team is effective. We own our services end to end, and are responsible for the architectural design, development, quality assurance, and operation of the services.

With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers.

If you're passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do.

Your Future at Kyndryl
Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential – offering a wide range of professional and personal growth opportunities that you won’t find anywhere else.

Who You Are

You’re good at what you do and possess the required experience to prove it. However, equally as important – you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused – someone who prioritizes customer success in their work. And finally, you’re open and borderless – naturally inclusive in how you work with others.

Role:

In this role, you will help us expand and sustainably manage our fleet of ESXi and PowerVM hypervisors and related technologies. You’ll also share your knowledge of design and best practices, so that collectively we can build a rock-solid system that’s indispensable to our customers. Thus, your skills should span in-depth ability to configure, deploy, troubleshoot, and tune x86 systems. This list isn't exhaustive, so you'll need to be perceptive and thoughtful, identifying latent problems and finding solutions.

Responsibilities:

Help design efficient distributed systems with emphasis on high availability and disaster recovery.
Design and add new monitoring, logging, alerting, and metrics to systems
Diagnose system issues, performance issues and assess fleet health.
Investigate and update provisioning systems with every new hardware build.
Improve configuration management systems and work to automate server provisioning.
Improve processes and documentation of day-to-day service administration.
Improve systems testes and functional tests
Participate in on-call rotation alongside the rest of the team. (Note: The Compute team understands the necessity of a healthy on-call rotation that doesn’t burn out people or trample their morale. We continually work to pay down technical and operational debt, ensuring that being on call is sustainable.)
Work with Support, Customer Success, and Professional Services teams to jointly address customer’s problems and needs.
Plan for hardware refreshes and upgrades.
Plan for hypervisor patches and upgrades.
Track upstream package updates and determine which need to be applied.

Required Skills and Experience

Experience with shell scripting and automated provisioning tools (e.g., Bash scripting, Ansible, Puppet)
Experience setting up, maintaining, and coordinating patch and configuration management of production servers
Ability to clearly communicate technical ideas, meeting outcomes, and problem statements, verbally and in writing
Good reading comprehension and attention to detail—you can read a spec and see the big picture as well as missing edge cases
Improving processes and documentation for service administration.
Writing design documents for major service improvements.
A collaborative attitude and working style.
Knowledge of deploying, configuring, troubleshooting, and performance-tuning infrastructure at scale.
Knowledge of best practices for architecting highly available, scale-out systems

Preferred Skills and Experience

Knowledge of vCenter/ESXi is a plus
Knowledge of PowerVM/VIOS is a plus

Being You

Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way.

What You Can Expect

With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.

Get Referred!

If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address.

Senior Associate, Site Reliability Engineer

Offer summary

Qualifications:

Key responsibilities:

Job description

Required profile

Experience

Hard Skills

Other Skills

Site Reliability Engineer (SRE) Related jobs

Site Reliability Engineer - System & Network (w/m/d)

Banco de talentos: Site Reliability Engineer / SRE com foco em Datadog (Pleno/Sênior)

Site Reliability Engineer (SRE) - Azure Cloud

Senior Software Engineer - Site Reliability

(1016) Staff Site Reliability Engineer