A career in IBM Software means youll be part of a team that transforms our customers challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the worlds leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. IBMs product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive. Your Role and Responsibilities As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes. The Site Reliability Team (SRE) ensures the service is highly available and fully optimizead in a 24/7 environment. As a SRE you will play a crutial role in ensuring the reliability and resiliency of our systems. If you are passionate about optimizing, building automation, solving problems, testing, deploying and managing highly-scalable environments - this is the perfect opportunity for you.In this role, you will be part of a global SRE team who works closely with our development and product teams to increase the quality and reliability for our products and services but also deploy and manage of Kubernetes clusters on IBM Cloud and other cloud platforms (AWS, Azure). As a SRE you must be willing to work in a fast paced Cloud environment, share rotational on-call duty coverage with the global Ops team and support the back-end Cloud infrastructure components.Key Responsibilities:- Maitain high-available product and service on cloud- Identify issues, ensure minimal downtime and drive them towards a resolution- Monitor health and performance of production systems- Automate repetitive tasks using scripts and tools, reduce manual interventions- Collaborate with development teams - roll out new services, ensure stability and reliability- Improve operational practices, ensure efficenty and innovation- Share knowlegde, ideas and solutions with global team