Informacje o stanowisku
Storage Operations Lead / Senior Storage Reliability Engineer (Kubernetes Platform)
Role Overview:
We are looking for an experienced Storage Operations Lead to ensure the stable, secure, and reliable operation of storage services supporting a business-critical Kubernetes platform. This role combines deep technical expertise in enterprise storage technologies with operational leadership responsibilities, including coordinating incident response, change management, problem resolution, and deployment activities across specialized teams.
The successful candidate will be responsible for maintaining high availability, performance, and resilience of the storage infrastructure while driving operational excellence and continuous improvement.
Key Responsibilities
- Manage, maintain, and enhance enterprise storage environments, including file, block, and object storage solutions (preferably NetApp ONTAP).
- Ensure seamless integration between storage platforms and Kubernetes environments, including CSI drivers, Persistent Volumes (PV), and Persistent Volume Claims (PVC).
- Monitor storage services and platform health, analyze performance metrics, and ensure compliance with defined SLA/SLO objectives.
- Lead and coordinate operational processes following ITSM and SRE best practices, including incident, problem, and change management.
- Develop, maintain, and enforce operational documentation, runbooks, and playbooks.
- Collaborate with Platform Engineering, DevOps, and Infrastructure teams to drive automation initiatives, GitOps adoption, and Infrastructure-as-Code practices.
- Ensure operational readiness, capacity planning, disaster recovery preparedness, and high availability of storage services.
- Act as a technical point of escalation for complex storage and platform-related issues.
- Drive continuous improvement initiatives focused on reliability, scalability, and operational efficiency.
Required Experience & Skills
- Strong hands-on experience with enterprise storage technologies (file, block, and object storage).
- Practical expertise with NetApp ONTAP administration and operations.
- Solid understanding of Kubernetes storage architecture, including CSI integrations and persistent storage management.
- Experience working in highly available, mission-critical production environments.
- Strong knowledge of ITSM processes, including Incident, Problem, and Change Management.
- Familiarity with Site Reliability Engineering (SRE) principles and operational excellence practices.
- Experience with automation, GitOps methodologies, and Infrastructure as Code.
- Excellent troubleshooting, coordination, and stakeholder management skills.
- Ability to lead operational activities across multiple specialized teams.
Praca WarszawaWarszawa - Oferty pracy w okolicznych lokalizacjach