Architect and implement robust solutions leveraging Kubernetes as the primary orchestration tool, focusing on the deployment and management of GPU workloads with NVIDIA cards.
Design, build, and maintain Kubernetes clusters in both on-premises and cloud environments, ensuring high availability, scalability, and security.
Develop and implement a new deployment process heavily reliant on Kubernetes, including infrastructure automation and Infrastructure as Code (IaC) principles.
Integrate supplementary services like Single Sign-On and cloud application integrations.
Collaborate with internal teams and third-party vendors to implement architecture solutions that are robust, maintainable, and supportable.
Serve as the point of contact for Kubernetes-related activities, coordinating with suppliers/vendors and internal stakeholders.
Build monitoring solutions specifically tailored for Kubernetes clusters, including setting up Prometheus or similar tools for effective cluster health monitoring.
Drive automation efforts within Kubernetes, utilizing tools like ArgoCD, HELM, and Terraform to streamline deployments and cluster management.
Train and hand over new services to the support teams, ensuring comprehensive documentation and knowledge transfer.
requirements-expected :
Extensive hands-on experience with Kubernetes, including setting up, managing, and troubleshooting clusters in on-premises and cloud environments.
Strong knowledge of Amazon Web Services (AWS) and experience with cloud-native services.
Proficiency with Infrastructure as Code (IaC) tools like Terraform for automating infrastructure.
Solid understanding of containerization technologies and how they integrate within orchestration platforms.
Experience with monitoring tools and platforms, with a focus on ensuring the health and performance of deployed services.
Experience in managing and automating infrastructure across Linux systems.
Excellent troubleshooting skills, particularly in large-scale, distributed environments.
Experience with CI/CD pipelines and modern deployment practices.
Familiarity with load balancer solutions and network configurations for high-availability environments.
offered :
We love sports, but we love diverse thinking more!
We know that diversity brings creativity, so we invite people from all backgrounds to join us. At Stats Perform you can make a difference, by using your skills and experience every day, youll feel valued and respected for your contribution.
We take care of our colleagues.
We like happy and healthy colleagues. You will benefit from things like Mental Health Days Off, ‘No Meeting Fridays,’ and flexible working schedules.
We pull together to build a better workplace and world for all.
We encourage employees to take part in charitable activities, utilize their 2 days of Volunteering Time Off, support our environmental efforts, and be actively involved in Employee Resource Groups.