Lead the development of developer and cloud platforms, including internal engineering accelerators and reusable toolsets.
Platform Catalog & Developer Experience
Design, implement, and manage unified platform catalogs using Backstage, enhancing developer experience and application metadata management.
Develop custom plugins and APIs for Backstage to support internal engineering workflows and documentation.
Automation & DevOps Excellence
Build and maintain Python-based automation frameworks, CI/CD pipelines, and Infrastructure-as-Code (Terraform, Helm, Pulumi, AWS CDK).
Operationalize containerized solutions using Docker and Kubernetes, integrating MLflow, Kubeflow, and other orchestration platforms.
Implement robust automation for provisioning, configuring, and managing cloud resources across multiple environments.
MLOps & Reliability Engineering
Lead the implementation of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and advanced observability (Prometheus, Grafana, PagerDuty).
Develop and maintain APIs and services for model management, feature stores, and inference pipelines.
Operationalize ML model serving at scale using frameworks such as TensorFlow Serving, TorchServe, KServe, and Seldon Core.
Ensure compliance with industry standards (e.g., HIPAA, FDA) for data protection and reliability.
Collaboration & Leadership
Mentor engineers and lead cross-functional teams to deliver integrated solutions.
Champion engineering excellence through design documentation, code reviews, and testing automation.
Present at industry summits, author technical proposals, and contribute to open-source projects (Kubernetes, Helm, Go, Envoy).
Continuous Improvement
Drive agile delivery, sprint planning, and performance optimization.
Lead incident response and disaster recovery initiatives for mission-critical platforms.
Foster a culture of shared ownership, transparency, and innovation
requirements-expected :
7+ years of hands-on software engineering experience in cloud infrastructure, DevOps, and MLOps.
Deep expertise in Python, Kubernetes, Terraform, Helm, and CI/CD pipeline development.
Proven experience architecting and operating containerized solutions on AWS, GCP, and Azure.
Strong knowledge of Infrastructure-as-Code, distributed systems, and production system reliability.
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.