We’re looking for a Senior AI Infrastructure Engineer to help design, build, and scale cloud-based AI and data infrastructure for a global technology company leading digital transformation. You’ll play a key role in building MLOps pipelines and production-grade AI systems that empower teams to move faster from experimentation to deployment.
You’ll collaborate with AI engineers, data engineers, and platform teams to create reliable, secure, and scalable infrastructure that powers next-generation AI products and services.
responsibilities :
Design, implement, and maintain cloud-native infrastructure to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock.
Build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics.
Develop infrastructure-as-code using tools like Cloudformation, AWS CDK to ensure repeatable and secure deployments.
Collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production.
Drive best practices for observability, including monitoring, alerting, and logging for AI platforms.
Contribute to the design and evolution of our AI platform to support new ML frameworks, workflows, and data types.
Stay current with new tools and technologies to recommend improvements to architecture and operations.
Integrate AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG).
requirements-expected :
7+ years of professional experience in software engineering and infrastructure engineering.
Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management.
Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK.
Expert-level coding skills in TypeScript and Python building robust APIs and backend services.
Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows.
Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads.
Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members.
Strong collaboration skills and the ability to partner effectively with cross-functional teams.