Do you thrive on optimizing AI systems for peak performance?
Are you ready to push the boundaries of inference speed and efficiency?
Join the Akamai Inference Cloud Team!
The Akamai Inference Cloud team is part of Akamais Cloud Technology Group. We build AI platforms for efficient, compliant, and high-performing applications. These platforms support customers in running inference models and empower developers to create advanced AI solutions effectively. #AIC
Partner with the best
The Performance Engineer ensures optimal benchmarking, tuning, and performance of an AI inference platform. Responsibilities include applying advanced optimization techniques to enhance throughput, reduce latency, and improve resource efficiency. The role involves working with models, hardware accelerators, and infrastructure. Expertise in AI/ML performance optimization, proficiency with inference frameworks, and a passion for maximizing hardware and software performance are essential.
Do you thrive on optimizing AI systems for peak performance?
Are you ready to push the boundaries of inference speed and efficiency?
Join the Akamai Inference Cloud Team!
The Akamai Inference Cloud team is part of Akamais Cloud Technology Group. We build AI platforms for efficient, compliant, and high-performing applications. These platforms support customers in running inference models and empower developers to create advanced AI solutions effectively. #AIC
Partner with the best
The Performance Engineer ensures optimal benchmarking, tuning, and performance of an AI inference platform. Responsibilities include applying advanced optimization techniques to enhance throughput, reduce latency, and improve resource efficiency. The role involves working with models, hardware accelerators, and infrastructure. Expertise in AI/ML performance optimization, proficiency with inference frameworks, and a passion for maximizing hardware and software performance are essential.
,[Benchmarking and profiling AI models and inference workloads across different hardware configurations, measuring latency, throughput, and resource utilization., Researching and implementing model optimization techniques including quantization, pruning, distillation, and hardware-specific optimizations., Optimizing inference frameworks and infrastructure to maximize performance, working with TensorRT, vLLM, TorchServe, Triton and other serving platforms., Establishing performance baselines and monitoring for the platform, identifying and addressing performance regressions., Collaborating with engineering teams to identify bottlenecks, recommend optimizations, and validate performance improvements. Requirements: Python, AI/ML model optimization, inference, GPU optimization , benchmarking Tools: . Additionally: Sport subscription, Private healthcare, International projects, Free coffee, Gym, Bike parking, In-house trainings, Modern office, No dress code.