.
AI Site Reliability Engineer
  • Warsaw
AI Site Reliability Engineer
Warszawa, Warsaw, Masovian Voivodeship, Polska
Procter & Gamble
25. 7. 2025
Informacje o stanowisku

AI Site Reliability Engineering (SREs) is responsible for keeping all production systems running smoothly including some bug fixing. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation including AI to our operating environments and the P&G codebase.

SREs specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability, and scalability, with varied interests in algorithms and distributed systems.

In this role, youll be constantly learning, staying up to date with industry trends and emerging technologies in data solutions. Youll have the chance to work with a variety of tools and technologies, including big data platforms, AI and machine learning frameworks, and data visualization tools, to build innovative and effective solutions.

So, if youre excited about the possibilities of data, and eager to make a real impact in the world of business, a career in SRE team might be just what youre looking for. Join us and become a part of the future of digital transformation.

Key Responsibilities:

As a Site Reliability Engineer (SRE) at P&G, you will play a crucial role in ensuring the reliability, availability, and performance of our production systems. Your role will blend software engineering principles with operational discipline to create scalable and highly available systems. You will collaborate with development and operations teams to implement automation, optimize costs, and troubleshoot issues as they arise.

Key Responsibilities:

  • Oversee and maintain the smooth operation of production systems, ensuring high availability and reliability.

  • Design and implement automation solutions including AI Agents for routine operational tasks to enhance efficiency and reduce manual intervention.

  • Develop monitoring and observability dashboards and alerts to provide actionable insights into system health.

  • Develop and maintain automatic tests to ensure the quality and reliability of production systems.

  • Analyze system performance and resource utilization to identify opportunities for cost optimization.

  • Work with teams to implement best practices for resource allocation and cost-effective architecture.

  • Lead post-incident reviews to identify improvements in processes and systems. 

  • Participate in the change management process to facilitate seamless production deployments.

  • Plan, execute, and monitor production deployments to ensure minimal downtime and service disruption.

  • Collaborate with other teams to ensure proper deployment strategies and rollback mechanisms are in place.

  • Praca Warszawa
  • Warszawa - Oferty pracy w okolicznych lokalizacjach


    87 056
    7 753