.
Site Reliability Engineer
  • Warsaw
Site Reliability Engineer
Warszawa, Warsaw, Masovian Voivodeship, Polska
N-iX
24. 12. 2025
Informacje o stanowisku

We are seeking experienced  Site Reliability Engineers (SREs)to help monitor, maintain, and scale software production environments, with a primary focus on  onboarding new microservices .

You will work closely with development and platform teams to automate and program‑manage the onboarding lifecycle—from initial requirements and environment setup through deployment, testing, documentation, and handover—ensuring reliability, scalability, performance, and compliance at every step.

Key Responsibilities

1. Service Onboarding & Automation

  • Lead and support the  end-to-end onboarding process for new microservices into production environments.

  • Identify and automate gaps in the current onboarding workflow (deployment, configuration, monitoring, scaling, etc.).

  • Provide  program management for onboarding activities, including timelines, dependencies, and stakeholder communication.

  • Collaborate with development and operations/platform teams to ensure smooth and consistent rollout of new services.

2. Monitoring, Logging & Observability

  • Design and implement  monitoring, logging, and alerting for all onboarded services.

  • Ensure comprehensive metrics collection (e.g., availability, latency, error rates, throughput) to support SLOs/SLIs.

  • Tune alerts to minimize noise while ensuring rapid detection and response to production issues.

3. Scalability, Load & Performance

  • Perform  load and stress testing to validate that services can scale to meet current and projected demand.

  • Implement and refine  auto‑scaling mechanisms and capacity planning practices.

  • Conduct ongoing  performance tuning and optimization to achieve minimal latency and high throughput.

4. Reliability, Resilience & Uptime

  • Drive high  service reliability and uptime for all onboarded microservices.

  • Help teams design and implement  fault‑tolerant architectures , including failover and redundancy mechanisms.

  • Work with teams to adopt SRE best practices (e.g., error budgets, post‑incident reviews, runbooks).

5. Security & Compliance

  • Ensure all onboarded services  meet security and compliance requirements .

  • Integrate security best practices into deployment, monitoring, and operational processes.

  • Maintain  audit trails and documentation for onboarding activities to support regulatory and internal compliance.

6. Documentation, Training & Knowledge Transfer

  • Create  detailed documentation for the service onboarding process, including standards, patterns, and templates.

  • Develop and maintain  runbooks, playbooks, and SOPs for ongoing operations.

  • Conduct  training sessions and workshops for internal teams to enable self‑service onboarding and long‑term maintainability.

7. Planning, Testing & Post‑Onboarding Support

  • Participate in  requirements analysis for new services; define onboarding success criteria and KPIs.

  • Develop  onboarding plans outlining steps, timelines, responsibilities, and acceptance criteria; present plans to stakeholders for review and approval.

  • Prepare and validate environments, ensuring appropriate access, permissions, and tooling are in place.

  • Conduct comprehensive  functional, performance, reliability, and security testing prior to go‑live.

  • Provide  post‑onboarding support , monitoring services to ensure continued reliability and quickly addressing any issues that arise.

Required Qualifications

  • Proven experience as a  Site Reliability Engineer , DevOps Engineer, or similar role in  microservices-based environments.

  • Strong understanding of  microservices architecture , distributed systems, and cloud‑native concepts.

  • Hands-on experience with:

    • Production  monitoring, logging, and alerting (e.g., metrics, tracing, log aggregation tools).

    • Automation of deployment and operational workflows (e.g., scripts, pipelines, IaC, or similar).

    • Load/performance testing and capacity planning.

  • Demonstrated ability to improve  service reliability, scalability, and performance in production.

  • Familiarity with  security best practices related to service deployment, monitoring, and operations.

  • Experience working across  cross‑functional teams (development, operations, security, compliance) to deliver complex initiatives.

  • Excellent  documentation , communication, and stakeholder management skills.

Preferred Qualifications

  • Experience defining and tracking  SRE KPIs/SLOs/SLIs for onboarding and production services.

  • Background in  program or project management of technical initiatives (especially service onboarding or platform rollouts).

  • Prior experience in  high‑availability, regulated, or large‑scale SaaS environments .

  • Praca Warszawa
  • Warszawa - Oferty pracy w okolicznych lokalizacjach


    122 369
    18 373