.
Site Reliability Engineer
  • Warszawa
Site Reliability Engineer
Warszawa, Warszawa, Województwo mazowieckie, Polska
Antal Sp. z o.o.
16. 9. 2024
Informacje o stanowisku

Grow Your Career with Us!

If you’re looking for a career that will help you stand out, join us and fulfill your potential. Whether you aim to reach the top or simply explore an exciting new direction, we offer opportunities, support, and rewards that will take you further.


Technologies We Use

  • Java SE
  • Spring Boot
  • Spring Cloud
  • Apache Beam
  • Apache Flink
  • GCP
  • Redis
  • REST APIs
  • Ansible
  • Jenkins


Our Work Culture

We invest heavily in an Agile culture, adopting DevOps processes, CI/CD pipelines, and cloud technologies. We plan to establish a new development team in Krakow in 2023 as part of a long-term strategy to develop and support our platform in Europe.

This is an exciting opportunity to join a team in its early stages and make a key contribution.


Your Responsibilities

  • Manage application support operations, focusing on resiliency, availability, and monitoring system health and performance.
  • Coordinate resolution of production incidents, conducting post-mortem/RCA to identify root causes and improve processes.
  • Investigate, triage, and resolve production incidents with a focus on technical signals and root cause analysis.
  • Document post-incident recovery steps, contributing to process improvements, identifying deviations, and creating a Knowledge Base.
  • Actively participate in the service management community, engaging in Incident Management, Problem Management, and Service Delivery.
  • Define and deliver tactical and strategic service improvements across the technical and process landscape.
  • Apply SRE principles to continuously improve platform reliability, capacity, and performance, reducing toil and enhancing observability.
  • Develop observability tools and techniques for monitoring, alerting, incident detection, response, capacity management, and release safety.


What You Need to Succeed in This Role

  • 4+ years of experience in developing and supporting distributed systems written in Java.
  • Experience with Disaster Recovery methods and processes.
  • A methodical approach to troubleshooting and problem-solving skills.
  • Experience in application lifecycle management tooling: JIRA/Confluence, Ansible, Vulnerability Remediation, CI/CD automation.
  • Experience implementing and managing Logging, Monitoring, and Alerting frameworks for hybrid cloud using tools such as Geneos, Grafana, InfluxDB, Splunk, Loki, or similar tools.
  • Understanding of RDBMS Database, Cloud Technology, Unix/Linux, Job scheduling e.g., Control-m or autosys.
  • Ability to lead technical conversations with various technical support groups.
  • Excellent communication skills and experience working in Agile methodology.

Join us and grow your career in a dynamic and innovative environment!

  • Praca Warszawa
  • Warszawa - Oferty pracy w okolicznych lokalizacjach


    111 387
    20 584