We are looking for an experienced Observability Engineer who will be responsible for designing, implementing, and maintaining a modern observability and telemetry platform used to monitor complex systems and applications. In this role, you will collaborate with system architects and engineering teams to ensure high availability, performance, and transparency across production environments.
The position focuses on developing and operating a monitoring ecosystem based on the Elastic Stack, implementing OpenTelemetry standards, and supporting development teams in identifying performance issues through advanced observability tools. A key part of the role will also involve infrastructure automation, applying Site Reliability Engineering (SRE) practices, and introducing AIOps capabilities for proactive anomaly detection and faster incident analysis.
We are looking for an experienced Observability Engineer who will be responsible for designing, implementing, and maintaining a modern observability and telemetry platform used to monitor complex systems and applications. In this role, you will collaborate with system architects and engineering teams to ensure high availability, performance, and transparency across production environments.
The position focuses on developing and operating a monitoring ecosystem based on the Elastic Stack, implementing OpenTelemetry standards, and supporting development teams in identifying performance issues through advanced observability tools. A key part of the role will also involve infrastructure automation, applying Site Reliability Engineering (SRE) practices, and introducing AIOps capabilities for proactive anomaly detection and faster incident analysis.
,[Design and maintain scalable observability solutions for distributed systems and cloud-native environments, Manage and optimize the Elastic Stack (Elasticsearch, Logstash, Kibana) to ensure performance, reliability, and cost efficiency, Implement centralized data collection using Elastic Agent and Fleet, Develop and maintain telemetry pipelines for logs, metrics, and traces using OpenTelemetry, Monitor and analyze Kubernetes / OpenShift environments to ensure system stability and performance, Implement and manage Application Performance Monitoring (APM) and distributed tracing to identify application bottlenecks, Automate deployment and configuration of observability infrastructure using Ansible and Infrastructure as Code practices, Configure and maintain Elastic Machine Learning models for anomaly detection and operational insights, Define and monitor Service Level Objectives (SLOs) and manage Error Budgets in line with SRE practices, Collaborate with development and architecture teams to improve system reliability and performance Requirements: Elastic Search Additionally: Private healthcare, Training budget, International projects, Sport subscription.