SOFTSWISS continues to expand the team and is looking for a Monitoring System Engineer. We need a true, experienced, and accomplished professional who shares our culture and values.
The two main pillars of our workflow are:
Responding to Events/Monitoring Alerts (L1/L2 tasks for certain system parts):
Offering on‑duty service coverage, encompassing day and night on‑call shifts.
Provide timely and effective solutions to technical problems reported by users.
Communicate clearly with users to understand their issues and provide updates on resolution status.
Address incidents by troubleshooting and resolving issues, even seeking assistance from third‑party or vendor support when necessary.
Direct issues or queries to the relevant department as needed.
Keep detailed records and documentation of current infrastructure challenges and Root Cause Analyses (RCAs).
Create detailed reports for all technical support incidents, including descriptions, resolutions, and timelines.
Maintaining and Enhancing the Monitoring Systems:
Collaborate with other teams to understand and define their monitoring needs, then implement the right solutions.
Set up and adjust monitoring/observability systems for various teams.
Design and tweak alerts and dashboards to suit specific needs.
Refine alerts to reduce irrelevant notifications and increase their significance.
Enhance dashboards for better clarity, understanding, and a more comprehensive view.
Build and sustain connections between the monitoring systems and other platforms like Jira, Opsgenie, etc. when required.
Establish and update a Knowledge Base, covering system configurations, alert processes, troubleshooting guidelines, and user manuals.
Stay updated with the newest trends and best practices to continuously uplift our organization’s monitoring capabilities.
Minimum of 3 years’ experience as a Systems Engineer, SRE, DevOps, or Monitoring Support Engineer.
Good understanding of Linux‑like operating systems (Debian‑based).
Experience with containerization, virtualization, and orchestration (LXC/LXD, Docker, Kubernetes).
Development experience in any scripting language (Bash, Python, Go, etc.) and familiarity with REST API.
Knowledge of basic database concepts (experience with PostgreSQL is preferable), including transactions and WAL.
English proficiency at an Intermediate (B1) level or higher. It’s crucial to understand technical terminology related to our specific tech stack and to be able to interpret technical documentation.
Zabbix (familiarity with concepts such as LLD, prototypes, dependencies, and preprocessing)
Grafana (knowledge of data sources, dashboard creation, and query usage)
Prometheus/VictoriaMetrics/etc. (understanding of metrics collection and alerting)
ELK/Splunk/etc. (ability to use queries and filters for log analysis)
Site24x7/Pingdom/etc. (experience with web monitoring and performance metrics)
Strong understanding of key concepts, including:
File systems
Process management
Built‑in monitoring tools
Networks
Scripting
Troubleshooting
Kafka
RabbitMQ
GitLab
Nginx/Puma
Clickhouse
PostgreSQL
MongoDB
Hashicorp Vault
Microservices and orchestration (Kubernetes)
Any IaC / infrastructure automation:
Provisioning tools (Terraform)
Configuration management (Ansible, Salt, Puppet)
Full‑time remote work opportunities and flexible working hours
Private insurance
Additional 1 Day Off per calendar year
Sports program compensation
Comprehensive Mental Health Programme
Free online English lessons with a native speaker
Generous referral program
Training, internal workshops, and participation in international professional conferences and corporate events.