Site Reliability Engineer (Hadoop)
87_30635
Obowiązki
- perform site reliability engineering tasks for big data platforms such as Hadoop, Spark, and HBase on open-source environments,
- monitor systems and follow standard operating procedures and runbooks to manage platform and application issues,
- assist with cluster maintenance activities and implement updates based on predefined installation and validation plans,
- apply strong troubleshooting and debugging skills to identify and resolve issues, while recommending preventive solutions,
- perform root cause analysis of major production incidents, document findings, and implement preventive measures to improve system reliability,
- automate routine operations using scripts or automation tools to reduce manual intervention, minimize errors, and enhance system performance.
Wymagania
- solid understanding of hadoop platforms and core hadoop ecosystem components,
- strong ability to troubleshoot both platform services and application-level issues, with a focus on identifying root causes,
- hands-on experience with writing ansible playbooks and automating tasks using ansible, shell scripts, and python,
- familiarity with unix/linux system internals, networking concepts, and distributed systems.
Oferujemy
- contract of employment or a B2B contract,
- for the contract of employment: extensive benefits package - Multisport Card, Lux Med medical healthcare including dental care, life insurance, cafeteria benefits,
- hybrid work model,
- training and continuous learning and certification opportunities.
Źródło: Arche Consulting/Praca