We are seeking a Senior Site Reliability Engineer to join our IB Client Platform Stream.
You will focus on enhancing the reliability of production systems and reducing escalations for IT, support, and development teams. Your role involves managing critical incidents, monitoring, and automating processes to improve system performance and business continuity. You will collaborate with stakeholders across business and technology units to ensure the smooth delivery and support of technology projects. If you are ready to contribute your expertise to a dynamic environment, we encourage you to apply.
responsibilities :
Drive reliability improvements in production systems to reduce escalations and enable faster feature development
Handle support escalations with a thorough understanding of environment, code, and logs
Manage incident response, change management, and business continuity activities
Analyze and document system issues from business and technical perspectives
Identify and implement solutions and system improvements, including automation of manual tasks
Collaborate with product managers, developers, quality analysts, and support teams to support project delivery and onboarding
Provide regular updates to management on system status and issues
Develop technical fixes and scripts to support operational needs
Investigate problems to determine root cause and provide workarounds
Create and maintain known error documentation
Own the lifecycle of problem resolution
Perform daily system monitoring and troubleshoot production issues
Support and configure global production environments
Manage release processes for UAT and production environments
Document support procedures, releases, and troubleshooting guides
Provide coverage during weekdays and weekends as needed
requirements-expected :
Minimum of 3 years programming experience in Python, JavaScript, or Java
At least 3 years of experience in DevOps including building and troubleshooting pipelines
Proficiency in automation using Python or other scripting languages
Knowledge of Unix administration
Familiarity with ITIL processes
Experience using ServiceNow for operational support
Experience with Azure Log Analytics and query languages such as KQL or Splunk