Join the Team Lead, Production Engineering role at Allvue Systems, the leading provider of software solutions for the Private Capital and Credit markets. Allvue helps eliminate the boundaries between systems, information, and people while fostering innovation and continuous improvement.
Job Summary
We are seeking a Team Lead, Production Engineering to transform our Cloud Operations team into a modern Site Reliability Engineering (SRE) organization. In this mid‑level management role you will lead a distributed team across Europe and India, covering AWS and Azure cloud environments for all product lines (Credit, Equity, Data). You’ll focus on people management, process execution, and delivery, while the Technical Lead drives deep technical strategy. Your objectives over the next 6‑12 months are to stabilize operations, eliminate manual toil, implement automation, and establish robust planning and workflow processes. Long‑term you will shape the Production Engineering practice and have a broad impact across all our products.
Responsibilities
- Team Leadership: Manage and provide direction to the Production Engineering team (spanning EU and India), ensuring 24/7 operational coverage and reliability for all product lines on our AWS/Azure infrastructure.
- SRE Transformation: Drive the evolution from a traditional cloud operations model to a Production Engineering/SRE approach. Instill SRE best practices such as robust monitoring, alerting, and automation of manual processes to minimize toil.
- Operational Excellence: Oversee day‑to‑day production operations, including incident management and response. Ensure incidents are resolved quickly and followed by blameless post‑mortems and root cause analysis.
- Automation & Efficiency: Identify operational waste or repetitive manual work and drive initiatives to automate them. Focus on streamlining workflows, implementing scripts/tooling, and eliminating hands‑on tasks through Infrastructure as Code, CI/CD pipelines, and other automation.
- Process & Planning: Implement effective team processes for planning and execution. Use Agile methodologies (e.g., Kanban) to visualize work, manage long‑term projects, and ensure the team is working on the highest priority tasks.
- Collaboration with Tech Lead: Work closely with the Technical Lead to align on technical strategy and architectural decisions, focusing on delivery timelines, resource allocation, and team enablement.
- People Management: Mentor and coach team members, set performance goals, conduct regular 1:1s, and support their professional development.
- Cross‑Functional Collaboration: Serve as the liaison between the Production Engineering team and other departments (Development, QA, Product, etc.), ensuring new applications and features are built with reliability in mind.
- Continuous Improvement: Assess and improve platform reliability and efficiency, including capacity planning, cost optimization, security best practices, and adoption of new tools or technologies.
Requirements
- Experience: 7+ years in IT infrastructure, DevOps, or SRE roles, including 2+ years in a technical leadership or manager position.
- Cloud & Infrastructure Knowledge: Strong expertise in AWS and/or Azure services (REQUIRED), hands‑on experience with cloud infrastructure, containerization, and modern deployment practices. Solid understanding of managing production systems on Windows Server environments (REQUIRED). Experience with Linux systems is also highly beneficial.
- Automation & Tools: Demonstrated ability to automate operational tasks and workflows. Proficiency with scripting (PowerShell, Python, or similar) and infrastructure‑as‑code tools (Terraform, CloudFormation, etc.). Experience setting up CI/CD pipelines and using configuration management or DevOps tools (Jenkins, Ansible, etc.).
- SRE Best Practices: Strong knowledge of Site Reliability Engineering principles—monitoring/observability, incident response, SLAs/SLOs, and reducing toil through automation.
- Agile Planning: Experience implementing team workflows using Agile methodologies (Kanban or Scrum). Ability to manage a backlog of work, plan sprints or continuous flow, and deliver projects on schedule.
- Leadership & Communication: Excellent people management skills and strong communication skills to work across global teams and report operational status to leadership.
- Problem‑Solving: Hands‑on, analytical mindset to troubleshoot complex systems and drive problem resolution under pressure during incidents.
- Proactive Mindset: Self‑driven and proactive in identifying areas of improvement, capable of proposing innovative solutions and driving changes independently.
Education
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).
What We Offer
- 26 Working days of PTO
- Parental Program
- Daily Practice of English
- Udemy Learning
- Certification Reimbursement
- People Committee
- Hobby Clubs
- Corporate Events and Teambuildings
EEOC Statement
Allvue Systems provides equal employment opportunities (EEO) for all employees and applicants for employment. Allvue is committed to advancing diversity, equity, and inclusion, and it is our policy to prohibit discrimination and harassment of any type. Allvue will provide reasonable accommodations for qualified individuals with disabilities.
Seniority level
Employment type
Job function
- Production, Information Technology, and Engineering
- Industries: Software Development