We are seeking a talented and passionate Support Engineer to join our growing team. You will play a critical role in providing exceptional technical support to our customers by efficiently resolving escalated issues, conducting in-depth root cause analysis, and collaborating with cross-functional teams to ensure system stability and operational excellence.
You will be instrumental in building and maintaining a robust support infrastructure by developing and implementing automation solutions, improving support processes, and proactively preventing incidents. This role requires a strong understanding of cloud technologies, software development principles, and a passion for delivering outstanding customer support.
This fully remote role requires coverage of US working hours. Expect a typical workday to start around 2:00-3:00 PM CET and last for approximately 8 hours, with some flexibility.
Maxima Consulting is an IT consulting company founded in 1993 in Boston. Our technology experts across North America, Europe, Asia, and Australia help organizations of all shapes and sizes in their digital transformation efforts. We provide effective and dependable solutions to the IT infrastructure, software development, quality assurance, maintenance & support, and cybersecurity challenges, as well as a broad scope of additional services.
responsibilities :
Efficiently resolve escalated support issues from Tier 1, prioritizing critical incidents while ensuring smooth transitions between teams.
Conduct in-depth root cause analysis and ensure that findings are used to prevent future occurrences through proactive system changes and automation.
Partner with cross-functional teams (Engineering, SRE, DevOps) to ensure smooth handoffs, shared knowledge, and efficient problem-solving, minimizing disruptions to product development.
Continuously identify areas for improvement in support processes, incident management, and tooling. Work actively on automation initiatives to reduce manual interventions.
Develop and maintain automation scripts and tools to streamline repetitive tasks and accelerate incident resolution times.
Leverage monitoring tools and predictive analytics to identify and resolve issues before they escalate, reducing the frequency of customer-facing incidents.
Participate in incident response and post-incident reviews to learn from failures and continuously improve system reliability.
Empower others by sharing knowledge through documentation, training, and mentorship.
requirements-expected :
Proficient in software debugging, system diagnostics, and familiarity with the software development lifecycle (SDLC).
Strong understanding of Cloud Infrastructure: Knowledge of cloud-based environments (e.g., AWS, GCP, Azure), networking fundamentals, databases (e.g., MySQL, PostgreSQL, NoSQL), and systems infrastructure.
Proficiency in debugging across software layers, diagnosing infrastructure issues, and troubleshooting production systems.
Experience with advanced monitoring tools (e.g., Datadog, Grafana, Prometheus), log analysis tools (e.g., Splunk, ELK Stack), and automation solutions for incident remediation.
Expertise in managing incidents using platforms like Jira, ServiceNow, PagerDuty, or OpsGenie.
Familiarity with coding or scripting for automation, particularly using languages such as Python, Bash, Go, Rust.
Experience creating self-healing or auto-remediation scripts is a plus.
Proven ability to diagnose and resolve complex technical issues, especially in high-availability, high-scalability environments.
Ability to use data and performance metrics to proactively identify patterns and implement preventative measures.
Experience working with Engineering, Product, and SRE teams to ensure issues are resolved efficiently and root causes are documented and communicated effectively.
Ability to translate technical issues into easily understandable terms for both technical and non-technical audiences, ensuring clear communication with stakeholders.
Takes full ownership of escalated incidents, ensuring thorough documentation, follow-up, and resolution. Prioritizes maintaining service continuity and minimizing customer impact.
Experience setting up and optimizing proactive monitoring systems and leveraging these tools to catch potential issues before they become critical.
Understanding of Site Reliability Engineering (SRE) principles or DevOps practices is highly preferred. Candidates should have experience or an understanding of improving system reliability through monitoring, incident response automation, and continuous feedback loops.
Experience with modern software languages such as Elixir, Scala, Javascript/TypeScript, Java, GoLang, C#, etc.