We are seeking a mission-driven Senior SRE to join our growing, international team. You will play a critical role in building and maintaining a highly resilient, efficient, and secure infrastructure and platform. You will be responsible for delivering a catalog of high-quality, world-class products and services to our customers at scale.
This fully remote role requires coverage of US working hours. Expect a typical workday to start around 2:00-3:00 PM CET and last for approximately 8 hours, with some flexibility.
Maxima Consulting is an IT consulting company founded in 1993 in Boston. Our technology experts across North America, Europe, Asia, and Australia help organizations of all shapes and sizes in their digital transformation efforts. We provide effective and dependable solutions to the IT infrastructure, software development, quality assurance, maintenance & support, and cybersecurity challenges, as well as a broad scope of additional services.
responsibilities :
Collaborate with cross-functional teams to design and implement CI/CD pipelines that automate fast and safe delivery of software to our customers, enable experimentation, create fast feedback loops, and developer self-service capabilities.
Lead efforts in automating deployment, monitoring, and infrastructure management.
Proactively identify and resolve performance bottlenecks, system failures, and security vulnerabilities.
Minimize or eliminate degradations and failures related to fault tolerance, security, availability, and performance.
Develop SLOs and SLIs to manage risk through continuous monitoring and measurement of system performance.
Build, manage, and deploy highly available, self-healing, customer-facing production infrastructure and applications (microservice and event-based architectures) using Docker, Kubernetes, Helm, and Terraform.
Leverage 12 Factor App methodology when building and deploying all our services and systems.
Implement best practice infrastructure as code (IaC) principles for configuration management and deployment of infrastructure.
Enhance operational efficiency by identifying repetitive tasks and developing automation to eliminate toil work.
Implement robust metrics, monitoring, and alerting for proactive issue identification and resolution.
Participate in incident response, on-call rotation, and post-incident reviews to ensure 24/7 availability of critical systems and to learn from failures and continuously improve system reliability.
Implement and enforce security best practices for infrastructure and applications.
Collaborate with security teams to ensure compliance with industry standards and regulations.
Empower others by sharing knowledge through documentation, training, and mentorship.
requirements-expected :
Proven experience as a Senior DevOps Engineer or Senior Site Reliability Engineer.
Strong expertise in cloud platforms such as AWS, GCP, or Azure.
Strong experience with CI/CD tools (Github Actions, GitLab CI, CircleCI) and version control systems (Git).
Proficiency with infrastructure-as-code tools (e.g., Terraform, Ansible, Cloudformation).
Hands-on experience with container orchestration tools like Docker and Kubernetes.
Solid understanding of networking, security, and system engineering.
Experience with monitoring and logging tools (e.g., Datadog, Prometheus, Grafana, ELK stack).
Strong scripting skills in languages such as Python, Shell, or similar.
Familiarity with security best practices and compliance requirements.
Excellent problem-solving and troubleshooting skills.
Ability to work collaboratively in a fast-paced, agile environment.
Passion for building the highest-quality solutions for the long term that delight the customer (both internal and external customers).