We are seeking a highly skilled and motivated Cloud Operations Engineer to join our IT Operations team. You will be a key player in managing and optimizing our hybrid cloud infrastructure, ensuring high availability, performance, and security of our systems across AWS and on-premises environments. You will play a crucial role in our migration to OpenShift virtualization and help lead our transition to a containerized infrastructure.
Responsibilities:
Cloud and On-Premises Infrastructure Management:
Provision, configure, and troubleshoot Linux servers in AWS and on-premises environments.
Administer key AWS services, including S3, ELB, EFS, and Auto Scaling.
Implement and maintain system monitoring and alerting to ensure optimal performance and stability.
DevOps and Automation:
Utilize Git and GitHub Actions for CI/CD automation.
Develop and maintain Infrastructure-as-Code (IaC) using Terraform and Ansible.
Leverage scripting skills (Bash, PowerShell) and AWS CLI for automation tasks.
Virtualization and Containerization:
Key Goal: Participate in migrating VMs from VMware to OpenShift Virtualization in Q1-Q2 2025.
Train the rest of the team on Kubernetes and OpenShift, enabling a smooth transition to containerized infrastructure.
Contribute to the planning and implementation of our broader containerization strategy.
Incident Response and Support:
Participate in on-call rotation to address and resolve production issues.
Proactively identify and address potential system vulnerabilities and performance bottlenecks.
Collaboration and Knowledge Sharing:
Create and maintain comprehensive documentation of systems and processes.
Actively share knowledge and collaborate effectively with team members.
Basic Requirements
Qualifications:
Experience: 5+ years of proven experience as a Cloud Operations Engineer or Systems Administrator with hands-on experience managing cloud-based infrastructure.
Cloud Expertise: Strong understanding of AWS cloud platform, with experience managing core services (EC2, S3, VPC, ELB, EFS, Auto Scaling).
Linux Proficiency: Deep understanding of Linux system administration, including networking, security, and performance tuning.
Networking: Solid understanding of networking principles and protocols.
DevOps Skills: Experience with Git, GitHub Actions, IaC tools (Terraform, Ansible), and scripting (Bash, PowerShell, AWS CLI).
Problem-Solving Abilities: Excellent analytical and problem-solving skills with a proactive approach to identifying and resolving issues.
Communication and Teamwork: Strong communication and interpersonal skills, with the ability to work effectively in a collaborative team environment.
Highly Desired:
Linux Certification (RHCSA or RHCE)
AWS Certification (SysOps Administrator)
Kubernetes Certification (CKAD or CKA)
OpenShift Certification (EX280 or EX316)
Desired:
Experience with VMware vSphere and virtualization technologies.
Familiarity with other cloud service providers (GCP, OCI, Azure).
Understanding of security best practices for cloud and on-premises environments.