Staff Site Reliability Engineer – (Hadoop)
Visa is a world leader in payments and technology, with over 259 billion payment transactions flowing safely every year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network.
This role focuses on ensuring the reliability and performance of Visa’s Data Platform built on an open‑source Hadoop ecosystem.
Job Description
- Sound knowledge of managing large‑scale Hadoop platforms, including monitoring, debugging, and tuning cluster performance.
- In‑depth knowledge of the Hadoop ecosystem: Zookeeper, HDFS, Yarn, Hive, Spark, Trino, and Kafka.
- Proven experience debugging issues on Hadoop platforms and applications.
- Familiarity with security tools such as Kerberos, Ranger, and Active Directory integrations.
- Experience with cloud technologies, preferably AWS EMR.
- Knowledge of Kubernetes, AI, and MLOps advantageous.
Collaboration & Teamwork
- Collaborate closely with L‑3 teams to review new use cases and implement cluster hardening techniques.
- Build and maintain strong relationships with customer teams, user communities, architects, and engineering teams.
- Work jointly on key deliverables to ensure production scalability and stability.
Automation
Hands‑on experience with automation using Ansible, Shell, Python, or other programming languages. Ability to automate manual tasks is key.
Observability
Knowledge of observability tools such as Grafana, Prometheus, and Splunk.
Operating System & Programming
Understanding of Linux, networking, CPU, memory, and storage. Ability to code in Python, Java, or another widely used language.
Communication
Excellent interpersonal, verbal, and written communication skills.
This position is not ideal for a Hadoop developer. This is a hybrid role; office days will be confirmed by the hiring manager.
Qualifications
- Key role in maintaining and supporting Visa’s Data Platform.
- Drive innovation for partners and clients by working on open‑source Big Data clusters.
Education & Experience
- Master’s degree in a related field.
- Bachelor’s degree in a related field and a minimum of five years of relevant experience.
- Minimum of five years of experience working with Hadoop systems.
Preferred Qualifications
- Experience in Big Data SRE and Engineering across platforms such as Hadoop, Kafka, HBase, and Spark.
- Strong troubleshooting and debugging skills.
- Root cause analysis of major production incidents and implementation of high‑availability solutions.
- Capacity planning, system expansions, and timely upgrades.
- Fine‑tuning alerting and observability tools.
- Strong documentation skills for SOPs and platform guidelines.
- Proficiency in DevOps tools and incident, problem, and change management.
- Commitment to meet service‑level agreements and experience in security remediation, automation, and self‑healing.
- Experience developing automation tools and reports using Shell, Ansible, Python, or other languages.
Additional Information
Visa is an EEO employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in accordance with EEOC guidelines.