We are looking for experienced Big Data Engineer specializing in Scala, Hadoop, and Spark within a large enterprise digital product company. The role involves a mix of technical proficiency, strategic thinking, collaboration, and proactive problem-solving to deliver robust and scalable data solutions.
Senior Big Data Engineer
Your responsibilities
- Design and Develop Big Data Solutions: Architect, design, and implement scalable and reliable big data solutions using Scala, Hadoop, and Spark technologies to meet business requirements.
- Data Ingestion and Processing: Develop robust data ingestion processes to acquire, clean, and transform large volumes of structured and unstructured data from various sources into usable formats.
- Optimize Data Pipelines: Build and optimize data pipelines and workflows for efficient data processing, storage, and retrieval, ensuring high performance and low latency.
- Cluster Management and Optimization: Manage and optimize Hadoop cluster resources to ensure high availability, reliability, and scalability of big data applications.
- Real-Time Data Processing: Implement real-time data processing solutions using Spark Streaming or other similar technologies to enable real-time analytics and decision-making.
- Data Quality and Governance: Establish and enforce data quality standards, data governance policies, and best practices to ensure data accuracy, consistency, and integrity across systems.
- Performance Tuning and Monitoring: Monitor system performance, troubleshoot bottlenecks, and conduct performance tuning of Spark jobs and Hadoop clusters to optimize resource utilization and job execution times.
Our requirements
- Strong Programming Skills: Proficiency in Scala and Java, including a deep understanding of functional programming concepts in Scala.
- Expertise in Big Data Technologies: Extensive experience with Apache Hadoop ecosystem components such as HDFS, MapReduce, Hive, Pig, and HBase.
- Advanced Spark Knowledge: In-depth knowledge of Apache Spark, including Spark Core, Spark SQL, Spark Streaming, and MLlib for large-scale data processing and analytics.
- Data Modeling and ETL: Experience with data modeling, schema design, and implementing efficient ETL (Extract, Transform, Load) processes using big data technologies.
- Cluster Management Tools: Hands-on experience with cluster management tools like Apache YARN, Apache Mesos, or Kubernetes for resource allocation and job scheduling.
- SQL and NoSQL Databases: Proficiency in SQL for querying and analyzing data stored in relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra).
What we offer
- Opportunity to work on bleeding-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, sports
- Corporate social events
- Professional development opportunities
- Well-equipped office