.
Senior Spark Engineer
  • Warsaw
Senior Spark Engineer
Warszawa, Warsaw, Masovian Voivodeship, Polska
Link Group
16. 9. 2024
Informacje o stanowisku

Employment Type: Full-Time, Remote

Job Description: We are looking for a highly experienced Senior Spark Engineer with deep expertise in Apache Spark, particularly in performance tuning and managing cyclic Spark data flows. The candidate should be proficient in troubleshooting and optimizing real-time data processing systems, including customizing the Catalyst Optimizer. Experience working with federated data systems and distributed computing environments is essential, along with the ability to integrate external systems and APIs. The role will involve optimizing Spark pipeline performance across large-scale, multi-cloud environments.


Key Responsibilities:


  • Expertise in Apache Spark: Utilize in-depth knowledge of Spark, including performance tuning, query optimization, and customizing the Catalyst Optimizer for distributed systems.
  • Federated Data Systems: Design, implement, and manage data workflows within federated models across multi-cloud environments.
  • Performance Optimization: Diagnose and address bottlenecks in Spark jobs, ensuring scalable and efficient performance on large clusters.
  • Distributed Computing: Manage Spark clusters, oversee task scheduling, resource allocation, and ensure fault tolerance in distributed environments.
  • API Integration: Connect Spark applications with external systems and APIs to improve data processing workflows.
  • Scala and Java Development: Apply strong skills in Scala and Java to build, maintain, and optimize real-time distributed applications in Spark.
  • Front-End Collaboration: Work with front-end developers and data teams to create and deploy user interfaces for monitoring Spark pipeline performance.
  • CI/CD and Version Control: Develop and manage CI/CD pipelines to ensure reliable software development practices, version control, and automated deployment for distributed applications.


Required Skills and Experience:


  • Apache Spark: Advanced experience in tuning, optimizing, and customizing the Spark Catalyst Optimizer for maximum performance.
  • Scala and Java Proficiency: Strong hands-on experience with Scala and Java in Spark-based distributed systems.
  • Federated Data Models: Proven experience managing federated data systems in multi-cloud environments (e.g., AWS, GCP, Azure).
  • Distributed Computing: Deep understanding of distributed computing principles, including task scheduling, resource management, fault tolerance, and cluster optimization.
  • Performance Optimization: Demonstrated expertise in optimizing Spark pipelines for large-scale, high-volume systems.
  • API Integration: Experience integrating Spark with third-party systems and APIs to streamline data workflows.
  • Front-End Development: Basic to intermediate skills in front-end development to collaborate on building monitoring dashboards for Spark systems.
  • Software Development: Strong programming fundamentals, experience with version control (Git), and a solid understanding of CI/CD pipelines.


Preferred Qualifications:


  • Experience with Kubernetes for managing Spark clusters in containerized environments.
  • Familiarity with cloud platforms such as AWS, GCP, or Azure.
  • Knowledge of SQL and database integration with Spark.
  • Experience with big data tools (e.g., Hadoop, Kafka) used alongside Spark.


Why Join Us?


  • Be part of a forward-thinking, tech-driven team.
  • Work on cutting-edge distributed systems using federated models.
  • Collaborate with experts in cloud computing, big data, and data engineering.
  • Opportunities for professional growth and continuous learning.

  • Praca Warszawa
  • Warszawa - Oferty pracy w okolicznych lokalizacjach


    111 387
    20 584