Working with Spark & Python to define and maintain data ingestions and transformation
Building distributed and highly parallelized big data processing pipeline which process massive amount of data (both structured and unstructured data) in near real-time
Leverage Spark to enrich and transform corporate data to enable searching, data visualization, and advanced analytics
Working closely with analysts and business stakeholders to develop analytics models
Collaboration with Data Scientist and Machine Learning experts
Providing detailed information about progress during PI Demo sessions
Our requirements
Experience in Spark & Python - minimum 4 years
Strong Spark SQL or Hive SQL - minimum 4 years
Experience with Hadoop/Hive ecosystem and/or other BIG Data technologies - minimum 3 years
Previous experience in creating data flows (ETL's, ELT's, ect.)
Experience with BitBucket and GIT, code versioning and branching strategy
Familiar with Agile/Safe framework
Cloud Experience is a big plus (AWS, GCP, AZURE)
Basic knowledge about machine learning models (how to build, validate and maintenance regression models)