- Building distributed and highly parallelized Big data processing pipeline which process massive amount of data (both structured and unstructured data) in near real-time
- Leveraging Spark to enrich and transform corporate data to enable searching, data visualization, and advanced analytics
- Working closely with analysts and business stakeholders to develop analytics models
- Continuous delivering on Hadoop and other Big Data Platforms
- Automating processes where possible and are repeatable and reliable
- Working closely with QA team