The main goal of the project is to establish a single source of truth for high-quality, maintainable data.
The customer wanted to integrate insurance and call center data (size in terabytes) from multiple warehouses while addressing tech debt.
Implemented Solutions:
- Merged data from various warehouses into a unified system.
- Optimized and automated data ingestion processes with Airflow and Airbyte.
- Added new data sources to enhance reporting and analytics capabilities.
- Refactored existing code and processes to improve system performance and maintainability.
- Developed a mechanism to analyze caller sentiment using the integrated data.
responsibilities :
Design, build, and maintain scalable data pipelines to ingest, transform and load data into Google Cloud Platform (GCP)
Serve as a source of knowledge for the Data Engineering team for process improvement, automation, and new technologies available to enable best-in-class timeliness and data coverage
Design data pipelines utilizing ETL tools, event-driven software, and other streaming software.
Partner with data scientists and engineers to bring our amazing concepts to reality. This requires learning to speak the language of statisticians and software engineers.
Ensure reliability in data pipelines and enforce data governance, security, and protection of our customers’ information while balancing tech debt.
Stay up to date with industry trends and best practices in data engineering and GCP services
requirements-expected :
Minimum 6 years of experience with GCP (Google BigQuery),
Familiarity and working knowledge of Apache Airflow, dbt, Datadog and Airbyte
You are comfortable and have expertise in data engineering tooling such as Jira, git, Buildkite, Terraform, Airflow, dbt, and containers, as well as the GCP suite, Terraform, Kubernetes, Cloud Functions
Extensive experience with data engineering techniques, Python, and using SQL
Technologies You Will Use:
Python for data pipelining and automation.
Airbyte for ETL purposes
Airflow and dbt for data pipelining
GitHub for version control,
Various APIs for data integration.
Google Cloud Platform, Terraform, Kubernetes, Cloud SQL, Cloud Functions, BigQuery, DataStore, and more: we keep adopting new tools as we grow!