We are seeking a highly skilled Senior Data Engineer to join our team. This position involves working on a groundbreaking project that enables the discovery, access, processing, publication, and sharing of biomedical data. The objective is to generate insights for secondary use and to integrate both clinical and non-clinical data (Real-World Data, RWD) using the EDIS end-to-end engine. The ideal candidate will possess a strong background in ETL processes, data modeling, and cloud technologies, particularly within the AWS ecosystem.
responsibilities :
Develop and enhance Data Warehouse solutions.
Design and implement ETL processes to ensure efficient data flow and storage.
Maintain and optimize data pipelines using various cloud technologies.
Collaborate with cross-functional teams to integrate and process biomedical data.
Conduct performance analysis, troubleshooting, and remediation of data-related issues.
Ensure data quality and adherence to architectural standards.
Utilize best practices in data architecture, including data modeling, metadata management, and workflow management.
requirements-expected :
4+ years in programming with a focus on data pipelines using Python or R.
4+ years of experience working with SQL and relational databases.
3+ years in maintaining data pipelines and handling various data types (structured, unstructured, metrics, logs).
3+ years of experience in data architecture concepts, including ETL/ELT processes, real-time streaming, and data quality.
3+ years of experience with cloud technologies and data pipeline solutions such as Airflow, Glue, and Dataflow.
Proficiency in AWS components such as S3, Redshift, DocumentDB, and DynamoDB.
Familiarity with different storage solutions (filesystem, relational, MPP, NoSQL).
Basic knowledge of Java and/or Scala (1+ years).
Strong understanding of data serialization languages like JSON, XML, and YAML.
Proficient in using Git, Gitflow, and DevOps tools (Docker, Bamboo, Jenkins, Terraform).