We are seeking a highly skilled Senior Data Engineer to join our team. This position involves working on a groundbreaking project that enables the discovery, access, processing, publication, and sharing of biomedical data. The objective is to generate insights for secondary use and to integrate both clinical and non-clinical data (Real-World Data, RWD) using the EDIS end-to-end engine. The ideal candidate will possess a strong background in ETL processes, data modeling, and cloud technologies, particularly within the AWS ecosystem.
Senior AWS Data Engineer
Your responsibilities
- Develop and enhance Data Warehouse solutions.
- Design and implement ETL processes to ensure efficient data flow and storage.
- Maintain and optimize data pipelines using various cloud technologies.
- Collaborate with cross-functional teams to integrate and process biomedical data.
- Conduct performance analysis, troubleshooting, and remediation of data-related issues.
- Ensure data quality and adherence to architectural standards.
- Utilize best practices in data architecture, including data modeling, metadata management, and workflow management.
Our requirements
- 4+ years in programming with a focus on data pipelines using Python or R.
- 4+ years of experience working with SQL and relational databases.
- 3+ years in maintaining data pipelines and handling various data types (structured, unstructured, metrics, logs).
- 3+ years of experience in data architecture concepts, including ETL/ELT processes, real-time streaming, and data quality.
- 3+ years of experience with cloud technologies and data pipeline solutions such as Airflow, Glue, and Dataflow.
- Proficiency in AWS components such as S3, Redshift, DocumentDB, and DynamoDB.
- Familiarity with different storage solutions (filesystem, relational, MPP, NoSQL).
- Basic knowledge of Java and/or Scala (1+ years).
- Strong understanding of data serialization languages like JSON, XML, and YAML.
- Proficient in using Git, Gitflow, and DevOps tools (Docker, Bamboo, Jenkins, Terraform).
- Excellent knowledge of Unix and its environment.
- Experience with pharmaceutical data formats (SDTM).