We are seeking talented individuals to join our team in a key project focused on developing data products and reporting solutions that support decision-making for organizations servicing diagnostic instruments.
responsibilities :
Develop and Maintain Data Pipelines: Design and optimize data pipelines using Python or R for efficient data processing
Data Architecture Implementation: Collaborate on data architecture concepts, including ETL/ELT processes and real-time data streaming
Performance Optimization: Conduct performance analysis and troubleshooting of data pipelines to enhance efficiency
Utilize Advanced Tools: Leverage tools like DBT, Airflow, and cloud services to improve data processing and storage
Data Quality Assurance: Implement data quality checks and ensure compliance with relevant standards, particularly in pharmaceutical data formats
requirements-expected :
Programming Experience: several years in programming languages for data pipelines, particularly Python or R
Java/Scala Experience: 1+ year in Java or Scala
SQL Proficiency
Several years of experience in managing data pipelines
Diverse Storage Knowledge: familiarity with various storage types (file systems, relational, NoSQL) and data formats (structured, unstructured).
Knowledge of Cloud Technologies: working with cloud data solutions (e.g., Airflow, Glue, BigQuery)
Strong knowledge of Git, Gitflow, and tools like Docker and Jenkins