Project enables find, access, processing, publication and sharing of biomedical data to generate insight for secondary use. It also contains EDIS end to end engine for secondary use and primary exploration and integration with externally generated data (RWD) from both clinical and non-clinical sources.
Senior ML Engineer / Data Scientist with focus on LLM
Your responsibilities
- Design, develop, and deploy solutions based on large language models (LLM), including agent design and tool utilization for problem-solving.
- Fine-tune LLM models and optimize their performance to meet specific business requirements.
- Develop and optimize data pipelines and deployment pipelines for LLM-based applications.
- Integrate clinical, non-clinical, and external real-world data (RWD) from various sources.
- Work with ML/AI tools, including AWS SageMaker, PyTorch, TensorFlow, Vertex AI, and implement MLOps solutions using tools like Kubeflow.
- Create scripts and automate processes using tools like Git, Bash, Docker, and Kubernetes.
- Develop scalable applications in cloud environments (AWS, Azure, GCP).
- Implement Continuous Integration / Continuous Deployment (CI/CD) practices using tools like Jenkins or GitLab CI.
- Collaborate with teams across different locations and cultures to deliver customer-oriented solutions.
- Test and optimize ML models, manage training and testing datasets, and mitigate overfitting.
Our requirements
- Experience with LLM applications development in particular agentic design such as tool using and reasoning.
- Experience in building data pipelines and deployment pipelines for LLM applications.
- Recent experience with ML/AI toolkits such as AWS Sagemager (other toolkits like Pytorch, Tensorflow, Keras, MXNet, H20, etc are nice to have).
- Experience with MLOps technologies (Sagemaker, Vertex AI, Kubeflow).
- Experience with cloud solutions (AWS / Azure / GCP), docker.
- Proven scripting and automation skills.
- Good knowledge of: git, bash, linux, CI/CD tools (e.g. jenkins, gitlab CI), software lifecycle, RDB, visualization tools eg Tableau, Jira, confluence.
- Programming languages: Python, R.
- Test driven development, good coding practices.
- Problem-solving and decision-making skills.
- Customer & delivery focus.
- Ability to work effectively with team members and virtual teams from different locations and different cultural backgrounds.
- Experience with LLM fine tuning a big plus.
- Experience with deployment of scalable apps a plus.
- Experience with clinical study data a plus.