technologies-expected :

responsibilities :

Build comprehensive training observability systems - Design and implement monitoring infrastructure to keep an eye on how model behaviors evolve throughout training.
Develop next-generation evaluation frameworks - Move beyond traditional benchmarks to create evaluations that capture real-world utility.
Create automated quality assessment pipelines - Build custom classifiers to continuously monitor models for complex issues
Create, follow, and refine data annotation guidelines to ensure consistently high-quality labeled datasets
Bridge research and production - Partner with research teams to translate cutting-edge evaluation techniques into production-ready systems, and work with engineering teams to ensure our monitoring infrastructure scales with increasingly complex training workflows.

requirements-expected :

Hands-on experience or openness to working in the field of AI governance, responsible AI, or AI/ML model documentation and further expanding your knowledge of these topics.
Proficiency in Python and experience building production ML systems
Experience with training, evaluating, or monitoring large language models
Experience creating, following, and refining data annotation guidelines to ensure high-quality labeled datasets
Experience building dashboards for reporting and visualization.
Ability to analyze and structure technical information into clear, standardized documentation.
Ability to create systematic and repeatable frameworks for AI documentation that engineering teams can easily adopt.
Strong communication and collaboration skills for working across teams.
Detail-oriented approach with an emphasis on accuracy and consistency.
Fluency in English (you will attend meetings with English speaking clients). Polish is a plus.

responsibilities :