The role involves designing, implementing, and optimizing document intelligence pipelines and ML models for document classification, segmentation, and field extraction on AWS platforms.
Senior ML Engineer
Your responsibilities
- Design and implement end-to-end document intelligence pipelines on AWS
- Develop and optimize ML models for document classification, segmentation, and field extraction
- Build scalable data processing systems handling PDFs up to 2000 pages
- Collaborate with subject matter experts to create and refine requirements for extraction
- Own features from research through production deployment and monitoring
- Establish evaluation frameworks and quality metrics for extraction accuracy
Our requirements
- Advanced knowledge of Python (native, Pandas, ScikitLearn, Tensorflow or Pytorch, PyStats, Pydantic)
- Experience with AWS tools for ML Engineering and ML deployment (Sagemaker, Lambda, Cloudformation/CDK, Step Functions)
- Advanced knowledge of SQL and Data Modeling
- Experience with GenAI for document intelligence, including prompt engineering, RAG (Retrieval Augmented Generation), multi-modal models (vision + text), and production deployment using AWS Bedrock or Azure OpenAI APIs
- Experience in experiment design (power analysis and hypothesis testing)
- Proficiency in both written and verbal communication, required for a remote and largely asynchronous work environment
- Demonstrated capacity to clearly and concisely communicate complex technical problems and propose iterative solutions
- Experience owning a feature from concept to production, including proposal, discussion, and execution
- Experience with document processing tools (AWS Textract, Azure Document Intelligence, or similar OCR/layout detection systems)
- Experience with PDF and Image processing libraries (e.g. PyMuPDF, OpenCV, Pillow)
- Experience in Machine Learning/Data Science (e.g., ML algorithm selection, feature engineering, model training, hyperparameter tuning, supervised and unsupervised learning implementation, building model pipelines, using Machine Learning tools/libraries/frameworks)
- Experience working with AWS big data technologies (Redshift, S3, EMR, Glue, etc.)