We seek an AI Expert with deep expertise in designing, implementing, and optimizing Retrieval Augmented Generation (RAG) systems in on-premises environments. The ideal candidate will have hands-on experience with vLLM, liteLLM, and open-source LLMs like LLAMA 3.2, along with a proven ability to integrate these tools into scalable, secure, and high-performance enterprise workflows.
Network & Services International (NWI) develops, plans, builds and operates the international network infrastructure of DTAG and produces intercarrier and wholesale services for the sales units W-IC, B2B and IoT.
The Squad T-BDA is responsible for providing seamless AI and Automation solutions to DT internal customers which are integrated in a self-hosted environment.
responsibilities :
RAG System Development:
Architect and deploy end-to-end RAG pipelines, combining retrieval mechanisms (e.g., vector databases like Neo4j) with generative models (e.g., LLAMA) for enterprise use cases.
Fine-tune and optimize retrieval models to ensure high accuracy and low latency in on-prem environments.
Model Integration & Deployment:
Implement and customize inference servers using vLLM for efficient LLM serving and LiteLLM for lightweight model orchestration.
Integrate open-source LLMs (e.g., LLAMA, Mistral) with proprietary data sources and APIs.
On-Prem Infrastructure Management:
Design GPU-optimized, scalable infrastructure for LLM training and inference, ensuring compliance with security and data governance policies.
Collaborate with DevOps teams to containerize workflows using Docker/Kubernetes and automate MLOps pipelines.
Performance Optimization:
Apply techniques like quantization, pruning, and dynamic batching to maximize resource efficiency in resource-constrained on-prem setups.
Monitor system performance, troubleshoot bottlenecks, and ensure high availability.
Cross-Functional Collaboration:
Partner with data engineers to curate and preprocess domain-specific datasets for retrieval and generation tasks.
Translate business requirements into technical solutions for stakeholders in telco environments.
requirements-expected :
Education:
Bachelor’s/Master’s/PhD in Computer Science, AI, or related field
Experience:
3+ years in ML/AI roles, with 2+ years focused on RAG systems.
Proven experience deploying LLMs in on-prem or hybrid environments.
Proficiency with vLLM, LiteLLM, and open-source LLMs (e.g., LLAMA, Deepseek, Mistral).
Experience in introducing AI Agents/Assistants
Technical Skills:
Strong Python expertise with frameworks like PyTorch, Hugging Face Transformers, and LangChain.
Experience with vector/graph databases (e.g. Neo4j).
Familiarity with Linux-based systems and RedHat OpenShift
Soft Skills:
Ability to communicate complex AI concepts to non-technical stakeholders.
Strong problem-solving skills and adaptability in fast-paced environments.
offered :
Employment contract
Additional day off on the occasion of birthdays/namedays
Medical package and life insurance
Benefit platform - you choose what you benefit from
Access to training platforms to improve your knowledge
I know Talent - training or money for referring friends to work ?!
Besides, with us you can count on access to our products and services at preferential terms and benefits, which you will read about below
benefits :
sharing the costs of sports activities
private medical care
sharing the costs of foreign language classes
sharing the costs of professional training & courses
life insurance
flexible working time
corporate products and services at discounted prices
integration events
mobile phone available for private use
no dress code
parking space for employees
extra social benefits
sharing the costs of tickets to the movies, theater
holiday funds
birthday celebration
sharing the costs of a streaming platform subscription