
Search by job, company or skills
Job summary
Data Scientist / GenAI Specialist to lead the applied AI work for GenAI and LLM use cases and own the model quality of each solution from proof-of-concept through stabilization. This role covers feasibility, prompt engineering, RAG tuning, fine-tuning decisions, evaluation design, and continuous model-quality improvement. Working within the AI Centre of Excellence, the ideal candidate pairs strong hands-on data-science craft with the rigour to measure and reduce hallucination, bias, and toxicity, define quality SLOs, and produce the evidence a model-risk reviewer needs in a regulated banking environment.
Job description
• Leads proof-of-concept feasibility, rapid prototyping, and value-hypothesis validation for GenAI use cases.
• Designs and implements prompt engineering, few-shot strategies, chain-of-thought, and agent reasoning patterns.
• Tunes RAG components: chunking, embeddings, retrievers, re-rankers, and context-window strategy.
• Decides fine-tuning versus prompting versus tool-use trade-offs, and selects base models and adapter strategies.
• Designs evaluation frameworks: offline evals, LLM-as-judge, human-in-the-loop review, A/B testing, and red-teaming.
• Measures and reduces hallucination, bias, and toxicity, and defines quality SLOs per use case (accuracy, hallucination, bias, toxicity).
• Documents model cards, evaluation reports, and assumptions for model-risk review.
• Partners with Data Engineers on data quality, curation, and annotation, and monitors production model quality and drift.
• Iteratively improves prompts, retrievers, and evaluation hooks, contributing to reusable eval pipelines and prompt/RAG libraries.
• Experiments with new models, prompts, and retrieval strategies, and applies methodologies aligned with quality SLOs and Responsible-AI goals.
Qualifications
• Bachelor's or Master's degree in Computer Science, Data Science, Statistics, Machine Learning, or a related quantitative field (Master's preferred).
• 5+ years of data-science / machine-learning experience, with demonstrated hands-on LLM/GenAI delivery.
• Strong Python and the ML/AI ecosystem: PyTorch or TensorFlow, Hugging Face, scikit-learn, pandas/NumPy, and notebook workflows.
• Practical experience with prompt engineering, RAG (chunking, embeddings, retrievers, re-rankers), and agent patterns.
• Experience designing evaluation — offline evals, LLM-as-judge, human-in-the-loop, A/B testing, and red-teaming — with frameworks such as Ragas or DeepEval.
• Understanding of fine-tuning approaches (LoRA/PEFT) and their trade-offs versus prompting and tool-use.
• Ability to define and measure model-quality metrics and to write clear model cards and evaluation reports.
• Solid grasp of Responsible AI, bias, and drift monitoring; exposure to model-risk validation is an advantage.
• Experience with Azure OpenAI / Azure AI Foundry, Azure Machine Learning, and Azure AI Search is preferred.
• Comfortable working in an Agile / Scrum environment.
Job ID: 149788925
Skills:
Pytorch, Spark, Kafka, Databricks, Python, AWS, LangChain, Hugging Face Transformers, MLflow
Skills:
Big Data Technologies, Tableau, Tensorflow, Pytorch, Python, AWS, Hadoop, Power Bi, Sql, Gcp, Spark, Databricks, Azure, GenAI, machine learning frameworks, NLP frameworks, data visualization tools, MLOps practices, R, RAG pipelines, LangChain, LLMs, Scikit-learn, cloud-based data platforms
Skills:
Sql, Pytorch, Python, Tensorflow, n8n, Dify, Coze, Hugging Face
We don’t charge any money for job offers