← All masterclasses

Building Reliable RAG

Data prep, retrieval quality, evals, and monitoring for messy real-world data.

3–5 days ML engineers, data engineers, backend developers

Overview

Most RAG systems fail silently — they return plausible-sounding answers backed by the wrong documents. This masterclass focuses on building RAG pipelines that are accurate, measurable, and maintainable against real-world messy data.

What you’ll build

A complete RAG pipeline with:

  • Robust ingestion for heterogeneous document types (PDFs, HTML, databases, Confluence, Slack)
  • Hybrid retrieval (dense + sparse + reranking)
  • Automated quality evaluation suite
  • Production monitoring dashboard

Curriculum

Day 1 — Data Ingestion & Chunking

  • Document parsing strategies for PDFs, tables, images, and mixed-format sources
  • Chunking methods: fixed-size, semantic, recursive, document-structure-aware
  • Metadata extraction and enrichment
  • Handling multilingual and domain-specific content
  • Data cleaning pipelines for noisy enterprise data

Day 2 — Embeddings & Retrieval

  • Embedding model selection and fine-tuning for your domain
  • Vector databases: Qdrant, Weaviate, pgvector — choosing the right one
  • Hybrid search: combining dense vectors, BM25, and metadata filters
  • Reranking with cross-encoders
  • Query transformation: HyDE, multi-query, step-back prompting

Day 3 — Generation & Grounding

  • Prompt engineering for grounded generation
  • Citation and source attribution
  • Handling “I don’t know” — abstention and confidence estimation
  • Multi-turn conversational RAG
  • Structured output from RAG (tables, summaries, comparisons)

Day 4 — Evaluation & Testing

  • Building evaluation datasets from your domain
  • Metrics: retrieval precision/recall, answer faithfulness, relevance
  • Automated evaluation with LLM-as-judge
  • Regression testing: catching quality drops before deployment
  • Human-in-the-loop evaluation workflows

Day 5 — Production & Monitoring

  • Deployment architecture: sync vs async, caching strategies
  • Monitoring retrieval quality in production
  • Feedback loops: user signals to improve retrieval
  • Cost optimisation: balancing quality and latency
  • Capstone: end-to-end pipeline on your data

Prerequisites

  • Python proficiency
  • Basic familiarity with SQL and APIs
  • Sample documents from your domain (we’ll use them in exercises)

Outcomes

Your team leaves with a tested RAG pipeline on your own data, an evaluation suite to catch regressions, and a monitoring setup for production.

Interested in this masterclass?

Tell me about your team and I'll tailor the programme to your needs.

Book this masterclass