Building Reliable RAG

Overview

Most RAG systems fail silently — they return plausible-sounding answers backed by the wrong documents. This masterclass focuses on building RAG pipelines that are accurate, measurable, and maintainable against real-world messy data.

What you’ll build

A complete RAG pipeline with:

Robust ingestion for heterogeneous document types (PDFs, HTML, databases, Confluence, Slack)
Hybrid retrieval (dense + sparse + reranking)
Automated quality evaluation suite
Production monitoring dashboard

Curriculum

Day 1 — Data Ingestion & Chunking

Document parsing strategies for PDFs, tables, images, and mixed-format sources
Chunking methods: fixed-size, semantic, recursive, document-structure-aware
Metadata extraction and enrichment
Handling multilingual and domain-specific content
Data cleaning pipelines for noisy enterprise data

Day 2 — Embeddings & Retrieval

Embedding model selection and fine-tuning for your domain
Vector databases: Qdrant, Weaviate, pgvector — choosing the right one
Hybrid search: combining dense vectors, BM25, and metadata filters
Reranking with cross-encoders
Query transformation: HyDE, multi-query, step-back prompting

Day 3 — Generation & Grounding

Prompt engineering for grounded generation
Citation and source attribution
Handling “I don’t know” — abstention and confidence estimation
Multi-turn conversational RAG
Structured output from RAG (tables, summaries, comparisons)

Day 4 — Evaluation & Testing

Building evaluation datasets from your domain
Metrics: retrieval precision/recall, answer faithfulness, relevance
Automated evaluation with LLM-as-judge
Regression testing: catching quality drops before deployment
Human-in-the-loop evaluation workflows

Day 5 — Production & Monitoring

Deployment architecture: sync vs async, caching strategies
Monitoring retrieval quality in production
Feedback loops: user signals to improve retrieval
Cost optimisation: balancing quality and latency
Capstone: end-to-end pipeline on your data

Prerequisites

Python proficiency
Basic familiarity with SQL and APIs
Sample documents from your domain (we’ll use them in exercises)

Outcomes

Your team leaves with a tested RAG pipeline on your own data, an evaluation suite to catch regressions, and a monitoring setup for production.