Orchestrating and Testing RAG Pipelines with Airflow

By Shrividya Hegde

RAG pipelines fail silently. Bad retrievals, stale vectors, and unfaithful answers rarely trigger alerts, your row counts pass, your DAG turns green, and your AI product quietly gets worse. This session presents a reference DAG architecture for production-grade RAG ingestion on Airflow 3, with inline quality gates that evaluate retrieval accuracy and answer faithfulness before new vectors are promoted to production. We’ll walk through four failure modes , chunking regressions, embedding model drift, partial re-index states, and retrieval quality decay, and the specific Airflow pattern that catches each, using Ragas for evaluation and Airflow 3’s TaskFlow API, Assets, and DAG versioning for reproducible, event-driven runs.

Shrividya Hegde

Senior AI Data Engineer