Gen AI using Airflow 3: A vision for Airflow RAGs

Presented at Airflow Summit 2024

By Kaxil Naik Ash Berlin-Taylor

Gen AI has taken the computing world by storm. As Enterprises and Startups have started to experiment with LLM applications, it has become clear that providing the right context to these LLM applications is critical.

This process known as Retrieval augmented generation (RAG) relies on adding custom data to the large language model, so that the efficacy of the response can be improved. Processing custom data and integrating with Enterprise applications is a strength of Apache Airflow.

This talk goes into details about a vision to enhance Apache Airflow to more intuitively support RAG, with additional capabilities and patterns. Specifically, these include the following

Support for unstructured data sources such as Text, but also extending to Image, Audio, Video, and Custom sensor data
LLM model invocation, including both external model services through APIs and local models using container invocation.
Automatic Index Refreshing with a focus on unstructured data lifecycle management to avoid cumbersome and expensive index creation on Vector databases
Templates for hallucination reduction via testing and scoping strategies

Download slides

Kaxil Naik

Airflow PMC member & Committer | Senior Director of Engineering at Astronomer

Ash Berlin-Taylor

Airflow PMC member & Director Airflow Engineering at Astronomer