Agentic Pipelines on Airflow: From Thesis to Production

By Vikram Koka Kaxil Naik

The industry treats agents and pipelines as opposing paradigms. We think that framing is wrong. Most agentic problem-solving, when you look at what it actually does, has pipeline structure: gather data, process each dimension independently, synthesize, evaluate. The question is not “agents or pipelines?” but where the LLM fits inside the pipeline and what you gain by making each step explicit.

This talk makes that concrete. We start with AIP-99 and the operator library that gives Airflow first-class LLM support: inference, SQL generation, branching, schema validation, and embedding, all backed by PydanticAI with 20+ model providers out of the box. We walk through a real pipeline that analyzes 5,856 survey responses using four parallel LLM-generated queries, DataFusion execution, and a synthesis step, showing exactly where the LLM reasons and where the pipeline handles everything else.

Then we go deeper. Fault-tolerant agentic systems need more than retry counts. AIP-105 introduces pluggable retry policies that classify failures at the exception level, including an LLM-powered variant that distinguishes a rate limit from an auth error from a transient network blip. LLMSchemaCheckOperator validates upstream data before the LLM ever sees it. DAG Result API lets a pipeline expose a semantic output, turning a DAG into a callable function for downstream agents. These are not theoretical. We demo each one.

We close with what is next: persistent task state for agentic workflows that survive retries (AIP-103), and the path toward dynamic execution graphs that support feedback loops while preserving the auditability that makes pipelines worth building in the first place.

Vikram Koka

Chief Strategy Officer at Astronomer & PMC Member of Apache Airflow

Kaxil Naik

Airflow PMC member & Committer | Senior Director of Engineering @ Astronomer.io