These are the confirmed sessions for Airflow Summit 2026.
| Title |
|---|
[Keynote] The State of Airflow: Momentum, Innovation, and What's Nextby Vikram KokaAirflow 3 has been out for a year. In this keynote, we take stock of where the community stands, what we built together, and where we are headed. We open with the data: adoption trends, community growth, and honest feedback from teams running Airflow 3 in production. What is working, what surprised us, and what the survey tells us about how the ecosystem is evolving. The second section covers the year in Airflow. Provider discovery and distribution has been modernized. Airflow gained first-class support for AI and LLM workloads. And scheduling became more powerful, letting pipelines respond to data at a finer granularity. |
Airflow at the Heart of Equifaxs Sata Processingby Yuvaraj SankaranAt Equifax, Apache Airflow is used across many departments, helping Data Engineers, Data Scientists, and Business Analysts in their daily work. This presentation is about how to use modern orchestration technology at the heart of data processing and business processes to support daily company operations. |
Airflow Autopilot: The Generate-Verify-Refine Loop That Makes Pipeline Authoring Truly AI-Nativeby Yifan WangToday’s Pipeline authoring is synchronous: writing code, chasing error - every step blocks the engineer until resolved. You can’t step away or parallelize. Airflow Autopilot reimagines this to be AI-native and asynchronous. Describe your pipeline’s intent. The agent takes over - orchestrating two classes of purpose-built tools: tools that generate the DAG code and automate setup, and scorer tools that evaluate it across dimensions: e.g. data discovery, auth, compliance, DAG validation, even end-to-end execution. Every scorer returns a deterministic result and structured, prioritized hints. The agent runs the generate → verify → refine loop — calling scorers, reading hints, fixing code, re-scoring — until every dimension passes. You come back to a PR with DAGs that have been iteratively built, tested, and ready for review. For 10,000+ Airflow users, this shifts the engineer from executor to reviewer: you own the intent and final judgment, the agent owns the execution. Attendees leave with the architecture for an AI-native authoring experience, the principles behind decomposing work into scorer-sized verification units, and what it takes to scale this in production. |
Airflow to the rescue: managing chemical emergenciesby Eloi Codina TorrasAt Meteosim, Airflow is the engine for our entire decision system. It runs daily weather and air quality forecasts on schedule, but it also enables OnaChem React, a software that lets users manage chemical emergencies in real-time, and helps us manage consultancy projects. This talk covers how we set up Airflow 3 to handle five very different types of workloads:
We will explain why Airflow 3 was necessary to make this work. You will see how we orchestrate physics, AI, and human decisions in a single environment. |
Asset Partitions: Matching Workflow to the Right Databy Wei LeeAsset partitions are a key building block in Expanded Data Awareness. This session explains the core semantics of partition definitions, partition mappings, and backfill behavior in AIP-76. I will show how these pieces fit together in the current design, then discuss where asset partitions can go next, including improvements in authoring ergonomics, observability, and partition-aware workflow capabilities. Attendees will leave with a clear mental model of today’s implementation and a practical view of future direction. |
Beyond Multi-Cluster Airflow: Operating GPU Workloads at Scaleby Aleksandr Shirokov & Tarasov AlexeyAt last year’s Airflow Summit, we shared how we built a multi-cluster orchestration layer on top of Apache Airflow to run ML workloads across multiple Kubernetes GPU clusters. Once hundreds of ML engineers started running GPU pipelines in production, we discovered that orchestration alone is not enough. Operating multi-cluster GPU infrastructure introduces new challenges: controlling GPU allocation across teams, observing pipelines across clusters, and helping users run workloads efficiently without wasting expensive GPU resources. |
Build AI Pipelines with Apache Airflow 3by Kenten DanasApache Airflow® has long been the control plane for data pipelines. As AI workflows move into production, teams are discovering the same challenges apply: LLM calls fail, embeddings need regenerating, and agent outputs need human review. The operational discipline that Airflow brings to data pipelines is exactly what AI workflows need too. Rather than managing data pipelines in Airflow and AI workflows in a separate system, Airflow lets you build both in one observable, reliable control plane. You get scheduling, retries, lineage, versioning, and human-in-the-loop capabilities for your LLM tasks the same way you already have them for your SQL transformations. |
Building a Context-Aware Agentic Coding Platform for Airflow at Scaleby Yarden WolfGeneric AI coding assistants like Cursor and Claude code are powerful, but they struggle with proprietary infrastructures. At Wix, managing 7,500 active DAGs across 120 Data Engineers, we found that standard AI tools lacked the context to be truly effective - they didn’t know our custom operators, DWH modeling patterns, or strict governance rules. In this session, we introduce our internal “Agentic IDE Configuration Manager” that bridges this gap. We will demonstrate how we leverage MCPs to inject deep Airflow context into our AI agents. You will learn how we enabled our coding agents to: Generate compliant code by utilizing custom Cursor rules to ensure every DAG meets production standards and naming conventions. Interact with Airflow by using our custom MCPs to run DAGs locally, parse error logs, and autonomously fix pipeline failures. Understand data by accessing our Data Catalog and Trino engine to validate schema logic in real-time. Whether you are trying to optimize your team’s workflows or simply curious how far can coding agents go in the current age, join us in this exciting talk. |
Cloud Composer Workshop - Managing DAGs at Scaleby Danny De LeoDuring this workshop you are going to learn how to effectively set up CI/CD for Composer environment and build observance of your DAGs across many Cloud Composer environments |
DAGs Move Robots: Closed‑Loop Orchestration for Silicon Validation Labs with Airflowby Dheeraj Turaga, Deva Madhavan & Shubham RajWhat if your Airflow DAG could orchestrate robots, thermal chambers, and silicon tests, not just code? Silicon validation labs rely on scarce, stateful physical resources: robotic handlers, DUT boards, thermal/power systems, instruments, and shared hardware queues. Teams often coordinate these via spreadsheets and ad hoc reservations, causing contention, idle gaps, conflicts, poor observability, and slow triage. This talk presents a closed-loop orchestration model where Apache Airflow is the control plane for a software-defined validation lab. A central DAG coordinates robotic handling, thermal/power setup, stress and performance runs, and parametric characterization on hosts connected to silicon. It continuously ingests hardware health, measurements, and test outcomes, then feeds results into AI-assisted analysis to choose the next physical action: refine parameters, schedule follow-up experiments, or trigger mitigation. |
Migrating Airflow 2 to 3 for Infrastructure Operations at Scaleby Ethan (Tianyang) Lin & Rumeysa OzaydinThis talk covers migrating a production Airflow platform that orchestrates a large VM fleet — provisioning, OS patching, and decommissioning at high concurrency. This is not a data pipeline — it is infrastructure operations at fleet scale We’ll share workflow patterns that make fleet-scale orchestration possible in Airflow, then cover how we moved from an Airflow 2 monolith — all components on every node with fixed worker counts — to Airflow 3 with independently scalable services, each with its own release cycle. We’ll dig into a silent breaking change in Airflow 3’s XCom behavior: xcom_pull(key=…) without task_ids no longer searches upstream tasks, returning None with no warning. We’ll present three iterations of solving this — from O(n) DAG traversal to a custom XCom backend that restores Airflow 2 semantics with zero DAG code changes — and the design tradeoffs at each stage. Attendees will learn how Airflow powers infrastructure operations beyond data pipelines, how Airflow 3’s XCom silently breaks Airflow 2 workflows, three approaches to the same migration problem, and lessons from running both versions in parallel. |
Multi-Team Airflow: A Customer-Driven Journeyby Niko Oliveira & Vincent BeckAs Airflow deployments scale and the number of Dag authors increases the question arises: how do we support many teams with different needs and requirements on a shared platform? Over the years we’ve observed many organizations building their own multi-tenant layers on top of Apache Airflow to solve this problem and we’re now adding native support for this type of deployment. This talk explores building multi-team support in Airflow, working backwards from those real deployment challenges and community pain points we’ve observed. |
One Codebase, Many Distributions: Airflow’s Modular Approachby Jarek Potiuk & Amogh DesaiAirflow’s evolution toward a client-server architecture faced a fundamental challenge: splitting a monolithic codebase into independent distributions (airflow-core, task-sdk, providers) without triggering dependency hell. Traditional PyPi packaging and code duplication both fail at Airflow’s scale. Airflow 3.2 solves this through modular isolation and shared libraries using in-repository symlinks. This approach ensures each distribution ships with the exact version of shared code it requires, eliminating runtime version conflicts and allowing for independent dependency management. We have already migrated 10+ critical components—including the config parser, observability, and secrets masking—into this shared model. |
Orchestrating 100 ML Models using Airflowby Ryan StevensProductionizing ML workflows is complicated; scaling them is harder. At Ramp, we grew from zero to nearly 100 production ML models powering systems like credit risk assessment and sales lead valuation. This talk covers how Airflow became the backbone of our ML platform, orchestrating ETL jobs, data quality checks, and model runs. We’ll discuss how we evolved it to meet the increasing complexity of our ML systems. Every ML system consists of feature creation and large-batch inference. We started with a few DBT models and one cloud-hosted notebook, which evolved into thousands of upstream tables and hundreds of AWS batch inference jobs. |
Scaling Airflow for Capacity Forecasting at Amazon Prime Videoby Shivam RastogiAmazon Prime Video uses Airflow to forecast traffic for hundreds of micro-services to deliver the best customer experience for some of the world’s biggest live events across multiple global regions. The forecasting methodology involves complex job dependencies between customer interaction metrics and geographies - translating to ~50 production DAGs with cross-DAG dependencies that process terabytes of customer activity data daily across tens of thousands of compute cores. In this talk, we’ll cover how we manage dependency complexity at scale, coordinate data flows across geographical boundaries, and keep forecasts reliable as the system grows. |
Self-Service DAGs: Event-Driven Design for GitHub Actions and Airflow at Lyftby Ken ObataAt Lyft, driver pay configs on GitHub must be validated through Airflow DAGs before merging. However, Scientists and Analysts who change configs are not familiar with Airflow. How do we make such validation self-service while meeting SOX compliance? This talk presents a design pattern for bidirectional GitHub-Airflow integration: GitHub Actions trigger DAGs, and DAGs push results back as PR status checks via the GitHub Commit Status API. We cover event-driven push-style vs traditional polling style, and why an event-driven push-style works well with Dynamic Task Mapping. This pattern aligns with Airflow 3’s event-driven scheduling vision. We also discuss how SOX requirements shaped this design. |
Streamlining Data Pipelines Creation at Stripe with Airflowby Jiayu YiAt Stripe, we process petabytes of data daily across thousands of pipelines powering financial reporting, fraud detection, and merchant analytics. As our data estate grew, so did the complexity of authoring, scheduling, and operating these pipelines. Engineers spent more time wrangling Airflow DAG boilerplate and managing dependencies than writing transformation logic. To address this, we built a declarative platform that generates Airflow DAGs from YAML and SQL definitions. Authors specify what they want — source tables, SQL transformations, incremental mode, output schema — and the platform handles the rest: generating Airflow tasks, wiring upstream sensors, registering Iceberg tables, and configuring scheduling parameters. A key piece is an in-house dataset-to-task mapping service that resolves upstream dataset dependencies to their producing Airflow tasks. When an author declares an input dataset, the platform automatically looks up which task produces it and generates the appropriate sensor — no manual DAG cross-referencing required. This eliminates an entire class of misconfigured dependency bugs common in hand-wired Airflow deployments. |
The Messy Middle: When Your Data Team Doesn't Need a Streaming Engineby Constance MartineauThere’s a class of workload that doesn’t belong in your streaming stack. A team needs to react to data arriving in S3 or a message landing in Kafka. The SLA is minutes. Someone reaches for Flink because the orchestrator can’t trigger on events. Six months later, you’re running a streaming app for what is a bounded computation with a latency requirement. This talk names that pattern, the “messy middle,” and argues that Airflow 3 eliminates the gap that pushed these workloads to streaming. Asset Watchers monitor external sources through async triggers, firing DAGs within minutes of event arrival. Assets turn data products into scheduling primitives. Partitions let Airflow reason about which slices of a dataset are ready. |
Toward a Polyglot Airflowby Tzu-ping ChungBuilding on Airflow 3’s new worker structure and foundation laied by Go SDK, we take a look at how Airflow can support a fully cross-language Dag-authoring experience. We will discuss how a new language SDK is built, how a task talks to Airflow, and how multiple languages may be mixed inside a Dag. To support additional languages without logic duplication, a new middle layer is required between Airflow and the task. Additional topics, such as security, distributed workload, and user interface considerations, will also be touched on. |
When Airflow Meets Yunikorn: Enhancing Airflow on Kubernetes with Yunikorn for Higher Efficiencyby Xiaodong DengApache Airflow’s Kubernetes integration enables flexible workload execution on Kubernetes but lacks advanced resource management features including application queueing, tenant isolation and gang scheduling. These features are increasingly critical for data engineering as well as AI/ML use cases, particularly GPU utilization optimization. For example, gang scheduling ensures all required resources for a job are allocated atomically, preventing partial allocations that waste resources. Apache Yunikorn, a Kubernetes-native scheduler, addresses these gaps by offering a high-performance alternative to Kubernetes default scheduler. In this talk, we’ll demonstrate how to conveniently leverage Yunikorn’s power in Airflow, along with practical use cases and examples. |
Your first Apache Airflow Contributionby Amogh Rajesh Desai, Kalyan Reddy & Phani KumarReady to contribute to Apache Airflow? In this hands-on workshop, we’ll help you jump straight into the project with real, beginner-friendly issues matched to your skills and interests. To make the most of our time together, come with a development environment set up in advance — installing Breeze is highly recommended, but GitHub Codespaces is a great alternative if Docker isn’t an option for you. We’ll walk through the full contribution journey step by step: exploring the codebase, picking an issue, opening your first pull request, and engaging with the community for feedback and reviews. Whether you’re interested in writing code, improving documentation, writing tests, or sharing ideas, there’s a welcoming place for you in the Airflow community. |