Airflow Summit 2026 sessions

These are the confirmed sessions for Airflow Summit 2026.

Title
A Decade of Spark + Airflow: How Two Communities Built the De Facto Standard for Data Engineering by Lisa Cao & Jarek Potiuk No two open source projects have shaped modern data engineering more than Apache Spark and Apache Airflow. But their partnership wasn’t designed- it was earned. From the early days of BashOperator wrapping spark-submit, through the SparkSubmitOperator, Livy, Kubernetes-native execution, and now Airflow 3’s asset-aware scheduling paired with Spark’s Declarative Pipelines, the integration story is a masterclass in how independent communities converge on shared problems without shared governance. This talk traces the full arc: how Spark’s compute model and Airflow’s orchestration model co-evolved, where they fought, where they complemented each other, and what the next chapter looks like as both projects ship their most ambitious releases simultaneously. Along the way, we’ll examine the contribution patterns, the cross-pollination of committers, and why this particular pairing outlasted every managed alternative that tried to replace it. This is not a vendor talk. This is a community talk about what happens when two ecosystems trust each other enough to stay independent. See more ...
A SQL Query is Just a DAG: Building an SQL Engine on Apache Airflow by Hussein Awala Ever wondered what happens between typing `SELECT ... GROUP BY` and getting results back? Inside every SQL engine lives a scheduler that breaks your query into a DAG of tasks — shuffling, sorting, aggregating, and parallelizing work across partitions. Sound familiar? In this talk, I’ll demystify SQL engine internals by building one on top of Apache Airflow. We’ll take a SQL query, parse it, optimize it, and transform it into a DAG of Airflow tasks that you can watch execute step by step in the Airflow UI. See more ...
Advanced Deadine Alerts: Writing your own custom References and Callbacks by Dennis Ferruzzi Airflow 3’s Deadline Alerts let you set “need-by” times on DAGs and fire callbacks when deadlines are missed. The built-in references cover common cases, but the real power is the feature’s extensibility. In this workshop, led by the feature’s author, we will go beyond the basics and explore these more advanced features. We start with an overview of how DeadlineAlert, DeadlineReference, and Callback fit together, and how the scheduler detects misses. Then, a guided project: coding our own Callback implementation and building custom DeadlineReference classes using the @deadline_reference decorator, implementing _evaluate_with(), serialization, and required_kwargs. We wrap up with a hackathon-style “competition” to build the most creative WORKING DeadlineReference (business hours, the last time it didn’t rain in Vancouver, the moon phases, the last time the Leafs won the cup… anything goes, as long as it serializes and returns a valid datetime). See more ...
Agentic Incident DAG: Structured, Replayable Data Incident Intelligence by Victor Samayoa & Andres Astorga Espriella Data incidents are often investigated through fragmented Slack threads and manual SQL queries, leaving data owners dependent on engineers. Qbiz introduces a more efficient alternative: the Agentic Incident DAG. This approach uses AI agents to lead investigations while Airflow orchestrates a systematic diagnostic workflow. When a failure occurs, the system triggers a diagnostic DAG and creates a Data Incident Ticket. An Investigation Thread captures the analysis in real time as specialized agents evaluate potential causes and provide clear summaries for data owners. See more ...
Agentic Pipelines on Airflow: From Thesis to Production by Vikram Koka & Kaxil Naik The industry treats agents and pipelines as opposing paradigms. We think that framing is wrong. Most agentic problem-solving, when you look at what it actually does, has pipeline structure: gather data, process each dimension independently, synthesize, evaluate. The question is not “agents or pipelines?” but where the LLM fits inside the pipeline and what you gain by making each step explicit. This talk makes that concrete. We start with AIP-99 and the operator library that gives Airflow first-class LLM support: inference, SQL generation, branching, schema validation, and embedding, all backed by PydanticAI with 20+ model providers out of the box. We walk through a real pipeline that analyzes 5,856 survey responses using four parallel LLM-generated queries, DataFusion execution, and a synthesis step, showing exactly where the LLM reasons and where the pipeline handles everything else. See more ...
Agor: A Collaborative Orchestration Layer for AI Agents by Maxime Beauchemin Agor is an open-source platform for orchestrating AI agents: built for teams, not just individuals. It provides a shared, real-time workspace where humans and agents collaborate on a spatial canvas. Multiple agents run in parallel across isolated git worktrees, with full visibility into sessions, conversations, and outputs. Teams can inspect, intervene, and steer work as it happens. At the core are persistent assistants: long-lived agents with memory and tools that coordinate tasks, spawn sub-agents, and continuously advance workflows. See more ...
Airflow 3 as a Dungeon Master: Orchestrating a Text Adventure by Jed Cunningham A live text adventure where Airflow is the game engine. Rooms are tasks. Choices are branches. Inventory lives in XComs. Monsters have Deadline Alerts. The audience votes at every fork, and the DAG decides what happens next. It’s silly, it’s live, and every concept maps to a real production pattern. See more ...
Airflow 3.0 Asset Watchers: Cross-Domain Data Mesh Orchestration with AI-Assisted Deployment by Corrine Tan & Haofei Feng Data Mesh decentralises data ownership across business domains. In regulated industries each domain operates in its own account where producers publish data products and consumers subscribe. This enforces governance, limits blast radius and preserves autonomy. When each domain runs its own Airflow, orchestrating across these boundaries is the central challenge. Airflow 2.4 introduced data-aware scheduling which were designed for single Airflow instance with no native cross-instance event propagation. In practice this meant building polling sensors that queried the producer REST API to check upstream completion, but it is unreliable as events were lost and ordering not guaranteed. Airflow 3.0 resolves this with Event-driven scheduling via AssetWatcher. The Triggerer monitors a message queue and triggers the consumer DAG when the producer publishes a completion event. This talk traces that journey through a regulated enterprise Data Mesh. We also share how we built an agentic AI skills framework that encodes operational Airflow knowledge into reusable skills, enabling an AI agent to autonomously deploy, validate and troubleshoot the cross-environment pattern end-to-end. See more ...
Airflow as a Harness: The Workflow That Merged Itself by Ryan Hatter This talk is the story of getting a PR merged into a Apache Airflow without writing a single line of code, using Apache Airflow itself as an agentic orchestration harness to replicate the functionality of Claude Code for any pluggable LLM. We’ll walk through how Airflow’s AIP-99 Dag functionality map naturally onto the tool-use loops, context management, and decision branching that power modern agentic coding workflows. The result is a model-agnostic harness that can read a codebase, reason about changes, write and test code, and deploy a commit to a git repository, all orchestrated as an Airflow Dag. See more ...
Airflow at the heart of Equifax's Data Processing by Yuvaraj Sankaran At Equifax, Apache Airflow is used across many departments, helping Data Engineers, Data Scientists, and Business Analysts in their daily work. This presentation is about how to use modern orchestration technology at the heart of data processing and business processes to support daily company operations. See more ...
Airflow Autopilot: The Generate-Verify-Refine Loop That Makes Pipeline Authoring Truly AI-Native by Yifan Wang Today’s Pipeline authoring is synchronous: writing code, chasing error - every step blocks the engineer until resolved. You can’t step away or parallelize. Airflow Autopilot reimagines this to be AI-native and asynchronous. Describe your pipeline’s intent. The agent takes over - orchestrating two classes of purpose-built tools: tools that generate the DAG code and automate setup, and scorer tools that evaluate it across dimensions: e.g. data discovery, auth, compliance, DAG validation, even end-to-end execution. Every scorer returns a deterministic result and structured, prioritized hints. The agent runs the generate → verify → refine loop — calling scorers, reading hints, fixing code, re-scoring — until every dimension passes. You come back to a PR with DAGs that have been iteratively built, tested, and ready for review. For 10,000+ Airflow users, this shifts the engineer from executor to reviewer: you own the intent and final judgment, the agent owns the execution. Attendees leave with the architecture for an AI-native authoring experience, the principles behind decomposing work into scorer-sized verification units, and what it takes to scale this in production. See more ...
Airflow Callbacks Revamped: Beyond the Dag Processor by Ramit Kataria Airflow’s callback system has undergone significant architectural changes recently. Originally driven by the introduction of Deadline Alerts, these improvements have far broader implications for how callbacks are defined, where they run, and how reliable they are. In this talk, I’ll cover the user-facing and provider-facing changes along with a brief look at the significant technical design decisions and internal refactoring behind them, such as a new workload type and unified type-agnostic database model for callbacks. In the long term, this work makes both callbacks and the Dag Processor more robust, and the improved isolation is a key stepping stone toward Airflow’s upcoming multi-team capabilities. See more ...
Airflow in a Box: Methodology or Madness? by Nicholas Redd Airflow testing today is a patchwork: you can validate code and catch obvious breakage early, but many production failures live in the seams—runtime state, persistence, serialization boundaries, API behavior, and the way a real deployment executes work across components. The fast tools are valuable, yet they don’t fully model Airflow as a system. Meanwhile, the default development posture nudges you toward single-process behavior and away from realistic concurrency and state interactions. The result is a familiar trade: quick feedback vs. meaningful confidence. “Airflow in a Box” is a step toward collapsing that trade—making deeper, more production-relevant tests accessible without requiring a full, heavyweight instance for every iteration. In this talk, we’ll discuss methodology, quantify slickness, and share real code! See more ...
Airflow to the rescue: managing chemical emergencies by Eloi Codina Torras At Meteosim, Airflow is the engine for our entire decision system. It runs daily weather and air quality forecasts on schedule, but it also enables OnaChem React, a software that lets users manage chemical emergencies in real-time, and helps us manage consultancy projects. This talk covers how we set up Airflow 3 to handle five very different types of workloads: `1. Daily Forecasts: Running physics simulations for weather and air quality. 2. Sensor Validation: Ingest data from thousands of sensors and validate it. 3. Human-in-the-Loop: Managing long-running consultancy projects where Dags pause and wait for expert approval. 4. Emergency Response: Help users manage chemical emergencies using multiple real-time toxic dispersion simulations with pre-defined workflows through our SaaS platform. 5. Training AI models: Track multiple experiments.` We will explain why Airflow 3 was necessary to make this work. You will see how we orchestrate physics, AI, and human decisions in a single environment. See more ...
Anatomy of a Task Instance: From Scheduled to Done by Cedrik Neumann In this session I will provide a deep dive into a task instance’s lifetime. From when the scheduler decides for it to be scheduled until it is marked as success or failed. We will explore when in the process concepts like concurrency, pools and priority weights apply, what it means for a task to be “queued” and where things like cluster policies, operator links, callbacks and event listeners are evaluated. See more ...
Architecting the Center of Excellence: A Strategic Blueprint for Federated Airflow at Scale by Akanksha Khushboo As Airflow becomes mission-critical, centralized data teams often become a bottleneck. This session provides a framework for building a Center of Excellence (CoE) that empowers autonomous domain teams while maintaining global standards. We detail the shift toward “Data Platform Engineering,” treating orchestration as a product. Using case studies from large-scale organizations, we discuss a three-layer model: Strategic (governance), Tactical (platform development), and Operational (business unit execution). Attendees will learn to design a self-service platform with guardrails that manages multiple teams without interference. We will explore using Airflow 3.0’s architecture for task isolation and conclude with a guide on aligning cross-functional teams and measuring value through consumption-based billing. See more ...
Asset Partitions: Matching Workflow to the Right Data by Wei Lee Asset partitions are a key building block in Expanded Data Awareness. This session explains the core semantics of partition definitions, partition mappings, and backfill behavior in AIP-76. I will show how these pieces fit together in the current design, then discuss where asset partitions can go next, including improvements in authoring ergonomics, observability, and partition-aware workflow capabilities. Attendees will leave with a clear mental model of today’s implementation and a practical view of future direction. See more ...
Autonomous, Not Unsupervised: Agent-Authored Pipelines on Airflow at Together AI by Jordan Kail At Together AI, AI agents have become the primary authors of our production data pipelines — and Airflow is what makes that safe to do. Agents do the building. Airflow gives us the surface to set the rules, see what’s happening, and step in when we need to. The interesting part is what each side has to look like for that to actually work in production. This talk is a field report on that relationship. We’ll walk how we got from a world where humans wrote SQL by hand and dashboards refreshed nightly to one where agents make hundreds of queries per session, catalog thousands of tables across engines, and ship pipelines in hours instead of weeks. The platform now spans twelve dbt projects across billing, inference, and analytics — all of it agent-authored, all of it running through Airflow. See more ...
Beyond Containers: Securely Orchestrating AI Agents with Strong Isolation in Airflow by Uriel Munoz AI agents break the traditional Airflow trust model. While standard tasks are deterministic, agents execute dynamic logic and invoke external tools, meaning untrusted code is suddenly running inside standard containers sharing your host kernel. This session demonstrates how to secure AI workloads in Airflow without rewriting the orchestrator or building custom executors. We will introduce a custom, policy-driven @agent TaskFlow abstraction that leverages Kubernetes executor_config overrides (like runtimeClassName) to isolate workloads on the fly. See more ...
Beyond Multi-Cluster Airflow: Operating GPU Workloads at Scale by Aleksandr Shirokov, Tarasov Alexey & Vladislav Repev At last year’s Airflow Summit, we shared how we built a multi-cluster orchestration layer on top of Apache Airflow to run ML workloads across multiple Kubernetes GPU clusters. Once hundreds of ML engineers started running GPU pipelines in production, we discovered that orchestration alone is not enough. Operating multi-cluster GPU infrastructure introduces new challenges: controlling GPU allocation across teams, observing pipelines across clusters, and helping users run workloads efficiently without wasting expensive GPU resources. See more ...
Breaking the Monolith: Implementing Airflow 3.x Remote Execution for Multi-Team Environments by Kowsy Narayan Problem Statement: As our data platform scaled, our shared Airflow 2.9 deployment became a bottleneck with critical challenges: development friction from shared repositories, custom security workarounds, release coordination complexity, data isolation concerns, and cost attribution opacity. When Airflow 3.x launched with hybrid execution support, we restructured our architecture. Following a successful proof of value, we implemented remote execution - enabling teams to run workloads in isolated Kubernetes clusters while maintaining centralized orchestration. This session shares our journey, architectural decisions, and how we leveraged agentic AI to streamline migration and developer experience. See more ...
Build AI Pipelines with Apache Airflow 3 by Kenten Danas Apache Airflow® has long been the control plane for data pipelines. As AI workflows move into production, teams are discovering the same challenges apply: LLM calls fail, embeddings need regenerating, and agent outputs need human review. The operational discipline that Airflow brings to data pipelines is exactly what AI workflows need too. Rather than managing data pipelines in Airflow and AI workflows in a separate system, Airflow lets you build both in one observable, reliable control plane. You get scheduling, retries, lineage, versioning, and human-in-the-loop capabilities for your LLM tasks the same way you already have them for your SQL transformations. See more ...
Building a Context-Aware Agentic Coding Platform for Airflow at Scale by Yarden Wolf Generic AI coding assistants like Cursor and Claude code are powerful, but they struggle with proprietary infrastructures. At Wix, managing 7,500 active DAGs across 120 Data Engineers, we found that standard AI tools lacked the context to be truly effective - they didn’t know our custom operators, DWH modeling patterns, or strict governance rules. In this session, we introduce our internal “Agentic IDE Configuration Manager” that bridges this gap. We will demonstrate how we leverage MCPs to inject deep Airflow context into our AI agents. You will learn how we enabled our coding agents to: Generate compliant code by utilizing custom Cursor rules to ensure every DAG meets production standards and naming conventions. Interact with Airflow by using our custom MCPs to run DAGs locally, parse error logs, and autonomously fix pipeline failures. Understand data by accessing our Data Catalog and Trino engine to validate schema logic in real-time. Whether you are trying to optimize your team’s workflows or simply curious how far can coding agents go in the current age, join us in this exciting talk. See more ...
Building a low-cost, scalable Airflow Platform for Small Teams by Aniruddha Sengupta Apache Airflow is often perceived as a platform best suited for large organisations with significant infrastructure budgets and dedicated platform teams. In this talk, I want to share how we built and scaled a robust Airflow platform with tight cost constraints whilst still maintaining reliability, governance and developer productivity. Starting from a small Airflow setup, we have evolved our architecture to support multiple teams and increasingly complex workflows. This includes standardising environments and making sure best practises are adopted around observability, resource management and version control. See more ...
Building Blocks, not Factories: Abstractions that enable, not obscure by Collin McNulty Many teams develop their own “Dag factory” to make Airflow easier to use in their organizations. This can help their users avoid python and configure Dags in a simpler manner. However, there is a huge spike in the difficulty curve of writing a DAG if it requires logic that does not fit into the confines of the Dag factory. If you want to create such a DAG, you are then having to completely abandon the pre-made framework and go back to writing a pure airflow DAG. Instead, I will present a different perspective that instead of producing entire DAGs, you should create pre-made task groups that can be dropped into a DAG to cover common steps, but in a manner that maintains a smooth difficulty curve if you want to add customer elements. See more ...
Building storage analytics pipelines for cloud cost optimization with Airflow by Bao Nguyen Storage usage is a major driver of infrastructure cost for media collaboration platforms. Understanding how storage grows across accounts, assets, and workflows requires analytics pipelines that combine product data with infrastructure metrics. In this talk, I’ll share how we built storage analytics pipelines that model storage usage across accounts and plan tiers to help leadership understand infrastructure cost drivers. Using warehouse data models orchestrated with Airflow, we developed pipelines that track storage usage over time, identify discrepancies in legacy storage calculations, and resolve edge-cases. See more ...
Cloud Composer Workshop - Managing DAGs at Scale by Danny De Leo During this workshop you are going to learn how to effectively set up CI/CD for Composer environment and build observance of your DAGs across many Cloud Composer environments See more ...
Common Issues When Running dbt in Airflow (and How to Fix Them) by Tatiana Al-Chueyr Martins In many modern data platforms, orchestration tools are combined with transformation frameworks. A common pattern is orchestrating dbt (data build tool) transformations using Apache Airflow — something reported by roughly 44% of the community. At first glance, the integration seems straightforward: simply run dbt run inside an Airflow task. Some teams go further and use libraries that convert dbt projects into native Airflow DAGs, such as Astronomer Cosmos. In practice, however, teams quickly run into operational and architectural challenges. Slowness, out-of-memory errors, zombie tasks, and DAGs that take minutes to appear in the UI are just a few of the issues that can emerge as projects scale. See more ...
Dag Versioning in Airflow: Version Proliferation and Open Questions by Ephraim Anierobi This session explores the next phase of Dag versioning in Airflow and the practical questions users face in real deployments. Dag versioning moved Airflow beyond a “latest only” model, but it also introduced confusion around why Dag versions keep increasing, what disabling Dag bundle versioning actually does, what creates a new version, and how users should think about clears, reruns, and backfills after a Dag changes. I will examine a common misconception: disabling bundle versioning does not stop Dag version changes. I will also connect Dag versioning to Dag delivery in Airflow 3, showing how Git backed Dag bundles provide a more native alternative to git-sync in Helm-based deployments. See more ...
DAGs Move Robots: Closed‑Loop Orchestration for Silicon Validation Labs with Airflow by Dheeraj Turaga, Deva Madhavan & Shubham Raj What if your Airflow DAG could orchestrate robots, thermal chambers, and silicon tests, not just code? Silicon validation labs rely on scarce, stateful physical resources: robotic handlers, DUT boards, thermal/power systems, instruments, and shared hardware queues. Teams often coordinate these via spreadsheets and ad hoc reservations, causing contention, idle gaps, conflicts, poor observability, and slow triage. This talk presents a closed-loop orchestration model where Apache Airflow is the control plane for a software-defined validation lab. A central DAG coordinates robotic handling, thermal/power setup, stress and performance runs, and parametric characterization on hosts connected to silicon. It continuously ingests hardware health, measurements, and test outcomes, then feeds results into AI-assisted analysis to choose the next physical action: refine parameters, schedule follow-up experiments, or trigger mitigation. See more ...
Deadlines for DAGs: What's Shipped and What's Next by Sean Ghaeli Airflow’s legacy SLA (Service level agreement) feature let users set a maximum expected duration for a DAG run and receive an email when it was exceeded, but it was inflexible and hard to configure. Deadline Alerts replaced it in 3.1 with a general-purpose system for time-based alerting. Since then, two release cycles have reshaped the feature. Callbacks now run in supervised subprocesses with access to Connections, Variables, and Assets, which means they can query your infrastructure and respond to problems, not just send a notification. Deadline status is visible in the UI Grid view and DAG run overview. Named deadlines let you attach multiple alerts to a single DAG for different stakeholders. OpenLineage captures deadline events. And fixes for duplicate callbacks under HA schedulers and migration performance have made the feature production-solid. See more ...
Debugging the Undebuggable: Lessons from Real Airflow Incidents by Pankaj Singh Debugging Airflow failures in production can be harder than building the pipelines themselves. Engineers often encounter issues such as disappearing DAGs, hanging tasks, missing logs, zombie tasks, or sudden performance degradation, often with little visibility into the root cause. Over the past year, while supporting multiple Airflow deployments and integrations, we investigated several such incidents across different teams and environments. This session shares lessons from these real debugging cases and explains how the issues were diagnosed and resolved. See more ...
Declarative Pipelines Meet Declarative Orchestration: Spark Declarative Pipelines + Airflow 3 by Lisa Cao & Andreas Neumann Apache Spark’s new Declarative Pipelines (SDP) let engineers define WHAT their data should look like, not HOW to build it. Apache Airflow 3 brings a declarized orchestration model. Together, they eliminate an entire category of boilerplate: the DAG that exists only to babysit a pipeline. This talk walks through building a production Spark SDP pipeline orchestrated by Airflow 3, showing how dependency graphs replace imperative task chains, how testing and recovery patterns change when your pipeline is declarative end-to-end, and what this means for the 80% of data engineering time currently spent on operational plumbing. See more ...
Designing Domain-Oriented dbt Projects and Making Them Work in Airflow by Pankaj Koti As analytics teams grow, monolithic dbt projects can become tightly coupled and difficult to scale. Cross-domain dependencies multiply, deployment cycles slow down, and ownership boundaries blur. dbt Mesh proposes a domain-oriented approach with independently owned dbt projects, explicit cross-project contracts, and controlled exposure to dependencies. Applying Mesh principles is not just about splitting repositories; orchestration must also support these boundaries. In this session, we explore how to design dbt projects according to Mesh principles and how Airflow orchestration can reinforce those architectural decisions. Using multi-project capabilities in Cosmos that leverage dbt Loom-style cross-project referencing, we demonstrate how Airflow can model domain separation while still enabling controlled cross-project dependencies. See more ...
Designing Self-Healing Airflow Platforms: Autonomous DAG Recovery at Scale by Kumuda Sreenivasa & Sandeep Bommisetti Most Airflow failures are still handled manually — retries, Slack alerts, and late-night debugging. This talk shows how to design Airflow as a self-healing platform that detects problems early, limits blast radius, and automatically recovers. We’ll cover practical patterns for DAG, schema, and dependency-drift detection; safe, selective backfills; predictive failure modeling using metadata; lineage-aware rollbacks; and canary deployment for DAGs. You’ll learn how to isolate unstable workloads before they impact others and how to turn Airflow into an intelligent control plane — not just a scheduler. See more ...
Developer Velocity at Scale: Production-Like Airflow Environments on Kubernetes by Matthew Davis & Matt Koski Teams running Airflow on Kubernetes know the trade‑off all too well: Kubernetes scales beautifully in production, but makes local development slow, brittle, and unrealistic. Engineers struggle to replicate production environments locally, forcing them into inefficient “test-in-production” cycles that slow delivery velocity, increase deployment risk, and frustrate data teams. In this talk, we’ll walk through the architectural patterns and platform engineering approach we used to give engineers on‑demand, isolated, production‑like Airflow environments, without sacrificing the benefits of shared Kubernetes infrastructure. See more ...
Developing an AI-powered personal endurance sports coach by Bas Harenslak During this session, I’ll deep dive into the implementation of an AI-powered endurance sports coach using Apache Airflow as the backbone for data ingestion and processing. Beyond data pipelines, I’ll explain what’s required to build a conversational AI system, from structured data modeling to orchestration and retrieval. We’ll explore how metrics are precomputed, how vector search enables contextual memory, and which front-end patterns work best for interacting with AI agents. The result is a reproducible architecture where Airflow powers the data layer and an LLM provides the reasoning on top to help athletes perform at their best in numbers-driven endurance sports. See more ...
Dynamic graphs - Airflow for agentic workflows by Ash Berlin-Taylor With the growing recognition if the need for agentic orchestration, Airflow is evolving to support a growing set of agentic patterns. Dynamic Task mapping provided a foundation for RAG workflows. Learn how to go beyond those and orchestrate reasoning patterns with dynamic graphs See more ...
Enterprise-Grade Airflow Upgrade: Strategies & Deep Dive by M Waqas Shahid Upgrading to Apache Airflow in large, production-grade environments can be complex—especially in enterprise setups with hundreds of DAGs, custom plugins, and mission-critical pipelines. The challenge grows even more complex in decentralized setups, where platform teams are responsible for the system’s stability, but the DAG code lives across multiple teams you don’t directly control. You will have the chance for personalised review of your current organizational setup, assess testing coverage, and identify concrete ways to improve your upgrade process. This hands-on workshop will provide: See more ...
Event-Driven Orchestration Monitoring: Streaming Airflow Metadata to Kafka via CDC by Vipin Kataria How do you monitor Airflow across 50 teams in real-time? How do downstream systems react instantly to pipeline completions without polling APIs? How do you build custom dashboards without overloading Airflow’s database? This talk demonstrates how we use Change Data Capture to stream Airflow’s metadata to Kafka, making orchestration events consumable by any system in real-time. By capturing changes in Airflow’s Postgres database and publishing them to Kafka topics, we enable instant notifications, real-time dashboards, compliance audit trails, and cross-system orchestration without modifying Airflow code or impacting performance. You’ll learn how to set up Debezium CDC for Airflow’s metadata tables, design Kafka topics for task and DAG events, build real-time consumers for monitoring and alerting, handle schema evolution across Airflow upgrades, and implement cost attribution and SLA monitoring in real-time. Using production examples processing millions of events daily, I’ll share architecture decisions, performance optimizations, and lessons from running CDC at scale. You’ll leave with patterns for making Airflow observable to your entire organization. See more ...
Fixing The Token Authentication: Revocation, Scoping, and Securing the Execution Boundary by Anish Giri When Airflow 3 introduced JWT based task authentication, it also introduced new attack surfaces; such as, Tokens that can’t be revoked,Tasks that lose authentication while waiting in queues and Forked processes that inherit signing keys and also can forge tokens for other tasks. In this talk, I’ll walk through three security challenges at the task execution boundary and the code contributed to fix them: Token revocation (merged, PR #61339): Airflow 3.x had no way to invalidate issued JWTs with implications for common compliance frameworks. See more ...
From Airflow 2 to Airflow 3: Migrating 100+ DAGs Without Downtime or Developer Burden by Goncalo Costa Migrating a production Airflow deployment from version 2 to 3 without disrupting hundreds of DAGs across multiple teams sounds scary (and it is). In this talk I will share how we migrated versions without a big-bang cutover, without weeks of cross-team change requests, and without leaving our pipelines in a broken state. I’ll walk through how we built a compatibility layer to make sure our code runs on both versions during the migration, how we used AI-tooling to orchestrate 400+ DAG changes and how our on-demand ephemeral environments - full k8s deployments deployed for each pull request - helped us experiment and test all the required changes. See more ...
From Chaos to Control: Navigating Airflow Sprawl with Centralized Observability by Jon Hiett As data platforms mature, organizations often experience “Airflow Sprawl”—the rapid, organic growth of isolated Airflow instances across different teams and projects. While this empowers localized control, it creates dangerous silos that hinder visibility, increase operational risk, and erode developer productivity. In this session, we will explore the critical challenges of managing a fragmented Airflow ecosystem and discuss strategies for regaining control. We will examine why centralizing execution history and establishing unified observability is essential for reducing Mean Time to Recovery (MTTR), mitigating hidden security risks, and transforming fragmented instances into a cohesive, reliable data service. Attendees will leave with a strategic framework for managing Airflow at scale. See more ...
From Cron to Assets: Event-Driven Drone Telemetry Ingestion with Airflow by Charles Adetiloye & Bryan Johns A drone doesn’t care what time it is. It takes off when the mission says so, lands when the battery says so, and uploads its logs whenever the LTE link or WiFi finally cooperates. Cron-based pipelines, by contrast, care deeply about the clock — and that mismatch is where most fleet telemetry stacks quietly bleed money, latency, and engineer sanity on empty polls, half-parsed flights, and workers pinned waiting on slow uploads. See more ...
From Experiments to Production: How We Built a Lightweight ML Platform on Airflow by Marion Azoulai Many data teams can build machine learning models, but operationalizing them reliably remains a challenge. At Astronomer, our data team recently moved from exploratory modeling to running multiple production ML models powering go-to-market analytics and workflows. Rather than introducing heavy MLOps infrastructure, we integrated the full ML lifecycle directly into our Airflow-based data platform. In this talk, we’ll share how we use Airflow to orchestrate production ML end-to-end: from feature pipelines in Snowflake, to model training and artifact promotion, to batch scoring and prediction delivery. See more ...
From Hours to Minutes: Orchestrating Local LLMs for Sensitive Data Pipelines with Apache Airflow by Chhayank Jain Processing unstructured data in regulated industries, healthcare, finance, legal, is one of the hardest data engineering challenges: the data is messy, privacy constraints prevent sending it to external APIs, and scale makes manual processing impossible. In this talk, I’ll walk through how to design and deploy an Apache Airflow–orchestrated LangChain pipeline powered by LLMs to digitize unstructured documents into a unified structured platform. I’ll cover the full architecture: how Airflow DAGs coordinate multi-step LLM inference, validation, and ingestion stages; how LoRA/PEFT fine-tuning adapted open-source LLMs for domain-specific language without leaking sensitive data; and how failure handling, retries, and data quality checks were built natively into Airflow. See more ...
From JAR to DAG: Running Java Tasks Natively with the Airflow Java-SDK by Zhe You Liu Airflow is Python-first — but production business logic is often Java-first. The Airflow Java SDK bridges that gap by letting you mix Java and Python tasks within the same DAG, without shell wrappers or separate services. In this talk, we’ll walk through the full lifecycle of a Java task: how the Java SDK is set up, how tasks are defined and packaged as a JAR, how Airflow picks them up and runs them on any executor, and how results flow back into your DAG. We’ll also cover how core Airflow primitives — Variables, Connections, XCom, and logging — work natively in the Java SDK, enabling true cross-language, bidirectional communication within a single pipeline. You’ll see it all running end-to-end in a live demo alongside Python tasks. See more ...
From UC4 Automic to Airflow at eBay: Bridging the Enterprise Scheduler Feature Gap at Scale by Xiongfeng Song Migrating from UC4 Automic to Apache Airflow is far from a lift-and-shift exercise. UC4 offers advanced scheduling primitives that data teams rely on daily — and Airflow doesn’t replicate them out of the box. At eBay, we migrated thousands of business-critical UC4 workflows onto our Airflow 2.10 platform. Rather than forcing teams to change how they operate, we built the missing capabilities natively into Airflow: Breakpoints — pause a pipeline at a specific task for inspection without failing the run Skip logic — dynamically bypass tasks or task groups at runtime Calendar-aware scheduling — replicate UC4’s calendar model as custom Airflow timetables Pipeline pause/resume — operator-triggered suspension of in-flight DAG runs with state consistency We’ll share the engineering trade-offs, architectural constraints we hit, and patterns reusable beyond eBay’s stack. See more ...
Get Certified: DAG Authoring for Apache Airflow 3 by Marc Lamberti We’re excited to offer Airflow Summit 2026 attendees an exclusive opportunity to earn their DAG Authoring certification in person, now updated to include all the latest Airflow 3 features. This certification workshop comes at no additional cost to summit attendees. The DAG Authoring for Apache Airflow certification validates your expertise in advanced Airflow concepts and demonstrates your ability to build production-grade data pipelines. It covers TaskFlow API, Dynamic task mapping, Templating, Asset-driven scheduling, Best practices for production DAGs, and new Airflow 3.0 features and optimizations. See more ...
Gleaming the Cube: Exploring the limits of Airflow through the Rubik's Cube by Jonathan Leek What does solving a Rubik’s Cube have to do with Apache Airflow? More than you’d think. In this talk, I’ll walk through a project where Airflow orchestrates the process of solving a Rubik’s Cube — not as a gimmick, but as a framework for exploring cyclic workflows, state management, and iterative computation in a system designed for DAGs. Cube-solving algorithms naturally require feedback loops, evolving state, and conditional branching — all things that challenge Airflow’s acyclic model. See more ...
Healthcare Interoperability Meets Airflow Extensibility by Wyatt Shapiro In healthcare data, standards are often anything but standard. Every new partner arrives with its own requirements for data exchange spanning FHIR APIs, HL7 feeds, SFTP drops, and custom vendor extracts. The result? Integration projects that stretch from weeks into months, custom pipelines that only one engineer understands, and implementation teams who are already counting down to your next missed deadline. This session shows how Airflow can change your approach to managing data transfer for healthcare partners. See more ...
It Works! Now What? Fast Iteration for AI Capabilities in Airflow by Alex Guglielmone Building an AI capability in Airflow is the easy part. The hard part is what comes next. You want to swap a model, refactor a prompt, cut token costs, or try a local model instead of paying for cloud. How do you know it still works as expected? Without a fast feedback loop, every change is a gamble. This talk shows practical patterns for building that feedback loop, with real examples using agent skills, MCPs, and local and cloud models. It covers the challenges too: sandboxing, observability, non-determinism, and keeping checks simple enough that people actually use them. See more ...
Migrating Airflow 2 to 3 for Infrastructure Operations at Scale by Ethan (Tianyang) Lin & Rumeysa Ozaydin This talk covers migrating a production Airflow platform that orchestrates a large VM fleet — provisioning, OS patching, and decommissioning at high concurrency. This is not a data pipeline — it is infrastructure operations at fleet scale We’ll share workflow patterns that make fleet-scale orchestration possible in Airflow, then cover how we moved from an Airflow 2 monolith — all components on every node with fixed worker counts — to Airflow 3 with independently scalable services, each with its own release cycle. We’ll dig into a silent breaking change in Airflow 3’s XCom behavior: xcom_pull(key=…) without task_ids no longer searches upstream tasks, returning None with no warning. We’ll present three iterations of solving this — from O(n) DAG traversal to a custom XCom backend that restores Airflow 2 semantics with zero DAG code changes — and the design tradeoffs at each stage. Attendees will learn how Airflow powers infrastructure operations beyond data pipelines, how Airflow 3’s XCom silently breaks Airflow 2 workflows, three approaches to the same migration problem, and lessons from running both versions in parallel. See more ...
Migrating Airflow at Scale - What the Docs Don't Tell You by Olivier Daneau If you are migrating from self-hosted Airflow to any of the managed platforms, most migration guides you’ll find online assume one environment, one team, one version. Large organizations are never that simple. This talk comes from four years of assisting customers with real migrations across some of the biggest Airflow deployments out there, from self-hosted open source to managed cloud platforms like MWAA, GCC, and Astro, and between major version upgrades. See more ...
Multi-Team Airflow: A Customer-Driven Journey by Niko Oliveira & Vincent Beck As Airflow deployments scale and the number of Dag authors increases the question arises: how do we support many teams with different needs and requirements on a shared platform? Over the years we’ve observed many organizations building their own multi-tenant layers on top of Apache Airflow to solve this problem and we’re now adding native support for this type of deployment. This talk explores building multi-team support in Airflow, working backwards from those real deployment challenges and community pain points we’ve observed. See more ...
One Codebase, Many Distributions: Airflow’s Modular Approach by Jarek Potiuk & Amogh Desai Airflow’s evolution toward a client-server architecture faced a fundamental challenge: splitting a monolithic codebase into independent distributions (airflow-core, task-sdk, providers) without triggering dependency hell. Traditional PyPi packaging and code duplication both fail at Airflow’s scale. Airflow 3.2 solves this through modular isolation and shared libraries using in-repository symlinks. This approach ensures each distribution ships with the exact version of shared code it requires, eliminating runtime version conflicts and allowing for independent dependency management. We have already migrated 10+ critical components—including the config parser, observability, and secrets masking—into this shared model. See more ...
One Gateway, Six Clusters: Routing, Federation, and Zero-Downtime Upgrades at eBay by Huanjie Guo We built a centralized Gateway that sits in front of our entire scheduling fleet and solves three problems no single-cluster Airflow setup ever faces. Composite Routing — Workflows are bound to clusters via a tag or their workspace Global Concurrency Control — Each cluster enforces its own Airflow pool locally, unaware of what the other five are running. Shared downstream systems — rate-limited APIs, licensed compute engines — can be overwhelmed even when every individual pool looks healthy. The Gateway acts as a platform-wide slot broker: operators acquire a slot before doing real work. A built-in heartbeat scheduler reconciles stale slots against each cluster’s REST API, handling crashes and OOM kills transparently. See more ...
Optimising Airflow in Real-World Deployments: Profiling, Performance Drift, and Confident Upgrades by Pankaj Koti & Vara Prasad Regani Performance issues in Apache Airflow rarely appear as clear failures. Instead, they surface as subtle signals: longer task queue times, slower DAG parsing, scheduler lag, or workers hitting limits as workloads grow. In this talk, we share lessons from profiling real production deployments across Airflow 2.x and 3.x. Combining frontline operational insights with focused technical investigation, we analysed task latency, DAG parsing time, worker behaviour, and metadata database performance under sustained load. See more ...
Orchestrating 100 ML Models using Airflow by Ryan Stevens Productionizing ML workflows is complicated; scaling them is harder. At Ramp, we grew from zero to nearly 100 production ML models powering systems like credit risk assessment and sales lead valuation. This talk covers how Airflow became the backbone of our ML platform, orchestrating ETL jobs, data quality checks, and model runs. We’ll discuss how we evolved it to meet the increasing complexity of our ML systems. Every ML system consists of feature creation and large-batch inference. We started with a few DBT models and one cloud-hosted notebook, which evolved into thousands of upstream tables and hundreds of AWS batch inference jobs. See more ...
Orchestrating AI-Enabled Prescription Workflows with Apache Airflow: Improving Accuracy, Efficiency, by Chandra Kiran Yelagam Modern pharmacy enterprise systems must process high volumes of complex prescriptions while maintaining strict safety, compliance, and operational efficiency. However, traditional rule-based platforms frequently generate low-specificity alerts that contribute to alert fatigue, workflow bottlenecks, and increased manual intervention. As clinical guidelines, payer requirements, and treatment protocols evolve, static rule engines struggle to keep pace with the dynamic nature of modern pharmacy operations. This session presents a practical architecture for AI-enabled prescription workflow automation orchestrated through Apache Airflow, enabling scalable, transparent, and auditable clinical workflows. By combining rule-based safety checks with machine learning models for classification, anomaly detection, and intelligent workflow routing, the system significantly improves routing precision, reduces false positives, and accelerates prescription verification. See more ...
Orchestrating AI-Ready Data Pipelines Across the Enterprise with Airflow and Control-M by Minh Nguyen AI and ML pipelines built in Airflow often power critical business outcomes, but they rarely operate in isolation. In this hands-on workshop, learn how Control-M integrates with Airflow to orchestrate end-to-end workflows that connect data pipelines with upstream and downstream enterprise systems such as supply chain, billing, and other mission-critical applications. You’ll gain practical insight into how teams can improve visibility, reliability, and governance across Airflow-driven workflows, helping move AI and data initiatives from pipeline execution to enterprise-ready business impact. See more ...
Orchestrating and Testing RAG Pipelines with Airflow by Shrividya Hegde RAG pipelines fail silently. Bad retrievals, hallucinated answers, and stale vectors rarely trigger alerts; they quietly degrade your AI product. This session presents a reference DAG architecture for production-grade RAG ingestion built on Airflow 3, with inline quality gates that evaluate retrieval accuracy and LLM faithfulness before a single vector reaches production. We’ll walk through four common RAG failure modes and the specific Airflow pattern that stops each one using RAGAS as the evaluation framework, and Airflow 3’s TaskFlow API, Assets, and DAG Versioning to make pipelines reproducible and event-driven. You’ll leave with reusable quality gate patterns and a concrete architecture you can adapt — because in RAG systems, quality shouldn’t be an afterthought. It should be built into the pipeline from the start. See more ...
Orchestrating GenAI & ML Pipelines with Apache Airflow by Sneha Rao, Sushmita Barthakur & Suba Palanisamy Orchestrating Cross-Account ML & Data Pipelines with Apache Airflow As organizations scale data and ML workloads across multiple AWS accounts and Regions, orchestration becomes the hardest engineering problem — not the models themselves. This session shows how Apache Airflow serves as a centralized orchestration hub for distributed data-processing and machine-learning pipelines that span account and regional boundaries. We walk through a production-ready architecture where a single Airflow environment coordinates: Cross-account DAG patterns — using Airflow connections, IAM role assumption, and custom hooks to trigger AWS Glue, SageMaker, and Lambda in remote accounts Cross-Region data flow — leveraging S3 Cross-Region Replication with S3KeySensor operators to gate downstream tasks on data availability Custom operators for cross-account ML — extending SageMakerHook and SageMakerTrainingOperator to train models in a separate account while keeping orchestration centralized Sensor and operator design — choosing the right sensor modes, timeouts, and poke intervals for long-running training jobs and inference calls Human-in-the-loop approval gates — using Airflow’s built-in mechanisms to require manual sign-off before promoting models to production Cost and governance controls — short-circuiting DAG branches on early evaluation metrics, managing spot instances, and enforcing least-privilege IAM across accounts Attendees leave with reusable DAG patterns, operator recipes, and an architecture blueprint for running multi-account, multi-Region data and ML pipelines — all orchestrated through Airflow. See more ...
Orchestrating Graph Database Workloads in Apache Airflow with Apache TinkerPop by Ahmad Farhan Graph databases are increasingly used for relationship-heavy data such as fraud detection, knowledge graphs and CRM systems, yet integrating them into orchestration workflows has remained difficult. This session introduces the Apache TinkerPop Provider for Airflow, enabling graph databases to be orchestrated as first-class citizens. I will demonstrate how it works with both self-hosted and managed services such as AWS Neptune and Azure Cosmos DB. See more ...
Orchestrating Streaming Data Pipelines with Airflow, Kafka, Spark, and Kubernetes on GCP by Karan Alang Modern data platforms rely on real-time pipelines to process and analyze large volumes of streaming events. Apache Airflow is widely used for batch orchestration, but it can also coordinate complex streaming architectures. In this session, we explore how Airflow orchestrates scalable pipelines built with Apache Kafka and Apache Spark running on Kubernetes in cloud environments. We walk through an architecture where Kafka handles high-throughput event ingestion, Spark processes streaming data for analytics and transformation, and Kubernetes provides scalable infrastructure for distributed workloads. Airflow acts as the orchestration layer, coordinating job scheduling, pipeline dependencies, and operational visibility. See more ...
Performance Debugging in Airflow: From Symptoms to Solutions by Rahul Vats Airflow running slow? Memory is spiking. Tasks are queuing forever. Now what? Debugging performance issues in a distributed system like Airflow can feel overwhelming—is it the scheduler, the database, the DAG Processor, or your DAG code? This talk shares practical techniques for isolating and fixing performance problems, using real examples from the Airflow codebase. We’ll cover: Understanding Airflow’s moving parts – Where bottlenecks typically hide (scheduler loop, DAG parsing, database queries). See more ...
Pushing to Prod on a Friday by Ashley Gough If the idea of pushing to production on a Friday still makes your stomach drop, you’re in good company because most data professionals know that particular flavor of dread. But that fear says more about systemic fragility than the day of the week. This talk explores how unclear ownership, hidden dependencies, and late validation create production risk in data platforms. I’ll show how data contracts clarify expectations between producers and consumers, how Behavior‑Driven Development (BDD) provides a shared language for system behavior, and how Airflow can enforce guardrails that shift validation earlier and reduce blast radius. This session focuses on the organizational and architectural decisions that shape platform reliability. Because Airflow often becomes the visible surface of upstream uncertainty, its teams feel the impact of broader design and governance choices. Attendees will learn to interpret “Friday fear” as a strategic signal, how contracts and BDD strengthen alignment and predictability, and how Airflow can act as a platform‑level safety system that builds trust and supports confident deployments - Fridays included. See more ...
Remote Control Isolation: airflowctl becomes the new default by Bugra Ozturk Meet airflowctl, the new default for API-driven remote operations. You will see how separating control from execution enhances security, enables isolation, and simplifies automation across different environments. I will discuss the development of airflowctl, demonstrate practical examples of secure remote execution, and provide a guide for transitioning from legacy workflows. You will learn how to easily migrate towards airflowctl and leverage the flexibility of an API-driven approach. See more ...
Resumable Task Execution for Long Running Tasks such as Spark by Amogh Desai This talk focuses on leveraging the Task State Management (AIP-103) and Enhanced Retry Policy work (AIP-105) being released in Airflow 3.3 to enable enhanced execution of long running tasks including checkpointing, sophisticated (and automated) retry policies, and intra-task observability. Initially focused on Apache Spark, which is one of the most widely used workload frameworks for data engineers, this is extensible to long running tasks of any type including agentic workflows. See more ...
Scaling Airflow for Capacity Forecasting at Amazon Prime Video by Shivam Rastogi Amazon Prime Video uses Airflow to forecast traffic for hundreds of micro-services to deliver the best customer experience for some of the world’s biggest live events across multiple global regions. The forecasting methodology involves complex job dependencies between customer interaction metrics and geographies - translating to ~50 production DAGs with cross-DAG dependencies that process terabytes of customer activity data daily across tens of thousands of compute cores. In this talk, we’ll cover how we manage dependency complexity at scale, coordinate data flows across geographical boundaries, and keep forecasts reliable as the system grows. See more ...
Scaling to 1,000 DAGs: Idelic's Blueprint for Airflow Automation and Reliability by Matthew Stavinga, Ray Carroll & Matt McCormack This session details Idelic’s critical journey to a robust, scaled Astronomer Airflow environment. We’ll share technical lessons from overcoming initial orchestration challenges and successfully scaling to over 1,000 active DAGs. The session will showcase our advanced, Jenkins-integrated testing deployment for managing this scale, and the development of a standardized framework that simplifies DAG creation, eliminates code repetition, and enables configuration changes without a full deployment. This is essential for any team managing complex data pipelines, offering a blueprint for standardized Airflow development, maximum data reliability, and future growth at a large scale. See more ...
Securing Apache Airflow with Keycloak: A Deep Dive into the Keycloak Auth Manager by Vincent Beck As organizations scale their data platforms, managing access to Apache Airflow becomes increasingly complex. In this talk, we introduce the Keycloak Auth Manager — a pluggable authentication and authorization backend for Airflow that delegates identity management to Keycloak, a battle-tested open-source Identity and Access Management solution. We’ll start with the big picture: what problem does the Keycloak Auth Manager solve, and why Keycloak? We’ll walk through the architecture — how Airflow’s auth manager interface works, how the Keycloak integration hooks into it, and how authentication flows (OIDC/OAuth2) and authorization (role mapping, resource-based permissions) are handled under the hood. See more ...
Self-Service DAGs: Event-Driven Design for GitHub Actions and Airflow at Lyft by Ken Obata At Lyft, driver pay configs on GitHub must be validated through Airflow DAGs before merging. However, Scientists and Analysts who change configs are not familiar with Airflow. How do we make such validation self-service while meeting SOX compliance? This talk presents a design pattern for bidirectional GitHub-Airflow integration: GitHub Actions trigger DAGs, and DAGs push results back as PR status checks via the GitHub Commit Status API. We cover event-driven push-style vs traditional polling style, and why an event-driven push-style works well with Dynamic Task Mapping. This pattern aligns with Airflow 3’s event-driven scheduling vision. We also discuss how SOX requirements shaped this design. See more ...
Shift-Left for Data Pipelines: Improving Airflow Code Quality and Reliability with AI by As Apache Apache Airflow environments scale, teams face duplicated DAG patterns, slow debugging cycles, and technical debt that often surfaces only in production. This workshop explores how AI can shift pipeline quality left by enabling earlier detection of issues and improving code reliability during development. Using IBM Bob, we demonstrate real-time code review and refactoring guidance across IDE and terminal workflows, helping engineers identify complexity and performance risks before deployment. We also show how AI accelerates DAG debugging and improves consistency across pipelines in environments that span Airflow and streaming systems such as Confluent. See more ...
Spark + Airflow: How Orchestration Decisions Impact Performance and Cost by Meni Shmueli Apache Airflow is the de facto orchestrator for modern data platforms, while Apache Spark powers large-scale data processing. But when the two meet in production, teams quickly face architectural decisions that affect reliability, performance, and cloud cost. In this talk we explore key design questions when orchestrating Spark with Airflow: • Should you run a shared Spark cluster, a cluster per DAG run, or clusters per task? • When should Spark workloads run in parallel vs sequentially within a workflow? • How can teams benchmark pipeline performance in terms of both runtime and cost? • How do emerging features like Spark Declarative Pipelines change how Spark integrates with orchestration systems? See more ...
Spec-Driven Development for Airflow DAGs by Kyle McCluskey AI coding assistants have transformed software development, moving from ad hoc “vibe coding” to rigorous spec-driven development (SDD). The Airflow ecosystem has fully embraced these advancements, but different use cases demand different SDD approaches. This talk compares ETL and ML pipeline patterns, showing how each leverages Airflow’s unique capabilities differently. I then present SDD strategies along a Spec Stability Spectrum. ETL specs are stable and external — schemas, dbt models — making deterministic, template-driven approaches like DAG Factory and the cosmos-dbt-core skill the right fit. ML specs are volatile and internal, as experiments evolve, so LLM-driven hybrid approaches like the Airflow AI SDK and the airflow-hitl skill are better suited. Both approaches are demonstrated live with Claude Code. Examples draw from my work at TXI Digital generating ETL and ML pipelines for heavy industry clients, with a focus on Rail and anecdotes from Renewable Energy. See more ...
Stabilizing LinkedIn Continuous Deployment on Airflow by Wensi Hu & Pooja Pal Last year, we showed how LinkedIn’s continuous deployment (LCD) runs on Apache Airflow to orchestrate safe, repeatable releases across thousands of services—powering everyday deployments for 10,000+ engineers. This year, we’ll dive into the hard‑won patterns that keep those deployments stable at scale: preserving DAG consistency during live updates; routing seamlessly across multiple clusters for graceful failover; enforcing HA guardrails on the control plane; and using dynamic task mapping to deliver faster rollbacks and reduce deployment overhead. You’ll see how we abstract Airflow for a cleaner user experience, what really moved the needle on launching tasks faster, and portable observability practices that cut on‑call toil. See more ...
Stop Being the Dag Bottleneck: How to Scale Airflow Orchestration Beyond Your Engineering Team by Yetunde Dada Your data platform team didn’t sign up to be a Dag factory. But when Airflow expertise is concentrated in a small group of engineers, that’s exactly what happens. Analysts wait days for simple workflows, engineers burn cycles rebuilding the same patterns, and frustrated teams start building outside the stack entirely. The real fix isn’t a better onboarding guide or a friendlier UI. It’s rethinking the abstraction layer your team exposes to the rest of the business. See more ...
Stop Debugging in Production: Shift-Left Airflow for Streaming Pipelines with AI by Bob Reno As Apache Airflow expands beyond batch into real-time, event-driven architectures, data teams face a new set of challenges: duplicated DAG patterns, fragile Kafka-triggered workflows, and debugging cycles that happen too late—often in production. In this session, we introduce a shift-left approach to pipeline reliability for environments combining Airflow with streaming platforms like Confluent. We’ll explore how event-driven pipelines increase complexity—and why traditional debugging and validation approaches no longer scale. You’ll see how IBM Bob, an AI-powered assistant for data engineers, brings real-time code review, refactoring guidance, and debugging insights directly into developer workflows. See more ...
Stop Fighting Fires: Autonomous Incident Response for Apache Airflow using AI-Powered Agents by Suba Palanisamy, Vinod Jayendra & Abdul Majid Mohammed Modern data platforms generate overwhelming amounts of operational data across distributed systems. For teams running Apache Airflow at scale, incidents often mean high mean time to resolution (MTTR), constant context switching between observability tools, and a growing on-call burden. What if your Airflow environment had an always-on, autonomous on-call engineer? In this workshop, we’ll explore how an AI-powered DevOps agent can supercharge Airflow operations — from automated DAG failure diagnosis and intelligent log analysis to proactive prevention of recurring incidents. Whether you’re running Airflow on a managed cloud service or self-hosted, the patterns and practices covered here apply broadly to modern data pipeline operations. See more ...
Streamlining Data Pipelines Creation at Stripe with Airflow by Jiayu Yi At Stripe, we process petabytes of data daily across thousands of pipelines powering financial reporting, fraud detection, and merchant analytics. As our data estate grew, so did the complexity of authoring, scheduling, and operating these pipelines. Engineers spent more time wrangling Airflow DAG boilerplate and managing dependencies than writing transformation logic. To address this, we built a declarative platform that generates Airflow DAGs from YAML and SQL definitions. Authors specify what they want — source tables, SQL transformations, incremental mode, output schema — and the platform handles the rest: generating Airflow tasks, wiring upstream sensors, registering Iceberg tables, and configuring scheduling parameters. A key piece is an in-house dataset-to-task mapping service that resolves upstream dataset dependencies to their producing Airflow tasks. When an author declares an input dataset, the platform automatically looks up which task produces it and generates the appropriate sensor — no manual DAG cross-referencing required. This eliminates an entire class of misconfigured dependency bugs common in hand-wired Airflow deployments. See more ...
Streamlining Your Airflow Upgrade: Essential Tools for Migrating from 2.x to 3 by Ankit Chaurasia Airflow 3 has officially arrived! If you’re considering an upgrade, this session will equip you with essential migration utilities that facilitate a smooth transition from Airflow 2.x. Attendees will learn the new CLI command, “airflow config lint”, to analyze your configuration files for any removed, deprecated, or renamed elements. This command provides comprehensive feedback and allows for filtering specific sections and options. During the session, attendees will learn to leverage a set of rigorous Ruff rules - AIR301, AIR302, and AIR303 - crafted to detect migration issues within your codebase automatically. Notably, rule AIR301 flags DAG definitions lacking an explicit schedule argument, a critical update in Airflow 3. Rule AIR302 identifies deprecated functions and removes configuration settings, offering recommended alternatives. Rule AIR303 highlights code that references components now shifted to provider packages, ensuring your integrations are up to date. See more ...
Talk to your Dags: Airflow AI Assistant (AIP-101 proposal) by Shahar Epstein Airflow runs the pipelines that matter. When a task breaks, the workflow is fragmented: you copy an error, ask your favorite LLM in another tab, and still end up back in the Grid, scrolling logs. AIP-101 proposes a better way: an opt-in AI assistant, natively integrated into Apache Airflow. Ask about your Dags, runs, and logs, and get grounded answers based on what you can already see. Built with a safety-first mindset, it respects your existing access, keeps sensitive details out of responses, and makes its help transparent. In this initial phase, the assistant explains, not executes. This talk highlights the user experience, the key design decisions, suggested high-level architecture, and what comes next for AI in Airflow. See more ...
Taming AI Workloads in Apache Airflow: Dag Patterns to Avoid Infrastructure Instability by Zhe You Liu Orchestrating AI workloads introduces a two-front battle with infrastructure instability. First, the Airflow workers themselves (e.g., Kubernetes pod evictions, Celery node scaling) can restart and lose track of active tasks. Second, the external AI cluster running the heavy compute can experience temporary network blips, API timeouts or compute rescheduling. With standard Dag designs, these transient hiccups often cause Airflow to panic, fail the task, and tragically send a kill signal to an expensive, perfectly healthy AI job. See more ...
Taming the MLOps Zoo: Orchestrating and Monitoring Models with Airflow by Lindy Bustabad Thanks to AI, your data scientists can build models faster than ever. The new bottleneck? Their attention. When your team maintains a zoo of ML models (dbt/SQL scoring models, Python ML on Kubernetes, and point-and-click product UI models) every new species adds feeding schedules, health checks, and habitat needs. The real question becomes: which animals need the zookeeper right now? At Pendo, we orchestrate 10+ ML models through Airflow, each with its own dbt Cloud feature prep, Kubernetes scoring pods, and downstream monitoring. This talk covers how we keep the zoo running: DAG dependencies across heterogeneous model types, conditional execution for models that only score on certain schedules, and model-specific sub-pipelines that keep each species healthy. Then we’ll demo DS ModelGuard, an agentic monitoring system we built internally that does the morning rounds, tracking API health, output volume, likelihood drift, and feature-level input drift, so your data scientists know which enclosure to check first. See more ...
The Messy Middle: When Your Data Team Doesn't Need a Streaming Engine by Constance Martineau There’s a class of workload that doesn’t belong in your streaming stack. A team needs to react to data arriving in S3 or a message landing in Kafka. The SLA is minutes. Someone reaches for Flink because the orchestrator can’t trigger on events. Six months later, you’re running a streaming app for what is a bounded computation with a latency requirement. This talk names that pattern, the “messy middle,” and argues that Airflow 3 eliminates the gap that pushed these workloads to streaming. Asset Watchers monitor external sources through async triggers, firing DAGs within minutes of event arrival. Assets turn data products into scheduling primitives. Partitions let Airflow reason about which slices of a dataset are ready. See more ...
The Rise of Abstraction in Dag Authoring: From YAML to Minecraft by Volker Janz In my almost 15 years as a data engineer, I’ve learned one universal truth: everyone needs orchestration. The marketing team needs daily attribution reports. The CRM team needs personalized newsletter triggers. The platform team needs cross-cloud data transfers. The analytics team needs third-party data imports. Data touches every corner of the business, and the orchestration layer is the one layer that connects it all. This talk explores what becomes possible when we decouple pipeline logic (what happens) from definition (how it’s authored). With the right abstractions, the authoring interface can be anything: Python, declarative YAML, templates, spreadsheets, or even a video game. See more ...
The Self-Healing Pipeline: How Error Classification Eliminated 100% of Engineering Oversight by Evgeny Nuger What if your pipeline could tell the difference between recoverable errors and real bugs and handle both without waking anyone up? At OnsiteIQ, we process millions of construction site images monthly through Airflow with mixed AWS Batch spot and on-demand tasks. We need to handle corrupt data, spot evictions, and real bugs. Before Airflow, every failure looked the same: something broke, an engineer investigated, and the same transient infrastructure issues kept masking real bugs underneath. In a 3-month solo migration, I built custom Airflow operators that automatically classified and handled every failure via Airflow’s callbacks. Actual code bugs surface through clean, noise-free alerts directly to actionable tickets. Every genuine bug got caught exactly once and permanently fixed. Engineering oversight dropped from 20% to zero within months. This talk covers the error classification architecture, automatic fallback patterns, and the framework for turning Airflow into a self-healing system. See more ...
The State of Airflow: Momentum, Innovation, and What's Next by Vikram Koka Airflow 3 has been out for a year. In this keynote, we take stock of where the community stands, what we built together, and where we are headed. We open with the data: adoption trends, community growth, and honest feedback from teams running Airflow 3 in production. What is working, what surprised us, and what the survey tells us about how the ecosystem is evolving. The second section covers the year in Airflow. Provider discovery and distribution has been modernized. Airflow gained first-class support for AI and LLM workloads. And scheduling became more powerful, letting pipelines respond to data at a finer granularity. See more ...
Toward a Polyglot Airflow by Tzu-ping Chung Building on Airflow 3’s new worker structure and foundation laied by Go SDK, we take a look at how Airflow can support a fully cross-language Dag-authoring experience. We will discuss how a new language SDK is built, how a task talks to Airflow, and how multiple languages may be mixed inside a Dag. To support additional languages without logic duplication, a new middle layer is required between Airflow and the task. Additional topics, such as security, distributed workload, and user interface considerations, will also be touched on. See more ...
Triggers at Datadog: What Are Trigger Queues and Why You Should Use Them by Zach Gottesman Datadog is a world-class data platform ingesting more than a 100 trillion events a day, providing real-time insights. Since our internal adoption of Airflow following the release of 3.0.0, the number of teams relying on our internal Airflow platform have grown organically and quickly. This internal Airflow adoption came with a number of platform challenges, requiring novel solutions which could support multi-tenancy, scalability, and bespoke runtime environments. In this talk, we will cover how we’ve expanded the functionality of Airflow triggers – via trigger queue assignment – to support multi-tenancy deployments, while contributing those solutions upstream with the broader Airflow community. We’ll cover the conceptual design and motivations for Trigger queues, and how the trigger queue pattern can benefit both multi-tenant and single-occupant Airflow systems alike. See more ...
Using agents to create ML pipelines with Dag factory on MWAA serverless by Sriram Ramarathnam Apache Airflow has long been the go-to orchestration platform for data engineering teams, but managing the underlying infrastructure remains a persistent challenge. Amazon MWAA Serverless eliminates that burden entirely — no environment sizing, no capacity planning, and no idle costs. In this hands-on workshop, attendees will get a practical introduction to MWAA Serverless and walk away having built and run a real end-to-end ML pipeline on AWS. In this workshop, we’ll use an agent equipped with MWAA serverless knowledge and Airflow DAG tooling to build a pipeline that takes raw training data from S3, kicks off a SageMaker training job, evaluates the output using Claude on Bedrock, and deploys if evaluation passes. You’ll watch the agent reason about the task, generate a valid YAML workflow using the DAG factory pattern, and deploy it to MWAA Serverless — no hand-written code required. Once deployed, we trigger a run and provide full observability: task-level logs from the SageMaker job, the Bedrock evaluation output, and the deploy decision. If something fails, we show how to identify the broken step, read its isolated logs, and iterate — either asking the agent to fix the YAML or rolling back to a prior version. The goal is the full loop: prompt, deploy, run, observe, debug, iterate. See more ...
UX to manage Dags to Assets to Agents - The evolution of the Airflow UI by Brent Bovenzi & Pierre Jeambrun The rise of more complex asset and agentic powered workflows, the Airflow UI needs to evolve beyond just a way to view failed logs and relationships between tasks. Come see how we are leveraging the latest Airflow features to build new user experiences that can handle growing agentic workflows. We’ll go through a few workflows to see how they can be solved through a traditional Dag-centric view or a new Asset-centric view. We will also showcase how both are becoming more realtime so you can always see what is happening. See more ...
When Airflow Meets Yunikorn: Enhancing Airflow on Kubernetes with Yunikorn for Higher Efficiency by Xiaodong Deng Apache Airflow’s Kubernetes integration enables flexible workload execution on Kubernetes but lacks advanced resource management features including application queueing, tenant isolation and gang scheduling. These features are increasingly critical for data engineering as well as AI/ML use cases, particularly GPU utilization optimization. For example, gang scheduling ensures all required resources for a job are allocated atomically, preventing partial allocations that waste resources. Apache Yunikorn, a Kubernetes-native scheduler, addresses these gaps by offering a high-performance alternative to Kubernetes default scheduler. In this talk, we’ll demonstrate how to conveniently leverage Yunikorn’s power in Airflow, along with practical use cases and examples. See more ...
Your first Apache Airflow Contribution by Amogh Desai, Kalyan Reddy & Phani Kumar Ready to contribute to Apache Airflow? In this hands-on workshop, we’ll help you jump straight into the project with real, beginner-friendly issues matched to your skills and interests. To make the most of our time together, come with a development environment set up in advance — installing Breeze is highly recommended, but GitHub Codespaces is a great alternative if Docker isn’t an option for you. We’ll walk through the full contribution journey step by step: exploring the codebase, picking an issue, opening your first pull request, and engaging with the community for feedback and reviews. Whether you’re interested in writing code, improving documentation, writing tests, or sharing ideas, there’s a welcoming place for you in the Airflow community. See more ...
Zero‑Code Airflow at Scale: Building Enterprise Pipelines from CSV to Production DAGs by Ajay Kumar Bhupatiraju As Airflow adoption expands across large enterprises, a core challenge emerges: How to enable multiple teams to design and operate data pipelines without relying heavily on specialized engineering expertise. In this session, we will present a zero‑code, metadata‑driven Airflow framework built and deployed within a large financial services organization to accelerate pipeline development and onboarding at scale. This framework allows users to define workflows using simple CSV or Excel inputs, which are automatically converted into YAML configurations and deployed as fully production‑ready Airflow DAGs using standardized templates on Astronomer. By leveraging a remote execution model and reusable DAG patterns, the solution supports orchestration across heterogeneous systems—including data warehouses, ingestion pipelines, and data quality frameworks—while maintaining enterprise‑grade governance, consistency, and observability. See more ...