Check out the full program for Airflow Summit.

If you prefer, you can also see this as sessionize layout or list of sessions.

Tuesday, October 7, 2025

8:50
Welcome
9:00
11:00
Coffee break
11:30
12:15
13:00
Lunch
14:00
14:30
15:00
15:35
Coffee break
15:45
16:15
16:45
09:00 - 11:00.
By Amogh Desai, Ash Berlin-Taylor, Brent Bovenzi, Bugra Ozturk, Daniel Standish, Jed Cunningham, Jens Scheffler, Kaxil Naik, Pierre Jeambrun, Tzu-ping Chung, Vikram Koka, Vincent Beck & Constance Martineau
Track: keynote
Room: Columbia A
09/10/2025 9:00 AM 09/10/2025 11:00 AM America/Los_Angeles AS24: Introducing Apache Airflow® 3 – The Next Evolution in Orchestration

Apache Airflow® 3 is here, bringing major improvements to data orchestration. In this keynote, core Airflow contributors will walk through key enhancements that boost flexibility, efficiency, and user experience.

Vikram Koka will kick things off with an overview of Airflow 3, followed by deep dives into DAG versioning (Jed Cunningham), enhanced backfilling (Daniel Standish), and a modernized UI (Brent Bovenzi & Pierre Jeambrun).

Next, Ash Berlin-Taylor, Kaxil Naik, and Amogh Desai will introduce the Task Execution Interface and Task SDK, enabling tasks in any environment and language. Jens Scheffler will showcase the Edge Executor, while Tzu-ping Chung and Vincent Beck will demo event-driven scheduling and data assets. Finally, Buğra Öztürk will unveil CLI enhancements for automation and debugging.

This keynote sets the stage for Airflow 3—don’t miss the chance to learn from the experts shaping the future of workflow orchestration!

Columbia A

Apache Airflow® 3 is here, bringing major improvements to data orchestration. In this keynote, core Airflow contributors will walk through key enhancements that boost flexibility, efficiency, and user experience.

Vikram Koka will kick things off with an overview of Airflow 3, followed by deep dives into DAG versioning (Jed Cunningham), enhanced backfilling (Daniel Standish), and a modernized UI (Brent Bovenzi & Pierre Jeambrun).

Next, Ash Berlin-Taylor, Kaxil Naik, and Amogh Desai will introduce the Task Execution Interface and Task SDK, enabling tasks in any environment and language. Jens Scheffler will showcase the Edge Executor, while Tzu-ping Chung and Vincent Beck will demo event-driven scheduling and data assets. Finally, Buğra Öztürk will unveil CLI enhancements for automation and debugging.

11:30 - 12:10.
By Tatiana Al-Chueyr Martins & Rahul Vats
Track: Best practices
Room: Beckler
10/07/2025 11:30 AM 10/07/2025 12:10 PM America/Los_Angeles AS24: Benchmarking the Performance of Dynamically Generated DAGs

As teams scale their Airflow workflows, a common question is: “My DAG has 5,000 tasks—how long will it take to run in Airflow?”

Beyond execution time, users often face challenges with dynamically generated DAGs, such as:

  • Delayed visualization in the Airflow UI after deployment.
  • High resource consumption, leading to Kubernetes pod evictions and out-of-memory errors.

While estimating the resource utilization in a distributed data platform is complex, benchmarking can provide crucial insights.

In this talk, we’ll share our approach to benchmarking dynamically generated DAGs with Astronomer Cosmos (https://github.com/astronomer/astronomer-cosmos), covering:

  • Designing representative and extensible baseline tests.
  • Setting up an isolated, distributed infrastructure for benchmarking.
  • Running reproducible performance tests.
  • Measuring DAG run times and task throughput.
  • Evaluating CPU & memory consumption to optimize deployments.

By the end of this session, you will have practical benchmarks and strategies for making informed decisions about evaluating the performance of DAGs in Airflow.

Beckler

As teams scale their Airflow workflows, a common question is: “My DAG has 5,000 tasks—how long will it take to run in Airflow?”

Beyond execution time, users often face challenges with dynamically generated DAGs, such as:

  • Delayed visualization in the Airflow UI after deployment.
  • High resource consumption, leading to Kubernetes pod evictions and out-of-memory errors.

While estimating the resource utilization in a distributed data platform is complex, benchmarking can provide crucial insights.

11:30 - 12:10.
By Rahul Gade & Arun Kumar
Track: Airflow & ...
Room: Columbia D
10/07/2025 11:30 AM 10/07/2025 12:10 PM America/Los_Angeles AS24: Linkedin's Journey on scaling airflow

Last year, we shared how LinkedIn’s continuous deployment platform (LCD) leveraged Apache Airflow to streamline and automate deployment workflows. LCD is the deployment platform inside Linkedin which is actively used by all engineers (10000+) at Likedin.

This year, we take a deeper dive into the challenges, solutions, and engineering innovations that helped us scale Airflow to support thousands of concurrent tasks while maintaining usability and reliability.

Key Takeaways: Abstracting Airflow for a Better User Experience – How we designed a system where users could define and update their workflows without directly interacting with Airflow.

Scaling to 10,000+ Concurrent Tasks – The architectural and configuration changes that enabled us to scale execution efficiently.

Enhanced Observability & Monitoring – The tools and techniques we implemented to track Airflow’s health, detect failures, and improve reliability.

Lessons from the Field – Key learnings, trade-offs, and best practices for managing large-scale Airflow deployments.

Columbia D

Last year, we shared how LinkedIn’s continuous deployment platform (LCD) leveraged Apache Airflow to streamline and automate deployment workflows. LCD is the deployment platform inside Linkedin which is actively used by all engineers (10000+) at Likedin.

This year, we take a deeper dive into the challenges, solutions, and engineering innovations that helped us scale Airflow to support thousands of concurrent tasks while maintaining usability and reliability.

Key Takeaways: Abstracting Airflow for a Better User Experience – How we designed a system where users could define and update their workflows without directly interacting with Airflow.

11:30 - 12:10.
By Amogh Desai & Ash Berlin-Taylor
Track: Airflow 3
Room: Columbia A
10/07/2025 11:30 AM 10/07/2025 12:10 PM America/Los_Angeles AS24: Security made us do it: Airflow’s new Task Execution Architecture

Airflow v2 architecture has strong coupling between the Airflow core & the User Code running in an Airflow task. This poses barriers in security, maintenance, and adoption. One such threat is that user code can access the source of truth of Airflow - the metadata DB and run any query against it! From a scalability angle, ‘n’ tasks create ‘n’ DB connections, limiting Airflow’s ability to scale effectively.

To address this we proposed AIP-72 – a client-server model for task execution. The new architecture addresses several long-standing issues, including DB isolation from workers, dependency conflicts between Airflow core & workers, and ‘n’ number of DB connections.The new architecture has two parts:

  1. Execution API Server: Tasks no longer have direct DB access, they use this new slim, secure API
  2. Task SDK: A lightweight toolkit that lets you write tasks without drowning within Airflow’s codebase

Beyond isolation and security, the redesign unlocks the ability for native multi-language task authoring support, and secure Remote Execution. Join us to explore how AIP-72 transforms Airflow task execution, paving the way for a more secure, flexible, and futuristic task orchestration!

Columbia A

Airflow v2 architecture has strong coupling between the Airflow core & the User Code running in an Airflow task. This poses barriers in security, maintenance, and adoption. One such threat is that user code can access the source of truth of Airflow - the metadata DB and run any query against it! From a scalability angle, ‘n’ tasks create ‘n’ DB connections, limiting Airflow’s ability to scale effectively.

To address this we proposed AIP-72 – a client-server model for task execution. The new architecture addresses several long-standing issues, including DB isolation from workers, dependency conflicts between Airflow core & workers, and ‘n’ number of DB connections.The new architecture has two parts:

11:30 - 12:10.
By John Jackson
Track: Use cases
Room: Columbia C
10/07/2025 11:30 AM 10/07/2025 12:10 PM America/Los_Angeles AS24: Why AWS chose Apache Airflow to power workflows for the next generation of Amazon SageMaker

On March 13th, 2025, Amazon Web Services announced General Availability of Amazon SageMaker Unified Studio, bringing together AWS machine learning and analytics capabilities. At the heart of this next generation of Amazon SageMaker sits Apache Airflow. All SageMaker Unified Studio users have a personal, open-source Airflow deployment, running alongside their Jupyter notebook, enabling those users to easily develop Airflow DAGs that have unified access to all of their data.

In this talk, I will go into details around the motivations for choosing Airflow for this capability, the challenges with incorporating Airflow into such a large and diverse experience, the key role that open-source plays, how we’re leveraging GenAI to make that open source development experience better, and the goals for the future of Airflow in SageMaker Unified Studio.

Attendees will leave with a better understanding of the considerations they need to make when choosing Airflow as a component of their enterprise project, and a greater appreciation of how Airflow can power advanced capabilities.

Columbia C

On March 13th, 2025, Amazon Web Services announced General Availability of Amazon SageMaker Unified Studio, bringing together AWS machine learning and analytics capabilities. At the heart of this next generation of Amazon SageMaker sits Apache Airflow. All SageMaker Unified Studio users have a personal, open-source Airflow deployment, running alongside their Jupyter notebook, enabling those users to easily develop Airflow DAGs that have unified access to all of their data.

In this talk, I will go into details around the motivations for choosing Airflow for this capability, the challenges with incorporating Airflow into such a large and diverse experience, the key role that open-source plays, how we’re leveraging GenAI to make that open source development experience better, and the goals for the future of Airflow in SageMaker Unified Studio.

12:15 - 12:55.
By Jed Cunningham & Ephraim Anierobi
Track: Airflow 3
Room: Columbia A
10/07/2025 12:15 PM 10/07/2025 12:55 PM America/Los_Angeles AS24: Airflow That Remembers: The Dag Versioning Era is here!

Airflow 3 introduced a game-changing feature: Dag versioning.

Gone are the days of “latest only” Dags and confusing, inconsistent UI views when pipelines change mid-flight. This talk covers:

  • Visualizing Dag changes over time in the UI
  • How Dags code is versioned and can be grabbed from external sources
  • Executing a whole Dag run against the same code version
  • Dynamic Dags? Where do they fit in?!

You’ll see real-world scenarios, UI demos, and learn how these advancements will help avoid “Airflow amnesia”.

Columbia A

Airflow 3 introduced a game-changing feature: Dag versioning.

Gone are the days of “latest only” Dags and confusing, inconsistent UI views when pipelines change mid-flight. This talk covers:

  • Visualizing Dag changes over time in the UI
  • How Dags code is versioned and can be grabbed from external sources
  • Executing a whole Dag run against the same code version
  • Dynamic Dags? Where do they fit in?!

You’ll see real-world scenarios, UI demos, and learn how these advancements will help avoid “Airflow amnesia”.

12:15 - 12:55.
By Sungji Yang & DaeHoon Song
Track: Best practices
Room: Beckler
10/07/2025 12:15 PM 10/07/2025 12:55 PM America/Los_Angeles AS24: Enhancing DAG Management with DMS: A Scalable Solution for Airflow

In this talk, we will introduce the DAG Management Service (DMS), developed to address critical challenges in managing Airflow clusters. With over 10,000 active DAGs, a single Airflow cluster faces scaling limits and noisy neighbor issues, impacting task scheduling SLAs. DMS enhances reliability by distributing DAGs across multiple clusters and enforcing proper configurations.

We will also discuss how DMS streamlines Airflow version upgrades. Upgrading from an old Airflow version to the latest requires sequential updates and code modifications for over 10,000 DAGs. DMS proposes an efficient upgrade method, reducing dependency on users.

Key functions of DMS include:

  1. DAG Deployment: Selectively deploys DAG files from GitHub to Airflow clusters via an event-driven pipeline.
  2. DAG Migration: Facilitates seamless DAG migration between clusters, supporting both cluster upgrades and team-specific deployments.
  3. Connections and Variables Management: Centralizes management of connection IDs and variables, ensuring consistency and smooth migrations.

Join us to explore how DMS can revolutionize your Airflow DAG management, enhancing scalability, reliability, and efficiency.

Beckler

In this talk, we will introduce the DAG Management Service (DMS), developed to address critical challenges in managing Airflow clusters. With over 10,000 active DAGs, a single Airflow cluster faces scaling limits and noisy neighbor issues, impacting task scheduling SLAs. DMS enhances reliability by distributing DAGs across multiple clusters and enforcing proper configurations.

We will also discuss how DMS streamlines Airflow version upgrades. Upgrading from an old Airflow version to the latest requires sequential updates and code modifications for over 10,000 DAGs. DMS proposes an efficient upgrade method, reducing dependency on users.

12:15 - 12:55.
By Julien Le Dem & Zach Gottesman
Track: Use cases
Room: Columbia C
10/07/2025 12:15 PM 10/07/2025 12:55 PM America/Los_Angeles AS24: Event-Driven, Partition-Aware: Modern Orchestration with Airflow at Datadog

Datadog is a world-class data platform ingesting more than a 100 trillion events a day, providing real-time insights.

Before Airflow’s prominence, we built batch processing on Luigi, Spotify’s open-source orchestrator. As Airflow gained wide adoption, we evaluated adopting the major improvements of release 2.0, but opted for building our own orchestrator instead to realize our dataset-centric, event-driven vision.

Meanwhile, the 3.0 release aligned Airflow with the same vision we pursued internally, as a modern asset-driven orchestrator. It showed how futile it is to build our own compared to the momentum of the community. We evaluated several orchestrators and decided to join forces with the Airflow project.

This talk follows our journey from building a custom orchestrator to adopting and contributing to Airflow 3. We’ll share our thought process, our asset partitions use case, and how we’re working with the community to materialize the Data Awareness (AIP-73) vision. Partition-based incremental scheduling is core to our orchestration model, enabling scalable, observable pipelines thanks to Datadog’s Data Observability product providing visibility into pipeline health.

Columbia C

Datadog is a world-class data platform ingesting more than a 100 trillion events a day, providing real-time insights.

Before Airflow’s prominence, we built batch processing on Luigi, Spotify’s open-source orchestrator. As Airflow gained wide adoption, we evaluated adopting the major improvements of release 2.0, but opted for building our own orchestrator instead to realize our dataset-centric, event-driven vision.

Meanwhile, the 3.0 release aligned Airflow with the same vision we pursued internally, as a modern asset-driven orchestrator. It showed how futile it is to build our own compared to the momentum of the community. We evaluated several orchestrators and decided to join forces with the Airflow project.

12:15 - 12:55.
By Przemek Wiech & Augusto Hidalgo
Track: Airflow 3
Room: Columbia D
10/07/2025 12:15 PM 10/07/2025 12:55 PM America/Los_Angeles AS24: Lessons learned for scaling up Airflow 3 in Public Cloud

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2.

This presentation is going to present performance results for various Airflow 3 configurations and provide information to users to should give Airflow 3 adopters good understanding of Airflow 3 performance.

The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users.

  • Users will be provided with comparison of Airflow 3 and Airflow 2 from performance standpoint

  • Users also will learn how to optimize Airflow scheduler performance by understanding DAG file processing, task scheduling and configuring Scheduler to run tens of thousands of DAGs/tasks in Airflow 3

Columbia D

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2.

This presentation is going to present performance results for various Airflow 3 configurations and provide information to users to should give Airflow 3 adopters good understanding of Airflow 3 performance.

The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users.

14:00 - 14:25.
By Jon Hiett
Track: Sponsored
Room: Columbia D
10/07/2025 2:00 PM 10/07/2025 2:25 PM America/Los_Angeles AS24: Airflow & Your Automation CoE: Streamlining Integration for Enterprise-Wide Governance and Value

As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.

Columbia D

As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.

14:00 - 14:25.
By Yuan Luo & Huiliang Zhang
Track: Use cases
Room: Columbia C
10/07/2025 2:00 PM 10/07/2025 2:25 PM America/Los_Angeles AS24: Empowering Precision Healthcare with Apache Airflow-iKang Healthcare Group’s DataHub Journey

iKang Healthcare Group, serving nearly 10 million patients annually, built a centralized healthcare data hub powered by Apache Airflow to support its large-scale, real-time clinical operations. The platform integrates batch and streaming data in a lakehouse architecture, orchestrating complex workflows from data ingestion (HL7/FHIR) to clinical decision support.

Healthcare data’s inherent complexity—spanning structured lab results to unstructured clinical notes—requires dynamic, reliable orchestration. iKang uses Airflow’s DAGs, extensibility, and workflow-as-code capabilities to address challenges like multi-system coordination, semantic data linking, and fault-tolerant automation.

iKang extended Airflow with cross-DAG event triggers, task priority weights, LLM-driven clinical text processing, and a visual drag-and-drop DAG builder for medical teams. These innovations improved diagnostic turnaround, patient safety, and cross-system workflow visibility.

iKang’s work demonstrates Airflow’s power in transforming healthcare data infrastructure and advancing intelligent, scalable patient care.

Columbia C

iKang Healthcare Group, serving nearly 10 million patients annually, built a centralized healthcare data hub powered by Apache Airflow to support its large-scale, real-time clinical operations. The platform integrates batch and streaming data in a lakehouse architecture, orchestrating complex workflows from data ingestion (HL7/FHIR) to clinical decision support.

Healthcare data’s inherent complexity—spanning structured lab results to unstructured clinical notes—requires dynamic, reliable orchestration. iKang uses Airflow’s DAGs, extensibility, and workflow-as-code capabilities to address challenges like multi-system coordination, semantic data linking, and fault-tolerant automation.

14:00 - 14:25.
By Maggie Stark & Marion Azoulai
Track: Best practices
Room: Beckler
10/07/2025 2:00 PM 10/07/2025 2:25 PM America/Los_Angeles AS24: Orchestrating Data Quality - Quality Data Brought To You By Airflow

Ensuring high-quality data is essential for building user trust and enabling data teams to work efficiently. In this talk, we’ll explore how the Astronomer data team leverages Airflow to uphold data quality across complex pipelines; minimizing firefighting and maximizing confidence in reported metrics.

Maintaining data quality requires a multi-faceted approach: safeguarding the integrity of source data, orchestrating pipelines reliably, writing robust code, and maintaining consistency in outputs. We’ve embedded data quality into the DevEx experience, so it’s always at the forefront instead of in the backlog of tech debt.

We’ll share how we’ve operationalized:

  • Implementing data contracts to define and enforce expectations
  • Differentiating between critical (pipeline-blocking) and non-critical (soft) failures
  • Exposing upstream data issues to domain owners
  • Tracking metrics to measure overall data quality of our team

Join us to learn practical strategies for building scalable, trustworthy data systems powered by Airflow.

Beckler

Ensuring high-quality data is essential for building user trust and enabling data teams to work efficiently. In this talk, we’ll explore how the Astronomer data team leverages Airflow to uphold data quality across complex pipelines; minimizing firefighting and maximizing confidence in reported metrics.

Maintaining data quality requires a multi-faceted approach: safeguarding the integrity of source data, orchestrating pipelines reliably, writing robust code, and maintaining consistency in outputs. We’ve embedded data quality into the DevEx experience, so it’s always at the forefront instead of in the backlog of tech debt.

14:00 - 14:25.
By Vincent Beck
Track: Airflow 3
Room: Columbia A
10/07/2025 2:00 PM 10/07/2025 2:25 PM America/Los_Angeles AS24: Unlocking Event-Driven Scheduling in Airflow 3: A New Era of Reactive Data Pipelines

Airflow 3 introduces a major evolution in orchestration: native support for external event-driven scheduling. In this talk, I’ll share the journey behind AIP-82—why we needed it, how we built it, and what it unlocks. I’ll dive into how the new AssetWatcher enables pipelines to respond immediately to events like file arrivals, API calls, or pub/sub messages. You’ll see how this drastically reduces latency and infrastructure overhead while improving reactivity and resource efficiency. We’ll explore how it works under the hood, real-world use cases, best practices, and migration tips for teams ready to shift from time-based to event-driven workflows. If you’re looking to make your Airflow DAGs more dynamic, this is the talk that shows you how. Whether you’re an operator or contributor, you’ll walk away with a deep understanding of one of Airflow 3’s most impactful features.

Columbia A

Airflow 3 introduces a major evolution in orchestration: native support for external event-driven scheduling. In this talk, I’ll share the journey behind AIP-82—why we needed it, how we built it, and what it unlocks. I’ll dive into how the new AssetWatcher enables pipelines to respond immediately to events like file arrivals, API calls, or pub/sub messages. You’ll see how this drastically reduces latency and infrastructure overhead while improving reactivity and resource efficiency. We’ll explore how it works under the hood, real-world use cases, best practices, and migration tips for teams ready to shift from time-based to event-driven workflows. If you’re looking to make your Airflow DAGs more dynamic, this is the talk that shows you how. Whether you’re an operator or contributor, you’ll walk away with a deep understanding of one of Airflow 3’s most impactful features.

14:15 - 14:40.
By William Orgertrice
Track: Best practices
Room: Beckler
10/07/2025 2:15 PM 10/07/2025 2:40 PM America/Los_Angeles AS24: 5 Simple Strategies To Enhance Your DAGs For Data Processing

Take your DAGs in Apache Airflow to the next level? This is an insightful session where we’ll uncover 5 transformative strategies to enhance your data workflows. Whether you’re a data engineering pro or just getting started, this presentation is packed with practical tips and actionable insights that you can apply right away.

We’ll dive into the magic of using powerful libraries like Pandas, share techniques to trim down data volumes for faster processing, and highlight the importance of modularizing your code for easier maintenance. Plus, you’ll discover efficient ways to monitor and debug your DAGs, and how to make the most of Airflow’s built-in features.

By the end of this session, you’ll have a toolkit of strategies to boost the efficiency and performance of your DAGs, making your data processing tasks smoother and more effective. Don’t miss out on this opportunity to elevate your Airflow DAGs!

Beckler

Take your DAGs in Apache Airflow to the next level? This is an insightful session where we’ll uncover 5 transformative strategies to enhance your data workflows. Whether you’re a data engineering pro or just getting started, this presentation is packed with practical tips and actionable insights that you can apply right away.

We’ll dive into the magic of using powerful libraries like Pandas, share techniques to trim down data volumes for faster processing, and highlight the importance of modularizing your code for easier maintenance. Plus, you’ll discover efficient ways to monitor and debug your DAGs, and how to make the most of Airflow’s built-in features.

14:15 - 14:40.
By Justin Wang & Saurabh Gupta
Track: Use cases
Room: Columbia A
10/07/2025 2:15 PM 10/07/2025 2:40 PM America/Los_Angeles AS24: Airflow at Zoox: A journey to orchestrate heterogeneous workflows

The workflow orchestration team at Zoox aims to build a solution for orchestrating heterogeneous workflows encompassing data, ML, and QA pipelines. We have encountered two primary challenges: first, the steep learning curve for new Airflow users and the need for a user-friendly yet scalable development process; second, integrating and migrating existing pipelines with established solutions.

This presentation will detail our approach, as a small team at Zoox, to address these challenges. The discussion will cover the scope and scale of Airflow within Zoox, including current applications and future directions. Furthermore, we will share our strategies for simplifying the Airflow DAG creation process and enhancing user experience. Finally, we will present a case study illustrating the onboarding of a heterogeneous workflow across Databricks, AWS, and a Zoox in-house platform to manage both on-prem and cloud services.

Columbia A

The workflow orchestration team at Zoox aims to build a solution for orchestrating heterogeneous workflows encompassing data, ML, and QA pipelines. We have encountered two primary challenges: first, the steep learning curve for new Airflow users and the need for a user-friendly yet scalable development process; second, integrating and migrating existing pipelines with established solutions.

This presentation will detail our approach, as a small team at Zoox, to address these challenges. The discussion will cover the scope and scale of Airflow within Zoox, including current applications and future directions. Furthermore, we will share our strategies for simplifying the Airflow DAG creation process and enhancing user experience. Finally, we will present a case study illustrating the onboarding of a heterogeneous workflow across Databricks, AWS, and a Zoox in-house platform to manage both on-prem and cloud services.

14:15 - 14:40.
By Jens Scheffler & Daniel Wolf
Track: Airflow 3
Room: Columbia C
10/07/2025 2:15 PM 10/07/2025 2:40 PM America/Los_Angeles AS24: EdgeExecutor / Edge Worker - The new option to run anywhere

Airflow 3 extends the deployment options to run your workload anywhere. You don’t need to bring your data to airflow but you can bring the execution where it needs to be.

You can connect any cloud and on-prem location together and generate a hybrid workflow from one central Airflow instance. Only a HTTP connection is needed.

We will present the use cases and concepts of the Edge deployment and how it is working also in a hybrid setup with Celery or other executors.

Columbia C

Airflow 3 extends the deployment options to run your workload anywhere. You don’t need to bring your data to airflow but you can bring the execution where it needs to be.

You can connect any cloud and on-prem location together and generate a hybrid workflow from one central Airflow instance. Only a HTTP connection is needed.

We will present the use cases and concepts of the Edge deployment and how it is working also in a hybrid setup with Celery or other executors.

14:15 - 14:40.
By Kyle McCluskey
Track: Use cases
Room: Columbia D
10/07/2025 2:15 PM 10/07/2025 2:40 PM America/Los_Angeles AS24: ELT, AI, and Elections: Leveraging Airflow and Machine Learning to Analyze Voting Behavior at INTRVL

Discover how Apache Airflow powers scalable ELT pipelines, enabling seamless data ingestion, transformation, and machine learning-driven insights. This session will walk through:

Automating Data Ingestion: Using Airflow to orchestrate raw data ingestion from third-party sources into your data lake (S3, GCP), ensuring a steady pipeline of high-quality training and prediction data.

Optimizing Transformations with Serverless Computing: Offloading intensive transformations to serverless functions (GCP Cloud Run, AWS Lambda) and machine learning models (BigQuery ML, Sagemaker), integrating their outputs seamlessly into Airflow workflows.

Real-World Impact: A case study on how INTRVL leveraged Airflow, BigQuery ML, and Cloud Run to analyze early voting data in near real-time, generating actionable insights on voter behavior across swing states.

This talk not only provides a deep dive into the Political Tech space but also serves as a reference architecture for building robust, repeatable ELT pipelines. Attendees will gain insights into modern serverless technologies from AWS and GCP that enhance Airflow’s capabilities, helping data engineers design scalable, cloud-agnostic workflows.

Columbia D

Discover how Apache Airflow powers scalable ELT pipelines, enabling seamless data ingestion, transformation, and machine learning-driven insights. This session will walk through:

Automating Data Ingestion: Using Airflow to orchestrate raw data ingestion from third-party sources into your data lake (S3, GCP), ensuring a steady pipeline of high-quality training and prediction data.

Optimizing Transformations with Serverless Computing: Offloading intensive transformations to serverless functions (GCP Cloud Run, AWS Lambda) and machine learning models (BigQuery ML, Sagemaker), integrating their outputs seamlessly into Airflow workflows.

14:30 - 14:55.
By Chinni Krishna Abburi
Track: Use cases
Room: Columbia C
10/07/2025 2:30 PM 10/07/2025 2:55 PM America/Los_Angeles AS24: Automating Business Intelligence with Airflow: A Practical Guide

In today’s fast-paced business world, timely and reliable insights are crucial — but manual BI workflows can’t keep up. This session offers a practical guide to automating business intelligence processes using Apache Airflow. We’ll walk through real-world examples of automating data extraction, transformation, dashboard refreshes, and report distribution. Learn how to design DAGs that align with business SLAs, trigger workflows based on events, integrate with popular BI tools like Tableau and Power BI, and implement alerting and failure recovery mechanisms. Whether you’re new to Airflow or looking to scale your BI operations, this session will equip you with actionable strategies to save time, reduce errors, and supercharge your organization’s decision-making capabilities.

Columbia C

In today’s fast-paced business world, timely and reliable insights are crucial — but manual BI workflows can’t keep up. This session offers a practical guide to automating business intelligence processes using Apache Airflow. We’ll walk through real-world examples of automating data extraction, transformation, dashboard refreshes, and report distribution. Learn how to design DAGs that align with business SLAs, trigger workflows based on events, integrate with popular BI tools like Tableau and Power BI, and implement alerting and failure recovery mechanisms. Whether you’re new to Airflow or looking to scale your BI operations, this session will equip you with actionable strategies to save time, reduce errors, and supercharge your organization’s decision-making capabilities.

14:30 - 14:55.
By Christos Bisias
Track: Airflow 3
Room: Columbia A
10/07/2025 2:30 PM 10/07/2025 2:55 PM America/Los_Angeles AS24: Beyond Logs: Unlocking Airflow 3.0 Observability with OpenTelemetry Traces

Using OpenTelemetry tracing, users can gain full visibility into tasks and calls to outside services. This is an increasingly important skill, especially as tasks in an Airflow DAG involve multiple complex computations which take hours or days to complete. Airflow allows users to easily monitor how long entire DAG runs or individual tasks take, but preserves the anonymity of internal actions. OpenTelemetry gives users much more operational awareness and metrics they can use to improve operations.

This presentation will explain the basics: what OpenTelemetry is and how it works – perfect for someone with no prior familiarity with tracing or with the use of OpenTelemetry. It will demonstrate how Airflow users can leverage the new tracing support to achieve deeper observability into DAG runs.

Columbia A

Using OpenTelemetry tracing, users can gain full visibility into tasks and calls to outside services. This is an increasingly important skill, especially as tasks in an Airflow DAG involve multiple complex computations which take hours or days to complete. Airflow allows users to easily monitor how long entire DAG runs or individual tasks take, but preserves the anonymity of internal actions. OpenTelemetry gives users much more operational awareness and metrics they can use to improve operations.

14:30 - 14:55.
By Snir Israeli
Track: Use cases
Room: Beckler
10/07/2025 2:30 PM 10/07/2025 2:55 PM America/Los_Angeles AS24: DAGLint: Elevating Airflow DAG Quality Through Automated Linting

Maintaining consistency, code quality, and best practices for writing Airflow DAGs between teams and individual developers can be a significant challenge. Trying to achieve it using manual code reviews is both time-consuming and error-prone.

To solve this at Next, we decided to build a custom, internally developed linting tool for Airflow DAGs, to help us evaluate their quality and uniformity - we call it - DAGLint.

In this talk I am going to share why we chose to implement it, how we built it, and how we use it to elevate our code quality and standards throughout the entire Data engineering group.

This tool supports our day-to-day development process, provides us with a visual analysis of the state of our entire code base, and allows our code reviews to focus on other code quality aspects. We can now easily identify deviations from our defined standards, promote consistency throughout our DAGs repository, and extend the tool with additional new standards introduced to our group.

The talk will cover how you can implement similar solution in your own organization, we also published a blog post on it https://medium.com/apache-airflow/mastering-airflow-dag-standardization-with-pythons-ast-a-deep-dive-into-linting-at-scale-1396771a9b90

Beckler

Maintaining consistency, code quality, and best practices for writing Airflow DAGs between teams and individual developers can be a significant challenge. Trying to achieve it using manual code reviews is both time-consuming and error-prone.

To solve this at Next, we decided to build a custom, internally developed linting tool for Airflow DAGs, to help us evaluate their quality and uniformity - we call it - DAGLint.

In this talk I am going to share why we chose to implement it, how we built it, and how we use it to elevate our code quality and standards throughout the entire Data engineering group.

14:30 - 14:55.
By Denny Lee
Track: Sponsored
Room: Columbia D
10/07/2025 2:30 PM 10/07/2025 2:55 PM America/Los_Angeles AS24: Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment.

In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines.

Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

Columbia D

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment.

In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines.

15:00 - 15:25.
By Zdravko Hvarlingov & Ivan Nikolov
Track: Use cases
Room: Beckler
10/07/2025 3:00 PM 10/07/2025 3:25 PM America/Los_Angeles AS24: Breaking News with Data Pipelines: How Airflow and AI Power Investigative Journalism

Investigative journalism often relies on uncovering hidden patterns in vast amounts of unstructured and semi-structured data. At the FT, we leverage Airflow to orchestrate AI-powered pipelines that transform complex, fragmented datasets into structured insights. Our Storyfinding team works closely with journalists to automate tedious data processing, enabling them to tell stories that might otherwise go untold.

This talk will explore how we use Airflow to process and analyze text, documents, and other difficult-to-structure data sources combining AI, machine learning, and advanced computational techniques to extract meaningful entities, relationships, and patterns. We’ll also showcase our connection analysis workflows, which link various datasets to reveal previously hidden chains of people and companies, a crucial capability for investigative reporting.

Attendees will learn:

  • How Airflow can orchestrate AI-driven pipelines for handling unstructured and semi-structured data.
  • Techniques for automating connection analysis to support investigative journalism.
  • Lessons from our experience working with journalists to develop data-driven storytelling and storyfinding capabilities.
Beckler

Investigative journalism often relies on uncovering hidden patterns in vast amounts of unstructured and semi-structured data. At the FT, we leverage Airflow to orchestrate AI-powered pipelines that transform complex, fragmented datasets into structured insights. Our Storyfinding team works closely with journalists to automate tedious data processing, enabling them to tell stories that might otherwise go untold.

This talk will explore how we use Airflow to process and analyze text, documents, and other difficult-to-structure data sources combining AI, machine learning, and advanced computational techniques to extract meaningful entities, relationships, and patterns. We’ll also showcase our connection analysis workflows, which link various datasets to reveal previously hidden chains of people and companies, a crucial capability for investigative reporting.

15:00 - 15:25.
By Dennis Ferruzzi
Track: Airflow 3
Room: Columbia A
10/07/2025 3:00 PM 10/07/2025 3:25 PM America/Los_Angeles AS24: Deadline Alerts in Airflow 3.1

Do you have a DAG that needs to be done by a certain time? Have you tried to use Airflow 2’s SLA feature and found it restrictive or complicated? You aren’t alone! Come learn about the all-new Deadline Alerts feature in Airflow 3.1 which replaces SLA. We will discuss how Deadline Alerts work and how they improve on the retired SLA feature. Then we will look at some examples of workflows you can build with the new feature, including some of the callback options and how they work, and finally looking ahead to some future use-cases of using Deadlines for Tasks and even Assets.

Columbia A

Do you have a DAG that needs to be done by a certain time? Have you tried to use Airflow 2’s SLA feature and found it restrictive or complicated? You aren’t alone! Come learn about the all-new Deadline Alerts feature in Airflow 3.1 which replaces SLA. We will discuss how Deadline Alerts work and how they improve on the retired SLA feature. Then we will look at some examples of workflows you can build with the new feature, including some of the callback options and how they work, and finally looking ahead to some future use-cases of using Deadlines for Tasks and even Assets.

15:00 - 15:25.
By Basil Faruqui
Track: Sponsored
Room: Columbia D
10/07/2025 3:00 PM 10/07/2025 3:25 PM America/Los_Angeles AS24: Orchestrator of Orchestrators: Uniting Airflow Pipelines with Business Applications in Production

Airflow powers thousands of data and ML pipelines—but in the enterprise, these pipelines often need to interact with business-critical systems like ERPs, CRMs, and core banking platforms.

In this demo-driven session we will connect Airflow with Control-M from BMC and showcase how Airflow can participate in end-to-end workflows that span not just data platforms but also transactional business applications.

Session highlights

  • Trigger Airflow DAGs based on business events (e.g., invoice approvals, trade settlements)
  • Feed Airflow pipeline outputs into ERP systems (e.g., SAP) or CRMs (e.g., Salesforce)
  • Orchestrate multi-platform workflows from cloud to mainframe with SLA enforcement, dependency management, and centralized control.
  • Provide unified monitoring and auditing across data and application layers
Columbia D

Airflow powers thousands of data and ML pipelines—but in the enterprise, these pipelines often need to interact with business-critical systems like ERPs, CRMs, and core banking platforms.

In this demo-driven session we will connect Airflow with Control-M from BMC and showcase how Airflow can participate in end-to-end workflows that span not just data platforms but also transactional business applications.

Session highlights

  • Trigger Airflow DAGs based on business events (e.g., invoice approvals, trade settlements)
  • Feed Airflow pipeline outputs into ERP systems (e.g., SAP) or CRMs (e.g., Salesforce)
  • Orchestrate multi-platform workflows from cloud to mainframe with SLA enforcement, dependency management, and centralized control.
  • Provide unified monitoring and auditing across data and application layers
15:00 - 15:25.
By Ankit Chaurasia
Track: Best practices
Room: Columbia C
10/07/2025 3:00 PM 10/07/2025 3:25 PM America/Los_Angeles AS24: Seamless Airflow Upgrades: Migrating from 2.x to 3

Airflow 3 has officially arrived! In this session, we’ll start by discussing prerequisites for a smooth upgrade from Airflow 2.x to Airflow 3, including airflow version requirements, removing deprecated SubDAGs, and backing up and cleaning your metadata database prior to migration. We’ll then explore the new CLI utility: airflow config update [—-fix] for auto-applying configuration changes. We’ll demo cleaning old XCom data to speed up schema migration.

During this session, attendees will learn to verify and adapt their pipelines for Airflow 3 using a Ruff-based upgrade utility. I will demo run ruff check dag/ –select AIR301 to surface scheduling issues, inspect fixes via ruff check dag/ –select AIR301 –show-fixes, and apply corrections with ruff check dag/ –select AIR301 –fix. We’ll also examine rules AIR302 for deprecated config and AIR303 for provider package migrations. By the end, your DAGs will pass all AIR3xx checks error-free.

Join this session for live demos and practical examples that will empower you to confidently upgrade, minimise downtime, and achieve optimal performance in Airflow 3.

Columbia C

Airflow 3 has officially arrived! In this session, we’ll start by discussing prerequisites for a smooth upgrade from Airflow 2.x to Airflow 3, including airflow version requirements, removing deprecated SubDAGs, and backing up and cleaning your metadata database prior to migration. We’ll then explore the new CLI utility: airflow config update [—-fix] for auto-applying configuration changes. We’ll demo cleaning old XCom data to speed up schema migration.

During this session, attendees will learn to verify and adapt their pipelines for Airflow 3 using a Ruff-based upgrade utility. I will demo run ruff check dag/ –select AIR301 to surface scheduling issues, inspect fixes via ruff check dag/ –select AIR301 –show-fixes, and apply corrections with ruff check dag/ –select AIR301 –fix. We’ll also examine rules AIR302 for deprecated config and AIR303 for provider package migrations. By the end, your DAGs will pass all AIR3xx checks error-free.

15:45 - 16:10.
By Shahar Epstein
Track: Airflow 3
Room: Columbia C
10/07/2025 3:45 PM 10/07/2025 4:10 PM America/Los_Angeles AS24: Airflow Without Borders: A Journey into Internationalization (i18n)

One of the exciting new features in Airflow 3 is internationalization (i18n), bringing multilingual support to the UI and making Airflow more accessible to users worldwide. This talk will highlight the UI changes made to support different languages, including locale-aware adjustments. We’ll discuss how translations are contributed and managed — including the use of LLMs to accelerate the process — and why human review remains an essential part of it. We’ll present the i18n policy designed to ensure long-term maintainability, along with the tooling developed to support it. Finally, you’ll learn how to get involved and contribute to Airflow’s global reach by translating or reviewing content in your language.

Columbia C

One of the exciting new features in Airflow 3 is internationalization (i18n), bringing multilingual support to the UI and making Airflow more accessible to users worldwide. This talk will highlight the UI changes made to support different languages, including locale-aware adjustments. We’ll discuss how translations are contributed and managed — including the use of LLMs to accelerate the process — and why human review remains an essential part of it. We’ll present the i18n policy designed to ensure long-term maintainability, along with the tooling developed to support it. Finally, you’ll learn how to get involved and contribute to Airflow’s global reach by translating or reviewing content in your language.

15:45 - 16:10.
By Sagar Sharma
Track: Use cases
Room: Columbia D
10/07/2025 3:45 PM 10/07/2025 4:10 PM America/Los_Angeles AS24: Designing Scalable Retrieval-Augmented Generation (RAG) Pipelines at SAP with Apache Airflow

At SAP Business AI, we’ve transformed Retrieval-Augmented Generation (RAG) pipelines into enterprise-grade powerhouses using Apache Airflow. Our Generative AI Foundations Team developed a cutting-edge system that effectively grounds Large Language Models (LLMs) with rich SAP enterprise data. Powering Joule for Consultants, our innovative AI copilot, this pipeline manages the seamless ingestion, sophisticated metadata enrichment, and efficient lifecycle management of over a million structured and unstructured documents. By leveraging Airflow’s Dynamic DAGs, TaskFlow API, XCom, and Kubernetes Event-Driven Autoscaling (KEDA), we achieved unprecedented scalability and flexibility. Join our session to discover actionable insights, innovative scaling strategies, and a forward-looking vision for Pipeline-as-a-Service, empowering seamless integration of customer-generated content into scalable AI workflows

Columbia D

At SAP Business AI, we’ve transformed Retrieval-Augmented Generation (RAG) pipelines into enterprise-grade powerhouses using Apache Airflow. Our Generative AI Foundations Team developed a cutting-edge system that effectively grounds Large Language Models (LLMs) with rich SAP enterprise data. Powering Joule for Consultants, our innovative AI copilot, this pipeline manages the seamless ingestion, sophisticated metadata enrichment, and efficient lifecycle management of over a million structured and unstructured documents. By leveraging Airflow’s Dynamic DAGs, TaskFlow API, XCom, and Kubernetes Event-Driven Autoscaling (KEDA), we achieved unprecedented scalability and flexibility. Join our session to discover actionable insights, innovative scaling strategies, and a forward-looking vision for Pipeline-as-a-Service, empowering seamless integration of customer-generated content into scalable AI workflows

15:45 - 16:10.
By Bao Nguyen
Track: Best practices
Room: Beckler
10/07/2025 3:45 PM 10/07/2025 4:10 PM America/Los_Angeles AS24: Ensuring Data Accuracy & Consistency with Airflow and dbt Tests

As analytics engineers, ensuring data accuracy and consistency is critical, but how do we systematically catch errors before they impact stakeholders? This session will explore how to integrate Airflow with dbt tests to build reliable and automated data validation workflows.

We’ll cover:

  • How to orchestrate dbt tests with Airflow DAGs for real-time data quality checks.
  • Handling test failures with alerting and retry strategies.
  • Using custom dbt tests for advanced validation beyond built-in checks.
  • Best practices for data observability, logging, and monitoring failed runs.
Beckler

As analytics engineers, ensuring data accuracy and consistency is critical, but how do we systematically catch errors before they impact stakeholders? This session will explore how to integrate Airflow with dbt tests to build reliable and automated data validation workflows.

We’ll cover:

  • How to orchestrate dbt tests with Airflow DAGs for real-time data quality checks.
  • Handling test failures with alerting and retry strategies.
  • Using custom dbt tests for advanced validation beyond built-in checks.
  • Best practices for data observability, logging, and monitoring failed runs.
15:45 - 16:10.
By Alida Laney
Track: Use cases
Room: Columbia A
10/07/2025 3:45 PM 10/07/2025 4:10 PM America/Los_Angeles AS24: Pittsburgh Goes With The Flow - Use Cases In Local Government

The City of Pittsburgh utilizes Airflow (via Astronomer) for a wide variety of tasks. From employee-focused use cases, like time bank balancing and internal dashboards, to public-facing publication, the City’s data flows through our DAGs from many sources to many sources. Airflow acts as a funnel point and is an essential tool for Pittsburgh’s Data Services team.

Columbia A

The City of Pittsburgh utilizes Airflow (via Astronomer) for a wide variety of tasks. From employee-focused use cases, like time bank balancing and internal dashboards, to public-facing publication, the City’s data flows through our DAGs from many sources to many sources. Airflow acts as a funnel point and is an essential tool for Pittsburgh’s Data Services team.

16:45 - 17:10.
By Piotr Dziuba & Marek Gawinski
Track: Use cases
Room: Columbia A
10/07/2025 4:45 PM 10/07/2025 5:10 PM America/Los_Angeles AS24: Allegro's Airflow Journey: From On-Prem to Cloud Orchestration at Scale

This session will detail Allegro’s, a leading e-commerce company in Poland, journey with Apache Airflow. It will chart our evolution from a custom, on-premises Airflow-as-a-Service solution through a significant expansion to over 300 Cloud Composer instances in Google Cloud, culminating in Airflow becoming the core of our data processing. We orchestrate over 64,000 regular tasks spanning over 6,000 active DAGs on more than 200 Airflow instances. From feeding business-supporting dashboards, to managing main data marts, and handling ML pipelines, and more.

We will share our practical experiences, lessons learned, and the strategies employed to manage and scale this critical infrastructure. Furthermore, we will introduce our innovative economy-of-share approach for providing ready-to-use Airflow environments, significantly enhancing both user productivity and cost efficiency.

Columbia A

This session will detail Allegro’s, a leading e-commerce company in Poland, journey with Apache Airflow. It will chart our evolution from a custom, on-premises Airflow-as-a-Service solution through a significant expansion to over 300 Cloud Composer instances in Google Cloud, culminating in Airflow becoming the core of our data processing. We orchestrate over 64,000 regular tasks spanning over 6,000 active DAGs on more than 200 Airflow instances. From feeding business-supporting dashboards, to managing main data marts, and handling ML pipelines, and more.

16:45 - 17:10.
By Naseem Shah
Track: Use cases
Room: Columbia D
10/07/2025 4:45 PM 10/07/2025 5:10 PM America/Los_Angeles AS24: LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews.

What You’ll Learn:

  1. Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability.

  2. Integrating LLM Analysis: Explore how we integrated LLM-based analysis into our pipeline. Learn how we designed the database, queries, and ingestion to Postgres.

  3. Extending Airflow UI: We developed a custom Airflow UI plugin that filters and visualizes DAG runs by customer, product, and marketplace, delivering clear insights for faster troubleshooting.

  4. Leveraging Airflow REST API: Discover how we leveraged the API to trigger DAGs on demand, elevating the UX by tracking mapped DAG progress and computing ETAs.

  5. CI/CD and Cost Management: Get practical tips for deploying DAGs with CI/CD.

Columbia D

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews.

What You’ll Learn:

  1. Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability.

16:45 - 17:10.
By Ryan Singman
Track: Best practices
Room: Beckler
10/07/2025 4:45 PM 10/07/2025 5:10 PM America/Los_Angeles AS24: Sustainable Computing in Airflow: Reducing Emissions with Carbon Aware Scheduling

As the climate impact of cloud computing grows, carbon aware computing offers a promising way to cut emissions without compromising performance. By shifting workloads to times of lower carbon intensity on the power grid, we can achieve significant emissions reductions—often 10–30%—with no code changes to the underlying task.

In this talk, we’ll explore the principles behind carbon-aware computing, walk through how these ideas translate to actionable reductions in Airflow, and introduce the open-source CarbonAware provider for Airflow. We’ll also highlight how Airflow’s deferable operators, task metadata, and flexible execution model make it uniquely well suited for temporal shifting based on grid carbon intensity.

Attendees will walk away with practical tools to make their workflows more sustainable—and a deeper understanding of how orchestration can play a role in climate-conscious engineering.

Beckler

As the climate impact of cloud computing grows, carbon aware computing offers a promising way to cut emissions without compromising performance. By shifting workloads to times of lower carbon intensity on the power grid, we can achieve significant emissions reductions—often 10–30%—with no code changes to the underlying task.

In this talk, we’ll explore the principles behind carbon-aware computing, walk through how these ideas translate to actionable reductions in Airflow, and introduce the open-source CarbonAware provider for Airflow. We’ll also highlight how Airflow’s deferable operators, task metadata, and flexible execution model make it uniquely well suited for temporal shifting based on grid carbon intensity.

8:50 - 9:00
Welcome
11:00 - 11:30
Coffee break
13:00 - 13:55
Lunch
15:35 - 16:00
Coffee break
09:00 - 11:00. Columbia A
By Amogh Desai, Ash Berlin-Taylor, Brent Bovenzi, Bugra Ozturk, Daniel Standish, Jed Cunningham, Jens Scheffler, Kaxil Naik, Pierre Jeambrun, Tzu-ping Chung, Vikram Koka, Vincent Beck & Constance Martineau
Track: keynote

Apache Airflow® 3 is here, bringing major improvements to data orchestration. In this keynote, core Airflow contributors will walk through key enhancements that boost flexibility, efficiency, and user experience.

Vikram Koka will kick things off with an overview of Airflow 3, followed by deep dives into DAG versioning (Jed Cunningham), enhanced backfilling (Daniel Standish), and a modernized UI (Brent Bovenzi & Pierre Jeambrun).

Next, Ash Berlin-Taylor, Kaxil Naik, and Amogh Desai will introduce the Task Execution Interface and Task SDK, enabling tasks in any environment and language. Jens Scheffler will showcase the Edge Executor, while Tzu-ping Chung and Vincent Beck will demo event-driven scheduling and data assets. Finally, Buğra Öztürk will unveil CLI enhancements for automation and debugging.

11:30 - 12:10. Columbia A
By Amogh Desai & Ash Berlin-Taylor
Track: Airflow 3

Airflow v2 architecture has strong coupling between the Airflow core & the User Code running in an Airflow task. This poses barriers in security, maintenance, and adoption. One such threat is that user code can access the source of truth of Airflow - the metadata DB and run any query against it! From a scalability angle, ‘n’ tasks create ‘n’ DB connections, limiting Airflow’s ability to scale effectively.

To address this we proposed AIP-72 – a client-server model for task execution. The new architecture addresses several long-standing issues, including DB isolation from workers, dependency conflicts between Airflow core & workers, and ‘n’ number of DB connections.The new architecture has two parts:

11:30 - 12:10. Columbia C
By John Jackson
Track: Use cases

On March 13th, 2025, Amazon Web Services announced General Availability of Amazon SageMaker Unified Studio, bringing together AWS machine learning and analytics capabilities. At the heart of this next generation of Amazon SageMaker sits Apache Airflow. All SageMaker Unified Studio users have a personal, open-source Airflow deployment, running alongside their Jupyter notebook, enabling those users to easily develop Airflow DAGs that have unified access to all of their data.

In this talk, I will go into details around the motivations for choosing Airflow for this capability, the challenges with incorporating Airflow into such a large and diverse experience, the key role that open-source plays, how we’re leveraging GenAI to make that open source development experience better, and the goals for the future of Airflow in SageMaker Unified Studio.

11:30 - 12:10. Columbia D
By Rahul Gade & Arun Kumar
Track: Airflow & ...

Last year, we shared how LinkedIn’s continuous deployment platform (LCD) leveraged Apache Airflow to streamline and automate deployment workflows. LCD is the deployment platform inside Linkedin which is actively used by all engineers (10000+) at Likedin.

This year, we take a deeper dive into the challenges, solutions, and engineering innovations that helped us scale Airflow to support thousands of concurrent tasks while maintaining usability and reliability.

Key Takeaways: Abstracting Airflow for a Better User Experience – How we designed a system where users could define and update their workflows without directly interacting with Airflow.

11:30 - 12:10. Beckler
By Tatiana Al-Chueyr Martins & Rahul Vats
Track: Best practices

As teams scale their Airflow workflows, a common question is: “My DAG has 5,000 tasks—how long will it take to run in Airflow?”

Beyond execution time, users often face challenges with dynamically generated DAGs, such as:

  • Delayed visualization in the Airflow UI after deployment.
  • High resource consumption, leading to Kubernetes pod evictions and out-of-memory errors.

While estimating the resource utilization in a distributed data platform is complex, benchmarking can provide crucial insights.

12:15 - 12:55. Columbia A
By Jed Cunningham & Ephraim Anierobi
Track: Airflow 3

Airflow 3 introduced a game-changing feature: Dag versioning.

Gone are the days of “latest only” Dags and confusing, inconsistent UI views when pipelines change mid-flight. This talk covers:

  • Visualizing Dag changes over time in the UI
  • How Dags code is versioned and can be grabbed from external sources
  • Executing a whole Dag run against the same code version
  • Dynamic Dags? Where do they fit in?!

You’ll see real-world scenarios, UI demos, and learn how these advancements will help avoid “Airflow amnesia”.

12:15 - 12:55. Columbia C
By Julien Le Dem & Zach Gottesman
Track: Use cases

Datadog is a world-class data platform ingesting more than a 100 trillion events a day, providing real-time insights.

Before Airflow’s prominence, we built batch processing on Luigi, Spotify’s open-source orchestrator. As Airflow gained wide adoption, we evaluated adopting the major improvements of release 2.0, but opted for building our own orchestrator instead to realize our dataset-centric, event-driven vision.

Meanwhile, the 3.0 release aligned Airflow with the same vision we pursued internally, as a modern asset-driven orchestrator. It showed how futile it is to build our own compared to the momentum of the community. We evaluated several orchestrators and decided to join forces with the Airflow project.

12:15 - 12:55. Columbia D
By Przemek Wiech & Augusto Hidalgo
Track: Airflow 3

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2.

This presentation is going to present performance results for various Airflow 3 configurations and provide information to users to should give Airflow 3 adopters good understanding of Airflow 3 performance.

The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users.

12:15 - 12:55. Beckler
By Sungji Yang & DaeHoon Song
Track: Best practices

In this talk, we will introduce the DAG Management Service (DMS), developed to address critical challenges in managing Airflow clusters. With over 10,000 active DAGs, a single Airflow cluster faces scaling limits and noisy neighbor issues, impacting task scheduling SLAs. DMS enhances reliability by distributing DAGs across multiple clusters and enforcing proper configurations.

We will also discuss how DMS streamlines Airflow version upgrades. Upgrading from an old Airflow version to the latest requires sequential updates and code modifications for over 10,000 DAGs. DMS proposes an efficient upgrade method, reducing dependency on users.

14:00 - 14:25. Columbia A
By Vincent Beck
Track: Airflow 3

Airflow 3 introduces a major evolution in orchestration: native support for external event-driven scheduling. In this talk, I’ll share the journey behind AIP-82—why we needed it, how we built it, and what it unlocks. I’ll dive into how the new AssetWatcher enables pipelines to respond immediately to events like file arrivals, API calls, or pub/sub messages. You’ll see how this drastically reduces latency and infrastructure overhead while improving reactivity and resource efficiency. We’ll explore how it works under the hood, real-world use cases, best practices, and migration tips for teams ready to shift from time-based to event-driven workflows. If you’re looking to make your Airflow DAGs more dynamic, this is the talk that shows you how. Whether you’re an operator or contributor, you’ll walk away with a deep understanding of one of Airflow 3’s most impactful features.

14:00 - 14:25. Columbia C
By Yuan Luo & Huiliang Zhang
Track: Use cases

iKang Healthcare Group, serving nearly 10 million patients annually, built a centralized healthcare data hub powered by Apache Airflow to support its large-scale, real-time clinical operations. The platform integrates batch and streaming data in a lakehouse architecture, orchestrating complex workflows from data ingestion (HL7/FHIR) to clinical decision support.

Healthcare data’s inherent complexity—spanning structured lab results to unstructured clinical notes—requires dynamic, reliable orchestration. iKang uses Airflow’s DAGs, extensibility, and workflow-as-code capabilities to address challenges like multi-system coordination, semantic data linking, and fault-tolerant automation.

14:00 - 14:25. Columbia D
By Jon Hiett
Track: Sponsored

As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.

14:00 - 14:25. Beckler
By Maggie Stark & Marion Azoulai
Track: Best practices

Ensuring high-quality data is essential for building user trust and enabling data teams to work efficiently. In this talk, we’ll explore how the Astronomer data team leverages Airflow to uphold data quality across complex pipelines; minimizing firefighting and maximizing confidence in reported metrics.

Maintaining data quality requires a multi-faceted approach: safeguarding the integrity of source data, orchestrating pipelines reliably, writing robust code, and maintaining consistency in outputs. We’ve embedded data quality into the DevEx experience, so it’s always at the forefront instead of in the backlog of tech debt.

14:30 - 14:55. Columbia A
By Christos Bisias
Track: Airflow 3

Using OpenTelemetry tracing, users can gain full visibility into tasks and calls to outside services. This is an increasingly important skill, especially as tasks in an Airflow DAG involve multiple complex computations which take hours or days to complete. Airflow allows users to easily monitor how long entire DAG runs or individual tasks take, but preserves the anonymity of internal actions. OpenTelemetry gives users much more operational awareness and metrics they can use to improve operations.

14:30 - 14:55. Columbia C
By Chinni Krishna Abburi
Track: Use cases

In today’s fast-paced business world, timely and reliable insights are crucial — but manual BI workflows can’t keep up. This session offers a practical guide to automating business intelligence processes using Apache Airflow. We’ll walk through real-world examples of automating data extraction, transformation, dashboard refreshes, and report distribution. Learn how to design DAGs that align with business SLAs, trigger workflows based on events, integrate with popular BI tools like Tableau and Power BI, and implement alerting and failure recovery mechanisms. Whether you’re new to Airflow or looking to scale your BI operations, this session will equip you with actionable strategies to save time, reduce errors, and supercharge your organization’s decision-making capabilities.

14:30 - 14:55. Columbia D
By Denny Lee
Track: Sponsored

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment.

In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines.

14:30 - 14:55. Beckler
By Snir Israeli
Track: Use cases

Maintaining consistency, code quality, and best practices for writing Airflow DAGs between teams and individual developers can be a significant challenge. Trying to achieve it using manual code reviews is both time-consuming and error-prone.

To solve this at Next, we decided to build a custom, internally developed linting tool for Airflow DAGs, to help us evaluate their quality and uniformity - we call it - DAGLint.

In this talk I am going to share why we chose to implement it, how we built it, and how we use it to elevate our code quality and standards throughout the entire Data engineering group.

15:00 - 15:25. Columbia A
By Dennis Ferruzzi
Track: Airflow 3

Do you have a DAG that needs to be done by a certain time? Have you tried to use Airflow 2’s SLA feature and found it restrictive or complicated? You aren’t alone! Come learn about the all-new Deadline Alerts feature in Airflow 3.1 which replaces SLA. We will discuss how Deadline Alerts work and how they improve on the retired SLA feature. Then we will look at some examples of workflows you can build with the new feature, including some of the callback options and how they work, and finally looking ahead to some future use-cases of using Deadlines for Tasks and even Assets.

15:00 - 15:25. Columbia C
By Ankit Chaurasia
Track: Best practices

Airflow 3 has officially arrived! In this session, we’ll start by discussing prerequisites for a smooth upgrade from Airflow 2.x to Airflow 3, including airflow version requirements, removing deprecated SubDAGs, and backing up and cleaning your metadata database prior to migration. We’ll then explore the new CLI utility: airflow config update [—-fix] for auto-applying configuration changes. We’ll demo cleaning old XCom data to speed up schema migration.

During this session, attendees will learn to verify and adapt their pipelines for Airflow 3 using a Ruff-based upgrade utility. I will demo run ruff check dag/ –select AIR301 to surface scheduling issues, inspect fixes via ruff check dag/ –select AIR301 –show-fixes, and apply corrections with ruff check dag/ –select AIR301 –fix. We’ll also examine rules AIR302 for deprecated config and AIR303 for provider package migrations. By the end, your DAGs will pass all AIR3xx checks error-free.

15:00 - 15:25. Columbia D
By Basil Faruqui
Track: Sponsored

Airflow powers thousands of data and ML pipelines—but in the enterprise, these pipelines often need to interact with business-critical systems like ERPs, CRMs, and core banking platforms.

In this demo-driven session we will connect Airflow with Control-M from BMC and showcase how Airflow can participate in end-to-end workflows that span not just data platforms but also transactional business applications.

Session highlights

  • Trigger Airflow DAGs based on business events (e.g., invoice approvals, trade settlements)
  • Feed Airflow pipeline outputs into ERP systems (e.g., SAP) or CRMs (e.g., Salesforce)
  • Orchestrate multi-platform workflows from cloud to mainframe with SLA enforcement, dependency management, and centralized control.
  • Provide unified monitoring and auditing across data and application layers
15:00 - 15:25. Beckler
By Zdravko Hvarlingov & Ivan Nikolov
Track: Use cases

Investigative journalism often relies on uncovering hidden patterns in vast amounts of unstructured and semi-structured data. At the FT, we leverage Airflow to orchestrate AI-powered pipelines that transform complex, fragmented datasets into structured insights. Our Storyfinding team works closely with journalists to automate tedious data processing, enabling them to tell stories that might otherwise go untold.

This talk will explore how we use Airflow to process and analyze text, documents, and other difficult-to-structure data sources combining AI, machine learning, and advanced computational techniques to extract meaningful entities, relationships, and patterns. We’ll also showcase our connection analysis workflows, which link various datasets to reveal previously hidden chains of people and companies, a crucial capability for investigative reporting.

15:45 - 16:10. Columbia A
By Alida Laney
Track: Use cases

The City of Pittsburgh utilizes Airflow (via Astronomer) for a wide variety of tasks. From employee-focused use cases, like time bank balancing and internal dashboards, to public-facing publication, the City’s data flows through our DAGs from many sources to many sources. Airflow acts as a funnel point and is an essential tool for Pittsburgh’s Data Services team.

15:45 - 16:10. Columbia C
By Shahar Epstein
Track: Airflow 3

One of the exciting new features in Airflow 3 is internationalization (i18n), bringing multilingual support to the UI and making Airflow more accessible to users worldwide. This talk will highlight the UI changes made to support different languages, including locale-aware adjustments. We’ll discuss how translations are contributed and managed — including the use of LLMs to accelerate the process — and why human review remains an essential part of it. We’ll present the i18n policy designed to ensure long-term maintainability, along with the tooling developed to support it. Finally, you’ll learn how to get involved and contribute to Airflow’s global reach by translating or reviewing content in your language.

15:45 - 16:10. Columbia D
By Sagar Sharma
Track: Use cases

At SAP Business AI, we’ve transformed Retrieval-Augmented Generation (RAG) pipelines into enterprise-grade powerhouses using Apache Airflow. Our Generative AI Foundations Team developed a cutting-edge system that effectively grounds Large Language Models (LLMs) with rich SAP enterprise data. Powering Joule for Consultants, our innovative AI copilot, this pipeline manages the seamless ingestion, sophisticated metadata enrichment, and efficient lifecycle management of over a million structured and unstructured documents. By leveraging Airflow’s Dynamic DAGs, TaskFlow API, XCom, and Kubernetes Event-Driven Autoscaling (KEDA), we achieved unprecedented scalability and flexibility. Join our session to discover actionable insights, innovative scaling strategies, and a forward-looking vision for Pipeline-as-a-Service, empowering seamless integration of customer-generated content into scalable AI workflows

15:45 - 16:10. Beckler
By Bao Nguyen
Track: Best practices

As analytics engineers, ensuring data accuracy and consistency is critical, but how do we systematically catch errors before they impact stakeholders? This session will explore how to integrate Airflow with dbt tests to build reliable and automated data validation workflows.

We’ll cover:

  • How to orchestrate dbt tests with Airflow DAGs for real-time data quality checks.
  • Handling test failures with alerting and retry strategies.
  • Using custom dbt tests for advanced validation beyond built-in checks.
  • Best practices for data observability, logging, and monitoring failed runs.
14:15 - 14:40. Columbia A
By Justin Wang & Saurabh Gupta
Track: Use cases

The workflow orchestration team at Zoox aims to build a solution for orchestrating heterogeneous workflows encompassing data, ML, and QA pipelines. We have encountered two primary challenges: first, the steep learning curve for new Airflow users and the need for a user-friendly yet scalable development process; second, integrating and migrating existing pipelines with established solutions.

This presentation will detail our approach, as a small team at Zoox, to address these challenges. The discussion will cover the scope and scale of Airflow within Zoox, including current applications and future directions. Furthermore, we will share our strategies for simplifying the Airflow DAG creation process and enhancing user experience. Finally, we will present a case study illustrating the onboarding of a heterogeneous workflow across Databricks, AWS, and a Zoox in-house platform to manage both on-prem and cloud services.

14:15 - 14:40. Columbia C
By Jens Scheffler & Daniel Wolf
Track: Airflow 3

Airflow 3 extends the deployment options to run your workload anywhere. You don’t need to bring your data to airflow but you can bring the execution where it needs to be.

You can connect any cloud and on-prem location together and generate a hybrid workflow from one central Airflow instance. Only a HTTP connection is needed.

We will present the use cases and concepts of the Edge deployment and how it is working also in a hybrid setup with Celery or other executors.

14:15 - 14:40. Columbia D
By Kyle McCluskey
Track: Use cases

Discover how Apache Airflow powers scalable ELT pipelines, enabling seamless data ingestion, transformation, and machine learning-driven insights. This session will walk through:

Automating Data Ingestion: Using Airflow to orchestrate raw data ingestion from third-party sources into your data lake (S3, GCP), ensuring a steady pipeline of high-quality training and prediction data.

Optimizing Transformations with Serverless Computing: Offloading intensive transformations to serverless functions (GCP Cloud Run, AWS Lambda) and machine learning models (BigQuery ML, Sagemaker), integrating their outputs seamlessly into Airflow workflows.

14:15 - 14:40. Beckler
By William Orgertrice
Track: Best practices

Take your DAGs in Apache Airflow to the next level? This is an insightful session where we’ll uncover 5 transformative strategies to enhance your data workflows. Whether you’re a data engineering pro or just getting started, this presentation is packed with practical tips and actionable insights that you can apply right away.

We’ll dive into the magic of using powerful libraries like Pandas, share techniques to trim down data volumes for faster processing, and highlight the importance of modularizing your code for easier maintenance. Plus, you’ll discover efficient ways to monitor and debug your DAGs, and how to make the most of Airflow’s built-in features.

16:45 - 17:10. Columbia A
By Piotr Dziuba & Marek Gawinski
Track: Use cases

This session will detail Allegro’s, a leading e-commerce company in Poland, journey with Apache Airflow. It will chart our evolution from a custom, on-premises Airflow-as-a-Service solution through a significant expansion to over 300 Cloud Composer instances in Google Cloud, culminating in Airflow becoming the core of our data processing. We orchestrate over 64,000 regular tasks spanning over 6,000 active DAGs on more than 200 Airflow instances. From feeding business-supporting dashboards, to managing main data marts, and handling ML pipelines, and more.

16:45 - 17:10. Columbia D
By Naseem Shah
Track: Use cases

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews.

What You’ll Learn:

  1. Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability.

16:45 - 17:10. Beckler
By Ryan Singman
Track: Best practices

As the climate impact of cloud computing grows, carbon aware computing offers a promising way to cut emissions without compromising performance. By shifting workloads to times of lower carbon intensity on the power grid, we can achieve significant emissions reductions—often 10–30%—with no code changes to the underlying task.

In this talk, we’ll explore the principles behind carbon-aware computing, walk through how these ideas translate to actionable reductions in Airflow, and introduce the open-source CarbonAware provider for Airflow. We’ll also highlight how Airflow’s deferable operators, task metadata, and flexible execution model make it uniquely well suited for temporal shifting based on grid carbon intensity.

Wednesday, October 8, 2025

09:00
09:30
Keynote TBC
10:00
Coffee break
10:30
Sponsored talk
11:00
Sponsored talk
12:00
Sponsored talk
12:30
13:00
Lunch
14:00
Workshop TBD
14:30
15:00
15:30
Coffee break
15:45
16:15
16:45
09:10 - 09:25.
By Tala Karadsheh
Track: Use cases
Room: Columbia A
10/08/2025 9:10 AM 10/08/2025 9:25 AM America/Los_Angeles AS24: From DAGs to Insights: Business-Driven Airflow Use Cases

Airflow is integral to GitHub’s data and insight generation. This session dives into use cases from GitHub where key business decisions are driven, at the root, with the help of Airflow. The session will also highlight how both GitHub and Airflow celebrate, promote, and nurture OSS innovations in their own ways.

Columbia A

Airflow is integral to GitHub’s data and insight generation. This session dives into use cases from GitHub where key business decisions are driven, at the root, with the help of Airflow. The session will also highlight how both GitHub and Airflow celebrate, promote, and nurture OSS innovations in their own ways.

10:30 - 11:10.
By M Waqas Shahid
Track: Airflow 3
Room: Columbia D
10/08/2025 10:30 AM 10/08/2025 11:10 AM America/Los_Angeles AS24: Airflow 3 - An Open Heart Surgery

Curious how code truly flows inside Airflow? Join me for a unique visualisation journey into Airflow’s inner workings (first of its kind) — code blocks and modules called when certain operations are running.

A walkthrough that unveils task execution, observability, and debugging like never before. Scaling of Airflow in action, showing performance comparison b/w Airflow 3 vs 2. This session will demystify Airflow’s architecture, showcasing real-time task flows and the heartbeat of pipelines in action.

Perfect for engineers looking to optimize workflows, troubleshoot efficiently, and gain a new perspective on Airflow’s powerful upgraded core. See Airflow running live with detailed insights and unlock the secrets to better pipeline management!

Columbia D

Curious how code truly flows inside Airflow? Join me for a unique visualisation journey into Airflow’s inner workings (first of its kind) — code blocks and modules called when certain operations are running.

A walkthrough that unveils task execution, observability, and debugging like never before. Scaling of Airflow in action, showing performance comparison b/w Airflow 3 vs 2. This session will demystify Airflow’s architecture, showcasing real-time task flows and the heartbeat of pipelines in action.

10:30 - 11:10.
By Miquel Angel Andreu Febrer
Track: Use cases
Room: Beckler
10/08/2025 10:30 AM 10/08/2025 11:10 AM America/Los_Angeles AS24: Dynamic Data Pipelines with DBT and Airflow

This session showcases Okta’s innovative approach to data pipeline orchestration with dbt and Airflow. How we’ve implemented dynamically generated airflow dags workflows based on dbt’s dependency graph. This allows us to enforce strict data quality standards by automatically executing downstream model tests before upstream model deployments, effectively preventing error cascades. The entire CI/CD pipeline, from dbt model changes to production DAG deployment, is fully automated. The result? Accelerated development cycles, reduced operational overhead, and bulletproof data reliability

Beckler

This session showcases Okta’s innovative approach to data pipeline orchestration with dbt and Airflow. How we’ve implemented dynamically generated airflow dags workflows based on dbt’s dependency graph. This allows us to enforce strict data quality standards by automatically executing downstream model tests before upstream model deployments, effectively preventing error cascades. The entire CI/CD pipeline, from dbt model changes to production DAG deployment, is fully automated. The result? Accelerated development cycles, reduced operational overhead, and bulletproof data reliability

10:30 - 13:00.
By Kenten Danas
Track: Workshop
Room: 301
10/08/2025 10:30 AM 10/08/2025 1:00 PM America/Los_Angeles AS24: Get started with Airflow 3.0

Airflow 3.0 is the most significant release in the project’s history, and brings a better user experience, stronger security, and the ability to run tasks anywhere, at any time. In this workshop, you’ll get hands-on experience with the new release and learn how to leverage new features like DAG versioning, backfills, data assets, and a new react-based UI.

Whether you’re writing traditional ELT/ETL pipelines or complex ML and GenAI workflows, you’ll learn how Airflow 3 will make your day-to-day work smoother and your pipelines even more flexible. This workshop is suitable for intermediate to advanced Airflow users. Beginning users should consider taking the Airflow fundamentals course on the Astronomer Academy before attending this workshop.

301
Get hands-on experience with the new release and learn how to leverage new features like DAG versioning, backfills, data assets, and a new react-based UI.
10:30 - 13:00.
By Vinod Jayendra, Suba Palanisamy, Sean Bjurstrom & Anurag Srivastava
Track: Workshop
Room: 305
10/08/2025 10:30 AM 10/08/2025 1:00 PM America/Los_Angeles AS24: Orchestrating Apache Airflow ML Workflows at Scale with SageMaker Unified Studio

As organizations increasingly rely on data-driven applications, managing the diverse tools, data, and teams involved can create challenges. Amazon SageMaker Unified Studio addresses this by providing an integrated, governed platform to orchestrate end-to-end data and AI/ML workflows.

In this workshop, we’ll explore how to leverage Amazon SageMaker Unified Studio to build and deploy scalable Apache Airflow workflows that span the data and AI/ML lifecycle. We’ll walk through real-world examples showcasing how this AWS service brings together familiar Airflow capabilities with SageMaker’s data processing, model training, and inference features - all within a unified, collaborative workspace.

Key topics covered:

  • Authoring and scheduling Airflow DAGs in SageMaker Unified Studio
  • Understanding how Apache Airflow powers workflow orchestration under the hood
  • Leveraging SageMaker capabilities like Notebooks, Data Wrangler, and Models
  • Implementing centralized governance and workflow monitoring
  • Enhancing productivity through unified development environments

Join us to transform your ML workflow experience from complex and fragmented to streamlined and efficient.

305
We’ll explore how to leverage Amazon SageMaker Unified Studio to build and deploy scalable Apache Airflow workflows that span the data and AI/ML lifecycle.
10:30 - 11:10.
By Amogh Desai, Jarek Potiuk & Pavan kumar Gopidesu
Track: Community
Room: Columbia C
10/08/2025 10:30 AM 10/08/2025 11:10 AM America/Los_Angeles AS24: The Secret to Airflow's Evergreen Build: CI/CD magic

Have you ever wondered why Apache Airflow builds are asymptotically(*) green? That thrive for “perennial green build” is not magic, it’s the result of continuous, often unseen engineering effort within our CI/CD pipelines & dev environments. This dedication ensures that maintainers can work efficiently & contributors can onboard smoothly.

To tackle the ever growing contributor base, we have a CI/CD team run by volunteers putting in significant work in the foundational tooling. In this talk, we reveal some innovative solutions we have implemented like:

  • Handling GitHub Actions pull_request_target challenges
  • Restructuring the repo for better clarity
  • Slack bot for CI failure alerts
  • A cherry picker workflow for releases
  • Pre-commit hooks
  • Faster website and image builds
  • Tackling the new GitHub API rate limits
  • Solving chicken-and-egg build issues during releases

Join us to understand the “why” & “how” behind these infra components. You’ll gain insights into the continuous effort required to support a thriving open-source project like Airflow and, hopefully, be inspired to contribute to these areas. (*) asymptotically = we fix failures as quickly as we can when they happen

Columbia C

Have you ever wondered why Apache Airflow builds are asymptotically(*) green? That thrive for “perennial green build” is not magic, it’s the result of continuous, often unseen engineering effort within our CI/CD pipelines & dev environments. This dedication ensures that maintainers can work efficiently & contributors can onboard smoothly.

To tackle the ever growing contributor base, we have a CI/CD team run by volunteers putting in significant work in the foundational tooling. In this talk, we reveal some innovative solutions we have implemented like:

10:30 - 13:00.
By Mike Ellis
Track: Workshop
Room: 306
10/08/2025 10:30 AM 10/08/2025 1:00 PM America/Los_Angeles AS24: Unleash Airflow's Potential with hands-on Performance Optimization

This interactive workshop session empowers you to unlock the full potential of Apache Airflow through performance optimization techniques. Gain hands-on experience identifying performance bottlenecks and implementing best practices to overcome them.

306
This interactive workshop session empowers you to unlock the full potential of Apache Airflow through performance optimization techniques.
11:15 - 12:00.
By Kaxil Naik
Track: Roadmap
Room: Columbia A
10/08/2025 11:15 AM 10/08/2025 12:00 PM America/Los_Angeles AS24: Airflow as an AI Agent’s Toolkit: Unlocking 1000+ Integrations with MCP

AI agents transform conversational prompts into actionable automation provided they have reliable access to essential tools like data warehouses, cloud storage, and APIs.

Now imagine exposing Airflow’s rich integration layer directly to AI agents via the emerging Model Context Protocol (MCP). This isn’t just gluing AI into Airflow; it’s turning Airflow into a structured execution layer for adaptive, agentic logic with full observability, retries, and audit trails built in.

We’ll demonstrate a real-world fraud detection pipeline powered by agents: suspicious transactions are analyzed, enriched dynamically with external customer data via MCP, and escalated based on validated, structured outputs. Every prompt, decision, and action is auditable and compliant.

We will then explore how Airflow can be extended into a conversational future - such as querying Snowflake from natural language, inspecting AWS S3 files, or executing BigQuery operations directly via agent prompts.

Explore the next potential evolution of Airflow - going beyond scheduling DAGs and empowering conversational AI agents with a toolkit of over 1,000 integrations you already use and trust.

Columbia A

AI agents transform conversational prompts into actionable automation provided they have reliable access to essential tools like data warehouses, cloud storage, and APIs.

Now imagine exposing Airflow’s rich integration layer directly to AI agents via the emerging Model Context Protocol (MCP). This isn’t just gluing AI into Airflow; it’s turning Airflow into a structured execution layer for adaptive, agentic logic with full observability, retries, and audit trails built in.

We’ll demonstrate a real-world fraud detection pipeline powered by agents: suspicious transactions are analyzed, enriched dynamically with external customer data via MCP, and escalated based on validated, structured outputs. Every prompt, decision, and action is auditable and compliant.

11:15 - 12:00.
By Sumit Maheshwari
Track: Use cases
Room: Columbia C
10/08/2025 11:15 AM 10/08/2025 12:00 PM America/Los_Angeles AS24: Operation Airlift: Uber's ongoing journey of migrating 200K pipelines to a single Airflow3 instance

Yes, you read that right — 200,000 pipelines, nearly 1 million task executions per day, all powered by a single Airflow instance.

In this session, we’ll take you behind the scenes of one of the boldest orchestration projects ever attempted: how Uber’s data platform team is executing what might be the largest Apache Airflow migration in history — and doing it straight to Airflow 3.

From scaling challenges and architectural choices to lessons learned in high-throughput orchestration, this is a deep dive into the tech, the chaos, and the strategy behind making data fly at unprecedented scale.

Columbia C

Yes, you read that right — 200,000 pipelines, nearly 1 million task executions per day, all powered by a single Airflow instance.

In this session, we’ll take you behind the scenes of one of the boldest orchestration projects ever attempted: how Uber’s data platform team is executing what might be the largest Apache Airflow migration in history — and doing it straight to Airflow 3.

From scaling challenges and architectural choices to lessons learned in high-throughput orchestration, this is a deep dive into the tech, the chaos, and the strategy behind making data fly at unprecedented scale.

11:15 - 12:00.
By Bugra Ozturk
Track: Airflow 3
Room: Beckler
10/08/2025 11:15 AM 10/08/2025 12:00 PM America/Los_Angeles AS24: Securing Airflow CLI with API

This talk will explore the key changes introduced by AIP-81, focusing on security enhancements and user experience improvements across the entire software development lifecycle. We will break down the technical advancements from both a security and usability perspective, addressing key questions for Apache Airflow users of all levels. Topics include and not limited to isolating CLI communication to enhance security via leveraging Role-Based Access Control (RBAC) within the API for secure database interactions, clearly defining local vs. remote command execution and future improvements.

Beckler

This talk will explore the key changes introduced by AIP-81, focusing on security enhancements and user experience improvements across the entire software development lifecycle. We will break down the technical advancements from both a security and usability perspective, addressing key questions for Apache Airflow users of all levels. Topics include and not limited to isolating CLI communication to enhance security via leveraging Role-Based Access Control (RBAC) within the API for secure database interactions, clearly defining local vs. remote command execution and future improvements.

12:00 - 12:25.
By Andres Astorga Espriella & Soren Archibald
Track: Airflow & ...
Room: Beckler
10/08/2025 12:00 PM 10/08/2025 12:25 PM America/Los_Angeles AS24: Agentic AI Automating Semantic Layer Updates with Airflow 3

In today’s dynamic data environments, tables and schemas are constantly evolving and keeping semantic layers up to date has become a critical operational challenge. Manual updates don’t scale, and delays can quickly lead to broken dashboards, failed pipelines, and lost trust.

We’ll show how to harness Apache Airflow 3 and its new event-driven scheduling capabilities to automate the entire lifecycle: detecting table and schema changes in real time, parsing and interpreting those changes, and shifting left the updating of semantic models across dbt, Looker, or custom metadata layers. AI agents will add intelligence and automation that rationalize schema diffs, assess impact of changes, and propose targeted updates to semantic layers reducing manual work and minimizing the risk of errors.

We’ll dive into strategies for efficient change detection, safe incremental updates, and orchestrating workflows where humans collaborate with AI agents to validate and deploy changes.

By the end of the session, you’ll understand how to build resilient, self-healing semantic layers that minimize downtime, reduce manual intervention, and scale effortlessly across fast-changing data environments.

Beckler

In today’s dynamic data environments, tables and schemas are constantly evolving and keeping semantic layers up to date has become a critical operational challenge. Manual updates don’t scale, and delays can quickly lead to broken dashboards, failed pipelines, and lost trust.

We’ll show how to harness Apache Airflow 3 and its new event-driven scheduling capabilities to automate the entire lifecycle: detecting table and schema changes in real time, parsing and interpreting those changes, and shifting left the updating of semantic models across dbt, Looker, or custom metadata layers. AI agents will add intelligence and automation that rationalize schema diffs, assess impact of changes, and propose targeted updates to semantic layers reducing manual work and minimizing the risk of errors.

12:00 - 12:25.
By Pankaj Koti, Tatiana Al-Chueyr Martins & Pankaj Singh
Track: Airflow & ...
Room: Columbia A
10/08/2025 12:00 PM 10/08/2025 12:25 PM America/Los_Angeles AS24: Boosting dbt-core workflows performance with Airflow’s Deferrable capabilities

Efficiently handling long-running workflows is crucial for scaling modern data pipelines. Apache Airflow’s deferrable operators help offload tasks during idle periods — freeing worker slots while tracking progress.

This session explores how Cosmos 1.9 (https://github.com/astronomer/astronomer-cosmos) integrates Airflow’s deferrable capabilities to enhance orchestrating dbt (https://github.com/dbt-labs/dbt-core) in production, with insights from recent contributions that introduced this functionality.

Key takeaways:

  • Deferrable Operators: How they work and why they’re ideal for long-running dbt tasks.
  • Integrating with Cosmos: Refactoring and enhancements to enable deferrable behaviour across platforms.
  • Performance Gains: Resource savings and task throughput improvements from deferrable execution.
  • Challenges & Future Enhancements: Lessons learned, compatibility, and ideas for broader support.

Whether orchestrating dbt models on a cloud warehouse or managing large-scale transformations, this session offers practical strategies to reduce resource contention and boost pipeline performance.

Columbia A

Efficiently handling long-running workflows is crucial for scaling modern data pipelines. Apache Airflow’s deferrable operators help offload tasks during idle periods — freeing worker slots while tracking progress.

This session explores how Cosmos 1.9 (https://github.com/astronomer/astronomer-cosmos) integrates Airflow’s deferrable capabilities to enhance orchestrating dbt (https://github.com/dbt-labs/dbt-core) in production, with insights from recent contributions that introduced this functionality.

Key takeaways:

  • Deferrable Operators: How they work and why they’re ideal for long-running dbt tasks.
  • Integrating with Cosmos: Refactoring and enhancements to enable deferrable behaviour across platforms.
  • Performance Gains: Resource savings and task throughput improvements from deferrable execution.
  • Challenges & Future Enhancements: Lessons learned, compatibility, and ideas for broader support.

Whether orchestrating dbt models on a cloud warehouse or managing large-scale transformations, this session offers practical strategies to reduce resource contention and boost pipeline performance.

12:00 - 12:25.
By Belle Romea
Track: Use cases
Room: Columbia A
10/08/2025 12:00 PM 10/08/2025 12:25 PM America/Los_Angeles AS24: Creating DuoFactory: An Orchestration Ecosystem with Airflow

Duolingo has built an internal tool DuoFactory to orchestrate AI generated content using Airflow. The tool has been used to generate example sentences per lesson, math exercises, and Duoradio lessons. The ecosystem is flexible for various company needs. Some of these use cases contain end to end generation where one click of a button generates content in app. We also have created a Workflow Builder to orchestrate and iterate on generative AI workflows by creating one-time DAG instances with a UI easy enough for non-engineers to use.

Columbia A

Duolingo has built an internal tool DuoFactory to orchestrate AI generated content using Airflow. The tool has been used to generate example sentences per lesson, math exercises, and Duoradio lessons. The ecosystem is flexible for various company needs. Some of these use cases contain end to end generation where one click of a button generates content in app. We also have created a Workflow Builder to orchestrate and iterate on generative AI workflows by creating one-time DAG instances with a UI easy enough for non-engineers to use.

12:30 - 12:55.
By Kengo Seki
Track: Airflow & ...
Room: Beckler
10/08/2025 12:30 PM 10/08/2025 12:55 PM America/Los_Angeles AS24: Airflow & Bigtop: Modernize and integrate time-proven OSS stack with Apache Airflow

Apache Bigtop is a time-proven open-source software stack for building data platform, which has been built around the Hadoop and Spark ecosystem since 2011. Its software composition has been changed during such a long period, and recently job scheduler is removed mainly due to the inactivity of its development. The speaker believes that Airflow perfectly fits into this gap and is proposing incorporating it in the Bigtop stack. This presentation will introduce how easily users can build a data platform with Bigtop including Airflow, and how Airflow can integrate those software with its wide range of providers and enterprise-readiness such as the Kerberos support.

Beckler

Apache Bigtop is a time-proven open-source software stack for building data platform, which has been built around the Hadoop and Spark ecosystem since 2011. Its software composition has been changed during such a long period, and recently job scheduler is removed mainly due to the inactivity of its development. The speaker believes that Airflow perfectly fits into this gap and is proposing incorporating it in the Bigtop stack. This presentation will introduce how easily users can build a data platform with Bigtop including Airflow, and how Airflow can integrate those software with its wide range of providers and enterprise-readiness such as the Kerberos support.

12:30 - 12:55.
By Jens Scheffler, Brent Bovenzi & Pierre Jeambrun
Track: Airflow 3
Room: Columbia A
10/08/2025 12:30 PM 10/08/2025 12:55 PM America/Los_Angeles AS24: Airflow 3 UI is not enough? Add a Plugin!

In Airflow 2 there was a plugin mechanism to extend the UI for new functions as well as be able to add hooks and other features.

As Airflow 3 rewrote the UI old Plugins were not working for all cases anymore. Airflow 3.1 now provides a re-vamped option to extend the UI with a new plugin schema in native React components and embedded iframes following AIP-68 definitions.

In this session we will provide an overview about capabilities and give some intro how you can roll-your-own.

Columbia A

In Airflow 2 there was a plugin mechanism to extend the UI for new functions as well as be able to add hooks and other features.

As Airflow 3 rewrote the UI old Plugins were not working for all cases anymore. Airflow 3.1 now provides a re-vamped option to extend the UI with a new plugin schema in native React components and embedded iframes following AIP-68 definitions.

In this session we will provide an overview about capabilities and give some intro how you can roll-your-own.

12:30 - 12:55.
By Nick Bilozerov, Daniel Melchor & Sabrina Liu
Track: Use cases
Room: Columbia D
10/08/2025 12:30 PM 10/08/2025 12:55 PM America/Los_Angeles AS24: Do you trust Airflow with your money? (We do!)

Airflow is wonderfully, frustratingly complex - and so is global finance! Stripe has very specific needs all over the planet, and we have customized Airflow to adapt to the variety and rigor that we need to grow the GDP of the internet.

In this talk, you’ll learn:

  • How we support independent DAG change management for over 500 different teams running over 150k tasks.

  • How we’ve customized Airflow’s Kubernetes integration to comply with Stripe’s unique compliance requirements.

  • How we’ve built on Airflow to support no-code data pipelines.

Columbia D

Airflow is wonderfully, frustratingly complex - and so is global finance! Stripe has very specific needs all over the planet, and we have customized Airflow to adapt to the variety and rigor that we need to grow the GDP of the internet.

In this talk, you’ll learn:

  • How we support independent DAG change management for over 500 different teams running over 150k tasks.

  • How we’ve customized Airflow’s Kubernetes integration to comply with Stripe’s unique compliance requirements.

12:30 - 12:55.
By Ryan Hatter
Track: Use cases
Room: Columbia C
10/08/2025 12:30 PM 10/08/2025 12:55 PM America/Los_Angeles AS24: LLMOps with Airflow 3.0 and the Airflow AI SDK

Airflow 3 brings several exciting new features that better support MLOps:

  • Native, intuitive backfills
  • Removal of the unique execution date for dag runs
  • Native support for event-driven scheduling

These features, combined with the Airflow AI SDK, enable dag authors to easily build scalable, maintainable, and performant LLMOps pipelines.

In this talk, we’ll go through a series of workflows that use the Airflow AI SDK to empower Astronomer’s support staff to more quickly resolve problems faced by Astronomer’s customers.

Columbia C

Airflow 3 brings several exciting new features that better support MLOps:

  • Native, intuitive backfills
  • Removal of the unique execution date for dag runs
  • Native support for event-driven scheduling

These features, combined with the Airflow AI SDK, enable dag authors to easily build scalable, maintainable, and performant LLMOps pipelines.

In this talk, we’ll go through a series of workflows that use the Airflow AI SDK to empower Astronomer’s support staff to more quickly resolve problems faced by Astronomer’s customers.

14:00 - 14:25.
By Ankit Chaurasia & Rahul Vats
Track: Airflow 3
Room: Columbia D
10/08/2025 2:00 PM 10/08/2025 2:25 PM America/Los_Angeles AS24: Beyond Execution Dates: Empowering inference execution and hyper-parameter tuning with Airflow 3

In legacy Airflow 2.x, each DAG run was tied to a unique “execution_date.” By removing this requirement, Airflow can now directly support a variety of new use cases, such as model training and generative AI inference, without the need for hacks and workarounds typically used by machine learning and AI engineers.

In this talk, we will delve into the significant advancements in Airflow 3 that enable GenAI and MLOps use cases, particularly through the changes outlined in AIP 83. We’ll cover key changes like the renaming of “execution_date” to “logical_date,” along with the allowance for it to be null, and the introduction of the new “run_after” field which provides a more meaningful mechanism for scheduling and sorting. Furthermore, we’ll discuss how by removing the uniqueness constraint, Airflow 3 enables multiple parallel runs, empowering diverse triggering mechanisms and easing backfill logic with a real-world demo.

Columbia D

In legacy Airflow 2.x, each DAG run was tied to a unique “execution_date.” By removing this requirement, Airflow can now directly support a variety of new use cases, such as model training and generative AI inference, without the need for hacks and workarounds typically used by machine learning and AI engineers.

In this talk, we will delve into the significant advancements in Airflow 3 that enable GenAI and MLOps use cases, particularly through the changes outlined in AIP 83. We’ll cover key changes like the renaming of “execution_date” to “logical_date,” along with the allowance for it to be null, and the introduction of the new “run_after” field which provides a more meaningful mechanism for scheduling and sorting. Furthermore, we’ll discuss how by removing the uniqueness constraint, Airflow 3 enables multiple parallel runs, empowering diverse triggering mechanisms and easing backfill logic with a real-world demo.

14:00 - 16:30.
By Eugene Kosteev
Track: Workshop
Room: 30
10/08/2025 2:00 PM 10/08/2025 4:30 PM America/Los_Angeles AS24: Cloud Composer : Introduction into Advanced Features

During this workshop you are going to learn the latest features published within Cloud Composer which is a managed service for Apache Airflow on Google Cloud Platform.

30
Learn the latest features published within Cloud Composer which is a managed service for Apache Airflow on Google Cloud Platform.
14:00 - 14:25.
By Andrea Bombino & Nawfel Bacha
Track: Airflow 3
Room: Columbia C
10/08/2025 2:00 PM 10/08/2025 2:25 PM America/Los_Angeles AS24: Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt.

This session will showcase how Airflow 3.0 enables truly real-time orchestration.

Columbia C

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt.

14:00 - 14:25.
By Oleksandr Slynko
Track: Use cases
Room: Columbia A
10/08/2025 2:00 PM 10/08/2025 2:25 PM America/Los_Angeles AS24: GitHub's Airflow Journey: Lessons, Mistakes, and Insights

This session explores how GitHub uses Apache Airflow for efficient data engineering. We will share nearly 9 years of experiences, including lessons learnt, mistakes made, and the ways we reduced our on-call and engineering burden. We’ll demonstrate how we keep data flowing smoothly while continuously evolving Airflow and other components of our data platform, ensuring safety and reliability. The session will touch on how we migrate Airflow between cloud without user impact. We’ll also cover how we cut down the time from idea to running a DAG in production, despite our Airflow repo being among the top 15 by number of PRs within GitHub.

We’ll dive into specific techniques such as testing connections and operators, relying on dag-sync, providing short-lived development environments to let developers test their DAG runs, and creating reusable patterns for DAGs. By the end of this session, you will gain practical insights and actionable strategies to improve your own data engineering processes.

Columbia A

This session explores how GitHub uses Apache Airflow for efficient data engineering. We will share nearly 9 years of experiences, including lessons learnt, mistakes made, and the ways we reduced our on-call and engineering burden. We’ll demonstrate how we keep data flowing smoothly while continuously evolving Airflow and other components of our data platform, ensuring safety and reliability. The session will touch on how we migrate Airflow between cloud without user impact. We’ll also cover how we cut down the time from idea to running a DAG in production, despite our Airflow repo being among the top 15 by number of PRs within GitHub.

14:00 - 16:30.
By Philippe Gagnon
Track: Workshop
Room: 305
10/08/2025 2:00 PM 10/08/2025 4:30 PM America/Los_Angeles AS24: Implementing Operations Research Problems with Apache Airflow: From Modelling to Production

This workshop will provide an overview of implementing operations research problems using Apache Airflow. This is a hands-on session where attendees will gain experience creating DAGs to define and manage workflows for classical operations research problems. The workshop will include several examples of how Airflow can be used to optimize and automate various decision-making processes, including:

  • Inventory management: How to use Airflow to optimize inventory levels and reduce stockouts by analyzing demand patterns, lead times, and other factors.
  • Production planning: How to use Airflow to create optimized production schedules that minimize downtime, reduce costs, and increase throughput.
  • Logistics optimization: How to use Airflow to optimize transportation routes and other factors to improve the efficiency of logistics operations.

Attendees will come away with a solid understanding of using Airflow to automate decision-making processes with optimization solvers.

305
Hands-on session where attendees will gain experience creating DAGs to define and manage workflows for classical operations research problems.
14:00 - 14:25.
By Ephraim Anierobi
Track: Airflow & ...
Room: Beckler
10/08/2025 2:00 PM 10/08/2025 2:25 PM America/Los_Angeles AS24: Seamless Integration: Building Applications That Leverage Airflow's Database Migration Framework

This session presents a comprehensive guide to building applications that integrate with Apache Airflow’s database migration system. We’ll explore how to harness Airflow’s robust Alembic-based migration toolchain to maintain schema compatibility between Airflow and custom applications, enabling developers to create solutions that evolve alongside the Airflow ecosystem without disruption.

Beckler

This session presents a comprehensive guide to building applications that integrate with Apache Airflow’s database migration system. We’ll explore how to harness Airflow’s robust Alembic-based migration toolchain to maintain schema compatibility between Airflow and custom applications, enabling developers to create solutions that evolve alongside the Airflow ecosystem without disruption.

14:30 - 14:55.
By Shubham Raj & Jens Scheffler
Track: Airflow intro/overview
Room: Columbia A
10/08/2025 2:30 PM 10/08/2025 2:55 PM America/Los_Angeles AS24: Airflow 3’s Trigger UI: Evolution of Params

Are you looking to build slick, dynamic trigger forms for your DAGs? It all starts with mastering params.

Params are the gold standard for adding execution options to your DAGs, allowing you to create dynamic, user-friendly trigger forms with descriptions, validation, and now, with Airflow 3, bidirectional support for conf data!

In this talk, we’ll break down how to use params effectively, share best practices, and explore what’s new since the 2023 Airflow Summit talk (https://airflowsummit.org/sessions/2023/flexible-dag-trigger-forms-aip-50/). If you want to make DAG execution more flexible, intuitive, and powerful, this session is a must-attend!

Columbia A

Are you looking to build slick, dynamic trigger forms for your DAGs? It all starts with mastering params.

Params are the gold standard for adding execution options to your DAGs, allowing you to create dynamic, user-friendly trigger forms with descriptions, validation, and now, with Airflow 3, bidirectional support for conf data!

In this talk, we’ll break down how to use params effectively, share best practices, and explore what’s new since the 2023 Airflow Summit talk (https://airflowsummit.org/sessions/2023/flexible-dag-trigger-forms-aip-50/). If you want to make DAG execution more flexible, intuitive, and powerful, this session is a must-attend!

14:30 - 14:55.
By Igor Kholopov
Track: Roadmap
Room: Columbia C
10/08/2025 2:30 PM 10/08/2025 2:55 PM America/Los_Angeles AS24: Beyond the bundle - evolving DAG parsing in Airflow 3

Airflow 3 made some great strides with AIP-66, introducing the concept of a DAG bundle. This successfully challenged one of the fundamental architectural limitations of original Airflow design of how DAGs are deployed, bringing the structure to something that often had to be operated as a pile of files in the past. However, we believe that this by no means should be the end of the road when it comes to making the DAG management easier, authoring more accessible to a broader audience, and integration with Data Agents smoother. We believe that the next step in Airflow’s evolution is in having a native option to break away from the necessity of having a real file in file systems on multiple components to have your DAG up and running. This is what we are hoping to achieve as part of AIP-85 - extendable DAG parsing control. In this talk I’d like to give a detailed overview of how we want to make it happen and show the examples of the valuable integrations we hope to unblock with it.

Columbia C

Airflow 3 made some great strides with AIP-66, introducing the concept of a DAG bundle. This successfully challenged one of the fundamental architectural limitations of original Airflow design of how DAGs are deployed, bringing the structure to something that often had to be operated as a pile of files in the past. However, we believe that this by no means should be the end of the road when it comes to making the DAG management easier, authoring more accessible to a broader audience, and integration with Data Agents smoother. We believe that the next step in Airflow’s evolution is in having a native option to break away from the necessity of having a real file in file systems on multiple components to have your DAG up and running. This is what we are hoping to achieve as part of AIP-85 - extendable DAG parsing control. In this talk I’d like to give a detailed overview of how we want to make it happen and show the examples of the valuable integrations we hope to unblock with it.

14:30 - 14:55.
By Aleksandr Shirokov, Roman Khomenko & Tarasov Alexey
Track: Airflow & ...
Room: Columbia D
10/08/2025 2:30 PM 10/08/2025 2:55 PM America/Los_Angeles AS24: Building an MLOps Platform for 300+ ML/DS Specialists on Top of Airflow

As your organization scales to 20+ data science teams and 300+ DS/ML/DE engineers, you face a critical challenge: how to build a secure, reliable, and scalable orchestration layer that supports both fast experimentation and stable production workflows. We chose Airflow — and didn’t regret it! But to make it truly work at our scale, we had to rethink its architecture from the ground up.

In this talk, we’ll share how we turned Airflow into a powerful MLOps platform through its core capability: running pipelines across multiple K8s GPU clusters from a single UI (!) using per-cluster worker pools. To support ease of use, we developed MLTool — our own library for fast and standardized DAG development, integrated Vault for secure secret management across teams, enabled real-time logging with S3 persistence and built a custom SparkSubmitOperator for Kerberos-authenticated Spark/Hadoop jobs in Kubernetes. We also streamlined the developer experience — users can generate a GitLab repo and deploy a versioned pipeline to prod in under 10 minutes!

We’re proud of what we’ve built — and our users are too. Now we want to share it with the world!

Columbia D

As your organization scales to 20+ data science teams and 300+ DS/ML/DE engineers, you face a critical challenge: how to build a secure, reliable, and scalable orchestration layer that supports both fast experimentation and stable production workflows. We chose Airflow — and didn’t regret it! But to make it truly work at our scale, we had to rethink its architecture from the ground up.

In this talk, we’ll share how we turned Airflow into a powerful MLOps platform through its core capability: running pipelines across multiple K8s GPU clusters from a single UI (!) using per-cluster worker pools. To support ease of use, we developed MLTool — our own library for fast and standardized DAG development, integrated Vault for secure secret management across teams, enabled real-time logging with S3 persistence and built a custom SparkSubmitOperator for Kerberos-authenticated Spark/Hadoop jobs in Kubernetes. We also streamlined the developer experience — users can generate a GitLab repo and deploy a versioned pipeline to prod in under 10 minutes!

14:30 - 14:55.
By Ipsa Trivedi & Chirag Tailor
Track: Use cases
Room: Beckler
10/08/2025 2:30 PM 10/08/2025 2:55 PM America/Los_Angeles AS24: Data Quality and Observability with Airflow

Tekmetric is the largest cloud based auto shop management system in the United States. We process vast amounts of data from various integrations with internal and external systems. Data quality and governance are crucial for both our internal operations and the success of our customers.

We leverage multi-step data processing pipelines using AWS services and Airflow. While we utilize traditional data pipeline workflows to manage and move data, we go beyond standard orchestration. After data is processed, we apply tailored quality checks for schema validation, record completeness, freshness, duplication and more.

In this talk, we’ll explore how Airflow allows us to enhance data observability. We’ll discuss how Airflow’s flexibility enables seamless integration and monitoring across different teams and datasets, ensuring reliable and accurate data at every stage.

This session will highlight how Tekmetric uses data quality governance and observability practices to drive business success through trusted data.

Beckler

Tekmetric is the largest cloud based auto shop management system in the United States. We process vast amounts of data from various integrations with internal and external systems. Data quality and governance are crucial for both our internal operations and the success of our customers.

We leverage multi-step data processing pipelines using AWS services and Airflow. While we utilize traditional data pipeline workflows to manage and move data, we go beyond standard orchestration. After data is processed, we apply tailored quality checks for schema validation, record completeness, freshness, duplication and more.

15:00 - 15:25.
By Khaled Hassan
Track: Airflow 3
Room: Columbia C
10/08/2025 3:00 PM 10/08/2025 3:25 PM America/Los_Angeles AS24: Building Airflow 3 setups resilient to zonal/regional down events, ready for Disaster Recovery event

Want to be resilient to any zonal/regional down events when building Airflow in a cloud environment? Unforeseen disruptions in cloud infrastructure, whether isolated to specific zones or impacting entire regions, pose a tangible threat to the continuous operation of critical data workflows managed by Airflow. These outages, though often technical in nature, translate directly into real-world consequences, potentially causing interruptions in essential services, delays in crucial information delivery, and ultimately impacting the reliability and efficiency of various operational processes that businesses and individuals depend upon daily. The inability to process data reliably due to infrastructure instability can cascade into tangible setbacks across diverse sectors, highlighting the urgent need for resilient and robust Airflow deployments.

Let’s dive deep into strategies for building truly resilient Airflow setups that can withstand zonal and even regional down events. We’ll explore architectural patterns like multi-availability zone deployments, cross-region failover mechanisms, and robust data replication techniques to minimise downtime and ensure business continuity. Discover practical tips and best practices for having a resilient Airflow infrastructure. By attending this presentation, you’ll gain the knowledge and tools necessary to significantly improve the reliability and stability of your critical data pipelines, ultimately saving time, resources, and preventing costly disruptions.

Columbia C

Want to be resilient to any zonal/regional down events when building Airflow in a cloud environment? Unforeseen disruptions in cloud infrastructure, whether isolated to specific zones or impacting entire regions, pose a tangible threat to the continuous operation of critical data workflows managed by Airflow. These outages, though often technical in nature, translate directly into real-world consequences, potentially causing interruptions in essential services, delays in crucial information delivery, and ultimately impacting the reliability and efficiency of various operational processes that businesses and individuals depend upon daily. The inability to process data reliably due to infrastructure instability can cascade into tangible setbacks across diverse sectors, highlighting the urgent need for resilient and robust Airflow deployments.

15:00 - 15:25.
By Eloi Codina Torras
Track: Use cases
Room: Columbia A
10/08/2025 3:00 PM 10/08/2025 3:25 PM America/Los_Angeles AS24: How Airflow Runs The Weather

Forecasting the weather and air quality is a logistical challenge. Numerical simulations are complex, resource-hungry, and sometimes fail without warning. Yet, our clients depend on accurate forecasts delivered daily and on time. At the heart of this operation is Airflow: the orchestration engine that keeps everything running.

In this session, we’ll dive into the world behind weather and air quality forecasts. In particular, we’ll explore:

  • The atmospheric modeling pipeline, to understand the unique demands it places on infrastructure
  • How we use Airflow to orchestrate complex simulations reliably and at scale, to inspire new ways of managing time-critical, compute-heavy workflows.
  • Our integration of Airflow with a high-performance computing (HPC) environment using Slurm, to run resource-intensive workloads efficiently in bare metal machines.

At Meteosim we are experts on weather and air quality intelligence. With projects in over 80 countries, we support decision-making in industries where weather and air quality matter most: from daily operations to long-term sustainability.

Columbia A

Forecasting the weather and air quality is a logistical challenge. Numerical simulations are complex, resource-hungry, and sometimes fail without warning. Yet, our clients depend on accurate forecasts delivered daily and on time. At the heart of this operation is Airflow: the orchestration engine that keeps everything running.

In this session, we’ll dive into the world behind weather and air quality forecasts. In particular, we’ll explore:

  • The atmospheric modeling pipeline, to understand the unique demands it places on infrastructure
  • How we use Airflow to orchestrate complex simulations reliably and at scale, to inspire new ways of managing time-critical, compute-heavy workflows.
  • Our integration of Airflow with a high-performance computing (HPC) environment using Slurm, to run resource-intensive workloads efficiently in bare metal machines.

At Meteosim we are experts on weather and air quality intelligence. With projects in over 80 countries, we support decision-making in industries where weather and air quality matter most: from daily operations to long-term sustainability.

15:00 - 15:25.
By Harel Shein & Maciej Obuchowski
Track: Airflow & ...
Room: Beckler
10/08/2025 3:00 PM 10/08/2025 3:25 PM America/Los_Angeles AS24: Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond

OpenLineage has simplified collecting lineage metadata across the data ecosystem by standardizing its representation in an extensible model. It enabled a whole ecosystem improving data pipeline reliability and ease of troubleshooting in production environments. In this talk, we’ll briefly introduce the OpenLineage model and explore how this metadata is collected from Airflow, Spark, dbt, and Flink. We’ll demonstrate how to extract valuable insights and outline practical benefits and common challenges when building ingestion, processing and storage for OpenLineage data. We will also briefly show how OpenLineage events can be used to observe data pipelines exhastively and the benefits that brings.

Beckler

OpenLineage has simplified collecting lineage metadata across the data ecosystem by standardizing its representation in an extensible model. It enabled a whole ecosystem improving data pipeline reliability and ease of troubleshooting in production environments. In this talk, we’ll briefly introduce the OpenLineage model and explore how this metadata is collected from Airflow, Spark, dbt, and Flink. We’ll demonstrate how to extract valuable insights and outline practical benefits and common challenges when building ingestion, processing and storage for OpenLineage data. We will also briefly show how OpenLineage events can be used to observe data pipelines exhastively and the benefits that brings.

15:00 - 15:25.
By Maxime Beauchemin
Track: Use cases
Room: Columbia D
10/08/2025 3:00 PM 10/08/2025 3:25 PM America/Los_Angeles AS24: Why Data Teams Keep Reinventing the Wheel: The Struggle for Code Reuse in the Data Transformation La

Data teams have a bad habit: reinventing the wheel. Despite the explosion of open-source tooling, best practices, and managed services, teams still find themselves building bespoke data platforms from scratch—often hitting the same roadblocks as those before them. Why does this keep happening, and more importantly, how can we break the cycle?

In this talk, we’ll unpack the key reasons data teams default to building rather than adopting, from technical nuances to cultural and organizational dynamics. We’ll discuss why fragmentation in the modern data stack, the pressure to “own” infrastructure, and the allure of in-house solutions make this problem so persistent.

Using real-world examples, we’ll explore strategies to help data teams focus on delivering business value rather than endlessly rebuilding foundational infrastructure. Whether you’re an engineer, a data leader, or an open-source contributor, this session will provide insights into navigating the build-vs-buy tradeoff more effectively.

Columbia D

Data teams have a bad habit: reinventing the wheel. Despite the explosion of open-source tooling, best practices, and managed services, teams still find themselves building bespoke data platforms from scratch—often hitting the same roadblocks as those before them. Why does this keep happening, and more importantly, how can we break the cycle?

In this talk, we’ll unpack the key reasons data teams default to building rather than adopting, from technical nuances to cultural and organizational dynamics. We’ll discuss why fragmentation in the modern data stack, the pressure to “own” infrastructure, and the allure of in-house solutions make this problem so persistent.

15:45 - 16:10.
By Bhavani Ravi
Track: Best practices
Room: Columbia C
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: Apache Airflow 3.0 - Bad vs. Best Practices In Production

The general-purpose nature of Airflow has always left us questioning, “Is this the right way”? While the existing resources and community cover them, the new Airflow releases always leave us wondering if there is more . This talk reveals how 3.0’s innovations redefine best practices, building production-ready data platforms.

• Dag Development - Future-proof your dags without compromising on Fundamentals • Modern Pipelines: How to best incorporate new Airflow features • Infrastructure: Leveraging 3.0’s Service-Oriented Architecture and Edge Executor • Teams & Responsibilities: Streamlined operations with the new split CLI and improved UI. • Monitoring & Observability: Building fail-proof pipelines

Columbia C

The general-purpose nature of Airflow has always left us questioning, “Is this the right way”? While the existing resources and community cover them, the new Airflow releases always leave us wondering if there is more . This talk reveals how 3.0’s innovations redefine best practices, building production-ready data platforms.

• Dag Development - Future-proof your dags without compromising on Fundamentals • Modern Pipelines: How to best incorporate new Airflow features • Infrastructure: Leveraging 3.0’s Service-Oriented Architecture and Edge Executor • Teams & Responsibilities: Streamlined operations with the new split CLI and improved UI. • Monitoring & Observability: Building fail-proof pipelines

15:45 - 16:10.
By Tzu-ping Chung
Track: Roadmap
Room: Columbia A
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: Assets: Past, Present, Future

Airflow Asset originated from data lineage and evolved into its current state, being used as a scheduling concept (data-aware, event-based scheduling). It has even more potential. This talk discusses how other parts of Airflow, namely Connection and Object Storage, contain concepts related to Asset, and we can tie them all together to make task authoring flow even more naturally.

Planned topics:

  • Brief history on Asset and related constructs.
  • Current state of Asset concepts.
  • Inlets, anyone?
  • Finding inspiration from Pydantic et al.
  • My next step for Asset.
Columbia A

Airflow Asset originated from data lineage and evolved into its current state, being used as a scheduling concept (data-aware, event-based scheduling). It has even more potential. This talk discusses how other parts of Airflow, namely Connection and Object Storage, contain concepts related to Asset, and we can tie them all together to make task authoring flow even more naturally.

Planned topics:

  • Brief history on Asset and related constructs.
  • Current state of Asset concepts.
  • Inlets, anyone?
  • Finding inspiration from Pydantic et al.
  • My next step for Asset.
15:45 - 16:10.
By Blagoy Kaloferov
Track: Use cases
Room: Beckler
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: From Legacy to Leading Edge: How Airflow Migration Unlocked Cross-Team Business Value

At TrueCar, migrating hundreds of legacy Oozie workflows and in-house orchestration tools to Apache Airflow required key technical decisions that transformed our data platform architecture and organizational capabilities. We consolidated individual chained tasks into optimized DAGs leveraging native Airflow functionality to trigger compute across cloud environments. A crucial breakthrough was developing DAG generators to scale migration—essential for efficiently migrating hundreds of workflows while maintaining consistency. By decoupling orchestration from compute, we gained flexibility to select optimal tools for specific outcomes—programmatic processing, analytics, batch jobs, or AI/ML pipelines. This resulted in cost reductions, performance improvements, and team agility. We also gained unprecedented visibility into DAG performance and dependency patterns previously invisible across fragmented systems. Attendees will learn how we redesigned complex workflows into efficient DAGs using dynamic task generation, architectural decisions that enabled platform innovation and the decision framework that made our migration transformational.

Beckler

At TrueCar, migrating hundreds of legacy Oozie workflows and in-house orchestration tools to Apache Airflow required key technical decisions that transformed our data platform architecture and organizational capabilities. We consolidated individual chained tasks into optimized DAGs leveraging native Airflow functionality to trigger compute across cloud environments. A crucial breakthrough was developing DAG generators to scale migration—essential for efficiently migrating hundreds of workflows while maintaining consistency. By decoupling orchestration from compute, we gained flexibility to select optimal tools for specific outcomes—programmatic processing, analytics, batch jobs, or AI/ML pipelines. This resulted in cost reductions, performance improvements, and team agility. We also gained unprecedented visibility into DAG performance and dependency patterns previously invisible across fragmented systems. Attendees will learn how we redesigned complex workflows into efficient DAGs using dynamic task generation, architectural decisions that enabled platform innovation and the decision framework that made our migration transformational.

15:45 - 16:10.
By Oscar Ligthart & Rodrigo Loredo
Track: Use cases
Room: Columbia A
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: How Airflow solves the coordination of decentralised teams at Vinted

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams.

In this session, we will present how we utilize a code generator to streamline the creation of Airflow code for numerous dbt repositories, dockerized jobs, and Vertex-AI pipelines. With this approach, we simplify the complexity and offer our users the flexibility required to accommodate their use cases. We will share our sensor-callback strategy, which we developed to manage task dependencies, overcoming the limitations of traditional dataset triggers. This approach requires a data asset registry to monitor global dependencies and SLOs, and serves as a safeguard during CI processes for detecting potential breaking changes.

Columbia A

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams.

15:45 - 16:10.
By Ethan Shalev
Track: Use cases
Room: Columbia D
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: Purple is the new green: harnessing deferrable operators to improve performance & reduce costs

Airflow’s traditional execution model often leads to wasted resources: worker nodes sitting idle, waiting on external systems. At Wix, we tackled this inefficiency head-on by refactoring our in-house operators to support Airflow’s deferrable execution model.

Join us on a walk through Wix’s journey to a more efficient Airflow setup, from identifying bottlenecks to implementing deferrable operators and reaping their benefits. We’ll share the alternatives considered, the refactoring process, and how the team seamlessly integrated deferrable execution with no disruption to data engineers’ workflows.

Attendees will get a practical introduction to deferrable operators: how they work, what they require, and how to implement them. We’ll also discuss the changes made to Wix’s Airflow environment, the process of prioritizing operators for modification, and lessons learned from testing and rollout.

By the end of this talk, attendees will be ready to embrace more purple tasks in their Airflow UI, boosting efficiency, cutting costs, and making their workflows greener in every other way.

Columbia D

Airflow’s traditional execution model often leads to wasted resources: worker nodes sitting idle, waiting on external systems. At Wix, we tackled this inefficiency head-on by refactoring our in-house operators to support Airflow’s deferrable execution model.

Join us on a walk through Wix’s journey to a more efficient Airflow setup, from identifying bottlenecks to implementing deferrable operators and reaping their benefits. We’ll share the alternatives considered, the refactoring process, and how the team seamlessly integrated deferrable execution with no disruption to data engineers’ workflows.

15:45 - 16:10.
By Wei Lee
Track: Airflow 3
Room: Columbia C
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: Seamless Migration: Leveraging Ruff for a Smooth Transition from Airflow 2 to Airflow 3

Migrating from Airflow 2 to the newly released Airflow 3 may seem intimidating due to numerous breaking changes and the introduction of new features. Although a backward compatibility layer has been implemented and most of the existing dags should work fine, some features—such as subdags and execution_date—have been removed based on community consensus.

To support this transition, we worked with Ruff to establish rules that automatically identify removed or deprecated features and even assist in fixing them. In this presentation, I will outline our current Ruff features, the migration rules from Airflow 2 to 3, and how this experience opens the door for us to promote best practices in Airflow through Ruff in the future.

After this session, Airflow users will understand how Ruff can facilitate a smooth transition to Airflow 3. As developers of Airflow, we will delve into the details of how these migration rules were implemented and discuss how we can leverage this knowledge to introduce linting rules that encourage Airflow users to adopt best practices.

Columbia C

Migrating from Airflow 2 to the newly released Airflow 3 may seem intimidating due to numerous breaking changes and the introduction of new features. Although a backward compatibility layer has been implemented and most of the existing dags should work fine, some features—such as subdags and execution_date—have been removed based on community consensus.

To support this transition, we worked with Ruff to establish rules that automatically identify removed or deprecated features and even assist in fixing them. In this presentation, I will outline our current Ruff features, the migration rules from Airflow 2 to 3, and how this experience opens the door for us to promote best practices in Airflow through Ruff in the future.

15:45 - 16:10.
By Dheeraj Turaga
Track: Use cases
Room: Columbia C
10/08/2025 3:45 PM 10/08/2025 4:10 PM America/Los_Angeles AS24: Semiconductor (Chip) Design Workflow Orchestration with Airflow

The design of Qualcomm’s Snapdragon System-On-Chip (SoCs) involves several hundred complex workflows orchestrated across multiple data centers, taking the design from RTL to GDS. In the Snapdragon Oryon Custom CPU team, we introduced Airflow about 2 years ago to orchestrate design, verification, emulation, CI/CD, and physical implementation of our CPUs.

Use Case: • Standardization and Templatization: We standardize and templatize common workflows, allowing designers to verify their designs by customizing YAML parameters. • Custom Shell Operators: We created custom shell operators (tcshrc) to source project environments and work with internal tooling. • Smart Retries: We use pre/post-execute hooks to trigger smart retries on failure. • Dynamic Celery Workers: We auto-create Celery workers on the fly on our High-Performance Compute (HPC) clusters to launch and manage Electronic Design Automation (EDA) workloads. • Hybrid Executor Strategy: We use a hybrid executor strategy (CeleryExecutor and EdgeExecutor) to orchestrate tasks across multiple data centers. • EdgeExecutor for Remote Testing: We leverage EdgeExecutor to access post-silicon hardware in remote locations.

Columbia C

The design of Qualcomm’s Snapdragon System-On-Chip (SoCs) involves several hundred complex workflows orchestrated across multiple data centers, taking the design from RTL to GDS. In the Snapdragon Oryon Custom CPU team, we introduced Airflow about 2 years ago to orchestrate design, verification, emulation, CI/CD, and physical implementation of our CPUs.

Use Case: • Standardization and Templatization: We standardize and templatize common workflows, allowing designers to verify their designs by customizing YAML parameters. • Custom Shell Operators: We created custom shell operators (tcshrc) to source project environments and work with internal tooling. • Smart Retries: We use pre/post-execute hooks to trigger smart retries on failure. • Dynamic Celery Workers: We auto-create Celery workers on the fly on our High-Performance Compute (HPC) clusters to launch and manage Electronic Design Automation (EDA) workloads. • Hybrid Executor Strategy: We use a hybrid executor strategy (CeleryExecutor and EdgeExecutor) to orchestrate tasks across multiple data centers. • EdgeExecutor for Remote Testing: We leverage EdgeExecutor to access post-silicon hardware in remote locations.

16:45 - 17:10.
By Rahul Vats & Phani Kumar
Track: Airflow 3
Room: Beckler
10/08/2025 4:45 PM 10/08/2025 5:10 PM America/Los_Angeles AS24: Behind the Scenes: How We Tested Airflow 3 for Stability and Reliability

Ensuring the stability of a major release like Airflow 3 required extensive testing across multiple dimensions. In this session, we will dive into the testing strategies and validation techniques used to guarantee a smooth rollout. From unit and integration tests to real-world DAG validations, this talk will cover the challenges faced, key learnings, and best practices for testing Airflow. Whether you’re a contributor, QA engineer, or Airflow user preparing for migration, this session will offer valuable takeaways to improve your own testing approach.

Beckler

Ensuring the stability of a major release like Airflow 3 required extensive testing across multiple dimensions. In this session, we will dive into the testing strategies and validation techniques used to guarantee a smooth rollout. From unit and integration tests to real-world DAG validations, this talk will cover the challenges faced, key learnings, and best practices for testing Airflow. Whether you’re a contributor, QA engineer, or Airflow user preparing for migration, this session will offer valuable takeaways to improve your own testing approach.

16:45 - 17:10.
By Silver Pang
Track: Use cases
Room: Columbia A
10/08/2025 4:45 PM 10/08/2025 5:10 PM America/Los_Angeles AS24: From Centrailization to Autonomy: Managing Airflow Pipeline through Multi-Tenancy

At the enterprise level, managing Airflow deployments across multiple teams can become complex, leading to bottlenecks and slowed development cycles. We will share our journey of decentralizing Airflow repositories to empower data engineering teams with multi-tenancy, clean folder structures, and streamlined DevOps processes.

We dive into how restructuring our Airflow architecture and utilizing repository templates allowed teams to generate new data pipelines effortlessly. This approach enables engineers to focus on business logic without worrying about underlying Airflow configurations. By automating deployments and reducing manual errors through CI/CD pipelines, we minimized operational overhead.

However, this transformation wasn’t without challenges. We’ll discuss obstacles we faced, such as maintaining code consistency, variables, and utility functions across decentralized repositories; ensuring compliance in a multi-tenant environment; and managing the learning curve associated with new workflows.

Join us to discover practical insights on how decentralizing Airflow repositories can boost team productivity and adapt to evolving business needs with minimal effort.

Columbia A

At the enterprise level, managing Airflow deployments across multiple teams can become complex, leading to bottlenecks and slowed development cycles. We will share our journey of decentralizing Airflow repositories to empower data engineering teams with multi-tenancy, clean folder structures, and streamlined DevOps processes.

We dive into how restructuring our Airflow architecture and utilizing repository templates allowed teams to generate new data pipelines effortlessly. This approach enables engineers to focus on business logic without worrying about underlying Airflow configurations. By automating deployments and reducing manual errors through CI/CD pipelines, we minimized operational overhead.

16:45 - 17:10.
By Cedrik Neumann
Track: Airflow 3
Room: Columbia C
10/08/2025 4:45 PM 10/08/2025 5:10 PM America/Los_Angeles AS24: Run Airflow tasks on your coffee machine

Airflow 3 comes with two new features: Edge execution and the task SDK. Powered by a HTTP API, these make it possible to write and execute Airflow tasks in any language from anywhere.

In this session I will explain some of the APIs needed and show how to interact with them based on an embedded toy worker written in Rust and running on an ESP32-C3. Furthermore I will provide practical tips on writing your own edge worker and how to develop against a running instance of Airflow.

Columbia C

Airflow 3 comes with two new features: Edge execution and the task SDK. Powered by a HTTP API, these make it possible to write and execute Airflow tasks in any language from anywhere.

In this session I will explain some of the APIs needed and show how to interact with them based on an embedded toy worker written in Rust and running on an ESP32-C3. Furthermore I will provide practical tips on writing your own edge worker and how to develop against a running instance of Airflow.

16:45 - 17:10.
By Srinivas Podila & Venkat Sadineni
Track: Use cases
Room: Columbia D
10/08/2025 4:45 PM 10/08/2025 5:10 PM America/Los_Angeles AS24: Scaling Airflow with MWAA: A Multi-Tenant Enterprise Data Platform Journey

We use Amazon MWAA to orchestrate our enterprise data warehouse and MDM solutions. Our DAGs extract data from Salesforce, Oracle, Workday, and SFTP, transform it using Mulesoft, Informatica, and DBT, and load it into Salesforce Data Cloud and Snowflake. MWAA is configured as a multi-tenant platform, supporting more than 10 teams and managing thousands of DAGs per environment. Each team follows a full SDLC and has a dedicated Git repo integrated with Jenkins-based CI/CD pipelines for independent deployments.

To enhance security, we integrated CyberArk, assigning each team a SAFE and restricting access at the DAG level. We built custom Python frameworks for Salesforce, Snowflake, DBT Cloud, and Informatica, which not only trigger jobs in DBT Cloud and Informatica but also retrieve service account credentials from CyberArk at runtime. These frameworks bypass default Airflow plugins and eliminate the use of variables or hardcoded credentials, ensuring a secure and scalable approach to credential management.

We’ve enabled CloudWatch dashboards to monitor system health and resolve performance or DAG parsing issues, ensuring high availability and observability across environments.

Columbia D

We use Amazon MWAA to orchestrate our enterprise data warehouse and MDM solutions. Our DAGs extract data from Salesforce, Oracle, Workday, and SFTP, transform it using Mulesoft, Informatica, and DBT, and load it into Salesforce Data Cloud and Snowflake. MWAA is configured as a multi-tenant platform, supporting more than 10 teams and managing thousands of DAGs per environment. Each team follows a full SDLC and has a dedicated Git repo integrated with Jenkins-based CI/CD pipelines for independent deployments.

9:30 - 10:00
Keynote TBC
10:00 - 10:30
Coffee break
10:00 - 10:30
Sponsored talk
10:00 - 10:30
Sponsored talk
10:00 - 10:30
Sponsored talk
13:30 - 14:30
Lunch
14:00 - 16:30
Workshop TBD
15:30 - 15:45
Coffee break
09:10 - 09:25. Columbia A
By Tala Karadsheh
Track: Use cases

Airflow is integral to GitHub’s data and insight generation. This session dives into use cases from GitHub where key business decisions are driven, at the root, with the help of Airflow. The session will also highlight how both GitHub and Airflow celebrate, promote, and nurture OSS innovations in their own ways.

10:30 - 11:10. Columbia C
By Amogh Desai, Jarek Potiuk & Pavan kumar Gopidesu
Track: Community

Have you ever wondered why Apache Airflow builds are asymptotically(*) green? That thrive for “perennial green build” is not magic, it’s the result of continuous, often unseen engineering effort within our CI/CD pipelines & dev environments. This dedication ensures that maintainers can work efficiently & contributors can onboard smoothly.

To tackle the ever growing contributor base, we have a CI/CD team run by volunteers putting in significant work in the foundational tooling. In this talk, we reveal some innovative solutions we have implemented like:

10:30 - 11:10. Columbia D
By M Waqas Shahid
Track: Airflow 3

Curious how code truly flows inside Airflow? Join me for a unique visualisation journey into Airflow’s inner workings (first of its kind) — code blocks and modules called when certain operations are running.

A walkthrough that unveils task execution, observability, and debugging like never before. Scaling of Airflow in action, showing performance comparison b/w Airflow 3 vs 2. This session will demystify Airflow’s architecture, showcasing real-time task flows and the heartbeat of pipelines in action.

10:30 - 11:10. Beckler
By Miquel Angel Andreu Febrer
Track: Use cases

This session showcases Okta’s innovative approach to data pipeline orchestration with dbt and Airflow. How we’ve implemented dynamically generated airflow dags workflows based on dbt’s dependency graph. This allows us to enforce strict data quality standards by automatically executing downstream model tests before upstream model deployments, effectively preventing error cascades. The entire CI/CD pipeline, from dbt model changes to production DAG deployment, is fully automated. The result? Accelerated development cycles, reduced operational overhead, and bulletproof data reliability

10:30 - 13:00. 301
By Kenten Danas
Track: Workshop
Get hands-on experience with the new release and learn how to leverage new features like DAG versioning, backfills, data assets, and a new react-based UI.
10:30 - 13:00. 305
By Vinod Jayendra, Suba Palanisamy, Sean Bjurstrom & Anurag Srivastava
Track: Workshop
We’ll explore how to leverage Amazon SageMaker Unified Studio to build and deploy scalable Apache Airflow workflows that span the data and AI/ML lifecycle.
10:30 - 13:00. 306
By Mike Ellis
Track: Workshop
This interactive workshop session empowers you to unlock the full potential of Apache Airflow through performance optimization techniques.
11:15 - 12:00. Columbia A
By Kaxil Naik
Track: Roadmap

AI agents transform conversational prompts into actionable automation provided they have reliable access to essential tools like data warehouses, cloud storage, and APIs.

Now imagine exposing Airflow’s rich integration layer directly to AI agents via the emerging Model Context Protocol (MCP). This isn’t just gluing AI into Airflow; it’s turning Airflow into a structured execution layer for adaptive, agentic logic with full observability, retries, and audit trails built in.

We’ll demonstrate a real-world fraud detection pipeline powered by agents: suspicious transactions are analyzed, enriched dynamically with external customer data via MCP, and escalated based on validated, structured outputs. Every prompt, decision, and action is auditable and compliant.

11:15 - 12:00. Columbia C
By Sumit Maheshwari
Track: Use cases

Yes, you read that right — 200,000 pipelines, nearly 1 million task executions per day, all powered by a single Airflow instance.

In this session, we’ll take you behind the scenes of one of the boldest orchestration projects ever attempted: how Uber’s data platform team is executing what might be the largest Apache Airflow migration in history — and doing it straight to Airflow 3.

From scaling challenges and architectural choices to lessons learned in high-throughput orchestration, this is a deep dive into the tech, the chaos, and the strategy behind making data fly at unprecedented scale.

11:15 - 12:00. Beckler
By Bugra Ozturk
Track: Airflow 3

This talk will explore the key changes introduced by AIP-81, focusing on security enhancements and user experience improvements across the entire software development lifecycle. We will break down the technical advancements from both a security and usability perspective, addressing key questions for Apache Airflow users of all levels. Topics include and not limited to isolating CLI communication to enhance security via leveraging Role-Based Access Control (RBAC) within the API for secure database interactions, clearly defining local vs. remote command execution and future improvements.

12:00 - 12:25. Columbia A
By Belle Romea
Track: Use cases

Duolingo has built an internal tool DuoFactory to orchestrate AI generated content using Airflow. The tool has been used to generate example sentences per lesson, math exercises, and Duoradio lessons. The ecosystem is flexible for various company needs. Some of these use cases contain end to end generation where one click of a button generates content in app. We also have created a Workflow Builder to orchestrate and iterate on generative AI workflows by creating one-time DAG instances with a UI easy enough for non-engineers to use.

12:00 - 12:25. Columbia A
By Pankaj Koti, Tatiana Al-Chueyr Martins & Pankaj Singh
Track: Airflow & ...

Efficiently handling long-running workflows is crucial for scaling modern data pipelines. Apache Airflow’s deferrable operators help offload tasks during idle periods — freeing worker slots while tracking progress.

This session explores how Cosmos 1.9 (https://github.com/astronomer/astronomer-cosmos) integrates Airflow’s deferrable capabilities to enhance orchestrating dbt (https://github.com/dbt-labs/dbt-core) in production, with insights from recent contributions that introduced this functionality.

Key takeaways:

  • Deferrable Operators: How they work and why they’re ideal for long-running dbt tasks.
  • Integrating with Cosmos: Refactoring and enhancements to enable deferrable behaviour across platforms.
  • Performance Gains: Resource savings and task throughput improvements from deferrable execution.
  • Challenges & Future Enhancements: Lessons learned, compatibility, and ideas for broader support.

Whether orchestrating dbt models on a cloud warehouse or managing large-scale transformations, this session offers practical strategies to reduce resource contention and boost pipeline performance.

12:00 - 12:25. Beckler
By Andres Astorga Espriella & Soren Archibald
Track: Airflow & ...

In today’s dynamic data environments, tables and schemas are constantly evolving and keeping semantic layers up to date has become a critical operational challenge. Manual updates don’t scale, and delays can quickly lead to broken dashboards, failed pipelines, and lost trust.

We’ll show how to harness Apache Airflow 3 and its new event-driven scheduling capabilities to automate the entire lifecycle: detecting table and schema changes in real time, parsing and interpreting those changes, and shifting left the updating of semantic models across dbt, Looker, or custom metadata layers. AI agents will add intelligence and automation that rationalize schema diffs, assess impact of changes, and propose targeted updates to semantic layers reducing manual work and minimizing the risk of errors.

12:30 - 12:55. Columbia A
By Jens Scheffler, Brent Bovenzi & Pierre Jeambrun
Track: Airflow 3

In Airflow 2 there was a plugin mechanism to extend the UI for new functions as well as be able to add hooks and other features.

As Airflow 3 rewrote the UI old Plugins were not working for all cases anymore. Airflow 3.1 now provides a re-vamped option to extend the UI with a new plugin schema in native React components and embedded iframes following AIP-68 definitions.

In this session we will provide an overview about capabilities and give some intro how you can roll-your-own.

12:30 - 12:55. Columbia C
By Ryan Hatter
Track: Use cases

Airflow 3 brings several exciting new features that better support MLOps:

  • Native, intuitive backfills
  • Removal of the unique execution date for dag runs
  • Native support for event-driven scheduling

These features, combined with the Airflow AI SDK, enable dag authors to easily build scalable, maintainable, and performant LLMOps pipelines.

In this talk, we’ll go through a series of workflows that use the Airflow AI SDK to empower Astronomer’s support staff to more quickly resolve problems faced by Astronomer’s customers.

12:30 - 12:55. Columbia D
By Nick Bilozerov, Daniel Melchor & Sabrina Liu
Track: Use cases

Airflow is wonderfully, frustratingly complex - and so is global finance! Stripe has very specific needs all over the planet, and we have customized Airflow to adapt to the variety and rigor that we need to grow the GDP of the internet.

In this talk, you’ll learn:

  • How we support independent DAG change management for over 500 different teams running over 150k tasks.

  • How we’ve customized Airflow’s Kubernetes integration to comply with Stripe’s unique compliance requirements.

12:30 - 12:55. Beckler
By Kengo Seki
Track: Airflow & ...

Apache Bigtop is a time-proven open-source software stack for building data platform, which has been built around the Hadoop and Spark ecosystem since 2011. Its software composition has been changed during such a long period, and recently job scheduler is removed mainly due to the inactivity of its development. The speaker believes that Airflow perfectly fits into this gap and is proposing incorporating it in the Bigtop stack. This presentation will introduce how easily users can build a data platform with Bigtop including Airflow, and how Airflow can integrate those software with its wide range of providers and enterprise-readiness such as the Kerberos support.

14:00 - 14:25. Columbia A
By Oleksandr Slynko
Track: Use cases

This session explores how GitHub uses Apache Airflow for efficient data engineering. We will share nearly 9 years of experiences, including lessons learnt, mistakes made, and the ways we reduced our on-call and engineering burden. We’ll demonstrate how we keep data flowing smoothly while continuously evolving Airflow and other components of our data platform, ensuring safety and reliability. The session will touch on how we migrate Airflow between cloud without user impact. We’ll also cover how we cut down the time from idea to running a DAG in production, despite our Airflow repo being among the top 15 by number of PRs within GitHub.

14:00 - 14:25. Columbia C
By Andrea Bombino & Nawfel Bacha
Track: Airflow 3

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt.

14:00 - 14:25. Columbia D
By Ankit Chaurasia & Rahul Vats
Track: Airflow 3

In legacy Airflow 2.x, each DAG run was tied to a unique “execution_date.” By removing this requirement, Airflow can now directly support a variety of new use cases, such as model training and generative AI inference, without the need for hacks and workarounds typically used by machine learning and AI engineers.

In this talk, we will delve into the significant advancements in Airflow 3 that enable GenAI and MLOps use cases, particularly through the changes outlined in AIP 83. We’ll cover key changes like the renaming of “execution_date” to “logical_date,” along with the allowance for it to be null, and the introduction of the new “run_after” field which provides a more meaningful mechanism for scheduling and sorting. Furthermore, we’ll discuss how by removing the uniqueness constraint, Airflow 3 enables multiple parallel runs, empowering diverse triggering mechanisms and easing backfill logic with a real-world demo.

14:00 - 14:25. Beckler
By Ephraim Anierobi
Track: Airflow & ...

This session presents a comprehensive guide to building applications that integrate with Apache Airflow’s database migration system. We’ll explore how to harness Airflow’s robust Alembic-based migration toolchain to maintain schema compatibility between Airflow and custom applications, enabling developers to create solutions that evolve alongside the Airflow ecosystem without disruption.

14:00 - 16:30. 305
By Philippe Gagnon
Track: Workshop
Hands-on session where attendees will gain experience creating DAGs to define and manage workflows for classical operations research problems.
14:00 - 16:30. 30
By Eugene Kosteev
Track: Workshop
Learn the latest features published within Cloud Composer which is a managed service for Apache Airflow on Google Cloud Platform.
14:30 - 14:55. Columbia A
By Shubham Raj & Jens Scheffler
Track: Airflow intro/overview

Are you looking to build slick, dynamic trigger forms for your DAGs? It all starts with mastering params.

Params are the gold standard for adding execution options to your DAGs, allowing you to create dynamic, user-friendly trigger forms with descriptions, validation, and now, with Airflow 3, bidirectional support for conf data!

In this talk, we’ll break down how to use params effectively, share best practices, and explore what’s new since the 2023 Airflow Summit talk (https://airflowsummit.org/sessions/2023/flexible-dag-trigger-forms-aip-50/). If you want to make DAG execution more flexible, intuitive, and powerful, this session is a must-attend!

14:30 - 14:55. Columbia C
By Igor Kholopov
Track: Roadmap

Airflow 3 made some great strides with AIP-66, introducing the concept of a DAG bundle. This successfully challenged one of the fundamental architectural limitations of original Airflow design of how DAGs are deployed, bringing the structure to something that often had to be operated as a pile of files in the past. However, we believe that this by no means should be the end of the road when it comes to making the DAG management easier, authoring more accessible to a broader audience, and integration with Data Agents smoother. We believe that the next step in Airflow’s evolution is in having a native option to break away from the necessity of having a real file in file systems on multiple components to have your DAG up and running. This is what we are hoping to achieve as part of AIP-85 - extendable DAG parsing control. In this talk I’d like to give a detailed overview of how we want to make it happen and show the examples of the valuable integrations we hope to unblock with it.

14:30 - 14:55. Columbia D
By Aleksandr Shirokov, Roman Khomenko & Tarasov Alexey
Track: Airflow & ...

As your organization scales to 20+ data science teams and 300+ DS/ML/DE engineers, you face a critical challenge: how to build a secure, reliable, and scalable orchestration layer that supports both fast experimentation and stable production workflows. We chose Airflow — and didn’t regret it! But to make it truly work at our scale, we had to rethink its architecture from the ground up.

In this talk, we’ll share how we turned Airflow into a powerful MLOps platform through its core capability: running pipelines across multiple K8s GPU clusters from a single UI (!) using per-cluster worker pools. To support ease of use, we developed MLTool — our own library for fast and standardized DAG development, integrated Vault for secure secret management across teams, enabled real-time logging with S3 persistence and built a custom SparkSubmitOperator for Kerberos-authenticated Spark/Hadoop jobs in Kubernetes. We also streamlined the developer experience — users can generate a GitLab repo and deploy a versioned pipeline to prod in under 10 minutes!

14:30 - 14:55. Beckler
By Ipsa Trivedi & Chirag Tailor
Track: Use cases

Tekmetric is the largest cloud based auto shop management system in the United States. We process vast amounts of data from various integrations with internal and external systems. Data quality and governance are crucial for both our internal operations and the success of our customers.

We leverage multi-step data processing pipelines using AWS services and Airflow. While we utilize traditional data pipeline workflows to manage and move data, we go beyond standard orchestration. After data is processed, we apply tailored quality checks for schema validation, record completeness, freshness, duplication and more.

15:00 - 15:25. Columbia A
By Eloi Codina Torras
Track: Use cases

Forecasting the weather and air quality is a logistical challenge. Numerical simulations are complex, resource-hungry, and sometimes fail without warning. Yet, our clients depend on accurate forecasts delivered daily and on time. At the heart of this operation is Airflow: the orchestration engine that keeps everything running.

In this session, we’ll dive into the world behind weather and air quality forecasts. In particular, we’ll explore:

  • The atmospheric modeling pipeline, to understand the unique demands it places on infrastructure
  • How we use Airflow to orchestrate complex simulations reliably and at scale, to inspire new ways of managing time-critical, compute-heavy workflows.
  • Our integration of Airflow with a high-performance computing (HPC) environment using Slurm, to run resource-intensive workloads efficiently in bare metal machines.

At Meteosim we are experts on weather and air quality intelligence. With projects in over 80 countries, we support decision-making in industries where weather and air quality matter most: from daily operations to long-term sustainability.

15:00 - 15:25. Columbia C
By Khaled Hassan
Track: Airflow 3

Want to be resilient to any zonal/regional down events when building Airflow in a cloud environment? Unforeseen disruptions in cloud infrastructure, whether isolated to specific zones or impacting entire regions, pose a tangible threat to the continuous operation of critical data workflows managed by Airflow. These outages, though often technical in nature, translate directly into real-world consequences, potentially causing interruptions in essential services, delays in crucial information delivery, and ultimately impacting the reliability and efficiency of various operational processes that businesses and individuals depend upon daily. The inability to process data reliably due to infrastructure instability can cascade into tangible setbacks across diverse sectors, highlighting the urgent need for resilient and robust Airflow deployments.

15:00 - 15:25. Columbia D
By Maxime Beauchemin
Track: Use cases

Data teams have a bad habit: reinventing the wheel. Despite the explosion of open-source tooling, best practices, and managed services, teams still find themselves building bespoke data platforms from scratch—often hitting the same roadblocks as those before them. Why does this keep happening, and more importantly, how can we break the cycle?

In this talk, we’ll unpack the key reasons data teams default to building rather than adopting, from technical nuances to cultural and organizational dynamics. We’ll discuss why fragmentation in the modern data stack, the pressure to “own” infrastructure, and the allure of in-house solutions make this problem so persistent.

15:00 - 15:25. Beckler
By Harel Shein & Maciej Obuchowski
Track: Airflow & ...

OpenLineage has simplified collecting lineage metadata across the data ecosystem by standardizing its representation in an extensible model. It enabled a whole ecosystem improving data pipeline reliability and ease of troubleshooting in production environments. In this talk, we’ll briefly introduce the OpenLineage model and explore how this metadata is collected from Airflow, Spark, dbt, and Flink. We’ll demonstrate how to extract valuable insights and outline practical benefits and common challenges when building ingestion, processing and storage for OpenLineage data. We will also briefly show how OpenLineage events can be used to observe data pipelines exhastively and the benefits that brings.

15:45 - 16:10. Columbia A
By Tzu-ping Chung
Track: Roadmap

Airflow Asset originated from data lineage and evolved into its current state, being used as a scheduling concept (data-aware, event-based scheduling). It has even more potential. This talk discusses how other parts of Airflow, namely Connection and Object Storage, contain concepts related to Asset, and we can tie them all together to make task authoring flow even more naturally.

Planned topics:

  • Brief history on Asset and related constructs.
  • Current state of Asset concepts.
  • Inlets, anyone?
  • Finding inspiration from Pydantic et al.
  • My next step for Asset.
15:45 - 16:10. Columbia C
By Wei Lee
Track: Airflow 3

Migrating from Airflow 2 to the newly released Airflow 3 may seem intimidating due to numerous breaking changes and the introduction of new features. Although a backward compatibility layer has been implemented and most of the existing dags should work fine, some features—such as subdags and execution_date—have been removed based on community consensus.

To support this transition, we worked with Ruff to establish rules that automatically identify removed or deprecated features and even assist in fixing them. In this presentation, I will outline our current Ruff features, the migration rules from Airflow 2 to 3, and how this experience opens the door for us to promote best practices in Airflow through Ruff in the future.

15:45 - 16:10. Columbia C
By Dheeraj Turaga
Track: Use cases

The design of Qualcomm’s Snapdragon System-On-Chip (SoCs) involves several hundred complex workflows orchestrated across multiple data centers, taking the design from RTL to GDS. In the Snapdragon Oryon Custom CPU team, we introduced Airflow about 2 years ago to orchestrate design, verification, emulation, CI/CD, and physical implementation of our CPUs.

Use Case: • Standardization and Templatization: We standardize and templatize common workflows, allowing designers to verify their designs by customizing YAML parameters. • Custom Shell Operators: We created custom shell operators (tcshrc) to source project environments and work with internal tooling. • Smart Retries: We use pre/post-execute hooks to trigger smart retries on failure. • Dynamic Celery Workers: We auto-create Celery workers on the fly on our High-Performance Compute (HPC) clusters to launch and manage Electronic Design Automation (EDA) workloads. • Hybrid Executor Strategy: We use a hybrid executor strategy (CeleryExecutor and EdgeExecutor) to orchestrate tasks across multiple data centers. • EdgeExecutor for Remote Testing: We leverage EdgeExecutor to access post-silicon hardware in remote locations.

15:45 - 16:10. Beckler
By Blagoy Kaloferov
Track: Use cases

At TrueCar, migrating hundreds of legacy Oozie workflows and in-house orchestration tools to Apache Airflow required key technical decisions that transformed our data platform architecture and organizational capabilities. We consolidated individual chained tasks into optimized DAGs leveraging native Airflow functionality to trigger compute across cloud environments. A crucial breakthrough was developing DAG generators to scale migration—essential for efficiently migrating hundreds of workflows while maintaining consistency. By decoupling orchestration from compute, we gained flexibility to select optimal tools for specific outcomes—programmatic processing, analytics, batch jobs, or AI/ML pipelines. This resulted in cost reductions, performance improvements, and team agility. We also gained unprecedented visibility into DAG performance and dependency patterns previously invisible across fragmented systems. Attendees will learn how we redesigned complex workflows into efficient DAGs using dynamic task generation, architectural decisions that enabled platform innovation and the decision framework that made our migration transformational.

15:45 - 16:10. Columbia A
By Oscar Ligthart & Rodrigo Loredo
Track: Use cases

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams.

15:45 - 16:10. Columbia C
By Bhavani Ravi
Track: Best practices

The general-purpose nature of Airflow has always left us questioning, “Is this the right way”? While the existing resources and community cover them, the new Airflow releases always leave us wondering if there is more . This talk reveals how 3.0’s innovations redefine best practices, building production-ready data platforms.

• Dag Development - Future-proof your dags without compromising on Fundamentals • Modern Pipelines: How to best incorporate new Airflow features • Infrastructure: Leveraging 3.0’s Service-Oriented Architecture and Edge Executor • Teams & Responsibilities: Streamlined operations with the new split CLI and improved UI. • Monitoring & Observability: Building fail-proof pipelines

15:45 - 16:10. Columbia D
By Ethan Shalev
Track: Use cases

Airflow’s traditional execution model often leads to wasted resources: worker nodes sitting idle, waiting on external systems. At Wix, we tackled this inefficiency head-on by refactoring our in-house operators to support Airflow’s deferrable execution model.

Join us on a walk through Wix’s journey to a more efficient Airflow setup, from identifying bottlenecks to implementing deferrable operators and reaping their benefits. We’ll share the alternatives considered, the refactoring process, and how the team seamlessly integrated deferrable execution with no disruption to data engineers’ workflows.

16:45 - 17:10. Columbia A
By Silver Pang
Track: Use cases

At the enterprise level, managing Airflow deployments across multiple teams can become complex, leading to bottlenecks and slowed development cycles. We will share our journey of decentralizing Airflow repositories to empower data engineering teams with multi-tenancy, clean folder structures, and streamlined DevOps processes.

We dive into how restructuring our Airflow architecture and utilizing repository templates allowed teams to generate new data pipelines effortlessly. This approach enables engineers to focus on business logic without worrying about underlying Airflow configurations. By automating deployments and reducing manual errors through CI/CD pipelines, we minimized operational overhead.

16:45 - 17:10. Columbia C
By Cedrik Neumann
Track: Airflow 3

Airflow 3 comes with two new features: Edge execution and the task SDK. Powered by a HTTP API, these make it possible to write and execute Airflow tasks in any language from anywhere.

In this session I will explain some of the APIs needed and show how to interact with them based on an embedded toy worker written in Rust and running on an ESP32-C3. Furthermore I will provide practical tips on writing your own edge worker and how to develop against a running instance of Airflow.

16:45 - 17:10. Columbia D
By Srinivas Podila & Venkat Sadineni
Track: Use cases

We use Amazon MWAA to orchestrate our enterprise data warehouse and MDM solutions. Our DAGs extract data from Salesforce, Oracle, Workday, and SFTP, transform it using Mulesoft, Informatica, and DBT, and load it into Salesforce Data Cloud and Snowflake. MWAA is configured as a multi-tenant platform, supporting more than 10 teams and managing thousands of DAGs per environment. Each team follows a full SDLC and has a dedicated Git repo integrated with Jenkins-based CI/CD pipelines for independent deployments.

16:45 - 17:10. Beckler
By Rahul Vats & Phani Kumar
Track: Airflow 3

Ensuring the stability of a major release like Airflow 3 required extensive testing across multiple dimensions. In this session, we will dive into the testing strategies and validation techniques used to guarantee a smooth rollout. From unit and integration tests to real-world DAG validations, this talk will cover the challenges faced, key learnings, and best practices for testing Airflow. Whether you’re a contributor, QA engineer, or Airflow user preparing for migration, this session will offer valuable takeaways to improve your own testing approach.

Thursday, October 9, 2025

09:00
09:30
Keynote TBC
10:10
Coffee break
10:30
11:00
11:30
12:00
12:30
13:00
Lunch
14:00
Invited talk
14:30
15:00
15:30
Coffee break
15:45
16:15
16:45
17:30
17:35
Lightning talk
17:40
Lightning talk
17:45
Lightning talk
17:50
Lightning talk
18:00
Event wrap-up
09:00 - 09:25.
By Brooke Jamieson
Track: Keynote
Room: Columbia A
10/09/2025 9:00 AM 10/09/2025 9:25 AM America/Los_Angeles AS24: New Tools, Same Craft: The Developer's Toolbox in 2025

Our development workflows look dramatically different than they did a year ago. Code generation, automated testing, and AI-assisted documentation tools are now part of many developers’ daily work. Yet as these tools reshape how we code, I’ve noticed something worth examining: while our toolbox is changing rapidly, the core of being a good developer hasn’t. Problem-solving, collaborative debugging, and systems thinking remain as crucial as ever.

In this keynote, I’ll share observations about:

  • Which parts of our workflow are genuinely enhanced by new tools
  • The development skills that continue to separate good code from great code
  • How teams can collaborate effectively when everyone’s tools are evolving
  • What Airflow’s journey teaches us about balancing innovation with stability

No hype or grand pronouncements—just an honest look at incorporating new tools while preserving the craft that makes us developers in the first place.

Columbia A

Our development workflows look dramatically different than they did a year ago. Code generation, automated testing, and AI-assisted documentation tools are now part of many developers’ daily work. Yet as these tools reshape how we code, I’ve noticed something worth examining: while our toolbox is changing rapidly, the core of being a good developer hasn’t. Problem-solving, collaborative debugging, and systems thinking remain as crucial as ever.

In this keynote, I’ll share observations about:

10:30 - 10:55.
By Zhe-You Liu
Track: Airflow intro/overview
Room: Beckler
10/09/2025 10:30 AM 10/09/2025 10:55 AM America/Los_Angeles AS24: Becoming an Apache Airflow Committer from 0

How a Complete Beginner in Data Engineering / Junior Computer Science Student Became an Apache Airflow Committer in Just 5 Months—With 70+ PRs and 300 Hours of Contributions

This talk is aimed at those who are still hesitant about contributing to Apache Airflow. I hope to inspire and encourage anyone to take the first step and start their journey in open-source—let’s build together!

Beckler

How a Complete Beginner in Data Engineering / Junior Computer Science Student Became an Apache Airflow Committer in Just 5 Months—With 70+ PRs and 300 Hours of Contributions

This talk is aimed at those who are still hesitant about contributing to Apache Airflow. I hope to inspire and encourage anyone to take the first step and start their journey in open-source—let’s build together!

10:30 - 13:00.
By Jon Fink & Amy Pitcher
Track: Workshop
Room: 305
10/09/2025 10:30 AM 10/09/2025 1:00 PM America/Los_Angeles AS24: Bridging Data Pipelines and Business Applications with Airflow and Control-M

AI and ML pipelines built in Airflow often power critical business outcomes, but they rarely operate in isolation. In this hands-on workshop, learn how Control-M integrates with Airflow to orchestrate end-to-end workflows that include upstream and downstream enterprise systems like Supply Chain and Billing. Gain visibility, reliability, and seamless coordination across your data pipelines and the business operations they support.

305
Learn how Control-M integrates with Airflow to orchestrate end-to-end workflows that include upstream and downstream enterprise systems like Supply Chain and Billing. Gain visibility, reliability, and seamless coordination across your data pipelines and the business operations they support.
10:30 - 10:55.
By Hannah Lundrigan & Alberto Hernandez
Track: Use cases
Room: Columbia C
10/09/2025 10:30 AM 10/09/2025 10:55 AM America/Los_Angeles AS24: Enhancing Small Retailer Visibility: Machine Learning Pipelines with Apache Airflow

Small retailers often lack the data visibility that larger companies rely on for decision-making. In this session, we’ll dive into how Apache Airflow powers end-to-end machine learning pipelines that process inventory and sales data, enabling retailers and suppliers to gain valuable industry insights. We’ll cover feature engineering, model training, and automated inference workflows, along with strategies for handling messy, incomplete retail data. We will discuss how Airflow enables scalable ML-driven insights that improve demand forecasting, product categorization, and supply chain optimization.

Columbia C

Small retailers often lack the data visibility that larger companies rely on for decision-making. In this session, we’ll dive into how Apache Airflow powers end-to-end machine learning pipelines that process inventory and sales data, enabling retailers and suppliers to gain valuable industry insights. We’ll cover feature engineering, model training, and automated inference workflows, along with strategies for handling messy, incomplete retail data. We will discuss how Airflow enables scalable ML-driven insights that improve demand forecasting, product categorization, and supply chain optimization.

10:30 - 13:00.
By Marc Lamberti
Track: Workshop
Room: 301
10/09/2025 10:30 AM 10/09/2025 1:00 PM America/Los_Angeles AS24: Get Certified: DAG Authoring for Apache Airflow 3

We’re excited to offer Airflow Summit 2025 attendees an exclusive opportunity to earn their DAG Authoring certification in person, now updated to include all the latest Airflow 3.0 features. This certification workshop comes at no additional cost to summit attendees.

The DAG Authoring for Apache Airflow certification validates your expertise in advanced Airflow concepts and demonstrates your ability to build production-grade data pipelines. It covers TaskFlow API, Dynamic task mapping, Templating, Asset-driven scheduling, Best practices for production DAGs, and new Airflow 3.0 features and optimizations.

The certification session includes:

  • 20-minute preparation period with expert guidance
  • Live Q&A session with Marc Lamberti from Astronomer
  • 60-minute examination period
  • Real-time results and immediate feedback

To prepare for the Airflow Certification, visit the Astronomer Academy (https://academy.astronomer.io/page/astronomer-certification).

301
The DAG Authoring for Apache Airflow certification validates your expertise in advanced Airflow concepts and demonstrates your ability to build production-grade data pipelines.
10:30 - 13:00.
By Luan Moreno Medeiros Maciel
Track: Workshop
Room: 306
10/09/2025 10:30 AM 10/09/2025 1:00 PM America/Los_Angeles AS24: Mastering Event-Driven in Airflow 3: Building Scalable Data Pipelines

Transform your data pipelines with event-driven scheduling in Airflow 3. In this hands-on workshop, you’ll:

  • Set up AssetWatchers to track S3, Kafka, or database events
  • Build DAGs that trigger instantly on new data
  • Master scaling techniques for high-volume workflows

Create a live pipeline—process logs or IoT data in real time—and adapt it to your needs. No event-driven experience required; just bring a laptop and Airflow basics. Gain practical skills to make your pipelines responsive and efficient.

Don’t miss this chance to level up your orchestration game!

306
Transform your data pipelines with event-driven scheduling in Airflow 3.
10:30 - 10:55.
By Karthik Dulam
Track: Use cases
Room: Columbia A
10/09/2025 10:30 AM 10/09/2025 10:55 AM America/Los_Angeles AS24: Orchestrating MLOps and Data Transformation at EDB with Airflow

This talk explores EDB’s journey from siloed reporting to a unified data platform, powered by Airflow. We’ll delve into the architectural evolution, showcasing how Airflow orchestrates a diverse range of use cases, from Analytics Engineering to complex MLOps pipelines.

Learn how EDB leverages Airflow and Cosmos to integrate dbt for robust data transformations, ensuring data quality and consistency.

We’ll provide a detailed case study of our MLOps implementation, demonstrating how Airflow manages training, inference, and model monitoring pipelines for Azure Machine Learning models.

Discover the design considerations driven by our internal data governance framework and gain insights into our future plans for AIOps integration with Airflow.

Columbia A

This talk explores EDB’s journey from siloed reporting to a unified data platform, powered by Airflow. We’ll delve into the architectural evolution, showcasing how Airflow orchestrates a diverse range of use cases, from Analytics Engineering to complex MLOps pipelines.

Learn how EDB leverages Airflow and Cosmos to integrate dbt for robust data transformations, ensuring data quality and consistency.

We’ll provide a detailed case study of our MLOps implementation, demonstrating how Airflow manages training, inference, and model monitoring pipelines for Azure Machine Learning models.

10:30 - 10:55.
By Xiaodong Deng & Chaoran Yu
Track: Airflow & ...
Room: Columbia D
10/09/2025 10:30 AM 10/09/2025 10:55 AM America/Los_Angeles AS24: When Airflow Meets Yunikorn: Enhancing Airflow with Yunikorn for Higher Efficiency

Apache Airflow’s Kubernetes integration enables flexible workload execution on Kubernetes but lacks advanced resource management features including application queueing, tenant isolation and gang scheduling. These features are increasingly critical for data engineering as well as AI/ML use cases, particularly GPU utilization optimization. Apache Yunikorn, a Kubernetes-native scheduler, addresses these gaps by offering a high-performance alternative to Kubernetes default scheduler. In this talk, we’ll demonstrate how to conveniently leverage Yunikorn’s power in Airflow, along with practical use cases and examples.

Columbia D

Apache Airflow’s Kubernetes integration enables flexible workload execution on Kubernetes but lacks advanced resource management features including application queueing, tenant isolation and gang scheduling. These features are increasingly critical for data engineering as well as AI/ML use cases, particularly GPU utilization optimization. Apache Yunikorn, a Kubernetes-native scheduler, addresses these gaps by offering a high-performance alternative to Kubernetes default scheduler. In this talk, we’ll demonstrate how to conveniently leverage Yunikorn’s power in Airflow, along with practical use cases and examples.

11:00 - 11:25.
By Karan Alang
Track: Airflow & ...
Room: Beckler
10/09/2025 11:00 AM 10/09/2025 11:25 AM America/Los_Angeles AS24: Automating Threat Intelligence with Airflow, XDR, and LLMs using the MITRE ATT&CK Framework

Security teams often face alert fatigue from massive volumes of raw log data. This session demonstrates how to combine Apache Airflow, Wazuh, and LLMs to build automated pipelines for smarter threat triage—grounded in the MITRE ATT&CK framework.

We’ll explore how Airflow can orchestrate a full workflow: ingesting Wazuh alerts, using LLMs to summarize log events, matching behavior to ATT&CK tactics and techniques, and generating enriched incident summaries. With AI-powered interpretation layered on top of structured threat intelligence, teams can reduce manual effort while increasing context and clarity.

You’ll learn how to build modular DAGs that automate: • Parsing and routing Wazuh alerts, • Querying LLMs for human-readable summaries, • Mapping IOCs to ATT&CK using vector similarity or prompt templates, • Outputting structured threat reports for analysts.

The session includes a real-world example integrating open-source tools and public ATT&CK data, and will provide reusable components for rapid adoption. If you’re a SecOps engineer or ML practitioner in cybersecurity, this talk gives you a practical blueprint to deploy intelligent, scalable threat automation.

Beckler

Security teams often face alert fatigue from massive volumes of raw log data. This session demonstrates how to combine Apache Airflow, Wazuh, and LLMs to build automated pipelines for smarter threat triage—grounded in the MITRE ATT&CK framework.

We’ll explore how Airflow can orchestrate a full workflow: ingesting Wazuh alerts, using LLMs to summarize log events, matching behavior to ATT&CK tactics and techniques, and generating enriched incident summaries. With AI-powered interpretation layered on top of structured threat intelligence, teams can reduce manual effort while increasing context and clarity.

11:00 - 11:25.
By Jonathan Leek & Michelle Winters
Track: Best practices
Room: Columbia C
10/09/2025 11:00 AM 10/09/2025 11:25 AM America/Los_Angeles AS24: Building an Airflow Center of Excellence: Lessons from the Frontlines

As organizations scale their data infrastructure, Apache Airflow becomes a mission-critical component for orchestrating workflows efficiently. But scaling Airflow successfully isn’t just about running pipelines—it’s about building a Center of Excellence (CoE) that empowers teams with the right strategy, best practices, and long-term enablement. Join Jon Leek and Michelle Winters as they share their experiences helping customers design and implement Airflow Centers of Excellence. They’ll walk through real-world challenges, best practices, and the structured approach Astronomer takes to ensure teams have the right plan, resources, and support to succeed. Whether you’re just starting with Airflow or looking to optimize and scale your workflows, this session will give you a proven framework to build a sustainable Airflow Center of Excellence within your organization. 🚀

Columbia C

As organizations scale their data infrastructure, Apache Airflow becomes a mission-critical component for orchestrating workflows efficiently. But scaling Airflow successfully isn’t just about running pipelines—it’s about building a Center of Excellence (CoE) that empowers teams with the right strategy, best practices, and long-term enablement. Join Jon Leek and Michelle Winters as they share their experiences helping customers design and implement Airflow Centers of Excellence. They’ll walk through real-world challenges, best practices, and the structured approach Astronomer takes to ensure teams have the right plan, resources, and support to succeed. Whether you’re just starting with Airflow or looking to optimize and scale your workflows, this session will give you a proven framework to build a sustainable Airflow Center of Excellence within your organization. 🚀

11:00 - 11:25.
By Rachel Sun
Track: Airflow & ...
Room: Columbia D
10/09/2025 11:00 AM 10/09/2025 11:25 AM America/Los_Angeles AS24: How Pinterest Uses Ai to Empower Airflow Users for Troubleshooting

At Pinterest, there are over 10,000 DAGs supporting various use cases across different teams and roles. With this scale and diversity, user support has been an ongoing challenge to unlock productivity. As Airflow increasingly serves as a user interface to a variety of data and ML infrastructure behind the scenes, it’s common for issues from multiple areas to surface in Airflow, making triage and troubleshooting a challenge.

In this session, we will discuss the scale of the problem we are facing, how we have addressed it so far, and how we are introducing LLM AI to help solve this problem.

Columbia D

At Pinterest, there are over 10,000 DAGs supporting various use cases across different teams and roles. With this scale and diversity, user support has been an ongoing challenge to unlock productivity. As Airflow increasingly serves as a user interface to a variety of data and ML infrastructure behind the scenes, it’s common for issues from multiple areas to surface in Airflow, making triage and troubleshooting a challenge.

In this session, we will discuss the scale of the problem we are facing, how we have addressed it so far, and how we are introducing LLM AI to help solve this problem.

11:00 - 11:25.
By Bolke de Bruin
Track: Community
Room: Columbia A
10/09/2025 11:00 AM 10/09/2025 11:25 AM America/Los_Angeles AS24: Your privacy or our progress: rethinking telemetry in Airflow

We face a paradox: we could use usage data to build better software, but collecting that data seems to contradict the very principles of user freedom that open source represents. Apache Airflow’s current telemetry - already purged - system has become a battleground for this conflict, with some users voicing concerns over privacy while maintainers struggle to make informed decisions without data. What can we do to strike the right balance?

Columbia A

We face a paradox: we could use usage data to build better software, but collecting that data seems to contradict the very principles of user freedom that open source represents. Apache Airflow’s current telemetry - already purged - system has become a battleground for this conflict, with some users voicing concerns over privacy while maintainers struggle to make informed decisions without data. What can we do to strike the right balance?

11:30 - 11:55.
By Shalabh Agarwal
Track: Airflow & ...
Room: Beckler
10/09/2025 11:30 AM 10/09/2025 11:55 AM America/Los_Angeles AS24: Custom Operators in Action: A Guide to Extending Airflow's Capabilities

Custom operators are the secret weapon for solving Airflow’s unique & challenging orchestration problems.

This session will cover:

  • When to build custom operators vs. using existing solutions
  • Architecture patterns for creating maintainable, reusable operators
  • Live coding demonstration: Building a custom operator from scratch
  • Real-world examples: How custom operators solve specific business challenges

Through practical code examples and architecture patterns, attendees will walk away with the knowledge to implement custom operators that enhance their Airflow deployments.

This session is ideal for experienced Airflow users looking to extend functionality beyond out-of-the-box solutions.

Beckler

Custom operators are the secret weapon for solving Airflow’s unique & challenging orchestration problems.

This session will cover:

  • When to build custom operators vs. using existing solutions
  • Architecture patterns for creating maintainable, reusable operators
  • Live coding demonstration: Building a custom operator from scratch
  • Real-world examples: How custom operators solve specific business challenges

Through practical code examples and architecture patterns, attendees will walk away with the knowledge to implement custom operators that enhance their Airflow deployments.

11:30 - 11:55.
By Ankit Sahu & Brandon Abear
Track: Airflow & ...
Room: Columbia C
10/09/2025 11:30 AM 10/09/2025 11:55 AM America/Los_Angeles AS24: Data & AI Orchestration at GoDaddy

As the adoption of Airflow increases within large enterprises to orchestrate their data pipelines, more than one team needs to create, manage, and run their workflows in isolation. With multi-tenancy not yet supported natively in Airflow, customers are adopting alternate ways to enable multiple teams to share infrastructure. In this session, we will explore how GoDaddy uses MWAA to build a Single Pane Airflow setup for multiple teams with a common observability platform, and how this foundation enables orchestration expansion beyond data workflows to AI workflows as well. We’ll discuss our roadmap for leveraging upcoming Airflow 3 features, including the task execution API for enhanced workflow management and DAG versioning capabilities for comprehensive auditing and governance. This session will help attendees gain insights into the use case, the solution architecture, implementation challenges and benefits, and our strategic vision for unified orchestration across data and AI workloads.

Outline:

  • About GoDaddy
  • GoDaddy Data & AI Orchestration Vision
  • Current State & Airflow Usage
  • Airflow Monitoring & Observability
  • Lessons Learned & Best Practices
  • Airflow 3 Adoption
Columbia C

As the adoption of Airflow increases within large enterprises to orchestrate their data pipelines, more than one team needs to create, manage, and run their workflows in isolation. With multi-tenancy not yet supported natively in Airflow, customers are adopting alternate ways to enable multiple teams to share infrastructure. In this session, we will explore how GoDaddy uses MWAA to build a Single Pane Airflow setup for multiple teams with a common observability platform, and how this foundation enables orchestration expansion beyond data workflows to AI workflows as well. We’ll discuss our roadmap for leveraging upcoming Airflow 3 features, including the task execution API for enhanced workflow management and DAG versioning capabilities for comprehensive auditing and governance. This session will help attendees gain insights into the use case, the solution architecture, implementation challenges and benefits, and our strategic vision for unified orchestration across data and AI workloads.

11:30 - 11:55.
By Nathan Hadfield
Track: Airflow & ...
Room: Columbia D
10/09/2025 11:30 AM 10/09/2025 11:55 AM America/Los_Angeles AS24: From Oops to Secure Ops: Self-Hosted AI for Airflow Failure Diagnosis

Last year, ‘From Oops to Ops’ showed how AI-powered failure analysis could help diagnose why Airflow tasks fail. But do we really need large, expensive cloud-based AI models to answer simple diagnostic questions? Relying on external AI APIs introduces privacy risks, unpredictable costs, and latency, often without clear benefits for this use case.

With the rise of distilled, open-source models, self-hosted failure analysis is now a practical alternative. This talk will explore how to deploy an AI service on infrastructure you control, compare cost, speed, and accuracy between OpenAI’s API and self-hosted models, and showcase a live demo of AI-powered task failure diagnosis using DeepSeek and Llama—running without external dependencies to keep data private and costs predictable.

Columbia D

Last year, ‘From Oops to Ops’ showed how AI-powered failure analysis could help diagnose why Airflow tasks fail. But do we really need large, expensive cloud-based AI models to answer simple diagnostic questions? Relying on external AI APIs introduces privacy risks, unpredictable costs, and latency, often without clear benefits for this use case.

With the rise of distilled, open-source models, self-hosted failure analysis is now a practical alternative. This talk will explore how to deploy an AI service on infrastructure you control, compare cost, speed, and accuracy between OpenAI’s API and self-hosted models, and showcase a live demo of AI-powered task failure diagnosis using DeepSeek and Llama—running without external dependencies to keep data private and costs predictable.

11:30 - 11:55.
By Theo Lebrun
Track: Use cases
Room: Columbia A
10/09/2025 11:30 AM 10/09/2025 11:55 AM America/Los_Angeles AS24: Orchestrating AI Knowledge Bases with Apache Airflow

In the age of Generative AI, knowledge bases are the backbone of intelligent systems, enabling them to deliver accurate and context-aware responses. But how do you ensure that these knowledge bases remain up-to-date and relevant in a rapidly changing world? Enter Apache Airflow, a robust orchestration tool that streamlines the automation of data workflows.

This talk will explore how Airflow can be leveraged to manage and update AI knowledge bases across multiple data sources. We’ll dive into the architecture, demonstrate how Airflow enables efficient data extraction, transformation, and loading (ETL), and share insights on tackling challenges like data consistency, scheduling, and scalability.

Whether you’re building your own AI-driven systems or looking to optimize existing workflows, this session will provide practical takeaways to make the most of Apache Airflow in orchestrating intelligent solutions.

Columbia A

In the age of Generative AI, knowledge bases are the backbone of intelligent systems, enabling them to deliver accurate and context-aware responses. But how do you ensure that these knowledge bases remain up-to-date and relevant in a rapidly changing world? Enter Apache Airflow, a robust orchestration tool that streamlines the automation of data workflows.

This talk will explore how Airflow can be leveraged to manage and update AI knowledge bases across multiple data sources. We’ll dive into the architecture, demonstrate how Airflow enables efficient data extraction, transformation, and loading (ETL), and share insights on tackling challenges like data consistency, scheduling, and scalability.

12:00 - 12:25.
By Lawrence Gerstley
Track: Use cases
Room: Columbia A
10/09/2025 12:00 PM 10/09/2025 12:25 PM America/Los_Angeles AS24: Airflow Uses in an on-prem Research Setting

KP Division of Research uses Airflow as a central technology for integrating diverse technologies in an agile setting. We wish to present a set of use-cases for AI/ML workloads, including imaging analysis (tissue segmentation, mammography), NLP (early identification of psychosis), LLM processing (identification of vessel diameter from radiological impressions), and other large data processing tasks. We create these “short-lived” project workflows to accomplish specific aims, and then may never run the job again, so leveraging generalized patterns are crucial to quickly implementing these jobs. Our Advanced Computational Infrastructure is comprised of multiple Kubernetes clusters, and we use Airflow to democratize the use of our batch level resources in those clusters. We use Airflow form-based parameters to deploy pods running R and Python scripts where generalized parameters are injected into scripts that follow internal programming patterns. Finally, we also leverage Airflow to create headless services inside Kubernetes for large computational workloads (Spark & H2O) that subsequent pods consume ephemerally.

Columbia A

KP Division of Research uses Airflow as a central technology for integrating diverse technologies in an agile setting. We wish to present a set of use-cases for AI/ML workloads, including imaging analysis (tissue segmentation, mammography), NLP (early identification of psychosis), LLM processing (identification of vessel diameter from radiological impressions), and other large data processing tasks. We create these “short-lived” project workflows to accomplish specific aims, and then may never run the job again, so leveraging generalized patterns are crucial to quickly implementing these jobs. Our Advanced Computational Infrastructure is comprised of multiple Kubernetes clusters, and we use Airflow to democratize the use of our batch level resources in those clusters. We use Airflow form-based parameters to deploy pods running R and Python scripts where generalized parameters are injected into scripts that follow internal programming patterns. Finally, we also leverage Airflow to create headless services inside Kubernetes for large computational workloads (Spark & H2O) that subsequent pods consume ephemerally.

12:00 - 12:25.
By Vishal Vijayvargiya
Track: Airflow & ...
Room: Beckler
10/09/2025 12:00 PM 10/09/2025 12:25 PM America/Los_Angeles AS24: Enhancing Airflow REST API: From Basic Integration to Enterprise Scale

Apache Airflow’s REST API has evolved to support diverse orchestration needs, with managed services like MWAA introducing custom enhancements. One such feature, InvokeRestApi, enables dynamic interactions with external services while maintaining Airflow’s core orchestration capabilities.

In this talk, we will explore the architectural design behind InvokeRestApi, detailing how it enhances API-driven workflows. Beyond the architecture, we’ll share key challenges and learnings from implementing and scaling Airflow’s REST API in production environments. Topics include authentication, performance considerations, error handling, and best practices for integrating external APIs efficiently.

Attendees will gain a deeper understanding of Airflow’s API extensibility, its implications for workflow automation, and actionable insights for building robust, API-driven orchestration solutions. Whether you’re an Airflow user or an architect, this session will provide valuable takeaways for simplifying API interactions across airflow environments.

Beckler

Apache Airflow’s REST API has evolved to support diverse orchestration needs, with managed services like MWAA introducing custom enhancements. One such feature, InvokeRestApi, enables dynamic interactions with external services while maintaining Airflow’s core orchestration capabilities.

In this talk, we will explore the architectural design behind InvokeRestApi, detailing how it enhances API-driven workflows. Beyond the architecture, we’ll share key challenges and learnings from implementing and scaling Airflow’s REST API in production environments. Topics include authentication, performance considerations, error handling, and best practices for integrating external APIs efficiently.

12:00 - 12:25.
By Salih Goktug Kose & Burak Ozdemir
Track: Airflow & ...
Room: Columbia D
10/09/2025 12:00 PM 10/09/2025 12:25 PM America/Los_Angeles AS24: From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

At Trendyol, Turkey’s leading e-commerce company, Apache Airflow powers our task orchestration, handling DAGs with 500+ tasks, complex interdependencies, and diverse environments. Managing on-prem Airflow instances posed challenges in scalability, maintenance, and deployment. To address these, we built TaskHarbor, a fully managed orchestration platform with a hybrid architecture—combining Airflow on GKE with on-prem resources for optimal performance and efficiency.

This talk covers how we:

  • Enabled seamless DAG synchronization across environments using GCS Fuse.
  • Optimized workload distribution via GCP’s HTTPS & TCP Load Balancers.
  • Automated infrastructure provisioning (GKE, CloudSQL, Kubernetes) using Terraform.
  • Simplified Airflow deployments by replacing Helm YAML files with a custom templating tool, reducing configurations to 10-15 lines.
  • Built a fully automated deployment pipeline, ensuring zero developer intervention.

We enhanced efficiency, reliability, and automation in hybrid orchestration by embracing a scalable, maintainable, and cloud-native strategy. Attendees will obtain practical insights into architecting Airflow at scale and optimizing deployments.

Columbia D

At Trendyol, Turkey’s leading e-commerce company, Apache Airflow powers our task orchestration, handling DAGs with 500+ tasks, complex interdependencies, and diverse environments. Managing on-prem Airflow instances posed challenges in scalability, maintenance, and deployment. To address these, we built TaskHarbor, a fully managed orchestration platform with a hybrid architecture—combining Airflow on GKE with on-prem resources for optimal performance and efficiency.

This talk covers how we:

  • Enabled seamless DAG synchronization across environments using GCS Fuse.
  • Optimized workload distribution via GCP’s HTTPS & TCP Load Balancers.
  • Automated infrastructure provisioning (GKE, CloudSQL, Kubernetes) using Terraform.
  • Simplified Airflow deployments by replacing Helm YAML files with a custom templating tool, reducing configurations to 10-15 lines.
  • Built a fully automated deployment pipeline, ensuring zero developer intervention.

We enhanced efficiency, reliability, and automation in hybrid orchestration by embracing a scalable, maintainable, and cloud-native strategy. Attendees will obtain practical insights into architecting Airflow at scale and optimizing deployments.

12:00 - 12:25.
By Peeyush Rai
Track: Airflow & ...
Room: Columbia C
10/09/2025 12:00 PM 10/09/2025 12:25 PM America/Los_Angeles AS24: Transforming Insurance underwriting with Agentic AI

The weav.ai platform is built on top of Apache Airflow, chosen for its deterministic, predictable execution coupled with extreme developer customizability. weav.ai has seamlessly integrated its AI agents with Airflow to enable a unified AI orchestration to bring the power of scalability, robustness and the intelligence of AI in a single process. This talk will focus on the use cases being served, an architecture overview of the key Airflow capabilities being leveraged, and how Agentic AI has been seamlessly integrated to deliver the AI powered workflows. Weav.ai’s platform is agnostic to any specific cloud or LLM and can orchestrate across those based on the use case.

Columbia C

The weav.ai platform is built on top of Apache Airflow, chosen for its deterministic, predictable execution coupled with extreme developer customizability. weav.ai has seamlessly integrated its AI agents with Airflow to enable a unified AI orchestration to bring the power of scalability, robustness and the intelligence of AI in a single process. This talk will focus on the use cases being served, an architecture overview of the key Airflow capabilities being leveraged, and how Agentic AI has been seamlessly integrated to deliver the AI powered workflows. Weav.ai’s platform is agnostic to any specific cloud or LLM and can orchestrate across those based on the use case.

12:30 - 12:55.
By Vikram Koka
Track: Best practices
Room: Columbia A
10/09/2025 12:30 PM 10/09/2025 12:55 PM America/Los_Angeles AS24: Common provider abstractions: Key for multi-cloud data handling

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time.

This talk will dive into why these abstractions matter, how they reduce friction for developers while giving enterprises true multi-cloud optionality, and what’s next for Airflow’s evolving provider ecosystem.

Columbia A

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time.

12:30 - 12:55.
By Chirag Todarka & Alvin Zhang
Track: Airflow & ...
Room: Columbia D
10/09/2025 12:30 PM 10/09/2025 12:55 PM America/Los_Angeles AS24: Scaling and Unifying Multiple Airflow Instances with Orchestration Frederator

In large organizations, multiple Apache Airflow instances often arise organically—driven by team-specific needs, distinct use cases, or tiered workloads. This fragmentation introduces complexity, operational overhead, and higher infrastructure costs. To address these challenges, we developed the “Orchestration Frederator,” a solution designed to unify and horizontally scale multiple Airflow deployments seamlessly.

This session will detail our journey in implementing Orchestration Frederator, highlighting how we achieved:

  • Horizontal Scalability: Seamlessly scaling Airflow across multiple instances without operational overhead.

  • End-to-End Data Lineage: Constructing comprehensive data lineage across disparate Airflow deployments to simplify monitoring and debugging.

  • Multi-Region Support: Introducing multi-region capabilities, enhancing reliability and disaster recovery.

  • Unified Ecosystem: Consolidating previously fragmented Airflow environments into a cohesive orchestration platform.

Join us to explore practical strategies, technical challenges, lessons learned, and best practices for enhancing scalability, reliability, and maintainability in large-scale Airflow deployments.

Columbia D

In large organizations, multiple Apache Airflow instances often arise organically—driven by team-specific needs, distinct use cases, or tiered workloads. This fragmentation introduces complexity, operational overhead, and higher infrastructure costs. To address these challenges, we developed the “Orchestration Frederator,” a solution designed to unify and horizontally scale multiple Airflow deployments seamlessly.

This session will detail our journey in implementing Orchestration Frederator, highlighting how we achieved:

  • Horizontal Scalability: Seamlessly scaling Airflow across multiple instances without operational overhead.

12:30 - 12:55.
By Rakesh Kumar Tai & Mili Tripathi
Track: Use cases
Room: Beckler
10/09/2025 12:30 PM 10/09/2025 12:55 PM America/Los_Angeles AS24: Transforming Data Engineering: Achieving Efficiency and Ease with an Intuitive Orchestration Solution

In the rapidly evolving field of data engineering and data science, efficiency and ease of use are crucial. Our innovative solution offers a user-friendly interface to manage and schedule custom PySpark, PySQL, Python, and SQL code, streamlining the process from development to production. Using Airflow at the backend, this tool eliminates the complexities of infrastructure management, version control, CI/CD processes, and workflow orchestration.The intuitive UI allows users to upload code, configure job parameters, and set schedules effortlessly, without the need for additional scripting or coding. Additionally, users have the flexibility to bring their own custom artifactory solution and run their code. In summary, our solution significantly enhances the orchestration and scheduling of custom code, breaking down traditional barriers and empowering organizations to maximize their data’s potential and drive innovation efficiently. Whether you are an individual data scientist or part of a large data engineering team, this tool provides the resources needed to streamline your workflow and achieve your goals faster than ever before.

Beckler

In the rapidly evolving field of data engineering and data science, efficiency and ease of use are crucial. Our innovative solution offers a user-friendly interface to manage and schedule custom PySpark, PySQL, Python, and SQL code, streamlining the process from development to production. Using Airflow at the backend, this tool eliminates the complexities of infrastructure management, version control, CI/CD processes, and workflow orchestration.The intuitive UI allows users to upload code, configure job parameters, and set schedules effortlessly, without the need for additional scripting or coding. Additionally, users have the flexibility to bring their own custom artifactory solution and run their code. In summary, our solution significantly enhances the orchestration and scheduling of custom code, breaking down traditional barriers and empowering organizations to maximize their data’s potential and drive innovation efficiently. Whether you are an individual data scientist or part of a large data engineering team, this tool provides the resources needed to streamline your workflow and achieve your goals faster than ever before.

14:00 - 16:30.
By M Waqas Shahid
Track: Workshop
Room: 305
10/09/2025 2:00 PM 10/09/2025 4:30 PM America/Los_Angeles AS24: Airflow and Optimised Data Platform: Setup & Customisations

Airflow has been used by many companies as a core part of their internal data platform. Would you be interested in finding out how Airflow could play a pivotal role in achieving data engineering excellence and efficiency using modern data architecture. The best practices, tools and setup to achieve a stable but yet cost effective way of running small or big workloads, let’s find out!

In this workshop we will review how an organisation can setup data platform architecture around Airflow and necessary requirements.

  • Airflow and it’s role in Data Platform
  • Different ways to organise airflow environment enabling scalability and stability
  • Useful open source libraries and custom plugins allowing efficiency
  • How to manage multi-tenancy, cost savings
  • Challenges and factors to keep in mind using Success Matrix!

This workshop should be suitable for any Architect, Data Engineer or Devops aiming to build/enhance their internal Data Platform. At the end of this workshop you would have solid understanding of initial setup and ways to optimise further getting most out of the tool for your own organisation.

305
This workshop should be suitable for any Architect, Data Engineer or Devops aiming to build/enhance their internal Data Platform. At the end of this workshop you would have solid understanding of initial setup and ways to optimise further getting most out of the tool for your own organisation.
14:00 - 14:25.
By Yunhao Qing
Track: Use cases
Room: Columbia A
10/09/2025 2:00 PM 10/09/2025 2:25 PM America/Los_Angeles AS24: From Cron to Data-Aware: Evolving Airflow Scheduling at Scale

As data platforms grow in complexity, so do the orchestration needs behind them. Time-based (cron) scheduling has long been the default in Airflow, but dataset-based scheduling promises a more data-aware, efficient alternative. In this session, I’ll share lessons learned from operating Airflow at scale—supporting thousands of DAGs across teams with varied use cases, from simple ETL to complex ML workflows. We’ll explore when dataset scheduling makes sense, the challenges it introduces, and how to evolve your DAG design and platform architecture to make the most of it. Whether you’re migrating legacy workflows or designing new ones, this talk will help you evaluate the right scheduling model for your needs.

Columbia A

As data platforms grow in complexity, so do the orchestration needs behind them. Time-based (cron) scheduling has long been the default in Airflow, but dataset-based scheduling promises a more data-aware, efficient alternative. In this session, I’ll share lessons learned from operating Airflow at scale—supporting thousands of DAGs across teams with varied use cases, from simple ETL to complex ML workflows. We’ll explore when dataset scheduling makes sense, the challenges it introduces, and how to evolve your DAG design and platform architecture to make the most of it. Whether you’re migrating legacy workflows or designing new ones, this talk will help you evaluate the right scheduling model for your needs.

14:00 - 14:25.
By Arthur Chen, Trevor DeVore & Deng Pan
Track: Airflow & ...
Room: Columbia C
10/09/2025 2:00 PM 10/09/2025 2:25 PM America/Los_Angeles AS24: Lessons learned from migrating to Airflow @ LI Scale

At LinkedIn, our data pipelines process exabytes of data, with our offline infrastructure executing 300K ETL workflows daily and 10K concurrent executions. Historically, these workloads ran on our legacy system, Azkaban, which faced UX, scalability, and operational challenges. To modernize our infra, we built a managed Airflow service, leveraging its enhanced developer & operator experience, rich feature set, and strong OSS community support. That initiated LinkedIn’s largest-ever infrastructure migration—transitioning thousands of legacy workflows to Airflow.

In this talk, we will share key lessons from migrating massive-scale pipelines with minimal production disruption. We will discuss:

  • Overall Migration Strategy
  • Custom Tooling Enhancements on testing, deployment, and observability
  • Architectural Innovations decoupling orchestration and compute
  • GenAI-powered Migration automating code rewrites
  • Post-Migration Challenges & Airflow 3.0.

Attendees will walk away with battle-tested strategies for large-scale Airflow adoption and practical insights into scaling Airflow in enterprise environments.

Columbia C

At LinkedIn, our data pipelines process exabytes of data, with our offline infrastructure executing 300K ETL workflows daily and 10K concurrent executions. Historically, these workloads ran on our legacy system, Azkaban, which faced UX, scalability, and operational challenges. To modernize our infra, we built a managed Airflow service, leveraging its enhanced developer & operator experience, rich feature set, and strong OSS community support. That initiated LinkedIn’s largest-ever infrastructure migration—transitioning thousands of legacy workflows to Airflow.

14:00 - 16:30.
By Pankaj Singh, Tatiana Al-Chueyr Martins & Pankaj Koti
Track: Workshop
Room: 301
10/09/2025 2:00 PM 10/09/2025 4:30 PM America/Los_Angeles AS24: Productionising dbt-core with Airflow

As a popular open-source library for analytics engineering, dbt is often combined with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.

This workshop will cover a step-by-step guide to Cosmos (https://github.com/astronomer/astronomer-cosmos), a popular open-source package from Astronomer that helps you quickly run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:

  • Running and visualising your dbt transformations
  • Managing dependency conflicts
  • Defining database credentials (profiles)
  • Configuring source and test nodes
  • Using dbt selectors
  • Customising arguments per model
  • Addressing performance challenges
  • Leveraging deferrable operators
  • Visualising dbt docs in the Airflow UI
  • Example of how to deploy to production
  • Troubleshooting

We encourage participants to bring their dbt project to follow this step-by-step workshop.

301
This workshop will cover a step-by-step guide to Cosmos, an open-source package that helps you quickly run your dbt Core projects as Airflow DAGs and Task Groups.
14:00 - 14:25.
By Khadija Al Ahyane
Track: Airflow & ...
Room: Beckler
10/09/2025 2:00 PM 10/09/2025 2:25 PM America/Los_Angeles AS24: Task failures troubleshooting based on Airflow & Kubernetes signals

Per Airflow community survey, Kubernetes is the most popular compute platform used to run Airflow and when run on Kubernetes, Airflow gains, out of the box, lots of benefits like monitoring, reliability, ease of deployment, scalability and autoscaling. On the other hand, running Airflow on Kubernetes means running a sophisticated distributed system on another distributed system which makes troubleshooting of Airflow tasks and DAGs failures harder.

This session tackles that bottleneck head-on, introducing a practical approach to building an automated diagnostic pipeline for Airflow on Kubernetes. Imagine offloading tedious investigations to a system that, on task failure, automatically collects and correlates key signals from Kubernetes components (linking Airflow tasks to specific Pods and their events), KubernetesGKE monitoring, and relevant logs—pinpointing root causes and suggesting actionable fixes.

Attendees will leave with a clear understanding of common Airflow-on-Kubernetes failure patterns—and more importantly, a blueprint and practical strategies to reduce MTTR and boost team efficiency.

Beckler

Per Airflow community survey, Kubernetes is the most popular compute platform used to run Airflow and when run on Kubernetes, Airflow gains, out of the box, lots of benefits like monitoring, reliability, ease of deployment, scalability and autoscaling. On the other hand, running Airflow on Kubernetes means running a sophisticated distributed system on another distributed system which makes troubleshooting of Airflow tasks and DAGs failures harder.

This session tackles that bottleneck head-on, introducing a practical approach to building an automated diagnostic pipeline for Airflow on Kubernetes. Imagine offloading tedious investigations to a system that, on task failure, automatically collects and correlates key signals from Kubernetes components (linking Airflow tasks to specific Pods and their events), KubernetesGKE monitoring, and relevant logs—pinpointing root causes and suggesting actionable fixes.

14:00 - 16:30.
By Ryan Hatter, Amogh Desai, Phani Kumar & Kalya Reddy
Track: Workshop
Room: 306
10/09/2025 2:00 PM 10/09/2025 4:30 PM America/Los_Angeles AS24: Your first Apache Airflow Contribution

Ready to contribute to Apache Airflow? In this hands-on workshop, you’ll be expected to come prepared with your development environment already configured (Breeze installed is strongly recommended, but Codespaces works if you can’t install Docker). We’ll dive straight into finding issues that match your skills and walk you through the entire contribution process—from creating your first pull request to receiving community feedback. Whether you’re writing code, enhancing documentation, or offering feedback, there’s a place for you. Let’s get started and see your name among Airflow contributors!

306
Whether you’re writing code, enhancing documentation, or offering feedback, there’s a place for you. Let’s get started and see your name among Airflow contributors!"
14:30 - 14:55.
By Shoubhik Bose
Track: Use cases
Room: Columbia C
10/09/2025 2:30 PM 10/09/2025 2:55 PM America/Los_Angeles AS24: Applying Airflow to drive the digital workforce in the Enterprise

Red Hat’s unified data and AI platform relies on Apache Airflow for orchestration, alongside Snowflake, Fivetran, and Atlan. The platform prioritizes building a dependable data foundation, recognizing that effective AI depends on quality data. Airflow was selected for its predictability, extensive connectivity, reliability, and scalability.

The platform now supports business analytics, transitioning from ETL to ELT processes. This has resulted in a remarkable improvement in how we make data available for business decisions.

The platform’s capabilities are being extended to power Digital Workers (AI agents) using large language models, encompassing model training, fine-tuning, and inference. Two Digital Workers are currently deployed, with more in development.

This presentation will detail the rationale and background of this evolution, followed by an explanation of the architectural decisions made and the challenges encountered and resolved throughout the process of transforming into an AI-enabled data platform to power Red Hat’s business.

Columbia C

Red Hat’s unified data and AI platform relies on Apache Airflow for orchestration, alongside Snowflake, Fivetran, and Atlan. The platform prioritizes building a dependable data foundation, recognizing that effective AI depends on quality data. Airflow was selected for its predictability, extensive connectivity, reliability, and scalability.

The platform now supports business analytics, transitioning from ETL to ELT processes. This has resulted in a remarkable improvement in how we make data available for business decisions.

14:30 - 14:55.
By Christian Foernges
Track: Use cases
Room: Columbia A
10/09/2025 2:30 PM 10/09/2025 2:55 PM America/Los_Angeles AS24: Learn from Deutsche Bank: Using Apache Airflow in Regulated Environments

Operating within the stringent regulatory landscape of Corporate Banking, Deutsche Bank relies heavily on robust data orchestration. This session explores how Deutsche Bank’s Corporate Bank leverages Apache Airflow across diverse environments, including both on-premises infrastructure and cloud platforms. Discover their approach to managing critical data & analytics workflows, encompassing areas like regulatory reporting, data integration and complex data processing pipelines. Gain insights into the architectural patterns and operational best practices employed to ensure compliance, security, and scalability when running Airflow at scale in a highly regulated, hybrid setting.

Columbia A

Operating within the stringent regulatory landscape of Corporate Banking, Deutsche Bank relies heavily on robust data orchestration. This session explores how Deutsche Bank’s Corporate Bank leverages Apache Airflow across diverse environments, including both on-premises infrastructure and cloud platforms. Discover their approach to managing critical data & analytics workflows, encompassing areas like regulatory reporting, data integration and complex data processing pipelines. Gain insights into the architectural patterns and operational best practices employed to ensure compliance, security, and scalability when running Airflow at scale in a highly regulated, hybrid setting.

14:30 - 14:55.
By Purshotam Shah & Prakash Nandha Mukunthan
Track: Use cases
Room: Beckler
10/09/2025 2:30 PM 10/09/2025 2:55 PM America/Los_Angeles AS24: Navigating Secure and Cost-Efficient Flink Batch on Kubernetes with Airflow

At Yahoo, we built a secure, scalable, and cost-efficient batch processing platform using Amazon MWAA to orchestrate Apache Flink jobs on EKS, managed by the Flink Kubernetes Operator. This setup enables dynamic job orchestration while meeting strict enterprise compliance standards.

In this session, we’ll share how Airflow DAGs:

  • Dynamically launch, monitor, and clean up isolated Flink clusters per batch job, improving resource efficiency.

  • Securely fetch EKS kubeconfig, submit FlinkDeployment CRDs using FlinkKubernetesOperator, and poll job status using Airflow sensors.

  • Integrate IAM for access control and meet Yahoo’s security requirements, including mutual TLS (mTLS) with Athenz.

  • Optimize for cost and resilience through automated cleanup of jobs and the operator, and handle job failures and retries.

Join us for practical strategies and lessons from Yahoo’s production-scale Flink workflows in a Kubernetes environment.

Beckler
14:30 - 14:55.
By Katarzyna Kalek & Jakub Orlowski
Track: Airflow & ...
Room: Columbia D
10/09/2025 2:30 PM 10/09/2025 2:55 PM America/Los_Angeles AS24: Simplifying Data Management with DAG Factory

At OLX, we connect millions of people daily through our online marketplace while relying on robust data pipelines. In this talk, we explore how the DAG Factory concept elevates data governance, lineage, and discovery by centralizing operator logic and restricting direct DAG creation. This approach enforces code quality, optimizes resources, maintains infrastructure hygiene and enables smooth version upgrades. We then leverage consistent naming conventions in Airflow to build targeted namespaces, aligning teams with global policies while preserving autonomy. Integrating external tools like AWS Lake Formation and Open Metadata further unifies governance, making it straightforward to manage and secure data. This is critical when handling hundreds or even thousands of active DAGs. If the idea of storing 1,600 pipelines in one folder seems overwhelming, join us to learn how the DAG Factory concept simplifies pipeline management. We’ll also share insights from OLX, highlighting how thoughtful design fosters oversight, efficiency, and discoverability across diverse use cases.

Columbia D

At OLX, we connect millions of people daily through our online marketplace while relying on robust data pipelines. In this talk, we explore how the DAG Factory concept elevates data governance, lineage, and discovery by centralizing operator logic and restricting direct DAG creation. This approach enforces code quality, optimizes resources, maintains infrastructure hygiene and enables smooth version upgrades. We then leverage consistent naming conventions in Airflow to build targeted namespaces, aligning teams with global policies while preserving autonomy. Integrating external tools like AWS Lake Formation and Open Metadata further unifies governance, making it straightforward to manage and secure data. This is critical when handling hundreds or even thousands of active DAGs. If the idea of storing 1,600 pipelines in one folder seems overwhelming, join us to learn how the DAG Factory concept simplifies pipeline management. We’ll also share insights from OLX, highlighting how thoughtful design fosters oversight, efficiency, and discoverability across diverse use cases.

15:00 - 15:25.
By Niko Oliveira
Track: Airflow intro/overview
Room: Beckler
10/09/2025 3:00 PM 10/09/2025 3:25 PM America/Los_Angeles AS24: AWS Lambda Executor: The Speed of Local Execution with the Advantages of Remote

Apache Airflow’s executor landscape has traditionally presented users with a clear trade-off: choose either the speed of local execution or the scalability, isolation and configurability of remote execution. The AWS Lambda Executor introduces a new paradigm that bridges this gap, offering near-local execution speeds with the benefits of remote containerization.

This talk will begin with a brief overview of Airflow’s executors, how they work and what they are responsible for, highlighting the compromises between different executors. We will explore the emerging niche for fast, yet remote execution and demonstrate how the AWS Lambda Executor fills this space. We will also address practical considerations when using such an executor, such as working within Lambda’s 15 minute execution limit, and how to mitigate this using multi-executor configuration.

Whether you’re new to Airflow or an experienced user, this session will provide valuable insights into task execution and how you can combine the best of both local and remote execution paradigms.

Beckler

Apache Airflow’s executor landscape has traditionally presented users with a clear trade-off: choose either the speed of local execution or the scalability, isolation and configurability of remote execution. The AWS Lambda Executor introduces a new paradigm that bridges this gap, offering near-local execution speeds with the benefits of remote containerization.

This talk will begin with a brief overview of Airflow’s executors, how they work and what they are responsible for, highlighting the compromises between different executors. We will explore the emerging niche for fast, yet remote execution and demonstrate how the AWS Lambda Executor fills this space. We will also address practical considerations when using such an executor, such as working within Lambda’s 15 minute execution limit, and how to mitigate this using multi-executor configuration.

15:00 - 15:25.
By Kunal Jain
Track: Use cases
Room: Columbia C
10/09/2025 3:00 PM 10/09/2025 3:25 PM America/Los_Angeles AS24: How Airflow can help with Data Management and Governance

Metadata management is a cornerstone of effective data governance, yet it presents unique challenges distinct from traditional data engineering. At scale, efficiently extracting metadata from relational and NoSQL databases demands specialized solutions. To address this, our team has developed custom Airflow operators that scan and extract metadata across various database technologies, orchestrating 100+ production jobs to ensure continuous and reliable metadata collection.

Now, we’re expanding beyond databases to tackle non-traditional data sources such as file repositories and message queues. This shift introduces new complexities, including processing structured and unstructured files, managing schema evolution in streaming data, and maintaining metadata consistency across heterogeneous sources. In this session, we’ll share our approach to building scalable metadata scanners, optimizing performance, and ensuring adaptability across diverse data environments. Attendees will gain insights into designing efficient metadata pipelines, overcoming common pitfalls, and leveraging Airflow to drive metadata governance at scale.

Columbia C

Metadata management is a cornerstone of effective data governance, yet it presents unique challenges distinct from traditional data engineering. At scale, efficiently extracting metadata from relational and NoSQL databases demands specialized solutions. To address this, our team has developed custom Airflow operators that scan and extract metadata across various database technologies, orchestrating 100+ production jobs to ensure continuous and reliable metadata collection.

Now, we’re expanding beyond databases to tackle non-traditional data sources such as file repositories and message queues. This shift introduces new complexities, including processing structured and unstructured files, managing schema evolution in streaming data, and maintaining metadata consistency across heterogeneous sources. In this session, we’ll share our approach to building scalable metadata scanners, optimizing performance, and ensuring adaptability across diverse data environments. Attendees will gain insights into designing efficient metadata pipelines, overcoming common pitfalls, and leveraging Airflow to drive metadata governance at scale.

15:00 - 15:25.
By Oluwafemi Olawoyin
Track: Use cases
Room: Columbia A
10/09/2025 3:00 PM 10/09/2025 3:25 PM America/Los_Angeles AS24: Modernizing Automation in Secure, Regulated Environments: Lessons from Deploying Airflow

This session details practical strategies for introducing Apache Airflow in strict, compliance-heavy organizations. Learn how on-premise deployment and hybrid tooling can help modernize legacy workflows when public cloud solutions and container technologies are restricted. Discover how cross-platform engineering teams can collaborate securely using CI/CD bridges, and what it takes to meet rigorous security and governance standards. Key lessons address navigating resistance to change, achieving production sign-off, and avoiding common compliance pitfalls, relevant to anyone automating in public sector settings.

Columbia A

This session details practical strategies for introducing Apache Airflow in strict, compliance-heavy organizations. Learn how on-premise deployment and hybrid tooling can help modernize legacy workflows when public cloud solutions and container technologies are restricted. Discover how cross-platform engineering teams can collaborate securely using CI/CD bridges, and what it takes to meet rigorous security and governance standards. Key lessons address navigating resistance to change, achieving production sign-off, and avoiding common compliance pitfalls, relevant to anyone automating in public sector settings.

15:00 - 15:25.
By Philippe Gagnon
Track: Airflow & ...
Room: Columbia D
10/09/2025 3:00 PM 10/09/2025 3:25 PM America/Los_Angeles AS24: Using Apache Airflow with Trino for (almost) all your data problems

Trino is incredibly effective at enabling users to extract insights quickly and effectively from large amount of data located in dispersed and heterogeneous federated data systems.

However, some business data problems are more complex than interactive analytics use cases, and are best broken down into a sequence of interdependent steps, a.k.a. a workflow. For these use cases, dedicated software is often required in order to schedule and manage these processes with a principled approach.

In this session, we will look at how we can leverage Apache Airflow to orchestrate Trino queries into complex workflows that solve practical batch processing problems, all the while avoiding the use of repetitive, redundant data movement.

Columbia D

Trino is incredibly effective at enabling users to extract insights quickly and effectively from large amount of data located in dispersed and heterogeneous federated data systems.

However, some business data problems are more complex than interactive analytics use cases, and are best broken down into a sequence of interdependent steps, a.k.a. a workflow. For these use cases, dedicated software is often required in order to schedule and manage these processes with a principled approach.

15:45 - 16:10.
By John Robert
Track: Airflow & ...
Room: Columbia C
10/09/2025 3:45 PM 10/09/2025 4:10 PM America/Los_Angeles AS24: Building a Transparent Data Workflow with Airflow and Data Catalog

As modern data ecosystems grow in complexity, ensuring transparency, discoverability, and governance in data workflows becomes critical. Apache Airflow, a powerful workflow orchestration tool, enables data engineers to build scalable pipelines, but without proper visibility into data lineage, ownership, and quality, teams risk operating in a black box.

In this talk, we will explore how integrating Airflow with a data catalog can bring clarity and transparency to data workflows. We’ll discuss how metadata-driven orchestration enhances data governance, enables lineage tracking, and improves collaboration across teams. Through real-world use cases, we will demonstrate how Airflow can automate metadata collection, update data catalogs dynamically, and ensure data quality at every stage of the pipeline.

Attendees will walk away with practical strategies for implementing a transparent data workflow that fosters trust, efficiency, and compliance in their data infrastructure.

Columbia C

As modern data ecosystems grow in complexity, ensuring transparency, discoverability, and governance in data workflows becomes critical. Apache Airflow, a powerful workflow orchestration tool, enables data engineers to build scalable pipelines, but without proper visibility into data lineage, ownership, and quality, teams risk operating in a black box.

In this talk, we will explore how integrating Airflow with a data catalog can bring clarity and transparency to data workflows. We’ll discuss how metadata-driven orchestration enhances data governance, enables lineage tracking, and improves collaboration across teams. Through real-world use cases, we will demonstrate how Airflow can automate metadata collection, update data catalogs dynamically, and ensure data quality at every stage of the pipeline.

15:45 - 16:10.
By Gurmeet Saran & Kushal Thakkar
Track: Airflow & ...
Room: Columbia D
10/09/2025 3:45 PM 10/09/2025 4:10 PM America/Los_Angeles AS24: Enabling SQL testing in Airflow workflows using Pydantic types

This session explores how to bring unit testing to SQL pipelines using Airflow. I’ll walk through the development of a SQL testing library that allows isolated testing of SQL logic by injecting mock data into base tables. To support this, we built a type system for AWS Glue tables using Pydantic, enabling schema validation and mock data generation. Over time, this type system also powered production data quality checks via a custom Airflow operator. Learn how this approach improves reliability, accelerates development, and scales testing across data workflows.

Columbia D

This session explores how to bring unit testing to SQL pipelines using Airflow. I’ll walk through the development of a SQL testing library that allows isolated testing of SQL logic by injecting mock data into base tables. To support this, we built a type system for AWS Glue tables using Pydantic, enabling schema validation and mock data generation. Over time, this type system also powered production data quality checks via a custom Airflow operator. Learn how this approach improves reliability, accelerates development, and scales testing across data workflows.

15:45 - 16:10.
By Yu Lung Law & Ivan Sayapin
Track: Use cases
Room: Columbia A
10/09/2025 3:45 PM 10/09/2025 4:10 PM America/Los_Angeles AS24: Fine-Tuning Airflow: Parameters You May Not Know About

The Bloomberg Data Platform Engineering team is responsible for managing, storing, and providing access to business and financial data used by financial professionals across the global capital markets. Our team utilizes Apache Airflow to orchestrate data workflows across various applications and Bloomberg Terminal functions. Over the years, we have fine-tuned our Airflow cluster to handle more than 1,000 ingestion DAGs, which has presented unique scalability challenges. In this session, we will share insights into several key Airflow parameters — some of which you may not be all that familiar with — that our team uses to optimize and scale the platform effectively.

Columbia A

The Bloomberg Data Platform Engineering team is responsible for managing, storing, and providing access to business and financial data used by financial professionals across the global capital markets. Our team utilizes Apache Airflow to orchestrate data workflows across various applications and Bloomberg Terminal functions. Over the years, we have fine-tuned our Airflow cluster to handle more than 1,000 ingestion DAGs, which has presented unique scalability challenges. In this session, we will share insights into several key Airflow parameters — some of which you may not be all that familiar with — that our team uses to optimize and scale the platform effectively.

15:45 - 16:10.
By Priyanka Samanta
Track: Use cases
Room: Beckler
10/09/2025 3:45 PM 10/09/2025 4:10 PM America/Los_Angeles AS24: Orchestrating Travel Insights: Priceline's MLOps with Airflow

The journey from ML model development to production deployment and monitoring is often complex and fragmented. How can teams overcome the chaos of disparate tools and processes? This session dives into how Apache Airflow serves as a unifying force in MLOps. We’ll begin with a look at the broader MLOps trends observed by Google within the Airflow community, highlighting how Airflow is evolving to meet these challenges and showcasing diverse MLOps use cases – both current and future.

Then, Priceline will present a deep-dive case study on their MLOps transformation. Learn how they leveraged Cloud Composer, Google Cloud’s managed Apache Airflow service, to orchestrate their entire ML pipeline end-to-end: ETL, data preprocessing, model building & training, Dockerization, Google Artifact Registry integration, deployment, model serving, and evaluation. Discover how using Cloud Composer on GCP enabled them to build a scalable, reliable, adaptable, and maintainable MLOps practice, moving decisively from chaos to coordination. Cloud Composer (Airflow) has served as a major backbone in transforming the whole ML experience in Priceline.

Join us to learn how to harness Airflow, particularly within a managed environment like Cloud Composer, for robust MLOps workflows, drawing lessons from both industry trends and a concrete, successful implementation.

Beckler

The journey from ML model development to production deployment and monitoring is often complex and fragmented. How can teams overcome the chaos of disparate tools and processes? This session dives into how Apache Airflow serves as a unifying force in MLOps. We’ll begin with a look at the broader MLOps trends observed by Google within the Airflow community, highlighting how Airflow is evolving to meet these challenges and showcasing diverse MLOps use cases – both current and future.

16:15 - 16:40.
By Annie Friedman & Caitlin Petro
Track: Best practices
Room: Beckler
10/09/2025 4:15 PM 10/09/2025 4:40 PM America/Los_Angeles AS24: Lessons from Airflow gone wrong: How to set yourself up to scale successfully

Ever seen a DAG go rogue and deploy itself? Or try to time travel back to 1999? Join us for a light-hearted yet painfully relatable look at how not to scale your Airflow deployment to avoid chaos and debugging nightmares.

We’ll cover the classics: hardcoded secrets, unbounded retries (hello, immortal task!), and the infamous spaghetti DAG where 200 tasks are lovingly connected by hand and no one dares open the Airflow UI anymore. If you’ve ever used datetime.now() in your DAG definition and watched your backfills implode, this talk is for you.

From the BashOperator that became sentient to the XCom that tried to pass a whole Pandas DataFrame and the key to your mother’s house, we’ll walk through real-world bloopers with practical takeaways. You’ll learn why overusing PythonOperator is a recipe for mess, how not to use sensors unless you enjoy resource starvation, and why scheduling in local timezones is basically asking for a daylight savings time horror story. Other highlights include:

Over-provisioning resources in KubernetesPodOperator: many teams allocate excessive memory/CPU “just in case”, leading to cluster contention and resource waste.

Dynamic task mapping gone wild: 10,000 mapped tasks later… the scheduler is still crying.

SLAs used as data quality guarantees: creating alerts so noisy, nobody listens. Design-free DAGs: no docs, no comments, no idea why a task has a 3-day timeout. Finally, we’ll round it out with some dos and don’ts: using environment variables, avoiding memory-hungry monolith DAGs, skipping global imports, and not allocating 10x more memory “just in case.” Whether you’re new to Airflow or battle-hardened from a thousand failed backfills, come learn how to scale your pipelines without losing your mind (or your cluster).

Beckler

Ever seen a DAG go rogue and deploy itself? Or try to time travel back to 1999? Join us for a light-hearted yet painfully relatable look at how not to scale your Airflow deployment to avoid chaos and debugging nightmares.

We’ll cover the classics: hardcoded secrets, unbounded retries (hello, immortal task!), and the infamous spaghetti DAG where 200 tasks are lovingly connected by hand and no one dares open the Airflow UI anymore. If you’ve ever used datetime.now() in your DAG definition and watched your backfills implode, this talk is for you.

16:15 - 16:40.
By Abhishek Bhakat & Sudarshan Chaudhari
Track: Airflow & ...
Room: Columbia D
10/09/2025 4:15 PM 10/09/2025 4:40 PM America/Los_Angeles AS24: Model Context Protocol with Airflow

In today’s data-driven world, effective workflow management and AI are crucial for success. However, there’s a notable gap between Airflow and AI. Our presentation offers a solution to close this gap.

Proposing MCP (Model Context Protocol) server to act as a bridge. We’ll dive into two paths:

  • AI-Augmented Airflow: Enhancing Airflow with AI to improve error handling, automate DAG generation, proactively detect issues, and optimize resource use.
  • Airflow-Powered AI: Utilizing Airflow’s reliability to empower LLMs in executing complex tasks, orchestrating AI agents, and supporting decision-making with real-time data.

Key takeaways:

  • Understanding how to integrate AI insights directly into your workflow orchestration.
  • Learning how MCP empowers AI with robust orchestration capabilities, offering full logging, monitoring, and auditability.
  • Gaining insights into how to transform LLMS from a reactive responder to a proactive, intelligent, and reliable executor.

Inviting you to explore how MCP can help workflow management, making AI-driven decisions more reliable and turning workflow systems into intelligent, autonomous agents.

Columbia D

In today’s data-driven world, effective workflow management and AI are crucial for success. However, there’s a notable gap between Airflow and AI. Our presentation offers a solution to close this gap.

Proposing MCP (Model Context Protocol) server to act as a bridge. We’ll dive into two paths:

  • AI-Augmented Airflow: Enhancing Airflow with AI to improve error handling, automate DAG generation, proactively detect issues, and optimize resource use.
  • Airflow-Powered AI: Utilizing Airflow’s reliability to empower LLMs in executing complex tasks, orchestrating AI agents, and supporting decision-making with real-time data.

Key takeaways:

16:15 - 16:40.
By Sebastien Crocquevieille
Track: Use cases
Room: Columbia C
10/09/2025 4:15 PM 10/09/2025 4:40 PM America/Los_Angeles AS24: Multi-Instance Asset Synchronization - push or pull?

As Data Engineers, our jobs regularly include scheduling or scaling workflows.

But have you ever asked yourself, can I scale my scheduling ?

It turns out that you can! But doing so raises a number of issues that need to be addressed.

In this talk we’ll be:

  • Recapping Asset-aware scheduling in Apache Airflow
  • Discussing diverse methods to upscale our scheduling
  • Solving the issue of maintaining our Airflow Asset synchronized between instances
  • Comparing our professional push based solution and the built-in solution from AIP-82 and the pros and cons of each method.

I hope you will enjoy it!

Columbia C

As Data Engineers, our jobs regularly include scheduling or scaling workflows.

But have you ever asked yourself, can I scale my scheduling ?

It turns out that you can! But doing so raises a number of issues that need to be addressed.

In this talk we’ll be:

  • Recapping Asset-aware scheduling in Apache Airflow
  • Discussing diverse methods to upscale our scheduling
  • Solving the issue of maintaining our Airflow Asset synchronized between instances
  • Comparing our professional push based solution and the built-in solution from AIP-82 and the pros and cons of each method.

I hope you will enjoy it!

16:15 - 16:40.
By pei-chi-miko-chen
Track: Use cases
Room: Columbia A
10/09/2025 4:15 PM 10/09/2025 4:40 PM America/Los_Angeles AS24: No More Missed Beats: How Airflow Rescued Our Analytics Pipeline

Before Airflow, our BigQuery pipelines at Create Music Group operated like musicians without a conductor—each playing on its own schedule, regardless of whether upstream data was ready. As our data platform grew, this chaos led to spiralling costs, performance bottlenecks, and became utterly unsustainable.

This talk tells the story of how Create Music Group brought harmony to its data workflows by adopting Apache Airflow and the Medallion architecture, ultimately slashing our data processing costs by 50%. We’ll show how moving to event-driven scheduling with datasets helped eliminate stale data issues, dramatically improved performance, and unlocked faster iteration across teams. Discover how we replaced repetitive SQL with standardized dimension/fact tables, empowering analysts in a safer sandbox.

Columbia A

Before Airflow, our BigQuery pipelines at Create Music Group operated like musicians without a conductor—each playing on its own schedule, regardless of whether upstream data was ready. As our data platform grew, this chaos led to spiralling costs, performance bottlenecks, and became utterly unsustainable.

This talk tells the story of how Create Music Group brought harmony to its data workflows by adopting Apache Airflow and the Medallion architecture, ultimately slashing our data processing costs by 50%. We’ll show how moving to event-driven scheduling with datasets helped eliminate stale data issues, dramatically improved performance, and unlocked faster iteration across teams. Discover how we replaced repetitive SQL with standardized dimension/fact tables, empowering analysts in a safer sandbox.

16:45 - 17:10.
By Di Wu
Track: Use cases
Room: Columbia A
10/09/2025 4:45 PM 10/09/2025 5:10 PM America/Los_Angeles AS24: Orchestrating Global Market Data Pipelines with Airflow

In this presentation, I will highlight how Apache Airflow addresses key data management challenges for Exchange-Traded Funds (ETFs) in the global financial market. ETFs, which combine features of mutual funds and stocks, track indexes, commodities, or baskets of assets and trade on major stock exchanges. Because they operate around the clock across multiple time zones, ETF managers must navigate diverse regulations, coordinate complex operational constraints, and ensure accurate valuations. This often requires integrating data from vendors for pricing and reference details. These data sets arrive at different times, can conflict, and must pass rigorous quality checks before being published for global investors. Managing updates, orchestrating workflows, and maintaining high data quality present significant hurdles. Apache Airflow tackles these issues by scheduling repetitive tasks and enabling event-triggered job runs for immediate data checks. It offers monitoring and alerting, thus reducing manual intervention and errors. Using DAGs, Airflow scales efficiently, streamlining complex data ingestion, validation, and publication processes.

Columbia A

In this presentation, I will highlight how Apache Airflow addresses key data management challenges for Exchange-Traded Funds (ETFs) in the global financial market. ETFs, which combine features of mutual funds and stocks, track indexes, commodities, or baskets of assets and trade on major stock exchanges. Because they operate around the clock across multiple time zones, ETF managers must navigate diverse regulations, coordinate complex operational constraints, and ensure accurate valuations. This often requires integrating data from vendors for pricing and reference details. These data sets arrive at different times, can conflict, and must pass rigorous quality checks before being published for global investors. Managing updates, orchestrating workflows, and maintaining high data quality present significant hurdles. Apache Airflow tackles these issues by scheduling repetitive tasks and enabling event-triggered job runs for immediate data checks. It offers monitoring and alerting, thus reducing manual intervention and errors. Using DAGs, Airflow scales efficiently, streamlining complex data ingestion, validation, and publication processes.

16:45 - 17:10.
By Ashok Prakash
Track: Best practices
Room: Beckler
10/09/2025 4:45 PM 10/09/2025 5:10 PM America/Los_Angeles AS24: Scaling ML Infrastructure: Lessons from Building Distributed Systems

In today’s data-driven world, scalable ML infrastructure is mission-critical. As ML workloads grow, orchestration tools like Apache Airflow become essential for managing pipelines, training, deployment, and observability. In this talk, I’ll share lessons from building distributed ML systems across cloud platforms, including GPU-based training and AI-powered healthcare. We’ll cover patterns for scaling Airflow DAGs, integrating telemetry and auto-healing, and aligning cross-functional teams. Whether you’re launching your first pipeline or managing ML at scale, you’ll gain practical strategies to make Airflow the backbone of your ML infrastructure.

Beckler

In today’s data-driven world, scalable ML infrastructure is mission-critical. As ML workloads grow, orchestration tools like Apache Airflow become essential for managing pipelines, training, deployment, and observability. In this talk, I’ll share lessons from building distributed ML systems across cloud platforms, including GPU-based training and AI-powered healthcare. We’ll cover patterns for scaling Airflow DAGs, integrating telemetry and auto-healing, and aligning cross-functional teams. Whether you’re launching your first pipeline or managing ML at scale, you’ll gain practical strategies to make Airflow the backbone of your ML infrastructure.

17:30 - 17:35.
By Shahar Epstein
Track: Airflow & ...
Room: Columbia A
10/09/2025 5:30 PM 10/09/2025 5:35 PM America/Los_Angeles AS24: Supercharging Apache Airflow: Enhancing Core Components with Rust

Apache Airflow is a powerful workflow orchestrator, but as workloads grow, its Python-based components can become performance bottlenecks. This talk explores how Rust, with its speed, safety, and concurrency advantages, can enhance Airflow’s core components (e.g, scheduler, DAG processor, etc). We’ll dive into the motivations behind using Rust, architectural trade-offs, and the challenges of bridging the gap between Python and Rust. A proof-of-concept showcasing an Airflow scheduler rewritten in Rust will demonstrate the potential benefits of this approach.

Columbia A

Apache Airflow is a powerful workflow orchestrator, but as workloads grow, its Python-based components can become performance bottlenecks. This talk explores how Rust, with its speed, safety, and concurrency advantages, can enhance Airflow’s core components (e.g, scheduler, DAG processor, etc). We’ll dive into the motivations behind using Rust, architectural trade-offs, and the challenges of bridging the gap between Python and Rust. A proof-of-concept showcasing an Airflow scheduler rewritten in Rust will demonstrate the potential benefits of this approach.

9:30 - 10:00
Keynote TBC
10:00 - 10:30
Coffee break
13:00 - 14:00
Lunch
14:00 - 14:25
Invited talk
15:30 - 15:45
Coffee break
17:35 - 17:40
Lightning talk
16:40 - 17:45
Lightning talk
17:45 - 17:50
Lightning talk
17:50 - 17:55
Lightning talk
18:00 - 18:15
Event wrap-up
09:00 - 09:25. Columbia A
By Brooke Jamieson
Track: Keynote

Our development workflows look dramatically different than they did a year ago. Code generation, automated testing, and AI-assisted documentation tools are now part of many developers’ daily work. Yet as these tools reshape how we code, I’ve noticed something worth examining: while our toolbox is changing rapidly, the core of being a good developer hasn’t. Problem-solving, collaborative debugging, and systems thinking remain as crucial as ever.

In this keynote, I’ll share observations about:

10:30 - 10:55. Columbia A
By Karthik Dulam
Track: Use cases

This talk explores EDB’s journey from siloed reporting to a unified data platform, powered by Airflow. We’ll delve into the architectural evolution, showcasing how Airflow orchestrates a diverse range of use cases, from Analytics Engineering to complex MLOps pipelines.

Learn how EDB leverages Airflow and Cosmos to integrate dbt for robust data transformations, ensuring data quality and consistency.

We’ll provide a detailed case study of our MLOps implementation, demonstrating how Airflow manages training, inference, and model monitoring pipelines for Azure Machine Learning models.

10:30 - 10:55. Columbia C
By Hannah Lundrigan & Alberto Hernandez
Track: Use cases

Small retailers often lack the data visibility that larger companies rely on for decision-making. In this session, we’ll dive into how Apache Airflow powers end-to-end machine learning pipelines that process inventory and sales data, enabling retailers and suppliers to gain valuable industry insights. We’ll cover feature engineering, model training, and automated inference workflows, along with strategies for handling messy, incomplete retail data. We will discuss how Airflow enables scalable ML-driven insights that improve demand forecasting, product categorization, and supply chain optimization.

10:30 - 10:55. Columbia D
By Xiaodong Deng & Chaoran Yu
Track: Airflow & ...

Apache Airflow’s Kubernetes integration enables flexible workload execution on Kubernetes but lacks advanced resource management features including application queueing, tenant isolation and gang scheduling. These features are increasingly critical for data engineering as well as AI/ML use cases, particularly GPU utilization optimization. Apache Yunikorn, a Kubernetes-native scheduler, addresses these gaps by offering a high-performance alternative to Kubernetes default scheduler. In this talk, we’ll demonstrate how to conveniently leverage Yunikorn’s power in Airflow, along with practical use cases and examples.

10:30 - 10:55. Beckler
By Zhe-You Liu
Track: Airflow intro/overview

How a Complete Beginner in Data Engineering / Junior Computer Science Student Became an Apache Airflow Committer in Just 5 Months—With 70+ PRs and 300 Hours of Contributions

This talk is aimed at those who are still hesitant about contributing to Apache Airflow. I hope to inspire and encourage anyone to take the first step and start their journey in open-source—let’s build together!

10:30 - 13:00. 301
By Marc Lamberti
Track: Workshop
The DAG Authoring for Apache Airflow certification validates your expertise in advanced Airflow concepts and demonstrates your ability to build production-grade data pipelines.
10:30 - 13:00. 305
By Jon Fink & Amy Pitcher
Track: Workshop
Learn how Control-M integrates with Airflow to orchestrate end-to-end workflows that include upstream and downstream enterprise systems like Supply Chain and Billing. Gain visibility, reliability, and seamless coordination across your data pipelines and the business operations they support.
10:30 - 13:00. 306
By Luan Moreno Medeiros Maciel
Track: Workshop
Transform your data pipelines with event-driven scheduling in Airflow 3.
11:00 - 11:25. Columbia A
By Bolke de Bruin
Track: Community

We face a paradox: we could use usage data to build better software, but collecting that data seems to contradict the very principles of user freedom that open source represents. Apache Airflow’s current telemetry - already purged - system has become a battleground for this conflict, with some users voicing concerns over privacy while maintainers struggle to make informed decisions without data. What can we do to strike the right balance?

11:00 - 11:25. Columbia C
By Jonathan Leek & Michelle Winters
Track: Best practices

As organizations scale their data infrastructure, Apache Airflow becomes a mission-critical component for orchestrating workflows efficiently. But scaling Airflow successfully isn’t just about running pipelines—it’s about building a Center of Excellence (CoE) that empowers teams with the right strategy, best practices, and long-term enablement. Join Jon Leek and Michelle Winters as they share their experiences helping customers design and implement Airflow Centers of Excellence. They’ll walk through real-world challenges, best practices, and the structured approach Astronomer takes to ensure teams have the right plan, resources, and support to succeed. Whether you’re just starting with Airflow or looking to optimize and scale your workflows, this session will give you a proven framework to build a sustainable Airflow Center of Excellence within your organization. 🚀

11:00 - 11:25. Columbia D
By Rachel Sun
Track: Airflow & ...

At Pinterest, there are over 10,000 DAGs supporting various use cases across different teams and roles. With this scale and diversity, user support has been an ongoing challenge to unlock productivity. As Airflow increasingly serves as a user interface to a variety of data and ML infrastructure behind the scenes, it’s common for issues from multiple areas to surface in Airflow, making triage and troubleshooting a challenge.

In this session, we will discuss the scale of the problem we are facing, how we have addressed it so far, and how we are introducing LLM AI to help solve this problem.

11:00 - 11:25. Beckler
By Karan Alang
Track: Airflow & ...

Security teams often face alert fatigue from massive volumes of raw log data. This session demonstrates how to combine Apache Airflow, Wazuh, and LLMs to build automated pipelines for smarter threat triage—grounded in the MITRE ATT&CK framework.

We’ll explore how Airflow can orchestrate a full workflow: ingesting Wazuh alerts, using LLMs to summarize log events, matching behavior to ATT&CK tactics and techniques, and generating enriched incident summaries. With AI-powered interpretation layered on top of structured threat intelligence, teams can reduce manual effort while increasing context and clarity.

11:30 - 11:55. Columbia A
By Theo Lebrun
Track: Use cases

In the age of Generative AI, knowledge bases are the backbone of intelligent systems, enabling them to deliver accurate and context-aware responses. But how do you ensure that these knowledge bases remain up-to-date and relevant in a rapidly changing world? Enter Apache Airflow, a robust orchestration tool that streamlines the automation of data workflows.

This talk will explore how Airflow can be leveraged to manage and update AI knowledge bases across multiple data sources. We’ll dive into the architecture, demonstrate how Airflow enables efficient data extraction, transformation, and loading (ETL), and share insights on tackling challenges like data consistency, scheduling, and scalability.

11:30 - 11:55. Columbia C
By Ankit Sahu & Brandon Abear
Track: Airflow & ...

As the adoption of Airflow increases within large enterprises to orchestrate their data pipelines, more than one team needs to create, manage, and run their workflows in isolation. With multi-tenancy not yet supported natively in Airflow, customers are adopting alternate ways to enable multiple teams to share infrastructure. In this session, we will explore how GoDaddy uses MWAA to build a Single Pane Airflow setup for multiple teams with a common observability platform, and how this foundation enables orchestration expansion beyond data workflows to AI workflows as well. We’ll discuss our roadmap for leveraging upcoming Airflow 3 features, including the task execution API for enhanced workflow management and DAG versioning capabilities for comprehensive auditing and governance. This session will help attendees gain insights into the use case, the solution architecture, implementation challenges and benefits, and our strategic vision for unified orchestration across data and AI workloads.

11:30 - 11:55. Columbia D
By Nathan Hadfield
Track: Airflow & ...

Last year, ‘From Oops to Ops’ showed how AI-powered failure analysis could help diagnose why Airflow tasks fail. But do we really need large, expensive cloud-based AI models to answer simple diagnostic questions? Relying on external AI APIs introduces privacy risks, unpredictable costs, and latency, often without clear benefits for this use case.

With the rise of distilled, open-source models, self-hosted failure analysis is now a practical alternative. This talk will explore how to deploy an AI service on infrastructure you control, compare cost, speed, and accuracy between OpenAI’s API and self-hosted models, and showcase a live demo of AI-powered task failure diagnosis using DeepSeek and Llama—running without external dependencies to keep data private and costs predictable.

11:30 - 11:55. Beckler
By Shalabh Agarwal
Track: Airflow & ...

Custom operators are the secret weapon for solving Airflow’s unique & challenging orchestration problems.

This session will cover:

  • When to build custom operators vs. using existing solutions
  • Architecture patterns for creating maintainable, reusable operators
  • Live coding demonstration: Building a custom operator from scratch
  • Real-world examples: How custom operators solve specific business challenges

Through practical code examples and architecture patterns, attendees will walk away with the knowledge to implement custom operators that enhance their Airflow deployments.

12:00 - 12:25. Columbia A
By Lawrence Gerstley
Track: Use cases

KP Division of Research uses Airflow as a central technology for integrating diverse technologies in an agile setting. We wish to present a set of use-cases for AI/ML workloads, including imaging analysis (tissue segmentation, mammography), NLP (early identification of psychosis), LLM processing (identification of vessel diameter from radiological impressions), and other large data processing tasks. We create these “short-lived” project workflows to accomplish specific aims, and then may never run the job again, so leveraging generalized patterns are crucial to quickly implementing these jobs. Our Advanced Computational Infrastructure is comprised of multiple Kubernetes clusters, and we use Airflow to democratize the use of our batch level resources in those clusters. We use Airflow form-based parameters to deploy pods running R and Python scripts where generalized parameters are injected into scripts that follow internal programming patterns. Finally, we also leverage Airflow to create headless services inside Kubernetes for large computational workloads (Spark & H2O) that subsequent pods consume ephemerally.

12:00 - 12:25. Columbia C
By Peeyush Rai
Track: Airflow & ...

The weav.ai platform is built on top of Apache Airflow, chosen for its deterministic, predictable execution coupled with extreme developer customizability. weav.ai has seamlessly integrated its AI agents with Airflow to enable a unified AI orchestration to bring the power of scalability, robustness and the intelligence of AI in a single process. This talk will focus on the use cases being served, an architecture overview of the key Airflow capabilities being leveraged, and how Agentic AI has been seamlessly integrated to deliver the AI powered workflows. Weav.ai’s platform is agnostic to any specific cloud or LLM and can orchestrate across those based on the use case.

12:00 - 12:25. Columbia D
By Salih Goktug Kose & Burak Ozdemir
Track: Airflow & ...

At Trendyol, Turkey’s leading e-commerce company, Apache Airflow powers our task orchestration, handling DAGs with 500+ tasks, complex interdependencies, and diverse environments. Managing on-prem Airflow instances posed challenges in scalability, maintenance, and deployment. To address these, we built TaskHarbor, a fully managed orchestration platform with a hybrid architecture—combining Airflow on GKE with on-prem resources for optimal performance and efficiency.

This talk covers how we:

  • Enabled seamless DAG synchronization across environments using GCS Fuse.
  • Optimized workload distribution via GCP’s HTTPS & TCP Load Balancers.
  • Automated infrastructure provisioning (GKE, CloudSQL, Kubernetes) using Terraform.
  • Simplified Airflow deployments by replacing Helm YAML files with a custom templating tool, reducing configurations to 10-15 lines.
  • Built a fully automated deployment pipeline, ensuring zero developer intervention.

We enhanced efficiency, reliability, and automation in hybrid orchestration by embracing a scalable, maintainable, and cloud-native strategy. Attendees will obtain practical insights into architecting Airflow at scale and optimizing deployments.

12:00 - 12:25. Beckler
By Vishal Vijayvargiya
Track: Airflow & ...

Apache Airflow’s REST API has evolved to support diverse orchestration needs, with managed services like MWAA introducing custom enhancements. One such feature, InvokeRestApi, enables dynamic interactions with external services while maintaining Airflow’s core orchestration capabilities.

In this talk, we will explore the architectural design behind InvokeRestApi, detailing how it enhances API-driven workflows. Beyond the architecture, we’ll share key challenges and learnings from implementing and scaling Airflow’s REST API in production environments. Topics include authentication, performance considerations, error handling, and best practices for integrating external APIs efficiently.

12:30 - 12:55. Columbia A
By Vikram Koka
Track: Best practices

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time.

12:30 - 12:55. Columbia D
By Chirag Todarka & Alvin Zhang
Track: Airflow & ...

In large organizations, multiple Apache Airflow instances often arise organically—driven by team-specific needs, distinct use cases, or tiered workloads. This fragmentation introduces complexity, operational overhead, and higher infrastructure costs. To address these challenges, we developed the “Orchestration Frederator,” a solution designed to unify and horizontally scale multiple Airflow deployments seamlessly.

This session will detail our journey in implementing Orchestration Frederator, highlighting how we achieved:

  • Horizontal Scalability: Seamlessly scaling Airflow across multiple instances without operational overhead.

12:30 - 12:55. Beckler
By Rakesh Kumar Tai & Mili Tripathi
Track: Use cases

In the rapidly evolving field of data engineering and data science, efficiency and ease of use are crucial. Our innovative solution offers a user-friendly interface to manage and schedule custom PySpark, PySQL, Python, and SQL code, streamlining the process from development to production. Using Airflow at the backend, this tool eliminates the complexities of infrastructure management, version control, CI/CD processes, and workflow orchestration.The intuitive UI allows users to upload code, configure job parameters, and set schedules effortlessly, without the need for additional scripting or coding. Additionally, users have the flexibility to bring their own custom artifactory solution and run their code. In summary, our solution significantly enhances the orchestration and scheduling of custom code, breaking down traditional barriers and empowering organizations to maximize their data’s potential and drive innovation efficiently. Whether you are an individual data scientist or part of a large data engineering team, this tool provides the resources needed to streamline your workflow and achieve your goals faster than ever before.

14:00 - 14:25. Columbia A
By Yunhao Qing
Track: Use cases

As data platforms grow in complexity, so do the orchestration needs behind them. Time-based (cron) scheduling has long been the default in Airflow, but dataset-based scheduling promises a more data-aware, efficient alternative. In this session, I’ll share lessons learned from operating Airflow at scale—supporting thousands of DAGs across teams with varied use cases, from simple ETL to complex ML workflows. We’ll explore when dataset scheduling makes sense, the challenges it introduces, and how to evolve your DAG design and platform architecture to make the most of it. Whether you’re migrating legacy workflows or designing new ones, this talk will help you evaluate the right scheduling model for your needs.

14:00 - 14:25. Columbia C
By Arthur Chen, Trevor DeVore & Deng Pan
Track: Airflow & ...

At LinkedIn, our data pipelines process exabytes of data, with our offline infrastructure executing 300K ETL workflows daily and 10K concurrent executions. Historically, these workloads ran on our legacy system, Azkaban, which faced UX, scalability, and operational challenges. To modernize our infra, we built a managed Airflow service, leveraging its enhanced developer & operator experience, rich feature set, and strong OSS community support. That initiated LinkedIn’s largest-ever infrastructure migration—transitioning thousands of legacy workflows to Airflow.

14:00 - 14:25. Beckler
By Khadija Al Ahyane
Track: Airflow & ...

Per Airflow community survey, Kubernetes is the most popular compute platform used to run Airflow and when run on Kubernetes, Airflow gains, out of the box, lots of benefits like monitoring, reliability, ease of deployment, scalability and autoscaling. On the other hand, running Airflow on Kubernetes means running a sophisticated distributed system on another distributed system which makes troubleshooting of Airflow tasks and DAGs failures harder.

This session tackles that bottleneck head-on, introducing a practical approach to building an automated diagnostic pipeline for Airflow on Kubernetes. Imagine offloading tedious investigations to a system that, on task failure, automatically collects and correlates key signals from Kubernetes components (linking Airflow tasks to specific Pods and their events), KubernetesGKE monitoring, and relevant logs—pinpointing root causes and suggesting actionable fixes.

14:00 - 16:30. 301
By Pankaj Singh, Tatiana Al-Chueyr Martins & Pankaj Koti
Track: Workshop
This workshop will cover a step-by-step guide to Cosmos, an open-source package that helps you quickly run your dbt Core projects as Airflow DAGs and Task Groups.
14:00 - 16:30. 305
By M Waqas Shahid
Track: Workshop
This workshop should be suitable for any Architect, Data Engineer or Devops aiming to build/enhance their internal Data Platform. At the end of this workshop you would have solid understanding of initial setup and ways to optimise further getting most out of the tool for your own organisation.
14:00 - 16:30. 306
By Ryan Hatter, Amogh Desai, Phani Kumar & Kalya Reddy
Track: Workshop
Whether you’re writing code, enhancing documentation, or offering feedback, there’s a place for you. Let’s get started and see your name among Airflow contributors!"
14:30 - 14:55. Columbia A
By Christian Foernges
Track: Use cases

Operating within the stringent regulatory landscape of Corporate Banking, Deutsche Bank relies heavily on robust data orchestration. This session explores how Deutsche Bank’s Corporate Bank leverages Apache Airflow across diverse environments, including both on-premises infrastructure and cloud platforms. Discover their approach to managing critical data & analytics workflows, encompassing areas like regulatory reporting, data integration and complex data processing pipelines. Gain insights into the architectural patterns and operational best practices employed to ensure compliance, security, and scalability when running Airflow at scale in a highly regulated, hybrid setting.

14:30 - 14:55. Columbia C
By Shoubhik Bose
Track: Use cases

Red Hat’s unified data and AI platform relies on Apache Airflow for orchestration, alongside Snowflake, Fivetran, and Atlan. The platform prioritizes building a dependable data foundation, recognizing that effective AI depends on quality data. Airflow was selected for its predictability, extensive connectivity, reliability, and scalability.

The platform now supports business analytics, transitioning from ETL to ELT processes. This has resulted in a remarkable improvement in how we make data available for business decisions.

14:30 - 14:55. Columbia D
By Katarzyna Kalek & Jakub Orlowski
Track: Airflow & ...

At OLX, we connect millions of people daily through our online marketplace while relying on robust data pipelines. In this talk, we explore how the DAG Factory concept elevates data governance, lineage, and discovery by centralizing operator logic and restricting direct DAG creation. This approach enforces code quality, optimizes resources, maintains infrastructure hygiene and enables smooth version upgrades. We then leverage consistent naming conventions in Airflow to build targeted namespaces, aligning teams with global policies while preserving autonomy. Integrating external tools like AWS Lake Formation and Open Metadata further unifies governance, making it straightforward to manage and secure data. This is critical when handling hundreds or even thousands of active DAGs. If the idea of storing 1,600 pipelines in one folder seems overwhelming, join us to learn how the DAG Factory concept simplifies pipeline management. We’ll also share insights from OLX, highlighting how thoughtful design fosters oversight, efficiency, and discoverability across diverse use cases.

14:30 - 14:55. Beckler
By Purshotam Shah & Prakash Nandha Mukunthan
Track: Use cases
15:00 - 15:25. Columbia A
By Oluwafemi Olawoyin
Track: Use cases

This session details practical strategies for introducing Apache Airflow in strict, compliance-heavy organizations. Learn how on-premise deployment and hybrid tooling can help modernize legacy workflows when public cloud solutions and container technologies are restricted. Discover how cross-platform engineering teams can collaborate securely using CI/CD bridges, and what it takes to meet rigorous security and governance standards. Key lessons address navigating resistance to change, achieving production sign-off, and avoiding common compliance pitfalls, relevant to anyone automating in public sector settings.

15:00 - 15:25. Columbia C
By Kunal Jain
Track: Use cases

Metadata management is a cornerstone of effective data governance, yet it presents unique challenges distinct from traditional data engineering. At scale, efficiently extracting metadata from relational and NoSQL databases demands specialized solutions. To address this, our team has developed custom Airflow operators that scan and extract metadata across various database technologies, orchestrating 100+ production jobs to ensure continuous and reliable metadata collection.

Now, we’re expanding beyond databases to tackle non-traditional data sources such as file repositories and message queues. This shift introduces new complexities, including processing structured and unstructured files, managing schema evolution in streaming data, and maintaining metadata consistency across heterogeneous sources. In this session, we’ll share our approach to building scalable metadata scanners, optimizing performance, and ensuring adaptability across diverse data environments. Attendees will gain insights into designing efficient metadata pipelines, overcoming common pitfalls, and leveraging Airflow to drive metadata governance at scale.

15:00 - 15:25. Columbia D
By Philippe Gagnon
Track: Airflow & ...

Trino is incredibly effective at enabling users to extract insights quickly and effectively from large amount of data located in dispersed and heterogeneous federated data systems.

However, some business data problems are more complex than interactive analytics use cases, and are best broken down into a sequence of interdependent steps, a.k.a. a workflow. For these use cases, dedicated software is often required in order to schedule and manage these processes with a principled approach.

15:00 - 15:25. Beckler
By Niko Oliveira
Track: Airflow intro/overview

Apache Airflow’s executor landscape has traditionally presented users with a clear trade-off: choose either the speed of local execution or the scalability, isolation and configurability of remote execution. The AWS Lambda Executor introduces a new paradigm that bridges this gap, offering near-local execution speeds with the benefits of remote containerization.

This talk will begin with a brief overview of Airflow’s executors, how they work and what they are responsible for, highlighting the compromises between different executors. We will explore the emerging niche for fast, yet remote execution and demonstrate how the AWS Lambda Executor fills this space. We will also address practical considerations when using such an executor, such as working within Lambda’s 15 minute execution limit, and how to mitigate this using multi-executor configuration.

15:45 - 16:10. Columbia A
By Yu Lung Law & Ivan Sayapin
Track: Use cases

The Bloomberg Data Platform Engineering team is responsible for managing, storing, and providing access to business and financial data used by financial professionals across the global capital markets. Our team utilizes Apache Airflow to orchestrate data workflows across various applications and Bloomberg Terminal functions. Over the years, we have fine-tuned our Airflow cluster to handle more than 1,000 ingestion DAGs, which has presented unique scalability challenges. In this session, we will share insights into several key Airflow parameters — some of which you may not be all that familiar with — that our team uses to optimize and scale the platform effectively.

15:45 - 16:10. Columbia C
By John Robert
Track: Airflow & ...

As modern data ecosystems grow in complexity, ensuring transparency, discoverability, and governance in data workflows becomes critical. Apache Airflow, a powerful workflow orchestration tool, enables data engineers to build scalable pipelines, but without proper visibility into data lineage, ownership, and quality, teams risk operating in a black box.

In this talk, we will explore how integrating Airflow with a data catalog can bring clarity and transparency to data workflows. We’ll discuss how metadata-driven orchestration enhances data governance, enables lineage tracking, and improves collaboration across teams. Through real-world use cases, we will demonstrate how Airflow can automate metadata collection, update data catalogs dynamically, and ensure data quality at every stage of the pipeline.

15:45 - 16:10. Columbia D
By Gurmeet Saran & Kushal Thakkar
Track: Airflow & ...

This session explores how to bring unit testing to SQL pipelines using Airflow. I’ll walk through the development of a SQL testing library that allows isolated testing of SQL logic by injecting mock data into base tables. To support this, we built a type system for AWS Glue tables using Pydantic, enabling schema validation and mock data generation. Over time, this type system also powered production data quality checks via a custom Airflow operator. Learn how this approach improves reliability, accelerates development, and scales testing across data workflows.

15:45 - 16:10. Beckler
By Priyanka Samanta
Track: Use cases

The journey from ML model development to production deployment and monitoring is often complex and fragmented. How can teams overcome the chaos of disparate tools and processes? This session dives into how Apache Airflow serves as a unifying force in MLOps. We’ll begin with a look at the broader MLOps trends observed by Google within the Airflow community, highlighting how Airflow is evolving to meet these challenges and showcasing diverse MLOps use cases – both current and future.

16:15 - 16:40. Columbia A
By pei-chi-miko-chen
Track: Use cases

Before Airflow, our BigQuery pipelines at Create Music Group operated like musicians without a conductor—each playing on its own schedule, regardless of whether upstream data was ready. As our data platform grew, this chaos led to spiralling costs, performance bottlenecks, and became utterly unsustainable.

This talk tells the story of how Create Music Group brought harmony to its data workflows by adopting Apache Airflow and the Medallion architecture, ultimately slashing our data processing costs by 50%. We’ll show how moving to event-driven scheduling with datasets helped eliminate stale data issues, dramatically improved performance, and unlocked faster iteration across teams. Discover how we replaced repetitive SQL with standardized dimension/fact tables, empowering analysts in a safer sandbox.

16:15 - 16:40. Columbia C
By Sebastien Crocquevieille
Track: Use cases

As Data Engineers, our jobs regularly include scheduling or scaling workflows.

But have you ever asked yourself, can I scale my scheduling ?

It turns out that you can! But doing so raises a number of issues that need to be addressed.

In this talk we’ll be:

  • Recapping Asset-aware scheduling in Apache Airflow
  • Discussing diverse methods to upscale our scheduling
  • Solving the issue of maintaining our Airflow Asset synchronized between instances
  • Comparing our professional push based solution and the built-in solution from AIP-82 and the pros and cons of each method.

I hope you will enjoy it!

16:15 - 16:40. Columbia D
By Abhishek Bhakat & Sudarshan Chaudhari
Track: Airflow & ...

In today’s data-driven world, effective workflow management and AI are crucial for success. However, there’s a notable gap between Airflow and AI. Our presentation offers a solution to close this gap.

Proposing MCP (Model Context Protocol) server to act as a bridge. We’ll dive into two paths:

  • AI-Augmented Airflow: Enhancing Airflow with AI to improve error handling, automate DAG generation, proactively detect issues, and optimize resource use.
  • Airflow-Powered AI: Utilizing Airflow’s reliability to empower LLMs in executing complex tasks, orchestrating AI agents, and supporting decision-making with real-time data.

Key takeaways:

16:15 - 16:40. Beckler
By Annie Friedman & Caitlin Petro
Track: Best practices

Ever seen a DAG go rogue and deploy itself? Or try to time travel back to 1999? Join us for a light-hearted yet painfully relatable look at how not to scale your Airflow deployment to avoid chaos and debugging nightmares.

We’ll cover the classics: hardcoded secrets, unbounded retries (hello, immortal task!), and the infamous spaghetti DAG where 200 tasks are lovingly connected by hand and no one dares open the Airflow UI anymore. If you’ve ever used datetime.now() in your DAG definition and watched your backfills implode, this talk is for you.

16:45 - 17:10. Columbia A
By Di Wu
Track: Use cases

In this presentation, I will highlight how Apache Airflow addresses key data management challenges for Exchange-Traded Funds (ETFs) in the global financial market. ETFs, which combine features of mutual funds and stocks, track indexes, commodities, or baskets of assets and trade on major stock exchanges. Because they operate around the clock across multiple time zones, ETF managers must navigate diverse regulations, coordinate complex operational constraints, and ensure accurate valuations. This often requires integrating data from vendors for pricing and reference details. These data sets arrive at different times, can conflict, and must pass rigorous quality checks before being published for global investors. Managing updates, orchestrating workflows, and maintaining high data quality present significant hurdles. Apache Airflow tackles these issues by scheduling repetitive tasks and enabling event-triggered job runs for immediate data checks. It offers monitoring and alerting, thus reducing manual intervention and errors. Using DAGs, Airflow scales efficiently, streamlining complex data ingestion, validation, and publication processes.

16:45 - 17:10. Beckler
By Ashok Prakash
Track: Best practices

In today’s data-driven world, scalable ML infrastructure is mission-critical. As ML workloads grow, orchestration tools like Apache Airflow become essential for managing pipelines, training, deployment, and observability. In this talk, I’ll share lessons from building distributed ML systems across cloud platforms, including GPU-based training and AI-powered healthcare. We’ll cover patterns for scaling Airflow DAGs, integrating telemetry and auto-healing, and aligning cross-functional teams. Whether you’re launching your first pipeline or managing ML at scale, you’ll gain practical strategies to make Airflow the backbone of your ML infrastructure.

17:30 - 17:35. Columbia A
By Shahar Epstein
Track: Airflow & ...

Apache Airflow is a powerful workflow orchestrator, but as workloads grow, its Python-based components can become performance bottlenecks. This talk explores how Rust, with its speed, safety, and concurrency advantages, can enhance Airflow’s core components (e.g, scheduler, DAG processor, etc). We’ll dive into the motivations behind using Rust, architectural trade-offs, and the challenges of bridging the gap between Python and Rust. A proof-of-concept showcasing an Airflow scheduler rewritten in Rust will demonstrate the potential benefits of this approach.