These are the confirmed sessions for Airflow Summit 2025.

Title

5 Simple Strategies To Enhance Your DAGs For Data Processing

by William Orgertrice

Take your DAGs in Apache Airflow to the next level? This is an insightful session where we’ll uncover 5 transformative strategies to enhance your data workflows. Whether you’re a data engineering pro or just getting started, this presentation is packed with practical tips and actionable insights that you can apply right away.

We’ll dive into the magic of using powerful libraries like Pandas, share techniques to trim down data volumes for faster processing, and highlight the importance of modularizing your code for easier maintenance. Plus, you’ll discover efficient ways to monitor and debug your DAGs, and how to make the most of Airflow’s built-in features.

Airflow & Bigtop: Modernize and integrate time-proven OSS stack with Apache Airflow

by Kengo Seki

Apache Bigtop is a time-proven open-source software stack for building data platform, which has been built around the Hadoop and Spark ecosystem since 2011. Its software composition has been changed during such a long period, and recently job scheduler is removed mainly due to the inactivity of its development. The speaker believes that Airflow perfectly fits into this gap and is proposing incorporating it in the Bigtop stack. This presentation will introduce how easily users can build a data platform with Bigtop including Airflow, and how Airflow can integrate those software with its wide range of providers and enterprise-readiness such as the Kerberos support.

Assets: Past, Present, Future

by Tzu-ping Chung

Airflow Asset originated from data lineage and evolved into its current state, being used as a scheduling concept (data-aware, event-based scheduling). It has even more potential. This talk discusses how other parts of Airflow, namely Connection and Object Storage, contain concepts related to Asset, and we can tie them all together to make task authoring flow even more naturally.

Planned topics:

  • Brief history on Asset and related constructs.
  • Current state of Asset concepts.
  • Inlets, anyone?
  • Finding inspiration from Pydantic et al.
  • My next step for Asset.

Behind the Scenes: How We Tested Airflow 3 for Stability and Reliability

by Rahul Vats & Phani Kumar

Ensuring the stability of a major release like Airflow 3 required extensive testing across multiple dimensions. In this session, we will dive into the testing strategies and validation techniques used to guarantee a smooth rollout. From unit and integration tests to real-world DAG validations, this talk will cover the challenges faced, key learnings, and best practices for testing Airflow. Whether you’re a contributor, QA engineer, or Airflow user preparing for migration, this session will offer valuable takeaways to improve your own testing approach.

Beyond the bundle - evolving DAG parsing in Airflow 3

by Igor Kholopov

Airflow 3 made some great strides with AIP-66, introducing the concept of a DAG bundle. This successfully challenged one of the fundamental architectural limitations of original Airflow design of how DAGs are deployed, bringing the structure to something that often had to be operated as a pile of files in the past. However, we believe that this by no means should be the end of the road when it comes to making the DAG management easier, authoring more accessible to a broader audience, and integration with Data Agents smoother. We believe that the next step in Airflow’s evolution is in having a native option to break away from the necessity of having a real file in file systems on multiple components to have your DAG up and running. This is what we are hoping to achieve as part of AIP-85 - extendable DAG parsing control. In this talk I’d like to give a detailed overview of how we want to make it happen and show the examples of the valuable integrations we hope to unblock with it.

Bringing Apache Airflow to a Security-First Organization: The Unconventional Battle for Automation

by Oluwafemi Olawoyin

What happens when you introduce Apache Airflow in an organization where every change is scrutinized, every deployment must pass security hurdles, and cloud is not an option?

In this talk, I share how I successfully implemented Airflow inside a high-compliance, security-driven environment, overcoming infrastructure limitations, manual workflows, and initial resistance from leadership.

This session will cover:

  1. How I deployed Airflow in an on-premise, cost-conscious setup without Docker or cloud
  2. How I automated DAG deployments from GitHub Actions, bridging Windows-based engineers with a Linux-hosted Airflow instance
  3. Security and compliance challenges: Getting Airflow approved in a highly regulated setting Scaling workflows: From a single team to cross-departmental adoption
  4. What worked, what failed, and what I’d do differently

Who Should Attend?

Building an Airflow Center of Excellence: Lessons from the Frontlines

by Jonathan Leek & Michelle Winters

As organizations scale their data infrastructure, Apache Airflow becomes a mission-critical component for orchestrating workflows efficiently. But scaling Airflow successfully isn’t just about running pipelines—it’s about building a Center of Excellence (CoE) that empowers teams with the right strategy, best practices, and long-term enablement. Join Jon Leek and Michelle Winters as they share their experiences helping customers design and implement Airflow Centers of Excellence. They’ll walk through real-world challenges, best practices, and the structured approach Astronomer takes to ensure teams have the right plan, resources, and support to succeed. Whether you’re just starting with Airflow or looking to optimize and scale your workflows, this session will give you a proven framework to build a sustainable Airflow Center of Excellence within your organization. 🚀

Common provider abstractions: Key for multi-cloud data handling

by Vikram Koka

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time.

Creating DuoFactory: An Orchestration Ecosystem with Airflow

by Belle Romea

Duolingo has built an internal tool DuoFactory to orchestrate AI generated content using Airflow. The tool has been used to generate example sentences per lesson, math exercises, and Duoradio lessons. The ecosystem is flexible for various company needs. Some of these use cases contain end to end generation where one click of a button generates content in app. We also have created a Workflow Builder to orchestrate and iterate on generative AI workflows by creating one-time DAG instances with a UI easy enough for non-engineers to use.

DAGLint: Elevating Airflow DAG Quality Through Automated Linting

by Snir Israeli

Maintaining consistency, code quality, and best practices for writing Airflow DAGs between teams and individual developers can be a significant challenge. Trying to achieve it using manual code reviews is both time-consuming and error-prone.

To solve this at Next, we decided to build a custom, internally developed linting tool for Airflow DAGs, to help us evaluate their quality and uniformity - we call it - DAGLint.

In this talk I am going to share why we chose to implement it, how we built it, and how we use it to elevate our code quality and standards throughout the entire Data engineering group.

Data Quality and Observability with Airflow

by Ipsa Trivedi & Chirag Tailor

Tekmetric is the largest cloud based auto shop management system in the United States. We process vast amounts of data from various integrations with internal and external systems. Data quality and governance are crucial for both our internal operations and the success of our customers.

We leverage multi-step data processing pipelines using AWS services and Airflow. While we utilize traditional data pipeline workflows to manage and move data, we go beyond standard orchestration. After data is processed, we apply tailored quality checks for schema validation, record completeness, freshness, duplication and more.

Designing Scalable Retrieval-Augmented Generation (RAG) Pipelines at SAP with Apache Airflow

by Sagar Sharma

At SAP Business AI, we’ve transformed Retrieval-Augmented Generation (RAG) pipelines into enterprise-grade powerhouses using Apache Airflow. Our Generative AI Foundations Team developed a cutting-edge system that effectively grounds Large Language Models (LLMs) with rich SAP enterprise data. Powering Joule for Consultants, our innovative AI copilot, this pipeline manages the seamless ingestion, sophisticated metadata enrichment, and efficient lifecycle management of over a million structured and unstructured documents. By leveraging Airflow’s Dynamic DAGs, TaskFlow API, XCom, and Kubernetes Event-Driven Autoscaling (KEDA), we achieved unprecedented scalability and flexibility. Join our session to discover actionable insights, innovative scaling strategies, and a forward-looking vision for Pipeline-as-a-Service, empowering seamless integration of customer-generated content into scalable AI workflows

Do you trust Airflow with your money? (We do!)

by Nick Bilozerov, Daniel Melchor & Sabrina Liu

Airflow is wonderfully, frustratingly complex - and so is global finance! Stripe has very specific needs all over the planet, and we have customized Airflow to adapt to the variety and rigor that we need to grow the GDP of the internet.

In this talk, you’ll learn:

  • How we support independent DAG change management for over 500 different teams running over 150k tasks.

  • How we’ve customized Airflow’s Kubernetes integration to comply with Stripe’s unique compliance requirements.

EdgeExecutor / Edge Worker - The new option to run anywhere

by Jens Scheffler & Daniel Wolf

Airflow 3 extends the deployment options to run your workload anywhere. You don’t need to bring your data to airflow but you can bring the execution where it needs to be.

You can connect any cloud and on-prem location together and generate a hybrid workflow from one central Airflow instance. Only a HTTP connection is needed.

We will present the use cases and concepts of the Edge deployment and how it is working also in a hybrid setup with Celery or other executors.

ELT, AI, and Elections: Leveraging Airflow and Machine Learning to Analyze Voting Behavior at INTRVL

by Kyle McCluskey

Discover how Apache Airflow powers scalable ELT pipelines, enabling seamless data ingestion, transformation, and machine learning-driven insights. This session will walk through:

Automating Data Ingestion: Using Airflow to orchestrate raw data ingestion from third-party sources into your data lake (S3, GCP), ensuring a steady pipeline of high-quality training and prediction data.

Optimizing Transformations with Serverless Computing: Offloading intensive transformations to serverless functions (GCP Cloud Run, AWS Lambda) and machine learning models (BigQuery ML, Sagemaker), integrating their outputs seamlessly into Airflow workflows.

Ensuring Data Accuracy & Consistency with Airflow and dbt Tests

by Bao Nguyen

As analytics engineers, ensuring data accuracy and consistency is critical, but how do we systematically catch errors before they impact stakeholders? This session will explore how to integrate Airflow with dbt tests to build reliable and automated data validation workflows.

We’ll cover:

  • How to orchestrate dbt tests with Airflow DAGs for real-time data quality checks.
  • Handling test failures with alerting and retry strategies.
  • Using custom dbt tests for advanced validation beyond built-in checks.
  • Best practices for data observability, logging, and monitoring failed runs.

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

by Andrea Bombino & Nawfel Bacha

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt.

Event-Driven, Partition-Aware: Modern Orchestration with Airflow at Datadog

by Julien Le Dem

Datadog is a world-class data platform ingesting more than a 100 trillion events a day, providing real-time insights.

Before Airflow’s prominence, we built batch processing on Luigi, Spotify’s open-source orchestrator. As Airflow gained wide adoption, we evaluated adopting the major improvements of release 2.0, but opted for building our own orchestrator instead to realize our dataset-centric, event-driven vision.

Meanwhile, the 3.0 release aligned Airflow with the same vision we pursued internally, as a modern asset-driven orchestrator. It showed how futile it is to build our own compared to the momentum of the community. We evaluated several orchestrators and decided to join forces with the Airflow project.

From Chaos to Cosmos: Automating DBT Workflows with Airflow at Riot

by Zach Ward

Breaking Barriers in Data Orchestration: Discover how Riot Games has integrated DBT, Airflow, Astronomer Cosmos, and custom automation to transform data workflows from a complex, code-heavy process into a simple, config-driven experience.

From Complexity to Simplicity: Learn how Riot has dramatically reduced the technical overhead of building and orchestrating DBT models, slashing Time to Production and accelerating Time to Insights.

Building a Seamless Data Pipeline Ecosystem: Get a high-level overview of how we’ve stitched together multiple technologies to create a unified, scalable, and developer-friendly pipelining system.

From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

by Salih Goktug Kose & Burak Ozdemir

At Trendyol, Turkey’s leading e-commerce company, Apache Airflow powers our task orchestration, handling DAGs with 500+ tasks, complex interdependencies, and diverse environments. Managing on-prem Airflow instances posed challenges in scalability, maintenance, and deployment. To address these, we built TaskHarbor, a fully managed orchestration platform with a hybrid architecture—combining Airflow on GKE with on-prem resources for optimal performance and efficiency.

This talk covers how we:

  • Enabled seamless DAG synchronization across environments using GCS Fuse.
  • Optimized workload distribution via GCP’s HTTPS & TCP Load Balancers.
  • Automated infrastructure provisioning (GKE, CloudSQL, Kubernetes) using Terraform.
  • Simplified Airflow deployments by replacing Helm YAML files with a custom templating tool, reducing configurations to 10-15 lines.
  • Built a fully automated deployment pipeline, ensuring zero developer intervention.

We enhanced efficiency, reliability, and automation in hybrid orchestration by embracing a scalable, maintainable, and cloud-native strategy. Attendees will obtain practical insights into architecting Airflow at scale and optimizing deployments.

From DAGs to Insights: Business-Driven Airflow Use Cases

by Tala Karadsheh

Airflow is integral to GitHub’s data and insight generation. This session dives into use cases from GitHub where key business decisions are driven, at the root, with the help of Airflow. The session will also highlight how both GitHub and Airflow celebrate, promote, and nurture OSS innovations in their own ways.

From Oops to Secure Ops: Self-Hosted AI for Airflow Failure Diagnosis

by Nathan Hadfield

Last year, ‘From Oops to Ops’ showed how AI-powered failure analysis could help diagnose why Airflow tasks fail. But do we really need large, expensive cloud-based AI models to answer simple diagnostic questions? Relying on external AI APIs introduces privacy risks, unpredictable costs, and latency, often without clear benefits for this use case.

With the rise of distilled, open-source models, self-hosted failure analysis is now a practical alternative. This talk will explore how to deploy an AI service on infrastructure you control, compare cost, speed, and accuracy between OpenAI’s API and self-hosted models, and showcase a live demo of AI-powered task failure diagnosis using DeepSeek and Llama—running without external dependencies to keep data private and costs predictable.

Get Certified: DAG Authoring for Apache Airflow 3

by Marc Lamberti

We’re excited to offer Airflow Summit 2025 attendees an exclusive opportunity to earn their DAG Authoring certification in person, now updated to include all the latest Airflow 3.0 features. This certification workshop comes at no additional cost to summit attendees.

The DAG Authoring for Apache Airflow certification validates your expertise in advanced Airflow concepts and demonstrates your ability to build production-grade data pipelines. It covers TaskFlow API, Dynamic task mapping, Templating, Asset-driven scheduling, Best practices for production DAGs, and new Airflow 3.0 features and optimizations.

How Airflow can help with Data Management and Governance

by Kunal Jain

Metadata management is a cornerstone of effective data governance, yet it presents unique challenges distinct from traditional data engineering. At scale, efficiently extracting metadata from relational and NoSQL databases demands specialized solutions. To address this, our team has developed custom Airflow operators that scan and extract metadata across various database technologies, orchestrating 100+ production jobs to ensure continuous and reliable metadata collection.

Now, we’re expanding beyond databases to tackle non-traditional data sources such as file repositories and message queues. This shift introduces new complexities, including processing structured and unstructured files, managing schema evolution in streaming data, and maintaining metadata consistency across heterogeneous sources. In this session, we’ll share our approach to building scalable metadata scanners, optimizing performance, and ensuring adaptability across diverse data environments. Attendees will gain insights into designing efficient metadata pipelines, overcoming common pitfalls, and leveraging Airflow to drive metadata governance at scale.

How Airflow solves the coordination of decentralised teams at Vinted

by Oscar Ligthart & Rodrigo Loredo

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams.

Implementing Operations Research Problems with Apache Airflow: From Modelling to Production

by Philippe Gagnon

This workshop will provide an overview of implementing operations research problems using Apache Airflow. This is a hands-on session where attendees will gain experience creating DAGs to define and manage workflows for classical operations research problems. The workshop will include several examples of how Airflow can be used to optimize and automate various decision-making processes, including:

Inventory management: How to use Airflow to optimize inventory levels and reduce stockouts by analyzing demand patterns, lead times, and other factors.

Introducing Apache Airflow® 3 – The Next Evolution in Orchestration

by Amogh Desai, Ash Berlin-Taylor, Brent Bovenzi, Bugra Ozturk, Daniel Standish, Jed Cunningham, Jens Scheffler, Kaxil Naik, Pierre Jeambrun, Tzu-ping Chung, Vikram Koka & Vincent Beck

Apache Airflow® 3 is here, bringing major improvements to data orchestration. In this keynote, core Airflow contributors will walk through key enhancements that boost flexibility, efficiency, and user experience.

Vikram Koka will kick things off with an overview of Airflow 3, followed by deep dives into DAG versioning (Jed Cunningham), enhanced backfilling (Daniel Standish), and a modernized UI (Brent Bovenzi & Pierre Jeambrun).

Next, Ash Berlin-Taylor, Kaxil Naik, and Amogh Desai will introduce the Task Execution Interface and Task SDK, enabling tasks in any environment and language. Jens Scheffler will showcase the Edge Executor, while Tzu-ping Chung and Vincent Beck will demo event-driven scheduling and data assets. Finally, Buğra Öztürk will unveil CLI enhancements for automation and debugging.

Lessons learned for scaling up Airflow 3 in Public Cloud

by Przemek Wiech

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2.

This presentation is going to present performance results for various Airflow 3 configurations and provide information to users to should give Airflow 3 adopters good understanding of Airflow 3 performance.

The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users.

Linkedin's Journey on scaling airflow

by Rahul Gade

Last year, we shared how LinkedIn’s continuous deployment platform (LCD) leveraged Apache Airflow to streamline and automate deployment workflows. LCD is the deployment platform inside Linkedin which is actively used by all engineers (10000+) at Likedin.

This year, we take a deeper dive into the challenges, solutions, and engineering innovations that helped us scale Airflow to support thousands of concurrent tasks while maintaining usability and reliability.

Key Takeaways: Abstracting Airflow for a Better User Experience – How we designed a system where users could define and update their workflows without directly interacting with Airflow.

Managing Airflow DAGs Across DAG and ETL Repos

by Yunhao Qing

At Lyft, we manage Airflow DAGs across both the ETL and DAG repos, each serving distinct needs. The ETL repo is ideal for simple use cases and users with only a few DAGs, offering a streamlined workflow. Meanwhile, the DAG repo supports power users with numerous DAGs, custom dependencies, and complex ML pipelines. In this session, I’ll share how we structure these repos, the trade-offs involved, and best practices for scaling Airflow DAG management across diverse teams and workloads.

Mastering Event-Driven in Airflow 3: Building Scalable Data Pipelines

by Luan Moreno Medeiros Maciel

Transform your data pipelines with event-driven scheduling in Airflow 3. In this hands-on workshop, you’ll:

  • Set up AssetWatchers to track S3, Kafka, or database events
  • Build DAGs that trigger instantly on new data
  • Master scaling techniques for high-volume workflows

Create a live pipeline—process logs or IoT data in real time—and adapt it to your needs. No event-driven experience required; just bring a laptop and Airflow basics. Gain practical skills to make your pipelines responsive and efficient.

Model Context Protocol with Airflow

by Abhishek Bhakat & Sudarshan Chaudhari

In today’s data-driven world, effective workflow management and AI are crucial for success. However, there’s a notable gap between Airflow and AI. Our presentation offers a solution to close this gap.

Proposing MCP (Model Context Protocol) server to act as a bridge. We’ll dive into two paths:

  • AI-Augmented Airflow: Enhancing Airflow with AI to improve error handling, automate DAG generation, proactively detect issues, and optimize resource use.
  • Airflow-Powered AI: Utilizing Airflow’s reliability to empower LLMs in executing complex tasks, orchestrating AI agents, and supporting decision-making with real-time data.

Key takeaways:

New Tools, Same Craft: The Developer's Toolbox in 2025

by Brooke Jamieson

Our development workflows look dramatically different than they did a year ago. Code generation, automated testing, and AI-assisted documentation tools are now part of many developers’ daily work. Yet as these tools reshape how we code, I’ve noticed something worth examining: while our toolbox is changing rapidly, the core of being a good developer hasn’t. Problem-solving, collaborative debugging, and systems thinking remain as crucial as ever.

In this keynote, I’ll share observations about:

Orchestrating Apache Airflow ML Workflows at Scale with SageMaker Unified Studio

by Vinod Jayendra

As organizations increasingly rely on data-driven applications, managing the diverse tools, data, and teams involved can create challenges. Amazon SageMaker Unified Studio addresses this by providing an integrated, governed platform to orchestrate end-to-end data and AI/ML workflows.

In this workshop, we’ll explore how to leverage Amazon SageMaker Unified Studio to build and deploy scalable Apache Airflow workflows that span the data and AI/ML lifecycle. We’ll walk through real-world examples showcasing how this AWS service brings together familiar Airflow capabilities with SageMaker’s data processing, model training, and inference features - all within a unified, collaborative workspace.

Orchestrating Global Market Data Pipelines with Airflow

by Di Wu

In this presentation, I will highlight how Apache Airflow addresses key data management challenges for Exchange-Traded Funds (ETFs) in the global financial market. ETFs, which combine features of mutual funds and stocks, track indexes, commodities, or baskets of assets and trade on major stock exchanges. Because they operate around the clock across multiple time zones, ETF managers must navigate diverse regulations, coordinate complex operational constraints, and ensure accurate valuations. This often requires integrating data from vendors for pricing and reference details. These data sets arrive at different times, can conflict, and must pass rigorous quality checks before being published for global investors. Managing updates, orchestrating workflows, and maintaining high data quality present significant hurdles. Apache Airflow tackles these issues by scheduling repetitive tasks and enabling event-triggered job runs for immediate data checks. It offers monitoring and alerting, thus reducing manual intervention and errors. Using DAGs, Airflow scales efficiently, streamlining complex data ingestion, validation, and publication processes.

Orchestrating MLOps and Data Transformation at EDB with Airflow

by Karthik Dulam

This talk explores EDB’s journey from siloed reporting to a unified data platform, powered by Airflow. We’ll delve into the architectural evolution, showcasing how Airflow orchestrates a diverse range of use cases, from Analytics Engineering to complex MLOps pipelines.

Learn how EDB leverages Airflow and Cosmos to integrate dbt for robust data transformations, ensuring data quality and consistency.

We’ll provide a detailed case study of our MLOps implementation, demonstrating how Airflow manages training, inference, and model monitoring pipelines for Azure Machine Learning models.

Purple is the new green: harnessing deferrable operators to improve performance & reduce costs

by Ethan Shalev

Airflow’s traditional execution model often leads to wasted resources: worker nodes sitting idle, waiting on external systems. At Wix, we tackled this inefficiency head-on by refactoring our in-house operators to support Airflow’s deferrable execution model.

Join us on a walk through Wix’s journey to a more efficient Airflow setup, from identifying bottlenecks to implementing deferrable operators and reaping their benefits. We’ll share the alternatives considered, the refactoring process, and how the team seamlessly integrated deferrable execution with no disruption to data engineers’ workflows.

Run Airflow tasks on your coffee machine

by Cedrik Neumann

Airflow 3 comes with two new features: Edge execution and the task SDK. Powered by a HTTP API, these make it possible to write and execute Airflow tasks in any language from anywhere.

In this session I will explain some of the APIs needed and show how to interact with them based on an embedded toy worker written in Rust and running on an ESP32-C3. Furthermore I will provide practical tips on writing your own edge worker and how to develop against a running instance of Airflow.

Scaling and Unifying Multiple Airflow Instances with Orchestration Frederator

by Chirag Todarka & Alvin Zhang

In large organizations, multiple Apache Airflow instances often arise organically—driven by team-specific needs, distinct use cases, or tiered workloads. This fragmentation introduces complexity, operational overhead, and higher infrastructure costs. To address these challenges, we developed the “Orchestration Frederator,” a solution designed to unify and horizontally scale multiple Airflow deployments seamlessly.

This session will detail our journey in implementing Orchestration Frederator, highlighting how we achieved:

  • Horizontal Scalability: Seamlessly scaling Airflow across multiple instances without operational overhead.

Seamless Integration: Building Applications That Leverage Airflow's Database Migration Framework

by Ephraim Anierobi

This session presents a comprehensive guide to building applications that integrate with Apache Airflow’s database migration system. We’ll explore how to harness Airflow’s robust Alembic-based migration toolchain to maintain schema compatibility between Airflow and custom applications, enabling developers to create solutions that evolve alongside the Airflow ecosystem without disruption.

Security made us do it: Airflow’s new Task Execution Architecture

by Amogh Desai & Ash Berlin-Taylor

Airflow v2 architecture has strong coupling between the Airflow core & the User Code running in an Airflow task. This poses barriers in security, maintenance, and adoption. One such threat is that user code can access the source of truth of Airflow - the metadata DB and run any query against it! From a scalability angle, ‘n’ tasks create ‘n’ DB connections, limiting Airflow’s ability to scale effectively.

To address this we proposed AIP-72 – a client-server model for task execution. The new architecture addresses several long-standing issues, including DB isolation from workers, dependency conflicts between Airflow core & workers, and ‘n’ number of DB connections.The new architecture has two parts:

Simplifying Data Management with DAG Factory

by Katarzyna Kalek & Jakub Orlowski

At OLX, we connect millions of people daily through our online marketplace while relying on robust data pipelines. In this talk, we explore how the DAG Factory concept elevates data governance, lineage, and discovery by centralizing operator logic and restricting direct DAG creation. This approach enforces code quality, optimizes resources, maintains infrastructure hygiene and enables smooth version upgrades. We then leverage consistent naming conventions in Airflow to build targeted namespaces, aligning teams with global policies while preserving autonomy. Integrating external tools like AWS Lake Formation and Open Metadata further unifies governance, making it straightforward to manage and secure data. This is critical when handling hundreds or even thousands of active DAGs. If the idea of storing 1,600 pipelines in one folder seems overwhelming, join us to learn how the DAG Factory concept simplifies pipeline management. We’ll also share insights from OLX, highlighting how thoughtful design fosters oversight, efficiency, and discoverability across diverse use cases.

Supercharging Apache Airflow: Enhancing Core Components with Rust

by Shahar Epstein

Apache Airflow is a powerful workflow orchestrator, but as workloads grow, its Python-based components can become performance bottlenecks. This talk explores how Rust, with its speed, safety, and concurrency advantages, can enhance Airflow’s core components (e.g, scheduler, DAG processor, etc). We’ll dive into the motivations behind using Rust, architectural trade-offs, and the challenges of bridging the gap between Python and Rust. A proof-of-concept showcasing an Airflow scheduler rewritten in Rust will demonstrate the potential benefits of this approach.

Task failures troubleshooting based on Airflow & Kubernetes signals

by Khadija Al Ahyane

Per Airflow community survey, Kubernetes is the most popular compute platform used to run Airflow and when run on Kubernetes, Airflow gains, out of the box, lots of benefits like monitoring, reliability, ease of deployment, scalability and autoscaling. On the other hand, running Airflow on Kubernetes means running a sophisticated distributed system on another distributed system which makes troubleshooting of Airflow tasks and DAGs failures harder.

This session tackles that bottleneck head-on, introducing a practical approach to building an automated diagnostic pipeline for Airflow on Kubernetes. Imagine offloading tedious investigations to a system that, on task failure, automatically collects and correlates key signals from Kubernetes components (linking Airflow tasks to specific Pods and their events), KubernetesGKE monitoring, and relevant logs—pinpointing root causes and suggesting actionable fixes.

Transforming Data Engineering: Achieving Efficiency and Ease with an Intuitive Orchestration Solution

by Rakesh Kumar Tai & Mili Tripathi

In the rapidly evolving field of data engineering and data science, efficiency and ease of use are crucial. Our innovative solution offers a user-friendly interface to manage and schedule custom PySpark, PySQL, Python, and SQL code, streamlining the process from development to production. Using Airflow at the backend, this tool eliminates the complexities of infrastructure management, version control, CI/CD processes, and workflow orchestration.The intuitive UI allows users to upload code, configure job parameters, and set schedules effortlessly, without the need for additional scripting or coding. Additionally, users have the flexibility to bring their own custom artifactory solution and run their code. In summary, our solution significantly enhances the orchestration and scheduling of custom code, breaking down traditional barriers and empowering organizations to maximize their data’s potential and drive innovation efficiently. Whether you are an individual data scientist or part of a large data engineering team, this tool provides the resources needed to streamline your workflow and achieve your goals faster than ever before.

Transforming Insurance underwriting with Agentic AI

by Peeyush Rai

The weav.ai platform is built on top of Apache Airflow, chosen for its deterministic, predictable execution coupled with extreme developer customizability. weav.ai has seamlessly integrated its AI agents with Airflow to enable a unified AI orchestration to bring the power of scalability, robustness and the intelligence of AI in a single process. This talk will focus on the use cases being served, an architecture overview of the key Airflow capabilities being leveraged, and how Agentic AI has been seamlessly integrated to deliver the AI powered workflows. Weav.ai’s platform is agnostic to any specific cloud or LLM and can orchestrate across those based on the use case.

Unleash Airflow's Potential with hands-on Performance Optimization workshop

by Mike Ellis

This interactive workshop session empowers you to unlock the full potential of Apache Airflow through performance optimization techniques. Gain hands-on experience identifying performance bottlenecks and implementing best practices to overcome them.

Variant DAGs: A new approach to support A/B testing in ML workflows with Airflow

by Shlomi Chovel

Our team at Amazon’s P13n faced a challenge: how to create experimental variants of ML workflows for A/B testing product recommendations on the website without sacrificing speed or resources. The solution we developed addresses this challenge while disrupting the conventional Airflow practices.

Airflow documentation advises: “Never delete a task from a DAG.. It is advised to create a new DAG in case the tasks need to be deleted.”

We offer an alternative perspective: by carefully modifying existing DAGs to create experimental variants, we’ve achieved a balance between code reuse and experiment isolation.

Why AWS chose Apache Airflow to power workflows for the next generation of Amazon SageMaker

by John Jackson

On March 13th, 2025, Amazon Web Services announced General Availability of Amazon SageMaker Unified Studio, bringing together AWS machine learning and analytics capabilities. At the heart of this next generation of Amazon SageMaker sits Apache Airflow. All SageMaker Unified Studio users have a personal, open-source Airflow deployment, running alongside their Jupyter notebook, enabling those users to easily develop Airflow DAGs that have unified access to all of their data.

In this talk, I will go into details around the motivations for choosing Airflow for this capability, the challenges with incorporating Airflow into such a large and diverse experience, the key role that open-source plays, how we’re leveraging GenAI to make that open source development experience better, and the goals for the future of Airflow in SageMaker Unified Studio.

Why Data Teams Keep Reinventing the Wheel: The Struggle for Code Reuse in the Data Transformation La

by Maxime Beauchemin

Data teams have a bad habit: reinventing the wheel. Despite the explosion of open-source tooling, best practices, and managed services, teams still find themselves building bespoke data platforms from scratch—often hitting the same roadblocks as those before them. Why does this keep happening, and more importantly, how can we break the cycle?

In this talk, we’ll unpack the key reasons data teams default to building rather than adopting, from technical nuances to cultural and organizational dynamics. We’ll discuss why fragmentation in the modern data stack, the pressure to “own” infrastructure, and the allure of in-house solutions make this problem so persistent.

Workshop: Get started with Airflow 3.0

by Kenten Danas

Airflow 3.0 is the most significant release in the project’s history, and brings a better user experience, stronger security, and the ability to run tasks anywhere, at any time. In this workshop, you’ll get hands-on experience with the new release and learn how to leverage new features like DAG versioning, backfills, data assets, and a new react-based UI.

Whether you’re writing traditional ELT/ETL pipelines or complex ML and GenAI workflows, you’ll learn how Airflow 3 will make your day-to-day work smoother and your pipelines even more flexible. This workshop is suitable for intermediate to advanced Airflow users. Beginning users should consider taking the Airflow fundamentals course on the Astronomer Academy before attending this workshop.

Your first Apache Airflow Contribution

by Ryan Hatter, Amogh Desai & Phani Kumar

Ready to contribute to Apache Airflow? In this hands-on workshop, you’ll be expected to come prepared with your development environment already configured (Breeze installed is strongly recommended, but Codespaces works if you can’t install Docker). We’ll dive straight into finding issues that match your skills and walk you through the entire contribution process—from creating your first pull request to receiving community feedback. Whether you’re writing code, enhancing documentation, or offering feedback, there’s a place for you. Let’s get started and see your name among Airflow contributors!