Managing Apache Airflow at Scale

In this session we’ll be discussing the considerations and challenges when running Apache Airflow at scale.

We’ll start by defining what it means to run Airflow at scale. Then we’ll dive deep into understanding limitations of the Airflow architecture, Scheduler processes, and configuration options.

We’ll then define scaling workloads via containers and leveraging pools and priority, followed by scaling DAGs via dDynamic DAGs/DAG factories, CI/CD, and DAG access control.

Finally we’ll get into managing Multiple Airflow Environments, how to split up workloads, and provide central governance for Airflow environment creation and monitoring with an example of Distributing workloads across environments.