In this talk, we will introduce the DAG Management Service (DMS), developed to address critical challenges in managing Airflow clusters. With over 10,000 active DAGs, a single Airflow cluster faces scaling limits and noisy neighbor issues, impacting task scheduling SLAs. DMS enhances reliability by distributing DAGs across multiple clusters and enforcing proper configurations.
We will also discuss how DMS streamlines Airflow version upgrades. Upgrading from an old Airflow version to the latest requires sequential updates and code modifications for over 10,000 DAGs. DMS proposes an efficient upgrade method, reducing dependency on users.
Key functions of DMS include:
- DAG Deployment: Selectively deploys DAG files from GitHub to Airflow clusters via an event-driven pipeline.
- DAG Migration: Facilitates seamless DAG migration between clusters, supporting both cluster upgrades and team-specific deployments.
- Connections and Variables Management: Centralizes management of connection IDs and variables, ensuring consistency and smooth migrations.
Join us to explore how DMS can revolutionize your Airflow DAG management, enhancing scalability, reliability, and efficiency.
Sungji Yang
Coupang, Staff Backend Engineer
DaeHoon Song
Coupang, Backend Engineer