How do you monitor Airflow across 50 teams in real-time? How do downstream systems react instantly to pipeline completions without polling APIs? How do you build custom dashboards without overloading Airflow’s database? This talk demonstrates how we use Change Data Capture to stream Airflow’s metadata to Kafka, making orchestration events consumable by any system in real-time. By capturing changes in Airflow’s Postgres database and publishing them to Kafka topics, we enable instant notifications, real-time dashboards, compliance audit trails, and cross-system orchestration without modifying Airflow code or impacting performance. You’ll learn how to set up Debezium CDC for Airflow’s metadata tables, design Kafka topics for task and DAG events, build real-time consumers for monitoring and alerting, handle schema evolution across Airflow upgrades, and implement cost attribution and SLA monitoring in real-time. Using production examples processing millions of events daily, I’ll share architecture decisions, performance optimizations, and lessons from running CDC at scale. You’ll leave with patterns for making Airflow observable to your entire organization.

Vipin Kataria

Picarro Inc, Architect - Data/ML