Achieving Airflow Observability

Presented at Airflow Summit 2020

By Evgeny Shulman

Identify issues in a fraction of the time and streamline root cause analysis for your DAGs. Airflow is the leading orchestration platform for data engineers. But when running Airflow at production scale, many teams have bigger needs for monitoring jobs, creating the right level of alerting, tracking problems in data, and finding the root cause of errors. In this talk we will cover our suggested approach to gaining Airflow observability so that you have the visibility you need to be productive.

What is observability? The capability of monitoring and analyzing event logs, along with KPIs and other data, that yields actionable insights.

In the data engineering context, observability is crucial for finding problems in jobs and data before those problems impact data consumers downstream. It’s a particularly difficult challenge because of the different platforms data engineers use (Airflow, Spark, Kubernetes, etc.) and the complicated life cycle of data pipeline CI/CD.

In the session, we will do a deep dive into the visibility gaps your team might face running production-scale Airflow. We will walk through a typical day in the life of finding errors in DAGs, offer best practices, and discuss open source tools you can use to extend Airflow for observability and robust monitoring.

We will use standard Airflow DAG examples to guide the presentation.

Evgeny Shulman

CTO and co-founder at databand.ai