Using Airflow to speed up development of data intensive tools

Presented at Airflow Summit 2020

In this talk we review how Airflow helped create a tool to detect data anomalies. Leveraging Airflow for process management, database interoperability, and authentication created an easy path forward to achieve scale, decrease the development time and pass security audits. While Airflow is generally looked at as a solution to manage data pipelines, integrating tools with Airflow can also speed up development of those tools.

The Data Anomaly Detector was created at One Medical to scan thousands of metrics per day for data anomalies.  It’s a complicated tool and much of that complexity was outsourced to Airflow.  Because the data infrastructure at One Medical was already built around Airflow, and Airflow had many desirable features, it made sense to build the tool to integrate closely with Airflow.  The end result was more time could be spend on building features to do statistical analysis, and less effort had to be spent on database authentication, interoperability or process management.  It’s an interesting example of how Airflow can be leveraged to build data intensive tools.