Dataclasses as Pipeline Definitions in Airflow

(04:00-04:25 UTC) -
Add to Calendar 07/13/2021 4:00 AM 07/13/2021 4:25 AM UTC Airflow Summit: Dataclasses as Pipeline Definitions in …

We will describe how we were able to build a system in Airflow for MySQL to Redshift ETL pipelines defined in pure Python using dataclasses. These dataclasses are then used to dynamically generate DAGs depending on pipeline type. This setup allows us to implement robust testing, validation, alerts, and documentation for our pipelines. We will also describe the performance improvements we achieved by upgrading to Airflow 2.0.

https://airflowsummit.org/live

We will describe how we were able to build a system in Airflow for MySQL to Redshift ETL pipelines defined in pure Python using dataclasses. These dataclasses are then used to dynamically generate DAGs depending on pipeline type. This setup allows us to implement robust testing, validation, alerts, and documentation for our pipelines. We will also describe the performance improvements we achieved by upgrading to Airflow 2.0.