Effective Cross-DAG dependency

Presented at Airflow Summit 2020

Cross-DAG dependency may reduce cohesion in data pipelines and, without having an explicit solution in Airflow or in a third-party plugin, those pipelines tend to become complex to handle. That is the reason we, at QuintoAndar, have created an intermediate DAG to handle relationships across data pipelines called Mediator, in order for them to be scalable and maintainable by any team.

At QuintoAndar we seek automation and modularization in our data pipelines and believe that breaking them into many responsibility modules (DAGs) enhances maintainability, reusability and understanding to move data from one point to another. However, extending interconnections between DAGs tend to reduce those enhancements, make them complex and, above all, there’s no explicit built-in solution in Airflow for them. That is why we created a Mediator DAG.

The Mediator DAG in Airflow has the responsibility of looking for successfully finished DAG executions that may represent the previous step of another. That is, if a DAG is dependent of another, the Mediator will take care of checking and triggering the necessary objects for the data flow to continue.

In conclusion, it is sometimes not practical to combine multiple DAGs into one. Hence, our proposal, is to define a Mediator DAG to handle dependencies and bring cohesion to a data pipeline without losing its purpose.

View presentation (Prezi)