Data DAGs with lineage for fun and for profit

Speaker(s): Bolke de Bruin
When: (Jul-7 17:00 UTC)

Let’s be honest about it. Many of us don’t consider data lineage to be cool. But what if lineage would allow you to write less boilerplate and less code, while at the same time make your data scientists, your auditors, your management and well everyone more happy? What if you could write DAGs that mix between tasks based and data based?

Lineage support has been incubating with Airflow for a while. It was buggy and not very easy to use. Still for a lot of reasons it is really cool to have data lineage available. One of those reasons is that it can make writing DAGs a lot easier. Recently a lot of development has gone into improved lineage support and to make it much easier or even transparent to use. In this talk I will focus on what we have in mind, evangelize data lineage but also gather feedback from the audience where we should take it next.