If a job fails, how can you learn about downstream datasets that have become out-of-date? Can you be confident that jobs are consuming fresh, high-quality data from their upstream sources? How might you predict the impact of a planned change on distant corners of the pipeline?
These questions become easier once you have a complete understanding of data lineage, the complex set of relationships between all of your jobs and datasets. In this talk, Ross Turk from Datakin will provide a quick introduction to the core concepts behind data lineage and an overview of common architectural approaches.