Building reuseable and trustworthy ELT pipelines (A templated approach)

Presented at Airflow Summit 2020

By Nehil Jain

To improve automation of data pipelines, I propose a universal approach to ELT pipeline that optimizes for data integrity, extensibility, and speed to delivery. The workflow is built using open source tools and standards like Apache Airflow, Singer, Great Expectations, and DBT.

Templating ETLs is challenging! The creation and maintenance of data pipelines in production require hard work to manage bugs in code and bad data.

I like to propose a data pipeline pattern that can simplify building pipelines while optimizing for data integrity and observability. The workflow is built using open source tools like Singer, Great Expectations, and DBT.

Goals:

Make ELT simple and fast to implement
Validate your assumptions of the data before you make it available for use
Allow analysts/data scientists add pain-free contributions to ELT using SQL
Generate data documentation, failure logs for quick recovery, and fixes outages in your pipeline

Target Audience:

Approachable to any level of developer
Novice data personals interested in starting ELT workflow and learning about different tools of the ecosystem
Intermediate+ developers interested in supercharging their pipeline with Write Audit Publish pattern and reducing pipeline debt

Download slides

Nehil Jain

Tech Lead, Data @ SnapTravel