Ever wondered what happens between typing SELECT ... GROUP BY and getting results back? Inside every SQL engine lives a scheduler that breaks your query into a DAG of tasks — shuffling, sorting, aggregating, and parallelizing work across partitions. Sound familiar?
In this talk, I’ll demystify SQL engine internals by building one on top of Apache Airflow. We’ll take a SQL query, parse it, optimize it, and transform it into a DAG of Airflow tasks that you can watch execute step by step in the Airflow UI.
You’ll walk away understanding:
- How SQL engines plan and schedule query execution
- What shuffle, partition, and pipeline-breaking actually mean
- How query parallelism works under the hood
No PhD in databases required — just curiosity and an Airflow UI to watch it all unfold.
Hussein Awala
Sr. Data Engineer at Datadog | PMC member and committer at Apache Airflow