We would love to speak about our experience upgrading our old airflow 1 infrastructure to airflow 2 on kubernetes and how we orchestrated the migration of approximately 1500 DAGs that were owned by multiple teams in our organization. We had some interesting challenges along the way and can speak about our solutions.
Points we can talk about:
- Old airflow 1 infrastructure and why we decided to move to kubernetes for airflow 2.
- Possible migration paths we thought of and why we chose the route we did.
Things we did to make the migration easier to achieve:
- Implementing dag factories - used some neat programmatic approaches to make a great factory interface for our users.
- Custom cross airflow instance dag dependency solution.
- DAG audits - how we programmatically determined which dags were actually still being used to reduce migration load.
Problems that we faced:
- DAG ownership
- Backfilling in airflow 2 k8s
- DAG dependencies