For the dag owner, testing Airflow DAGs can be complicated and tedious. Kubectl cp your dag from local to pod, exec into the pod, and run a command? Install breeze? Why pull the Airflow image and start up the webserver / scheduler / triggerer if all we want is to test the addition of a new task?
It doesn’t have to be this hard. At Etsy, we’ve simplified testing dags for the dag owner with dagtest. Dagtest is a Python package that we house on our internal PyPi. It is a small client binary that makes HTTP requests to a test API. The test API is a simple Flask server that receives these requests and builds pods to run airflow dags backfill
commands based on the options provided via dagtest. The simplest of these is a dry-run. Typically, users run test runs where the dag executes E2E for a single ds.
Equally important is the environment setup. We use an adhoc Airflow instance in a separate GCP environment with an SA that cannot write to Production buckets. This talk will discuss both.