Apache Airflow and Ray: Orchestrating ML at Scale

Presented at Airflow Summit 2021

As the Apache Airflow project grows, we seek both ways to incorporate rising technologies and novel ways to expose them to our users. Ray is one of the fastest-growing distributed computation systems on the market today. In this talk, we will introduce the Ray decorator and Ray backend. These features, built with the help of the Ray maintainers at Anyscale, will allow Data Scientists to natively integrate their distributed pandas, XGBoost, and TensorFlow jobs to their airflow pipelines with a single decorator. By merging the orchestration of Airflow and the distributed computation of Ray, this coordination of technologies opens Airflow users to a whole host of new possibilities when designing their pipelines.