Orchestrating Cross-Account ML & Data Pipelines with Apache Airflow As organizations scale data and ML workloads across multiple AWS accounts and Regions, orchestration becomes the hardest engineering problem — not the models themselves. This session shows how Apache Airflow serves as a centralized orchestration hub for distributed data-processing and machine-learning pipelines that span account and regional boundaries.

We walk through a production-ready architecture where a single Airflow environment coordinates:

  • Cross-account DAG patterns — using Airflow connections, IAM role assumption, and custom hooks to trigger AWS Glue, SageMaker, and Lambda in remote accounts
  • Cross-Region data flow — leveraging S3 Cross-Region Replication with S3KeySensor operators to gate downstream tasks on data availability
  • Custom operators for cross-account ML — extending SageMakerHook and SageMakerTrainingOperator to train models in a separate account while keeping orchestration centralized
  • Sensor and operator design — choosing the right sensor modes, timeouts, and poke intervals for long-running training jobs and inference calls
  • Human-in-the-loop approval gates — using Airflow’s built-in mechanisms to require manual sign-off before promoting models to production
  • Cost and governance controls — short-circuiting DAG branches on early evaluation metrics, managing spot instances, and enforcing least-privilege IAM across accounts

Attendees leave with reusable DAG patterns, operator recipes, and an architecture blueprint for running multi-account, multi-Region data and ML pipelines — all orchestrated through Airflow.

Sneha Rao

Solutions Architect -AWS

Sushmita Barthakur

Senior Data Solutions Architect, AWS

Suba Palanisamy

Enterprise Support Lead TAM