At Yahoo, we built a secure, scalable, and cost-efficient batch processing platform using Amazon MWAA to orchestrate Apache Flink jobs on EKS, managed by the Flink Kubernetes Operator. This setup enables dynamic job orchestration while meeting strict enterprise compliance standards.
In this session, we’ll share how Airflow DAGs:
-
Dynamically launch, monitor, and clean up isolated Flink clusters per batch job, improving resource efficiency.
-
Securely fetch EKS kubeconfig, submit FlinkDeployment CRDs using FlinkKubernetesOperator, and poll job status using Airflow sensors.
-
Integrate IAM for access control and meet Yahoo’s security requirements, including mutual TLS (mTLS) with Athenz.
-
Optimize for cost and resilience through automated cleanup of jobs and the operator, and handle job failures and retries.
Join us for practical strategies and lessons from Yahoo’s production-scale Flink workflows in a Kubernetes environment.
Purshotam Shah
Yahoo, Sr Principal Software Dev Engineer
David Scherba
Yahoo, Principal Software Dev Engineer