In this presentation, we discuss how we built a fully managed workflow orchestration system at Salesforce using Apache Airflow to facilitate dependable data lake infrastructure on the public cloud. We touch upon how we utilized kubernetes for increased scalability and resilience, as well as the most effective approaches for managing and scaling data pipelines. We will also talk about how we addressed data security and privacy, multitenancy, and interoperability with other internal systems.

We discuss how we use this system to empower users with the ability to effortlessly build reliable pipelines that incorporate failure detection, alerting, and monitoring for deep insights through monitoring, removing the undifferentiated heavy lifting associated with running and managing their own orchestration engines.

Lastly, we elaborate on how we integrated our in-house CI/CD pipelines to enable effective DAG and dependency management, further enhancing the system’s capabilities.