Dates & times should show in your local time zone.
by Mohammed Marragh Alaeddine Maaoui
This talk covers an overview of Airflow as well as lessons learned of its implementation in a banking production environment which is Société Générale. It will be the summary of a two-year experience, a storytelling of an adventure within Société Générale in order to offer an internal cloud solution based on Airflow (AirflowaaS).
by Nitish Victor Preethi Ganeshan Xiaoqin Zhu
In this talk, we share the lessons learned while building a scheduler-as-a-service leveraging Apache Airflow to achieve improved stability and security for one of the largest gaming companies. The platform integrates with different data sources and meets varied SLA’s across workflows owned by multiple game studios. In particular, we present a comprehensive self-serve airflow architecture with multi-tenancy, auto-dag generation, SSO-integration with improved ease of deployment.
by Kevin Yang Dan Davydov Tao Feng
In this talk, colleagues from Airbnb, Twitter and Lyft share details about how they are using Apache Airflow to power their data pipelines.
by Bolke de Bruin
Let’s be honest about it. Many of us don’t consider data lineage to be cool. But what if lineage would allow you to write less boilerplate and less code, while at the same time make your data scientists, your auditors, your management and well everyone more happy? What if you could write DAGs that mix between tasks based and data based?
by Michael Hewitt
At Nielsen Digital we have been moving our ETLs to containerized environments managed by Kubernetes. We have successfully transferred some of our ETLs to this environment in production. In order to do this we used the following technologies: Helm to easily deploy Airflow on to Kubernetes; Airflow’s Kubernetes Executor to take full advantage Kubernetes features; and Airflow’s Kubernetes Pod Operator in order to execute our containerized Tasks within our DAGs. To automate a lot of the deployment process we also used Terraform. Lastly, Kubernetes features were used to gain much more fine grained control of Airflows infrastructure.
by Aishwarya Sankaravadivel
by Mihail Petkov Emil Todorov
Financial Times is increasing its digital revenue by allowing business people to make data-driven decisions. Providing an Airflow based platform where data engineers, data scientists, BI experts and others can run language agnostic jobs was a huge swing. One of the most successful steps in the platform’s development was building our own execution environment, allowing stakeholders to self deploy jobs without cross team dependencies on top of the unlimited scale of Kubernetes. In this talk we share how we have integrated and extended Airflow at Financial Times.
by Roi Teveth Itai Yaffe
Host: Bay Area
by Rafal Biegacz
In the contemporary world security is important more than ever - Airflow installations are no exception. Google Cloud Platform and Cloud Composer offer useful security options for running your DAGs and tasks in a way so you effectively can manage a risk of data exfiltration and access to the system is limited.
by Yulei Li Dinghang Yu Ace Haidrey
by Aizhamal Nurmamat kyzy Griselda Cuevas
by Ry Walker Maxime Beauchemin Viraj Parekh
Astronomer is focused on improving Airflow’s user experience through the entire lifecycle — from authoring + testing DAGs, to building containers and deploying the DAGs, to running and monitoring both the DAGs and the infrastructure that they are operating within — with an eye towards increased security and governance as well. In this talk we walk you through some current UX challenges, an overview of how the Astronomer platform addresses the major challenges, and also provide sneak peek of the things that we’re working on in the coming months to improve Airflow’s user experience.
by Jacob Ferriero
Deploying bad DAGs to your Airflow environment can wreak havoc. This talk provides an opinionated take on a mono repo structure for GCP data pipelines leveraging BigQuery, Dataflow and a series of CI tests for validating your Airflow DAGs before deploying them to Cloud Composer.
by Maxime Beauchemin
Superset is the leading open source data exploration and visualization platform. In this talk, we’ll be presenting Superset with a focus on advanced topics that are most relevant to Data Engineers. The presentation will be largely a live demo of the product, with a deeper dive into advanced topics for Data Engineers.
by QP Hou
Scribd is migrating its data pipeline from an in house system to Airflow. It’s a one big giant data pipeline consisting of more than 1,500 tasks. In this talk, I would like to share couple best practices on setting up a cloud native Airflow deployment in AWS. For those who are interested in migrating a non-trivial data pipeline to Airflow, I will also share how Scribd plans and executes the migration.
by Traey Hatch
In this talk I will introduce a DAG authoring and editing tool for Airflow that we have built. Installed as a plugin, this tool allows users to author DAGs compose existing operators and hooks with virtually no Python experience. We walk through a demo of DAG authorship and deployment, and spend time reviewing the underlying open-source standards used and the general approach that was taken to develop the code.
by Gerard Casas Saez
Airflow does not currently have an explicit way to declare messages passed between tasks in a DAG. XCom are available but are hidden in execution functions inside the operator. AIP-31 proposes a way to make this message passing explicit in the DAG file and make it easier to reason about your DAG behaviour.
by Blaine Elliot
In this talk we review how Airflow helped create a tool to detect data anomalies. Leveraging Airflow for process management, database interoperability, and authentication created an easy path forward to achieve scale, decrease the development time and pass security audits. While Airflow is generally looked at as a solution to manage data pipelines, integrating tools with Airflow can also speed up development of those tools.
by Daniel Imberman Greg Neiheisel
by Amr Noureldin Michal Dura
This talk describes how Airflow is utilized in an Autonomous driving project, originating from Munich - Germany. We describe the Airflow setup, what challenges we encountered and how we maneuvered to achieve a distributed and highly scalable Airflow setup.
by Adam Boscarino
Learn how Devoted Health went from cron jobs to Airflow deployment Kubernetes using a combination of open source and internal tooling.
by Victor Shafran
How do you create fast and painless delivery of new DAGs into production? When running Airflow at scale, it becomes a big challenge to manage the full lifecycle around your pipelines; making sure that DAGs are easy to develop, test, and ship into prod. In this talk, we will cover our suggested approach to building a proper CI/CD cycle that ensures the quality and fast delivery of production pipelines.
by Jarek Potiuk
by Hendrik Kleine Vicente Ruben del Pino Ruiz
In search of a better, modern, simplistic method of managing ETL’s processes and merging them with various AI and ML tasks, we landed on Airflow. We envisioned a new user friendly interface that can leverage dynamic DAG’s and reusable components to build an ETL tool that requires virtually no training.
by Alexander Eliseev
by Josh Benamram
While Airflow is a central product for data engineering teams, it’s usually one piece of a bigger puzzle. The vast majority of teams use Airflow in combination with other tools like Spark, Snowflake, and BigQuery. Making sure pipelines are reliable, detecting issues that lead to SLA misses, and identifying data quality problems requires deep visibility into DAGs and data flows. Join this session to learn how Databand’s observability system makes it easy to monitor your end-to-end pipeline health and quickly remediate issues.
by Leah Cole
BigQuery is GCP’s serverless, highly scalable and cost-effective cloud data warehouse that can analyze petabytes of data at super fast speeds. Amazon S3 is one of the oldest and most popular cloud storage offerings. Folks with data in S3 often want to use BigQuery to gain insights into their data. Using Apache Airflow, they can build pipelines to seamlessly orchestrate that connection. In this talk, Leah walks through how they created an easily configurable pipeline to extract data.
by Nehil Jain
To improve automation of data pipelines, I propose a universal approach to ELT pipeline that optimizes for data integrity, extensibility, and speed to delivery. The workflow is built using open source tools and standards like Apache Airflow, Singer, Great Expectations, and DBT.
by Bas Harenslak
How do you ensure your workflows work before deploying to production? In this talk I’ll go over various ways to assure your code works as intended - both on a task and a DAG level.
by Vanessa Sochat
Engaging with a new community is a common experience in OSS development. There are usually expectations held by the project about the contributor’s exposure to the community, and by the contributor about interactions with the community. When these expectations are misaligned, the process is strained. In this talk Vanessa discusses a real life experience that required communication, persistence, and patience to ultimately lead to a positive outcome.
by Noam Elfanbaum
At Bluevine we use Airflow to drive our ML platform. In this talk, Noam presents the challenges and gains we had at transitioning from a single server running Python scripts with cron to a full blown Airflow setup. This includes: supporting multiple Python versions, event driven DAGs, performance issues and more!
by Sergio Fandino
For three years we at LOVOO, a market-leading dating app, have been using the Google Cloud managed version of Airflow, a product we’ve been familiar with since its Alpha release. We took a calculated risk and integrated the Alpha into our product, and, luckily, it was a match. Since then, we have been leveraging this software to build out not only our data pipeline, but also boost the way we do analytics and BI.
Host: Bay Area
by Angel Daz
Data Infrastructures look differently between small, mid, and large sized companies. Yet, most content out there is for large and sophisticated systems. And almost none of it is on migrating a legacy, on-prem, databases over to the cloud. In order to better explain the evolving needs of data engineering organizations, we will review the hierarchy of needs for data engineering.
by Karolina Rosol Maciej Oczko
This talk shares Polidea’s journey from mobile app development studio to an OSS oriented business partner. We will tell you our story towards code leadership throughout the years. We are also going to share the challenges and practical insights into managing open source projects in our company. After this talk, you will know how we approached combining open source, business and team management not forgetting about a human aspect.
by Rafael Ribaldo Lucas Mendes Mota da Fonseca
Cross-DAG dependency may reduce cohesion in data pipelines and, without having an explicit solution in Airflow or in a third-party plugin, those pipelines tend to become complex to handle. That is the reason we, at QuintoAndar, have created an intermediate DAG to handle relationships across data pipelines called Mediator, in order for them to be scalable and maintainable by any team.
by Naresh Yegireddi Patricio Garza
Being a pioneer for the past 25 years, SONY PlayStation has played a vital role in the Interactive Gaming Industry. Over 100+ million monthly active users, 100+ million PS-4 console sales along with thousands of game development partners across the globe, big-data problem is quite inevitable. This presentation talks about how we scaled Airflow horizontally which has helped us building a stable, scalable and optimal data processing infrastructure powered by Apache Spark, AWS ECS, EC2 and Docker.
by Evgeny Shulman
Identify issues in a fraction of the time and streamline root cause analysis for your DAGs. Airflow is the leading orchestration platform for data engineers. But when running Airflow at production scale, many teams have bigger needs for monitoring jobs, creating the right level of alerting, tracking problems in data, and finding the root cause of errors. In this talk we will cover our suggested approach to gaining Airflow observability so that you have the visibility you need to be productive.