Airflow - beyond the basics (and also some basics)

Speaker: sheena

Track: Other

Type: Tutorial

Room: Tutorial Room

Time: Oct 04 (Wed): 09:00

Duration: 4:00

Apache Airflow is a platform that allows Pythonistas to programmatically author, schedule and monitor workflows. It was originally created by the nice folks at Airbnb because they had a lot of problems.

xkcd comic

Now Airflow is open source and pretty well documented. But it has a few gotchas and unexpected behaviors.

In this tutorial we'll cover:

  • a setup tutorial and tour for those who are new to Airflow
  • using Airflow command line utilities for efficient dag authoring
  • how to create custom operators that don't break Airflow's magic. We'll do this by creating a Django operator (airflow tasks are instances of operators)
  • strategies for passing non-trivial data between tasks
  • how to generate Dags and tasks based on data instead of authoring everything by hand ( a dag is like a workflow, a collection of tasks and configuration)

The examples that will be covered in the tutorial are available from this github repo.

URLs


Python Software Foundation
Thinkst Canary Afrolabs