Ecosystem

These resources and services are not maintained, nor endorsed by the Apache Airflow® Community and Apache Airflow project (maintained by the Committers and the Airflow PMC). Use them at your sole discretion. The community does not verify the licences nor validity of those tools, so it’s your responsibility to verify them.

If you would you like to be included on this page, please reach out to the Apache Airflow dev or user mailing list and let us know or simply open a Pull Request to that page.

 

Learning resources

Apache Airflow YouTube Channel - Official YouTube Channel

Airflow Summit - Conference for Apache Airflow developers

Awesome Apache Airflow - Curated list of resources about Apache Airflow

Astronomer Academy - Full courses and certifications available by the Education team at Astronomer

The Complete Hands-On Introduction to Apache Airflow by Marc Lamberti on Udemy

Apache Airflow: Complete Hands-On Beginner to Advanced Class by Alexandra Abbas on Udemy

Data Pipelines with Apache Airflow Apache Airflow Book on Amazon

 

Airflow as a Service

Astro - Provided by Astronomer, Astro is the modern data orchestration platform, powered by Apache Airflow. Astro enables data engineers, data scientists, and data analysts to build, run, and observe pipelines-as-code.

Google Cloud Composer - Managed Apache Airflow service on Google Cloud Platform

Amazon Managed Workflows for Apache Airflow - Managed Apache Airflow on Amazon Web Services (AWS)

Azure Data Factory Managed Airflow - Managed Apache Airflow service on Azure

Yandex Managed Service for Apache Airflow - Managed Apache Airflow on Yandex Cloud

Airflow with Restack - Managed Apache Airflow on Restack Cloud or bring your own cloud: AWS EKS, GCP GKE, or Azure AKS. Allowing you to use the latest version of Airflow with your own DAGs. Connect your repo to the Restack GitHub app for built-in CI/CD.

DoubleCloud Managed Service for Apache Airflow - Managed Apache Airflow on DoubleCloud platform.

 

Other deployments methods

Airflow Heroku Deployment - Airflow Heroku Deployment allows creating a demo Airflow instance in just a couple of clicks.

Self-Managed Airflow via CNDI - Toolkit for deploying Airflow Kubernetes clusters, with support for AWS, GCP, Azure, VMWare, Bare-Metal, and even multi/hybrid cloud support. See docs for more details.

Self-managed Airflow on Amazon EKS - Self-managed Airflow on Amazon EKS provides a guide for deploying self-managed Apache Airflow on Amazon EKS with Terraform using Data on EKS Blueprints with the Terraform Data add-ons module, check out the Data on EKS Airflow blueprint.

Amazon MWAA Terraform Module allows you to deploy Amazon Managed Workflows for Apache Airflow using the official Terraform module. For a full example on how to use Amazon MWAA, check out the Data on EKS MWAA blueprint.

 

Third Party Airflow Plugins and Providers

Astronomer Registry - The discovery and distribution hub for Apache Airflow integrations created to aggregate and curate the best bits of the ecosystem.

Airflow Plugins - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, GitHub, etc.

Airflow ECR Plugin - Plugin to refresh AWS ECR login token at regular intervals. This is helpful where DockerOperator needs to pull images hosted on ECR.

Airflow OpenMLDB Provider - Airflow OpenMLDB Provider containing Operators for Feature Extraction on OpenMLDB.

Airflow Apache Mesos Provider - Airflow Apache Mesos Provider containing Scheduler to scale out with Apache Mesos.

Airflow Netezza Provider - Airflow Provider to connect with Netezza using nzpy

Airflow Grafana Loki Provider - Provides Hook and LogHandler that integrates with Grafana Loki. This provides a LogHandler for writing and reading Task Logs to and from Grafana Loki.

Airflow SAS Provider - Provides Hook and Operators for creating Airflow tasks to execute SAS Studio Flows and Jobs.

Airflow Cloudera Provider - Provides Hooks and Operators to interact and run your workloads onto Cloudera Data Platform Services

Airflow Alembic Provider - Provides Hooks and Operators to run Database Migrations with Alembic

Airflow Pulumi Provider - Provides Hooks and Operators to manage Infrastructure-as-Code with Pulumi

Airflow DolphinDB Provider - Provides Hooks and Operators to run scripts with DolphinDB.

Airflow TM1 Provider - Provides Hook and Operators to simplify connecting to the IBM Cognos TM1 / Planning Analytics database over REST API.

Astronomer Cosmos - Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code.

Airflow OpenTelemetry Provider - Provides Hook and EventListener which will generate traces, metrics, and logs in OpenTelemetry for your DAG runs.

 

Async Providers

Astronomer Providers - A collection of Async Operators and Sensors for Apache Airflow built and maintained by Astronomer.

Airflow Kafka Provider - Apache Airflow Kafka provider containing Deferrable Operators & Sensors.

 

Third Party Airflow Helm Charts

Apache Airflow releases the Official Apache Airflow Community Chart as of early 2021 but historically there were few other popular charts

User Community Chart - the user community managed chart that has existed since 2018 and was previously called stable/airflow on the official (now deprecated) Helm Charts repo.

Bitnami Chart - Bitnami manages a number of charts and the Airflow chart is one of those

Astronomer Chart - The chart managed by Astronomer Chart. This was the original chart that the Official Airflow Community chart is based on (it was donated by Astronomer)

 

Tools integrating with Airflow

ADA - A microservice created to retrieve analytics metrics from an Airflow database instance.

as-scraper - An integration with Selenium to build & mantain web scrapers inside Airflow.

afctl - A CLI tool that includes everything required to create, manage and deploy airflow projects faster and smoother.

airflint - Enforce Best Practices for all your Airflow DAGs.

airflow-aws-executors - Run Airflow Tasks directly on AWS Batch, AWS Fargate, or AWS ECS; provisioning less infra is more.

airflow-code-editor - A tool for Apache Airflow that allows you to edit DAGs in browser.

airflow-diagrams - Auto-generated Diagrams from Airflow DAGs

airflow-maintenance-dags - Clairvoyant has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store.

AirflowK8sDebugger - A library for generate k8s pod yaml templates from an Airflow dag using the KubernetesPodOperator.

Airflow Ditto - An extensible framework to do transformations to an Airflow DAG and convert it into another DAG which is flow-isomorphic with the original DAG, to be able to run it on different environments (e.g. on different clouds, or even different container frameworks - Apache Spark on YARN vs Kubernetes). Comes with out-of-the-box support for EMR-to-HDInsight-DAG transforms.

Amundsen - Amundsen is a data discovery and metadata platform for improving the productivity of data analysts, data scientists and engineers when interacting with data. It can surface which Airflow task generates a given table.

Apache-Liminal-Incubating - Liminal provides a domain-specific-language (DSL) to build ML/AI workflows on top of Apache Airflow. Its goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production.

Astro CLI - The Astro CLI is the easiest way to get a local Airflow server for prototyping and development.

Astro SDK - Astro SDK allows rapid and clean development of Extract, Load, Transform workflows using Python and SQL, powered by Apache Airflow and maintained by Astronomer.

Chartis - Python package to convert Common Workflow Language (CWL) into Airflow DAG.

CWL-Airflow - Python package to extend Apache-Airflow 1.10.11 functionality with CWL v1.2 support.

DAGify - A Python tool which converts Control-M workflows to Airflow DAGs.

dag-factory - A library for dynamically generating Apache Airflow DAGs from YAML configuration files.

Dag Dependencies viewer - A tool which creates a view to visualize dependencies between the Airflow DAGs

data-dag - A library for building factories to dynamically generate DAGs from data (such as YAML files)

Databand - Observability platform built on top of Airflow.

DataHub - A metadata platform for the modern data stack. It can automatically collect lineage and other metadata from Airflow.

dbt (data build tool) - Data transformation tool, dbt jobs can be scheduled using Airflow.

Domino - Domino is an open source Graphical User Interface platform for creating data and Machine Learning workflows (DAGs) with no-code, visually intuitive drag-and-drop actions. It is also a standard for publishing and sharing your Python code so it can be automatically used by anyone, directly in the GUI.

Elyra - Elyra provides a visual editor that enables data scientists to create AI pipelines in a low-code/no-code fashion.

GeniumCloud - One-Stop-Shop Platform for rapid build, scheduling and control Airflow workflows via completely new UI. Out of the box comprehensive Airflow infrastructure monitoring, integration with alerting systems and service adoption from small to enterprise organizations. The easiest way to manage complex workflows.

gusty - Create a DAG using any number of YAML, Python, Jupyter Notebook, or R Markdown files that represent individual tasks in the DAG. gusty also configures dependencies, DAGs, and TaskGroups, features support for your local operators, and more. A fully containerized demo is available here.

Marquez - Marquez is an open source metadata service that maintains data provenance, shows how datasets are consumed and produced and centralizes dataset lifecycle management. Marquez can be used with Apache Airflow as an OpenLineage backend.

Meltano - Open source, self-hosted, CLI-first, debuggable, and extensible ELT tool that embraces Singer for extraction and loading, leverages dbt for transformation, and integrates with Airflow for orchestration.

Nexla - Build, transform, and manage data flows to and from databases, APIs, streams, SaaS services, events, and even emails. Use Nexla’s Airflow Operator to trigger flows to start in other Operators when your Nexla flow finishes running.

Oozie to Airflow - A tool to easily convert between Apache Oozie workflows and Apache Airflow workflows.

OpenLineage - An open standard for the collection of data lineage, which can be used to trace the path of datasets as they traverse multiple systems including Apache Airflow.

Panda Patrol - Test and profile your data right within your Airflow DAGs. With dashboards and alerts already pre-built.

PowerBI-Airflow-Plugin - The Airflow plugin for Power BI includes a custom Airflow operator designed to refresh Power BI datasets.

Pylint-Airflow - A Pylint plugin for static code analysis on Airflow code.

Redactics - A managed appliance (built on Airflow) installed next to your databases that powers a growing collection of data management workflows.

simple-dag-editor - Zero configuration Airflow tool that let you manage your DAG files.

Viewflow - An Airflow-based framework that allows data scientists to create data models without writing Airflow code.

whirl - Fast iterative local development and testing of Apache Airflow workflows.

ZenML - Run your machine learning specific pipelines on Airflow, easily integrating with your existing data science tools and workflows.

Airflow Vscode Extension This is a VSCode extension for Apache Airflow 2+. You can trigger your DAGs, pause/unpause DAGs, view execution logs, explore source code and do much more.

Airflow Provider Template - Template and commands for creating and testing airflow provider packages.

Airflow Template - Template and commands for creating minimal airflow environments for rapid testing and prototyping.

 

Airflow Provider System Test Dashboards

Amazon provider package health dashboard - Dashboard listing all system tests within the Amazon provider package and their current health status: last execution status (succeeded/failed, average duration, …).

Google provider package health dashboard - Dashboard listing all system tests within the Google provider package and their current health status

LLM Providers health dashboard - Dashboard listing all system tests within the LLM provider packages and their current health status: execution status for last 7 runs(succeeded/failed, Execution date).

Teradata Provider health dashboard - Dashboard listing status of system tests for Teradata Provider and their current health status for last runs.