Dask Executor

airflow.executors.dask_executor.DaskExecutor allows you to run Airflow tasks in a Dask Distributed cluster.

Dask clusters can be run on a single machine or on remote networks. For complete details, consult the Distributed documentation.

To create a cluster, first start a Scheduler:

# default settings for a local cluster

dask-scheduler --host $DASK_HOST --port $DASK_PORT

Next start at least one Worker on any machine that can connect to the host:

dask-worker $DASK_HOST:$DASK_PORT

Edit your airflow.cfg to set your executor to airflow.executors.dask_executor.DaskExecutor and provide the Dask Scheduler address in the [dask] section. For more information on setting the configuration, see Setting Configuration Options.

Please note:

  • Each Dask worker must be able to import Airflow and any dependencies you require.

  • The DaskExecutor implements queues using Dask Worker Resources functionality. To enable the use of queues, start your Dask workers with resources of the same name as the desired queues and a limit of inf. E.g. dask-worker <scheduler_address> --resources="QUEUE1=inf,QUEUE2=inf".

Was this entry helpful?