airflow.executors.dask_executor.DaskExecutor allows you to run Airflow tasks in a Dask Distributed cluster.
Dask clusters can be run on a single machine or on remote networks. For complete details, consult the Distributed documentation.
To create a cluster, first start a Scheduler:
# default settings for a local cluster DASK_HOST=127.0.0.1 DASK_PORT=8786 dask-scheduler --host $DASK_HOST --port $DASK_PORT
Next start at least one Worker on any machine that can connect to the host:
airflow.cfg to set your executor to
airflow.executors.dask_executor.DaskExecutor and provide
the Dask Scheduler address in the
[dask] section. For more information on setting the configuration,
see Setting Configuration Options.
Each Dask worker must be able to import Airflow and any dependencies you require.
The DaskExecutor implements queues using Dask Worker Resources functionality. To enable the use of queues, start your Dask workers with resources of the same name as the desired queues and a limit of
dask-worker <scheduler_address> --resources="QUEUE1=inf,QUEUE2=inf".