Dask Executor¶
airflow.providers.daskexecutor.executors.dask_executor.DaskExecutor
allows you to run Airflow tasks in a Dask Distributed cluster.
Dask clusters can be run on a single machine or on remote networks. For complete details, consult the Distributed documentation.
To create a cluster, first start a Scheduler:
# default settings for a local cluster
DASK_HOST=127.0.0.1
DASK_PORT=8786
dask-scheduler --host $DASK_HOST --port $DASK_PORT
Next start at least one Worker on any machine that can connect to the host:
dask-worker $DASK_HOST:$DASK_PORT
Edit your airflow.cfg
to set your executor to airflow.providers.daskexecutor.executors.dask_executor.DaskExecutor
and provide
the Dask Scheduler address in the [dask]
section. For more information on setting the configuration,
see Setting Configuration Options.
Please note:
Each Dask worker must be able to import Airflow and any dependencies you require.
The DaskExecutor implements queues using Dask Worker Resources functionality. To enable the use of queues, start your Dask workers with resources of the same name as the desired queues and a limit of
inf
. E.g.dask-worker <scheduler_address> --resources="QUEUE1=inf,QUEUE2=inf"
.