class, aws_conn_id='aws_default', region_name=None, poll_interval=5, wait_for_completion=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates, updates and triggers an AWS Glue Crawler.

AWS Glue Crawler is a serverless service that manages a catalog of metadata tables that contain the inferred schema, format and data types of data stores within the AWS cloud.

For more information on how to use this operator, take a look at the guide: Create an AWS Glue crawler

  • config – Configurations for the AWS Glue crawler

  • aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).

  • poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check crawler status

  • wait_for_completion (bool) – Whether to wait for crawl execution completion. (default: True)

  • deferrable (bool) – If True, the operator will wait asynchronously for the crawl to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)

template_fields: Sequence[str] = ('config',)[source]
ui_color = '#ededed'[source]

Create and return a GlueCrawlerHook.


Execute AWS Glue Crawler from Airflow.


the name of the current glue crawler.

execute_complete(context, event=None)[source]

