airflow.providers.amazon.aws.hooks.glue_crawler

Module Contents

class airflow.providers.amazon.aws.hooks.glue_crawler.AwsGlueCrawlerHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interacts with AWS Glue Crawler.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

See also

AwsBaseHook

glue_client(self)[source]
Returns

AWS Glue client

has_crawler(self, crawler_name)[source]

Checks if the crawler already exists

Parameters

crawler_name (str) -- unique crawler name per AWS account

Returns

Returns True if the crawler already exists and False if not.

get_crawler(self, crawler_name: str)[source]

Gets crawler configurations

Parameters

crawler_name (str) -- unique crawler name per AWS account

Returns

Nested dictionary of crawler configurations

update_crawler(self, **crawler_kwargs)[source]

Updates crawler configurations

Parameters

crawler_kwargs (any) -- Keyword args that define the configurations used for the crawler

Returns

True if crawler was updated and false otherwise

create_crawler(self, **crawler_kwargs)[source]

Creates an AWS Glue Crawler

Parameters

crawler_kwargs (any) -- Keyword args that define the configurations used to create the crawler

Returns

Name of the crawler

start_crawler(self, crawler_name: str)[source]

Triggers the AWS Glue crawler

Parameters

crawler_name (str) -- unique crawler name per AWS account

Returns

Empty dictionary

wait_for_crawler_completion(self, crawler_name: str, poll_interval: int = 5)[source]

Waits until Glue crawler completes and returns the status of the latest crawl run. Raises AirflowException if the crawler fails or is cancelled.

Parameters
  • crawler_name (str) -- unique crawler name per AWS account

  • poll_interval (int) -- Time (in seconds) to wait between two consecutive calls to check crawler status

Returns

Crawler's status

Was this entry helpful?