airflow.providers.amazon.aws.hooks.glue_crawler

Module Contents

Classes

GlueCrawlerHook

Interacts with AWS Glue Crawler.

class airflow.providers.amazon.aws.hooks.glue_crawler.GlueCrawlerHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interacts with AWS Glue Crawler. Provide thin wrapper around boto3.client("glue").

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

glue_client()[source]
Returns

AWS Glue client

has_crawler(crawler_name)[source]

Checks if the crawler already exists.

Parameters

crawler_name – unique crawler name per AWS account

Returns

Returns True if the crawler already exists and False if not.

Return type

bool

get_crawler(crawler_name)[source]

Gets crawler configurations.

Parameters

crawler_name (str) – unique crawler name per AWS account

Returns

Nested dictionary of crawler configurations

Return type

dict

update_crawler(**crawler_kwargs)[source]

Updates crawler configurations.

Parameters

crawler_kwargs – Keyword args that define the configurations used for the crawler

Returns

True if crawler was updated and false otherwise

Return type

bool

update_tags(crawler_name, crawler_tags)[source]

Updates crawler tags.

Parameters
  • crawler_name (str) – Name of the crawler for which to update tags

  • crawler_tags (dict) – Dictionary of new tags. If empty, all tags will be deleted

Returns

True if tags were updated and false otherwise

Return type

bool

create_crawler(**crawler_kwargs)[source]

Creates an AWS Glue Crawler.

Parameters

crawler_kwargs – Keyword args that define the configurations used to create the crawler

Returns

Name of the crawler

Return type

str

start_crawler(crawler_name)[source]

Triggers the AWS Glue Crawler.

Parameters

crawler_name (str) – unique crawler name per AWS account

Returns

Empty dictionary

Return type

dict

wait_for_crawler_completion(crawler_name, poll_interval=5)[source]

Waits until Glue crawler completes and returns the status of the latest crawl run. Raises AirflowException if the crawler fails or is cancelled.

Parameters
  • crawler_name (str) – unique crawler name per AWS account

  • poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check crawler status

Returns

Crawler’s status

Return type

str

Was this entry helpful?