airflow.providers.amazon.aws.hooks.glue

Module Contents

Classes

GlueJobHook

Interact with AWS Glue - create job, trigger, crawler

AwsGlueJobHook

This hook is deprecated.

class airflow.providers.amazon.aws.hooks.glue.GlueJobHook(s3_bucket=None, job_name=None, desc=None, concurrent_run_limit=1, script_location=None, retry_limit=0, num_of_dpus=None, iam_role_name=None, create_job_kwargs=None, *args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS Glue - create job, trigger, crawler

Parameters
  • s3_bucket (Optional[str]) -- S3 bucket where logs and local etl script will be uploaded

  • job_name (Optional[str]) -- unique job name per AWS account

  • desc (Optional[str]) -- job description

  • concurrent_run_limit (int) -- The maximum number of concurrent runs allowed for a job

  • script_location (Optional[str]) -- path to etl script on s3

  • retry_limit (int) -- Maximum number of times to retry this job if it fails

  • num_of_dpus (Optional[int]) -- Number of AWS Glue DPUs to allocate to this Job

  • region_name -- aws region name (example: us-east-1)

  • iam_role_name (Optional[str]) -- AWS IAM Role for Glue Job Execution

  • create_job_kwargs (Optional[dict]) -- Extra arguments for Glue Job Creation

JOB_POLL_INTERVAL = 6[source]
list_jobs(self)[source]
Returns

Lists of Jobs

Return type

List

get_iam_execution_role(self)[source]
Returns

iam role for job execution

Return type

Dict

initialize_job(self, script_arguments=None, run_kwargs=None)[source]

Initializes connection with AWS Glue to run job :return:

get_job_state(self, job_name, run_id)[source]

Get state of the Glue job. The job state can be running, finished, failed, stopped or timeout. :param job_name: unique job name per AWS account :param run_id: The job-run ID of the predecessor job run :return: State of the Glue job

job_completion(self, job_name, run_id)[source]

Waits until Glue job with job_name completes or fails and return final state if finished. Raises AirflowException when the job failed :param job_name: unique job name per AWS account :param run_id: The job-run ID of the predecessor job run :return: Dict of JobRunState and JobRunId

get_or_create_glue_job(self)[source]

Creates(or just returns) and returns the Job name :return:Name of the Job

class airflow.providers.amazon.aws.hooks.glue.AwsGlueJobHook(*args, **kwargs)[source]

Bases: GlueJobHook

This hook is deprecated. Please use airflow.providers.amazon.aws.hooks.glue.GlueJobHook.

Was this entry helpful?