airflow.providers.amazon.aws.hooks.glue¶
Module Contents¶
Classes¶
| Interact with AWS Glue. | 
Attributes¶
- class airflow.providers.amazon.aws.hooks.glue.GlueJobHook(s3_bucket=None, job_name=None, desc=None, concurrent_run_limit=1, script_location=None, retry_limit=0, num_of_dpus=None, iam_role_name=None, create_job_kwargs=None, update_config=False, job_poll_interval=6, *args, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with AWS Glue. - Provide thick wrapper around - boto3.client("glue").- Parameters
- s3_bucket (str | None) – S3 bucket where logs and local etl script will be uploaded 
- job_name (str | None) – unique job name per AWS account 
- desc (str | None) – job description 
- concurrent_run_limit (int) – The maximum number of concurrent runs allowed for a job 
- script_location (str | None) – path to etl script on s3 
- retry_limit (int) – Maximum number of times to retry this job if it fails 
- num_of_dpus (int | float | None) – Number of AWS Glue DPUs to allocate to this Job 
- region_name – aws region name (example: us-east-1) 
- iam_role_name (str | None) – AWS IAM Role for Glue Job Execution 
- create_job_kwargs (dict | None) – Extra arguments for Glue Job Creation 
- update_config (bool) – Update job configuration on Glue (default: False) 
 
 - Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- class LogContinuationTokens[source]¶
- Used to hold the continuation tokens when reading logs from both streams Glue Jobs write to. 
 - initialize_job(script_arguments=None, run_kwargs=None)[source]¶
- Initializes connection with AWS Glue to run job. - See also 
 - get_job_state(job_name, run_id)[source]¶
- Get state of the Glue job; the job state can be running, finished, failed, stopped or timeout. - See also 
 - print_job_logs(job_name, run_id, continuation_tokens)[source]¶
- Prints the latest job logs to the Airflow task log and updates the continuation tokens. - Parameters
- continuation_tokens (LogContinuationTokens) – the tokens where to resume from when reading logs. The object gets updated with the new tokens by this method. 
 
 - job_completion(job_name, run_id, verbose=False)[source]¶
- Wait until Glue job with job_name finishes; return final state if finished or raises AirflowException. 
 - async async_job_completion(job_name, run_id, verbose=False)[source]¶
- Wait until Glue job with job_name finishes; return final state if finished or raises AirflowException. 
 - has_job(job_name)[source]¶
- Checks if the job already exists. - See also - Parameters
- job_name – unique job name per AWS account 
- Returns
- Returns True if the job already exists and False if not. 
- Return type
 
 - update_job(**job_kwargs)[source]¶
- Updates job configurations. - See also - Parameters
- job_kwargs – Keyword args that define the configurations used for the job 
- Returns
- True if job was updated and false otherwise 
- Return type
 
 
