airflow.providers.amazon.aws.hooks.glue
¶
Module Contents¶
Classes¶
Interact with AWS Glue. |
Attributes¶
- class airflow.providers.amazon.aws.hooks.glue.GlueJobHook(s3_bucket=None, job_name=None, desc=None, concurrent_run_limit=1, script_location=None, retry_limit=0, num_of_dpus=None, iam_role_name=None, create_job_kwargs=None, *args, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook
Interact with AWS Glue. Provide thick wrapper around
boto3.client("glue")
.- Parameters
s3_bucket (str | None) – S3 bucket where logs and local etl script will be uploaded
job_name (str | None) – unique job name per AWS account
desc (str | None) – job description
concurrent_run_limit (int) – The maximum number of concurrent runs allowed for a job
script_location (str | None) – path to etl script on s3
retry_limit (int) – Maximum number of times to retry this job if it fails
num_of_dpus (int | float | None) – Number of AWS Glue DPUs to allocate to this Job
region_name – aws region name (example: us-east-1)
iam_role_name (str | None) – AWS IAM Role for Glue Job Execution
create_job_kwargs (dict | None) – Extra arguments for Glue Job Creation
Additional arguments (such as
aws_conn_id
) may be specified and are passed down to the underlying AwsBaseHook.- initialize_job(script_arguments=None, run_kwargs=None)[source]¶
Initializes connection with AWS Glue to run job.
See also
- get_job_state(job_name, run_id)[source]¶
Get state of the Glue job. The job state can be running, finished, failed, stopped or timeout.
See also
- print_job_logs(job_name, run_id, job_failed=False, next_token=None)[source]¶
Prints the batch of logs to the Airflow task log and returns nextToken.
- job_completion(job_name, run_id, verbose=False)[source]¶
Waits until Glue job with job_name completes or fails and return final state if finished. Raises AirflowException when the job failed.
- has_job(job_name)[source]¶
Checks if the job already exists.
See also
- Parameters
job_name – unique job name per AWS account
- Returns
Returns True if the job already exists and False if not.
- Return type