airflow.providers.amazon.aws.operators.glue

Module Contents

Classes

GlueJobOperator

Create an AWS Glue Job.

class airflow.providers.amazon.aws.operators.glue.GlueJobOperator(*, job_name='aws_glue_default_job', job_desc='AWS Glue Job with Airflow', script_location=None, concurrent_run_limit=None, script_args=None, retry_limit=0, num_of_dpus=None, aws_conn_id='aws_default', region_name=None, s3_bucket=None, iam_role_name=None, iam_role_arn=None, create_job_kwargs=None, run_job_kwargs=None, wait_for_completion=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), verbose=False, replace_script_file=False, update_config=False, job_poll_interval=6, stop_job_run_on_kill=False, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create an AWS Glue Job.

AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala.

See also

For more information on how to use this operator, take a look at the guide: Submit an AWS Glue job

Parameters
  • job_name (str) – unique job name per AWS Account

  • script_location (str | None) – location of ETL script. Must be a local or S3 path

  • job_desc (str) – job description details

  • concurrent_run_limit (int | None) – The maximum number of concurrent runs allowed for a job

  • script_args (dict | None) – etl script arguments and AWS Glue arguments (templated)

  • retry_limit (int) – The maximum number of times to retry this job if it fails

  • num_of_dpus (int | float | None) – Number of AWS Glue DPUs to allocate to this Job.

  • region_name (str | None) – aws region name (example: us-east-1)

  • s3_bucket (str | None) – S3 bucket where logs and local etl script will be uploaded

  • iam_role_name (str | None) – AWS IAM Role for Glue Job Execution. If set iam_role_arn must equal None.

  • iam_role_arn (str | None) – AWS IAM ARN for Glue Job Execution. If set iam_role_name must equal None.

  • create_job_kwargs (dict | None) – Extra arguments for Glue Job Creation

  • run_job_kwargs (dict | None) – Extra arguments for Glue Job Run

  • wait_for_completion (bool) – Whether to wait for job run completion. (default: True)

  • deferrable (bool) – If True, the operator will wait asynchronously for the job to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)

  • verbose (bool) – If True, Glue Job Run logs show in the Airflow Task Logs. (default: False)

  • update_config (bool) – If True, Operator will update job configuration. (default: False)

  • replace_script_file (bool) – If True, the script file will be replaced in S3. (default: False)

  • stop_job_run_on_kill (bool) – If True, Operator will stop the job run when task is killed.

template_fields: Sequence[str] = ('job_name', 'script_location', 'script_args', 'create_job_kwargs', 's3_bucket',...[source]
template_ext: Sequence[str] = ()[source]
template_fields_renderers[source]
ui_color = '#ededed'[source]
glue_job_hook()[source]
execute(context)[source]

Execute AWS Glue Job from Airflow.

Returns

the current Glue job ID.

execute_complete(context, event=None)[source]
on_kill()[source]

Cancel the running AWS Glue Job.

Was this entry helpful?