airflow.providers.amazon.aws.operators.glue

Module Contents

Classes

GlueJobOperator

Creates an AWS Glue Job. AWS Glue is a serverless Spark

class airflow.providers.amazon.aws.operators.glue.GlueJobOperator(*, job_name='aws_glue_default_job', job_desc='AWS Glue Job with Airflow', script_location=None, concurrent_run_limit=None, script_args=None, retry_limit=0, num_of_dpus=None, aws_conn_id='aws_default', region_name=None, s3_bucket=None, iam_role_name=None, create_job_kwargs=None, run_job_kwargs=None, wait_for_completion=True, verbose=False, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala

See also

For more information on how to use this operator, take a look at the guide: Submit an AWS Glue job

Parameters
  • job_name (str) -- unique job name per AWS Account

  • script_location (Optional[str]) -- location of ETL script. Must be a local or S3 path

  • job_desc (str) -- job description details

  • concurrent_run_limit (Optional[int]) -- The maximum number of concurrent runs allowed for a job

  • script_args (Optional[dict]) -- etl script arguments and AWS Glue arguments (templated)

  • retry_limit (int) -- The maximum number of times to retry this job if it fails

  • num_of_dpus (Optional[int]) -- Number of AWS Glue DPUs to allocate to this Job.

  • region_name (Optional[str]) -- aws region name (example: us-east-1)

  • s3_bucket (Optional[str]) -- S3 bucket where logs and local etl script will be uploaded

  • iam_role_name (Optional[str]) -- AWS IAM Role for Glue Job Execution

  • create_job_kwargs (Optional[dict]) -- Extra arguments for Glue Job Creation

  • run_job_kwargs (Optional[dict]) -- Extra arguments for Glue Job Run

  • wait_for_completion (bool) -- Whether or not wait for job run completion. (default: True)

  • verbose (bool) -- If True, Glue Job Run logs show in the Airflow Task Logs. (default: False)

template_fields :Sequence[str] = ['job_name', 'script_location', 'script_args', 's3_bucket', 'iam_role_name'][source]
template_ext :Sequence[str] = [][source]
template_fields_renderers[source]
ui_color = #ededed[source]
execute(context)[source]

Executes AWS Glue Job from Airflow

Returns

the id of the current glue job.

Was this entry helpful?