airflow.providers.amazon.aws.operators.glue
¶
Module Contents¶
Classes¶
Create an AWS Glue Job. |
- class airflow.providers.amazon.aws.operators.glue.GlueJobOperator(*, job_name='aws_glue_default_job', job_desc='AWS Glue Job with Airflow', script_location=None, concurrent_run_limit=None, script_args=None, retry_limit=0, num_of_dpus=None, aws_conn_id='aws_default', region_name=None, s3_bucket=None, iam_role_name=None, create_job_kwargs=None, run_job_kwargs=None, wait_for_completion=True, deferrable=False, verbose=False, update_config=False, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Create an AWS Glue Job.
AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala.
See also
For more information on how to use this operator, take a look at the guide: Submit an AWS Glue job
- Parameters
job_name (str) – unique job name per AWS Account
script_location (str | None) – location of ETL script. Must be a local or S3 path
job_desc (str) – job description details
concurrent_run_limit (int | None) – The maximum number of concurrent runs allowed for a job
script_args (dict | None) – etl script arguments and AWS Glue arguments (templated)
retry_limit (int) – The maximum number of times to retry this job if it fails
num_of_dpus (int | float | None) – Number of AWS Glue DPUs to allocate to this Job.
region_name (str | None) – aws region name (example: us-east-1)
s3_bucket (str | None) – S3 bucket where logs and local etl script will be uploaded
iam_role_name (str | None) – AWS IAM Role for Glue Job Execution
create_job_kwargs (dict | None) – Extra arguments for Glue Job Creation
run_job_kwargs (dict | None) – Extra arguments for Glue Job Run
wait_for_completion (bool) – Whether to wait for job run completion. (default: True)
deferrable (bool) – If True, the operator will wait asynchronously for the job to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
verbose (bool) – If True, Glue Job Run logs show in the Airflow Task Logs. (default: False)
update_config (bool) – If True, Operator will update job configuration. (default: False)