airflow.providers.amazon.aws.operators.glue¶
Module Contents¶
Classes¶
| Create an AWS Glue Job. | 
- class airflow.providers.amazon.aws.operators.glue.GlueJobOperator(*, job_name='aws_glue_default_job', job_desc='AWS Glue Job with Airflow', script_location=None, concurrent_run_limit=None, script_args=None, retry_limit=0, num_of_dpus=None, aws_conn_id='aws_default', region_name=None, s3_bucket=None, iam_role_name=None, create_job_kwargs=None, run_job_kwargs=None, wait_for_completion=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), verbose=False, update_config=False, job_poll_interval=6, stop_job_run_on_kill=False, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Create an AWS Glue Job. - AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala. - See also - For more information on how to use this operator, take a look at the guide: Submit an AWS Glue job - Parameters
- job_name (str) – unique job name per AWS Account 
- script_location (str | None) – location of ETL script. Must be a local or S3 path 
- job_desc (str) – job description details 
- concurrent_run_limit (int | None) – The maximum number of concurrent runs allowed for a job 
- script_args (dict | None) – etl script arguments and AWS Glue arguments (templated) 
- retry_limit (int) – The maximum number of times to retry this job if it fails 
- num_of_dpus (int | float | None) – Number of AWS Glue DPUs to allocate to this Job. 
- region_name (str | None) – aws region name (example: us-east-1) 
- s3_bucket (str | None) – S3 bucket where logs and local etl script will be uploaded 
- iam_role_name (str | None) – AWS IAM Role for Glue Job Execution 
- create_job_kwargs (dict | None) – Extra arguments for Glue Job Creation 
- run_job_kwargs (dict | None) – Extra arguments for Glue Job Run 
- wait_for_completion (bool) – Whether to wait for job run completion. (default: True) 
- deferrable (bool) – If True, the operator will wait asynchronously for the job to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False) 
- verbose (bool) – If True, Glue Job Run logs show in the Airflow Task Logs. (default: False) 
- update_config (bool) – If True, Operator will update job configuration. (default: False) 
- stop_job_run_on_kill (bool) – If True, Operator will stop the job run when task is killed. 
 
 
