airflow.providers.amazon.aws.operators.glue
¶
Module Contents¶
-
class
airflow.providers.amazon.aws.operators.glue.
AwsGlueJobOperator
(*, job_name: str = 'aws_glue_default_job', job_desc: str = 'AWS Glue Job with Airflow', script_location: Optional[str] = None, concurrent_run_limit: Optional[int] = None, script_args: Optional[dict] = None, retry_limit: Optional[int] = None, num_of_dpus: int = 6, aws_conn_id: str = 'aws_default', region_name: Optional[str] = None, s3_bucket: Optional[str] = None, iam_role_name: Optional[str] = None, create_job_kwargs: Optional[dict] = None, run_job_kwargs: Optional[dict] = None, wait_for_completion: bool = True, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala
- Parameters
job_name (Optional[str]) – unique job name per AWS Account
script_location (Optional[str]) – location of ETL script. Must be a local or S3 path
job_desc (Optional[str]) – job description details
concurrent_run_limit (Optional[int]) – The maximum number of concurrent runs allowed for a job
script_args (dict) – etl script arguments and AWS Glue arguments (templated)
retry_limit (Optional[int]) – The maximum number of times to retry this job if it fails
num_of_dpus (int) – Number of AWS Glue DPUs to allocate to this Job.
region_name (str) – aws region name (example: us-east-1)
s3_bucket (Optional[str]) – S3 bucket where logs and local etl script will be uploaded
iam_role_name (Optional[str]) – AWS IAM Role for Glue Job Execution
create_job_kwargs (Optional[dict]) – Extra arguments for Glue Job Creation
run_job_kwargs (Optional[dict]) – Extra arguments for Glue Job Run
wait_for_completion (bool) – Whether or not wait for job run completion. (default: True)