airflow.providers.amazon.aws.operators.redshift_data

Module Contents

Classes

RedshiftDataOperator

Executes SQL Statements against an Amazon Redshift cluster using Redshift Data.

class airflow.providers.amazon.aws.operators.redshift_data.RedshiftDataOperator(sql, database=None, cluster_identifier=None, db_user=None, parameters=None, secret_arn=None, statement_name=None, with_event=False, wait_for_completion=True, poll_interval=10, return_sql_result=False, workgroup_name=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), session_id=None, session_keep_alive_seconds=None, **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.redshift_data.RedshiftDataHook]

Executes SQL Statements against an Amazon Redshift cluster using Redshift Data.

… see also::

For more information on how to use this operator, take a look at the guide: Execute a statement on an Amazon Redshift cluster

Parameters
  • database (str | None) – the name of the database

  • sql (str | list) – the SQL statement or list of SQL statement to run

  • cluster_identifier (str | None) – unique identifier of a cluster

  • db_user (str | None) – the database username

  • parameters (list | None) – the parameters for the SQL statement

  • secret_arn (str | None) – the name or ARN of the secret that enables db access

  • statement_name (str | None) – the name of the SQL statement

  • with_event (bool) – indicates whether to send an event to EventBridge

  • wait_for_completion (bool) – indicates whether to wait for a result, if True wait, if False don’t wait

  • poll_interval (int) – how often in seconds to check the query status

  • return_sql_result (bool) – if True will return the result of an SQL statement, if False (default) will return statement ID

  • workgroup_name (str | None) – name of the Redshift Serverless workgroup. Mutually exclusive with cluster_identifier. Specify this parameter to query Redshift Serverless. More info https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html

  • session_id (str | None) – the session identifier of the query

  • session_keep_alive_seconds (int | None) – duration in seconds to keep the session alive after the query finishes. The maximum time a session can keep alive is 24 hours

  • aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).

  • region_name – AWS region_name. If not specified then the default boto3 behaviour is used.

  • verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html

  • botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]
template_fields[source]
template_ext = ('.sql',)[source]
template_fields_renderers[source]
execute(context)[source]

Execute a statement against Amazon Redshift.

execute_complete(context, event=None)[source]
get_sql_results(statement_id, return_sql_result)[source]

Retrieve either the result of the SQL query, or the statement ID(s).

Parameters
  • statement_id (str) – Statement ID of the running queries

  • return_sql_result (bool) – Boolean, true if results should be returned

on_kill()[source]

Cancel the submitted redshift query.

Was this entry helpful?