airflow.providers.amazon.aws.hooks.redshift_data

Module Contents

Classes

QueryExecutionOutput

Describes the output of a query execution.

RedshiftDataHook

Interact with Amazon Redshift Data API.

Attributes

FINISHED_STATE

FAILED_STATE

ABORTED_STATE

FAILURE_STATES

RUNNING_STATES

airflow.providers.amazon.aws.hooks.redshift_data.FINISHED_STATE = 'FINISHED'[source]
airflow.providers.amazon.aws.hooks.redshift_data.FAILED_STATE = 'FAILED'[source]
airflow.providers.amazon.aws.hooks.redshift_data.ABORTED_STATE = 'ABORTED'[source]
airflow.providers.amazon.aws.hooks.redshift_data.FAILURE_STATES[source]
airflow.providers.amazon.aws.hooks.redshift_data.RUNNING_STATES[source]
class airflow.providers.amazon.aws.hooks.redshift_data.QueryExecutionOutput[source]

Describes the output of a query execution.

statement_id: str[source]
session_id: str | None[source]
exception airflow.providers.amazon.aws.hooks.redshift_data.RedshiftDataQueryFailedError[source]

Bases: ValueError

Raise an error that redshift data query failed.

exception airflow.providers.amazon.aws.hooks.redshift_data.RedshiftDataQueryAbortedError[source]

Bases: ValueError

Raise an error that redshift data query was aborted.

class airflow.providers.amazon.aws.hooks.redshift_data.RedshiftDataHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook[mypy_boto3_redshift_data.RedshiftDataAPIServiceClient]

Interact with Amazon Redshift Data API.

Provide thin wrapper around boto3.client("redshift-data").

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

execute_query(sql, database=None, cluster_identifier=None, db_user=None, parameters=None, secret_arn=None, statement_name=None, with_event=False, wait_for_completion=True, poll_interval=10, workgroup_name=None, session_id=None, session_keep_alive_seconds=None)[source]

Execute a statement against Amazon Redshift.

Parameters
  • sql (str | list[str]) – the SQL statement or list of SQL statement to run

  • database (str | None) – the name of the database

  • cluster_identifier (str | None) – unique identifier of a cluster

  • db_user (str | None) – the database username

  • parameters (Iterable | None) – the parameters for the SQL statement

  • secret_arn (str | None) – the name or ARN of the secret that enables db access

  • statement_name (str | None) – the name of the SQL statement

  • with_event (bool) – whether to send an event to EventBridge

  • wait_for_completion (bool) – whether to wait for a result

  • poll_interval (int) – how often in seconds to check the query status

  • workgroup_name (str | None) – name of the Redshift Serverless workgroup. Mutually exclusive with cluster_identifier. Specify this parameter to query Redshift Serverless. More info https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html

  • session_id (str | None) – the session identifier of the query

  • session_keep_alive_seconds (int | None) – duration in seconds to keep the session alive after the query finishes. The maximum time a session can keep alive is 24 hours

Returns statement_id

str, the UUID of the statement

Return type

QueryExecutionOutput

wait_for_results(statement_id, poll_interval)[source]
check_query_is_finished(statement_id)[source]

Check whether query finished, raise exception is failed.

parse_statement_response(resp)[source]

Parse the response of describe_statement.

get_table_primary_key(table, database, schema='public', cluster_identifier=None, workgroup_name=None, db_user=None, secret_arn=None, statement_name=None, with_event=False, wait_for_completion=True, poll_interval=10)[source]

Return the table primary key.

Copied from RedshiftSQLHook.get_table_primary_key()

Parameters
  • table (str) – Name of the target table

  • database (str) – the name of the database

  • schema (str | None) – Name of the target schema, public by default

  • cluster_identifier (str | None) – unique identifier of a cluster

  • workgroup_name (str | None) – name of the Redshift Serverless workgroup. Mutually exclusive with cluster_identifier. Specify this parameter to query Redshift Serverless. More info https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html

  • db_user (str | None) – the database username

  • secret_arn (str | None) – the name or ARN of the secret that enables db access

  • statement_name (str | None) – the name of the SQL statement

  • with_event (bool) – indicates whether to send an event to EventBridge

  • wait_for_completion (bool) – indicates whether to wait for a result, if True wait, if False don’t wait

  • poll_interval (int) – how often in seconds to check the query status

Returns

Primary key columns list

Return type

list[str] | None

async is_still_running(statement_id)[source]

Async function to check whether the query is still running.

Parameters

statement_id (str) – the UUID of the statement

async check_query_is_finished_async(statement_id)[source]

Async function to check statement is finished.

It takes statement_id, makes async connection to redshift data to get the query status by statement_id and returns the query status.

Parameters

statement_id (str) – the UUID of the statement

Was this entry helpful?