airflow.providers.amazon.aws.transfers.redshift_to_s3

Transfers data from AWS Redshift into a S3 Bucket.

Module Contents

Classes

RedshiftToS3Operator

Execute an UNLOAD command to s3 as a CSV with headers.

class airflow.providers.amazon.aws.transfers.redshift_to_s3.RedshiftToS3Operator(*, s3_bucket, s3_key, schema=None, table=None, select_query=None, redshift_conn_id='redshift_default', aws_conn_id='aws_default', verify=None, unload_options=None, autocommit=False, include_header=False, parameters=None, table_as_file_name=True, redshift_data_api_kwargs=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute an UNLOAD command to s3 as a CSV with headers.

See also

For more information on how to use this operator, take a look at the guide: Amazon Redshift To Amazon S3 transfer operator

Parameters
  • s3_bucket (str) – reference to a specific S3 bucket

  • s3_key (str) – reference to a specific S3 key. If table_as_file_name is set to False, this param must include the desired file name

  • schema (str | None) – reference to a specific schema in redshift database, used when table param provided and select_query param not provided. Do not provide when unloading a temporary table

  • table (str | None) – reference to a specific table in redshift database, used when schema param provided and select_query param not provided

  • select_query (str | None) – custom select query to fetch data from redshift database, has precedence over default query SELECT * FROM ``schema`.``table``

  • redshift_conn_id (str) – reference to a specific redshift database

  • aws_conn_id (str | None) – reference to a specific S3 connection If the AWS connection contains ‘aws_iam_role’ in extras the operator will use AWS STS credentials with a token https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-authorization.html#copy-credentials

  • verify (bool | str | None) –

    Whether to verify SSL certificates for S3 connection. By default, SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • unload_options (list | None) – reference to a list of UNLOAD options

  • autocommit (bool) – If set to True it will automatically commit the UNLOAD statement. Otherwise, it will be committed right before the redshift connection gets closed.

  • include_header (bool) – If set to True the s3 file contains the header columns.

  • parameters (Iterable | Mapping | None) – (optional) the parameters to render the SQL query with.

  • table_as_file_name (bool) – If set to True, the s3 file will be named as the table. Applicable when table param provided.

  • redshift_data_api_kwargs (dict | None) – If using the Redshift Data API instead of the SQL-based connection, dict of arguments for the hook’s execute_query method. Cannot include any of these kwargs: {'sql', 'parameters'}

property default_select_query: str | None[source]
template_fields: Sequence[str] = ('s3_bucket', 's3_key', 'schema', 'table', 'unload_options', 'select_query', 'redshift_conn_id',...[source]
template_ext: Sequence[str] = ('.sql',)[source]
template_fields_renderers[source]
ui_color = '#ededed'[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?