Transfers data from AWS Redshift into a S3 Bucket.

Module Contents



Executes an UNLOAD command to s3 as a CSV with headers

class*, s3_bucket: str, s3_key: str, schema: Optional[str] = None, table: Optional[str] = None, select_query: Optional[str] = None, redshift_conn_id: str = 'redshift_default', aws_conn_id: str = 'aws_default', verify: Optional[Union[bool, str]] = None, unload_options: Optional[List] = None, autocommit: bool = False, include_header: bool = False, parameters: Optional[Union[Mapping, Iterable]] = None, table_as_file_name: bool = True, **kwargs)[source]

Bases: airflow.models.BaseOperator

Executes an UNLOAD command to s3 as a CSV with headers

  • s3_bucket (str) -- reference to a specific S3 bucket

  • s3_key (str) -- reference to a specific S3 key. If table_as_file_name is set to False, this param must include the desired file name

  • schema (str) -- reference to a specific schema in redshift database Applicable when table param provided.

  • table (str) -- reference to a specific table in redshift database Used when select_query param not provided.

  • select_query (str) -- custom select query to fetch data from redshift database

  • redshift_conn_id (str) -- reference to a specific redshift database

  • aws_conn_id (str) -- reference to a specific S3 connection If the AWS connection contains 'aws_iam_role' in extras the operator will use AWS STS credentials with a token

  • verify (bool or str) --

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • unload_options (list) -- reference to a list of UNLOAD options

  • autocommit (bool) -- If set to True it will automatically commit the UNLOAD statement. Otherwise it will be committed right before the redshift connection gets closed.

  • include_header (bool) -- If set to True the s3 file contains the header columns.

  • parameters (dict or iterable) -- (optional) the parameters to render the SQL query with.

  • table_as_file_name (bool) -- If set to True, the s3 file will be named as the table. Applicable when table param provided.

template_fields :Sequence[str] = ['s3_bucket', 's3_key', 'schema', 'table', 'unload_options', 'select_query'][source]
template_ext :Sequence[str] = ['.sql'][source]
ui_color = #ededed[source]
execute(self, context: airflow.utils.context.Context) None[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?