airflow.providers.amazon.aws.transfers.sql_to_s3

Module Contents

Classes

FILE_FORMAT

Possible file formats.

SqlToS3Operator

Saves data from a specific SQL query into a file in S3.

Attributes

FileOptions

FILE_OPTIONS_MAP

class airflow.providers.amazon.aws.transfers.sql_to_s3.FILE_FORMAT[source]

Bases: enum.Enum

Possible file formats.

CSV[source]
JSON[source]
PARQUET[source]
airflow.providers.amazon.aws.transfers.sql_to_s3.FileOptions[source]
airflow.providers.amazon.aws.transfers.sql_to_s3.FILE_OPTIONS_MAP[source]
class airflow.providers.amazon.aws.transfers.sql_to_s3.SqlToS3Operator(*, query, s3_bucket, s3_key, sql_conn_id, sql_hook_params=None, parameters=None, replace=False, aws_conn_id='aws_default', verify=None, file_format='csv', max_rows_per_file=0, pd_kwargs=None, groupby_kwargs=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Saves data from a specific SQL query into a file in S3.

See also

For more information on how to use this operator, take a look at the guide: MySQL to Amazon S3 transfer operator

Parameters
  • query (str) – the sql query to be executed. If you want to execute a file, place the absolute path of it, ending with .sql extension. (templated)

  • s3_bucket (str) – bucket where the data will be stored. (templated)

  • s3_key (str) – desired key for the file. It includes the name of the file. (templated)

  • replace (bool) – whether or not to replace the file in S3 if it previously existed

  • sql_conn_id (str) – reference to a specific database.

  • sql_hook_params (dict | None) – Extra config params to be passed to the underlying hook. Should match the desired hook constructor params.

  • parameters (None | Mapping[str, Any] | list | tuple) – (optional) the parameters to render the SQL query with.

  • aws_conn_id (str | None) – reference to a specific S3 connection

  • verify (bool | str | None) –

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • file_format (typing_extensions.Literal[csv, json, parquet]) – the destination file format, only string ‘csv’, ‘json’ or ‘parquet’ is accepted.

  • max_rows_per_file (int) – (optional) argument to set destination file number of rows limit, if source data is larger than that, it will be dispatched into multiple files. Will be ignored if groupby_kwargs argument is specified.

  • pd_kwargs (dict | None) – arguments to include in DataFrame .to_parquet(), .to_json() or .to_csv().

  • groupby_kwargs (dict | None) – argument to include in DataFrame groupby().

template_fields: Sequence[str] = ('s3_bucket', 's3_key', 'query', 'sql_conn_id')[source]
template_ext: Sequence[str] = ('.sql',)[source]
template_fields_renderers[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?