airflow.providers.amazon.aws.transfers.sql_to_s3
¶
Module Contents¶
Classes¶
Possible file formats. |
|
Saves data from a specific SQL query into a file in S3. |
Attributes¶
- class airflow.providers.amazon.aws.transfers.sql_to_s3.FILE_FORMAT[source]¶
Bases:
enum.Enum
Possible file formats.
- class airflow.providers.amazon.aws.transfers.sql_to_s3.SqlToS3Operator(*, query, s3_bucket, s3_key, sql_conn_id, sql_hook_params=None, parameters=None, replace=False, aws_conn_id='aws_default', verify=None, file_format='csv', max_rows_per_file=0, pd_kwargs=None, groupby_kwargs=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Saves data from a specific SQL query into a file in S3.
See also
For more information on how to use this operator, take a look at the guide: MySQL to Amazon S3 transfer operator
- Parameters
query (str) – the sql query to be executed. If you want to execute a file, place the absolute path of it, ending with .sql extension. (templated)
s3_bucket (str) – bucket where the data will be stored. (templated)
s3_key (str) – desired key for the file. It includes the name of the file. (templated)
replace (bool) – whether or not to replace the file in S3 if it previously existed
sql_conn_id (str) – reference to a specific database.
sql_hook_params (dict | None) – Extra config params to be passed to the underlying hook. Should match the desired hook constructor params.
parameters (None | Mapping[str, Any] | list | tuple) – (optional) the parameters to render the SQL query with.
aws_conn_id (str | None) – reference to a specific S3 connection
Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:
False
: do not validate SSL certificates. SSL will still be used(unless use_ssl is False), but SSL certificates will not be verified.
path/to/cert/bundle.pem
: A filename of the CA cert bundle to uses.You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.
file_format (typing_extensions.Literal[csv, json, parquet]) – the destination file format, only string ‘csv’, ‘json’ or ‘parquet’ is accepted.
max_rows_per_file (int) – (optional) argument to set destination file number of rows limit, if source data is larger than that, it will be dispatched into multiple files. Will be ignored if
groupby_kwargs
argument is specified.pd_kwargs (dict | None) – arguments to include in DataFrame
.to_parquet()
,.to_json()
or.to_csv()
.groupby_kwargs (dict | None) – argument to include in DataFrame
groupby()
.