SQL to Amazon S3¶
Use SqlToS3Operator
to copy data from a SQL server to an Amazon Simple Storage Service (S3) file.
SqlToS3Operator
is compatible with any SQL connection as long as the SQL hook has function that
converts the SQL result to pandas dataframe
(e.g. MySQL, Hive, …).
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation
Operators¶
MySQL to Amazon S3 transfer operator¶
This example sends the response of a MySQL query to an Amazon S3 file.
To get more information about this operator visit:
SqlToS3Operator
Example usage:
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query=SQL_QUERY,
s3_bucket=bucket_name,
s3_key=key,
replace=True,
)
Grouping¶
We can group the data in the table by passing the groupby_kwargs
param. This param accepts a dict
which will be passed to pandas groupby() as kwargs.
Example usage:
sql_to_s3_task_with_groupby = SqlToS3Operator(
task_id="sql_to_s3_with_groupby_task",
sql_conn_id=conn_id_name,
query=SQL_QUERY,
s3_bucket=bucket_name,
s3_key=key,
replace=True,
groupby_kwargs={"by": "color"},
)