SQL to Amazon S3¶

Use SqlToS3Operator to copy data from a SQL server to an Amazon Simple Storage Service (S3) file. SqlToS3Operator is compatible with any SQL connection as long as the SQL hook has function that converts the SQL result to pandas dataframe (e.g. MySQL, Hive, …).

Prerequisite Tasks¶

To use these operators, you must do a few things:

Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'
Detailed information is available Installation of Airflow®
Setup Connection.

Operators¶

MySQL to Amazon S3 transfer operator¶

This example sends the response of a MySQL query to an Amazon S3 file.

To get more information about this operator visit: SqlToS3Operator

Example usage:

tests/system/amazon/aws/example_sql_to_s3.py[source]

sql_to_s3_task = SqlToS3Operator(
    task_id="sql_to_s3_task",
    sql_conn_id=conn_id_name,
    query=SQL_QUERY,
    s3_bucket=bucket_name,
    s3_key=key,
    replace=True,
)

Grouping¶

We can group the data in the table by passing the groupby_kwargs param. This param accepts a dict which will be passed to pandas groupby() as kwargs.

Example usage:

tests/system/amazon/aws/example_sql_to_s3.py[source]

sql_to_s3_task_with_groupby = SqlToS3Operator(
    task_id="sql_to_s3_with_groupby_task",
    sql_conn_id=conn_id_name,
    query=SQL_QUERY,
    s3_bucket=bucket_name,
    s3_key=key,
    replace=True,
    groupby_kwargs={"by": "color"},
)

Reference¶

AWS boto3 library documentation for Amazon S3