SQL to Amazon S3

Use SqlToS3Operator to copy data from a SQL server to an Amazon Simple Storage Service (S3) file. SqlToS3Operator is compatible with any SQL connection as long as the SQL hook has function that converts the SQL result to pandas dataframe (e.g. MySQL, Hive, …).

Prerequisite Tasks

To use these operators, you must do a few things:

Operators

MySQL to Amazon S3 transfer operator

This example sends the response of a MySQL query to an Amazon S3 file.

To get more information about this operator visit: SqlToS3Operator

Example usage:

tests/system/amazon/aws/example_sql_to_s3.py

sql_to_s3_task = SqlToS3Operator(
    task_id="sql_to_s3_task",
    sql_conn_id=conn_id_name,
    query=SQL_QUERY,
    s3_bucket=bucket_name,
    s3_key=key,
    replace=True,
)

Grouping

We can group the data in the table by passing the groupby_kwargs param. This param accepts a dict which will be passed to pandas groupby() as kwargs.

Example usage:

tests/system/amazon/aws/example_sql_to_s3.py

sql_to_s3_task_with_groupby = SqlToS3Operator(
    task_id="sql_to_s3_with_groupby_task",
    sql_conn_id=conn_id_name,
    query=SQL_QUERY,
    s3_bucket=bucket_name,
    s3_key=key,
    replace=True,
    groupby_kwargs={"by": "color"},
)

Was this entry helpful?