Amazon S3 to SQL

Use the S3ToSqlOperator transfer to copy data from an Amazon Simple Storage Service (S3) file into an existing SQL table. By providing a parser function which is applied to the downloaded file, this operator can accept a variety of file formats.

Prerequisite Tasks

To use these operators, you must do a few things:

Operators

Amazon S3 To SQL Transfer Operator

To get more information about this operator visit: S3ToSqlOperator

Example usage with a parser for a csv file. This parser loads the file into memory and returns a list of rows:

tests/system/amazon/aws/example_s3_to_sql.py

#
# This operator requires a parser method. The Parser should take a filename as input
# and return an iterable of rows.
# This example parser uses the builtin csv library and returns a list of rows
#
def parse_csv_to_list(filepath):
    import csv

    with open(filepath, newline="") as file:
        return list(csv.reader(file))

transfer_s3_to_sql = S3ToSqlOperator(
    task_id="transfer_s3_to_sql",
    s3_bucket=s3_bucket_name,
    s3_key=s3_key,
    table=SQL_TABLE_NAME,
    column_list=SQL_COLUMN_LIST,
    parser=parse_csv_to_list,
    sql_conn_id=conn_id_name,
)

Example usage with a parser function that returns a generator.

tests/system/amazon/aws/example_s3_to_sql.py

#
# As the parser can return any kind of iterator, a generator is also allowed.
# This example parser returns a generator which prevents python from loading
# the whole file into memory.
#

def parse_csv_to_generator(filepath):
    import csv

    with open(filepath, newline="") as file:
        yield from csv.reader(file)

transfer_s3_to_sql_generator = S3ToSqlOperator(
    task_id="transfer_s3_to_sql_paser_to_generator",
    s3_bucket=s3_bucket_name,
    s3_key=s3_key,
    table=SQL_TABLE_NAME,
    column_list=SQL_COLUMN_LIST,
    parser=parse_csv_to_generator,
    sql_conn_id=conn_id_name,
)

Was this entry helpful?