Amazon S3 to SQL¶
Use the S3ToSqlOperator
transfer to copy data from an Amazon Simple Storage Service (S3)
file into an existing SQL table. By providing a parser function which is applied to the
downloaded file, this operator can accept a variety of file formats.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Apache Airflow®
Operators¶
Amazon S3 To SQL Transfer Operator¶
To get more information about this operator visit:
S3ToSqlOperator
Example usage with a parser for a csv file. This parser loads the file into memory and returns a list of rows:
#
# This operator requires a parser method. The Parser should take a filename as input
# and return an iterable of rows.
# This example parser uses the builtin csv library and returns a list of rows
#
def parse_csv_to_list(filepath):
import csv
with open(filepath, newline="") as file:
return [row for row in csv.reader(file)]
transfer_s3_to_sql = S3ToSqlOperator(
task_id="transfer_s3_to_sql",
s3_bucket=s3_bucket_name,
s3_key=s3_key,
table=SQL_TABLE_NAME,
column_list=SQL_COLUMN_LIST,
parser=parse_csv_to_list,
sql_conn_id=conn_id_name,
)
Example usage with a parser function that returns a generator.
#
# As the parser can return any kind of iterator, a generator is also allowed.
# This example parser returns a generator which prevents python from loading
# the whole file into memory.
#
def parse_csv_to_generator(filepath):
import csv
with open(filepath, newline="") as file:
yield from csv.reader(file)
transfer_s3_to_sql_generator = S3ToSqlOperator(
task_id="transfer_s3_to_sql_paser_to_generator",
s3_bucket=s3_bucket_name,
s3_key=s3_key,
table=SQL_TABLE_NAME,
column_list=SQL_COLUMN_LIST,
parser=parse_csv_to_generator,
sql_conn_id=conn_id_name,
)