Amazon S3 to Amazon Redshift¶
Use the S3ToRedshiftOperator
transfer to copy the data from an Amazon Simple Storage Service (S3) file into an
Amazon Redshift table.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Airflow®
Operators¶
Amazon S3 To Amazon Redshift transfer operator¶
This operator loads data from Amazon S3 to an existing Amazon Redshift table.
To get more information about this operator visit:
S3ToRedshiftOperator
Example usage:
tests/system/amazon/aws/example_redshift_s3_transfers.py
transfer_s3_to_redshift = S3ToRedshiftOperator(
task_id="transfer_s3_to_redshift",
redshift_data_api_kwargs={
"database": DB_NAME,
"cluster_identifier": redshift_cluster_identifier,
"db_user": DB_LOGIN,
"wait_for_completion": True,
},
s3_bucket=bucket_name,
s3_key=S3_KEY_2,
schema="PUBLIC",
table=REDSHIFT_TABLE,
copy_options=["csv"],
)
Example of ingesting multiple keys:
tests/system/amazon/aws/example_redshift_s3_transfers.py
transfer_s3_to_redshift_multiple = S3ToRedshiftOperator(
task_id="transfer_s3_to_redshift_multiple",
redshift_data_api_kwargs={
"database": DB_NAME,
"cluster_identifier": redshift_cluster_identifier,
"db_user": DB_LOGIN,
"wait_for_completion": True,
},
s3_bucket=bucket_name,
s3_key=S3_KEY_PREFIX,
schema="PUBLIC",
table=REDSHIFT_TABLE,
copy_options=["csv"],
)
You can find more information to the COPY
command used
here.