Amazon S3 Glacier to GCS¶
Amazon Glacier is a secure, durable, and extremely low-cost Amazon S3 cloud storage class for data archiving and long-term backup.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Apache Airflow®
Operators¶
Amazon S3 Glacier To GCS transfer operator¶
To transfer data from an Amazon Glacier vault to Google Cloud Storage you can use
GlacierToGCSOperator
transfer_archive_to_gcs = GlacierToGCSOperator(
task_id="transfer_archive_to_gcs",
vault_name=vault_name,
bucket_name=gcs_bucket_name,
object_name=gcs_object_name,
gzip=False,
# Override to match your needs
# If chunk size is bigger than actual file size
# then whole file will be downloaded
chunk_size=1024,
)
Note
Please be aware that GlacierToGCSOperator depends on available memory. Transferring large files may exhaust memory on the worker host.