Amazon S3 Glacier¶
Amazon Glacier is a secure, durable, and extremely low-cost Amazon S3 cloud storage class for data archiving and long-term backup.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Apache Airflow®
Operators¶
Create an Amazon Glacier job¶
To initiate an Amazon Glacier inventory retrieval job
use GlacierCreateJobOperator
This Operator returns a dictionary of information related to the initiated job such as jobId, which is required for subsequent tasks.
create_glacier_job = GlacierCreateJobOperator(task_id="create_glacier_job", vault_name=vault_name)
JOB_ID = '{{ task_instance.xcom_pull("create_glacier_job")["jobId"] }}'
Upload archive to an Amazon Glacier¶
To add an archive to an Amazon S3 Glacier vault
use GlacierUploadArchiveOperator
upload_archive_to_glacier = GlacierUploadArchiveOperator(
task_id="upload_data_to_glacier", vault_name=vault_name, body=b"Test Data"
)
Sensors¶
Wait on an Amazon Glacier job state¶
To wait on the status of an Amazon Glacier Job to reach a terminal state
use GlacierJobOperationSensor
wait_for_operation_complete = GlacierJobOperationSensor(
vault_name=vault_name,
job_id=JOB_ID,
task_id="wait_for_operation_complete",
)