Amazon Glacier Operator

Amazon Glacier is a secure, durable, and extremely low-cost Amazon S3 cloud storage classes for data archiving and long-term backup. For more information about the service visit Amazon Glacier API documentation

GlacierCreateJobOperator

Operator task is to initiate an Amazon Glacier inventory-retrieval job. The operation returns dictionary of information related to the initiated job like jobId what is required for subsequent tasks.

To get more information about operator visit: GlacierCreateJobOperator

Example usage:

airflow/providers/amazon/aws/example_dags/example_glacier_to_gcs.py[source]

create_glacier_job = GlacierCreateJobOperator(task_id="create_glacier_job", vault_name=VAULT_NAME)
JOB_ID = '{{ task_instance.xcom_pull("create_glacier_job")["jobId"] }}'

GlacierJobOperationSensor

Operator task is to wait until task create_glacier_job will be completed. When sensor returns true then subsequent tasks can be executed. In this case subsequent tasks are: GlacierDownloadArchive and GlacierTransferDataToGCS.

Job states:

  • Succeeded – job is finished and for example archives from the vault can be downloaded

  • InProgress – job is in progress and you have to wait until it's done (Succeeded)

GlacierJobOperationSensor checks the job status. If response status code is succeeded then sensor returns true and subsequent tasks will be executed. If response code is InProgress then sensor returns false and reschedule task with poke_interval=60 * 20. Which means that every next request will be sent every 20 minutes.

To get more information about operator visit: GlacierJobOperationSensor

Example usage:

airflow/providers/amazon/aws/example_dags/example_glacier_to_gcs.py[source]

transfer_archive_to_gcs = GlacierToGCSOperator(
    task_id="transfer_archive_to_gcs",
    vault_name=VAULT_NAME,
    bucket_name=BUCKET_NAME,
    object_name=OBJECT_NAME,
    gzip=False,
    # Override to match your needs
    # If chunk size is bigger than actual file size
    # then whole file will be downloaded
    chunk_size=1024,
    delegate_to=None,
    google_impersonation_chain=None,
)

Was this entry helpful?