Azure Blob Storage to Google Cloud Storage (GCS) Transfer Operator

The Blob service stores text and binary data as objects in the cloud. The Blob service offers the following three resources: the storage account, containers, and blobs. Within your storage account, containers provide a way to organize sets of blobs. For more information about the service visit Azure Blob Storage API documentation.

Before you begin

Before using Blob Storage within Airflow you need to authenticate your account with Token, Login and Password. Please follow Azure instructions to do it.

TOKEN should be added to the Connection in Airflow in JSON format, Login and Password as plain text. You can check how to do such connection.

See following example. Set values for these fields:

Connection Id: wasb_default
Login: Storage Account Name
Password: KEY1
Extra: {"sas_token": "TOKEN"}

Transfer Data from Blob Storage to Google Cloud Storage

Operator transfers data from Azure Blob Storage to specified bucket in Google Cloud Storage

To get information about jobs within a Azure Blob Storage use: AzureBlobStorageToGCSOperator

Example usage:

tests/system/providers/microsoft/azure/example_azure_blob_to_gcs.py[source]

with DAG(
    DAG_ID,
    schedule=None,
    start_date=datetime(2021, 1, 1),  # Override to match your needs
    default_args={"container_name": AZURE_CONTAINER_NAME, "blob_name": BLOB_NAME, "prefix": PREFIX_NAME},
) as dag:
    wait_for_blob = WasbBlobSensor(task_id="wait_for_blob")

    wait_for_blob_async = WasbBlobSensor(task_id="wait_for_blob_async", deferrable=True)

    wait_for_blob_prefix = WasbPrefixSensor(task_id="wait_for_blob_prefix")

    wait_for_blob_prefix_async = WasbPrefixSensor(
        task_id="wait_for_blob_prefix_async",
        deferrable=True,
    )

    transfer_files_to_gcs = AzureBlobStorageToGCSOperator(
        task_id="transfer_files_to_gcs",
        # AZURE arg
        file_path=GCP_OBJECT_NAME,
        # GCP args
        bucket_name=GCP_BUCKET_NAME,
        object_name=GCP_OBJECT_NAME,
        filename=GCP_BUCKET_FILE_PATH,
        gzip=False,
        impersonation_chain=None,
    )

Was this entry helpful?