Azure Blob Storage to Google Cloud Storage (GCS) Transfer Operator¶
The Blob service stores text and binary data as objects in the cloud. The Blob service offers the following three resources: the storage account, containers, and blobs. Within your storage account, containers provide a way to organize sets of blobs. For more information about the service visit Azure Blob Storage API documentation.
Before you begin¶
Before using Blob Storage within Airflow you need to authenticate your account with Token, Login and Password. Please follow Azure instructions to do it.
TOKEN should be added to the Connection in Airflow in JSON format, Login and Password as plain text. You can check how to do such connection.
See following example. Set values for these fields:
Connection Id: wasb_default
Login: Storage Account Name
Password: KEY1
Extra: {"sas_token": "TOKEN"}
Transfer Data from Blob Storage to Google Cloud Storage¶
Operator transfers data from Azure Blob Storage to specified bucket in Google Cloud Storage
To get information about jobs within a Azure Blob Storage use:
AzureBlobStorageToGCSOperator
Example usage:
with DAG(
DAG_ID,
schedule=None,
start_date=datetime(2021, 1, 1), # Override to match your needs
default_args={"container_name": AZURE_CONTAINER_NAME, "blob_name": BLOB_NAME, "prefix": PREFIX_NAME},
) as dag:
wait_for_blob = WasbBlobSensor(task_id="wait_for_blob")
wait_for_blob_async = WasbBlobSensor(task_id="wait_for_blob_async", deferrable=True)
wait_for_blob_prefix = WasbPrefixSensor(task_id="wait_for_blob_prefix")
wait_for_blob_prefix_async = WasbPrefixSensor(
task_id="wait_for_blob_prefix_async",
deferrable=True,
)
transfer_files_to_gcs = AzureBlobStorageToGCSOperator(
task_id="transfer_files_to_gcs",
# AZURE arg
file_path=GCP_OBJECT_NAME,
# GCP args
bucket_name=GCP_BUCKET_NAME,
object_name=GCP_OBJECT_NAME,
filename=GCP_BUCKET_FILE_PATH,
gzip=False,
impersonation_chain=None,
)