airflow.contrib.operators.adls_to_gcs
¶
Module Contents¶
-
class
airflow.contrib.operators.adls_to_gcs.
AdlsToGoogleCloudStorageOperator
(src_adls, dest_gcs, azure_data_lake_conn_id, google_cloud_storage_conn_id, delegate_to=None, replace=False, gzip=False, *args, **kwargs)[source]¶ Bases:
airflow.contrib.operators.adls_list_operator.AzureDataLakeStorageListOperator
Synchronizes an Azure Data Lake Storage path with a GCS bucket
- Parameters
src_adls (str) – The Azure Data Lake path to find the objects (templated)
dest_gcs (str) – The Google Cloud Storage bucket and prefix to store the objects. (templated)
replace (bool) – If true, replaces same-named files in GCS
gzip (bool) – Option to compress file for upload
azure_data_lake_conn_id (str) – The connection ID to use when connecting to Azure Data Lake Storage.
google_cloud_storage_conn_id (str) – The connection ID to use when connecting to Google Cloud Storage.
delegate_to (str) – The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
- Examples:
The following Operator would copy a single file named
hello/world.avro
from ADLS to the GCS bucketmybucket
. Its full resulting gcs path will begs://mybucket/hello/world.avro
copy_single_file = AdlsToGoogleCloudStorageOperator( task_id='copy_single_file', src_adls='hello/world.avro', dest_gcs='gs://mybucket', replace=False, azure_data_lake_conn_id='azure_data_lake_default', google_cloud_storage_conn_id='google_cloud_default' )
The following Operator would copy all parquet files from ADLS to the GCS bucket
mybucket
.copy_all_files = AdlsToGoogleCloudStorageOperator( task_id='copy_all_files', src_adls='*.parquet', dest_gcs='gs://mybucket', replace=False, azure_data_lake_conn_id='azure_data_lake_default', google_cloud_storage_conn_id='google_cloud_default' ) The following Operator would copy all parquet files from ADLS path ``/hello/world``to the GCS bucket ``mybucket``. :: copy_world_files = AdlsToGoogleCloudStorageOperator( task_id='copy_world_files', src_adls='hello/world/*.parquet', dest_gcs='gs://mybucket', replace=False, azure_data_lake_conn_id='azure_data_lake_default', google_cloud_storage_conn_id='google_cloud_default' )