airflow.contrib.operators.adls_to_gcs

Module Contents

class airflow.contrib.operators.adls_to_gcs.AdlsToGoogleCloudStorageOperator(src_adls, dest_gcs, azure_data_lake_conn_id, google_cloud_storage_conn_id, delegate_to=None, replace=False, *args, **kwargs)[source]

Bases: airflow.contrib.operators.adls_list_operator.AzureDataLakeStorageListOperator

Synchronizes an Azure Data Lake Storage path with a GCS bucket

Parameters
  • src_adls (str) – The Azure Data Lake path to find the objects (templated)

  • dest_gcs (str) – The Google Cloud Storage bucket and prefix to store the objects. (templated)

  • replace (bool) – If true, replaces same-named files in GCS

  • azure_data_lake_conn_id (str) – The connection ID to use when connecting to Azure Data Lake Storage.

  • google_cloud_storage_conn_id (str) – The connection ID to use when connecting to Google Cloud Storage.

  • delegate_to (str) – The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

Examples:

The following Operator would copy a single file named hello/world.avro from ADLS to the GCS bucket mybucket. Its full resulting gcs path will be gs://mybucket/hello/world.avro

copy_single_file = AdlsToGoogleCloudStorageOperator(
    task_id='copy_single_file',
    src_adls='hello/world.avro',
    dest_gcs='gs://mybucket',
    replace=False,
    azure_data_lake_conn_id='azure_data_lake_default',
    google_cloud_storage_conn_id='google_cloud_default'
)

The following Operator would copy all parquet files from ADLS to the GCS bucket mybucket.

   copy_all_files = AdlsToGoogleCloudStorageOperator(
       task_id='copy_all_files',
       src_adls='*.parquet',
       dest_gcs='gs://mybucket',
       replace=False,
       azure_data_lake_conn_id='azure_data_lake_default',
       google_cloud_storage_conn_id='google_cloud_default'
   )

The following Operator would copy all parquet files from ADLS
path ``/hello/world``to the GCS bucket ``mybucket``. ::

   copy_world_files = AdlsToGoogleCloudStorageOperator(
       task_id='copy_world_files',
       src_adls='hello/world/*.parquet',
       dest_gcs='gs://mybucket',
       replace=False,
       azure_data_lake_conn_id='azure_data_lake_default',
       google_cloud_storage_conn_id='google_cloud_default'
   )
template_fields = ['src_adls', 'dest_gcs'][source]
ui_color = #f0eee4[source]
execute(self, context)[source]