airflow.contrib.operators.gcs_list_operator

Module Contents

class airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator(bucket, prefix=None, delimiter=None, google_cloud_storage_conn_id='google_cloud_default', delegate_to=None, *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

List all objects from the bucket with the give string prefix and delimiter in name.

This operator returns a python list with the name of objects which can be used by

xcom in the downstream task.

Parameters
  • bucket (str) – The Google cloud storage bucket to find the objects. (templated)

  • prefix (str) – Prefix string which filters objects whose name begin with this prefix. (templated)

  • delimiter (str) – The delimiter by which you want to filter the objects. (templated) For e.g to lists the CSV files from in a directory in GCS you would use delimiter=’.csv’.

  • google_cloud_storage_conn_id (str) – The connection ID to use when connecting to Google cloud storage.

  • delegate_to (str) – The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

Example:

The following Operator would list all the Avro files from sales/sales-2017 folder in data bucket.

GCS_Files = GoogleCloudStorageListOperator(
    task_id='GCS_Files',
    bucket='data',
    prefix='sales/sales-2017/',
    delimiter='.avro',
    google_cloud_storage_conn_id=google_cloud_conn_id
)
template_fields :Iterable[str] = ['bucket', 'prefix', 'delimiter'][source]
ui_color = #f0eee4[source]
execute(self, context)[source]

Was this entry helpful?