airflow.contrib.operators.s3_list_operator

Module Contents

class airflow.contrib.operators.s3_list_operator.S3ListOperator(bucket, prefix='', delimiter='', aws_conn_id='aws_default', verify=None, *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

List all objects from the bucket with the given string prefix in name.

This operator returns a python list with the name of objects which can be used by xcom in the downstream task.

Parameters
  • bucket (str) – The S3 bucket where to find the objects. (templated)

  • prefix (str) – Prefix string to filters the objects whose name begin with such prefix. (templated)

  • delimiter (str) – the delimiter marks key hierarchy. (templated)

  • aws_conn_id (str) – The connection ID to use when connecting to S3 storage.

  • verify (bool or str) –

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

Example:

The following operator would list all the files (excluding subfolders) from the S3 customers/2018/04/ key in the data bucket.

s3_file = S3ListOperator(
    task_id='list_3s_files',
    bucket='data',
    prefix='customers/2018/04/',
    delimiter='/',
    aws_conn_id='aws_customers_conn'
)
template_fields :Iterable[str] = ['bucket', 'prefix', 'delimiter'][source]
ui_color = #ffd700[source]
execute(self, context)[source]

Was this entry helpful?