airflow.contrib.operators.s3_to_gcs_operator

Module Contents

class airflow.contrib.operators.s3_to_gcs_operator.S3ToGoogleCloudStorageOperator(bucket, prefix='', delimiter='', aws_conn_id='aws_default', verify=None, dest_gcs_conn_id=None, dest_gcs=None, delegate_to=None, replace=False, gzip=False, *args, **kwargs)[source]

Bases: airflow.contrib.operators.s3_list_operator.S3ListOperator

Synchronizes an S3 key, possibly a prefix, with a Google Cloud Storage destination path.

Parameters
  • bucket (str) – The S3 bucket where to find the objects. (templated)

  • prefix (str) – Prefix string which filters objects whose name begin with such prefix. (templated)

  • delimiter (str) – the delimiter marks key hierarchy. (templated)

  • aws_conn_id (str) – The source S3 connection

  • verify (bool or str) –

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • dest_gcs_conn_id (str) – The destination connection ID to use when connecting to Google Cloud Storage.

  • dest_gcs (str) – The destination Google Cloud Storage bucket and prefix where you want to store the files. (templated)

  • delegate_to (str) – The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

  • replace (bool) – Whether you want to replace existing destination files or not.

  • gzip (bool) – Option to compress file for upload

Example:

s3_to_gcs_op = S3ToGoogleCloudStorageOperator(
     task_id='s3_to_gcs_example',
     bucket='my-s3-bucket',
     prefix='data/customers-201804',
     dest_gcs_conn_id='google_cloud_default',
     dest_gcs='gs://my.gcs.bucket/some/customers/',
     replace=False,
     gzip=True,
     dag=my-dag)

Note that bucket, prefix, delimiter and dest_gcs are templated, so you can use variables in them if you wish.

template_fields = ['bucket', 'prefix', 'delimiter', 'dest_gcs'][source]
ui_color = #e09411[source]
execute(self, context)[source]
static _gcs_object_is_directory(object)[source]

Was this entry helpful?