airflow.providers.google.cloud.operators.cloud_storage_transfer_service

This module contains Google Cloud Transfer operators.

Module Contents

Classes

TransferJobPreprocessor

Helper class for preprocess of transfer job body.

TransferJobValidator

Helper class for validating transfer job body.

CloudDataTransferServiceCreateJobOperator

Creates a transfer job that runs periodically.

CloudDataTransferServiceUpdateJobOperator

Updates a transfer job that runs periodically.

CloudDataTransferServiceDeleteJobOperator

Delete a transfer job.

CloudDataTransferServiceGetOperationOperator

Gets the latest state of a long-running operation in Google Storage Transfer Service.

CloudDataTransferServiceListOperationsOperator

Lists long-running operations in Google Storage Transfer Service that match the specified filter.

CloudDataTransferServicePauseOperationOperator

Pauses a transfer operation in Google Storage Transfer Service.

CloudDataTransferServiceResumeOperationOperator

Resumes a transfer operation in Google Storage Transfer Service.

CloudDataTransferServiceCancelOperationOperator

Cancels a transfer operation in Google Storage Transfer Service.

CloudDataTransferServiceS3ToGCSOperator

Sync an S3 bucket with a Google Cloud Storage bucket using the Google Cloud Storage Transfer Service.

CloudDataTransferServiceGCSToGCSOperator

Copies objects from a bucket to another using the Google Cloud Storage Transfer Service.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.TransferJobPreprocessor(body, aws_conn_id='aws_default', default_schedule=False)[source]

Helper class for preprocess of transfer job body.

process_body()[source]

Injects AWS credentials into body if needed and reformats schedule information.

Returns

Preprocessed body

Return type

dict

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.TransferJobValidator(body)[source]

Helper class for validating transfer job body.

validate_body()[source]

Validate the body.

Checks if body specifies transferSpec if yes, then check if AWS credentials are passed correctly and no more than 1 data source was selected.

Raises

AirflowException

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCreateJobOperator(*, body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', project_id=None, google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Creates a transfer job that runs periodically.

Warning

This operator is NOT idempotent in the following cases:

  • name is not passed in body param

  • transfer job name has been soft deleted. In this case, each new task will receive a unique suffix

If you run it many times, many transfer jobs will be created in the Google Cloud.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceCreateJobOperator

Parameters
  • body (dict) –

    (Required) The request body, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs#TransferJob With three additional improvements:

    • dates can be given in the form datetime.date

    • times can be given in the form datetime.time

    • credentials to Amazon Web Service should be stored in the connection and indicated by the aws_conn_id parameter

  • aws_conn_id (str | None) – The connection ID used to retrieve credentials to Amazon Web Service.

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('body', 'gcp_conn_id', 'aws_conn_id', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceUpdateJobOperator(*, job_name, body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', project_id=None, google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Updates a transfer job that runs periodically.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceUpdateJobOperator

Parameters
  • job_name (str) – (Required) Name of the job to be updated

  • body (dict) –

    (Required) The request body, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/patch#request-body With three additional improvements:

    • dates can be given in the form datetime.date

    • times can be given in the form datetime.time

    • credentials to Amazon Web Service should be stored in the connection and indicated by the aws_conn_id parameter

  • aws_conn_id (str | None) – The connection ID used to retrieve credentials to Amazon Web Service.

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('job_name', 'body', 'gcp_conn_id', 'aws_conn_id', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceDeleteJobOperator(*, job_name, gcp_conn_id='google_cloud_default', api_version='v1', project_id=None, google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Delete a transfer job.

This is a soft delete. After a transfer job is deleted, the job and all the transfer executions are subject to garbage collection. Transfer jobs become eligible for garbage collection 30 days after soft delete.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceDeleteJobOperator

Parameters
  • job_name (str) – (Required) Name of the TRANSFER operation

  • project_id (str | None) – (Optional) the ID of the project that owns the Transfer Job. If set to None or missing, the default project_id from the Google Cloud connection is used.

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('job_name', 'project_id', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceGetOperationOperator(*, project_id=None, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Gets the latest state of a long-running operation in Google Storage Transfer Service.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceGetOperationOperator

Parameters
  • operation_name (str) – (Required) Name of the transfer operation.

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud Platform.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('operation_name', 'gcp_conn_id', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceListOperationsOperator(request_filter, project_id=None, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Lists long-running operations in Google Storage Transfer Service that match the specified filter.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceListOperationsOperator

Parameters
  • request_filter (dict) – (Required) A request filter, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/list#body.QUERY_PARAMETERS.filter

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud Platform.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

property filter: dict | None[source]

Alias for request_filter, used for compatibility.

template_fields: Sequence[str] = ('request_filter', 'gcp_conn_id', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServicePauseOperationOperator(*, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Pauses a transfer operation in Google Storage Transfer Service.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServicePauseOperationOperator

Parameters
  • operation_name (str) – (Required) Name of the transfer operation.

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('operation_name', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceResumeOperationOperator(*, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Resumes a transfer operation in Google Storage Transfer Service.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceResumeOperationOperator

Parameters
  • operation_name (str) – (Required) Name of the transfer operation.

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud.

  • api_version (str) – API version used (e.g. v1).

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('operation_name', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCancelOperationOperator(*, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Cancels a transfer operation in Google Storage Transfer Service.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceCancelOperationOperator

Parameters
  • operation_name (str) – (Required) Name of the transfer operation.

  • api_version (str) – API version used (e.g. v1).

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud Platform.

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('operation_name', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceS3ToGCSOperator(*, s3_bucket, gcs_bucket, s3_path=None, gcs_path=None, project_id=None, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, google_impersonation_chain=None, delete_job_after_completion=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Sync an S3 bucket with a Google Cloud Storage bucket using the Google Cloud Storage Transfer Service.

Warning

This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud.

Example:

s3_to_gcs_transfer_op = S3ToGoogleCloudStorageTransferOperator(
    task_id="s3_to_gcs_transfer_example",
    s3_bucket="my-s3-bucket",
    project_id="my-gcp-project",
    gcs_bucket="my-gcs-bucket",
    dag=my_dag,
)
Parameters
  • s3_bucket (str) – The S3 bucket where to find the objects. (templated)

  • gcs_bucket (str) – The destination Google Cloud Storage bucket where you want to store the files. (templated)

  • s3_path (str | None) – Optional root path where the source objects are. (templated)

  • gcs_path (str | None) – Optional root path for transferred objects. (templated)

  • project_id (str | None) – Optional ID of the Google Cloud Console project that owns the job

  • aws_conn_id (str | None) – The source S3 connection

  • gcp_conn_id (str) – The destination connection ID to use when connecting to Google Cloud Storage.

  • description (str | None) – Optional transfer service job description

  • schedule (dict | None) –

    Optional transfer service schedule; If not set, run transfer job once as soon as the operator runs The format is described https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements:

  • object_conditions (dict | None) – Optional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec

  • transfer_options (dict | None) – Optional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec

  • wait (bool) – Wait for transfer to finish. It must be set to True, if ‘delete_job_after_completion’ is set to True.

  • timeout (float | None) – Time to wait for the operation to end in seconds. Defaults to 60 seconds if not specified.

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • delete_job_after_completion (bool) – If True, delete the job after complete. If set to True, ‘wait’ must be set to True.

template_fields: Sequence[str] = ('gcp_conn_id', 's3_bucket', 'gcs_bucket', 's3_path', 'gcs_path', 'description',...[source]
ui_color = '#e09411'[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceGCSToGCSOperator(*, source_bucket, destination_bucket, source_path=None, destination_path=None, project_id=None, gcp_conn_id='google_cloud_default', description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, google_impersonation_chain=None, delete_job_after_completion=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Copies objects from a bucket to another using the Google Cloud Storage Transfer Service.

Warning

This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud.

See also

For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceGCSToGCSOperator

Example:

gcs_to_gcs_transfer_op = GoogleCloudStorageToGoogleCloudStorageTransferOperator(
    task_id="gcs_to_gcs_transfer_example",
    source_bucket="my-source-bucket",
    destination_bucket="my-destination-bucket",
    project_id="my-gcp-project",
    dag=my_dag,
)
Parameters
  • source_bucket (str) – The source Google Cloud Storage bucket where the object is. (templated)

  • destination_bucket (str) – The destination Google Cloud Storage bucket where the object should be. (templated)

  • source_path (str | None) – Optional root path where the source objects are. (templated)

  • destination_path (str | None) – Optional root path for transferred objects. (templated)

  • project_id (str | None) – The ID of the Google Cloud Console project that owns the job

  • gcp_conn_id (str) – Optional connection ID to use when connecting to Google Cloud Storage.

  • description (str | None) – Optional transfer service job description

  • schedule (dict | None) –

    Optional transfer service schedule; If not set, run transfer job once as soon as the operator runs See: https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements:

  • object_conditions (dict | None) – Optional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#ObjectConditions

  • transfer_options (dict | None) – Optional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#TransferOptions

  • wait (bool) – Wait for transfer to finish. It must be set to True, if ‘delete_job_after_completion’ is set to True.

  • timeout (float | None) – Time to wait for the operation to end in seconds. Defaults to 60 seconds if not specified.

  • google_impersonation_chain (str | Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • delete_job_after_completion (bool) – If True, delete the job after complete. If set to True, ‘wait’ must be set to True.

template_fields: Sequence[str] = ('gcp_conn_id', 'source_bucket', 'destination_bucket', 'source_path', 'destination_path',...[source]
ui_color = '#e09411'[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?