airflow.providers.google.cloud.operators.cloud_storage_transfer_service
¶
This module contains Google Cloud Transfer operators.
Module Contents¶
Classes¶
Helper class for preprocess of transfer job body. |
|
Helper class for validating transfer job body. |
|
Creates a transfer job that runs periodically. |
|
Updates a transfer job that runs periodically. |
|
Delete a transfer job. |
|
Runs a transfer job. |
|
Gets the latest state of a long-running operation in Google Storage Transfer Service. |
|
Lists long-running operations in Google Storage Transfer Service that match the specified filter. |
|
Pauses a transfer operation in Google Storage Transfer Service. |
|
Resumes a transfer operation in Google Storage Transfer Service. |
|
Cancels a transfer operation in Google Storage Transfer Service. |
|
Sync an S3 bucket with a Google Cloud Storage bucket using the Google Cloud Storage Transfer Service. |
|
Copies objects from a bucket to another using the Google Cloud Storage Transfer Service. |
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.TransferJobPreprocessor(body, aws_conn_id='aws_default', default_schedule=False)[source]¶
Helper class for preprocess of transfer job body.
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.TransferJobValidator(body)[source]¶
Helper class for validating transfer job body.
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCreateJobOperator(*, body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', project_id=PROVIDE_PROJECT_ID, google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Creates a transfer job that runs periodically.
Warning
This operator is NOT idempotent in the following cases:
name is not passed in body param
transfer job name has been soft deleted. In this case, each new task will receive a unique suffix
If you run it many times, many transfer jobs will be created in the Google Cloud.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceCreateJobOperator
- Parameters
body (dict) –
(Required) The request body, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs#TransferJob With three additional improvements:
dates can be given in the form
datetime.date
times can be given in the form
datetime.time
credentials to Amazon Web Service should be stored in the connection and indicated by the aws_conn_id parameter
aws_conn_id (str | None) – The connection ID used to retrieve credentials to Amazon Web Service.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('body', 'gcp_conn_id', 'aws_conn_id', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceUpdateJobOperator(*, job_name, body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', project_id=PROVIDE_PROJECT_ID, google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Updates a transfer job that runs periodically.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceRunJobOperator
- Parameters
job_name (str) – (Required) Name of the job to be updated
body (dict) –
(Required) The request body, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/patch#request-body With three additional improvements:
dates can be given in the form
datetime.date
times can be given in the form
datetime.time
credentials to Amazon Web Service should be stored in the connection and indicated by the aws_conn_id parameter
aws_conn_id (str | None) – The connection ID used to retrieve credentials to Amazon Web Service.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('job_name', 'body', 'gcp_conn_id', 'aws_conn_id', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceDeleteJobOperator(*, job_name, gcp_conn_id='google_cloud_default', api_version='v1', project_id=PROVIDE_PROJECT_ID, google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Delete a transfer job.
This is a soft delete. After a transfer job is deleted, the job and all the transfer executions are subject to garbage collection. Transfer jobs become eligible for garbage collection 30 days after soft delete.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceDeleteJobOperator
- Parameters
job_name (str) – (Required) Name of the TRANSFER operation
project_id (str) – (Optional) the ID of the project that owns the Transfer Job. If set to None or missing, the default project_id from the Google Cloud connection is used.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('job_name', 'project_id', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceRunJobOperator(*, job_name, gcp_conn_id='google_cloud_default', api_version='v1', project_id=PROVIDE_PROJECT_ID, google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Runs a transfer job.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceUpdateJobOperator
- Parameters
job_name (str) – (Required) Name of the job to be run
project_id (str) – (Optional) the ID of the project that owns the Transfer Job. If set to None or missing, the default project_id from the Google Cloud connection is used.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('job_name', 'project_id', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceGetOperationOperator(*, project_id=PROVIDE_PROJECT_ID, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Gets the latest state of a long-running operation in Google Storage Transfer Service.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceGetOperationOperator
- Parameters
operation_name (str) – (Required) Name of the transfer operation.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud Platform.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('operation_name', 'gcp_conn_id', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceListOperationsOperator(request_filter, project_id=PROVIDE_PROJECT_ID, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Lists long-running operations in Google Storage Transfer Service that match the specified filter.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceListOperationsOperator
- Parameters
request_filter (dict) – (Required) A request filter, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/list#body.QUERY_PARAMETERS.filter
gcp_conn_id (str) – The connection ID used to connect to Google Cloud Platform.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('request_filter', 'gcp_conn_id', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServicePauseOperationOperator(*, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Pauses a transfer operation in Google Storage Transfer Service.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServicePauseOperationOperator
- Parameters
operation_name (str) – (Required) Name of the transfer operation.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('operation_name', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceResumeOperationOperator(*, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Resumes a transfer operation in Google Storage Transfer Service.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceResumeOperationOperator
- Parameters
operation_name (str) – (Required) Name of the transfer operation.
gcp_conn_id (str) – The connection ID used to connect to Google Cloud.
api_version (str) – API version used (e.g. v1).
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('operation_name', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCancelOperationOperator(*, operation_name, gcp_conn_id='google_cloud_default', api_version='v1', google_impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Cancels a transfer operation in Google Storage Transfer Service.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceCancelOperationOperator
- Parameters
operation_name (str) – (Required) Name of the transfer operation.
api_version (str) – API version used (e.g. v1).
gcp_conn_id (str) – The connection ID used to connect to Google Cloud Platform.
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('operation_name', 'gcp_conn_id', 'api_version', 'google_impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceS3ToGCSOperator(*, s3_bucket, gcs_bucket, s3_path=None, gcs_path=None, project_id=PROVIDE_PROJECT_ID, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, google_impersonation_chain=None, delete_job_after_completion=False, aws_role_arn=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Sync an S3 bucket with a Google Cloud Storage bucket using the Google Cloud Storage Transfer Service.
Warning
This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud.
Example:
s3_to_gcs_transfer_op = S3ToGoogleCloudStorageTransferOperator( task_id="s3_to_gcs_transfer_example", s3_bucket="my-s3-bucket", project_id="my-gcp-project", gcs_bucket="my-gcs-bucket", dag=my_dag, )
- Parameters
s3_bucket (str) – The S3 bucket where to find the objects. (templated)
gcs_bucket (str) – The destination Google Cloud Storage bucket where you want to store the files. (templated)
s3_path (str | None) – Optional root path where the source objects are. (templated)
gcs_path (str | None) – Optional root path for transferred objects. (templated)
project_id (str) – Optional ID of the Google Cloud Console project that owns the job
aws_conn_id (str | None) – The source S3 connection
gcp_conn_id (str) – The destination connection ID to use when connecting to Google Cloud Storage.
description (str | None) – Optional transfer service job description
schedule (dict | None) –
Optional transfer service schedule; If not set, run transfer job once as soon as the operator runs The format is described https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements:
dates they can be passed as
datetime.date
times they can be passed as
datetime.time
object_conditions (dict | None) – Optional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec
transfer_options (dict | None) – Optional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec
wait (bool) – Wait for transfer to finish. It must be set to True, if ‘delete_job_after_completion’ is set to True.
timeout (float | None) – Time to wait for the operation to end in seconds. Defaults to 60 seconds if not specified.
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
delete_job_after_completion (bool) – If True, delete the job after complete. If set to True, ‘wait’ must be set to True.
aws_role_arn (str | None) – Optional AWS role ARN for workload identity federation. This will override the aws_conn_id for authentication between GCP and AWS; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#AwsS3Data
- template_fields: collections.abc.Sequence[str] = ('gcp_conn_id', 's3_bucket', 'gcs_bucket', 's3_path', 'gcs_path', 'description',...[source]¶
- class airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceGCSToGCSOperator(*, source_bucket, destination_bucket, source_path=None, destination_path=None, project_id=PROVIDE_PROJECT_ID, gcp_conn_id='google_cloud_default', description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, google_impersonation_chain=None, delete_job_after_completion=False, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Copies objects from a bucket to another using the Google Cloud Storage Transfer Service.
Warning
This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud.
See also
For more information on how to use this operator, take a look at the guide: CloudDataTransferServiceGCSToGCSOperator
Example:
gcs_to_gcs_transfer_op = GoogleCloudStorageToGoogleCloudStorageTransferOperator( task_id="gcs_to_gcs_transfer_example", source_bucket="my-source-bucket", destination_bucket="my-destination-bucket", project_id="my-gcp-project", dag=my_dag, )
- Parameters
source_bucket (str) – The source Google Cloud Storage bucket where the object is. (templated)
destination_bucket (str) – The destination Google Cloud Storage bucket where the object should be. (templated)
source_path (str | None) – Optional root path where the source objects are. (templated)
destination_path (str | None) – Optional root path for transferred objects. (templated)
project_id (str) – The ID of the Google Cloud Console project that owns the job
gcp_conn_id (str) – Optional connection ID to use when connecting to Google Cloud Storage.
description (str | None) – Optional transfer service job description
schedule (dict | None) –
Optional transfer service schedule; If not set, run transfer job once as soon as the operator runs See: https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements:
dates they can be passed as
datetime.date
times they can be passed as
datetime.time
object_conditions (dict | None) – Optional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#ObjectConditions
transfer_options (dict | None) – Optional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#TransferOptions
wait (bool) – Wait for transfer to finish. It must be set to True, if ‘delete_job_after_completion’ is set to True.
timeout (float | None) – Time to wait for the operation to end in seconds. Defaults to 60 seconds if not specified.
google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
delete_job_after_completion (bool) – If True, delete the job after complete. If set to True, ‘wait’ must be set to True.
- template_fields: collections.abc.Sequence[str] = ('gcp_conn_id', 'source_bucket', 'destination_bucket', 'source_path', 'destination_path',...[source]¶