:mod:`airflow.contrib.operators.gcp_transfer_operator` ====================================================== .. py:module:: airflow.contrib.operators.gcp_transfer_operator Module Contents --------------- .. data:: AwsHook .. py:class:: TransferJobPreprocessor(body, aws_conn_id='aws_default') .. method:: _inject_aws_credentials(self) .. method:: _reformat_date(self, field_key) .. method:: _reformat_time(self, field_key) .. method:: _reformat_schedule(self) .. method:: process_body(self) .. staticmethod:: _convert_date_to_dict(field_date) Convert native python ``datetime.date`` object to a format supported by the API .. staticmethod:: _convert_time_to_dict(time) Convert native python ``datetime.time`` object to a format supported by the API .. py:class:: TransferJobValidator(body) .. method:: _verify_data_source(self) .. method:: _restrict_aws_credentials(self) .. method:: _restrict_empty_body(self) .. method:: validate_body(self) .. py:class:: GcpTransferServiceJobCreateOperator(body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Creates a transfer job that runs periodically. .. warning:: This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud Platform. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceJobCreateOperator` :param body: (Required) The request body, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/create#request-body With three additional improvements: * dates can be given in the form :class:`datetime.date` * times can be given in the form :class:`datetime.time` * credentials to Amazon Web Service should be stored in the connection and indicated by the aws_conn_id parameter :type body: dict :param aws_conn_id: The connection ID used to retrieve credentials to Amazon Web Service. :type aws_conn_id: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str :param api_version: API version used (e.g. v1). :type api_version: str .. attribute:: template_fields :annotation: = ['body', 'gcp_conn_id', 'aws_conn_id'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceJobUpdateOperator(job_name, body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Updates a transfer job that runs periodically. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceJobUpdateOperator` :param job_name: (Required) Name of the job to be updated :type job_name: str :param body: (Required) The request body, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/patch#request-body With three additional improvements: * dates can be given in the form :class:`datetime.date` * times can be given in the form :class:`datetime.time` * credentials to Amazon Web Service should be stored in the connection and indicated by the aws_conn_id parameter :type body: dict :param aws_conn_id: The connection ID used to retrieve credentials to Amazon Web Service. :type aws_conn_id: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str :param api_version: API version used (e.g. v1). :type api_version: str .. attribute:: template_fields :annotation: = ['job_name', 'body', 'gcp_conn_id', 'aws_conn_id'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceJobDeleteOperator(job_name, gcp_conn_id='google_cloud_default', api_version='v1', project_id=None, *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Delete a transfer job. This is a soft delete. After a transfer job is deleted, the job and all the transfer executions are subject to garbage collection. Transfer jobs become eligible for garbage collection 30 days after soft delete. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceJobDeleteOperator` :param job_name: (Required) Name of the TRANSFER operation :type job_name: str :param project_id: (Optional) the ID of the project that owns the Transfer Job. If set to None or missing, the default project_id from the GCP connection is used. :type project_id: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str :param api_version: API version used (e.g. v1). :type api_version: str .. attribute:: template_fields :annotation: = ['job_name', 'project_id', 'gcp_conn_id', 'api_version'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceOperationGetOperator(operation_name, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Gets the latest state of a long-running operation in Google Storage Transfer Service. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceOperationGetOperator` :param operation_name: (Required) Name of the transfer operation. :type operation_name: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str :param api_version: API version used (e.g. v1). :type api_version: str .. attribute:: template_fields :annotation: = ['operation_name', 'gcp_conn_id'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceOperationsListOperator(filter, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Lists long-running operations in Google Storage Transfer Service that match the specified filter. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceOperationsListOperator` :param filter: (Required) A request filter, as described in https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/list#body.QUERY_PARAMETERS.filter :type filter: dict :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str :param api_version: API version used (e.g. v1). :type api_version: str .. attribute:: template_fields :annotation: = ['filter', 'gcp_conn_id'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceOperationPauseOperator(operation_name, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Pauses a transfer operation in Google Storage Transfer Service. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceOperationPauseOperator` :param operation_name: (Required) Name of the transfer operation. :type operation_name: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str :param api_version: API version used (e.g. v1). :type api_version: str .. attribute:: template_fields :annotation: = ['operation_name', 'gcp_conn_id', 'api_version'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceOperationResumeOperator(operation_name, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Resumes a transfer operation in Google Storage Transfer Service. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceOperationResumeOperator` :param operation_name: (Required) Name of the transfer operation. :type operation_name: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :param api_version: API version used (e.g. v1). :type api_version: str :type gcp_conn_id: str .. attribute:: template_fields :annotation: = ['operation_name', 'gcp_conn_id', 'api_version'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: GcpTransferServiceOperationCancelOperator(operation_name, api_version='v1', gcp_conn_id='google_cloud_default', *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Cancels a transfer operation in Google Storage Transfer Service. .. seealso:: For more information on how to use this operator, take a look at the guide: :ref:`howto/operator:GcpTransferServiceOperationCancelOperator` :param operation_name: (Required) Name of the transfer operation. :type operation_name: str :param api_version: API version used (e.g. v1). :type api_version: str :param gcp_conn_id: The connection ID used to connect to Google Cloud Platform. :type gcp_conn_id: str .. attribute:: template_fields :annotation: = ['operation_name', 'gcp_conn_id', 'api_version'] .. method:: _validate_inputs(self) .. method:: execute(self, context) .. py:class:: S3ToGoogleCloudStorageTransferOperator(s3_bucket, gcs_bucket, project_id=None, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', delegate_to=None, description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Synchronizes an S3 bucket with a Google Cloud Storage bucket using the GCP Storage Transfer Service. .. warning:: This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud Platform. **Example**: .. code-block:: python s3_to_gcs_transfer_op = S3ToGoogleCloudStorageTransferOperator( task_id='s3_to_gcs_transfer_example', s3_bucket='my-s3-bucket', project_id='my-gcp-project', gcs_bucket='my-gcs-bucket', dag=my_dag) :param s3_bucket: The S3 bucket where to find the objects. (templated) :type s3_bucket: str :param gcs_bucket: The destination Google Cloud Storage bucket where you want to store the files. (templated) :type gcs_bucket: str :param project_id: Optional ID of the Google Cloud Platform Console project that owns the job :type project_id: str :param aws_conn_id: The source S3 connection :type aws_conn_id: str :param gcp_conn_id: The destination connection ID to use when connecting to Google Cloud Storage. :type gcp_conn_id: str :param delegate_to: The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled. :type delegate_to: str :param description: Optional transfer service job description :type description: str :param schedule: Optional transfer service schedule; If not set, run transfer job once as soon as the operator runs The format is described https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements: * dates they can be passed as :class:`datetime.date` * times they can be passed as :class:`datetime.time` :type schedule: dict :param object_conditions: Optional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec :type object_conditions: dict :param transfer_options: Optional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec :type transfer_options: dict :param wait: Wait for transfer to finish :type wait: bool :param timeout: Time to wait for the operation to end in seconds :type timeout: int .. attribute:: template_fields :annotation: = ['gcp_conn_id', 's3_bucket', 'gcs_bucket', 'description', 'object_conditions'] .. attribute:: ui_color :annotation: = #e09411 .. method:: execute(self, context) .. method:: _create_body(self) .. py:class:: GoogleCloudStorageToGoogleCloudStorageTransferOperator(source_bucket, destination_bucket, project_id=None, gcp_conn_id='google_cloud_default', delegate_to=None, description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, *args, **kwargs) Bases::class:`airflow.models.BaseOperator` Copies objects from a bucket to another using the GCP Storage Transfer Service. .. warning:: This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud Platform. **Example**: .. code-block:: python gcs_to_gcs_transfer_op = GoogleCloudStorageToGoogleCloudStorageTransferOperator( task_id='gcs_to_gcs_transfer_example', source_bucket='my-source-bucket', destination_bucket='my-destination-bucket', project_id='my-gcp-project', dag=my_dag) :param source_bucket: The source Google cloud storage bucket where the object is. (templated) :type source_bucket: str :param destination_bucket: The destination Google cloud storage bucket where the object should be. (templated) :type destination_bucket: str :param project_id: The ID of the Google Cloud Platform Console project that owns the job :type project_id: str :param gcp_conn_id: Optional connection ID to use when connecting to Google Cloud Storage. :type gcp_conn_id: str :param delegate_to: The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled. :type delegate_to: str :param description: Optional transfer service job description :type description: str :param schedule: Optional transfer service schedule; If not set, run transfer job once as soon as the operator runs See: https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements: * dates they can be passed as :class:`datetime.date` * times they can be passed as :class:`datetime.time` :type schedule: dict :param object_conditions: Optional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#ObjectConditions :type object_conditions: dict :param transfer_options: Optional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#TransferOptions :type transfer_options: dict :param wait: Wait for transfer to finish; defaults to `True` :type wait: bool :param timeout: Time to wait for the operation to end in seconds :type timeout: int .. attribute:: template_fields :annotation: = ['gcp_conn_id', 'source_bucket', 'destination_bucket', 'description', 'object_conditions'] .. attribute:: ui_color :annotation: = #e09411 .. method:: execute(self, context) .. method:: _create_body(self)