airflow.contrib.operators.gcp_dlp_operator

This module contains various GCP Cloud DLP operators which allow you to perform basic operations using Cloud DLP.

Module Contents

class airflow.contrib.operators.gcp_dlp_operator.CloudDLPCancelDLPJobOperator(dlp_job_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Starts asynchronous cancellation on a long-running DlpJob.

Parameters
  • dlp_job_id (str) – ID of the DLP job resource to be cancelled.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['dlp_job_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateDeidentifyTemplateOperator(organization_id=None, project_id=None, deidentify_template=None, template_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates a DeidentifyTemplate for re-using frequently used configuration for de-identifying content, images, and storage.

Parameters
  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • deidentify_template (dict or google.cloud.dlp_v2.types.DeidentifyTemplate) – (Optional) The DeidentifyTemplate to create.

  • template_id (str) – (Optional) The template ID.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.DeidentifyTemplate

template_fields = ['organization_id', 'project_id', 'deidentify_template', 'template_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateDLPJobOperator(project_id=None, inspect_job=None, risk_job=None, job_id=None, retry=None, timeout=None, metadata=None, wait_until_finished=True, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates a new job to inspect storage or calculate risk metrics.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • inspect_job (dict or google.cloud.dlp_v2.types.InspectJobConfig) – (Optional) The configuration for the inspect job.

  • risk_job (dict or google.cloud.dlp_v2.types.RiskAnalysisJobConfig) – (Optional) The configuration for the risk job.

  • job_id (str) – (Optional) The job ID.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • wait_until_finished (bool) – (Optional) If true, it will keep polling the job state until it is set to DONE.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.DlpJob

template_fields = ['project_id', 'inspect_job', 'risk_job', 'job_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateInspectTemplateOperator(organization_id=None, project_id=None, inspect_template=None, template_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates an InspectTemplate for re-using frequently used configuration for inspecting content, images, and storage.

Parameters
  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • inspect_template (dict or google.cloud.dlp_v2.types.InspectTemplate) – (Optional) The InspectTemplate to create.

  • template_id (str) – (Optional) The template ID.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.InspectTemplate

template_fields = ['organization_id', 'project_id', 'inspect_template', 'template_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateJobTriggerOperator(project_id=None, job_trigger=None, trigger_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates a job trigger to run DLP actions such as scanning storage for sensitive information on a set schedule.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • job_trigger (dict or google.cloud.dlp_v2.types.JobTrigger) – (Optional) The JobTrigger to create.

  • trigger_id (str) – (Optional) The JobTrigger ID.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.JobTrigger

template_fields = ['project_id', 'job_trigger', 'trigger_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateStoredInfoTypeOperator(organization_id=None, project_id=None, config=None, stored_info_type_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates a pre-built stored infoType to be used for inspection.

Parameters
  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • config (dict or google.cloud.dlp_v2.types.StoredInfoTypeConfig) – (Optional) The config for the StoredInfoType.

  • stored_info_type_id (str) – (Optional) The StoredInfoType ID.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.StoredInfoType

template_fields = ['organization_id', 'project_id', 'config', 'stored_info_type_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeidentifyContentOperator(project_id=None, deidentify_config=None, inspect_config=None, item=None, inspect_template_name=None, deidentify_template_name=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

De-identifies potentially sensitive info from a ContentItem. This method has limits on input size and output size.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • deidentify_config (dict or google.cloud.dlp_v2.types.DeidentifyConfig) – (Optional) Configuration for the de-identification of the content item. Items specified here will override the template referenced by the deidentify_template_name argument.

  • inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector. Items specified here will override the template referenced by the inspect_template_name argument.

  • item (dict or google.cloud.dlp_v2.types.ContentItem) – (Optional) The item to de-identify. Will be treated as text.

  • inspect_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in inspect_config will override those set in the template.

  • deidentify_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in deidentify_config will override those set in the template.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.DeidentifyContentResponse

template_fields = ['project_id', 'deidentify_config', 'inspect_config', 'item', 'inspect_template_name', 'deidentify_template_name', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteDeidentifyTemplateOperator(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Deletes a DeidentifyTemplate.

Parameters
  • template_id (str) – The ID of deidentify template to be deleted.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['template_id', 'organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteDlpJobOperator(dlp_job_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Deletes a long-running DlpJob. This method indicates that the client is no longer interested in the DlpJob result. The job will be cancelled if possible.

Parameters
  • dlp_job_id (str) – The ID of the DLP job resource to be cancelled.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['dlp_job_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteInspectTemplateOperator(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Deletes an InspectTemplate.

Parameters
  • template_id (str) – The ID of the inspect template to be deleted.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['template_id', 'organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteJobTriggerOperator(job_trigger_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Deletes a job trigger.

Parameters
  • job_trigger_id (str) – The ID of the DLP job trigger to be deleted.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['job_trigger_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteStoredInfoTypeOperator(stored_info_type_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Deletes a stored infoType.

Parameters
  • stored_info_type_id (str) – The ID of the stored info type to be deleted.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['stored_info_type_id', 'organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetDeidentifyTemplateOperator(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets a DeidentifyTemplate.

Parameters
  • template_id (str) – The ID of deidentify template to be read.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.DeidentifyTemplate

template_fields = ['template_id', 'organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetDlpJobOperator(dlp_job_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets the latest state of a long-running DlpJob.

Parameters
  • dlp_job_id (str) – The ID of the DLP job resource to be read.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.DlpJob

template_fields = ['dlp_job_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetInspectTemplateOperator(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets an InspectTemplate.

Parameters
  • template_id (str) – The ID of inspect template to be read.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.InspectTemplate

template_fields = ['template_id', 'organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetJobTripperOperator(job_trigger_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets a job trigger.

Parameters
  • job_trigger_id (str) – The ID of the DLP job trigger to be read.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.JobTrigger

template_fields = ['job_trigger_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetStoredInfoTypeOperator(stored_info_type_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets a stored infoType.

Parameters
  • stored_info_type_id (str) – The ID of the stored info type to be read.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.StoredInfoType

template_fields = ['stored_info_type_id', 'organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPInspectContentOperator(project_id=None, inspect_config=None, item=None, inspect_template_name=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Finds potentially sensitive info in content. This method has limits on input size, processing time, and output size.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector. Items specified here will override the template referenced by the inspect_template_name argument.

  • item (dict or google.cloud.dlp_v2.types.ContentItem) – (Optional) The item to de-identify. Will be treated as text.

  • inspect_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in inspect_config will override those set in the template.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.tasks_v2.types.InspectContentResponse

template_fields = ['project_id', 'inspect_config', 'item', 'inspect_template_name', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPListDeidentifyTemplatesOperator(organization_id=None, project_id=None, page_size=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Lists DeidentifyTemplates.

Parameters
  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.

  • order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

list[google.cloud.dlp_v2.types.DeidentifyTemplate]

template_fields = ['organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPListDlpJobsOperator(project_id=None, results_filter=None, page_size=None, job_type=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Lists DlpJobs that match the specified filter in the request.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • results_filter (str) – (Optional) Filter used to specify a subset of results.

  • page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.

  • job_type (str) – (Optional) The type of job.

  • order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

list[google.cloud.dlp_v2.types.DlpJob]

template_fields = ['project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPListInfoTypesOperator(language_code=None, results_filter=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Returns a list of the sensitive information types that the DLP API supports.

Parameters
  • language_code (str) – (Optional) Optional BCP-47 language code for localized infoType friendly names. If omitted, or if localized strings are not available, en-US strings will be returned.

  • results_filter (str) – (Optional) Filter used to specify a subset of results.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

ListInfoTypesResponse

template_fields = ['language_code', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPListInspectTemplatesOperator(organization_id=None, project_id=None, page_size=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Lists InspectTemplates.

Parameters
  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.

  • order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

list[google.cloud.dlp_v2.types.InspectTemplate]

template_fields = ['organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPListJobTriggersOperator(project_id=None, page_size=None, order_by=None, results_filter=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Lists job triggers.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.

  • order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.

  • results_filter (str) – (Optional) Filter used to specify a subset of results.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

list[google.cloud.dlp_v2.types.JobTrigger]

template_fields = ['project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPListStoredInfoTypesOperator(organization_id=None, project_id=None, page_size=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Lists stored infoTypes.

Parameters
  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.

  • order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

list[google.cloud.dlp_v2.types.StoredInfoType]

template_fields = ['organization_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPRedactImageOperator(project_id=None, inspect_config=None, image_redaction_configs=None, include_findings=None, byte_item=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Redacts potentially sensitive info from an image. This method has limits on input size, processing time, and output size.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector. Items specified here will override the template referenced by the inspect_template_name argument.

  • image_redaction_configs (list[dict] or list[google.cloud.dlp_v2.types.ImageRedactionConfig]) – (Optional) The configuration for specifying what content to redact from images.

  • include_findings (bool) – (Optional) Whether the response should include findings along with the redacted image.

  • byte_item (dict or google.cloud.dlp_v2.types.ByteContentItem) – (Optional) The content must be PNG, JPEG, SVG or BMP.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.RedactImageResponse

template_fields = ['project_id', 'inspect_config', 'image_redaction_configs', 'include_findings', 'byte_item', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPReidentifyContentOperator(project_id=None, reidentify_config=None, inspect_config=None, item=None, inspect_template_name=None, reidentify_template_name=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Re-identifies content that has been de-identified.

Parameters
  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • reidentify_config (dict or google.cloud.dlp_v2.types.DeidentifyConfig) – (Optional) Configuration for the re-identification of the content item.

  • inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector.

  • item (dict or google.cloud.dlp_v2.types.ContentItem) – (Optional) The item to re-identify. Will be treated as text.

  • inspect_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in inspect_config will override those set in the template.

  • reidentify_template_name (str) – (Optional) Optional template to use. References an instance of DeidentifyTemplate. Any configuration directly specified in reidentify_config or inspect_config will override those set in the template.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.ReidentifyContentResponse

template_fields = ['project_id', 'reidentify_config', 'inspect_config', 'item', 'inspect_template_name', 'reidentify_template_name', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateDeidentifyTemplateOperator(template_id, organization_id=None, project_id=None, deidentify_template=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Updates the DeidentifyTemplate.

Parameters
  • template_id (str) – The ID of deidentify template to be updated.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • deidentify_template (dict or google.cloud.dlp_v2.types.DeidentifyTemplate) – New DeidentifyTemplate value.

  • update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.DeidentifyTemplate

template_fields = ['template_id', 'organization_id', 'project_id', 'deidentify_template', 'update_mask', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateInspectTemplateOperator(template_id, organization_id=None, project_id=None, inspect_template=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Updates the InspectTemplate.

Parameters
  • template_id (str) – The ID of the inspect template to be updated.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.

  • inspect_template (dict or google.cloud.dlp_v2.types.InspectTemplate) – New InspectTemplate value.

  • update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.InspectTemplate

template_fields = ['template_id', 'organization_id', 'project_id', 'inspect_template', 'update_mask', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateJobTriggerOperator(job_trigger_id, project_id=None, job_trigger=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Updates a job trigger.

Parameters
  • job_trigger_id (str) – The ID of the DLP job trigger to be updated.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.

  • job_trigger (dict or google.cloud.dlp_v2.types.JobTrigger) – New JobTrigger value.

  • update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.InspectTemplate

template_fields = ['job_trigger_id', 'project_id', 'job_trigger', 'update_mask', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateStoredInfoTypeOperator(stored_info_type_id, organization_id=None, project_id=None, config=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Updates the stored infoType by creating a new version.

Parameters
  • stored_info_type_id (str) – The ID of the stored info type to be updated.

  • organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organisation.

  • project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organisation.

  • config (dict or google.cloud.dlp_v2.types.StoredInfoTypeConfig) – Updated configuration for the storedInfoType. If not provided, a new version of the storedInfoType will be created with the existing configuration.

  • update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Return type

google.cloud.dlp_v2.types.StoredInfoType

template_fields = ['stored_info_type_id', 'organization_id', 'project_id', 'config', 'update_mask', 'gcp_conn_id'][source]
execute(self, context)[source]