airflow.contrib.operators.gcp_dlp_operator
¶
This module contains various GCP Cloud DLP operators which allow you to perform basic operations using Cloud DLP.
Module Contents¶
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPCancelDLPJobOperator
(dlp_job_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Starts asynchronous cancellation on a long-running DlpJob.
- Parameters
dlp_job_id (str) – ID of the DLP job resource to be cancelled.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPCreateDeidentifyTemplateOperator
(organization_id=None, project_id=None, deidentify_template=None, template_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Creates a DeidentifyTemplate for re-using frequently used configuration for de-identifying content, images, and storage.
- Parameters
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
deidentify_template (dict or google.cloud.dlp_v2.types.DeidentifyTemplate) – (Optional) The DeidentifyTemplate to create.
template_id (str) – (Optional) The template ID.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPCreateDLPJobOperator
(project_id=None, inspect_job=None, risk_job=None, job_id=None, retry=None, timeout=None, metadata=None, wait_until_finished=True, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Creates a new job to inspect storage or calculate risk metrics.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
inspect_job (dict or google.cloud.dlp_v2.types.InspectJobConfig) – (Optional) The configuration for the inspect job.
risk_job (dict or google.cloud.dlp_v2.types.RiskAnalysisJobConfig) – (Optional) The configuration for the risk job.
job_id (str) – (Optional) The job ID.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
wait_until_finished (bool) – (Optional) If true, it will keep polling the job state until it is set to DONE.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPCreateInspectTemplateOperator
(organization_id=None, project_id=None, inspect_template=None, template_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Creates an InspectTemplate for re-using frequently used configuration for inspecting content, images, and storage.
- Parameters
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
inspect_template (dict or google.cloud.dlp_v2.types.InspectTemplate) – (Optional) The InspectTemplate to create.
template_id (str) – (Optional) The template ID.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPCreateJobTriggerOperator
(project_id=None, job_trigger=None, trigger_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Creates a job trigger to run DLP actions such as scanning storage for sensitive information on a set schedule.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
job_trigger (dict or google.cloud.dlp_v2.types.JobTrigger) – (Optional) The JobTrigger to create.
trigger_id (str) – (Optional) The JobTrigger ID.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPCreateStoredInfoTypeOperator
(organization_id=None, project_id=None, config=None, stored_info_type_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Creates a pre-built stored infoType to be used for inspection.
- Parameters
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
config (dict or google.cloud.dlp_v2.types.StoredInfoTypeConfig) – (Optional) The config for the StoredInfoType.
stored_info_type_id (str) – (Optional) The StoredInfoType ID.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPDeidentifyContentOperator
(project_id=None, deidentify_config=None, inspect_config=None, item=None, inspect_template_name=None, deidentify_template_name=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
De-identifies potentially sensitive info from a ContentItem. This method has limits on input size and output size.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
deidentify_config (dict or google.cloud.dlp_v2.types.DeidentifyConfig) – (Optional) Configuration for the de-identification of the content item. Items specified here will override the template referenced by the deidentify_template_name argument.
inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector. Items specified here will override the template referenced by the inspect_template_name argument.
item (dict or google.cloud.dlp_v2.types.ContentItem) – (Optional) The item to de-identify. Will be treated as text.
inspect_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in inspect_config will override those set in the template.
deidentify_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in deidentify_config will override those set in the template.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPDeleteDeidentifyTemplateOperator
(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Deletes a DeidentifyTemplate.
- Parameters
template_id (str) – The ID of deidentify template to be deleted.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPDeleteDlpJobOperator
(dlp_job_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Deletes a long-running DlpJob. This method indicates that the client is no longer interested in the DlpJob result. The job will be cancelled if possible.
- Parameters
dlp_job_id (str) – The ID of the DLP job resource to be cancelled.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPDeleteInspectTemplateOperator
(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Deletes an InspectTemplate.
- Parameters
template_id (str) – The ID of the inspect template to be deleted.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPDeleteJobTriggerOperator
(job_trigger_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Deletes a job trigger.
- Parameters
job_trigger_id (str) – The ID of the DLP job trigger to be deleted.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPDeleteStoredInfoTypeOperator
(stored_info_type_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Deletes a stored infoType.
- Parameters
stored_info_type_id (str) – The ID of the stored info type to be deleted.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPGetDeidentifyTemplateOperator
(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Gets a DeidentifyTemplate.
- Parameters
template_id (str) – The ID of deidentify template to be read.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPGetDlpJobOperator
(dlp_job_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Gets the latest state of a long-running DlpJob.
- Parameters
dlp_job_id (str) – The ID of the DLP job resource to be read.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPGetInspectTemplateOperator
(template_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Gets an InspectTemplate.
- Parameters
template_id (str) – The ID of inspect template to be read.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPGetJobTripperOperator
(job_trigger_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Gets a job trigger.
- Parameters
job_trigger_id (str) – The ID of the DLP job trigger to be read.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPGetStoredInfoTypeOperator
(stored_info_type_id, organization_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Gets a stored infoType.
- Parameters
stored_info_type_id (str) – The ID of the stored info type to be read.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPInspectContentOperator
(project_id=None, inspect_config=None, item=None, inspect_template_name=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Finds potentially sensitive info in content. This method has limits on input size, processing time, and output size.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector. Items specified here will override the template referenced by the inspect_template_name argument.
item (dict or google.cloud.dlp_v2.types.ContentItem) – (Optional) The item to de-identify. Will be treated as text.
inspect_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in inspect_config will override those set in the template.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
google.cloud.tasks_v2.types.InspectContentResponse
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPListDeidentifyTemplatesOperator
(organization_id=None, project_id=None, page_size=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Lists DeidentifyTemplates.
- Parameters
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.
order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPListDlpJobsOperator
(project_id=None, results_filter=None, page_size=None, job_type=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Lists DlpJobs that match the specified filter in the request.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
results_filter (str) – (Optional) Filter used to specify a subset of results.
page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.
job_type (str) – (Optional) The type of job.
order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPListInfoTypesOperator
(language_code=None, results_filter=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Returns a list of the sensitive information types that the DLP API supports.
- Parameters
language_code (str) – (Optional) Optional BCP-47 language code for localized infoType friendly names. If omitted, or if localized strings are not available, en-US strings will be returned.
results_filter (str) – (Optional) Filter used to specify a subset of results.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
ListInfoTypesResponse
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPListInspectTemplatesOperator
(organization_id=None, project_id=None, page_size=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Lists InspectTemplates.
- Parameters
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.
order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPListJobTriggersOperator
(project_id=None, page_size=None, order_by=None, results_filter=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Lists job triggers.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.
order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.
results_filter (str) – (Optional) Filter used to specify a subset of results.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPListStoredInfoTypesOperator
(organization_id=None, project_id=None, page_size=None, order_by=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Lists stored infoTypes.
- Parameters
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
page_size (int) – (Optional) The maximum number of resources contained in the underlying API response.
order_by (str) – (Optional) Optional comma separated list of fields to order by, followed by asc or desc postfix.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPRedactImageOperator
(project_id=None, inspect_config=None, image_redaction_configs=None, include_findings=None, byte_item=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Redacts potentially sensitive info from an image. This method has limits on input size, processing time, and output size.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector. Items specified here will override the template referenced by the inspect_template_name argument.
image_redaction_configs (list[dict] or list[google.cloud.dlp_v2.types.ImageRedactionConfig]) – (Optional) The configuration for specifying what content to redact from images.
include_findings (bool) – (Optional) Whether the response should include findings along with the redacted image.
byte_item (dict or google.cloud.dlp_v2.types.ByteContentItem) – (Optional) The content must be PNG, JPEG, SVG or BMP.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPReidentifyContentOperator
(project_id=None, reidentify_config=None, inspect_config=None, item=None, inspect_template_name=None, reidentify_template_name=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Re-identifies content that has been de-identified.
- Parameters
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
reidentify_config (dict or google.cloud.dlp_v2.types.DeidentifyConfig) – (Optional) Configuration for the re-identification of the content item.
inspect_config (dict or google.cloud.dlp_v2.types.InspectConfig) – (Optional) Configuration for the inspector.
item (dict or google.cloud.dlp_v2.types.ContentItem) – (Optional) The item to re-identify. Will be treated as text.
inspect_template_name (str) – (Optional) Optional template to use. Any configuration directly specified in inspect_config will override those set in the template.
reidentify_template_name (str) – (Optional) Optional template to use. References an instance of DeidentifyTemplate. Any configuration directly specified in reidentify_config or inspect_config will override those set in the template.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPUpdateDeidentifyTemplateOperator
(template_id, organization_id=None, project_id=None, deidentify_template=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Updates the DeidentifyTemplate.
- Parameters
template_id (str) – The ID of deidentify template to be updated.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
deidentify_template (dict or google.cloud.dlp_v2.types.DeidentifyTemplate) – New DeidentifyTemplate value.
update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPUpdateInspectTemplateOperator
(template_id, organization_id=None, project_id=None, inspect_template=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Updates the InspectTemplate.
- Parameters
template_id (str) – The ID of the inspect template to be updated.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organzation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organzation.
inspect_template (dict or google.cloud.dlp_v2.types.InspectTemplate) – New InspectTemplate value.
update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPUpdateJobTriggerOperator
(job_trigger_id, project_id=None, job_trigger=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Updates a job trigger.
- Parameters
job_trigger_id (str) – The ID of the DLP job trigger to be updated.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. If set to None or missing, the default project_id from the GCP connection is used.
job_trigger (dict or google.cloud.dlp_v2.types.JobTrigger) – New JobTrigger value.
update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type
-
class
airflow.contrib.operators.gcp_dlp_operator.
CloudDLPUpdateStoredInfoTypeOperator
(stored_info_type_id, organization_id=None, project_id=None, config=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Updates the stored infoType by creating a new version.
- Parameters
stored_info_type_id (str) – The ID of the stored info type to be updated.
organization_id (str) – (Optional) The organization ID. Required to set this field if parent resource is an organisation.
project_id (str) – (Optional) Google Cloud Platform project ID where the DLP Instance exists. Only set this field if the parent resource is a project instead of an organisation.
config (dict or google.cloud.dlp_v2.types.StoredInfoTypeConfig) – Updated configuration for the storedInfoType. If not provided, a new version of the storedInfoType will be created with the existing configuration.
update_mask (dict or google.cloud.dlp_v2.types.FieldMask) – Mask to control which fields get updated.
retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.
timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadata (sequence[tuple[str, str]]]) – (Optional) Additional metadata that is provided to the method.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.
- Return type