Google Cloud Data Loss Prevention Operator

Google Cloud DLP, provides tools to classify, mask, tokenize, and transform sensitive elements to help you better manage the data that you collect, store, or use for business or analytics.

Prerequisite Tasks

Info-Types

Google Cloud DLP uses info-types to define what scans for.

Create Stored Info-Type

To create a custom info-type you can use CloudDLPCreateStoredInfoTypeOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

create_info_type = CloudDLPCreateStoredInfoTypeOperator(
    project_id=GCP_PROJECT,
    config=CUSTOM_INFO_TYPES,
    stored_info_type_id=CUSTOM_INFO_TYPE_ID,
    task_id="create_info_type",
)

Retrieve Stored Info-Type

To retrieve the lists of sensitive info-types supported by DLP-API for reference, you can use CloudDLPListInfoTypesOperator.

Similarly to retrieve the list custom info-types, you can use CloudDLPListStoredInfoTypesOperator.

To retrieve a single info-type CloudDLPGetStoredInfoTypeOperator

Update Stored Info-Type

To update a info-type you can use CloudDLPUpdateStoredInfoTypeOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

update_info_type = CloudDLPUpdateStoredInfoTypeOperator(
    project_id=GCP_PROJECT,
    stored_info_type_id=CUSTOM_INFO_TYPE_ID,
    config=UPDATE_CUSTOM_INFO_TYPE,
    task_id="update_info_type",
)

Deleting Stored Info-Type

To delete a info-type you can use CloudDLPDeleteStoredInfoTypeOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

delete_info_type = CloudDLPDeleteStoredInfoTypeOperator(
    project_id=GCP_PROJECT,
    stored_info_type_id=CUSTOM_INFO_TYPE_ID,
    task_id="delete_info_type",
)

Templates

Templates can be used to create and persist configuration information to use with the Cloud Data Loss Prevention. There are two types of DLP templates supported by Airflow:

  • Inspection Template

  • De-Identification Template

Here we will be using identification template for our example

Creating Template

To create a inspection template you can use CloudDLPCreateInspectTemplateOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

create_template = CloudDLPCreateInspectTemplateOperator(
    project_id=GCP_PROJECT,
    inspect_template=INSPECT_TEMPLATE,
    template_id=TEMPLATE_ID,
    task_id="create_template",
    do_xcom_push=True,
)

Retrieving Template

If you already have an existing inspect template you can retrieve it by use CloudDLPGetInspectTemplateOperator List of existing inspect templates can be retrieved by CloudDLPListInspectTemplatesOperator

Using Template

To find potentially sensitive info using the inspection template we just created, we can use CloudDLPInspectContentOperator

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

inspect_content = CloudDLPInspectContentOperator(
    task_id="inspect_content",
    project_id=GCP_PROJECT,
    item=ITEM,
    inspect_template_name="{{ task_instance.xcom_pull('create_template', key='return_value')['name'] }}",
)

Updating Template

To update the template you can use CloudDLPUpdateInspectTemplateOperator.

Deleting Template

To delete the template you can use CloudDLPDeleteInspectTemplateOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

delete_template = CloudDLPDeleteInspectTemplateOperator(
    task_id="delete_template",
    template_id=TEMPLATE_ID,
    project_id=GCP_PROJECT,
)

De-Identification Template

Like Inspect templates, De-Identification templates also have CRUD operators

  • CloudDLPCreateDeidentifyTemplateOperator

  • CloudDLPDeleteDeidentifyTemplateOperator

  • CloudDLPUpdateDeidentifyTemplateOperator

  • CloudDLPGetDeidentifyTemplateOperator

  • CloudDLPListDeidentifyTemplatesOperator

Jobs & Job Triggers

Cloud Data Loss Protection uses a job to run actions to scan content for sensitive data or calculate the risk of re-identification. You can schedule these jobs using job triggers.

Creating Job

To create a job you can use CloudDLPCreateDLPJobOperator.

Retrieving Job

To retrieve the list of jobs you can use CloudDLPListDLPJobsOperator. To retrieve a single job CloudDLPGetDLPJobOperator.

Deleting Job

To delete a job you can use CloudDLPDeleteDLPJobOperator.

Canceling a Job

To start asynchronous cancellation of a long-running DLP job you can use CloudDLPCancelDLPJobOperator.

Creating Job Trigger

To create a job trigger you can use CloudDLPCreateJobTriggerOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

create_trigger = CloudDLPCreateJobTriggerOperator(
    project_id=GCP_PROJECT,
    job_trigger=JOB_TRIGGER,
    trigger_id=TRIGGER_ID,
    task_id="create_trigger",
)

Retrieving Job Trigger

To retrieve list of job triggers you can use CloudDLPListJobTriggersOperator. To retrieve a single job trigger you can use CloudDLPGetDLPJobTriggerOperator.

Updating Job Trigger

To update a job trigger you can use CloudDLPUpdateJobTriggerOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

update_trigger = CloudDLPUpdateJobTriggerOperator(
    project_id=GCP_PROJECT,
    job_trigger_id=TRIGGER_ID,
    job_trigger=JOB_TRIGGER,
    task_id="update_info_type",
)

Deleting Job Trigger

To delete a job trigger you can use CloudDLPDeleteJobTriggerOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

delete_trigger = CloudDLPDeleteJobTriggerOperator(
    project_id=GCP_PROJECT, job_trigger_id=TRIGGER_ID, task_id="delete_info_type"
)

Content Method

Unlike storage methods (Jobs) content method are synchronous, stateless methods.

De-identify Content

De-identification is the process of removing identifying information from data. Configuration information defines how you want the sensitive data de-identified.

This config can either be saved and persisted in de-identification templates or defined in a DeidentifyConfig object:

DEIDENTIFY_CONFIG = {
    "info_type_transformations": {
        "transformations": [
            {
                "primitive_transformation": {
                    "replace_config": {"new_value": {"string_value": "[deidentified_number]"}}
                }
            }
        ]
    }
}

To de-identify potentially sensitive information from a content item, you can use CloudDLPDeidentifyContentOperator.

airflow/providers/google/cloud/example_dags/example_dlp.py[source]

deidentify_content = CloudDLPDeidentifyContentOperator(
    project_id=GCP_PROJECT,
    item=ITEM,
    deidentify_config=DEIDENTIFY_CONFIG,
    inspect_config=INSPECT_CONFIG,
    task_id="deidentify_content",
)

Re-identify Content

To re-identify the content that has been de-identified you can use CloudDLPReidentifyContentOperator.

Redact Image

To redact potentially sensitive information from the content image you can use CloudDLPRedactImageOperator.

Reference

For further information, look at:

Was this entry helpful?