Google Cloud Data Loss Prevention Operator¶
Google Cloud DLP, provides tools to classify, mask, tokenize, and transform sensitive elements to help you better manage the data that you collect, store, or use for business or analytics.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Select or create a Cloud Platform project using the Cloud Console.
Enable billing for your project, as described in the Google Cloud documentation.
Enable the API, as described in the Cloud Console documentation.
Install API libraries via pip.
pip install 'apache-airflow[google]'Detailed information is available for Installation.
Info-Types¶
Google Cloud DLP uses info-types to define what scans for.
Create Stored Info-Type¶
To create a custom info-type you can use
CloudDLPCreateStoredInfoTypeOperator
.
create_info_type = CloudDLPCreateStoredInfoTypeOperator(
project_id=PROJECT_ID,
config=CUSTOM_INFO_TYPES,
stored_info_type_id=CUSTOM_INFO_TYPE_ID,
task_id="create_info_type",
)
Retrieve Stored Info-Type¶
To retrieve the lists of sensitive info-types supported by DLP-API for reference, you can use
CloudDLPListInfoTypesOperator
.
Similarly to retrieve the list custom info-types, you can use
CloudDLPListStoredInfoTypesOperator
.
To retrieve a single info-type
CloudDLPGetStoredInfoTypeOperator
Update Stored Info-Type¶
To update a info-type you can use
CloudDLPUpdateStoredInfoTypeOperator
.
update_info_type = CloudDLPUpdateStoredInfoTypeOperator(
project_id=PROJECT_ID,
stored_info_type_id=CUSTOM_INFO_TYPE_ID,
config=UPDATE_CUSTOM_INFO_TYPE,
task_id="update_info_type",
)
Deleting Stored Info-Type¶
To delete a info-type you can use
CloudDLPDeleteStoredInfoTypeOperator
.
delete_info_type = CloudDLPDeleteStoredInfoTypeOperator(
project_id=PROJECT_ID,
stored_info_type_id=CUSTOM_INFO_TYPE_ID,
task_id="delete_info_type",
)
Templates¶
Templates can be used to create and persist configuration information to use with the Cloud Data Loss Prevention. There are two types of DLP templates supported by Airflow:
Inspection Template
De-Identification Template
Here we will be using identification template for our example
Creating Template¶
To create a inspection template you can use
CloudDLPCreateInspectTemplateOperator
.
create_template = CloudDLPCreateInspectTemplateOperator(
task_id="create_template",
project_id=PROJECT_ID,
inspect_template=INSPECT_TEMPLATE,
template_id=TEMPLATE_ID,
do_xcom_push=True,
)
Retrieving Template¶
If you already have an existing inspect template you can retrieve it by use
CloudDLPGetInspectTemplateOperator
List of existing inspect templates can be retrieved by
CloudDLPListInspectTemplatesOperator
Using Template¶
To find potentially sensitive info using the inspection template we just created, we can use
CloudDLPInspectContentOperator
inspect_content = CloudDLPInspectContentOperator(
task_id="inspect_content",
project_id=PROJECT_ID,
item=ITEM,
inspect_template_name="{{ task_instance.xcom_pull('create_template', key='return_value')['name'] }}",
)
Updating Template¶
To update the template you can use
CloudDLPUpdateInspectTemplateOperator
.
Deleting Template¶
To delete the template you can use
CloudDLPDeleteInspectTemplateOperator
.
delete_template = CloudDLPDeleteInspectTemplateOperator(
task_id="delete_template",
template_id=TEMPLATE_ID,
project_id=PROJECT_ID,
)
De-Identification Template¶
Like Inspect templates, De-Identification templates also have CRUD operators
CloudDLPCreateDeidentifyTemplateOperator
CloudDLPDeleteDeidentifyTemplateOperator
CloudDLPUpdateDeidentifyTemplateOperator
CloudDLPGetDeidentifyTemplateOperator
CloudDLPListDeidentifyTemplatesOperator
Jobs & Job Triggers¶
Cloud Data Loss Protection uses a job to run actions to scan content for sensitive data or calculate the risk of re-identification. You can schedule these jobs using job triggers.
Creating Job¶
To create a job you can use
CloudDLPCreateDLPJobOperator
.
Retrieving Job¶
To retrieve the list of jobs you can use
CloudDLPListDLPJobsOperator
.
To retrieve a single job
CloudDLPGetDLPJobOperator
.
Deleting Job¶
To delete a job you can use
CloudDLPDeleteDLPJobOperator
.
Canceling a Job¶
To start asynchronous cancellation of a long-running DLP job you can use
CloudDLPCancelDLPJobOperator
.
Creating Job Trigger¶
To create a job trigger you can use
CloudDLPCreateJobTriggerOperator
.
create_trigger = CloudDLPCreateJobTriggerOperator(
project_id=PROJECT_ID,
job_trigger=JOB_TRIGGER,
trigger_id=TRIGGER_ID,
task_id="create_trigger",
)
Retrieving Job Trigger¶
To retrieve list of job triggers you can use
CloudDLPListJobTriggersOperator
.
To retrieve a single job trigger you can use
CloudDLPGetDLPJobTriggerOperator
.
Updating Job Trigger¶
To update a job trigger you can use
CloudDLPUpdateJobTriggerOperator
.
update_trigger = CloudDLPUpdateJobTriggerOperator(
project_id=PROJECT_ID,
job_trigger_id=TRIGGER_ID,
job_trigger=JOB_TRIGGER,
task_id="update_info_type",
)
Deleting Job Trigger¶
To delete a job trigger you can use
CloudDLPDeleteJobTriggerOperator
.
delete_trigger = CloudDLPDeleteJobTriggerOperator(
project_id=PROJECT_ID, job_trigger_id=TRIGGER_ID, task_id="delete_info_type"
)
Content Method¶
Unlike storage methods (Jobs) content method are synchronous, stateless methods.
De-identify Content¶
De-identification is the process of removing identifying information from data. Configuration information defines how you want the sensitive data de-identified.
This config can either be saved and persisted in de-identification templates or defined in a DeidentifyConfig
object:
DEIDENTIFY_CONFIG = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"replace_config": {"new_value": {"string_value": "[deidentified_number]"}}
}
}
]
}
}
To de-identify potentially sensitive information from a content item, you can use
CloudDLPDeidentifyContentOperator
.
deidentify_content = CloudDLPDeidentifyContentOperator(
project_id=PROJECT_ID,
item=ITEM,
deidentify_config=DEIDENTIFY_CONFIG,
inspect_config=INSPECT_CONFIG,
task_id="deidentify_content",
)
Re-identify Content¶
To re-identify the content that has been de-identified you can use
CloudDLPReidentifyContentOperator
.
Redact Image¶
To redact potentially sensitive information from the content image you can use
CloudDLPRedactImageOperator
.