airflow.providers.google.cloud.utils.field_sanitizer

Sanitizer for body fields sent via Google Cloud API.

The sanitizer removes fields specified from the body.

Context

In some cases where Google Cloud operation requires modification of existing resources (such as instances or instance templates) we need to sanitize body of the resources returned via Google Cloud APIs. This is in the case when we retrieve information from Google Cloud first, modify the body and either update the existing resource or create a new one with the modified body. Usually when you retrieve resource from Google Cloud you get some extra fields which are Output-only, and we need to delete those fields if we want to use the body as input for subsequent create/insert type operation.

Field specification

Specification of fields is an array of strings which denote names of fields to be removed. The field can be either direct field name to remove from the body or the full specification of the path you should delete - separated with '.'

>>> FIELDS_TO_SANITIZE = [
>>>    "kind",
>>>    "properties.disks.kind",
>>>    "properties.metadata.kind",
>>>]
>>> body = {
>>>     "kind": "compute#instanceTemplate",
>>>     "name": "instance",
>>>     "properties": {
>>>         "disks": [
>>>             {
>>>                 "name": "a",
>>>                 "kind": "compute#attachedDisk",
>>>                 "type": "PERSISTENT",
>>>                 "mode": "READ_WRITE",
>>>             },
>>>             {
>>>                 "name": "b",
>>>                 "kind": "compute#attachedDisk",
>>>                 "type": "PERSISTENT",
>>>                 "mode": "READ_WRITE",
>>>             }
>>>         ],
>>>         "metadata": {
>>>             "kind": "compute#metadata",
>>>             "fingerprint": "GDPUYxlwHe4="
>>>         },
>>>     }
>>> }
>>> sanitizer=GcpBodyFieldSanitizer(FIELDS_TO_SANITIZE)
>>> sanitizer.sanitize(body)
>>> json.dumps(body, indent=2)
{
    "name":  "instance",
    "properties": {
        "disks": [
            {
                "name": "a",
                "type": "PERSISTENT",
                "mode": "READ_WRITE",
            },
            {
                "name": "b",
                "type": "PERSISTENT",
                "mode": "READ_WRITE",
            }
        ],
        "metadata": {
            "fingerprint": "GDPUYxlwHe4="
        },
    }
}

Note that the components of the path can be either dictionaries or arrays of dictionaries. In case they are dictionaries, subsequent component names key of the field, in case of arrays - the sanitizer iterates through all dictionaries in the array and searches components in all elements of the array.

Module Contents

exception airflow.providers.google.cloud.utils.field_sanitizer.GcpFieldSanitizerException[source]

Bases: airflow.exceptions.AirflowException

Thrown when sanitizer finds unexpected field type in the path (other than dict or array).

class airflow.providers.google.cloud.utils.field_sanitizer.GcpBodyFieldSanitizer(sanitize_specs: List[str])[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

Sanitizes the body according to specification.

Parameters

sanitize_specs (list[str]) -- array of strings that specifies which fields to remove

sanitize(self, body)[source]

Sanitizes the body according to specification.

Was this entry helpful?