airflow.providers.google.cloud.utils.field_validator

Validator for body fields sent via Google Cloud API.

The validator performs validation of the body (being dictionary of fields) that is sent in the API request to Google Cloud (via googleclient API usually).

Context

The specification mostly focuses on helping Airflow DAG developers in the development phase. You can build your own Google Cloud operator (such as GcfDeployOperator for example) which can have built-in validation specification for the particular API. It's super helpful when developer plays with different fields and their values at the initial phase of DAG development. Most of the Google Cloud APIs perform their own validation on the server side, but most of the requests are asynchronous and you need to wait for result of the operation. This takes precious times and slows down iteration over the API. BodyFieldValidator is meant to be used on the client side and it should therefore provide an instant feedback to the developer on misspelled or wrong type of parameters.

The validation should be performed in "execute()" method call in order to allow template parameters to be expanded before validation is performed.

Types of fields

Specification is an array of dictionaries - each dictionary describes field, its type, validation, optionality, api_version supported and nested fields (for unions and dicts).

Typically (for clarity and in order to aid syntax highlighting) the array of dicts should be defined as series of dict() executions. Fragment of example specification might look as follows:

SPECIFICATION =[
   dict(name="an_union", type="union", optional=True, fields=[
       dict(name="variant_1", type="dict"),
       dict(name="variant_2", regexp=r'^.+$', api_version='v1beta2'),
   ),
   dict(name="an_union", type="dict", fields=[
       dict(name="field_1", type="dict"),
       dict(name="field_2", regexp=r'^.+$'),
   ),
   ...
]

Each field should have key = "name" indicating field name. The field can be of one of the following types:

  • Dict fields: (key = "type", value="dict"): Field of this type should contain nested fields in form of an array of dicts. Each of the fields in the array is then expected (unless marked as optional) and validated recursively. If an extra field is present in the dictionary, warning is printed in log file (but the validation succeeds - see the Forward-compatibility notes)

  • List fields: (key = "type", value="list"): Field of this type should be a list. Only the type correctness is validated. The contents of a list are not subject to validation.

  • Union fields (key = "type", value="union"): field of this type should contain nested fields in form of an array of dicts. One of the fields (and only one) should be present (unless the union is marked as optional). If more than one union field is present, FieldValidationException is raised. If none of the union fields is present - warning is printed in the log (see below Forward-compatibility notes).

  • Fields validated for non-emptiness: (key = "allow_empty") - this applies only to fields the value of which is a string, and it allows to check for non-emptiness of the field (allow_empty=False).

  • Regexp-validated fields: (key = "regexp") - fields of this type are assumed to be strings and they are validated with the regexp specified. Remember that the regexps should ideally contain ^ at the beginning and $ at the end to make sure that the whole field content is validated. Typically such regexp validations should be used carefully and sparingly (see Forward-compatibility notes below).

  • Custom-validated fields: (key = "custom_validation") - fields of this type are validated using method specified via custom_validation field. Any exception thrown in the custom validation will be turned into FieldValidationException and will cause validation to fail. Such custom validations might be used to check numeric fields (including ranges of values), booleans or any other types of fields.

  • API version: (key="api_version") if API version is specified, then the field will only be validated when api_version used at field validator initialization matches exactly the version specified. If you want to declare fields that are available in several versions of the APIs, you should specify the field as many times as many API versions should be supported (each time with different API version).

  • if none of the keys ("type", "regexp", "custom_validation" - the field is not validated

You can see some of the field examples in EXAMPLE_VALIDATION_SPECIFICATION.

Forward-compatibility notes

Certain decisions are crucial to allow the client APIs to work also with future API versions. Since body attached is passed to the API’s call, this is entirely possible to pass-through any new fields in the body (for future API versions) - albeit without validation on the client side - they can and will still be validated on the server side usually.

Here are the guidelines that you should follow to make validation forward-compatible:

  • most of the fields are not validated for their content. It's possible to use regexp in some specific cases that are guaranteed not to change in the future, but for most fields regexp validation should be r'^.+$' indicating check for non-emptiness

  • api_version is not validated - user can pass any future version of the api here. The API version is only used to filter parameters that are marked as present in this api version any new (not present in the specification) fields in the body are allowed (not verified) For dictionaries, new fields can be added to dictionaries by future calls. However if an unknown field in dictionary is added, a warning is logged by the client (but validation remains successful). This is very nice feature to protect against typos in names.

  • For unions, newly added union variants can be added by future calls and they will pass validation, however the content or presence of those fields will not be validated. This means that it’s possible to send a new non-validated union field together with an old validated field and this problem will not be detected by the client. In such case warning will be printed.

  • When you add validator to an operator, you should also add validate_body parameter (default = True) to __init__ of such operators - when it is set to False, no validation should be performed. This is a safeguard for totally unpredicted and backwards-incompatible changes that might sometimes occur in the APIs.

Module Contents

Classes

GcpBodyFieldValidator

Validates correctness of request body according to specification.

Attributes

COMPOSITE_FIELD_TYPES

EXAMPLE_VALIDATION_SPECIFICATION

airflow.providers.google.cloud.utils.field_validator.COMPOSITE_FIELD_TYPES = ['union', 'dict', 'list'][source]
exception airflow.providers.google.cloud.utils.field_validator.GcpFieldValidationException[source]

Bases: airflow.exceptions.AirflowException

Thrown when validation finds dictionary field not valid according to specification.

exception airflow.providers.google.cloud.utils.field_validator.GcpValidationSpecificationException[source]

Bases: airflow.exceptions.AirflowException

Thrown when validation specification is wrong.

This should only happen during development as ideally

specification itself should not be invalid ;) .

airflow.providers.google.cloud.utils.field_validator.EXAMPLE_VALIDATION_SPECIFICATION[source]
class airflow.providers.google.cloud.utils.field_validator.GcpBodyFieldValidator(validation_specs: Sequence[Dict], api_version: str)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

Validates correctness of request body according to specification.

The specification can describe various type of fields including custom validation, and union of fields. This validator is to be reusable by various operators. See the EXAMPLE_VALIDATION_SPECIFICATION for some examples and explanations of how to create specification.

Parameters
  • validation_specs (list[dict]) -- dictionary describing validation specification

  • api_version (str) -- Version of the api used (for example v1)

validate(self, body_to_validate: dict) None[source]

Validates if the body (dictionary) follows specification that the validator was instantiated with. Raises ValidationSpecificationException or ValidationFieldException in case of problems with specification or the body not conforming to the specification respectively.

Parameters

body_to_validate (dict) -- body that must follow the specification

Returns

None

Was this entry helpful?