airflow.providers.google.cloud.operators.mlengine

This module contains Google Cloud MLEngine operators.

Module Contents

Classes

MLEngineStartBatchPredictionJobOperator

Start a Google Cloud ML Engine prediction job.

MLEngineManageModelOperator

Operator for managing a Google Cloud ML Engine model.

MLEngineCreateModelOperator

Creates a new model.

MLEngineGetModelOperator

Gets a particular model.

MLEngineDeleteModelOperator

Deletes a model.

MLEngineManageVersionOperator

Operator for managing a Google Cloud ML Engine version.

MLEngineCreateVersionOperator

Creates a new version in the model.

MLEngineSetDefaultVersionOperator

Sets a version in the model.

MLEngineListVersionsOperator

Lists all available versions of the model.

MLEngineDeleteVersionOperator

Deletes the version from the model.

MLEngineStartTrainingJobOperator

Operator for launching a MLEngine training job.

MLEngineTrainingCancelJobOperator

Operator for cleaning up failed MLEngine training job.

Attributes

log

airflow.providers.google.cloud.operators.mlengine.log[source]
class airflow.providers.google.cloud.operators.mlengine.MLEngineStartBatchPredictionJobOperator(*, job_id, region, data_format, input_paths, output_path, model_name=None, version_name=None, uri=None, max_worker_count=None, runtime_version=None, signature_name=None, project_id=None, gcp_conn_id='google_cloud_default', labels=None, impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Start a Google Cloud ML Engine prediction job.

See also

For more information on how to use this operator, take a look at the guide: Making predictions

NOTE: For model origin, users should consider exactly one from the three options below:

  1. Populate uri field only, which should be a GCS location that points to a tensorflow savedModel directory.

  2. Populate model_name field only, which refers to an existing model, and the default version of the model will be used.

  3. Populate both model_name and version_name fields, which refers to a specific version of a specific model.

In options 2 and 3, both model and version name should contain the minimal identifier. For instance, call:

MLEngineBatchPredictionOperator(
    ...,
    model_name='my_model',
    version_name='my_version',
    ...)

if the desired model version is projects/my_project/models/my_model/versions/my_version.

See https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs for further documentation on the parameters.

Parameters
  • job_id (str) – A unique id for the prediction job on Google Cloud ML Engine. (templated)

  • data_format (str) – The format of the input data. It will default to ‘DATA_FORMAT_UNSPECIFIED’ if is not provided or is not one of [“TEXT”, “TF_RECORD”, “TF_RECORD_GZIP”].

  • input_paths (list[str]) – A list of GCS paths of input data for batch prediction. Accepting wildcard operator *, but only at the end. (templated)

  • output_path (str) – The GCS path where the prediction results are written to. (templated)

  • region (str) – The Google Compute Engine region to run the prediction job in. (templated)

  • model_name (str | None) – The Google Cloud ML Engine model to use for prediction. If version_name is not provided, the default version of this model will be used. Should not be None if version_name is provided. Should be None if uri is provided. (templated)

  • version_name (str | None) – The Google Cloud ML Engine model version to use for prediction. Should be None if uri is provided. (templated)

  • uri (str | None) – The GCS path of the saved model to use for prediction. Should be None if model_name is provided. It should be a GCS path pointing to a tensorflow SavedModel. (templated)

  • max_worker_count (int | None) – The maximum number of workers to be used for parallel processing. Defaults to 10 if not specified. Should be a string representing the worker count (“10” instead of 10, “50” instead of 50, etc.)

  • runtime_version (str | None) – The Google Cloud ML Engine runtime version to use for batch prediction.

  • signature_name (str | None) – The name of the signature defined in the SavedModel to use for this job.

  • project_id (str | None) – The Google Cloud project name where the prediction job is submitted. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID used for connection to Google Cloud Platform.

  • labels (dict[str, str] | None) – a dictionary containing labels for the job; passed to BigQuery

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Raises

ValueError: if a unique model/version origin cannot be determined.

template_fields: Sequence[str] = ('_project_id', '_job_id', '_region', '_input_paths', '_output_path', '_model_name',...[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineManageModelOperator(*, model, operation='create', project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Operator for managing a Google Cloud ML Engine model.

Warning

This operator is deprecated. Consider using operators for specific operations: MLEngineCreateModelOperator, MLEngineGetModelOperator.

Parameters
  • model (dict) –

    A dictionary containing the information about the model. If the operation is create, then the model parameter should contain all the information about this model such as name.

    If the operation is get, the model parameter should contain the name of the model.

  • operation (str) –

    The operation to perform. Available operations are:

    • create: Creates a new model as provided by the model parameter.

    • get: Gets a particular model where the name is specified in model.

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineCreateModelOperator(*, model, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Creates a new model.

See also

For more information on how to use this operator, take a look at the guide: Creating a model

The model should be provided by the model parameter.

Parameters
  • model (dict) – A dictionary containing the information about the model.

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineGetModelOperator(*, model_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Gets a particular model.

See also

For more information on how to use this operator, take a look at the guide: Getting a model

The name of model should be specified in model_name.

Parameters
  • model_name (str) – The name of the model.

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineDeleteModelOperator(*, model_name, delete_contents=False, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Deletes a model.

See also

For more information on how to use this operator, take a look at the guide: Cleaning up

The model should be provided by the model_name parameter.

Parameters
  • model_name (str) – The name of the model.

  • delete_contents (bool) – (Optional) Whether to force the deletion even if the models is not empty. Will delete all version (if any) in the dataset if set to True. The default value is False.

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineManageVersionOperator(*, model_name, version_name=None, version=None, operation='create', project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Operator for managing a Google Cloud ML Engine version.

Warning

This operator is deprecated. Consider using operators for specific operations: MLEngineCreateVersionOperator, MLEngineSetDefaultVersionOperator, MLEngineListVersionsOperator, MLEngineDeleteVersionOperator.

Parameters
  • model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)

  • version_name (str | None) – A name to use for the version being operated upon. If not None and the version argument is None or does not have a value for the name key, then this will be populated in the payload for the name key. (templated)

  • version (dict | None) – A dictionary containing the information about the version. If the operation is create, version should contain all the information about this version such as name, and deploymentUrl. If the operation is get or delete, the version parameter should contain the name of the version. If it is None, the only operation possible would be list. (templated)

  • operation (str) –

    The operation to perform. Available operations are:

    • create: Creates a new version in the model specified by model_name, in which case the version parameter should contain all the information to create that version (e.g. name, deploymentUrl).

    • set_defaults: Sets a version in the model specified by model_name to be the default. The name of the version should be specified in the version parameter.

    • list: Lists all available versions of the model specified by model_name.

    • delete: Deletes the version specified in version parameter from the model specified by model_name). The name of the version should be specified in the version parameter.

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_version_name', '_version', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineCreateVersionOperator(*, model_name, version, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Creates a new version in the model.

See also

For more information on how to use this operator, take a look at the guide: Creating model versions

Model should be specified by model_name, in which case the version parameter should contain all the information to create that version

Parameters
  • model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)

  • version (dict) – A dictionary containing the information about the version. (templated)

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_version', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineSetDefaultVersionOperator(*, model_name, version_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Sets a version in the model.

See also

For more information on how to use this operator, take a look at the guide: Managing model versions

The model should be specified by model_name to be the default. The name of the version should be specified in the version_name parameter.

Parameters
  • model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)

  • version_name (str) – A name to use for the version being operated upon. (templated)

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_version_name', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineListVersionsOperator(*, model_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Lists all available versions of the model.

See also

For more information on how to use this operator, take a look at the guide: Managing model versions

The model should be specified by model_name.

Parameters
  • model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineDeleteVersionOperator(*, model_name, version_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Deletes the version from the model.

See also

For more information on how to use this operator, take a look at the guide: Cleaning up

The name of the version should be specified in version_name parameter from the model specified by model_name.

Parameters
  • model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)

  • version_name (str) – A name to use for the version being operated upon. (templated)

  • project_id (str | None) – The Google Cloud project name to which MLEngine model belongs.

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_model_name', '_version_name', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.mlengine.MLEngineStartTrainingJobOperator(*, job_id, region, project_id, package_uris=None, training_python_module=None, training_args=None, scale_tier=None, master_type=None, master_config=None, runtime_version=None, python_version=None, job_dir=None, service_account=None, gcp_conn_id='google_cloud_default', mode='PRODUCTION', labels=None, impersonation_chain=None, hyperparameters=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), cancel_on_kill=True, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Operator for launching a MLEngine training job.

See also

For more information on how to use this operator, take a look at the guide: Launching a Job

For more information about used parameters, check:

https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training

Parameters
  • job_id (str) – A unique templated id for the submitted Google MLEngine training job. (templated)

  • region (str) – The Google Compute Engine region to run the MLEngine training job in (templated).

  • package_uris (list[str] | None) – A list of Python package locations for the training job, which should include the main training program and any additional dependencies. This is mutually exclusive with a custom image specified via master_config. (templated)

  • training_python_module (str | None) – The name of the Python module to run within the training job after installing the packages. This is mutually exclusive with a custom image specified via master_config. (templated)

  • training_args (list[str] | None) – A list of command-line arguments to pass to the training program. (templated)

  • scale_tier (str | None) – Resource tier for MLEngine training job. (templated)

  • master_type (str | None) – The type of virtual machine to use for the master worker. It must be set whenever scale_tier is CUSTOM. (templated)

  • master_config (dict | None) – The configuration for the master worker. If this is provided, master_type must be set as well. If a custom image is specified, this is mutually exclusive with package_uris and training_python_module. (templated)

  • runtime_version (str | None) – The Google Cloud ML runtime version to use for training. (templated)

  • python_version (str | None) – The version of Python used in training. (templated)

  • job_dir (str | None) – A Google Cloud Storage path in which to store training outputs and other data needed for training. (templated)

  • service_account (str | None) – Optional service account to use when running the training application. (templated) The specified service account must have the iam.serviceAccounts.actAs role. The Google-managed Cloud ML Engine service account must have the iam.serviceAccountAdmin role for the specified service account. If set to None or missing, the Google-managed Cloud ML Engine service account will be used.

  • project_id (str) – The Google Cloud project name within which MLEngine training job should run.

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • mode (str) – Can be one of ‘DRY_RUN’/’CLOUD’. In ‘DRY_RUN’ mode, no real training job will be launched, but the MLEngine training job request will be printed out. In ‘CLOUD’ mode, a real MLEngine training job creation request will be issued.

  • labels (dict[str, str] | None) – a dictionary containing labels for the job; passed to BigQuery

  • hyperparameters (dict | None) – Optional HyperparameterSpec dictionary for hyperparameter tuning. For further reference, check: https://cloud.google.com/ai-platform/training/docs/reference/rest/v1/projects.jobs#HyperparameterSpec

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • cancel_on_kill (bool) – Flag which indicates whether cancel the hook’s job or not, when on_kill is called

  • deferrable (bool) – Run operator in the deferrable mode

template_fields: Sequence[str] = ('_project_id', '_job_id', '_region', '_package_uris', '_training_python_module',...[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately.

Relies on trigger to throw an exception, otherwise it assumes execution was successful.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

class airflow.providers.google.cloud.operators.mlengine.MLEngineTrainingCancelJobOperator(*, job_id, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Operator for cleaning up failed MLEngine training job.

Parameters
  • job_id (str) – A unique templated id for the submitted Google MLEngine training job. (templated)

  • project_id (str | None) – The Google Cloud project name within which MLEngine training job should run. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('_project_id', '_job_id', '_impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?