airflow.providers.google.cloud.operators.mlengine
¶
This module contains Google Cloud MLEngine operators.
Module Contents¶
Classes¶
Start a Google Cloud ML Engine prediction job. |
|
Operator for managing a Google Cloud ML Engine model. |
|
Creates a new model. |
|
Gets a particular model. |
|
Deletes a model. |
|
Operator for managing a Google Cloud ML Engine version. |
|
Creates a new version in the model. |
|
Sets a version in the model. |
|
Lists all available versions of the model. |
|
Deletes the version from the model. |
|
Operator for launching a MLEngine training job. |
|
Operator for cleaning up failed MLEngine training job. |
Attributes¶
- class airflow.providers.google.cloud.operators.mlengine.MLEngineStartBatchPredictionJobOperator(*, job_id, region, data_format, input_paths, output_path, model_name=None, version_name=None, uri=None, max_worker_count=None, runtime_version=None, signature_name=None, project_id=None, gcp_conn_id='google_cloud_default', labels=None, impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Start a Google Cloud ML Engine prediction job.
See also
For more information on how to use this operator, take a look at the guide: Making predictions
NOTE: For model origin, users should consider exactly one from the three options below:
Populate
uri
field only, which should be a GCS location that points to a tensorflow savedModel directory.Populate
model_name
field only, which refers to an existing model, and the default version of the model will be used.Populate both
model_name
andversion_name
fields, which refers to a specific version of a specific model.
In options 2 and 3, both model and version name should contain the minimal identifier. For instance, call:
MLEngineBatchPredictionOperator( ..., model_name='my_model', version_name='my_version', ...)
if the desired model version is
projects/my_project/models/my_model/versions/my_version
.See https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs for further documentation on the parameters.
- Parameters
job_id (str) – A unique id for the prediction job on Google Cloud ML Engine. (templated)
data_format (str) – The format of the input data. It will default to ‘DATA_FORMAT_UNSPECIFIED’ if is not provided or is not one of [“TEXT”, “TF_RECORD”, “TF_RECORD_GZIP”].
input_paths (list[str]) – A list of GCS paths of input data for batch prediction. Accepting wildcard operator
*
, but only at the end. (templated)output_path (str) – The GCS path where the prediction results are written to. (templated)
region (str) – The Google Compute Engine region to run the prediction job in. (templated)
model_name (str | None) – The Google Cloud ML Engine model to use for prediction. If version_name is not provided, the default version of this model will be used. Should not be None if version_name is provided. Should be None if uri is provided. (templated)
version_name (str | None) – The Google Cloud ML Engine model version to use for prediction. Should be None if uri is provided. (templated)
uri (str | None) – The GCS path of the saved model to use for prediction. Should be None if model_name is provided. It should be a GCS path pointing to a tensorflow SavedModel. (templated)
max_worker_count (int | None) – The maximum number of workers to be used for parallel processing. Defaults to 10 if not specified. Should be a string representing the worker count (“10” instead of 10, “50” instead of 50, etc.)
runtime_version (str | None) – The Google Cloud ML Engine runtime version to use for batch prediction.
signature_name (str | None) – The name of the signature defined in the SavedModel to use for this job.
project_id (str | None) – The Google Cloud project name where the prediction job is submitted. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID used for connection to Google Cloud Platform.
labels (dict[str, str] | None) – a dictionary containing labels for the job; passed to BigQuery
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- Raises
ValueError
: if a unique model/version origin cannot be determined.
- class airflow.providers.google.cloud.operators.mlengine.MLEngineManageModelOperator(*, model, operation='create', project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Operator for managing a Google Cloud ML Engine model.
Warning
This operator is deprecated. Consider using operators for specific operations: MLEngineCreateModelOperator, MLEngineGetModelOperator.
- Parameters
model (dict) –
A dictionary containing the information about the model. If the operation is create, then the model parameter should contain all the information about this model such as name.
If the operation is get, the model parameter should contain the name of the model.
operation (str) –
The operation to perform. Available operations are:
create
: Creates a new model as provided by the model parameter.get
: Gets a particular model where the name is specified in model.
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineCreateModelOperator(*, model, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Creates a new model.
See also
For more information on how to use this operator, take a look at the guide: Creating a model
The model should be provided by the model parameter.
- Parameters
model (dict) – A dictionary containing the information about the model.
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineGetModelOperator(*, model_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Gets a particular model.
See also
For more information on how to use this operator, take a look at the guide: Getting a model
The name of model should be specified in model_name.
- Parameters
model_name (str) – The name of the model.
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineDeleteModelOperator(*, model_name, delete_contents=False, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Deletes a model.
See also
For more information on how to use this operator, take a look at the guide: Cleaning up
The model should be provided by the model_name parameter.
- Parameters
model_name (str) – The name of the model.
delete_contents (bool) – (Optional) Whether to force the deletion even if the models is not empty. Will delete all version (if any) in the dataset if set to True. The default value is False.
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineManageVersionOperator(*, model_name, version_name=None, version=None, operation='create', project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Operator for managing a Google Cloud ML Engine version.
Warning
This operator is deprecated. Consider using operators for specific operations: MLEngineCreateVersionOperator, MLEngineSetDefaultVersionOperator, MLEngineListVersionsOperator, MLEngineDeleteVersionOperator.
- Parameters
model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)
version_name (str | None) – A name to use for the version being operated upon. If not None and the version argument is None or does not have a value for the name key, then this will be populated in the payload for the name key. (templated)
version (dict | None) – A dictionary containing the information about the version. If the operation is create, version should contain all the information about this version such as name, and deploymentUrl. If the operation is get or delete, the version parameter should contain the name of the version. If it is None, the only operation possible would be list. (templated)
operation (str) –
The operation to perform. Available operations are:
create
: Creates a new version in the model specified by model_name, in which case the version parameter should contain all the information to create that version (e.g. name, deploymentUrl).set_defaults
: Sets a version in the model specified by model_name to be the default. The name of the version should be specified in the version parameter.list
: Lists all available versions of the model specified by model_name.delete
: Deletes the version specified in version parameter from the model specified by model_name). The name of the version should be specified in the version parameter.
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineCreateVersionOperator(*, model_name, version, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Creates a new version in the model.
See also
For more information on how to use this operator, take a look at the guide: Creating model versions
Model should be specified by model_name, in which case the version parameter should contain all the information to create that version
- Parameters
model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)
version (dict) – A dictionary containing the information about the version. (templated)
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineSetDefaultVersionOperator(*, model_name, version_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Sets a version in the model.
See also
For more information on how to use this operator, take a look at the guide: Managing model versions
The model should be specified by model_name to be the default. The name of the version should be specified in the version_name parameter.
- Parameters
model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)
version_name (str) – A name to use for the version being operated upon. (templated)
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineListVersionsOperator(*, model_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Lists all available versions of the model.
See also
For more information on how to use this operator, take a look at the guide: Managing model versions
The model should be specified by model_name.
- Parameters
model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineDeleteVersionOperator(*, model_name, version_name, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Deletes the version from the model.
See also
For more information on how to use this operator, take a look at the guide: Cleaning up
The name of the version should be specified in version_name parameter from the model specified by model_name.
- Parameters
model_name (str) – The name of the Google Cloud ML Engine model that the version belongs to. (templated)
version_name (str) – A name to use for the version being operated upon. (templated)
project_id (str | None) – The Google Cloud project name to which MLEngine model belongs.
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.mlengine.MLEngineStartTrainingJobOperator(*, job_id, region, project_id, package_uris=None, training_python_module=None, training_args=None, scale_tier=None, master_type=None, master_config=None, runtime_version=None, python_version=None, job_dir=None, service_account=None, gcp_conn_id='google_cloud_default', mode='PRODUCTION', labels=None, impersonation_chain=None, hyperparameters=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), cancel_on_kill=True, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Operator for launching a MLEngine training job.
See also
For more information on how to use this operator, take a look at the guide: Launching a Job
- For more information about used parameters, check:
https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training
- Parameters
job_id (str) – A unique templated id for the submitted Google MLEngine training job. (templated)
region (str) – The Google Compute Engine region to run the MLEngine training job in (templated).
package_uris (list[str] | None) – A list of Python package locations for the training job, which should include the main training program and any additional dependencies. This is mutually exclusive with a custom image specified via master_config. (templated)
training_python_module (str | None) – The name of the Python module to run within the training job after installing the packages. This is mutually exclusive with a custom image specified via master_config. (templated)
training_args (list[str] | None) – A list of command-line arguments to pass to the training program. (templated)
scale_tier (str | None) – Resource tier for MLEngine training job. (templated)
master_type (str | None) – The type of virtual machine to use for the master worker. It must be set whenever scale_tier is CUSTOM. (templated)
master_config (dict | None) – The configuration for the master worker. If this is provided, master_type must be set as well. If a custom image is specified, this is mutually exclusive with package_uris and training_python_module. (templated)
runtime_version (str | None) – The Google Cloud ML runtime version to use for training. (templated)
python_version (str | None) – The version of Python used in training. (templated)
job_dir (str | None) – A Google Cloud Storage path in which to store training outputs and other data needed for training. (templated)
service_account (str | None) – Optional service account to use when running the training application. (templated) The specified service account must have the iam.serviceAccounts.actAs role. The Google-managed Cloud ML Engine service account must have the iam.serviceAccountAdmin role for the specified service account. If set to None or missing, the Google-managed Cloud ML Engine service account will be used.
project_id (str) – The Google Cloud project name within which MLEngine training job should run.
gcp_conn_id (str) – The connection ID to use when fetching connection info.
mode (str) – Can be one of ‘DRY_RUN’/’CLOUD’. In ‘DRY_RUN’ mode, no real training job will be launched, but the MLEngine training job request will be printed out. In ‘CLOUD’ mode, a real MLEngine training job creation request will be issued.
labels (dict[str, str] | None) – a dictionary containing labels for the job; passed to BigQuery
hyperparameters (dict | None) – Optional HyperparameterSpec dictionary for hyperparameter tuning. For further reference, check: https://cloud.google.com/ai-platform/training/docs/reference/rest/v1/projects.jobs#HyperparameterSpec
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
cancel_on_kill (bool) – Flag which indicates whether cancel the hook’s job or not, when on_kill is called
deferrable (bool) – Run operator in the deferrable mode
- template_fields: Sequence[str] = ('_project_id', '_job_id', '_region', '_package_uris', '_training_python_module',...[source]¶
- execute(context)[source]¶
Derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
- class airflow.providers.google.cloud.operators.mlengine.MLEngineTrainingCancelJobOperator(*, job_id, project_id=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Operator for cleaning up failed MLEngine training job.
- Parameters
job_id (str) – A unique templated id for the submitted Google MLEngine training job. (templated)
project_id (str | None) – The Google Cloud project name within which MLEngine training job should run. If set to None or missing, the default project_id from the Google Cloud connection is used. (templated)
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).