Google Cloud AutoML Operators

The Google Cloud AutoML makes the power of machine learning available to you even if you have limited knowledge of machine learning. You can use AutoML to build on Google’s machine learning capabilities to create your own custom machine learning models that are tailored to your business needs, and then integrate those models into your applications and web sites.

Prerequisite Tasks

To use these operators, you must do a few things:

Creating Datasets

To create a Google AutoML dataset you can use AutoMLCreateDatasetOperator. The operator returns dataset id in XCom under dataset_id key.

This operator is deprecated when running for text, video and vision prediction and will be removed soon. All the functionality of legacy AutoML Natural Language, Vision, Video Intelligence and new features are available on the Vertex AI platform. Please use CreateDatasetOperator

tests/system/providers/google/cloud/automl/example_automl_dataset.py[source]

create_dataset = AutoMLCreateDatasetOperator(
    task_id="create_dataset",
    dataset=DATASET,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)
dataset_id = create_dataset.output["dataset_id"]

After creating a dataset you can use it to import some data using AutoMLImportDataOperator.

tests/system/providers/google/cloud/automl/example_automl_dataset.py[source]

import_dataset = AutoMLImportDataOperator(
    task_id="import_dataset",
    dataset_id=dataset_id,
    location=GCP_AUTOML_LOCATION,
    input_config=IMPORT_INPUT_CONFIG,
)

To update dataset you can use AutoMLTablesUpdateDatasetOperator.

This operator is deprecated when running for text, video and vision prediction and will be removed soon. All the functionality of legacy AutoML Natural Language, Vision, Video Intelligence and new features are available on the Vertex AI platform. Please use UpdateDatasetOperator

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_dataset.py[source]

update_dataset_job = UpdateDatasetOperator(
    task_id="update_dataset",
    project_id=PROJECT_ID,
    region=REGION,
    dataset_id=create_video_dataset_job.output["dataset_id"],
    dataset=DATASET_TO_UPDATE,
    update_mask=TEST_UPDATE_MASK,
)

Listing Table And Columns Specs

To list table specs you can use AutoMLTablesListTableSpecsOperator.

To list column specs you can use AutoMLTablesListColumnSpecsOperator.

AutoML Tables related operators are deprecated. Please use related Vertex AI Tabular operators.

Operations On Models

To create a Google AutoML model you can use AutoMLTrainModelOperator. The operator will wait for the operation to complete. Additionally the operator returns the id of model in XCom under model_id key.

This operator is deprecated when running for text, video and vision prediction and will be removed soon. All the functionality of legacy AutoML Natural Language, Vision, Video Intelligence and new features are available on the Vertex AI platform. Please use CreateAutoMLTextTrainingJobOperator, CreateAutoMLImageTrainingJobOperator or CreateAutoMLVideoTrainingJobOperator.

The Vertex AutoMLText API for model training is deprecated on September 15, 2024 and the other part will be deprecated on June 15, 2025. Please consider using fine tuning with Gemini model - https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-tuning.

You can find example on how to use VertexAI operators for AutoML Vision classification here:

tests/system/providers/google/cloud/automl/example_automl_vision_classification.py[source]

create_auto_ml_image_training_job = CreateAutoMLImageTrainingJobOperator(
    task_id="auto_ml_image_task",
    display_name=IMAGE_DISPLAY_NAME,
    dataset_id=image_dataset_id,
    prediction_type="classification",
    multi_label=False,
    model_type="CLOUD",
    training_fraction_split=0.6,
    validation_fraction_split=0.2,
    test_fraction_split=0.2,
    budget_milli_node_hours=8000,
    model_display_name=MODEL_DISPLAY_NAME,
    disable_early_stopping=False,
    region=REGION,
    project_id=PROJECT_ID,
)

Example on how to use VertexAI operators for AutoML Video Intelligence classification you can find here:

tests/system/providers/google/cloud/automl/example_automl_video_classification.py[source]

create_auto_ml_video_training_job = CreateAutoMLVideoTrainingJobOperator(
    task_id="auto_ml_video_task",
    display_name=VIDEO_DISPLAY_NAME,
    prediction_type="classification",
    model_type="CLOUD",
    dataset_id=video_dataset_id,
    model_display_name=MODEL_DISPLAY_NAME,
    region=REGION,
    project_id=PROJECT_ID,
)

When running Vertex AI Operator for training data, please ensure that your data is correctly stored in Vertex AI datasets. To create and import data to the dataset please use CreateDatasetOperator and ImportDataOperator

tests/system/providers/google/cloud/automl/example_automl_model.py[source]

create_model = AutoMLTrainModelOperator(
    task_id="create_model",
    model=MODEL,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)
model_id = create_model.output["model_id"]

To get existing model one can use AutoMLGetModelOperator.

This operator deprecated for tables, video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use airflow.providers.google.cloud.operators.vertex_ai.model_service.GetModelOperator instead. You can find example on how to use VertexAI operators here:

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_model_service.py[source]

get_model = GetModelOperator(
    task_id="get_model", region=REGION, project_id=PROJECT_ID, model_id=model_id_v1
)

Once a model is created it could be deployed using AutoMLDeployModelOperator.

This operator deprecated for tables, video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use airflow.providers.google.cloud.operators.vertex_ai.endpoint_service.DeployModelOperator instead. You can find example on how to use VertexAI operators here:

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_endpoint.py[source]

deploy_model = DeployModelOperator(
    task_id="deploy_model",
    endpoint_id=create_endpoint.output["endpoint_id"],
    deployed_model=DEPLOYED_MODEL,
    traffic_split={"0": 100},
    region=REGION,
    project_id=PROJECT_ID,
)

If you wish to delete a model you can use AutoMLDeleteModelOperator.

This operator deprecated for tables, video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use airflow.providers.google.cloud.operators.vertex_ai.model_service.DeleteModelOperator instead. You can find example on how to use VertexAI operators here:

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_model_service.py[source]

delete_model = DeleteModelOperator(
    task_id="delete_model",
    project_id=PROJECT_ID,
    region=REGION,
    model_id=upload_model.output["model_id"],
    trigger_rule=TriggerRule.ALL_DONE,
)

Making Predictions

To obtain predictions from Google Cloud AutoML model you can use AutoMLPredictOperator or AutoMLBatchPredictOperator. In the first case the model must be deployed.

tests/system/providers/google/cloud/automl/example_automl_model.py[source]

predict_task = AutoMLPredictOperator(
    task_id="predict_task",
    model_id=model_id,
    payload={
        "row": {
            "values": PREDICT_VALUES,
        }
    },
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

tests/system/providers/google/cloud/automl/example_automl_model.py[source]

batch_predict_task = AutoMLBatchPredictOperator(
    task_id="batch_predict_task",
    model_id=model_id,
    input_config=IMPORT_INPUT_CONFIG,
    output_config=IMPORT_OUTPUT_CONFIG,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

Th AutoMLBatchPredictOperator deprecated for tables, video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.CreateBatchPredictionJobOperator, airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.GetBatchPredictionJobOperator, airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.ListBatchPredictionJobsOperator, airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.DeleteBatchPredictionJobOperator, instead. You can find examples on how to use VertexAI operators here:

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_batch_prediction_job.py[source]

create_batch_prediction_job = CreateBatchPredictionJobOperator(
    task_id="create_batch_prediction_job",
    job_display_name=JOB_DISPLAY_NAME,
    model_name="{{ti.xcom_pull('auto_ml_forecasting_task')['name']}}",
    predictions_format="csv",
    bigquery_source=BIGQUERY_SOURCE,
    gcs_destination_prefix=GCS_DESTINATION_PREFIX,
    model_parameters=MODEL_PARAMETERS,
    region=REGION,
    project_id=PROJECT_ID,
)

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_batch_prediction_job.py[source]

list_batch_prediction_job = ListBatchPredictionJobsOperator(
    task_id="list_batch_prediction_jobs",
    region=REGION,
    project_id=PROJECT_ID,
)

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_batch_prediction_job.py[source]

delete_batch_prediction_job = DeleteBatchPredictionJobOperator(
    task_id="delete_batch_prediction_job",
    batch_prediction_job_id=create_batch_prediction_job.output["batch_prediction_job_id"],
    region=REGION,
    project_id=PROJECT_ID,
    trigger_rule=TriggerRule.ALL_DONE,
)

Listing And Deleting Datasets

You can get a list of AutoML datasets using AutoMLListDatasetOperator. The operator returns list of datasets ids in XCom under dataset_id_list key.

This operator deprecated for tables, video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use airflow.providers.google.cloud.operators.vertex_ai.dataset.ListDatasetsOperator instead. You can find example on how to use VertexAI operators here:

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_dataset.py[source]

list_dataset_job = ListDatasetsOperator(
    task_id="list_dataset",
    region=REGION,
    project_id=PROJECT_ID,
)

To delete a dataset you can use AutoMLDeleteDatasetOperator. The delete operator allows also to pass list or coma separated string of datasets ids to be deleted.

This operator deprecated for tables, video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use airflow.providers.google.cloud.operators.vertex_ai.dataset.DeleteDatasetOperator instead. You can find example on how to use VertexAI operators here:

tests/system/providers/google/cloud/vertex_ai/example_vertex_ai_dataset.py[source]

delete_dataset_job = DeleteDatasetOperator(
    task_id="delete_dataset",
    dataset_id=create_text_dataset_job.output["dataset_id"],
    region=REGION,
    project_id=PROJECT_ID,
)

Reference

For further information, look at:

Was this entry helpful?