Google Cloud AutoML Operators¶
The Google Cloud AutoML makes the power of machine learning available to you even if you have limited knowledge of machine learning. You can use AutoML to build on Google’s machine learning capabilities to create your own custom machine learning models that are tailored to your business needs, and then integrate those models into your applications and web sites.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Select or create a Cloud Platform project using the Cloud Console.
Enable billing for your project, as described in the Google Cloud documentation.
Enable the API, as described in the Cloud Console documentation.
Install API libraries via pip.
pip install 'apache-airflow[google]'Detailed information is available for Installation.
Creating Datasets¶
To create a Google AutoML dataset you can use
AutoMLCreateDatasetOperator
.
The operator returns dataset id in XCom under dataset_id
key.
Warning
This operator is deprecated when running for text, video and vision prediction and will be removed soon.
All the functionality of legacy AutoML Natural Language, Vision, Video Intelligence and new features are
available on the Vertex AI platform. Please use
CreateDatasetOperator
create_dataset = AutoMLCreateDatasetOperator(
task_id="create_dataset",
dataset=DATASET,
location=GCP_AUTOML_LOCATION,
project_id=GCP_PROJECT_ID,
)
dataset_id = create_dataset.output["dataset_id"]
After creating a dataset you can use it to import some data using
AutoMLImportDataOperator
.
import_dataset = AutoMLImportDataOperator(
task_id="import_dataset",
dataset_id=dataset_id,
location=GCP_AUTOML_LOCATION,
input_config=IMPORT_INPUT_CONFIG,
)
To update dataset you can use
AutoMLTablesUpdateDatasetOperator
.
Warning
This operator is deprecated when running for text, video and vision prediction and will be removed soon.
All the functionality of legacy AutoML Natural Language, Vision, Video Intelligence and new features are
available on the Vertex AI platform. Please use
UpdateDatasetOperator
update_dataset_job = UpdateDatasetOperator(
task_id="update_dataset",
project_id=PROJECT_ID,
region=REGION,
dataset_id=create_video_dataset_job.output["dataset_id"],
dataset=DATASET_TO_UPDATE,
update_mask=TEST_UPDATE_MASK,
)
Listing Table And Columns Specs¶
To list table specs you can use
AutoMLTablesListTableSpecsOperator
.
To list column specs you can use
AutoMLTablesListColumnSpecsOperator
.
AutoML Tables related operators are deprecated. Please use related Vertex AI Tabular operators.
Operations On Models¶
To create a Google AutoML model you can use
AutoMLTrainModelOperator
.
The operator will wait for the operation to complete. Additionally the operator
returns the id of model in XCom under model_id
key.
Warning
This operator is deprecated when running for text, video and vision prediction and will be removed soon.
All the functionality of legacy AutoML Natural Language, Vision, Video Intelligence and new features are
available on the Vertex AI platform. Please use
SupervisedFineTuningTrainOperator
,
CreateAutoMLImageTrainingJobOperator
or
CreateAutoMLVideoTrainingJobOperator
.
You can find example on how to use VertexAI operators for AutoML Vision classification here:
create_auto_ml_image_training_job = CreateAutoMLImageTrainingJobOperator(
task_id="auto_ml_image_task",
display_name=IMAGE_DISPLAY_NAME,
dataset_id=image_dataset_id,
prediction_type="classification",
multi_label=False,
model_type="CLOUD",
training_fraction_split=0.6,
validation_fraction_split=0.2,
test_fraction_split=0.2,
budget_milli_node_hours=8000,
model_display_name=MODEL_DISPLAY_NAME,
disable_early_stopping=False,
region=REGION,
project_id=PROJECT_ID,
)
Example on how to use VertexAI operators for AutoML Video Intelligence classification you can find here:
create_auto_ml_video_training_job = CreateAutoMLVideoTrainingJobOperator(
task_id="auto_ml_video_task",
display_name=VIDEO_DISPLAY_NAME,
prediction_type="classification",
model_type="CLOUD",
dataset_id=video_dataset_id,
model_display_name=MODEL_DISPLAY_NAME,
region=REGION,
project_id=PROJECT_ID,
)
When running Vertex AI Operator for training data, please ensure that your data is correctly stored in Vertex AI
datasets. To create and import data to the dataset please use
CreateDatasetOperator
and
ImportDataOperator
create_model = AutoMLTrainModelOperator(task_id="create_model", model=MODEL, location=GCP_AUTOML_LOCATION)
To get existing model one can use
AutoMLGetModelOperator
.
This operator deprecated for tables, video intelligence, vision and natural language is deprecated
and will be removed after 31.03.2024. Please use
airflow.providers.google.cloud.operators.vertex_ai.model_service.GetModelOperator
instead.
You can find example on how to use VertexAI operators here:
get_model = GetModelOperator(
task_id="get_model", region=REGION, project_id=PROJECT_ID, model_id=model_id_v1
)
Once a model is created it could be deployed using
AutoMLDeployModelOperator
.
This operator deprecated for tables, video intelligence, vision and natural language is deprecated
and will be removed after 31.03.2024. Please use
airflow.providers.google.cloud.operators.vertex_ai.endpoint_service.DeployModelOperator
instead.
You can find example on how to use VertexAI operators here:
deploy_model = DeployModelOperator(
task_id="deploy_model",
endpoint_id=create_endpoint.output["endpoint_id"],
deployed_model=DEPLOYED_MODEL,
traffic_split={"0": 100},
region=REGION,
project_id=PROJECT_ID,
)
If you wish to delete a model you can use
AutoMLDeleteModelOperator
.
This operator deprecated for tables, video intelligence, vision and natural language is deprecated
and will be removed after 31.03.2024. Please use
airflow.providers.google.cloud.operators.vertex_ai.model_service.DeleteModelOperator
instead.
You can find example on how to use VertexAI operators here:
delete_model = DeleteModelOperator(
task_id="delete_model",
project_id=PROJECT_ID,
region=REGION,
model_id=upload_model.output["model_id"],
trigger_rule=TriggerRule.ALL_DONE,
)
Making Predictions¶
To obtain predictions from Google Cloud AutoML model you can use
AutoMLPredictOperator
or
AutoMLBatchPredictOperator
. In the first case
the model must be deployed.
TRANSLATION_STR = "A Dog walks down the street"
predict_task = AutoMLPredictOperator(
task_id="predict_task",
model_id=model_id,
payload={"text_snippet": {"content": TRANSLATION_STR}},
location=GCP_AUTOML_LOCATION,
project_id=GCP_PROJECT_ID,
)
Th AutoMLBatchPredictOperator
deprecated for tables,
video intelligence, vision and natural language is deprecated and will be removed after 31.03.2024. Please use
airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.CreateBatchPredictionJobOperator
,
airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.GetBatchPredictionJobOperator
,
airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.ListBatchPredictionJobsOperator
,
airflow.providers.google.cloud.operators.vertex_ai.batch_prediction_job.DeleteBatchPredictionJobOperator
,
instead.
You can find examples on how to use VertexAI operators here:
create_batch_prediction_job = CreateBatchPredictionJobOperator(
task_id="create_batch_prediction_job",
job_display_name=JOB_DISPLAY_NAME,
model_name="{{ti.xcom_pull('auto_ml_forecasting_task')['name']}}",
predictions_format="csv",
bigquery_source=BIGQUERY_SOURCE,
gcs_destination_prefix=GCS_DESTINATION_PREFIX,
model_parameters=MODEL_PARAMETERS,
region=REGION,
project_id=PROJECT_ID,
)
list_batch_prediction_job = ListBatchPredictionJobsOperator(
task_id="list_batch_prediction_jobs",
region=REGION,
project_id=PROJECT_ID,
)
delete_batch_prediction_job = DeleteBatchPredictionJobOperator(
task_id="delete_batch_prediction_job",
batch_prediction_job_id=create_batch_prediction_job.output["batch_prediction_job_id"],
region=REGION,
project_id=PROJECT_ID,
trigger_rule=TriggerRule.ALL_DONE,
)
Listing And Deleting Datasets¶
You can get a list of AutoML datasets using
AutoMLListDatasetOperator
. The operator returns list
of datasets ids in XCom under dataset_id_list
key.
This operator deprecated for tables, video intelligence, vision and natural language is deprecated
and will be removed after 31.03.2024. Please use
airflow.providers.google.cloud.operators.vertex_ai.dataset.ListDatasetsOperator
instead.
You can find example on how to use VertexAI operators here:
list_dataset_job = ListDatasetsOperator(
task_id="list_dataset",
region=REGION,
project_id=PROJECT_ID,
)
To delete a dataset you can use AutoMLDeleteDatasetOperator
.
The delete operator allows also to pass list or coma separated string of datasets ids to be deleted.
This operator deprecated for tables, video intelligence, vision and natural language is deprecated
and will be removed after 31.03.2024. Please use
airflow.providers.google.cloud.operators.vertex_ai.dataset.DeleteDatasetOperator
instead.
You can find example on how to use VertexAI operators here:
delete_dataset_job = DeleteDatasetOperator(
task_id="delete_dataset",
dataset_id=create_text_dataset_job.output["dataset_id"],
region=REGION,
project_id=PROJECT_ID,
)