airflow.providers.google.cloud.operators.gen_ai¶
This module contains Google Gen AI operators.
Classes¶
Uses the Gemini AI Embeddings API to generate embeddings for words, phrases, sentences, and code. |
|
Generate a model response based on given configuration. Input capabilities differ between models, including tuned models. |
|
Create a tuning job to adapt model behavior with a labeled dataset. |
|
Use Count Tokens API to calculate the number of input tokens before sending a request to Gemini API. |
|
Create CachedContent resource to reduce the cost of requests that contain repeat content with high input token counts. |
|
Possible states of batch job in Gemini Batch API. |
|
Create Batch job using Gemini Batch API. Use to generate model response for several requests. |
|
Get Batch job using Gemini API. |
|
Get list of Batch jobs metadata using Gemini API. |
|
Queue a batch job for deletion using the Gemini API. |
|
Cancel Batch job using Gemini API. |
|
Create embeddings Batch job using Gemini Batch API. |
|
Get file uploaded to Gemini Files API. |
|
Get file's metadata uploaded to Gemini Files API by using GenAIGeminiUploadFileOperator. |
|
List files uploaded to Gemini Files API. |
|
Delete file uploaded to Gemini Files API. |
Module Contents¶
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGenerateEmbeddingsOperator(*, project_id, location, model, contents, config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorUses the Gemini AI Embeddings API to generate embeddings for words, phrases, sentences, and code.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to (templated).
location (str) – Required. The ID of the Google Cloud location that the service belongs to (templated).
model (str) – Required. The name of the model to use for content generation, which can be a text-only or multimodal model. For example, gemini-pro or gemini-pro-vision.
contents (google.genai.types.ContentListUnion | google.genai.types.ContentListUnionDict | list[str]) – Optional. The contents to use for embedding.
config (google.genai.types.EmbedContentConfigOrDict | None) – Optional. Configuration for embeddings.
gcp_conn_id (str) – Optional. The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional. Service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGenerateContentOperator(*, project_id, location, contents, model, generation_config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorGenerate a model response based on given configuration. Input capabilities differ between models, including tuned models.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to (templated).
location (str) – Required. The ID of the Google Cloud location that the service belongs to (templated).
model (str) – Required. The name of the model to use for content generation, which can be a text-only or multimodal model. For example, gemini-pro or gemini-pro-vision.
contents (google.genai.types.ContentListUnionDict) – Required. The multi-part content of a message that a user or a program gives to the generative model, in order to elicit a specific response.
generation_config (google.genai.types.GenerateContentConfig | dict[str, Any] | None) – Optional. Generation configuration settings.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAISupervisedFineTuningTrainOperator(*, project_id, location, source_model, training_dataset, tuning_job_config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorCreate a tuning job to adapt model behavior with a labeled dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
source_model (str) – Required. A pre-trained model optimized for performing natural language tasks such as classification, summarization, extraction, content creation, and ideation.
training_dataset (google.genai.types.TuningDatasetOrDict) – Required. Cloud Storage URI of your training dataset. The dataset must be formatted as a JSONL file. For best results, provide at least 100 to 500 examples.
tuning_job_config (google.genai.types.CreateTuningJobConfigOrDict | dict[str, Any] | None) – Optional. Configuration of the Tuning job to be created.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAICountTokensOperator(*, project_id, location, contents, model, config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorUse Count Tokens API to calculate the number of input tokens before sending a request to Gemini API.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to (templated).
location (str) – Required. The ID of the Google Cloud location that the service belongs to (templated).
contents (google.genai.types.ContentListUnion | google.genai.types.ContentListUnionDict) – Required. The multi-part content of a message that a user or a program gives to the generative model, in order to elicit a specific response.
model (str) – Required. Model, supporting prompts with text-only input, including natural language tasks, multi-turn text and code chat, and code generation. It can output text and code.
config (google.genai.types.CountTokensConfigOrDict | None) – Optional. Configuration for Count Tokens.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAICreateCachedContentOperator(*, project_id, location, model, cached_content_config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorCreate CachedContent resource to reduce the cost of requests that contain repeat content with high input token counts.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model (str) – Required. The name of the publisher model to use for cached content.
cached_content_config (google.genai.types.CreateCachedContentConfigOrDict | None) – Optional. Configuration of the Cached Content.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.BatchJobStatus[source]¶
Bases:
enum.EnumPossible states of batch job in Gemini Batch API.
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiCreateBatchJobOperator(*, project_id, location, model, input_source, gemini_api_key, create_batch_job_config=None, results_folder=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, retrieve_result=False, wait_until_complete=False, polling_interval=30, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorCreate Batch job using Gemini Batch API. Use to generate model response for several requests.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model (str) – Required. The name of the publisher model to use for Batch job.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
input_source (list | str) – Required. Source of requests, could be inline requests or file name.
results_folder (str | None) – Optional. Path to a folder on local machine where file with results will be saved.
create_batch_job_config (google.genai.types.CreateBatchJobConfig | dict | None) – Optional. Config for batch job creation.
wait_until_complete (bool) – Optional. Await job completion.
retrieve_result (bool) – Optional. Push the result to XCom. If the input_source is inline, this pushes the execution result. If a file name is specified, this pushes the output file path.
polling_interval (int) – Optional. The interval, in seconds, to poll the job status.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiGetBatchJobOperator(*, project_id, location, job_name, gemini_api_key, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorGet Batch job using Gemini API.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model – Required. The name of the publisher model to use for Batch job.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
job_name (str) – Required. Name of the batch job.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiListBatchJobsOperator(*, project_id, location, gemini_api_key, list_batch_jobs_config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorGet list of Batch jobs metadata using Gemini API.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model – Required. The name of the publisher model to use for Batch job.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiDeleteBatchJobOperator(*, project_id, location, job_name, gemini_api_key, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorQueue a batch job for deletion using the Gemini API.
The job will not be deleted immediately. After submitting it for deletion, it will still be available through GenAIGeminiListBatchJobsOperator or GenAIGeminiGetBatchJobOperator for some time.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model – Required. The name of the publisher model to use for Batch job.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
job_name (str) – Required. Name of the batch job.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiCancelBatchJobOperator(*, project_id, location, job_name, gemini_api_key, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorCancel Batch job using Gemini API.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model – Required. The name of the publisher model to use for Batch job.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
job_name (str) – Required. Name of the batch job.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiCreateEmbeddingsBatchJobOperator(*, project_id, location, model, gemini_api_key, input_source, results_folder=None, create_embeddings_config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, wait_until_complete=False, retrieve_result=False, polling_interval=30, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorCreate embeddings Batch job using Gemini Batch API.
Use to generate embeddings for words, phrases, sentences, and code for several requests.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
model (str) – Required. The name of the publisher model to use for Batch job.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
input_source (dict | str) – Required. Source of requests, could be inline requests or file name.
results_folder (str | None) – Optional. Path to a folder on local machine where file with results will be saved.
create_embeddings_config (google.genai.types.CreateBatchJobConfig | dict | None) – Optional. Config for batch job creation.
wait_until_complete (bool) – Optional. Await job completion.
retrieve_result (bool) – Optional. Push the result to XCom. If the input_source is inline, this pushes the execution result. If a file name is specified, this pushes the output file path.
polling_interval (int) – Optional. The interval, in seconds, to poll the job status.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiUploadFileOperator(*, project_id, location, file_path, gemini_api_key, upload_file_config=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorGet file uploaded to Gemini Files API.
The Files API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Supported types are audio files, images, videos, documents, and others. Files are stored for 48 hours.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
file_path (str) – Required. Path to file on your local machine.
upload_file_config (dict | None) – Optional. Metadata configuration for file upload. Defaults to display name and mime type parsed from file_path.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiGetFileOperator(*, project_id, location, file_name, gemini_api_key, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorGet file’s metadata uploaded to Gemini Files API by using GenAIGeminiUploadFileOperator.
The Files API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
file_name (str) – Required. File name in Gemini Files API to get
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiListFilesOperator(*, project_id, location, gemini_api_key, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorList files uploaded to Gemini Files API.
The Files API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.gen_ai.GenAIGeminiDeleteFileOperator(*, project_id, location, file_name, gemini_api_key, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorDelete file uploaded to Gemini Files API.
The Files API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
location (str) – Required. The ID of the Google Cloud location that the service belongs to.
gemini_api_key (str) – Required. Key to interact with Gemini Batch API.
file_name (str) – Required. File name in Gemini Files API to delete.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).