airflow.providers.google.cloud.triggers.bigquery

Classes

BigQueryInsertJobTrigger

BigQueryInsertJobTrigger run on the trigger worker to perform insert operation.

BigQueryCheckTrigger

BigQueryCheckTrigger run on the trigger worker.

BigQueryGetDataTrigger

BigQueryGetDataTrigger run on the trigger worker, inherits from BigQueryInsertJobTrigger class.

BigQueryIntervalCheckTrigger

BigQueryIntervalCheckTrigger run on the trigger worker, inherits from BigQueryInsertJobTrigger class.

BigQueryValueCheckTrigger

BigQueryValueCheckTrigger run on the trigger worker, inherits from BigQueryInsertJobTrigger class.

BigQueryTableExistenceTrigger

Initialize the BigQuery Table Existence Trigger with needed parameters.

BigQueryTablePartitionExistenceTrigger

Initialize the BigQuery Table Partition Existence Trigger with needed parameters.

BigQueryStreamingBufferEmptyTrigger

Poll a BigQuery table until its streaming buffer is empty.

Module Contents

class airflow.providers.google.cloud.triggers.bigquery.BigQueryInsertJobTrigger(conn_id, job_id, project_id, location, dataset_id=None, table_id=None, poll_interval=4.0, impersonation_chain=None, cancel_on_kill=True)[source]

Bases: airflow.triggers.base.BaseTrigger

BigQueryInsertJobTrigger run on the trigger worker to perform insert operation.

Parameters:
  • conn_id (str) – Reference to google cloud connection id

  • job_id (str | None) – The ID of the job. It will be suffixed with hash of job configuration

  • project_id (str) – Google Cloud Project where the job is running

  • location (str | None) – The dataset location.

  • dataset_id (str | None) – The dataset ID of the requested table. (templated)

  • table_id (str | None) – The table ID of the requested table. (templated)

  • poll_interval (float) – polling period in seconds to check for the status. (templated)

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account. (templated)

conn_id[source]
job_id[source]
dataset_id = None[source]
project_id[source]
location[source]
table_id = None[source]
poll_interval = 4.0[source]
impersonation_chain = None[source]
cancel_on_kill = True[source]
serialize()[source]

Serialize BigQueryInsertJobTrigger arguments and classpath.

async on_kill()[source]

Cancel the BigQuery job when the task is killed by a user action.

get_task_instance(session)[source]
async run()[source]

Get current job execution status and yields a TriggerEvent.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryCheckTrigger(conn_id, job_id, project_id, location, dataset_id=None, table_id=None, poll_interval=4.0, impersonation_chain=None, cancel_on_kill=True)[source]

Bases: BigQueryInsertJobTrigger

BigQueryCheckTrigger run on the trigger worker.

serialize()[source]

Serialize BigQueryCheckTrigger arguments and classpath.

async run()[source]

Get current job execution status and yields a TriggerEvent.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryGetDataTrigger(as_dict=False, selected_fields=None, **kwargs)[source]

Bases: BigQueryInsertJobTrigger

BigQueryGetDataTrigger run on the trigger worker, inherits from BigQueryInsertJobTrigger class.

Parameters:

as_dict (bool) – if True returns the result as a list of dictionaries, otherwise as list of lists (default: False).

as_dict = False[source]
selected_fields = None[source]
serialize()[source]

Serialize BigQueryInsertJobTrigger arguments and classpath.

async run()[source]

Get current job execution status and yields a TriggerEvent with response data.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryIntervalCheckTrigger(conn_id, first_job_id, second_job_id, project_id, table, metrics_thresholds, location=None, date_filter_column='ds', days_back=-7, ratio_formula='max_over_min', ignore_zero=True, dataset_id=None, table_id=None, poll_interval=4.0, impersonation_chain=None)[source]

Bases: BigQueryInsertJobTrigger

BigQueryIntervalCheckTrigger run on the trigger worker, inherits from BigQueryInsertJobTrigger class.

Parameters:
  • conn_id (str) – Reference to google cloud connection id

  • first_job_id (str) – The ID of the job 1 performed

  • second_job_id (str) – The ID of the job 2 performed

  • project_id (str) – Google Cloud Project where the job is running

  • dataset_id (str | None) – The dataset ID of the requested table. (templated)

  • table (str) – table name

  • metrics_thresholds (dict[str, int]) – dictionary of ratios indexed by metrics

  • location (str | None) – The dataset location.

  • date_filter_column (str | None) – column name. (templated)

  • days_back (SupportsAbs[int]) – number of days between ds and the ds we want to check against. (templated)

  • ratio_formula (str) – ration formula. (templated)

  • ignore_zero (bool) – boolean value to consider zero or not. (templated)

  • table_id (str | None) – The table ID of the requested table. (templated)

  • poll_interval (float) – polling period in seconds to check for the status. (templated)

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account. (templated)

conn_id[source]
first_job_id[source]
second_job_id[source]
project_id[source]
table[source]
metrics_thresholds[source]
date_filter_column = 'ds'[source]
days_back = -7[source]
ratio_formula = 'max_over_min'[source]
ignore_zero = True[source]
serialize()[source]

Serialize BigQueryCheckTrigger arguments and classpath.

async run()[source]

Get current job execution status and yields a TriggerEvent.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryValueCheckTrigger(conn_id, sql, pass_value, job_id, project_id, tolerance=None, dataset_id=None, table_id=None, location=None, poll_interval=4.0, impersonation_chain=None)[source]

Bases: BigQueryInsertJobTrigger

BigQueryValueCheckTrigger run on the trigger worker, inherits from BigQueryInsertJobTrigger class.

Parameters:
  • conn_id (str) – Reference to google cloud connection id

  • sql (str) – the sql to be executed

  • pass_value (int | float | str) – pass value

  • job_id (str | None) – The ID of the job

  • project_id (str) – Google Cloud Project where the job is running

  • tolerance (Any) – certain metrics for tolerance. (templated)

  • dataset_id (str | None) – The dataset ID of the requested table. (templated)

  • table_id (str | None) – The table ID of the requested table. (templated)

  • location (str | None) – The dataset location

  • poll_interval (float) – polling period in seconds to check for the status. (templated)

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

sql[source]
pass_value[source]
tolerance = None[source]
serialize()[source]

Serialize BigQueryValueCheckTrigger arguments and classpath.

async run()[source]

Get current job execution status and yields a TriggerEvent.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryTableExistenceTrigger(project_id, dataset_id, table_id, gcp_conn_id, hook_params, poll_interval=4.0, impersonation_chain=None)[source]

Bases: airflow.triggers.base.BaseTrigger

Initialize the BigQuery Table Existence Trigger with needed parameters.

Parameters:
  • project_id (str) – Google Cloud Project where the job is running

  • dataset_id (str) – The dataset ID of the requested table.

  • table_id (str) – The table ID of the requested table.

  • gcp_conn_id (str) – Reference to google cloud connection id

  • hook_params (dict[str, Any]) – params for hook

  • poll_interval (float) – polling period in seconds to check for the status

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account. (templated)

dataset_id[source]
project_id[source]
table_id[source]
gcp_conn_id: str[source]
poll_interval = 4.0[source]
hook_params[source]
impersonation_chain = None[source]
serialize()[source]

Serialize BigQueryTableExistenceTrigger arguments and classpath.

async run()[source]

Will run until the table exists in the Google Big Query.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryTablePartitionExistenceTrigger(partition_id, **kwargs)[source]

Bases: BigQueryTableExistenceTrigger

Initialize the BigQuery Table Partition Existence Trigger with needed parameters.

Parameters:
  • partition_id (str) – The name of the partition to check the existence of.

  • project_id – Google Cloud Project where the job is running

  • dataset_id – The dataset ID of the requested table.

  • table_id – The table ID of the requested table.

  • gcp_conn_id – Reference to google cloud connection id

  • hook_params – params for hook

  • poll_interval – polling period in seconds to check for the status

  • impersonation_chain – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account. (templated)

partition_id[source]
serialize()[source]

Serialize BigQueryTablePartitionExistenceTrigger arguments and classpath.

async run()[source]

Will run until the table exists in the Google Big Query.

class airflow.providers.google.cloud.triggers.bigquery.BigQueryStreamingBufferEmptyTrigger(project_id, dataset_id, table_id, gcp_conn_id, poll_interval=30.0, impersonation_chain=None)[source]

Bases: airflow.triggers.base.BaseTrigger

Poll a BigQuery table until its streaming buffer is empty.

Used by BigQueryStreamingBufferEmptySensor in deferrable mode.

Parameters:
  • project_id (str) – Google Cloud project ID.

  • dataset_id (str) – Dataset of the table to monitor.

  • table_id (str) – Table to monitor.

  • gcp_conn_id (str) – Airflow connection ID for GCP.

  • poll_interval (float) – Seconds between polls.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate, or a chained list of accounts.

project_id[source]
dataset_id[source]
table_id[source]
gcp_conn_id[source]
poll_interval = 30.0[source]
impersonation_chain = None[source]
serialize()[source]

Return the information needed to reconstruct this Trigger.

Returns:

Tuple of (class path, keyword arguments needed to re-instantiate).

Return type:

tuple[str, dict[str, Any]]

async run()[source]

Run the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

Was this entry helpful?