airflow.providers.google.cloud.sensors.dataplex
¶
This module contains Google Dataplex sensors.
Module Contents¶
Classes¶
Dataplex Task states. |
|
Check the status of the Dataplex task. |
|
Check the status of the Dataplex DataQuality job. |
|
Check the status of the Dataplex DataProfile job. |
- class airflow.providers.google.cloud.sensors.dataplex.DataplexTaskStateSensor(project_id, region, lake_id, dataplex_task_id, api_version='v1', retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, *args, **kwargs)[source]¶
Bases:
airflow.sensors.base.BaseSensorOperator
Check the status of the Dataplex task.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the task belongs to.
region (str) – Required. The ID of the Google Cloud region that the task belongs to.
lake_id (str) – Required. The ID of the Google Cloud lake that the task belongs to.
dataplex_task_id (str) – Required. Task identifier.
api_version (str) – The version of the api that will be requested for example ‘v3’.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.
metadata (Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.sensors.dataplex.DataplexDataQualityJobStatusSensor(project_id, region, data_scan_id, job_id, api_version='v1', retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, fail_on_dq_failure=False, result_timeout=60.0 * 10, start_sensor_time=None, *args, **kwargs)[source]¶
Bases:
airflow.sensors.base.BaseSensorOperator
Check the status of the Dataplex DataQuality job.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the task belongs to.
region (str) – Required. The ID of the Google Cloud region that the task belongs to.
data_scan_id (str) – Required. Data Quality scan identifier.
job_id (str) – Required. Job ID.
api_version (str) – The version of the api that will be requested for example ‘v3’.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.
metadata (Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
result_timeout (float) – Value in seconds for which operator will wait for the Data Quality scan result. Throws exception if there is no result found after specified amount of seconds.
fail_on_dq_failure (bool) – If set to true and not all Data Quality scan rules have been passed, an exception is thrown. If set to false and not all Data Quality scan rules have been passed, execution will finish with success.
- Returns
Boolean indicating if the job run has reached the
DataScanJob.State.SUCCEEDED
.
- class airflow.providers.google.cloud.sensors.dataplex.DataplexDataProfileJobStatusSensor(project_id, region, data_scan_id, job_id, api_version='v1', retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, result_timeout=60.0 * 10, start_sensor_time=None, *args, **kwargs)[source]¶
Bases:
airflow.sensors.base.BaseSensorOperator
Check the status of the Dataplex DataProfile job.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the task belongs to.
region (str) – Required. The ID of the Google Cloud region that the task belongs to.
data_scan_id (str) – Required. Data Quality scan identifier.
job_id (str) – Required. Job ID.
api_version (str) – The version of the api that will be requested for example ‘v3’.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.
metadata (Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.
gcp_conn_id (str) – The connection ID to use when fetching connection info.
impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
result_timeout (float) – Value in seconds for which operator will wait for the Data Quality scan result. Throws exception if there is no result found after specified amount of seconds.
- Returns
Boolean indicating if the job run has reached the
DataScanJob.State.SUCCEEDED
.