airflow.providers.google.cloud.hooks.dataprep

This module contains Google Dataprep hook.

Module Contents

Classes

JobGroupStatuses

Types of job group run statuses.

GoogleDataprepHook

Hook for connection with Dataprep API.

class airflow.providers.google.cloud.hooks.dataprep.JobGroupStatuses[source]

Bases: str, enum.Enum

Types of job group run statuses.

CREATED = 'Created'[source]
UNDEFINED = 'undefined'[source]
IN_PROGRESS = 'InProgress'[source]
COMPLETE = 'Complete'[source]
FAILED = 'Failed'[source]
CANCELED = 'Canceled'[source]
class airflow.providers.google.cloud.hooks.dataprep.GoogleDataprepHook(dataprep_conn_id=default_conn_name)[source]

Bases: airflow.hooks.base.BaseHook

Hook for connection with Dataprep API.

To get connection Dataprep with Airflow you need Dataprep token.

https://clouddataprep.com/documentation/api#section/Authentication

It should be added to the Connection in Airflow in JSON format.

conn_name_attr = 'dataprep_conn_id'[source]
default_conn_name = 'google_cloud_dataprep_default'[source]
conn_type = 'dataprep'[source]
hook_name = 'Google Dataprep'[source]
get_jobs_for_job_group(job_id)[source]

Get information about the batch jobs within a Cloud Dataprep job.

Parameters

job_id (int) – The ID of the job that will be fetched

get_job_group(job_group_id, embed, include_deleted)[source]

Get the specified job group. A job group is a job that is executed from a specific node in a flow.

Parameters
  • job_group_id (int) – The ID of the job that will be fetched

  • embed (str) – Comma-separated list of objects to pull in as part of the response

  • include_deleted (bool) – if set to “true”, will include deleted objects

run_job_group(body_request)[source]

Creates a jobGroup, which launches the specified job as the authenticated user.

This performs the same action as clicking on the Run Job button in the application.

To get recipe_id please follow the Dataprep API documentation https://clouddataprep.com/documentation/api#operation/runJobGroup.

Parameters

body_request (dict) – The identifier for the recipe you would like to run.

copy_flow(*, flow_id, name='', description='', copy_datasources=False)[source]

Create a copy of the provided flow id, as well as all contained recipes.

Parameters
  • flow_id (int) – ID of the flow to be copied

  • name (str) – Name for the copy of the flow

  • description (str) – Description of the copy of the flow

  • copy_datasources (bool) – Bool value to define should copies of data inputs be made or not.

delete_flow(*, flow_id)[source]

Delete the flow with the provided id.

Parameters

flow_id (int) – ID of the flow to be copied

run_flow(*, flow_id, body_request)[source]

Runs the flow with the provided id copy of the provided flow id.

Parameters
  • flow_id (int) – ID of the flow to be copied

  • body_request (dict) – Body of the POST request to be sent.

get_job_group_status(*, job_group_id)[source]

Check the status of the Dataprep task to be finished.

Parameters

job_group_id (int) – ID of the job group to check

Was this entry helpful?