airflow.providers.google.cloud.operators.dataflow
¶
This module contains Google Dataflow operators.
Module Contents¶
Classes¶
Helper enum for choosing what to do if job is already running. |
|
Dataflow configuration for BeamRunJavaPipelineOperator and BeamRunPythonPipelineOperator. |
|
Start a Dataflow job with a classic template; the parameters of the operation will be passed to the job. |
|
Starts a Dataflow Job with a Flex Template. |
|
Starts Dataflow SQL query. |
|
Launch a Dataflow YAML job and return the result. |
|
Stops the job with the specified name prefix or Job ID. |
|
Creates a new Dataflow Data Pipeline instance. |
|
Runs a Dataflow Data Pipeline. |
|
Deletes a Dataflow Data Pipeline. |
- class airflow.providers.google.cloud.operators.dataflow.CheckJobRunning[source]¶
Bases:
enum.Enum
Helper enum for choosing what to do if job is already running.
IgnoreJob - do not check if running FinishIfRunning - finish current dag run with no action WaitForRun - wait for job to finish and then continue with new job
- class airflow.providers.google.cloud.operators.dataflow.DataflowConfiguration(*, job_name=None, append_job_name=True, project_id=PROVIDE_PROJECT_ID, location=DEFAULT_DATAFLOW_LOCATION, gcp_conn_id='google_cloud_default', poll_sleep=10, impersonation_chain=None, drain_pipeline=False, cancel_timeout=5 * 60, wait_until_finished=None, multiple_jobs=None, check_if_running=CheckJobRunning.WaitForRun, service_account=None)[source]¶
Dataflow configuration for BeamRunJavaPipelineOperator and BeamRunPythonPipelineOperator.
See also
BeamRunJavaPipelineOperator
andBeamRunPythonPipelineOperator
.- Parameters
job_name (str | None) – The ‘jobName’ to use when executing the Dataflow job (templated). This ends up being set in the pipeline options, so any entry with key
'jobName'
or'job_name'``in ``options
will be overwritten.append_job_name (bool) – True if unique suffix has to be appended to job name.
project_id (str) – Optional, the Google Cloud project ID in which to start a job. If set to None or missing, the default project_id from the Google Cloud connection is used.
location (str | None) – Job location.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
poll_sleep (int) – The time in seconds to sleep between polling Google Cloud Platform for the dataflow job status while the job is in the JOB_STATE_RUNNING state.
impersonation_chain (str | collections.abc.Sequence[str] | None) –
Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
Warning
This option requires Apache Beam 2.39.0 or newer.
drain_pipeline (bool) – Optional, set to True if want to stop streaming job by draining it instead of canceling during killing task instance. See: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
cancel_timeout (int | None) – How long (in seconds) operator should wait for the pipeline to be successfully cancelled when task is being killed. (optional) default to 300s
wait_until_finished (bool | None) –
(Optional) If True, wait for the end of pipeline execution before exiting. If False, only submits job. If None, default behavior.
The default behavior depends on the type of pipeline:
for the streaming pipeline, wait for jobs to start,
for the batch pipeline, wait for the jobs to complete.
Warning
You cannot call
PipelineResult.wait_until_finish
method in your pipeline code for the operator to work properly. i. e. you must use asynchronous execution. Otherwise, your pipeline will always wait until finished. For more information, look at: Asynchronous executionThe process of starting the Dataflow job in Airflow consists of two steps: * running a subprocess and reading the stderr/stderr log for the job id. * loop waiting for the end of the job ID from the previous step by checking its status.
Step two is started just after step one has finished, so if you have wait_until_finished in your pipeline code, step two will not start until the process stops. When this process stops, steps two will run, but it will only execute one iteration as the job will be in a terminal state.
If you in your pipeline do not call the wait_for_pipeline method but pass wait_until_finish=True to the operator, the second loop will wait for the job’s terminal state.
If you in your pipeline do not call the wait_for_pipeline method, and pass wait_until_finish=False to the operator, the second loop will check once is job not in terminal state and exit the loop.
multiple_jobs (bool | None) – If pipeline creates multiple jobs then monitor all jobs. Supported only by
BeamRunJavaPipelineOperator
.check_if_running (CheckJobRunning) – Before running job, validate that a previous run is not in process. Supported only by:
BeamRunJavaPipelineOperator
.service_account (str | None) – Run the job as a specific service account, instead of the default GCE robot.
- template_fields: collections.abc.Sequence[str] = ('job_name', 'location')[source]¶
- class airflow.providers.google.cloud.operators.dataflow.DataflowTemplatedJobStartOperator(*, template, project_id=PROVIDE_PROJECT_ID, job_name='{{task.task_id}}', options=None, dataflow_default_options=None, parameters=None, location=None, gcp_conn_id='google_cloud_default', poll_sleep=10, impersonation_chain=None, environment=None, cancel_timeout=10 * 60, wait_until_finished=None, append_job_name=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), expected_terminal_state=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Start a Dataflow job with a classic template; the parameters of the operation will be passed to the job.
See also
For more information on how to use this operator, take a look at the guide: Templated jobs
- Parameters
template (str) – The reference to the Dataflow template.
job_name (str) – The ‘jobName’ to use when executing the Dataflow template (templated).
options (dict[str, Any] | None) –
Map of job runtime environment options. It will update environment argument if passed.
See also
For more information on possible configurations, look at the API documentation https://cloud.google.com/dataflow/pipelines/specifying-exec-params
dataflow_default_options (dict[str, Any] | None) – Map of default job environment options.
parameters (dict[str, str] | None) – Map of job specific parameters for the template.
project_id (str) – Optional, the Google Cloud project ID in which to start a job. If set to None or missing, the default project_id from the Google Cloud connection is used.
location (str | None) – Job location.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
poll_sleep (int) – The time in seconds to sleep between polling Google Cloud Platform for the dataflow job status while the job is in the JOB_STATE_RUNNING state.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
environment (dict | None) –
Optional, Map of job runtime environment options.
See also
For more information on possible configurations, look at the API documentation https://cloud.google.com/dataflow/pipelines/specifying-exec-params
cancel_timeout (int | None) – How long (in seconds) operator should wait for the pipeline to be successfully cancelled when task is being killed.
append_job_name (bool) – True if unique suffix has to be appended to job name.
wait_until_finished (bool | None) –
(Optional) If True, wait for the end of pipeline execution before exiting. If False, only submits job. If None, default behavior.
The default behavior depends on the type of pipeline:
for the streaming pipeline, wait for jobs to start,
for the batch pipeline, wait for the jobs to complete.
Warning
You cannot call
PipelineResult.wait_until_finish
method in your pipeline code for the operator to work properly. i. e. you must use asynchronous execution. Otherwise, your pipeline will always wait until finished. For more information, look at: Asynchronous executionThe process of starting the Dataflow job in Airflow consists of two steps:
running a subprocess and reading the stderr/stderr log for the job id.
loop waiting for the end of the job ID from the previous step. This loop checks the status of the job.
Step two is started just after step one has finished, so if you have wait_until_finished in your pipeline code, step two will not start until the process stops. When this process stops, steps two will run, but it will only execute one iteration as the job will be in a terminal state.
If you in your pipeline do not call the wait_for_pipeline method but pass wait_until_finish=True to the operator, the second loop will wait for the job’s terminal state.
If you in your pipeline do not call the wait_for_pipeline method, and pass wait_until_finish=False to the operator, the second loop will check once is job not in terminal state and exit the loop.
expected_terminal_state (str | None) – The expected terminal state of the operator on which the corresponding Airflow task succeeds. When not specified, it will be determined by the hook.
It’s a good practice to define dataflow_* parameters in the default_args of the dag like the project, zone and staging location.
See also
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/LaunchTemplateParameters https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment
default_args = { "dataflow_default_options": { "zone": "europe-west1-d", "tempLocation": "gs://my-staging-bucket/staging/", } }
You need to pass the path to your dataflow template as a file reference with the
template
parameter. Useparameters
to pass on parameters to your job. Useenvironment
to pass on runtime environment variables to your job.t1 = DataflowTemplatedJobStartOperator( task_id="dataflow_example", template="{{var.value.gcp_dataflow_base}}", parameters={ "inputFile": "gs://bucket/input/my_input.txt", "outputFile": "gs://bucket/output/my_output.txt", }, gcp_conn_id="airflow-conn-id", dag=my_dag, )
template
,dataflow_default_options
,parameters
, andjob_name
are templated, so you can use variables in them.Note that
dataflow_default_options
is expected to save high-level options for project information, which apply to all dataflow operators in the DAG.See also
https://cloud.google.com/dataflow/docs/reference/rest/v1b3 /LaunchTemplateParameters https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment For more detail on job template execution have a look at the reference: https://cloud.google.com/dataflow/docs/templates/executing-templates
- Parameters
deferrable (bool) – Run operator in the deferrable mode.
- template_fields: collections.abc.Sequence[str] = ('template', 'job_name', 'options', 'parameters', 'project_id', 'location', 'gcp_conn_id',...[source]¶
- class airflow.providers.google.cloud.operators.dataflow.DataflowStartFlexTemplateOperator(body, location, project_id=PROVIDE_PROJECT_ID, gcp_conn_id='google_cloud_default', drain_pipeline=False, cancel_timeout=10 * 60, wait_until_finished=None, impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), append_job_name=True, expected_terminal_state=None, poll_sleep=10, *args, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Starts a Dataflow Job with a Flex Template.
See also
For more information on how to use this operator, take a look at the guide: Templated jobs
- Parameters
body (dict) – The request body. See: https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.flexTemplates/launch#request-body
location (str) – The location of the Dataflow job (for example europe-west1)
project_id (str) – The ID of the GCP project that owns the job.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud Platform.
drain_pipeline (bool) – Optional, set to True if want to stop streaming job by draining it instead of canceling during killing task instance. See: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
cancel_timeout (int | None) – How long (in seconds) operator should wait for the pipeline to be successfully cancelled when task is being killed.
wait_until_finished (bool | None) –
(Optional) If True, wait for the end of pipeline execution before exiting. If False, only submits job. If None, default behavior.
The default behavior depends on the type of pipeline:
for the streaming pipeline, wait for jobs to start,
for the batch pipeline, wait for the jobs to complete.
Warning
You cannot call
PipelineResult.wait_until_finish
method in your pipeline code for the operator to work properly. i. e. you must use asynchronous execution. Otherwise, your pipeline will always wait until finished. For more information, look at: Asynchronous executionThe process of starting the Dataflow job in Airflow consists of two steps:
running a subprocess and reading the stderr/stderr log for the job id.
loop waiting for the end of the job ID from the previous step. This loop checks the status of the job.
Step two is started just after step one has finished, so if you have wait_until_finished in your pipeline code, step two will not start until the process stops. When this process stops, steps two will run, but it will only execute one iteration as the job will be in a terminal state.
If you in your pipeline do not call the wait_for_pipeline method but pass wait_until_finished=True to the operator, the second loop will wait for the job’s terminal state.
If you in your pipeline do not call the wait_for_pipeline method, and pass wait_until_finished=False to the operator, the second loop will check once is job not in terminal state and exit the loop.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
deferrable (bool) – Run operator in the deferrable mode.
expected_terminal_state (str | None) – The expected final status of the operator on which the corresponding Airflow task succeeds. When not specified, it will be determined by the hook.
append_job_name (bool) – True if unique suffix has to be appended to job name.
poll_sleep (int) – The time in seconds to sleep between polling Google Cloud Platform for the dataflow job status while the job is in the JOB_STATE_RUNNING state.
- template_fields: collections.abc.Sequence[str] = ('body', 'location', 'project_id', 'gcp_conn_id')[source]¶
- class airflow.providers.google.cloud.operators.dataflow.DataflowStartSqlJobOperator(job_name, query, options, location=DEFAULT_DATAFLOW_LOCATION, project_id=PROVIDE_PROJECT_ID, gcp_conn_id='google_cloud_default', drain_pipeline=False, impersonation_chain=None, *args, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Starts Dataflow SQL query.
See also
For more information on how to use this operator, take a look at the guide: Dataflow SQL
Warning
This operator requires
gcloud
command (Google Cloud SDK) must be installed on the Airflow worker <https://cloud.google.com/sdk/docs/install>`__- Parameters
job_name (str) – The unique name to assign to the Cloud Dataflow job.
query (str) – The SQL query to execute.
Job parameters to be executed. It can be a dictionary with the following keys.
For more information, look at: https://cloud.google.com/sdk/gcloud/reference/beta/dataflow/sql/query command reference
location (str) – The location of the Dataflow job (for example europe-west1)
project_id (str) – The ID of the GCP project that owns the job. If set to
None
or missing, the default project_id from the GCP connection is used.gcp_conn_id (str) – The connection ID to use connecting to Google Cloud Platform.
drain_pipeline (bool) – Optional, set to True if want to stop streaming job by draining it instead of canceling during killing task instance. See: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('job_name', 'query', 'options', 'location', 'project_id', 'gcp_conn_id')[source]¶
- class airflow.providers.google.cloud.operators.dataflow.DataflowStartYamlJobOperator(*, job_name, yaml_pipeline_file, region=DEFAULT_DATAFLOW_LOCATION, project_id=PROVIDE_PROJECT_ID, gcp_conn_id='google_cloud_default', append_job_name=True, drain_pipeline=False, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), poll_sleep=10, cancel_timeout=5 * 60, expected_terminal_state=None, jinja_variables=None, options=None, impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Launch a Dataflow YAML job and return the result.
See also
For more information on how to use this operator, take a look at the guide: Dataflow YAML
Warning
This operator requires
gcloud
command (Google Cloud SDK) must be installed on the Airflow worker <https://cloud.google.com/sdk/docs/install>`__- Parameters
job_name (str) – Required. The unique name to assign to the Cloud Dataflow job.
yaml_pipeline_file (str) – Required. Path to a file defining the YAML pipeline to run. Must be a local file or a URL beginning with ‘gs://’.
region (str) – Optional. Region ID of the job’s regional endpoint. Defaults to ‘us-central1’.
project_id (str) – Required. The ID of the GCP project that owns the job. If set to
None
or missing, the default project_id from the GCP connection is used.gcp_conn_id (str) – Optional. The connection ID used to connect to GCP.
append_job_name (bool) – Optional. Set to True if a unique suffix has to be appended to the job_name. Defaults to True.
drain_pipeline (bool) – Optional. Set to True if you want to stop a streaming pipeline job by draining it instead of canceling when killing the task instance. Note that this does not work for batch pipeline jobs or in the deferrable mode. Defaults to False. For more info see: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
deferrable (bool) – Optional. Run operator in the deferrable mode.
expected_terminal_state (str | None) – Optional. The expected terminal state of the Dataflow job at which the operator task is set to succeed. Defaults to ‘JOB_STATE_DONE’ for the batch jobs and ‘JOB_STATE_RUNNING’ for the streaming jobs.
poll_sleep (int) – Optional. The time in seconds to sleep between polling Google Cloud Platform for the Dataflow job status. Used both for the sync and deferrable mode.
cancel_timeout (int | None) – Optional. How long (in seconds) operator should wait for the pipeline to be successfully canceled when the task is being killed.
jinja_variables (dict[str, str] | None) – Optional. A dictionary of Jinja2 variables to be used in reifying the yaml pipeline file.
options (dict[str, Any] | None) – Optional. Additional gcloud or Beam job parameters. It must be a dictionary with the keys matching the optional flag names in gcloud. The list of supported flags can be found at: https://cloud.google.com/sdk/gcloud/reference/dataflow/yaml/run. Note that if a flag does not require a value, then its dictionary value must be either True or None. For example, the –log-http flag can be passed as {‘log-http’: True}.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- Returns
Dictionary containing the job’s data.
- template_fields: collections.abc.Sequence[str] = ('job_name', 'yaml_pipeline_file', 'jinja_variables', 'options', 'region', 'project_id', 'gcp_conn_id')[source]¶
- execute(context)[source]¶
Derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
- class airflow.providers.google.cloud.operators.dataflow.DataflowStopJobOperator(job_name_prefix=None, job_id=None, project_id=PROVIDE_PROJECT_ID, location=DEFAULT_DATAFLOW_LOCATION, gcp_conn_id='google_cloud_default', poll_sleep=10, impersonation_chain=None, stop_timeout=10 * 60, drain_pipeline=True, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Stops the job with the specified name prefix or Job ID.
All jobs with provided name prefix will be stopped. Streaming jobs are drained by default.
Parameter
job_name_prefix
andjob_id
are mutually exclusive.See also
For more details on stopping a pipeline see: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
See also
For more information on how to use this operator, take a look at the guide: Stopping a pipeline
- Parameters
job_name_prefix (str | None) – Name prefix specifying which jobs are to be stopped.
job_id (str | None) – Job ID specifying which jobs are to be stopped.
project_id (str) – Optional, the Google Cloud project ID in which to start a job. If set to None or missing, the default project_id from the Google Cloud connection is used.
location (str) – Optional, Job location. If set to None or missing, “us-central1” will be used.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
poll_sleep (int) – The time in seconds to sleep between polling Google Cloud Platform for the dataflow job status to confirm it’s stopped.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
drain_pipeline (bool) – Optional, set to False if want to stop streaming job by canceling it instead of draining. See: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline
stop_timeout (int | None) – wait time in seconds for successful job canceling/draining
- class airflow.providers.google.cloud.operators.dataflow.DataflowCreatePipelineOperator(*, body, project_id=PROVIDE_PROJECT_ID, location=DEFAULT_DATAFLOW_LOCATION, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Creates a new Dataflow Data Pipeline instance.
See also
For more information on how to use this operator, take a look at the guide: JSON-formatted pipelines
- Parameters
body (dict) – The request body (contains instance of Pipeline). See: https://cloud.google.com/dataflow/docs/reference/data-pipelines/rest/v1/projects.locations.pipelines/create#request-body
project_id (str) – The ID of the GCP project that owns the job.
location (str) – The location to direct the Data Pipelines instance to (for example us-central1).
gcp_conn_id (str) – The connection ID to connect to the Google Cloud Platform.
impersonation_chain (str | collections.abc.Sequence[str] | None) –
Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
Warning
This option requires Apache Beam 2.39.0 or newer.
Returns the created Dataflow Data Pipeline instance in JSON representation.
- class airflow.providers.google.cloud.operators.dataflow.DataflowRunPipelineOperator(pipeline_name, project_id=PROVIDE_PROJECT_ID, location=DEFAULT_DATAFLOW_LOCATION, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Runs a Dataflow Data Pipeline.
See also
For more information on how to use this operator, take a look at the guide: JSON-formatted pipelines
- Parameters
pipeline_name (str) – The display name of the pipeline. In example projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID it would be the PIPELINE_ID.
project_id (str) – The ID of the GCP project that owns the job.
location (str) – The location to direct the Data Pipelines instance to (for example us-central1).
gcp_conn_id (str) – The connection ID to connect to the Google Cloud Platform.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
Returns the created Job in JSON representation.
- class airflow.providers.google.cloud.operators.dataflow.DataflowDeletePipelineOperator(pipeline_name, project_id=PROVIDE_PROJECT_ID, location=DEFAULT_DATAFLOW_LOCATION, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Deletes a Dataflow Data Pipeline.
See also
For more information on how to use this operator, take a look at the guide: Deleting a pipeline
- Parameters
pipeline_name (str) – The display name of the pipeline. In example projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID it would be the PIPELINE_ID.
project_id (str) – The ID of the GCP project that owns the job.
location (str) – The location to direct the Data Pipelines instance to (for example us-central1).
gcp_conn_id (str) – The connection ID to connect to the Google Cloud Platform.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).