airflow.providers.alibaba.cloud.operators.analyticdb_spark

Module Contents

Classes

AnalyticDBSparkBaseOperator

Abstract base class that defines how users develop AnalyticDB Spark.

AnalyticDBSparkSQLOperator

Submits a Spark SQL application to the underlying cluster; wraps the AnalyticDB Spark REST API.

AnalyticDBSparkBatchOperator

Submits a Spark batch application to the underlying cluster; wraps the AnalyticDB Spark REST API.

class airflow.providers.alibaba.cloud.operators.analyticdb_spark.AnalyticDBSparkBaseOperator(*, adb_spark_conn_id='adb_spark_default', region=None, polling_interval=0, **kwargs)[source]

Bases: airflow.models.BaseOperator

Abstract base class that defines how users develop AnalyticDB Spark.

hook()[source]

Get valid hook.

get_hook()[source]

Get valid hook.

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

monitor_application()[source]
poll_for_termination(app_id)[source]

Pool for spark application termination.

Parameters

app_id (str) – id of the spark application to monitor

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

kill()[source]

Delete the specified application.

class airflow.providers.alibaba.cloud.operators.analyticdb_spark.AnalyticDBSparkSQLOperator(*, sql, conf=None, driver_resource_spec=None, executor_resource_spec=None, num_executors=None, name=None, cluster_id, rg_name, **kwargs)[source]

Bases: AnalyticDBSparkBaseOperator

Submits a Spark SQL application to the underlying cluster; wraps the AnalyticDB Spark REST API.

Parameters
  • sql (str) – The SQL query to execute.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

  • driver_resource_spec (str | None) – The resource specifications of the Spark driver.

  • executor_resource_spec (str | None) – The resource specifications of each Spark executor.

  • num_executors (int | str | None) – number of executors to launch for this application.

  • name (str | None) – name of this application.

  • cluster_id (str) – The cluster ID of AnalyticDB MySQL 3.0 Data Lakehouse.

  • rg_name (str) – The name of resource group in AnalyticDB MySQL 3.0 Data Lakehouse cluster.

template_fields: Sequence[str] = ('spark_params',)[source]
template_fields_renderers[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.alibaba.cloud.operators.analyticdb_spark.AnalyticDBSparkBatchOperator(*, file, class_name=None, args=None, conf=None, jars=None, py_files=None, files=None, driver_resource_spec=None, executor_resource_spec=None, num_executors=None, archives=None, name=None, cluster_id, rg_name, **kwargs)[source]

Bases: AnalyticDBSparkBaseOperator

Submits a Spark batch application to the underlying cluster; wraps the AnalyticDB Spark REST API.

Parameters
  • file (str) – path of the file containing the application to execute.

  • class_name (str | None) – name of the application Java/Spark main class.

  • args (Sequence[str | int | float] | None) – application command line arguments.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

  • jars (Sequence[str] | None) – jars to be used in this application.

  • py_files (Sequence[str] | None) – python files to be used in this application.

  • files (Sequence[str] | None) – files to be used in this application.

  • driver_resource_spec (str | None) – The resource specifications of the Spark driver.

  • executor_resource_spec (str | None) – The resource specifications of each Spark executor.

  • num_executors (int | str | None) – number of executors to launch for this application.

  • archives (Sequence[str] | None) – archives to be used in this application.

  • name (str | None) – name of this application.

  • cluster_id (str) – The cluster ID of AnalyticDB MySQL 3.0 Data Lakehouse.

  • rg_name (str) – The name of resource group in AnalyticDB MySQL 3.0 Data Lakehouse cluster.

template_fields: Sequence[str] = ('spark_params',)[source]
template_fields_renderers[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?