Apache Spark

apache-airflow-providers-apache-spark

Apache Spark

Works with Airflow 2.11+
Install:
pip install apache-airflow-providers-apache-spark==5.6.0

Airflow

2.11+

Python

>=3.10

Dependencies (4)

Show all Hide apache-airflow>=2.11.0 apache-airflow-providers-common-compat>=1.12.0 pyspark>=3.5.2 grpcio-status>=1.59.0

Connections (4)

Modules

O

SparkJDBCOperator

Extend the SparkSubmitOperator to perform data transfers to/from JDBC-based databases with Apache Spark.

airflow.providers.apache.spark.operators.spark_jdbc.SparkJDBCOperator
O

SparkPipelinesOperator

Execute Spark Declarative Pipelines using the spark-pipelines CLI.

airflow.providers.apache.spark.operators.spark_pipelines.SparkPipelinesOperator
O

SparkSqlOperator

Execute Spark SQL query.

airflow.providers.apache.spark.operators.spark_sql.SparkSqlOperator
O

SparkSubmitOperator

Wrap the spark-submit binary to kick off a spark-submit job; requires "spark-submit" binary in the PATH.

airflow.providers.apache.spark.operators.spark_submit.SparkSubmitOperator
O

PySparkOperator

Submit the run of a pyspark job to an external spark-connect service or directly run the pyspark job in a standalone mode.

airflow.providers.apache.spark.operators.spark_pyspark.PySparkOperator
H

SparkConnectHook

Hook for Spark Connect.

airflow.providers.apache.spark.hooks.spark_connect.SparkConnectHook
H

SparkJDBCHook

Extends the SparkSubmitHook for performing data transfers to/from JDBC-based databases with Apache Spark.

airflow.providers.apache.spark.hooks.spark_jdbc.SparkJDBCHook
H

SparkPipelinesHook

Hook for interacting with Spark Declarative Pipelines via the spark-pipelines CLI.

airflow.providers.apache.spark.hooks.spark_pipelines.SparkPipelinesHook
H

SparkSqlHook

This hook is a wrapper around the spark-sql binary; requires the "spark-sql" binary to be in the PATH.

airflow.providers.apache.spark.hooks.spark_sql.SparkSqlHook
H

SparkSubmitHook

Wrap the spark-submit binary to kick off a spark-submit job; requires "spark-submit" binary in the PATH.

airflow.providers.apache.spark.hooks.spark_submit.SparkSubmitHook
D

@task.pyspark

Task decorator for pyspark

airflow.providers.apache.spark.decorators.pyspark.pyspark_task