Apache Spark Connection¶
The Apache Spark connection type enables connection to Apache Spark.
Default Connection IDs¶
Spark Submit and Spark JDBC hooks and operators use spark_default
by default, Spark SQL hooks and operators point to spark_sql_default
by default, but don’t use it.
Configuring the Connection¶
- Host (required)
The host to connect to, it can be
local
,yarn
or an URL.- Port (optional)
Specify the port in case of host be an URL.
- Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in spark connection. The following parameters out of the standard python parameters are supported:
queue
- The name of the YARN queue to which the application is submitted.deploy-mode
- Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client).spark-home
- If passed then build thespark-binary
executable path using it (spark-home
/bin/spark-binary
); otherwise assume thatspark-binary
is present in the PATH of the executing user.spark-binary
- The command to use for Spark submit. Some distros may usespark2-submit
. Defaultspark-submit
.namespace
- Kubernetes namespace (spark.kubernetes.namespace
) to divide cluster resources between multiple users (via resource quota).