Apache Spark Connection¶
The Apache Spark connection type enables connection to Apache Spark.
Default Connection IDs¶
Spark Submit and Spark JDBC hooks and operators use
spark_default by default, Spark SQL hooks and operators point to
spark_sql_default by default, but don't use it.
Configuring the Connection¶
- Host (required)
The host to connect to, it can be
yarnor an URL.
- Port (optional)
Specify the port in case of host be an URL.
- Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in spark connection. The following parameters out of the standard python parameters are supported:
queue- The name of the YARN queue to which the application is submitted.
deploy-mode- Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client).
spark-home- If passed then build the
spark-binaryexecutable path using it (
spark-binary); otherwise assume that
spark-binaryis present in the PATH of the executing user.
spark-binary- The command to use for Spark submit. Some distros may use
namespace- Kubernetes namespace (
spark.kubernetes.namespace) to divide cluster resources between multiple users (via resource quota).