Apache Spark Connection¶
The Apache Spark connection type enables connection to Apache Spark.
Default Connection IDs¶
Spark Submit and Spark JDBC hooks and operators use spark_default by default, Spark SQL hooks and operators point to spark_sql_default by default, but don't use it.
Configuring the Connection¶
- Host (required)
The host to connect to, it can be
local,yarnor an URL.- Port (optional)
Specify the port in case of host be an URL.
- Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in spark connection. The following parameters out of the standard python parameters are supported:
queue- The name of the YARN queue to which the application is submitted.deploy-mode- Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client).spark-home- If passed then build thespark-binaryexecutable path using it (spark-home/bin/spark-binary); otherwise assume thatspark-binaryis present in the PATH of the executing user.spark-binary- The command to use for Spark submit. Some distros may usespark2-submit. Defaultspark-submit.namespace- Kubernetes namespace (spark.kubernetes.namespace) to divide cluster resources between multiple users (via resource quota).