Apache HDFS Connection¶
The Apache HDFS connection type enables connection to Apache HDFS.
Default Connection IDs¶
HDFS Hook uses parameter hdfs_conn_id
for Connection IDs and the value of the parameter
as hdfs_default
by default.
Web HDFS Hook uses parameter webhdfs_conn_id
for Connection IDs and the value of the
parameter as webhdfs_default
by default.
Configuring the Connection¶
- Host
The host to connect to, it can be
local
,yarn
or an URL.- Port
Specify the port in case of host be an URL.
- Login
Effective user for HDFS operations (non-Kerberized).
- Extra (optional, connection parameters)
Specify the extra parameters (as json dictionary) that can be used in HDFS connection. The following parameters out of the standard python parameters are supported:
autoconfig
- Default value is bool: False. Use snakebite’s automatically configured client. This HDFSHook implementation requires snakebite.hdfs_namenode_principal
- Specifies the Kerberos principal to use for HDFS.
The following extra parameters can be used to configure SSL for Web HDFS Hook:
use_ssl
- If SSL should be used. By default is set to false.verify
- How to verify SSL. For more information refer to https://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification.