airflow.providers.sftp.hooks.sftp

This module contains SFTP hook.

Classes

SFTPHook

Interact with SFTP.

SFTPHookAsync

Interact with an SFTP server via asyncssh package.

Module Contents

class airflow.providers.sftp.hooks.sftp.SFTPHook(ssh_conn_id='sftp_default', host_proxy_cmd=None, *args, **kwargs)[source]

Bases: airflow.providers.ssh.hooks.ssh.SSHHook

Interact with SFTP.

This hook inherits the SSH hook. Please refer to SSH hook for the input arguments.

Pitfalls::
  • In contrast with FTPHook describe_directory only returns size, type and modify. It doesn’t return unix.owner, unix.mode, perm, unix.group and unique.

  • If no mode is passed to create_directory it will be created with 777 permissions.

Errors that may occur throughout but should be handled downstream.

For consistency reasons with SSHHook, the preferred parameter is “ssh_conn_id”.

Parameters:

ssh_conn_id (str | None) – The sftp connection id

conn_name_attr = 'ssh_conn_id'[source]
default_conn_name = 'sftp_default'[source]
conn_type = 'sftp'[source]
hook_name = 'SFTP'[source]
classmethod get_ui_field_behaviour()[source]

Return custom UI field behaviour for SSH connection.

conn: paramiko.sftp_client.SFTPClient | None = None[source]
ssh_conn_id = 'sftp_default'[source]
get_conn()[source]

Open an SFTP connection to the remote host.

close_conn()[source]

Close the SFTP connection.

get_managed_conn()[source]

Context manager that closes the connection after use.

get_conn_count()[source]

Get the number of open connections.

describe_directory(path)[source]

Get file information in a directory on the remote system.

The return format is {filename: {attributes}}. The remote system support the MLSD command.

Parameters:

path (str) – full path to the remote directory

list_directory(path)[source]

List files in a directory on the remote system.

Parameters:

path (str) – full path to the remote directory to list

list_directory_with_attr(path)[source]

List files in a directory on the remote system including their SFTPAttributes.

Parameters:

path (str) – full path to the remote directory to list

mkdir(path, mode=511)[source]

Create a directory on the remote system.

The default mode is 0o777, but on some systems, the current umask value may be first masked out.

Parameters:
  • path (str) – full path to the remote directory to create

  • mode (int) – int permissions of octal mode for directory

isdir(path)[source]

Check if the path provided is a directory.

Parameters:

path (str) – full path to the remote directory to check

isfile(path)[source]

Check if the path provided is a file.

Parameters:

path (str) – full path to the remote file to check

create_directory(path, mode=511)[source]

Create a directory on the remote system.

The default mode is 0o777, but on some systems, the current umask value may be first masked out. Different from mkdir(), this function attempts to create parent directories if needed, and returns silently if the target directory already exists.

Parameters:
  • path (str) – full path to the remote directory to create

  • mode (int) – int permissions of octal mode for directory

delete_directory(path, include_files=False)[source]

Delete a directory on the remote system.

Parameters:

path (str) – full path to the remote directory to delete

retrieve_file(remote_full_path, local_full_path, prefetch=True)[source]

Transfer the remote file to a local location.

If local_full_path is a string path, the file will be put at that location.

Parameters:
  • remote_full_path (str) – full path to the remote file

  • local_full_path (str) – full path to the local file or a file-like buffer

  • prefetch (bool) – controls whether prefetch is performed (default: True)

store_file(remote_full_path, local_full_path, confirm=True)[source]

Transfer a local file to the remote location.

If local_full_path_or_buffer is a string path, the file will be read from that location.

Parameters:
  • remote_full_path (str) – full path to the remote file

  • local_full_path (str) – full path to the local file or a file-like buffer

delete_file(path)[source]

Remove a file on the server.

Parameters:

path (str) – full path to the remote file

retrieve_directory(remote_full_path, local_full_path, prefetch=True)[source]

Transfer the remote directory to a local location.

If local_full_path is a string path, the directory will be put at that location.

Parameters:
  • remote_full_path (str) – full path to the remote directory

  • local_full_path (str) – full path to the local directory

  • prefetch (bool) – controls whether prefetch is performed (default: True)

retrieve_directory_concurrently(remote_full_path, local_full_path, workers=os.cpu_count() or 2)[source]

Transfer the remote directory to a local location concurrently.

If local_full_path is a string path, the directory will be put at that location.

Parameters:
  • remote_full_path (str) – full path to the remote directory

  • local_full_path (str) – full path to the local directory

  • prefetch – controls whether prefetch is performed (default: True)

  • workers (int) – number of workers to use for concurrent transfer (default: number of CPUs or 2 if undetermined)

store_directory(remote_full_path, local_full_path, confirm=True)[source]

Transfer a local directory to the remote location.

If local_full_path is a string path, the directory will be read from that location.

Parameters:
  • remote_full_path (str) – full path to the remote directory

  • local_full_path (str) – full path to the local directory

store_directory_concurrently(remote_full_path, local_full_path, confirm=True, workers=os.cpu_count() or 2)[source]

Transfer a local directory to the remote location concurrently.

If local_full_path is a string path, the directory will be read from that location.

Parameters:
  • remote_full_path (str) – full path to the remote directory

  • local_full_path (str) – full path to the local directory

  • confirm (bool) – whether to confirm the file size after transfer (default: True)

  • workers (int) – number of workers to use for concurrent transfer (default: number of CPUs or 2 if undetermined)

get_mod_time(path)[source]

Get an entry’s modification time.

Parameters:

path (str) – full path to the remote file

path_exists(path)[source]

Whether a remote entity exists.

Parameters:

path (str) – full path to the remote file or directory

walktree(path, fcallback, dcallback, ucallback, recurse=True)[source]

Recursively descend, depth first, the directory tree at path.

This calls discrete callback functions for each regular file, directory, and unknown file type.

Parameters:
  • path (str) – root of remote directory to descend, use ‘.’ to start at pwd

  • fcallback (callable) – callback function to invoke for a regular file. (form: func(str))

  • dcallback (callable) – callback function to invoke for a directory. (form: func(str))

  • ucallback (callable) – callback function to invoke for an unknown file type. (form: func(str))

  • recurse (bool) – Default: True - should it recurse

get_tree_map(path, prefix=None, delimiter=None)[source]

Get tuple with recursive lists of files, directories and unknown paths.

It is possible to filter results by giving prefix and/or delimiter parameters.

Parameters:
  • path (str) – path from which tree will be built

  • prefix (str | None) – if set paths will be added if start with prefix

  • delimiter (str | None) – if set paths will be added if end with delimiter

Returns:

tuple with list of files, dirs and unknown items

Return type:

tuple[list[str], list[str], list[str]]

test_connection()[source]

Test the SFTP connection by calling path with directory.

get_file_by_pattern(path, fnmatch_pattern)[source]

Get the first matching file based on the given fnmatch type pattern.

Parameters:
  • path – path to be checked

  • fnmatch_pattern – The pattern that will be matched with fnmatch

Returns:

string containing the first found file, or an empty string if none matched

Return type:

str

get_files_by_pattern(path, fnmatch_pattern)[source]

Get all matching files based on the given fnmatch type pattern.

Parameters:
  • path – path to be checked

  • fnmatch_pattern – The pattern that will be matched with fnmatch

Returns:

list of string containing the found files, or an empty list if none matched

Return type:

list[str]

class airflow.providers.sftp.hooks.sftp.SFTPHookAsync(sftp_conn_id=default_conn_name, host='', port=22, username='', password='', known_hosts=default_known_hosts, key_file='', passphrase='', private_key='')[source]

Bases: airflow.hooks.base.BaseHook

Interact with an SFTP server via asyncssh package.

Parameters:
  • sftp_conn_id (str) – SFTP connection ID to be used for connecting to SFTP server

  • host (str) – hostname of the SFTP server

  • port (int) – port of the SFTP server

  • username (str) – username used when authenticating to the SFTP server

  • password (str) – password used when authenticating to the SFTP server. Can be left blank if using a key file

  • known_hosts (str) – path to the known_hosts file on the local file system. Defaults to ~/.ssh/known_hosts.

  • key_file (str) – path to the client key file used for authentication to SFTP server

  • passphrase (str) – passphrase used with the key_file for authentication to SFTP server

conn_name_attr = 'ssh_conn_id'[source]
default_conn_name = 'sftp_default'[source]
conn_type = 'sftp'[source]
hook_name = 'SFTP'[source]
default_known_hosts = '~/.ssh/known_hosts'[source]
sftp_conn_id = 'sftp_default'[source]
host = ''[source]
port = 22[source]
username = ''[source]
password = ''[source]
known_hosts: bytes | str[source]
key_file = ''[source]
passphrase = ''[source]
private_key = ''[source]
async list_directory(path='')[source]

Return a list of files on the SFTP server at the provided path.

async read_directory(path='')[source]

Return a list of files along with their attributes on the SFTP server at the provided path.

async get_files_and_attrs_by_pattern(path='', fnmatch_pattern='')[source]

Get the files along with their attributes matching the pattern (e.g. *.pdf) at the provided path.

if one exists. Otherwise, raises an AirflowException to be handled upstream for deferring

async get_mod_time(path)[source]

Make SFTP async connection.

Looks for last modified time in the specific file path and returns last modification time for

the file path.

Parameters:

path (str) – full path to the remote file

Was this entry helpful?