airflow.providers.microsoft.azure.hooks.wasb

This module contains integration with Azure Blob Storage.

It communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default for an example).

Module Contents

class airflow.providers.microsoft.azure.hooks.wasb.WasbHook(wasb_conn_id: str = default_conn_name, public_read: bool = False)[source]

Bases: airflow.hooks.base.BaseHook

Interacts with Azure Blob Storage through the wasb:// protocol.

These parameters have to be passed in Airflow Data Base: account_name and account_key.

Additional options passed in the 'extra' field of the connection will be passed to the BlockBlockService() constructor. For example, authenticate using a SAS token by adding {"sas_token": "YOUR_TOKEN"}.

Parameters
  • wasb_conn_id (str) -- Reference to the wasb connection.

  • public_read (bool) -- Whether an anonymous public read access should be used. default is False

conn_name_attr = wasb_conn_id[source]
default_conn_name = wasb_default[source]
conn_type = wasb[source]
hook_name = Azure Blob Storage[source]
get_conn(self)[source]

Return the BlobServiceClient object.

_get_container_client(self, container_name: str)[source]

Instantiates a container client

Parameters

container_name (str) -- The name of the container

Returns

ContainerClient

_get_blob_client(self, container_name: str, blob_name: str)[source]

Instantiates a blob client

Parameters
  • container_name (str) -- The name of the blob container

  • blob_name (str) -- The name of the blob. This needs not be existing

check_for_blob(self, container_name: str, blob_name: str, **kwargs)[source]

Check if a blob exists on Azure Blob Storage.

Parameters
  • container_name (str) -- Name of the container.

  • blob_name (str) -- Name of the blob.

  • kwargs (object) -- Optional keyword arguments for BlobClient.get_blob_properties takes.

Returns

True if the blob exists, False otherwise.

Return type

bool

check_for_prefix(self, container_name: str, prefix: str, **kwargs)[source]

Check if a prefix exists on Azure Blob storage.

Parameters
  • container_name (str) -- Name of the container.

  • prefix (str) -- Prefix of the blob.

  • kwargs (object) -- Optional keyword arguments that ContainerClient.walk_blobs takes

Returns

True if blobs matching the prefix exist, False otherwise.

Return type

bool

get_blobs_list(self, container_name: str, prefix: Optional[str] = None, include: Optional[List[str]] = None, delimiter: Optional[str] = '/', **kwargs)[source]

List blobs in a given container

Parameters
  • container_name (str) -- The name of the container

  • prefix (str) -- Filters the results to return only blobs whose names begin with the specified prefix.

  • include (List[str]) -- Specifies one or more additional datasets to include in the response. Options include: snapshots, metadata, uncommittedblobs, copy`, ``deleted.

  • delimiter (str) -- filters objects based on the delimiter (for e.g '.csv')

load_file(self, file_path: str, container_name: str, blob_name: str, **kwargs)[source]

Upload a file to Azure Blob Storage.

Parameters
  • file_path (str) -- Path to the file to load.

  • container_name (str) -- Name of the container.

  • blob_name (str) -- Name of the blob.

  • kwargs (object) -- Optional keyword arguments that BlobClient.upload_blob() takes.

load_string(self, string_data: str, container_name: str, blob_name: str, **kwargs)[source]

Upload a string to Azure Blob Storage.

Parameters
  • string_data (str) -- String to load.

  • container_name (str) -- Name of the container.

  • blob_name (str) -- Name of the blob.

  • kwargs (object) -- Optional keyword arguments that BlobClient.upload() takes.

get_file(self, file_path: str, container_name: str, blob_name: str, **kwargs)[source]

Download a file from Azure Blob Storage.

Parameters
  • file_path (str) -- Path to the file to download.

  • container_name (str) -- Name of the container.

  • blob_name (str) -- Name of the blob.

  • kwargs (object) -- Optional keyword arguments that BlobClient.download_blob() takes.

read_file(self, container_name: str, blob_name: str, **kwargs)[source]

Read a file from Azure Blob Storage and return as a string.

Parameters
  • container_name (str) -- Name of the container.

  • blob_name (str) -- Name of the blob.

  • kwargs (object) -- Optional keyword arguments that BlobClient.download_blob takes.

upload(self, container_name, blob_name, data, blob_type: str = 'BlockBlob', length: Optional[int] = None, **kwargs)[source]

Creates a new blob from a data source with automatic chunking.

Parameters
  • container_name (str) -- The name of the container to upload data

  • blob_name (str) -- The name of the blob to upload. This need not exist in the container

  • data -- The blob data to upload

  • blob_type (storage.BlobType) -- The type of the blob. This can be either BlockBlob, PageBlob or AppendBlob. The default value is BlockBlob.

  • length (int) -- Number of bytes to read from the stream. This is optional, but should be supplied for optimal performance.

download(self, container_name, blob_name, offset: Optional[int] = None, length: Optional[int] = None, **kwargs)[source]

Downloads a blob to the StorageStreamDownloader

Parameters
  • container_name (str) -- The name of the container containing the blob

  • blob_name (str) -- The name of the blob to download

  • offset (int) -- Start of byte range to use for downloading a section of the blob. Must be set if length is provided.

  • length (int) -- Number of bytes to read from the stream.

create_container(self, container_name: str)[source]

Create container object if not already existing

Parameters

container_name (str) -- The name of the container to create

delete_container(self, container_name: str)[source]

Delete a container object

Parameters

container_name (str) -- The name of the container

delete_blobs(self, container_name: str, *blobs, **kwargs)[source]

Marks the specified blobs or snapshots for deletion.

Parameters
  • container_name (str) -- The name of the container containing the blobs

  • blobs (Union[str, BlobProperties]) -- The blobs to delete. This can be a single blob, or multiple values can be supplied, where each value is either the name of the blob (str) or BlobProperties.

delete_file(self, container_name: str, blob_name: str, is_prefix: bool = False, ignore_if_missing: bool = False, **kwargs)[source]

Delete a file from Azure Blob Storage.

Parameters
  • container_name (str) -- Name of the container.

  • blob_name (str) -- Name of the blob.

  • is_prefix (bool) -- If blob_name is a prefix, delete all matching files

  • ignore_if_missing (bool) -- if True, then return success even if the blob does not exist.

  • kwargs (object) -- Optional keyword arguments that ContainerClient.delete_blobs() takes.

Was this entry helpful?