airflow.providers.apache.iceberg.hooks.iceberg

Attributes

TOKENS_ENDPOINT

Classes

IcebergHook

Hook for Apache Iceberg REST catalogs.

Module Contents

airflow.providers.apache.iceberg.hooks.iceberg.TOKENS_ENDPOINT = 'oauth/tokens'[source]
class airflow.providers.apache.iceberg.hooks.iceberg.IcebergHook(iceberg_conn_id=default_conn_name)[source]

Bases: airflow.providers.common.compat.sdk.BaseHook

Hook for Apache Iceberg REST catalogs.

Provides catalog-level operations (list namespaces, list tables, load schemas) using pyiceberg, plus OAuth2 token generation for external query engines.

Parameters:

iceberg_conn_id (str) – The Iceberg connection id which refers to the information to connect to the Iceberg catalog.

conn_name_attr = 'iceberg_conn_id'[source]
default_conn_name = 'iceberg_default'[source]
conn_type = 'iceberg'[source]
hook_name = 'Iceberg'[source]
classmethod get_ui_field_behaviour()[source]

Return custom UI field behaviour for Iceberg connection.

conn_id = 'iceberg_default'[source]
property catalog: pyiceberg.catalog.Catalog[source]

Return a pyiceberg Catalog instance for the configured connection.

get_conn()[source]

Return the pyiceberg Catalog.

test_connection()[source]

Test the Iceberg connection by listing namespaces.

get_token()[source]

Obtain a short-lived OAuth2 access token.

This preserves the legacy behavior of the pre-2.0 get_conn() method. Use this when you need a raw token for external engines (Spark, Trino, Flink).

get_token_macro()[source]

Return a Jinja2 macro that resolves to a fresh token at render time.

list_namespaces()[source]

Return all namespace names in the catalog.

list_tables(namespace)[source]

Return all table names in the given namespace.

Parameters:

namespace (str) – Namespace (database/schema) to list tables from.

Returns:

List of fully-qualified table names (“namespace.table”).

Return type:

list[str]

load_table(table_name)[source]

Load an Iceberg table object.

Parameters:

table_name (str) – Fully-qualified table name (“namespace.table”).

Returns:

pyiceberg Table instance.

Return type:

pyiceberg.table.Table

table_exists(table_name)[source]

Check whether a table exists in the catalog.

get_table_schema(table_name, **kwargs)[source]

Return column names and types for an Iceberg table.

Compatible with the DbApiHook.get_table_schema() contract so that LLM operators can use this hook interchangeably for schema context.

Parameters:

table_name (str) – Fully-qualified table name (“namespace.table”).

Returns:

List of dicts with name and type keys.

Return type:

list[dict[str, str]]

Example return value:

[
    {"name": "id", "type": "long"},
    {"name": "name", "type": "string"},
    {"name": "created_at", "type": "timestamptz"},
]
get_partition_spec(table_name)[source]

Return the partition spec for an Iceberg table.

Parameters:

table_name (str) – Fully-qualified table name.

Returns:

List of dicts with field and transform keys.

Return type:

list[dict[str, str]]

Example:

[
    {"field": "event_date", "transform": "day"},
    {"field": "region", "transform": "identity"},
]
get_table_properties(table_name)[source]

Return table properties (format version, write config, etc.).

Parameters:

table_name (str) – Fully-qualified table name.

get_snapshots(table_name, limit=10)[source]

Return recent snapshots for an Iceberg table.

Parameters:
  • table_name (str) – Fully-qualified table name.

  • limit (int) – Maximum number of snapshots to return (most recent first).

Returns:

List of dicts with snapshot metadata.

Return type:

list[dict[str, Any]]

Was this entry helpful?