airflow.providers.apache.iceberg.hooks.iceberg¶

Attributes¶

TOKENS_ENDPOINT

Classes¶

IcebergHook

Hook for Apache Iceberg REST catalogs.

Module Contents¶

airflow.providers.apache.iceberg.hooks.iceberg.TOKENS_ENDPOINT = 'oauth/tokens'[source]¶

class airflow.providers.apache.iceberg.hooks.iceberg.IcebergHook(iceberg_conn_id=default_conn_name)[source]¶

Bases: airflow.providers.common.compat.sdk.BaseHook

Hook for Apache Iceberg REST catalogs.

Provides catalog-level operations (list namespaces, list tables, load schemas) using pyiceberg, plus OAuth2 token generation for external query engines.

Parameters:: iceberg_conn_id (str) – The Iceberg connection id which refers to the information to connect to the Iceberg catalog.

conn_name_attr = 'iceberg_conn_id'[source]¶

default_conn_name = 'iceberg_default'[source]¶

conn_type = 'iceberg'[source]¶

hook_name = 'Iceberg'[source]¶

classmethod get_ui_field_behaviour()[source]¶

Return custom UI field behaviour for Iceberg connection.

conn_id = 'iceberg_default'[source]¶

property catalog: pyiceberg.catalog.Catalog[source]¶

Return a pyiceberg Catalog instance for the configured connection.

get_conn()[source]¶

Return the pyiceberg Catalog.

test_connection()[source]¶

Test the Iceberg connection by listing namespaces.

get_token()[source]¶

Obtain a short-lived OAuth2 access token.

This preserves the legacy behavior of the pre-2.0 get_conn() method. Use this when you need a raw token for external engines (Spark, Trino, Flink).

get_token_macro()[source]¶

Return a Jinja2 macro that resolves to a fresh token at render time.

list_namespaces()[source]¶

Return all namespace names in the catalog.

list_tables(namespace)[source]¶

Return all table names in the given namespace.

Parameters:: namespace (str) – Namespace (database/schema) to list tables from.
Returns:: List of fully-qualified table names (“namespace.table”).
Return type:: list[str]

load_table(table_name)[source]¶

Load an Iceberg table object.

Parameters:: table_name (str) – Fully-qualified table name (“namespace.table”).
Returns:: pyiceberg Table instance.
Return type:: pyiceberg.table.Table

table_exists(table_name)[source]¶

Check whether a table exists in the catalog.

get_table_schema(table_name, **kwargs)[source]¶

Return column names and types for an Iceberg table.

Compatible with the DbApiHook.get_table_schema() contract so that LLM operators can use this hook interchangeably for schema context.

Parameters:: table_name (str) – Fully-qualified table name (“namespace.table”).
Returns:: List of dicts with name and type keys.
Return type:: list[dict[str, str]]

Example return value:

[
    {"name": "id", "type": "long"},
    {"name": "name", "type": "string"},
    {"name": "created_at", "type": "timestamptz"},
]

get_partition_spec(table_name)[source]¶

Return the partition spec for an Iceberg table.

Parameters:: table_name (str) – Fully-qualified table name.
Returns:: List of dicts with field and transform keys.
Return type:: list[dict[str, str]]

Example:

[
    {"field": "event_date", "transform": "day"},
    {"field": "region", "transform": "identity"},
]

get_table_properties(table_name)[source]¶

Return table properties (format version, write config, etc.).

Parameters:: table_name (str) – Fully-qualified table name.

get_snapshots(table_name, limit=10)[source]¶

Return recent snapshots for an Iceberg table.

Parameters:

table_name (str) – Fully-qualified table name.
limit (int) – Maximum number of snapshots to return (most recent first).

Returns:

List of dicts with snapshot metadata.

Return type:

list[dict[str, Any]]