airflow.providers.amazon.aws.hooks.glue_catalog

This module contains AWS Glue Catalog Hook.

Module Contents

Classes

GlueCatalogHook

Interact with AWS Glue Data Catalog.

class airflow.providers.amazon.aws.hooks.glue_catalog.GlueCatalogHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS Glue Data Catalog.

Provide thin wrapper around boto3.client("glue").

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

async async_get_partitions(client, database_name, table_name, expression='', page_size=None, max_items=1)[source]

Asynchronously retrieves the partition values for a table.

Parameters
Returns

set of partition values where each value is a tuple since a partition may be composed of multiple columns. For example: {('2018-01-01','1'), ('2018-01-01','2')}

Return type

set[tuple]

get_partitions(database_name, table_name, expression='', page_size=None, max_items=None)[source]

Retrieve the partition values for a table.

Parameters
Returns

set of partition values where each value is a tuple since a partition may be composed of multiple columns. For example: {('2018-01-01','1'), ('2018-01-01','2')}

Return type

set[tuple]

check_for_partition(database_name, table_name, expression)[source]

Check whether a partition exists.

hook = GlueCatalogHook()
t = "static_babynames_partitioned"
hook.check_for_partition("airflow", t, "ds='2015-01-01'")
Parameters
  • database_name (str) – Name of hive database (schema) @table belongs to

  • table_name (str) – Name of hive table @partition belongs to

Expression

Expression that matches the partitions to check for, e.g.: a = 'b' AND c = 'd'

get_table(database_name, table_name)[source]

Get the information of the table.

hook = GlueCatalogHook()
r = hook.get_table("db", "table_foo")
r["Name"] = "table_foo"
Parameters
  • database_name (str) – Name of hive database (schema) @table belongs to

  • table_name (str) – Name of hive table

get_table_location(database_name, table_name)[source]

Get the physical location of the table.

Parameters
  • database_name (str) – Name of hive database (schema) @table belongs to

  • table_name (str) – Name of hive table

get_partition(database_name, table_name, partition_values)[source]

Get a Partition.

hook = GlueCatalogHook()
partition = hook.get_partition("db", "table", ["string"])
partition["Values"]
Parameters
Raises

AirflowException

create_partition(database_name, table_name, partition_input)[source]

Create a new Partition.

hook = GlueCatalogHook()
partition_input = {"Values": []}
hook.create_partition(database_name="db", table_name="table", partition_input=partition_input)
Parameters
Raises

AirflowException

Was this entry helpful?