airflow.providers.amazon.aws.hooks.glue_catalog

This module contains AWS Glue Catalog Hook

Module Contents

Classes

GlueCatalogHook

Interact with AWS Glue Catalog

AwsGlueCatalogHook

This hook is deprecated.

class airflow.providers.amazon.aws.hooks.glue_catalog.GlueCatalogHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS Glue Catalog

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

See also

AwsBaseHook

get_partitions(self, database_name, table_name, expression='', page_size=None, max_items=None)[source]

Retrieves the partition values for a table.

Parameters
Returns

set of partition values where each value is a tuple since a partition may be composed of multiple columns. For example: {('2018-01-01','1'), ('2018-01-01','2')}

Return type

Set[tuple]

check_for_partition(self, database_name, table_name, expression)[source]

Checks whether a partition exists

Parameters
  • database_name (str) – Name of hive database (schema) @table belongs to

  • table_name (str) – Name of hive table @partition belongs to

Expression

Expression that matches the partitions to check for (eg a = ‘b’ AND c = ‘d’)

Return type

bool

>>> hook = GlueCatalogHook()
>>> t = 'static_babynames_partitioned'
>>> hook.check_for_partition('airflow', t, "ds='2015-01-01'")
True
get_table(self, database_name, table_name)[source]

Get the information of the table

Parameters
  • database_name (str) – Name of hive database (schema) @table belongs to

  • table_name (str) – Name of hive table

Return type

dict

>>> hook = GlueCatalogHook()
>>> r = hook.get_table('db', 'table_foo')
>>> r['Name'] = 'table_foo'
get_table_location(self, database_name, table_name)[source]

Get the physical location of the table

Parameters
  • database_name (str) – Name of hive database (schema) @table belongs to

  • table_name (str) – Name of hive table

Returns

str

Return type

str

get_partition(self, database_name, table_name, partition_values)[source]

Gets a Partition

Parameters
Return type

dict

Raises

AirflowException

>>> hook = GlueCatalogHook()
>>> partition = hook.get_partition('db', 'table', ['string'])
>>> partition['Values']
create_partition(self, database_name, table_name, partition_input)[source]

Creates a new Partition

Parameters
Return type

dict

Raises

AirflowException

>>> hook = GlueCatalogHook()
>>> partition_input = {"Values": []}
>>> hook.create_partition(database_name="db", table_name="table", partition_input=partition_input)
class airflow.providers.amazon.aws.hooks.glue_catalog.AwsGlueCatalogHook(*args, **kwargs)[source]

Bases: GlueCatalogHook

This hook is deprecated. Please use airflow.providers.amazon.aws.hooks.glue_catalog.GlueCatalogHook.

Was this entry helpful?