airflow.providers.amazon.aws.hooks.athena

This module contains AWS Athena hook.

Module Contents

Classes

AthenaHook

Interact with Amazon Athena.

class airflow.providers.amazon.aws.hooks.athena.AthenaHook(*args, sleep_time=30, log_query=True, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with Amazon Athena.

Provide thick wrapper around boto3.client("athena").

Parameters
  • sleep_time (int) – Time (in seconds) to wait between two consecutive calls to check query status on Athena.

  • log_query (bool) – Whether to log athena query and other execution params when it’s executed. Defaults to True.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

INTERMEDIATE_STATES = ('QUEUED', 'RUNNING')[source]
FAILURE_STATES = ('FAILED', 'CANCELLED')[source]
SUCCESS_STATES = ('SUCCEEDED',)[source]
TERMINAL_STATES = ('SUCCEEDED', 'FAILED', 'CANCELLED')[source]
run_query(query, query_context, result_configuration, client_request_token=None, workgroup='primary')[source]

Run a Presto query on Athena with provided config.

Parameters
  • query (str) – Presto query to run.

  • query_context (dict[str, str]) – Context in which query need to be run.

  • result_configuration (dict[str, Any]) – Dict with path to store results in and config related to encryption.

  • client_request_token (str | None) – Unique token created by user to avoid multiple executions of same query.

  • workgroup (str) – Athena workgroup name, when not specified, will be 'primary'.

Returns

Submitted query execution ID.

Return type

str

check_query_status(query_execution_id)[source]

Fetch the state of a submitted query.

Parameters

query_execution_id (str) – Id of submitted athena query

Returns

One of valid query states, or None if the response is malformed.

Return type

str | None

get_state_change_reason(query_execution_id)[source]

Fetch the reason for a state change (e.g. error message). Returns None or reason string.

Parameters

query_execution_id (str) – Id of submitted athena query

get_query_results(query_execution_id, next_token_id=None, max_results=1000)[source]

Fetch submitted query results.

Parameters
  • query_execution_id (str) – Id of submitted athena query

  • next_token_id (str | None) – The token that specifies where to start pagination.

  • max_results (int) – The maximum number of results (rows) to return in this request.

Returns

None if the query is in intermediate, failed, or cancelled state. Otherwise a dict of query outputs.

Return type

dict | None

get_query_results_paginator(query_execution_id, max_items=None, page_size=None, starting_token=None)[source]

Fetch submitted Athena query results.

Parameters
  • query_execution_id (str) – Id of submitted athena query

  • max_items (int | None) – The total number of items to return.

  • page_size (int | None) – The size of each page.

  • starting_token (str | None) – A token to specify where to start paginating.

Returns

None if the query is in intermediate, failed, or cancelled state. Otherwise a paginator to iterate through pages of results.

Return type

botocore.paginate.PageIterator | None

Call :meth`.build_full_result()` on the returned paginator to get all results at once.

poll_query_status(query_execution_id, max_polling_attempts=None)[source]

Poll the state of a submitted query until it reaches final state.

Parameters
  • query_execution_id (str) – ID of submitted athena query

  • max_polling_attempts (int | None) – Number of times to poll for query state before function exits

Returns

One of the final states

Return type

str | None

get_output_location(query_execution_id)[source]

Get the output location of the query results in S3 URI format.

Parameters

query_execution_id (str) – Id of submitted athena query

stop_query(query_execution_id)[source]

Cancel the submitted query.

Parameters

query_execution_id (str) – Id of submitted athena query

Was this entry helpful?