airflow.providers.amazon.aws.operators.athena

Module Contents

Classes

AthenaOperator

An operator that submits a Trino/Presto query to Amazon Athena.

class airflow.providers.amazon.aws.operators.athena.AthenaOperator(*, query, database, output_location, client_request_token=None, workgroup='primary', query_execution_context=None, result_configuration=None, sleep_time=30, max_polling_attempts=None, log_query=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), catalog='AwsDataCatalog', **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.athena.AthenaHook]

An operator that submits a Trino/Presto query to Amazon Athena.

Note

if the task is killed while it runs, it’ll cancel the athena query that was launched, EXCEPT if running in deferrable mode.

See also

For more information on how to use this operator, take a look at the guide: Run a query in Amazon Athena

Parameters
  • query (str) – Trino/Presto query to be run on Amazon Athena. (templated)

  • database (str) – Database to select. (templated)

  • catalog (str) – Catalog to select. (templated)

  • output_location (str) – s3 path to write the query results into. (templated)

  • client_request_token (str | None) – Unique token created by user to avoid multiple executions of same query

  • workgroup (str) – Athena workgroup in which query will be run. (templated)

  • query_execution_context (dict[str, str] | None) – Context in which query need to be run

  • result_configuration (dict[str, Any] | None) – Dict with path to store results in and config related to encryption

  • sleep_time (int) – Time (in seconds) to wait between two consecutive calls to check query status on Athena

  • max_polling_attempts (int | None) – Number of times to poll for query state before function exits To limit task execution time, use execution_timeout.

  • log_query (bool) – Whether to log athena query and other execution params when it’s executed. Defaults to True.

  • aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).

  • region_name – AWS region_name. If not specified then the default boto3 behaviour is used.

  • verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html

  • botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]
ui_color = '#44b5e2'[source]
template_fields: Sequence[str][source]
template_ext: Sequence[str] = ('.sql',)[source]
template_fields_renderers[source]
execute(context)[source]

Run Trino/Presto Query on Amazon Athena.

execute_complete(context, event=None)[source]
on_kill()[source]

Cancel the submitted Amazon Athena query.

Was this entry helpful?