airflow.providers.apache.hive.operators.hive

Module Contents

Classes

HiveOperator

Executes hql code or hive script in a specific Hive database.

class airflow.providers.apache.hive.operators.hive.HiveOperator(*, hql, hive_cli_conn_id='hive_cli_default', schema='default', hiveconfs=None, hiveconf_jinja_translate=False, script_begin_tag=None, run_as_owner=False, mapred_queue=None, mapred_queue_priority=None, mapred_job_name=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Executes hql code or hive script in a specific Hive database.

Parameters
  • hql (str) – the hql to be executed. Note that you may also use a relative path from the dag file of a (template) hive script. (templated)

  • hive_cli_conn_id (str) – Reference to the Hive CLI connection id. (templated)

  • hiveconfs (dict[Any, Any] | None) – if defined, these key value pairs will be passed to hive as -hiveconf "key"="value"

  • hiveconf_jinja_translate (bool) – when True, hiveconf-type templating ${var} gets translated into jinja-type templating {{ var }} and ${hiveconf:var} gets translated into jinja-type templating {{ var }}. Note that you may want to use this along with the DAG(user_defined_macros=myargs) parameter. View the DAG object documentation for more details.

  • script_begin_tag (str | None) – If defined, the operator will get rid of the part of the script before the first occurrence of script_begin_tag

  • run_as_owner (bool) – Run HQL code as a DAG’s owner.

  • mapred_queue (str | None) – queue used by the Hadoop CapacityScheduler. (templated)

  • mapred_queue_priority (str | None) – priority within CapacityScheduler queue. Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

  • mapred_job_name (str | None) – This name will appear in the jobtracker. This can make monitoring easier.

template_fields :Sequence[str] = ['hql', 'schema', 'hive_cli_conn_id', 'mapred_queue', 'hiveconfs', 'mapred_job_name',...[source]
template_ext :Sequence[str] = ['.hql', '.sql'][source]
template_fields_renderers[source]
ui_color = #f0e4ec[source]
get_hook()[source]

Get Hive cli hook

prepare_template()[source]

Hook triggered after the templated fields get replaced by their content.

If you need your operator to alter the content of the file before the template is rendered, it should override this method to do so.

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

dry_run()[source]

Performs dry run for the operator - just render template fields.

on_kill()[source]

Override this method to cleanup subprocesses when a task instance gets killed. Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up or it will leave ghost processes behind.

clear_airflow_vars()[source]

Reset airflow environment variables to prevent existing ones from impacting behavior.

Was this entry helpful?