airflow.providers.apache.hive.operators.hive

Module Contents

class airflow.providers.apache.hive.operators.hive.HiveOperator(*, hql: str, hive_cli_conn_id: str = 'hive_cli_default', schema: str = 'default', hiveconfs: Optional[Dict[Any, Any]] = None, hiveconf_jinja_translate: bool = False, script_begin_tag: Optional[str] = None, run_as_owner: bool = False, mapred_queue: Optional[str] = None, mapred_queue_priority: Optional[str] = None, mapred_job_name: Optional[str] = None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Executes hql code or hive script in a specific Hive database.

Parameters
  • hql (str) -- the hql to be executed. Note that you may also use a relative path from the dag file of a (template) hive script. (templated)

  • hive_cli_conn_id (str) -- reference to the Hive database. (templated)

  • hiveconfs (dict) -- if defined, these key value pairs will be passed to hive as -hiveconf "key"="value"

  • hiveconf_jinja_translate (bool) -- when True, hiveconf-type templating ${var} gets translated into jinja-type templating {{ var }} and ${hiveconf:var} gets translated into jinja-type templating {{ var }}. Note that you may want to use this along with the DAG(user_defined_macros=myargs) parameter. View the DAG object documentation for more details.

  • script_begin_tag (str) -- If defined, the operator will get rid of the part of the script before the first occurrence of script_begin_tag

  • run_as_owner (bool) -- Run HQL code as a DAG's owner.

  • mapred_queue (str) -- queue used by the Hadoop CapacityScheduler. (templated)

  • mapred_queue_priority (str) -- priority within CapacityScheduler queue. Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

  • mapred_job_name (str) -- This name will appear in the jobtracker. This can make monitoring easier.

template_fields = ['hql', 'schema', 'hive_cli_conn_id', 'mapred_queue', 'hiveconfs', 'mapred_job_name', 'mapred_queue_priority'][source]
template_ext = ['.hql', '.sql'][source]
ui_color = #f0e4ec[source]
get_hook(self)[source]

Get Hive cli hook

prepare_template(self)[source]
execute(self, context: Dict[str, Any])[source]
dry_run(self)[source]
on_kill(self)[source]
clear_airflow_vars(self)[source]

Reset airflow environment variables to prevent existing ones from impacting behavior.

Was this entry helpful?