:mod:`airflow.contrib.hooks.databricks_hook` ============================================ .. py:module:: airflow.contrib.hooks.databricks_hook Module Contents --------------- .. data:: RESTART_CLUSTER_ENDPOINT :annotation: = ['POST', 'api/2.0/clusters/restart'] .. data:: START_CLUSTER_ENDPOINT :annotation: = ['POST', 'api/2.0/clusters/start'] .. data:: TERMINATE_CLUSTER_ENDPOINT :annotation: = ['POST', 'api/2.0/clusters/delete'] .. data:: RUN_NOW_ENDPOINT :annotation: = ['POST', 'api/2.0/jobs/run-now'] .. data:: SUBMIT_RUN_ENDPOINT :annotation: = ['POST', 'api/2.0/jobs/runs/submit'] .. data:: GET_RUN_ENDPOINT :annotation: = ['GET', 'api/2.0/jobs/runs/get'] .. data:: CANCEL_RUN_ENDPOINT :annotation: = ['POST', 'api/2.0/jobs/runs/cancel'] .. data:: USER_AGENT_HEADER .. py:class:: DatabricksHook(databricks_conn_id='databricks_default', timeout_seconds=180, retry_limit=3, retry_delay=1.0) Bases: :class:`airflow.hooks.base_hook.BaseHook` Interact with Databricks. .. staticmethod:: _parse_host(host) The purpose of this function is to be robust to improper connections settings provided by users, specifically in the host field. For example -- when users supply ``https://xx.cloud.databricks.com`` as the host, we must strip out the protocol to get the host.:: h = DatabricksHook() assert h._parse_host('https://xx.cloud.databricks.com') == 'xx.cloud.databricks.com' In the case where users supply the correct ``xx.cloud.databricks.com`` as the host, this function is a no-op.:: assert h._parse_host('xx.cloud.databricks.com') == 'xx.cloud.databricks.com' .. method:: _do_api_call(self, endpoint_info, json) Utility function to perform an API call with retries :param endpoint_info: Tuple of method and endpoint :type endpoint_info: tuple[string, string] :param json: Parameters for this API call. :type json: dict :return: If the api call returns a OK status code, this function returns the response in JSON. Otherwise, we throw an AirflowException. :rtype: dict .. method:: _log_request_error(self, attempt_num, error) .. method:: run_now(self, json) Utility function to call the ``api/2.0/jobs/run-now`` endpoint. :param json: The data used in the body of the request to the ``run-now`` endpoint. :type json: dict :return: the run_id as a string :rtype: str .. method:: submit_run(self, json) Utility function to call the ``api/2.0/jobs/runs/submit`` endpoint. :param json: The data used in the body of the request to the ``submit`` endpoint. :type json: dict :return: the run_id as a string :rtype: str .. method:: get_run_page_url(self, run_id) .. method:: get_run_state(self, run_id) .. method:: cancel_run(self, run_id) .. method:: restart_cluster(self, json) .. method:: start_cluster(self, json) .. method:: terminate_cluster(self, json) .. function:: _retryable_error(exception) .. data:: RUN_LIFE_CYCLE_STATES :annotation: = ['PENDING', 'RUNNING', 'TERMINATING', 'TERMINATED', 'SKIPPED', 'INTERNAL_ERROR'] .. py:class:: RunState(life_cycle_state, result_state, state_message) Utility class for the run state concept of Databricks runs. .. attribute:: is_terminal .. attribute:: is_successful .. method:: __eq__(self, other) .. method:: __repr__(self) .. py:class:: _TokenAuth(token) Bases: :class:`requests.auth.AuthBase` Helper class for requests Auth field. AuthBase requires you to implement the __call__ magic function. .. method:: __call__(self, r)