airflow.contrib.hooks.databricks_hook

Module Contents

airflow.contrib.hooks.databricks_hook.RESTART_CLUSTER_ENDPOINT = ['POST', 'api/2.0/clusters/restart'][source]
airflow.contrib.hooks.databricks_hook.START_CLUSTER_ENDPOINT = ['POST', 'api/2.0/clusters/start'][source]
airflow.contrib.hooks.databricks_hook.TERMINATE_CLUSTER_ENDPOINT = ['POST', 'api/2.0/clusters/delete'][source]
airflow.contrib.hooks.databricks_hook.RUN_NOW_ENDPOINT = ['POST', 'api/2.0/jobs/run-now'][source]
airflow.contrib.hooks.databricks_hook.SUBMIT_RUN_ENDPOINT = ['POST', 'api/2.0/jobs/runs/submit'][source]
airflow.contrib.hooks.databricks_hook.GET_RUN_ENDPOINT = ['GET', 'api/2.0/jobs/runs/get'][source]
airflow.contrib.hooks.databricks_hook.CANCEL_RUN_ENDPOINT = ['POST', 'api/2.0/jobs/runs/cancel'][source]
airflow.contrib.hooks.databricks_hook.USER_AGENT_HEADER[source]
class airflow.contrib.hooks.databricks_hook.DatabricksHook(databricks_conn_id='databricks_default', timeout_seconds=180, retry_limit=3, retry_delay=1.0)[source]

Bases:airflow.hooks.base_hook.BaseHook

Interact with Databricks.

static _parse_host(host)[source]

The purpose of this function is to be robust to improper connections settings provided by users, specifically in the host field.

For example – when users supply https://xx.cloud.databricks.com as the host, we must strip out the protocol to get the host.:

h = DatabricksHook()
assert h._parse_host('https://xx.cloud.databricks.com') ==                 'xx.cloud.databricks.com'

In the case where users supply the correct xx.cloud.databricks.com as the host, this function is a no-op.:

assert h._parse_host('xx.cloud.databricks.com') == 'xx.cloud.databricks.com'
_do_api_call(self, endpoint_info, json)[source]

Utility function to perform an API call with retries

Parameters
  • endpoint_info (tuple[string, string]) – Tuple of method and endpoint

  • json (dict) – Parameters for this API call.

Returns

If the api call returns a OK status code, this function returns the response in JSON. Otherwise, we throw an AirflowException.

Return type

dict

_log_request_error(self, attempt_num, error)[source]
run_now(self, json)[source]

Utility function to call the api/2.0/jobs/run-now endpoint.

Parameters

json (dict) – The data used in the body of the request to the run-now endpoint.

Returns

the run_id as a string

Return type

str

submit_run(self, json)[source]

Utility function to call the api/2.0/jobs/runs/submit endpoint.

Parameters

json (dict) – The data used in the body of the request to the submit endpoint.

Returns

the run_id as a string

Return type

str

get_run_page_url(self, run_id)[source]
get_run_state(self, run_id)[source]
cancel_run(self, run_id)[source]
restart_cluster(self, json)[source]
start_cluster(self, json)[source]
terminate_cluster(self, json)[source]
airflow.contrib.hooks.databricks_hook._retryable_error(exception)[source]
airflow.contrib.hooks.databricks_hook.RUN_LIFE_CYCLE_STATES = ['PENDING', 'RUNNING', 'TERMINATING', 'TERMINATED', 'SKIPPED', 'INTERNAL_ERROR'][source]
class airflow.contrib.hooks.databricks_hook.RunState(life_cycle_state, result_state, state_message)[source]

Utility class for the run state concept of Databricks runs.

is_terminal[source]
is_successful[source]
__eq__(self, other)[source]
__repr__(self)[source]
class airflow.contrib.hooks.databricks_hook._TokenAuth(token)[source]

Bases:requests.auth.AuthBase

Helper class for requests Auth field. AuthBase requires you to implement the __call__ magic function.

__call__(self, r)[source]