airflow.contrib.hooks.gcp_dataproc_hook

Module Contents

class airflow.contrib.hooks.gcp_dataproc_hook._DataProcJob(dataproc_api, project_id, job, region='global', job_error_states=None, num_retries=5)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

wait_for_done(self)[source]
raise_error(self, message=None)[source]
get(self)[source]
class airflow.contrib.hooks.gcp_dataproc_hook._DataProcJobBuilder(project_id, task_id, cluster_name, job_type, properties)[source]
add_labels(self, labels)[source]

Set labels for Dataproc job.

Parameters

labels (dict) – Labels for the job query.

add_variables(self, variables)[source]
add_args(self, args)[source]
add_query(self, query)[source]
add_query_uri(self, query_uri)[source]
add_jar_file_uris(self, jars)[source]
add_archive_uris(self, archives)[source]
add_file_uris(self, files)[source]
add_python_file_uris(self, pyfiles)[source]
set_main(self, main_jar, main_class)[source]
set_python_main(self, main)[source]
set_job_name(self, name)[source]
build(self)[source]
class airflow.contrib.hooks.gcp_dataproc_hook._DataProcOperation(dataproc_api, operation, num_retries)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

Continuously polls Dataproc Operation until it completes.

wait_for_done(self)[source]
get(self)[source]
_check_done(self)[source]
_raise_error(self)[source]
class airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook(gcp_conn_id='google_cloud_default', delegate_to=None, api_version='v1beta2')[source]

Bases: airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook

Hook for Google Cloud Dataproc APIs.

get_conn(self)[source]

Returns a Google Cloud Dataproc service object.

get_cluster(self, project_id, region, cluster_name)[source]
submit(self, project_id, job, region='global', job_error_states=None)[source]
create_job_template(self, task_id, cluster_name, job_type, properties)[source]
wait(self, operation)[source]

Awaits for Google Cloud Dataproc Operation to complete.

cancel(self, project_id, job_id, region='global')[source]

Cancel a Google Cloud DataProc job. :param project_id: Name of the project the job belongs to :type project_id: str :param job_id: Identifier of the job to cancel :type job_id: int :param region: Region used for the job :type region: str :returns A Job json dictionary representing the canceled job