airflow.providers.cncf.kubernetes.operators.spark_kubernetes¶
Classes¶
| Creates sparkApplication object in kubernetes cluster. | 
Module Contents¶
- class airflow.providers.cncf.kubernetes.operators.spark_kubernetes.SparkKubernetesOperator(*, image=None, code_path=None, namespace='default', name=None, application_file=None, template_spec=None, get_logs=True, do_xcom_push=False, success_run_history_limit=1, startup_timeout_seconds=600, log_events_on_failure=False, reattach_on_restart=True, delete_on_termination=True, kubernetes_conn_id='kubernetes_default', random_name_suffix=True, **kwargs)[source]¶
- Bases: - airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator- Creates sparkApplication object in kubernetes cluster. - See also - For more detail about Spark Application Object have a look at the reference: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/v1beta2-1.3.3-3.1.1/docs/api-docs.md#sparkapplication - Parameters:
- image (str | None) – Docker image you wish to launch. Defaults to hub.docker.com, 
- code_path (str | None) – path to the spark code in image, 
- namespace (str) – kubernetes namespace to put sparkApplication 
- name (str | None) – name of the pod in which the task will run, will be used (plus a random suffix if random_name_suffix is True) to generate a pod id (DNS-1123 subdomain, containing only [a-z0-9.-]). 
- application_file (str | None) – filepath to kubernetes custom_resource_definition of sparkApplication 
- template_spec – kubernetes sparkApplication specification 
- get_logs (bool) – get the stdout of the container as logs of the tasks. 
- do_xcom_push (bool) – If True, the content of the file /airflow/xcom/return.json in the container will also be pushed to an XCom when the container completes. 
- success_run_history_limit (int) – Number of past successful runs of the application to keep. 
- startup_timeout_seconds – timeout in seconds to startup the pod. 
- log_events_on_failure (bool) – Log the pod’s events if a failure occurs 
- reattach_on_restart (bool) – if the scheduler dies while the pod is running, reattach and monitor 
- delete_on_termination (bool) – What to do when the pod reaches its final state, or the execution is interrupted. If True (default), delete the pod; if False, leave the pod. 
- kubernetes_conn_id (str) – the connection to Kubernetes cluster 
- random_name_suffix (bool) – If True, adds a random suffix to the pod name 
 
 - property pod_manager: airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager[source]¶
 - property launcher: airflow.providers.cncf.kubernetes.operators.custom_object_launcher.CustomObjectLauncher[source]¶
 - execute(context)[source]¶
- Based on the deferrable parameter runs the pod asynchronously or synchronously. 
 - on_kill()[source]¶
- Override this method to clean up subprocesses when a task instance gets killed. - Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.