Apache Livy Operators

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or an RPC client library.

LivyOperator

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

airflow/providers/apache/livy/example_dags/example_livy.py[source]

    livy_java_task = LivyOperator(
        task_id='pi_java_task',
        file='/spark-examples.jar',
        num_executors=1,
        conf={
            'spark.shuffle.compress': 'false',
        },
        class_name='org.apache.spark.examples.SparkPi',
    )

    livy_python_task = LivyOperator(task_id='pi_python_task', file='/pi.py', polling_interval=60)

    livy_java_task >> livy_python_task

Reference

For further information, look at Apache Livy.

Was this entry helpful?