Yandex.Cloud Data Proc Operators¶
The Yandex.Cloud Data Proc is a service that helps to deploy Apache Hadoop®* and Apache Spark™ clusters in the Yandex.Cloud infrastructure.
You can control the cluster size, node capacity, and set of Apache® services (Spark, HDFS, YARN, Hive, HBase, Oozie, Sqoop, Flume, Tez, Zeppelin).
Apache Hadoop is used for storing and analyzing structured and unstructured big data.
Apache Spark is a tool for quick data-processing that can be integrated with Apache Hadoop as well as with other storage systems.
yandexcloudpackage first, like so:
pip install 'apache-airflow[yandexcloud]'.
Restart the Airflow webserver and scheduler.
Make sure the Yandex.Cloud connection type has been defined in Airflow. Open the connections list and look for a connection with ‘yandexcloud’ type.
Fill the required fields in Yandex.Cloud connection.