tests.system.providers.google.cloud.dataproc.example_dataproc_pyspark
¶
Example Airflow DAG for DataprocSubmitJobOperator with pyspark job.
Module Contents¶
- tests.system.providers.google.cloud.dataproc.example_dataproc_pyspark.DAG_ID = 'dataproc_pyspark'[source]¶
- tests.system.providers.google.cloud.dataproc.example_dataproc_pyspark.REGION = 'europe-west1'[source]¶
- tests.system.providers.google.cloud.dataproc.example_dataproc_pyspark.ZONE = 'europe-west1-b'[source]¶
- tests.system.providers.google.cloud.dataproc.example_dataproc_pyspark.JOB_FILE_NAME = 'dataproc-pyspark-job.py'[source]¶
- tests.system.providers.google.cloud.dataproc.example_dataproc_pyspark.JOB_FILE_CONTENT = Multiline-String[source]¶
Show Value
"""from operator import add from random import random from pyspark.sql import SparkSession def f(_: int) -> float: x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x**2 + y**2 <= 1 else 0 spark = SparkSession.builder.appName("PythonPi").getOrCreate() partitions = 2 n = 100000 * partitions count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add) print(f"Pi is roughly {4.0 * count / n:f}") spark.stop() """