airflow.providers.apache.beam.hooks.beam

This module contains a Apache Beam Hook.

Module Contents

Classes

BeamRunnerType

Helper class for listing runner types.

BeamCommandRunner

Class responsible for running pipeline command in subprocess

BeamHook

Hook for Apache Beam.

Functions

beam_options_to_args(options)

Returns a formatted pipeline options from a dictionary of arguments

class airflow.providers.apache.beam.hooks.beam.BeamRunnerType[source]

Helper class for listing runner types. For more information about runners see: https://beam.apache.org/documentation/

DataflowRunner = DataflowRunner[source]
DirectRunner = DirectRunner[source]
SparkRunner = SparkRunner[source]
FlinkRunner = FlinkRunner[source]
SamzaRunner = SamzaRunner[source]
NemoRunner = NemoRunner[source]
JetRunner = JetRunner[source]
Twister2Runner = Twister2Runner[source]
airflow.providers.apache.beam.hooks.beam.beam_options_to_args(options)[source]

Returns a formatted pipeline options from a dictionary of arguments

The logic of this method should be compatible with Apache Beam: https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/ apache_beam/options/pipeline_options.py#L230-L251

Parameters

options (dict) – Dictionary with options

Returns

List of arguments

Return type

List[str]

class airflow.providers.apache.beam.hooks.beam.BeamCommandRunner(cmd, process_line_callback=None, working_directory=None)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

Class responsible for running pipeline command in subprocess

Parameters
  • cmd (List[str]) – Parts of the command to be run in subprocess

  • process_line_callback (Optional[Callable[[str], None]]) – Optional callback which can be used to process stdout and stderr to detect job id

  • working_directory (Optional[str]) – Working directory

wait_for_done(self)[source]

Waits for Apache Beam pipeline to complete.

class airflow.providers.apache.beam.hooks.beam.BeamHook(runner)[source]

Bases: airflow.hooks.base.BaseHook

Hook for Apache Beam.

All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.

Parameters

runner (str) – Runner type

start_python_pipeline(self, variables, py_file, py_options, py_interpreter='python3', py_requirements=None, py_system_site_packages=False, process_line_callback=None)[source]

Starts Apache Beam python pipeline.

Parameters
  • variables (dict) – Variables passed to the pipeline.

  • py_file (str) – Path to the python file to execute.

  • py_options (List[str]) – Additional options.

  • py_interpreter (str) – Python version of the Apache Beam pipeline. If None, this defaults to the python3. To track python versions supported by beam and related issues check: https://issues.apache.org/jira/browse/BEAM-1251

  • py_requirements (Optional[List[str]]) –

    Additional python package(s) to install. If a value is passed to this parameter, a new virtual environment has been created with additional packages installed.

    You could also install the apache-beam package if it is not installed on your system or you want to use a different version.

  • py_system_site_packages (bool) –

    Whether to include system_site_packages in your virtualenv. See virtualenv documentation for more information.

    This option is only relevant if the py_requirements parameter is not None.

  • process_line_callback (Optional[Callable[[str], None]]) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.

start_java_pipeline(self, variables, jar, job_class=None, process_line_callback=None)[source]

Starts Apache Beam Java pipeline.

Parameters
  • variables (dict) – Variables passed to the job.

  • jar (str) – Name of the jar for the pipeline

  • job_class (Optional[str]) – Name of the java class for the pipeline.

  • process_line_callback (Optional[Callable[[str], None]]) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.

start_go_pipeline(self, variables, go_file, process_line_callback=None, should_init_module=False)[source]

Starts Apache Beam Go pipeline.

Parameters
  • variables (dict) – Variables passed to the job.

  • go_file (str) – Path to the Go file with your beam pipeline.

  • go_file

  • process_line_callback (Optional[Callable[[str], None]]) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.

  • should_init_module (bool) – If False (default), will just execute a go run command. If True, will init a module and dependencies with a go mod init and go mod tidy, useful when pulling source with GCSHook.

Returns

Return type

None

Was this entry helpful?