airflow.providers.apache.beam.hooks.beam¶
This module contains a Apache Beam Hook.
Module Contents¶
Classes¶
Helper class for listing runner types. |
|
Class responsible for running pipeline command in subprocess |
|
Hook for Apache Beam. |
Functions¶
|
Returns a formatted pipeline options from a dictionary of arguments |
- class airflow.providers.apache.beam.hooks.beam.BeamRunnerType[source]¶
Helper class for listing runner types. For more information about runners see: https://beam.apache.org/documentation/
- airflow.providers.apache.beam.hooks.beam.beam_options_to_args(options)[source]¶
Returns a formatted pipeline options from a dictionary of arguments
The logic of this method should be compatible with Apache Beam: https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/ apache_beam/options/pipeline_options.py#L230-L251
- class airflow.providers.apache.beam.hooks.beam.BeamCommandRunner(cmd, process_line_callback=None, working_directory=None)[source]¶
Bases:
airflow.utils.log.logging_mixin.LoggingMixinClass responsible for running pipeline command in subprocess
- Parameters
- class airflow.providers.apache.beam.hooks.beam.BeamHook(runner)[source]¶
Bases:
airflow.hooks.base.BaseHookHook for Apache Beam.
All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.
- Parameters
runner (str) -- Runner type
- start_python_pipeline(self, variables, py_file, py_options, py_interpreter='python3', py_requirements=None, py_system_site_packages=False, process_line_callback=None)[source]¶
Starts Apache Beam python pipeline.
- Parameters
variables (dict) -- Variables passed to the pipeline.
py_file (str) -- Path to the python file to execute.
py_options (List[str]) -- Additional options.
py_interpreter (str) -- Python version of the Apache Beam pipeline. If None, this defaults to the python3. To track python versions supported by beam and related issues check: https://issues.apache.org/jira/browse/BEAM-1251
py_requirements (Optional[List[str]]) --
Additional python package(s) to install. If a value is passed to this parameter, a new virtual environment has been created with additional packages installed.
You could also install the apache-beam package if it is not installed on your system or you want to use a different version.
py_system_site_packages (bool) --
Whether to include system_site_packages in your virtualenv. See virtualenv documentation for more information.
This option is only relevant if the
py_requirementsparameter is not None.process_line_callback (Optional[Callable[[str], None]]) -- (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
- start_java_pipeline(self, variables, jar, job_class=None, process_line_callback=None)[source]¶
Starts Apache Beam Java pipeline.
- Parameters
variables (dict) -- Variables passed to the job.
jar (str) -- Name of the jar for the pipeline
job_class (Optional[str]) -- Name of the java class for the pipeline.
process_line_callback (Optional[Callable[[str], None]]) -- (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
- start_go_pipeline(self, variables, go_file, process_line_callback=None, should_init_module=False)[source]¶
Starts Apache Beam Go pipeline.
- Parameters
variables (dict) -- Variables passed to the job.
go_file (str) -- Path to the Go file with your beam pipeline.
go_file --
process_line_callback (Optional[Callable[[str], None]]) -- (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
should_init_module (bool) -- If False (default), will just execute a go run command. If True, will init a module and dependencies with a
go mod initandgo mod tidy, useful when pulling source with GCSHook.
- Returns
- Return type
None