airflow.providers.apache.beam.hooks.beam
¶
This module contains a Apache Beam Hook.
Module Contents¶
Classes¶
Helper class for listing runner types. |
|
Hook for Apache Beam. |
|
Asynchronous hook for Apache Beam. |
Functions¶
|
Return a formatted pipeline options from a dictionary of arguments. |
|
Print output to logs. |
|
Run pipeline command in subprocess. |
- class airflow.providers.apache.beam.hooks.beam.BeamRunnerType[source]¶
Helper class for listing runner types.
For more information about runners see: https://beam.apache.org/documentation/
- airflow.providers.apache.beam.hooks.beam.beam_options_to_args(options)[source]¶
Return a formatted pipeline options from a dictionary of arguments.
The logic of this method should be compatible with Apache Beam: https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/ apache_beam/options/pipeline_options.py#L230-L251
- airflow.providers.apache.beam.hooks.beam.process_fd(proc, fd, log, process_line_callback=None, check_job_status_callback=None)[source]¶
Print output to logs.
- Parameters
proc – subprocess.
fd – File descriptor.
process_line_callback (Callable[[str], None] | None) – Optional callback which can be used to process stdout and stderr to detect job id.
log (logging.Logger) – logger.
- airflow.providers.apache.beam.hooks.beam.run_beam_command(cmd, log, process_line_callback=None, working_directory=None, check_job_status_callback=None)[source]¶
Run pipeline command in subprocess.
- Parameters
cmd (list[str]) – Parts of the command to be run in subprocess
process_line_callback (Callable[[str], None] | None) – Optional callback which can be used to process stdout and stderr to detect job id
working_directory (str | None) – Working directory
log (logging.Logger) – logger.
- class airflow.providers.apache.beam.hooks.beam.BeamHook(runner)[source]¶
Bases:
airflow.hooks.base.BaseHook
Hook for Apache Beam.
All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.
- Parameters
runner (str) – Runner type
- start_python_pipeline(variables, py_file, py_options, py_interpreter='python3', py_requirements=None, py_system_site_packages=False, process_line_callback=None, check_job_status_callback=None)[source]¶
Start Apache Beam python pipeline.
- Parameters
variables (dict) – Variables passed to the pipeline.
py_file (str) – Path to the python file to execute.
py_interpreter (str) – Python version of the Apache Beam pipeline. If None, this defaults to the python3. To track python versions supported by beam and related issues check: https://issues.apache.org/jira/browse/BEAM-1251
py_requirements (list[str] | None) –
Additional python package(s) to install. If a value is passed to this parameter, a new virtual environment has been created with additional packages installed.
You could also install the apache-beam package if it is not installed on your system, or you want to use a different version.
py_system_site_packages (bool) –
Whether to include system_site_packages in your virtualenv. See virtualenv documentation for more information.
This option is only relevant if the
py_requirements
parameter is not None.process_line_callback (Callable[[str], None] | None) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
- start_java_pipeline(variables, jar, job_class=None, process_line_callback=None)[source]¶
Start Apache Beam Java pipeline.
- Parameters
variables (dict) – Variables passed to the job.
jar (str) – Name of the jar for the pipeline
job_class (str | None) – Name of the java class for the pipeline.
process_line_callback (Callable[[str], None] | None) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
- start_go_pipeline(variables, go_file, process_line_callback=None, should_init_module=False)[source]¶
Start Apache Beam Go pipeline with a source file.
- Parameters
variables (dict) – Variables passed to the job.
go_file (str) – Path to the Go file with your beam pipeline.
process_line_callback (Callable[[str], None] | None) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
should_init_module (bool) – If False (default), will just execute a go run command. If True, will init a module and dependencies with a
go mod init
andgo mod tidy
, useful when pulling source with GCSHook.
- Returns
- Return type
None
- start_go_pipeline_with_binary(variables, launcher_binary, worker_binary, process_line_callback=None)[source]¶
Start Apache Beam Go pipeline with an executable binary.
- Parameters
variables (dict) – Variables passed to the job.
launcher_binary (str) – Path to the binary compiled for the launching platform.
worker_binary (str) – Path to the binary compiled for the worker platform.
process_line_callback (Callable[[str], None] | None) – (optional) Callback that can be used to process each line of the stdout and stderr file descriptors.
- class airflow.providers.apache.beam.hooks.beam.BeamAsyncHook(runner)[source]¶
Bases:
BeamHook
Asynchronous hook for Apache Beam.
- Parameters
runner (str) – Runner type.
- async start_python_pipeline_async(variables, py_file, py_options=None, py_interpreter='python3', py_requirements=None, py_system_site_packages=False)[source]¶
Start Apache Beam python pipeline.
- Parameters
variables (dict) – Variables passed to the pipeline.
py_file (str) – Path to the python file to execute.
py_interpreter (str) – Python version of the Apache Beam pipeline. If None, this defaults to the python3. To track python versions supported by beam and related issues check: https://issues.apache.org/jira/browse/BEAM-1251
py_requirements (list[str] | None) – Additional python package(s) to install. If a value is passed to this parameter, a new virtual environment has been created with additional packages installed. You could also install the apache-beam package if it is not installed on your system, or you want to use a different version.
py_system_site_packages (bool) – Whether to include system_site_packages in your virtualenv. See virtualenv documentation for more information. This option is only relevant if the
py_requirements
parameter is not None.
- async start_java_pipeline_async(variables, jar, job_class=None)[source]¶
Start Apache Beam Java pipeline.