airflow.providers.git.bundles.git¶
Attributes¶
Classes¶
git DAG bundle - exposes a git repository as a DAG bundle. |
Module Contents¶
- class airflow.providers.git.bundles.git.GitDagBundle(*, tracking_ref, subdir=None, git_conn_id=None, repo_url=None, submodules=False, prune_dotgit_folder=True, **kwargs)[source]¶
Bases:
airflow.dag_processing.bundles.base.BaseDagBundlegit DAG bundle - exposes a git repository as a DAG bundle.
Instead of cloning the repository every time, we clone the repository once into a bare repo from the source and then do a clone for each version from there.
- Parameters:
tracking_ref (str) – Branch or tag for this DAG bundle
subdir (str | None) – Subdirectory within the repository where the DAGs are stored (Optional)
git_conn_id (str | None) – Connection ID for SSH/token based connection to the repository (Optional)
repo_url (str | None) – Explicit Git repository URL to override the connection’s host. (Optional)
submodules (bool) – Whether to initialize git submodules. In case of submodules, the .git folder is preserved.
prune_dotgit_folder (bool) –
Remove .git folder from the versions after cloning.
The per-version clone is not a full “git” copy (it makes use of git’s –local ability to share the object directory via hard links, but if you have a lot of current versions running, or an especially large git repo leaving this as True will save some disk space at the expense of git operations not working in the bundle that Tasks run from.
- hook: airflow.providers.git.hooks.git.GitHook | None = None[source]¶
- initialize()[source]¶
Initialize the bundle.
This method is called by the DAG processor and worker before the bundle is used, and allows for deferring expensive operations until that point in time. This will only be called when Airflow needs the bundle files on disk - some uses only need to call the view_url method, which can run without initializing the bundle.
This method must ultimately be safe to call concurrently from different threads or processes. If it isn’t naturally safe, you’ll need to make it so with some form of locking. There is a lock context manager on this class available for this purpose.
If you override this method, ensure you call super().initialize() at the end of your method, after the bundle is initialized, not the beginning.
- get_current_version()[source]¶
Retrieve a string that represents the version of the DAG bundle.
Airflow can use this value to retrieve this same bundle version later.
- property path: pathlib.Path[source]¶
Path for this bundle.
Airflow will use this path to find/load/execute the DAGs from the bundle. After initialize has been called, all dag files in the bundle should be accessible from this path.
- refresh()[source]¶
Retrieve the latest version of the files in the bundle.
This method must ultimately be safe to call concurrently from different threads or processes. If it isn’t naturally safe, you’ll need to make it so with some form of locking. There is a lock context manager on this class available for this purpose.
- view_url(version=None)[source]¶
Return a URL for viewing the DAGs in the repository.
This method is deprecated and will be removed when the minimum supported Airflow version is 3.1. Use view_url_template instead.
- view_url_template()[source]¶
URL template to view the bundle on an external website.
This is shown to users in the Airflow UI, allowing them to navigate to this url for more details about that version of the bundle.
The template should use format string placeholders like {version}, {subdir}, etc. Common placeholders: - {version}: The version identifier - {subdir}: The subdirectory within the bundle (if applicable)
This needs to function without initialize being called.
- Returns:
URL template string or None if not applicable
- Return type:
str | None