Content¶
Guides
References
Commits
apache-airflow-providers-databricks
¶This is a provider package for databricks
provider. All classes for this provider package
are in airflow.providers.databricks
python package.
You can install this package on top of an existing Airflow 2 installation (see Requirements
below)
for the minimum Airflow version supported) via
pip install apache-airflow-providers-databricks
The minimum Apache Airflow version supported by this provider package is 2.4.0
.
PIP package |
Version required |
---|---|
|
|
|
|
|
|
|
|
|
|
Those are dependencies that might be needed in order to use all the features of the package. You need to install the specified provider packages in order to use them.
You can install such cross-provider dependencies when installing from PyPI. For example:
pip install apache-airflow-providers-databricks[common.sql]
Dependent package |
Extra |
---|---|
|
You can download officially released packages and verify their checksums and signatures from the Official Apache Download site
The apache-airflow-providers-databricks 4.3.0 sdist package (asc, sha512)
The apache-airflow-providers-databricks 4.3.0 wheel package (asc, sha512)
Note
This release dropped support for Python 3.7
add a return when the event is yielded in a loop to stop the execution (#31985)
Fix type annotation (#31888)
Fix Databricks SQL operator serialization (#31780)
Making Databricks run related multi-query string in one session again (#31898) (#31899)
Remove return statement after yield from triggers class (#31703)
Remove Python 3.7 support (#30963)
Note
This release of provider is only available for Airflow 2.4+ as explained in the Apache Airflow providers support policy.
Add conditional output processing in SQL operators (#31136)
Add cancel all runs functionality to Databricks hook (#31038)
Add retry param in databrics async operator (#30744)
Add repair job functionality to databricks hook (#30786)
Add 'DatabricksPartitionSensor' (#30980)
Bump minimum Airflow version in providers (#30917)
Deprecate databricks async operator (#30761)
Add delete inactive run functionality to databricks provider (#30646)
Databricks SQL sensor (#30477)
The DatabricksSqlHook
is now conforming to the same semantics as all the other DBApiHook
implementations and returns the same kind of response in its run
method. Previously (pre 4.* versions
of the provider, the Hook returned Tuple of (“cursor description”, “results”) which was not compatible
with other DBApiHooks that return just “results”. After this change (and dependency on common.sql >= 1.3.1),
The DatabricksSqlHook
returns now “results” only. The description
can be retrieved via
last_description
field of the hook after run
method completes.
That makes the DatabricksSqlHook
suitable for generic SQL operator and detailed lineage analysis.
If you had custom hooks or used the Hook in your TaskFlow code or custom operators that relied on this behaviour, you need to adapt your DAGs.
The Databricks DatabricksSQLOperator
is also more standard and derives from common
SQLExecuteQueryOperator
and uses more consistent approach to process output when SQL queries are run.
However in this case the result returned by execute
method is unchanged (it still returns Tuple of
(“description”, “results”) and this Tuple is pushed to XCom, so your DAGs relying on this behaviour
should continue working without any change.
Fix errors in Databricks SQL operator introduced when refactoring (#27854)
Bump common.sql provider to 1.3.1 (#27888)
Fix templating fields and do_xcom_push in DatabricksSQLOperator (#27868)
Fixing the behaviours of SQL Hooks and Operators finally (#27912)
Note
This release of provider is only available for Airflow 2.3+ as explained in the Apache Airflow providers support policy.
Move min airflow version to 2.3.0 for all providers (#27196)
Replace urlparse with urlsplit (#27389)
Add SQLExecuteQueryOperator (#25717)
Use new job search API for triggering Databricks job by name (#27446)
DatabricksSubmitRunOperator dbt task support (#25623)
Add common-sql lower bound for common-sql (#25789)
Remove duplicated connection-type within the provider (#26628)
Databricks: fix provider name in the User-Agent string (#25873)
Databricks: update user-agent string (#25578)
More improvements in the Databricks operators (#25260)
Improved telemetry for Databricks provider (#25115)
Unify DbApiHook.run() method with the methods which override it (#23971)
Databricks: fix test_connection implementation (#25114)
Do not convert boolean values to string in deep_string_coerce function (#25394)
Correctly handle output of the failed tasks (#25427)
Databricks: Fix provider for Airflow 2.2.x (#25674)
Added databricks_conn_id as templated field (#24945)
Add 'test_connection' method to Databricks hook (#24617)
Move all SQL classes to common-sql provider (#24836)
Update providers to use functools compat for ''cached_property'' (#24582)
Note
This release of provider is only available for Airflow 2.2+ as explained in the Apache Airflow providers support policy.
Add Deferrable Databricks operators (#19736)
Add git_source to DatabricksSubmitRunOperator (#23620)
fix: DatabricksSubmitRunOperator and DatabricksRunNowOperator cannot define .json as template_ext (#23622) (#23641)
Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (#23815)
Update to the released version of DBSQL connector
DatabricksSqlOperator - switch to databricks-sql-connector 2.x
Further improvement of Databricks Jobs operators (#23199)
More operators for Databricks Repos (#22422)
Add a link to Databricks Job Run (#22541)
Databricks SQL operators are now Python 3.10 compatible (#22886)
Databricks: Correctly handle HTTP exception (#22885)
Refactor 'DatabricksJobRunLink' to not create ad hoc TaskInstances (#22571)
Operator for updating Databricks Repos (#22278)
Fix mistakenly added install_requires for all providers (#22382)
Add new options to DatabricksCopyIntoOperator (#22076)
Databricks hook - retry on HTTP Status 429 as well (#21852)
Skip some tests for Databricks from running on Python 3.10 (#22221)
Add-showing-runtime-error-feature-to-DatabricksSubmitRunOperator (#21709)
Databricks: add support for triggering jobs by name (#21663)
Added template_ext = ('.json') to databricks operators #18925 (#21530)
Databricks SQL operators (#21363)
Fixed changelog for January 2022 (delayed) provider's release (#21439)
Support for Python 3.10
Updated Databricks docs for correct jobs 2.1 API and links (#21494)
Add 'wait_for_termination' argument for Databricks Operators (#20536)
Update connection object to ''cached_property'' in ''DatabricksHook'' (#20526)
Remove 'host' as an instance attr in 'DatabricksHook' (#20540)
Databricks: fix verification of Managed Identity (#20550)
Databricks: add more methods to represent run state information (#19723)
Databricks - allow Azure SP authentication on other Azure clouds (#19722)
Databricks: allow to specify PAT in Password field (#19585)
Databricks jobs 2.1 (#19544)
Update Databricks API from 2.0 to 2.1 (#19412)
Authentication with AAD tokens in Databricks provider (#19335)
Update Databricks operators to match latest version of API 2.0 (#19443)
Remove db call from DatabricksHook.__init__() (#20180)
Fixup string concatenations (#19099)
Databricks hook: fix expiration time check (#20036)
Auto-apply apply_default decorator (#15667)
Warning
Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+.
If your Airflow version is < 2.1.0, and you want to install this provider version, first upgrade
Airflow to at least version 2.1.0. Otherwise your Airflow package version will be upgraded
automatically and you will have to manually run airflow upgrade db
to complete the migration.
Updated documentation and readme files.
Initial version of the provider.