Release Notes¶
Apache Airflow Releases
Airflow 2.8.3 (2024-03-11)¶
Significant Changes¶
The smtp provider is now pre-installed when you install Airflow. (#37713)¶
Bug Fixes¶
Add “MENU” permission in auth manager (#37881)
Fix external_executor_id being overwritten (#37784)
Make more MappedOperator members modifiable (#37828)
Set parsing context dag_id in dag test command (#37606)
Miscellaneous¶
Remove useless methods from security manager (#37889)
Improve code coverage for TriggerRuleDep (#37680)
The SMTP provider is now preinstalled when installing Airflow (#37713)
Bump min versions of openapi validators (#37691)
Properly include
airflow_pre_installed_providers.txt
artifact (#37679)
Doc Only Changes¶
Clarify lack of sync between workers and scheduler (#37913)
Simplify some docs around airflow_local_settings (#37835)
Add section about local settings configuration (#37829)
Fix docs of
BranchDayOfWeekOperator
(#37813)Write to secrets store is not supported by design (#37814)
ERD
generating doc improvement (#37808)Update incorrect config value (#37706)
Update security model to clarify Connection Editing user’s capabilities (#37688)
Fix ImportError on examples dags (#37571)
Airflow 2.8.2 (2024-02-26)¶
Significant Changes¶
The allowed_deserialization_classes
flag now follows a glob pattern (#36147).¶
For example if one wants to add the class airflow.tests.custom_class
to the
allowed_deserialization_classes
list, it can be done by writing the full class
name (airflow.tests.custom_class
) or a pattern such as the ones used in glob
search (e.g., airflow.*
, airflow.tests.*
).
If you currently use a custom regexp path make sure to rewrite it as a glob pattern.
Alternatively, if you still wish to match it as a regexp pattern, add it under the new
list allowed_deserialization_classes_regexp
instead.
The audit_logs permissions have been updated for heightened security (#37501).¶
This was done under the policy that we do not want users like Viewer, Ops, and other users apart from Admin to have access to audit_logs. The intention behind this change is to restrict users with less permissions from viewing user details like First Name, Email etc. from the audit_logs when they are not permitted to.
The impact of this change is that the existing users with non admin rights won’t be able to view or access the audit_logs, both from the Browse tab or from the DAG run.
AirflowTimeoutError
is no longer except
by default through Exception
(#35653).¶
The AirflowTimeoutError
is now inheriting BaseException
instead of
AirflowException
->``Exception``.
See https://docs.python.org/3/library/exceptions.html#exception-hierarchy
This prevents code catching Exception
from accidentally
catching AirflowTimeoutError
and continuing to run.
AirflowTimeoutError
is an explicit intent to cancel the task, and should not
be caught in attempts to handle the error and return some default value.
Catching AirflowTimeoutError
is still possible by explicitly except``ing
``AirflowTimeoutError
or BaseException
.
This is discouraged, as it may allow the code to continue running even after
such cancellation requests.
Code that previously depended on performing strict cleanup in every situation
after catching Exception
is advised to use finally
blocks or
context managers. To perform only the cleanup and then automatically
re-raise the exception.
See similar considerations about catching KeyboardInterrupt
in
https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt
Bug Fixes¶
Sort dag processing stats by last_runtime (#37302)
Allow pre-population of trigger form values via URL parameters (#37497)
Base date for fetching dag grid view must include selected run_id (#34887)
Check permissions for ImportError (#37468)
Move
IMPORT_ERROR
from DAG related permissions to view related permissions (#37292)Change
AirflowTaskTimeout
to inheritBaseException
(#35653)Revert “Fix future DagRun rarely triggered by race conditions when max_active_runs reached its upper limit. (#31414)” (#37596)
Change margin to padding so first task can be selected (#37527)
Fix Airflow serialization for
namedtuple
(#37168)Fix bug with clicking url-unsafe tags (#37395)
Set deterministic and new getter for
Treeview
function (#37162)Fix permissions of parent folders for log file handler (#37310)
Fix permission check on DAGs when
access_entity
is specified (#37290)Fix the value of
dateTimeAttrFormat
constant (#37285)Resolve handler close race condition at triggerer shutdown (#37206)
Fixing status icon alignment for various views (#36804)
Remove superfluous
@Sentry.enrich_errors
(#37002)Use execution_date= param as a backup to base date for grid view (#37018)
Handle SystemExit raised in the task. (#36986)
Revoking audit_log permission from all users except admin (#37501)
Fix broken regex for allowed_deserialization_classes (#36147)
Fix the bug that affected the DAG end date. (#36144)
Adjust node width based on task name length (#37254)
fix: PythonVirtualenvOperator crashes if any python_callable function is defined in the same source as DAG (#37165)
Fix collapsed grid width, line up selected bar with gantt (#37205)
Adjust graph node layout (#37207)
Revert the sequence of initializing configuration defaults (#37155)
Displaying “actual” try number in TaskInstance view (#34635)
Bugfix Triggering DAG with parameters is mandatory when show_trigger_form_if_no_params is enabled (#37063)
Secret masker ignores passwords with special chars (#36692)
Fix DagRuns with UPSTREAM_FAILED tasks get stuck in the backfill. (#36954)
Disable
dryrun
auto-fetch (#36941)Fix copy button on a DAG run’s config (#36855)
Fix bug introduced by replacing spaces by + in run_id (#36877)
Fix webserver always redirecting to home page if user was not logged in (#36833)
REST API set description on POST to
/variables
endpoint (#36820)Sanitize the conn_id to disallow potential script execution (#32867)
Fix task id copy button copying wrong id (#34904)
Fix security manager inheritance in fab provider (#36538)
Avoid
pendulum.from_timestamp
usage (#37160)
Miscellaneous¶
Install latest docker
CLI
instead of specific one (#37651)Bump
undici
from5.26.3
to5.28.3
in/airflow/www
(#37493)Add Python
3.12
exclusions inproviders/pyproject.toml
(#37404)Remove
markdown
from core dependencies (#37396)Remove unused
pageSize
method. (#37319)Add more-itertools as dependency of common-sql (#37359)
Replace other
Python 3.11
and3.12
deprecations (#37478)Include
airflow_pre_installed_providers.txt
intosdist
distribution (#37388)Turn Pydantic into an optional dependency (#37320)
Limit
universal-pathlib to < 0.2.0
(#37311)Allow running airflow against sqlite in-memory DB for tests (#37144)
Add description to
queue_when
(#36997)Updated
config.yml
for environment variablesql_alchemy_connect_args
(#36526)Bump min version of
Alembic to 1.13.1
(#36928)Limit
flask-session
to<0.6
(#36895)
Doc Only Changes¶
Fix upgrade docs to reflect true
CLI
flags available (#37231)Fix a bug in fundamentals doc (#37440)
Add redirect for deprecated page (#37384)
Fix the
otel
config descriptions (#37229)Update
Objectstore
tutorial withprereqs
section (#36983)Add more precise description on avoiding generic
package/module
names (#36927)Add airflow version substitution into Docker Compose Howto (#37177)
Add clarification about DAG author capabilities to security model (#37141)
Move docs for cron basics to Authoring and Scheduling section (#37049)
Link to release notes in the upgrade docs (#36923)
Prevent templated field logic checks in
__init__
of operators automatically (#33786)
Airflow 2.8.1 (2024-01-19)¶
Significant Changes¶
Target version for core dependency pendulum
package set to 3 (#36281).¶
Support for pendulum 2.1.2 will be saved for a while, presumably until the next feature version of Airflow. It is advised to upgrade user code to use pendulum 3 as soon as possible.
Airflow packaging specification follows modern Python packaging standards (#36537).¶
We standardized Airflow dependency configuration to follow latest development in Python packaging by
using pyproject.toml
. Airflow is now compliant with those accepted PEPs:
PEP-518 Specifying Minimum Build System Requirements for Python Projects
PEP-660 Editable installs for pyproject.toml based builds (wheel based)
PEP-685 Comparison of extra names for optional distribution dependencies
Also we implement multiple license files support coming from Draft, not yet accepted (but supported by hatchling) PEP: * PEP 639 Improving License Clarity with Better Package Metadata
This has almost no noticeable impact on users if they are using modern Python packaging and development tools, generally
speaking Airflow should behave as it did before when installing it from PyPI and it should be much easier to install
it for development purposes using pip install -e ".[devel]"
.
The differences from the user side are:
Airflow extras now get extras normalized to
-
(following PEP-685) instead of_
and.
(as it was before in some extras). When you install airflow with such extras (for exampledbt.core
orall_dbs
) you should use-
instead of_
and.
.
In most modern tools this will work in backwards-compatible way, but in some old version of those tools you might need to
replace _
and .
with -
. You can also get warnings that the extra you are installing does not exist - but usually
this warning is harmless and the extra is installed anyway. It is, however, recommended to change to use -
in extras in your dependency
specifications for all Airflow extras.
Released airflow package does not contain
devel
,devel-*
,doc
anddoc-gen
extras. Those extras are only available when you install Airflow from sources in--editable
mode. This is because those extras are only used for development and documentation building purposes and are not needed when you install Airflow for production use. Those dependencies had unspecified and varying behaviour for released packages anyway and you were not supposed to use them in released packages.The
all
andall-*
extras were not always working correctly when installing Airflow using constraints because they were also considered as development-only dependencies. With this change, those dependencies are now properly handling constraints and they will install properly with constraints, pulling the right set of providers and dependencies when constraints are used.
Graphviz dependency is now an optional one, not required one (#36647).¶
The graphviz
dependency has been problematic as Airflow required dependency - especially for
ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a
limitation, but they also require to install graphviz Python bindings to be build and installed.
This does not work for older Linux installation but - more importantly - when you try to install
Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because
Python bindings compilation for M1 can only work for Python 3.10+.
This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you already have graphviz installed, it will continue working as it did before. The only problem when it does not work is where you do not have graphviz installed it will raise an error and inform that you need it.
Graphviz will remain to be installed for most users:
the Airflow Image will still contain graphviz library, because it is added there as extra
when previous version of Airflow has been installed already, then graphviz library is already installed there and Airflow will continue working as it did
The only change will be a new installation of new version of Airflow from the scratch, where graphviz will need to be specified as extra or installed separately in order to enable DAG rendering option.
Bug Fixes¶
Fix airflow-scheduler exiting with code 0 on exceptions (#36800)
Fix Callback exception when a removed task is the last one in the
taskinstance
list (#36693)Allow anonymous user edit/show resource when set
AUTH_ROLE_PUBLIC=admin
(#36750)Better error message when sqlite URL uses relative path (#36774)
Explicit string cast required to force integer-type run_ids to be passed as strings instead of integers (#36756)
Add log lookup exception for empty
op
subtypes (#35536)Remove unused index on task instance (#36737)
Fix check on subclass for
typing.Union
in_infer_multiple_outputs
for Python 3.10+ (#36728)Make sure
multiple_outputs
is inferred correctly even when usingTypedDict
(#36652)Add back FAB constant in legacy security manager (#36719)
Fix AttributeError when using
Dagrun.update_state
(#36712)Do not let
EventsTimetable
schedule past events ifcatchup=False
(#36134)Support encryption for triggers parameters (#36492)
Fix the type hint for
tis_query
in_process_executor_events
(#36655)Redirect to index when user does not have permission to access a page (#36623)
Avoid using dict as default value in
call_regular_interval
(#36608)Remove option to set a task instance to running state in UI (#36518)
Fix details tab not showing when using dynamic task mapping (#36522)
Raise error when
DagRun
fails while runningdag test
(#36517)Refactor
_manage_executor_state
by refreshing TIs in batch (#36502)Add flask config:
MAX_CONTENT_LENGTH
(#36401)Fix get_leaves calculation for teardown in nested group (#36456)
Stop serializing timezone-naive datetime to timezone-aware datetime with UTC tz (#36379)
Make
kubernetes
decorator type annotation consistent with operator (#36405)Fix Webserver returning 500 for POST requests to
api/dag/*/dagrun
from anonymous user (#36275)Fix the required access for get_variable endpoint (#36396)
Fix datetime reference in
DAG.is_fixed_time_schedule
(#36370)Fix AirflowSkipException message raised by BashOperator (#36354)
Allow PythonVirtualenvOperator.skip_on_exit_code to be zero (#36361)
Increase width of execution_date input in trigger.html (#36278)
Fix logging for pausing DAG (#36182)
Stop deserializing pickle when enable_xcom_pickling is False (#36255)
Check DAG read permission before accessing DAG code (#36257)
Enable mark task as failed/success always (#36254)
Create latest log dir symlink as relative link (#36019)
Fix Python-based decorators templating (#36103)
Miscellaneous¶
Rename concurrency label to max active tasks (#36691)
Restore function scoped
httpx
import in file_task_handler for performance (#36753)Add support of Pendulum 3 (#36281)
Standardize airflow build process and switch to Hatchling build backend (#36537)
Get rid of
pyarrow-hotfix
forCVE-2023-47248
(#36697)Make
graphviz
dependency optional (#36647)Announce MSSQL support end in Airflow 2.9.0, add migration script hints (#36509)
Set min
pandas
dependency to 1.2.5 for all providers and airflow (#36698)Bump follow-redirects from 1.15.3 to 1.15.4 in
/airflow/www
(#36700)Provide the logger_name param to base hook in order to override the logger name (#36674)
Fix run type icon alignment with run type text (#36616)
Follow BaseHook connection fields method signature in FSHook (#36444)
Remove redundant
docker
decorator type annotations (#36406)Straighten typing in workday timetable (#36296)
Use
batch_is_authorized_dag
to check if user has permission to read DAGs (#36279)Replace deprecated get_accessible_dag_ids and use get_readable_dags in get_dag_warnings (#36256)
Doc Only Changes¶
Metrics tagging documentation (#36627)
In docs use logical_date instead of deprecated execution_date (#36654)
Add section about live-upgrading Airflow (#36637)
Replace
numpy
example with practical exercise demonstrating top-level code (#35097)Improve and add more complete description in the architecture diagrams (#36513)
Improve the error message displayed when there is a webserver error (#36570)
Update
dags.rst
with information on DAG pausing (#36540)Update installation prerequisites after upgrading to Debian Bookworm (#36521)
Add description on the ways how users should approach DB monitoring (#36483)
Add branching based on mapped task group example to dynamic-task-mapping.rst (#36480)
Add further details to replacement documentation (#36485)
Use cards when describing priority weighting methods (#36411)
Update
metrics.rst
for paramdagrun.schedule_delay
(#36404)Update admonitions in Python operator doc to reflect sentiment (#36340)
Improve audit_logs.rst (#36213)
Remove Redshift mention from the list of managed Postgres backends (#36217)
Airflow 2.8.0 (2023-12-18)¶
Significant Changes¶
Raw HTML code in DAG docs and DAG params descriptions is disabled by default (#35460)¶
To ensure that no malicious javascript can be injected with DAG descriptions or trigger UI forms by DAG authors
a new parameter webserver.allow_raw_html_descriptions
was added with default value of False
.
If you trust your DAG authors code and want to allow using raw HTML in DAG descriptions and params, you can restore the previous
behavior by setting the configuration value to True
.
To ensure Airflow is secure by default, the raw HTML support in trigger UI has been super-seeded by markdown support via
the description_md
attribute. If you have been using description_html
please migrate to description_md
.
The custom_html_form
is now deprecated.
New Features¶
AIP-58: Add Airflow ObjectStore (AFS) (AIP-58)
Add XCom tab to Grid (#35719)
Add “literal” wrapper to disable field templating (#35017)
Add task context logging feature to allow forwarding messages to task logs (#32646, #32693, #35857)
Add Listener hooks for Datasets (#34418, #36247)
Allow override of navbar text color (#35505)
Add lightweight serialization for deltalake tables (#35462)
Add support for serialization of iceberg tables (#35456)
prev_end_date_success
method access (#34528)Add task parameter to set custom logger name (#34964)
Add pyspark decorator (#35247)
Add trigger as a valid option for the db clean command (#34908)
Add decorators for external and venv python branching operators (#35043)
Allow PythonVenvOperator using other index url (#33017)
Add Python Virtualenv Operator Caching (#33355)
Introduce a generic export for containerized executor logging (#34903)
Add ability to clear downstream tis in
List Task Instances
view (#34529)Attribute
clear_number
to track DAG run being cleared (#34126)Add BranchPythonVirtualenvOperator (#33356)
Allow PythonVenvOperator using other index url (#33017)
Add CLI notification commands to providers (#33116)
Use dropdown instead of buttons when there are more than 10 retries in log tab (#36025)
Improvements¶
Add
multiselect
to run state in grid view (#35403)Fix warning message in
Connection.get_hook
in case of ImportError (#36005)Add processor_subdir to import_error table to handle multiple dag processors (#35956)
Consolidate the call of change_state to fail or success in the core executors (#35901)
Relax mandatory requirement for start_date when schedule=None (#35356)
Use ExitStack to manage mutation of secrets_backend_list in dag.test (#34620)
improved visibility of tasks in ActionModal for
taskinstance
(#35810)Create directories based on
AIRFLOW_CONFIG
path (#35818)Implements
JSON-string
connection representation generator (#35723)Move
BaseOperatorLink
into the separate module (#35032)Set mark_end_on_close after set_context (#35761)
Move external logs links to top of react logs page (#35668)
Change terminal mode to
cbreak
inexecute_interactive
and handleSIGINT
(#35602)Make raw HTML descriptions configurable (#35460)
Allow email field to be templated (#35546)
Hide logical date and run id in trigger UI form (#35284)
Improved instructions for adding dependencies in TaskFlow (#35406)
Add optional exit code to list import errors (#35378)
Limit query result on DB rather than client in
synchronize_log_template
function (#35366)Allow description to be passed in when using variables CLI (#34791)
Allow optional defaults in required fields with manual triggered dags (#31301)
Permitting airflow kerberos to run in different modes (#35146)
Refactor commands to unify daemon context handling (#34945)
Add extra fields to plugins endpoint (#34913)
Add description to pools view (#34862)
Move cli’s Connection export and Variable export command print logic to a separate function (#34647)
Extract and reuse get_kerberos_principle func from get_kerberos_principle (#34936)
Change type annotation for
BaseOperatorLink.operators
(#35003)Optimise and migrate to
SA2-compatible
syntax for TaskReschedule (#33720)Consolidate the permissions name in SlaMissModelView (#34949)
Add debug log saying what’s being run to
EventScheduler
(#34808)Increase log reader stream loop sleep duration to 1 second (#34789)
Resolve pydantic deprecation warnings re
update_forward_refs
(#34657)Unify mapped task group lookup logic (#34637)
Allow filtering event logs by attributes (#34417)
Make connection login and password TEXT (#32815)
Ban import
Dataset
fromairflow
package in codebase (#34610)Use
airflow.datasets.Dataset
in examples and tests (#34605)Enhance task status visibility (#34486)
Simplify DAG trigger UI (#34567)
Ban import AirflowException from airflow (#34512)
Add descriptions for airflow resource config parameters (#34438)
Simplify trigger name expression (#34356)
Move definition of Pod*Exceptions to pod_generator (#34346)
Add deferred tasks to the cluster_activity view Pools Slots (#34275)
heartbeat failure log message fix (#34160)
Rename variables for dag runs (#34049)
Clarify new_state in OpenAPI spec (#34056)
Remove
version
top-level element from docker compose files (#33831)Remove generic trigger cancelled error log (#33874)
Use
NOT EXISTS
subquery instead oftuple_not_in_condition
(#33527)Allow context key args to not provide a default (#33430)
Order triggers by - TI priority_weight when assign unassigned triggers (#32318)
Add metric
triggerer_heartbeat
(#33320)Allow
airflow variables export
to print to stdout (#33279)Workaround failing deadlock when running backfill (#32991)
add dag_run_ids and task_ids filter for the batch task instance API endpoint (#32705)
Configurable health check threshold for triggerer (#33089)
Rework provider manager to treat Airflow core hooks like other provider hooks (#33051)
Ensure DAG-level references are filled on unmap (#33083)
Affix webserver access_denied warning to be configurable (#33022)
Add support for arrays of different data types in the Trigger Form UI (#32734)
Add a mechanism to warn if executors override existing CLI commands (#33423)
Bug Fixes¶
Account for change in UTC offset when calculating next schedule (#35887)
Add read access to pools for viewer role (#35352)
Fix gantt chart queued duration when queued_dttm is greater than start_date for deferred tasks (#35984)
Avoid crushing container when directory is not found on rm (#36050)
Update
reset_user_sessions
to work from either CLI or web (#36056)Fix UI Grid error when DAG has been removed. (#36028)
Change Trigger UI to use HTTP POST in web ui (#36026)
Fix airflow db shell needing an extra key press to exit (#35982)
Change dag grid
overscroll
behaviour to auto (#35717)Run triggers inline with dag test (#34642)
Add
borderWidthRight
to grid for Firefoxscrollbar
(#35346)Fix for infinite recursion due to secrets_masker (#35048)
Fix write
processor_subdir
in serialized_dag table (#35661)Reload configuration for standalone dag file processor (#35725)
Long custom operator name overflows in graph view (#35382)
Add try_number to extra links query (#35317)
Prevent assignment of non JSON serializable values to DagRun.conf dict (#35096)
Numeric values in DAG details are incorrectly rendered as timestamps (#35538)
Fix Scheduler and triggerer crashes in daemon mode when statsd metrics are enabled (#35181)
Infinite UI redirection loop after deactivating an active user (#35486)
Bug fix fetch_callback of Partial Subset DAG (#35256)
Fix DagRun data interval for DeltaDataIntervalTimetable (#35391)
Fix query in
get_dag_by_pickle
util function (#35339)Fix TriggerDagRunOperator failing to trigger subsequent runs when reset_dag_run=True (#35429)
Fix weight_rule property type in
mappedoperator
(#35257)Bugfix/prevent concurrency with cached venv (#35258)
Fix dag serialization (#34042)
Fix py/url-redirection by replacing request.referrer by get_redirect() (#34237)
Fix updating variables during variable imports (#33932)
Use Literal from airflow.typing_compat in Airflow core (#33821)
Always use
Literal
fromtyping_extensions
(#33794)
Miscellaneous¶
Change default MySQL client to MariaDB (#36243)
Mark daskexecutor provider as removed (#35965)
Bump FAB to
4.3.10
(#35991)Mark daskexecutor provider as removed (#35965)
Rename
Connection.to_json_dict
toConnection.to_dict
(#35894)Upgrade to Pydantic v2 (#35551)
Bump
moto
version to>= 4.2.9
(#35687)Use
pyarrow-hotfix
to mitigate CVE-2023-47248 (#35650)Bump
axios
from0.26.0 to 1.6.0
in/airflow/www/
(#35624)Make docker decorator’s type annotation consistent with operator (#35568)
Add default to
navbar_text_color
andrm
condition in style (#35553)Avoid initiating session twice in
dag_next_execution
(#35539)Work around typing issue in examples and providers (#35494)
Enable
TCH004
andTCH005
rules (#35475)Humanize log output about retrieved DAG(s) (#35338)
Switch from Black to Ruff formatter (#35287)
Upgrade to Flask Application Builder 4.3.9 (#35085)
D401 Support (#34932, #34933)
Use requires_access to check read permission on dag instead of checking it explicitly (#34940)
Deprecate lazy import
AirflowException
from airflow (#34541)View util refactoring on mapped stuff use cases (#34638)
Bump
postcss
from8.4.25 to 8.4.31
in/airflow/www
(#34770)Refactor Sqlalchemy queries to 2.0 style (#34763, #34665, #32883, #35120)
Change to lazy loading of io in pandas serializer (#34684)
Use
airflow.models.dag.DAG
in examples (#34617)Use airflow.exceptions.AirflowException in core (#34510)
Check that dag_ids passed in request are consistent (#34366)
Refactors to make code better (#34278, #34113, #34110, #33838, #34260, #34409, #34377, #34350)
Suspend qubole provider (#33889)
Generate Python API docs for Google ADS (#33814)
Improve importing in modules (#33812, #33811, #33810, #33806, #33807, #33805, #33804, #33803, #33801, #33799, #33800, #33797, #33798, #34406, #33808)
Upgrade Elasticsearch to 8 (#33135)
Doc Only Changes¶
Add support for tabs (and other UX components) to docs (#36041)
Replace architecture diagram of Airflow with diagrams-generated one (#36035)
Add the section describing the security model of DAG Author capabilities (#36022)
Enhance docs for zombie tasks (#35825)
Reflect drop/add support of DB Backends versions in documentation (#35785)
More detail on mandatory task arguments (#35740)
Indicate usage of the
re2
regex engine in the .airflowignore documentation. (#35663)Update
best-practices.rst
(#35692)Update
dag-run.rst
to mention Airflow’s support for extended cron syntax through croniter (#35342)Update
webserver.rst
to include information of supported OAuth2 providers (#35237)Add back dag_run to docs (#35142)
Fix
rst
code block format (#34708)Add typing to concrete taskflow examples (#33417)
Add concrete examples for accessing context variables from TaskFlow tasks (#33296)
Fix links in security docs (#33329)
Airflow 2.7.3 (2023-11-06)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Fix pre-mature evaluation of tasks in mapped task group (#34337)
Add TriggerRule missing value in rest API (#35194)
Fix Scheduler crash looping when dagrun creation fails (#35135)
Fix test connection with
codemirror
and extra (#35122)Fix usage of cron-descriptor since BC in v1.3.0 (#34836)
Fix
get_plugin_info
for class based listeners. (#35022)Some improvements/fixes for dag_run and task_instance endpoints (#34942)
Fix the dags count filter in webserver home page (#34944)
Return only the TIs of the readable dags when ~ is provided as a dag_id (#34939)
Fix triggerer thread crash in daemon mode (#34931)
Fix wrong plugin schema (#34858)
Use DAG timezone in TimeSensorAsync (#33406)
Mark tasks with
all_skipped
trigger rule asskipped
if any task is inupstream_failed
state (#34392)Add read only validation to read only fields (#33413)
Misc/Internal¶
Improve testing harness to separate DB and non-DB tests (#35160, #35333)
Add pytest db_test markers to our tests (#35264)
Add pip caching for faster build (#35026)
Upper bound
pendulum
requirement to<3.0
(#35336)Limit
sentry_sdk
to1.33.0
(#35298)Fix subtle bug in mocking processor_agent in our tests (#35221)
Bump
@babel/traverse
from7.16.0 to 7.23.2
in/airflow/www
(#34988)Bump
undici
from5.19.1 to 5.26.3
in/airflow/www
(#34971)Remove unused set from
SchedulerJobRunner
(#34810)Remove warning about
max_tis per query > parallelism
(#34742)Improve modules import in Airflow core by moving some of them into a type-checking block (#33755)
Fix tests to respond to Python 3.12 handling of utcnow in sentry-sdk (#34946)
Add
connexion<3.0
upper bound (#35218)Limit Airflow to
< 3.12
(#35123)update moto version (#34938)
Limit WTForms to below
3.1.0
(#34943)
Doc Only Changes¶
Fix variables substitution in Airflow Documentation (#34462)
Added example for defaults in
conn.extras
(#35165)Update datasets.rst issue with running example code (#35035)
Remove
mysql-connector-python
from recommended MySQL driver (#34287)Fix syntax error in task dependency
set_downstream
example (#35075)Update documentation to enable test connection (#34905)
Update docs errors.rst - Mention sentry “transport” configuration option (#34912)
Update dags.rst to put SubDag deprecation note right after the SubDag section heading (#34925)
Add info on getting variables and config in custom secrets backend (#34834)
Document BaseExecutor interface in more detail to help users in writing custom executors (#34324)
Fix broken link to
airflow_local_settings.py
template (#34826)Fixes python_callable function assignment context kwargs example in params.rst (#34759)
Add missing multiple_outputs=True param in the TaskFlow example (#34812)
Remove extraneous
'>'
in provider section name (#34813)Fix imports in extra link documentation (#34547)
Airflow 2.7.2 (2023-10-12)¶
Significant Changes¶
No significant changes
Bug Fixes¶
Check if the lower of provided values are sensitives in config endpoint (#34712)
Add support for ZoneInfo and generic UTC to fix datetime serialization (#34683, #34804)
Fix AttributeError: ‘Select’ object has no attribute ‘count’ during the airflow db migrate command (#34348)
Make dry run optional for patch task instance (#34568)
Fix non deterministic datetime deserialization (#34492)
Use iterative loop to look for mapped parent (#34622)
Fix is_parent_mapped value by checking if any of the parent
taskgroup
is mapped (#34587)Avoid top-level airflow import to avoid circular dependency (#34586)
Add more exemptions to lengthy metric list (#34531)
Fix dag warning endpoint permissions (#34355)
Fix task instance access issue in the batch endpoint (#34315)
Correcting wrong time showing in grid view (#34179)
Fix www
cluster_activity
view not loading due tostandaloneDagProcessor
templating (#34274)Set
loglevel=DEBUG
in ‘Not syncingDAG-level
permissions’ (#34268)Make param validation consistent for DAG validation and triggering (#34248)
Ensure details panel is shown when any tab is selected (#34136)
Fix issues related to
access_control={}
(#34114)Fix not found
ab_user
table in the CLI session (#34120)Fix FAB-related logging format interpolation (#34139)
Fix query bug in
next_run_datasets_summary
endpoint (#34143)Fix for TaskGroup toggles for duplicated labels (#34072)
Fix the required permissions to clear a TI from the UI (#34123)
Reuse
_run_task_session
in mappedrender_template_fields
(#33309)Fix scheduler logic to plan new dag runs by ignoring manual runs (#34027)
Add missing audit logs for Flask actions add, edit and delete (#34090)
Hide Irrelevant Dag Processor from Cluster Activity Page (#33611)
Remove infinite animation for pinwheel, spin for 1.5s (#34020)
Restore rendering of provider configuration with
version_added
(#34011)
Doc Only Changes¶
Clarify audit log permissions (#34815)
Add explanation for Audit log users (#34814)
Import
AUTH_REMOTE_USER
from FAB in WSGI middleware example (#34721)Add information about drop support MsSQL as DB Backend in the future (#34375)
Document how to use the system’s timezone database (#34667)
Clarify what landing time means in doc (#34608)
Fix screenshot in dynamic task mapping docs (#34566)
Fix class reference in Public Interface documentation (#34454)
Clarify var.value.get and var.json.get usage (#34411)
Schedule default value description (#34291)
Docs for triggered_dataset_event (#34410)
Add DagRun events (#34328)
Provide tabular overview about trigger form param types (#34285)
Add link to Amazon Provider Configuration in Core documentation (#34305)
Add “security infrastructure” paragraph to security model (#34301)
Change links to SQLAlchemy 1.4 (#34288)
Add SBOM entry in security documentation (#34261)
Added more example code for XCom push and pull (#34016)
Add state utils to Public Airflow Interface (#34059)
Replace markdown style link with rst style link (#33990)
Fix broken link to the “UPDATING.md” file (#33583)
Misc/Internal¶
Update min-sqlalchemy version to account for latest features used (#34293)
Fix SesssionExemptMixin spelling (#34696)
Restrict
astroid
version < 3 (#34658)Fail dag test if defer without triggerer (#34619)
Fix connections exported output (#34640)
Don’t run isort when creating new alembic migrations (#34636)
Deprecate numeric type python version in PythonVirtualEnvOperator (#34359)
Refactor
os.path.splitext
toPath.*
(#34352, #33669)Replace = by is for type comparison (#33983)
Refactor integer division (#34180)
Refactor: Simplify comparisons (#34181)
Refactor: Simplify string generation (#34118)
Replace unnecessary dict comprehension with dict() in core (#33858)
Change “not all” to “any” for ease of readability (#34259)
Replace assert by if…raise in code (#34250, #34249)
Move default timezone to except block (#34245)
Combine similar if logic in core (#33988)
Refactor: Consolidate import and usage of random (#34108)
Consolidate importing of os.path.* (#34060)
Replace sequence concatenation by unpacking in Airflow core (#33934)
Refactor unneeded ‘continue’ jumps around the repo (#33849, #33845, #33846, #33848, #33839, #33844, #33836, #33842)
Remove [project] section from
pyproject.toml
(#34014)Move the try outside the loop when this is possible in Airflow core (#33975)
Replace loop by any when looking for a positive value in core (#33985)
Do not create lists we don’t need (#33519)
Remove useless string join from core (#33969)
Add TCH001 and TCH002 rules to pre-commit to detect and move type checking modules (#33865)
Add cancel_trigger_ids to to_cancel dequeue in batch (#33944)
Avoid creating unnecessary list when parsing stats datadog tags (#33943)
Replace dict.items by dict.values when key is not used in core (#33940)
Replace lambdas with comprehensions (#33745)
Improve modules import in Airflow core by some of them into a type-checking block (#33755)
Refactor: remove unused state - SHUTDOWN (#33746, #34063, #33893)
Refactor: Use in-place .sort() (#33743)
Use literal dict instead of calling dict() in Airflow core (#33762)
remove unnecessary map and rewrite it using list in Airflow core (#33764)
Replace lambda by a def method in Airflow core (#33758)
Replace type func by
isinstance
in fab_security manager (#33760)Replace single quotes by double quotes in all Airflow modules (#33766)
Merge multiple
isinstance
calls for the same object in a single call (#33767)Use a single statement with multiple contexts instead of nested statements in core (#33769)
Refactor: Use f-strings (#33734, #33455)
Refactor: Use random.choices (#33631)
Use
str.splitlines()
to split lines (#33592)Refactor: Remove useless str() calls (#33629)
Refactor: Improve detection of duplicates and list sorting (#33675)
Simplify conditions on
len()
(#33454)
Airflow 2.7.1 (2023-09-07)¶
Significant Changes¶
CronTriggerTimetable is now less aggressive when trying to skip a run (#33404)¶
When setting catchup=False
, CronTriggerTimetable no longer skips a run if
the scheduler does not query the timetable immediately after the previous run
has been triggered.
This should not affect scheduling in most cases, but can change the behaviour if
a DAG is paused-unpaused to manually skip a run. Previously, the timetable (with
catchup=False
) would only start a run after a DAG is unpaused, but with this
change, the scheduler would try to look at little bit back to schedule the
previous run that covers a part of the period when the DAG was paused. This
means you will need to keep a DAG paused longer (namely, for the entire cron
period to pass) to really skip a run.
Note that this is also the behaviour exhibited by various other cron-based
scheduling tools, such as anacron
.
conf.set()
becomes case insensitive to match conf.get()
behavior (#33452)¶
Also, conf.get()
will now break if used with non-string parameters.
conf.set(section, key, value)
used to be case sensitive, i.e. conf.set("SECTION", "KEY", value)
and conf.set("section", "key", value)
were stored as two distinct configurations.
This was inconsistent with the behavior of conf.get(section, key)
, which was always converting the section and key to lower case.
As a result, configuration options set with upper case characters in the section or key were unreachable.
That’s why we are now converting section and key to lower case in conf.set
too.
We also changed a bit the behavior of conf.get()
. It used to allow objects that are not strings in the section or key.
Doing this will now result in an exception. For instance, conf.get("section", 123)
needs to be replaced with conf.get("section", "123")
.
Bug Fixes¶
Ensure that tasks wait for running indirect setup (#33903)
Respect “soft_fail” for core async sensors (#33403)
Differentiate 0 and unset as a default param values (#33965)
Raise 404 from Variable PATCH API if variable is not found (#33885)
Fix
MappedTaskGroup
tasks not respecting upstream dependency (#33732)Add limit 1 if required first value from query result (#33672)
Fix UI DAG counts including deleted DAGs (#33778)
Fix cleaning zombie RESTARTING tasks (#33706)
SECURITY_MANAGER_CLASS
should be a reference to class, not a string (#33690)Add back
get_url_for_login
in security manager (#33660)Fix
2.7.0 db
migration job errors (#33652)Set context inside templates (#33645)
Treat dag-defined access_control as authoritative if defined (#33632)
Bind engine before attempting to drop archive tables (#33622)
Add a fallback in case no first name and last name are set (#33617)
Sort data before
groupby
in TIS duration calculation (#33535)Stop adding values to rendered templates UI when there is no dagrun (#33516)
Set strict to True when parsing dates in webserver views (#33512)
Use
dialect.name
in custom SA types (#33503)Do not return ongoing dagrun when a
end_date
is less thanutcnow
(#33488)Fix a bug in
formatDuration
method (#33486)Make
conf.set
case insensitive (#33452)Allow timetable to slightly miss catchup cutoff (#33404)
Respect
soft_fail
argument whenpoke
is called (#33401)Create a new method used to resume the task in order to implement specific logic for operators (#33424)
Fix DagFileProcessor interfering with dags outside its
processor_subdir
(#33357)Remove the unnecessary
<br>
text in Provider’s view (#33326)Respect
soft_fail
argument when ExternalTaskSensor runs in deferrable mode (#33196)Fix handling of default value and serialization of Param class (#33141)
Check if the dynamically-added index is in the table schema before adding (#32731)
Fix rendering the mapped parameters when using
expand_kwargs
method (#32272)Fix dependencies for celery and opentelemetry for Python 3.8 (#33579)
Misc/Internal¶
Bring back
Pydantic
1 compatibility (#34081, #33998)Use a trimmed version of README.md for PyPI (#33637)
Upgrade to
Pydantic
2 (#33956)Reorganize
devel_only
extra in Airflow’s setup.py (#33907)Bumping
FAB
to4.3.4
in order to fix issues with filters (#33931)Add minimum requirement for
sqlalchemy to 1.4.24
(#33892)Update version_added field for configs in config file (#33509)
Replace
OrderedDict
with plain dict (#33508)Consolidate import and usage of itertools (#33479)
Static check fixes (#33462)
Import utc from datetime and normalize its import (#33450)
D401 Support (#33352, #33339, #33337, #33336, #33335, #33333, #33338)
Fix some missing type hints (#33334)
D205 Support - Stragglers (#33301, #33298, #33297)
Refactor: Simplify code (#33160, #33270, #33268, #33267, #33266, #33264, #33292, #33453, #33476, #33567, #33568, #33480, #33753, #33520, #33623)
Fix
Pydantic
warning aboutorm_mode
rename (#33220)Add MySQL 8.1 to supported versions. (#33576)
Remove
Pydantic
limitation for version < 2 (#33507)
Doc only changes¶
Add documentation explaining template_ext (and how to override it) (#33735)
Explain how users can check if python code is top-level (#34006)
Clarify that DAG authors can also run code in DAG File Processor (#33920)
Fix broken link in Modules Management page (#33499)
Fix secrets backend docs (#33471)
Fix config description for base_log_folder (#33388)
Airflow 2.7.0 (2023-08-18)¶
Significant Changes¶
Remove Python 3.7 support (#30963)¶
As of now, Python 3.7 is no longer supported by the Python community. Therefore, to use Airflow 2.7.0, you must ensure your Python version is either 3.8, 3.9, 3.10, or 3.11.
Old Graph View is removed (#32958)¶
The old Graph View is removed. The new Graph View is the default view now.
The trigger UI form is skipped in web UI if no parameters are defined in a DAG (#33351)¶
If you are using dag_run.conf
dictionary and web UI JSON entry to run your DAG you should either:
Enable the new configuration
show_trigger_form_if_no_params
to bring back old behaviour
The “db init”, “db upgrade” commands and “[database] load_default_connections” configuration options are deprecated (#33136).¶
Instead, you should use “airflow db migrate” command to create or upgrade database. This command will not create default connections. In order to create default connections you need to run “airflow connections create-default-connections” explicitly, after running “airflow db migrate”.
In case of SMTP SSL connection, the context now uses the “default” context (#33070)¶
The “default” context is Python’s default_ssl_contest
instead of previously used “none”. The
default_ssl_context
provides a balance between security and compatibility but in some cases,
when certificates are old, self-signed or misconfigured, it might not work. This can be configured
by setting “ssl_context” in “email” configuration of Airflow.
Setting it to “none” brings back the “none” setting that was used in Airflow 2.6 and before, but it is not recommended due to security reasons ad this setting disables validation of certificates and allows MITM attacks.
Disable default allowing the testing of connections in UI, API and CLI(#32052)¶
For security reasons, the test connection functionality is disabled by default across Airflow UI,
API and CLI. The availability of the functionality can be controlled by the
test_connection
flag in the core
section of the Airflow
configuration (airflow.cfg
). It can also be controlled by the
environment variable AIRFLOW__CORE__TEST_CONNECTION
.
The following values are accepted for this config param:
1. Disabled
: Disables the test connection functionality and
disables the Test Connection button in the UI.
This is also the default value set in the Airflow configuration.
2. Enabled
: Enables the test connection functionality and
activates the Test Connection button in the UI.
3. Hidden
: Disables the test connection functionality and
hides the Test Connection button in UI.
For more information on capabilities of users, see the documentation: https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html#capabilities-of-authenticated-ui-users It is strongly advised to not enable the feature until you make sure that only highly trusted UI/API users have “edit connection” permissions.
The xcomEntries
API disables support for the deserialize
flag by default (#32176)¶
For security reasons, the /dags/*/dagRuns/*/taskInstances/*/xcomEntries/*
API endpoint now disables the deserialize
option to deserialize arbitrary
XCom values in the webserver. For backward compatibility, server admins may set
the [api] enable_xcom_deserialize_support
config to True to enable the
flag and restore backward compatibility.
However, it is strongly advised to not enable the feature, and perform deserialization at the client side instead.
Change of the default Celery application name (#32526)¶
Default name of the Celery application changed from airflow.executors.celery_executor
to airflow.providers.celery.executors.celery_executor
.
- You should change both your configuration and Health check command to use the new name:
in configuration (
celery_app_name
configuration incelery
section) useairflow.providers.celery.executors.celery_executor
in your Health check command use
airflow.providers.celery.executors.celery_executor.app
The default value for scheduler.max_tis_per_query
is changed from 512 to 16 (#32572)¶
This change is expected to make the Scheduler more responsive.
scheduler.max_tis_per_query
needs to be lower than core.parallelism
.
If both were left to their default value previously, the effective default value of scheduler.max_tis_per_query
was 32
(because it was capped at core.parallelism
).
To keep the behavior as close as possible to the old config, one can set scheduler.max_tis_per_query = 0
,
in which case it’ll always use the value of core.parallelism
.
Some executors have been moved to corresponding providers (#32767)¶
In order to use the executors, you need to install the providers:
for Celery executors you need to install
apache-airflow-providers-celery
package >= 3.3.0for Kubernetes executors you need to install
apache-airflow-providers-cncf-kubernetes
package >= 7.4.0For Dask executors you need to install
apache-airflow-providers-daskexecutor
package in any version
You can achieve it also by installing airflow with [celery]
, [cncf.kubernetes]
, [daskexecutor]
extras respectively.
Users who base their images on the apache/airflow
reference image (not slim) should be unaffected - the base
reference image comes with all the three providers installed.
Improvement Changes¶
PostgreSQL only improvement: Added index on taskinstance table (#30762)¶
This index seems to have great positive effect in a setup with tens of millions such rows.
New Features¶
Add OpenTelemetry to Airflow (AIP-49)
Trigger Button - Implement Part 2 of AIP-50 (#31583)
Removing Executor Coupling from Core Airflow (AIP-51)
Automatic setup and teardown tasks (AIP-52)
OpenLineage in Airflow (AIP-53)
Experimental: Add a cache to Variable and Connection when called at dag parsing time (#30259)
Enable pools to consider deferred tasks (#32709)
Allows to choose SSL context for SMTP connection (#33070)
New gantt tab (#31806)
Load plugins from providers (#32692)
Add
BranchExternalPythonOperator
(#32787, #33360)Add option for storing configuration description in providers (#32629)
Introduce Heartbeat Parameter to Allow
Per-LocalTaskJob
Configuration (#32313)Add Executors discovery and documentation (#32532)
Add JobState for job state constants (#32549)
Add config to disable the ‘deserialize’ XCom API flag (#32176)
Show task instance in web UI by custom operator name (#31852)
Add default_deferrable config (#31712)
Introducing
AirflowClusterPolicySkipDag
exception (#32013)Use
reactflow
for datasets graph (#31775)Add an option to load the dags from db for command tasks run (#32038)
Add version of
chain
which doesn’t require matched lists (#31927)Use operator_name instead of task_type in UI (#31662)
Add
--retry
and--retry-delay
toairflow db check
(#31836)Allow skipped task state task_instance_schema.py (#31421)
Add a new config for celery result_backend engine options (#30426)
UI Add Cluster Activity Page (#31123, #32446)
Adding keyboard shortcuts to common actions (#30950)
Adding more information to kubernetes executor logs (#29929)
Add support for configuring custom alembic file (#31415)
Add running and failed status tab for DAGs on the UI (#30429)
Add multi-select, proposals and labels for trigger form (#31441)
Making webserver config customizable (#29926)
Render DAGCode in the Grid View as a tab (#31113)
Add rest endpoint to get option of configuration (#31056)
Add
section
query param in get config rest API (#30936)Create metrics to track
Scheduled->Queued->Running
task state transition times (#30612)Mark Task Groups as Success/Failure (#30478)
Add CLI command to list the provider trigger info (#30822)
Add Fail Fast feature for DAGs (#29406)
Improvements¶
Improve graph nesting logic (#33421)
Configurable health check threshold for triggerer (#33089, #33084)
add dag_run_ids and task_ids filter for the batch task instance API endpoint (#32705)
Ensure DAG-level references are filled on unmap (#33083)
Add support for arrays of different data types in the Trigger Form UI (#32734)
Always show gantt and code tabs (#33029)
Move listener success hook to after SQLAlchemy commit (#32988)
Rename
db upgrade
todb migrate
and addconnections create-default-connections
(#32810, #33136)Remove old gantt chart and redirect to grid views gantt tab (#32908)
Adjust graph zoom based on selected task (#32792)
Call listener on_task_instance_running after rendering templates (#32716)
Display execution_date in graph view task instance tooltip. (#32527)
Allow configuration to be contributed by providers (#32604, #32755, #32812)
Reduce default for max TIs per query, enforce
<=
parallelism (#32572)Store config description in Airflow configuration object (#32669)
Use
isdisjoint
instead ofnot intersection
(#32616)Speed up calculation of leaves and roots for task groups (#32592)
Kubernetes Executor Load Time Optimizations (#30727)
Save DAG parsing time if dag is not schedulable (#30911)
Updates health check endpoint to include
dag_processor
status. (#32382)Disable default allowing the testing of connections in UI, API and CLI (#32052, #33342)
Fix config var types under the scheduler section (#32132)
Allow to sort Grid View alphabetically (#32179)
Add hostname to triggerer metric
[triggers.running]
(#32050)Improve DAG ORM cleanup code (#30614)
TriggerDagRunOperator
: Addwait_for_completion
totemplate_fields
(#31122)Open links in new tab that take us away from Airflow UI (#32088)
Only show code tab when a task is not selected (#31744)
Add descriptions for celery and dask cert configs (#31822)
PythonVirtualenvOperator
termination log in alert (#31747)Migration of all DAG details to existing grid view dag details panel (#31690)
Add a diagram to help visualize timer metrics (#30650)
Celery Executor load time optimizations (#31001)
Update code style for
airflow db
commands to SQLAlchemy 2.0 style (#31486)Mark uses of md5 as “not-used-for-security” in FIPS environments (#31171)
Add pydantic support to serde (#31565)
Enable search in note column in DagRun and TaskInstance (#31455)
Save scheduler execution time by adding new Index idea for dag_run (#30827)
Save scheduler execution time by caching dags (#30704)
Support for sorting DAGs by Last Run Date in the web UI (#31234)
Better typing for Job and JobRunners (#31240)
Add sorting logic by created_date for fetching triggers (#31151)
Remove DAGs.can_create on access control doc, adjust test fixture (#30862)
Split Celery logs into stdout/stderr (#30485)
Decouple metrics clients and
validators
into their own modules (#30802)Description added for pagination in
get_log
api (#30729)Optimize performance of scheduling mapped tasks (#30372)
Add sentry transport configuration option (#30419)
Better message on deserialization error (#30588)
Bug Fixes¶
Remove user sessions when resetting password (#33347)
Gantt chart:
Use earliest/oldest ti dates if different than dag run start/end (#33215)Fix
virtualenv
detection for Pythonvirtualenv
operator (#33223)Correctly log when there are problems trying to
chmod
airflow.cfg
(#33118)Pass app context to webserver_config.py (#32759)
Skip served logs for non-running task try (#32561)
Fix reload gunicorn workers (#32102)
Fix future DagRun rarely triggered by race conditions when
max_active_runs
reached its upper limit. (#31414)Fix BaseOperator
get_task_instances
query (#33054)Fix issue with using the various state enum value in logs (#33065)
Use string concatenation to prepend base URL for log_url (#33063)
Update graph nodes with operator style attributes (#32822)
Affix webserver access_denied warning to be configurable (#33022)
Only load task action modal if user can edit (#32992)
OpenAPI Spec fix nullable alongside
$ref
(#32887)Make the decorators of
PythonOperator
sub-classes extend its decorator (#32845)Fix check if
virtualenv
is installed inPythonVirtualenvOperator
(#32939)Unwrap Proxy before checking
__iter__
in is_container() (#32850)Override base log folder by using task handler’s base_log_folder (#32781)
Catch arbitrary exception from run_job to prevent zombie scheduler (#32707)
Fix depends_on_past work for dynamic tasks (#32397)
Sort extra_links for predictable order in UI. (#32762)
Fix prefix group false graph (#32764)
Fix bad delete logic for dagruns (#32684)
Fix bug in prune_dict where empty dict and list would be removed even in strict mode (#32573)
Add explicit browsers list and correct rel for blank target links (#32633)
Handle returned None when multiple_outputs is True (#32625)
Fix returned value when ShortCircuitOperator condition is falsy and there is not downstream tasks (#32623)
Fix returned value when ShortCircuitOperator condition is falsy (#32569)
Fix rendering of
dagRunTimeout
(#32565)Fix permissions on
/blocked
endpoint (#32571)Bugfix, prevent force of unpause on trigger DAG (#32456)
Fix data interval in
cli.dags.trigger
command output (#32548)Strip
whitespaces
from airflow connections form (#32292)Add timedelta support for applicable arguments of sensors (#32515)
Fix incorrect default on
readonly
property in our API (#32510)Add xcom map_index as a filter to xcom endpoint (#32453)
Fix CLI commands when custom timetable is used (#32118)
Use WebEncoder to encode DagRun.conf in DagRun’s list view (#32385)
Fix logic of the skip_all_except method (#31153)
Ensure dynamic tasks inside dynamic task group only marks the (#32354)
Handle the cases that webserver.expose_config is set to non-sensitive-only instead of boolean value (#32261)
Add retry functionality for handling process termination caused by database network issues (#31998)
Adapt Notifier for sla_miss_callback (#31887)
Fix XCOM view (#31807)
Fix for “Filter dags by tag” flickering on initial load of dags.html (#31578)
Fix where expanding
resizer
would not expanse grid view (#31581)Fix MappedOperator-BaseOperator attr sync check (#31520)
Always pass named
type_
arg to drop_constraint (#31306)Fix bad
drop_constraint
call in migrations (#31302)Resolving problems with redesigned grid view (#31232)
Support
requirepass
redis sentinel (#30352)Fix webserver crash when calling get
/config
(#31057)
Misc/Internal¶
Modify pathspec version restriction (#33349)
Refactor: Simplify code in
dag_processing
(#33161)For now limit
Pydantic
to< 2.0.0
(#33235)Refactor: Simplify code in models (#33181)
Add elasticsearch group to pre-2.7 defaults (#33166)
Refactor: Simplify dict manipulation in airflow/cli (#33159)
Remove redundant dict.keys() call (#33158)
Upgrade ruff to latest 0.0.282 version in pre-commits (#33152)
Move openlineage configuration to provider (#33124)
Replace State by TaskInstanceState in Airflow executors (#32627)
Get rid of Python 2 numeric relics (#33050)
Remove legacy dag code (#33058)
Remove legacy task instance modal (#33060)
Remove old graph view (#32958)
Move CeleryExecutor to the celery provider (#32526, #32628)
Move all k8S classes to
cncf.kubernetes
provider (#32767, #32891)Refactor existence-checking SQL to helper (#32790)
Extract Dask executor to new daskexecutor provider (#32772)
Remove atlas configuration definition (#32776)
Add Redis task handler (#31855)
Move writing configuration for webserver to main (webserver limited) (#32766)
Improve getting the query count in Airflow API endpoints (#32630)
Remove click upper bound (#32634)
Add D400
pydocstyle
check - core Airflow only (#31297)D205 Support (#31742, #32575, #32213, #32212, #32591, #32449, #32450)
Bump word-wrap from
1.2.3 to 1.2.4
in/airflow/www
(#32680)Strong-type all single-state enum values (#32537)
More strong typed state conversion (#32521)
SQL query improvements in utils/db.py (#32518)
Bump semver from
6.3.0 to 6.3.1
in/airflow/www
(#32506)Bump jsonschema version to
4.18.0
(#32445)Bump
stylelint
from13.13.1 to 15.10.1
in/airflow/www
(#32435)Bump tough-cookie from
4.0.0 to 4.1.3
in/airflow/www
(#32443)upgrade flask-appbuilder (#32054)
Support
Pydantic
2 (#32366)Limit click until we fix mypy issues (#32413)
A couple of minor cleanups (#31890)
Replace State usages with strong-typed
enums
(#31735)Upgrade ruff to
0.272
(#31966)Better error message when serializing callable without name (#31778)
Improve the views module a bit (#31661)
Remove
asynctest
(#31664)Refactor sqlalchemy queries to
2.0
style (#31569, #31772, #32350, #32339, #32474, #32645)Remove Python
3.7
support (#30963)Bring back min-airflow-version for preinstalled providers (#31469)
Docstring improvements (#31375)
Improve typing in SchedulerJobRunner (#31285)
Upgrade ruff to
0.0.262
(#30809)Upgrade to MyPy
1.2.0
(#30687)
Docs only changes¶
Clarify UI user types in security model (#33021)
Add links to
DAGRun / DAG / Task
in templates-ref.rst (#33013)Add docs of how to test for DAG Import Errors (#32811)
Clean-up of our new security page (#32951)
Cleans up Extras reference page (#32954)
Update Dag trigger API and command docs (#32696)
Add deprecation info to the Airflow modules and classes docstring (#32635)
Formatting installation doc to improve readability (#32502)
Fix triggerer HA doc (#32454)
Add type annotation to code examples (#32422)
Document cron and delta timetables (#32392)
Update index.rst doc to correct grammar (#32315)
Fixing small typo in python.py (#31474)
Separate out and clarify policies for providers (#30657)
Fix docs: add an “apache” prefix to pip install (#30681)
Airflow 2.6.3 (2023-07-10)¶
Significant Changes¶
Default allowed pattern of a run_id has been changed to ^[A-Za-z0-9_.~:+-]+$
(#32293).¶
Previously, there was no validation on the run_id string. There is now a validation regex that
can be set by configuring allowed_run_id_pattern
in scheduler
section.
Bug Fixes¶
Use linear time regular expressions (#32303)
Fix triggerers alive check and add a new conf for triggerer heartbeat rate (#32123)
Catch the exception that triggerer initialization failed (#31999)
Hide sensitive values from extra in connection edit form (#32309)
Sanitize
DagRun.run_id
and allow flexibility (#32293)Add triggerer canceled log (#31757)
Fix try number shown in the task view (#32361)
Retry transactions on occasional deadlocks for rendered fields (#32341)
Fix behaviour of LazyDictWithCache when import fails (#32248)
Remove
executor_class
from Job - fixing backfill for custom executors (#32219)Fix bugged singleton implementation (#32218)
Use
mapIndex
to display extra links per mapped task. (#32154)Ensure that main triggerer thread exits if the async thread fails (#32092)
Use
re2
for matching untrusted regex (#32060)Render list items in rendered fields view (#32042)
Fix hashing of
dag_dependencies
in serialized dag (#32037)Return
None
if an XComArg fails to resolve in a multiple_outputs Task (#32027)Check for DAG ID in query param from url as well as kwargs (#32014)
Flash an error message instead of failure in
rendered-templates
when map index is not found (#32011)Fix
ExternalTaskSensor
when there is no task group TIs for the current execution date (#32009)Fix number param html type in trigger template (#31980, #31946)
Fix masking nested variable fields (#31964)
Fix
operator_extra_links
property serialization in mapped tasks (#31904)Decode old-style nested Xcom value (#31866)
Add a check for trailing slash in webserver base_url (#31833)
Fix connection uri parsing when the host includes a scheme (#31465)
Fix database session closing with
xcom_pull
andinlets
(#31128)Fix DAG’s
on_failure_callback
is not invoked when task failed during testing dag. (#30965)Fix airflow module version check when using
ExternalPythonOperator
and debug logging level (#30367)
Misc/Internal¶
Fix
task.sensor
annotation in type stub (#31954)Limit
Pydantic
to< 2.0.0
until we solve2.0.0
incompatibilities (#32312)Fix
Pydantic
2 pickiness about model definition (#32307)
Doc only changes¶
Add explanation about tag creation and cleanup (#32406)
Minor updates to docs (#32369, #32315, #32310, #31794)
Clarify Listener API behavior (#32269)
Add information for users who ask for requirements (#32262)
Add links to DAGRun / DAG / Task in Templates Reference (#32245)
Add comment to warn off a potential wrong fix (#32230)
Add a note that we’ll need to restart triggerer to reflect any trigger change (#32140)
Adding missing hyperlink to the tutorial documentation (#32105)
Added difference between Deferrable and Non-Deferrable Operators (#31840)
Add comments explaining need for special “trigger end” log message (#31812)
Documentation update on Plugin updates. (#31781)
Fix SemVer link in security documentation (#32320)
Update security model of Airflow (#32098)
Update references to restructured documentation from Airflow core (#32282)
Separate out advanced logging configuration (#32131)
Add
™
to Airflow in prominent places (#31977)
Airflow 2.6.2 (2023-06-17)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Cascade update of TaskInstance to TaskMap table (#31445)
Fix Kubernetes executors detection of deleted pods (#31274)
Use keyword parameters for migration methods for mssql (#31309)
Control permissibility of driver config in extra from airflow.cfg (#31754)
Fixing broken links in openapi/v1.yaml (#31619)
Hide old alert box when testing connection with different value (#31606)
Add TriggererStatus to OpenAPI spec (#31579)
Resolving issue where Grid won’t un-collapse when Details is collapsed (#31561)
Fix sorting of tags (#31553)
Add the missing
map_index
to the xcom key when skipping downstream tasks (#31541)Fix airflow users delete CLI command (#31539)
Include triggerer health status in Airflow
/health
endpoint (#31529)Remove dependency already registered for this task warning (#31502)
Use kube_client over default CoreV1Api for deleting pods (#31477)
Ensure min backoff in base sensor is at least 1 (#31412)
Fix
max_active_tis_per_dagrun
for Dynamic Task Mapping (#31406)Fix error handling when pre-importing modules in DAGs (#31401)
Fix dropdown default and adjust tutorial to use 42 as default for proof (#31400)
Fix crash when clearing run with task from normal to mapped (#31352)
Make BaseJobRunner a generic on the job class (#31287)
Fix
url_for_asset
fallback and 404 on DAG Audit Log (#31233)Don’t present an undefined execution date (#31196)
Added spinner activity while the logs load (#31165)
Include rediss to the list of supported URL schemes (#31028)
Optimize scheduler by skipping “non-schedulable” DAGs (#30706)
Save scheduler execution time during search for queued dag_runs (#30699)
Fix ExternalTaskSensor to work correctly with task groups (#30742)
Fix DAG.access_control can’t sync when clean access_control (#30340)
Fix failing get_safe_url tests for latest Python 3.8 and 3.9 (#31766)
Fix typing for POST user endpoint (#31767)
Fix wrong update for nested group default args (#31776)
Fix overriding
default_args
in nested task groups (#31608)Mark
[secrets] backend_kwargs
as a sensitive config (#31788)Executor events are not always “exited” here (#30859)
Validate connection IDs (#31140)
Misc/Internal¶
Add Python 3.11 support (#27264)
Replace unicodecsv with standard csv library (#31693)
Bring back unicodecsv as dependency of Airflow (#31814)
Remove found_descendents param from get_flat_relative_ids (#31559)
Fix typing in external task triggers (#31490)
Wording the next and last run DAG columns better (#31467)
Skip auto-document things with :meta private: (#31380)
Add an example for sql_alchemy_connect_args conf (#31332)
Convert dask upper-binding into exclusion (#31329)
Upgrade FAB to 4.3.1 (#31203)
Added metavar and choices to –state flag in airflow dags list-jobs CLI for suggesting valid state arguments. (#31308)
Use only one line for tmp dir log (#31170)
Rephrase comment in setup.py (#31312)
Add fullname to owner on logging (#30185)
Make connection id validation consistent across interface (#31282)
Use single source of truth for sensitive config items (#31820)
Doc only changes¶
Add docstring and signature for _read_remote_logs (#31623)
Remove note about triggerer being 3.7+ only (#31483)
Fix version support information (#31468)
Add missing BashOperator import to documentation example (#31436)
Fix task.branch error caused by incorrect initial parameter (#31265)
Update callbacks documentation (errors and context) (#31116)
Add an example for dynamic task mapping with non-TaskFlow operator (#29762)
Few doc fixes - links, grammar and wording (#31719)
Add description in a few more places about adding airflow to pip install (#31448)
Fix table formatting in docker build documentation (#31472)
Update documentation for constraints installation (#31882)
Airflow 2.6.1 (2023-05-16)¶
Significant Changes¶
Clarifications of the external Health Check mechanism and using Job
classes (#31277).¶
In the past SchedulerJob and other *Job
classes are known to have been used to perform
external health checks for Airflow components. Those are, however, Airflow DB ORM related classes.
The DB models and database structure of Airflow are considered as internal implementation detail, following
public interface).
Therefore, they should not be used for external health checks. Instead, you should use the
airflow jobs check
CLI command (introduced in Airflow 2.1) for that purpose.
Bug Fixes¶
Fix calculation of health check threshold for SchedulerJob (#31277)
Fix timestamp parse failure for k8s executor pod tailing (#31175)
Make sure that DAG processor job row has filled value in
job_type
column (#31182)Fix section name reference for
api_client_retry_configuration
(#31174)Ensure the KPO runs pod mutation hooks correctly (#31173)
Remove worrying log message about redaction from the OpenLineage plugin (#31149)
Move
interleave_timestamp_parser
config to the logging section (#31102)Ensure that we check worker for served logs if no local or remote logs found (#31101)
Fix
MappedTaskGroup
import in taskinstance file (#31100)Format DagBag.dagbag_report() Output (#31095)
Mask task attribute on task detail view (#31125)
Fix template error when iterating None value and fix params documentation (#31078)
Fix
apache-hive
extra so it installs the correct package (#31068)Fix issue with zip files in DAGs folder when pre-importing Airflow modules (#31061)
Move TaskInstanceKey to a separate file to fix circular import (#31033, #31204)
Fix deleting DagRuns and TaskInstances that have a note (#30987)
Fix
airflow providers get
command output (#30978)Fix Pool schema in the OpenAPI spec (#30973)
Add support for dynamic tasks with template fields that contain
pandas.DataFrame
(#30943)Use the Task Group explicitly passed to ‘partial’ if any (#30933)
Fix
order_by
request in list DAG rest api (#30926)Include node height/width in center-on-task logic (#30924)
Remove print from dag trigger command (#30921)
Improve task group UI in new graph (#30918)
Fix mapped states in grid view (#30916)
Fix problem with displaying graph (#30765)
Fix backfill KeyError when try_number out of sync (#30653)
Re-enable clear and setting state in the TaskInstance UI (#30415)
Prevent DagRun’s
state
andstart_date
from being reset when clearing a task in a running DagRun (#30125)
Misc/Internal¶
Upper bind dask until they solve a side effect in their test suite (#31259)
Show task instances affected by clearing in a table (#30633)
Fix missing models in API documentation (#31021)
Doc only changes¶
Improve description of the
dag_processing.processes
metric (#30891)Improve Quick Start instructions (#30820)
Add section about missing task logs to the FAQ (#30717)
Mount the
config
directory in docker compose (#30662)Update
version_added
config field formight_contain_dag
andmetrics_allow_list
(#30969)
Airflow 2.6.0 (2023-04-30)¶
Significant Changes¶
Default permissions of file task handler log directories and files has been changed to “owner + group” writeable (#29506).¶
Default setting handles case where impersonation is needed and both users (airflow and the impersonated user)
have the same group set as main group. Previously the default was also other-writeable and the user might choose
to use the other-writeable setting if they wish by configuring file_task_handler_new_folder_permissions
and file_task_handler_new_file_permissions
in logging
section.
SLA callbacks no longer add files to the dag processor manager’s queue (#30076)¶
This stops SLA callbacks from keeping the dag processor manager permanently busy. It means reduced CPU, and fixes issues where SLAs stop the system from seeing changes to existing dag files. Additional metrics added to help track queue state.
The cleanup()
method in BaseTrigger is now defined as asynchronous (following async/await) pattern (#30152).¶
This is potentially a breaking change for any custom trigger implementations that override the cleanup()
method and uses synchronous code, however using synchronous operations in cleanup was technically wrong,
because the method was executed in the main loop of the Triggerer and it was introducing unnecessary delays
impacting other triggers. The change is unlikely to affect any existing trigger implementations.
The gauge scheduler.tasks.running
no longer exist (#30374)¶
The gauge has never been working and its value has always been 0. Having an accurate value for this metric is complex so it has been decided that removing this gauge makes more sense than fixing it with no certainty of the correctness of its value.
Consolidate handling of tasks stuck in queued under new task_queued_timeout
config (#30375)¶
Logic for handling tasks stuck in the queued state has been consolidated, and the all configurations
responsible for timing out stuck queued tasks have been deprecated and merged into
[scheduler] task_queued_timeout
. The configurations that have been deprecated are
[kubernetes] worker_pods_pending_timeout
, [celery] stalled_task_timeout
, and
[celery] task_adoption_timeout
. If any of these configurations are set, the longest timeout will be
respected. For example, if [celery] stalled_task_timeout
is 1200, and [scheduler] task_queued_timeout
is 600, Airflow will set [scheduler] task_queued_timeout
to 1200.
Improvement Changes¶
Display only the running configuration in configurations view (#28892)¶
The configurations view now only displays the running configuration. Previously, the default configuration
was displayed at the top but it was not obvious whether this default configuration was overridden or not.
Subsequently, the non-documented endpoint /configuration?raw=true
is deprecated and will be removed in
Airflow 3.0. The HTTP response now returns an additional Deprecation
header. The /config
endpoint on
the REST API is the standard way to fetch Airflow configuration programmatically.
Explicit skipped states list for ExternalTaskSensor (#29933)¶
ExternalTaskSensor now has an explicit skipped_states
list
Miscellaneous Changes¶
Handle OverflowError on exponential backoff in next_run_calculation (#28172)¶
Maximum retry task delay is set to be 24h (86400s) by default. You can change it globally via core.max_task_retry_delay
parameter.
Move Hive macros to the provider (#28538)¶
The Hive Macros (hive.max_partition
, hive.closest_ds_partition
) are available only when Hive Provider is
installed. Please install Hive Provider > 5.1.0 when using those macros.
Updated app to support configuring the caching hash method for FIPS v2 (#30675)¶
Various updates for FIPS-compliance when running Airflow in Python 3.9+. This includes a new webserver option, caching_hash_method
,
for changing the default flask caching method.
New Features¶
AIP-50 Trigger DAG UI Extension with Flexible User Form Concept (#27063,#29376)
Skip PythonVirtualenvOperator task when it returns a provided exit code (#30690)
rename skip_exit_code to skip_on_exit_code and allow providing multiple codes (#30692)
Add skip_on_exit_code also to ExternalPythonOperator (#30738)
Add
max_active_tis_per_dagrun
for Dynamic Task Mapping (#29094)Add serializer for pandas dataframe (#30390)
Deferrable
TriggerDagRunOperator
(#30292)Add command to get DAG Details via CLI (#30432)
Adding ContinuousTimetable and support for @continuous schedule_interval (#29909)
Allow customized rules to check if a file has dag (#30104)
Add a new Airflow conf to specify a SSL ca cert for Kubernetes client (#30048)
Bash sensor has an explicit retry code (#30080)
Add filter task upstream/downstream to grid view (#29885)
Add testing a connection via Airflow CLI (#29892)
Support deleting the local log files when using remote logging (#29772)
Blocklist
to disable specific metric tags or metric names (#29881)Add a new graph inside of the grid view (#29413)
Add database
check_migrations
config (#29714)add output format arg for
cli.dags.trigger
(#29224)Make json and yaml available in templates (#28930)
Enable tagged metric names for existing Statsd metric publishing events | influxdb-statsd support (#29093)
Add arg –yes to
db export-archived
command. (#29485)Make the policy functions pluggable (#28558)
Add
airflow db drop-archived
command (#29309)Enable individual trigger logging (#27758)
Implement new filtering options in graph view (#29226)
Add triggers for ExternalTask (#29313)
Add command to export purged records to CSV files (#29058)
Add
FileTrigger
(#29265)Emit DataDog statsd metrics with metadata tags (#28961)
Add some statsd metrics for dataset (#28907)
Add –overwrite option to
connections import
CLI command (#28738)Add general-purpose “notifier” concept to DAGs (#28569)
Add a new conf to wait past_deps before skipping a task (#27710)
Add Flink on K8s Operator (#28512)
Allow Users to disable SwaggerUI via configuration (#28354)
Show mapped task groups in graph (#28392)
Log FileTaskHandler to work with KubernetesExecutor’s multi_namespace_mode (#28436)
Add a new config for adapting masked secrets to make it easier to prevent secret leakage in logs (#28239)
List specific config section and its values using the cli (#28334)
KubernetesExecutor multi_namespace_mode can use namespace list to avoid requiring cluster role (#28047)
Automatically save and allow restore of recent DAG run configs (#27805)
Added exclude_microseconds to cli (#27640)
Improvements¶
Rename most pod_id usage to pod_name in KubernetesExecutor (#29147)
Update the error message for invalid use of poke-only sensors (#30821)
Update log level in scheduler critical section edge case (#30694)
AIP-51 Removing Executor Coupling from Core Airflow (AIP-51)
Add multiple exit code handling in skip logic for BashOperator (#30739)
Updated app to support configuring the caching hash method for FIPS v2 (#30675)
Preload airflow imports before dag parsing to save time (#30495)
Improve task & run actions
UX
in grid view (#30373)Speed up TaskGroups with caching property of group_id (#30284)
Use the engine provided in the session (#29804)
Type related import optimization for Executors (#30361)
Add more type hints to the code base (#30503)
Always use self.appbuilder.get_session in security managers (#30233)
Update SQLAlchemy
select()
to new style (#30515)Refactor out xcom constants from models (#30180)
Add exception class name to DAG-parsing error message (#30105)
Rename statsd_allow_list and statsd_block_list to
metrics_*_list
(#30174)Improve serialization of tuples and sets (#29019)
Make cleanup method in trigger an async one (#30152)
Lazy load serialization modules (#30094)
SLA callbacks no longer add files to the dag_processing manager queue (#30076)
Add task.trigger rule to grid_data (#30130)
Speed up log template sync by avoiding ORM (#30119)
Separate cli_parser.py into two modules (#29962)
Explicit skipped states list for ExternalTaskSensor (#29933)
Add task state hover highlighting to new graph (#30100)
Store grid tabs in url params (#29904)
Use custom Connexion resolver to load lazily (#29992)
Delay Kubernetes import in secret masker (#29993)
Delay ConnectionModelView init until it’s accessed (#29946)
Scheduler, make stale DAG deactivation threshold configurable instead of using dag processing timeout (#29446)
Improve grid view height calculations (#29563)
Avoid importing executor during conf validation (#29569)
Make permissions for FileTaskHandler group-writeable and configurable (#29506)
Add colors in help outputs of Airflow CLI commands #28789 (#29116)
Add a param for get_dags endpoint to list only unpaused dags (#28713)
Expose updated_at filter for dag run and task instance endpoints (#28636)
Increase length of user identifier columns (#29061)
Update gantt chart UI to display queued state of tasks (#28686)
Add index on log.dttm (#28944)
Display only the running configuration in configurations view (#28892)
Cap dropdown menu size dynamically (#28736)
Added JSON linter to connection edit / add UI for field extra. On connection edit screen, existing extra data will be displayed indented (#28583)
Use labels instead of pod name for pod log read in k8s exec (#28546)
Use time not tries for queued & running re-checks. (#28586)
CustomTTYColoredFormatter should inherit TimezoneAware formatter (#28439)
Improve past depends handling in Airflow CLI tasks.run command (#28113)
Support using a list of callbacks in
on_*_callback/sla_miss_callbacks
(#28469)Better table name validation for db clean (#28246)
Use object instead of array in config.yml for config template (#28417)
Add markdown rendering for task notes. (#28245)
Show mapped task groups in grid view (#28208)
Add
renamed
andprevious_name
in config sections (#28324)Speed up most Users/Role CLI commands (#28259)
Speed up Airflow role list command (#28244)
Refactor serialization (#28067, #30819, #30823)
Allow longer pod names for k8s executor / KPO (#27736)
Updates health check endpoint to include
triggerer
status (#27755)
Bug Fixes¶
Fix static_folder for cli app (#30952)
Initialize plugins for cli appbuilder (#30934)
Fix dag file processor heartbeat to run only if necessary (#30899)
Fix KubernetesExecutor sending state to scheduler (#30872)
Count mapped upstream only if all are finished (#30641)
ExternalTaskSensor: add external_task_group_id to template_fields (#30401)
Improve url detection for task instance details (#30779)
Use material icons for dag import error banner (#30771)
Fix misc grid/graph view UI bugs (#30752)
Add a collapse grid button (#30711)
Fix d3 dependencies (#30702)
Simplify logic to resolve tasks stuck in queued despite stalled_task_timeout (#30375)
When clearing task instances try to get associated DAGs from database (#29065)
Fix mapped tasks partial arguments when DAG default args are provided (#29913)
Deactivate DAGs deleted from within zip files (#30608)
Recover from
too old resource version exception
by retrieving the latestresource_version
(#30425)Fix possible race condition when refreshing DAGs (#30392)
Use custom validator for OpenAPI request body (#30596)
Fix
TriggerDagRunOperator
with deferrable parameter (#30406)Speed up dag runs deletion (#30330)
Do not use template literals to construct html elements (#30447)
Fix deprecation warning in
example_sensor_decorator
DAG (#30513)Avoid logging sensitive information in triggerer job log (#30110)
Add a new parameter for base sensor to catch the exceptions in poke method (#30293)
Fix dag run conf encoding with non-JSON serializable values (#28777)
Added fixes for Airflow to be usable on Windows Dask-Workers (#30249)
Force DAG last modified time to UTC (#30243)
Fix EmptySkipOperator in example dag (#30269)
Make the webserver startup respect update_fab_perms (#30246)
Ignore error when changing log folder permissions (#30123)
Disable ordering DagRuns by note (#30043)
Fix reading logs from finished KubernetesExecutor worker pod (#28817)
Mask out non-access bits when comparing file modes (#29886)
Remove Run task action from UI (#29706)
Fix log tailing issues with legacy log view (#29496)
Fixes to how DebugExecutor handles sensors (#28528)
Ensure that pod_mutation_hook is called before logging the pod name (#28534)
Handle OverflowError on exponential backoff in next_run_calculation (#28172)
Misc/Internal¶
Make eager upgrade additional dependencies optional (#30811)
Upgrade to pip 23.1.1 (#30808)
Remove protobuf limitation from eager upgrade (#30182)
Remove protobuf limitation from eager upgrade (#30182)
Deprecate
skip_exit_code
inBashOperator
(#30734)Remove gauge
scheduler.tasks.running
(#30374)Bump json5 to 1.0.2 and eslint-plugin-import to 2.27.5 in
/airflow/www
(#30568)Add tests to PythonOperator (#30362)
Add asgiref as a core dependency (#30527)
Discovery safe mode toggle comment clarification (#30459)
Upgrade moment-timezone package to fix Tehran tz (#30455)
Bump loader-utils from 2.0.0 to 2.0.4 in
/airflow/www
(#30319)Bump babel-loader from 8.1.0 to 9.1.0 in
/airflow/www
(#30316)DagBag: Use
dag.fileloc
instead ofdag.full_filepath
in exception message (#30610)Change log level of serialization information (#30239)
Minor DagRun helper method cleanup (#30092)
Improve type hinting in stats.py (#30024)
Limit
importlib-metadata
backport to < 5.0.0 (#29924)Align cncf provider file names with AIP-21 (#29905)
Upgrade FAB to 4.3.0 (#29766)
Clear ExecutorLoader cache in tests (#29849)
Lazy load Task Instance logs in UI (#29827)
added warning log for max page limit exceeding api calls (#29788)
Aggressively cache entry points in process (#29625)
Don’t use
importlib.metadata
to get Version for speed (#29723)Upgrade Mypy to 1.0 (#29468)
Rename
db export-cleaned
todb export-archived
(#29450)listener: simplify API by replacing SQLAlchemy event-listening by direct calls (#29289)
No multi-line log entry for bash env vars (#28881)
Switch to ruff for faster static checks (#28893)
Remove horizontal lines in TI logs (#28876)
Make allowed_deserialization_classes more intuitive (#28829)
Propagate logs to stdout when in k8s executor pod (#28440, #30860)
Fix code readability, add docstrings to json_client (#28619)
AIP-51 - Misc. Compatibility Checks (#28375)
Fix is_local for LocalKubernetesExecutor (#28288)
Move Hive macros to the provider (#28538)
Rerun flaky PinotDB integration test (#28562)
Add pre-commit hook to check session default value (#28007)
Refactor get_mapped_group_summaries for web UI (#28374)
Add support for k8s 1.26 (#28320)
Replace
freezegun
with time-machine (#28193)Completed D400 for
airflow/kubernetes/*
(#28212)Completed D400 for multiple folders (#27969)
Drop k8s 1.21 and 1.22 support (#28168)
Remove unused task_queue attr from k8s scheduler class (#28049)
Completed D400 for multiple folders (#27767, #27768)
Doc only changes¶
Add instructions on how to avoid accidental airflow upgrade/downgrade (#30813)
Add explicit information about how to write task logs (#30732)
Better explanation on how to log from tasks (#30746)
Use correct import path for Dataset (#30617)
Create
audit_logs.rst
(#30405)Adding taskflow API example for sensors (#30344)
Add clarification about timezone aware dags (#30467)
Clarity params documentation (#30345)
Fix unit for task duration metric (#30273)
Update dag-run.rst for dead links of cli commands (#30254)
Add Write efficient Python code section to Reducing DAG complexity (#30158)
Allow to specify which connection, variable or config are being looked up in the backend using
*_lookup_pattern
parameters (#29580)Add Documentation for notification feature extension (#29191)
Clarify that executor interface is public but instances are not (#29200)
Add Public Interface description to Airflow documentation (#28300)
Add documentation for task group mapping (#28001)
Some fixes to metrics doc (#30290)
Airflow 2.5.3 (2023-04-01)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Fix DagProcessorJob integration for standalone dag-processor (#30278)
Fix proper termination of gunicorn when it hangs (#30188)
Fix XCom.get_one exactly one exception text (#30183)
Correct the VARCHAR size to 250. (#30178)
Revert fix for on_failure_callback when task receives a SIGTERM (#30165)
Move read only property to DagState to fix generated docs (#30149)
Ensure that
dag.partial_subset
doesn’t mutate task group properties (#30129)Fix inconsistent returned value of
airflow dags next-execution
cli command (#30117)Fix www/utils.dag_run_link redirection (#30098)
Fix
TriggerRuleDep
when the mapped tasks count is 0 (#30084)Dag processor manager, add retry_db_transaction to _fetch_callbacks (#30079)
Fix db clean command for mysql db (#29999)
Avoid considering EmptyOperator in mini scheduler (#29979)
Fix some long known Graph View UI problems (#29971, #30355, #30360)
Fix dag docs toggle icon initial angle (#29970)
Fix tags selection in DAGs UI (#29944)
Including airflow/example_dags/sql/sample.sql in MANIFEST.in (#29883)
Fixing broken filter in /taskinstance/list view (#29850)
Allow generic param dicts (#29782)
Fix update_mask in patch variable route (#29711)
Strip markup from app_name if instance_name_has_markup = True (#28894)
Misc/Internal¶
Revert “Also limit importlib on Python 3.9 (#30069)” (#30209)
Add custom_operator_name to @task.sensor tasks (#30131)
Bump webpack from 5.73.0 to 5.76.0 in /airflow/www (#30112)
Formatted config (#30103)
Remove upper bound limit of astroid (#30033)
Remove accidentally merged vendor daemon patch code (#29895)
Fix warning in airflow tasks test command regarding absence of data_interval (#27106)
Doc only changes¶
Adding more information regarding top level code (#30040)
Update workday example (#30026)
Fix some typos in the DAGs docs (#30015)
Update set-up-database.rst (#29991)
Fix some typos on the kubernetes documentation (#29936)
Fix some punctuation and grammar (#29342)
Airflow 2.5.2 (2023-03-15)¶
Significant Changes¶
The date-time fields passed as API parameters or Params should be RFC3339-compliant (#29395)¶
In case of API calls, it was possible that “+” passed as part of the date-time fields were not URL-encoded, and
such date-time fields could pass validation. Such date-time parameters should now be URL-encoded (as %2B
).
In case of parameters, we still allow IS8601-compliant date-time (so for example it is possible that
‘ ‘ was used instead of T
separating date from time and no timezone was specified) but we raise
deprecation warning.
Default for [webserver] expose_hostname
changed to False
(#29547)¶
The default for [webserver] expose_hostname
has been set to False
, instead of True
. This means administrators must opt-in to expose webserver hostnames to end users.
Bug Fixes¶
Fix validation of date-time field in API and Parameter schemas (#29395)
Fix grid logs for large logs (#29390)
Fix on_failure_callback when task receives a SIGTERM (#29743)
Update min version of python-daemon to fix containerd file limits (#29916)
POST
/dagRuns
API should 404 if dag not active (#29860)DAG list sorting lost when switching page (#29756)
Fix Scheduler crash when clear a previous run of a normal task that is now a mapped task (#29645)
Convert moment with timezone to UTC instead of raising an exception (#29606)
Fix clear dag run
openapi
spec responses by adding additional return type (#29600)Don’t display empty rendered attrs in Task Instance Details page (#29545)
Remove section check from get-value command (#29541)
Do not show version/node in UI traceback for unauthenticated user (#29501)
Make
prev_logical_date
variable offset-aware (#29454)Fix nested fields rendering in mapped operators (#29451)
Datasets, next_run_datasets, remove unnecessary timestamp filter (#29441)
Edgemodifier
refactoring w/ labels in TaskGroup edge case (#29410)Fix Rest API update user output (#29409)
Ensure Serialized DAG is deleted (#29407)
Persist DAG and task doc values in TaskFlow API if explicitly set (#29399)
Redirect to the origin page with all the params (#29212)
Fixing Task Duration view in case of manual DAG runs only (#22015) (#29195)
Remove poke method to fall back to parent implementation (#29146)
PR: Introduced fix to run tasks on Windows systems (#29107)
Fix warning in migrations about old config. (#29092)
Emit dagrun failed duration when timeout (#29076)
Handling error on cluster policy itself (#29056)
Fix kerberos authentication for the REST API. (#29054)
Fix leak sensitive field via V1EnvVar on exception (#29016)
Sanitize url_for arguments before they are passed (#29039)
Fix dag run trigger with a note. (#29228)
Write action log to DB when DAG run is triggered via API (#28998)
Resolve all variables in pickled XCom iterator (#28982)
Allow URI without authority and host blocks in
airflow connections add
(#28922)Be more selective when adopting pods with KubernetesExecutor (#28899)
KubenetesExecutor sends state even when successful (#28871)
Annotate KubernetesExecutor pods that we don’t delete (#28844)
Throttle streaming log reads (#28818)
Introduce dag processor job (#28799)
Fix #28391 manual task trigger from UI fails for k8s executor (#28394)
Logging poke info when external dag is not none and task_id and task_ids are none (#28097)
Fix inconsistencies in checking edit permissions for a DAG (#20346)
Misc/Internal¶
Add a check for not templateable fields (#29821)
Removed continue for not in (#29791)
Move extra links position in grid view (#29703)
Bump
undici
from5.9.1
to5.19.1
(#29583)Change expose_hostname default to false (#29547)
Change permissions of config/password files created by airflow (#29495)
Use newer setuptools
v67.2.0
(#29465)Increase max height for grid view elements (#29367)
Clarify description of worker control config (#29247)
Bump
ua-parser-js
from0.7.31
to0.7.33
in/airflow/www
(#29172)Remove upper bound limitation for
pytest
(#29086)Check for
run_id
url param when linking tograph/gantt
views (#29066)Clarify graph view dynamic task labels (#29042)
Fixing import error for dataset (#29007)
Update how PythonSensor returns values from
python_callable
(#28932)Add dep context description for better log message (#28875)
Bump
swagger-ui-dist
from3.52.0
to4.1.3
in/airflow/www
(#28824)Limit
importlib-metadata
backport to< 5.0.0
(#29924, #30069)
Doc only changes¶
Update pipeline.rst - Fix query in
merge_data()
task (#29158)Correct argument name of Workday timetable in timetable.rst (#29896)
Update ref anchor for env var link in Connection how-to doc (#29816)
Better description for limit in api (#29773)
Description of dag_processing.last_duration (#29740)
Update docs re: template_fields typing and subclasses (#29725)
Fix formatting of Dataset inlet/outlet note in TaskFlow concepts (#29678)
Specific use-case: adding packages via requirements.txt in compose (#29598)
Detect is ‘docker-compose’ existing (#29544)
Add Landing Times entry to UI docs (#29511)
Improve health checks in example docker-compose and clarify usage (#29408)
Remove
notes
param from TriggerDagRunOperator docstring (#29298)Use
schedule
param rather thantimetable
in Timetables docs (#29255)Add trigger process to Airflow Docker docs (#29203)
Update set-up-database.rst (#29104)
Several improvements to the Params doc (#29062)
Email Config docs more explicit env var examples (#28845)
Listener plugin example added (#27905)
Airflow 2.5.1 (2023-01-20)¶
Significant Changes¶
Trigger gevent monkeypatching
via environment variable (#28283)¶
If you are using gevent for your webserver deployment and used local settings to monkeypatch
gevent,
you might want to replace local settings patching with an _AIRFLOW_PATCH_GEVENT
environment variable
set to 1 in your webserver. This ensures gevent patching is done as early as possible.
Bug Fixes¶
Fix masking of non-sensitive environment variables (#28802)
Remove swagger-ui extra from connexion and install
swagger-ui-dist
via npm package (#28788)Fix
UIAlert
should_show whenAUTH_ROLE_PUBLIC
set (#28781)Only patch single label when adopting pod (#28776)
Update CSRF token to expire with session (#28730)
Fix “airflow tasks render” cli command for mapped task instances (#28698)
Allow XComArgs for
external_task_ids
of ExternalTaskSensor (#28692)Row-lock TIs to be removed during mapped task expansion (#28689)
Handle ConnectionReset exception in Executor cleanup (#28685)
Fix description of output redirection for access_log for gunicorn (#28672)
Add back join to zombie query that was dropped in #28198 (#28544)
Fix calendar view for CronTriggerTimeTable dags (#28411)
After running the DAG the employees table is empty. (#28353)
Fix
DetachedInstanceError
when finding zombies in Dag Parsing process (#28198)Nest header blocks in
divs
to fixdagid
copy nit on dag.html (#28643)Fix UI caret direction (#28624)
Guard not-yet-expanded ti in trigger rule dep (#28592)
Move TI
setNote
endpoints under TaskInstance in OpenAPI (#28566)Consider previous run in
CronTriggerTimetable
(#28532)Ensure correct log dir in file task handler (#28477)
Fix bad pods pickled in executor_config (#28454)
Add
ensure_ascii=False
in trigger dag run API (#28451)Add setters to MappedOperator on_*_callbacks (#28313)
Fix
ti._try_number
for deferred and up_for_reschedule tasks (#26993)separate
callModal
from dag.js (#28410)A manual run can’t look like a scheduled one (#28397)
Dont show task/run durations when there is no start_date (#28395)
Maintain manual scroll position in task logs (#28386)
Correctly select a mapped task’s “previous” task (#28379)
Trigger gevent
monkeypatching
via environment variable (#28283)Fix db clean warnings (#28243)
Make arguments ‘offset’ and ‘length’ not required (#28234)
Make live logs reading work for “other” k8s executors (#28213)
Add custom pickling hooks to
LazyXComAccess
(#28191)fix next run datasets error (#28165)
Ensure that warnings from
@dag
decorator are reported in dag file (#28153)Do not warn when airflow dags tests command is used (#28138)
Ensure the
dagbag_size
metric decreases when files are deleted (#28135)Improve run/task grid view actions (#28130)
Make BaseJob.most_recent_job favor “running” jobs (#28119)
Don’t emit FutureWarning when code not calling old key (#28109)
Add
airflow.api.auth.backend.session
to backend sessions in compose (#28094)Resolve false warning about calling conf.get on moved item (#28075)
Return list of tasks that will be changed (#28066)
Handle bad zip files nicely when parsing DAGs. (#28011)
Prevent double loading of providers from local paths (#27988)
Fix deadlock when chaining multiple empty mapped tasks (#27964)
fix: current_state method on TaskInstance doesn’t filter by map_index (#27898)
Don’t log CLI actions if db not initialized (#27851)
Make sure we can get out of a faulty scheduler state (#27834)
dagrun,
next_dagruns_to_examine
, add MySQL index hint (#27821)Handle DAG disappearing mid-flight when dag verification happens (#27720)
fix: continue checking sla (#26968)
Allow generation of connection URI to work when no conn type (#26765)
Misc/Internal¶
Remove limit for
dnspython
after eventlet got fixed (#29004)Limit
dnspython
to <2.3.0
until eventlet incompatibility is solved (#28962)Add automated version replacement in example dag indexes (#28090)
Cleanup and do housekeeping with plugin examples (#28537)
Limit
SQLAlchemy
to below2.0
(#28725)Bump
json5
from1.0.1
to1.0.2
in/airflow/www
(#28715)Fix some docs on using sensors with taskflow (#28708)
Change Architecture and OperatingSystem classes into
Enums
(#28627)Add doc-strings and small improvement to email util (#28634)
Fix
Connection.get_extra
type (#28594)navbar, cap dropdown size, and add scroll bar (#28561)
Emit warnings for
conf.get*
from the right source location (#28543)Move MyPY plugins of ours to dev folder (#28498)
Add retry to
purge_inactive_dag_warnings
(#28481)Re-enable Plyvel on ARM as it now builds cleanly (#28443)
Add SIGUSR2 handler for LocalTaskJob and workers to aid debugging (#28309)
Convert
test_task_command
to Pytest andunquarantine
tests in it (#28247)Make invalid characters exception more readable (#28181)
Bump decode-uri-component from
0.2.0
to0.2.2
in/airflow/www
(#28080)Use asserts instead of exceptions for executor not started (#28019)
Simplify dataset
subgraph
logic (#27987)Order TIs by
map_index
(#27904)Additional info about Segmentation Fault in
LocalTaskJob
(#27381)
Doc only changes¶
Mention mapped operator in cluster policy doc (#28885)
Slightly improve description of Dynamic DAG generation preamble (#28650)
Restructure Docs (#27235)
Update scheduler docs about low priority tasks (#28831)
Clarify that versioned constraints are fixed at release time (#28762)
Clarify about docker compose (#28729)
Adding an example dag for dynamic task mapping (#28325)
Use docker compose v2 command (#28605)
Add AIRFLOW_PROJ_DIR to docker-compose example (#28517)
Remove outdated Optional Provider Feature outdated documentation (#28506)
Add documentation for [core] mp_start_method config (#27993)
Documentation for the LocalTaskJob return code counter (#27972)
Note which versions of Python are supported (#27798)
Airflow 2.5.0 (2022-12-02)¶
Significant Changes¶
airflow dags test
no longer performs a backfill job (#26400)¶
In order to make airflow dags test
more useful as a testing and debugging tool, we no
longer run a backfill job and instead run a “local task runner”. Users can still backfill
their DAGs using the airflow dags backfill
command.
Airflow config section kubernetes
renamed to kubernetes_executor
(#26873)¶
KubernetesPodOperator no longer considers any core kubernetes config params, so this section now only applies to kubernetes executor. Renaming it reduces potential for confusion.
AirflowException
is now thrown as soon as any dependent tasks of ExternalTaskSensor fails (#27190)¶
ExternalTaskSensor
no longer hangs indefinitely when failed_states
is set, an execute_date_fn
is used, and some but not all of the dependent tasks fail.
Instead, an AirflowException
is thrown as soon as any of the dependent tasks fail.
Any code handling this failure in addition to timeouts should move to caching the AirflowException
BaseClass
and not only the AirflowSensorTimeout
subclass.
The Airflow config option scheduler.deactivate_stale_dags_interval
has been renamed to scheduler.parsing_cleanup_interval
(#27828).¶
The old option will continue to work but will issue deprecation warnings, and will be removed entirely in Airflow 3.
New Features¶
TaskRunner
: notify of component start and finish (#27855)Add DagRun state change to the Listener plugin system(#27113)
Metric for raw task return codes (#27155)
Add logic for XComArg to pull specific map indexes (#27771)
Clear TaskGroup (#26658, #28003)
Add critical section query duration metric (#27700)
Add: #23880 :: Audit log for
AirflowModelViews(Variables/Connection)
(#24079, #27994, #27923)Add postgres 15 support (#27444)
Expand tasks in mapped group at run time (#27491)
reset commits, clean submodules (#27560)
scheduler_job, add metric for scheduler loop timer (#27605)
Allow datasets to be used in taskflow (#27540)
Add expanded_ti_count to ti context (#27680)
Add user comment to task instance and dag run (#26457, #27849, #27867)
Enable copying DagRun JSON to clipboard (#27639)
Implement extra controls for SLAs (#27557)
add dag parsed time in DAG view (#27573)
Add max_wait for exponential_backoff in BaseSensor (#27597)
Expand tasks in mapped group at parse time (#27158)
Add disable retry flag on backfill (#23829)
Adding sensor decorator (#22562)
Api endpoint update ti (#26165)
Filtering datasets by recent update events (#26942)
Support
Is /not
Null filter for value is None onwebui
(#26584)Add search to datasets list (#26893)
Split out and handle ‘params’ in mapped operator (#26100)
Add authoring API for TaskGroup mapping (#26844)
Add
one_done
trigger rule (#26146)Create a more efficient airflow dag test command that also has better local logging (#26400)
Support add/remove permissions to roles commands (#26338)
Auto tail file logs in Web UI (#26169)
Add triggerer info to task instance in API (#26249)
Flag to deserialize value on custom XCom backend (#26343)
Improvements¶
Allow depth-first execution (#27827)
UI: Update offset height if data changes (#27865)
Improve TriggerRuleDep typing and readability (#27810)
Make views requiring session, keyword only args (#27790)
Optimize
TI.xcom_pull()
with explicit task_ids and map_indexes (#27699)Allow hyphens in pod id used by k8s executor (#27737)
optimise task instances filtering (#27102)
Use context managers to simplify log serve management (#27756)
Fix formatting leftovers (#27750)
Improve task deadlock messaging (#27734)
Improve “sensor timeout” messaging (#27733)
Replace urlparse with
urlsplit
(#27389)Align TaskGroup semantics to AbstractOperator (#27723)
Add new files to parsing queue on every loop of dag processing (#27060)
Make Kubernetes Executor & Scheduler resilient to error during PMH execution (#27611)
Separate dataset deps into individual graphs (#27356)
Use log.exception where more economical than log.error (#27517)
Move validation
branch_task_ids
intoSkipMixin
(#27434)Coerce LazyXComAccess to list when pushed to XCom (#27251)
Update cluster-policies.rst docs (#27362)
Add warning if connection type already registered within the provider (#27520)
Activate debug logging in commands with –verbose option (#27447)
Add classic examples for Python Operators (#27403)
change
.first()
to.scalar()
(#27323)Improve reset_dag_run description (#26755)
Add examples and
howtos
about sensors (#27333)Make grid view widths adjustable (#27273)
Sorting plugins custom menu links by category before name (#27152)
Simplify DagRun.verify_integrity (#26894)
Add mapped task group info to serialization (#27027)
Correct the JSON style used for Run config in Grid View (#27119)
No
extra__conn_type__
prefix required for UI behaviors (#26995)Improve dataset update blurb (#26878)
Rename kubernetes config section to kubernetes_executor (#26873)
decode params for dataset searches (#26941)
Get rid of the DAGRun details page & rely completely on Grid (#26837)
Fix scheduler
crashloopbackoff
when usinghostname_callable
(#24999)Reduce log verbosity in KubernetesExecutor. (#26582)
Don’t iterate tis list twice for no reason (#26740)
Clearer code for PodGenerator.deserialize_model_file (#26641)
Don’t import kubernetes unless you have a V1Pod (#26496)
Add updated_at column to DagRun and Ti tables (#26252)
Move the deserialization of custom XCom Backend to 2.4.0 (#26392)
Avoid calculating all elements when one item is needed (#26377)
Add
__future__
.annotations automatically by isort (#26383)Handle list when serializing expand_kwargs (#26369)
Apply PEP-563 (Postponed Evaluation of Annotations) to core airflow (#26290)
Add more weekday operator and sensor examples #26071 (#26098)
Align TaskGroup semantics to AbstractOperator (#27723)
Bug Fixes¶
Gracefully handle whole config sections being renamed (#28008)
Add allow list for imports during deserialization (#27887)
Soft delete datasets that are no longer referenced in DAG schedules or task outlets (#27828)
Redirect to home view when there are no valid tags in the URL (#25715)
Refresh next run datasets info in dags view (#27839)
Make MappedTaskGroup depend on its expand inputs (#27876)
Make DagRun state updates for paused DAGs faster (#27725)
Don’t explicitly set include_examples to False on task run command (#27813)
Fix menu border color (#27789)
Fix backfill queued task getting reset to scheduled state. (#23720)
Fix clearing child dag mapped tasks from parent dag (#27501)
Handle json encoding of
V1Pod
in task callback (#27609)Fix ExternalTaskSensor can’t check zipped dag (#27056)
Avoid re-fetching DAG run in TriggerDagRunOperator (#27635)
Continue on exception when retrieving metadata (#27665)
External task sensor fail fix (#27190)
Add the default None when pop actions (#27537)
Display parameter values from serialized dag in trigger dag view. (#27482, #27944)
Move TriggerDagRun conf check to execute (#27035)
Resolve trigger assignment race condition (#27072)
Update google_analytics.html (#27226)
Fix some bug in web ui dags list page (auto-refresh & jump search null state) (#27141)
Fixed broken URL for docker-compose.yaml (#26721)
Fix xcom arg.py .zip bug (#26636)
Fix 404
taskInstance
errors and split into two tables (#26575)Fix browser warning of improper thread usage (#26551)
template rendering issue fix (#26390)
Clear
autoregistered
DAGs if there are any import errors (#26398)Fix
from airflow import version
lazy import (#26239)allow scroll in triggered dag runs modal (#27965)
Misc/Internal¶
Remove
is_mapped
attribute (#27881)Simplify FAB table resetting (#27869)
Fix old-style typing in Base Sensor (#27871)
Switch (back) to late imports (#27730)
Completed D400 for multiple folders (#27748)
simplify notes accordion test (#27757)
completed D400 for
airflow/callbacks/* airflow/cli/*
(#27721)Completed D400 for
airflow/api_connexion/* directory
(#27718)Completed D400 for
airflow/listener/* directory
(#27731)Completed D400 for
airflow/lineage/* directory
(#27732)Update API & Python Client versions (#27642)
Completed D400 & D401 for
airflow/api/*
directory (#27716)Completed D400 for multiple folders (#27722)
Bump
minimatch
from3.0.4 to 3.0.8
in/airflow/www
(#27688)Bump loader-utils from
1.4.1 to 1.4.2 ``in ``/airflow/www
(#27697)Disable nested task mapping for now (#27681)
bump alembic minimum version (#27629)
remove unused code.html (#27585)
Enable python string normalization everywhere (#27588)
Upgrade dependencies in order to avoid backtracking (#27531)
Strengthen a bit and clarify importance of triaging issues (#27262)
Deduplicate type hints (#27508)
Add stub ‘yield’ to
BaseTrigger.run
(#27416)Remove upper-bound limit to dask (#27415)
Limit Dask to under
2022.10.1
(#27383)Update old style typing (#26872)
Enable string normalization for docs (#27269)
Slightly faster up/downgrade tests (#26939)
Deprecate use of core get_kube_client in PodManager (#26848)
Add
memray
files togitignore / dockerignore
(#27001)Bump sphinx and
sphinx-autoapi
(#26743)Simplify
RTIF.delete_old_records()
(#26667)migrate last react files to typescript (#26112)
Work around
pyupgrade
edge cases (#26384)
Doc only changes¶
Document dag_file_processor_timeouts metric as deprecated (#27067)
Drop support for PostgreSQL 10 (#27594)
Update index.rst (#27529)
Add note about pushing the lazy XCom proxy to XCom (#27250)
Fix BaseOperator link (#27441)
[docs] best-practices add use variable with template example. (#27316)
docs for custom view using plugin (#27244)
Update graph view and grid view on overview page (#26909)
Documentation fixes (#26819)
make consistency on markup title string level (#26696)
Add documentation to dag test function (#26713)
Fix broken URL for
docker-compose.yaml
(#26726)Add a note against use of top level code in timetable (#26649)
Fix example_datasets dag names (#26495)
Update docs: zip-like effect is now possible in task mapping (#26435)
changing to task decorator in docs from classic operator use (#25711)
Airflow 2.4.3 (2022-11-14)¶
Significant Changes¶
Make RotatingFilehandler
used in DagProcessor
non-caching (#27223)¶
In case you want to decrease cache memory when CONFIG_PROCESSOR_MANAGER_LOGGER=True
, and you have your local settings created before,
you can update processor_manager_handler
to use airflow.utils.log.non_caching_file_handler.NonCachingRotatingFileHandler
handler instead of logging.RotatingFileHandler
.
Bug Fixes¶
Fix double logging with some task logging handler (#27591)
Replace FAB url filtering function with Airflow’s (#27576)
Fix mini scheduler expansion of mapped task (#27506)
SLAMiss
is nullable and not always given back when pulling task instances (#27423)Fix behavior of
_
when searching for DAGs (#27448)Fix getting the
dag/task
ids from BaseExecutor (#27550)Fix SQLAlchemy primary key black-out error on DDRQ (#27538)
Fix IntegrityError during webserver startup (#27297)
Add case insensitive constraint to username (#27266)
Fix python external template keys (#27256)
Reduce extraneous task log requests (#27233)
Make
RotatingFilehandler
used inDagProcessor
non-caching (#27223)Listener: Set task on SQLAlchemy TaskInstance object (#27167)
Fix dags list page auto-refresh & jump search null state (#27141)
Set
executor.job_id
toBackfillJob.id
for backfills (#27020)
Misc/Internal¶
Bump loader-utils from
1.4.0
to1.4.1
in/airflow/www
(#27552)Reduce log level for k8s
TCP_KEEPALIVE
etc warnings (#26981)
Doc only changes¶
Use correct executable in docker compose docs (#27529)
Fix wording in DAG Runs description (#27470)
Document that
KubernetesExecutor
overwrites container args (#27450)Fix
BaseOperator
links (#27441)Correct timer units to seconds from milliseconds. (#27360)
Add missed import in the Trigger Rules example (#27309)
Update SLA wording to reflect it is relative to
Dag Run
start. (#27111)Add
kerberos
environment variables to the docs (#27028)
Airflow 2.4.2 (2022-10-23)¶
Significant Changes¶
Default for [webserver] expose_stacktrace
changed to False
(#27059)¶
The default for [webserver] expose_stacktrace
has been set to False
, instead of True
. This means administrators must opt-in to expose tracebacks to end users.
Bug Fixes¶
Make tracebacks opt-in (#27059)
Add missing AUTOINC/SERIAL for FAB tables (#26885)
Add separate error handler for 405(Method not allowed) errors (#26880)
Don’t re-patch pods that are already controlled by current worker (#26778)
Handle mapped tasks in task duration chart (#26722)
Fix task duration cumulative chart (#26717)
Avoid 500 on dag redirect (#27064)
Filter dataset dependency data on webserver (#27046)
Remove double collection of dags in
airflow dags reserialize
(#27030)Fix auto refresh for graph view (#26926)
Don’t overwrite connection extra with invalid json (#27142)
Fix next run dataset modal links (#26897)
Change dag audit log sort by date from asc to desc (#26895)
Bump min version of jinja2 (#26866)
Add missing colors to
state_color_mapping
jinja global (#26822)Fix running debuggers inside
airflow tasks test
(#26806)Fix warning when using xcomarg dependencies (#26801)
demote Removed state in priority for displaying task summaries (#26789)
Ensure the log messages from operators during parsing go somewhere (#26779)
Add restarting state to TaskState Enum in REST API (#26776)
Allow retrieving error message from data.detail (#26762)
Simplify origin string cleaning (#27143)
Remove DAG parsing from StandardTaskRunner (#26750)
Fix non-hidden cumulative chart on duration view (#26716)
Remove TaskFail duplicates check (#26714)
Fix airflow tasks run –local when dags_folder differs from that of processor (#26509)
Fix yarn warning from d3-color (#27139)
Fix version for a couple configurations (#26491)
Revert “No grid auto-refresh for backfill dag runs (#25042)” (#26463)
Retry on Airflow Schedule DAG Run DB Deadlock (#26347)
Misc/Internal¶
Clean-ups around task-mapping code (#26879)
Move user-facing string to template (#26815)
add icon legend to datasets graph (#26781)
Bump
sphinx
andsphinx-autoapi
(#26743)Simplify
RTIF.delete_old_records()
(#26667)Bump FAB to
4.1.4
(#26393)
Doc only changes¶
Fixed triple quotes in task group example (#26829)
Documentation fixes (#26819)
make consistency on markup title string level (#26696)
Add a note against use of top level code in timetable (#26649)
Fix broken URL for
docker-compose.yaml
(#26726)
Airflow 2.4.1 (2022-09-30)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
When rendering template, unmap task in context (#26702)
Fix scroll overflow for ConfirmDialog (#26681)
Resolve deprecation warning re
Table.exists()
(#26616)Fix XComArg zip bug (#26636)
Use COALESCE when ordering runs to handle NULL (#26626)
Check user is active (#26635)
No missing user warning for public admin (#26611)
Allow MapXComArg to resolve after serialization (#26591)
Resolve warning about DISTINCT ON query on dags view (#26608)
Log warning when secret backend kwargs is invalid (#26580)
Fix grid view log try numbers (#26556)
Template rendering issue in passing
templates_dict
to task decorator (#26390)Fix Deferrable stuck as
scheduled
during backfill (#26205)Suppress SQLALCHEMY_TRACK_MODIFICATIONS warning in db init (#26617)
Correctly set
json_provider_class
on Flask app so it uses our encoder (#26554)Fix WSGI root app (#26549)
Fix deadlock when mapped task with removed upstream is rerun (#26518)
ExecutorConfigType should be
cacheable
(#26498)Fix proper joining of the path for logs retrieved from celery workers (#26493)
DAG Deps extends
base_template
(#26439)Don’t update backfill run from the scheduler (#26342)
Doc only changes¶
Clarify owner links document (#26515)
Fix invalid RST in dataset concepts doc (#26434)
Document the
non-sensitive-only
option forexpose_config
(#26507)Fix
example_datasets
dag names (#26495)Zip-like effect is now possible in task mapping (#26435)
Use task decorator in docs instead of classic operators (#25711)
Airflow 2.4.0 (2022-09-19)¶
Significant Changes¶
Data-aware Scheduling and Dataset
concept added to Airflow¶
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags: data-aware scheduling.
This allows DAG runs to be automatically created as a result of a task “producing” a dataset. In some ways
this can be thought of as the inverse of TriggerDagRunOperator
, where instead of the producing DAG
controlling which DAGs get created, the consuming DAGs can “listen” for changes.
A dataset is identified by a URI:
from airflow import Dataset
# The URI doesn't have to be absolute
dataset = Dataset(uri="my-dataset")
# Or you can use a scheme to show where it lives.
dataset2 = Dataset(uri="s3://bucket/prefix")
To create a DAG that runs whenever a Dataset is updated use the new schedule
parameter (see below) and
pass a list of 1 or more Datasets:
with DAG(dag_id='dataset-consumer', schedule=[dataset]):
...
And to mark a task as producing a dataset pass the dataset(s) to the outlets
attribute:
@task(outlets=[dataset])
def my_task():
...
# Or for classic operators
BashOperator(task_id="update-ds", bash_command=..., outlets=[dataset])
If you have the producer and consumer in different files you do not need to use the same Dataset object, two
Dataset()
s created with the same URI are equal.
Datasets represent the abstract concept of a dataset, and (for now) do not have any direct read or write capability - in this release we are adding the foundational feature that we will build upon.
For more info on Datasets please see Data-aware scheduling.
Expanded dynamic task mapping support¶
Dynamic task mapping now includes support for expand_kwargs
, zip
and map
.
For more info on dynamic task mapping please see Dynamic Task Mapping.
DAGS used in a context manager no longer need to be assigned to a module variable (#23592)¶
Previously you had to assign a DAG to a module-level variable in order for Airflow to pick it up. For example this
with DAG(dag_id="example") as dag:
...
@dag
def dag_maker():
...
dag2 = dag_maker()
can become
with DAG(dag_id="example"):
...
@dag
def dag_maker():
...
dag_maker()
If you want to disable the behaviour for any reason then set auto_register=False
on the dag:
# This dag will not be picked up by Airflow as it's not assigned to a variable
with DAG(dag_id="example", auto_register=False):
...
Deprecation of schedule_interval
and timetable
arguments (#25410)¶
We added new DAG argument schedule
that can accept a cron expression, timedelta object, timetable object, or list of dataset objects. Arguments schedule_interval
and timetable
are deprecated.
If you previously used the @daily
cron preset, your DAG may have looked like this:
with DAG(
dag_id="my_example",
start_date=datetime(2021, 1, 1),
schedule_interval="@daily",
):
...
Going forward, you should use the schedule
argument instead:
with DAG(
dag_id="my_example",
start_date=datetime(2021, 1, 1),
schedule="@daily",
):
...
The same is true if you used a custom timetable. Previously you would have used the timetable
argument:
with DAG(
dag_id="my_example",
start_date=datetime(2021, 1, 1),
timetable=EventsTimetable(event_dates=[pendulum.datetime(2022, 4, 5)]),
):
...
Now you should use the schedule
argument:
with DAG(
dag_id="my_example",
start_date=datetime(2021, 1, 1),
schedule=EventsTimetable(event_dates=[pendulum.datetime(2022, 4, 5)]),
):
...
Removal of experimental Smart Sensors (#25507)¶
Smart Sensors were added in 2.0 and deprecated in favor of Deferrable operators in 2.2, and have now been removed.
airflow.contrib
packages and deprecated modules are dynamically generated (#26153, #26179, #26167)¶
The airflow.contrib
packages and deprecated modules from Airflow 1.10 in airflow.hooks
, airflow.operators
, airflow.sensors
packages are now dynamically generated modules and while users can continue using the deprecated contrib classes, they are no longer visible for static code check tools and will be reported as missing. It is recommended for the users to move to the non-deprecated classes.
DBApiHook
and SQLSensor
have moved (#24836)¶
DBApiHook
and SQLSensor
have been moved to the apache-airflow-providers-common-sql
provider.
DAG runs sorting logic changed in grid view (#25090)¶
The ordering of DAG runs in the grid view has been changed to be more “natural”. The new logic generally orders by data interval, but a custom ordering can be applied by setting the DAG to use a custom timetable.
New Features¶
Add Data-aware Scheduling (AIP-48)
Add
@task.short_circuit
TaskFlow decorator (#25752)Make
execution_date_or_run_id
optional intasks test
command (#26114)Automatically register DAGs that are used in a context manager (#23592, #26398)
Add option of sending DAG parser logs to stdout. (#25754)
Support multiple
DagProcessors
parsing files from different locations. (#25935)Implement
ExternalPythonOperator
(#25780)Make execution_date optional for command
dags test
(#26111)Implement
expand_kwargs()
against a literal list (#25925)Add trigger rule tooltip (#26043)
Add conf parameter to CLI for airflow dags test (#25900)
Include scheduled slots in pools view (#26006)
Add
output
property toMappedOperator
(#25604)Add roles delete command to cli (#25854)
Add Airflow specific warning classes (#25799)
Add support for
TaskGroup
inExternalTaskSensor
(#24902)Add
@task.kubernetes
taskflow decorator (#25663)Add a way to import Airflow without side-effects (#25832)
Let timetables control generated run_ids. (#25795)
Allow per-timetable ordering override in grid view (#25633)
Grid logs for mapped instances (#25610, #25621, #25611)
Consolidate to one
schedule
param (#25410)DAG regex flag in backfill command (#23870)
Adding support for owner links in the Dags view UI (#25280)
Ability to clear a specific DAG Run’s task instances via REST API (#23516)
Possibility to document DAG with a separate markdown file (#25509)
Add parsing context to DAG Parsing (#25161)
Implement
CronTriggerTimetable
(#23662)Add option to mask sensitive data in UI configuration page (#25346)
Create new databases from the ORM (#24156)
Implement
XComArg.zip(*xcom_args)
(#25176)Introduce
sla_miss
metric (#23402)Implement
map()
semantic (#25085)Add override method to TaskGroupDecorator (#25160)
Implement
expand_kwargs()
(#24989)Add parameter to turn off SQL query logging (#24570)
Add
DagWarning
model, and a check for missing pools (#23317)Add Task Logs to Grid details panel (#24249)
Added small health check server and endpoint in scheduler(#23905)
Add built-in External Link for
ExternalTaskMarker
operator (#23964)Add default task retry delay config (#23861)
Add clear DagRun endpoint. (#23451)
Add support for timezone as string in cron interval timetable (#23279)
Add auto-refresh to dags home page (#22900, #24770)
Improvements¶
Add more weekday operator and sensor examples #26071 (#26098)
Add subdir parameter to dags reserialize command (#26170)
Update zombie message to be more descriptive (#26141)
Only send an
SlaCallbackRequest
if the DAG is scheduled (#26089)Promote
Operator.output
more (#25617)Upgrade API files to typescript (#25098)
Less
hacky
double-rendering prevention in mapped task (#25924)Improve Audit log (#25856)
Remove mapped operator validation code (#25870)
More
DAG(schedule=...)
improvements (#25648)Reduce
operator_name
dupe in serialized JSON (#25819)Make grid view group/mapped summary UI more consistent (#25723)
Remove useless statement in
task_group_to_grid
(#25654)Add optional data interval to
CronTriggerTimetable
(#25503)Remove unused code in
/grid
endpoint (#25481)Add and document description fields (#25370)
Improve Airflow logging for operator Jinja template processing (#25452)
Update core example DAGs to use
@task.branch
decorator (#25242)Update DAG
audit_log
route (#25415)Change stdout and stderr access mode to append in commands (#25253)
Remove
getTasks
from Grid view (#25359)Improve taskflow type hints with ParamSpec (#25173)
Use tables in grid details panes (#25258)
Explicitly list
@dag
arguments (#25044)More typing in
SchedulerJob
andTaskInstance
(#24912)Patch
getfqdn
with more resilient version (#24981)Replace all
NBSP
characters bywhitespaces
(#24797)Re-serialize all DAGs on
airflow db upgrade
(#24518)Rework contract of try_adopt_task_instances method (#23188)
Make
expand()
error vague so it’s not misleading (#24018)Add enum validation for
[webserver]analytics_tool
(#24032)Add
dttm
searchable field in audit log (#23794)Allow more parameters to be piped through via
execute_in_subprocess
(#23286)Use
func.count
to count rows (#23657)Remove stale serialized dags (#22917)
AIP45 Remove dag parsing in airflow run local (#21877)
Add support for queued state in DagRun update endpoint. (#23481)
Add fields to dagrun endpoint (#23440)
Use
sql_alchemy_conn
for celery result backend whenresult_backend
is not set (#24496)
Bug Fixes¶
Have consistent types between the ORM and the migration files (#24044, #25869)
Disallow any dag tags longer than 100 char (#25196)
Add the dag_id to
AirflowDagCycleException
message (#26204)Properly build URL to retrieve logs independently from system (#26337)
For worker log servers only bind to IPV6 when dual stack is available (#26222)
Fix
TaskInstance.task
not defined beforehandle_failure
(#26040)Undo secrets backend config caching (#26223)
Fix faulty executor config serialization logic (#26191)
Show
DAGs
andDatasets
menu links based on role permission (#26183)Allow setting
TaskGroup
tooltip via function docstring (#26028)Fix RecursionError on graph view of a DAG with many tasks (#26175)
Fix backfill occasional deadlocking (#26161)
Fix
DagRun.start_date
not set during backfill with--reset-dagruns
True (#26135)Use label instead of id for dynamic task labels in graph (#26108)
Don’t fail DagRun when leaf
mapped_task
is SKIPPED (#25995)Add group prefix to decorated mapped task (#26081)
Fix UI flash when triggering with dup logical date (#26094)
Fix Make items nullable for
TaskInstance
related endpoints to avoid API errors (#26076)Fix
BranchDateTimeOperator
to betimezone-awreness-insensitive
(#25944)Fix legacy timetable schedule interval params (#25999)
Fix response schema for
list-mapped-task-instance
(#25965)Properly check the existence of missing mapped TIs (#25788)
Fix broken auto-refresh on grid view (#25950)
Use per-timetable ordering in grid UI (#25880)
Rewrite recursion when parsing DAG into iteration (#25898)
Find cross-group tasks in
iter_mapped_dependants
(#25793)Fail task if mapping upstream fails (#25757)
Support
/
in variable get endpoint (#25774)Use cfg default_wrap value for grid logs (#25731)
Add origin request args when triggering a run (#25729)
Operator name separate from class (#22834)
Fix incorrect data interval alignment due to assumption on input time alignment (#22658)
Return None if an
XComArg
fails to resolve (#25661)Correct
json
arg help inairflow variables set
command (#25726)Added MySQL index hint to use
ti_state
onfind_zombies
query (#25725)Only excluded actually expanded fields from render (#25599)
Grid, fix toast for
axios
errors (#25703)Fix UI redirect (#26409)
Require dag_id arg for dags list-runs (#26357)
Check for queued states for dags auto-refresh (#25695)
Fix upgrade code for the
dag_owner_attributes
table (#25579)Add map index to task logs api (#25568)
Ensure that zombie tasks for dags with errors get cleaned up (#25550)
Make extra link work in UI (#25500)
Sync up plugin API schema and definition (#25524)
First/last names can be empty (#25476)
Refactor DAG pages to be consistent (#25402)
Check
expand_kwargs()
input type before unmapping (#25355)Filter XCOM by key when calculating map lengths (#24530)
Fix
ExternalTaskSensor
not working with dynamic task (#25215)Added exception catching to send default email if template file raises any exception (#24943)
Bring
MappedOperator
members in sync withBaseOperator
(#24034)
Misc/Internal¶
Add automatically generated
ERD
schema for theMetaData
DB (#26217)Mark serialization functions as internal (#26193)
Remove remaining deprecated classes and replace them with
PEP562
(#26167)Move
dag_edges
andtask_group_to_dict
to corresponding util modules (#26212)Lazily import many modules to improve import speed (#24486, #26239)
FIX Incorrect typing information (#26077)
Add missing contrib classes to deprecated dictionaries (#26179)
Re-configure/connect the
ORM
after forking to run a DAG processor (#26216)Remove cattrs from lineage processing. (#26134)
Removed deprecated contrib files and replace them with
PEP-562
getattr (#26153)Make
BaseSerialization.serialize
“public” to other classes. (#26142)Change the template to use human readable task_instance description (#25960)
Bump
moment-timezone
from0.5.34
to0.5.35
in/airflow/www
(#26080)Fix Flask deprecation warning (#25753)
Add
CamelCase
to generated operations types (#25887)Fix migration issues and tighten the CI upgrade/downgrade test (#25869)
Fix type annotations in
SkipMixin
(#25864)Workaround setuptools editable packages path issue (#25848)
Bump
undici
from5.8.0 to 5.9.1
in /airflow/www (#25801)Add custom_operator_name attr to
_BranchPythonDecoratedOperator
(#25783)Clarify
filename_template
deprecation message (#25749)Use
ParamSpec
to replace...
in Callable (#25658)Remove deprecated modules (#25543)
Documentation on task mapping additions (#24489)
Remove Smart Sensors (#25507)
Fix
elasticsearch
test config to avoid warning on deprecated template (#25520)Bump
terser
from4.8.0 to 4.8.1
in /airflow/ui (#25178)Generate
typescript
types from restAPI
docs (#25123)Upgrade utils files to
typescript
(#25089)Upgrade remaining context file to
typescript
. (#25096)Migrate files to
ts
(#25267)Upgrade grid Table component to
ts.
(#25074)Skip mapping against mapped
ti
if it returns None (#25047)Refactor
js
file structure (#25003)Move mapped kwargs introspection to separate type (#24971)
Only assert stuff for mypy when type checking (#24937)
Bump
moment
from2.29.3 to 2.29.4
in/airflow/www
(#24885)Remove “bad characters” from our codebase (#24841)
Remove
xcom_push
flag fromBashOperator
(#24824)Move Flask hook registration to end of file (#24776)
Upgrade more javascript files to
typescript
(#24715)Clean up task decorator type hints and docstrings (#24667)
Preserve original order of providers’ connection extra fields in UI (#24425)
Rename
charts.css
tochart.css
(#24531)Rename
grid.css
tochart.css
(#24529)Misc: create new process group by
set_new_process_group
utility (#24371)Airflow UI fix Prototype Pollution (#24201)
Bump
moto
version (#24222)Remove unused
[github_enterprise]
from ref docs (#24033)Clean up
f-strings
in logging calls (#23597)Add limit for
JPype1
(#23847)Simply json responses (#25518)
Add min attrs version (#26408)
Doc only changes¶
Add url prefix setting for
Celery
Flower (#25986)Updating deprecated configuration in examples (#26037)
Fix wrong link for taskflow tutorial (#26007)
Reorganize tutorials into a section (#25890)
Fix concept doc for dynamic task map (#26002)
Update code examples from “classic” operators to taskflow (#25845, #25657)
Add instructions on manually fixing
MySQL
Charset problems (#25938)Prefer the local Quick Start in docs (#25888)
Fix broken link to
Trigger Rules
(#25840)Improve docker documentation (#25735)
Correctly link to Dag parsing context in docs (#25722)
Add note on
task_instance_mutation_hook
usage (#25607)Note that TaskFlow API automatically passes data between tasks (#25577)
Update DAG run to clarify when a DAG actually runs (#25290)
Update tutorial docs to include a definition of operators (#25012)
Rewrite the Airflow documentation home page (#24795)
Fix
task-generated mapping
example (#23424)Add note on subtle logical date change in
2.2.0
(#24413)Add missing import in best-practices code example (#25391)
Airflow 2.3.4 (2022-08-23)¶
Significant Changes¶
Added new config [logging]log_formatter_class
to fix timezone display for logs on UI (#24811)¶
If you are using a custom Formatter subclass in your [logging]logging_config_class
, please inherit from airflow.utils.log.timezone_aware.TimezoneAware
instead of logging.Formatter
.
For example, in your custom_config.py
:
from airflow.utils.log.timezone_aware import TimezoneAware
# before
class YourCustomFormatter(logging.Formatter):
...
# after
class YourCustomFormatter(TimezoneAware):
...
AIRFLOW_FORMATTER = LOGGING_CONFIG["formatters"]["airflow"]
AIRFLOW_FORMATTER["class"] = "somewhere.your.custom_config.YourCustomFormatter"
# or use TimezoneAware class directly. If you don't have custom Formatter.
AIRFLOW_FORMATTER["class"] = "airflow.utils.log.timezone_aware.TimezoneAware"
Bug Fixes¶
Disable
attrs
state management onMappedOperator
(#24772)Serialize
pod_override
to JSON before picklingexecutor_config
(#24356)Fix
pid
check (#24636)Rotate session id during login (#25771)
Fix mapped sensor with reschedule mode (#25594)
Cache the custom secrets backend so the same instance gets reused (#25556)
Add right padding (#25554)
Fix reducing mapped length of a mapped task at runtime after a clear (#25531)
Fix
airflow db reset
when dangling tables exist (#25441)Change
disable_verify_ssl
behaviour (#25023)Set default task group in dag.add_task method (#25000)
Removed interfering force of index. (#25404)
Remove useless logging line (#25347)
Adding mysql index hint to use index on
task_instance.state
in critical section query (#25673)Configurable umask to all daemonized processes. (#25664)
Fix the errors raised when None is passed to template filters (#25593)
Allow wildcarded CORS origins (#25553)
Fix “This Session’s transaction has been rolled back” (#25532)
Fix Serialization error in
TaskCallbackRequest
(#25471)fix - resolve bash by absolute path (#25331)
Add
__repr__
to ParamsDict class (#25305)Only load distribution of a name once (#25296)
convert
TimeSensorAsync
target_time
to utc on call time (#25221)call
updateNodeLabels
afterexpandGroup
(#25217)Stop SLA callbacks gazumping other callbacks and DOS’ing the
DagProcessorManager
queue (#25147)Fix
invalidateQueries
call (#25097)airflow/www/package.json
: Add name, version fields. (#25065)No grid auto-refresh for backfill dag runs (#25042)
Fix tag link on dag detail page (#24918)
Fix zombie task handling with multiple schedulers (#24906)
Bind log server on worker to
IPv6
address (#24755) (#24846)Add
%z
for%(asctime)s
to fix timezone for logs on UI (#24811)TriggerDagRunOperator.operator_extra_links
is attr (#24676)Send DAG timeout callbacks to processor outside of
prohibit_commit
(#24366)Don’t rely on current ORM structure for db clean command (#23574)
Clear next method when clearing TIs (#23929)
Two typing fixes (#25690)
Doc only changes¶
Update set-up-database.rst (#24983)
Fix syntax in mysql setup documentation (#24893 (#24939)
Note how DAG policy works with default_args (#24804)
Update PythonVirtualenvOperator Howto (#24782)
Doc: Add hyperlinks to Github PRs for Release Notes (#24532)
Misc/Internal¶
Remove depreciation warning when use default remote tasks logging handlers (#25764)
clearer method name in scheduler_job.py (#23702)
Bump cattrs version (#25689)
Include missing mention of
external_executor_id
insql_engine_collation_for_ids
docs (#25197)Refactor
DR.task_instance_scheduling_decisions
(#24774)Sort operator extra links (#24992)
Extends
resolve_xcom_backend
function level documentation (#24965)Upgrade FAB to 4.1.3 (#24884)
Limit Flask to <2.3 in the wake of 2.2 breaking our tests (#25511)
Limit astroid version to < 2.12 (#24982)
Move javascript compilation to host (#25169)
Bump typing-extensions and mypy for ParamSpec (#25088)
Airflow 2.3.3 (2022-07-09)¶
Significant Changes¶
We’ve upgraded Flask App Builder to a major version 4.* (#24399)¶
Flask App Builder is one of the important components of Airflow Webserver, as it uses a lot of dependencies that are essential to run the webserver and integrate it in enterprise environments - especially authentication.
The FAB 4.* upgrades a number of dependencies to major releases, which upgrades them to versions that have a number of security issues fixed. A lot of tests were performed to bring the dependencies in a backwards-compatible way, however the dependencies themselves implement breaking changes in their internals so it might be that some of those changes might impact the users in case they are using the libraries for their own purposes.
One important change that you likely will need to apply to Oauth configuration is to add
server_metadata_url
or jwks_uri
and you can read about it more
in this issue.
Here is the list of breaking changes in dependencies that comes together with FAB 4:
Flask
from 1.X to 2.X breaking changes
flask-jwt-extended
3.X to 4.X breaking changes:
Jinja2
2.X to 3.X breaking changes:
Werkzeug
1.X to 2.X breaking changes
pyJWT
1.X to 2.X breaking changes:
Click
7.X to 8.X breaking changes:
itsdangerous
1.X to 2.X breaking changes
Bug Fixes¶
Fix exception in mini task scheduler (#24865)
Fix cycle bug with attaching label to task group (#24847)
Fix timestamp defaults for
sensorinstance
(#24638)Move fallible
ti.task.dag
assignment back insidetry/except
block (#24533) (#24592)Add missing types to
FSHook
(#24470)Mask secrets in
stdout
forairflow tasks test
(#24362)DebugExecutor
useti.run()
instead ofti._run_raw_task
(#24357)Fix bugs in
URI
constructor forMySQL
connection (#24320)Missing
scheduleinterval
nullable true added inopenapi
(#24253)Unify
return_code
interface for task runner (#24093)Handle occasional deadlocks in trigger with retries (#24071)
Remove special serde logic for mapped
op_kwargs
(#23860)ExternalTaskSensor
respectssoft_fail
if the external task enters afailed_state
(#23647)Fix
StatD
timing metric units (#21106)Add
cache_ok
flag to sqlalchemy TypeDecorators. (#24499)Allow for
LOGGING_LEVEL=DEBUG
(#23360)Fix grid date ticks (#24738)
Debounce status highlighting in Grid view (#24710)
Fix Grid vertical scrolling (#24684)
don’t try to render child rows for closed groups (#24637)
Do not calculate grid root instances (#24528)
Maintain grid view selection on filtering upstream (#23779)
Speed up
grid_data
endpoint by 10x (#24284)Apply per-run log templates to log handlers (#24153)
Don’t crash scheduler if exec config has old k8s objects (#24117)
TI.log_url
fix formap_index
(#24335)Fix migration
0080_2_0_2
- Replace null values before setting column not null (#24585)Patch
sql_alchemy_conn
if old Postgres schemes used (#24569)Seed
log_template
table (#24511)Fix deprecated
log_id_template
value (#24506)Fix toast messages (#24505)
Add indexes for CASCADE deletes for
task_instance
(#24488)Return empty dict if Pod JSON encoding fails (#24478)
Improve grid rendering performance with a custom tooltip (#24417, #24449)
Check for
run_id
for grid group summaries (#24327)Optimize calendar view for cron scheduled DAGs (#24262)
Use
get_hostname
instead ofsocket.getfqdn
(#24260)Check that edge nodes actually exist (#24166)
Fix
useTasks
crash on error (#24152)Do not fail re-queued TIs (#23846)
Reduce grid view API calls (#24083)
Rename Permissions to Permission Pairs. (#24065)
Replace
use_task_execution_date
withuse_task_logical_date
(#23983)Grid fix details button truncated and small UI tweaks (#23934)
Add TaskInstance State
REMOVED
to finished states and success states (#23797)Fix mapped task immutability after clear (#23667)
Fix permission issue for dag that has dot in name (#23510)
Fix closing connection
dbapi.get_pandas_df
(#23452)Check bag DAG
schedule_interval
match timetable (#23113)Parse error for task added to multiple groups (#23071)
Fix flaky order of returned dag runs (#24405)
Migrate
jsx
files that affect run/task selection totsx
(#24509)Fix links to sources for examples (#24386)
Set proper
Content-Type
andchartset
ongrid_data
endpoint (#24375)
Doc only changes¶
Update templates doc to mention
extras
and format AirflowVars
/Conns
(#24735)Document built in Timetables (#23099)
Alphabetizes two tables (#23923)
Clarify that users should not use Maria DB (#24556)
Add imports to deferring code samples (#24544)
Add note about image regeneration in June 2022 (#24524)
Small cleanup of
get_current_context()
chapter (#24482)Fix default 2.2.5
log_id_template
(#24455)Update description of installing providers separately from core (#24454)
Mention context variables and logging (#24304)
Misc/Internal¶
Remove internet explorer support (#24495)
Removing magic status code numbers from
api_connexion
(#24050)Upgrade FAB to
4.1.2
(#24619)Switch Markdown engine to
markdown-it-py
(#19702)Update
rich
to latest version across the board. (#24186)Get rid of
TimedJSONWebSignatureSerializer
(#24519)Update flask-appbuilder
authlib
/oauth
dependency (#24516)Upgrade to
webpack
5 (#24485)Add
typescript
(#24337)The JWT claims in the request to retrieve logs have been standardized: we use
nbf
andaud
claims for maturity and audience of the requests. Also “filename” payload field is used to keep log name. (#24519)Address all
yarn
test warnings (#24722)Upgrade to react 18 and chakra 2 (#24430)
Refactor
DagRun.verify_integrity
(#24114)Upgrade FAB to
4.1.1
(#24399)We now need at least
Flask-WTF 0.15
(#24621)
Airflow 2.3.2 (2022-06-04)¶
No significant changes.
Bug Fixes¶
Run the
check_migration
loop at least onceFix grid view for mapped tasks (#24059)
Icons in grid view for different DAG run types (#23970)
Faster grid view (#23951)
Disallow calling expand with no arguments (#23463)
Add missing
is_mapped
field to Task response. (#23319)DagFileProcessorManager: Start a new process group only if current process not a session leader (#23872)
Mask sensitive values for not-yet-running TIs (#23807)
Add cascade to
dag_tag
todag
foreign key (#23444)Use
--subdir
argument value for standalone dag processor. (#23864)Highlight task states by hovering on legend row (#23678)
Fix and speed up grid view (#23947)
Prevent UI from crashing if grid task instances are null (#23939)
Remove redundant register exit signals in
dag-processor
command (#23886)Add
__wrapped__
property to_TaskDecorator
(#23830)Fix UnboundLocalError when
sql
is empty list in DbApiHook (#23816)Enable clicking on DAG owner in autocomplete dropdown (#23804)
Simplify flash message for
_airflow_moved
tables (#23635)Exclude missing tasks from the gantt view (#23627)
Doc only changes¶
Add column names for DB Migration Reference (#23853)
Misc/Internal¶
Remove pinning for xmltodict (#23992)
Airflow 2.3.1 (2022-05-25)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Automatically reschedule stalled queued tasks in
CeleryExecutor
(#23690)Fix expand/collapse all buttons (#23590)
Grid view status filters (#23392)
Expand/collapse all groups (#23487)
Fix retrieval of deprecated non-config values (#23723)
Fix secrets rendered in UI when task is not executed. (#22754)
Fix provider import error matching (#23825)
Fix regression in ignoring symlinks (#23535)
Fix
dag-processor
fetch metadata database config (#23575)Fix auto upstream dep when expanding non-templated field (#23771)
Fix task log is not captured (#23684)
Add
reschedule
to the serialized fields for theBaseSensorOperator
(#23674)Modify db clean to also catch the ProgrammingError exception (#23699)
Remove titles from link buttons (#23736)
Fix grid details header text overlap (#23728)
Ensure
execution_timeout
as timedelta (#23655)Don’t run pre-migration checks for downgrade (#23634)
Add index for event column in log table (#23625)
Implement
send_callback
method forCeleryKubernetesExecutor
andLocalKubernetesExecutor
(#23617)Fix
PythonVirtualenvOperator
templated_fields (#23559)Apply specific ID collation to
root_dag_id
too (#23536)Prevent
KubernetesJobWatcher
getting stuck on resource too old (#23521)Fix scheduler crash when expanding with mapped task that returned none (#23486)
Fix broken dagrun links when many runs start at the same time (#23462)
Fix: Exception when parsing log #20966 (#23301)
Handle invalid date parsing in webserver views. (#23161)
Pools with negative open slots should not block other pools (#23143)
Move around overflow, position and padding (#23044)
Change approach to finding bad rows to LEFT OUTER JOIN. (#23528)
Only count bad refs when
moved
table exists (#23491)Visually distinguish task group summary (#23488)
Remove color change for highly nested groups (#23482)
Optimize 2.3.0 pre-upgrade check queries (#23458)
Add backward compatibility for
core__sql_alchemy_conn__cmd
(#23441)Fix literal cross product expansion (#23434)
Fix broken task instance link in xcom list (#23367)
Fix connection test button (#23345)
fix cli
airflow dags show
for mapped operator (#23339)Hide some task instance attributes (#23338)
Don’t show grid actions if server would reject with permission denied (#23332)
Use run_id for
ti.mark_success_url
(#23330)Fix update user auth stats (#23314)
Use
<Time />
in Mapped Instance table (#23313)Fix duplicated Kubernetes DeprecationWarnings (#23302)
Store grid view selection in url params (#23290)
Remove custom signal handling in Triggerer (#23274)
Override pool for TaskInstance when pool is passed from cli. (#23258)
Show warning if ‘/’ is used in a DAG run ID (#23106)
Use kubernetes queue in kubernetes hybrid executors (#23048)
Add tags inside try block. (#21784)
Doc only changes¶
Move
dag_processing.processor_timeouts
to counters section (#23393)Clarify that bundle extras should not be used for PyPi installs (#23697)
Synchronize support for Postgres and K8S in docs (#23673)
Replace DummyOperator references in docs (#23502)
Add doc notes for keyword-only args for
expand()
andpartial()
(#23373)Document fix for broken elasticsearch logs with 2.3.0+ upgrade (#23821)
Misc/Internal¶
Add typing for airflow/configuration.py (#23716)
Disable Flower by default from docker-compose (#23685)
Added postgres 14 to support versions(including breeze) (#23506)
add K8S 1.24 support (#23637)
Refactor code references from tree to grid (#23254)
Airflow 2.3.0 (2022-04-30)¶
For production docker image related changes, see the Docker Image Changelog.
Significant Changes¶
Passing execution_date
to XCom.set()
, XCom.clear()
, XCom.get_one()
, and XCom.get_many()
is deprecated (#19825)¶
Continuing the effort to bind TaskInstance to a DagRun, XCom entries are now also tied to a DagRun. Use the run_id
argument to specify the DagRun instead.
Task log templates are now read from the metadata database instead of airflow.cfg
(#20165)¶
Previously, a task’s log is dynamically rendered from the [core] log_filename_template
and [elasticsearch] log_id_template
config values at runtime. This resulted in unfortunate characteristics, e.g. it is impractical to modify the config value after an Airflow instance is running for a while, since all existing task logs have be saved under the previous format and cannot be found with the new config value.
A new log_template
table is introduced to solve this problem. This table is synchronized with the aforementioned config values every time Airflow starts, and a new field log_template_id
is added to every DAG run to point to the format used by tasks (NULL
indicates the first ever entry for compatibility).
Minimum kubernetes library version bumped from 3.0.0
to 21.7.0
(#20759)¶
Note
This is only about changing the kubernetes
library, not the Kubernetes cluster. Airflow support for
Kubernetes version is described in Installation prerequisites.
No change in behavior is expected. This was necessary in order to take advantage of a bugfix concerning refreshing of Kubernetes API tokens with EKS, which enabled the removal of some workaround code.
XCom now defined by run_id
instead of execution_date
(#20975)¶
As a continuation to the TaskInstance-DagRun relation change started in Airflow 2.2, the execution_date
columns on XCom has been removed from the database, and replaced by an association proxy field at the ORM level. If you access Airflow’s metadata database directly, you should rewrite the implementation to use the run_id
column instead.
Note that Airflow’s metadatabase definition on both the database and ORM levels are considered implementation detail without strict backward compatibility guarantees.
Non-JSON-serializable params deprecated (#21135).¶
It was previously possible to use dag or task param defaults that were not JSON-serializable.
For example this worked previously:
@dag.task(params={"a": {1, 2, 3}, "b": pendulum.now()})
def datetime_param(value):
print(value)
datetime_param("{{ params.a }} | {{ params.b }}")
Note the use of set
and datetime
types, which are not JSON-serializable. This behavior is problematic because to override these values in a dag run conf, you must use JSON, which could make these params non-overridable. Another problem is that the support for param validation assumes JSON. Use of non-JSON-serializable params will be removed in Airflow 3.0 and until then, use of them will produce a warning at parse time.
You must use postgresql://
instead of postgres://
in sql_alchemy_conn
for SQLAlchemy 1.4.0+ (#21205)¶
When you use SQLAlchemy 1.4.0+, you need to use postgresql://
as the scheme in the sql_alchemy_conn
.
In the previous versions of SQLAlchemy it was possible to use postgres://
, but using it in
SQLAlchemy 1.4.0+ results in:
> raise exc.NoSuchModuleError(
"Can't load plugin: %s:%s" % (self.group, name)
)
E sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:postgres
If you cannot change the scheme of your URL immediately, Airflow continues to work with SQLAlchemy 1.3 and you can downgrade SQLAlchemy, but we recommend updating the scheme. Details in the SQLAlchemy Changelog.
auth_backends
replaces auth_backend
configuration setting (#21472)¶
Previously, only one backend was used to authorize use of the REST API. In 2.3 this was changed to support multiple backends, separated by comma. Each will be tried in turn until a successful response is returned.
This setting is also used for the deprecated experimental API, which only uses the first option even if multiple are given.
airflow.models.base.Operator
is removed (#21505)¶
Previously, there was an empty class airflow.models.base.Operator
for “type hinting”. This class was never really useful for anything (everything it did could be done better with airflow.models.baseoperator.BaseOperator
), and has been removed. If you are relying on the class’s existence, use BaseOperator
(for concrete operators), airflow.models.abstractoperator.AbstractOperator
(the base class of both BaseOperator
and the AIP-42 MappedOperator
), or airflow.models.operator.Operator
(a union type BaseOperator | MappedOperator
for type annotation).
Zip files in the DAGs folder can no longer have a .py
extension (#21538)¶
It was previously possible to have any extension for zip files in the DAGs folder. Now .py
files are going to be loaded as modules without checking whether it is a zip file, as it leads to less IO. If a .py
file in the DAGs folder is a zip compressed file, parsing it will fail with an exception.
auth_backends
includes session (#21640)¶
To allow the Airflow UI to use the API, the previous default authorization backend airflow.api.auth.backend.deny_all
is changed to airflow.api.auth.backend.session
, and this is automatically added to the list of API authorization backends if a non-default value is set.
Default templates for log filenames and elasticsearch log_id changed (#21734)¶
In order to support Dynamic Task Mapping the default templates for per-task instance logging has changed. If your config contains the old default values they will be upgraded-in-place.
If you are happy with the new config values you should remove the setting in airflow.cfg
and let the default value be used. Old default values were:
[core] log_filename_template
:{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
[elasticsearch] log_id_template
:{dag_id}-{task_id}-{execution_date}-{try_number}
[core] log_filename_template
now uses “hive partition style” of dag_id=<id>/run_id=<id>
by default, which may cause problems on some older FAT filesystems. If this affects you then you will have to change the log template.
If you have customized the templates you should ensure that they contain {{ ti.map_index }}
if you want to use dynamically mapped tasks.
If after upgrading you find your task logs are no longer accessible, try adding a row in the log_template
table with id=0
containing your previous log_id_template
and log_filename_template
. For example, if you used the defaults in 2.2.5:
INSERT INTO log_template (id, filename, elasticsearch_id, created_at) VALUES (0, '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log', '{dag_id}-{task_id}-{execution_date}-{try_number}', NOW());
BaseOperatorLink’s get_link
method changed to take a ti_key
keyword argument (#21798)¶
In v2.2 we “deprecated” passing an execution date to XCom.get methods, but there was no other option for operator links as they were only passed an execution_date.
Now in 2.3 as part of Dynamic Task Mapping (AIP-42) we will need to add map_index to the XCom row to support the “reduce” part of the API.
In order to support that cleanly we have changed the interface for BaseOperatorLink to take an TaskInstanceKey as the ti_key
keyword argument (as execution_date + task is no longer unique for mapped operators).
The existing signature will be detected (by the absence of the ti_key
argument) and continue to work.
ReadyToRescheduleDep
now only runs when reschedule
is True (#21815)¶
When a ReadyToRescheduleDep
is run, it now checks whether the reschedule
attribute on the operator, and always reports itself as passed unless it is set to True. If you use this dep class on your custom operator, you will need to add this attribute to the operator class. Built-in operator classes that use this dep class (including sensors and all subclasses) already have this attribute and are not affected.
The deps
attribute on an operator class should be a class level attribute (#21815)¶
To support operator-mapping (AIP 42), the deps
attribute on operator class must be a set at the class level. This means that if a custom operator implements this as an instance-level variable, it will not be able to be used for operator-mapping. This does not affect existing code, but we highly recommend you to restructure the operator’s dep logic in order to support the new feature.
Deprecation: Connection.extra
must be JSON-encoded dict (#21816)¶
TLDR¶
From Airflow 3.0, the extra
field in airflow connections must be a JSON-encoded Python dict.
What, why, and when?¶
Airflow’s Connection is used for storing credentials. For storage of information that does not
fit into user / password / host / schema / port, we have the extra
string field. Its intention
was always to provide for storage of arbitrary key-value pairs, like no_host_key_check
in the SSH
hook, or keyfile_dict
in GCP.
But since the field is string, it’s technically been permissible to store any string value. For example
one could have stored the string value 'my-website.com'
and used this in the hook. But this is a very
bad practice. One reason is intelligibility: when you look at the value for extra
, you don’t have any idea
what its purpose is. Better would be to store {"api_host": "my-website.com"}
which at least tells you
something about the value. Another reason is extensibility: if you store the API host as a simple string
value, what happens if you need to add more information, such as the API endpoint, or credentials? Then
you would need to convert the string to a dict, and this would be a breaking change.
For these reason, starting in Airflow 3.0 we will require that the Connection.extra
field store
a JSON-encoded Python dict.
How will I be affected?¶
For users of providers that are included in the Airflow codebase, you should not have to make any changes
because in the Airflow codebase we should not allow hooks to misuse the Connection.extra
field in this way.
However, if you have any custom hooks that store something other than JSON dict, you will have to update it. If you do, you should see a warning any time that this connection is retrieved or instantiated (e.g. it should show up in task logs).
To see if you have any connections that will need to be updated, you can run this command:
airflow connections export - 2>&1 >/dev/null | grep 'non-JSON'
This will catch any warnings about connections that are storing something other than JSON-encoded Python dict in the extra
field.
The tree
default view setting has been renamed to grid
(#22167)¶
If you set the dag_default_view
config option or the default_view
argument to DAG()
to tree
you will need to update your deployment. The old name will continue to work but will issue warnings.
Database configuration moved to new section (#22284)¶
The following configurations have been moved from [core]
to the new [database]
section. However when reading the new option, the old option will be checked to see if it exists. If it does a DeprecationWarning will be issued and the old option will be used instead.
sql_alchemy_conn
sql_engine_encoding
sql_engine_collation_for_ids
sql_alchemy_pool_enabled
sql_alchemy_pool_size
sql_alchemy_max_overflow
sql_alchemy_pool_recycle
sql_alchemy_pool_pre_ping
sql_alchemy_schema
sql_alchemy_connect_args
load_default_connections
max_db_retries
Remove requirement that custom connection UI fields be prefixed (#22607)¶
Hooks can define custom connection fields for their connection type by implementing method get_connection_form_widgets
. These custom fields appear in the web UI as additional connection attributes, but internally they are stored in the connection extra
dict field. For technical reasons, previously, when stored in the extra
dict, the custom field’s dict key had to take the form extra__<conn type>__<field name>
. This had the consequence of making it more cumbersome to define connections outside of the UI, since the prefix extra__<conn type>__
makes it tougher to read and work with. With #22607, we make it so that you can now define custom fields such that they can be read from and stored in extra
without the prefix.
To enable this, update the dict returned by the get_connection_form_widgets
method to remove the prefix from the keys. Internally, the providers manager will still use a prefix to ensure each custom field is globally unique, but the absence of a prefix in the returned widget dict will signal to the Web UI to read and store custom fields without the prefix. Note that this is only a change to the Web UI behavior; when updating your hook in this way, you must make sure that when your hook reads the extra
field, it will also check for the prefixed value for backward compatibility.
The webserver.X_FRAME_ENABLED configuration works according to description now (#23222).¶
In Airflow 2.0.0 - 2.2.4 the webserver.X_FRAME_ENABLED parameter worked the opposite of its description, setting the value to “true” caused “X-Frame-Options” header to “DENY” (not allowing Airflow to be used in an iframe). When you set it to “false”, the header was not added, so Airflow could be embedded in an iframe. By default Airflow could not be embedded in an iframe.
In Airflow 2.2.5 there was a bug introduced that made it impossible to disable Airflow to work in iframe. No matter what the configuration was set, it was possible to embed Airflow in an iframe.
Airflow 2.3.0 restores the original meaning to the parameter. If you set it to “true” (default) Airflow can be embedded in an iframe (no header is added), but when you set it to “false” the header is added and Airflow cannot be embedded in an iframe.
New Features¶
Add dynamic task mapping (AIP-42)
New Grid View replaces Tree View (#18675)
Templated
requirements.txt
in Python Operators (#17349)Allow reuse of decorated tasks (#22941)
Move the database configuration to a new section (#22284)
Add
SmoothOperator
(#22813)Make operator’s
execution_timeout
configurable (#22389)Events Timetable (#22332)
Support dag serialization with custom
ti_deps
rules (#22698)Support log download in task log view (#22804)
support for continue backfill on failures (#22697)
Add
dag-processor
cli command (#22305)Add possibility to create users in LDAP mode (#22619)
Add
ignore_first_depends_on_past
for scheduled jobs (#22491)Update base sensor operator to support XCOM return value (#20656)
Add an option for run id in the ui trigger screen (#21851)
Enable JSON serialization for connections (#19857)
Add REST API endpoint for bulk update of DAGs (#19758)
Add queue button to click-on-DagRun interface. (#21555)
Add
list-import-errors
toairflow dags
command (#22084)Store callbacks in database if
standalone_dag_processor
config is True. (#21731)Add LocalKubernetesExecutor (#19729)
Add
celery.task_timeout_error
metric (#21602)Airflow
db downgrade
cli command (#21596)Add
ALL_SKIPPED
trigger rule (#21662)Add
db clean
CLI command for purging old data (#20838)Add
celery_logging_level
(#21506)Support different timeout value for dag file parsing (#21501)
Support generating SQL script for upgrades (#20962)
Add option to compress Serialized dag data (#21332)
Branch python operator decorator (#20860)
Add Audit Log View to Dag View (#20733)
Add missing StatsD metric for failing SLA Callback notification (#20924)
Add
ShortCircuitOperator
configurability for respecting downstream trigger rules (#20044)Allow using Markup in page title in Webserver (#20888)
Add Listener Plugin API that tracks TaskInstance state changes (#20443)
Add context var hook to inject more env vars (#20361)
Add a button to set all tasks to skipped (#20455)
Cleanup pending pods (#20438)
Add config to warn public deployment exposure in UI (#18557)
Log filename template records (#20165)
Added windows extensions (#16110)
Showing approximate time until next dag_run in Airflow (#20273)
Extend config window on UI (#20052)
Add show dag dependencies feature to CLI (#19985)
Add cli command for ‘airflow dags reserialize` (#19471)
Add missing description field to Pool schema(REST API) (#19841)
Introduce DagRun action to change state to queued. (#19353)
Add DAG run details page (#19705)
Add role export/import to cli tools (#18916)
Adding
dag_id_pattern
parameter to the/dags
endpoint (#18924)
Improvements¶
Show schedule_interval/timetable description in UI (#16931)
Added column duration to DAG runs view (#19482)
Enable use of custom conn extra fields without prefix (#22607)
Initialize finished counter at zero (#23080)
Improve logging of optional provider features messages (#23037)
Meaningful error message in resolve_template_files (#23027)
Update ImportError items instead of deleting and recreating them (#22928)
Add option
--skip-init
to db reset command (#22989)Support importing connections from files with “.yml” extension (#22872)
Support glob syntax in
.airflowignore
files (#21392) (#22051)Hide pagination when data is a single page (#22963)
Support for sorting DAGs in the web UI (#22671)
Speed up
has_access
decorator by ~200ms (#22858)Add XComArg to lazy-imported list of Airflow module (#22862)
Add more fields to REST API dags/dag_id/details endpoint (#22756)
Don’t show irrelevant/duplicated/”internal” Task attrs in UI (#22812)
No need to load whole ti in current_state (#22764)
Pickle dag exception string fix (#22760)
Better verification of Localexecutor’s parallelism option (#22711)
log backfill exceptions to sentry (#22704)
retry commit on MySQL deadlocks during backfill (#22696)
Add more fields to REST API get DAG(dags/dag_id) endpoint (#22637)
Use timetable to generate planned days for current year (#22055)
Disable connection pool for celery worker (#22493)
Make date picker label visible in trigger dag view (#22379)
Expose
try_number
in airflow vars (#22297)Add generic connection type (#22310)
Add a few more fields to the taskinstance finished log message (#22262)
Pause auto-refresh if scheduler isn’t running (#22151)
Show DagModel details. (#21868)
Add pip_install_options to PythonVirtualenvOperator (#22158)
Show import error for
airflow dags list
CLI command (#21991)Pause auto-refresh when page is hidden (#21904)
Default args type check (#21809)
Enhance magic methods on XComArg for UX (#21882)
py files don’t have to be checked
is_zipfiles
in refresh_dag (#21926)Fix TaskDecorator type hints (#21881)
Add ‘Show record’ option for variables (#21342)
Use DB where possible for quicker
airflow dag
subcommands (#21793)REST API: add rendered fields in task instance. (#21741)
Change the default auth backend to session (#21640)
Don’t check if
py
DAG files are zipped during parsing (#21538)Switch XCom implementation to use
run_id
(#20975)Action log on Browse Views (#21569)
Implement multiple API auth backends (#21472)
Change logging level details of connection info in
get_connection()
(#21162)Support mssql in airflow db shell (#21511)
Support config
worker_enable_remote_control
for celery (#21507)Log memory usage in
CgroupTaskRunner
(#21481)Modernize DAG-related URL routes and rename “tree” to “grid” (#20730)
Move Zombie detection to
SchedulerJob
(#21181)Improve speed to run
airflow
by 6x (#21438)Add more SQL template fields renderers (#21237)
Simplify fab has access lookup (#19294)
Log context only for default method (#21244)
Log trigger status only if at least one is running (#21191)
Add optional features in providers. (#21074)
Better multiple_outputs inferral for @task.python (#20800)
Improve handling of string type and non-attribute
template_fields
(#21054)Remove un-needed deps/version requirements (#20979)
Correctly specify overloads for TaskFlow API for type-hinting (#20933)
Introduce notification_sent to SlaMiss view (#20923)
Rewrite the task decorator as a composition (#20868)
Add “Greater/Smaller than or Equal” to filters in the browse views (#20602) (#20798)
Rewrite DAG run retrieval in task command (#20737)
Speed up creation of DagRun for large DAGs (5k+ tasks) by 25-130% (#20722)
Make native environment Airflow-flavored like sandbox (#20704)
Better error when param value has unexpected type (#20648)
Add filter by state in DagRun REST API (List Dag Runs) (#20485)
Prevent exponential memory growth in Tasks with custom logging handler (#20541)
Set default logger in logging Mixin (#20355)
Reduce deprecation warnings from www (#20378)
Add hour and minute to time format on x-axis of all charts using nvd3.lineChart (#20002)
Add specific warning when Task asks for more slots than pool defined with (#20178)
UI: Update duration column for better human readability (#20112)
Use Viewer role as example public role (#19215)
Properly implement DAG param dict copying (#20216)
ShortCircuitOperator
push XCom by returning python_callable result (#20071)Add clear logging to tasks killed due to a Dagrun timeout (#19950)
Change log level for Zombie detection messages (#20204)
Better confirmation prompts (#20183)
Only execute TIs of running DagRuns (#20182)
Check and run migration in commands if necessary (#18439)
Log only when Zombies exists (#20118)
Increase length of the email and username (#19932)
Add more filtering options for TI’s in the UI (#19910)
Dynamically enable “Test Connection” button by connection type (#19792)
Avoid littering postgres server logs with “could not obtain lock” with HA schedulers (#19842)
Renamed
Connection.get_hook
parameter to make it the same as inSqlSensor
andSqlOperator
. (#19849)Add hook_params in SqlSensor using the latest changes from PR #18718. (#18431)
Speed up webserver boot time by delaying provider initialization (#19709)
Configurable logging of
XCOM
value in PythonOperator (#19378)Minimize production js files (#19658)
Add
hook_params
inBaseSqlOperator
(#18718)Add missing “end_date” to hash components (#19281)
More friendly output of the airflow plugins command + add timetables (#19298)
Add sensor default timeout config (#19119)
Update
taskinstance
REST API schema to include dag_run_id field (#19105)Adding feature in bash operator to append the user defined env variable to system env variable (#18944)
Duplicate Connection: Added logic to query if a connection id exists before creating one (#18161)
Bug Fixes¶
Use inherited ‘trigger_tasks’ method (#23016)
In DAG dependency detector, use class type instead of class name (#21706)
Fix tasks being wrongly skipped by schedule_after_task_execution (#23181)
Fix X-Frame enabled behaviour (#23222)
Allow
extra
to be nullable in connection payload as per schema(REST API). (#23183)Fix
dag_id
extraction for dag level access checks in web ui (#23015)Fix timezone display for logs on UI (#23075)
Include message in graph errors (#23021)
Change trigger dropdown left position (#23013)
Don’t add planned tasks for legacy DAG runs (#23007)
Add dangling rows check for TaskInstance references (#22924)
Validate the input params in connection
CLI
command (#22688)Fix trigger event payload is not persisted in db (#22944)
Drop “airflow moved” tables in command
db reset
(#22990)Add max width to task group tooltips (#22978)
Add template support for
external_task_ids
. (#22809)Allow
DagParam
to hold falsy values (#22964)Fix regression in pool metrics (#22939)
Priority order tasks even when using pools (#22483)
Do not clear XCom when resuming from deferral (#22932)
Handle invalid JSON metadata in
get_logs_with_metadata endpoint
. (#22898)Fix pre-upgrade check for rows dangling w.r.t. dag_run (#22850)
Fixed backfill interference with scheduler (#22701)
Support conf param override for backfill runs (#22837)
Correctly interpolate pool name in
PoolSlotsAvailableDep
statues (#22807)Fix
email_on_failure
withrender_template_as_native_obj
(#22770)Fix processor cleanup on
DagFileProcessorManager
(#22685)Prevent meta name clash for task instances (#22783)
remove json parse for gantt chart (#22780)
Check for missing dagrun should know version (#22752)
Fixes
ScheduleInterval
spec (#22635)Fixing task status for non-running and non-committed tasks (#22410)
Do not log the hook connection details even at DEBUG level (#22627)
Stop crashing when empty logs are received from kubernetes client (#22566)
Fix bugs about timezone change (#22525)
Fix entire DAG stops when one task has end_date (#20920)
Use logger to print message during task execution. (#22488)
Make sure finalizers are not skipped during exception handling (#22475)
update smart sensor docs and minor fix on
is_smart_sensor_compatible()
(#22386)Fix
run_id
k8s and elasticsearch compatibility with Airflow 2.1 (#22385)Allow to
except_skip
None onBranchPythonOperator
(#20411)Fix incorrect datetime details (DagRun views) (#21357)
Remove incorrect deprecation warning in secrets backend (#22326)
Remove
RefreshConfiguration
workaround for K8s token refreshing (#20759)Masking extras in GET
/connections/<connection>
endpoint (#22227)Set
queued_dttm
when submitting task to directly to executor (#22259)Addressed some issues in the tutorial mentioned in discussion #22233 (#22236)
Change default python executable to python3 for docker decorator (#21973)
Don’t validate that Params are JSON when NOTSET (#22000)
Add per-DAG delete permissions (#21938)
Fix handling some None parameters in kubernetes 23 libs. (#21905)
Fix handling of empty (None) tags in
bulk_write_to_db
(#21757)Fix DAG date range bug (#20507)
Removed
request.referrer
from views.py (#21751)Make
DbApiHook
useget_uri
from Connection (#21764)Fix some migrations (#21670)
[de]serialize resources on task correctly (#21445)
Add params
dag_id
,task_id
etc toXCom.serialize_value
(#19505)Update test connection functionality to use custom form fields (#21330)
fix all “high” npm vulnerabilities (#21526)
Fix bug incorrectly removing action from role, rather than permission. (#21483)
Fix relationship join bug in FAB/SecurityManager with SQLA 1.4 (#21296)
Use Identity instead of Sequence in SQLAlchemy 1.4 for MSSQL (#21238)
Ensure
on_task_instance_running
listener can get at task (#21157)Return to the same place when triggering a DAG (#20955)
Fix task ID deduplication in
@task_group
(#20870)Add downgrade to some FAB migrations (#20874)
Only validate Params when DAG is triggered (#20802)
Fix
airflow trigger
cli (#20781)Fix task instances iteration in a pool to prevent blocking (#20816)
Allow depending to a
@task_group
as a whole (#20671)Use original task’s
start_date
if a task continues after deferral (#20062)Disabled edit button in task instances list view page (#20659)
Fix a package name import error (#20519) (#20519)
Remove
execution_date
label when get cleanup pods list (#20417)Remove unneeded FAB REST API endpoints (#20487)
Fix parsing of Cloudwatch log group arn containing slashes (#14667) (#19700)
Sanity check for MySQL’s TIMESTAMP column (#19821)
Allow using default celery command group with executors subclassed from Celery-based executors. (#18189)
Move
class_permission_name
to mixin so it applies to all classes (#18749)Adjust trimmed_pod_id and replace ‘.’ with ‘-‘ (#19036)
Pass custom_headers to send_email and send_email_smtp (#19009)
Ensure
catchup=False
is used in example dags (#19396)Edit permalinks in OpenApi description file (#19244)
Navigate directly to DAG when selecting from search typeahead list (#18991)
[Minor] Fix padding on home page (#19025)
Doc only changes¶
Update doc for DAG file processing (#23209)
Replace changelog/updating with release notes and
towncrier
now (#22003)Fix wrong reference in tracking-user-activity.rst (#22745)
Remove references to
rbac = True
from docs (#22725)Doc: Update description for executor-bound dependencies (#22601)
Update check-health.rst (#22372)
Stronger language about Docker Compose customizability (#22304)
Update logging-tasks.rst (#22116)
Add example config of
sql_alchemy_connect_args
(#22045)Update best-practices.rst (#22053)
Add information on DAG pausing/deactivation/deletion (#22025)
Add brief examples of integration test dags you might want (#22009)
Run inclusive language check on CHANGELOG (#21980)
Add detailed email docs for Sendgrid (#21958)
Add docs for
db upgrade
/db downgrade
(#21879)Update modules_management.rst (#21889)
Fix UPDATING section on SqlAlchemy 1.4 scheme changes (#21887)
Update TaskFlow tutorial doc to show how to pass “operator-level” args. (#21446)
Fix doc - replace decreasing by increasing (#21805)
Add another way to dynamically generate DAGs to docs (#21297)
Add extra information about time synchronization needed (#21685)
Update debug.rst docs (#21246)
Replaces the usage of
postgres://
withpostgresql://
(#21205)Fix task execution process in
CeleryExecutor
docs (#20783)
Misc/Internal¶
Bring back deprecated security manager functions (#23243)
Replace usage of
DummyOperator
withEmptyOperator
(#22974)Deprecate
DummyOperator
in favor ofEmptyOperator
(#22832)Remove unnecessary python 3.6 conditionals (#20549)
Bump
moment
from 2.29.1 to 2.29.2 in /airflow/www (#22873)Bump
prismjs
from 1.26.0 to 1.27.0 in /airflow/www (#22823)Bump
nanoid
from 3.1.23 to 3.3.2 in /airflow/www (#22803)Bump
minimist
from 1.2.5 to 1.2.6 in /airflow/www (#22798)Remove dag parsing from db init command (#22531)
Update our approach for executor-bound dependencies (#22573)
Use
Airflow.Base.metadata
in FAB models (#22353)Limit docutils to make our documentation pretty again (#22420)
Add Python 3.10 support (#22050)
[FEATURE] add 1.22 1.23 K8S support (#21902)
Remove pandas upper limit now that SQLA is 1.4+ (#22162)
Patch
sql_alchemy_conn
if old postgres scheme used (#22333)Protect against accidental misuse of XCom.get_value() (#22244)
Order filenames for migrations (#22168)
Don’t try to auto generate migrations for Celery tables (#22120)
Require SQLAlchemy 1.4 (#22114)
bump sphinx-jinja (#22101)
Add compat shim for SQLAlchemy to avoid warnings (#21959)
Rename
xcom.dagrun_id
toxcom.dag_run_id
(#21806)Deprecate non-JSON
conn.extra
(#21816)Bump upper bound version of
jsonschema
to 5.0 (#21712)Deprecate helper utility
days_ago
(#21653)Remove
`:type`
lines nowsphinx-autoapi
supports type hints (#20951)Silence deprecation warning in tests (#20900)
Use
DagRun.run_id
instead ofexecution_date
when updating state of TIs (UI & REST API) (#18724)Add Context stub to Airflow packages (#20817)
Update Kubernetes library version (#18797)
Rename
PodLauncher
toPodManager
(#20576)Removes Python 3.6 support (#20467)
Add deprecation warning for non-json-serializable params (#20174)
Rename TaskMixin to DependencyMixin (#20297)
Deprecate passing execution_date to XCom methods (#19825)
Remove
get_readable_dags
andget_editable_dags
, andget_accessible_dags
. (#19961)Remove postgres 9.6 support (#19987)
Removed hardcoded connection types. Check if hook is instance of DbApiHook. (#19639)
add kubernetes 1.21 support (#19557)
Add FAB base class and set import_name explicitly. (#19667)
Removes unused state transitions to handle auto-changing view permissions. (#19153)
Chore: Use enum for
__var
and__type
members (#19303)Use fab models (#19121)
Consolidate method names between Airflow Security Manager and FAB default (#18726)
Remove distutils usages for Python 3.10 (#19064)
Removing redundant
max_tis_per_query
initialisation on SchedulerJob (#19020)Remove deprecated usage of
init_role()
from API (#18820)Remove duplicate code on dbapi hook (#18821)
Airflow 2.2.5, (2022-04-04)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Check and disallow a relative path for sqlite (#22530)
Fixed dask executor and tests (#22027)
Fix broken links to celery documentation (#22364)
Fix incorrect data provided to tries & landing times charts (#21928)
Fix assignment of unassigned triggers (#21770)
Fix triggerer
--capacity
parameter (#21753)Fix graph auto-refresh on page load (#21736)
Fix filesystem sensor for directories (#21729)
Fix stray
order_by(TaskInstance.execution_date)
(#21705)Correctly handle multiple ‘=’ in LocalFileSystem secrets. (#21694)
Log exception in local executor (#21667)
Disable
default_pool
delete on web ui (#21658)Extends
typing-extensions
to be installed with python 3.8+ #21566 (#21567)Dispose unused connection pool (#21565)
Fix logging JDBC SQL error when task fails (#21540)
Filter out default configs when overrides exist. (#21539)
Fix Resources
__eq__
check (#21442)Fix
max_active_runs=1
not scheduling runs whenmin_file_process_interval
is high (#21413)Reduce DB load incurred by Stale DAG deactivation (#21399)
Fix race condition between triggerer and scheduler (#21316)
Fix trigger dag redirect from task instance log view (#21239)
Log traceback in trigger exceptions (#21213)
A trigger might use a connection; make sure we mask passwords (#21207)
Update
ExternalTaskSensorLink
to handle templatedexternal_dag_id
(#21192)Ensure
clear_task_instances
sets valid run state (#21116)Fix: Update custom connection field processing (#20883)
Truncate stack trace to DAG user code for exceptions raised during execution (#20731)
Fix duplicate trigger creation race condition (#20699)
Fix Tasks getting stuck in scheduled state (#19747)
Fix: Do not render undefined graph edges (#19684)
Set
X-Frame-Options
header to DENY only ifX_FRAME_ENABLED
is set to true. (#19491)
Doc only changes¶
adding
on_execute_callback
to callbacks docs (#22362)Add documentation on specifying a DB schema. (#22347)
Fix postgres part of pipeline example of tutorial (#21586)
Extend documentation for states of DAGs & tasks and update trigger rules docs (#21382)
DB upgrade is required when updating Airflow (#22061)
Remove misleading MSSQL information from the docs (#21998)
Misc¶
Add the new Airflow Trove Classifier to setup.cfg (#22241)
Rename
to_delete
toto_cancel
in TriggerRunner (#20658)Update Flask-AppBuilder to
3.4.5
(#22596)
Airflow 2.2.4, (2022-02-22)¶
Significant Changes¶
Smart sensors deprecated¶
Smart sensors, an “early access” feature added in Airflow 2, are now deprecated and will be removed in Airflow 2.4.0. They have been superseded by Deferrable Operators, added in Airflow 2.2.0.
See Migrating to Deferrable Operators for details on how to migrate.
Bug Fixes¶
Adding missing login provider related methods from Flask-Appbuilder (#21294)
Fix slow DAG deletion due to missing
dag_id
index for job table (#20282)Add a session backend to store session data in the database (#21478)
Show task status only for running dags or only for the last finished dag (#21352)
Use compat data interval shim in log handlers (#21289)
Fix mismatch in generated run_id and logical date of DAG run (#18707)
Fix TriggerDagRunOperator extra link (#19410)
Add possibility to create user in the Remote User mode (#19963)
Avoid deadlock when rescheduling task (#21362)
Fix the incorrect scheduling time for the first run of dag (#21011)
Fix Scheduler crash when executing task instances of missing DAG (#20349)
Deferred tasks does not cancel when DAG is marked fail (#20649)
Removed duplicated dag_run join in
Dag.get_task_instances()
(#20591)Avoid unintentional data loss when deleting DAGs (#20758)
Fix session usage in
/rendered-k8s
view (#21006)Fix
airflow dags backfill --reset-dagruns
errors when run twice (#21062)Do not set
TaskInstance.max_tries
inrefresh_from_task
(#21018)Don’t require dag_id in body in dagrun REST API endpoint (#21024)
Add Roles from Azure OAUTH Response in internal Security Manager (#20707)
Allow Viewing DagRuns and TIs if a user has DAG “read” perms (#20663)
Fix running
airflow dags test <dag_id> <execution_dt>
results in error when run twice (#21031)Switch to non-vendored latest connexion library (#20910)
Bump flask-appbuilder to
>=3.3.4
(#20628)upgrade celery to
5.2.3
(#19703)Bump croniter from
<1.1
to<1.2
(#20489)Avoid calling
DAG.following_schedule()
forTaskInstance.get_template_context()
(#20486)Fix(standalone): Remove hardcoded Webserver port (#20429)
Remove unnecessary logging in experimental API (#20356)
Un-ignore DeprecationWarning (#20322)
Deepcopying Kubernetes Secrets attributes causing issues (#20318)
Fix(dag-dependencies): fix arrow styling (#20303)
Adds retry on taskinstance retrieval lock (#20030)
Correctly send timing metrics when using dogstatsd (fix schedule_delay metric) (#19973)
Enhance
multiple_outputs
inference of dict typing (#19608)Fixing Amazon SES email backend (#18042)
Pin MarkupSafe until we are able to upgrade Flask/Jinja (#21664)
Doc only changes¶
Added explaining concept of logical date in DAG run docs (#21433)
Add note about Variable precedence with env vars (#21568)
Update error docs to include before_send option (#21275)
Augment xcom docs (#20755)
Add documentation and release policy on “latest” constraints (#21093)
Add a link to the DAG model in the Python API reference (#21060)
Added an enum param example (#20841)
Compare taskgroup and subdag (#20700)
Add note about reserved
params
keyword (#20640)Improve documentation on
Params
(#20567)Fix typo in MySQL Database creation code (Set up DB docs) (#20102)
Add requirements.txt description (#20048)
Clean up
default_args
usage in docs (#19803)Add docker-compose explanation to conn localhost (#19076)
Update CSV ingest code for tutorial (#18960)
Adds Pendulum 1.x -> 2.x upgrade documentation (#18955)
Clean up dynamic
start_date
values from docs (#19607)Docs for multiple pool slots (#20257)
Update upgrading.rst with detailed code example of how to resolve post-upgrade warning (#19993)
Misc¶
Deprecate some functions in the experimental API (#19931)
Deprecate smart sensors (#20151)
Airflow 2.2.3, (2021-12-21)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Lazy Jinja2 context (#20217)
Exclude
snowflake-sqlalchemy
v1.2.5 (#20245)Move away from legacy
importlib.resources
API (#19091)Move
setgid
as the first command executed in forked task runner (#20040)Fix race condition when starting
DagProcessorAgent
(#19935)Limit
httpx
to <0.20.0 (#20218)Log provider import errors as debug warnings (#20172)
Bump minimum required
alembic
version (#20153)Fix log link in gantt view (#20121)
fixing #19028 by moving chown to use sudo (#20114)
Lift off upper bound for
MarkupSafe
(#20113)Fix infinite recursion on redact log (#20039)
Fix db downgrades (#19994)
Context class handles deprecation (#19886)
Fix possible reference to undeclared variable (#19933)
Validate
DagRun
state is valid on assignment (#19898)Workaround occasional deadlocks with MSSQL (#19856)
Enable task run setting to be able reinitialize (#19845)
Fix log endpoint for same task (#19672)
Cast macro datetime string inputs explicitly (#19592)
Do not crash with stacktrace when task instance is missing (#19478)
Fix log timezone in task log view (#19342) (#19401)
Fix: Add taskgroup tooltip to graph view (#19083)
Rename execution date in forms and tables (#19063)
Simplify “invalid TI state” message (#19029)
Handle case of nonexistent file when preparing file path queue (#18998)
Do not create dagruns for DAGs with import errors (#19367)
Fix field relabeling when switching between conn types (#19411)
KubernetesExecutor
should default to template image if used (#19484)Fix task instance api cannot list task instances with
None
state (#19487)Fix IntegrityError in
DagFileProcessor.manage_slas
(#19553)Declare data interval fields as serializable (#19616)
Relax timetable class validation (#19878)
Fix labels used to find queued
KubernetesExecutor
pods (#19904)Fix moved data migration check for MySQL when replication is used (#19999)
Doc only changes¶
Warn without tracebacks when example_dags are missing deps (#20295)
Deferrable operators doc clarification (#20150)
Ensure the example DAGs are all working (#19355)
Updating core example DAGs to use TaskFlow API where applicable (#18562)
Add xcom clearing behaviour on task retries (#19968)
Add a short chapter focusing on adapting secret format for connections (#19859)
Add information about supported OS-es for Apache Airflow (#19855)
Update docs to reflect that changes to the
base_log_folder
require updating other configs (#19793)Disclaimer in
KubernetesExecutor
pod template docs (#19686)Add upgrade note on
execution_date
->run_id
(#19593)Expanding
.output
operator property information in TaskFlow tutorial doc (#19214)Add example SLA DAG (#19563)
Add a proper example to patch DAG (#19465)
Add DAG file processing description to Scheduler Concepts (#18954)
Updating explicit arg example in TaskFlow API tutorial doc (#18907)
Adds back documentation about context usage in Python/@task (#18868)
Add release date for when an endpoint/field is added in the REST API (#19203)
Better
pod_template_file
examples (#19691)Add description on how you can customize image entrypoint (#18915)
Dags-in-image pod template example should not have dag mounts (#19337)
Airflow 2.2.2 (2021-11-15)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Fix bug when checking for existence of a Variable (#19395)
Fix Serialization when
relativedelta
is passed asschedule_interval
(#19418)Fix moving of dangling TaskInstance rows for SQL Server (#19425)
Fix task instance modal in gantt view (#19258)
Fix serialization of
Params
with set data type (#19267)Check if job object is
None
before calling.is_alive()
(#19380)Task should fail immediately when pod is unprocessable (#19359)
Fix downgrade for a DB Migration (#19390)
Only mark SchedulerJobs as failed, not any jobs (#19375)
Fix message on “Mark as” confirmation page (#19363)
Bugfix: Check next run exists before reading data interval (#19307)
Fix MySQL db migration with default encoding/collation (#19268)
Fix hidden tooltip position (#19261)
sqlite_default
Connection has been hard-coded to/tmp
, usegettempdir
instead (#19255)Fix Toggle Wrap on DAG code page (#19211)
Clarify “dag not found” error message in CLI (#19338)
Add Note to SLA regarding
schedule_interval
(#19173)Use
execution_date
to check for existingDagRun
forTriggerDagRunOperator
(#18968)Add explicit session parameter in
PoolSlotsAvailableDep
(#18875)FAB still requires
WTForms<3.0
(#19466)Fix missing dagruns when
catchup=True
(#19528)
Doc only changes¶
Add missing parameter documentation for “timetable” (#19282)
Improve Kubernetes Executor docs (#19339)
Update image tag used in docker docs
Airflow 2.2.1 (2021-10-29)¶
Significant Changes¶
Param
’s default value for default
removed¶
Param
, introduced in Airflow 2.2.0, accidentally set the default value to None
. This default has been removed. If you want None
as your default, explicitly set it as such. For example:
Param(None, type=["null", "string"])
Now if you resolve a Param
without a default and don’t pass a value, you will get an TypeError
. For Example:
Param().resolve() # raises TypeError
max_queued_runs_per_dag
configuration has been removed¶
The max_queued_runs_per_dag
configuration option in [core]
section has been removed. Previously, this controlled the number of queued dagrun
the scheduler can create in a dag. Now, the maximum number is controlled internally by the DAG’s max_active_runs
Bug Fixes¶
Fix Unexpected commit error in SchedulerJob (#19213)
Add DagRun.logical_date as a property (#19198)
Clear
ti.next_method
andti.next_kwargs
on task finish (#19183)Faster PostgreSQL db migration to Airflow 2.2 (#19166)
Remove incorrect type comment in
Swagger2Specification._set_defaults
classmethod (#19065)Add TriggererJob to jobs check command (#19179, #19185)
Hide tooltip when next run is
None
(#19112)Create TI context with data interval compat layer (#19148)
Fix queued dag runs changes
catchup=False
behaviour (#19130, #19145)add detailed information to logging when a dag or a task finishes. (#19097)
Warn about unsupported Python 3.10 (#19060)
Fix catchup by limiting queued dagrun creation using
max_active_runs
(#18897)Prevent scheduler crash when serialized dag is missing (#19113)
Don’t install SQLAlchemy/Pendulum adapters for other DBs (#18745)
Workaround
libstdcpp
TLS error (#19010)Change
ds
,ts
, etc. back to use logical date (#19088)Ensure task state doesn’t change when marked as failed/success/skipped (#19095)
Relax packaging requirement (#19087)
Rename trigger page label to Logical Date (#19061)
Allow Param to support a default value of
None
(#19034)Upgrade old DAG/task param format when deserializing from the DB (#18986)
Don’t bake ENV and _cmd into tmp config for non-sudo (#18772)
CLI: Fail
backfill
command before loading DAGs if missing args (#18994)BugFix: Null execution date on insert to
task_fail
violating NOT NULL (#18979)Try to move “dangling” rows in db upgrade (#18953)
Row lock TI query in
SchedulerJob._process_executor_events
(#18975)Sentry before send fallback (#18980)
Fix
XCom.delete
error in Airflow 2.2.0 (#18956)Check python version before starting triggerer (#18926)
Doc only changes¶
Update access control documentation for TaskInstances and DagRuns (#18644)
Add information about keepalives for managed Postgres (#18850)
Doc: Add Callbacks Section to Logging & Monitoring (#18842)
Group PATCH DAGrun together with other DAGRun endpoints (#18885)
Airflow 2.2.0 (2021-10-11)¶
Significant Changes¶
Note: Upgrading the database to 2.2.0
or later can take some time to complete, particularly if you have a large task_instance
table.
worker_log_server_port
configuration has been moved to the logging
section.¶
The worker_log_server_port
configuration option has been moved from [celery]
section to [logging]
section to allow for reuse between different executors.
pandas
is now an optional dependency¶
Previously pandas
was a core requirement so when you run pip install apache-airflow
it looked for pandas
library and installed it if it does not exist.
If you want to install pandas
compatible with Airflow, you can use [pandas]
extra while
installing Airflow, example for Python 3.8 and Airflow 2.1.2:
pip install -U "apache-airflow[pandas]==2.1.2" \
--constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.1.2/constraints-3.8.txt"
none_failed_or_skipped
trigger rule has been deprecated¶
TriggerRule.NONE_FAILED_OR_SKIPPED
is replaced by TriggerRule.NONE_FAILED_MIN_ONE_SUCCESS
.
This is only name change, no functionality changes made.
This change is backward compatible however TriggerRule.NONE_FAILED_OR_SKIPPED
will be removed in next major release.
Dummy trigger rule has been deprecated¶
TriggerRule.DUMMY
is replaced by TriggerRule.ALWAYS
.
This is only name change, no functionality changes made.
This change is backward compatible however TriggerRule.DUMMY
will be removed in next major release.
DAG concurrency settings have been renamed¶
[core] dag_concurrency
setting in airflow.cfg
has been renamed to [core] max_active_tasks_per_dag
for better understanding.
It is the maximum number of task instances allowed to run concurrently in each DAG. To calculate the number of tasks that is running concurrently for a DAG, add up the number of running tasks for all DAG runs of the DAG.
This is configurable at the DAG level with max_active_tasks
and a default can be set in airflow.cfg
as
[core] max_active_tasks_per_dag
.
Before:
[core]
dag_concurrency = 16
Now:
[core]
max_active_tasks_per_dag = 16
Similarly, DAG.concurrency
has been renamed to DAG.max_active_tasks
.
Before:
dag = DAG(
dag_id="example_dag",
start_date=datetime(2021, 1, 1),
catchup=False,
concurrency=3,
)
Now:
dag = DAG(
dag_id="example_dag",
start_date=datetime(2021, 1, 1),
catchup=False,
max_active_tasks=3,
)
If you are using DAGs Details API endpoint, use max_active_tasks
instead of concurrency
.
Task concurrency parameter has been renamed¶
BaseOperator.task_concurrency
has been deprecated and renamed to max_active_tis_per_dag
for
better understanding.
This parameter controls the number of concurrent running task instances across dag_runs
per task.
Before:
with DAG(dag_id="task_concurrency_example"):
BashOperator(task_id="t1", task_concurrency=2, bash_command="echo Hi")
After:
with DAG(dag_id="task_concurrency_example"):
BashOperator(task_id="t1", max_active_tis_per_dag=2, bash_command="echo Hi")
processor_poll_interval
config have been renamed to scheduler_idle_sleep_time
¶
[scheduler] processor_poll_interval
setting in airflow.cfg
has been renamed to [scheduler] scheduler_idle_sleep_time
for better understanding.
It controls the ‘time to sleep’ at the end of the Scheduler loop if nothing was scheduled inside SchedulerJob
.
Before:
[scheduler]
processor_poll_interval = 16
Now:
[scheduler]
scheduler_idle_sleep_time = 16
Marking success/failed automatically clears failed downstream tasks¶
When marking a task success/failed in Graph View, its downstream tasks that are in failed/upstream_failed state are automatically cleared.
[core] store_dag_code
has been removed¶
While DAG Serialization is a strict requirements since Airflow 2, we allowed users to control where the Webserver looked for when showing the Code View.
If [core] store_dag_code
was set to True
, the Scheduler stored the code in the DAG file in the
DB (in dag_code
table) as a plain string, and the webserver just read it from the same table.
If the value was set to False
, the webserver read it from the DAG file.
While this setting made sense for Airflow < 2, it caused some confusion to some users where they thought this setting controlled DAG Serialization.
From Airflow 2.2, Airflow will only look for DB when a user clicks on Code View for a DAG.
Clearing a running task sets its state to RESTARTING
¶
Previously, clearing a running task sets its state to SHUTDOWN
. The task gets killed and goes into FAILED
state. After #16681, clearing a running task sets its state to RESTARTING
. The task is eligible for retry without going into FAILED
state.
Remove TaskInstance.log_filepath
attribute¶
This method returned incorrect values for a long time, because it did not take into account the different logger configuration and task retries. We have also started supporting more advanced tools that don’t use files, so it is impossible to determine the correct file path in every case e.g. Stackdriver doesn’t use files but identifies logs based on labels. For this reason, we decided to delete this attribute.
If you need to read logs, you can use airflow.utils.log.log_reader.TaskLogReader
class, which does not have
the above restrictions.
If a sensor times out, it will not retry¶
Previously, a sensor is retried when it times out until the number of retries
are exhausted. So the effective timeout of a sensor is timeout * (retries + 1)
. This behaviour is now changed. A sensor will immediately fail without retrying if timeout
is reached. If it’s desirable to let the sensor continue running for longer time, set a larger timeout
instead.
Default Task Pools Slots can be set using [core] default_pool_task_slot_count
¶
By default tasks are running in default_pool
. default_pool
is initialized with 128
slots and user can change the
number of slots through UI/CLI/API for an existing deployment.
For new deployments, you can use default_pool_task_slot_count
setting in [core]
section. This setting would
not have any effect in an existing deployment where the default_pool
already exists.
Previously this was controlled by non_pooled_task_slot_count
in [core]
section, which was not documented.
Webserver DAG refresh buttons removed¶
Now that the DAG parser syncs DAG permissions there is no longer a need for manually refreshing DAGs. As such, the buttons to refresh a DAG have been removed from the UI.
In addition, the /refresh
and /refresh_all
webserver endpoints have also been removed.
TaskInstances now require a DagRun¶
Under normal operation every TaskInstance row in the database would have DagRun row too, but it was possible to manually delete the DagRun and Airflow would still schedule the TaskInstances.
In Airflow 2.2 we have changed this and now there is a database-level foreign key constraint ensuring that every TaskInstance has a DagRun row.
Before updating to this 2.2 release you will have to manually resolve any inconsistencies (add back DagRun rows, or delete TaskInstances) if you have any “dangling” TaskInstance” rows.
As part of this change the clean_tis_without_dagrun_interval
config option under [scheduler]
section has been removed and has no effect.
TaskInstance and TaskReschedule now define run_id
instead of execution_date
¶
As a part of the TaskInstance-DagRun relation change, the execution_date
columns on TaskInstance and TaskReschedule have been removed from the database, and replaced by association proxy fields at the ORM level. If you access Airflow’s metadatabase directly, you should rewrite the implementation to use the run_id
columns instead.
Note that Airflow’s metadatabase definition on both the database and ORM levels are considered implementation detail without strict backward compatibility guarantees.
DaskExecutor - Dask Worker Resources and queues¶
If dask workers are not started with complementary resources to match the specified queues, it will now result in an AirflowException
, whereas before it would have just ignored the queue
argument.
Logical date of a DAG run triggered from the web UI now have its sub-second component set to zero¶
Due to a change in how the logical date (execution_date
) is generated for a manual DAG run, a manual DAG run’s logical date may not match its time-of-trigger, but have its sub-second part zero-ed out. For example, a DAG run triggered on 2021-10-11T12:34:56.78901
would have its logical date set to 2021-10-11T12:34:56.00000
.
This may affect some logic that expects on this quirk to detect whether a run is triggered manually or not. Note that dag_run.run_type
is a more authoritative value for this purpose. Also, if you need this distinction between automated and manually-triggered run for “next execution date” calculation, please also consider using the new data interval variables instead, which provide a more consistent behavior between the two run types.
New Features¶
AIP-39: Add (customizable) Timetable class to Airflow for richer scheduling behaviour (#15397, #16030, #16352, #17030, #17122, #17414, #17552, #17755, #17989, #18084, #18088, #18244, #18266, #18420, #18434, #18421, #18475, #18499, #18573, #18522, #18729, #18706, #18742, #18786, #18804)
AIP-40: Add Deferrable “Async” Tasks (#15389, #17564, #17565, #17601, #17745, #17747, #17748, #17875, #17876, #18129, #18210, #18214, #18552, #18728, #18414)
Add a Docker Taskflow decorator (#15330, #18739)
Add Airflow Standalone command (#15826)
Display alert messages on dashboard from local settings (#18284)
Advanced Params using json-schema (#17100)
Ability to test connections from UI or API (#15795, #18750)
Add Next Run to UI (#17732)
Add default weight rule configuration option (#18627)
Add a calendar field to choose the execution date of the DAG when triggering it (#16141)
Allow setting specific
cwd
for BashOperator (#17751)Show import errors in DAG views (#17818)
Add pre/post execution hooks [Experimental] (#17576)
Added table to view providers in Airflow ui under admin tab (#15385)
Adds secrets backend/logging/auth information to provider yaml (#17625)
Add date format filters to Jinja environment (#17451)
Introduce
RESTARTING
state (#16681)Webserver: Unpause DAG on manual trigger (#16569)
API endpoint to create new user (#16609)
Add
insert_args
for support transfer replace (#15825)Add recursive flag to glob in filesystem sensor (#16894)
Add conn to jinja template context (#16686)
Add
default_args
forTaskGroup
(#16557)Allow adding duplicate connections from UI (#15574)
Allow specifying multiple URLs via the CORS config option (#17941)
Implement API endpoint for DAG deletion (#17980)
Add DAG run endpoint for marking a dagrun success or failed(#17839)
Add support for
kinit
options[-f|-F]
and[-a|-A]
(#17816)Queue support for
DaskExecutor
using Dask Worker Resources (#16829, #18720)Make auto refresh interval configurable (#18107)
Improvements¶
Small improvements for Airflow UI (#18715, #18795)
Rename
processor_poll_interval
toscheduler_idle_sleep_time
(#18704)Check the allowed values for the logging level (#18651)
Fix error on triggering a dag that doesn’t exist using
dagrun_conf
(#18655)Add muldelete action to
TaskInstanceModelView
(#18438)Avoid importing DAGs during clean DB installation (#18450)
Require can_edit on DAG privileges to modify TaskInstances and DagRuns (#16634)
Make Kubernetes job description fit on one log line (#18377)
Always draw borders if task instance state is null or undefined (#18033)
Inclusive Language (#18349)
Improved log handling for zombie tasks (#18277)
Adding
Variable.update
method and improving detection of variable key collisions (#18159)Add note about params on trigger DAG page (#18166)
Change
TaskInstance
andTaskReschedule
PK fromexecution_date
torun_id
(#17719)Adding
TaskGroup
support inBaseOperator.chain()
(#17456)Allow filtering DAGS by tags in the REST API (#18090)
Optimize imports of Providers Manager (#18052)
Adds capability of Warnings for incompatible community providers (#18020)
Serialize the
template_ext
attribute to show it in UI (#17985)Add
robots.txt
andX-Robots-Tag
header (#17946)Refactor
BranchDayOfWeekOperator
,DayOfWeekSensor
(#17940)Update error message to guide the user into self-help mostly (#17929)
Update to Celery 5 (#17397)
Add links to provider’s documentation (#17736)
Remove Marshmallow schema warnings (#17753)
Rename
none_failed_or_skipped
bynone_failed_min_one_success
trigger rule (#17683)Remove
[core] store_dag_code
& use DB to get Dag Code (#16342)Rename
task_concurrency
tomax_active_tis_per_dag
(#17708)Import Hooks lazily individually in providers manager (#17682)
Adding support for multiple task-ids in the external task sensor (#17339)
Replace
execution_date
withrun_id
in airflow tasks run command (#16666)Make output from users cli command more consistent (#17642)
Open relative extra links in place (#17477)
Move
worker_log_server_port
option to the logging section (#17621)Use gunicorn to serve logs generated by worker (#17591)
Improve validation of Group id (#17578)
Simplify 404 page (#17501)
Add XCom.clear so it’s hookable in custom XCom backend (#17405)
Add deprecation notice for
SubDagOperator
(#17488)Support DAGS folder being in different location on scheduler and runners (#16860)
Remove /dagrun/create and disable edit form generated by F.A.B (#17376)
Enable specifying dictionary paths in
template_fields_renderers
(#17321)error early if virtualenv is missing (#15788)
Handle connection parameters added to Extra and custom fields (#17269)
Fix
airflow celery stop
to accept the pid file. (#17278)Remove DAG refresh buttons (#17263)
Deprecate dummy trigger rule in favor of always (#17144)
Be verbose about failure to import
airflow_local_settings
(#17195)Include exit code in
AirflowException
str whenBashOperator
fails. (#17151)Adding EdgeModifier support for chain() (#17099)
Only allows supported field types to be used in custom connections (#17194)
Secrets backend failover (#16404)
Warn on Webserver when using
SQLite
orSequentialExecutor
(#17133)Extend
init_containers
defined inpod_override
(#17537)Client-side filter dag dependencies (#16253)
Improve executor validation in CLI (#17071)
Prevent running
airflow db init/upgrade
migrations and setup in parallel. (#17078)Update
chain()
andcross_downstream()
to supportXComArgs
(#16732)Improve graph view refresh (#16696)
When a task instance fails with exception, log it (#16805)
Set process title for
serve-logs
andLocalExecutor
(#16644)Rename
test_cycle
tocheck_cycle
(#16617)Add schema as
DbApiHook
instance attribute (#16521, #17423)Improve compatibility with MSSQL (#9973)
Add transparency for unsupported connection type (#16220)
Call resource based fab methods (#16190)
Format more dates with timezone (#16129)
Replace deprecated
dag.sub_dag
withdag.partial_subset
(#16179)Treat
AirflowSensorTimeout
as immediate failure without retrying (#12058)Marking success/failed automatically clears failed downstream tasks (#13037)
Add close/open indicator for import dag errors (#16073)
Add collapsible import errors (#16072)
Always return a response in TI’s
action_clear
view (#15980)Add cli command to delete user by email (#15873)
Use resource and action names for FAB permissions (#16410)
Rename DAG concurrency (
[core] dag_concurrency
) settings for easier understanding (#16267, #18730)Calendar UI improvements (#16226)
Refactor:
SKIPPED
should not be logged again asSUCCESS
(#14822)Remove version limits for
dnspython
(#18046, #18162)Accept custom run ID in TriggerDagRunOperator (#18788)
Bug Fixes¶
Make REST API patch user endpoint work the same way as the UI (#18757)
Properly set
start_date
for cleared tasks (#18708)Ensure task_instance exists before running update on its state(REST API) (#18642)
Make
AirflowDateTimePickerWidget
a required field (#18602)Retry deadlocked transactions on deleting old rendered task fields (#18616)
Fix
retry_exponential_backoff
divide by zero error when retry delay is zero (#17003)Improve how UI handles datetimes (#18611, #18700)
Bugfix: dag_bag.get_dag should return None, not raise exception (#18554)
Only show the task modal if it is a valid instance (#18570)
Fix accessing rendered
{{ task.x }}
attributes from within templates (#18516)Add missing email type of connection (#18502)
Don’t use flash for “same-page” UI messages. (#18462)
Fix task group tooltip (#18406)
Properly fix dagrun update state endpoint (#18370)
Properly handle ti state difference between executor and scheduler (#17819)
Fix stuck “queued” tasks in KubernetesExecutor (#18152)
Don’t permanently add zip DAGs to
sys.path
(#18384)Fix random deadlocks in MSSQL database (#18362)
Deactivating DAGs which have been removed from files (#17121)
When syncing dags to db remove
dag_tag
rows that are now unused (#8231)Graceful scheduler shutdown on error (#18092)
Fix mini scheduler not respecting
wait_for_downstream
dep (#18338)Pass exception to
run_finished_callback
for Debug Executor (#17983)Make
XCom.get_one
return full, not abbreviated values (#18274)Use try/except when closing temporary file in task_runner (#18269)
show next run if not none (#18273)
Fix DB session handling in
XCom.set
(#18240)Fix external_executor_id not being set for manually run jobs (#17207)
Fix deleting of zipped Dags in Serialized Dag Table (#18243)
Return explicit error on user-add for duplicated email (#18224)
Remove loading dots even when last run data is empty (#18230)
Swap dag import error dropdown icons (#18207)
Automatically create section when migrating config (#16814)
Set encoding to utf-8 by default while reading task logs (#17965)
Apply parent dag permissions to subdags (#18160)
Change id collation for MySQL to case-sensitive (#18072)
Logs task launch exception in
StandardTaskRunner
(#17967)Applied permissions to
self._error_file
(#15947)Fix blank dag dependencies view (#17990)
Add missing menu access for dag dependencies and configurations pages (#17450)
Fix passing Jinja templates in
DateTimeSensor
(#17959)Fixing bug which restricted the visibility of ImportErrors (#17924)
Fix grammar in
traceback.html
(#17942)Fix
DagRunState
enum query forMySQLdb
driver (#17886)Fixed button size in “Actions” group. (#17902)
Only show import errors for DAGs a user can access (#17835)
Show all import_errors from zip files (#17759)
fix EXTRA_LOGGER_NAMES param and related docs (#17808)
Use one interpreter for Airflow and gunicorn (#17805)
Fix: Mysql 5.7 id utf8mb3 (#14535)
Fix dag_processing.last_duration metric random holes (#17769)
Automatically use
utf8mb3_general_ci
collation for MySQL (#17729)fix: filter condition of
TaskInstance
does not work #17535 (#17548)Dont use TaskInstance in CeleryExecutor.trigger_tasks (#16248)
Remove locks for upgrades in MSSQL (#17213)
Create virtualenv via python call (#17156)
Ensure a DAG is acyclic when running
DAG.cli()
(#17105)Translate non-ascii characters (#17057)
Change the logic of
None
comparison inmodel_list
template (#16893)Have UI and POST /task_instances_state API endpoint have same behaviour (#16539)
ensure task is skipped if missing sla (#16719)
Fix direct use of
cached_property
module (#16710)Fix TI success confirm page (#16650)
Modify return value check in python virtualenv jinja template (#16049)
Fix dag dependency search (#15924)
Make custom JSON encoder support
Decimal
(#16383)Bugfix: Allow clearing tasks with just
dag_id
and emptysubdir
(#16513)Convert port value to a number before calling test connection (#16497)
Handle missing/null serialized DAG dependencies (#16393)
Correctly set
dag.fileloc
when using the@dag
decorator (#16384)Fix TI success/failure links (#16233)
Correctly implement autocomplete early return in
airflow/www/views.py
(#15940)Backport fix to allow pickling of Loggers to Python 3.6 (#18798)
Fix bug that Backfill job fail to run when there are tasks run into
reschedule
state (#17305, #18806)
Doc only changes¶
Update
dagbag_size
documentation (#18824)Update documentation about bundle extras (#18828)
Fix wrong Postgres
search_path
set up instructions (#17600)Remove
AIRFLOW_GID
from Docker images (#18747)Improve error message for BranchPythonOperator when no task_id to follow (#18471)
Improve guidance to users telling them what to do on import timeout (#18478)
Explain scheduler fine-tuning better (#18356)
Added example JSON for airflow pools import (#18376)
Add
sla_miss_callback
section to the documentation (#18305)Explain sentry default environment variable for subprocess hook (#18346)
Refactor installation pages (#18282)
Improves quick-start docker-compose warnings and documentation (#18164)
Production-level support for MSSQL (#18382)
Update non-working example in documentation (#18067)
Remove default_args pattern + added get_current_context() use for Core Airflow example DAGs (#16866)
Update max_tis_per_query to better render on the webpage (#17971)
Adds Github Oauth example with team based authorization (#17896)
Update docker.rst (#17882)
Example xcom update (#17749)
Add doc warning about connections added via env vars (#17915)
fix wrong documents around upgrade-check.rst (#17903)
Add Brent to Committers list (#17873)
Improves documentation about modules management (#17757)
Remove deprecated metrics from metrics.rst (#17772)
Make sure “production-readiness” of docker-compose is well explained (#17731)
Doc: Update Upgrade to v2 docs with Airflow 1.10.x EOL dates (#17710)
Doc: Replace deprecated param from docstrings (#17709)
Describe dag owner more carefully (#17699)
Update note so avoid misinterpretation (#17701)
Docs: Make
DAG.is_active
read-only in API (#17667)Update documentation regarding Python 3.9 support (#17611)
Fix MySQL database character set instruction (#17603)
Document overriding
XCom.clear
for data lifecycle management (#17589)Path correction in docs for airflow core (#17567)
docs(celery): reworded, add actual multiple queues example (#17541)
Doc: Add FAQ to speed up parsing with tons of dag files (#17519)
Improve image building documentation for new users (#17409)
Doc: Strip unnecessary arguments from MariaDB JIRA URL (#17296)
Update warning about MariaDB and multiple schedulers (#17287)
Doc: Recommend using same configs on all Airflow components (#17146)
Move docs about masking to a new page (#17007)
Suggest use of Env vars instead of Airflow Vars in best practices doc (#16926)
Docs: Better description for
pod_template_file
(#16861)Add Aneesh Joseph as Airflow Committer (#16835)
Docs: Added new pipeline example for the tutorial docs (#16548)
Remove upstart from docs (#16672)
Add new committers:
Jed
andTP
(#16671)Docs: Fix
flask-ouathlib
toflask-oauthlib
in Upgrading docs (#16320)Docs: Fix creating a connection docs (#16312)
Docs: Fix url for
Elasticsearch
(#16275)Small improvements for README.md files (#16244)
Fix docs for
dag_concurrency
(#16177)Check syntactic correctness for code-snippets (#16005)
Add proper link for wheel packages in docs. (#15999)
Add Docs for
default_pool
slots (#15997)Add memory usage warning in quick-start documentation (#15967)
Update example
KubernetesExecutor
git-sync
pod template file (#15904)Docs: Fix Taskflow API docs (#16574)
Added new pipeline example for the tutorial docs (#16084)
Updating the DAG docstring to include
render_template_as_native_obj
(#16534)Update docs on setting up SMTP (#16523)
Docs: Fix API verb from
POST
toPATCH
(#16511)
Misc/Internal¶
Renaming variables to be consistent with code logic (#18685)
Simplify strings previously split across lines (#18679)
fix exception string of
BranchPythonOperator
(#18623)Add multiple roles when creating users (#18617)
Move FABs base Security Manager into Airflow. (#16647)
Remove unnecessary css state colors (#18461)
Update
boto3
to<1.19
(#18389)Improve coverage for
airflow.security.kerberos module
(#18258)Fix Amazon Kinesis test (#18337)
Fix provider test accessing
importlib-resources
(#18228)Silence warnings in tests from using SubDagOperator (#18275)
Fix usage of
range(len())
toenumerate
(#18174)Test coverage on the autocomplete view (#15943)
Add “packaging” to core requirements (#18122)
Adds LoggingMixins to BaseTrigger (#18106)
Fix building docs in
main
builds (#18035)Remove upper-limit on
tenacity
(#17593)Remove redundant
numpy
dependency (#17594)Bump
mysql-connector-python
to latest version (#17596)Make
pandas
an optional core dependency (#17575)Add more typing to airflow.utils.helpers (#15582)
Chore: Some code cleanup in
airflow/utils/db.py
(#17090)Refactor: Remove processor_factory from DAG processing (#16659)
Remove AbstractDagFileProcessorProcess from dag processing (#16816)
Update TaskGroup typing (#16811)
Update
click
to 8.x (#16779)Remove remaining Pylint disables (#16760)
Remove duplicated try, there is already a try in create_session (#16701)
Removes pylint from our toolchain (#16682)
Refactor usage of unneeded function call (#16653)
Add type annotations to setup.py (#16658)
Remove SQLAlchemy <1.4 constraint (#16630) (Note: our dependencies still have a requirement on <1.4)
Refactor
dag.clear
method (#16086)Use
DAG_ACTIONS
constant (#16232)Use updated
_get_all_non_dag_permissions
method (#16317)Add updated-name wrappers for built-in FAB methods (#16077)
Remove
TaskInstance.log_filepath
attribute (#15217)Removes unnecessary function call in
airflow/www/app.py
(#15956)Move
plyvel
to google provider extra (#15812)Update permission migrations to use new naming scheme (#16400)
Use resource and action names for FAB (#16380)
Swap out calls to
find_permission_view_menu
forget_permission
wrapper (#16377)Fix deprecated default for
fab_logging_level
toWARNING
(#18783)Allow running tasks from UI when using
CeleryKubernetesExecutor
(#18441)
Airflow 2.1.4 (2021-09-18)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Fix deprecation error message rather than silencing it (#18126)
Limit the number of queued dagruns created by the Scheduler (#18065)
Fix
DagRun
execution order from queued to running not being properly followed (#18061)Fix
max_active_runs
not allowing moving of queued dagruns to running (#17945)Avoid redirect loop for users with no permissions (#17838)
Avoid endless redirect loop when user has no roles (#17613)
Fix log links on graph TI modal (#17862)
Hide variable import form if user lacks permission (#18000)
Improve dag/task concurrency check (#17786)
Fix Clear task instances endpoint resets all DAG runs bug (#17961)
Fixes incorrect parameter passed to views (#18083) (#18085)
Fix Sentry handler from
LocalTaskJob
causing error (#18119)Limit
colorlog
version (6.x is incompatible) (#18099)Only show Pause/Unpause tooltip on hover (#17957)
Improve graph view load time for dags with open groups (#17821)
Increase width for Run column (#17817)
Fix wrong query on running tis (#17631)
Add root to tree refresh url (#17633)
Do not delete running DAG from the UI (#17630)
Improve discoverability of Provider packages’ functionality
Do not let
create_dagrun
overwrite explicitrun_id
(#17728)Regression on pid reset to allow task start after heartbeat (#17333)
Set task state to failed when pod is DELETED while running (#18095)
Advises the kernel to not cache log files generated by Airflow (#18054)
Sort adopted tasks in
_check_for_stalled_adopted_tasks
method (#18208)
Doc only changes¶
Update version added fields in airflow/config_templates/config.yml (#18128)
Improve the description of how to handle dynamic task generation (#17963)
Improve cross-links to operators and hooks references (#17622)
Doc: Fix replacing Airflow version for Docker stack (#17711)
Make the providers operators/hooks reference much more usable (#17768)
Update description about the new
connection-types
provider meta-dataSuggest to use secrets backend for variable when it contains sensitive data (#17319)
Separate Installing from sources section and add more details (#18171)
Doc: Use
closer.lua
script for downloading sources (#18179)Doc: Improve installing from sources (#18194)
Improves installing from sources pages for all components (#18251)
Airflow 2.1.3 (2021-08-23)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Fix task retries when they receive
sigkill
and have retries and properly handlesigterm
(#16301)Fix redacting secrets in context exceptions. (#17618)
Fix race condition with dagrun callbacks (#16741)
Add ‘queued’ to DagRunState (#16854)
Add ‘queued’ state to DagRun (#16401)
Fix external elasticsearch logs link (#16357)
Add proper warning message when recorded PID is different from current PID (#17411)
Fix running tasks with
default_impersonation
config (#17229)Rescue if a DagRun’s DAG was removed from db (#17544)
Fixed broken json_client (#17529)
Handle and log exceptions raised during task callback (#17347)
Fix CLI
kubernetes cleanup-pods
which fails on invalid label key (#17298)Show serialization exceptions in DAG parsing log (#17277)
Fix:
TaskInstance
does not showqueued_by_job_id
&external_executor_id
(#17179)Adds more explanatory message when
SecretsMasker
is not configured (#17101)Enable the use of
__init_subclass__
in subclasses ofBaseOperator
(#17027)Fix task instance retrieval in XCom view (#16923)
Validate type of
priority_weight
during parsing (#16765)Correctly handle custom
deps
andtask_group
during DAG Serialization (#16734)Fix slow (cleared) tasks being be adopted by Celery worker. (#16718)
Fix calculating duration in tree view (#16695)
Fix
AttributeError
:datetime.timezone
object has no attributename
(#16599)Redact conn secrets in webserver logs (#16579)
Change graph focus to top of view instead of center (#16484)
Fail tasks in scheduler when executor reports they failed (#15929)
fix(smart_sensor): Unbound variable errors (#14774)
Add back missing permissions to
UserModelView
controls. (#17431)Better diagnostics and self-healing of docker-compose (#17484)
Improve diagnostics message when users have
secret_key
misconfigured (#17410)Stop checking
execution_date
intask_instance.refresh_from_db
(#16809)
Improvements¶
Run mini scheduler in
LocalTaskJob
during task exit (#16289)Remove
SQLAlchemy<1.4
constraint (#16630)Bump Jinja2 upper-bound from 2.12.0 to 4.0.0 (#16595)
Bump
dnspython
(#16698)Updates to
FlaskAppBuilder
3.3.2+ (#17208)Add State types for tasks and DAGs (#15285)
Set Process title for Worker when using
LocalExecutor
(#16623)Move
DagFileProcessor
andDagFileProcessorProcess
out ofscheduler_job.py
(#16581)
Doc only changes¶
Fix inconsistencies in configuration docs (#17317)
Fix docs link for using SQLite as Metadata DB (#17308)
Misc¶
Switch back http provider after requests removes LGPL dependency (#16974)
Airflow 2.1.2 (2021-07-14)¶
Significant Changes¶
No significant changes.
Bug Fixes¶
Only allow webserver to request from the worker log server (#16754)
Fix “Invalid JSON configuration, must be a dict” bug (#16648)
Fix
CeleryKubernetesExecutor
(#16700)Mask value if the key is
token
(#16474)Fix impersonation issue with
LocalTaskJob
(#16852)Resolve all npm vulnerabilities including bumping
jQuery
to3.5
(#16440)
Misc¶
Add Python 3.9 support (#15515)
Airflow 2.1.1 (2021-07-02)¶
Significant Changes¶
activate_dag_runs
argument of the function clear_task_instances
is replaced with dag_run_state
¶
To achieve the previous default behaviour of clear_task_instances
with activate_dag_runs=True
, no change is needed. To achieve the previous behaviour of activate_dag_runs=False
, pass dag_run_state=False
instead. (The previous parameter is still accepted, but is deprecated)
Bug Fixes¶
Don’t crash attempting to mask secrets in dict with non-string keys (#16601)
Always install sphinx_airflow_theme from
PyPI
(#16594)Remove limitation for elasticsearch library (#16553)
Adding extra requirements for build and runtime of the PROD image. (#16170)
Cattrs 1.7.0 released by the end of May 2021 break lineage usage (#16173)
Removes unnecessary packages from setup_requires (#16139)
Pins docutils to <0.17 until breaking behaviour is fixed (#16133)
Improvements for Docker Image docs (#14843)
Ensure that
dag_run.conf
is a dict (#15057)Fix CLI connections import and migrate logic from secrets to Connection model (#15425)
Fix Dag Details start date bug (#16206)
Fix DAG run state not updated while DAG is paused (#16343)
Allow null value for operator field in task_instance schema(REST API) (#16516)
Avoid recursion going too deep when redacting logs (#16491)
Backfill: Don’t create a DagRun if no tasks match task regex (#16461)
Tree View UI for larger DAGs & more consistent spacing in Tree View (#16522)
Correctly handle None returns from Query.scalar() (#16345)
Adding
only_active
parameter to /dags endpoint (#14306)Don’t show stale Serialized DAGs if they are deleted in DB (#16368)
Make REST API List DAGs endpoint consistent with UI/CLI behaviour (#16318)
Support remote logging in elasticsearch with
filebeat 7
(#14625)Queue tasks with higher priority and earlier execution_date first. (#15210)
Make task ID on legend have enough width and width of line chart to be 100%. (#15915)
Fix normalize-url vulnerability (#16375)
Validate retries value on init for better errors (#16415)
add num_runs query param for tree refresh (#16437)
Fix templated default/example values in config ref docs (#16442)
Add
passphrase
andprivate_key
to default sensitive field names (#16392)Fix tasks in an infinite slots pool were never scheduled (#15247)
Fix Orphaned tasks stuck in CeleryExecutor as running (#16550)
Don’t fail to log if we can’t redact something (#16118)
Set max tree width to 1200 pixels (#16067)
Fill the “job_id” field for
airflow task run
without--local
/--raw
for KubeExecutor (#16108)Fixes problem where conf variable was used before initialization (#16088)
Fix apply defaults for task decorator (#16085)
Parse recently modified files even if just parsed (#16075)
Ensure that we don’t try to mask empty string in logs (#16057)
Don’t die when masking
log.exception
when there is no exception (#16047)Restores apply_defaults import in base_sensor_operator (#16040)
Fix auto-refresh in tree view When webserver ui is not in
/
(#16018)Fix dag.clear() to set multiple dags to running when necessary (#15382)
Fix Celery executor getting stuck randomly because of reset_signals in multiprocessing (#15989)
Airflow 2.1.0 (2021-05-21)¶
Significant Changes¶
New “deprecated_api” extra¶
We have a new ‘[deprecated_api]’ extra that should be used when installing airflow when the deprecated API
is going to be used. This is now an optional feature of Airflow now because it pulls in requests
which
(as of 14 May 2021) pulls LGPL chardet
dependency.
The http
provider is not installed by default¶
The http
provider is now optional and not installed by default, until chardet
becomes an optional
dependency of requests
.
See PR to replace chardet with charset-normalizer
@apply_default
decorator isn’t longer necessary¶
This decorator is now automatically added to all operators via the metaclass on BaseOperator
Change the configuration options for field masking¶
We’ve improved masking for sensitive data in Web UI and logs. As part of it, the following configurations have been changed:
hide_sensitive_variable_fields
option inadmin
section has been replaced byhide_sensitive_var_conn_fields
section incore
section,sensitive_variable_fields
option inadmin
section has been replaced bysensitive_var_conn_names
section incore
section.
Deprecated PodDefaults and add_xcom_sidecar in airflow.kubernetes.pod_generator¶
We have moved PodDefaults from airflow.kubernetes.pod_generator.PodDefaults
to
airflow.providers.cncf.kubernetes.utils.xcom_sidecar.PodDefaults
and moved add_xcom_sidecar
from airflow.kubernetes.pod_generator.PodGenerator.add_xcom_sidecar
to
airflow.providers.cncf.kubernetes.utils.xcom_sidecar.add_xcom_sidecar
.
This change will allow us to modify the KubernetesPodOperator XCom functionality without requiring airflow upgrades.
Removed pod_launcher from core airflow¶
Moved the pod launcher from airflow.kubernetes.pod_launcher
to airflow.providers.cncf.kubernetes.utils.pod_launcher
This will allow users to update the pod_launcher for the KubernetesPodOperator without requiring an airflow upgrade
Default [webserver] worker_refresh_interval
is changed to 6000
seconds¶
The default value for [webserver] worker_refresh_interval
was 30
seconds for
Airflow <=2.0.1. However, since Airflow 2.0 DAG Serialization is a hard requirement
and the Webserver used the serialized DAGs, there is no need to kill an existing
worker and create a new one as frequently as 30
seconds.
This setting can be raised to an even higher value, currently it is
set to 6000
seconds (100 minutes) to
serve as a DagBag cache burst time.
default_queue
configuration has been moved to the operators
section.¶
The default_queue
configuration option has been moved from [celery]
section to [operators]
section to allow for reuse between different executors.
New Features¶
Add
PythonVirtualenvDecorator
to Taskflow API (#14761)Add
Taskgroup
decorator (#15034)Create a DAG Calendar View (#15423)
Create cross-DAG dependencies view (#13199)
Add rest API to query for providers (#13394)
Mask passwords and sensitive info in task logs and UI (#15599)
Add
SubprocessHook
for running commands from operators (#13423)Add DAG Timeout in UI page “DAG Details” (#14165)
Add
WeekDayBranchOperator
(#13997)Add JSON linter to DAG Trigger UI (#13551)
Add DAG Description Doc to Trigger UI Page (#13365)
Add airflow webserver URL into SLA miss email. (#13249)
Add read only REST API endpoints for users (#14735)
Add files to generate Airflow’s Python SDK (#14739)
Add dynamic fields to snowflake connection (#14724)
Add read only REST API endpoint for roles and permissions (#14664)
Add new datetime branch operator (#11964)
Add Google leveldb hook and operator (#13109) (#14105)
Add plugins endpoint to the REST API (#14280)
Add
worker_pod_pending_timeout
support (#15263)Add support for labeling DAG edges (#15142)
Add CUD REST API endpoints for Roles (#14840)
Import connections from a file (#15177)
A bunch of
template_fields_renderers
additions (#15130)Add REST API query sort and order to some endpoints (#14895)
Add timezone context in new ui (#15096)
Add query mutations to new UI (#15068)
Add different modes to sort dag files for parsing (#15046)
Auto refresh on Tree View (#15474)
BashOperator to raise
AirflowSkipException
on exit code 99 (by default, configurable) (#13421) (#14963)Clear tasks by task ids in REST API (#14500)
Support jinja2 native Python types (#14603)
Allow celery workers without gossip or mingle modes (#13880)
Add
airflow jobs check
CLI command to check health of jobs (Scheduler etc) (#14519)Rename
DateTimeBranchOperator
toBranchDateTimeOperator
(#14720)
Improvements¶
Add optional result handler callback to
DbApiHook
(#15581)Update Flask App Builder limit to recently released 3.3 (#15792)
Prevent creating flask sessions on REST API requests (#15295)
Sync DAG specific permissions when parsing (#15311)
Increase maximum length of pool name on Tasks to 256 characters (#15203)
Enforce READ COMMITTED isolation when using mysql (#15714)
Auto-apply
apply_default
to subclasses ofBaseOperator
(#15667)Emit error on duplicated DAG ID (#15302)
Update
KubernetesExecutor
pod templates to allow access to IAM permissions (#15669)More verbose logs when running
airflow db check-migrations
(#15662)When one_success mark task as failed if no success (#15467)
Add an option to trigger a dag w/o changing conf (#15591)
Add Airflow UI instance_name configuration option (#10162)
Add a decorator to retry functions with DB transactions (#14109)
Add return to PythonVirtualenvOperator’s execute method (#14061)
Add verify_ssl config for kubernetes (#13516)
Add description about
secret_key
when Webserver > 1 (#15546)Add Traceback in LogRecord in
JSONFormatter
(#15414)Add support for arbitrary json in conn uri format (#15100)
Adds description field in variable (#12413) (#15194)
Add logs to show last modified in SFTP, FTP and Filesystem sensor (#15134)
Execute
on_failure_callback
when SIGTERM is received (#15172)Allow hiding of all edges when highlighting states (#15281)
Display explicit error in case UID has no actual username (#15212)
Serve logs with Scheduler when using Local or Sequential Executor (#15557)
Deactivate trigger, refresh, and delete controls on dag detail view. (#14144)
Turn off autocomplete for connection forms (#15073)
Increase default
worker_refresh_interval
to6000
seconds (#14970)Only show User’s local timezone if it’s not UTC (#13904)
Suppress LOG/WARNING for a few tasks CLI for better CLI experience (#14567)
Configurable API response (CORS) headers (#13620)
Allow viewers to see all docs links (#14197)
Update Tree View date ticks (#14141)
Make the tooltip to Pause / Unpause a DAG clearer (#13642)
Warn about precedence of env var when getting variables (#13501)
Move
[celery] default_queue
config to[operators] default_queue
to reuse between executors (#14699)
Bug Fixes¶
Fix 500 error from
updateTaskInstancesState
API endpoint whendry_run
not passed (#15889)Ensure that task preceding a PythonVirtualenvOperator doesn’t fail (#15822)
Prevent mixed case env vars from crashing processes like worker (#14380)
Fixed type annotations in DAG decorator (#15778)
Fix on_failure_callback when task receive SIGKILL (#15537)
Fix dags table overflow (#15660)
Fix changing the parent dag state on subdag clear (#15562)
Fix reading from zip package to default to text (#13962)
Fix wrong parameter for
drawDagStatsForDag
in dags.html (#13884)Fix QueuedLocalWorker crashing with EOFError (#13215)
Fix typo in
NotPreviouslySkippedDep
(#13933)Fix parallelism after KubeExecutor pod adoption (#15555)
Fix kube client on mac with keepalive enabled (#15551)
Fixes wrong limit for dask for python>3.7 (should be <3.7) (#15545)
Fix Task Adoption in
KubernetesExecutor
(#14795)Fix timeout when using XCom with
KubernetesPodOperator
(#15388)Fix deprecated provider aliases in “extras” not working (#15465)
Fixed default XCom deserialization. (#14827)
Fix used_group_ids in
dag.partial_subset
(#13700) (#15308)Further fix trimmed
pod_id
forKubernetesPodOperator
(#15445)Bugfix: Invalid name when trimmed
pod_id
ends with hyphen inKubernetesPodOperator
(#15443)Fix incorrect slots stats when TI
pool_slots > 1
(#15426)Fix DAG last run link (#15327)
Fix
sync-perm
to work correctly when update_fab_perms = False (#14847)Fixes limits on Arrow for plexus test (#14781)
Fix UI bugs in tree view (#14566)
Fix AzureDataFactoryHook failing to instantiate its connection (#14565)
Fix permission error on non-POSIX filesystem (#13121)
Fix spelling in “ignorable” (#14348)
Fix get_context_data doctest import (#14288)
Correct typo in
GCSObjectsWtihPrefixExistenceSensor
(#14179)Fix order of failed deps (#14036)
Fix critical
CeleryKubernetesExecutor
bug (#13247)Fix four bugs in
StackdriverTaskHandler
(#13784)func.sum
may returnDecimal
that break rest APIs (#15585)Persist tags params in pagination (#15411)
API: Raise
AlreadyExists
exception when theexecution_date
is same (#15174)Remove duplicate call to
sync_metadata
insideDagFileProcessorManager
(#15121)Extra
docker-py
update to resolve docker op issues (#15731)Ensure executors end method is called (#14085)
Remove
user_id
from API schema (#15117)Prevent clickable bad links on disabled pagination (#15074)
Acquire lock on db for the time of migration (#10151)
Skip SLA check only if SLA is None (#14064)
Print right version in airflow info command (#14560)
Make
airflow info
work with pipes (#14528)Rework client-side script for connection form. (#14052)
API: Add
CollectionInfo
in all Collections that havetotal_entries
(#14366)Fix
task_instance_mutation_hook
when importing airflow.models.dagrun (#15851)
Doc only changes¶
Fix docstring of SqlSensor (#15466)
Small changes on “DAGs and Tasks documentation” (#14853)
Add note on changes to configuration options (#15696)
Add docs to the
markdownlint
andyamllint
config files (#15682)Rename old “Experimental” API to deprecated in the docs. (#15653)
Fix documentation error in
git_sync_template.yaml
(#13197)Fix doc link permission name (#14972)
Fix link to Helm chart docs (#14652)
Fix docstrings for Kubernetes code (#14605)
docs: Capitalize & minor fixes (#14283) (#14534)
Fixed reading from zip package to default to text. (#13984)
An initial rework of the “Concepts” docs (#15444)
Improve docstrings for various modules (#15047)
Add documentation on database connection URI (#14124)
Add Helm Chart logo to docs index (#14762)
Create a new documentation package for Helm Chart (#14643)
Add docs about supported logging levels (#14507)
Update docs about tableau and salesforce provider (#14495)
Replace deprecated doc links to the correct one (#14429)
Refactor redundant doc url logic to use utility (#14080)
docs: NOTICE: Updated 2016-2019 to 2016-now (#14248)
Skip DAG perm sync during parsing if possible (#15464)
Add picture and examples for Edge Labels (#15310)
Add example DAG & how-to guide for sqlite (#13196)
Add links to new modules for deprecated modules (#15316)
Add note in Updating.md about FAB data model change (#14478)
Misc/Internal¶
Fix
logging.exception
redundancy (#14823)Bump
stylelint
to remove vulnerable sub-dependency (#15784)Add resolution to force dependencies to use patched version of lodash (#15777)
Update croniter to 1.0.x series (#15769)
Get rid of Airflow 1.10 in Breeze (#15712)
Run helm chart tests in parallel (#15706)
Bump
ssri
from 6.0.1 to 6.0.2 in /airflow/www (#15437)Remove the limit on Gunicorn dependency (#15611)
Better “dependency already registered” warning message for tasks #14613 (#14860)
Pin pandas-gbq to <0.15.0 (#15114)
Use Pip 21.* to install airflow officially (#15513)
Bump mysqlclient to support the 1.4.x and 2.x series (#14978)
Finish refactor of DAG resource name helper (#15511)
Refactor/Cleanup Presentation of Graph Task and Path Highlighting (#15257)
Standardize default fab perms (#14946)
Remove
datepicker
for task instance detail view (#15284)Turn provider’s import warnings into debug logs (#14903)
Remove left-over fields from required in provider_info schema. (#14119)
Deprecate
tableau
extra (#13595)Use built-in
cached_property
on Python 3.8 where possible (#14606)Clean-up JS code in UI templates (#14019)
Bump elliptic from 6.5.3 to 6.5.4 in /airflow/www (#14668)
Switch to f-strings using
flynt
. (#13732)use
jquery
ready instead of vanilla js (#15258)Migrate task instance log (ti_log) js (#15309)
Migrate graph js (#15307)
Migrate dags.html javascript (#14692)
Removes unnecessary AzureContainerInstance connection type (#15514)
Separate Kubernetes pod_launcher from core airflow (#15165)
update remaining old import paths of operators (#15127)
Remove broken and undocumented “demo mode” feature (#14601)
Simplify configuration/legibility of
Webpack
entries (#14551)remove inline tree js (#14552)
Js linting and inline migration for simple scripts (#14215)
Remove use of repeated constant in AirflowConfigParser (#14023)
Deprecate email credentials from environment variables. (#13601)
Remove unused ‘context’ variable in task_instance.py (#14049)
Disable suppress_logs_and_warning in cli when debugging (#13180)
Airflow 2.0.2 (2021-04-19)¶
Significant Changes¶
Default [kubernetes] enable_tcp_keepalive
is changed to True
¶
This allows Airflow to work more reliably with some environments (like Azure) by default.
sync-perm
CLI no longer syncs DAG specific permissions by default¶
The sync-perm
CLI command will no longer sync DAG specific permissions by default as they are now being handled during
DAG parsing. If you need or want the old behavior, you can pass --include-dags
to have sync-perm
also sync DAG
specific permissions.
Bug Fixes¶
Bugfix:
TypeError
when Serializing & sorting iterable properties of DAGs (#15395)Fix missing
on_load
trigger for folder-based plugins (#15208)kubernetes cleanup-pods
subcommand will only clean up Airflow-created Pods (#15204)Fix password masking in CLI action_logging (#15143)
Fix url generation for TriggerDagRunOperatorLink (#14990)
Restore base lineage backend (#14146)
Unable to trigger backfill or manual jobs with Kubernetes executor. (#14160)
Bugfix: Task docs are not shown in the Task Instance Detail View (#15191)
Bugfix: Fix overriding
pod_template_file
in KubernetesExecutor (#15197)Bugfix: resources in
executor_config
breaks Graph View in UI (#15199)Fix celery executor bug trying to call len on map (#14883)
Fix bug in airflow.stats timing that broke dogstatsd mode (#15132)
Avoid scheduler/parser manager deadlock by using non-blocking IO (#15112)
Re-introduce
dagrun.schedule_delay
metric (#15105)Compare string values, not if strings are the same object in Kube executor(#14942)
Pass queue to BaseExecutor.execute_async like in airflow 1.10 (#14861)
Scheduler: Remove TIs from starved pools from the critical path. (#14476)
Remove extra/needless deprecation warnings from airflow.contrib module (#15065)
Fix support for long dag_id and task_id in KubernetesExecutor (#14703)
Sort lists, sets and tuples in Serialized DAGs (#14909)
Simplify cleaning string passed to origin param (#14738) (#14905)
Fix error when running tasks with Sentry integration enabled. (#13929)
Webserver: Sanitize string passed to origin param (#14738)
Fix losing duration < 1 secs in tree (#13537)
Pin SQLAlchemy to <1.4 due to breakage of sqlalchemy-utils (#14812)
Fix KubernetesExecutor issue with deleted pending pods (#14810)
Default to Celery Task model when backend model does not exist (#14612)
Bugfix: Plugins endpoint was unauthenticated (#14570)
BugFix: fix DAG doc display (especially for TaskFlow DAGs) (#14564)
BugFix: TypeError in airflow.kubernetes.pod_launcher’s monitor_pod (#14513)
Bugfix: Fix wrong output of tags and owners in dag detail API endpoint (#14490)
Fix logging error with task error when JSON logging is enabled (#14456)
Fix StatsD metrics not sending when using daemon mode (#14454)
Gracefully handle missing start_date and end_date for DagRun (#14452)
BugFix: Serialize max_retry_delay as a timedelta (#14436)
Fix crash when user clicks on “Task Instance Details” caused by start_date being None (#14416)
BugFix: Fix TaskInstance API call fails if a task is removed from running DAG (#14381)
Scheduler should not fail when invalid
executor_config
is passed (#14323)Fix bug allowing task instances to survive when dagrun_timeout is exceeded (#14321)
Fix bug where DAG timezone was not always shown correctly in UI tooltips (#14204)
Use
Lax
forcookie_samesite
when empty string is passed (#14183)[AIRFLOW-6076] fix
dag.cli()
KeyError (#13647)Fix running child tasks in a subdag after clearing a successful subdag (#14776)
Improvements¶
Remove unused JS packages causing false security alerts (#15383)
Change default of
[kubernetes] enable_tcp_keepalive
for new installs toTrue
(#15338)Fixed #14270: Add error message in OOM situations (#15207)
Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
Updates 3.6 limits for latest versions of a few libraries (#15209)
Adds Blinker dependency which is missing after recent changes (#15182)
Remove ‘conf’ from search_columns in DagRun View (#15099)
More proper default value for namespace in K8S cleanup-pods CLI (#15060)
Faster default role syncing during webserver start (#15017)
Speed up webserver start when there are many DAGs (#14993)
Much easier to use and better documented Docker image (#14911)
Use
libyaml
C library when available. (#14577)Don’t create unittest.cfg when not running in unit test mode (#14420)
Webserver: Allow Filtering TaskInstances by queued_dttm (#14708)
Update Flask-AppBuilder dependency to allow 3.2 (and all 3.x series) (#14665)
Remember expanded task groups in browser local storage (#14661)
Add plain format output to cli tables (#14546)
Make
airflow dags show
command display TaskGroups (#14269)Increase maximum size of
extra
connection field. (#12944)Speed up clear_task_instances by doing a single sql delete for TaskReschedule (#14048)
Add more flexibility with FAB menu links (#13903)
Add better description and guidance in case of sqlite version mismatch (#14209)
Doc only changes¶
Add documentation create/update community providers (#15061)
Fix mistake and typos in airflow.utils.timezone docstrings (#15180)
Replace new url for Stable Airflow Docs (#15169)
Docs: Clarify behavior of delete_worker_pods_on_failure (#14958)
Create a documentation package for Docker image (#14846)
Multiple minor doc (OpenAPI) fixes (#14917)
Replace Graph View Screenshot to show Auto-refresh (#14571)
Misc/Internal¶
Import Connection lazily in hooks to avoid cycles (#15361)
Rename last_scheduler_run into last_parsed_time, and ensure it’s updated in DB (#14581)
Make TaskInstance.pool_slots not nullable with a default of 1 (#14406)
Log migrations info in consistent way (#14158)
Airflow 2.0.1 (2021-02-08)¶
Significant Changes¶
Permission to view Airflow Configurations has been removed from User
and Viewer
role¶
Previously, Users with User
or Viewer
role were able to get/view configurations using
the REST API or in the Webserver. From Airflow 2.0.1, only users with Admin
or Op
role would be able
to get/view Configurations.
To allow users with other roles to view configuration, add can read on Configurations
permissions to that role.
Note that if [webserver] expose_config
is set to False
, the API will throw a 403
response even if
the user has role with can read on Configurations
permission.
Default [celery] worker_concurrency
is changed to 16
¶
The default value for [celery] worker_concurrency
was 16
for Airflow <2.0.0.
However, it was unintentionally changed to 8
in 2.0.0.
From Airflow 2.0.1, we revert to the old default of 16
.
Default [scheduler] min_file_process_interval
is changed to 30
¶
The default value for [scheduler] min_file_process_interval
was 0
,
due to which the CPU Usage mostly stayed around 100% as the DAG files are parsed
constantly.
From Airflow 2.0.0, the scheduling decisions have been moved from
DagFileProcessor to Scheduler, so we can keep the default a bit higher: 30
.
Bug Fixes¶
Bugfix: Return XCom Value in the XCom Endpoint API (#13684)
Bugfix: Import error when using custom backend and
sql_alchemy_conn_secret
(#13260)Allow PID file path to be relative when daemonize a process (scheduler, kerberos, etc) (#13232)
Bugfix: no generic
DROP CONSTRAINT
in MySQL duringairflow db upgrade
(#13239)Bugfix: Sync Access Control defined in DAGs when running
sync-perm
(#13377)Stop sending Callback Requests if no callbacks are defined on DAG (#13163)
BugFix: Dag-level Callback Requests were not run (#13651)
Stop creating duplicate Dag File Processors (#13662)
Filter DagRuns with Task Instances in removed State while Scheduling (#13165)
Bump
datatables.net
from 1.10.21 to 1.10.22 in /airflow/www (#13143)Bump
datatables.net
JS to 1.10.23 (#13253)Bump
dompurify
from 2.0.12 to 2.2.6 in /airflow/www (#13164)Update minimum
cattrs
version (#13223)Remove inapplicable arg ‘output’ for CLI pools import/export (#13071)
Webserver: Fix the behavior to deactivate the authentication option and add docs (#13191)
Fix: add support for no-menu plugin views (#11742)
Add
python-daemon
limit for Python 3.8+ to fix daemon crash (#13540)Change the default celery
worker_concurrency
to 16 (#13612)Audit Log records View should not contain link if
dag_id
is None (#13619)Fix invalid
continue_token
for cleanup list pods (#13563)Switches to latest version of snowflake connector (#13654)
Fix backfill crash on task retry or reschedule (#13712)
Setting
max_tis_per_query
to0
now correctly removes the limit (#13512)Fix race conditions in task callback invocations (#10917)
Fix webserver exiting when gunicorn master crashes (#13518)(#13780)
Fix SQL syntax to check duplicate connections (#13783)
BaseBranchOperator
will push to xcom by default (#13704) (#13763)Fix Deprecation for
configuration.getsection
(#13804)Fix TaskNotFound in log endpoint (#13872)
Fix race condition when using Dynamic DAGs (#13893)
Fix: Linux/Chrome window bouncing in Webserver
Fix db shell for sqlite (#13907)
Only compare updated time when Serialized DAG exists (#13899)
Fix dag run type enum query for mysqldb driver (#13278)
Add authentication to lineage endpoint for experimental API (#13870)
Do not add User role perms to custom roles. (#13856)
Do not add
Website.can_read
access to default roles. (#13923)Fix invalid value error caused by long Kubernetes pod name (#13299)
Fix DB Migration for SQLite to upgrade to 2.0 (#13921)
Bugfix: Manual DagRun trigger should not skip scheduled runs (#13963)
Stop loading Extra Operator links in Scheduler (#13932)
Added missing return parameter in read function of
FileTaskHandler
(#14001)Bugfix: Do not try to create a duplicate Dag Run in Scheduler (#13920)
Make
v1/config
endpoint respect webserverexpose_config
setting (#14020)Disable row level locking for Mariadb and MySQL <8 (#14031)
Bugfix: Fix permissions to triggering only specific DAGs (#13922)
Fix broken SLA Mechanism (#14056)
Bugfix: Scheduler fails if task is removed at runtime (#14057)
Remove permissions to read Configurations for User and Viewer roles (#14067)
Fix DB Migration from 2.0.1rc1
Improvements¶
Increase the default
min_file_process_interval
to decrease CPU Usage (#13664)Dispose connections when running tasks with
os.fork
&CeleryExecutor
(#13265)Make function purpose clearer in
example_kubernetes_executor
example dag (#13216)Remove unused libraries -
flask-swagger
,funcsigs
(#13178)Display alternative tooltip when a Task has yet to run (no TI) (#13162)
User werkzeug’s own type conversion for request args (#13184)
UI: Add
queued_by_job_id
&external_executor_id
Columns to TI View (#13266)Make
json-merge-patch
an optional library and unpin it (#13175)Adds missing LDAP “extra” dependencies to ldap provider. (#13308)
Refactor
setup.py
to better reflect changes in providers (#13314)Pin
pyjwt
and Add integration tests for Apache Pinot (#13195)Removes provider-imposed requirements from
setup.cfg
(#13409)Replace deprecated decorator (#13443)
Streamline & simplify
__eq__
methods in models Dag and BaseOperator (#13449)Additional properties should be allowed in provider schema (#13440)
Remove unused dependency -
contextdecorator
(#13455)Remove ‘typing’ dependency (#13472)
Log migrations info in consistent way (#13458)
Unpin
mysql-connector-python
to allow8.0.22
(#13370)Remove thrift as a core dependency (#13471)
Add
NotFound
response for DELETE methods in OpenAPI YAML (#13550)Stop Log Spamming when
[core] lazy_load_plugins
isFalse
(#13578)Display message and docs link when no plugins are loaded (#13599)
Unpin restriction for
colorlog
dependency (#13176)Add missing Dag Tag for Example DAGs (#13665)
Support tables in DAG docs (#13533)
Add
python3-openid
dependency (#13714)Add
__repr__
for Executors (#13753)Add description to hint if
conn_type
is missing (#13778)Upgrade Azure blob to v12 (#12188)
Add extra field to
get_connnection
REST endpoint (#13885)Make Smart Sensors DB Migration idempotent (#13892)
Improve the error when DAG does not exist when running dag pause command (#13900)
Update
airflow_local_settings.py
to fix an error message (#13927)Only allow passing JSON Serializable conf to
TriggerDagRunOperator
(#13964)Bugfix: Allow getting details of a DAG with null
start_date
(REST API) (#13959)Add params to the DAG details endpoint (#13790)
Make the role assigned to anonymous users customizable (#14042)
Retry critical methods in Scheduler loop in case of
OperationalError
(#14032)
Doc only changes¶
Add Missing StatsD Metrics in Docs (#13708)
Add Missing Email configs in Configuration doc (#13709)
Add quick start for Airflow on Docker (#13660)
Describe which Python versions are supported (#13259)
Add note block to 2.x migration docs (#13094)
Add documentation about webserver_config.py (#13155)
Add missing version information to recently added configs (#13161)
API: Use generic information in UpdateMask component (#13146)
Add Airflow 2.0.0 to requirements table (#13140)
Avoid confusion in doc for CeleryKubernetesExecutor (#13116)
Update docs link in REST API spec (#13107)
Add link to PyPI Repository to provider docs (#13064)
Fix link to Airflow master branch documentation (#13179)
Minor enhancements to Sensors docs (#13381)
Use 2.0.0 in Airflow docs & Breeze (#13379)
Improves documentation regarding providers and custom connections (#13375)(#13410)
Fix malformed table in production-deployment.rst (#13395)
Update celery.rst to fix broken links (#13400)
Remove reference to scheduler run_duration param in docs (#13346)
Set minimum SQLite version supported (#13412)
Fix installation doc (#13462)
Add docs about mocking variables and connections (#13502)
Add docs about Flask CLI (#13500)
Fix Upgrading to 2 guide to use
rbac
UI (#13569)Make docs clear that Auth can not be disabled for Stable API (#13568)
Remove archived links from docs & add link for AIPs (#13580)
Minor fixes in upgrading-to-2.rst (#13583)
Fix Link in Upgrading to 2.0 guide (#13584)
Fix heading for Mocking section in best-practices.rst (#13658)
Add docs on how to use custom operators within plugins folder (#13186)
Update docs to register Operator Extra Links (#13683)
Improvements for database setup docs (#13696)
Replace module path to Class with just Class Name (#13719)
Update DAG Serialization docs (#13722)
Fix link to Apache Airflow docs in webserver (#13250)
Clarifies differences between extras and provider packages (#13810)
Add information about all access methods to the environment (#13940)
Docs: Fix FAQ on scheduler latency (#13969)
Updated taskflow api doc to show dependency with sensor (#13968)
Add deprecated config options to docs (#13883)
Added a FAQ section to the Upgrading to 2 doc (#13979)
Airflow 2.0.0 (2020-12-18)¶
The full changelog is about 3,000 lines long (already excluding everything backported to 1.10) so please check Airflow 2.0.0 Highlights Blog Post instead.
Significant Changes¶
The 2.0 release of the Airflow is a significant upgrade, and includes substantial major changes, and some of them may be breaking. Existing code written for earlier versions of this project will may require updates to use this version. Sometimes necessary configuration changes are also required. This document describes the changes that have been made, and what you need to do to update your usage.
If you experience issues or have questions, please file an issue.
Major changes¶
This section describes the major changes that have been made in this release.
The experimental REST API is disabled by default¶
The experimental REST API is disabled by default. To restore these APIs while migrating to
the stable REST API, set enable_experimental_api
option in [api]
section to True
.
Please note that the experimental REST API do not have access control. The authenticated user has full access.
SparkJDBCHook default connection¶
For SparkJDBCHook default connection was spark-default
, and for SparkSubmitHook it was
spark_default
. Both hooks now use the spark_default
which is a common pattern for the connection
names used across all providers.
Changes to output argument in commands¶
From Airflow 2.0, We are replacing tabulate with rich to render commands output. Due to this change, the --output
argument
will no longer accept formats of tabulate tables. Instead, it now accepts:
table
- will render the output in predefined tablejson
- will render the output as a jsonyaml
- will render the output as yaml
By doing this we increased consistency and gave users possibility to manipulate the output programmatically (when using json or yaml).
Affected commands:
airflow dags list
airflow dags report
airflow dags list-runs
airflow dags list-jobs
airflow connections list
airflow connections get
airflow pools list
airflow pools get
airflow pools set
airflow pools delete
airflow pools import
airflow pools export
airflow role list
airflow providers list
airflow providers get
airflow providers hooks
airflow tasks states-for-dag-run
airflow users list
airflow variables list
Azure Wasb Hook does not work together with Snowflake hook¶
The WasbHook in Apache Airflow use a legacy version of Azure library. While the conflict is not
significant for most of the Azure hooks, it is a problem for Wasb Hook because the blob
folders
for both libraries overlap. Installing both Snowflake and Azure extra will result in non-importable
WasbHook.
Rename all
to devel_all
extra¶
The all
extras were reduced to include only user-facing dependencies. This means
that this extra does not contain development dependencies. If you were relying on
all
extra then you should use now devel_all
or figure out if you need development
extras at all.
Context variables prev_execution_date_success
and prev_execution_date_success
are now pendulum.DateTime
¶
Rename policy to task_policy¶
Because Airflow introduced DAG level policy (dag_policy
) we decided to rename existing policy
function to task_policy
to make the distinction more profound and avoid any confusion.
Users using cluster policy need to rename their policy
functions in airflow_local_settings.py
to task_policy
.
Default value for [celery] operation_timeout
has changed to 1.0
¶
From Airflow 2, by default Airflow will retry 3 times to publish task to Celery broker. This is controlled by
[celery] task_publish_max_retries
. Because of this we can now have a lower Operation timeout that raises
AirflowTaskTimeout
. This generally occurs during network blips or intermittent DNS issues.
Adding Operators and Sensors via plugins is no longer supported¶
Operators and Sensors should no longer be registered or imported via Airflow’s plugin mechanism – these types of classes are just treated as plain python classes by Airflow, so there is no need to register them with Airflow.
If you previously had a plugins/my_plugin.py
and you used it like this in a DAG:
from airflow.operators.my_plugin import MyOperator
You should instead import it as:
from my_plugin import MyOperator
The name under airflow.operators.
was the plugin name, where as in the second example it is the python module name where the operator is defined.
See https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html for more info.
Importing Hooks via plugins is no longer supported¶
Importing hooks added in plugins via airflow.hooks.<plugin_name>
is no longer supported, and hooks should just be imported as regular python modules.
from airflow.hooks.my_plugin import MyHook
You should instead import it as:
from my_plugin import MyHook
It is still possible (but not required) to “register” hooks in plugins. This is to allow future support for dynamically populating the Connections form in the UI.
See https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html for more info.
The default value for [core] enable_xcom_pickling
has been changed to False
¶
The pickle type for XCom messages has been replaced to JSON by default to prevent RCE attacks.
Note that JSON serialization is stricter than pickling, so for example if you want to pass
raw bytes through XCom you must encode them using an encoding like base64
.
If you understand the risk and still want to use pickling,
set enable_xcom_pickling = True
in your Airflow config’s core
section.
Airflowignore of base path¶
There was a bug fixed in https://github.com/apache/airflow/pull/11993 that the “airflowignore” checked the base path of the dag folder for forbidden dags, not only the relative part. This had the effect that if the base path contained the excluded word the whole dag folder could have been excluded. For example if the airflowignore file contained x, and the dags folder was ‘/var/x/dags’, then all dags in the folder would be excluded. The fix only matches the relative path only now which means that if you previously used full path as ignored, you should change it to relative one. For example if your dag folder was ‘/var/dags/’ and your airflowignore contained ‘/var/dag/excluded/’, you should change it to ‘excluded/’.
ExternalTaskSensor
provides all task context variables to execution_date_fn
as keyword arguments¶
The old syntax of passing context
as a dictionary will continue to work with the caveat that the argument must be named context
. The following will break. To fix it, change ctx
to context
.
def execution_date_fn(execution_date, ctx):
...
execution_date_fn
can take in any number of keyword arguments available in the task context dictionary. The following forms of execution_date_fn
are all supported:
def execution_date_fn(dt):
...
def execution_date_fn(execution_date):
...
def execution_date_fn(execution_date, ds_nodash):
...
def execution_date_fn(execution_date, ds_nodash, dag):
...
The default value for [webserver] cookie_samesite
has been changed to Lax
¶
As recommended by Flask, the
[webserver] cookie_samesite
has been changed to Lax
from ''
(empty string) .
Changes to import paths¶
Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib
package was supported by the community. The project was passed to the Apache community and currently the
entire code is maintained by the community, so now the division has no justification, and it is only due
to historical reasons. In Airflow 2.0, we want to organize packages and move integrations
with third party services to the airflow.providers
package.
All changes made are backward compatible, but if you use the old import paths you will see a deprecation warning. The old import paths can be abandoned in the future.
According to AIP-21
_operator
suffix has been removed from operators. A deprecation warning has also been raised for paths
importing with the suffix.
The following table shows changes in import paths.
Old path |
New path |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Database schema changes¶
In order to migrate the database, you should use the command airflow db upgrade
, but in
some cases manual steps are required.
Unique conn_id in connection table¶
Previously, Airflow allowed users to add more than one connection with the same conn_id
and on access it would choose one connection randomly. This acted as a basic load balancing and fault tolerance technique, when used in conjunction with retries.
This behavior caused some confusion for users, and there was no clear evidence if it actually worked well or not.
Now the conn_id
will be unique. If you already have duplicates in your metadata database, you will have to manage those duplicate connections before upgrading the database.
Not-nullable conn_type column in connection table¶
The conn_type
column in the connection
table must contain content. Previously, this rule was enforced
by application logic, but was not enforced by the database schema.
If you made any modifications to the table directly, make sure you don’t have
null in the conn_type
column.
Configuration changes¶
This release contains many changes that require a change in the configuration of this application or other application that integrate with it.
This section describes the changes that have been made, and what you need to do to.
airflow.contrib.utils.log has been moved¶
Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib
package was supported by the community. The project was passed to the Apache community and currently the
entire code is maintained by the community, so now the division has no justification, and it is only due
to historical reasons. In Airflow 2.0, we want to organize packages and move integrations
with third party services to the airflow.providers
package.
To clean up, the following packages were moved:
Old package |
New package |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You should update the import paths if you are setting log configurations with the logging_config_class
option.
The old import paths still works but can be abandoned.
SendGrid emailer has been moved¶
Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib package was supported by the community. The project was passed to the Apache community and currently the entire code is maintained by the community, so now the division has no justification, and it is only due to historical reasons.
To clean up, the send_mail
function from the airflow.contrib.utils.sendgrid
module has been moved.
If your configuration file looks like this:
[email]
email_backend = airflow.contrib.utils.sendgrid.send_email
It should look like this now:
[email]
email_backend = airflow.providers.sendgrid.utils.emailer.send_email
The old configuration still works but can be abandoned.
Unify hostname_callable
option in core
section¶
The previous option used a colon(:
) to split the module from function. Now the dot(.
) is used.
The change aims to unify the format of all options that refer to objects in the airflow.cfg
file.
Custom executors is loaded using full import path¶
In previous versions of Airflow it was possible to use plugins to load custom executors. It is still
possible, but the configuration has changed. Now you don’t have to create a plugin to configure a
custom executor, but you need to provide the full path to the module in the executor
option
in the core
section. The purpose of this change is to simplify the plugin mechanism and make
it easier to configure executor.
If your module was in the path my_acme_company.executors.MyCustomExecutor
and the plugin was
called my_plugin
then your configuration looks like this
[core]
executor = my_plugin.MyCustomExecutor
And now it should look like this:
[core]
executor = my_acme_company.executors.MyCustomExecutor
The old configuration is still works but can be abandoned at any time.
Use CustomSQLAInterface
instead of SQLAInterface
for custom data models.¶
From Airflow 2.0, if you want to define your own Flask App Builder data models you need to use CustomSQLAInterface instead of SQLAInterface.
For Non-RBAC replace:
from flask_appbuilder.models.sqla.interface import SQLAInterface
datamodel = SQLAInterface(your_data_model)
with RBAC (in 1.10):
from airflow.www_rbac.utils import CustomSQLAInterface
datamodel = CustomSQLAInterface(your_data_model)
and in 2.0:
from airflow.www.utils import CustomSQLAInterface
datamodel = CustomSQLAInterface(your_data_model)
Drop plugin support for stat_name_handler¶
In previous version, you could use plugins mechanism to configure stat_name_handler
. You should now use the stat_name_handler
option in [scheduler]
section to achieve the same effect.
If your plugin looked like this and was available through the test_plugin
path:
def my_stat_name_handler(stat):
return stat
class AirflowTestPlugin(AirflowPlugin):
name = "test_plugin"
stat_name_handler = my_stat_name_handler
then your airflow.cfg
file should look like this:
[scheduler]
stat_name_handler=test_plugin.my_stat_name_handler
This change is intended to simplify the statsd configuration.
Logging configuration has been moved to new section¶
The following configurations have been moved from [core]
to the new [logging]
section.
base_log_folder
remote_logging
remote_log_conn_id
remote_base_log_folder
encrypt_s3_logs
logging_level
fab_logging_level
logging_config_class
colored_console_log
colored_log_format
colored_formatter_class
log_format
simple_log_format
task_log_prefix_template
log_filename_template
log_processor_filename_template
dag_processor_manager_log_location
task_log_reader
Metrics configuration has been moved to new section¶
The following configurations have been moved from [scheduler]
to the new [metrics]
section.
statsd_on
statsd_host
statsd_port
statsd_prefix
statsd_allow_list
stat_name_handler
statsd_datadog_enabled
statsd_datadog_tags
statsd_custom_client_path
Changes to Elasticsearch logging provider¶
When JSON output to stdout is enabled, log lines will now contain the log_id
& offset
fields, this should make reading task logs from elasticsearch on the webserver work out of the box. Example configuration:
[logging]
remote_logging = True
[elasticsearch]
host = http://es-host:9200
write_stdout = True
json_format = True
Note that the webserver expects the log line data itself to be present in the message
field of the document.
Remove gcp_service_account_keys option in airflow.cfg file¶
This option has been removed because it is no longer supported by the Google Kubernetes Engine. The new recommended service account keys for the Google Cloud management method is Workload Identity.
Fernet is enabled by default¶
The fernet mechanism is enabled by default to increase the security of the default installation. In order to
restore the previous behavior, the user must consciously set an empty key in the fernet_key
option of
section [core]
in the airflow.cfg
file.
At the same time, this means that the apache-airflow[crypto]
extra-packages are always installed.
However, this requires that your operating system has libffi-dev
installed.
Changes to propagating Kubernetes worker annotations¶
kubernetes_annotations
configuration section has been removed.
A new key worker_annotations
has been added to existing kubernetes
section instead.
That is to remove restriction on the character set for k8s annotation keys.
All key/value pairs from kubernetes_annotations
should now go to worker_annotations
as a json. I.e. instead of e.g.
[kubernetes_annotations]
annotation_key = annotation_value
annotation_key2 = annotation_value2
it should be rewritten to
[kubernetes]
worker_annotations = { "annotation_key" : "annotation_value", "annotation_key2" : "annotation_value2" }
Remove run_duration¶
We should not use the run_duration
option anymore. This used to be for restarting the scheduler from time to time, but right now the scheduler is getting more stable and therefore using this setting is considered bad and might cause an inconsistent state.
Rename pool statsd metrics¶
Used slot has been renamed to running slot to make the name self-explanatory and the code more maintainable.
This means pool.used_slots.<pool_name>
metric has been renamed to
pool.running_slots.<pool_name>
. The Used Slots
column in Pools Web UI view
has also been changed to Running Slots
.
Removal of Mesos Executor¶
The Mesos Executor is removed from the code base as it was not widely used and not maintained. Mailing List Discussion on deleting it.
Change dag loading duration metric name¶
Change DAG file loading duration metric from
dag.loading-duration.<dag_id>
to dag.loading-duration.<dag_file>
. This is to
better handle the case when a DAG file has multiple DAGs.
Sentry is disabled by default¶
Sentry is disabled by default. To enable these integrations, you need set sentry_on
option
in [sentry]
section to "True"
.
Simplified GCSTaskHandler configuration¶
In previous versions, in order to configure the service account key file, you had to create a connection entry.
In the current version, you can configure google_key_path
option in [logging]
section to set
the key file path.
Users using Application Default Credentials (ADC) need not take any action.
The change aims to simplify the configuration of logging, to prevent corruption of the instance configuration by changing the value controlled by the user - connection entry. If you configure a backend secret, it also means the webserver doesn’t need to connect to it. This simplifies setups with multiple GCP projects, because only one project will require the Secret Manager API to be enabled.
Changes to the core operators/hooks¶
We strive to ensure that there are no changes that may affect the end user and your files, but this release may contain changes that will require changes to your DAG files.
This section describes the changes that have been made, and what you need to do to update your DAG File, if you use core operators or any other.
BaseSensorOperator now respects the trigger_rule of downstream tasks¶
Previously, BaseSensorOperator with setting soft_fail=True
skips itself
and skips all its downstream tasks unconditionally, when it fails i.e the trigger_rule of downstream tasks is not
respected.
In the new behavior, the trigger_rule of downstream tasks is respected.
User can preserve/achieve the original behaviour by setting the trigger_rule of each downstream task to all_success
.
BaseOperator uses metaclass¶
BaseOperator
class uses a BaseOperatorMeta
as a metaclass. This meta class is based on
abc.ABCMeta
. If your custom operator uses different metaclass then you will have to adjust it.
Remove SQL support in BaseHook¶
Remove get_records
and get_pandas_df
and run
from BaseHook, which only apply for SQL-like hook,
If want to use them, or your custom hook inherit them, please use airflow.hooks.dbapi.DbApiHook
Assigning task to a DAG using bitwise shift (bit-shift) operators are no longer supported¶
Previously, you could assign a task to a DAG as follows:
dag = DAG("my_dag")
dummy = DummyOperator(task_id="dummy")
dag >> dummy
This is no longer supported. Instead, we recommend using the DAG as context manager:
with DAG("my_dag") as dag:
dummy = DummyOperator(task_id="dummy")
Removed deprecated import mechanism¶
The deprecated import mechanism has been removed so the import of modules becomes more consistent and explicit.
For example: from airflow.operators import BashOperator
becomes from airflow.operators.bash_operator import BashOperator
Changes to sensor imports¶
Sensors are now accessible via airflow.sensors
and no longer via airflow.operators.sensors
.
For example: from airflow.operators.sensors import BaseSensorOperator
becomes from airflow.sensors.base import BaseSensorOperator
Skipped tasks can satisfy wait_for_downstream¶
Previously, a task instance with wait_for_downstream=True
will only run if the downstream task of
the previous task instance is successful. Meanwhile, a task instance with depends_on_past=True
will run if the previous task instance is either successful or skipped. These two flags are close siblings
yet they have different behavior. This inconsistency in behavior made the API less intuitive to users.
To maintain consistent behavior, both successful or skipped downstream task can now satisfy the
wait_for_downstream=True
flag.
airflow.utils.helpers.cross_downstream
¶
airflow.utils.helpers.chain
¶
The chain
and cross_downstream
methods are now moved to airflow.models.baseoperator module from
airflow.utils.helpers
module.
The baseoperator
module seems to be a better choice to keep
closely coupled methods together. Helpers module is supposed to contain standalone helper methods
that can be imported by all classes.
The chain
method and cross_downstream
method both use BaseOperator. If any other package imports
any classes or functions from helpers module, then it automatically has an
implicit dependency to BaseOperator. That can often lead to cyclic dependencies.
More information in AIRFLOW-6392
In Airflow < 2.0 you imported those two methods like this:
from airflow.utils.helpers import chain
from airflow.utils.helpers import cross_downstream
In Airflow 2.0 it should be changed to:
from airflow.models.baseoperator import chain
from airflow.models.baseoperator import cross_downstream
airflow.operators.python.BranchPythonOperator
¶
BranchPythonOperator
will now return a value equal to the task_id
of the chosen branch,
where previously it returned None. Since it inherits from BaseOperator it will do an
xcom_push
of this value if do_xcom_push=True
. This is useful for downstream decision-making.
airflow.sensors.sql_sensor.SqlSensor
¶
SQLSensor now consistent with python bool()
function and the allow_null
parameter has been removed.
It will resolve after receiving any value that is casted to True
with python bool(value)
. That
changes the previous response receiving NULL
or '0'
. Earlier '0'
has been treated as success
criteria. NULL
has been treated depending on value of allow_null
parameter. But all the previous
behaviour is still achievable setting param success
to lambda x: x is None or str(x) not in ('0', '')
.
airflow.operators.trigger_dagrun.TriggerDagRunOperator
¶
The TriggerDagRunOperator now takes a conf
argument to which a dict can be provided as conf for the DagRun.
As a result, the python_callable
argument was removed. PR: https://github.com/apache/airflow/pull/6317.
airflow.operators.python.PythonOperator
¶
provide_context
argument on the PythonOperator was removed. The signature of the callable passed to the PythonOperator is now inferred and argument values are always automatically provided. There is no need to explicitly provide or not provide the context anymore. For example:
def myfunc(execution_date):
print(execution_date)
python_operator = PythonOperator(task_id="mytask", python_callable=myfunc, dag=dag)
Notice you don’t have to set provide_context=True, variables from the task context are now automatically detected and provided.
All context variables can still be provided with a double-asterisk argument:
def myfunc(**context):
print(context) # all variables will be provided to context
python_operator = PythonOperator(task_id="mytask", python_callable=myfunc)
The task context variable names are reserved names in the callable function, hence a clash with op_args
and op_kwargs
results in an exception:
def myfunc(dag):
# raises a ValueError because "dag" is a reserved name
# valid signature example: myfunc(mydag)
print("output")
python_operator = PythonOperator(
task_id="mytask",
op_args=[1],
python_callable=myfunc,
)
The change is backwards compatible, setting provide_context
will add the provide_context
variable to the kwargs
(but won’t do anything).
PR: #5990
airflow.sensors.filesystem.FileSensor
¶
FileSensor is now takes a glob pattern, not just a filename. If the filename you are looking for has *
, ?
, or [
in it then you should replace these with [*]
, [?]
, and [[]
.
airflow.operators.subdag_operator.SubDagOperator
¶
SubDagOperator
is changed to use Airflow scheduler instead of backfill
to schedule tasks in the subdag. User no longer need to specify the executor
in SubDagOperator
.
airflow.providers.google.cloud.operators.datastore.CloudDatastoreExportEntitiesOperator
¶
airflow.providers.google.cloud.operators.datastore.CloudDatastoreImportEntitiesOperator
¶
airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator
¶
airflow.providers.ssh.operators.ssh.SSHOperator
¶
airflow.providers.microsoft.winrm.operators.winrm.WinRMOperator
¶
airflow.operators.bash.BashOperator
¶
airflow.providers.docker.operators.docker.DockerOperator
¶
airflow.providers.http.operators.http.SimpleHttpOperator
¶
The do_xcom_push
flag (a switch to push the result of an operator to xcom or not) was appearing in different incarnations in different operators. It’s function has been unified under a common name (do_xcom_push
) on BaseOperator
. This way it is also easy to globally disable pushing results to xcom.
The following operators were affected:
DatastoreExportOperator (Backwards compatible)
DatastoreImportOperator (Backwards compatible)
KubernetesPodOperator (Not backwards compatible)
SSHOperator (Not backwards compatible)
WinRMOperator (Not backwards compatible)
BashOperator (Not backwards compatible)
DockerOperator (Not backwards compatible)
SimpleHttpOperator (Not backwards compatible)
See AIRFLOW-3249 for details
airflow.operators.latest_only_operator.LatestOnlyOperator
¶
In previous versions, the LatestOnlyOperator
forcefully skipped all (direct and indirect) downstream tasks on its own. From this version on the operator will only skip direct downstream tasks and the scheduler will handle skipping any further downstream dependencies.
No change is needed if only the default trigger rule all_success
is being used.
If the DAG relies on tasks with other trigger rules (i.e. all_done
) being skipped by the LatestOnlyOperator
, adjustments to the DAG need to be made to accommodate the change in behaviour, i.e. with additional edges from the LatestOnlyOperator
.
The goal of this change is to achieve a more consistent and configurable cascading behaviour based on the BaseBranchOperator
(see AIRFLOW-2923 and AIRFLOW-1784).
Changes to the core Python API¶
We strive to ensure that there are no changes that may affect the end user, and your Python files, but this release may contain changes that will require changes to your plugins, DAG File or other integration.
Only changes unique to this provider are described here. You should still pay attention to the changes that have been made to the core (including core operators) as they can affect the integration behavior of this provider.
This section describes the changes that have been made, and what you need to do to update your Python files.
Removed sub-package imports from airflow/__init__.py
¶
The imports LoggingMixin
, conf
, and AirflowException
have been removed from airflow/__init__.py
.
All implicit references of these objects will no longer be valid. To migrate, all usages of each old path must be
replaced with its corresponding new path.
Old Path (Implicit Import) |
New Path (Explicit Import) |
---|---|
|
|
|
|
|
|
Variables removed from the task instance context¶
The following variables were removed from the task instance context:
end_date
latest_date
tables
airflow.contrib.utils.Weekday
¶
Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib package was supported by the community. The project was passed to the Apache community and currently the entire code is maintained by the community, so now the division has no justification, and it is only due to historical reasons.
To clean up, Weekday
enum has been moved from airflow.contrib.utils
into airflow.utils
module.
airflow.models.connection.Connection
¶
The connection module has new deprecated methods:
Connection.parse_from_uri
Connection.log_info
Connection.debug_info
and one deprecated function:
parse_netloc_to_hostname
Previously, users could create a connection object in two ways
conn_1 = Connection(conn_id="conn_a", uri="mysql://AAA/")
# or
conn_2 = Connection(conn_id="conn_a")
conn_2.parse_uri(uri="mysql://AAA/")
Now the second way is not supported.
Connection.log_info
and Connection.debug_info
method have been deprecated. Read each Connection field individually or use the
default representation (__repr__
).
The old method is still works but can be abandoned at any time. The changes are intended to delete method that are rarely used.
airflow.models.dag.DAG.create_dagrun
¶
DAG.create_dagrun accepts run_type and does not require run_id
This change is caused by adding run_type
column to DagRun
.
Previous signature:
def create_dagrun(
self,
run_id,
state,
execution_date=None,
start_date=None,
external_trigger=False,
conf=None,
session=None,
):
...
current:
def create_dagrun(
self,
state,
execution_date=None,
run_id=None,
start_date=None,
external_trigger=False,
conf=None,
run_type=None,
session=None,
):
...
If user provides run_id
then the run_type
will be derived from it by checking prefix, allowed types
: manual
, scheduled
, backfill
(defined by airflow.utils.types.DagRunType
).
If user provides run_type
and execution_date
then run_id
is constructed as
{run_type}__{execution_data.isoformat()}
.
Airflow should construct dagruns using run_type
and execution_date
, creation using
run_id
is preserved for user actions.
airflow.models.dagrun.DagRun
¶
Use DagRunType.SCHEDULED.value instead of DagRun.ID_PREFIX
All the run_id prefixes for different kind of DagRuns have been grouped into a single
enum in airflow.utils.types.DagRunType
.
Previously, there were defined in various places, example as ID_PREFIX
class variables for
DagRun
, BackfillJob
and in _trigger_dag
function.
Was:
>> from airflow.models.dagrun import DagRun
>> DagRun.ID_PREFIX
scheduled__
Replaced by:
>> from airflow.utils.types import DagRunType
>> DagRunType.SCHEDULED.value
scheduled
airflow.utils.file.TemporaryDirectory
¶
We remove airflow.utils.file.TemporaryDirectory
Since Airflow dropped support for Python < 3.5 there’s no need to have this custom
implementation of TemporaryDirectory
because the same functionality is provided by
tempfile.TemporaryDirectory
.
Now users instead of import from airflow.utils.files import TemporaryDirectory
should
do from tempfile import TemporaryDirectory
. Both context managers provide the same
interface, thus no additional changes should be required.
airflow.AirflowMacroPlugin
¶
We removed airflow.AirflowMacroPlugin
class. The class was there in airflow package but it has not been used (apparently since 2015).
It has been removed.
airflow.settings.CONTEXT_MANAGER_DAG
¶
CONTEXT_MANAGER_DAG was removed from settings. Its role has been taken by DagContext
in
‘airflow.models.dag’. One of the reasons was that settings should be rather static than store
dynamic context from the DAG, but the main one is that moving the context out of settings allowed to
untangle cyclic imports between DAG, BaseOperator, SerializedDAG, SerializedBaseOperator which was
part of AIRFLOW-6010.
airflow.utils.log.logging_mixin.redirect_stderr
¶
airflow.utils.log.logging_mixin.redirect_stdout
¶
Function redirect_stderr
and redirect_stdout
from airflow.utils.log.logging_mixin
module has
been deleted because it can be easily replaced by the standard library.
The functions of the standard library are more flexible and can be used in larger cases.
The code below
import logging
from airflow.utils.log.logging_mixin import redirect_stderr, redirect_stdout
logger = logging.getLogger("custom-logger")
with redirect_stdout(logger, logging.INFO), redirect_stderr(logger, logging.WARN):
print("I love Airflow")
can be replaced by the following code:
from contextlib import redirect_stdout, redirect_stderr
import logging
from airflow.utils.log.logging_mixin import StreamLogWriter
logger = logging.getLogger("custom-logger")
with redirect_stdout(StreamLogWriter(logger, logging.INFO)), redirect_stderr(
StreamLogWriter(logger, logging.WARN)
):
print("I Love Airflow")
airflow.models.baseoperator.BaseOperator
¶
Now, additional arguments passed to BaseOperator cause an exception. Previous versions of Airflow took additional arguments and displayed a message on the console. When the message was not noticed by users, it caused very difficult to detect errors.
In order to restore the previous behavior, you must set an True
in the allow_illegal_arguments
option of section [operators]
in the airflow.cfg
file. In the future it is possible to completely
delete this option.
airflow.models.dagbag.DagBag
¶
Passing store_serialized_dags
argument to DagBag.init and accessing DagBag.store_serialized_dags
property
are deprecated and will be removed in future versions.
Previous signature:
def __init__(
dag_folder=None,
include_examples=conf.getboolean("core", "LOAD_EXAMPLES"),
safe_mode=conf.getboolean("core", "DAG_DISCOVERY_SAFE_MODE"),
store_serialized_dags=False,
):
...
current:
def __init__(
dag_folder=None,
include_examples=conf.getboolean("core", "LOAD_EXAMPLES"),
safe_mode=conf.getboolean("core", "DAG_DISCOVERY_SAFE_MODE"),
read_dags_from_db=False,
):
...
If you were using positional arguments, it requires no change but if you were using keyword
arguments, please change store_serialized_dags
to read_dags_from_db
.
Similarly, if you were using DagBag().store_serialized_dags
property, change it to
DagBag().read_dags_from_db
.
Changes in google
provider package¶
We strive to ensure that there are no changes that may affect the end user and your Python files, but this release may contain changes that will require changes to your configuration, DAG Files or other integration e.g. custom operators.
Only changes unique to this provider are described here. You should still pay attention to the changes that have been made to the core (including core operators) as they can affect the integration behavior of this provider.
This section describes the changes that have been made, and what you need to do to update your if you use operators or hooks which integrate with Google services (including Google Cloud - GCP).
Direct impersonation added to operators communicating with Google services¶
Directly impersonating a service account
has been made possible for operators communicating with Google services via new argument called impersonation_chain
(google_impersonation_chain
in case of operators that also communicate with services of other cloud providers).
As a result, GCSToS3Operator no longer derivatives from GCSListObjectsOperator.
Normalize gcp_conn_id for Google Cloud¶
Previously not all hooks and operators related to Google Cloud use
gcp_conn_id
as parameter for GCP connection. There is currently one parameter
which apply to most services. Parameters like datastore_conn_id
, bigquery_conn_id
,
google_cloud_storage_conn_id
and similar have been deprecated. Operators that require two connections are not changed.
Following components were affected by normalization:
airflow.providers.google.cloud.hooks.datastore.DatastoreHook
airflow.providers.google.cloud.hooks.bigquery.BigQueryHook
airflow.providers.google.cloud.hooks.gcs.GoogleCloudStorageHook
airflow.providers.google.cloud.operators.bigquery.BigQueryCheckOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryValueCheckOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryIntervalCheckOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryGetDataOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryDeleteDatasetOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryCreateEmptyDatasetOperator
airflow.providers.google.cloud.operators.bigquery.BigQueryTableDeleteOperator
airflow.providers.google.cloud.operators.gcs.GoogleCloudStorageCreateBucketOperator
airflow.providers.google.cloud.operators.gcs.GoogleCloudStorageListOperator
airflow.providers.google.cloud.operators.gcs.GoogleCloudStorageDownloadOperator
airflow.providers.google.cloud.operators.gcs.GoogleCloudStorageDeleteOperator
airflow.providers.google.cloud.operators.gcs.GoogleCloudStorageBucketCreateAclEntryOperator
airflow.providers.google.cloud.operators.gcs.GoogleCloudStorageObjectCreateAclEntryOperator
airflow.operators.sql_to_gcs.BaseSQLToGoogleCloudStorageOperator
airflow.operators.adls_to_gcs.AdlsToGoogleCloudStorageOperator
airflow.operators.gcs_to_s3.GoogleCloudStorageToS3Operator
airflow.operators.gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator
airflow.operators.bigquery_to_gcs.BigQueryToCloudStorageOperator
airflow.operators.local_to_gcs.FileToGoogleCloudStorageOperator
airflow.operators.cassandra_to_gcs.CassandraToGoogleCloudStorageOperator
airflow.operators.bigquery_to_bigquery.BigQueryToBigQueryOperator
Changes to import paths and names of GCP operators and hooks¶
According to AIP-21 operators related to Google Cloud has been moved from contrib to core. The following table shows changes in import paths.
Old path |
New path |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Unify default conn_id for Google Cloud¶
Previously not all hooks and operators related to Google Cloud use
google_cloud_default
as a default conn_id. There is currently one default
variant. Values like google_cloud_storage_default
, bigquery_default
,
google_cloud_datastore_default
have been deprecated. The configuration of
existing relevant connections in the database have been preserved. To use those
deprecated GCP conn_id, you need to explicitly pass their conn_id into
operators/hooks. Otherwise, google_cloud_default
will be used as GCP’s conn_id
by default.
airflow.providers.google.cloud.hooks.dataflow.DataflowHook
¶
airflow.providers.google.cloud.operators.dataflow.DataflowCreateJavaJobOperator
¶
airflow.providers.google.cloud.operators.dataflow.DataflowTemplatedJobStartOperator
¶
airflow.providers.google.cloud.operators.dataflow.DataflowCreatePythonJobOperator
¶
To use project_id argument consistently across GCP hooks and operators, we did the following changes:
- Changed order of arguments in DataflowHook.start_python_dataflow. Uses
with positional arguments may break.
- Changed order of arguments in DataflowHook.is_job_dataflow_running. Uses
with positional arguments may break.
- Changed order of arguments in DataflowHook.cancel_job. Uses
with positional arguments may break.
- Added optional project_id argument to DataflowCreateJavaJobOperator
constructor.
- Added optional project_id argument to DataflowTemplatedJobStartOperator
constructor.
- Added optional project_id argument to DataflowCreatePythonJobOperator
constructor.
airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor
¶
To provide more precise control in handling of changes to objects in underlying GCS Bucket the constructor of this sensor now has changed.
Old Behavior: This constructor used to optionally take
previous_num_objects: int
.New replacement constructor kwarg:
previous_objects: Optional[Set[str]]
.
Most users would not specify this argument because the bucket begins empty and the user wants to treat any files as new.
Example of Updating usage of this sensor: Users who used to call:
GCSUploadSessionCompleteSensor(bucket='my_bucket', prefix='my_prefix', previous_num_objects=1)
Will now call:
GCSUploadSessionCompleteSensor(bucket='my_bucket', prefix='my_prefix', previous_num_objects={'.keep'})
Where ‘.keep’ is a single file at your prefix that the sensor should not consider new.
airflow.providers.google.cloud.hooks.bigquery.BigQueryBaseCursor
¶
airflow.providers.google.cloud.hooks.bigquery.BigQueryHook
¶
To simplify BigQuery operators (no need of Cursor
) and standardize usage of hooks within all GCP integration methods from BiqQueryBaseCursor
were moved to BigQueryHook
. Using them by from Cursor
object is still possible due to preserved backward compatibility but they will raise DeprecationWarning
.
The following methods were moved:
Old path |
New path |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
airflow.providers.google.cloud.hooks.bigquery.BigQueryBaseCursor
¶
Since BigQuery is the part of the GCP it was possible to simplify the code by handling the exceptions
by usage of the airflow.providers.google.common.hooks.base.GoogleBaseHook.catch_http_exception
decorator however it changes
exceptions raised by the following methods:
airflow.providers.google.cloud.hooks.bigquery.BigQueryBaseCursor.run_table_delete
raisesAirflowException
instead ofException
.airflow.providers.google.cloud.hooks.bigquery.BigQueryBaseCursor.create_empty_dataset
raisesAirflowException
instead ofValueError
.airflow.providers.google.cloud.hooks.bigquery.BigQueryBaseCursor.get_dataset
raisesAirflowException
instead ofValueError
.
airflow.providers.google.cloud.operators.bigquery.BigQueryCreateEmptyTableOperator
¶
airflow.providers.google.cloud.operators.bigquery.BigQueryCreateEmptyDatasetOperator
¶
Idempotency was added to BigQueryCreateEmptyTableOperator
and BigQueryCreateEmptyDatasetOperator
.
But to achieve that try / except clause was removed from create_empty_dataset
and create_empty_table
methods of BigQueryHook
.
airflow.providers.google.cloud.hooks.dataflow.DataflowHook
¶
airflow.providers.google.cloud.hooks.mlengine.MLEngineHook
¶
airflow.providers.google.cloud.hooks.pubsub.PubSubHook
¶
The change in GCP operators implies that GCP Hooks for those operators require now keyword parameters rather
than positional ones in all methods where project_id
is used. The methods throw an explanatory exception
in case they are called using positional parameters.
Other GCP hooks are unaffected.
airflow.providers.google.cloud.hooks.pubsub.PubSubHook
¶
airflow.providers.google.cloud.operators.pubsub.PubSubTopicCreateOperator
¶
airflow.providers.google.cloud.operators.pubsub.PubSubSubscriptionCreateOperator
¶
airflow.providers.google.cloud.operators.pubsub.PubSubTopicDeleteOperator
¶
airflow.providers.google.cloud.operators.pubsub.PubSubSubscriptionDeleteOperator
¶
airflow.providers.google.cloud.operators.pubsub.PubSubPublishOperator
¶
airflow.providers.google.cloud.sensors.pubsub.PubSubPullSensor
¶
In the PubSubPublishOperator
and PubSubHook.publish
method the data field in a message should be bytestring (utf-8 encoded) rather than base64 encoded string.
Due to the normalization of the parameters within GCP operators and hooks a parameters like project
or topic_project
are deprecated and will be substituted by parameter project_id
.
In PubSubHook.create_subscription
hook method in the parameter subscription_project
is replaced by subscription_project_id
.
Template fields are updated accordingly and old ones may not work.
It is required now to pass key-word only arguments to PubSub
hook.
These changes are not backward compatible.
airflow.providers.google.cloud.operators.kubernetes_engine.GKEStartPodOperator
¶
The gcp_conn_id parameter in GKEPodOperator is required. In previous versions, it was possible to pass
the None
value to the gcp_conn_id
in the GKEStartPodOperator
operator, which resulted in credentials being determined according to the
Application Default Credentials strategy.
Now this parameter requires a value. To restore the previous behavior, configure the connection without specifying the service account.
Detailed information about connection management is available: Google Cloud Connection.
airflow.providers.google.cloud.hooks.gcs.GCSHook
¶
The following parameters have been replaced in all the methods in GCSHook:
bucket
is changed tobucket_name
object
is changed toobject_name
The
maxResults
parameter inGoogleCloudStorageHook.list
has been renamed tomax_results
for consistency.
airflow.providers.google.cloud.operators.dataproc.DataprocSubmitPigJobOperator
¶
airflow.providers.google.cloud.operators.dataproc.DataprocSubmitHiveJobOperator
¶
airflow.providers.google.cloud.operators.dataproc.DataprocSubmitSparkSqlJobOperator
¶
airflow.providers.google.cloud.operators.dataproc.DataprocSubmitSparkJobOperator
¶
airflow.providers.google.cloud.operators.dataproc.DataprocSubmitHadoopJobOperator
¶
airflow.providers.google.cloud.operators.dataproc.DataprocSubmitPySparkJobOperator
¶
The ‘properties’ and ‘jars’ properties for the Dataproc related operators (DataprocXXXOperator
) have been renamed from
dataproc_xxxx_properties
and dataproc_xxx_jars
to dataproc_properties
and dataproc_jars
respectively.
Arguments for dataproc_properties dataproc_jars
airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCreateJobOperator
¶
To obtain pylint compatibility the filter
argument in CloudDataTransferServiceCreateJobOperator
has been renamed to request_filter
.
airflow.providers.google.cloud.hooks.cloud_storage_transfer_service.CloudDataTransferServiceHook
¶
To obtain pylint compatibility the
filter
argument inCloudDataTransferServiceHook.list_transfer_job
andCloudDataTransferServiceHook.list_transfer_operations
has been renamed torequest_filter
.
airflow.providers.google.cloud.hooks.bigquery.BigQueryHook
¶
In general all hook methods are decorated with @GoogleBaseHook.fallback_to_default_project_id
thus
parameters to hook can only be passed via keyword arguments.
create_empty_table
method accepts nowtable_resource
parameter. If provided all other parameters are ignored.create_empty_dataset
will now use values fromdataset_reference
instead of raising error if parameters were passed indataset_reference
and as arguments to method. Additionally validation ofdataset_reference
is done usingDataset.from_api_repr
. Exception and log messages has been changed.update_dataset
requires now newfields
argument (breaking change)delete_dataset
has new signature (dataset_id, project_id, …) previous one was (project_id, dataset_id, …) (breaking change)get_tabledata
returns list of rows instead of API response in dict format. This method is deprecated in favor oflist_rows
. (breaking change)
airflow.providers.google.cloud.hooks.cloud_build.CloudBuildHook
¶
airflow.providers.google.cloud.operators.cloud_build.CloudBuildCreateBuildOperator
¶
- The
api_version
has been removed and will not be used since we migrateCloudBuildHook
from using Discovery API to native google-cloud-build python library.
- The
body
parameter inCloudBuildCreateBuildOperator
has been deprecated. Instead, you should pass body using the
build
parameter.
airflow.providers.google.cloud.hooks.dataflow.DataflowHook.start_python_dataflow
¶
airflow.providers.google.cloud.hooks.dataflow.DataflowHook.start_python_dataflow
¶
airflow.providers.google.cloud.operators.dataflow.DataflowCreatePythonJobOperator
¶
Change python3 as Dataflow Hooks/Operators default interpreter
Now the py_interpreter
argument for DataFlow Hooks/Operators has been changed from python2 to python3.
airflow.providers.google.common.hooks.base_google.GoogleBaseHook
¶
To simplify the code, the decorator provide_gcp_credential_file has been moved from the inner-class.
Instead of @GoogleBaseHook._Decorators.provide_gcp_credential_file
,
you should write @GoogleBaseHook.provide_gcp_credential_file
airflow.providers.google.cloud.operators.dataproc.DataprocCreateClusterOperator
¶
It is highly recommended to have 1TB+ disk size for Dataproc to have sufficient throughput: https://cloud.google.com/compute/docs/disks/performance
Hence, the default value for master_disk_size
in DataprocCreateClusterOperator
has been changed from 500GB to 1TB.
Generating Cluster Config¶
If you are upgrading from Airflow 1.10.x and are not using CLUSTER_CONFIG,
You can easily generate config using make() of airflow.providers.google.cloud.operators.dataproc.ClusterGenerator
This has been proved specially useful if you are using metadata argument from older API, refer AIRFLOW-16911 for details.
eg. your cluster creation may look like this in v1.10.x
path = f"gs://goog-dataproc-initialization-actions-us-central1/python/pip-install.sh"
create_cluster = DataprocClusterCreateOperator(
task_id="create_dataproc_cluster",
cluster_name="test",
project_id="test",
zone="us-central1-a",
region="us-central1",
master_machine_type="n1-standard-4",
worker_machine_type="n1-standard-4",
num_workers=2,
storage_bucket="test_bucket",
init_actions_uris=[path],
metadata={"PIP_PACKAGES": "pyyaml requests pandas openpyxl"},
)
After upgrading to v2.x.x and using CLUSTER_CONFIG, it will look like followed:
path = f"gs://goog-dataproc-initialization-actions-us-central1/python/pip-install.sh"
CLUSTER_CONFIG = ClusterGenerator(
project_id="test",
zone="us-central1-a",
master_machine_type="n1-standard-4",
worker_machine_type="n1-standard-4",
num_workers=2,
storage_bucket="test",
init_actions_uris=[path],
metadata={"PIP_PACKAGES": "pyyaml requests pandas openpyxl"},
).make()
create_cluster_operator = DataprocClusterCreateOperator(
task_id="create_dataproc_cluster",
cluster_name="test",
project_id="test",
region="us-central1",
cluster_config=CLUSTER_CONFIG,
)