apache-airflow-providers-google
¶
Package apache-airflow-providers-google¶
Google services including:
Google Workspace (formerly Google Suite)
Release: 3.0.0
Provider package¶
This is a provider package for google
provider. All classes for this provider package
are in airflow.providers.google
python package.
Installation¶
You can install this package on top of an existing airflow 2.* installation via
pip install apache-airflow-providers-google
PIP requirements¶
PIP package |
Version required |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cross provider package dependencies¶
Those are dependencies that might be needed in order to use all the features of the package. You need to install the specified provider packages in order to use them.
You can install such cross-provider dependencies when installing from PyPI. For example:
pip install apache-airflow-providers-google[amazon]
Dependent package |
Extra |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Changelog¶
3.0.0¶
Breaking changes¶
Integration with the apache.beam
provider¶
In 3.0.0 version of the provider we've changed the way of integrating with the apache.beam
provider.
The previous versions of both providers caused conflicts when trying to install them together
using PIP > 20.2.4. The conflict is not detected by PIP 20.2.4 and below but it was there and
the version of Google BigQuery
python client was not matching on both sides. As the result, when
both apache.beam
and google
provider were installed, some features of the BigQuery
operators
might not work properly. This was cause by apache-beam
client not yet supporting the new google
python clients when apache-beam[gcp]
extra was used. The apache-beam[gcp]
extra is used
by Dataflow
operators and while they might work with the newer version of the Google BigQuery
python client, it is not guaranteed.
This version introduces additional extra requirement for the apache.beam
extra of the google
provider
and symmetrically the additional requirement for the google
extra of the apache.beam
provider.
Both google
and apache.beam
provider do not use those extras by default, but you can specify
them when installing the providers. The consequence of that is that some functionality of the Dataflow
operators might not be available.
Unfortunately the only complete
solution to the problem is for the apache.beam
to migrate to the
new (>=2.0.0) Google Python clients.
This is the extra for the google
provider:
extras_require={
...
'apache.beam': ['apache-airflow-providers-apache-beam', 'apache-beam[gcp]'],
....
},
And likewise this is the extra for the apache.beam
provider:
extras_require={'google': ['apache-airflow-providers-google', 'apache-beam[gcp]']},
You can still run this with PIP version <= 20.2.4 and go back to the previous behaviour:
pip install apache-airflow-providers-google[apache.beam]
or
pip install apache-airflow-providers-apache-beam[google]
But be aware that some BigQuery
operators functionality might not be available in this case.
Features¶
[Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator (#15250)
Fixes¶
Bugfix: Fix rendering of ''object_name'' in ''GCSToLocalFilesystemOperator'' (#15487)
Fix typo in DataprocCreateClusterOperator (#15462)
Fixes wrongly specified path for leveldb hook (#15453)
2.2.0¶
Features¶
Adds 'Trino' provider (with lower memory footprint for tests) (#15187)
update remaining old import paths of operators (#15127)
Override project in dataprocSubmitJobOperator (#14981)
GCS to BigQuery Transfer Operator with Labels and Description parameter (#14881)
Add GCS timespan transform operator (#13996)
Add job labels to bigquery check operators. (#14685)
Use libyaml C library when available. (#14577)
Add Google leveldb hook and operator (#13109) (#14105)
Bug fixes¶
Google Dataflow Hook to handle no Job Type (#14914)
2.1.0¶
Features¶
Corrects order of argument in docstring in GCSHook.download method (#14497)
Refactor SQL/BigQuery/Qubole/Druid Check operators (#12677)
Add GoogleDriveToLocalOperator (#14191)
Add 'exists_ok' flag to BigQueryCreateEmptyTable(Dataset)Operator (#14026)
Add materialized view support for BigQuery (#14201)
Add BigQueryUpdateTableOperator (#14149)
Add param to CloudDataTransferServiceOperator (#14118)
Add gdrive_to_gcs operator, drive sensor, additional functionality to drive hook (#13982)
Improve GCSToSFTPOperator paths handling (#11284)
Bug Fixes¶
Fixes to dataproc operators and hook (#14086)
#9803 fix bug in copy operation without wildcard (#13919)
2.0.0¶
Breaking changes¶
Updated google-cloud-*
libraries¶
This release of the provider package contains third-party library updates, which may require updating your DAG files or custom hooks and operators, if you were using objects from those libraries. Updating of these libraries is necessary to be able to use new features made available by new versions of the libraries and to obtain bug fixes that are only available for new versions of the library.
Details are covered in the UPDATING.md files for each library, but there are some details that you should pay attention to.
Library name |
Previous constraints |
Current constraints |
Upgrade Documentation |
---|---|---|---|
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
The field names use the snake_case convention¶
If your DAG uses an object from the above mentioned libraries passed by XCom, it is necessary to update the naming convention of the fields that are read. Previously, the fields used the CamelSnake convention, now the snake_case convention is used.
Before:
set_acl_permission = GCSBucketCreateAclEntryOperator(
task_id="gcs-set-acl-permission",
bucket=BUCKET_NAME,
entity="user-{{ task_instance.xcom_pull('get-instance')['persistenceIamIdentity']"
".split(':', 2)[1] }}",
role="OWNER",
)
After:
set_acl_permission = GCSBucketCreateAclEntryOperator(
task_id="gcs-set-acl-permission",
bucket=BUCKET_NAME,
entity="user-{{ task_instance.xcom_pull('get-instance')['persistence_iam_identity']"
".split(':', 2)[1] }}",
role="OWNER",
)
Features¶
Add Apache Beam operators (#12814)
Add Google Cloud Workflows Operators (#13366)
Replace 'google_cloud_storage_conn_id' by 'gcp_conn_id' when using 'GCSHook' (#13851)
Add How To Guide for Dataflow (#13461)
Generalize MLEngineStartTrainingJobOperator to custom images (#13318)
Add Parquet data type to BaseSQLToGCSOperator (#13359)
Add DataprocCreateWorkflowTemplateOperator (#13338)
Add OracleToGCS Transfer (#13246)
Add timeout option to gcs hook methods. (#13156)
Add regional support to dataproc workflow template operators (#12907)
Add project_id to client inside BigQuery hook update_table method (#13018)
Bug fixes¶
Fix four bugs in StackdriverTaskHandler (#13784)
Decode Remote Google Logs (#13115)
Fix and improve GCP BigTable hook and system test (#13896)
updated Google DV360 Hook to fix SDF issue (#13703)
Fix insert_all method of BigQueryHook to support tables without schema (#13138)
Fix Google BigQueryHook method get_schema() (#13136)
Fix Data Catalog operators (#13096)
1.0.0¶
Initial version of the provider.