Integration

Azure: Microsoft Azure

Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and Azure Data Lake Hook are in contrib section.

Logging

Airflow can be configured to read and write task logs in Azure Blob Storage. See Writing Logs to Azure Blob Storage.

Azure Blob Storage

All classes communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default for an example).

The operators are defined in the following module:

They use airflow.contrib.hooks.wasb_hook.WasbHook to communicate with Microsoft Azure.

Azure File Share

Cloud variant of a SMB file share. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=Storage account key), or login and SAS token in the extra field (see connection wasb_default for an example).

It uses airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook to communicate with Microsoft Azure.

Azure CosmosDB

AzureCosmosDBHook communicates via the Azure Cosmos library. Make sure that a Airflow connection of type azure_cosmos exists. Authorization can be done by supplying a login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the default database and collection to use (see connection azure_cosmos_default for an example).

The operators are defined in the following modules:

They also use airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook to communicate with Microsoft Azure.

Azure Data Lake

AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a Airflow connection of type azure_data_lake exists. Authorization can be done by supplying a login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name) (see connection azure_data_lake_default for an example).

The operators are defined in the following modules:

They also use airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook to communicate with Microsoft Azure.

Azure Container Instances

Azure Container Instances provides a method to run a docker container without having to worry about managing infrastructure. The AzureContainerInstanceHook requires a service principal. The credentials for this principal can either be defined in the extra field key_path, as an environment variable named AZURE_AUTH_LOCATION, or by providing a login/password and tenantId in extras.

The operator is defined in the airflow.contrib.operators.azure_container_instances_operator module.

They also use airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook, airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook and airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook to communicate with Microsoft Azure.

The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.

AWS: Amazon Web Services

Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and Operators are in the contrib section.

Logging

Airflow can be configured to read and write task logs in Amazon Simple Storage Service (Amazon S3). See Writing Logs to Amazon S3.

AWS S3

The operators are defined in the following modules:

They also use airflow.hooks.S3_hook.S3Hook to communicate with Amazon Web Service.

AWS Batch Service

The operator is defined in the airflow.contrib.operators.awsbatch_operator.AWSBatchOperator module.

AWS RedShift

The operators are defined in the following modules:

They also use airflow.contrib.hooks.redshift_hook.RedshiftHook to communicate with Amazon Web Service.

AWS DynamoDB

The operator is defined in the airflow.contrib.operators.hive_to_dynamodb module.

It uses airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook to communicate with Amazon Web Service.

AWS Lambda

It uses airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook to communicate with Amazon Web Service.

AWS Kinesis

It uses airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook to communicate with Amazon Web Service.

Databricks

With contributions from Databricks, Airflow has several operators which enable the submitting and running of jobs to the Databricks platform. Internally the operators talk to the api/2.0/jobs/runs/submit endpoint.

The operators are defined in the airflow.contrib.operators.databricks_operator module.

GCP: Google Cloud Platform

Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.

See the GCP connection type documentation to configure connections to GCP.

Logging

Airflow can be configured to read and write task logs in Google Cloud Storage. See Writing Logs to Google Cloud Storage.

GoogleCloudBaseHook

All hooks is based on airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook.

Cloud Spanner

The operator is defined in the airflow.contrib.operators.gcp_spanner_operator package.

They also use airflow.contrib.hooks.gcp_spanner_hook.CloudSpannerHook to communicate with Google Cloud Platform.

Cloud SQL

The operator is defined in the airflow.contrib.operators.gcp_sql_operator package.

They also use airflow.contrib.hooks.gcp_sql_hook.CloudSqlDatabaseHook and airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook to communicate with Google Cloud Platform.

Cloud Bigtable

The operator is defined in the airflow.contrib.operators.gcp_bigtable_operator package.

They also use airflow.contrib.hooks.gcp_bigtable_hook.BigtableHook to communicate with Google Cloud Platform.

Cloud Build

The operator is defined in the airflow.contrib.operators.gcp_cloud_build_operator package.

They also use airflow.contrib.hooks.gcp_cloud_build_hook.CloudBuildHook to communicate with Google Cloud Platform.

Compute Engine

The operators are defined in the airflow.contrib.operators.gcp_compute_operator package.

They also use airflow.contrib.hooks.gcp_compute_hook.GceHook to communicate with Google Cloud Platform.

Cloud Functions

The operators are defined in the airflow.contrib.operators.gcp_function_operator package.

They also use airflow.contrib.hooks.gcp_function_hook.GcfHook to communicate with Google Cloud Platform.

Cloud DataFlow

The operators are defined in the airflow.contrib.operators.dataflow_operator package.

They also use airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook to communicate with Google Cloud Platform.

Cloud DataProc

The operators are defined in the airflow.contrib.operators.dataproc_operator package.

Cloud Datastore

airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator

Export entities from Google Cloud Datastore to Cloud Storage.

airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator

Import entities from Cloud Storage to Google Cloud Datastore.

They also use airflow.contrib.hooks.datastore_hook.DatastoreHook to communicate with Google Cloud Platform.

Cloud ML Engine

airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator

Start a Cloud ML Engine batch prediction job.

airflow.contrib.operators.mlengine_operator.MLEngineModelOperator

Manages a Cloud ML Engine model.

airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator

Start a Cloud ML Engine training job.

airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator

Manages a Cloud ML Engine model version.

The operators are defined in the airflow.contrib.operators.mlengine_operator package.

They also use airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook to communicate with Google Cloud Platform.

Transfer Service

The operators are defined in the following module:

They also use airflow.contrib.hooks.gcp_transfer_hook.GCPTransferServiceHook to communicate with Google Cloud Platform.

Cloud Vision

The operator is defined in the airflow.contrib.operators.gcp_vision_operator package.

They also use airflow.contrib.hooks.gcp_vision_hook.CloudVisionHook to communicate with Google Cloud Platform.

Cloud Text to Speech

The operator is defined in the airflow.contrib.operators.gcp_text_to_speech_operator package.

They also use airflow.contrib.hooks.gcp_text_to_speech_hook.GCPTextToSpeechHook to communicate with Google Cloud Platform.

Cloud Speech to Text

The operator is defined in the airflow.contrib.operators.gcp_speech_to_text_operator package.

They also use airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook to communicate with Google Cloud Platform.

Cloud Speech Translate Operators

The operator is defined in the airflow.contrib.operators.gcp_translate_speech_operator package.

They also use airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook and

airflow.contrib.hooks.gcp_translate_hook.CloudTranslateHook to communicate with Google Cloud Platform.

Cloud Translate

Cloud Translate Text Operators

airflow.contrib.operators.gcp_translate_operator.CloudTranslateTextOperator

Translate a string or list of strings.

The operator is defined in the airflow.contrib.operators.gcp_translate_operator package.

Cloud Video Intelligence

The operators are defined in the airflow.contrib.operators.gcp_video_intelligence_operator package.

They also use airflow.contrib.hooks.gcp_video_intelligence_hook.CloudVideoIntelligenceHook to communicate with Google Cloud Platform.

Google Kubernetes Engine

The operators are defined in the airflow.contrib.operators.gcp_container_operator package.

They also use airflow.contrib.hooks.gcp_container_hook.GKEClusterHook to communicate with Google Cloud Platform.

Google Natural Language

The operators are defined in the airflow.contrib.operators.gcp_natural_language_operator package.

They also use airflow.contrib.hooks.gcp_natural_language_operator.CloudNaturalLanguageHook to communicate with Google Cloud Platform.

Google Cloud Data Loss Prevention (DLP)

The operators are defined in the airflow.contrib.operators.gcp_dlp_operator package.

They also use airflow.contrib.hooks.gcp_dlp_hook.CloudDLPHook to communicate with Google Cloud Platform.

Google Cloud Tasks

The operators are defined in the airflow.contrib.operators.gcp_tasks_operator package.

They also use airflow.contrib.hooks.gcp_tasks_hook.CloudTasksHook to communicate with Google Cloud Platform.

Qubole

Apache Airflow has a native operator and hooks to talk to Qubole, which lets you submit your big data jobs directly to Qubole from Apache Airflow.

The operators are defined in the following module:

Was this entry helpful?