Integration

Azure: Microsoft Azure

Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and Azure Data Lake Hook are in contrib section.

Azure Blob Storage

All classes communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default for an example).

airflow.contrib.hooks.wasb_hook.WasbHook

Interface with Azure Blob Storage.

airflow.contrib.sensors.wasb_sensor.WasbBlobSensor

Checks if a blob is present on Azure Blob storage.

airflow.contrib.operators.wasb_delete_blob_operator.WasbDeleteBlobOperator

Deletes blob(s) on Azure Blob Storage.

airflow.contrib.sensors.wasb_sensor.WasbPrefixSensor

Checks if blobs matching a prefix are present on Azure Blob storage.

airflow.contrib.operators.file_to_wasb.FileToWasbOperator

Uploads a local file to a container as a blob.

Azure File Share

Cloud variant of a SMB file share. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=Storage account key), or login and SAS token in the extra field (see connection wasb_default for an example).

airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook:

Interface with Azure File Share.

Logging

Airflow can be configured to read and write task logs in Azure Blob Storage. See Writing Logs to Azure Blob Storage.

Azure CosmosDB

AzureCosmosDBHook communicates via the Azure Cosmos library. Make sure that a Airflow connection of type azure_cosmos exists. Authorization can be done by supplying a login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the default database and collection to use (see connection azure_cosmos_default for an example).

airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook

Interface with Azure CosmosDB.

airflow.contrib.operators.azure_cosmos_operator.AzureCosmosInsertDocumentOperator

Simple operator to insert document into CosmosDB.

airflow.contrib.sensors.azure_cosmos_sensor.AzureCosmosDocumentSensor

Simple sensor to detect document existence in CosmosDB.

Azure Data Lake

AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a Airflow connection of type azure_data_lake exists. Authorization can be done by supplying a login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name) (see connection azure_data_lake_default for an example).

airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook

Interface with Azure Data Lake.

airflow.contrib.operators.adls_list_operator.AzureDataLakeStorageListOperator

Lists the files located in a specified Azure Data Lake path.

airflow.contrib.operators.adls_to_gcs.AdlsToGoogleCloudStorageOperator

Copies files from an Azure Data Lake path to a Google Cloud Storage bucket.

Azure Container Instances

Azure Container Instances provides a method to run a docker container without having to worry about managing infrastructure. The AzureContainerInstanceHook requires a service principal. The credentials for this principal can either be defined in the extra field key_path, as an environment variable named AZURE_AUTH_LOCATION, or by providing a login/password and tenantId in extras.

The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.

airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook

Interface with Azure Container Volumes

airflow.contrib.operators.azure_container_instances_operator.AzureContainerInstancesOperator

Start/Monitor a new ACI.

airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook

Wrapper around a single ACI.

airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook

Interface with ACR

AWS: Amazon Web Services

Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and Operators are in the contrib section.

AWS S3

airflow.hooks.S3_hook.S3Hook

Interface with AWS S3.

airflow.operators.s3_file_transform_operator.S3FileTransformOperator

Copies data from a source S3 location to a temporary location on the local filesystem.

airflow.contrib.operators.s3_list_operator.S3ListOperator

Lists the files matching a key prefix from a S3 location.

airflow.contrib.operators.s3_to_gcs_operator.S3ToGoogleCloudStorageOperator

Syncs an S3 location with a Google Cloud Storage bucket.

airflow.contrib.operators.s3_to_gcs_transfer_operator.S3ToGoogleCloudStorageTransferOperator

Syncs an S3 bucket with a Google Cloud Storage bucket using the GCP Storage Transfer Service.

airflow.operators.s3_to_hive_operator.S3ToHiveTransfer

Moves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table.

AWS Batch Service

airflow.contrib.operators.awsbatch_operator.AWSBatchOperator

Execute a task on AWS Batch Service.

AWS RedShift

airflow.contrib.sensors.aws_redshift_cluster_sensor.AwsRedshiftClusterSensor

Waits for a Redshift cluster to reach a specific status.

airflow.contrib.hooks.redshift_hook.RedshiftHook

Interact with AWS Redshift, using the boto3 library.

airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer

Executes an unload command to S3 as CSV with or without headers.

airflow.operators.s3_to_redshift_operator.S3ToRedshiftTransfer

Executes an copy command from S3 as CSV with or without headers.

AWS Lambda

airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook

Interface with AWS Lambda.

AWS Kinesis

airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook

Interface with AWS Kinesis Firehose.

Databricks

Databricks has contributed an Airflow operator which enables submitting runs to the Databricks platform. Internally the operator talks to the api/2.0/jobs/runs/submit endpoint.

airflow.contrib.operators.databricks_operator.DatabricksSubmitRunOperator

Submits a Spark job run to Databricks using the api/2.0/jobs/runs/submit API endpoint.

GCP: Google Cloud Platform

Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.

See the GCP connection type documentation to configure connections to GCP.

Logging

Airflow can be configured to read and write task logs in Google Cloud Storage. See Writing Logs to Google Cloud Storage.

GoogleCloudBaseHook

All hooks is based on airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook.

BigQuery

airflow.contrib.operators.bigquery_check_operator.BigQueryCheckOperator

Performs checks against a SQL query that will return a single row with different values.

airflow.contrib.operators.bigquery_check_operator.BigQueryIntervalCheckOperator

Checks that the values of metrics given as SQL expressions are within a certain tolerance of the ones from days_back before.

airflow.contrib.operators.bigquery_check_operator.BigQueryValueCheckOperator

Performs a simple value check using SQL code.

airflow.contrib.operators.bigquery_get_data.BigQueryGetDataOperator

Fetches the data from a BigQuery table and returns data in a python list

airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyDatasetOperator

Creates an empty BigQuery dataset.

airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyTableOperator

Creates a new, empty table in the specified BigQuery dataset optionally with schema.

airflow.contrib.operators.bigquery_operator.BigQueryCreateExternalTableOperator

Creates a new, external table in the dataset with the data in Google Cloud Storage.

airflow.contrib.operators.bigquery_operator.BigQueryDeleteDatasetOperator

Deletes an existing BigQuery dataset.

airflow.contrib.operators.bigquery_operator.BigQueryOperator

Executes BigQuery SQL queries in a specific BigQuery database.

airflow.contrib.operators.bigquery_table_delete_operator.BigQueryTableDeleteOperator

Deletes an existing BigQuery table.

airflow.contrib.operators.bigquery_to_bigquery.BigQueryToBigQueryOperator

Copy a BigQuery table to another BigQuery table.

airflow.contrib.operators.bigquery_to_gcs.BigQueryToCloudStorageOperator

Transfers a BigQuery table to a Google Cloud Storage bucket

They also use airflow.contrib.hooks.bigquery_hook.BigQueryHook to communicate with Google Cloud Platform.

Cloud Spanner

airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseDeleteOperator

deletes an existing database from a Google Cloud Spanner instance or returns success if the database is missing.

airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseDeployOperator

creates a new database in a Google Cloud instance or returns success if the database already exists.

airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseQueryOperator

executes an arbitrary DML query (INSERT, UPDATE, DELETE).

airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseUpdateOperator

updates the structure of a Google Cloud Spanner database.

airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDeleteOperator

deletes a Google Cloud Spanner instance.

airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDeployOperator

creates a new Google Cloud Spanner instance, or if an instance with the same name exists, updates the instance.

They also use airflow.contrib.hooks.gcp_spanner_hook.CloudSpannerHook to communicate with Google Cloud Platform.

Cloud Bigtable

airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator

updates the number of nodes in a Google Cloud Bigtable cluster.

airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator

creates a Cloud Bigtable instance.

airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator

deletes a Google Cloud Bigtable instance.

airflow.contrib.operators.gcp_bigtable_operator.BigtableTableCreateOperator

creates a table in a Google Cloud Bigtable instance.

airflow.contrib.operators.gcp_bigtable_operator.BigtableTableDeleteOperator

deletes a table in a Google Cloud Bigtable instance.

airflow.contrib.operators.gcp_bigtable_operator.BigtableTableWaitForReplicationSensor

(sensor) waits for a table to be fully replicated.

They also use airflow.contrib.hooks.gcp_bigtable_hook.BigtableHook to communicate with Google Cloud Platform.

Compute Engine

airflow.contrib.operators.gcp_compute_operator.GceInstanceStartOperator

start an existing Google Compute Engine instance.

airflow.contrib.operators.gcp_compute_operator.GceInstanceStopOperator

stop an existing Google Compute Engine instance.

airflow.contrib.operators.gcp_compute_operator.GceSetMachineTypeOperator

change the machine type for a stopped instance.

airflow.contrib.operators.gcp_compute_operator.GceInstanceTemplateCopyOperator

copy the Instance Template, applying specified changes.

airflow.contrib.operators.gcp_compute_operator.GceInstanceGroupManagerUpdateTemplateOperator

patch the Instance Group Manager, replacing source Instance Template URL with the destination one.

The operators have the common base operator airflow.contrib.operators.gcp_compute_operator.GceBaseOperator

They also use airflow.contrib.hooks.gcp_compute_hook.GceHook to communicate with Google Cloud Platform.

Cloud Functions

airflow.contrib.operators.gcp_function_operator.GcfFunctionDeployOperator

deploy Google Cloud Function to Google Cloud Platform

airflow.contrib.operators.gcp_function_operator.GcfFunctionDeleteOperator

delete Google Cloud Function in Google Cloud Platform

They also use airflow.contrib.hooks.gcp_function_hook.GcfHook to communicate with Google Cloud Platform.

Cloud DataFlow

airflow.contrib.operators.dataflow_operator.DataFlowJavaOperator

launching Cloud Dataflow jobs written in Java.

airflow.contrib.operators.dataflow_operator.DataflowTemplateOperator

launching a templated Cloud DataFlow batch job.

airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator

launching Cloud Dataflow jobs written in python.

They also use airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook to communicate with Google Cloud Platform.

Cloud DataProc

airflow.contrib.operators.dataproc_operator.DataprocClusterCreateOperator

Create a new cluster on Google Cloud Dataproc.

airflow.contrib.operators.dataproc_operator.DataprocClusterDeleteOperator

Delete a cluster on Google Cloud Dataproc.

airflow.contrib.operators.dataproc_operator.DataprocClusterScaleOperator

Scale up or down a cluster on Google Cloud Dataproc.

airflow.contrib.operators.dataproc_operator.DataProcHadoopOperator

Start a Hadoop Job on a Cloud DataProc cluster.

airflow.contrib.operators.dataproc_operator.DataProcHiveOperator

Start a Hive query Job on a Cloud DataProc cluster.

airflow.contrib.operators.dataproc_operator.DataProcPigOperator

Start a Pig query Job on a Cloud DataProc cluster.

airflow.contrib.operators.dataproc_operator.DataProcPySparkOperator

Start a PySpark Job on a Cloud DataProc cluster.

airflow.contrib.operators.dataproc_operator.DataProcSparkOperator

Start a Spark Job on a Cloud DataProc cluster.

airflow.contrib.operators.dataproc_operator.DataProcSparkSqlOperator

Start a Spark SQL query Job on a Cloud DataProc cluster.

airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateInlineOperator

Instantiate a WorkflowTemplate Inline on Google Cloud Dataproc.

airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateOperator

Instantiate a WorkflowTemplate on Google Cloud Dataproc.

Cloud Datastore

airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator

Export entities from Google Cloud Datastore to Cloud Storage.

airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator

Import entities from Cloud Storage to Google Cloud Datastore.

They also use airflow.contrib.hooks.datastore_hook.DatastoreHook to communicate with Google Cloud Platform.

Cloud ML Engine

airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator

Start a Cloud ML Engine batch prediction job.

airflow.contrib.operators.mlengine_operator.MLEngineModelOperator

Manages a Cloud ML Engine model.

airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator

Start a Cloud ML Engine training job.

airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator

Manages a Cloud ML Engine model version.

They also use airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook to communicate with Google Cloud Platform.

Cloud Storage

airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator

Uploads a file to Google Cloud Storage.

airflow.contrib.operators.gcs_acl_operator.GoogleCloudStorageBucketCreateAclEntryOperator

Creates a new ACL entry on the specified bucket.

airflow.contrib.operators.gcs_acl_operator.GoogleCloudStorageObjectCreateAclEntryOperator

Creates a new ACL entry on the specified object.

airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator

Downloads a file from Google Cloud Storage.

airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator

List all objects from the bucket with the give string prefix and delimiter in name.

airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator

Creates a new cloud storage bucket.

airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator

Loads files from Google cloud storage into BigQuery.

airflow.contrib.operators.gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator

Copies objects from a bucket to another, with renaming if requested.

airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator

Copy data from any MySQL Database to Google cloud storage in JSON format.

They also use airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook to communicate with Google Cloud Platform.

Transfer Service

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceJobDeleteOperator

Deletes a transfer job.

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceJobCreateOperator

Creates a transfer job.

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceJobUpdateOperator

Updates a transfer job.

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationCancelOperator

Cancels a transfer operation.

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationGetOperator

Gets a transfer operation.

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationPauseOperator

Pauses a transfer operation

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationResumeOperator

Resumes a transfer operation.

airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationsListOperator

Gets a list of transfer operations.

airflow.contrib.operators.gcp_transfer_operator.GoogleCloudStorageToGoogleCloudStorageTransferOperator

Copies objects from a Google Cloud Storage bucket to another bucket.

airflow.contrib.operators.gcp_transfer_operator.S3ToGoogleCloudStorageTransferOperator

Synchronizes an S3 bucket with a Google Cloud Storage bucket.

airflow.contrib.sensors.gcp_transfer_operator.GCPTransferServiceWaitForJobStatusSensor

Waits for at least one operation belonging to the job to have the expected status.

They also use airflow.contrib.hooks.gcp_transfer_hook.GCPTransferServiceHook to communicate with Google Cloud Platform.

Cloud Vision

Cloud Vision Product Search Operators

airflow.contrib.operators.gcp_vision_operator.CloudVisionAddProductToProductSetOperator

Adds a Product to the specified ProductSet.

airflow.contrib.operators.gcp_vision_operator.CloudVisionAnnotateImageOperator

Run image detection and annotation for an image.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductCreateOperator

Creates a new Product resource.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductDeleteOperator

Permanently deletes a product and its reference images.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductGetOperator

Gets information associated with a Product.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetCreateOperator

Creates a new ProductSet resource.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetDeleteOperator

Permanently deletes a ProductSet.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetGetOperator

Gets information associated with a ProductSet.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetUpdateOperator

Makes changes to a ProductSet resource.

airflow.contrib.operators.gcp_vision_operator.CloudVisionProductUpdateOperator

Makes changes to a Product resource.

airflow.contrib.operators.gcp_vision_operator.CloudVisionReferenceImageCreateOperator

Creates a new ReferenceImage resource.

airflow.contrib.operators.gcp_vision_operator.CloudVisionRemoveProductFromProductSetOperator

Removes a Product from the specified ProductSet.

They also use airflow.contrib.hooks.gcp_vision_hook.CloudVisionHook to communicate with Google Cloud Platform.

Cloud Translate

Cloud Translate Text Operators

airflow.contrib.operators.gcp_translate_operator.CloudTranslateTextOperator

Translate a string or list of strings.

Google Kubernetes Engine

airflow.contrib.operators.gcp_container_operator.GKEClusterCreateOperator

Creates a Kubernetes Cluster in Google Cloud Platform

airflow.contrib.operators.gcp_container_operator.GKEClusterDeleteOperator

Deletes a Kubernetes Cluster in Google Cloud Platform

airflow.contrib.operators.gcp_container_operator.GKEPodOperator

Executes a task in a Kubernetes pod in the specified Google Kubernetes Engine cluster

They also use airflow.contrib.hooks.gcp_container_hook.GKEClusterHook to communicate with Google Cloud Platform.

Google Natural Language

airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageAnalyzeEntities

Finds named entities (currently proper names and common nouns) in the text along with entity types, salience, mentions for each entity, and other properties.

airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageAnalyzeEntitySentiment

Finds entities, similar to AnalyzeEntities in the text and analyzes sentiment associated with each entity and its mentions.

airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageAnalyzeSentiment

Analyzes the sentiment of the provided text.

airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageClassifyTextOperator

Classifies a document into categories.

They also use airflow.contrib.hooks.gcp_natural_language_operator.CloudNaturalLanguageHook to communicate with Google Cloud Platform.

Qubole

Apache Airflow has a native operator and hooks to talk to Qubole, which lets you submit your big data jobs directly to Qubole from Apache Airflow.

airflow.contrib.operators.qubole_operator.QuboleOperator

Execute tasks (commands) on QDS (https://qubole.com).

airflow.contrib.sensors.qubole_sensor.QubolePartitionSensor

Wait for a Hive partition to show up in QHS (Qubole Hive Service) and check for its presence via QDS APIs

airflow.contrib.sensors.qubole_sensor.QuboleFileSensor

Wait for a file or folder to be present in cloud storage and check for its presence via QDS APIs

airflow.contrib.operators.qubole_check_operator.QuboleCheckOperator

Performs checks against Qubole Commands. QuboleCheckOperator expects a command that will be executed on QDS.

airflow.contrib.operators.qubole_check_operator.QuboleValueCheckOperator

Performs a simple value check using Qubole command. By default, each value on the first row of this Qubole command is compared with a pre-defined value