Content
Integration¶
Azure: Microsoft Azure¶
Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and Azure Data Lake Hook are in contrib section.
Logging¶
Airflow can be configured to read and write task logs in Azure Blob Storage. See Writing Logs to Azure Blob Storage.
Azure Blob Storage¶
All classes communicate via the Window Azure Storage Blob protocol. Make sure that a
Airflow connection of type wasb
exists. Authorization can be done by supplying a
login (=Storage account name) and password (=KEY), or login and SAS token in the extra
field (see connection wasb_default
for an example).
The operators are defined in the following module:
They use airflow.contrib.hooks.wasb_hook.WasbHook
to communicate with Microsoft Azure.
Azure CosmosDB¶
AzureCosmosDBHook communicates via the Azure Cosmos library. Make sure that a
Airflow connection of type azure_cosmos
exists. Authorization can be done by supplying a
login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the
default database and collection to use (see connection azure_cosmos_default
for an example).
The operators are defined in the following modules:
They also use airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook
to communicate with Microsoft Azure.
Azure Data Lake¶
AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a
Airflow connection of type azure_data_lake
exists. Authorization can be done by supplying a
login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name)
(see connection azure_data_lake_default
for an example).
The operators are defined in the following modules:
They also use airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook
to communicate with Microsoft Azure.
Azure Container Instances¶
Azure Container Instances provides a method to run a docker container without having to worry
about managing infrastructure. The AzureContainerInstanceHook requires a service principal. The
credentials for this principal can either be defined in the extra field key_path
, as an
environment variable named AZURE_AUTH_LOCATION
,
or by providing a login/password and tenantId in extras.
The operator is defined in the airflow.contrib.operators.azure_container_instances_operator
module.
They also use airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook
,
airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook
and
airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook
to communicate with Microsoft Azure.
The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.
AWS: Amazon Web Services¶
Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and Operators are in the contrib section.
Logging¶
Airflow can be configured to read and write task logs in Amazon Simple Storage Service (Amazon S3). See Writing Logs to Amazon S3.
AWS EMR¶
The operators are defined in the following modules:
They also use airflow.contrib.hooks.emr_hook.EmrHook
to communicate with Amazon Web Service.
AWS S3¶
The operators are defined in the following modules:
airflow.contrib.operators.s3_to_gcs_transfer_operator
They also use airflow.hooks.S3_hook.S3Hook
to communicate with Amazon Web Service.
AWS Batch Service¶
The operator is defined in the airflow.contrib.operators.awsbatch_operator.AWSBatchOperator
module.
AWS RedShift¶
The operators are defined in the following modules:
They also use airflow.contrib.hooks.redshift_hook.RedshiftHook
to communicate with Amazon Web Service.
AWS DynamoDB¶
The operator is defined in the airflow.contrib.operators.hive_to_dynamodb
module.
It uses airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook
to communicate with Amazon Web Service.
AWS Lambda¶
It uses airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook
to communicate with Amazon Web Service.
AWS Kinesis¶
It uses airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook
to communicate with Amazon Web Service.
Amazon SageMaker¶
For more instructions on using Amazon SageMaker in Airflow, please see the SageMaker Python SDK README.
The operators are defined in the following modules:
airflow.contrib.operators.sagemaker_training_operator
airflow.contrib.operators.sagemaker_tuning_operator
airflow.contrib.operators.sagemaker_model_operator
airflow.contrib.operators.sagemaker_transform_operator
airflow.contrib.operators.sagemaker_endpoint_config_operator
airflow.contrib.operators.sagemaker_endpoint_operator
They uses airflow.contrib.hooks.sagemaker_hook.SageMakerHook
to communicate with Amazon Web Service.
Databricks¶
With contributions from Databricks, Airflow has several operators
which enable the submitting and running of jobs to the Databricks platform. Internally the
operators talk to the api/2.0/jobs/runs/submit
endpoint.
The operators are defined in the airflow.contrib.operators.databricks_operator
module.
GCP: Google Cloud Platform¶
Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.
See the GCP connection type documentation to configure connections to GCP.
Logging¶
Airflow can be configured to read and write task logs in Google Cloud Storage. See Writing Logs to Google Cloud Storage.
GoogleCloudBaseHook¶
All hooks is based on airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook
.
BigQuery¶
The operators are defined in the following module:
They also use airflow.contrib.hooks.bigquery_hook.BigQueryHook
to communicate with Google Cloud Platform.
Cloud Spanner¶
The operator is defined in the airflow.contrib.operators.gcp_spanner_operator
package.
They also use airflow.contrib.hooks.gcp_spanner_hook.CloudSpannerHook
to communicate with Google Cloud Platform.
Cloud SQL¶
The operator is defined in the airflow.contrib.operators.gcp_sql_operator
package.
They also use airflow.contrib.hooks.gcp_sql_hook.CloudSqlDatabaseHook
and airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook
to communicate with Google Cloud Platform.
Cloud Bigtable¶
The operator is defined in the airflow.contrib.operators.gcp_bigtable_operator
package.
They also use airflow.contrib.hooks.gcp_bigtable_hook.BigtableHook
to communicate with Google Cloud Platform.
Cloud Build¶
The operator is defined in the airflow.contrib.operators.gcp_cloud_build_operator
package.
They also use airflow.contrib.hooks.gcp_cloud_build_hook.CloudBuildHook
to communicate with Google Cloud Platform.
Compute Engine¶
The operators are defined in the airflow.contrib.operators.gcp_compute_operator
package.
They also use airflow.contrib.hooks.gcp_compute_hook.GceHook
to communicate with Google Cloud Platform.
Cloud Functions¶
The operators are defined in the airflow.contrib.operators.gcp_function_operator
package.
They also use airflow.contrib.hooks.gcp_function_hook.GcfHook
to communicate with Google Cloud Platform.
Cloud DataFlow¶
The operators are defined in the airflow.contrib.operators.dataflow_operator
package.
They also use airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook
to communicate with Google Cloud Platform.
Cloud DataProc¶
The operators are defined in the airflow.contrib.operators.dataproc_operator
package.
Cloud Datastore¶
airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator
Export entities from Google Cloud Datastore to Cloud Storage.
airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator
Import entities from Cloud Storage to Google Cloud Datastore.
They also use airflow.contrib.hooks.datastore_hook.DatastoreHook
to communicate with Google Cloud Platform.
Cloud ML Engine¶
airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator
Start a Cloud ML Engine batch prediction job.
airflow.contrib.operators.mlengine_operator.MLEngineModelOperator
Manages a Cloud ML Engine model.
airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator
Start a Cloud ML Engine training job.
airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator
Manages a Cloud ML Engine model version.
The operators are defined in the airflow.contrib.operators.mlengine_operator
package.
They also use airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook
to communicate with Google Cloud Platform.
Cloud Storage¶
The operators are defined in the following module:
They also use airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook
to communicate with Google Cloud Platform.
Transfer Service¶
The operators are defined in the following module:
airflow.contrib.sensors.gcp_transfer_operator
They also use airflow.contrib.hooks.gcp_transfer_hook.GCPTransferServiceHook
to communicate with Google Cloud Platform.
Cloud Vision¶
The operator is defined in the airflow.contrib.operators.gcp_vision_operator
package.
They also use airflow.contrib.hooks.gcp_vision_hook.CloudVisionHook
to communicate with Google Cloud Platform.
Cloud Text to Speech¶
The operator is defined in the airflow.contrib.operators.gcp_text_to_speech_operator
package.
They also use airflow.contrib.hooks.gcp_text_to_speech_hook.GCPTextToSpeechHook
to communicate with Google Cloud Platform.
Cloud Speech to Text¶
The operator is defined in the airflow.contrib.operators.gcp_speech_to_text_operator
package.
They also use airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook
to communicate with Google Cloud Platform.
Cloud Speech Translate Operators¶
The operator is defined in the airflow.contrib.operators.gcp_translate_speech_operator
package.
- They also use
airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook
and airflow.contrib.hooks.gcp_translate_hook.CloudTranslateHook
to communicate with Google Cloud Platform.
Cloud Translate¶
Cloud Translate Text Operators¶
airflow.contrib.operators.gcp_translate_operator.CloudTranslateTextOperator
Translate a string or list of strings.
The operator is defined in the airflow.contrib.operators.gcp_translate_operator
package.
Cloud Video Intelligence¶
The operators are defined in the airflow.contrib.operators.gcp_video_intelligence_operator
package.
They also use airflow.contrib.hooks.gcp_video_intelligence_hook.CloudVideoIntelligenceHook
to communicate with Google Cloud Platform.
Google Kubernetes Engine¶
The operators are defined in the airflow.contrib.operators.gcp_container_operator
package.
They also use airflow.contrib.hooks.gcp_container_hook.GKEClusterHook
to communicate with Google Cloud Platform.
Google Natural Language¶
The operators are defined in the airflow.contrib.operators.gcp_natural_language_operator
package.
They also use airflow.contrib.hooks.gcp_natural_language_operator.CloudNaturalLanguageHook
to communicate with Google Cloud Platform.
Google Cloud Data Loss Prevention (DLP)¶
The operators are defined in the airflow.contrib.operators.gcp_dlp_operator
package.
They also use airflow.contrib.hooks.gcp_dlp_hook.CloudDLPHook
to communicate with Google Cloud Platform.
Google Cloud Tasks¶
The operators are defined in the airflow.contrib.operators.gcp_tasks_operator
package.
They also use airflow.contrib.hooks.gcp_tasks_hook.CloudTasksHook
to communicate with Google Cloud Platform.