Content
Integration¶
Azure: Microsoft Azure¶
Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and Azure Data Lake Hook are in contrib section.
Logging¶
Airflow can be configured to read and write task logs in Azure Blob Storage. See Writing Logs to Azure Blob Storage.
Azure Blob Storage¶
All classes communicate via the Window Azure Storage Blob protocol. Make sure that a
Airflow connection of type wasb exists. Authorization can be done by supplying a
login (=Storage account name) and password (=KEY), or login and SAS token in the extra
field (see connection wasb_default for an example).
The operators are defined in the following module:
They use airflow.contrib.hooks.wasb_hook.WasbHook to communicate with Microsoft Azure.
Azure CosmosDB¶
AzureCosmosDBHook communicates via the Azure Cosmos library. Make sure that a
Airflow connection of type azure_cosmos exists. Authorization can be done by supplying a
login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the
default database and collection to use (see connection azure_cosmos_default for an example).
The operators are defined in the following modules:
They also use airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook to communicate with Microsoft Azure.
Azure Data Lake¶
AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a
Airflow connection of type azure_data_lake exists. Authorization can be done by supplying a
login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name)
(see connection azure_data_lake_default for an example).
The operators are defined in the following modules:
They also use airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook to communicate with Microsoft Azure.
Azure Container Instances¶
Azure Container Instances provides a method to run a docker container without having to worry
about managing infrastructure. The AzureContainerInstanceHook requires a service principal. The
credentials for this principal can either be defined in the extra field key_path, as an
environment variable named AZURE_AUTH_LOCATION,
or by providing a login/password and tenantId in extras.
The operator is defined in the airflow.contrib.operators.azure_container_instances_operator module.
They also use airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook,
airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook and
airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook to communicate with Microsoft Azure.
The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.
AWS: Amazon Web Services¶
Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and Operators are in the contrib section.
Logging¶
Airflow can be configured to read and write task logs in Amazon Simple Storage Service (Amazon S3). See Writing Logs to Amazon S3.
AWS EMR¶
The operators are defined in the following modules:
They also use airflow.contrib.hooks.emr_hook.EmrHook to communicate with Amazon Web Service.
AWS S3¶
The operators are defined in the following modules:
airflow.contrib.operators.s3_to_gcs_transfer_operator
They also use airflow.hooks.S3_hook.S3Hook to communicate with Amazon Web Service.
AWS Batch Service¶
The operator is defined in the airflow.contrib.operators.awsbatch_operator.AWSBatchOperator module.
AWS RedShift¶
The operators are defined in the following modules:
They also use airflow.contrib.hooks.redshift_hook.RedshiftHook to communicate with Amazon Web Service.
AWS DynamoDB¶
The operator is defined in the airflow.contrib.operators.hive_to_dynamodb module.
It uses airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook to communicate with Amazon Web Service.
AWS Lambda¶
It uses airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook to communicate with Amazon Web Service.
AWS Kinesis¶
It uses airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook to communicate with Amazon Web Service.
Amazon SageMaker¶
For more instructions on using Amazon SageMaker in Airflow, please see the SageMaker Python SDK README.
The operators are defined in the following modules:
airflow.contrib.operators.sagemaker_training_operator
airflow.contrib.operators.sagemaker_tuning_operator
airflow.contrib.operators.sagemaker_model_operator
airflow.contrib.operators.sagemaker_transform_operator
airflow.contrib.operators.sagemaker_endpoint_config_operator
airflow.contrib.operators.sagemaker_endpoint_operator
They uses airflow.contrib.hooks.sagemaker_hook.SageMakerHook to communicate with Amazon Web Service.
Databricks¶
With contributions from Databricks, Airflow has several operators
which enable the submitting and running of jobs to the Databricks platform. Internally the
operators talk to the api/2.0/jobs/runs/submit endpoint.
The operators are defined in the airflow.contrib.operators.databricks_operator module.
GCP: Google Cloud Platform¶
Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.
See the GCP connection type documentation to configure connections to GCP.
Logging¶
Airflow can be configured to read and write task logs in Google Cloud Storage. See Writing Logs to Google Cloud Storage.
GoogleCloudBaseHook¶
All hooks is based on airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook.
BigQuery¶
The operators are defined in the following module:
They also use airflow.contrib.hooks.bigquery_hook.BigQueryHook to communicate with Google Cloud Platform.
Cloud Spanner¶
The operator is defined in the airflow.contrib.operators.gcp_spanner_operator package.
They also use airflow.contrib.hooks.gcp_spanner_hook.CloudSpannerHook to communicate with Google Cloud Platform.
Cloud SQL¶
The operator is defined in the airflow.contrib.operators.gcp_sql_operator package.
They also use airflow.contrib.hooks.gcp_sql_hook.CloudSqlDatabaseHook and airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook to communicate with Google Cloud Platform.
Cloud Bigtable¶
The operator is defined in the airflow.contrib.operators.gcp_bigtable_operator package.
They also use airflow.contrib.hooks.gcp_bigtable_hook.BigtableHook to communicate with Google Cloud Platform.
Cloud Build¶
The operator is defined in the airflow.contrib.operators.gcp_cloud_build_operator package.
They also use airflow.contrib.hooks.gcp_cloud_build_hook.CloudBuildHook to communicate with Google Cloud Platform.
Compute Engine¶
The operators are defined in the airflow.contrib.operators.gcp_compute_operator package.
They also use airflow.contrib.hooks.gcp_compute_hook.GceHook to communicate with Google Cloud Platform.
Cloud Functions¶
The operators are defined in the airflow.contrib.operators.gcp_function_operator package.
They also use airflow.contrib.hooks.gcp_function_hook.GcfHook to communicate with Google Cloud Platform.
Cloud DataFlow¶
The operators are defined in the airflow.contrib.operators.dataflow_operator package.
They also use airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook to communicate with Google Cloud Platform.
Cloud DataProc¶
The operators are defined in the airflow.contrib.operators.dataproc_operator package.
Cloud Datastore¶
airflow.contrib.operators.datastore_export_operator.DatastoreExportOperatorExport entities from Google Cloud Datastore to Cloud Storage.
airflow.contrib.operators.datastore_import_operator.DatastoreImportOperatorImport entities from Cloud Storage to Google Cloud Datastore.
They also use airflow.contrib.hooks.datastore_hook.DatastoreHook to communicate with Google Cloud Platform.
Cloud ML Engine¶
airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperatorStart a Cloud ML Engine batch prediction job.
airflow.contrib.operators.mlengine_operator.MLEngineModelOperatorManages a Cloud ML Engine model.
airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperatorStart a Cloud ML Engine training job.
airflow.contrib.operators.mlengine_operator.MLEngineVersionOperatorManages a Cloud ML Engine model version.
The operators are defined in the airflow.contrib.operators.mlengine_operator package.
They also use airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook to communicate with Google Cloud Platform.
Cloud Storage¶
The operators are defined in the following module:
They also use airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook to communicate with Google Cloud Platform.
Transfer Service¶
The operators are defined in the following module:
airflow.contrib.sensors.gcp_transfer_operator
They also use airflow.contrib.hooks.gcp_transfer_hook.GCPTransferServiceHook to communicate with Google Cloud Platform.
Cloud Vision¶
The operator is defined in the airflow.contrib.operators.gcp_vision_operator package.
They also use airflow.contrib.hooks.gcp_vision_hook.CloudVisionHook to communicate with Google Cloud Platform.
Cloud Text to Speech¶
The operator is defined in the airflow.contrib.operators.gcp_text_to_speech_operator package.
They also use airflow.contrib.hooks.gcp_text_to_speech_hook.GCPTextToSpeechHook to communicate with Google Cloud Platform.
Cloud Speech to Text¶
The operator is defined in the airflow.contrib.operators.gcp_speech_to_text_operator package.
They also use airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook to communicate with Google Cloud Platform.
Cloud Speech Translate Operators¶
The operator is defined in the airflow.contrib.operators.gcp_translate_speech_operator package.
- They also use
airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHookand airflow.contrib.hooks.gcp_translate_hook.CloudTranslateHookto communicate with Google Cloud Platform.
Cloud Translate¶
Cloud Translate Text Operators¶
airflow.contrib.operators.gcp_translate_operator.CloudTranslateTextOperatorTranslate a string or list of strings.
The operator is defined in the airflow.contrib.operators.gcp_translate_operator package.
Cloud Video Intelligence¶
The operators are defined in the airflow.contrib.operators.gcp_video_intelligence_operator package.
They also use airflow.contrib.hooks.gcp_video_intelligence_hook.CloudVideoIntelligenceHook to communicate with Google Cloud Platform.
Google Kubernetes Engine¶
The operators are defined in the airflow.contrib.operators.gcp_container_operator package.
They also use airflow.contrib.hooks.gcp_container_hook.GKEClusterHook to communicate with Google Cloud Platform.
Google Natural Language¶
The operators are defined in the airflow.contrib.operators.gcp_natural_language_operator package.
They also use airflow.contrib.hooks.gcp_natural_language_operator.CloudNaturalLanguageHook to communicate with Google Cloud Platform.
Google Cloud Data Loss Prevention (DLP)¶
The operators are defined in the airflow.contrib.operators.gcp_dlp_operator package.
They also use airflow.contrib.hooks.gcp_dlp_hook.CloudDLPHook to communicate with Google Cloud Platform.
Google Cloud Tasks¶
The operators are defined in the airflow.contrib.operators.gcp_tasks_operator package.
They also use airflow.contrib.hooks.gcp_tasks_hook.CloudTasksHook to communicate with Google Cloud Platform.