Integration¶
Azure: Microsoft Azure¶
Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and Azure Data Lake Hook are in contrib section.
Azure Blob Storage¶
All classes communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default for an example).
airflow.contrib.hooks.wasb_hook.WasbHook
Interface with Azure Blob Storage.
airflow.contrib.sensors.wasb_sensor.WasbBlobSensor
Checks if a blob is present on Azure Blob storage.
airflow.contrib.operators.wasb_delete_blob_operator.WasbDeleteBlobOperator
Deletes blob(s) on Azure Blob Storage.
airflow.contrib.sensors.wasb_sensor.WasbPrefixSensor
Checks if blobs matching a prefix are present on Azure Blob storage.
airflow.contrib.operators.file_to_wasb.FileToWasbOperator
Uploads a local file to a container as a blob.
Logging¶
Airflow can be configured to read and write task logs in Azure Blob Storage. See Writing Logs to Azure Blob Storage.
Azure CosmosDB¶
AzureCosmosDBHook communicates via the Azure Cosmos library. Make sure that a Airflow connection of type azure_cosmos exists. Authorization can be done by supplying a login (=Endpoint uri), password (=secret key) and extra fields database_name and collection_name to specify the default database and collection to use (see connection azure_cosmos_default for an example).
airflow.contrib.hooks.azure_cosmos_hook.AzureCosmosDBHook
Interface with Azure CosmosDB.
airflow.contrib.operators.azure_cosmos_operator.AzureCosmosInsertDocumentOperator
Simple operator to insert document into CosmosDB.
airflow.contrib.sensors.azure_cosmos_sensor.AzureCosmosDocumentSensor
Simple sensor to detect document existence in CosmosDB.
Azure Data Lake¶
AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a Airflow connection of type azure_data_lake exists. Authorization can be done by supplying a login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name) (see connection azure_data_lake_default for an example).
airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook
Interface with Azure Data Lake.
airflow.contrib.operators.adls_list_operator.AzureDataLakeStorageListOperator
Lists the files located in a specified Azure Data Lake path.
airflow.contrib.operators.adls_to_gcs.AdlsToGoogleCloudStorageOperator
Copies files from an Azure Data Lake path to a Google Cloud Storage bucket.
Azure Container Instances¶
Azure Container Instances provides a method to run a docker container without having to worry
about managing infrastructure. The AzureContainerInstanceHook requires a service principal. The
credentials for this principal can either be defined in the extra field key_path
, as an
environment variable named AZURE_AUTH_LOCATION
,
or by providing a login/password and tenantId in extras.
The AzureContainerRegistryHook requires a host/login/password to be defined in the connection.
airflow.contrib.hooks.azure_container_volume_hook.AzureContainerVolumeHook
Interface with Azure Container Volumes
airflow.contrib.operators.azure_container_instances_operator.AzureContainerInstancesOperator
Start/Monitor a new ACI.
airflow.contrib.hooks.azure_container_instance_hook.AzureContainerInstanceHook
Wrapper around a single ACI.
airflow.contrib.hooks.azure_container_registry_hook.AzureContainerRegistryHook
Interface with ACR
AWS: Amazon Web Services¶
Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and Operators are in the contrib section.
AWS EMR¶
airflow.contrib.hooks.emr_hook.EmrHook
Interface with AWS EMR.
airflow.contrib.operators.emr_add_steps_operator.EmrAddStepsOperator
Adds steps to an existing EMR JobFlow.
airflow.contrib.operators.emr_create_job_flow_operator.EmrCreateJobFlowOperator
Creates an EMR JobFlow, reading the config from the EMR connection.
airflow.contrib.operators.emr_terminate_job_flow_operator.EmrTerminateJobFlowOperator
Terminates an EMR JobFlow.
AWS S3¶
airflow.hooks.S3_hook.S3Hook
Interface with AWS S3.
airflow.operators.s3_file_transform_operator.S3FileTransformOperator
Copies data from a source S3 location to a temporary location on the local filesystem.
airflow.contrib.operators.s3_list_operator.S3ListOperator
Lists the files matching a key prefix from a S3 location.
airflow.contrib.operators.s3_to_gcs_operator.S3ToGoogleCloudStorageOperator
Syncs an S3 location with a Google Cloud Storage bucket.
airflow.contrib.operators.s3_to_gcs_transfer_operator.S3ToGoogleCloudStorageTransferOperator
Syncs an S3 bucket with a Google Cloud Storage bucket using the GCP Storage Transfer Service.
airflow.operators.s3_to_hive_operator.S3ToHiveTransfer
Moves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table.
AWS Batch Service¶
airflow.contrib.operators.awsbatch_operator.AWSBatchOperator
Execute a task on AWS Batch Service.
AWS RedShift¶
airflow.contrib.sensors.aws_redshift_cluster_sensor.AwsRedshiftClusterSensor
Waits for a Redshift cluster to reach a specific status.
airflow.contrib.hooks.redshift_hook.RedshiftHook
Interact with AWS Redshift, using the boto3 library.
airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer
Executes an unload command to S3 as CSV with or without headers.
airflow.operators.s3_to_redshift_operator.S3ToRedshiftTransfer
Executes an copy command from S3 as CSV with or without headers.
AWS DynamoDB¶
airflow.contrib.operators.hive_to_dynamodb.HiveToDynamoDBTransferOperator
Moves data from Hive to DynamoDB.
airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook
Interface with AWS DynamoDB.
AWS Lambda¶
airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook
Interface with AWS Lambda.
AWS Kinesis¶
airflow.contrib.hooks.aws_firehose_hook.AwsFirehoseHook
Interface with AWS Kinesis Firehose.
Amazon SageMaker¶
airflow.contrib.hooks.sagemaker_hook.SageMakerHook
Interface with Amazon SageMaker.
airflow.contrib.operators.sagemaker_training_operator.SageMakerTrainingOperator
Create a SageMaker training job.
airflow.contrib.operators.sagemaker_tuning_operator.SageMakerTuningOperator
Create a SageMaker tuning job.
airflow.contrib.operators.sagemaker_model_operator.SageMakerModelOperator
Create a SageMaker model.
airflow.contrib.operators.sagemaker_transform_operator.SageMakerTransformOperator
Create a SageMaker transform job.
airflow.contrib.operators.sagemaker_endpoint_config_operator.SageMakerEndpointConfigOperator
Create a SageMaker endpoint config.
airflow.contrib.operators.sagemaker_endpoint_operator.SageMakerEndpointOperator
Create a SageMaker endpoint.
Databricks¶
Databricks has contributed an Airflow operator which enables
submitting runs to the Databricks platform. Internally the operator talks to the
api/2.0/jobs/runs/submit
endpoint.
airflow.contrib.operators.databricks_operator.DatabricksSubmitRunOperator
Submits a Spark job run to Databricks using the api/2.0/jobs/runs/submit API endpoint.
GCP: Google Cloud Platform¶
Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.
See the GCP connection type documentation to configure connections to GCP.
Logging¶
Airflow can be configured to read and write task logs in Google Cloud Storage. See Writing Logs to Google Cloud Storage.
GoogleCloudBaseHook¶
All hooks is based on airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook
.
BigQuery¶
airflow.contrib.operators.bigquery_check_operator.BigQueryCheckOperator
Performs checks against a SQL query that will return a single row with different values.
airflow.contrib.operators.bigquery_check_operator.BigQueryIntervalCheckOperator
Checks that the values of metrics given as SQL expressions are within a certain tolerance of the ones from days_back before.
airflow.contrib.operators.bigquery_check_operator.BigQueryValueCheckOperator
Performs a simple value check using SQL code.
airflow.contrib.operators.bigquery_get_data.BigQueryGetDataOperator
Fetches the data from a BigQuery table and returns data in a python list
airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyDatasetOperator
Creates an empty BigQuery dataset.
airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyTableOperator
Creates a new, empty table in the specified BigQuery dataset optionally with schema.
airflow.contrib.operators.bigquery_operator.BigQueryCreateExternalTableOperator
Creates a new, external table in the dataset with the data in Google Cloud Storage.
airflow.contrib.operators.bigquery_operator.BigQueryDeleteDatasetOperator
Deletes an existing BigQuery dataset.
airflow.contrib.operators.bigquery_operator.BigQueryOperator
Executes BigQuery SQL queries in a specific BigQuery database.
airflow.contrib.operators.bigquery_table_delete_operator.BigQueryTableDeleteOperator
Deletes an existing BigQuery table.
airflow.contrib.operators.bigquery_to_bigquery.BigQueryToBigQueryOperator
Copy a BigQuery table to another BigQuery table.
airflow.contrib.operators.bigquery_to_gcs.BigQueryToCloudStorageOperator
Transfers a BigQuery table to a Google Cloud Storage bucket
They also use airflow.contrib.hooks.bigquery_hook.BigQueryHook
to communicate with Google Cloud Platform.
Cloud Spanner¶
airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseDeleteOperator
deletes an existing database from a Google Cloud Spanner instance or returns success if the database is missing.
airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseDeployOperator
creates a new database in a Google Cloud instance or returns success if the database already exists.
airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseQueryOperator
executes an arbitrary DML query (INSERT, UPDATE, DELETE).
airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDatabaseUpdateOperator
updates the structure of a Google Cloud Spanner database.
airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDeleteOperator
deletes a Google Cloud Spanner instance.
airflow.contrib.operators.gcp_spanner_operator.CloudSpannerInstanceDeployOperator
creates a new Google Cloud Spanner instance, or if an instance with the same name exists, updates the instance.
They also use airflow.contrib.hooks.gcp_spanner_hook.CloudSpannerHook
to communicate with Google Cloud Platform.
Cloud SQL¶
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceCreateOperator
create a new Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDatabaseCreateOperator
creates a new database inside a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDatabaseDeleteOperator
deletes a database from a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDatabasePatchOperator
updates a database inside a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceDeleteOperator
delete a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceExportOperator
exports data from a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstanceImportOperator
imports data into a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlInstancePatchOperator
patch a Cloud SQL instance.
airflow.contrib.operators.gcp_sql_operator.CloudSqlQueryOperator
run query in a Cloud SQL instance.
They also use airflow.contrib.hooks.gcp_sql_hook.CloudSqlDatabaseHook
and airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook
to communicate with Google Cloud Platform.
Cloud Bigtable¶
airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator
updates the number of nodes in a Google Cloud Bigtable cluster.
airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator
creates a Cloud Bigtable instance.
airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator
deletes a Google Cloud Bigtable instance.
airflow.contrib.operators.gcp_bigtable_operator.BigtableTableCreateOperator
creates a table in a Google Cloud Bigtable instance.
airflow.contrib.operators.gcp_bigtable_operator.BigtableTableDeleteOperator
deletes a table in a Google Cloud Bigtable instance.
airflow.contrib.operators.gcp_bigtable_operator.BigtableTableWaitForReplicationSensor
(sensor) waits for a table to be fully replicated.
They also use airflow.contrib.hooks.gcp_bigtable_hook.BigtableHook
to communicate with Google Cloud Platform.
Cloud Build¶
airflow.contrib.operators.gcp_cloud_build_operator.CloudBuildCreateBuildOperator
Starts a build with the specified configuration.
They also use airflow.contrib.hooks.gcp_cloud_build_hook.CloudBuildHook
to communicate with Google Cloud Platform.
Compute Engine¶
airflow.contrib.operators.gcp_compute_operator.GceInstanceStartOperator
start an existing Google Compute Engine instance.
airflow.contrib.operators.gcp_compute_operator.GceInstanceStopOperator
stop an existing Google Compute Engine instance.
airflow.contrib.operators.gcp_compute_operator.GceSetMachineTypeOperator
change the machine type for a stopped instance.
airflow.contrib.operators.gcp_compute_operator.GceInstanceTemplateCopyOperator
copy the Instance Template, applying specified changes.
airflow.contrib.operators.gcp_compute_operator.GceInstanceGroupManagerUpdateTemplateOperator
patch the Instance Group Manager, replacing source Instance Template URL with the destination one.
The operators have the common base operator airflow.contrib.operators.gcp_compute_operator.GceBaseOperator
They also use airflow.contrib.hooks.gcp_compute_hook.GceHook
to communicate with Google Cloud Platform.
Cloud Functions¶
airflow.contrib.operators.gcp_function_operator.GcfFunctionDeployOperator
deploy Google Cloud Function to Google Cloud Platform
airflow.contrib.operators.gcp_function_operator.GcfFunctionDeleteOperator
delete Google Cloud Function in Google Cloud Platform
They also use airflow.contrib.hooks.gcp_function_hook.GcfHook
to communicate with Google Cloud Platform.
Cloud DataFlow¶
airflow.contrib.operators.dataflow_operator.DataFlowJavaOperator
launching Cloud Dataflow jobs written in Java.
airflow.contrib.operators.dataflow_operator.DataflowTemplateOperator
launching a templated Cloud DataFlow batch job.
airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator
launching Cloud Dataflow jobs written in python.
They also use airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook
to communicate with Google Cloud Platform.
Cloud DataProc¶
airflow.contrib.operators.dataproc_operator.DataprocClusterCreateOperator
Create a new cluster on Google Cloud Dataproc.
airflow.contrib.operators.dataproc_operator.DataprocClusterDeleteOperator
Delete a cluster on Google Cloud Dataproc.
airflow.contrib.operators.dataproc_operator.DataprocClusterScaleOperator
Scale up or down a cluster on Google Cloud Dataproc.
airflow.contrib.operators.dataproc_operator.DataProcHadoopOperator
Start a Hadoop Job on a Cloud DataProc cluster.
airflow.contrib.operators.dataproc_operator.DataProcHiveOperator
Start a Hive query Job on a Cloud DataProc cluster.
airflow.contrib.operators.dataproc_operator.DataProcPigOperator
Start a Pig query Job on a Cloud DataProc cluster.
airflow.contrib.operators.dataproc_operator.DataProcPySparkOperator
Start a PySpark Job on a Cloud DataProc cluster.
airflow.contrib.operators.dataproc_operator.DataProcSparkOperator
Start a Spark Job on a Cloud DataProc cluster.
airflow.contrib.operators.dataproc_operator.DataProcSparkSqlOperator
Start a Spark SQL query Job on a Cloud DataProc cluster.
airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateInlineOperator
Instantiate a WorkflowTemplate Inline on Google Cloud Dataproc.
airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateOperator
Instantiate a WorkflowTemplate on Google Cloud Dataproc.
Cloud Datastore¶
airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator
Export entities from Google Cloud Datastore to Cloud Storage.
airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator
Import entities from Cloud Storage to Google Cloud Datastore.
They also use airflow.contrib.hooks.datastore_hook.DatastoreHook
to communicate with Google Cloud Platform.
Cloud ML Engine¶
airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator
Start a Cloud ML Engine batch prediction job.
airflow.contrib.operators.mlengine_operator.MLEngineModelOperator
Manages a Cloud ML Engine model.
airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator
Start a Cloud ML Engine training job.
airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator
Manages a Cloud ML Engine model version.
They also use airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook
to communicate with Google Cloud Platform.
Cloud Storage¶
airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator
Uploads a file to Google Cloud Storage.
airflow.contrib.operators.gcs_acl_operator.GoogleCloudStorageBucketCreateAclEntryOperator
Creates a new ACL entry on the specified bucket.
airflow.contrib.operators.gcs_acl_operator.GoogleCloudStorageObjectCreateAclEntryOperator
Creates a new ACL entry on the specified object.
airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator
Downloads a file from Google Cloud Storage.
airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator
List all objects from the bucket with the give string prefix and delimiter in name.
airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator
Creates a new cloud storage bucket.
airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator
Loads files from Google cloud storage into BigQuery.
airflow.contrib.operators.gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator
Copies objects from a bucket to another, with renaming if requested.
airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator
Copy data from any MySQL Database to Google cloud storage in JSON format.
airflow.contrib.operators.mssql_to_gcs.MsSqlToGoogleCloudStorageOperator
Copy data from any Microsoft SQL Server Database to Google Cloud Storage in JSON format.
airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageObjectSensor
Checks for the existence of a file in Google Cloud Storage.
airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageObjectUpdatedSensor
Checks if an object is updated in Google Cloud Storage.
airflow.contrib.sensors.gcs_sensor.GoogleCloudStoragePrefixSensor
Checks for the existence of a objects at prefix in Google Cloud Storage.
airflow.contrib.sensors.gcs_sensor.GoogleCloudStorageUploadSessionCompleteSession
Checks for changes in the number of objects at prefix in Google Cloud Storage bucket and returns True if the inactivity period has passed with no increase in the number of objects for situations when many objects are being uploaded to a bucket with no formal success signal.
airflow.contrib.operators.gcs_delete_operator.GoogleCloudStorageDeleteOperator
Deletes objects from a Google Cloud Storage bucket.
They also use airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook
to communicate with Google Cloud Platform.
Transfer Service¶
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceJobDeleteOperator
Deletes a transfer job.
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceJobCreateOperator
Creates a transfer job.
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceJobUpdateOperator
Updates a transfer job.
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationCancelOperator
Cancels a transfer operation.
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationGetOperator
Gets a transfer operation.
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationPauseOperator
Pauses a transfer operation
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationResumeOperator
Resumes a transfer operation.
airflow.contrib.operators.gcp_transfer_operator.GcpTransferServiceOperationsListOperator
Gets a list of transfer operations.
airflow.contrib.operators.gcp_transfer_operator.GoogleCloudStorageToGoogleCloudStorageTransferOperator
Copies objects from a Google Cloud Storage bucket to another bucket.
airflow.contrib.operators.gcp_transfer_operator.S3ToGoogleCloudStorageTransferOperator
Synchronizes an S3 bucket with a Google Cloud Storage bucket.
airflow.contrib.sensors.gcp_transfer_operator.GCPTransferServiceWaitForJobStatusSensor
Waits for at least one operation belonging to the job to have the expected status.
They also use airflow.contrib.hooks.gcp_transfer_hook.GCPTransferServiceHook
to communicate with Google Cloud Platform.
Cloud Vision¶
Cloud Vision Product Search Operators¶
airflow.contrib.operators.gcp_vision_operator.CloudVisionAddProductToProductSetOperator
Adds a Product to the specified ProductSet.
airflow.contrib.operators.gcp_vision_operator.CloudVisionAnnotateImageOperator
Run image detection and annotation for an image.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductCreateOperator
Creates a new Product resource.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductDeleteOperator
Permanently deletes a product and its reference images.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductGetOperator
Gets information associated with a Product.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetCreateOperator
Creates a new ProductSet resource.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetDeleteOperator
Permanently deletes a ProductSet.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetGetOperator
Gets information associated with a ProductSet.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetUpdateOperator
Makes changes to a ProductSet resource.
airflow.contrib.operators.gcp_vision_operator.CloudVisionProductUpdateOperator
Makes changes to a Product resource.
airflow.contrib.operators.gcp_vision_operator.CloudVisionReferenceImageCreateOperator
Creates a new ReferenceImage resource.
airflow.contrib.operators.gcp_vision_operator.CloudVisionRemoveProductFromProductSetOperator
Removes a Product from the specified ProductSet.
airflow.contrib.operators.gcp_vision_operator.CloudVisionAnnotateImageOperator
Run image detection and annotation for an image.
airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectTextOperator
Run text detection for an image
airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectDocumentTextOperator
Run document text detection for an image
airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectImageLabelsOperator
Run image labels detection for an image
airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectImageSafeSearchOperator
Run safe search detection for an image
They also use airflow.contrib.hooks.gcp_vision_hook.CloudVisionHook
to communicate with Google Cloud Platform.
Cloud Text to Speech¶
airflow.contrib.operators.gcp_text_to_speech_operator.GcpTextToSpeechSynthesizeOperator
Synthesizes input text into audio file and stores this file to GCS.
They also use airflow.contrib.hooks.gcp_text_to_speech_hook.GCPTextToSpeechHook
to communicate with Google Cloud Platform.
Cloud Speech to Text¶
airflow.contrib.operators.gcp_speech_to_text_operator.GcpSpeechToTextRecognizeSpeechOperator
Recognizes speech in audio input and returns text.
They also use airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook
to communicate with Google Cloud Platform.
Cloud Speech Translate Operators¶
airflow.contrib.operators.gcp_translate_speech_operator.GcpTranslateSpeechOperator
Recognizes speech in audio input and translates it.
- They also use
airflow.contrib.hooks.gcp_speech_to_text_hook.GCPSpeechToTextHook
and airflow.contrib.hooks.gcp_translate_hook.CloudTranslateHook
to communicate with Google Cloud Platform.
Cloud Translate¶
Cloud Translate Text Operators¶
airflow.contrib.operators.gcp_translate_operator.CloudTranslateTextOperator
Translate a string or list of strings.
Cloud Video Intelligence¶
airflow.contrib.operators.gcp_video_intelligence_operator.CloudVideoIntelligenceDetectVideoLabelsOperator
Performs video annotation, annotating video labels.
airflow.contrib.operators.gcp_video_intelligence_operator.CloudVideoIntelligenceDetectVideoExplicitContentOperator
Performs video annotation, annotating explicit content.
airflow.contrib.operators.gcp_video_intelligence_operator.CloudVideoIntelligenceDetectVideoShotsOperator
Performs video annotation, annotating video shots.
They also use airflow.contrib.hooks.gcp_video_intelligence_hook.CloudVideoIntelligenceHook
to communicate with Google Cloud Platform.
Google Kubernetes Engine¶
airflow.contrib.operators.gcp_container_operator.GKEClusterCreateOperator
Creates a Kubernetes Cluster in Google Cloud Platform
airflow.contrib.operators.gcp_container_operator.GKEClusterDeleteOperator
Deletes a Kubernetes Cluster in Google Cloud Platform
airflow.contrib.operators.gcp_container_operator.GKEPodOperator
Executes a task in a Kubernetes pod in the specified Google Kubernetes Engine cluster
They also use airflow.contrib.hooks.gcp_container_hook.GKEClusterHook
to communicate with Google Cloud Platform.
Google Natural Language¶
airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageAnalyzeEntities
Finds named entities (currently proper names and common nouns) in the text along with entity types, salience, mentions for each entity, and other properties.
airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageAnalyzeEntitySentiment
Finds entities, similar to AnalyzeEntities in the text and analyzes sentiment associated with each entity and its mentions.
airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageAnalyzeSentiment
Analyzes the sentiment of the provided text.
airflow.contrib.operators.gcp_natural_language_operator.CloudLanguageClassifyTextOperator
Classifies a document into categories.
They also use airflow.contrib.hooks.gcp_natural_language_operator.CloudNaturalLanguageHook
to communicate with Google Cloud Platform.
Google Cloud Data Loss Prevention (DLP)¶
airflow.contrib.operators.gcp_dlp_operator.CloudDLPCancelDLPJobOperator
Starts asynchronous cancellation on a long-running DlpJob.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateDeidentifyTemplateOperator
Creates a DeidentifyTemplate for re-using frequently used configuration for de-identifying content, images, and storage.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateDLPJobOperator
Creates a new job to inspect storage or calculate risk metrics.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateInspectTemplateOperator
Creates an InspectTemplate for re-using frequently used configuration for inspecting content, images, and storage.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateJobTriggerOperator
Creates a job trigger to run DLP actions such as scanning storage for sensitive information on a set schedule.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPCreateStoredInfoTypeOperator
Creates a pre-built stored infoType to be used for inspection.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeidentifyContentOperator
De-identifies potentially sensitive info from a ContentItem. This method has limits on input size and output size.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteDeidentifyTemplateOperator
Deletes a DeidentifyTemplate.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteDlpJobOperator
Deletes a long-running DlpJob. This method indicates that the client is no longer interested in the DlpJob result. The job will be cancelled if possible.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteInspectTemplateOperator
Deletes an InspectTemplate.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteJobTriggerOperator
Deletes a job trigger.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPDeleteStoredInfoTypeOperator
Deletes a stored infoType.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetDeidentifyTemplateOperator
Gets a DeidentifyTemplate.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetDlpJobOperator
Gets the latest state of a long-running DlpJob.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetInspectTemplateOperator
Gets an InspectTemplate.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetJobTripperOperator
Gets a job trigger.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPGetStoredInfoTypeOperator
Gets a stored infoType.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPInspectContentOperator
Finds potentially sensitive info in content. This method has limits on input size, processing time, and output size.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPListDeidentifyTemplatesOperator
Lists DeidentifyTemplates.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPListDlpJobsOperator
Lists DlpJobs that match the specified filter in the request.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPListInfoTypesOperator
Returns a list of the sensitive information types that the DLP API supports.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPListInspectTemplatesOperator
Lists InspectTemplates.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPListJobTriggersOperator
Lists job triggers.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPListStoredInfoTypesOperator
Lists stored infoTypes.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPRedactImageOperator
Redacts potentially sensitive info from an image. This method has limits on input size, processing time, and output size.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPReidentifyContentOperator
Re-identifies content that has been de-identified.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateDeidentifyTemplateOperator
Updates the DeidentifyTemplate.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateInspectTemplateOperator
Updates the InspectTemplate.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateJobTriggerOperator
Updates a job trigger.
airflow.contrib.operators.gcp_dlp_operator.CloudDLPUpdateStoredInfoTypeOperator
Updates the stored infoType by creating a new version.
They also use airflow.contrib.hooks.gcp_dlp_hook.CloudDLPHook
to communicate with Google Cloud Platform.
Qubole¶
Apache Airflow has a native operator and hooks to talk to Qubole, which lets you submit your big data jobs directly to Qubole from Apache Airflow.
airflow.contrib.operators.qubole_operator.QuboleOperator
Execute tasks (commands) on QDS (https://qubole.com).
airflow.contrib.sensors.qubole_sensor.QubolePartitionSensor
Wait for a Hive partition to show up in QHS (Qubole Hive Service) and check for its presence via QDS APIs
airflow.contrib.sensors.qubole_sensor.QuboleFileSensor
Wait for a file or folder to be present in cloud storage and check for its presence via QDS APIs
airflow.contrib.operators.qubole_check_operator.QuboleCheckOperator
Performs checks against Qubole Commands.
QuboleCheckOperator
expects a command that will be executed on QDS.airflow.contrib.operators.qubole_check_operator.QuboleValueCheckOperator
Performs a simple value check using Qubole command. By default, each value on the first row of this Qubole command is compared with a pre-defined value