Google Cloud Natural Language Operators¶
The Google Cloud Natural Language can be used to reveal the structure and meaning of text via powerful machine learning models. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Select or create a Cloud Platform project using Cloud Console.
Enable billing for your project, as described in Google Cloud documentation.
Enable API, as described in Cloud Console documentation.
Install API libraries via pip.
pip install 'apache-airflow[gcp]'
Detailed information is available Installation
Documents¶
Each operator uses a Document
for
representing text.
Here is an example of document with text provided as a string:
TEXT = """
Airflow is a platform to programmatically author, schedule and monitor workflows.
Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The Airflow scheduler executes
your tasks on an array of workers while following the specified dependencies. Rich command line utilities
make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize
pipelines running in production, monitor progress, and troubleshoot issues when needed.
"""
document = Document(content=TEXT, type="PLAIN_TEXT")
In addition to supplying string, a document can refer to content stored in Google Cloud Storage.
GCS_CONTENT_URI = "gs://my-text-bucket/sentiment-me.txt"
document_gcs = Document(gcs_content_uri=GCS_CONTENT_URI, type="PLAIN_TEXT")
Analyzing Entities¶
Entity Analysis inspects the given text for known entities (proper nouns such as
public figures, landmarks, etc.), and returns information about those entities.
Entity analysis is performed with the
CloudLanguageAnalyzeEntitiesOperator
operator.
analyze_entities = CloudLanguageAnalyzeEntitiesOperator(document=document, task_id="analyze_entities")
You can use Jinja templating with
document
, gcp_conn_id
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_entities_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_entities') }}\"",
task_id="analyze_entities_result",
)
Analyzing Entity Sentiment¶
Sentiment Analysis inspects the given text and identifies the prevailing
emotional opinion within the text, especially to determine a writer’s attitude
as positive, negative, or neutral. Sentiment analysis is performed through
the CloudLanguageAnalyzeEntitySentimentOperator
operator.
analyze_entity_sentiment = CloudLanguageAnalyzeEntitySentimentOperator(
document=document, task_id="analyze_entity_sentiment"
)
You can use Jinja templating with
document
, gcp_conn_id
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_entity_sentiment_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_entity_sentiment') }}\"",
task_id="analyze_entity_sentiment_result",
)
Analyzing Sentiment¶
Sentiment Analysis inspects the given text and identifies the prevailing
emotional opinion within the text, especially to determine a writer’s
attitude as positive, negative, or neutral. Sentiment analysis is performed
through the
CloudLanguageAnalyzeSentimentOperator
operator.
analyze_sentiment = CloudLanguageAnalyzeSentimentOperator(document=document, task_id="analyze_sentiment")
You can use Jinja templating with
document
, gcp_conn_id
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_sentiment_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_sentiment') }}\"",
task_id="analyze_sentiment_result",
)
Classifying Content¶
Content Classification analyzes a document and returns a list of content
categories that apply to the text found in the document. To classify the
content in a document, use the
CloudLanguageClassifyTextOperator
operator.
analyze_classify_text = CloudLanguageClassifyTextOperator(
document=document, task_id="analyze_classify_text"
)
You can use Jinja templating with
document
, gcp_conn_id
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_classify_text_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_classify_text') }}\"",
task_id="analyze_classify_text_result",
)