Google Cloud Natural Language Operators¶
The Google Cloud Natural Language can be used to reveal the structure and meaning of text via powerful machine learning models. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app.
Documents¶
Each operator uses a Document
for
representing text.
Here is an example of document with text provided as a string:
TEXT = """Airflow is a platform to programmatically author, schedule and monitor workflows.
Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The Airflow scheduler executes
your tasks on an array of workers while following the specified dependencies. Rich command line utilities
make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize
pipelines running in production, monitor progress, and troubleshoot issues when needed.
"""
document = Document(content=TEXT, type="PLAIN_TEXT")
In addition to supplying string, a document can refer to content stored in Google Cloud Storage.
GCS_CONTENT_URI = "gs://my-text-bucket/sentiment-me.txt"
document_gcs = Document(gcs_content_uri=GCS_CONTENT_URI, type="PLAIN_TEXT")
Analyzing Entities¶
Entity Analysis inspects the given text for known entities (proper nouns such as
public figures, landmarks, etc.), and returns information about those entities.
Entity analysis is performed with the
CloudNaturalLanguageAnalyzeEntitiesOperator
operator.
analyze_entities = CloudNaturalLanguageAnalyzeEntitiesOperator(
document=document, task_id="analyze_entities"
)
You can use Jinja templating with
document
, gcp_conn_id
, impersonation_chain
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_entities_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_entities') }}\"",
task_id="analyze_entities_result",
)
Analyzing Entity Sentiment¶
Sentiment Analysis inspects the given text and identifies the prevailing
emotional opinion within the text, especially to determine a writer's attitude
as positive, negative, or neutral. Sentiment analysis is performed through
the CloudNaturalLanguageAnalyzeEntitySentimentOperator
operator.
analyze_entity_sentiment = CloudNaturalLanguageAnalyzeEntitySentimentOperator(
document=document, task_id="analyze_entity_sentiment"
)
You can use Jinja templating with
document
, gcp_conn_id
, impersonation_chain
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_entity_sentiment_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_entity_sentiment') }}\"",
task_id="analyze_entity_sentiment_result",
)
Analyzing Sentiment¶
Sentiment Analysis inspects the given text and identifies the prevailing
emotional opinion within the text, especially to determine a writer's
attitude as positive, negative, or neutral. Sentiment analysis is performed
through the
CloudNaturalLanguageAnalyzeSentimentOperator
operator.
analyze_sentiment = CloudNaturalLanguageAnalyzeSentimentOperator(
document=document, task_id="analyze_sentiment"
)
You can use Jinja templating with
document
, gcp_conn_id
, impersonation_chain
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_sentiment_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_sentiment') }}\"",
task_id="analyze_sentiment_result",
)
Classifying Content¶
Content Classification analyzes a document and returns a list of content
categories that apply to the text found in the document. To classify the
content in a document, use the
CloudNaturalLanguageClassifyTextOperator
operator.
analyze_classify_text = CloudNaturalLanguageClassifyTextOperator(
document=document, task_id="analyze_classify_text"
)
You can use Jinja templating with
document
, gcp_conn_id
, impersonation_chain
parameters which allows you to dynamically determine values. The result is saved to XCom, which allows it
to be used by other operators.
analyze_classify_text_result = BashOperator(
bash_command="echo \"{{ task_instance.xcom_pull('analyze_classify_text') }}\"",
task_id="analyze_classify_text_result",
)