LangChainHook¶
Use LangChainHook to
bridge an Airflow connection to LangChain
chat and embedding models. The hook reads credentials (API key, optional base
URL) from the connection and returns configured LangChain model objects via
two universal entry-point functions:
langchain.chat_models.init_chat_modelfor chat models, dispatching to the right vendor based on theprovider:nameprefix.langchain.embeddings.init_embeddingsfor embedding models, same dispatch story.
The hook owns its own langchain connection type so the UI is honest about
which framework a connection configures.
Chat model usage¶
Pass llm_model to the constructor (or set extra["model"] on the
connection) and call get_chat_model():
@dag(schedule=None, tags=["example"])
def example_langchain_chat():
@task
def summarize(text: str) -> str:
hook = LangChainHook(
llm_conn_id="langchain_default",
llm_model="openai:gpt-4o",
)
llm = hook.get_chat_model()
# LangChain BaseMessage.content is `str | list[...]` (multi-modal union);
# coerce to str for the text-only path this example demonstrates.
return str(llm.invoke(f"Summarize concisely: {text}").content)
summarize("Apache Airflow is a platform for authoring, scheduling, and monitoring workflows.")
The returned model is a LangChain BaseChatModel, so it composes with the
rest of LangChain’s runnable surface
(ChatPromptTemplate / StrOutputParser / RunnableSequence / …).
Supported chat providers¶
Any model identifier accepted by langchain.chat_models.init_chat_model works out of the box. Common identifiers:
openai:gpt-4o,openai:gpt-4o-mini– requireslangchain-openaianthropic:claude-3-7-sonnet– requireslangchain-anthropicgroq:llama-3.3-70b-versatile– requireslangchain-groqmistralai:mistral-large-latest– requireslangchain-mistralaiollama:llama3– requireslangchain-ollama(pointhostat the Ollama URL)deepseek:deepseek-chat– requireslangchain-deepseek
Cloud providers with non-standard auth (AWS Bedrock, Google Vertex AI, Azure
OpenAI) are not covered by the api_key + base_url surface here and are
deferred to per-vendor hooks (mirroring the pydantic-ai cloud-auth subclass
pattern).
Embedding model usage¶
Pass embed_model to the constructor (or set extra["embed_model"] on
the connection) and call get_embedding_model():
@dag(schedule=None, tags=["example"])
def example_langchain_embedding():
@task
def embed_documents(texts: list[str]) -> int:
hook = LangChainHook(
llm_conn_id="langchain_default",
embed_model="openai:text-embedding-3-small",
)
embeddings = hook.get_embedding_model()
vectors = embeddings.embed_documents(texts)
return len(vectors[0])
embed_documents(
[
"Apache Airflow is a workflow orchestrator.",
"Workflows are defined as Python DAGs.",
]
)
The same hook instance can serve both chat and embedding models when both identifiers are set:
@dag(schedule=None, tags=["example"])
def example_langchain_chat_and_embedding():
"""One hook instance serves both chat and embeddings when both models are set."""
@task
def use_both() -> dict:
hook = LangChainHook(
llm_conn_id="langchain_default",
llm_model="openai:gpt-4o",
embed_model="openai:text-embedding-3-small",
)
chat = hook.get_chat_model()
embeddings = hook.get_embedding_model()
return {
"answer": str(chat.invoke("In one sentence: what does Airflow do?").content),
"embedding_dim": len(embeddings.embed_query("Airflow")),
}
use_both()
Supported embedding providers¶
The hook passes api_key and (optional) base_url from the connection to
langchain.embeddings.init_embeddings.
Providers whose embedding classes accept this kwarg shape work directly:
openai:text-embedding-3-small,openai:text-embedding-3-large– requireslangchain-openaiopenai:<model>against an OpenAI-compatible endpoint (pointhostat Ollama / vLLM / LM Studio) – requireslangchain-openai
init_embeddings advertises more providers (Cohere, Mistral AI, HuggingFace,
Bedrock, Vertex AI, Azure OpenAI, …), but their embedding classes expect
provider-specific credential kwargs (cohere_api_key, AWS auth chain, GCP
service-account, …) rather than the generic api_key / base_url this
hook forwards. Those are deferred to per-vendor subclasses mirroring the
pydantic-ai pattern (PydanticAIBedrockHook / PydanticAIVertexHook /
PydanticAIAzureHook).
Different connections for chat and embeddings¶
If chat and embeddings live on different API keys (e.g. premium chat key vs
free-tier embeddings key), pass an explicit embed_conn_id. When unset it
falls back to llm_conn_id, so the common one-provider case stays simple:
@dag(schedule=None, tags=["example"])
def example_langchain_different_conns():
"""Use separate connections when chat and embeddings live on different API keys."""
@task
def use_separate_conns() -> dict:
hook = LangChainHook(
llm_conn_id="openai_chat",
embed_conn_id="openai_embed",
llm_model="openai:gpt-4o",
embed_model="openai:text-embedding-3-small",
)
chat = hook.get_chat_model()
embeddings = hook.get_embedding_model()
return {
"answer": str(chat.invoke("In one sentence: what does Airflow do?").content),
"embedding_dim": len(embeddings.embed_query("Airflow")),
}
use_separate_conns()
Connection Configuration¶
The hook reads credentials from the Airflow connection of type langchain:
password – API key (passed as
api_keytoinit_chat_modelandinit_embeddings).host – Optional base URL (passed as
base_url; useful for custom OpenAI-compatible endpoints, Ollama, vLLM).extra JSON –
{"model": "openai:gpt-4o", "embed_model": "openai:text-embedding-3-small"}to set default chat and embedding model identifiers on the connection.
Parameters¶
Parameter |
Default |
Description |
|---|---|---|
|
|
Airflow connection ID for the LLM provider. |
|
|
Optional separate Airflow connection ID for the embedding provider.
Useful when chat and embeddings live on different API keys; in the
common one-provider case, leave unset and the hook reuses |
|
|
Chat model identifier in |
|
|
Embedding model identifier in |
Dependencies¶
Install the langchain extra to use this hook:
pip install apache-airflow-providers-common-ai[langchain]
That extra installs only langchain itself, since the framework is
vendor-agnostic. Install the LangChain integration package for whichever
provider(s) you intend to use:
langchain-openai– OpenAI and OpenAI-compatible endpoints (Ollama, vLLM)langchain-anthropic– Anthropiclangchain-groq,langchain-mistralai,langchain-deepseek,langchain-ollama, …