airflow.providers.common.ai.operators.llm_file_analysis¶

Operator for analyzing files with LLMs.

Classes¶

LLMFileAnalysisOperator

Analyze files from object storage or local storage using a single LLM call.

Module Contents¶

class airflow.providers.common.ai.operators.llm_file_analysis.LLMFileAnalysisOperator(*, file_path, file_conn_id=None, multi_modal=False, max_files=20, max_file_size_bytes=5 * 1024 * 1024, max_total_size_bytes=20 * 1024 * 1024, max_text_chars=100000, sample_rows=10, **kwargs)[source]¶

Bases: airflow.providers.common.ai.operators.llm.LLMOperator

Analyze files from object storage or local storage using a single LLM call.

The operator resolves file_path via ObjectStoragePath, normalizes supported formats into text context, and optionally attaches images/PDFs as multimodal inputs when multi_modal=True.

Parameters:

prompt – The analysis prompt for the LLM.
llm_conn_id – Connection ID for the LLM provider.
file_path (str) – File or prefix to analyze.
file_conn_id (str | None) – Optional connection ID for the storage backend. Overrides a connection embedded in file_path.
multi_modal (bool) – Allow PNG/JPG/PDF inputs as binary attachments. Default False.
max_files (int) – Maximum number of files to include from a prefix. Excess files are omitted and noted in the prompt. Default 20.
max_file_size_bytes (int) – Maximum size of any single input file. Default 5 MiB.
max_total_size_bytes (int) – Maximum cumulative size across all resolved files. Default 20 MiB.
max_text_chars (int) – Maximum normalized text context passed to the LLM after sampling/truncation. Default 100000.
sample_rows (int) – Maximum number of sampled rows or records included for CSV, Parquet, and Avro inputs. This limits structural preview depth, while max_file_size_bytes and max_total_size_bytes limit bytes read from storage and max_text_chars limits the final prompt text budget. Default 10.

template_fields: collections.abc.Sequence[str] = ('prompt', 'llm_conn_id', 'model_id', 'system_prompt', 'agent_params', 'file_path', 'file_conn_id')[source]¶

file_path[source]¶

file_conn_id = None[source]¶

multi_modal = False[source]¶

max_files = 20[source]¶

max_file_size_bytes = 5242880[source]¶

max_total_size_bytes = 20971520[source]¶

max_text_chars = 100000[source]¶

sample_rows = 10[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, generated_output, event)[source]¶

Resume after human review, restoring structured outputs for XCom consumers.