airflow.providers.common.ai.operators.llm_file_analysis¶
Operator for analyzing files with LLMs.
Classes¶
Analyze files from object storage or local storage using a single LLM call. |
Module Contents¶
- class airflow.providers.common.ai.operators.llm_file_analysis.LLMFileAnalysisOperator(*, file_path, file_conn_id=None, multi_modal=False, max_files=20, max_file_size_bytes=5 * 1024 * 1024, max_total_size_bytes=20 * 1024 * 1024, max_text_chars=100000, sample_rows=10, **kwargs)[source]¶
Bases:
airflow.providers.common.ai.operators.llm.LLMOperatorAnalyze files from object storage or local storage using a single LLM call.
The operator resolves
file_pathviaObjectStoragePath, normalizes supported formats into text context, and optionally attaches images/PDFs as multimodal inputs whenmulti_modal=True.- Parameters:
prompt – The analysis prompt for the LLM.
llm_conn_id – Connection ID for the LLM provider.
file_path (str) – File or prefix to analyze.
file_conn_id (str | None) – Optional connection ID for the storage backend. Overrides a connection embedded in
file_path.multi_modal (bool) – Allow PNG/JPG/PDF inputs as binary attachments. Default
False.max_files (int) – Maximum number of files to include from a prefix. Excess files are omitted and noted in the prompt. Default
20.max_file_size_bytes (int) – Maximum size of any single input file. Default
5 MiB.max_total_size_bytes (int) – Maximum cumulative size across all resolved files. Default
20 MiB.max_text_chars (int) – Maximum normalized text context passed to the LLM after sampling/truncation. Default
100000.sample_rows (int) – Maximum number of sampled rows or records included for CSV, Parquet, and Avro inputs. This limits structural preview depth, while
max_file_size_bytesandmax_total_size_byteslimit bytes read from storage andmax_text_charslimits the final prompt text budget. Default10.
- template_fields: collections.abc.Sequence[str] = ('prompt', 'llm_conn_id', 'model_id', 'system_prompt', 'agent_params', 'file_path', 'file_conn_id')[source]¶