airflow.providers.common.ai.operators.llm_file_analysis

Operator for analyzing files with LLMs.

Classes

LLMFileAnalysisOperator

Analyze files from object storage or local storage using a single LLM call.

Module Contents

class airflow.providers.common.ai.operators.llm_file_analysis.LLMFileAnalysisOperator(*, file_path, file_conn_id=None, multi_modal=False, max_files=20, max_file_size_bytes=5 * 1024 * 1024, max_total_size_bytes=20 * 1024 * 1024, max_text_chars=100000, sample_rows=10, **kwargs)[source]

Bases: airflow.providers.common.ai.operators.llm.LLMOperator

Analyze files from object storage or local storage using a single LLM call.

The operator resolves file_path via ObjectStoragePath, normalizes supported formats into text context, and optionally attaches images/PDFs as multimodal inputs when multi_modal=True.

Parameters:
  • prompt – The analysis prompt for the LLM.

  • llm_conn_id – Connection ID for the LLM provider.

  • file_path (str) – File or prefix to analyze.

  • file_conn_id (str | None) – Optional connection ID for the storage backend. Overrides a connection embedded in file_path.

  • multi_modal (bool) – Allow PNG/JPG/PDF inputs as binary attachments. Default False.

  • max_files (int) – Maximum number of files to include from a prefix. Excess files are omitted and noted in the prompt. Default 20.

  • max_file_size_bytes (int) – Maximum size of any single input file. Default 5 MiB.

  • max_total_size_bytes (int) – Maximum cumulative size across all resolved files. Default 20 MiB.

  • max_text_chars (int) – Maximum normalized text context passed to the LLM after sampling/truncation. Default 100000.

  • sample_rows (int) – Maximum number of sampled rows or records included for CSV, Parquet, and Avro inputs. This limits structural preview depth, while max_file_size_bytes and max_total_size_bytes limit bytes read from storage and max_text_chars limits the final prompt text budget. Default 10.

template_fields: collections.abc.Sequence[str] = ('prompt', 'llm_conn_id', 'model_id', 'system_prompt', 'agent_params', 'file_path', 'file_conn_id')[source]
file_path[source]
file_conn_id = None[source]
multi_modal = False[source]
max_files = 20[source]
max_file_size_bytes = 5242880[source]
max_total_size_bytes = 20971520[source]
max_text_chars = 100000[source]
sample_rows = 10[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, generated_output, event)[source]

Resume after human review, restoring structured outputs for XCom consumers.

Was this entry helpful?