airflow.providers.elasticsearch.log.es_task_handler
¶
Module Contents¶
-
class
airflow.providers.elasticsearch.log.es_task_handler.
ElasticsearchTaskHandler
(base_log_folder: str, filename_template: str, log_id_template: str, end_of_log_mark: str, write_stdout: bool, json_format: bool, json_fields: str, host_field: str = 'host', offset_field: str = 'offset', host: str = 'localhost:9200', frontend: str = 'localhost:5601', es_kwargs: Optional[dict] = conf.getsection('elasticsearch_configs'))[source]¶ Bases:
airflow.utils.log.file_task_handler.FileTaskHandler
,airflow.utils.log.logging_mixin.ExternalLoggingMixin
,airflow.utils.log.logging_mixin.LoggingMixin
ElasticsearchTaskHandler is a python log handler that reads logs from Elasticsearch. Note logs are not directly indexed into Elasticsearch. Instead, it flushes logs into local files. Additional software setup is required to index the log into Elasticsearch, such as using Filebeat and Logstash. To efficiently query and sort Elasticsearch results, we assume each log message has a field log_id consists of ti primary keys: log_id = {dag_id}-{task_id}-{execution_date}-{try_number} Log messages with specific log_id are sorted based on offset, which is a unique integer indicates log message's order. Timestamp here are unreliable because multiple log messages might have the same timestamp.
-
es_read
(self, log_id: str, offset: str, metadata: dict)[source]¶ Returns the logs matching log_id in Elasticsearch and next offset. Returns '' if no log is found or there was an error.
-
set_context
(self, ti: TaskInstance)[source]¶ Provide task_instance context to airflow task handler.
- Parameters
ti -- task instance object
-