Commit Graph

8 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
1000ae437c
lib/logstorage: optimize performance for top pipe when it is applied to a field with millions of unique values
- Use parallel merge of per-CPU shard results. This improves merge performance on multi-CPU systems.
- Use topN heap sort of per-shard results. This improves performance when results contain millions of entries.

(cherry picked from commit 192c07f76a)
2024-10-18 11:42:15 +02:00
Aliaksandr Valialkin
81f3e07e1e
lib/logstorage: do not count dictionary values which have no matching logs in count_uniq stats function
Create blockResultColumn.forEachDictValue* helper functions for visiting matching
dictionary values. These helper functions should prevent from counting dictionary values
without matching logs in the future.

This is a follow-up for 0c0f013a60
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152
2024-10-01 13:36:27 +02:00
Aliaksandr Valialkin
dbcf06cd85
lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes
See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483
2024-09-30 14:16:21 +02:00
Aliaksandr Valialkin
58d1e517de
lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column
encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values.
So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.
2024-09-29 10:29:50 +02:00
Aliaksandr Valialkin
b5d94f06f5
lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock()
This simplifies pipeProcessor initialization logic a bit.
This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.
2024-09-29 10:29:49 +02:00
Aliaksandr Valialkin
246c339e3d
lib/logstorage: read timestamps column when it is really needed during query execution
Previously timestamps column was read unconditionally on every query.
This could significantly slow down queries, which do not need reading this column
like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .
2024-09-25 19:18:37 +02:00
Aliaksandr Valialkin
dd62a2b9d6
lib/logstorage: work-in-progress 2024-06-27 14:21:03 +02:00
Aliaksandr Valialkin
1750991119
lib/logstorage: work-in-progress 2024-06-17 12:13:25 +02:00