VictoriaMetrics/lib/storage
Roman Khavronenko 7c0ae3a86a
lib/storage: keep sample with the biggest value on timestamp conflict (#3421)
The change leaves raw sample with the biggest value for identical
timestamps per each `-dedup.minScrapeInterval` discrete interval
when the deduplication is enabled.

```
benchstat old.txt new.txt
name                                         old time/op    new time/op    delta
DeduplicateSamples/minScrapeInterval=1s-10      817ns ± 2%     832ns ± 3%      ~     (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10     1.56µs ± 1%    2.12µs ± 0%   +35.19%  (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10     1.32µs ± 3%    1.65µs ± 2%   +25.57%  (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10    1.13µs ± 2%    1.50µs ± 1%   +32.85%  (p=0.000 n=10+10)

name                                         old speed      new speed      delta
DeduplicateSamples/minScrapeInterval=1s-10   10.0GB/s ± 2%   9.9GB/s ± 3%      ~     (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10   5.24GB/s ± 1%  3.87GB/s ± 0%   -26.03%  (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10   6.22GB/s ± 3%  4.96GB/s ± 2%   -20.37%  (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10  7.28GB/s ± 2%  5.48GB/s ± 1%   -24.74%  (p=0.000 n=10+10)
```

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-08 18:06:11 -08:00
..
block_header_test.go lib/storage: typo fix: umarshal -> unmarshal 2021-03-02 20:47:59 +02:00
block_header.go lib/storage: verify that timestamps in block are in the range specified by blockHeader.{Min,Max}Timestamp when upacking the block 2022-09-06 13:08:09 +03:00
block_stream_merger.go lib/storage: skip blocks outside the configured retention during search 2022-10-24 02:52:44 +03:00
block_stream_reader_test.go all: use %w instead of %s for wrapping errors in fmt.Errorf 2020-06-30 23:05:11 +03:00
block_stream_reader_timing_test.go all: use %w instead of %s for wrapping errors in fmt.Errorf 2020-06-30 23:05:11 +03:00
block_stream_reader.go lib/storage: remove logging redundant path values in a single error message 2022-12-03 22:13:13 -08:00
block_stream_writer_timing_test.go lib/{mergeset,storage}: pass compressLevel to blockStreamWriter.InitFromInmemoryPart 2022-12-03 22:46:48 -08:00
block_stream_writer.go lib/{mergeset,storage}: pass compressLevel to blockStreamWriter.InitFromInmemoryPart 2022-12-03 22:46:48 -08:00
block_test.go lib/storage: verify that timestamps in block are in the range specified by blockHeader.{Min,Max}Timestamp when upacking the block 2022-09-06 13:08:09 +03:00
block.go lib/storage: validate timestamps in the block only if they use encoding, which needs validation 2022-10-21 00:52:32 +03:00
dedup_test.go lib/storage: keep sample with the biggest value on timestamp conflict (#3421) 2022-12-08 18:06:11 -08:00
dedup_timing_test.go lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge() 2021-12-14 20:49:12 +02:00
dedup.go lib/storage: keep sample with the biggest value on timestamp conflict (#3421) 2022-12-08 18:06:11 -08:00
index_db_test.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
index_db_timing_test.go lib/storage: properly take into account already registered series when -storage.maxHourlySeries or -storage.maxDailySeries limits are enabled 2022-06-20 13:47:47 +03:00
index_db.go lib/mergeset: panic when too long item is passed to Table.AddItems() 2022-12-03 23:32:16 -08:00
inmemory_part_test.go
inmemory_part_timing_test.go
inmemory_part.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
merge_test.go lib/{mergeset,storage}: pass compressLevel to blockStreamWriter.InitFromInmemoryPart 2022-12-03 22:46:48 -08:00
merge_timing_test.go lib/{mergeset,storage}: pass compressLevel to blockStreamWriter.InitFromInmemoryPart 2022-12-03 22:46:48 -08:00
merge.go lib/storage: optimization: do not scan block for rows outside retention if it is covered by the retention 2022-12-03 22:14:12 -08:00
metaindex_row_test.go lib/storage: correctly use maxBlockSize in various checks 2020-09-24 18:12:56 +03:00
metaindex_row.go all: subsitute ioutil.ReadAll with io.ReadAll 2022-08-22 00:16:37 +03:00
metric_name_test.go app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:52:57 +03:00
metric_name.go lib/regexutil: add Simplify() function for simplifying the regular expression 2022-08-26 11:57:12 +03:00
part_header_test.go
part_header.go lib/fs: add canOverwrite arg to WriteFileAtomically when it is allowed to overwrite the file atomically if it already exists 2022-10-26 01:07:34 +03:00
part_search_test.go app/vmselect: optimize /api/v1/series a bit for time ranges smaller than one day 2022-06-28 13:02:47 +03:00
part_search.go lib/storage: speed up search for data block for the given tsids 2022-12-03 20:58:32 -08:00
part.go lib/mergeset: tune caches size limits for indexdb/dataBlocks and indexdb/indexBlocks 2022-01-21 12:45:43 +02:00
partition_search_test.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
partition_search.go all: make fmt via the upcoming Go1.19 2022-07-11 19:22:15 +03:00
partition_test.go lib/{mergeset,storage}: improve the detection of the needed free space for background merge 2021-08-25 09:35:44 +03:00
partition.go lib/{storage,mergeset}: log the duration for flushing in-memory parts on graceful shutdown 2022-12-05 21:30:48 -08:00
raw_block.go
raw_row.go lib/{mergeset,storage}: pass compressLevel to blockStreamWriter.InitFromInmemoryPart 2022-12-03 22:46:48 -08:00
search_test.go app/vmselect: optimize /api/v1/series a bit for time ranges smaller than one day 2022-06-28 13:02:47 +03:00
search.go lib/storage: skip blocks outside the configured retention during search 2022-10-24 02:52:44 +03:00
storage_test.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
storage_timing_test.go app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
storage.go lib/mergeset: panic when too long item is passed to Table.AddItems() 2022-12-03 23:32:16 -08:00
table_search_test.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
table_search_timing_test.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
table_search.go lib/storage: do not pass retentionMsecs and isReadOnly args explicitly - access them via Storage arg 2022-10-24 01:31:04 +03:00
table_test.go lib/storage: do not pass retentionMsecs and isReadOnly args explicitly - access them via Storage arg 2022-10-24 01:31:04 +03:00
table_timing_test.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
table.go all: add -inmemoryDataFlushInterval command-line flag for controlling the frequency of saving in-memory data to disk 2022-12-05 15:16:14 -08:00
tag_filters_test.go lib/regexutil: add Simplify() function for simplifying the regular expression 2022-08-26 11:57:12 +03:00
tag_filters_timing_test.go lib/regexutil: add Simplify() function for simplifying the regular expression 2022-08-26 11:57:12 +03:00
tag_filters.go lib/storage: optimize matching speed for non-trivial regexp filters 2022-10-01 12:06:06 +03:00
time_test.go
time.go all: readability improvements for query traces 2022-06-30 18:20:33 +03:00
tsid_test.go
tsid.go all: remove the remaining mentions of cluster version 2019-11-21 23:18:22 +02:00