VictoriaMetrics/lib/storage
Roman Khavronenko cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
..
block_header_test.go lib/storage: typo fix: umarshal -> unmarshal 2021-03-02 20:47:59 +02:00
block_header.go lib/storage: correctly use maxBlockSize in various checks 2020-09-24 18:12:56 +03:00
block_stream_merger.go lib/storage: set bsm.Block to nil on error, so the previous block couldn't be used. 2022-01-20 20:13:14 +02:00
block_stream_reader_test.go all: use %w instead of %s for wrapping errors in fmt.Errorf 2020-06-30 23:05:11 +03:00
block_stream_reader_timing_test.go all: use %w instead of %s for wrapping errors in fmt.Errorf 2020-06-30 23:05:11 +03:00
block_stream_reader.go lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations 2022-02-01 00:18:42 +02:00
block_stream_writer_timing_test.go lib/storage: deduplicate samples more thoroughly 2021-12-15 15:59:58 +02:00
block_stream_writer.go lib/storage: deduplicate samples more thoroughly 2021-12-15 15:59:58 +02:00
block_test.go lib/storage: fix tests for 32-bit arches such as GOARCH=386 and GOARCH=arm 2020-09-29 13:10:22 +03:00
block.go lib/storage: deduplicate samples more thoroughly 2021-12-15 15:59:58 +02:00
dedup_test.go lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge() 2021-12-14 20:49:12 +02:00
dedup_timing_test.go lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge() 2021-12-14 20:49:12 +02:00
dedup.go lib/storage: deduplicate samples more thoroughly 2021-12-15 15:59:58 +02:00
index_db_test.go lib/index: reduce read/write load after indexDB rotation (#2177) 2022-02-12 00:30:08 +02:00
index_db_timing_test.go lib/index: reduce read/write load after indexDB rotation (#2177) 2022-02-12 00:30:08 +02:00
index_db.go lib/index: reduce read/write load after indexDB rotation (#2177) 2022-02-12 00:30:08 +02:00
inmemory_part_test.go all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
inmemory_part_timing_test.go all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
inmemory_part.go lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects 2021-07-06 16:28:41 +03:00
merge_test.go app/vmstorage: support for -retentionPeriod smaller than one month 2020-10-20 14:31:44 +03:00
merge_timing_test.go app/vmstorage: support for -retentionPeriod smaller than one month 2020-10-20 14:31:44 +03:00
merge.go lib/storage: deduplicate samples more thoroughly 2021-12-15 15:59:58 +02:00
metaindex_row_test.go lib/storage: correctly use maxBlockSize in various checks 2020-09-24 18:12:56 +03:00
metaindex_row.go lib/storage: correctly use maxBlockSize in various checks 2020-09-24 18:12:56 +03:00
metric_name_test.go app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:52:57 +03:00
metric_name.go lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations 2022-02-01 00:18:42 +02:00
part_header_test.go all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
part_header.go lib/promscrape: support prometheus-like duration in scrape configs (#2169) 2022-02-11 16:17:00 +02:00
part_search_test.go Revert "lib/storage: remove unused fetchData arg from BlockRef.MustReadBlock" 2020-09-24 22:44:23 +03:00
part_search.go lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations 2022-02-01 00:18:42 +02:00
part.go lib/mergeset: tune caches size limits for indexdb/dataBlocks and indexdb/indexBlocks 2022-01-21 12:45:43 +02:00
partition_search_test.go app/vmstorage: support for -retentionPeriod smaller than one month 2020-10-20 14:31:44 +03:00
partition_search.go optimized code (#2103) 2022-01-28 14:15:41 +02:00
partition_test.go lib/{mergeset,storage}: improve the detection of the needed free space for background merge 2021-08-25 09:35:44 +03:00
partition.go lib/{mergeset,storage}: properly limit cache sizes for indexdb 2022-01-20 18:37:17 +02:00
raw_block.go all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
raw_row.go lib/storage: deduplicate samples more thoroughly 2021-12-15 15:59:58 +02:00
search_test.go app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
search.go lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations 2022-02-01 00:18:42 +02:00
storage_test.go lib/index: reduce read/write load after indexDB rotation (#2177) 2022-02-12 00:30:08 +02:00
storage_timing_test.go app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
storage.go lib/index: reduce read/write load after indexDB rotation (#2177) 2022-02-12 00:30:08 +02:00
table_search_test.go app/vmstorage: support for -retentionPeriod smaller than one month 2020-10-20 14:31:44 +03:00
table_search_timing_test.go lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard 2021-06-11 10:57:23 +03:00
table_search.go optimized code (#2103) 2022-01-28 14:15:41 +02:00
table_test.go app/vmstorage: support for -retentionPeriod smaller than one month 2020-10-20 14:31:44 +03:00
table_timing_test.go all: properly handle CPU limits set on the host system/container 2020-12-08 21:07:29 +02:00
table.go lib/storage/table.go: add missing tb.ptwsLock.Unlock() before the return 2022-01-28 14:15:42 +02:00
tag_filters_test.go lib/storage: properly handle {__name__=~"prefix(suffix1|suffix2)",other_label="..."} queries 2021-09-23 21:48:51 +03:00
tag_filters_timing_test.go lib/storage: small code adjustements after d2960a20e0 2020-10-17 01:16:54 +03:00
tag_filters.go lib/storage: fix broken BenchmarkHeadPostingForMatchers for {i=~".*"} after f4dead529f 2022-02-12 00:27:10 +02:00
time_test.go all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
time.go all: use %w instead of %s for wrapping errors in fmt.Errorf 2020-06-30 23:05:11 +03:00
tsid_test.go all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
tsid.go all: remove the remaining mentions of cluster version 2019-11-21 23:18:22 +02:00