VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-16 00:41:24 +01:00

Author	SHA1	Message	Date
Roman Khavronenko	d107f86fbc	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:34:44 +02:00
Aliaksandr Valialkin	21a42e990f	lib/storage: fix broken BenchmarkHeadPostingForMatchers for `{i=~".*"}` after `f4dead529f` The commit `f4dead529f` makes such query to return nothing instead of all the time series. This aligns more with Prometheus behaviour.	2022-02-12 00:28:21 +02:00
Aliaksandr Valialkin	ce10bdc82a	lib/storage: reset cache on disk during series deletion and during indexdb rotation This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347	2021-06-11 12:54:36 +03:00
Aliaksandr Valialkin	148422bcba	lib/storage: disable composite index usage when querying old data	2021-02-10 14:57:58 +02:00
Aliaksandr Valialkin	fd7dd5064a	lib/storage: code cleanup after `10f2eedee0` Remove the code that uses metricIDs caches for the current and the previous hour during metricIDs search, since this code became unused after implementing per-day inverted index almost a year ago. While at it, fix a bug, which could prevent from finding time series with names containing dots (aka Graphite-like names such as `foo.bar.baz`).	2020-10-01 19:12:04 +03:00
Aliaksandr Valialkin	94cc677b0c	lib/storage: slightly reduce code difference between single-node and cluster versions	2020-07-24 01:18:05 +03:00
Aliaksandr Valialkin	fb3d1380ac	lib/storage: respect `-search.maxQueryDuration` when searching for time series in inverted index Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`. This commit stops searching in inverted index on query timeout.	2020-07-23 21:22:05 +03:00
Aliaksandr Valialkin	be0ab4fbfe	lib/storage: reset `MetricName->TSID` cache after marking metricIDs as deleted This is a follow-up commit after `12b16077c4` , which didn't reset the `tsidCache` in all the required places. This could result in indefinite errors like: missing metricName by metricID ...; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time Fix this by resetting the cache inside deleteMetricIDs function.	2020-07-14 14:05:19 +03:00
Aliaksandr Valialkin	d962568e93	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:33:46 +03:00
Aliaksandr Valialkin	a02a57fbe9	lib/storage: verify the number of returned metricIDs in BenchmarkHeadPostingForMatchers	2019-11-20 15:40:03 +02:00
Aliaksandr Valialkin	6ca4b94511	lib/storage: increase the number of created time series in BenchmarkHeadPostingForMatchers in order to be on par with Promethues The previous commit was accidentally creating 10x smaller number of time series than Prometheus and this led to invalid benchmark results. The updated benchmark results: benchmark old ns/op new ns/op delta BenchmarkHeadPostingForMatchers/n="1" 272756688 6194893 -97.73% BenchmarkHeadPostingForMatchers/n="1",j="foo" 138132923 10781372 -92.19% BenchmarkHeadPostingForMatchers/j="foo",n="1" 134723762 10632834 -92.11% BenchmarkHeadPostingForMatchers/n="1",j!="foo" 195823953 10679975 -94.55% BenchmarkHeadPostingForMatchers/i=~"." 7962582919 100118510 -98.74% BenchmarkHeadPostingForMatchers/i=~".+" 7589543864 154955671 -97.96% BenchmarkHeadPostingForMatchers/i=~"" 1142371741 258003769 -77.42% BenchmarkHeadPostingForMatchers/i!="" 9964150263 159783895 -98.40% BenchmarkHeadPostingForMatchers/n="1",i=~".",j="foo" 216995884 10937895 -94.96% BenchmarkHeadPostingForMatchers/n="1",i=~".",i!="2",j="foo" 202541348 10990027 -94.57% BenchmarkHeadPostingForMatchers/n="1",i!="" 486285711 87004349 -82.11% BenchmarkHeadPostingForMatchers/n="1",i!="",j="foo" 350776931 53342793 -84.79% BenchmarkHeadPostingForMatchers/n="1",i=~".+",j="foo" 380888565 54256156 -85.76% BenchmarkHeadPostingForMatchers/n="1",i=~"1.+",j="foo" 89500296 21823279 -75.62% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!="2",j="foo" 379529654 46671359 -87.70% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!~"2.",j="foo" 424563825 53915842 -87.30% VictoriaMetrics uses 1GB of RAM during the benchmark (vs 3.5GB of RAM for Prometheus)	2019-11-18 19:48:27 +02:00
Aliaksandr Valialkin	6f61fd367a	lib/storage: add BenchmarkHeadPostingForMatchers similar to the benchmark from Prometheus See the corresponding benchmark in Prometheus - `23c0299d85/tsdb/head_bench_test.go (L52)` The benchmark allows performing apples-to-apples comparison of time series search in Prometheus and VictoriaMetrics. The following article - https://www.robustperception.io/evaluating-performance-and-correctness - contains incorrect numbers for VictoriaMetrics, since there wasn't this benchmark yet. Fix it. Benchmarks can be repeated with the following commands from Prometheus and VictoriaMetrics source code roots: - Prometheus: GOMAXPROCS=1 go test ./tsdb/ -run=111 -bench=BenchmarkHeadPostingForMatchers - VictoriaMetrics: GOMAXPROCS=1 go test ./lib/storage/ -run=111 -bench=BenchmarkHeadPostingForMatchers Benchmark results: benchmark old ns/op new ns/op delta BenchmarkHeadPostingForMatchers/n="1" 272756688 364977 -99.87% BenchmarkHeadPostingForMatchers/n="1",j="foo" 138132923 1181636 -99.14% BenchmarkHeadPostingForMatchers/j="foo",n="1" 134723762 1141578 -99.15% BenchmarkHeadPostingForMatchers/n="1",j!="foo" 195823953 1148056 -99.41% BenchmarkHeadPostingForMatchers/i=~"." 7962582919 8716755 -99.89% BenchmarkHeadPostingForMatchers/i=~".+" 7589543864 12096587 -99.84% BenchmarkHeadPostingForMatchers/i=~"" 1142371741 16164560 -98.59% BenchmarkHeadPostingForMatchers/i!="" 9964150263 12230021 -99.88% BenchmarkHeadPostingForMatchers/n="1",i=~".",j="foo" 216995884 1173476 -99.46% BenchmarkHeadPostingForMatchers/n="1",i=~".",i!="2",j="foo" 202541348 1299743 -99.36% BenchmarkHeadPostingForMatchers/n="1",i!="" 486285711 11555193 -97.62% BenchmarkHeadPostingForMatchers/n="1",i!="",j="foo" 350776931 5607506 -98.40% BenchmarkHeadPostingForMatchers/n="1",i=~".+",j="foo" 380888565 6380335 -98.32% BenchmarkHeadPostingForMatchers/n="1",i=~"1.+",j="foo" 89500296 2078970 -97.68% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!="2",j="foo" 379529654 6561368 -98.27% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!~"2.",j="foo" 424563825 6757132 -98.41% The first column (old) is for Prometheus, the second column (new) is for VictoriaMetrics. Prometheus was using 3.5GB of RAM during the benchmark, while VictoriaMetrics was using 400MB of RAM.	2019-11-18 18:47:02 +02:00
Aliaksandr Valialkin	4ed63d033a	lib/storage: add benchmarks for regexp filter match / mismatch These benchmarks allow estimate the performance of regexp filters in promql	2019-08-22 16:37:19 +03:00
Aliaksandr Valialkin	8c2158af24	all: use workingsetcache instead of fastcache This should reduce the amount of RAM required for processing time series with non-zero churn rate. The previous cache behavior can be restored with `-cache.oldBehavior` command-line flag.	2019-08-13 21:40:28 +03:00
Aliaksandr Valialkin	5a7ab0d90b	lib/storage: remove broken BenchmarkIndexDBSearchTSIDs	2019-08-13 20:21:23 +03:00
Aliaksandr Valialkin	ee23a143b9	lib/storage: make sure non-nil args are passed to openIndexDB	2019-06-25 20:10:08 +03:00
Aliaksandr Valialkin	d882afa905	lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)	2019-06-09 19:14:04 +03:00
Aliaksandr Valialkin	24578b4bb1	all: open-sourcing cluster version	2019-05-23 00:25:38 +03:00
Aliaksandr Valialkin	1836c415e6	all: open-sourcing single-node version	2019-05-23 00:18:06 +03:00

19 Commits