VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-23 20:37:12 +01:00

Author	SHA1	Message	Date
Artem Fetishev	55febc0920	lib/storage: restore ability to put empty metric ID list into tagFiltersToMetricIDsCache (#7064 ) ### Describe Your Changes Currently it the metricID list is empty it won't be mashalled and as the result won't be put into the tagFiltersToMetricIDsCache which causes the cache misses for the corresponding tagFilters. In some setups this causes severe search speed detradation (see #7009). The empty metric IDs was covered before but then was accidentally removed in `6c21439`. This PR restores the coverage of this case. A new unit test can be used as a proof that empty metricID lists are not added to the cache (just remove the fix in index_db.go and run the test to see the result) Also a benchmark has been added to see the implications of the compression. ``` user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR goos: linux goarch: amd64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: 13th Gen Intel(R) Core(TM) i7-1355U BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12 3237240 363.5 ns/op 0 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12 2831049 451.8 ns/op 0.4706 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12 1152764 1009 ns/op 1.667 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12 297055 3998 ns/op 5.755 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12 31172 34566 ns/op 8.484 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12 4900 289659 ns/op 9.416 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12 447 2341173 ns/op 9.456 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12 42 24926928 ns/op 9.468 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12 5 204098872 ns/op 9.467 compression-rate PASS ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 15.018s ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-09-20 17:21:53 +02:00
Dima Lazerka	e0e856d2e7	Add flagutil.Duration to avoid conversion bugs (#4835 ) * Introduce flagutil.Duration To avoid conversion bugs * Fix tests * Comment why not .Seconds()	2023-09-01 09:27:51 +02:00
Aliaksandr Valialkin	6685f6ce7c	lib/storage: move series registration in caches from createAllIndexesForMetricName into a separate function - putSeriesToCache This makes the code more clear and easier to read This is a follow-up for `7094fa38bc`	2023-07-13 23:13:23 -07:00
Aliaksandr Valialkin	3dacdcb707	lib/storage: optimize BenchmarkIndexDBGetTSIDs() - Sort MetricName tags only once before the benchmark loop. - Obtain indexSearch per each benchmark loop in order to give a chance for background merge for the recently created parts	2023-07-13 21:56:53 -07:00
Aliaksandr Valialkin	443661a5da	lib/storage: properly free up resources from newTestStorage() by calling stopTestStorage()	2023-07-13 17:13:24 -07:00
Aliaksandr Valialkin	7094fa38bc	lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping Previously all the newly ingested time series were registered in global `MetricName -> TSID` index. This index was used during data ingestion for locating the TSID (internal series id) for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names). The `MetricName -> TSID` index is stored on disk in order to make sure that the data isn't lost on VictoriaMetrics restart or unclean shutdown. The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache, and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics uses in-memory cache for speeding up the lookup for active time series. This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk. VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases: - If `storage/tsid` cache capacity isn't enough for active time series. Then just increase available memory for VictoriaMetrics or reduce the number of active time series ingested into VictoriaMetrics. - If new time series is ingested into VictoriaMetrics. In this case it cannot find the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index, since it doesn't know that the index has no the corresponding entry too. This is a typical event under high churn rate, when old time series are constantly substituted with new time series. Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index, are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics. Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName` for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod. This index can become very large under high churn rate and long retention. VictoriaMetrics caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups. The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series. This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics consults only the index for the current day when new time series is ingested into it. The downside of this change is increased indexdb size on disk for workloads without high churn rate, e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store identical `MetricName -> TSID` entries for static time series for every day. This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation, since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 . At the same time the change fixes the issue, which could result in lost access to time series, which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685 This is a follow-up for `1f28b46ae9`	2023-07-13 16:07:30 -07:00
Aliaksandr Valialkin	1f28b46ae9	lib/storage: revert the migration from global to per-day index for (MetricName -> TSID) This reverts the following commits: - `e0e16a2d36` - `2ce02a7fe6` The reason for revert: the updated logic breaks assumptions made when fixing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 . For example, if a time series stop receiving new samples during the first day after the indexdb rotation, there are chances that the time series won't be registered in the new indexdb. This is OK until the next indexdb rotation, since the time series is registered in the previous indexdb, so it can be found during queries. But the time series will become invisible for search after the next indexdb rotation, while its data is still there. There is also incompletely solved issue with the increased CPU and disk IO resource usage just after the indexdb rotation. There was an attempt to fix it, but it didn't fix it in full, while introducing the issue mentioned above. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 TODO: to find out the solution, which simultaneously solves the following issues: - increased memory usage for setups high churn rate and long retention (e.g. what the reverted commit does) - increased CPU and disk IO usage during indexdb rotation ( https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 ) - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 Possible solution - to create the new indexdb in one hour before the indexdb rotation and to gradually pre-populate it with the needed index data during the last hour before indexdb rotation. Then the new indexdb will contain all the needed data just after the rotation, so it won't trigger increased CPU and disk IO.	2023-05-18 11:30:49 -07:00
Aliaksandr Valialkin	e0e16a2d36	lib/storage: follow-up after `2ce02a7fe6` - Document the change at docs/CHANGELOG.md - Clarify comments for non-trivial code touched by the commit - Improve the logic behind maybeCreateIndexes(): - Correctly create per-day indexes if the indexdb rotation is performed during the first hour or the last hour of the day by UTC. Previously there was a possibility of missing index entries on that day. - Increase the duration for creating new indexes in the current indexdb for up to 22 hours after indexdb rotation. This should reduce the increased resource usage after indexdb rotation. It is safe to postpone index creation for the current day until the last hour of the current day after indexdb rotation by UTC, since the corresponding (date, ...) entries exist in the previous indexdb. - Search for TSID by (date, MetricName) in both the current and the previous indexdb. Previously the search was performed only in the current indexdb. This could lead to excess creation of per-day indexes for the current day just after indexdb rotation. - Search for (date, metricID) entries in both the current and the previous indexdb. Previously the search was performed only in the current indexdb. This could lead to excess creation of per-day indexes for the current day just after indexdb rotation.	2023-05-16 23:19:27 -07:00
Aliaksandr Valialkin	3727251910	lib/fs: add MustReadDir() function Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity. The fs.MustReadDir() logs the error with the directory name and the call stack on error before exit. This information should be enough for debugging the cause of the error.	2023-04-14 22:10:46 -07:00
Aliaksandr Valialkin	b958fc7846	lib/storage: properly take into account already registered series when `-storage.maxHourlySeries` or `-storage.maxDailySeries` limits are enabled The commit `5fb45173ae` takes into account only newly registered series when applying cardinality limits. This means that the cardinality limit could be exceeded with already registered series. This commit returns back accounting for already registered series when applying cardinality limits.	2022-06-20 13:47:47 +03:00
Aliaksandr Valialkin	55e7afae3a	lib/storage: create per-day indexes together with global indexes when registering new time series Previously the creation of per-day indexes and global indexes for the newly registered time series was decoupled. Now global indexes and per-day indexes for the current day are created toghether for new time series. This should speed up registering new time series a bit.	2022-06-19 22:42:10 +03:00
Aliaksandr Valialkin	ea06d2fd3c	lib/storage: stop background merge when storage enters read-only mode This should prevent from `no space left on device` errors when VictoriaMetrics under-estimates the additional disk space needed for background merge. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2603	2022-06-01 14:36:45 +03:00
Aliaksandr Valialkin	41958ed5dd	all: add initial support for query tracing See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403	2022-06-01 02:29:23 +03:00
Roman Khavronenko	cf1a8bce6b	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin	08428464e9	lib/storage: fix broken BenchmarkHeadPostingForMatchers for `{i=~".*"}` after `f4dead529f` The commit `f4dead529f` makes such query to return nothing instead of all the time series. This aligns more with Prometheus behaviour.	2022-02-12 00:27:10 +02:00
Aliaksandr Valialkin	c4f3fbfa5d	lib/storage: reset cache on disk during series deletion and during indexdb rotation This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347	2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin	553016ea99	lib/storage: disable composite index usage when querying old data	2021-02-10 14:57:50 +02:00
Aliaksandr Valialkin	764dc2499f	lib/storage: code cleanup after `10f2eedee0` Remove the code that uses metricIDs caches for the current and the previous hour during metricIDs search, since this code became unused after implementing per-day inverted index almost a year ago. While at it, fix a bug, which could prevent from finding time series with names containing dots (aka Graphite-like names such as `foo.bar.baz`).	2020-10-01 19:06:23 +03:00
Aliaksandr Valialkin	039c9d2441	lib/storage: respect `-search.maxQueryDuration` when searching for time series in inverted index Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`. This commit stops searching in inverted index on query timeout.	2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin	e1107fec10	lib/storage: reset `MetricName->TSID` cache after marking metricIDs as deleted This is a follow-up commit after `12b16077c4` , which didn't reset the `tsidCache` in all the required places. This could result in indefinite errors like: missing metricName by metricID ...; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time Fix this by resetting the cache inside deleteMetricIDs function.	2020-07-14 14:06:32 +03:00
Aliaksandr Valialkin	d5dddb0953	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:05:11 +03:00
Aliaksandr Valialkin	5c1e4143e9	lib/storage: verify the number of returned metricIDs in BenchmarkHeadPostingForMatchers	2019-11-20 15:39:28 +02:00
Aliaksandr Valialkin	b6f22a62cb	lib/storage: increase the number of created time series in BenchmarkHeadPostingForMatchers in order to be on par with Promethues The previous commit was accidentally creating 10x smaller number of time series than Prometheus and this led to invalid benchmark results. The updated benchmark results: benchmark old ns/op new ns/op delta BenchmarkHeadPostingForMatchers/n="1" 272756688 6194893 -97.73% BenchmarkHeadPostingForMatchers/n="1",j="foo" 138132923 10781372 -92.19% BenchmarkHeadPostingForMatchers/j="foo",n="1" 134723762 10632834 -92.11% BenchmarkHeadPostingForMatchers/n="1",j!="foo" 195823953 10679975 -94.55% BenchmarkHeadPostingForMatchers/i=~"." 7962582919 100118510 -98.74% BenchmarkHeadPostingForMatchers/i=~".+" 7589543864 154955671 -97.96% BenchmarkHeadPostingForMatchers/i=~"" 1142371741 258003769 -77.42% BenchmarkHeadPostingForMatchers/i!="" 9964150263 159783895 -98.40% BenchmarkHeadPostingForMatchers/n="1",i=~".",j="foo" 216995884 10937895 -94.96% BenchmarkHeadPostingForMatchers/n="1",i=~".",i!="2",j="foo" 202541348 10990027 -94.57% BenchmarkHeadPostingForMatchers/n="1",i!="" 486285711 87004349 -82.11% BenchmarkHeadPostingForMatchers/n="1",i!="",j="foo" 350776931 53342793 -84.79% BenchmarkHeadPostingForMatchers/n="1",i=~".+",j="foo" 380888565 54256156 -85.76% BenchmarkHeadPostingForMatchers/n="1",i=~"1.+",j="foo" 89500296 21823279 -75.62% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!="2",j="foo" 379529654 46671359 -87.70% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!~"2.",j="foo" 424563825 53915842 -87.30% VictoriaMetrics uses 1GB of RAM during the benchmark (vs 3.5GB of RAM for Prometheus)	2019-11-18 19:50:58 +02:00
Aliaksandr Valialkin	8a0dfc6220	lib/storage: add BenchmarkHeadPostingForMatchers similar to the benchmark from Prometheus See the corresponding benchmark in Prometheus - `23c0299d85/tsdb/head_bench_test.go (L52)` The benchmark allows performing apples-to-apples comparison of time series search in Prometheus and VictoriaMetrics. The following article - https://www.robustperception.io/evaluating-performance-and-correctness - contains incorrect numbers for VictoriaMetrics, since there wasn't this benchmark yet. Fix this. Benchmarks can be repeated with the following commands from Prometheus and VictoriaMetrics source code roots: - Prometheus: GOMAXPROCS=1 go test ./tsdb/ -run=111 -bench=BenchmarkHeadPostingForMatchers - VictoriaMetrics: GOMAXPROCS=1 go test ./lib/storage/ -run=111 -bench=BenchmarkHeadPostingForMatchers Benchmark results: benchmark old ns/op new ns/op delta BenchmarkHeadPostingForMatchers/n="1" 272756688 364977 -99.87% BenchmarkHeadPostingForMatchers/n="1",j="foo" 138132923 1181636 -99.14% BenchmarkHeadPostingForMatchers/j="foo",n="1" 134723762 1141578 -99.15% BenchmarkHeadPostingForMatchers/n="1",j!="foo" 195823953 1148056 -99.41% BenchmarkHeadPostingForMatchers/i=~"." 7962582919 8716755 -99.89% BenchmarkHeadPostingForMatchers/i=~".+" 7589543864 12096587 -99.84% BenchmarkHeadPostingForMatchers/i=~"" 1142371741 16164560 -98.59% BenchmarkHeadPostingForMatchers/i!="" 9964150263 12230021 -99.88% BenchmarkHeadPostingForMatchers/n="1",i=~".",j="foo" 216995884 1173476 -99.46% BenchmarkHeadPostingForMatchers/n="1",i=~".",i!="2",j="foo" 202541348 1299743 -99.36% BenchmarkHeadPostingForMatchers/n="1",i!="" 486285711 11555193 -97.62% BenchmarkHeadPostingForMatchers/n="1",i!="",j="foo" 350776931 5607506 -98.40% BenchmarkHeadPostingForMatchers/n="1",i=~".+",j="foo" 380888565 6380335 -98.32% BenchmarkHeadPostingForMatchers/n="1",i=~"1.+",j="foo" 89500296 2078970 -97.68% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!="2",j="foo" 379529654 6561368 -98.27% BenchmarkHeadPostingForMatchers/n="1",i=~".+",i!~"2.",j="foo" 424563825 6757132 -98.41% The first column (old) is for Prometheus, the second column (new) is for VictoriaMetrics. As you can see, VictoriaMetrics outperforms Prometheus by more than 100x in almost all the test cases of this benchmark. Prometheus was using 3.5GB of RAM during the benchmark, while VictoriaMetrics was using 400MB of RAM.	2019-11-18 18:45:06 +02:00
Aliaksandr Valialkin	380cae23a0	lib/storage: add benchmarks for regexp filter match / mismatch These benchmarks allow estimate the performance of regexp filters in promql	2019-08-22 16:36:42 +03:00
Aliaksandr Valialkin	09fc6e22e5	all: use workingsetcache instead of fastcache This should reduce the amount of RAM required for processing time series with non-zero churn rate. The previous cache behavior can be restored with `-cache.oldBehavior` command-line flag.	2019-08-13 21:39:34 +03:00
Aliaksandr Valialkin	ec1b185991	lib/storage: remove broken BenchmarkIndexDBSearchTSIDs	2019-08-13 20:22:08 +03:00
Aliaksandr Valialkin	683bf2a11f	lib/storage: make sure non-nil args are passed to openIndexDB	2019-06-25 20:10:04 +03:00
Aliaksandr Valialkin	80db24386e	lib/storage: typo fixes found by golangci-lint; updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/69	2019-06-20 14:37:55 +03:00
Aliaksandr Valialkin	d37924900b	lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)	2019-06-09 19:13:56 +03:00
Aliaksandr Valialkin	1836c415e6	all: open-sourcing single-node version	2019-05-23 00:18:06 +03:00

31 Commits