VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-26 04:10:08 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	ef12598ad4	lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers Already terminated pods and containers cannot be scraped and will never resurrect, so there is zero sense in creating scrape targets for them.	2024-01-24 14:57:53 +02:00
Aliaksandr Valialkin	4d961c70f7	app/{vmselect,vmstorage}: return compression of the data passed from vmstorage to vmselect This reverts `cd4f641d32` , since it has been appeared that the disabled compression for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads, while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%. The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage. This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default. The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage. The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.	2024-01-24 13:39:28 +02:00
Aliaksandr Valialkin	f888a019fe	lib/streamaggr: expand `%{ENV}` placeholders in stream aggregation configs	2024-01-24 12:31:27 +02:00
Aliaksandr Valialkin	fa566c68a6	lib/mergeset: really limit the number of in-memory parts to 15 It has been appeared that the registration of new time series slows down linearly with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part when it searches for TSID by newly ingested metric name. The number of in-memory parts grows when new time series are registered at high rate. The number of in-memory parts grows faster on systems with big number of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries for the indexdb, and every such entry is transformed eventually into a separate in-memory part. The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 by @misutoth - to limit the number of in-memory parts with buffered channel. This solution is implemented in this commit. Additionally, this commit merges per-CPU parts into a single part before adding it to the list of in-memory parts. This reduces CPU load when searching for TSID by newly ingested metric name. The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190	2024-01-24 03:38:12 +02:00
Aliaksandr Valialkin	5543c04061	docs/Cluster-VictoriaMetrics.md: document that `vmstorage` doesnt compress data it sends to `vmselect` by default This is a follow-up for `cd4f641d32`	2024-01-23 23:22:18 +02:00
Aliaksandr Valialkin	8fb8b71295	lib/encoding: remove uneeded re-slicing of byte slice before passing it to binary.BigEndian.Uint*	2024-01-23 22:50:29 +02:00
Aliaksandr Valialkin	1c58c00618	app/vmselect/netstorage: limit the initial size for brsPoolCap with 32Kb This should reduce the number of expensive memory allocations with sizes bigger than 32Kb	2024-01-23 22:29:39 +02:00
Aliaksandr Valialkin	43ecd5d258	app/vmselect/netstorage: pre-allocate memory for metricNamesBuf This should reduce the number of metricNamesBuf re-allocations in append()	2024-01-23 21:34:16 +02:00
Aliaksandr Valialkin	ae643ef1f1	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:46:50 +02:00
Github Actions	05c9a4d7ce	Automatic update operator docs from VictoriaMetrics/operator@1470569 (#5668 )	2024-01-23 16:22:16 +01:00
Aliaksandr Valialkin	6c214397ed	lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average. The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.	2024-01-23 16:09:55 +02:00
Aliaksandr Valialkin	4d78954158	lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting	2024-01-23 16:08:38 +02:00
Aliaksandr Valialkin	6d84b1beef	lib/filestream: do not measure read / write duration from / to in-memory buffers Measuring read / write duration from / to in-memory buffers has little sense, since it will be always fast. It is better to measure read / write duration from / to real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics. This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers. This is a follow-up for `2f63dec2e3`	2024-01-23 14:52:22 +02:00
Aliaksandr Valialkin	41456d9569	app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery() This avoids slow path in Go runtime for allocating objects bigger than 32Kb - see `704401ffa0/src/runtime/malloc.go (L11)` This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics after the commit `5dd37ad836` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 14:04:49 +02:00
Aliaksandr Valialkin	1f1768d7af	app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size See `704401ffa0/src/runtime/malloc.go (L11)` This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics after the commit `508c608062` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 13:46:37 +02:00
Aliaksandr Valialkin	fac7c30f4e	docs/vmagent.md: clarify how `-promscrape.seriesLimitPerTarget` command-line flag, `series_limit` config option and `__series_limit__` label interact with each other This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5663 See also `89e3c70ccd`	2024-01-23 13:14:50 +02:00
Roman Khavronenko	89e3c70ccd	lib/promscrape: respect `0` value for `series_limit` param (#5663 ) * lib/promscrape: respect `0` value for `series_limit` param Respect `0` value for `series_limit` param in `scrape_config` even if global limit was set via `-promscrape.seriesLimitPerTarget`. Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`. This behavior aligns with possibility to override `series_limit` value via relabeling with `__series_limit__` label. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update docs/CHANGELOG.md --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 13:09:14 +02:00
Aliaksandr Valialkin	1c5163ae51	lib/mergeset: make sure that the first and the last items are in the original range after prepareBlock() Previously the checks were to strict by requiring to leave the same first and last items by prepareBlock() Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5655	2024-01-23 12:58:32 +02:00
Fred Navruzov	2adb38a9c4	- fix 404 errors after page remaning (#5664 ) - slight text fixes	2024-01-23 01:56:42 -08:00
Aliaksandr Valialkin	15a15e5b99	app/vmselect/vmui: run `make vmui-update` in order to sync recent changes in app/vmui	2024-01-23 04:31:44 +02:00
Aliaksandr Valialkin	114822d585	app/{vmstorage,vmselect}: disable vmstorage->vmselect RPC compression by default in order to improve query performance	2024-01-23 04:24:57 +02:00
Zakhar Bessarab	bf4742526d	lib/storage: print tenant ID in log when discarding or truncating labels (#5658 ) Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 04:24:56 +02:00
Yury Molodov	38231d5994	vmui: query report (#5497 ) * vmui: add query analyzer page * vmui: fix tabs for query analyzer * vmui: add help to export query * vmui: add time params to query analyzer * docs/vmui: add query analyzer * vmui: fix validation JSON form --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 04:23:26 +02:00
Yury Molodov	eb6def0695	vmui: add flag for default timezone setting (#5611 ) * vmui: add flag for default timezone setting #5375 * vmui: validate timezone before client return * Update app/vmselect/vmui.go --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 04:11:19 +02:00
Yury Molodov	633e6b48ad	vmui: fix cache autocomplete (#5591 ) * vmui: fix the logic of closing the popper #5470 * vmui: fix the logic of caching autocomplete results #5472 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 04:06:14 +02:00
Aliaksandr Valialkin	980338861f	lib/mergeset: skip comparison for every item in the block during merge if the last item in the block is smaller than the first item in the next block Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5651	2024-01-23 03:15:52 +02:00
Aliaksandr Valialkin	bc7d19c8ca	app/vmselect/promql: remove superflouos memory allocations at aggrPrepareSeries() While at it, also remove unneeded map lookup	2024-01-23 02:28:31 +02:00
Aliaksandr Valialkin	9240bc36a3	app/vmselect/promql/aggr_incremental.go: eliminate unnecessary memory allocation in incrementalAggrFuncContext.updateTimeseries	2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin	e0399ec29a	app/vmselect/netstorage: remove tswPool, since it isnt efficient	2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin	72a838a2a1	app/vmselect/netstorage: avoid metricName->blockRef lookup when processing multiple blocks for the same time series This saves a few CPU cycles for common case	2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin	5dd37ad836	app/vmselect/netstorage: use []blockRef from blockRefPool in order to reduce memory allocations	2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin	7345567c29	app/vmselect/netstorage: substitute pointer to blockRefs by brssPool index at the metricName->blockRefs map This should reduce the pressure on Go GC, since it will see lower number of pointers. This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin	678234e9f0	app/vmselect/netstorage: reduce the number of allocations for blockRefs objects in ProcessSearchQuery() This should reduce pressure on Go GC at vmselect The change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 02:28:28 +02:00
Aliaksandr Valialkin	508c608062	app/vmselect/netstorage: reduce the number of memory allocations in ProcessSearchQuery() by storing all the metric names in a single byte slice This reduces the number of memory allocations at the cost of possible memory usage increase, since now different metric name strings may hold references to the previous byte slice. This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory. This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 02:28:28 +02:00
Daria Karavaieva	ffaf48b99e	add 1.8.0 notes to changelog (#5616 ) * add 1.8.0 notes to changelog * added release date * MAD internal link * monitoring health deprecation	2024-01-22 23:51:12 +01:00
Jaskeerat Singh Randhawa	b606521745	custom-resources: fix link text for alertmanager (#5660 )	2024-01-22 18:06:40 +01:00
Aliaksandr Valialkin	3449d563bd	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin	9b4294e53e	lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex. The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache. Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores. The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache. This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds because of bottleneck at the mutex. Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2 cache hits under the mutex, e.g. when the entry is found in the mutable cache. This should automatically adjust intervals between merges depending on the addition rate for new time series (aka churn rate): - The interval will be much smaller than 10 seconds under high churn rate. This should reduce the mutex contention for mutable cache. - The interval will be bigger than 10 seconds under low churn rate. This should reduce the uneeded work on merging of mutable cache into immutable cache.	2024-01-22 18:40:32 +02:00
hagen1778	8b8d0e3677	deployment/docker: fix typo in commands example Follow up after `38b2a5bc44` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-22 16:56:27 +01:00
hagen1778	b25ef138ce	dashboards: reflect dashboard rename in copy script This is a follow-up for `ff33e60a3d` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-22 16:51:24 +01:00
hagen1778	0e5e502b3c	deployment/docker: follow-up `38b2a5bc44` * Simplify folder structure * mention datasource in README Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-22 16:05:44 +01:00
Dmytro Kozlov	38b2a5bc44	deployment/docker: add grafana datasource to the docker-compose files (#5363 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3920 https://github.com/VictoriaMetrics/grafana-datasource/issues/113	2024-01-22 15:45:31 +01:00
hagen1778	1075fcfc8c	app/vmctl/backoff: fix flaky test The change removes artificial delay before returning error, which sometimes caused less retry events than expected. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-22 12:21:14 +01:00
hagen1778	da556cc329	docs: fix Grafana link example for vmalert Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-22 09:35:18 +01:00
dependabot[bot]	df197723ae	build(deps): bump github/codeql-action from 2 to 3 (#5462 ) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2 to 3. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/v2...v3) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-01-22 01:49:17 +02:00
Aliaksandr Valialkin	d3ee3e0ef5	Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577 )" This reverts commit `cfec258803`. Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled. The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled. This prevents from storing the last response at scrapeWork.processScrapedData(). It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577	2024-01-22 00:43:48 +02:00
Aliaksandr Valialkin	9c0863babc	docs: use persistent links to Grafana dashboards These links do not depend on the dashboard name, so they do not break after the renaming of the dashboard. This is a follow-up for `ff33e60a3d`	2024-01-22 00:17:17 +02:00
Aliaksandr Valialkin	1c7f990fad	app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery() This is a follow-up for `cf03e11d89` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630	2024-01-21 23:45:31 +02:00
Artem Navoiev	3f7ed7e6b2	docs vmanomaly fix anchor Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-21 22:21:37 +01:00
Hui Wang	4e3242b02d	lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557 ) * lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long. * remove mislead comment * docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 * wip * lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds. But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice` is being registered and the discovery of the associated `pod` and/or `service` objects takes longer than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details. Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher if the number of in-flight calls is non-zero. P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets. * typo fix --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-21 23:13:15 +02:00

... 9 10 11 12 13 ...

8083 Commits