VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-22 00:16:23 +01:00

Author	SHA1	Message	Date
Roman Khavronenko	9e9f170fe7	lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689 ) Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected. The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 22:45:45 +01:00
Roman Khavronenko	562edb72ea	app/vmalert: fix data race during hot-config reload (#5698 ) * app/vmalert: fix data race during hot-config reload During hot-reload, the logic evokes the group update and rules evaluation interruption simultaneously. Falsely assuming that interruption happens before the update. However, it could happen that group will be updated first and only after the rules evaluation will be cancelled. Which will result in permanent interruption for all rules within the group. The fix caches the cancel context function into local variable first. And only after performs the group update. With cached cancel function we can safely call it without worrying that we cancel the evaluation for already updated group. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Revert "app/vmalert: fix data race during hot-config reload" This reverts commit `a4bb7e8932`. * app/vmalert: fix data race during hot-config reload During hot-reload, the logic evokes the group update and rules evaluation interruption simultaneously. Falsely assuming that interruption happens before the update. However, it could happen that group will be updated first and only after the rules evaluation will be cancelled. Which will result in permanent interruption for all rules within the group. The fix cancels the evaulation context before applying the update, making sure that the context will be cancelled for old group always. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 22:43:02 +01:00
Yury Molodov	551f48466c	vmui: fix `Enter` key in query field (#5667 ) (#5681 )	2024-01-26 22:38:51 +01:00
Artem Navoiev	d42908133c	docs: remove <p> for imanges (#5702 ) Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 22:34:40 +01:00
Artem Navoiev	36fa314161	remove all <div> as far they obsolete and can break markdown (#5701 ) Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 22:32:54 +01:00
Aliaksandr Valialkin	2b0123058a	docs: update -help output after `bb7a419cc3`	2024-01-26 22:29:22 +01:00
Aliaksandr Valialkin	7a8b92b590	lib/{mergeset,storage}: make background merge more responsive and scalable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for `fa566c68a6` . Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2024-01-26 22:19:52 +01:00
Artem Navoiev	9e0416c666	docs: delete docs/provision_datasources.png as we support webp Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 21:36:19 +01:00
Github Actions	41c48a4c59	Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@ef5cfe6 (#5700 )	2024-01-26 21:35:53 +01:00
Aliaksandr Valialkin	c067f3f288	lib/mergeset: remove inmemoryBlock pooling, since it wasn't effecitve This should reduce memory usage a bit when new time series are ingested at high rate (aka high churn rate)	2024-01-26 21:34:22 +01:00
Aliaksandr Valialkin	d8c82b6421	app/vmselect/netstorage: initialize tmpBlocksFileWrapper at goroutine, which continues using it This may improve CPU cache locality	2024-01-26 21:29:30 +01:00
Aliaksandr Valialkin	230ef43a32	lib/logstorage: make sure that WaitGroup.Add isnt called after stopCh is closed and WaitGroup.Wait is called This protects from rare panic, which may occur during graceful shutdown of VictoriaLogs	2024-01-26 21:18:07 +01:00
Aliaksandr Valialkin	9e70d9ab47	docs/Makefile: mention that the Makefile rules must be run from VictoriaMetrics repository root	2024-01-26 21:11:14 +01:00
Aliaksandr Valialkin	7c7bfa27ac	app/vmauth: return 503 service unavailable status code when the backend returns response with unsupported status code, but the request cannot be re-tried. While at it, properly close response body. This should prevent from possible http keep-alive connection leak to backends because of unclosed response bodies. This is a follow-up for `3c0aa14b5b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5688	2024-01-26 21:10:57 +01:00
Artem Navoiev	2b870b1116	docs: fix key concepts image and links Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 21:10:36 +01:00
Artem Navoiev	bb5a0719a5	docs: change [image] to img as far we support it in release guide Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 21:09:22 +01:00
Artem Navoiev	89fbefefdb	docs: remoev vmanomaly as far we have dedicated section with alredy exists redirects Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 21:08:54 +01:00
Artem Navoiev	6ed9a05a08	docs: vmanomaly fix images Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 21:08:28 +01:00
Daria Karavaieva	b042982339	Vmanomaly Guide dashboard provisioning (#5679 ) * dashboard provisioning * delete dashboard filter, new query * dashboard screens, guide fixes	2024-01-26 21:07:44 +01:00
Artem Navoiev	aee3e51315	docs: remove raw and endraw tags as they are not needed for the new v… (#5696 ) * docs: remove raw and endraw tags as they are not needed for the new version of site Signed-off-by: Artem Navoiev <tenmozes@gmail.com> * revert formating in vmaler Signed-off-by: Artem Navoiev <tenmozes@gmail.com> --------- Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2024-01-26 21:06:10 +01:00
Github Actions	cd287b2e4c	Automatic update operator docs from VictoriaMetrics/operator@0628def (#5694 )	2024-01-26 21:05:13 +01:00
Github Actions	b8a4a78fef	Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@c644bec (#5691 )	2024-01-26 20:52:47 +01:00
Roman Khavronenko	a2f83115ae	app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0` (#5680 ) * app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0` Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`. This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE time series for alerting rules with `for: 0` as well. Such time series can be useful for tracking the moment when alerting rule became active. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056 Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: support ALERTS_FOR_STATE in `replay` mode Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 20:51:50 +01:00
Github Actions	362e52f880	Automatic update operator docs from VictoriaMetrics/operator@e75a096 (#5690 )	2024-01-26 20:50:57 +01:00
hagen1778	d76e338505	docs: simplify instructions for spinning up docker env Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 20:50:31 +01:00
Github Actions	8fa6c42396	Automatic update operator docs from VictoriaMetrics/operator@f6b9c08 (#5676 )	2024-01-26 20:50:00 +01:00
Alexander Marshalov	ef4bb36d99	vmauth: fix `vmauth_user_request_backend_errors_total` metric calc logic for use case when only one backend is available - if we get an error from the retry_status_codes list, but cannot execute retry, we increment vmauth_user_request_backend_errors_total as well (#5688 )	2024-01-26 20:49:18 +01:00
hagen1778	bd7ebb41b2	docs: fix the issue link Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 20:49:01 +01:00
Alexander Marshalov	1f858eb417	Follow up after https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5685 (#5686 )	2024-01-26 20:08:15 +01:00
Alexander Marshalov	711ace0582	Make a makefile work on a MacOS and Linux (#5685 )	2024-01-26 20:07:20 +01:00
Aliaksandr Valialkin	0715f1efcd	lib/storage: rename AssistedMerges to AssistedMergesCount in order to make these field names less misleading These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code	2024-01-25 10:21:13 +02:00
Alexander Marshalov	14712e3b99	vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached. (#5682 )	2024-01-25 10:21:09 +02:00
Aliaksandr Valialkin	1cdef56d84	lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts The maxFileParts usage has been accidentally removed in `fa566c68a6` While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading. Previously their names were falsely suggesting that these are gauges, which show the number of concurrently executed assisted merges.	2024-01-24 15:10:48 +02:00
Aliaksandr Valialkin	b8c7f0d3bc	lib/promscrape/discovery/kubernetes: typo fix in the comment for ContainerStateTerminated struct This is a follow-up for `ef12598ad4`	2024-01-24 15:10:47 +02:00
Aliaksandr Valialkin	1e364c992d	lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers Already terminated pods and containers cannot be scraped and will never resurrect, so there is zero sense in creating scrape targets for them.	2024-01-24 14:58:51 +02:00
Aliaksandr Valialkin	0dca3c4025	app/{vmselect,vmstorage}: return compression of the data passed from vmstorage to vmselect This reverts `cd4f641d32` , since it has been appeared that the disabled compression for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads, while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%. The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage. This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default. The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage. The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.	2024-01-24 13:37:05 +02:00
Aliaksandr Valialkin	e6e5b97e1e	lib/streamaggr: expand `%{ENV}` placeholders in stream aggregation configs	2024-01-24 12:31:42 +02:00
Aliaksandr Valialkin	12698b9136	lib/mergeset: really limit the number of in-memory parts to 15 It has been appeared that the registration of new time series slows down linearly with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part when it searches for TSID by newly ingested metric name. The number of in-memory parts grows when new time series are registered at high rate. The number of in-memory parts grows faster on systems with big number of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries for the indexdb, and every such entry is transformed eventually into a separate in-memory part. The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 by @misutoth - to limit the number of in-memory parts with buffered channel. This solution is implemented in this commit. Additionally, this commit merges per-CPU parts into a single part before adding it to the list of in-memory parts. This reduces CPU load when searching for TSID by newly ingested metric name. The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190	2024-01-24 03:41:19 +02:00
Aliaksandr Valialkin	5205f1c6a6	docs/Cluster-VictoriaMetrics.md: document that `vmstorage` doesnt compress data it sends to `vmselect` by default This is a follow-up for `cd4f641d32`	2024-01-23 23:21:31 +02:00
Aliaksandr Valialkin	8dd73574ca	lib/encoding: remove uneeded re-slicing of byte slice before passing it to binary.BigEndian.Uint*	2024-01-23 22:50:11 +02:00
Aliaksandr Valialkin	5a97668ad6	lib/handshake: substitute time.Now() with fastttime.UnixTimestamp(), since profiling shows time.Now() is slow	2024-01-23 18:39:28 +02:00
Aliaksandr Valialkin	be320c81bc	app/vminsert/clusternative: explain why lower-level vminsert doesnt compress responses to upper-level vminsert	2024-01-23 18:14:19 +02:00
Aliaksandr Valialkin	3199558da9	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:47:40 +02:00
Github Actions	4ccf3f41c6	Automatic update operator docs from VictoriaMetrics/operator@1470569 (#5668 )	2024-01-23 17:47:36 +02:00
Aliaksandr Valialkin	68d76b1436	lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average. The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.	2024-01-23 16:13:25 +02:00
Aliaksandr Valialkin	9b3217db61	lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting	2024-01-23 16:13:19 +02:00
Aliaksandr Valialkin	7ed7eb95b4	lib/filestream: do not measure read / write duration from / to in-memory buffers Measuring read / write duration from / to in-memory buffers has little sense, since it will be always fast. It is better to measure read / write duration from / to real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics. This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers. This is a follow-up for `2f63dec2e3`	2024-01-23 14:53:35 +02:00
Aliaksandr Valialkin	cfc1193d15	app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery() This avoids slow path in Go runtime for allocating objects bigger than 32Kb - see `704401ffa0/src/runtime/malloc.go (L11)` This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics after the commit `5dd37ad836` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 14:12:27 +02:00
Aliaksandr Valialkin	fe4ea30a79	app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size See `704401ffa0/src/runtime/malloc.go (L11)` This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics after the commit `508c608062` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527	2024-01-23 13:50:59 +02:00
Aliaksandr Valialkin	47cb79198e	docs/vmagent.md: clarify how `-promscrape.seriesLimitPerTarget` command-line flag, `series_limit` config option and `__series_limit__` label interact with each other This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5663 See also `89e3c70ccd`	2024-01-23 13:15:45 +02:00

... 15 16 17 18 19 ...

8411 Commits