VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-15 08:23:34 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	12e94f10cc	deployment/docker: update Go builder from Go1.21.4 to Go1.21.5 See https://github.com/golang/go/issues?q=milestone%3AGo1.21.5+label%3ACherryPickApproved	2023-12-06 22:33:27 +02:00
Dmytro Kozlov	6a41e1ec0c	app/vmalert: replace error metrics for gauges with counter metrics (#5217 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `935bec447b`)	2023-12-06 19:41:34 +01:00
Aliaksandr Valialkin	8b6bce61e4	lib/promscrape: follow-up for `97373b7786` Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down /metrics exposition significantly when -promscrape.config contains thousands of scrape jobs. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335	2023-12-06 17:36:48 +02:00
Aliaksandr Valialkin	509339bf63	app/vmselect: properly adjust the lower bound for the time range where raw samples must be selected for default_rollup() function Previously the lower bound could be too small, which could result in missing values at the beginning of the graph for default_rollup() function. This function is automatically applied to all the series selectors if they aren't explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting the lower bound for the selected time range. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388	2023-12-06 14:46:18 +02:00
Hui Wang	065f5a7f9e	vmagent: add `vm_promscrape_scrape_pool_targets` for scrape jobs like… (#5335 ) * vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers * add extra panel for new metric	2023-12-06 14:46:02 +02:00
Aliaksandr Valialkin	61db92cdc7	Revert "lib/protoparser/datadog: follow-up after 543f218fe96574b9b2189c8350bb09afa349e3bb" This reverts commit `73d18fbc7a`. Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:29:00 +02:00
Aliaksandr Valialkin	bf187b2dc9	app/vmagent: add `-enableMultitenantHandlers` command-line flag This flag allows converting tenant id to (vm_account_id, vm_project_id) labels. this flag deprecates `-remoteWrite.multitenantURL` command-line flag, because `-enableMultitenantHandlers` is easier to use and combine with multitenant url at vminsert - https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy-via-labels See https://docs.victoriametrics.com/vmagent.html#multitenancy Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1505	2023-12-05 01:35:59 +02:00
Dmytro Kozlov	6770bad207	app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format (#5397 ) * app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format * app/vmalert: hide updates if query param not set * app/vmalert: fix panic (recursion call) * app/vmalert: add needed group name and file name * app/vmalert: fix comment, update behavior * app/vmalert: fix description * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-12-04 22:49:39 +02:00
Aliaksandr Valialkin	a3d0bbfcda	deployment/docker: update backe Docker image from alpine 3.18.4 to 3.18.5 See https://www.alpinelinux.org/posts/Alpine-3.15.11-3.16.8-3.17.6-3.18.5-released.html	2023-12-04 18:17:07 +02:00
Aliaksandr Valialkin	d868155751	app/vmselect: do not limit concurrency for static and fast queries Previously concurrency for static and fast queries was limited with the -search.maxConcurrentRequests command-line flag. This could complicate identifying heavy queries via `vmui` at `Top queries` and `Active queries` pages, since `vmui` and these pages couldn't be opened on overloaded vmselect. Thanks to @f41gh7 for the idea.	2023-12-04 18:14:29 +02:00
Aliaksandr Valialkin	b6d6a3a530	lib/promscrape: show dropped targets because of sharding at /service-discovery page Previously the /service-discovery page didn't show targets dropped because of sharding ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). Show also the reason why every target is dropped at /service-discovery page. This should improve debuging why particular targets are dropped. While at it, do not remove dropped targets from the list at /service-discovery page until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets . Previously the list was cleaned up every 10 minutes from the entries, which weren't updated for the last minute. This could complicate debugging of dropped targets. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389	2023-12-04 17:42:46 +02:00
Zakhar Bessarab	2992682f6c	lib/backup/s3remote: remove prev object versions for recursive delete (#719 ) * lib/backup/s3remote: remove prev object versions for recursive delete - fix error caused by sending empty objects list to be deleted. This was possible in case old versions of objects where deleted, but root-level entries where still available. This caused paginator to return an empty page which wasn't skipped. - delete previous versions of objects recursively for S3 remote Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/changelog: add vmbackupmanager fix entry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/backup/s3remote: unify path construction for S3 objects Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-12-04 17:01:09 +02:00
Aliaksandr Valialkin	9f352f1b93	app/vminsert/newrelic: simplify the code a bit after `1fb8dc0092` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5416 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5421	2023-12-04 16:26:52 +02:00
Dmytro Kozlov	1fb8dc0092	app/vminsert: fix newrelic ingestion in cluster version (#5421 ) Properly pass tenant ID to ingested data from newrelic. Before tenant ID was mistakenly skipped.	2023-12-04 09:38:32 +01:00
Hui Wang	3507e1e27b	vmalert-tool: fix alert_rule_test case when eval_time is not multiple of evaluation_interval (#5387 ) Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `1911320c86`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-01 14:00:58 +01:00
Aliaksandr Valialkin	d1445bc0c8	all: expose additional metrics for simplifying debugging of VictoriaMetrics components Updates https://github.com/VictoriaMetrics/metrics/issues/54 (cherry picked from commit `8eddccfbb4`)	2023-12-01 14:00:28 +01:00
Aliaksandr Valialkin	f0215afee3	lib/promrelabel: add `keep_if_contains` and `drop_if_contains` relabeling actions (cherry picked from commit `ac65c6b178`)	2023-12-01 14:00:20 +01:00
Nikolay	9505d48070	lib/streamaggr: properly reference slice with labels (#5406 ) * lib/streamaggr: properly reference slice with labels by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402 * Reduce memory allocations when output_relabel_configs adds new labels to output samples --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> (cherry picked from commit `41f7940f97`)	2023-12-01 14:00:18 +01:00
hagen1778	73d18fbc7a	lib/protoparser/datadog: follow-up after `543f218fe9` * prevent /api/v1 from panic on parsing rows * add tests for Extract function for v1 and v2 api's * separate request types in different pools to prevent different objects mixing * add changelog line `543f218fe9` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `98d0f81f21`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-01 13:56:23 +01:00
hagen1778	1e557b73a5	docs: mention contributor of PR 5368 Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `5424632ba3`)	2023-11-28 12:49:49 +01:00
luckyxiaoqiang	8ce82c5400	app/vmselect/promql: add day_of_year() function (#5368 ) Co-authored-by: dingxiaoqiang <dingxiaoqiang@bytedance.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `d7897e0d70`)	2023-11-28 12:49:48 +01:00
Aliaksandr Valialkin	2f14394335	app/vmagent: follow-up for `090cb2c9de` - Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check for the result returned from these functions. - Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue. - Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case. - Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage cannot keep up with the data ingestion rate. - Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples. - Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push data to persistent queue when -remoteWrite.disableOnDiskQueue is set. - Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics, because they are replaced with vmagent_remotewrite_samples_dropped_total metric. - Update 'Disabling on-disk persistence' docs at docs/vmagent.md - Update stale comments in the code Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110	2023-11-25 12:13:39 +02:00
Nikolay	25ac2aac31	app/vmagent: allow to disabled on-disk persistence (#5088 ) * app/vmagent: allow to disabled on-disk queue Previously, it wasn't possible to build data processing pipeline with a chain of vmagents. In case when remoteWrite for the last vmagent in the chain wasn't accessible, it persisted data only when it has enough disk capacity. If disk queue is full, it started to silently drop ingested metrics. New flags allows to disable on-disk persistent and immediatly return an error if remoteWrite is not accessible anymore. It blocks any writes and notify client, that data ingestion isn't possible. Main use case for this feature - use external queue such as kafka for data persistence. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110 * adds test, updates readme * apply review suggestions * update docs for vmagent * makes linter happy --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-25 12:12:29 +02:00
Aliaksandr Valialkin	3674232128	docs: make more visible that the maximum JSON line length, which is accepted by /api/v1/import, is limited by -import.maxLineLen command-line flag value This is a follow-up for `0cf55ded34` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5364	2023-11-24 13:14:40 +02:00
Roman Khavronenko	26242f526e	lib/protoparser: decrease `import.maxLineLen` from 100MB to 10MB (#5364 ) Tests showed that importing a single line with 70MB size takes 5.3GiB RSS memory for VictoriaMetrics single-node. In the scenario when user exports and imports data from one VM to another, it could possibly lead to OOM exception for destination VM. Importing a single line with 16MB size taks 1.3GiB RSS memory. Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB to improve reliability of VictoriaMetrics during imports. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-24 13:13:33 +02:00
Aliaksandr Valialkin	01bc62eff9	docs/CHANGELOG.md: document Google PubSub support at vmagent (see `752f89f13f` )	2023-11-23 21:14:04 +02:00
Aliaksandr Valialkin	a906a7d85c	app/vmagent/remotewrite: do not drop persistent queues when -remoteWrite.multitenantURL is set It is unsafe to drop persistent queues when -remoteWrite.multitenantURL command-line flag is set, since these queues are created on demand when a new sample for the given tenant is pushed to the remote storage. This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5357 The issue has been appeared in the commit `f3a51e8b1d` when implementing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014	2023-11-23 20:43:21 +02:00
Hui Wang	91379331eb	lib/protoparser/promremotewrite: fall back to zstd decoding if Snappy-decoding fails (#5344 ) This case is possible after the following steps: 1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data. 2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk. 3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression. 4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data. 5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy. The solution is the same as `12cd32fd75`, just fall back to zstd decompression if Snappy decompression fails.	2023-11-17 15:53:18 +01:00
Aliaksandr Valialkin	1a15b0f57b	docs/CHANGELOG.md: cut v1.95.1	2023-11-16 20:32:27 +01:00
Aliaksandr Valialkin	ef80a89a24	lib/handshake: add SetReadDeadline and SetWriteDeadline implementations additionally to SetDeadline This is a follow-up for `27a5461785` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327	2023-11-16 16:43:36 +01:00
Roman Khavronenko	27a5461785	lib/handshake: check for deadline in Read and Write methods (#5327 ) The buffered connection could have exceeded the underlying connection deadline during reading or writing to an internal buffer. With this change, buffered connection struct additionally checks for a deadline in Read/Write methods. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-16 16:33:40 +01:00
Aliaksandr Valialkin	147fe45828	docs/CHANGELOG.md: remove duplicate word `query` after `2cbdb1db22`	2023-11-16 16:24:15 +01:00
Aliaksandr Valialkin	7ca8ebef20	app/vmselect/promql: properly handle duplicate series when merging cached results with the results obtained from the database evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series) when performing queries satisfying all the following conditions: - It must select time series with multiple metric names. For example, {__name__=~"foo\|bar"} - The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo\|bar"}) - The rollup function must be wrapped into aggregate function, which has no streaming optimization. For example, quantile(0.9, rate({__name__=~"foo\|bar"}) In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error. Instead, it should fall back to query execution with disabled cache. Also properly store the merged results. Previously they were incorrectly stored because of a typo introduced in the commit `41a0fdaf39` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337	2023-11-16 16:16:17 +01:00
hagen1778	7d72474a38	dashboards: use `version` instead of `short_version` in annotations `version` label won't show the difference if various flavors of the same version were deployed. But `short_version` will. For example, on the sandbox env we test VM builds before new version release. Without this change, the version update won't be visible on dashboard. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `d389a4fcf3`)	2023-11-16 09:27:42 +01:00
Aliaksandr Valialkin	9ad4a8fffe	docs/CHANGELOG.md: cut v1.95.0 release	2023-11-15 17:46:02 +01:00
Aliaksandr Valialkin	bd5bbdf00c	docs/CHANGELOG.md: document v1.93.8 LTS release	2023-11-15 17:12:56 +01:00
Aliaksandr Valialkin	6a8911ad38	docs/CHANGELOG.md: document v1.87.11 LTS release	2023-11-15 15:54:57 +01:00
Aliaksandr Valialkin	d7a63529b5	docs/CHANGELOG.md: consistently prepend command-line flags with a single dash	2023-11-14 21:44:46 +01:00
hagen1778	cfc58dd932	docs: clarify vmalert flag changes Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-14 21:44:46 +01:00
Nikolay	0730c2586d	lib/querytracer: makes package concurrent safe to use (#5322 ) * lib/querytracer: makes package concurrent safe to use it must fix various issues with concurrent code usage. Especially, when it's not reasonable to wait for all goroutines to be finished * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-14 20:58:28 +01:00
hagen1778	72a40539b0	dashboards: update description for RSS and anonymous memory panels to be consistent for single-node, cluster and vmagent dashboards. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `d3ae2b2f62`)	2023-11-14 10:00:11 +01:00
hagen1778	777424082b	deployment/dashboards: respect `job` and `instance` filters for `alerts` annotation in cluster and single-node dashboards Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `d6ae082598`)	2023-11-14 10:00:11 +01:00
Aliaksandr Valialkin	d6a2264709	docs/CHANGELOG.md: document `0e056ddb2d` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5203	2023-11-14 01:24:29 +01:00
Zakhar Bessarab	f7834767c1	vmcluster: re-routing enhancement (#5293 ) * app/vmstorage: close vminsert connections gradually before stopping storage Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878 Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: update graceful shutdown logic - close connections from vminsert in determenistic order - update flag description - lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks). Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/cluster: add information about re-routing enhancement during restart Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/changelog: add entry for new command-line flag Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * {app/vmstorage,lib/ingestserver}: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/cluster: add note to update workload scheduler timeout Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * wip --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-14 01:00:42 +01:00
Aliaksandr Valialkin	c1f651a9f9	app/vmauth: add ability to drop the specified number of `/`-delimited prefix parts from request path This can be done via `drop_src_path_prefix_parts` option at `url_map` and `user` levels. See https://docs.victoriametrics.com/vmauth.html#dropping-request-path-prefix	2023-11-13 22:34:40 +01:00
Aliaksandr Valialkin	12cd32fd75	lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails This case is possible after the following steps: 1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether the remote storage supports zstd-compressed data. 2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression for the data sent to the remote storage. 3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk. 4. The remote storage becomes available. 5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data. 6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage, while falsely advertizing it sends zstd-compressed data. 7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd. The solution is to just fall back to Snappy decompression if zstd decompression fails. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301	2023-11-13 21:25:39 +01:00
Aliaksandr Valialkin	356deada8c	lib/htmlcomponents: use relative links for the top page and for favicon.ico This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes. For example, vmagent HTTP handlers can be served via /vmagent/ path prefix: - http://proxy/vmagent/targets - http://proxy/vmagent/service-discovery The path prefix can be arbitrary. For example, below are vmagent urls for /tenantID/vmagent/ path prefix: - http://proxy/tenantID/vmagent/targets - http://proxy/tenantID/vmagent/service-discovery While at it, consistently serve favicon.ico from any path directory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307	2023-11-13 20:28:17 +01:00
Aliaksandr Valialkin	fb2071a01e	lib/regexutil: properly handle alternate regexps surrounded by .+ or .* Previously the following regexps were improperly handled: .+foo\|bar.+ .foo\|bar. This could lead to unexpected regexp match results. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5297 Thanks to @Haleygo for the initial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5308	2023-11-13 18:25:57 +01:00
Aliaksandr Valialkin	2a3352c70e	docs/CHANGELOG.md: remove trailing whitespace after `bffd30b57a`	2023-11-13 09:47:36 +01:00
Aliaksandr Valialkin	b9aba7edfb	app/vmauth: properly pass `Host` header to backends Previously the `Host` header was remained unchanged when passing it in requests to backends. This may improperly work if the backend uses host-based routing. While at it, allows http/2.0 requests to backends. While VictoriaMetrics components do not accept http/2.0 requests, other backends can require such requests. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240	2023-11-13 09:45:34 +01:00

1 2 3 4 5 ...

1750 Commits