VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-15 16:30:55 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	e1cf962bad	lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping Previously all the newly ingested time series were registered in global `MetricName -> TSID` index. This index was used during data ingestion for locating the TSID (internal series id) for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names). The `MetricName -> TSID` index is stored on disk in order to make sure that the data isn't lost on VictoriaMetrics restart or unclean shutdown. The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache, and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics uses in-memory cache for speeding up the lookup for active time series. This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk. VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases: - If `storage/tsid` cache capacity isn't enough for active time series. Then just increase available memory for VictoriaMetrics or reduce the number of active time series ingested into VictoriaMetrics. - If new time series is ingested into VictoriaMetrics. In this case it cannot find the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index, since it doesn't know that the index has no the corresponding entry too. This is a typical event under high churn rate, when old time series are constantly substituted with new time series. Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index, are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics. Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName` for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod. This index can become very large under high churn rate and long retention. VictoriaMetrics caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups. The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series. This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics consults only the index for the current day when new time series is ingested into it. The downside of this change is increased indexdb size on disk for workloads without high churn rate, e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store identical `MetricName -> TSID` entries for static time series for every day. This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation, since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 . At the same time the change fixes the issue, which could result in lost access to time series, which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685 This is a follow-up for `1f28b46ae9`	2023-07-13 17:03:50 -07:00
Aliaksandr Valialkin	df67b78f75	docs/CHANGELOG.md: clarify the description of the bugfix at `177a0c1ca9` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4555	2023-07-13 12:19:00 -07:00
Dmytro Kozlov	f31ac064f9	app/vmctl: fix panic `--remote-read-filter-time-start` flag not defined (#4605 ) * app/vmctl: fix panic `--remote-read-filter-time-start` flag not defined * app/vmctl: update CHANGELOG.md --------- Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-07-13 12:13:21 -07:00
Dmytro Kozlov	555a0a9d57	app/vmctl: fix issue with adding many seconds (#4617 ) * app/vmctl: fix issue with adding many seconds * app/vmagent: add CHANGELOG.md	2023-07-13 12:09:54 -07:00
Roman Khavronenko	fdccb56620	vmalert: check for negative offset for missed rounds (#4628 ) It could happen for low evaluation intervals and irregular delays during execution that evaluation time would get a negative offset. This could result into cumulative discrepancy between the actual time and evaluation time for rules. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-13 12:05:52 -07:00
Aliaksandr Valialkin	b07a1c85b9	all: update Go builder from 1.20.5 to 1.20.6 See https://github.com/golang/go/issues?q=milestone%3AGo1.20.6+label%3ACherryPickApproved	2023-07-12 01:00:24 -07:00
Haleygo	ef8e3eb9b3	vmselect: fix result in Prometheus query when time is small (#4578 ) vmselect: fix result in Prometheus query when time is small Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-07-09 12:33:29 -07:00
Haleygo	3c2308fd52	vmalert:fix query request using rfc3339 format (#4577 ) vmalert: consistently use time.RFC3339 format for time in queries Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-07-09 11:03:10 -07:00
Roman Khavronenko	173ccf4333	vmselect: introduce `search.skipSlowReplicas` cmd-line flag (#4538 ) * vmselect: introduce `search.skipSlowReplicas` cmd-line flag vmselect has two logical conditions during request processing when `-replicationFactor` cmd-line flag is set: 1. If at least `len(storageNodes) - replicationFactor` responded, it could skip waiting for the rest of nodes to respond. This could lead to problems described here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207. 2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded without an error. The P1 showed itself error-prone and became the main reason why `-replicationFactor` wasn't recommended to use at vmselect level. However, this optimization could be still very useful in situations when there are slow and fast replicas in cluster. But P2 remains viable and important conditionless. Hiding P1 behind the feature-flag `search.skipSlowReplicas` should make `-replicationFactor` flag usable again. And let users choose whether they want P1 to be respected. Related issues https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 Signed-off-by: hagen1778 <roman@victoriametrics.com> * docs: update changelog Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-07 11:50:26 +02:00
Roman Khavronenko	109e55f865	vmalert: allow disabling of `step` param attached to instant queries (#4574 ) vmalert: allow disabling of `step` param attached to instant queries This might be useful for using vmalert with datasources that to not support this param, unlike VictoriaMetrics. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 23:13:56 -07:00
Aliaksandr Valialkin	eea088d87f	docs/CHANGELOG.md: clarify description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 bugfix This is a follow-up for `5eb5df96e2`	2023-07-06 22:42:02 -07:00
Aliaksandr Valialkin	eeb53660b8	docs/CHANGELOG.md: use the proper link to the issue related to the commit `7a92263459` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4402	2023-07-06 22:41:43 -07:00
Aliaksandr Valialkin	67a8992798	docs/CHANGELOG.md: remove redundant info from the url to consulagent_sd_configs docs This is a follow-up for `40d12be607`	2023-07-06 22:41:23 -07:00
Aliaksandr Valialkin	40f1ccba67	docs/CHANGELOG.md: clarify the description of the bugfix at `ce7141383d`	2023-07-06 22:41:03 -07:00
Aliaksandr Valialkin	dc89e1f644	app/vmselect/graphite: follow-up after `c7884f8686` - Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API - Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API - Clarify the change in docs/CHANGELOG.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841	2023-07-06 22:33:30 -07:00
Alexander Marshalov	eb611c3dc3	fix removing storage data dir before restoring from backup (#598 ) * fix removing storage data dir before restoring from backup Signed-off-by: Alexander Marshalov <_@marshalov.org> * fix review comment Signed-off-by: Alexander Marshalov <_@marshalov.org> * fix review comment Signed-off-by: Alexander Marshalov <_@marshalov.org> * fixes after merge with `enterprise-single-node` branch Signed-off-by: Alexander Marshalov <_@marshalov.org> --------- Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-07-06 22:32:12 -07:00
Aliaksandr Valialkin	2f19ba0f75	app/vmselect/netstorage: follow-up after `11ac551d52` - Clarify the scope of the fix at docs/CHANGELOG.md - Handle the case when -search.maxSamplesPerSeries limit is exceeded in the same way as the -search.maxSamplesPerQuery limit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472	2023-07-06 22:26:47 -07:00
Roman Khavronenko	bd5abb74fd	vmctl: interrupt explore procedure in influx mode if no numeric fields were found (#4576 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 22:21:18 -07:00
Roman Khavronenko	41f0ed48eb	docs: follow-up after `9da638aa66` (#4572 ) `9da638aa66` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 22:18:54 -07:00
Dmytro Kozlov	dd412a3757	app/vmalert: show on UI groups error after reload config (#4543 ) show on UI groups error after reload config https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4076 Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 22:11:36 -07:00
Haleygo	b029286298	fix parse for invalid partial RFC3339 format (#4539 ) The validation was needed for covering corner cases when storage is tested with data from 1970. This resulted into unexpected search results, as year was parsed incorrectly from the given timestamp. Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 22:09:35 -07:00
Nikolay	68879061be	docs: adds v1.91.3 release docs (#4561 )	2023-07-06 22:06:58 -07:00
Yury Molodov	8c190ec8fb	vmui: fix app routing issues (#4408 ) The change focuses on rectifying inconsistencies in the navigation behavior of the application and eliminating issues encountered when manually altering the URL. The key updates include: - Refactoring of the routing mechanism to handle all possible routes and their states. - Enhancement of the React Router usage to ensure a smoother navigation experience. - Handling application state when the URL is manually changed.	2023-07-06 21:58:09 -07:00
Alexander Marshalov	677c8a5465	show backup progress percentage in vmbackup log during backup uploading and restoring progress percentage in vmrestore log during backup downloading (#4460 ) (#4530 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-07-06 21:56:54 -07:00
Roman Khavronenko	cf433c066a	vmauth: expose latency metrics per user (#4525 ) expose `vmauth_user_request_duration_seconds` and `vmauth_unauthorized_user_request_duration_seconds` summary metrics for measuring requests latency per user. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 21:55:37 -07:00
Haleygo	9e49a9e924	vmalert: add `vmalert_remotewrite_sent_duration_seconds_total` metric (#4517 ) add `vmalert_remotewrite_sent_duration_seconds_total` metric	2023-07-06 21:51:31 -07:00
Roman Khavronenko	d5e7ea5ef3	vmalert: update retry policy for pushing data to `-remoteWrite.url` (#4504 ) By default, vmalert will make multiple retry attempts with exponential delay. The total time spent during retry attempts shouldn't exceed `-remoteWrite.retryMaxTime` (default is 30s). When retry time is exceeded vmalert drops the data dedicated for `-remoteWrite.url`. Before, vmalert dropped data after 5 retry attempts with 1s delay between attempts (not configurable). See `-remoteWrite.retryMinInterval` and `-remoteWrite.retryMaxTime` cmd-line flags. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-07-06 21:44:18 -07:00
Roman Khavronenko	311a81c7b0	vmalert: properly interrupt remotewrite retries on shutdown (#4505 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 21:43:04 -07:00
Zakhar Bessarab	7a000159d8	docs/changelog: followup for `830dac177f` (#4499 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-07-06 21:41:36 -07:00
Roman Khavronenko	d4ee505f6f	vmalert: retry all errors except 4XX status codes (#4461 ) vmalert: retry all errors except 4XX status codes Retry all errors except 4XX status codes while pushing via remote-write to the remote storage. Previously, errors like broken connection could prevent vmalert from retrying the request. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 17:34:32 -07:00
Yury Molodov	0ad966a898	vmui: memory leak fix (#4455 ) * fix: optimize the preparation of data for the graph * fix: optimize tooltip rendering * fix: optimize re-rendering of the chart * vmui: memory leak fix	2023-07-06 17:33:54 -07:00
Aliaksandr Valialkin	46210c4d5e	lib/promutils.ParseTime(): add support for timestamps in milliseconds See https://stackoverflow.com/questions/76437098/how-to-handle-time-unit-and-step-while-ingesting-or-querying-in-victoriametrics/76438405 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4459	2023-07-06 17:11:54 -07:00
Nikolay	dd7ebd6779	lib/storage: creates parts.json on start-up if it not exists. (#4450 ) * lib/storage: creates parts.json on start-up if it not exists. It fixes migrations from versions below v1.90.0. Previously parts.json was created only after successful merge. But if merge was interruped for some reason (OOM or shutdown), parts.json wasn't created and partitions left after interruped merge weren't properly deleted. Since VM cannot check if it must be removed or not. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 * Apply suggestions from code review Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> * Update lib/storage/partition.go Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-07-06 17:10:26 -07:00
Dmytro Kozlov	b32a270da7	vmctl: increase retry backoff policy delay (#4447 ) vmctl: update backoff policy on retries to reduce probability of overloading for `source` or `destination` databases	2023-07-06 17:00:06 -07:00
Dmytro Kozlov	2e81c5f740	vmctl: finish retries if context canceled (#4442 ) vmctl: interrupt backoff retries if import context is cancelled Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-07-06 16:56:00 -07:00
Alexander Marshalov	4084dba9e4	fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390 ) (#4439 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-07-06 16:53:29 -07:00
Roman Khavronenko	ecd7ec4832	Dashboard upd (#4438 ) dashboards: update dashboard for single-node version * add anonymous mem usage panel; * add syscall rate panel; * add location to logs panel; * update legend for panels to reflect instance name; * update queries to aggregate per instance. dashboards: update dashboard for cluster version * add syscall rate panel; * add drilldown to logs panel. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 16:49:42 -07:00
Aliaksandr Valialkin	ed868f47f9	docs/CHANGELOG.md: remove the change regarding http2 support at vmagent This is a follow-up for `8a07621a0c` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4283	2023-07-06 16:06:44 -07:00
Aliaksandr Valialkin	dff199a745	app/vmselect/graphite: follow-up after `c7884f8686` - Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API - Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API - Clarify the change in docs/CHANGELOG.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841	2023-07-06 15:19:07 -07:00
Aliaksandr Valialkin	ec75d9097d	app/vmselect/netstorage: follow-up after `11ac551d52` - Clarify the scope of the fix at docs/CHANGELOG.md - Handle the case when -search.maxSamplesPerSeries limit is exceeded in the same way as the -search.maxSamplesPerQuery limit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472	2023-07-05 21:13:34 -07:00
Roman Khavronenko	11ac551d52	app/vmselect/netstorage: properly process `-search.maxSamplesPerQuery` limit (#4472 ) Properly return the error to user when `-search.maxSamplesPerQuery` limit is exceeded. Before, user could have received a partial response instead. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-23 13:17:34 +02:00
Roman Khavronenko	4624fda00d	all: update Go builder from Go1.20.4 to Go1.20.5 (#4427 ) See https://github.com/golang/go/issues?q=milestone%3AGo1.20.5+label%3ACherryPickApproved Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `476c7bdd6f`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-09 10:42:15 +02:00
Roman Khavronenko	c42365dc31	docs/changelog: mention `a6a7795b9e` change (#4425 ) docs/changelog: mention `a6a7795b9e` change `a6a7795b9e` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `d4c314d628`)	2023-06-09 10:41:07 +02:00
Zakhar Bessarab	bcece4c5ce	doc: changelog followup for #4420 fix (#4421 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `9a490d0b5c`)	2023-06-09 10:41:07 +02:00
Zakhar Bessarab	7925e9698f	app/vmagent/remotewrite: fix vmagent panic on shutdown (#4407 ) app/vmagent/remotewrite: fix vmagent panic on shutdown Currently, when vmagent is stopping it first flushes pending series in remote write context and proceeds to stop streaming aggregation. This leads to streaming aggregation being unable to write results into pending timeseries (since it is already nil) and panic. This can lead to losing some aggregation results being lost almost silently. The fix is reordering flow to first stop streaming aggregation and flush all pending time series after that. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `ce7141383d`)	2023-06-09 10:40:52 +02:00
Roman Khavronenko	fb9b8f6b1b	app/vmagent: mention `enable_http2` in changelog (#4403 ) Follow-up after `72c3cd47eb` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `3305a6901c`)	2023-06-09 10:40:24 +02:00
Roman Khavronenko	d9131d71cd	docs/CHANGELOG.md: cut v1.91.2 (#4393 ) Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `cc739e3f8d`)	2023-06-09 10:40:13 +02:00
Dmytro Kozlov	dd89fb2e12	app/vmctl: add verbose output for docker installations or when TTY isn't available (#4333 ) * app/vmctl: add verbose output for docker installations or when TTY isn't available * app/vmctl: fix tests * app/vmctl: make vmctl interactive if no tty * app/vmctl: cleanup * app/vmctl: add comment --------- Co-authored-by: Nikolay <nik@victoriametrics.com> (cherry picked from commit `fc5292d8ed`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-09 10:39:38 +02:00
Dmytro Kozlov	c5debee3f4	app/{graphite,netstorage,prometheus}: fix graphite search tags api limits, remove redudant limit from SeriesHandler handler (#4352 ) * app/{graphite,netstorage,prometheus}: fix graphite search tags api limits, remove unused limit from SeriesHandler handler, * app/{graphite,netstorage,prometheus}: use search.maxTagValues for Graphite * app/{graphite,netstorage,prometheus}: update CHANGELOG.md * app/{graphite,netstorage,prometheus}: use own flags for Graphite API * app/{graphite,netstorage,prometheus}: cleanup * app/{graphite,netstorage,prometheus}: cleanup * app/{graphite,netstorage,prometheus}: update docs --------- Co-authored-by: Nikolay <nik@victoriametrics.com> (cherry picked from commit `c7884f8686`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-09 10:39:12 +02:00
Roman Khavronenko	a1b6a9317d	vmalert: fix nil map assignment (#4392 ) * vmalert: fix nil map assignment The storage instance with nil map params was created for remote-read purposes. And before change `7a9ae9de0d` this map was ignored in ApplyParams. Now, it started to be used and vmalert panics in runtime. The fix properly inits map for at `NewVMStorage` and verifies it is not nil on assignment in `ApplyParams`. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: add to changelog Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: properly clone Storage params Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: properly clone Storage params Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: properly clone Storage params Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `de94812088`)	2023-06-02 13:29:51 +02:00

1 2 3 4 5 ...

1461 Commits