VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-03 16:21:14 +01:00

Author	SHA1	Message	Date
Lauri Tirkkonen	8fe41b2b08	deployment/alerts: fix quoting on DiskRunsOutOfSpace (#7234 ) ### Describe Your Changes there's an extra `"` at the end of the dashboard url for this alert; remove it by making the quoting consistent with other alerts in this file. ### Checklist The following checks are mandatory: - [X] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Co-authored-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>	2024-10-11 00:44:18 -07:00
hagen1778	49f13b12d9	deployment/alerts: rm `ProcessNearFDLimits` alert from alerts-cluster As it is already present in alerts-health file Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 14:59:27 +02:00
Nikolay	69d244e6fb	lib/mergeset: adds tracking for indexdb records drop (#6297 ) It allows to create alert for possible item drops at indexdb. It may happen, if ingested metric size exceeds max indexdb item size. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-05-24 14:55:20 +02:00
Hui Wang	4369bc1df2	deployment/dashboards: fix `Storage full ETA` panels (#5747 ) During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than rate(vm_rows_added_to_storage_total) and it could last quite some time, which causes negative values of Storage full ETA and confuses users, see playground. Instead of trying to get more accurate results during downsampling, I think it's ok to ignore vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling. --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-08 09:43:39 +01:00
hagen1778	0a5ffb3bc1	docs: remove slug from Grafana dashboard URLs Each Grafana dashboard has unique ID which can be used to fetch the dashboard from grafana.com: https://grafana.com/grafana/dashboards/11176 The same dashboard can be accessed via URL with slug: https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/ But using slug implies that any change to dashboard name will break the link. So it is better to just use ID, so the dashboard URL will never break. This is follow-up for `ff33e60a3d` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-18 11:19:53 +01:00
hagen1778	d0e4190969	deployment/alerts: add `job` label to `DiskRunsOutOfSpace` alerting rule So it is easier to understand to which installation the triggered instance belongs. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-16 09:49:39 +01:00
hagen1778	8fb68152e6	alerts: simplify aggregation of alerting rules This is follow-up after `75196d7234` It updates some of the alerting rules to remove unnecessary aggregations. It keeps aggregations for expressions which are using multiple time series filters to make sure their label will match. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-11 15:17:30 +01:00
hagen1778	2e4d0d0e41	alerts: move `ConcurrentFlushesHitTheLimit` alert to health alerts The `ConcurrentFlushesHitTheLimit` could be related to components like vminsert, vmstorage, vm-single-node and vmagent. Moving this alert to the `health` section of alerts will be benefitial for all components and will remove the duplicates from single/cluster alerts. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-08-03 10:46:26 +02:00
Aliaksandr Valialkin	91533531f5	docs/Troubleshooting.md: document an additional case, which could result in slow inserts If `-cacheExpireDuration` is lower than the interval between ingested samples for the same time series, then vm_slow_row_inserts_total` metric is increased. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183	2023-03-20 13:28:36 -07:00
Aliaksandr Valialkin	c63755c316	lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit Previously the -maxConcurrentInserts was limiting the number of established client connections, which write data to VictoriaMetrics. Some of these connections could be idle. Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting the number of such connections. So now the -maxConcurrentInserts command-line option limits the number of concurrently executed insert requests, not including idle connections. It is recommended removing -maxConcurrentInserts command-line option, since the default value for this option should work good for most cases.	2023-01-06 22:20:19 -08:00
Artem Navoiev	7d9c4bebc0	update links to grafana dashboards (#3534 ) docs: update links to grafana dashboards Signed-off-by: Artem Navoiev <tenmozes@gmail.com>	2022-12-25 17:36:20 +01:00
Aliaksandr Valialkin	f3e84b4dea	{dashboards,alerts}: subtitute `{type="indexdb"}` with `{type=~"indexdb.*"}` inside queries after `8189770c50` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337	2022-12-05 16:00:22 -08:00
Roman Khavronenko	6801b37e53	dashboards: add `Disk space usage %` and `Disk space usage % by type` panels (#3436 ) The new panels have been added to the vmstorage and drilldown rows. `Disk space usage %` is supposed to show disk space usage percentage. This panel is now also referred by `DiskRunsOutOfSpace` alerting rule. This panel has Drilldown option to show absolute values. `Disk space usage % by type` shows the relation between datapoints and indexdb size. It supposed to help identify cases when indexdb starts to take too much disk space. This panel has Drilldown option to show absolute values. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-05 08:35:33 +01:00
Roman Khavronenko	3407006cdb	dashboards: cluster dashboard update (#3380 ) The purpose of the update is to make the dash more usable for large installations with many instances. Panels which showed metrics per-instance (Mem, CPU) now are showing metrics per-job or min/max/avg aggregations in % instead. This supposed to help immediately to identify resource shortage and remain usable for small and big installations. For cases when detailed info is needed, to the bottom of the dashboard a new row `Drilldown` was added. Panels like Mem or CPU now contain a `data-link` named `Drilldown` (cis shown on line click) which takes user to more detailed panel. The change list is the following: * bump Grafana version to 9.1.0; * replace old "Graph" panel with "TimeSeries" panel; * improve Uptime panel to show number of instances per job; * show % usage of Mem and CPU instead of absolute values; * `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`; * add `Drilldown` section for detailed resource usage; * add Annotations for Alert triggers. Not all alerts are supposed to be displayed on the dashboard, but only those with label `show_at: dashboard`. See `alerts-cluster.yml` change. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-11-23 18:03:25 -08:00
Zakhar Bessarab	6711eec109	docker-compose: move `TooManyLogs` into `vm-health` alerts set (#3199 )	2022-10-05 19:23:36 +02:00
Roman Khavronenko	5714a68ac6	deployment/docker: move cluster compose env to master branch (#3130 ) * deployment/docker: move cluster compose env to master branch The change supposed to simplify the process of maintaining for single/cluster docker-compose envs, alerts, dashboards. It also supposes to reduce confusion for users when looking for cluster related alerts/configs. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: move cluster compose env to master branch Review updates. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-21 11:48:38 +03:00

16 Commits