Commit Graph

8 Commits

Author SHA1 Message Date
Max Golionko
738741ab0d
rename group for cluster (#1546)
rename group for cluster, so that they not overlap when you have vmsingle and vmcluster deployed alongside
2021-08-18 16:03:04 +03:00
Roman Khavronenko
d63842cdbe
Cluster alerts (#1513)
* alerts: move `ProcessNearFDLimits` to `vm-health` group since it is relevant for all services

* alerts: add new `TooHighMemoryUsage` alerting rule
2021-08-02 17:54:24 +03:00
Roman Khavronenko
ce3f087d46
alerts: sync alert expression for DiskRunsOutOfSpaceIn3Days with dashboard (#1435) 2021-07-07 00:47:08 +03:00
k1rk
c6c789db8f rename serviceHealth group name to vm-health (#1360)
this causes conflicts in `victoria-metrics-k8s-stack` chart =)
2021-06-09 02:26:21 +03:00
Aliaksandr Valialkin
1c09e71f5b app/vminsert: add -disableRerouting command-line flag for disabling re-routing if some vmstorage nodes have lower performance than the others
Refactor the rerouting mechanism and make it more resilient to cases when some of vmstorage nodes are temporarily unavailable.

Reduce the probability of rerouting storm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1165
2021-06-04 04:33:52 +03:00
Roman Khavronenko
c6fc3fa94d
alerts: make alerting rule RPCErrors compatible with PromQL (#1204)
Original query can't be executed via PromQL which results in error
if expression is evaluated by Prometheus. The new expression is
compatible with both engines.
2021-04-13 08:10:23 +03:00
Roman Khavronenko
c4f6b79d76
alerts: add ServiceDown alert to detect "dead" services (#1196) 2021-04-08 18:23:10 +03:00
Roman Khavronenko
51faea5e4b
deployment: add vmalert+alertmanager services and list of default alerts for cluster version (#1187) 2021-04-05 22:29:04 +03:00