mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-11-23 12:31:07 +01:00
dashboards: cluster dashboard update (#3380)
The purpose of the update is to make the dash more usable for large installations with many instances. Panels which showed metrics per-instance (Mem, CPU) now are showing metrics per-job or min/max/avg aggregations in % instead. This supposed to help immediately to identify resource shortage and remain usable for small and big installations. For cases when detailed info is needed, to the bottom of the dashboard a new row `Drilldown` was added. Panels like Mem or CPU now contain a `data-link` named `Drilldown` (cis shown on line click) which takes user to more detailed panel. The change list is the following: * bump Grafana version to 9.1.0; * replace old "Graph" panel with "TimeSeries" panel; * improve Uptime panel to show number of instances per job; * show % usage of Mem and CPU instead of absolute values; * `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`; * add `Drilldown` section for detailed resource usage; * add Annotations for Alert triggers. Not all alerts are supposed to be displayed on the dashboard, but only those with label `show_at: dashboard`. See `alerts-cluster.yml` change. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
parent
04bb2e14dd
commit
3407006cdb
File diff suppressed because it is too large
Load Diff
@ -54,6 +54,7 @@ groups:
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=52&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many errors served for {{ $labels.job }} path {{ $labels.path }} (instance {{ $labels.instance }})"
|
||||
@ -72,6 +73,7 @@ groups:
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=44&var-instance={{ $labels.instance }}"
|
||||
summary: "Too many RPC errors for {{ $labels.job }} (instance {{ $labels.instance }})"
|
||||
@ -83,6 +85,7 @@ groups:
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=133&var-instance={{ $labels.instance }}"
|
||||
summary: "vmstorage on instance {{ $labels.instance }} is constantly hitting concurrent flushes limit"
|
||||
@ -179,6 +182,7 @@ groups:
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
show_at: dashboard
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=139&var-instance={{ $labels.instance }}"
|
||||
summary: "Connection between vminsert on {{ $labels.instance }} and vmstorage on {{ $labels.addr }} is saturated"
|
||||
@ -186,3 +190,4 @@ groups:
|
||||
is saturated by more than 90% and vminsert won't be able to keep up.\n
|
||||
This usually means that more vminsert or vmstorage nodes must be added to the cluster in order to increase
|
||||
the total number of vminsert -> vmstorage links."
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user