Commit Graph

79 Commits

Author SHA1 Message Date
Roman Khavronenko
b2f45b4856
dashboards: update VM single dash (#3400)
The change list is the following:
* bump Grafana version to 9.2.6;
* replace old "Graph" panel with "TimeSeries" panel;
* show % usage of Mem and CPU additionally to of absolute values;
* `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`;
* add Annotations for Alert triggers. Not all alerts are supposed to be displayed
on the dashboard, but only those with label `show_at: dashboard`.
See `alerts.yml` change.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-29 20:39:05 -08:00
Roman Khavronenko
e42308e84c
dashboards: update vmalert dash (#3404)
The change list is the following:
* bump Grafana version to 9.2.6;
* replace old Graph panel with TimeSeries panel;
* add RemoteWrite section;
* allow configuring topK elements for some of the panels;
* Preer grouping by job instead of grouping by instance.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-29 20:38:35 -08:00
Roman Khavronenko
e6f0da480e
dashboards: update vmagent dash (#3411)
The change list is the following:
* bump Grafana version to 9.2.6;
* add version change annotations;
* switch to per-job panels instead of per-instance;
* add drilldown option for resource usage panels.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-29 20:29:34 -08:00
Roman Khavronenko
85d0cbbfc6
dashboards: update VM cluster dash (#3401)
The change list is the following:
* bump Grafana version to 9.2.6;
* remove artifacts in data links.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-28 16:43:58 -08:00
Timur Bakeyev
b6064dd645
Update datasource entries consistently contain type prometheus and uid $ds. (#3393)
Co-authored-by: Timour I. Bakeev <tbakeev@ripe.net>
2022-11-28 16:43:58 -08:00
Roman Khavronenko
ed39d0d11c
dashboards: cleanup & remove artifacts (#3387)
* some unexpected DS UIDs were removed;
* replace `$instance.*` filter with `$instance` since we respect
the instance port anyway;
* remove predefined datasource for `clusterbytenant`
in favour of datasource variable `ds`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-25 07:28:24 -08:00
Roman Khavronenko
cae148d5c6
dashboards: cluster dashboard update (#3380)
The purpose of the update is to make the dash more usable
for large installations with many instances. Panels which showed
metrics per-instance (Mem, CPU) now are showing metrics per-job or min/max/avg
aggregations in % instead. This supposed to help immediately to identify
resource shortage and remain usable for small and big installations.

For cases when detailed info is needed, to the bottom of the dashboard
a new row `Drilldown` was added. Panels like Mem or CPU now contain
a `data-link` named `Drilldown` (cis shown on line click) which takes
user to more detailed panel.

The change list is the following:
* bump Grafana version to 9.1.0;
* replace old "Graph" panel with "TimeSeries" panel;
* improve Uptime panel to show number of instances per job;
* show % usage of Mem and CPU instead of absolute values;
* `Caches` row was removed. All needed info for caches is now part of `Troubleshooting`;
* add `Drilldown` section for detailed resource usage;
* add Annotations for Alert triggers. Not all alerts are supposed to be displayed
on the dashboard, but only those with label `show_at: dashboard`.
See `alerts-cluster.yml` change.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-11-24 13:20:10 -08:00
Roman Khavronenko
0efc20d7b8
dashboards: replace Index size panel with Active series (#3157)
Panel `Index size` showed itself impractical for users. So
replacing it with `Active series` panel.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/776#issuecomment-1255823734
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-09-26 08:48:25 +03:00
Roman Khavronenko
5dfe63e102
Dashboards (#3120)
* dashboards/cluster: few updates

* apply consistent formatting across panels;
* make resource usage panels per component more detailed;
* add extra panels to vmselect for displaying
`vm_rows_read_per_query`, `vm_rows_scanned_per_query`,
`vm_rows_read_per_series` and `vm_series_read_per_query` metrics.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/single: few updates

* apply consistent formatting across panels;
* add extra panels to Performance for displaying
`vm_rows_read_per_query`, `vm_rows_scanned_per_query`,
`vm_rows_read_per_series` and `vm_series_read_per_query` metrics.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/vmagent: few updates

* apply consistent formatting across panels;
* add panels for showing number of samples ingested
or scraped;
* adapt resource usage panels for multiple selected jobs/instances;
* add adhoc variable;
* display vmagent's version in Stats.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/vmalert: few updates

* apply consistent formatting across panels;
* adapt resource usage panels for multiple selected jobs/instances;
* show vmalert version in Stats section.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-09-19 15:04:37 +03:00
Max Golionko
e07f23a1b9
moved cluster dashboard to master (#3074)
dashboards: move cluster dashboard to master branch

This change should simplify dashboards management.
2022-09-08 11:47:25 +03:00
Roman Khavronenko
3c583c16a1
dashboards: add Cache usage % panel to Caches row (#2960)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2941
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-08-08 11:45:17 +02:00
Roman Khavronenko
23e85e0fc5
vmagent: expose metric vmagent_remotewrite_queues (#2871)
The new metric `vmagent_remotewrite_queues` exports a static value of
number of configured remote write queus. This metric is useful to
calculate total saturation per each configured URL with given number
of queues. See corresponding changes to vmagent alerts and dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-18 14:41:04 +03:00
Roman Khavronenko
018782c24e
dashboards: small visual tweaks for vmagent's dashboard (#2828)
* remove lines filling
* filter series with zero values
* update descriptions

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-05 13:20:41 +03:00
Roman Khavronenko
fc03950efa
dashboards: update cluster dashboard (#2773)
* dashboards: update cluster dashboard

* add assisted merges panel https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2754
* add mem panel per each component
* remove lines filling for some panels for clarity

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update dashboards/victoriametrics.json
2022-06-23 09:46:28 +02:00
Roman Khavronenko
246d2df361
dashboards: add cpu usage panels per each component type (#2723)
The change adds extra panel per each component, showing
the amount of used CPU cores and the limit (summary of all instances).

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2696

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-16 20:49:55 +03:00
Artem Navoiev
5b4b922433
dashboards: update cluster by tenant dashboard (#2695)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2022-06-09 13:15:40 +03:00
Roman Khavronenko
d956f6f68e
Dashboar cluster update (#2674)
* dashboard: fix query for `CPU percentage` panel

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboard: replace Uptime panel with Version panel

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-02 16:03:48 +02:00
Roman Khavronenko
46c06334ee
dashboards: use vm_concurrent_select_current instead of vm_concurrent_queries (#2655)
Using metric `vm_concurrent_queries` in relation to `vm_concurrent_select_capacity`
is incorrect. Switching to `vm_concurrent_select_current` in `Concurrent selects` panel.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-30 12:16:24 +03:00
Nikolay
514843be77
dashboards: adds dashboard for operator (#2621)
Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

Adds proper interval to rate functions
2022-05-23 11:49:03 +03:00
Roman Khavronenko
8c30640828
dashboards: bump version requirement for cluster dashboard (#2537)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-05 13:40:07 +03:00
hagen1778
e856d74b7b dashboards: replace fixed interval of 5m for rate expressions
Before we used fixed `5m` interval for expressions with `rate` func.
Unfortunately, this interval wasn't a fit for all the cases. So we
switch to `$__rate_interval` instead.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-24 23:25:32 +03:00
hagen1778
16b3374874 dashboards: add new panel IndexDB items rate
The new panel supposed to reflect the pressure on indexDB
caused by churn rate or new series registration.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-24 23:25:32 +03:00
hagen1778
1762256c7e dashboards: mention that Rows.Sent can be affected by replication
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-24 23:25:32 +03:00
hagen1778
4255cb7559 dashboards: rm "Deferred merges" panel since it could be misleading
See more context here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1682#issuecomment-938608067

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-24 23:25:32 +03:00
hagen1778
0bbc7221f3 dashboards: add adhoc filter to dasbhoard variables
The adhoc filter allows to quickly apply global filters without
modifying the panels.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-24 23:25:32 +03:00
hagen1778
80e8413f3a dashboards: remove index filter from stats panel for DiskUsage
The diskUsage stats panel was showing disk usage without including
size of the index, which is not correct. The filter was removed
to reflect the total disk usage.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2368

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-24 23:25:32 +03:00
Roman Khavronenko
3569352fe0
dashboards: update the threshold for slow inserts % on the dashboard (#2198)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 21:57:21 +02:00
Roman Khavronenko
3458a3d593
Monitoring cluster (#2191)
* dashboards: add `CPU percentage` panel for cluster dashboards

The new panel `CPU percentage` was added instead if adding a limit
to the existing `CPU` panel because dasbhoard may display big number
of components each with own limits. The separate panel should provide
a clear display of CPU load.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards: sync vmagent and vmalert changes from single version

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docker: remove unsupported param from vmagent config

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* alerts: add `TooHighCPUUsage` alert for all VM components

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 11:57:58 +02:00
Roman Khavronenko
4010f548b5
dashboards: migrate from old table panel in cluster dashboard (#1993)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-22 11:21:06 +02:00
Roman Khavronenko
89facbc5c4
dashboards/vmagent: fix cached datasource uid (#1984)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-20 17:34:55 +02:00
Roman Khavronenko
0311d3cc89
Dashboards cluster (#1983)
* dashboards/cluster: add panels for vmstorage in read-only mode

vmstorage readonly status panel was addded to "vmstorage" row.

A one more panel for showing vminsert->vmstorage readonly status
was added to troubleshooting row.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/cluster: add "Cache usage" panel

The new panel supposed to show the % of the used cache
compared to allowed size by type.
It should help to determine underutilized types of caches.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/cluster: add "Merges deferred" panel

The new panel supposed to show if there were deferred merges
due to insufficient disk space.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/cluster: update Network panel for vminsert

* delete bytes_written query, since in most cases it is insiginificant
* change display type to Stack

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/cluster: bump version requirement

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-20 17:32:05 +02:00
Roman Khavronenko
ada18cd963
Dashboards vmagent updates (#1973)
* dashboards/vmagent: shuffle panels for better visibility

More important error/dropped panels were moved higher on the main row.
Network usage panel moved to Resource usage row.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/vmagent: add Troubleshooting row to show top 5 instances/jobs by churn rate

New panels are supposed to show top 5 jobs or targets which generate the most
of the churn rate. They were placed into a new row "Troubleshooting".

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/vmagent: add panels for showing persistent queue saturation

New panels were added to Torubleshooting row to show the persistent queue
saturation. The corresponding alerts were added and linked to these
panels as well.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/vmagent: add alert "RejectedRemoteWriteDataBlocksAreDropped"

New alert suppose to send a notification when vmagent starts to drop
data blocks rejected by configured remote write destiantion.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-20 12:19:17 +02:00
Aliaksandr Valialkin
6346b78fa8
dashboards: consistently use regexp filters for template vars (#1799)
Template vars may contain regexp when `all` is selected (.*) or when multiple values are selected (foo|bar).
So they must be passed to regexp filters.
2021-11-09 16:50:08 +02:00
Roman Khavronenko
d763837130
dashboards: add cardnilaity limiter panels for vmagent (#1720)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-19 09:00:05 +03:00
Roman Khavronenko
18313f3f8e
Cluster dashboard update (#1594)
* dashboards: sync `vmagent` updates from master branch

* dashboards: add new `Storage connection saturation` panel for cluster dashboard

* dashboards: add new cluster alert for corresponding `Storage connection saturation` panel
2021-09-01 17:05:17 +03:00
Roman Khavronenko
be3e31a574 dashboards: bump vmagent version requirement 2021-09-01 16:08:17 +03:00
Roman Khavronenko
af8c1feddb Single dashboards upd (#1593)
* dasbhoard: replace `null` datasources

null datasource value may confuse Grafana and make it drop panel query in some
versions.

* docker: bump grafana image version

* dashboards: add URL variable selector to vmagent dashboard

* dashboards: add new panel `Remote write connection saturation` to vmagent dashboard

* alerts: add new alert for `Remote write connection saturation` panel of vmagent dashboard

* dashboards: add "Logging rate" panel to vmagent dashboard
2021-09-01 12:24:55 +03:00
Roman Khavronenko
434f33d04d
Cluster sync master changes (#1592)
* docker: add README for docker compose env

* docker: add vmalert Grafana dashboard
2021-09-01 10:25:07 +03:00
Roman Khavronenko
c6cf821600
dashboard: several minor fixes (#1418)
* move panel `Disk writes/reads` to `Resource usage` row
* rename row `storage` to `vmstorage`
* remove cumulative display for `Storage ETA` panel
2021-07-01 05:45:35 +03:00
Roman Khavronenko
5cb378f5b5
dasbhoard: display tweaks (#1387)
* rm cumulative visualisation for panel `Disk space used`.
It uses % threshold and cumulative display breaks it.
* remove area filling for resource usage row;
* add job name for panels in resource usage row.
2021-06-18 10:48:40 +03:00
Roman Khavronenko
db39c4a7d1
dashboard: bump version requirements (#1379) 2021-06-14 13:32:32 +03:00
Roman Khavronenko
1053d3e5a9
Dashboard cluster (#1375)
* dashboard: update vmagent dash

The update contains the following changes:
* display anonymous memory usage metric. This metric suppose to reflect
memory usage of the process which can't be freed by OS;
* add legends to all panels. This is important for cases when users share
the screenshots;
* modify panels for Grafana v8.0.0

* dashboard: update cluster dash

The update contains the following changes:
* move stats panels to Configuration row, so it can be collapsed;
* display anonymous memory usage metric. This metric suppose to reflect
memory usage of the process which can't be freed by OS;
* add legends to all panels. This is important for cases when users share
the screenshots;
* modify panels for Grafana v8.0.0
2021-06-14 13:03:54 +03:00
Aliaksandr Valialkin
1c09e71f5b app/vminsert: add -disableRerouting command-line flag for disabling re-routing if some vmstorage nodes have lower performance than the others
Refactor the rerouting mechanism and make it more resilient to cases when some of vmstorage nodes are temporarily unavailable.

Reduce the probability of rerouting storm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1165
2021-06-04 04:33:52 +03:00
Roman Khavronenko
78c388b246
dashboard: update descriptions for panel (#1275)
This commit fixes panels descriptions for `Concurrent flushes on disk` (vmstorage)
and `Concurrent inserts` (vminsert).
2021-05-07 11:25:00 +03:00
Aliaksandr Valialkin
6dc5d3b357 all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:20:01 +03:00
Roman Khavronenko
2b1f6b2373
dashboard: plot avg GC duration instead of quantile 1 for better perception (#1229) 2021-04-19 13:29:31 +03:00
Artem Navoiev
39b5de9f24 [draft] per tenant statistic (#121)
* [draft] per tenant statistic

* updates metric name
update graph
adds link and example config

* quick fix

* adds grafana dashboard
adds example alert

Co-authored-by: f41gh7 <nik@victoriametrics.com>
2021-04-14 11:23:41 +03:00
Roman Khavronenko
7fc9239536
Cluster dashboard update (#1185)
* dashboard: change FreeDiskSpace panel to show percentage of used space instead

* dashboard: disable area fill for Cache hit ratio

* dashboard: minor display updates

* dashboard: add panel `Concurrent flushes on disk`

* dashboard: add `Rows ignored` panel

* dashboard: update ChurnRate panel with proper description and additional query over 24h time window
2021-04-05 22:28:02 +03:00
Roman Khavronenko
b736c40053 Dashboards update (#1153)
* dashboard: update single node dashboard

* add number of new series created over last 24h;
* bump version requirements.

* dashboard: update vmagent dashboard

* add panel for open file descriptors;
* add panel for disk I/O;
* add panel for `vmagent_remotewrite_packets_dropped_total` metric;
* bump version requirements.
2021-03-29 12:41:28 +03:00
Roman Khavronenko
540c00f2a0
dashboard: update cluster node dashboard (#1136)
* add panel `Open FDs` for file descriptors metrics;
* add panel `Disk writes/reads` to show the real read/write
load on storage layer;
* add stats panel to show available CPUs, memory and disk space.
2021-03-18 12:04:12 +02:00