Commit Graph

161 Commits

Author SHA1 Message Date
Artem Fetishev
683f8c2780
dashboards: add Restarts panel (#7394)
Reopening PR #7373 from a branch in VictoriaMetrics repo in order to
enable edits and rebase.

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-10-30 16:44:08 +01:00
Zhu Jiekun
cd2222aa95
dashboards: fix query for full ETA vm_free_disk_space_bytes - vm_free_disk_space_limit_bytes (#7355)
Some checks failed
publish-docs / Build (push) Has been cancelled
### Describe Your Changes

Fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7334

available disk space should be 
```
(vm_free_disk_space_bytes{job=~...} - vm_free_disk_space_limit_bytes{job=~...})
```
instead of 
```
vm_free_disk_space_bytes{job=~...}
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-10-25 15:09:14 +02:00
Artem Fetishev
ca787c70d1
dashboards: fix vmagent monitoring chart descriptions (#7283)
### Describe Your Changes

Fix vmagent monitoring chart descriptions
### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-10-17 12:12:47 +02:00
Hui Wang
d3f110373c
dashboards: fix description about pending datapoints (#7235)
See [our
playground](https://play-grafana.victoriametrics.com/d/oS7Bi_0Wz_vm/victoriametrics-cluster-vm?orgId=1&var-ds=P996FABE17B5F6D1E&var-job=All&var-job_insert=All&var-job_select=All&var-job_storage=All&var-instance=All)
for reference.
2024-10-11 13:47:14 +02:00
Nikolay
fbaa026ae6
dashboards: updates operator dashboard (#7139)
* Replaces deprecated graphs with Timeseries panels
* Adds new latency dashboards for rest client and golang scheduler
* Adds new overview panels
* Adds VM Datasource version of dashboard

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-09-30 15:35:39 +02:00
Roman Khavronenko
4d0b41e63b
deployment: add panel and alerts for displying go scheduler latency (#7078)
The panel and alerting rule should help to understand whether VM
component doesn't have enough CPU resources or gets throttled. The alert
is applicable for all VM components.
The panel was added to vmalert, vmagent, vmsingle, vm clusert and
victorialogs dashes.

-------------------

This alerting rule should have help us identify resource shortage for
sandbox vmagent - see [this
link](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/?g0.range_input=23d13h25m25s424ms&g0.end_input=2024-09-23T14%3A11%3A00&g0.relative_time=none&g0.tab=0&g0.expr=histogram_quantile%280.99%2C+sum%28rate%28go_sched_latencies_seconds_bucket%7Bjob%3D%22vmagent-monitoring-vmagent%22%7D%5B5m%5D%29%29+by+%28le%2C+job%2C+instance%29%29+%3E+0.1)
for example. We weren't aware of resource shortage, because VM metrics
assumed this vmagent had 1vCPU while in fact its limit was 0.2vCPU.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-09-23 16:54:42 +02:00
hagen1778
4dcb6a3719
dashboards/vmagent: fix legend captions for stream aggregation related panels.
Before they were displaying wrong label names.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-09-03 14:23:35 +02:00
zjbztianya
1b1e61030b
dashboards: typo fix (#6920)
### Describe Your Changes

Correct the spelling error of 'vminsert' in the dashboards.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-09-03 10:27:01 +02:00
hagen1778
d225a2eb56
dashboards: add Scrape duration 0.99 quantile panel
The new panel will show the 99th quantile of scrape duration in seconds.
This should help identifying vmagent instances that experiences too high scraping durations.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-30 14:57:17 +02:00
Aliaksandr Valialkin
db557b86ee
app/vmagent/remotewrite: follow-up for f153f54d11
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
  This improves code maintainability a bit.

- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
  This prevents from potential resource leaks.

- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
  This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.

- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
  in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.

- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
  This label should simplify investigation of the exposed metrics.

- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
  it is hard to use these labels in query filters and aggregation functions.

- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
  This metric shows the number of samples generated by the corresponding streaming aggregation rule.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
  which could modify the behaviour of the constructed streaming aggregator.
  Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.

- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
  This also allows passing nil instead of Options when default options are enough.

- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
  so they have consistent set of labels comparing to the rest of streaming aggregation metrics.

- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
  `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
  `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
  metrics. This is a follow-up for the commit 2eb1bc4f81 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604

- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
  figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
  in the commit 2eb1bc4f81 .

- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
  While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
2024-07-15 20:24:01 +02:00
Hui Wang
c1c2286e09
vmagent-dashboard: update streaming aggregation panels (#6588)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-07-05 15:10:37 +02:00
hagen1778
ee66fb4387
docs: update refereneces to victoriametrics-datasource plugin
The plugin was renamed in https://github.com/VictoriaMetrics/victoriametrics-datasource/pull/178

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-05 09:46:17 +02:00
hagen1778
e45d80cd79
dashboards: fix wrong templating for vmauth
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-02 13:08:11 +02:00
Andrii Chubatiuk
861852f262
lib/streamaggr: added stale samples metric, added metrics labels (#6462)
### Describe Your Changes

- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
   - rwctx - global or number starting from 1
   - aggrid - aggregator id starting from 1
   - aggrSuffix - <interval>_(by|without)_label1_label2_labeln
   e.g: `name="global:1:1m_without_instance_pod"`

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 14:56:17 +02:00
Artem Navoiev
c9f496bdd0
dashboards: update statistic by tenant dashboard, fix billing disk usage pie panel (#6521)
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-27 09:29:25 +02:00
Nikolay
14b9ef1e4d
dashboards: add dashboard and alerts for vmauth (#6491)
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-06-25 11:15:29 +02:00
hagen1778
b201d1722d
dashboards: fix typo in panel descriptions for vmagent
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-21 11:42:38 +02:00
Hui Wang
75ad6c1b49
vmalert-dashboard: replace variable query metric (#6505)
`vmalert_iteration_total` series number is 4 time less than
`vmalert_iteration_duration_seconds`, queries will be lighter.
2024-06-19 09:40:34 +02:00
James Rhoat
fbd4b8e1ab
updating operator dashboard chart to be titled working instead of wokring (#6455)
### Describe Your Changes

Corrected spelling mistake in the operator json to be "working" instead
of "wokring"

### Checklist

The following checks are **mandatory**:

- [ x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-06-11 12:39:17 +04:00
Andrii Chubatiuk
2da45a8368
vmagent: updated dashboard and alert for stream aggregation (#6427)
### Describe Your Changes

Added streaming aggregation section to vmagent dashboards
Added alert for streaming aggregation and deduplication flush timeouts
Removed deprecated compose versions from compose files


Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-06-10 11:49:00 +02:00
hagen1778
9dd9b4442f
dashboards: use $__interval variable for offsets and look-behind windows in annotations
This should improve precision of `restarts` and `version change` annotations when
 zooming-in/zooming-out on the dashboards.

 The change also makes `restarts` dashboard visible on the panels, so user can disable it from
 displaying if needed. This could be useful when restarts overlap with version change events.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-22 16:32:51 +02:00
Hui Wang
d7b5062917
app/vmalert: support DNS SRV record in -remoteWrite.url (#6299)
part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053,
supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in
`-remoteWrite.url` command-line option.
2024-05-22 10:52:51 +02:00
hagen1778
c746ba154d
deployment/dashboards: fix AnnotationQueryRunner error in Grafana
The error appears when executing annotations query against Prometheus backend
because the query itself hasn't specified look-behind window (which is allowed
in VictoriaMetrics query engine).

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6309
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-05-21 11:39:02 +02:00
hagen1778
d386a68b59
dashboards: add new panel Concurrent selects to vmstorage row
The panel will show how many ongoing select queries are processed by vmstorage
and should help to identify resource bottlenecks. See panel description for more details.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 13:55:19 +02:00
hagen1778
9e18724036
deployment: update per-tenant-statistic dashboard to be compatible with Grafana 10
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:38:56 +02:00
hagen1778
5917ac003e
deployment: update backupmanager dashboard to be compatible with Grafana 10
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:35:25 +02:00
hagen1778
f0c4d372bd
deployment: update operator dashboard to be compatible with Grafana 10
- Use TimeSeries panel instead of deprecated Graph
- Update panel styles
- Fix version panel

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:32:19 +02:00
hagen1778
9256df17fa
deployment: bump Grafana version to 10.4.2
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 12:10:24 +02:00
hagen1778
8606b48ce5
dashboards: add Network Usage panel to Resource Usage row
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4478
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 11:54:17 +02:00
Dima Lazerka
564463259a
deployment/dashboards: properly show version for non-stable docker images (#6150)
re: .*-(?:tags|heads)-(.*)-(?:0|dirty)-.*

cases:
victoria-metrics-20240419-160209-heads-enterprise-single-node-0-g08f933ab0c
enterprise-single-node

victoria-metrics-20240201-133950-tags-v1.97.1-enterprise-0-g760a8733b
v1.97.1-enterprise

victoria-metrics-20240419-160209-heads-rotation-part-2-0-ge2367b6d1-dirty-848b54cd
rotation-part-2-0-ge2367b6d1

victoria-metrics-20240419-160209-heads-lts-1.93-enterprise-search-contention-0-g30ef4aad21-amd64
lts-1.93-enterprise-search-contention

victoria-metrics-20240425-150852-tags-v1.101.0-enterprise-0-g718138c64
v1.101.0-enterprise

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Dzmitry Lazerka <dlazerka@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 11:11:28 +02:00
Zakhar Bessarab
6b493582da
dashboards/victoria-metrics-single: allow selecting multiple instance values (#5870)
Allowing to select multiple instance IPs makes it much easier to view
metrics for longer periods of time in dynamic environments such as
Kubernetes. In k8s update will also cause IP to change making it harder
to use dashboard to check the status.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5869

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-29 10:34:36 +02:00
hagen1778
035de57e5e
dashboards: show max number of active merges instead of cumulative
The cumulative number of active merges could be red herring
as it its value depends on the number of vmstorages.
For example, vmstorage could be added or removed and this will affect
the panel.
Or, each vmstorage could start a merging process (i.e. for downsampling)
and visiually it could look like a massive change.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-04-23 16:41:48 +02:00
Aliaksandr Valialkin
59d495d469
all: replace old https://docs.victoriametrics.com/Troubleshooting.html url with the new one - https://docs.victoriametrics.com/troubleshooting/ 2024-04-18 03:26:36 +02:00
Aliaksandr Valialkin
f4b1cbfef0
all: replace old https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html url with the new one - https://docs.victoriametrics.com/cluster-victoriametrics/ 2024-04-18 02:54:20 +02:00
Aliaksandr Valialkin
8eeb045d3f
all: replace old https://docs.victoriametrics.com/MetricsQL.html url with the new one - https://docs.victoriametrics.com/metricsql/ 2024-04-18 02:14:53 +02:00
Aliaksandr Valialkin
b4fac26360
all: replace old https://docs.victoriametrics.com/vmalert.html url with the new one - https://docs.victoriametrics.com/vmalert/ 2024-04-18 01:44:12 +02:00
Aliaksandr Valialkin
4927e64700
all: replace remaining https://docs.victoriametrics.com/vmagent.html urls with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:36:13 +02:00
Vadim Rutkovsky
66c5fc3243
dashboards: fix typo in VictoriaLogs panel (#6102)
Comprasion -> compression
2024-04-16 09:50:46 +02:00
Artem Navoiev
5f9fb58dde
dashboards: statistic per tenant dashboard use variable for datasource in pie charts
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-03-16 13:46:56 +01:00
hagen1778
f781c42ea4
dashboards: add more context to cluster dashboard panels
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-05 15:00:49 +01:00
hagen1778
0ab1069363
dashboards: update links in various panels
* use docs.victoriametrics.com instead of github docs
* add links to common terms used in VictoriaMetrics

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-04 15:43:31 +01:00
Artem Navoiev
44e15c866a
dashboards: update statistic per tenant dashbaord. Change to timeseries panel, add churn rate over 24h and query duration, add billing section
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-28 16:04:20 +01:00
hagen1778
ecccd2a1cc
dashboards: add legend details to network panels in cluster dash
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 10:20:38 +01:00
hagen1778
3380043424
dashboards: follow-up 4369bc1df2
* add more details to changelog
* simplify panels description
* remove capacity planning recommendation, as it proves it incompetent

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 09:51:43 +01:00
Hui Wang
4369bc1df2
deployment/dashboards: fix Storage full ETA panels (#5747)
During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than 
rate(vm_rows_added_to_storage_total) and it could last quite some time,
 which causes negative values of Storage full ETA and confuses users, see playground.

Instead of trying to get more accurate results during downsampling, I think it's ok to ignore 
vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-08 09:43:39 +01:00
hagen1778
487a94565b
dashboards/all: add new panel CPU spent on GC
It should help identifying cases when too much CPU is spent on garbage collection,
 and advice users on how this can be addressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 16:21:21 +01:00
hagen1778
29a9b31584
dashboards: add Targets scraped/s
A new stat panel shows the number of targets scraped by the vmagent per-second.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:48:26 +01:00
hagen1778
db11b94e30
dashboards: update to grafana/grafana:10.3.1
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:41:08 +01:00
hagen1778
02492bc1a4
dashboards/single: fix typo in query for version annotation
The typo falsely produced many version change events.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-31 09:13:46 +01:00
hagen1778
c23e8bee89
dashboards: specify where to see details about dropped labels
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 07:37:51 +01:00