Commit Graph

91 Commits

Author SHA1 Message Date
Will Bollock
0a17e17718
docs (node/mixin): fix annotation for Skew alert (#2671)
This updates the annotation for the NodeClockSkewDetected mixin alert to
match the new threshold set.

Original discussion was in this PR: https://github.com/prometheus/node_exporter/pull/1480

I spent an embarrassingly large amount of time trying to figure out how
the heck that alert would mean 300s of clock skew. Turns out the
annotation was just left the same after the threshold change.

Signed-off-by: Will Bollock <wbollock@linode.com>
2023-05-11 10:33:10 +02:00
Ryan J. Geyer
5e552bac02 Replace mistaken ) with }, resulting in parsable promql
Signed-off-by: Ryan J. Geyer <me@ryangeyer.com>
2022-12-13 13:30:42 +01:00
Jan Fajerski
87b8e3790d
docs/node-mixin: add fsMointpointSelector to alerts and dashboards (#2446)
* docs/node-mixin: add fsMountpointSelector

This adds the option to add a `mountpoint` selector to filesystem
related alerts. The default is `mountpoint!=""`.

* docs/node-mixins: add fsMountpointSelector to dashboards

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2022-10-20 13:06:31 +02:00
Vitaly Zhuravlev
7519830a8a Change io time units to %util
When appying rate() to seconds we have 'seconds per second' or fractions of the second, so actually it actually can be from 0 to 1.

Also update intervalFactor to 1 for better rates.

Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-07-26 11:09:43 +02:00
Vitaly Zhuravlev
469600f4bf Update units of network ad disk graphs
https://prometheus.io/docs/prometheus/latest/querying/functions/#rate

rate() calculates per-second average rate, therefore Bps units should be used for disks.

In networking bandwidth throughput is usually measured in bits/s so units are changed accordingly.

Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-07-26 11:09:43 +02:00
Paweł Krupa (paulfantom)
8571536327 docs/node-mixin: add missing selectors
Signed-off-by: Paweł Krupa (paulfantom) <pawel@krupa.net.pl>
2022-07-19 16:44:16 +02:00
Sven Kieske
d64766f43d
fix the following markdownlint issues (#2362)
fix the following markdownlint errors (and some more):

[..]mixins/node-exporter/README.md:13: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:21: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:27: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:33: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:41: MD034 Bare URL used
A detailed description of the rules is available at https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md

Signed-off-by: Sven Kieske <s.kieske@mittwald.de>
2022-06-28 05:50:06 +02:00
Björn Rabenstein
e5128e83f2
Merge pull request #2364 from grafana/vzhuravlev/fs_table
mixin: Change disk graph to disk table
2022-06-08 20:46:47 +02:00
Jan Fajerski
cec414df78 node-mixins/config: Switch fsAvailable warning and critical thresholds
Problem: In 0b50eb7294 the usage of the
threshold variables was adjusted. The values had been switched as well
resulting in reversed thresholds after the commit above. Warnings now
have a smaller threshold than critical alerts.

Solution: Adjust thresholds to reflect that warnings should be alerted
on before critical alerts.

Issues: https://github.com/prometheus/node_exporter/pull/2352

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2022-06-07 12:10:48 +02:00
Björn Rabenstein
b5a2ad46e3
Merge pull request #2351 from grafana/vzhuravlev/macos
Add darwin dashboard
2022-05-03 12:59:29 +02:00
Vitaly Zhuravlev
eef827006a Change disk graph to disk table
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-27 19:15:50 +04:00
Daniel Lenar
0b50eb7294 Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold
Currently critical alert for space available alerts on warning and
warning alert for space available alerts on critical.

Signed-off-by: Daniel Lenar <dlenar@vailsys.com>
2022-04-21 11:34:54 -05:00
Gabriel Amaral Antunes
410e069471 Add darwin dashboard to mixin
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-20 15:18:43 +04:00
Vitaly Zhuravlev
8823605f12 Fix NodeFileDescriptorLimit alerts
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-07 16:25:17 +04:00
Severyn Lisovskyi
7b86b7cb29
[node-mixin] change current datasource to grafana's default
Signed-off-by: Severyn Lisovskyi <993215+sev3ryn@users.noreply.github.com>
2022-02-02 14:45:26 +01:00
Julian Wiedmann
3e6f4ce627
mixin: exclude iowait and steal from CPU Utilisation (#2194)
'iowait' and 'steal' indicate specific idle/wait states, which shouldn't
be counted into CPU Utilisation. Also see
https://github.com/prometheus-operator/kube-prometheus/pull/796 and
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667.

Per the iostat man page:

%idle
    Show the percentage of time that the CPU or CPUs were idle and the
    system did not have an outstanding disk I/O request.

%iowait
     Show the percentage of time that the CPU or CPUs were idle during
     which the system had an outstanding disk I/O request.

%steal
     Show the percentage of time spent in involuntary wait by the
     virtual CPU or CPUs while the hypervisor was servicing another
     virtual processor.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
2021-11-04 11:03:27 +01:00
Ben Kochie
421fc429f3
Replace deprecated linter (#2176)
Upstream is replacing `golint` with `revive`.
* Cleanup unused mixin go files.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-10-27 11:01:15 +02:00
ngc104
4bc1c02000
fix bug in #2130 (#2170)
Signed-off-by: Yves Mettier <yves.mettier@orange.com>

Co-authored-by: Yves Mettier <yves.mettier@orange.com>
2021-10-21 12:07:38 +02:00
Tom Wilkie
9bc184d236
Datasource template variable should be labelled 'Data Source'
Signed-off-by: Tom Wilkie <tom@grafana.com>
2021-10-20 17:10:14 +01:00
Ben Kochie
5a38949451
Fix up mixin tests (#2167)
Use new Go install format, cleanup working dir setup.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-10-14 11:06:01 +02:00
Julien Pivotto
68a6c78c0d
Update go to 1.17 (#2159)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-03 13:35:24 +02:00
Ben Kochie
aeef1edd62
mixin: Add fallback for MemAvailable (#2130)
Add a fallback to Buffers+Cached+MemFree+Slab for older Linux kernels
where the MemAvailable metric is not available for memory utilization.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-09-28 10:22:06 +02:00
Johannes 'fish' Ziemke
6f1286b314 mixin: Drop mode label for num cpu metric
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-09-03 12:13:35 +02:00
Johannes 'fish' Ziemke
fa9926c4eb mixin: Cheaper calculation for instance:node_num_cpu:sum
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-09-03 11:34:25 +02:00
paulfantom
832909dd25 docs/node-mixin/alerts: make NodeFilesystemAlmostOutOfSpace fire earlier
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 16:35:58 +02:00
Johannes 'fish' Ziemke
7fc5c6045a Read config from $
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-07-27 16:32:05 +02:00
ArthurSens
3731f93fd7 Refactor USE method mixin dashboards with grafonnet-lib, add multi-cluster support.
Aiming for cleaner code and following standards used on younger mixins.

Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-07-27 16:32:05 +02:00
Frederic Hemberger
5bee84f30d docs: Replace go get with go install for command installation
`go get` is deprecated for installation of commands as of go v1.17
Ref: https://go.googlesource.com/go/+/ced0fdbad0655d63d535390b1a7126fd1fef8348

Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>
2021-07-20 12:16:46 +02:00
Loïc Blot
55ffe57cbc
feat(rules): add NodeFileDescriptorLimit kernel exhaustion alert
Add a new alert when fs.file-nr is close to fs.file-max

Signed-off-by: Loic Blot <loic.blot@unix-experience.fr>
2021-04-30 12:40:09 +02:00
raviprasad_lr
504f9b785c fix interval in graphs panels of node dashboard
Signed-off-by: raviprasad_lr <raviprasad_lr@yahoo.com>
2021-04-26 11:14:30 +02:00
Johannes 'fish' Ziemke
a5908bf82b Make interval configurable
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-04-07 09:37:04 +02:00
Johannes 'fish' Ziemke
772335caa8 Use 5m rate in mixins
The default scrape interval of Prometheus is 60s, so we can't use a 1m
rate.

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-04-07 09:37:04 +02:00
Ben Kochie
eefb18db02
Merge pull request #1764 from dhoppe/patch-1
Use description instead of message as field for annotations
2021-01-24 14:56:03 +01:00
Ben Kochie
4b68aeb80a
Merge pull request #1862 from fsschmitt/fix/alerts-label-naming
fix: node_md_disks state label from fail to failed
2021-01-24 14:53:22 +01:00
Julien Pivotto
f645d49242 Mixin: Bump jsonnet requirement to 0.16 to use go-jsonnetcmd
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:41:46 +01:00
Matthias Loibl
77e76485c0
Use absolute jsonnet import paths
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.

This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2020-10-20 11:34:43 +02:00
Björn Rabenstein
9c9c636305
Merge pull request #1861 from paulfantom/network-alerts
docs/node-mixin/alerts: use ratio for network alerts
2020-10-19 12:14:24 +02:00
paulfantom
f81747e608 docs/node-mixin/alerts: add max error condition to alert about desynchronized clock
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-08 11:15:16 +02:00
fsschmitt
effa4da989 fix: node_md_disks state label as failed
Signed-off-by: fsschmitt <492108+fsschmitt@users.noreply.github.com>
2020-10-07 14:20:56 +01:00
paulfantom
d7cbe85d22
docs/node-mixin/alerts: use a rate for network alerts
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-07 13:04:51 +02:00
Arthur Outhenin-Chalandre
6585e43eec Fix memory gauge in mixin with multiple pods
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
2020-09-23 15:36:43 +02:00
Nicolas Lamirault
ff2ff3410f
Configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert (#1835)
* Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert

Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
2020-09-18 11:28:32 +02:00
Rajat Vig
7dd8adf7ed
Fix NodeRAIDDegraded to not use a string rule expressions
Signed-off-by: Rajat Vig <rvig@etsy.com>
2020-08-28 10:43:39 +01:00
Simon Pasquier
02212dd2c6 Run jsonnetfmt
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:15:30 +02:00
Hao Ke
9b7a0d06a1 Fix syntax error
Signed-off-by: Hao Ke <hao.ke@auryc.com>

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:07:37 +02:00
Simon Pasquier
6d959e2e8c *: add mixin tests to CI
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:03:46 +02:00
paulfantom
e4ec8e04c5 docs/node-mixin: add alerts about failing RAID array
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-08-24 16:17:20 +02:00
Dennis Hoppe
fc64b70386
Use description instead of message as field for annotations
Signed-off-by: Dennis Hoppe <github@debian-solutions.de>
2020-06-24 13:38:57 +02:00
Frederic Branczyk
b42819b69d
Merge pull request #1657 from povilasv/NodeTextFileCollectorScrapeError
Add NodeTextFileCollectorScrapeError alert to mixin
2020-04-30 17:54:06 +02:00
Povilas Versockas
bd3e6d224c
Add NodeTextFileCollectorScrapeError alert to mixin
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-31 18:12:36 +03:00