Commit Graph

114 Commits

Author SHA1 Message Date
Vitaly Zhuravlev
614030bb80 Set 'at' everywhere as preposition for instance
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
3d8075da7d Decrease NodeNetwork*Errs pending period
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
74794182a7 Add failed systemd service alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
fd2d62af63 Add CPU and memory alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
0e0399d41e Decrease NodeFilesystem pending time to 15m
30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file).

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
fc967aa992 Add mountpoint to NodeFilesystem alerts
This helps to identify alerting filesystem.

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Will Bollock
0a17e17718
docs (node/mixin): fix annotation for Skew alert (#2671)
This updates the annotation for the NodeClockSkewDetected mixin alert to
match the new threshold set.

Original discussion was in this PR: https://github.com/prometheus/node_exporter/pull/1480

I spent an embarrassingly large amount of time trying to figure out how
the heck that alert would mean 300s of clock skew. Turns out the
annotation was just left the same after the threshold change.

Signed-off-by: Will Bollock <wbollock@linode.com>
2023-05-11 10:33:10 +02:00
Ben Kochie
c8705ec4b2
Deprecate ntp collector
The ntp collector has always been a source of confusion and problems.
The data it produces is more of a blackbox probe against an NTP server.
The time sync / offset data produced is not what users expect.

Mark this collector as deprecated to be removed in v2.0.0

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-02-16 09:27:38 +01:00
Ryan J. Geyer
5e552bac02 Replace mistaken ) with }, resulting in parsable promql
Signed-off-by: Ryan J. Geyer <me@ryangeyer.com>
2022-12-13 13:30:42 +01:00
Jan Fajerski
87b8e3790d
docs/node-mixin: add fsMointpointSelector to alerts and dashboards (#2446)
* docs/node-mixin: add fsMountpointSelector

This adds the option to add a `mountpoint` selector to filesystem
related alerts. The default is `mountpoint!=""`.

* docs/node-mixins: add fsMountpointSelector to dashboards

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2022-10-20 13:06:31 +02:00
Siavash Sefid Rodi
f40dd31780 Fix CPU renaming rule
Signed-off-by: Florian Best <best@univention.de>
2022-07-27 13:16:00 +02:00
Vitaly Zhuravlev
7519830a8a Change io time units to %util
When appying rate() to seconds we have 'seconds per second' or fractions of the second, so actually it actually can be from 0 to 1.

Also update intervalFactor to 1 for better rates.

Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-07-26 11:09:43 +02:00
Vitaly Zhuravlev
469600f4bf Update units of network ad disk graphs
https://prometheus.io/docs/prometheus/latest/querying/functions/#rate

rate() calculates per-second average rate, therefore Bps units should be used for disks.

In networking bandwidth throughput is usually measured in bits/s so units are changed accordingly.

Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-07-26 11:09:43 +02:00
Albert Mikaelyan
cee386678c fix compatibility rule to convert to old node_cpu metric
Signed-off-by: Albert Mikaelyan <tahvok@gmail.com>
2022-07-25 18:54:53 +02:00
Paweł Krupa (paulfantom)
8571536327 docs/node-mixin: add missing selectors
Signed-off-by: Paweł Krupa (paulfantom) <pawel@krupa.net.pl>
2022-07-19 16:44:16 +02:00
Sven Kieske
d64766f43d
fix the following markdownlint issues (#2362)
fix the following markdownlint errors (and some more):

[..]mixins/node-exporter/README.md:13: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:21: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:27: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:33: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:41: MD034 Bare URL used
A detailed description of the rules is available at https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md

Signed-off-by: Sven Kieske <s.kieske@mittwald.de>
2022-06-28 05:50:06 +02:00
Björn Rabenstein
e5128e83f2
Merge pull request #2364 from grafana/vzhuravlev/fs_table
mixin: Change disk graph to disk table
2022-06-08 20:46:47 +02:00
Jan Fajerski
cec414df78 node-mixins/config: Switch fsAvailable warning and critical thresholds
Problem: In 0b50eb7294 the usage of the
threshold variables was adjusted. The values had been switched as well
resulting in reversed thresholds after the commit above. Warnings now
have a smaller threshold than critical alerts.

Solution: Adjust thresholds to reflect that warnings should be alerted
on before critical alerts.

Issues: https://github.com/prometheus/node_exporter/pull/2352

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2022-06-07 12:10:48 +02:00
Björn Rabenstein
b5a2ad46e3
Merge pull request #2351 from grafana/vzhuravlev/macos
Add darwin dashboard
2022-05-03 12:59:29 +02:00
Vitaly Zhuravlev
eef827006a Change disk graph to disk table
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-27 19:15:50 +04:00
Daniel Lenar
0b50eb7294 Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold
Currently critical alert for space available alerts on warning and
warning alert for space available alerts on critical.

Signed-off-by: Daniel Lenar <dlenar@vailsys.com>
2022-04-21 11:34:54 -05:00
Gabriel Amaral Antunes
410e069471 Add darwin dashboard to mixin
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-20 15:18:43 +04:00
Vitaly Zhuravlev
8823605f12 Fix NodeFileDescriptorLimit alerts
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-07 16:25:17 +04:00
Severyn Lisovskyi
7b86b7cb29
[node-mixin] change current datasource to grafana's default
Signed-off-by: Severyn Lisovskyi <993215+sev3ryn@users.noreply.github.com>
2022-02-02 14:45:26 +01:00
Julian Wiedmann
3e6f4ce627
mixin: exclude iowait and steal from CPU Utilisation (#2194)
'iowait' and 'steal' indicate specific idle/wait states, which shouldn't
be counted into CPU Utilisation. Also see
https://github.com/prometheus-operator/kube-prometheus/pull/796 and
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667.

Per the iostat man page:

%idle
    Show the percentage of time that the CPU or CPUs were idle and the
    system did not have an outstanding disk I/O request.

%iowait
     Show the percentage of time that the CPU or CPUs were idle during
     which the system had an outstanding disk I/O request.

%steal
     Show the percentage of time spent in involuntary wait by the
     virtual CPU or CPUs while the hypervisor was servicing another
     virtual processor.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
2021-11-04 11:03:27 +01:00
Ben Kochie
421fc429f3
Replace deprecated linter (#2176)
Upstream is replacing `golint` with `revive`.
* Cleanup unused mixin go files.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-10-27 11:01:15 +02:00
ngc104
4bc1c02000
fix bug in #2130 (#2170)
Signed-off-by: Yves Mettier <yves.mettier@orange.com>

Co-authored-by: Yves Mettier <yves.mettier@orange.com>
2021-10-21 12:07:38 +02:00
Tom Wilkie
9bc184d236
Datasource template variable should be labelled 'Data Source'
Signed-off-by: Tom Wilkie <tom@grafana.com>
2021-10-20 17:10:14 +01:00
Ben Kochie
5a38949451
Fix up mixin tests (#2167)
Use new Go install format, cleanup working dir setup.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-10-14 11:06:01 +02:00
Julien Pivotto
68a6c78c0d
Update go to 1.17 (#2159)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-03 13:35:24 +02:00
Michal
186e2e79c8
add yamllint config, fix yamllint errors (#2088)
After a recent change in prometheus/prometheus, Makefile.common includes
now a yamllint target which currently fails. This PR adds the missing
yamllint config and fixes the yamllint errors.

Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>
2021-09-29 20:12:14 +02:00
Ben Kochie
aeef1edd62
mixin: Add fallback for MemAvailable (#2130)
Add a fallback to Buffers+Cached+MemFree+Slab for older Linux kernels
where the MemAvailable metric is not available for memory utilization.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-09-28 10:22:06 +02:00
Johannes 'fish' Ziemke
6f1286b314 mixin: Drop mode label for num cpu metric
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-09-03 12:13:35 +02:00
Johannes 'fish' Ziemke
fa9926c4eb mixin: Cheaper calculation for instance:node_num_cpu:sum
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-09-03 11:34:25 +02:00
paulfantom
832909dd25 docs/node-mixin/alerts: make NodeFilesystemAlmostOutOfSpace fire earlier
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 16:35:58 +02:00
Johannes 'fish' Ziemke
7fc5c6045a Read config from $
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-07-27 16:32:05 +02:00
ArthurSens
3731f93fd7 Refactor USE method mixin dashboards with grafonnet-lib, add multi-cluster support.
Aiming for cleaner code and following standards used on younger mixins.

Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-07-27 16:32:05 +02:00
Frederic Hemberger
5bee84f30d docs: Replace go get with go install for command installation
`go get` is deprecated for installation of commands as of go v1.17
Ref: https://go.googlesource.com/go/+/ced0fdbad0655d63d535390b1a7126fd1fef8348

Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>
2021-07-20 12:16:46 +02:00
Loïc Blot
55ffe57cbc
feat(rules): add NodeFileDescriptorLimit kernel exhaustion alert
Add a new alert when fs.file-nr is close to fs.file-max

Signed-off-by: Loic Blot <loic.blot@unix-experience.fr>
2021-04-30 12:40:09 +02:00
raviprasad_lr
504f9b785c fix interval in graphs panels of node dashboard
Signed-off-by: raviprasad_lr <raviprasad_lr@yahoo.com>
2021-04-26 11:14:30 +02:00
Johannes 'fish' Ziemke
a5908bf82b Make interval configurable
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-04-07 09:37:04 +02:00
Johannes 'fish' Ziemke
772335caa8 Use 5m rate in mixins
The default scrape interval of Prometheus is 60s, so we can't use a 1m
rate.

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-04-07 09:37:04 +02:00
Ben Kochie
eefb18db02
Merge pull request #1764 from dhoppe/patch-1
Use description instead of message as field for annotations
2021-01-24 14:56:03 +01:00
Ben Kochie
4b68aeb80a
Merge pull request #1862 from fsschmitt/fix/alerts-label-naming
fix: node_md_disks state label from fail to failed
2021-01-24 14:53:22 +01:00
Anthony D'Atri
8b466360a3
Modest doc improvements (#1876)
* Modest doc improvements

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
2020-11-25 16:46:58 +01:00
Julien Pivotto
f645d49242 Mixin: Bump jsonnet requirement to 0.16 to use go-jsonnetcmd
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:41:46 +01:00
Matthias Loibl
77e76485c0
Use absolute jsonnet import paths
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.

This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2020-10-20 11:34:43 +02:00
Björn Rabenstein
9c9c636305
Merge pull request #1861 from paulfantom/network-alerts
docs/node-mixin/alerts: use ratio for network alerts
2020-10-19 12:14:24 +02:00
paulfantom
f81747e608 docs/node-mixin/alerts: add max error condition to alert about desynchronized clock
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-08 11:15:16 +02:00
fsschmitt
effa4da989 fix: node_md_disks state label as failed
Signed-off-by: fsschmitt <492108+fsschmitt@users.noreply.github.com>
2020-10-07 14:20:56 +01:00