Commit Graph

289 Commits

Author SHA1 Message Date
Gavin Lam
95efb86f6b
Add new collector and metrics for watchdog (#2309) (#2880)
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
2024-03-09 10:00:06 +01:00
linuxgcc
5e412a689a
disable selinux,fix end-to-end-test.sh error(#2934) (#2937)
Signed-off-by: heyitao <heyitao@uniontech.com>
Co-authored-by: heyitao <heyitao@uniontech.com>
2024-03-08 15:06:03 +01:00
João Pedro Lima
16f7122d31
Add mitigation information to the linux vulnerabilities collector (#2806)
While the CPU vulnerabilities collector has been added in https://github.com/prometheus/node_exporter/pull/2721 , it's currently not including information regarding the mitigation strategy used for a given vulnerability.

This information can be quite valuable, as often times different mitigation strategies come with a different performance impact.

This commit adds a third label to the cpu_vulnerabilities_info metric, to include the "mitigation" used for a given vulnerability - if a given vulnerability is not affecting a node or the node is still vulnerable, the mitigation is expected to be empty.

Signed-off-by: João Lima <jlima@cloudflare.com>
2023-12-14 13:15:27 +01:00
frigo
0550ab3f04
Add TCPOFOQueue to default netstat metrics (#2867)
Adds a count for TCP packets received out of orders. This can be an
indication that there is packet loss on the way packets travel towards
this server. In that case, the sender will retransmit (and we can
already monitor the Tcp_RetransSegs there), but we have no way to
monitor the packet loss on the receiver side. When a packet is received
and the receiver detects previous one missing, it will increase the
TCPOFOQueue counter and reply with selective ACK to the sender, both
possible indications of packet loss. Confirmation of packet loss can be
achieved by taking packet captures, ignoring wireshark analysis, and
carefully looking at data being retransmitted based on the TCP seq.

Just like RetransSegs, TCPOFOQueue should be interesting for any
deployment as a mean to detect packet loss, so here suggesting adding it
to the default list.

Signed-off-by: François Rigault <frigo@amadeus.com>
Co-authored-by: François Rigault <frigo@amadeus.com>
2023-12-08 18:24:07 +01:00
Gavin Lam
332232c22c
Add new collector and metrics for XFRM (#2544) (#2866)
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
2023-12-03 17:10:59 +01:00
Tobias Klausmann
78af952e63
NFSd: handle new wdeleg_getattr attribute in /proc/net/rpc/nfsd (#2810)
This attribute was introduced it v6.6-rc1.

The relevant changes in procfs were merged here:

https://github.com/prometheus/procfs/pull/574

and are part of procfs v0.11.2

I have also figured out that the stat should be part of the v4 ops
counters struct, but that will need changes to both procfs and this
code. Since people are already using 6.6-rc1, I think it's better to get
the code out there --- even if they don't care about wdeleg_getattr,
currently they get _no_ nfsd stats with 6.6-rc1.

I will make two follow-up PRs to clean this up in the next releases of
procfs and node-exporter.

Signed-off-by: Tobias Klausmann <klausman@schwarzvogel.de>
2023-11-14 03:54:11 +01:00
dongjiang
86ed8cdc6b
NFSd: fix nfsd v4 index miss (#2824)
* fix nfsd v4 index miss

---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
2023-10-16 18:14:21 +02:00
Ben Kochie
31a9cca551
Update e2e fixtures
Update for fixes in https://github.com/prometheus/procfs/pull/543

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-10-16 13:37:17 +02:00
John Kordich
933b1c1797 Add new node_cpu_frequency_hertz metric
Revert changes to node_cpu_info and add new node_cpu_frequency_hertz
metric for measuring CPU frequency from /proc/cpuinfo

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich
e84c278107 Update e2e-output.txt with new expected metric values
Changes the e2e-output.txt file to have the expected CPU MHz values
for the node_cpu_info metric.

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
Ben Kochie
7c564bcbef
Fixup hwmon chip include (#2739)
Use the correct include value to the device filter function.
* Add new bogus hwmon fixture.
* Update end-to-end test to use hwmon chip include flag.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-07-10 12:46:30 +02:00
Michal
c31ebb4359
Add cpu vulnerabilities reporting from sysfs (#2721)
* Add cpu vulnerabilities reporting from sysfs

---------

Signed-off-by: Michal Wasilewski <michal@mwasilewski.net>
2023-07-01 14:21:49 +02:00
Abbey Woodyear
eaacb2e3c7
exposing softirq metrics (#2294)
Signed-off-by: abbeywoodyear <abbey.woodyear@thehutgroup.com>
2023-05-25 15:09:32 +02:00
Remi Jouannet
df1b53bee2
softnet: additionals metrics from softnet_data, (#2592)
* softnet: additionals metrics from softnet_data, https://github.com/prometheus/procfs/pull/473
---------

Signed-off-by: remi <remijouannet@gmail.com>
Signed-off-by: Rémi Jouannet <remijouannet@gmail.com>
2023-05-24 17:23:13 +02:00
dongjiang
1bbb2a94c0
fix(zfs): add memory_available_bytes, fix dbufstats filename on Linux (#2687)
* Fix zfs memory_available_bytes collector
* Fix zfs dbufstats collector
---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
2023-05-23 11:13:48 +02:00
Ben Kochie
3f64e91b0d
Update Go modules (#2695)
Update Prometheus modules to latest releases.
* Add missing fixtures for cpus online/offline.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-05-23 06:14:58 +02:00
Ben Kochie
d2dd793e39
Update e2e output fixtures (#2696)
Fix up correct e2e output for node_power_supply_info.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-05-22 17:28:45 +02:00
Sal Sal
dcb10ff291
bcache: remove cache_readaheads_totals metrics #2103 (#2583)
* bcache: remove cache_readaheads_totals metrics #2103

Signed-off-by: Saleh Sal <0xack13@gmail.com>

* Append bcacheReadaheadMetrics when CacheReadaheads value exists

Signed-off-by: Saleh Sal <0xack13@gmail.com>

* Update test cases for cachereadahead greater than zero

Signed-off-by: Saleh Sal <0xack13@gmail.com>

---------

Signed-off-by: Saleh Sal <0xack13@gmail.com>
2023-05-20 14:13:07 +02:00
Maximilian Wilhelm
c8129fadd6
Expose administrative state of network interfaces as 'adminstate'. (#2515)
Signed-off-by: Maximilian Wilhelm <max@sdn.clinic>
2023-05-02 15:25:05 +02:00
Pablo Caderno
d31af1d1e5
feat: added suspended as a node_zfs_zpool_state (#2449)
Signed-off-by: Pablo Caderno <kaderno@gmail.com>
2023-04-26 18:12:54 +02:00
Lukas Coppens
fe19fdd1e8 feat: add support for cpu freq governor metrics
Signed-off-by: Lukas Coppens <lukas.coppens@be-mobile.com>
2023-03-10 18:19:33 +01:00
Daniël van Eeden
8d3c594346
interrupts_linux: Fix fields on aarch64 (#2631)
* interrupts_linux: Fix fields on aarch64

Fixes #2557

---------

Signed-off-by: Daniël van Eeden <git@myname.nl>
2023-03-10 13:02:33 +01:00
Ben Kochie
2d77d8c562
Update e2e output for new common version.
Signed-off-by: Ben Kochie <superq@gmail.com>
2023-01-20 10:38:19 +01:00
Ben Kochie
13a5cc1f74
Refactor netclass_rtnl collector (#2528)
* Refactor netclass_rtnl collector

Merge the netclass_rtnl collector into the netclass collector.
* Disabled by default
* Followup to #2492

Signed-off-by: Ben Kochie <superq@gmail.com>
2022-11-29 11:22:25 +01:00
Ben Kochie
98a40bd712
Fix hwmon label sanitizer (#2504)
We don't need to fully sanitize the hwmon label values to metric/label
name strings.
* Just make sure they're valid UTF-8.
* Always included the label metric to avoid group_left failures.

Signed-off-by: Ben Kochie <superq@gmail.com>

Signed-off-by: Ben Kochie <superq@gmail.com>
2022-10-11 14:40:28 +02:00
Darshil Chanpura
daba360c93
Archived fixtures/udev similar to fixtures/sys to avoid go-get errors, fixes #2482 (#2485)
Signed-off-by: Darshil Chanpura <darshil@thatwebsite.xyz>
2022-09-27 23:07:57 +02:00
Guillaume E
863f3ac54c
Merge metrics descriptions in textfile collector (#2475)
The textfile collector will now provide a unified metric description
(that will look like "Metric read from file/a.prom, file/b.prom")
for metrics collected accross several text-files that don't already
have a description.

Also change the error handling in the textfile collector tests to
ContinueOnError to better mirror the real-life use-case.

Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>

Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
2022-09-20 12:49:21 +02:00
Ben Kochie
88a031567f
Merge pull request #2074 from BenoitKnecht/netdev-linux-netlink
collector/netdev_linux.go: Use netlink to get stats
2022-07-27 13:47:01 +02:00
Benoît Knecht
a71d0bddc8 end-to-end-test.sh: Fix netdev metrics
Since netdev metrics are now read from netlink instead of `/proc/net/dev`, we
can't easily spoof them for the end-to-end tests by reading a fixture file in
place of `/proc/net/dev`.

Therefore, we only get metrics for `lo` and ignore those that would return
unpredictable values (i.e. the byte and packet counters).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-07-26 13:24:20 +02:00
DavidVentura
6477a197da adjust expected output for 64k file
Signed-off-by: DavidVentura <davidventura27@gmail.com>
2022-07-26 12:25:23 +02:00
david
9ea9a5f029 only publish metrics for isolated cpus
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
698670bb6e add fixture & e2e output
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
Johannes 'fish' Ziemke
d962e48ca2 Add sysctl collector
Signed-off-by: Johannes Ziemke <github@5pi.de>
2022-07-25 18:27:48 +02:00
Benoît Knecht
296aa35dd2 end-to-end-test.sh: Use udev fixture and update output
Set the `--path.udev.data` flag to point to the udev fixture, and update the
output fixture with

```console
$ ./end-to-end-test.sh -u
```

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-07-06 12:30:50 +02:00
Benoît Knecht
9b5d55e511 collector/diskstats: Add fixtures for udev data
Now that we read some data from `/run/udev/data`, add the corresponding
fixtures and update the expected test results accordingly.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-07-06 12:30:50 +02:00
Nobuhiro MIKI
3ed95908d6 collector: add slab info
Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
2022-07-06 12:18:27 +02:00
Jonathan Davies
88f1811eb1
Add selinux collector (#2205)
Add selinux collector

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
2022-06-28 05:54:05 +02:00
dependabot[bot]
b99f933713
Bump github.com/prometheus/client_golang from 1.12.1 to 1.12.2 (#2411)
* Bump github.com/prometheus/client_golang from 1.12.1 to 1.12.2

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.1 to 1.12.2.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.12.1...v1.12.2)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update fixtures for client_golang 1.12.2.

Signed-off-by: Ben Kochie <superq@gmail.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
2022-06-26 11:33:15 +02:00
Ben Kochie
59c146e57d
Update end-to-end test for aarch64 (#2415)
Fix up handling of CPU info collector on non-x86_64 systems due to
fixtures containing `/proc/cpuinfo` from x86_64.
* Update e2e 64k page test fixture from an arm64 system.
* Enable ARM testing in CircleCI.

Fixes: https://github.com/prometheus/node_exporter/issues/1959

Signed-off-by: Ben Kochie <superq@gmail.com>
2022-06-26 09:41:21 +02:00
Ben Kochie
a516d4de4a
Cleanup cgroups collector (#2414)
* Correctly name collector file.
* Fix cgroup summary type as gauge.
* Use a boolean metric rather than a label for enabled.

Signed-off-by: Ben Kochie <superq@gmail.com>
2022-06-24 17:15:31 +02:00
Kobe Biello
45c75f1dbc
Add cgroup summary collector (#2408)
* add cgroups summary collector

Signed-off-by: biello <bellusa@qq.com>
Co-authored-by: bielu <bielu@zuoyebang.com>
2022-06-24 12:05:13 +02:00
Fionera
9ece38fca9 refactor: Use netlink for tcpstat collector
Signed-off-by: Tim Windelschmidt <t.windelschmidt@babiel.com>
2022-04-25 10:13:06 +02:00
Ben Kochie
9155971e07
Update Go modues
Update to latest releases.
* Fix up perf collector syntax.

Signed-off-by: Ben Kochie <superq@gmail.com>
2022-03-30 11:47:09 +02:00
Ben Kochie
eecc2b1dea
Add device filter flags to arp collector
Allow filtering APR entries based on device. Useful for ignoring
entries for network namespaces (containers).

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-12-16 15:41:10 +01:00
heyitao
7dbf358915 delete duplicate items
Signed-off-by: heyitao <linuxgcc@163.com>
2021-12-09 11:50:10 +01:00
Ben Kochie
1d5afd05b5
Sanitize UTF-8 in dmi collector (#2229)
Replace invalid UTF-8 chars with "�" string.

Fixes: https://github.com/prometheus/node_exporter/issues/2228

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-12-01 11:13:43 +01:00
Jacob Vosmaer
5c8d162ca6
Add node_softirqs_total metric (#2221)
This adds a new Linux metric, node_softirqs_total, which corresponds
to the 'softirq' line in /proc/stat. This metric is disabled by
default and it can be enabled with '--collector.stat.softirq'.

Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
2021-12-01 09:55:13 +01:00
Martin Kennelly
4065902fe5
Add TCPTimeouts to netstat default filter (#2189)
TCP timeouts count is a useful signal to show
abnormal network performance and is another
signal to aid debugging. This metric can be
used to generate proactive alerts for host
network namespace workloads.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
2021-11-18 09:34:55 +01:00
Benjamin Drung
d85cbaa17c
ethtool: Prevent duplicate metric names (#2187)
Sanitizing the metric names can lead to duplicate metric names:

```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```

Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.

Fixes: https://github.com/prometheus/node_exporter/issues/2185
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
2021-11-15 11:22:36 +01:00
Johannes 'fish' Ziemke
85e20238e7
Add clocksource metrics to time collector (#2197)
* Add clocksource metrics to time collector

This closes #1336

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-11-12 11:45:31 +01:00