While the CPU vulnerabilities collector has been added in https://github.com/prometheus/node_exporter/pull/2721 , it's currently not including information regarding the mitigation strategy used for a given vulnerability.
This information can be quite valuable, as often times different mitigation strategies come with a different performance impact.
This commit adds a third label to the cpu_vulnerabilities_info metric, to include the "mitigation" used for a given vulnerability - if a given vulnerability is not affecting a node or the node is still vulnerable, the mitigation is expected to be empty.
Signed-off-by: João Lima <jlima@cloudflare.com>
Adds a count for TCP packets received out of orders. This can be an
indication that there is packet loss on the way packets travel towards
this server. In that case, the sender will retransmit (and we can
already monitor the Tcp_RetransSegs there), but we have no way to
monitor the packet loss on the receiver side. When a packet is received
and the receiver detects previous one missing, it will increase the
TCPOFOQueue counter and reply with selective ACK to the sender, both
possible indications of packet loss. Confirmation of packet loss can be
achieved by taking packet captures, ignoring wireshark analysis, and
carefully looking at data being retransmitted based on the TCP seq.
Just like RetransSegs, TCPOFOQueue should be interesting for any
deployment as a mean to detect packet loss, so here suggesting adding it
to the default list.
Signed-off-by: François Rigault <frigo@amadeus.com>
Co-authored-by: François Rigault <frigo@amadeus.com>
This attribute was introduced it v6.6-rc1.
The relevant changes in procfs were merged here:
https://github.com/prometheus/procfs/pull/574
and are part of procfs v0.11.2
I have also figured out that the stat should be part of the v4 ops
counters struct, but that will need changes to both procfs and this
code. Since people are already using 6.6-rc1, I think it's better to get
the code out there --- even if they don't care about wdeleg_getattr,
currently they get _no_ nfsd stats with 6.6-rc1.
I will make two follow-up PRs to clean this up in the next releases of
procfs and node-exporter.
Signed-off-by: Tobias Klausmann <klausman@schwarzvogel.de>
Revert changes to node_cpu_info and add new node_cpu_frequency_hertz
metric for measuring CPU frequency from /proc/cpuinfo
Signed-off-by: John Kordich <jkordich@gmail.com>
Use the correct include value to the device filter function.
* Add new bogus hwmon fixture.
* Update end-to-end test to use hwmon chip include flag.
Signed-off-by: Ben Kochie <superq@gmail.com>
* bcache: remove cache_readaheads_totals metrics #2103
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Append bcacheReadaheadMetrics when CacheReadaheads value exists
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Update test cases for cachereadahead greater than zero
Signed-off-by: Saleh Sal <0xack13@gmail.com>
---------
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Refactor netclass_rtnl collector
Merge the netclass_rtnl collector into the netclass collector.
* Disabled by default
* Followup to #2492
Signed-off-by: Ben Kochie <superq@gmail.com>
We don't need to fully sanitize the hwmon label values to metric/label
name strings.
* Just make sure they're valid UTF-8.
* Always included the label metric to avoid group_left failures.
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
The textfile collector will now provide a unified metric description
(that will look like "Metric read from file/a.prom, file/b.prom")
for metrics collected accross several text-files that don't already
have a description.
Also change the error handling in the textfile collector tests to
ContinueOnError to better mirror the real-life use-case.
Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
Since netdev metrics are now read from netlink instead of `/proc/net/dev`, we
can't easily spoof them for the end-to-end tests by reading a fixture file in
place of `/proc/net/dev`.
Therefore, we only get metrics for `lo` and ignore those that would return
unpredictable values (i.e. the byte and packet counters).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Set the `--path.udev.data` flag to point to the udev fixture, and update the
output fixture with
```console
$ ./end-to-end-test.sh -u
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Now that we read some data from `/run/udev/data`, add the corresponding
fixtures and update the expected test results accordingly.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Fix up handling of CPU info collector on non-x86_64 systems due to
fixtures containing `/proc/cpuinfo` from x86_64.
* Update e2e 64k page test fixture from an arm64 system.
* Enable ARM testing in CircleCI.
Fixes: https://github.com/prometheus/node_exporter/issues/1959
Signed-off-by: Ben Kochie <superq@gmail.com>
* Correctly name collector file.
* Fix cgroup summary type as gauge.
* Use a boolean metric rather than a label for enabled.
Signed-off-by: Ben Kochie <superq@gmail.com>
Allow filtering APR entries based on device. Useful for ignoring
entries for network namespaces (containers).
Signed-off-by: Ben Kochie <superq@gmail.com>
This adds a new Linux metric, node_softirqs_total, which corresponds
to the 'softirq' line in /proc/stat. This metric is disabled by
default and it can be enabled with '--collector.stat.softirq'.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
TCP timeouts count is a useful signal to show
abnormal network performance and is another
signal to aid debugging. This metric can be
used to generate proactive alerts for host
network namespace workloads.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Sanitizing the metric names can lead to duplicate metric names:
```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```
Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.
Fixes: https://github.com/prometheus/node_exporter/issues/2185
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>