The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.
Fixes#1241
Signed-off-by: Paul Gier <pgier@redhat.com>
This reduces the system metric collection time by using a wait group
and go routines to allow the systemd metric calls happen concurrently.
Also, makes the start time, restarts, tasks_max, and tasks_current metrics disabled by default
because these can be time consuming to gather.
Signed-off-by: Paul Gier <pgier@redhat.com>
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.
When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.
When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking. This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.
If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`. Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.
This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.
Signed-off-by: Sachi King <nakato@nakato.io>
* netclass_linux: remove varying labels from the 'up' metric
This moves the variable label values such as 'operstate' out of
the 'network_up' metric and into a separate metric called '_info'.
This allows the 'up' metric to remain continous over state changes.
Fixes#1236
Signed-off-by: Paul Gier <pgier@redhat.com>
* Rename interface to device in netclass collector
This makes it consistent with other networking metrics like node_network_receive_bytes_total
This closes#1223
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
* Add diskstats collector for OpenBSD
Tested on i386 and amd64, OpenBSD 6.4 and -current.
* Refactor diskstats collectors
This moves common descriptors from Linux, Darwin, OpenBSD
diskstats collectors into diskstats_common.go
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
Similar to #1228. Update the remaining collectors to use
'path/filepath' intead of 'path' for manipulating file paths.
Signed-off-by: Paul Gier <pgier@redhat.com>
Adds a new label called "type" systemd_unit_state which contains the
Type field from the unit file. This applies only to the .service and
.mount unit types. The other unit types do not include the optional
type field.
Fixes#1210
Signed-off-by: Paul Gier <pgier@redhat.com>
> ST1003 – Poorly chosen identifier (non-default)
> Identifiers, such as variable and package names, follow certain rules.
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
Add this new metric (where sda is active and sdb is in standby mode):
smartmon_device_active{disk="/dev/sda",type="sat"} 1
smartmon_device_active{disk="/dev/sdb",type="sat"} 0
Also skip further metrics if the drive is in a low-power mode. This
prevents spinning up disks just to get the metrics (which matches e.g.
debian's default behavior for smartd).
Signed-off-by: Andre Heider <a.heider@gmail.com>
* netstat: Add TCP In/Out Segs
In order to get a better idea of TCP packet loss, we need to know how
many `node_netstat_Tcp_OutSegs` there are so we can compare this to
`node_netstat_Tcp_RetransSegs`.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Update fixtures
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add fallback for missing /proc/1/mounts
On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add tests.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add CHANGELOG entry.
Signed-off-by: Ben Kochie <superq@gmail.com>
The pull request #1002 changed the logic used on Linux servers to determine if a filesystem is
read-only. As a result of this change, the variable `readOnly` is now unused and can be removed.
Signed-off-by: Jerome Froelich <jeromefroelich@hotmail.com>
* Convert to Go modules
* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack
* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.
Signed-off-by: Ben Kochie <superq@gmail.com>
We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result:
```
sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212"
```
I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this.
Signed-off-by: Andreas Wirooks <andreas.wirooks@1und1.de>
In order to avoid stuck collectors using up all system resources, add a
limit to the number of parallel in-flight scrape requests. This will
return a 503 error.
Default to 40 requests, this seems like a reasonable number based on:
* Two Prometheus servers scraping every 15 seconds.
* Failing scrapes after 5 minutes of stuckness.
Signed-off-by: Ben Kochie <superq@gmail.com>
If this flag is set, the metrics about the exporter itself (go_*,
process_*, promhttp_*) will be excluded from /metrics.
The Kingpin way of handling boolean flags makes the negative flag
wording (_dis_able) the most reasonably one.
This also refactors the flow in node_exporter.go quite a bit to avoid
mixing up the global and a local registry and to avoid re-creating a
registry even if no filtering is requested.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.
Signed-off-by: Ben Kochie <superq@gmail.com>
* storcli.py: Remove IntEnum
This removes an external dependency.
Moved VD state to VD info labels
* storcli.py: Fix BBU health detection
BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU.
* storcli.py: Strip all strings from PD
Strip all strings that we get from PDs.
They often contain whitespaces....
* storcli.py: Add formatting options
Add help text explaining how this documented was formatted
* storcli.py: Add DG to pd_info label
Add disk group to pd_info.
That way we can relate to PDs in the same DG.
For example to check if all disks in one RAID
use the same interface...
* storcli.py: Fix promtool issues
Fix linting issues reported by promtool check-metrics
* storcli.py: Exit if storcli reports issues
storcli reports if the command was a success.
We should not continue if there are issues.
* storcli.py: Try to parse metrics to float
This will sanitize the values we hand over to
node_exporter - eliminating any unforeseen values we read out...
* storcli.py: Refactor code to implement handle_sas_controller()
Move code into methods so that we can now also support HBA queries.
* storcli.py: Sort inputs
"...like a good python developer"
- Daniel Swarbrick
* storcli.py: Replace external dateutil library with internal datetime
Removes external dependency...
* storcli.py: Also collect temperature on megaraid cards
We have already collected them on mpt3sas cards...
* storcli.py: Clean up old code
Removed dead code that is not used any more.
* storcli.py: strip() all information for labels
They often contain whitespaces...
* storcli.py: Try to catch KeyErrors generally
If some key we expect is not there, we will want to
still print whatever we have collected so far...
* storcli.py: Increment version number
We have made some changes here and there.
The general look of the data has not been changed.
* storcli.py: Fix CodeSpell issue
Split string to avoid issues with Codespell due to Celcius in JSON Key
Signed-off-by: Christopher Blum <zeichenanonym@web.de>