Commit Graph

572 Commits

Author SHA1 Message Date
Sachi King
18fc512fc4 Bond: Monitor bond mii_status not link operstate (#1124)
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.

When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.

When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking.  This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.

If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`.  Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.

This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.

Signed-off-by: Sachi King <nakato@nakato.io>
2019-02-10 11:00:04 +01:00
Paul Gier
e0d6d11859 netclass_linux: remove varying labels from the 'up' metric (#1243)
* netclass_linux: remove varying labels from the 'up' metric

This moves the variable label values such as 'operstate' out of
the 'network_up' metric and into a separate metric called '_info'.
This allows the 'up' metric to remain continous over state changes.
Fixes #1236

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-07 15:59:32 +01:00
Johannes 'fish' Ziemke
6ea0aa73e4 Rename interface to device in netclass collector (#1224)
* Rename interface to device in netclass collector

This makes it consistent with other networking metrics like node_network_receive_bytes_total

This closes #1223 

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2019-02-06 20:02:48 +01:00
Ralf Horstmann
3867ad5ab0 Add diskstats collector for OpenBSD (#1250)
* Add diskstats collector for OpenBSD

Tested on i386 and amd64, OpenBSD 6.4 and -current.

* Refactor diskstats collectors

This moves common descriptors from Linux, Darwin, OpenBSD
diskstats collectors into diskstats_common.go

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2019-02-06 11:36:22 +01:00
David O'Rourke
d442108d7a collector: Implement uname collector for FreeBSD (#1239)
* collector: Implement uname collector for FreeBSD

Signed-off-by: David O'Rourke <david.orourke@gmail.com>
2019-02-05 17:39:24 +01:00
Paul Gier
2b81bff518 collector: use path/filepath for handling file paths (#1245)
Similar to #1228.  Update the remaining collectors to use
'path/filepath' intead of 'path' for manipulating file paths.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-05 16:37:27 +01:00
Ralf Horstmann
dda51ad06a Fix staticcheck ST1003 warnings (#1249)
This fixes a few staticcheck ST1003 warnings in OpenBSD CPU
collector. No functional change.

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2019-02-05 07:46:50 +01:00
mknapphrt
7fbdd0ae93 Update procfs vendor (#1248)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-02-04 16:54:41 +01:00
Paul Gier
40dce45d8d collector/systemd: add new label "type" for systemd_unit_state (#1229)
Adds a new label called "type" systemd_unit_state which contains the
Type field from the unit file.  This applies only to the .service and
.mount unit types.  The other unit types do not include the optional
type field.

Fixes #1210

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-01-29 23:54:47 +01:00
Matt Layher
3b5c2f6463 collector: use path/filepath for handling file paths (#1228)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-21 17:44:55 +01:00
Jon Davies
e766485286 Add kstat-based Solaris metrics (#1197)
* collector/loadavg_solaris.go: Use libkstat to gather load averages.
* go.mod: Added go-kstat.
* boot_time_solaris.go: Added.
* cpu_solaris.go: Added.
* README.md: Updated entries for Solaris.
* collector/zfs_solaris.go: Added.
* CHANGELOG.md: Added note about kstat-based Solaris metrics.

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
2019-01-12 13:33:56 +01:00
Ben Kochie
070e4b2e17 Update Makefile.common (#1220)
* Update Makefile.common

Update to new staticcheck method[0].

[0]: https://github.com/prometheus/prometheus/pull/5057

Signed-off-by: Ben Kochie <superq@gmail.com>

* Fix staticcheck errors.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-01-04 15:58:53 +00:00
Ben Kochie
73ddf5f1f7 netstat: Add TCP In/Out Segs (#1185)
* netstat: Add TCP In/Out Segs

In order to get a better idea of TCP packet loss, we need to know how
many `node_netstat_Tcp_OutSegs` there are so we can compare this to
`node_netstat_Tcp_RetransSegs`.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Update fixtures

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-12-08 12:16:02 +01:00
Tariq Ibrahim
6bd51269b7 update to host_statistics64 for Darwin meminfo (#1183)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2018-12-06 16:47:20 +01:00
Ben Kochie
4abc6fba7d
Add fallback for missing /proc/1/mounts (#1172)
* Add fallback for missing /proc/1/mounts

On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add tests.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add CHANGELOG entry.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:55 +01:00
Jerome Froelich
0cb0c4d911 Remove unused variable readOnly from filesystem_linux.go. (#1173)
The pull request #1002 changed the logic used on Linux servers to determine if a filesystem is
read-only. As a result of this change, the variable `readOnly` is now unused and can be removed.

Signed-off-by: Jerome Froelich <jeromefroelich@hotmail.com>
2018-11-30 14:01:39 +01:00
Nemikolh
62f99f95f0 Add receive/transmit bytes total metric (wifi collector). (#1150)
Signed-off-by: Nemikolh <Nemikolh@users.noreply.github.com>
2018-11-19 19:15:54 +01:00
ioriveur
17fee8081f Check BSD's mib which accounts for swap size (#1149)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Check BSD's mib which accounts for swap size; see #1127

Signed-off-by: iori-yja <fivo.11235813@gmail.com>

* fix swap check code

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-17 11:02:54 +01:00
Arno Uhlig
6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Ben Kochie
b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Matt Layher
073e056121
Merge pull request #1131 from prometheus/mdl-collector-export
collector: export NodeCollector for documentation purposes
2018-10-31 12:38:48 -04:00
Matt Layher
c0a55e3f80 collector: add bounds check and test for filesystem collector (#1133)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-30 22:12:42 +01:00
Patrick
bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier
988f049040 collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (#1123)
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:49:22 +01:00
Paul Gier
38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Matt Layher
778124a56c collector: add bounds check and test for tcpstat collector (#1134)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:21:36 +02:00
Matt Layher
3d798aa4a1 collector: fix golint problems in ZFS collector (#1132)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:18:33 +02:00
Matt Layher
2c2ee93519
collector: export NodeCollector for documentation purposes
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-26 15:42:00 -04:00
Ben Kochie
a0a164defb
Update cpufreq metrics collector (#1117)
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Paul Gier
7057c64f45 fix a few minor golint warnings (#1110)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 18:44:06 +02:00
Paul Gier
e8d8199072 Update diskstats for linux kernel 4.19 (#1109)
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields.  See: https://www.kernel.org/doc/Documentation/iostats.txt

* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 17:24:28 +02:00
Ben Kochie
0880d460d7
Ignore additional virtual filesystems (#1104)
Add more virtual filesystems to the default ignore list
* bpf
* cgroup2
* selinuxfs
* squashfs

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-12 11:24:32 +02:00
Dario Maiocchi
01ec8c5c5c Remove continue with label (#1084)
Instead of continue with label use helper function
Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
2018-10-05 13:20:30 +02:00
Ben Kochie
a1ce712e22
Cleanup unused /proc/mounts fixture. (#1097)
* Cleanup unused /proc/mounts fixture.
* Ignore Uint -> Unit in codespell.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-04 18:07:12 +02:00
Mario Trangoni
3659260b66 infiniband: Handle iWARP* RDMA modules N/A (#974)
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available

This is related to #966, and handle this error,

Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-10-04 15:05:59 +02:00
Yecheng Fu
0f9842f20a [continue 912] strip rootfs prefix for run in docker (#1058)
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.

Closes: #66

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
2018-10-04 14:11:21 +02:00
Ralf Horstmann
9f820bd3ee Update cpu collector for OpenBSD 6.4 (#1094)
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.

SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.

For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2018-10-02 10:21:30 +02:00
Daniele Sluijters
d999dacdc6 filesystem: Ignore netns/nsfs mounts (#1047)
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).

This ignores all nsfs filesystems.

Fixes #875

Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
2018-09-26 10:45:51 +02:00
Ben Kochie
0fdc089187
Change systemd unit filtering (#1083)
* Change systemd unit filtering

Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-24 15:04:55 +02:00
Luca Bruno
4672ea1671 collector/timex: remove cgo dependency (#1079)
This removes the cgo import from timex collector, as it was only used
to define two constants. Those are part of the Linux kernel<->userspace
interface, thus there is no need to depend on libc to source them:
https://github.com/torvalds/linux/blob/v4.18/include/uapi/linux/timex.h

Signed-off-by: Luca Bruno <luca.bruno@coreos.com>
2018-09-20 11:51:34 +02:00
Björn Rabenstein
1c9ea46cca Update vendoring for client_golang and friends (#1076)
Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 17:09:52 +02:00
Ben Kochie
ebdd524123
Correctly cast Darwin memory info (#1060)
* Correctly cast Darwin memory info

* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-07 22:27:52 +02:00
Marco Tulio R Braga
05e55bddad Fix typo on description of read_time_seconds_total (#1057)
Fix typo on unit description of metric `*read_time_seconds_total` from milliseconds to seconds.

Signed-off-by: Marco Tulio R Braga <marco.tulio@mtulio.eng.br>
2018-09-02 09:46:45 +02:00
Dan Fredell
c52e0d3353 Fix SmartOS build #1017 (#1018)
Signed-off-by: Dan Fredell <Dan.Fredell@gmail.com>
2018-08-23 10:57:15 +00:00
James Hartig
60c827231a NRestarts or NRefused aren't available on older systemd versions (#1039)
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available

Signed-off-by: James Hartig <james@getadmiral.com>
2018-08-14 14:28:26 +02:00
Ben Kochie
fe5a117831
Handle vanishing PIDs (#1043)
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.

* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-13 17:27:23 +02:00
Ben Kochie
0662673ad6
Disable wifi collector by default (#1037)
* Disable wifi collector by default

Disable the wifi collector by default due to suspected cashing issues and goroutine leaks.
* https://github.com/prometheus/node_exporter/issues/870
* https://github.com/prometheus/node_exporter/issues/1008

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-07 10:27:20 +02:00
Ben Kochie
5d23ad0ca7
Fix supervisord collector (#978)
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric

* Use Prometheus best practices for uptime metric.
  * Use "start time" rather than "uptime".
  * Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-06 16:54:46 +02:00
Julius Volz
2c52b8c761
systemd: Remove unneeded/unhandled error returns (#1035)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 16:55:25 +02:00
Christian Hoffmann
6bdc5558ec build: make staticcheck happy by using real regexp patterns #1025 (#1026)
Signed-off-by: Christian Hoffmann <mail@hoffmann-christian.info>
2018-07-30 07:57:18 +02:00