Commit Graph

555 Commits

Author SHA1 Message Date
ioriveur
17fee8081f Check BSD's mib which accounts for swap size (#1149)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Check BSD's mib which accounts for swap size; see #1127

Signed-off-by: iori-yja <fivo.11235813@gmail.com>

* fix swap check code

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-17 11:02:54 +01:00
Arno Uhlig
6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Ben Kochie
b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Matt Layher
073e056121
Merge pull request #1131 from prometheus/mdl-collector-export
collector: export NodeCollector for documentation purposes
2018-10-31 12:38:48 -04:00
Matt Layher
c0a55e3f80 collector: add bounds check and test for filesystem collector (#1133)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-30 22:12:42 +01:00
Patrick
bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier
988f049040 collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (#1123)
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:49:22 +01:00
Paul Gier
38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Matt Layher
778124a56c collector: add bounds check and test for tcpstat collector (#1134)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:21:36 +02:00
Matt Layher
3d798aa4a1 collector: fix golint problems in ZFS collector (#1132)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:18:33 +02:00
Matt Layher
2c2ee93519
collector: export NodeCollector for documentation purposes
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-26 15:42:00 -04:00
Ben Kochie
a0a164defb
Update cpufreq metrics collector (#1117)
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Paul Gier
7057c64f45 fix a few minor golint warnings (#1110)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 18:44:06 +02:00
Paul Gier
e8d8199072 Update diskstats for linux kernel 4.19 (#1109)
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields.  See: https://www.kernel.org/doc/Documentation/iostats.txt

* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 17:24:28 +02:00
Ben Kochie
0880d460d7
Ignore additional virtual filesystems (#1104)
Add more virtual filesystems to the default ignore list
* bpf
* cgroup2
* selinuxfs
* squashfs

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-12 11:24:32 +02:00
Dario Maiocchi
01ec8c5c5c Remove continue with label (#1084)
Instead of continue with label use helper function
Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
2018-10-05 13:20:30 +02:00
Ben Kochie
a1ce712e22
Cleanup unused /proc/mounts fixture. (#1097)
* Cleanup unused /proc/mounts fixture.
* Ignore Uint -> Unit in codespell.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-04 18:07:12 +02:00
Mario Trangoni
3659260b66 infiniband: Handle iWARP* RDMA modules N/A (#974)
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available

This is related to #966, and handle this error,

Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-10-04 15:05:59 +02:00
Yecheng Fu
0f9842f20a [continue 912] strip rootfs prefix for run in docker (#1058)
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.

Closes: #66

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
2018-10-04 14:11:21 +02:00
Ralf Horstmann
9f820bd3ee Update cpu collector for OpenBSD 6.4 (#1094)
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.

SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.

For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2018-10-02 10:21:30 +02:00
Daniele Sluijters
d999dacdc6 filesystem: Ignore netns/nsfs mounts (#1047)
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).

This ignores all nsfs filesystems.

Fixes #875

Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
2018-09-26 10:45:51 +02:00
Ben Kochie
0fdc089187
Change systemd unit filtering (#1083)
* Change systemd unit filtering

Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-24 15:04:55 +02:00
Luca Bruno
4672ea1671 collector/timex: remove cgo dependency (#1079)
This removes the cgo import from timex collector, as it was only used
to define two constants. Those are part of the Linux kernel<->userspace
interface, thus there is no need to depend on libc to source them:
https://github.com/torvalds/linux/blob/v4.18/include/uapi/linux/timex.h

Signed-off-by: Luca Bruno <luca.bruno@coreos.com>
2018-09-20 11:51:34 +02:00
Björn Rabenstein
1c9ea46cca Update vendoring for client_golang and friends (#1076)
Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 17:09:52 +02:00
Ben Kochie
ebdd524123
Correctly cast Darwin memory info (#1060)
* Correctly cast Darwin memory info

* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-07 22:27:52 +02:00
Marco Tulio R Braga
05e55bddad Fix typo on description of read_time_seconds_total (#1057)
Fix typo on unit description of metric `*read_time_seconds_total` from milliseconds to seconds.

Signed-off-by: Marco Tulio R Braga <marco.tulio@mtulio.eng.br>
2018-09-02 09:46:45 +02:00
Dan Fredell
c52e0d3353 Fix SmartOS build #1017 (#1018)
Signed-off-by: Dan Fredell <Dan.Fredell@gmail.com>
2018-08-23 10:57:15 +00:00
James Hartig
60c827231a NRestarts or NRefused aren't available on older systemd versions (#1039)
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available

Signed-off-by: James Hartig <james@getadmiral.com>
2018-08-14 14:28:26 +02:00
Ben Kochie
fe5a117831
Handle vanishing PIDs (#1043)
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.

* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-13 17:27:23 +02:00
Ben Kochie
0662673ad6
Disable wifi collector by default (#1037)
* Disable wifi collector by default

Disable the wifi collector by default due to suspected cashing issues and goroutine leaks.
* https://github.com/prometheus/node_exporter/issues/870
* https://github.com/prometheus/node_exporter/issues/1008

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-07 10:27:20 +02:00
Ben Kochie
5d23ad0ca7
Fix supervisord collector (#978)
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric

* Use Prometheus best practices for uptime metric.
  * Use "start time" rather than "uptime".
  * Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-06 16:54:46 +02:00
Julius Volz
2c52b8c761
systemd: Remove unneeded/unhandled error returns (#1035)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 16:55:25 +02:00
Christian Hoffmann
6bdc5558ec build: make staticcheck happy by using real regexp patterns #1025 (#1026)
Signed-off-by: Christian Hoffmann <mail@hoffmann-christian.info>
2018-07-30 07:57:18 +02:00
Hannes Körber
14a4f0028e Enable nfs protocol (#998)
* vendor: Update prometheus/procfs

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>

* mountstats: Use new NFS protocol field

In https://github.com/prometheus/procfs/pull/100, the NFSTransportStats
struct was expanded by a field called protocol that specifies the NFS
protocol in use, either "tcp" or "udp". This commit adds the protocol as
a label to all NFS metrics exported via the mountstats collector.

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>

* Update fixtures for UDP mount

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>
2018-07-24 00:47:12 +02:00
Johannes Wienke
5c780d132c Exclude only subdirectories of /var/lib/docker (#1003)
It is quite common to put /var/lib/docker itself on a separate partition
and that should be monitored as well.

Signed-off-by: Johannes Wienke <languitar@semipol.de>
2018-07-23 15:43:42 +02:00
Ben Kochie
23f95c8e04
Fix ntp collector thread safety (#1014)
Make the ntp collector thread safe by wrapping a mutex lock around the
leapMidnight variable.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-07-22 14:36:33 +02:00
xginn8
140b8b85c3 Filter out uninstalled systemd units when collecting all units (#1011)
fixes #567

Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
2018-07-22 09:20:03 +02:00
Sven Lange
2ae8c1c7a7 Add systemd uptime metric collection (#952)
* Add systemd uptime metric collection

Signed-off-by: Sven Lange <tdl@hadiko.de>
2018-07-18 16:02:05 +02:00
neiledgar
7e4d9bd150 Update wifi stats to support multiple stations (#977) (#980)
Signed-off-by: neiledgar <neil.edgar@btinternet.com>
2018-07-16 16:02:25 +02:00
xginn8
9b97f44a70 Add a counter for refused socket unit connections, available as of systemd 239 (#995)
Signed-off-by: xginn8 <mamcgi@gmail.com>
2018-07-16 16:01:42 +02:00
Brandon Gilmore
76bbd8dd18 Use /proc/mounts instead of statfs(2) for ro state (#1002)
While the statfs(2) approach is reliable for normally mounted filesystems, the
flags returned can be inconsistent when filesystem has been remounted read-only
after encountering an error. The returned flags do accurately represent the
internal state of the filesystem, but they do not reflect whether the VFS layer
will accept writes. Instead, it makes sense to parse the current VFS mount
state from the options field in /proc/mounts since it takes precedence.

Signed-off-by: Brandon Gilmore <bgilmore@valvesoftware.com>
2018-07-16 15:56:27 +02:00
Jan Klat
c4102f1175 Add sys/class/net parsing from procfs and expose its metrics (#851)
* add sys/class/net parsing from procfs and expose its metrics

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change code to use int pointers per procfs change, move netclass to separate collector, change metric naming

Signed-off-by: Jan Klat <jenik@klatys.cz>

* bump year in licence, remove redundant newline, correct fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix style

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change carrier changes to counter type

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix e2e output

Signed-off-by: Jan Klat <jenik@klatys.cz>

* add fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* update vendor, use fixtures correctly

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change fixtures (device in /sys/class/net should be symlinked)

Signed-off-by: Jan Klat <jenik@klatys.cz>

* correct fixtures for 64k page, updated readme

Signed-off-by: Jan Klat <jenik@klatys.cz>
2018-07-16 15:08:18 +02:00
mknapphrt
09b4305090 Changed the way that stuck mounts are handled. If a mount fails to return, it will stop being queried until it returns. (#997)
Fixed spelling mistakes.

Update transport_generic.go

Changed to a mutex approach instead of channels and added a timeout before declaring a mount stuck.

Removed unnecessary lock channel and clarified some var names.

Fixed style nits.

Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2018-07-14 11:10:28 +02:00
xginn8
ac5a981761 Adding socket stat collection for systemd socket units (#968)
Signed-off-by: xginn8 <mamcgi@gmail.com>
2018-07-05 16:26:48 +02:00
xginn8
8af84a215d Add support for NRestarts counter introduced in systemd 235 (#992)
* Add support for NRestarts counter introduced in systemd 235

`.service` units increment this counter any time the Restart= condition is
triggered.

Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
2018-07-05 13:31:45 +02:00
Ben Kochie
107e5dfecc
Fix mdadm collector issues (#985)
* Send "Personality unknown" to debug, not info, remove unnecessary newline.
* Add support for "linear" personality.
* Always set number of active disks to 0 when a device is inactive.
* Add total disks calculation to unknown personalites.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-07-02 12:38:20 +02:00
Derek Marcotte
2678d68dcc Fix for #945, cpu temperature is signed. (#965)
* Fix for #945, cpu temperature is signed.

Added a type conversion to cpu temperature sysctl.  Will still
collect/report -1 when the value is -1, this is because it should be up
to interpretation whether this is the correct value for the system or
not.

Some drivers will report -1 for cpu temperature.  Other sensors will
report "an input into the fan control algorithm", i.e. not the actual
temperature, but how much fan it wants.  Some people cool their machines
with liquid nitrogen.

Signed-off-by: Derek Marcotte <554b8425@razorfever.net>
2018-06-07 15:01:25 +02:00
Brad Beam
e3cf1d5187 Adding support for evaluating octal characters in mountpoint (#954)
Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2018-06-06 16:49:19 +02:00
Pavlo Kutishchev
456bf5094a Add processes exporter (#950)
* Add processes exporter

Signed-off-by: Pavel Kutishchev <pavel.kutishchev@olx.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-06-05 19:38:32 +02:00
Alexey Kopytov
dd98a09bb2 A couple of ARM64-related fixes (#934)
* Do not rely on AArch64 CPUs to support 32-bit ARM for cross-testing.

Signed-off-by: Alexey Kopytov <akopytov@gmail.com>

* aarch64 like ppc64le reports 64k node_sockstat_TCP_mem_bytes due to 64k pages.

Signed-off-by: Alexey Kopytov <akopytov@gmail.com>
2018-05-14 15:55:49 +02:00