Commit Graph

113 Commits

Author SHA1 Message Date
Patrick
bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier
38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Paul Gier
e8d8199072 Update diskstats for linux kernel 4.19 (#1109)
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields.  See: https://www.kernel.org/doc/Documentation/iostats.txt

* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 17:24:28 +02:00
Mario Trangoni
3659260b66 infiniband: Handle iWARP* RDMA modules N/A (#974)
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available

This is related to #966, and handle this error,

Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-10-04 15:05:59 +02:00
Björn Rabenstein
1c9ea46cca Update vendoring for client_golang and friends (#1076)
Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 17:09:52 +02:00
Marco Tulio R Braga
05e55bddad Fix typo on description of read_time_seconds_total (#1057)
Fix typo on unit description of metric `*read_time_seconds_total` from milliseconds to seconds.

Signed-off-by: Marco Tulio R Braga <marco.tulio@mtulio.eng.br>
2018-09-02 09:46:45 +02:00
Hannes Körber
14a4f0028e Enable nfs protocol (#998)
* vendor: Update prometheus/procfs

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>

* mountstats: Use new NFS protocol field

In https://github.com/prometheus/procfs/pull/100, the NFSTransportStats
struct was expanded by a field called protocol that specifies the NFS
protocol in use, either "tcp" or "udp". This commit adds the protocol as
a label to all NFS metrics exported via the mountstats collector.

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>

* Update fixtures for UDP mount

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>
2018-07-24 00:47:12 +02:00
neiledgar
7e4d9bd150 Update wifi stats to support multiple stations (#977) (#980)
Signed-off-by: neiledgar <neil.edgar@btinternet.com>
2018-07-16 16:02:25 +02:00
Jan Klat
c4102f1175 Add sys/class/net parsing from procfs and expose its metrics (#851)
* add sys/class/net parsing from procfs and expose its metrics

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change code to use int pointers per procfs change, move netclass to separate collector, change metric naming

Signed-off-by: Jan Klat <jenik@klatys.cz>

* bump year in licence, remove redundant newline, correct fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix style

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change carrier changes to counter type

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix e2e output

Signed-off-by: Jan Klat <jenik@klatys.cz>

* add fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* update vendor, use fixtures correctly

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change fixtures (device in /sys/class/net should be symlinked)

Signed-off-by: Jan Klat <jenik@klatys.cz>

* correct fixtures for 64k page, updated readme

Signed-off-by: Jan Klat <jenik@klatys.cz>
2018-07-16 15:08:18 +02:00
Ben Kochie
107e5dfecc
Fix mdadm collector issues (#985)
* Send "Personality unknown" to debug, not info, remove unnecessary newline.
* Add support for "linear" personality.
* Always set number of active disks to 0 when a device is inactive.
* Add total disks calculation to unknown personalites.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-07-02 12:38:20 +02:00
Pavlo Kutishchev
456bf5094a Add processes exporter (#950)
* Add processes exporter

Signed-off-by: Pavel Kutishchev <pavel.kutishchev@olx.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-06-05 19:38:32 +02:00
Ben Kochie
b10ca77680
Fix /proc/net/dev/ interface name handling
* Allow any character (UTF-8) for Linux interface names.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-18 12:53:59 +02:00
Ben Kochie
a528966dcd Fix parsing of interface aliases in netdev linux
Very old kernels expose interface aliases as `foo0:0`, adjust the line
parsing to handle these names.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-17 13:15:02 +02:00
Dmitriy Lukyanchikov
eddd1b9357 Fix netdev collector for linux (#890)
fix variable name, fix transmitHeader extracting
modify fixtures to run tests with updated netdev_linux collector

Signed-off-by: dmitriy-lukyanchikov <d.lukyanchikov@anchorfree.com>
2018-04-14 13:58:56 +02:00
Karsten Weiss
7e392e6634 Fix spelling mistakes found by codespell
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:27:17 +02:00
Karsten Weiss
efc1fdb6d0 cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total (#871)
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total

This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.

Rename the node_package_throttles_total metric label `node` to `package`.

Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Use the direct /sys path to the cpu files.

Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.

The latter path also does not exist e.g. on RHEL 6.9's kernel.

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Reverse core+package throttle processing order

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Add documentation URLs

Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:01:52 +02:00
Brian Brazil
31ce32f1fe
Greatly trim what netstat collector exposes by default (#876)
Netstat is 40% of the metrics on my laptop, many of which
are highly detailed information about IP internals in the kernel.
~300 such metrics on every machine in your fleet is excessive,
so focus on key metrics by default, overridable by the user.

Fixes #515

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-03-30 19:28:08 +01:00
Ben Kochie
cf3edadcbb Update fixtures
* Add oom_kill to fixture.
* Update e2e outputs.
* Put regexp in order.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-03-29 22:00:02 +01:00
Brian Brazil
499c342fed Greatly reduce the metrics vmstat returns by default.
Vmstat has over 100 fields, most of which are highly
detailed debug information. Trim this down to only
essential fields by default, configurable by flag.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-03-29 22:00:02 +01:00
Mario Trangoni
1f11a86d59 Fix nfs golint issues (#863)
* procfs: update vendoring

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* procfs: fix e2e tests after nfs changes

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-03-22 22:25:37 +01:00
Ben Kochie
7b720df1c5
Use lowercase cpu label name in interrupts (#849)
To match other CPU related metric labels, use a lowercase named label.
2018-03-08 15:04:49 +01:00
Rene Treffer
c504c7e264 Only report core throttles per core, not per cpu (#836)
* Only report core throttles per core, not per cpu

* Add topology/core_id to the cpu sysfs fixtures

* Add new cpu fixtures to ttar file

* Merge core_id reading and thermal throttle accounting

* Declare core_id
2018-02-27 19:43:15 +01:00
Ben Kochie
e0d54a509c
Cleanup NFS metrics (#834)
* Cleanup NFS metrics

* Update `nfs` metric names to match `nfsd`.
* Remove uneeded `tcp` label from TCP connections metric.
* Remove uneeded `v` on `nfsd` metrics.
* Enable all `nfs` v4 client metrics.
* Remove `nfs` metric name overrides.

* Add ppc64le fixture.

* Fix typo.
2018-02-21 07:25:41 +01:00
Ben Kochie
d33a447047
Remove deprecated prometheus.InstrumentHandlerFunc (#831)
Update Prometheus client golang use to use `promhttp.Handler()` instead
of `prometheus.InstrumentHandlerFunc()`.
2018-02-19 15:44:59 +01:00
Richard Elling
d7348a5c78 updates for zfsonlinux 0.7.5 (#779)
* updates for zfsonlinux 0.7.5

* add constants for KSTAT_DATA_* types

* added e2e test for negative values represented by uint64 that can result from ZFS bugs
2018-02-16 15:46:31 +01:00
Ben Kochie
3de2542d21
Fix NFSd metric type (#819)
RPC Count should be a counter, not a gauge.
2018-02-13 17:03:22 +01:00
Matt Layher
544488ddd6 Fix remaining metric naming issues (#799) 2018-02-12 18:53:31 +01:00
Ben Kochie
6a041692ed
Add NFS Server metrics collector. (#803)
* Add NFS Server metrics collector.

* Add File Handles metrics.

* Add nfsd IO stats.

* Add metrics for NFSd threads.

* Add metrics for NFSd read ahead cache.

* Add NFSd network traffic counters.

* Add RPC metrics.

* Add V2 requests metrics.

* Add NFSv3 metrics.

* Add NFSv4 metrics.

* Update reply cache comment.

* Update help text.
2018-02-12 17:56:05 +01:00
Ben Kochie
14d60958d6
Unify CPU collector conventions (#806)
* Unify CPU collector conventions

Add a common CPU metric description.
* All collectors use the same `nodeCpuSecondsDesc`.
* All collectors drop the `cpu` prefix for `cpu` label values.

* Fix subsystem string in cpu_freebsd.

* Fix Linux CPU freq label names.
2018-02-01 18:42:20 +01:00
Ben Kochie
111e3af437
Remove obsolete megacli collector. (#798)
This collector has been replaced by the textfile collector tool
`storcli.py`.
2018-01-23 11:25:42 +01:00
Julius Volz
6cac74f0e0
Add unit suffix to textfile collector mtime metric (#796) 2018-01-22 14:02:19 +01:00
Brian Brazil
a98067a294 Make metrics better follow guidelines (#787)
* Improve stat linux metric names.

cpu is no longer used.

* node_cpu -> node_cpu_seconds_total for Linux

* Improve filesystem metric names with units

* Improve units and names of linux disk stats

Remove sector metrics, the bytes metrics cover those already.

* Infiniband counters should end in _total

* Improve timex metric names, convert to more normal units.

See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.

* Update test fixture

* For meminfo metrics that had "kB" units, add _bytes

* Interrupts counter should have _total
2018-01-17 17:55:55 +01:00
Julius Volz
f536857ac6
Fix e2e tests after textfile custom timestamp removal (#768) 2017-12-24 11:54:33 +01:00
Ben Kochie
cd2a17176a
Add full make to CircleCI (#761)
* Add full make to CircleCI

Ensure end-to-end test is run.

* Fix go fmt error.

* Fix end-to-end output.
2017-12-21 16:24:23 +01:00
Ben Kochie
2a80537547
Split out guest cpu metrics on Linux. (#744)
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics.  Separate these into their own metric to
avoid duplication of data.
2017-11-23 15:04:47 +01:00
Ben Kochie
ea250d73f4
Fix off by one in Linux interrupts collector (#721)
* Fix off by one in Linux interrupts collector

* Fix off by one in CPU column handler.
* Add test.

* Enable interrupts in end-to-end test.
2017-11-02 09:59:46 +01:00
Matt Layher
f9ad88fc03
xfs: expose correct fields, fix metric names 2017-10-20 18:41:51 -04:00
Ben Kochie
deadfef4c9 Update vendoring (#685)
* Update vendor github.com/coreos/go-systemd/dbus@v15

* Update vendor github.com/ema/qdisc

* Update vendor github.com/godbus/dbus

* Update vendor github.com/golang/protobuf/proto

* Update vendor github.com/lufia/iostat

* Update vendor github.com/matttproud/golang_protobuf_extensions/pbutil@v1.0.0

* Update vendor github.com/prometheus/client_golang/...

* Update vendor github.com/prometheus/common/...

* Update vendor github.com/prometheus/procfs/...

* Update vendor github.com/sirupsen/logrus@v1.0.3

Adds vendor golang.org/x/crypto

* Update vendor golang.org/x/net/...

* Update vendor golang.org/x/sys/...

* Update end to end output.
2017-10-05 16:20:47 +02:00
Karsten Weiss
b0d5c00832 cpu: Metric 'package_throttles_total' is per package. (#657)
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Ben Kochie
46c31d8a7e Enable IPVS collector by default (#623)
* Silence error output when no IPVS present.
* Enable by default.
* Update end-to-end fixture.
* Update README.
2017-07-26 15:20:28 +02:00
Andrea De Pasquale
1369763067 Change raid0 status line regexp for mdadm collector (#619) 2017-07-20 17:04:33 +02:00
Aleksey Zhukov
7a914e58f2 Add parsing /proc/net/snmp6 file for netstat-linux (#615)
* Add parsing /proc/net/snmp6 file

* add /proc/net/snmp6 fixture

* fix e2e test

* gofmt

* remove unuser variable

* safe checks

* add tests

* change help format
2017-07-08 20:16:35 +02:00
Matt Layher
6e82fd1c56 Add XFS block mapping and block map B-tree stats (#575) 2017-07-07 07:27:52 +02:00
ideaship
8d90276283 Add bcache collector (#597)
* Add bcache collector for Linux

This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.

* Removed commented out code

* Use project comment style

* Add _sectors to metric name to indicate unit

* Really use project comment style

* Rename bcache.go to bcache_linux.go

* Keep collector namespace clean

Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric

* Shorten slice initialization

* Change label names to backing_device, cache_device

* Remove five minute metrics (keep only total)

* Include units in additional metric names

* Enable bcache collector by default

* Provide metrics in seconds, not nanoseconds

* remove metrics with label "all"

* Add fixtures, update end-to-end for bcache collector

* Move fixtures/sys into tar.gz

This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.

The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).

* Add ttar: plain text archive, replacement for tar

This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).

Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.

The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.

The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
2017-07-07 07:20:18 +02:00
Rene Treffer
bcc3cd92b8 Fix cpufreq statistics by converting kHz to Hz 2017-06-27 11:05:55 +02:00
Ben Kochie
182810056f Fix Linux cpu errors (#606)
Make the Linux cpu collector soft-error on missing `cpufreq` and
`thermal_throttle` features.
2017-06-20 07:51:26 +02:00
Rene Treffer
2e9f1913b8 Move stat_linux to cpu_linux and add cpufreq stats (#548) 2017-06-13 11:21:53 +02:00
Emanuele Rocca
047003b6bb Add qdisc collector for Linux (#580)
* Add qdisc collector for Linux

This collector gathers basic queueing discipline metrics via netlink,
similarly to what `tc -s qdisc show` does.

* qdisc collector: nl-specific code moved, names fixed

- netlink-specific parts moved to github.com/ema/qdisc
- avoid using shortened names
- counters renamed into XXX_total

* Get rid of parseMessage error checking leftover

* Add github.com/ema/qdisc to vendored packages

* Update help texts and comments

* Add qdisc collector to README file

* qdisc collector end-to-end testing

* Update qdisc dependency to latest version

Update github.com/ema/qdisc dependency to revision 2c7e72d, which
includes unit testing.

* qdisc collector: rename "iface" label into "device"
2017-05-23 11:55:50 +02:00
Robert Clark
58f50b31f2 Multiply port data XMIT/RCV metrics by 4 (#579)
According to Mellanox, it is standard practice that the port_xmit_data and port_rcv_data
files are split into 4 lanes. To get the actual transmit and receive values for each
port, the metric needs to be multiplied by 4.

Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-05-12 07:28:53 +02:00
Matt Layher
1feb091b36
Initial XFS collector 2017-04-22 11:53:07 -04:00