Commit Graph

133 Commits

Author SHA1 Message Date
Vladislav Rassokhin
4b9d7b63e6 Update CHANGELOG.md
Signed-off-by: Vladislav Rassokhin <vladislav.rassokhin@jetbrains.com>
2021-09-02 15:19:16 +02:00
Ben Kochie
84b36c4fd8
Add flag to disable guest CPU metrics
In high scale virtualized / cloud environments there are typically
no guest VMs. Add a boolean flag to allow disabling the Linux guest
CPU metrics.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-08-17 13:04:46 +02:00
Ben Kochie
120b9b463e
Release 1.2.2
* [BUGFIX] Fix processes collector long int parsing #2112

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-08-06 14:41:30 +02:00
Ben Kochie
1958a77add
Release 1.2.1
* [BUGFIX] Fix zoneinfo parsing prometheus/procfs#386
* [BUGFIX] Fix nvme collector log noise #2091
* [BUGFIX] Fix rapl collector log noise #2092

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-07-23 10:44:29 +02:00
Ben Kochie
138d4a20ee
Release 1.2.0
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors #2012
* [CHANGE] Make node_exporter print usage to STDOUT #2039
* [FEATURE] Add conntrack statistics metrics #1155
* [FEATURE] Add ethtool stats collector #1832
* [FEATURE] Add flag to ignore network speed if it is unknown #1989
* [FEATURE] Add tapestats collector for Linux #2044
* [FEATURE] Add nvme collector #2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887
* [ENHANCEMENT] Add more Infiniband counters #2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing #2033
* [ENHANCEMENT] Add time zone offset metric #2060
* [BUGFIX] Handle errors from disabled PSI subsystem #1983
* [BUGFIX] Fix panic when using backwards compatible flags #2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache #2015
* [BUGFIX] Only initiate collectors once #2048
* [BUGFIX] Handle small backwards jumps in CPU idle #2067

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-07-15 14:27:42 +02:00
Luiz Angelo Daros de Luca
00aa2f34ce Add tapestats to collect tape devices statistics
It is based on diskstats to allow metrics reuse by simply
s/disk/tape/ the query.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
2021-07-09 21:01:08 -03:00
Ben Kochie
13be860e25 Add time zone offset metric
Add the time zone and offset in seconds.

Closes: https://github.com/prometheus/node_exporter/issues/2052

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-07-01 11:25:53 +02:00
Ben Kochie
3bc9a93c20
Add ErrorLog plumbing to promhttp
Fix the error logging of the promhttp handler by connecting it to the
promlog setup.
* Switch to go-kit/log.
* Cleanup CHANGELOG.

Fixes: https://github.com/prometheus/node_exporter/issues/1886

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-06-03 10:47:41 +02:00
Frederic Hemberger
39124626cd Rename collector.filesystem flags to match other collectors
Ref: #1743
Fixes: #1994

Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>
2021-03-24 21:01:10 +01:00
Ben Kochie
9893fca77e
Add flag to ignore network speed if it is unknown
Some devices (ex virtual) don't have a speed and report `-1` as the
speed value. Add a flag to allow ignoring speed on these devices.

Fixes: https://github.com/prometheus/node_exporter/issues/1967

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-18 11:36:31 +01:00
Ben Kochie
378d7b46bf
Release version 1.1.2
* [BUGFIX] Handle errors from disabled PSI subsystem #1983
* [BUGFIX] Sanitize strings from /sys/class/power_supply #1984
* [BUGFIX] Silence missing netclass errors #1986

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-05 08:30:29 +01:00
Ben Kochie
3b3ef7357f
Silence missing netclass errors
* Handle no such file and permission denied errors.
* Reduce excessive error wrapping.

Fixes: https://github.com/prometheus/node_exporter/issues/1840

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-03 20:40:08 +01:00
Ben Kochie
23e5b245a4
Sanitize strings from /sys/class/power_supply
Avoid panic on invalid UTF-8 from /sys/class/power_supply by
sanitizing strings parsed from the kernel.
* Add a broken string to the test fixtures.

Fixes: https://github.com/prometheus/node_exporter/issues/1979

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-03 18:05:51 +01:00
Ben Kochie
46d0a0813f
Handle errors from disabled PSI subsystem
When CONFIG_PSI_DEFAULT_DISABLED=y, the pressure system returns
"operation not supported", rather than permission denied or not
exposing the /proc/pressure files.

Fixes: https://github.com/prometheus/node_exporter/issues/1961

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-03 11:02:28 +01:00
Ben Kochie
d1a791b1af
Release 1.1.1
* [BUGFIX] Fix ineffassign issue #1957
* [BUGFIX] Fix some noisy log lines #1962

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-02-12 16:47:01 +01:00
Ben Kochie
a37d3f659c
Release 1.1.0
* Update Build
  - Update CircleCI orb.
  - Update CIrcleCI Machine image.
  - Use golang-builder 1.15.
* Update Go modules.
* Fixup fixtures for XFS bug.

NOTE: We have improved some of the flag naming conventions (PR #1743). The old names are
      deprecated and will be removed in 2.0. They will continue to work for backwards
      compatibility.

* [CHANGE] Improve filter flag names #1743
* [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default #1897
* [FEATURE] Add fibre channel collector #1786
* [FEATURE] Expose cpu bugs and flags as info metrics. #1788
* [FEATURE] Add network_route collector #1811
* [FEATURE] Add zoneinfo collector #1922
* [ENHANCEMENT] Add more InfiniBand counters #1694
* [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics #1709
* [ENHANCEMENT] Adding backlog/current queue length to qdisc collector #1732
* [ENHANCEMENT] Include TCP OutRsts in netstat metrics #1733
* [ENHANCEMENT] Add pool size to entropy collector #1753
* [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 #1774
* [ENHANCEMENT] bcache: add writeback_rate_debug stats #1658
* [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric #1810
* [ENHANCEMENT] Expose XFS inode statistics #1870
* [ENHANCEMENT] Expose zfs zpool state #1878
* [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable #1947
* [BUGFIX] filesystem_freebsd: Fix label values #1728
* [BUGFIX] Fix various procfs parsing errors #1735
* [BUGFIX] Handle no data from powersupplyclass #1747
* [BUGFIX] udp_queues_linux.go: change upd to udp in two error strings #1769
* [BUGFIX] Fix node_scrape_collector_success behaviour #1816
* [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions #1827
* [BUGFIX] Fix node_md_disks state label from fail to failed #1862
* [BUGFIX] Handle EPERM for syscall in timex collector #1938
* [BUGFIX] bcache: fix typo in a metric name #1943
* [BUGFIX] Fix XFS read/write stats (https://github.com/prometheus/procfs/pull/343)

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-02-05 21:23:23 +01:00
Proskurin Kirill
8d147327a6 feat: Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable
Signed-off-by: Proskurin Kirill <kirill.proskurin@behavox.com>
2021-01-27 18:09:52 +00:00
Ben Kochie
1d03daf616
Handle EPERM for syscall in timex collector
Handle case where Adjtimex syscall gets a permission denined error.

Fixes: https://github.com/prometheus/node_exporter/issues/1934

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-01-23 14:22:46 +01:00
ston1th
f8609aeee2 remove openbsd amd64 cgo dependecies
I have rewritten all CGO dependencies for OpenBSD amd64
using pure go, be able to crosscompile node_exporter.

Signed-off-by: ston1th <ston1th@giftfish.de>
2020-11-12 23:37:48 +01:00
Ondrej Baudys
ed10485073
Expose XFS inode statistics (#1869) (#1870)
* Expose XFS inode statistics (#1869)

Also fixes #1177

@SuperQ @discordianfish

Signed-off-by: Ondrej Baudys <obaudys@gmail.com>
Co-authored-by: obaudys@gmail.com <ondrej.baudys@nextgen.net>
2020-10-22 18:14:33 +02:00
Ben Kochie
1f46669916
Fix up node_md_disks changelog entry
Fixes: https://github.com/prometheus/node_exporter/issues/1759

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-29 16:30:59 +02:00
Ben Kochie
08ce3c6dd4
Merge pull request #1733 from prometheus/superq/OutRsts
Include TCP OutRsts in netstat metrics
2020-06-18 17:12:45 +02:00
Ben Kochie
a34630b8a2
Update for 1.0.1 release
Update changelog and version for 1.0.1 release.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-15 14:34:07 +02:00
Ben Kochie
c8c1618074
Merge pull request #1747 from prometheus/superq/fix_powersupplyclass
Handle no data from powersupplyclass
2020-06-14 15:45:12 +02:00
Ben Kochie
5fed4f01e9
Handle no data from powersupplyclass
Handle the case when /sys/class/power_supply doesn't exist. Fixes
logging error spam.

Requires https://github.com/prometheus/procfs/pull/308

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-13 11:09:16 +02:00
Ben Kochie
7e49b68d3a
Improve filter flag names.
Update netdev and systemd collectors to deprecate poorly chosen flag names.

Old flag names to be removed in 2.0.0.

https://github.com/prometheus/node_exporter/issues/1742

Add log messages for parsed flag values to help discover quoting isuses in
supervisors.

https://github.com/prometheus/node_exporter/issues/1737

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-12 12:46:31 +02:00
Ben Kochie
204164e4e4
Include TCP OutRsts in netstat metrics
TCP "OutRsts" is the number of TCP Resets sent by the node. This can be
useful for monitoring connection failures and flooding.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-04 08:51:39 +02:00
Ben Kochie
11a0aaaa0a
Release 1.0.0
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. #1279
* The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. #1393
* Refactoring of the mdadm collector changes several metrics
    - `node_md_disks_active` is removed
    - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks.
    - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync".
* Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. #1417
* Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. #1510
* Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`.

* [CHANGE] Add `--collector.netdev.device-whitelist`. #1279
* [CHANGE] Ignore iso9600 filesystem on Linux #1355
* [CHANGE] Refactor mdadm collector #1403
* [CHANGE] Add `mountaddr` label to NFS metrics. #1417
* [CHANGE] Don't count empty collectors as success. #1613
* [FEATURE] New flag to disable default collectors #1276
* [FEATURE] Add experimental TLS support #1277, #1687, #1695
* [FEATURE] Add collector for Power Supply Class #1280
* [FEATURE] Add new schedstat collector #1389
* [FEATURE] Add FreeBSD zfs support #1394
* [FEATURE] Add uname support for Darwin and OpenBSD #1433
* [FEATURE] Add new metric node_cpu_info #1489
* [FEATURE] Add new thermal_zone collector #1425
* [FEATURE] Add new cooling_device metrics to thermal zone collector #1445
* [FEATURE] Add swap usage on darwin #1508
* [FEATURE] Add Btrfs collector #1512
* [FEATURE] Add RAPL collector #1523
* [FEATURE] Add new softnet collector #1576
* [FEATURE] Add new udp_queues collector #1503
* [FEATURE] Add basic authentication #1673
* [ENHANCEMENT] Log pid when there is a problem reading the process stats #1341
* [ENHANCEMENT] Collect InfiniBand port state and physical state #1357
* [ENHANCEMENT] Include additional XFS runtime statistics. #1423
* [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. #1439
* [ENHANCEMENT] Expose IPVS firewall mark as a label #1455
* [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. #1413
* [ENHANCEMENT] Add a flag to adjust mount timeout #1486
* [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 #1548
* [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors #1534
* [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. #1552
* [ENHANCEMENT] Add infiniband info metric #1563
* [ENHANCEMENT] Add unix socket support for supervisord collector #1592
* [ENHANCEMENT] Implement loadavg on all BSDs without cgo #1584
* [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric #1617
* [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. #1561
* [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. #1636
* [ENHANCEMENT] Add perf tracepoint collection flag #1664
* [ENHANCEMENT] ZFS: read contents of objset file #1632
* [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing #1711
* [BUGFIX] Read /proc/net files with a single read syscall #1380
* [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. #1393
* [BUGFIX] Fix netdev nil reference on Darwin #1414
* [BUGFIX] Strip path.rootfs from mountpoint labels #1421
* [BUGFIX] Fix seconds reported by schedstat #1426
* [BUGFIX] Fix empty string in path.rootfs #1464
* [BUGFIX] Fix typo in cpufreq metric names #1510
* [BUGFIX] Read /proc/stat in one syscall #1538
* [BUGFIX] Fix OpenBSD cache memory information #1542
* [BUGFIX] Refactor textfile collector to avoid looping defer #1549
* [BUGFIX] Fix network speed math #1580
* [BUGFIX] collector/systemd: use regexp to extract systemd version #1647
* [BUGFIX] Fix initialization in perf collector when using multiple CPUs #1665
* [BUGFIX] Fix accidentally empty lines in meminfo_linux #1671

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-05-25 14:03:04 +02:00
Ben Kochie
3565316d7e
Linux CPU: Cache CPU metrics
Cache CPU metrics to avoid counters (ie iowait) jumping backwards.

Fixes: https://github.com/prometheus/node_exporter/issues/1686

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-05-24 16:31:26 +02:00
Ben Kochie
3cedd344fd Release 1.0.0-rc.1
* Update CHANGELOG with fixes and improvements from rc.0

Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Richard Hartmann <richih@richih.org>
2020-05-14 16:41:37 +02:00
Julien Pivotto
202ecf9c9d
Add basic authentication (#1683)
* Add basic authentication

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-01 14:26:51 +02:00
Peter Bueschel
da5972b539
Add gauges for allocated memory for queued UDP and TCP packages (#1503)
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.

For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:

I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.


Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
2020-03-31 10:46:32 +02:00
Ben Kochie
4891b01b6c
Add changelog entry for #1647
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-27 21:36:39 +01:00
Tom Wilkie
6496c24d61
Metrics for IO errors on Mac. (#1636)
* Metrics for IO errors and retries on Mac.

Signed-off-by: Tom Wilkie <tom@grafana.com>
2020-03-21 21:05:38 +01:00
Benjamin Drung
34d50e15d5 Add model_name and stepping to node_cpu_info metric
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2020-03-20 17:27:11 +01:00
Ben Kochie
ef7c05816a
Release 1.0.0-rc.0 (#1614)
Update CHANGELOG/VERSION for 1.0.0-rc.0 release.
* Add a note about new https settings to top-level README.
* Mark --web.config flag as experimental.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-20 13:42:47 +01:00
Daniel Hodges
ec62141388
Fix num cpu (#1561)
* add a map of profilers to CPUids

`runtime.NumCPU()` returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.

This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.

The CPU id is stored as the value of a map keyed on the profiler
object's address.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodges@uber.com>

Co-authored-by: jdamato-fsly <55214354+jdamato-fsly@users.noreply.github.com>
2020-02-20 11:36:33 +01:00
Paul Gier
b40954dce5
new flag to disable all default collectors (#1460)
* new flag to disable all default collectors

Signed-off-by: Paul Gier <pgier@redhat.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
2020-02-20 11:03:33 +01:00
Ben Kochie
3e1b0f1bee
Don't count empty collection as success (#1613)
Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.

* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.

Closes: https://github.com/prometheus/node_exporter/issues/1323

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-19 16:11:29 +01:00
Ben Kochie
1a75bc7b50
Fix up Darwin swap metrics
* Add a changelog entry.
* Remove redundant swap free metric.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-19 15:52:47 +01:00
Silke Hofstra
8faa843fc4
Add Btrfs collector (#1512)
* Add procfs/btrfs to vendor folder
* Add Btrfs collector

Resolves #1100

Signed-off-by: Silke Hofstra <silke@slxh.eu>
2020-02-19 15:48:51 +01:00
Ukri Niemimuukko
eac3e30f7f rapl_linux collector
This exposes RAPL statistics from /sys/class/powercap.

Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-02-01 12:06:30 +01:00
Paul Cameron
9bb37873a8 Add unix socket support for supervisord collector (#1592)
* Add unix socket support for supervisord collector

For example:
  --collector.supervisord.url=unix:///var/run/supervisor.sock

Fixes prometheus/node_exporter#262

Signed-off-by: Paul Cameron <cameronpm@gmail.com>
2020-01-28 08:50:23 +01:00
Thomas Lin
3ddc82c2d8 Fixed inaccurate 'node_network_speed_bytes' when speeds are low (#1580)
Integer division and the order of operations when converting Mbps to Bps
results in a loss of accuracy if the interface speeds are set low.
e.g. 100 Mbps is reported as 12000000 Bps, should be 12500000
     10 Mbps is reported as 1000000 Bps, should be 1250000

Signed-off-by: Thomas Lin <t.lin@mail.utoronto.ca>
2020-01-01 13:10:53 +01:00
Peter Nicholson
a80b7d0bc5 Add softnet collector (#1576)
Signed-off-by: Peter Nicholson <petergoods@hotmail.com>
2019-12-30 01:36:10 +01:00
Ben Kochie
0d9d7e961a
Update CHANGELOG
Add/update entries for recent merged PRs.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-11-25 21:50:00 +01:00
Matt Layher
da6b66371f collector: reimplement sockstat collector with procfs (#1552)
* collector: reimplement sockstat collector with procfs
* collector: handle sockstat IPv4 disabled, debug logging

Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-11-25 13:41:38 -06:00
John Belmonte
15e36e2230 fix typo in cpufreq metric names (#1510)
Signed-off-by: John Belmonte <john@neggie.net>
2019-10-11 02:12:20 +09:00
Paul Gier
9f5225456d fix order of items in CHANGELOG
Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-25 14:39:43 -05:00
Paul Gier
4d72cb8059 add node_cpu_info metric
Contains information gathered from /proc/cpuinfo

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-25 14:38:57 -05:00