* deleted_libraries: Upgrade to Python 3
Python 2.7 will not be maintained past 2020. Therefore upgrade
text_collector_examples/deleted_libraries.py to Python 3.
* Add mellanox_hca_temp text collector example
mellanox_hca_temp is a script that reads Mellanox HCA temperature using
the Mellanox mget_temp_ext tool.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.
Fixes#1122
Signed-off-by: Paul Gier <pgier@redhat.com>
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.
Signed-off-by: Ben Kochie <superq@gmail.com>
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields. See: https://www.kernel.org/doc/Documentation/iostats.txt
* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics
Signed-off-by: Paul Gier <pgier@redhat.com>
* State that wifi collector is disabled by default
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add the 'processes' collector to the Readme
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
LaunchDaemons are the correct way to create services that are restart proof.
There is now only a single destination place mentioned in the readme for the plist file.
Signed-off-by: Dávid Balakirev <dave00ster@gmail.com>
This is mostly required to fix a bug with histograms on 32bit platforms.
(Which might or might not be used in node_exporter. Just in case...)
Signed-off-by: beorn7 <beorn@soundcloud.com>
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available
This is related to #966, and handle this error,
Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.
Closes: #66
Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.
SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.
For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).
This ignores all nsfs filesystems.
Fixes#875
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
* Update build
* Only use CGO when building non-Linux.
* Update build to Go 1.11
* Use tab indenting consistently.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Change systemd unit filtering
Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.
Signed-off-by: Ben Kochie <superq@gmail.com>
This removes the cgo import from timex collector, as it was only used
to define two constants. Those are part of the Linux kernel<->userspace
interface, thus there is no need to depend on libc to source them:
https://github.com/torvalds/linux/blob/v4.18/include/uapi/linux/timex.h
Signed-off-by: Luca Bruno <luca.bruno@coreos.com>
* textfile smartmon.sh
Added functions to also parse megaraid disks.
Added parsing to also detect the grown_defects counters.
* textfile storcli.py
Reworked the example file to export lots more information about
megaraid attached controllers, VDs and PDs.
Signed-off-by: Christopher Blum <christopher.blum@profitbricks.com>
* Correctly cast Darwin memory info
* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.
Signed-off-by: Ben Kochie <superq@gmail.com>
Fix typo on unit description of metric `*read_time_seconds_total` from milliseconds to seconds.
Signed-off-by: Marco Tulio R Braga <marco.tulio@mtulio.eng.br>
Add metrics that expose more information about MD RAID devices and
disks:
- the RAID level in use
- the RAID set that a disk belongs to
This allows for things like alert on unusually high I/O
utilisation for a disk compared to other disks in the same RAID set,
which usually means the disk is failing, and for comparing
write/read latency across RAID sets.
Output looks like:
node_md_disk_info{disk_device="/dev/dm-0", md_device="md1", md_set="A"} 1
node_md_disk_info{disk_device="/dev/dm-3", md_device="md1", md_set="B"} 1
node_md_disk_info{disk_device="/dev/dm-2", md_device="md1", md_set="A"} 1
node_md_disk_info{disk_device="/dev/dm-1", md_device="md1", md_set="B"} 1
node_md_disk_info{disk_device="/dev/dm-4", md_device="md1", md_set="A"} 1
node_md_disk_info{disk_device="/dev/dm-5", md_device="md1", md_set="B"} 1
node_md_info{md_device="md1", md_name="foo", raid_level="10", md_metadata_version="1.2"} 1
The `node_md_info` metric, which gives additional information about the
RAID array, is intentionally separate to avoid adding all of those
labels to each disk. If you need to query using the labels contained in
`node_md_info`, you can do that using PromQL:
https://www.robustperception.io/how-to-have-labels-for-machine-roles/
I looked at adding the array UUID, but there's no sysfs entry for it and
I'm not sure there's a strong use case for it.
This patch to add a sysfs entry for the UUID was apparently not
accepted:
https://www.spinics.net/lists/raid/msg40667.html
Add these metrics as a textfile script rather than adding them to the Go
'md' module as they're perhaps less commonly useful. If lots of people
find them useful, we can later rewrite this in Go.
Signed-off-by: Matt Bostock <mbostock@cloudflare.com>
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available
Signed-off-by: James Hartig <james@getadmiral.com>
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.
* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric
* Use Prometheus best practices for uptime metric.
* Use "start time" rather than "uptime".
* Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.
Signed-off-by: Ben Kochie <superq@gmail.com>