mirror of https://github.com/prometheus/node_exporter.git synced 2024-11-23 12:30:46 +01:00

Exporter for machine metrics

host-metrics machine-metrics metrics node-metrics procfs prometheus prometheus-exporter system-information system-metrics

Go to file

Matt Bostock 516e5d4beb Add metric for outdated libraries (#957 ) Add metrics that count how many running processes are linking to deleted libraries on each machine. Deleted libraries are usually outdated libraries, and outdated libraries may have known security vulnerabilities. The rationale behind storing these as metrics is allow the rollout of security fixes to be tracked across a fleet of machines, ensuring that all affected processes are restarted (e.g. via a reboot). I'm parsing the output from `/proc/*/maps` because it's using `lsof -d DEL` can be too slow, particularly if you have sockets that bind to thousands of IP addresses. The metric labels include the library path and the base filename, which allows us to pinpoint the exact path of the deleted library but also allows us to aggregate on the library name (or approximations of it) even if library locations differ between operating system versions. The metrics output and the CPU time consumed is as follows: user@host:~$ time sudo python processes.py # HELP node_processes_linking_deleted_libraries Count of running processes that link a deleted library # TYPE node_processes_linking_deleted_libraries gauge node_processes_linking_deleted_libraries{library_path="locale-archive", library_name="/usr/lib/locale"} 3 node_processes_linking_deleted_libraries{library_path="libevent-2.0.so.5.1.9", library_name="/usr/lib/x86_64-linux-gnu"} 4 real 0m0.071s user 0m0.030s sys 0m0.041s Including the library filename and path will result in reasonably high metrics cardinality, however I think the benefits when an urgent security patch is being deployed outweigh concerns around cardinality. This script assumes that library files do not contain spaces in their path. Signed-off-by: Matt Bostock <mbostock@cloudflare.com>		2018-05-25 18:20:42 +02:00
.circleci	Use Go 1.9 for build.	2018-04-24 16:22:35 +02:00
.github	Add additional field to github issue template. (#645 )	2017-08-17 12:44:26 +02:00
collector	A couple of ARM64-related fixes (#934 )	2018-05-14 15:55:49 +02:00
docs	docs: Add example recording rule for node_memory_MemAvailable	2018-05-16 17:01:51 -05:00
examples	Add example launchctl-file for MacOS (#856 )	2018-03-22 15:31:53 +01:00
text_collector_examples	Add metric for outdated libraries (#957 )	2018-05-25 18:20:42 +02:00
vendor	Update github.com/prometheus/common dependencies	2018-03-31 22:20:45 +02:00
.dockerignore	New release process using docker, circleci and a centralized	2016-04-28 22:07:21 +02:00
.gitignore	Ignore extracted sysfs fixture files from git	2017-07-20 14:36:48 -04:00
.promu.yml	define binary name in promu configuration file (#650 )	2017-08-22 17:24:07 +02:00
CHANGELOG.md	Release 0.16.0	2018-05-15 16:16:05 +02:00
checkmetrics.sh	Makefile: add checkmetrics target, use in CI (#797 )	2018-02-13 18:04:03 +01:00
CONTRIBUTING.md	Document DCO in CONTRIBUTING.md	2018-04-16 12:51:12 +02:00
Dockerfile	Run node-exporter in Docker as nobody (#599 )	2017-06-08 20:02:20 +02:00
Dockerfile.ppc64le	Add dockerfile for ppc64le (#638 )	2017-08-17 11:53:04 +02:00
end-to-end-test.sh	A couple of ARM64-related fixes (#934 )	2018-05-14 15:55:49 +02:00
example-rules.yml	Fix cpu utilization rule.	2018-05-17 18:15:07 +02:00
LICENSE	License cleanup	2015-01-22 17:11:26 +01:00
MAINTAINERS.md	Replace AUTHORS.md by an updated MAINTAINERS.md	2017-02-19 18:27:34 +01:00
Makefile	Add Makefile.common (#940 )	2018-05-24 23:31:48 +02:00
Makefile.common	Add Makefile.common (#940 )	2018-05-24 23:31:48 +02:00
node_exporter_test.go	Remove unnecessary select statement (#692 )	2017-10-18 07:38:48 +02:00
node_exporter.go	Sort collector names in startup logs (#857 )	2018-03-29 13:42:44 +01:00
NOTICE	Vendor github.com/mdlayher/wifi and dependencies	2017-01-10 11:29:00 -05:00
README.md	Merge pull request #852 from prometheus/remove-gmond	2018-04-27 10:02:16 +02:00
test_image.sh	Resolves prometheus/node_exporter#585 (#586 )	2017-07-07 07:26:11 +02:00
ttar	Vendor ttar from github.com/ideaship/ttar	2018-03-10 15:19:44 +01:00
VERSION	Release 0.16.0	2018-05-15 16:16:05 +02:00

README.md

Node exporter

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

The WMI exporter is recommended for Windows users.

Collectors

There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.

Collectors are enabled by providing a --collector.<name> flag. Collectors that are enabled by default can be disabled by providing a --no-collector.<name> flag.

Enabled by default

Name	Description	OS
arp	Exposes ARP statistics from `/proc/net/arp`.	Linux
bcache	Exposes bcache statistics from `/sys/fs/bcache/`.	Linux
bonding	Exposes the number of configured and active slaves of Linux bonding interfaces.	Linux
boottime	Exposes system boot time derived from the `kern.boottime` sysctl.	Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD
conntrack	Shows conntrack statistics (does nothing if no `/proc/sys/net/netfilter/` present).	Linux
cpu	Exposes CPU statistics	Darwin, Dragonfly, FreeBSD, Linux
diskstats	Exposes disk I/O statistics.	Darwin, Linux
edac	Exposes error detection and correction statistics.	Linux
entropy	Exposes available entropy.	Linux
exec	Exposes execution statistics.	Dragonfly, FreeBSD
filefd	Exposes file descriptor statistics from `/proc/sys/fs/file-nr`.	Linux
filesystem	Exposes filesystem statistics, such as disk space used.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon	Expose hardware monitoring and sensor data from `/sys/class/hwmon/`.	Linux
infiniband	Exposes network statistics specific to InfiniBand and Intel OmniPath configurations.	Linux
ipvs	Exposes IPVS status from `/proc/net/ip_vs` and stats from `/proc/net/ip_vs_stats`.	Linux
loadavg	Exposes load average.	Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm	Exposes statistics about devices in `/proc/mdstat` (does nothing if no `/proc/mdstat` present).	Linux
meminfo	Exposes memory statistics.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netdev	Exposes network interface statistics such as bytes transferred.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat	Exposes network statistics from `/proc/net/netstat`. This is the same information as `netstat -s`.	Linux
nfs	Exposes NFS client statistics from `/proc/net/rpc/nfs`. This is the same information as `nfsstat -c`.	Linux
nfsd	Exposes NFS kernel server statistics from `/proc/net/rpc/nfsd`. This is the same information as `nfsstat -s`.	Linux
sockstat	Exposes various statistics from `/proc/net/sockstat`.	Linux
stat	Exposes various statistics from `/proc/stat`. This includes boot time, forks and interrupts.	Linux
textfile	Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set.	any
time	Exposes the current system time.	any
timex	Exposes selected adjtimex(2) system call stats.	Linux
uname	Exposes system information as provided by the uname system call.	Linux
vmstat	Exposes statistics from `/proc/vmstat`.	Linux
wifi	Exposes WiFi device and station statistics.	Linux
xfs	Exposes XFS runtime statistics.	Linux (kernel 4.4+)
zfs	Exposes ZFS performance statistics.	Linux

Disabled by default

Name	Description	OS
buddyinfo	Exposes statistics of memory fragments as reported by /proc/buddyinfo.	Linux
devstat	Exposes device statistics	Dragonfly, FreeBSD
drbd	Exposes Distributed Replicated Block Device statistics (to version 8.4)	Linux
interrupts	Exposes detailed interrupts statistics.	Linux, OpenBSD
ksmd	Exposes kernel and system statistics from `/sys/kernel/mm/ksm`.	Linux
logind	Exposes session counts from logind.	Linux
meminfo_numa	Exposes memory statistics from `/proc/meminfo_numa`.	Linux
mountstats	Exposes filesystem statistics from `/proc/self/mountstats`. Exposes detailed NFS client statistics.	Linux
ntp	Exposes local NTP daemon health to check time	any
qdisc	Exposes queuing discipline statistics	Linux
runit	Exposes service status from runit.	any
supervisord	Exposes service status from supervisord.	any
systemd	Exposes service and system status from systemd.	Linux
tcpstat	Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6`. (Warning: the current version has potential performance issues in high load situations.)	Linux

Textfile Collector

The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.

To use it, set the --collector.textfile.directory flag on the Node exporter. The collector will parse all files in that directory matching the glob *.prom using the text format.

To atomically push completion time for a cron job:

echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

To statically set roles for a machine using labels:

echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Filtering enabled collectors

The node_exporter will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.

For advanced use the node_exporter can be passed an optional list of collectors to filter metrics. The collect[] parameter may be used multiple times. In Prometheus configuration you can use this syntax under the scrape config.

  params:
    collect[]:
      - foo
      - bar

This can be useful for having different Prometheus servers collect specific metrics from nodes.

Building and running

Prerequisites:

Go compiler
RHEL/CentOS: glibc-static package.

Building:

go get github.com/prometheus/node_exporter
cd ${GOPATH-$HOME/go}/src/github.com/prometheus/node_exporter
make
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

Running tests

make test

Using Docker

The node_exporter is designed to monitor the host system. It's not recommended to deploy it as Docker container because it requires access to the host system. Be aware that any non-root mount points you want to monitor will need bind-mounted into the container.

docker run -d \
  --net="host" \
  --pid="host" \
  quay.io/prometheus/node-exporter

Using a third-party repository for RHEL/CentOS/Fedora

There is a community-supplied COPR repository. It closely follows upstream releases.