mirror of https://github.com/prometheus/node_exporter.git synced 2024-11-23 12:30:46 +01:00

Sami Kerola 3762191e66 Add timex collector (#664 )

This collector is based on adjtimex(2) system call.  The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.

Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented.  By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing.  As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.

Minimum set of values to keep eye on are the following three:

    The node_timex_sync_status tells if local clock is in sync with a remote
    clock.  Value is set to zero when synchronisation to a reliable server
    is lost, or a time sync software is misconfigured.

    The node_timex_offset_seconds tells how much local clock is off when
    compared to reference.  In case of multiple time references this value
    is outcome of RFC 5905 adjustment algorithm.  Ideally offset should be
    close to zero, and it depends about use case how large value is
    acceptable.  For example a typical web server is probably fine if offset
    is about 0.1 or less, but that would not be good enough for mobile phone
    base station operator.

    The node_timex_freq tells amount of adjustment to local clock tick
    frequency.  For example if offset is one second and growing the local
    clock will need instruction to tick quicker.  Number value itself is not
    very important, and occasional small adjustments are fine.  When
    frequency is unusually in stable one can assume quality of time stamps
    will not be accurate to very far in sub second range.  Obviously
    explaining why local clock frequency behaves like a passenger in roller
    coaster is different matter.  Explanations can vary from system load, to
    environmental issues such as a machine being physically too hot.

Rest of the measurements can help when debugging.  If you run a clock server
do probably want to collect and keep track of everything.

Pull-request: https://github.com/prometheus/node_exporter/pull/664

2017-09-19 07:54:06 -07:00

7.8 KiB

Raw Blame History

Node exporter

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

The WMI exporter is recommended for Windows users.

Collectors

There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.

Which collectors are used is controlled by the --collectors.enabled flag.

Enabled by default

Name	Description	OS
arp	Exposes ARP statistics from `/proc/net/arp`.	Linux
bcache	Exposes bcache statistics from `/sys/fs/bcache/`.	Linux
conntrack	Shows conntrack statistics (does nothing if no `/proc/sys/net/netfilter/` present).	Linux
cpu	Exposes CPU statistics	Darwin, Dragonfly, FreeBSD, Linux
diskstats	Exposes disk I/O statistics.	Darwin, Linux
edac	Exposes error detection and correction statistics.	Linux
entropy	Exposes available entropy.	Linux
exec	Exposes execution statistics.	Dragonfly, FreeBSD
filefd	Exposes file descriptor statistics from `/proc/sys/fs/file-nr`.	Linux
filesystem	Exposes filesystem statistics, such as disk space used.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon	Expose hardware monitoring and sensor data from `/sys/class/hwmon/`.	Linux
infiniband	Exposes network statistics specific to InfiniBand and Intel OmniPath configurations.	Linux
ipvs	Exposes IPVS status from `/proc/net/ip_vs` and stats from `/proc/net/ip_vs_stats`.	Linux
loadavg	Exposes load average.	Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm	Exposes statistics about devices in `/proc/mdstat` (does nothing if no `/proc/mdstat` present).	Linux
meminfo	Exposes memory statistics.	Darwin, Dragonfly, FreeBSD, Linux
netdev	Exposes network interface statistics such as bytes transferred.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat	Exposes network statistics from `/proc/net/netstat`. This is the same information as `netstat -s`.	Linux
sockstat	Exposes various statistics from `/proc/net/sockstat`.	Linux
stat	Exposes various statistics from `/proc/stat`. This includes boot time, forks and interrupts.	Linux
textfile	Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set.	any
time	Exposes the current system time.	any
timex	Exposes selected adjtimex(2) system call stats.	Linux
uname	Exposes system information as provided by the uname system call.	Linux
vmstat	Exposes statistics from `/proc/vmstat`.	Linux
wifi	Exposes WiFi device and station statistics.	Linux
xfs	Exposes XFS runtime statistics.	Linux (kernel 4.4+)
zfs	Exposes ZFS performance statistics.	Linux

Disabled by default

Name	Description	OS
bonding	Exposes the number of configured and active slaves of Linux bonding interfaces.	Linux
buddyinfo	Exposes statistics of memory fragments as reported by /proc/buddyinfo.	Linux
devstat	Exposes device statistics	Dragonfly, FreeBSD
drbd	Exposes Distributed Replicated Block Device statistics (to version 8.4)	Linux
interrupts	Exposes detailed interrupts statistics.	Linux, OpenBSD
ksmd	Exposes kernel and system statistics from `/sys/kernel/mm/ksm`.	Linux
logind	Exposes session counts from logind.	Linux
meminfo_numa	Exposes memory statistics from `/proc/meminfo_numa`.	Linux
mountstats	Exposes filesystem statistics from `/proc/self/mountstats`. Exposes detailed NFS client statistics.	Linux
nfs	Exposes NFS client statistics from `/proc/net/rpc/nfs`. This is the same information as `nfsstat -c`.	Linux
ntp	Exposes local NTP daemon health to check time	any
qdisc	Exposes queuing discipline statistics	Linux
runit	Exposes service status from runit.	any
supervisord	Exposes service status from supervisord.	any
systemd	Exposes service and system status from systemd.	Linux
tcpstat	Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6`. (Warning: the current version has potential performance issues in high load situations.)	Linux

Deprecated

These collectors will be (re)moved in the future.

Name	Description	OS
gmond	Exposes statistics from Ganglia.	any
megacli	Exposes RAID statistics from MegaCLI.	Linux

Textfile Collector

The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.

To use it, set the --collector.textfile.directory flag on the Node exporter. The collector will parse all files in that directory matching the glob *.prom using the text format.

To atomically push completion time for a cron job:

echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

To statically set roles for a machine using labels:

echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Building and running

go get github.com/prometheus/node_exporter
cd ${GOPATH-$HOME/go}/src/github.com/prometheus/node_exporter
make
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

Running tests

make test

Using Docker

The node_exporter is designed to monitor the host system. It's not recommended to deploy it as Docker container because it requires access to the host system. If you need to run it on Docker, you can deploy this exporter using the node-exporter Docker image with the following options and bind-mounts:

docker run -d -p 9100:9100 \
  -v "/proc:/host/proc:ro" \
  -v "/sys:/host/sys:ro" \
  -v "/:/rootfs:ro" \
  --net="host" \
  quay.io/prometheus/node-exporter \
    --collector.procfs /host/proc \
    --collector.sysfs /host/sys \
    --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"

Be aware though that the mountpoint label in various metrics will now have /rootfs as prefix.

Using a third-party repository for RHEL/CentOS/Fedora

There is a community-supplied COPR repository. It closely follows upstream releases.

7.8 KiB Raw Blame History