Commit Graph

1345 Commits

Author SHA1 Message Date
Karsten Weiss
a8d7d1101a cpu: Support processor-less (memory-only) NUMA nodes (#734)
* cpu: Support processor-less (memory-only) NUMA nodes

Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).

IMDT RAM expansion supports two modes:

* "Unify Remote Memory domains": present a processor-less (memory-only)
  NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
  with a portion of the memory made available by Optane and IMDT

This commit fixes a crash in the first case (when "cpulist" is empty).

Here's an example of such a system:

$ numastat -m|head -n5

Per-node system memory usage (in MBs):
                          Node 0          Node 1          Node 2           Total
                 --------------- --------------- --------------- ---------------
MemTotal               118239.56       130816.00       464384.00       713439.56

$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:

$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
    Boards:      3
       1 x Proc. + I/O + Memory
       2 x NVM devices (Intel SSDPED1K375GAQ)
    Processors:  2, Cores: 16, Threads: 32
        Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
    Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
       1 x 249088MB   [262036/   678/12270]
       1 x 232192MB   [357707/125369/  146]  82:00.0#1
       1 x 232192MB   [357707/125369/  146]  83:00.0#1

* cpu: rename some variables (pkg => node)

* cpu: Use %v not %q in log.Debugf() format strings
2017-11-10 15:31:26 +01:00
Matt Layher
f6f9c8d6cc Add and use sysReadFile in hwmon collector (#728) 2017-11-07 07:49:37 +01:00
Ben Kochie
4d7aa57da0
Update vendoring (#722)
* Update vendor github.com/beevik/ntp@v0.2.0

* Update vendor github.com/mdlayher/netlink/...

* Update vendor github.com/mdlayher/wifi/...

Adds vendor github.com/mdlayher/genetlink

* Update vendor github.com/prometheus/common/...

* Update vendor github.com/prometheus/procfs/...

* Update vendor golang.org/x/sys/unix

* Update vendor golang.org/x/sys/windows
2017-11-02 12:30:34 +01:00
david
eb3a917bd8 Use host PID namespace in docker example (#672)
* Use host PID namespace in docker example

See https://github.com/prometheus/node_exporter/issues/671

* Update readme for readability

* Fix comments in readme
2017-11-02 12:07:40 +01:00
Nicholas Johns
defe2f373c Remove travis ci (#702)
This PR closes #690
2017-11-02 12:01:28 +01:00
Tobias Klauser
d73f1e60c4 Simplify Utsname string conversion (#716)
* Update golang.org/x/sys/unix

This allows to use simplified string conversion of Utsname members.

* Simplify Utsname string conversion

Use Utsname from golang.org/x/sys/unix which contains byte array
instead of int8/uint8 array members. This allows to simplify the string
conversions of these members.
2017-11-02 11:57:14 +01:00
Ben Kochie
ea250d73f4
Fix off by one in Linux interrupts collector (#721)
* Fix off by one in Linux interrupts collector

* Fix off by one in CPU column handler.
* Add test.

* Enable interrupts in end-to-end test.
2017-11-02 09:59:46 +01:00
Julius Volz
f6556e69ec
Merge pull request #718 from prometheus/mdl-netstat-ipv6
netstat: return nothing when /proc/net/snmp6 not found
2017-10-31 21:01:17 +00:00
Matt Layher
296b62acb7
netstat: return nothing when /proc/net/snmp6 not found 2017-10-31 15:26:32 -04:00
Derek Marcotte
0eecaa9547 Correct buffer_bytes > INT_MAX on BSD/amd64. (#712)
* Correct buffer_bytes > INT_MAX on BSD/amd64.

The sysctl vfs.bufspace returns either an int or a long, depending on
the value.  Large values of vfs.bufspace will result in error messages
like:

  couldn't get meminfo: cannot allocate memory

This will detect the returned data type, and cast appropriately.

* Added explicit length checks per feedback.

* Flatten Value() to make it easier to read.

* Simplify per feedback.

* Fix style.

* Doc updates.
2017-10-25 20:55:22 +02:00
Matt Layher
715ebd1ced Merge pull request #708 from prometheus/mdl-fix-xfs
xfs: expose correct fields, fix metric names
2017-10-21 01:24:17 -04:00
Matt Layher
f9ad88fc03
xfs: expose correct fields, fix metric names 2017-10-20 18:41:51 -04:00
William
6ecd8780d9 added Wear_Leveling_Count attribute to smartmon.sh script (#707) 2017-10-19 19:20:43 +02:00
Pontus Leitzler
0b6763886a Remove unnecessary select statement (#692)
* Remove unnecessary select statement

* Remove unnecessary if-statement
2017-10-18 07:38:48 +02:00
Ben Kochie
1824ac3b9e Fix smartmon.sh textfile script (#700)
When there are no SMART compatible devices (Raspberry Pi for example) an
error is returned, but the return code is still 0.

`# scan_smart_devices: glob(3) aborted matching pattern /dev/discs/disc*`

* Remove unused `disks` variable.
* Filter for only valid `/dev` devices.
2017-10-18 07:37:47 +02:00
Siavash Safi
f3a7022602 Add collect[] parameter (#699)
* Add `collect[]` parameter

* Add TODo comment about staticcheck ignored

* Restore promhttp.HandlerOpts

* Log a warning and return HTTP error instead of failing

* Check collector existence and status, cleanups

* Fix warnings and error messages

* Don't panic, return error if collector registration failed

* Update README
2017-10-14 14:23:42 +02:00
Ben Kochie
8f9edf87b5 Add extra notes to Building section (#694)
* Add link to Golang
* Add note about RHEL/CentOS build dep.
2017-10-11 11:46:13 +02:00
Wei Wei
1e4af21256 add rslave for docker example, so node_exporter can receive host mount/unmount events (#660) 2017-10-11 11:18:30 +02:00
Ben Kochie
6e2053c557 Fix circle docker test tag name. (#688)
The default DOCKER_IMAGE_TAG setup fails when running in circle,
override with the CIRCLE_TAG.
2017-10-06 12:33:03 +02:00
Ben Kochie
f84dd15be7 Release v0.15.0 (#686)
* Release v0.15.0

* Bump version.
* Update CHANGELOG.

* Update to Go 1.9 in circle.yml
2017-10-06 09:43:58 +02:00
Ben Kochie
deadfef4c9 Update vendoring (#685)
* Update vendor github.com/coreos/go-systemd/dbus@v15

* Update vendor github.com/ema/qdisc

* Update vendor github.com/godbus/dbus

* Update vendor github.com/golang/protobuf/proto

* Update vendor github.com/lufia/iostat

* Update vendor github.com/matttproud/golang_protobuf_extensions/pbutil@v1.0.0

* Update vendor github.com/prometheus/client_golang/...

* Update vendor github.com/prometheus/common/...

* Update vendor github.com/prometheus/procfs/...

* Update vendor github.com/sirupsen/logrus@v1.0.3

Adds vendor golang.org/x/crypto

* Update vendor golang.org/x/net/...

* Update vendor golang.org/x/sys/...

* Update end to end output.
2017-10-05 16:20:47 +02:00
Tobias Schmidt
ba96b6561b Merge pull request #682 from derekmarcotte/dm-386-native
Only enable race detector when GOHOSTARCH is amd64.
2017-10-05 09:07:52 +02:00
Ben Kochie
a47f033f1b Add text file helper for apt-get. (#680)
* Add metric for pending upgrades.
* Add metric for pending reboot required.
2017-10-04 08:34:30 +02:00
Brett Vickers
b62c7bc0ad Updated vendored ntp package (#681)
The github.com/beevik/ntp package was recently updated with some
API changes that broke node_exporter. This commit fetches the
latest version of the ntp package and brings node_exporter in
line with the latest API.
2017-10-04 08:33:49 +02:00
Derek Marcotte
a6b8922a01 Only enable race detector when GOHOSTARCH is amd64.
This enables native builds to still run the test and all targets without
problems on say 386.

Build failure on Buildkite build 85, prevents enabling native FreeBSD
386 builds.
2017-10-03 16:40:22 -04:00
Calle Pettersson
859a825bb8 Replace --collectors.enabled with per-collector flags (#640)
* Move NodeCollector into package collector

* Refactor collector enabling

* Update README with new collector enabled flags

* Fix out-of-date inline flag reference syntax

* Use new flags in end-to-end tests

* Add flag to disable all default collectors

* Track if a flag has been set explicitly

* Add --collectors.disable-defaults to README

* Revert disable-defaults flag

* Shorten flags

* Fixup timex collector registration

* Fix end-to-end tests

* Change procfs and sysfs path flags

* Fix review comments
2017-09-28 15:06:26 +02:00
Sami Kerola
3762191e66 Add timex collector (#664)
This collector is based on adjtimex(2) system call.  The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.

Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented.  By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing.  As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.

Minimum set of values to keep eye on are the following three:

    The node_timex_sync_status tells if local clock is in sync with a remote
    clock.  Value is set to zero when synchronisation to a reliable server
    is lost, or a time sync software is misconfigured.

    The node_timex_offset_seconds tells how much local clock is off when
    compared to reference.  In case of multiple time references this value
    is outcome of RFC 5905 adjustment algorithm.  Ideally offset should be
    close to zero, and it depends about use case how large value is
    acceptable.  For example a typical web server is probably fine if offset
    is about 0.1 or less, but that would not be good enough for mobile phone
    base station operator.

    The node_timex_freq tells amount of adjustment to local clock tick
    frequency.  For example if offset is one second and growing the local
    clock will need instruction to tick quicker.  Number value itself is not
    very important, and occasional small adjustments are fine.  When
    frequency is unusually in stable one can assume quality of time stamps
    will not be accurate to very far in sub second range.  Obviously
    explaining why local clock frequency behaves like a passenger in roller
    coaster is different matter.  Explanations can vary from system load, to
    environmental issues such as a machine being physically too hot.

Rest of the measurements can help when debugging.  If you run a clock server
do probably want to collect and keep track of everything.

Pull-request: https://github.com/prometheus/node_exporter/pull/664
2017-09-19 07:54:06 -07:00
Leonid Evdokimov
c169b4b1c5 Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check (#655)
* Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check

1. Checking local clock against remote NTP daemon is bad idea, local
ntpd acting as a  client should do it better and avoid excessive load on
remote NTP server so the collector is refactored to query local NTP
server.

2. Checking local clock against remote one does not check local ntpd
itself. Local ntpd may be down or out of sync due to network issues, but
clock will be OK.

3. Checking NTP server using sanity of it's response is tricky and
depends on ntpd implementation, that's why common `node_ntp_sanity`
variable is exported.

* `govendor add golang.org/x/net/ipv4`, it is dependency of github.com/beevik/ntp

* Update github.com/beevik/ntp to include boring SNTP fix

* Use variable name from RFC5905

* ntp: move code to make export of raw metrics more explicit

* Move NTP math to `github.com/beevik/ntp`

* Make `golint` happy

* Add some brief docs explaining `ntp` #655 and `timex` #664 modules

* ntp: drop XXX comment that got its decision

* ntp: add `_seconds` suffix to relevant metrics

* Better `node_ntp_leap` comment

* s/node_ntp_reftime/node_ntp_reference_timestamp_seconds/ as requested by @discordianfish

* Extract subsystem name to const as suggested by @SuperQ
2017-09-19 10:36:14 +02:00
Karsten Weiss
b0d5c00832 cpu: Metric 'package_throttles_total' is per package. (#657)
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Alexey Palazhchenko
abb58a31e2 Test with Go 1.9.x (#667) 2017-08-31 18:00:55 +02:00
Matt Bostock
89a2f21f45 Always try to return smartmon_device_info metric (#663)
* Always try to return smartmon_device_info metric

Sometimes the 'model family' field is not returned by `smartctl' because
a disk is not in the disk database for the version of smartmontools
installed on the system.

In those cases, the device model and serial number is still returned (at
least as far as I have observed.

Re-work the logic to prefer the 'vendor' field first, and if not
present, always output a `smartmon_device_info` metric even if some
labels have empty values.

On the box I'm testing this on, where previously no metric was returned,
it now returns:

    # HELP smartmon_device_info SMART metric device_info
    # TYPE smartmon_device_info gauge
    smartmon_device_info{disk="/dev/sda",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdb",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdc",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdd",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sde",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdf",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1

* Add trailing newline

Because POSIX:
https://stackoverflow.com/a/729795
2017-08-31 18:00:42 +02:00
Tobias Schmidt
f9a2388c60 Merge pull request #662 from prometheus/bjk/buildkite
Add buildkite status badge.
2017-08-24 12:59:18 +02:00
Ben Kochie
9947f602f3 Add buildkite status badge. 2017-08-24 12:29:34 +02:00
Matthias Rampke
d3e3a9c181 Only cross-test 32bit on Linux (#658)
This doesn't work on at least FreeBSD and Darwin. It does work on Linux,
only try it there.
2017-08-24 09:13:17 +02:00
Christian Will
2ed98fd5a5 define binary name in promu configuration file (#650) 2017-08-22 17:24:07 +02:00
Tobias Schmidt
505275b48c Merge pull request #652 from prometheus/mr/test-32
Automatically cross-test 32bit based on GOARCH
2017-08-22 00:10:04 +02:00
Tobias Schmidt
ba6897583b Merge pull request #653 from prometheus/mr/fix-629
Use int64 throughout the ZFS collector.
2017-08-21 22:28:37 +02:00
Matthias Rampke
7420046383 Automatically cross-test 32bit based on GOARCH
Try to determine the corresponding 32bit architecture from the current
GOARCH and run the tests under that architecture. This only works on a
GOOS/GOARCH that can execute binaries for the smaller architecture, such
as running linux/386 binaries under linux/amd64.

I tested that this works under linux/amd64 and darwin/amd64, the rest of
the architectures is guesswork.

While we still only run regular tests on Intel/Linux architectures, this
covers general integer overflow issues like #629.
2017-08-21 17:27:25 +00:00
Matthias Rampke
5aa6819eb1 gofmt node_exporter_test 2017-08-21 16:45:42 +00:00
Matthias Rampke
e1f129c729 Use int64 throughout the ZFS collector.
This avoids issues with integer overflows on 32-bit architectures. The
Prometheus data format is float64, so regardless of the architecture we
should handle large numbers.

Fixes #629.
2017-08-21 16:40:16 +00:00
Matthias Rampke
8661bbbb42 Merge pull request #651 from TheTincho/fix_integration_test_timing
Fix path and timing issues with integration tests.
2017-08-19 15:12:42 +02:00
Martín Ferrari
2cd49eb020 Fix path and timing issues with integration tests. 2017-08-19 11:37:57 +02:00
Ben Kochie
8839640cd1 Ignore wifi collector permission errors (#646)
Ignore the permission denined error when the wifi collector has no
permission to read metrics.
2017-08-18 10:19:48 +02:00
Ben Kochie
b7cc6fbea7 Add additional field to github issue template. (#645)
* Add additional field to github issue template.

Request the command line flags to the exporter.

* Update version flag for kingpin.
2017-08-17 12:44:26 +02:00
Hemant Kumar
de08e38c5e Add dockerfile for ppc64le (#638)
* Add dockerfile for ppc64le and related changes

* Pass the fill file as DOCKEFILE

* Add the dockerfile name to build msg
2017-08-17 11:53:04 +02:00
Joe Handzik
4b011bfe44 Clarify Infiniband collector support (#643)
Tested a DL360 Gen9 box with an Omni-Path adapter in it. The existing InfiniBand collector can provide support for the same metrics on Omni-Path cards as well.

Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-08-16 07:32:54 +02:00
Calle Pettersson
dfe07eaae8 Switch to kingpin flags (#639)
* Switch to kingpin flags

* Fix logrus vendoring

* Fix flags in main tests

* Fix vendoring versions
2017-08-12 15:07:24 +02:00
Vojtech Galda
1467d845fb Status information in /proc/drbd (#630)
in version 8.4 deprecated (but won’t be removed)
2017-08-02 08:04:13 +02:00
Matthias Rampke
6506513be5 Merge pull request #626 from teohhanhui/patch-1
Fix Docker mountpoint prefix docs
2017-07-28 09:32:19 +02:00
Teoh Han Hui
0b1f64bb15 Fix Docker mountpoint prefix docs 2017-07-28 15:06:28 +08:00