* remove injection hook for textfile metrics, convert them to prometheus format
* add support for summaries
* add support for histograms
* add logic for handling inconsistent labels within a metric family for counter, gauge, untyped
* change logic for parsing the metrics textfile
* fix logic to adding missing labels
* Export time and error metrics for textfiles
* Add tests for new textfile collector, fix found bugs
* refactor Update() to split into smaller functions
* remove parseTextFiles(), fix import issue
* add mtime metric directly to channel, fix handling of mtime during testing
* rename variables related to labels
* refactor: add default case, remove if guard for metrics, remove extra loop and slice
* refactor: remove extra loop iterating over metric families
* test: add test case for different metric type, fix found bug
* test: add test for metrics with inconsistent labels
* test: add test for histogram
* test: add test for histogram with extra dimension
* test: add test for summary
* test: add test for summary with extra dimension
* remove unnecessary creation of protobuf
* nit: remove extra blank line
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics. Separate these into their own metric to
avoid duplication of data.
* cpu: Support processor-less (memory-only) NUMA nodes
Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).
IMDT RAM expansion supports two modes:
* "Unify Remote Memory domains": present a processor-less (memory-only)
NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
with a portion of the memory made available by Optane and IMDT
This commit fixes a crash in the first case (when "cpulist" is empty).
Here's an example of such a system:
$ numastat -m|head -n5
Per-node system memory usage (in MBs):
Node 0 Node 1 Node 2 Total
--------------- --------------- --------------- ---------------
MemTotal 118239.56 130816.00 464384.00 713439.56
$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:
$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
Boards: 3
1 x Proc. + I/O + Memory
2 x NVM devices (Intel SSDPED1K375GAQ)
Processors: 2, Cores: 16, Threads: 32
Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
1 x 249088MB [262036/ 678/12270]
1 x 232192MB [357707/125369/ 146] 82:00.0#1
1 x 232192MB [357707/125369/ 146] 83:00.0#1
* cpu: rename some variables (pkg => node)
* cpu: Use %v not %q in log.Debugf() format strings
* cpu: Metric 'package_throttles_total' is per package.
'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).
* cpu: Better handling of a cpulist edge-case.
* cpu: Extract the package number from the directory name.
Do not rely on the range index.
* cpu: Add package_throttle_count for node0 cpu1
This file must be ignored by the cpu collector.
* Add bcache collector for Linux
This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.
* Removed commented out code
* Use project comment style
* Add _sectors to metric name to indicate unit
* Really use project comment style
* Rename bcache.go to bcache_linux.go
* Keep collector namespace clean
Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric
* Shorten slice initialization
* Change label names to backing_device, cache_device
* Remove five minute metrics (keep only total)
* Include units in additional metric names
* Enable bcache collector by default
* Provide metrics in seconds, not nanoseconds
* remove metrics with label "all"
* Add fixtures, update end-to-end for bcache collector
* Move fixtures/sys into tar.gz
This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.
The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).
* Add ttar: plain text archive, replacement for tar
This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).
Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.
The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.
The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
* Add qdisc collector for Linux
This collector gathers basic queueing discipline metrics via netlink,
similarly to what `tc -s qdisc show` does.
* qdisc collector: nl-specific code moved, names fixed
- netlink-specific parts moved to github.com/ema/qdisc
- avoid using shortened names
- counters renamed into XXX_total
* Get rid of parseMessage error checking leftover
* Add github.com/ema/qdisc to vendored packages
* Update help texts and comments
* Add qdisc collector to README file
* qdisc collector end-to-end testing
* Update qdisc dependency to latest version
Update github.com/ema/qdisc dependency to revision 2c7e72d, which
includes unit testing.
* qdisc collector: rename "iface" label into "device"
According to Mellanox, it is standard practice that the port_xmit_data and port_rcv_data
files are split into 4 lanes. To get the actual transmit and receive values for each
port, the metric needs to be multiplied by 4.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
* Implement commonalities and linux support for ARP collection
* Add ARP collector to fixtures and run as part of e2e tests
* Bubble up scanner errors
* Use single return values where it makes sense
* Add missing annotation
* Move arp_common into arp_linux
* Add license header to arp_linux.go
* Address initial feedback
* Use strings.Fields instead of strings.Split
* Deal with scanner.Err() rather than throwing away errors
* Check for scan errors in-line before interacting with the entries map
* Don't interact with potentially empty text from scan
* Check for scan errors outside the scan loop
* Add comment about moving procfs parsing
* Add more direct comment
* Update initialism style to match go style guide
* Put function args on the same line
* Add TODO in front of comment about procfs extraction
* Guard against strings.Fields returning an empty slice
* Be more defensive about ARP table format and use upcase more broadly
* Enable the ARP collector by default
* Add ARP collector to the README
* Remove 'entry'
Older versions of the OFED drivers contain 64-bit variants of the port counters and are located in a directory named 'counters_ext'. This patch includes these older metrics that have since been deprecated with OFED 4.0.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Add new metrics for the InfiniBand network protocol including the amount of packets sent and received, the number of times the link has been downed and how many times the link has recovered from an error state.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>