VictoriaMetrics/app/vmagent
Aliaksandr Valialkin bb00bae353
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.
2024-07-03 15:30:21 +02:00
..
common app/vmagent/common: use plain sync.Pool instead of a mix of sync.Pool with channel-based pool for PushCtx 2024-04-20 21:27:05 +02:00
csvimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
datadogsketches app/{vmagent,vminsert}: follow-up after a1d1ccd6f2 2024-02-07 01:28:05 +02:00
datadogv1 app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent 2023-12-21 20:50:55 +02:00
datadogv2 lib/protoparser/datadogv2: take into account source_type_name field, since it contains useful value such as kubernetes, docker, system, etc. 2023-12-21 23:05:41 +02:00
deployment Rootless docker images by default (#358) 2020-03-27 21:23:50 +02:00
graphite app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers 2023-12-04 22:46:28 +02:00
influx app/vmagent/influx: replace hybrid channel-based pool + sync.Pool with plain sync.Pool for pushCtx 2024-04-20 21:38:11 +02:00
multiarch deployment: build image for vmagent streamaggr benchmark (#6515) 2024-06-24 16:28:50 +02:00
native app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
newrelic app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
opentelemetry app/{vmagent/insert} fix typo in Firehose 2024-04-02 17:41:21 +02:00
opentsdb app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
opentsdbhttp app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
prometheusimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
promremotewrite lib/prompb: change type of Label.Name and Label.Value from []byte to string 2024-01-14 22:33:21 +02:00
remotewrite Revert "Exemplar support (#5982)" 2024-07-03 15:30:21 +02:00
static/css all: follow-up after 8edb390e21 2022-06-07 00:57:09 +03:00
statsd follow-up for c6c5a5a186 (#6265) 2024-05-16 09:25:42 +02:00
vmimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
main.go lib/httpserver: allow reloadAuthKey and configAuthKey to override htt… (#6338) 2024-06-10 12:09:47 +02:00
Makefile Add build support for loong64 (#6222) 2024-05-09 14:22:03 +02:00
README.md all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:31:37 +02:00
vmagent.png app/vmagent: update docs 2020-02-25 00:09:18 +02:00

See vmagent docs here.

vmagent docs can be edited at docs/vmagent.md.