VictoriaMetrics/app/vmagent
Hui Wang a21aea5dd4
stream aggregation: perform deduplication for all received data when … (#6711)
…specifying `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval` command-line flag

[The
documentation](https://docs.victoriametrics.com/stream-aggregation/)
contains conflicting descriptions regarding deduplication for
non-matched series when `-remoteWrite.streamAggr.config` and / or
`-streamAggr.config` are set:
1. Statement below says **all the received data** is deduplicated:
>[vmagent](https://docs.victoriametrics.com/vmagent/) supports
relabeling, deduplication and stream aggregation for all the received
data, scraped or pushed. Then, the collected data will be forwarded to
specified -remoteWrite.url destinations. The data processing order is
the following:
>1. all the received data is relabeled according to the specified
[-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling)
(if it is set)
>2. all the received data is deduplicated according to specified
[-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication)
(if it is set to duration bigger than 0)

2. Another statement says the deduplication is performed individually
for the **matching samples**
>The de-deduplication is performed after applying
[relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and
before performing the aggregation. If the -remoteWrite.streamAggr.config
and / or -streamAggr.config is set, then the de-duplication is performed
individually per each [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config)
for the matching samples after applying
[input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling).

Considering the following deduplication use cases:
1. To apply deduplication(globally or for specific remoteWrite
destination) for all the received data, scraped or pushed
--- using `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval`.
2. To deduplicate and aggregate metrics that match the rule `match`
filters
--- using `-remoteWrite.streamAggr.config` and specifiying
`dedup_interval` option in [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
3. To deduplicate all the received data while having `streamAggr.config`
for some metrics
--- no way for a single vmagent now, need to set up two level vmagents

This PR implements case3.

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit d523015f27)
2024-09-03 10:49:38 +02:00
..
common app/vmagent/common: use plain sync.Pool instead of a mix of sync.Pool with channel-based pool for PushCtx 2024-04-20 21:31:14 +02:00
csvimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:13:39 +02:00
datadogsketches app/{vmagent,vminsert}: follow-up after a1d1ccd6f2 2024-02-07 01:31:52 +02:00
datadogv1 app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent 2023-12-21 20:50:27 +02:00
datadogv2 lib/protoparser/datadogv2: take into account source_type_name field, since it contains useful value such as kubernetes, docker, system, etc. 2023-12-21 23:05:52 +02:00
deployment Rootless docker images by default (#358) 2020-03-27 21:18:32 +02:00
graphite app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent 2023-12-21 20:50:27 +02:00
influx app/vmagent/influx: replace hybrid channel-based pool + sync.Pool with plain sync.Pool for pushCtx 2024-04-20 21:38:25 +02:00
multiarch deployment: build image for vmagent streamaggr benchmark (#6515) 2024-06-24 16:29:14 +02:00
native app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:13:39 +02:00
newrelic app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:13:39 +02:00
opentelemetry app/{vmagent/insert} fix typo in Firehose 2024-04-03 02:51:57 +03:00
opentsdb app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:13:39 +02:00
opentsdbhttp app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:13:39 +02:00
prometheusimport app: consistently use t.Fatal* instead of t.Error* (except of app/vmalert and app/vmctl - these packages will be processed in a separate commit) 2024-07-11 16:01:25 +02:00
promremotewrite lib/prompb: change type of Label.Name and Label.Value from []byte to string 2024-01-16 20:41:37 +02:00
remotewrite stream aggregation: perform deduplication for all received data when … (#6711) 2024-09-03 10:49:38 +02:00
static/css all: follow-up after 8edb390e21 2022-06-07 01:05:53 +03:00
vmimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:13:39 +02:00
main.go app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749) 2024-08-05 09:45:32 +02:00
Makefile Add build support for loong64 (#6222) 2024-05-10 14:32:05 +02:00
README.md all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:32:57 +02:00
vmagent.png app/vmagent: update docs 2020-02-25 00:09:53 +02:00

See vmagent docs here.

vmagent docs can be edited at docs/vmagent.md.