VictoriaMetrics/app/vmagent
Hui Wang d523015f27
stream aggregation: perform deduplication for all received data when … (#6711)
…specifying `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval` command-line flag

[The
documentation](https://docs.victoriametrics.com/stream-aggregation/)
contains conflicting descriptions regarding deduplication for
non-matched series when `-remoteWrite.streamAggr.config` and / or
`-streamAggr.config` are set:
1. Statement below says **all the received data** is deduplicated:
>[vmagent](https://docs.victoriametrics.com/vmagent/) supports
relabeling, deduplication and stream aggregation for all the received
data, scraped or pushed. Then, the collected data will be forwarded to
specified -remoteWrite.url destinations. The data processing order is
the following:
>1. all the received data is relabeled according to the specified
[-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling)
(if it is set)
>2. all the received data is deduplicated according to specified
[-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication)
(if it is set to duration bigger than 0)

2. Another statement says the deduplication is performed individually
for the **matching samples**
>The de-deduplication is performed after applying
[relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and
before performing the aggregation. If the -remoteWrite.streamAggr.config
and / or -streamAggr.config is set, then the de-duplication is performed
individually per each [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config)
for the matching samples after applying
[input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling).


Considering the following deduplication use cases:
1. To apply deduplication(globally or for specific remoteWrite
destination) for all the received data, scraped or pushed
--- using `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval`.
2. To deduplicate and aggregate metrics that match the rule `match`
filters
--- using `-remoteWrite.streamAggr.config` and specifiying
`dedup_interval` option in [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
3. To deduplicate all the received data while having `streamAggr.config`
for some metrics
--- no way for a single vmagent now, need to set up two level vmagents

This PR implements case3.

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-09-03 10:47:05 +02:00
..
common app/vmagent/common: use plain sync.Pool instead of a mix of sync.Pool with channel-based pool for PushCtx 2024-04-20 21:27:05 +02:00
csvimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
datadogsketches app/{vmagent,vminsert}: follow-up after a1d1ccd6f2 2024-02-07 01:28:05 +02:00
datadogv1 app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent 2023-12-21 20:50:55 +02:00
datadogv2 lib/protoparser/datadogv2: take into account source_type_name field, since it contains useful value such as kubernetes, docker, system, etc. 2023-12-21 23:05:41 +02:00
deployment Rootless docker images by default (#358) 2020-03-27 21:23:50 +02:00
graphite app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers 2023-12-04 22:46:28 +02:00
influx app/vmagent/influx: replace hybrid channel-based pool + sync.Pool with plain sync.Pool for pushCtx 2024-04-20 21:38:11 +02:00
multiarch deployment: build image for vmagent streamaggr benchmark (#6515) 2024-06-24 16:28:50 +02:00
native app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
newrelic app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
opentelemetry app/{vmagent/insert} fix typo in Firehose 2024-04-02 17:41:21 +02:00
opentsdb app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
opentsdbhttp app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
prometheusimport app: consistently use t.Fatal* instead of t.Error* (except of app/vmalert and app/vmctl - these packages will be processed in a separate commit) 2024-07-11 15:59:08 +02:00
promremotewrite lib/prompb: change type of Label.Name and Label.Value from []byte to string 2024-01-14 22:33:21 +02:00
remotewrite stream aggregation: perform deduplication for all received data when … (#6711) 2024-09-03 10:47:05 +02:00
static/css all: follow-up after 8edb390e21 2022-06-07 00:57:09 +03:00
vmimport app/vmagent: follow-up for 090cb2c9de 2023-11-25 12:09:44 +02:00
main.go app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749) 2024-08-05 09:34:54 +02:00
Makefile Add build support for loong64 (#6222) 2024-05-09 14:22:03 +02:00
README.md all: replace the outdated url https://docs.victoriametrics.com/vmagent.html with the new one - https://docs.victoriametrics.com/vmagent/ 2024-04-18 01:31:37 +02:00
vmagent.png app/vmagent: update docs 2020-02-25 00:09:18 +02:00

See vmagent docs here.

vmagent docs can be edited at docs/vmagent.md.