VictoriaMetrics/app/vmagent/remotewrite
Hui Wang d523015f27
stream aggregation: perform deduplication for all received data when … (#6711)
…specifying `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval` command-line flag

[The
documentation](https://docs.victoriametrics.com/stream-aggregation/)
contains conflicting descriptions regarding deduplication for
non-matched series when `-remoteWrite.streamAggr.config` and / or
`-streamAggr.config` are set:
1. Statement below says **all the received data** is deduplicated:
>[vmagent](https://docs.victoriametrics.com/vmagent/) supports
relabeling, deduplication and stream aggregation for all the received
data, scraped or pushed. Then, the collected data will be forwarded to
specified -remoteWrite.url destinations. The data processing order is
the following:
>1. all the received data is relabeled according to the specified
[-remoteWrite.relabelConfig](https://docs.victoriametrics.com/vmagent/#relabeling)
(if it is set)
>2. all the received data is deduplicated according to specified
[-streamAggr.dedupInterval](https://docs.victoriametrics.com/stream-aggregation/#deduplication)
(if it is set to duration bigger than 0)

2. Another statement says the deduplication is performed individually
for the **matching samples**
>The de-deduplication is performed after applying
[relabeling](https://docs.victoriametrics.com/vmagent/#relabeling) and
before performing the aggregation. If the -remoteWrite.streamAggr.config
and / or -streamAggr.config is set, then the de-duplication is performed
individually per each [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config)
for the matching samples after applying
[input_relabel_configs](https://docs.victoriametrics.com/stream-aggregation/#relabeling).


Considering the following deduplication use cases:
1. To apply deduplication(globally or for specific remoteWrite
destination) for all the received data, scraped or pushed
--- using `-streamAggr.dedupInterval` or
`-remoteWrite.streamAggr.dedupInterval`.
2. To deduplicate and aggregate metrics that match the rule `match`
filters
--- using `-remoteWrite.streamAggr.config` and specifiying
`dedup_interval` option in [stream aggregation
config](https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config).
3. To deduplicate all the received data while having `streamAggr.config`
for some metrics
--- no way for a single vmagent now, need to set up two level vmagents

This PR implements case3.

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-09-03 10:47:05 +02:00
..
client.go app/vmagent: add remoteWrite.retryMinInterval and remoteWrite.retryMaxTime flags (#6289) 2024-08-23 14:05:51 +02:00
pendingseries_test.go Revert "Exemplar support (#5982)" 2024-07-03 15:30:21 +02:00
pendingseries_timing_test.go lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto 2024-01-14 23:04:45 +02:00
pendingseries.go Revert "Exemplar support (#5982)" 2024-07-03 15:30:21 +02:00
relabel_test.go app/vmagent/remotewrite: fix data race when extra labels are added to samples before sending them to multiple remote storage systems 2023-09-08 23:24:00 +02:00
relabel.go all: consistently use 'any' instead of 'interface{}' 2024-07-10 00:20:37 +02:00
remotewrite_test.go app/vmagent/remotewrite: follow-up for f153f54d11 2024-07-15 20:24:01 +02:00
remotewrite.go stream aggregation: perform deduplication for all received data when … (#6711) 2024-09-03 10:47:05 +02:00
streamaggr.go stream aggregation: perform deduplication for all received data when … (#6711) 2024-09-03 10:47:05 +02:00