Previously all the incoming samples were de-duplicated, even if their series doesn't
match aggregation rule filters. This could result in increased CPU usage.
Now the de-duplication isn't applied to samples for series, which do not match
aggregation rule filters. Such samples are just ignored.
This reverts commit 9e99f2f5b3.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4068
Reason for revert: this breaks valid use cases:
- If timestamps aren't specified in the incoming samples on purpose. For example, if stream aggregation is used
as StatsD replacement. StatsD protocol has no timestamp concept for incoming samples.
See https://github.com/b/statsd_spec
- If all the samples must be aggregated, even if they contain stale timestamps.
for example, if the stream aggregation produces some counter of some events,
it may be better to count all the events even if they were delayed before
being ingested into VictoriaMetrics.
Is is also unclear how to determine whether the sample becomes stale.
For example, if the aggregation interval equals to 1h, and the previous
aggregation cycle just finished 10 minutes ago, what to do with the newly
incoming sample with the timestamp 30 minutes older than the current time?
The answer highly depends on the context, so it is unsafe to uncoditionally
use a single logic for dropping the old samples here.
* lib/streamaggr: discard samples with timestamps not matching aggregation interval
Samples with timestamps lower than `now - aggregation_interval` are likely to be written via backfilling and should not be used for calculation of aggregation.
See #4068
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/streamaggr: make log message more descriptive, fix imports
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
This simplifies the code a bit. The fine-grained aggregator reload may be returned back
if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639