docs: mention information loss when downsampling gauges (#5204)

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-11-23 12:31:07 +01:00 · 2023-10-30 15:29:06 +01:00 · 2023-10-30 15:29:06 +01:00 · 23369321f1
commit 23369321f1
parent abcb21aa5e
3 changed files with 42 additions and 6 deletions
--- a/README.md
+++ b/README.md
@ -1897,8 +1897,20 @@ See how to request a free trial license [here](https://victoriametrics.com/produ

 * `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval.

-Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. 
-It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).
+Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
+with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the interval, in the same way as [deduplication](#deduplication).
+It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram),
+as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge)
+would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules)
+to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling.
+
+Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number 
+of samples per each series. The downsampling doesn't improve query performance if the database contains big number 
+of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)),
+since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. 
+It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or 
+recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to 
+[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).

 Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage) 
 and can't be performed if there is not enough of free disk space or if vmstorage 
--- a/docs/README.md
+++ b/docs/README.md
@ -1900,8 +1900,20 @@ See how to request a free trial license [here](https://victoriametrics.com/produ

 * `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval.

-Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. 
-It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).
+Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
+with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the interval, in the same way as [deduplication](#deduplication).
+It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram),
+as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge)
+would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules)
+to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling.
+
+Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number 
+of samples per each series. The downsampling doesn't improve query performance if the database contains big number 
+of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)),
+since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. 
+It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or 
+recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to 
+[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).

 Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage) 
 and can't be performed if there is not enough of free disk space or if vmstorage 
--- a/docs/Single-server-VictoriaMetrics.md
+++ b/docs/Single-server-VictoriaMetrics.md
@ -1908,8 +1908,20 @@ See how to request a free trial license [here](https://victoriametrics.com/produ

 * `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval.

-Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. 
-It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).
+Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
+with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the interval, in the same way as [deduplication](#deduplication).
+It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram),
+as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge)
+would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules)
+to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling.
+
+Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number 
+of samples per each series. The downsampling doesn't improve query performance if the database contains big number 
+of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)),
+since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. 
+It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or 
+recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to 
+[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).

 Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage) 
 and can't be performed if there is not enough of free disk space or if vmstorage