From 23369321f15d2c4720f92b9a6affb924337871c0 Mon Sep 17 00:00:00 2001 From: Roman Khavronenko Date: Mon, 30 Oct 2023 15:29:06 +0100 Subject: [PATCH] docs: mention information loss when downsampling gauges (#5204) Signed-off-by: hagen1778 --- README.md | 16 ++++++++++++++-- docs/README.md | 16 ++++++++++++++-- docs/Single-server-VictoriaMetrics.md | 16 ++++++++++++++-- 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index d013eea18..c9bc53700 100644 --- a/README.md +++ b/README.md @@ -1897,8 +1897,20 @@ See how to request a free trial license [here](https://victoriametrics.com/produ * `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval. -Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. -It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert). +Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) +with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the interval, in the same way as [deduplication](#deduplication). +It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram), +as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge) +would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules) +to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling. + +Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number +of samples per each series. The downsampling doesn't improve query performance if the database contains big number +of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), +since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. +It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or +recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to +[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert). Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage) and can't be performed if there is not enough of free disk space or if vmstorage diff --git a/docs/README.md b/docs/README.md index 28603c46b..e9f940bbb 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1900,8 +1900,20 @@ See how to request a free trial license [here](https://victoriametrics.com/produ * `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval. -Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. -It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert). +Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) +with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the interval, in the same way as [deduplication](#deduplication). +It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram), +as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge) +would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules) +to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling. + +Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number +of samples per each series. The downsampling doesn't improve query performance if the database contains big number +of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), +since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. +It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or +recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to +[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert). Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage) and can't be performed if there is not enough of free disk space or if vmstorage diff --git a/docs/Single-server-VictoriaMetrics.md b/docs/Single-server-VictoriaMetrics.md index 4941830c3..71a88345f 100644 --- a/docs/Single-server-VictoriaMetrics.md +++ b/docs/Single-server-VictoriaMetrics.md @@ -1908,8 +1908,20 @@ See how to request a free trial license [here](https://victoriametrics.com/produ * `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval. -Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. -It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert). +Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) +with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the interval, in the same way as [deduplication](#deduplication). +It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram), +as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge) +would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules) +to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling. + +Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number +of samples per each series. The downsampling doesn't improve query performance if the database contains big number +of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), +since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series. +It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or +recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to +[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert). Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage) and can't be performed if there is not enough of free disk space or if vmstorage