Stream aggregation doc improvements based on users feedback (#3934)
docs: stream aggregation doc improvements based on users feedback
BIN
docs/stream-aggregation-check-avg.png
Normal file
After Width: | Height: | Size: 449 KiB |
BIN
docs/stream-aggregation-check-increase.png
Normal file
After Width: | Height: | Size: 490 KiB |
BIN
docs/stream-aggregation-check-max.png
Normal file
After Width: | Height: | Size: 444 KiB |
BIN
docs/stream-aggregation-check-min.png
Normal file
After Width: | Height: | Size: 335 KiB |
BIN
docs/stream-aggregation-check-stdvar.png
Normal file
After Width: | Height: | Size: 355 KiB |
BIN
docs/stream-aggregation-check-sum-samples.png
Normal file
After Width: | Height: | Size: 351 KiB |
BIN
docs/stream-aggregation-check-total.png
Normal file
After Width: | Height: | Size: 436 KiB |
@ -50,7 +50,7 @@ Stream aggregation can be used in the following cases:
|
||||
|
||||
### Statsd alternative
|
||||
|
||||
Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) altnernative in the following cases:
|
||||
Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) alternative in the following cases:
|
||||
|
||||
* [Counting input samples](#counting-input-samples)
|
||||
* [Summing input metrics](#summing-input-metrics)
|
||||
@ -60,8 +60,8 @@ Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) alt
|
||||
### Recording rules alternative
|
||||
|
||||
Sometimes [alerting queries](https://docs.victoriametrics.com/vmalert.html#alerting-rules) may require non-trivial amounts of CPU, RAM,
|
||||
disk IO and network bandwith at metrics storage side. For example, if `http_request_duration_seconds` histogram is generated by thousands
|
||||
of app instances, then the alerting query `histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)) > 0.5`
|
||||
disk IO and network bandwidth at metrics storage side. For example, if `http_request_duration_seconds` histogram is generated by thousands
|
||||
of application instances, then the alerting query `histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)) > 0.5`
|
||||
can become slow, since it needs to scan too big number of unique [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series)
|
||||
with `http_request_duration_seconds_bucket` name. This alerting query can be sped up by pre-calculating
|
||||
the `sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)` via [recording rule](https://docs.victoriametrics.com/vmalert.html#recording-rules).
|
||||
@ -87,6 +87,8 @@ This query is executed much faster than the original query, because it needs to
|
||||
See [the list of aggregate output](#aggregation-outputs), which can be specified at `output` field.
|
||||
See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
Field `interval` is recommended to be set to a value at least several times higher than your metrics collect interval.
|
||||
|
||||
|
||||
### Reducing the number of stored samples
|
||||
|
||||
@ -131,7 +133,7 @@ See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
### Reducing the number of stored series
|
||||
|
||||
Sometimes apps may generate too many [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
|
||||
Sometimes applications may generate too many [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
|
||||
For example, the `http_requests_total` metric may have `path` or `user` label with too big number of unique values.
|
||||
In this case the following stream aggregation can be used for reducing the number metrics stored in VictoriaMetrics:
|
||||
|
||||
@ -156,7 +158,7 @@ See [the list of aggregate output](#aggregation-outputs), which can be specified
|
||||
|
||||
### Counting input samples
|
||||
|
||||
If the monitored app generates event-based metrics, then it may be useful to count the number of such metrics
|
||||
If the monitored application generates event-based metrics, then it may be useful to count the number of such metrics
|
||||
at stream aggregation level.
|
||||
|
||||
For example, if an advertising server generates `hits{some="labels"} 1` and `clicks{some="labels"} 1` metrics
|
||||
@ -183,7 +185,7 @@ See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
### Summing input metrics
|
||||
|
||||
If the monitored app calulates some events and then sends the calculated number of events to VictoriaMetrics
|
||||
If the monitored application calculates some events and then sends the calculated number of events to VictoriaMetrics
|
||||
at irregular intervals or at too high frequency, then stream aggregation can be used for summing such events
|
||||
and writing the aggregate sums to the storage at regular intervals.
|
||||
|
||||
@ -210,10 +212,10 @@ See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
### Quantiles over input metrics
|
||||
|
||||
If the monitored app generates measurement metrics per each request, then it may be useful to calculate
|
||||
If the monitored application generates measurement metrics per each request, then it may be useful to calculate
|
||||
the pre-defined set of [percentiles](https://en.wikipedia.org/wiki/Percentile) over these measurements.
|
||||
|
||||
For example, if the monitored app generates `request_duration_seconds N` and `response_size_bytes M` metrics
|
||||
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
|
||||
per each incoming request, then the following [stream aggregation config](#stream-aggregation-config)
|
||||
can be used for calculating 50th and 99th percentiles for these metrics every 30 seconds:
|
||||
|
||||
@ -238,10 +240,10 @@ See also [histograms over input metrics](#histograms-over-input-metrics) and [ag
|
||||
|
||||
### Histograms over input metrics
|
||||
|
||||
If the monitored app generates measurement metrics per each request, then it may be useful to calculate
|
||||
If the monitored application generates measurement metrics per each request, then it may be useful to calculate
|
||||
a [histogram](https://docs.victoriametrics.com/keyConcepts.html#histogram) over these metrics.
|
||||
|
||||
For example, if the monitored app generates `request_duration_seconds N` and `response_size_bytes M` metrics
|
||||
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
|
||||
per each incoming request, then the following [stream aggregation config](#stream-aggregation-config)
|
||||
can be used for calculating [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
|
||||
for these metrics every 60 seconds:
|
||||
@ -313,7 +315,7 @@ Output metric names for stream aggregation are constructed according to the foll
|
||||
- `<output>` is the aggregate used for constucting the output metric. The aggregate name is taken from the `outputs` list
|
||||
at the corresponding [stream aggregation config](#stream-aggregation-config).
|
||||
|
||||
Both input and ouput metric names can be modified if needed via relabeling according to [these docs](#relabeling).
|
||||
Both input and output metric names can be modified if needed via relabeling according to [these docs](#relabeling).
|
||||
|
||||
|
||||
## Relabeling
|
||||
@ -334,35 +336,128 @@ For example, the following config removes the `:1m_sum_samples` suffix added [to
|
||||
|
||||
## Aggregation outputs
|
||||
|
||||
The following aggregation outputs can be put in the `outputs` list at [stream aggregation config](#stream-aggregation-config):
|
||||
|
||||
* `total` generates output [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) by summing the input counters.
|
||||
The `total` handler properly handles input counter resets.
|
||||
The `total` handler returns garbage when something other than [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) is passed to the input.
|
||||
* `increase` returns the increase of input [counters](https://docs.victoriametrics.com/keyConcepts.html#counter).
|
||||
The `increase` handler properly handles the input counter resets.
|
||||
The `increase` handler returns garbage when something other than [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) is passed to the input.
|
||||
* `count_series` counts the number of unique [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
|
||||
* `count_samples` counts the number of input [samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `sum_samples` sums input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `last` returns the last input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `min` returns the minimum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `max` returns the maximum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `avg` returns the average input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `stddev` returns [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `stdvar` returns [standard variance](https://en.wikipedia.org/wiki/Variance) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `histogram_bucket` returns [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
|
||||
for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
* `quantiles(phi1, ..., phiN)` returns [percentiles](https://en.wikipedia.org/wiki/Percentile) for the given `phi*`
|
||||
over the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
The `phi` must be in the range `[0..1]`, where `0` means `0th` percentile, while `1` means `100th` percentile.
|
||||
|
||||
The aggregations are calculated during the `interval` specified in the [config](#stream-aggregation-config)
|
||||
and then sent to the storage.
|
||||
|
||||
If `by` and `without` lists are specified in the [config](#stream-aggregation-config),
|
||||
then the [aggregation by labels](#aggregating-by-labels) is performed additionally to aggregation by `interval`.
|
||||
|
||||
Below are aggregation functions that can be put in the `outputs` list at [stream aggregation config](#stream-aggregation-config).
|
||||
|
||||
### total
|
||||
|
||||
`total` generates output [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) by summing the input counters.
|
||||
`total` only makes sense for aggregating [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) type metrics.
|
||||
|
||||
The results of `total` is equal to the `sum(some_counter)` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and `by: ["instance"]` and the regular query:
|
||||
|
||||
<img alt="total aggregation" src="stream-aggregation-check-total.png">
|
||||
|
||||
### increase
|
||||
|
||||
`increase` returns the increase of input [counters](https://docs.victoriametrics.com/keyConcepts.html#counter).
|
||||
`increase` only makes sense for aggregating [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) type metrics.
|
||||
|
||||
The results of `increase` with aggregation interval of `1m` is equal to the `increase(some_counter[1m])` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and `by: ["instance"]` and the regular query:
|
||||
|
||||
<img alt="increase aggregation" src="stream-aggregation-check-increase.png">
|
||||
|
||||
### count_series
|
||||
|
||||
`count_series` counts the number of unique [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
|
||||
|
||||
The results of `count_series` is equal to the `count(some_metric)` query.
|
||||
|
||||
### count_samples
|
||||
|
||||
`count_samples` counts the number of input [samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `count_samples` with aggregation interval of `1m` is equal to the `count_over_time(some_metric[1m])` query.
|
||||
|
||||
### sum_samples
|
||||
|
||||
`sum_samples` sums input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `sum_samples` with aggregation interval of `1m` is equal to the `sum_over_time(some_metric[1m])` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
|
||||
|
||||
<img alt="sum_samples aggregation" src="stream-aggregation-check-sum-samples.png">
|
||||
|
||||
### last
|
||||
|
||||
`last` returns the last input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `last` with aggregation interval of `1m` is equal to the `last_over_time(some_metric[1m])` query.
|
||||
|
||||
This aggregation output doesn't make much sense with `by` lists specified in the [config](#stream-aggregation-config).
|
||||
The result of aggregation by labels in this case will be undetermined, because it depends on the order of processing the time series.
|
||||
|
||||
### min
|
||||
|
||||
`min` returns the minimum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `min` with aggregation interval of `1m` is equal to the `min_over_time(some_metric[1m])` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
|
||||
|
||||
<img alt="min aggregation" src="stream-aggregation-check-min.png">
|
||||
|
||||
### max
|
||||
|
||||
`max` returns the maximum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `max` with aggregation interval of `1m` is equal to the `max_over_time(some_metric[1m])` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
|
||||
|
||||
<img alt="total aggregation" src="stream-aggregation-check-max.png">
|
||||
|
||||
### avg
|
||||
|
||||
`avg` returns the average input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `avg` with aggregation interval of `1m` is equal to the `avg_over_time(some_metric[1m])` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and `by: ["instance"]` and the regular query:
|
||||
|
||||
<img alt="avg aggregation" src="stream-aggregation-check-avg.png">
|
||||
|
||||
### stddev
|
||||
|
||||
`stddev` returns [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `stddev` with aggregation interval of `1m` is equal to the `stddev_over_time(some_metric[1m])` query.
|
||||
|
||||
### stdvar
|
||||
|
||||
`stdvar` returns [standard variance](https://en.wikipedia.org/wiki/Variance) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `stdvar` with aggregation interval of `1m` is equal to the `stdvar_over_time(some_metric[1m])` query.
|
||||
|
||||
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
|
||||
|
||||
<img alt="stdvar aggregation" src="stream-aggregation-check-stdvar.png">
|
||||
|
||||
### histogram_bucket
|
||||
|
||||
`histogram_bucket` returns [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
|
||||
for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
|
||||
The results of `histogram_bucket` with aggregation interval of `1m` is equal to the `histogram_over_time(some_histogram_bucket[1m])` query.
|
||||
|
||||
### quantiles
|
||||
|
||||
`quantiles(phi1, ..., phiN)` returns [percentiles](https://en.wikipedia.org/wiki/Percentile) for the given `phi*`
|
||||
over the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
|
||||
The `phi` must be in the range `[0..1]`, where `0` means `0th` percentile, while `1` means `100th` percentile.
|
||||
|
||||
The results of `quantiles(phi1, ..., phiN)` with aggregation interval of `1m`
|
||||
is equal to the `quantiles_over_time("quantile", phi1, ..., phiN, some_histogram_bucket[1m])` query.
|
||||
|
||||
## Aggregating by labels
|
||||
|
||||
|