Stream aggregation doc improvements based on users feedback (#3934)

docs: stream aggregation doc improvements based on users feedback
This commit is contained in:
Alexander Marshalov 2023-03-10 21:39:58 +01:00 committed by GitHub
parent 3eebe52a06
commit c0c3dc02cf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 129 additions and 34 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 449 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 490 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 444 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 335 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 355 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 351 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 436 KiB

View File

@ -50,7 +50,7 @@ Stream aggregation can be used in the following cases:
### Statsd alternative
Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) altnernative in the following cases:
Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) alternative in the following cases:
* [Counting input samples](#counting-input-samples)
* [Summing input metrics](#summing-input-metrics)
@ -60,8 +60,8 @@ Stream aggregation can be used as [statsd](https://github.com/statsd/statsd) alt
### Recording rules alternative
Sometimes [alerting queries](https://docs.victoriametrics.com/vmalert.html#alerting-rules) may require non-trivial amounts of CPU, RAM,
disk IO and network bandwith at metrics storage side. For example, if `http_request_duration_seconds` histogram is generated by thousands
of app instances, then the alerting query `histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)) > 0.5`
disk IO and network bandwidth at metrics storage side. For example, if `http_request_duration_seconds` histogram is generated by thousands
of application instances, then the alerting query `histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)) > 0.5`
can become slow, since it needs to scan too big number of unique [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series)
with `http_request_duration_seconds_bucket` name. This alerting query can be sped up by pre-calculating
the `sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)` via [recording rule](https://docs.victoriametrics.com/vmalert.html#recording-rules).
@ -87,6 +87,8 @@ This query is executed much faster than the original query, because it needs to
See [the list of aggregate output](#aggregation-outputs), which can be specified at `output` field.
See also [aggregating by labels](#aggregating-by-labels).
Field `interval` is recommended to be set to a value at least several times higher than your metrics collect interval.
### Reducing the number of stored samples
@ -131,7 +133,7 @@ See also [aggregating by labels](#aggregating-by-labels).
### Reducing the number of stored series
Sometimes apps may generate too many [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
Sometimes applications may generate too many [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
For example, the `http_requests_total` metric may have `path` or `user` label with too big number of unique values.
In this case the following stream aggregation can be used for reducing the number metrics stored in VictoriaMetrics:
@ -156,7 +158,7 @@ See [the list of aggregate output](#aggregation-outputs), which can be specified
### Counting input samples
If the monitored app generates event-based metrics, then it may be useful to count the number of such metrics
If the monitored application generates event-based metrics, then it may be useful to count the number of such metrics
at stream aggregation level.
For example, if an advertising server generates `hits{some="labels"} 1` and `clicks{some="labels"} 1` metrics
@ -183,7 +185,7 @@ See also [aggregating by labels](#aggregating-by-labels).
### Summing input metrics
If the monitored app calulates some events and then sends the calculated number of events to VictoriaMetrics
If the monitored application calculates some events and then sends the calculated number of events to VictoriaMetrics
at irregular intervals or at too high frequency, then stream aggregation can be used for summing such events
and writing the aggregate sums to the storage at regular intervals.
@ -210,10 +212,10 @@ See also [aggregating by labels](#aggregating-by-labels).
### Quantiles over input metrics
If the monitored app generates measurement metrics per each request, then it may be useful to calculate
If the monitored application generates measurement metrics per each request, then it may be useful to calculate
the pre-defined set of [percentiles](https://en.wikipedia.org/wiki/Percentile) over these measurements.
For example, if the monitored app generates `request_duration_seconds N` and `response_size_bytes M` metrics
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
per each incoming request, then the following [stream aggregation config](#stream-aggregation-config)
can be used for calculating 50th and 99th percentiles for these metrics every 30 seconds:
@ -238,10 +240,10 @@ See also [histograms over input metrics](#histograms-over-input-metrics) and [ag
### Histograms over input metrics
If the monitored app generates measurement metrics per each request, then it may be useful to calculate
If the monitored application generates measurement metrics per each request, then it may be useful to calculate
a [histogram](https://docs.victoriametrics.com/keyConcepts.html#histogram) over these metrics.
For example, if the monitored app generates `request_duration_seconds N` and `response_size_bytes M` metrics
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
per each incoming request, then the following [stream aggregation config](#stream-aggregation-config)
can be used for calculating [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
for these metrics every 60 seconds:
@ -313,7 +315,7 @@ Output metric names for stream aggregation are constructed according to the foll
- `<output>` is the aggregate used for constucting the output metric. The aggregate name is taken from the `outputs` list
at the corresponding [stream aggregation config](#stream-aggregation-config).
Both input and ouput metric names can be modified if needed via relabeling according to [these docs](#relabeling).
Both input and output metric names can be modified if needed via relabeling according to [these docs](#relabeling).
## Relabeling
@ -334,35 +336,128 @@ For example, the following config removes the `:1m_sum_samples` suffix added [to
## Aggregation outputs
The following aggregation outputs can be put in the `outputs` list at [stream aggregation config](#stream-aggregation-config):
* `total` generates output [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) by summing the input counters.
The `total` handler properly handles input counter resets.
The `total` handler returns garbage when something other than [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) is passed to the input.
* `increase` returns the increase of input [counters](https://docs.victoriametrics.com/keyConcepts.html#counter).
The `increase` handler properly handles the input counter resets.
The `increase` handler returns garbage when something other than [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) is passed to the input.
* `count_series` counts the number of unique [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
* `count_samples` counts the number of input [samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `sum_samples` sums input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `last` returns the last input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `min` returns the minimum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `max` returns the maximum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `avg` returns the average input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `stddev` returns [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `stdvar` returns [standard variance](https://en.wikipedia.org/wiki/Variance) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `histogram_bucket` returns [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
* `quantiles(phi1, ..., phiN)` returns [percentiles](https://en.wikipedia.org/wiki/Percentile) for the given `phi*`
over the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The `phi` must be in the range `[0..1]`, where `0` means `0th` percentile, while `1` means `100th` percentile.
The aggregations are calculated during the `interval` specified in the [config](#stream-aggregation-config)
and then sent to the storage.
If `by` and `without` lists are specified in the [config](#stream-aggregation-config),
then the [aggregation by labels](#aggregating-by-labels) is performed additionally to aggregation by `interval`.
Below are aggregation functions that can be put in the `outputs` list at [stream aggregation config](#stream-aggregation-config).
### total
`total` generates output [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) by summing the input counters.
`total` only makes sense for aggregating [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) type metrics.
The results of `total` is equal to the `sum(some_counter)` query.
For example, see below time series produced by config with aggregation interval `1m` and `by: ["instance"]` and the regular query:
<img alt="total aggregation" src="stream-aggregation-check-total.png">
### increase
`increase` returns the increase of input [counters](https://docs.victoriametrics.com/keyConcepts.html#counter).
`increase` only makes sense for aggregating [counter](https://docs.victoriametrics.com/keyConcepts.html#counter) type metrics.
The results of `increase` with aggregation interval of `1m` is equal to the `increase(some_counter[1m])` query.
For example, see below time series produced by config with aggregation interval `1m` and `by: ["instance"]` and the regular query:
<img alt="increase aggregation" src="stream-aggregation-check-increase.png">
### count_series
`count_series` counts the number of unique [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series).
The results of `count_series` is equal to the `count(some_metric)` query.
### count_samples
`count_samples` counts the number of input [samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `count_samples` with aggregation interval of `1m` is equal to the `count_over_time(some_metric[1m])` query.
### sum_samples
`sum_samples` sums input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `sum_samples` with aggregation interval of `1m` is equal to the `sum_over_time(some_metric[1m])` query.
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
<img alt="sum_samples aggregation" src="stream-aggregation-check-sum-samples.png">
### last
`last` returns the last input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `last` with aggregation interval of `1m` is equal to the `last_over_time(some_metric[1m])` query.
This aggregation output doesn't make much sense with `by` lists specified in the [config](#stream-aggregation-config).
The result of aggregation by labels in this case will be undetermined, because it depends on the order of processing the time series.
### min
`min` returns the minimum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `min` with aggregation interval of `1m` is equal to the `min_over_time(some_metric[1m])` query.
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
<img alt="min aggregation" src="stream-aggregation-check-min.png">
### max
`max` returns the maximum input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `max` with aggregation interval of `1m` is equal to the `max_over_time(some_metric[1m])` query.
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
<img alt="total aggregation" src="stream-aggregation-check-max.png">
### avg
`avg` returns the average input [sample value](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `avg` with aggregation interval of `1m` is equal to the `avg_over_time(some_metric[1m])` query.
For example, see below time series produced by config with aggregation interval `1m` and `by: ["instance"]` and the regular query:
<img alt="avg aggregation" src="stream-aggregation-check-avg.png">
### stddev
`stddev` returns [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `stddev` with aggregation interval of `1m` is equal to the `stddev_over_time(some_metric[1m])` query.
### stdvar
`stdvar` returns [standard variance](https://en.wikipedia.org/wiki/Variance) for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `stdvar` with aggregation interval of `1m` is equal to the `stdvar_over_time(some_metric[1m])` query.
For example, see below time series produced by config with aggregation interval `1m` and the regular query:
<img alt="stdvar aggregation" src="stream-aggregation-check-stdvar.png">
### histogram_bucket
`histogram_bucket` returns [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
for the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The results of `histogram_bucket` with aggregation interval of `1m` is equal to the `histogram_over_time(some_histogram_bucket[1m])` query.
### quantiles
`quantiles(phi1, ..., phiN)` returns [percentiles](https://en.wikipedia.org/wiki/Percentile) for the given `phi*`
over the input [sample values](https://docs.victoriametrics.com/keyConcepts.html#raw-samples).
The `phi` must be in the range `[0..1]`, where `0` means `0th` percentile, while `1` means `100th` percentile.
The results of `quantiles(phi1, ..., phiN)` with aggregation interval of `1m`
is equal to the `quantiles_over_time("quantile", phi1, ..., phiN, some_histogram_bucket[1m])` query.
## Aggregating by labels