Commit Graph

60 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
ea81f6fc36
app/vmselect/promql: add outliers_iqr(q) and outlier_iqr_over_time(m[d]) functions
These functions allow detecting anomalies in series and samples using Interquartile range method.
See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range for more details.
2023-10-31 22:10:31 +01:00
Nikolay
1f91f22b5f
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-10 13:45:20 +02:00
Aliaksandr Valialkin
f7ef80aaad
.golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:18:59 -08:00
Roman Khavronenko
e1c3267e34
vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 16:59:26 -08:00
Aliaksandr Valialkin
2cea923ff7
app/vmselect/promql: add share(q) aggregate function for normalizing results across multiple time series in [0..1] value range per each timestamp and aggregation group 2023-02-18 22:42:01 -08:00
Oleksandr Redko
9fff48c3e3
app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
9a563a6aef
app/vmselect/promql: eliminate memory allocation when sorting values inside float64s 2023-01-09 23:06:46 -08:00
Aliaksandr Valialkin
645c24dc5f
app/vmselect/promql: consistently intern series names obtained from marshalMetricNameSorted
This reduces memory allocations when the returned series names are used as map keys later
2023-01-09 22:45:40 -08:00
Aliaksandr Valialkin
562d6bca08
app/vmselect/promql: intern output series names during normal aggregation 2023-01-09 22:15:24 -08:00
Aliaksandr Valialkin
e272a0ec78
app/vmselect/promql: allow passing inf arg into functions, which accept numeric limit on the number of output time series
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3461
2022-12-10 22:47:47 -08:00
Aliaksandr Valialkin
b70f815dc4
app/vmselect/promql: remove empty series before applying aggregate function
Previously empty series (e.g. series with all NaN samples) were passed to aggregate functions.
Such series must be ingored by all the aggregate functions.
So it is better from consistency PoV filtering out empty series before applying aggregate functions.
2022-09-30 08:39:54 +03:00
Aliaksandr Valialkin
e6ed92529b
all: remove explicit "xxhash" name when importing github.com/cespare/xxhash/v2 package
This package already has the same name, so there is no need in explicit name
2022-06-21 20:23:32 +03:00
Aliaksandr Valialkin
ed97908ca9
app/vmselect/promql: rename removeNaNs() to more clear removeEmptySeries() 2022-04-20 19:53:46 +03:00
Aliaksandr Valialkin
d1f8915ed1
app/vmselect/promql: preserve the order of time series passed to limit_offset() function
Previously time series passed to `limit_offset()` were shuffled according to hash for their labels.
This was unexpected behaviour for most users.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1920 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/951
2021-12-12 18:04:58 +02:00
Aliaksandr Valialkin
f43586c63c
app/vmselect/promql: arrange function names in the code in alphabetical order
This should simplify code maintenance in the future
2021-11-14 13:55:06 +02:00
Aliaksandr Valialkin
27044b84d2
app/vmselect/promql: add limit_offset(limit, offset, q) function, which can be used for paging over big number of time series
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1778
2021-11-03 16:02:27 +02:00
Aliaksandr Valialkin
da97e58979
app/vmselect/promql: randomize the static selection of time series returned from limitk()
Sort series by a hash calculated from the series labels. This should guarantee "random" selection of the returned time series.
Previously the selection could be biased, since time series were sorted alphabetically by label names and label values.
2021-10-16 21:16:49 +03:00
Aliaksandr Valialkin
92b92d4d2c
app/vmselect/promql: consistently return the same set of time series from limitk() function
This is the expected behaviour by most users.
2021-10-08 19:53:52 +03:00
Aliaksandr Valialkin
0e3de5a0cc
app/vmselect/promql: add topk_last and bottomk_last functions 2021-09-30 13:22:52 +03:00
Aliaksandr Valialkin
c4c77aa2dd
app/vmselect/promql: follow-up after 526dd93b32
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1625
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1612
2021-09-27 18:55:39 +03:00
Roman Khavronenko
526dd93b32
app/vmselect: quantile func compatiblity with Prometheus (#1646)
* app/vmselect: `quantile` func compatiblity with Prometheus

The `quantile` func was previously calculated by https://github.com/valyala/histogram
package. The result of such calculation was always the closest real value to
requested quantile. While in Prometheus implementation interpolation is used.
Such difference may result into discrepancy in output between Prometheus and
VictoriaMetrics.

This commit adds a Prometheus-like `quantile` function. It also used by other
functions which depend on it, such as `quantiles`, `quantile_over_time`, `median` etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1625

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmselect: `quantile` review fixes

* quantile functions were split into multiple to provide
different API for already sorted data;
* float64sPool is used for reducing allocations. Items in pool may have
different sizes, but defining a new pool was complicates due to name collisions;

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-09-27 18:02:41 +03:00
Aliaksandr Valialkin
5a44be0e52 app/vmselect/promql: optimize quantiles() calculation
Calculate quantiles in one go instead of calculating each quantile individually
2021-09-17 12:33:42 +03:00
Aliaksandr Valialkin
e60dfc96ff app/vmselect/promql: add mad(q) and outliers_mad(tolerance, q) functions to MetricsQL 2021-09-16 13:33:53 +03:00
Aliaksandr Valialkin
d1a16e0891 app/vmselect/promql: use Prometheus-compatible label value formatting for count_values function 2021-09-13 13:48:06 +03:00
Aliaksandr Valialkin
5ea689d61b app/vmselect/promql: add quantile("phiLabel", phi1, ..., phiN, q) aggregate function to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1573
2021-08-27 18:37:20 +03:00
Aliaksandr Valialkin
d3fa0ccabd app/vmselect/promql: properly detect aggregate topk* and bottomk* aggregate functions in order to disable duplicate sorting
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189
2021-04-08 00:09:40 +03:00
Aliaksandr Valialkin
3055ab0115 app/vmselect/promql: add ability to set label value additionally to label name for the remaining sum of time series returned from topk_* and bottomk_* functions in the form: topk_min(N, m, "label=value") 2021-04-02 23:55:54 +03:00
Aliaksandr Valialkin
dd6bfa50e9 app/vmselect/promql: code cleanup after 43823addea 2020-11-06 01:30:50 +02:00
n4mine
43823addea
app/vmselect/promql: fix when the parameter of maxValue(), minValue() leading by NaN. it will cause {top,bottom}k_{max,min} return inappropriate result (#883) 2020-11-06 01:29:24 +02:00
Aliaksandr Valialkin
c539494b36 app/vmselect/promql: allow passing optional third argument to topk_* and bottomk_* functions in order to obtain sum of time series outside top/bottom K 2020-11-01 23:35:06 +02:00
Aliaksandr Valialkin
d8af290947 app/vmselect/promql: fix mode_over_time calculations
Previously `mode_over_time` could return garbage due to improper shuffling of input data points.
2020-10-13 11:58:25 +03:00
Aliaksandr Valialkin
7d89fafe1a app/vmselect/promql: allow passing multiple args to aggregate functions such as avg(q1, q2, q3) 2020-08-15 01:15:09 +03:00
Aliaksandr Valialkin
e6eee2bebf app/vmselect/promql: add zscore-related functions: zscore_over_time(m[d]) and zscore(q) by (...) 2020-08-03 21:52:18 +03:00
Aliaksandr Valialkin
0da202023b app/vmselect/promql: return empty values from group() if all the time series have no values at the given timestamp
This aligns `group()` behaviour to Prometheus
2020-07-28 13:40:11 +03:00
Aliaksandr Valialkin
ecb1b2564a app/vmselect/promql: add mode() aggregate function 2020-07-20 15:31:20 +03:00
Aliaksandr Valialkin
6b5ad535ae app/vmselect/promql: optimize group(rollup(m)) calculations 2020-07-17 16:47:16 +03:00
Aliaksandr Valialkin
aa5d88055d app/vmselect/promql: add group() aggregate function to MetricsQL
This function has been added in Prometheus 2.20. See https://github.com/prometheus/prometheus/pull/7480
2020-07-17 15:17:55 +03:00
Aliaksandr Valialkin
df01836818 app/vmselect/promql: keep all labels for time series from any() call 2020-07-17 15:17:54 +03:00
Aliaksandr Valialkin
538fdfe133 app/vmselect/promql: move common code from aggrFuncOutliersK and newAggrFuncRangeTopK into getRangeTopKTimeseries 2020-05-19 16:11:14 +03:00
Aliaksandr Valialkin
f52769f6ee app/vmselect/promql: fix outilersk calculations 2020-05-19 14:44:53 +03:00
Aliaksandr Valialkin
a441cdd1d9 app/vmselect/promql: add outliersk(N, m) aggregate function for anomaly detection across groups of similar time series 2020-05-19 13:52:36 +03:00
Aliaksandr Valialkin
cc311e20fe app/vmselect/promql: add any(x) by (y) aggregate function, which returns any time series from q for each group y 2020-05-12 19:45:56 +03:00
Aliaksandr Valialkin
574289c3fb app/vmselect/promql: support for sum(x) by (y) limit N syntax in order to limit the number of output time series after aggregation 2020-05-12 19:45:54 +03:00
Aliaksandr Valialkin
4e4f57b121 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:28:22 +03:00
Aliaksandr Valialkin
1925ee038d Rename lib/promql to lib/metricsql and apply small fixes 2019-12-25 22:03:59 +02:00
Mike Poindexter
bec62e4e43 Split Extended PromQL parsing to a separate library 2019-12-25 22:03:51 +02:00
Aliaksandr Valialkin
cea5a14853 all: rename Extended PromQL to PromQL extensions 2019-12-12 19:25:58 +02:00
Aliaksandr Valialkin
c09472dfd9 app/vmselect/promql: add {topk|bottomk}_{min|max|avg|median} aggregate functions for returning the exact k time series on the given time range
The full list of functions added:
- `topk_min(k, q)` - returns top K time series with the max minimums on the given time range
- `topk_max(k, q)` - returns top K time series with the max maximums on the given time range
- `topk_avg(k, q)` - returns top K time series with the max averages on the given time range
- `topk_median(k, q)` - returns top K time series with the max medians on the given time range
- `bottomk_min(k, q)` - returns bottom K time series with the min minimums on the given time range
- `bottomk_max(k, q)` - returns bottom K time series with the min maximums on the given time range
- `bottomk_avg(k, q)` - returns bottom K time series with the min averages on the given time range
- `bottomk_median(k, q)` - returns bottom K time series with the min medians on the given time range
2019-12-05 19:26:47 +02:00
Aliaksandr Valialkin
0401969d78 app/vmselect/promql: re-use metrics.Histogram when calculating histogram function for each point on the graph
This should reduce the amounts memory allocations
2019-11-25 14:24:21 +02:00
Aliaksandr Valialkin
50ae1879c6 app/vmselect/promql: add histogram aggregate function, which is useful for building heatmaps from multiple time series 2019-11-24 00:04:25 +02:00