model list - isolation forest (#5235)

* model list - isolation forest

* curse of dimensionality

* isol forest definition change, minor fixes

* blank line fix
This commit is contained in:
Daria Karavaieva 2023-10-26 12:25:54 +02:00 committed by Aliaksandr Valialkin
parent 8fbe5a0893
commit 076a796061
No known key found for this signature in database
GPG Key ID: A72BEC6CD3D0DED1

View File

@ -17,13 +17,13 @@ Please [contact us](https://victoriametrics.com/contact-us/) to find out more._*
## About
**VictoriaMetrics Anomaly Detection** is a service that continuously scans Victoria Metrics time
**VictoriaMetrics Anomaly Detection** is a service that continuously scans VictoriaMetrics time
series and detects unexpected changes within data patterns in real-time. It does so by utilizing
user-configurable machine learning models.
It periodically queries user-specified metrics, computes an “anomaly score” for them, based on how
well they fit a predicted distribution, taking into account periodical data patterns with trends,
and pushes back the computed “anomaly score” to Victoria Metrics. Then, users can enable alerting
and pushes back the computed “anomaly score” to VictoriaMetrics. Then, users can enable alerting
rules based on the “anomaly score”.
Compared to classical alerting rules, anomaly detection is more “hands-off” i.e. it allows users to
@ -37,7 +37,7 @@ metrics.
## How?
Victoria Metrics Anomaly Detection service (**vmanomaly**) allows you to apply several built-in
VictoriaMetrics Anomaly Detection service (**vmanomaly**) allows you to apply several built-in
anomaly detection algorithms. You can also plug in your own detection models, code doesnt make any
distinction between built-in models or external ones.
@ -94,6 +94,12 @@ Currently, vmanomaly ships with a few common models:
A simple moving window of quantiles. Easy to use, easy to understand, but not as powerful as
other models.
1. **Isolation Forest**
Detects anomalies using binary trees. It works for both univariate and multivariate data. Be aware of [the curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) in the case of multivariate data - we advise against using a single model when handling multiple time series *if the number of these series significantly exceeds their average length (# of data points)*.
The algorithm has a linear time complexity and a low memory requirement, which works well with high-volume data. See [scikit-learn.org documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html) for Isolation Forest.
### Examples
For example, heres how Prophet predictions could look like on a real-data example
@ -115,7 +121,7 @@ Then, reads new data from VictoriaMetrics, according to schedule, and invokes it
“anomaly score” for each data point. The anomaly score ranges from 0 to positive infinity.
Values less than 1.0 are considered “not an anomaly”, values greater or equal than 1.0 are
considered “anomalous”, with greater values corresponding to larger anomaly.
Then, VMAnomaly pushes the metric to vminsert (under the user-configured metric name,
Then, vmanomaly pushes the metric to vminsert (under the user-configured metric name,
optionally preserving labels).