mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2025-01-20 15:29:24 +01:00
update wiki pages
parent
fda857bab9
commit
a05cfcd96f
144
vmalert.md
144
vmalert.md
@ -6,7 +6,11 @@ sort: 4
|
|||||||
|
|
||||||
`vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
|
`vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
|
||||||
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
|
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
|
||||||
rules against configured address. It is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
|
rules against configured `-datasource.url`. For sending alerting notifications
|
||||||
|
vmalert relies on [Alertmanager]((https://github.com/prometheus/alertmanager)) configured via `-notifier.url` flag.
|
||||||
|
Recording rules results are persisted via [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
|
||||||
|
protocol and require `-remoteWrite.url` to be configured.
|
||||||
|
Vmalert is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
|
||||||
implementation and aims to be compatible with its syntax.
|
implementation and aims to be compatible with its syntax.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
@ -22,12 +26,12 @@ implementation and aims to be compatible with its syntax.
|
|||||||
* Lightweight without extra dependencies.
|
* Lightweight without extra dependencies.
|
||||||
|
|
||||||
## Limitations
|
## Limitations
|
||||||
* `vmalert` execute queries against remote datasource which has reliability risks because of network.
|
* `vmalert` execute queries against remote datasource which has reliability risks because of the network.
|
||||||
It is recommended to configure alerts thresholds and rules expressions with understanding that network request
|
It is recommended to configure alerts thresholds and rules expressions with the understanding that network
|
||||||
may fail;
|
requests may fail;
|
||||||
* by default, rules execution is sequential within one group, but persistence of execution results to remote
|
* by default, rules execution is sequential within one group, but persistence of execution results to remote
|
||||||
storage is asynchronous. Hence, user shouldn't rely on chaining of recording rules when result of previous
|
storage is asynchronous. Hence, user shouldn't rely on chaining of recording rules when result of previous
|
||||||
recording rule is reused in next one;
|
recording rule is reused in the next one;
|
||||||
|
|
||||||
## QuickStart
|
## QuickStart
|
||||||
|
|
||||||
@ -37,13 +41,13 @@ git clone https://github.com/VictoriaMetrics/VictoriaMetrics
|
|||||||
cd VictoriaMetrics
|
cd VictoriaMetrics
|
||||||
make vmalert
|
make vmalert
|
||||||
```
|
```
|
||||||
The build binary will be placed to `VictoriaMetrics/bin` folder.
|
The build binary will be placed in `VictoriaMetrics/bin` folder.
|
||||||
|
|
||||||
To start using `vmalert` you will need the following things:
|
To start using `vmalert` you will need the following things:
|
||||||
* list of rules - PromQL/MetricsQL expressions to execute;
|
* list of rules - PromQL/MetricsQL expressions to execute;
|
||||||
* datasource address - reachable MetricsQL endpoint to run queries against;
|
* datasource address - reachable MetricsQL endpoint to run queries against;
|
||||||
* notifier address [optional] - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
|
* notifier address [optional] - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
|
||||||
aggregating alerts and sending notifications.
|
aggregating alerts, and sending notifications.
|
||||||
* remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
|
* remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
|
||||||
compatible storage to persist rules and alerts state info;
|
compatible storage to persist rules and alerts state info;
|
||||||
* remote read address [optional] - MetricsQL compatible datasource to restore alerts state from.
|
* remote read address [optional] - MetricsQL compatible datasource to restore alerts state from.
|
||||||
@ -62,7 +66,7 @@ Then configure `vmalert` accordingly:
|
|||||||
|
|
||||||
Note there's a separate `remoteRead.url` to allow writing results of
|
Note there's a separate `remoteRead.url` to allow writing results of
|
||||||
alerting/recording rules into a different storage than the initial data that's
|
alerting/recording rules into a different storage than the initial data that's
|
||||||
queried. This allows using `vmalert` to aggregate data from a short-term,
|
queried. This allows using `vmalert` to aggregate data from a short-term,
|
||||||
high-frequency, high-cardinality storage into a long-term storage with
|
high-frequency, high-cardinality storage into a long-term storage with
|
||||||
decreased cardinality and a bigger interval between samples.
|
decreased cardinality and a bigger interval between samples.
|
||||||
|
|
||||||
@ -124,14 +128,14 @@ expression and then act according to the Rule type.
|
|||||||
|
|
||||||
There are two types of Rules:
|
There are two types of Rules:
|
||||||
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
|
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
|
||||||
Alerting rules allow to define alert conditions via `expr` field and to send notifications to
|
Alerting rules allow defining alert conditions via `expr` field and to send notifications to
|
||||||
[Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty.
|
[Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty.
|
||||||
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
|
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
|
||||||
Recording rules allow to define `expr` which result will be then backfilled to configured
|
Recording rules allow defining `expr` which result will be then backfilled to configured
|
||||||
`-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally
|
`-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally
|
||||||
expensive expressions and save their result as a new set of time series.
|
expensive expressions and save their result as a new set of time series.
|
||||||
|
|
||||||
`vmalert` forbids defining duplicates - rules with the same combination of name, expression and labels
|
`vmalert` forbids defining duplicates - rules with the same combination of name, expression, and labels
|
||||||
within one group.
|
within one group.
|
||||||
|
|
||||||
#### Alerting rules
|
#### Alerting rules
|
||||||
@ -147,7 +151,7 @@ alert: <string>
|
|||||||
expr: <string>
|
expr: <string>
|
||||||
|
|
||||||
# Alerts are considered firing once they have been returned for this long.
|
# Alerts are considered firing once they have been returned for this long.
|
||||||
# Alerts which have not yet fired for long enough are considered pending.
|
# Alerts which have not yet been fired for long enough are considered pending.
|
||||||
# If param is omitted or set to 0 then alerts will be immediately considered
|
# If param is omitted or set to 0 then alerts will be immediately considered
|
||||||
# as firing once they return.
|
# as firing once they return.
|
||||||
[ for: <duration> | default = 0s ]
|
[ for: <duration> | default = 0s ]
|
||||||
@ -192,19 +196,19 @@ For recording rules to work `-remoteWrite.url` must be specified.
|
|||||||
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
|
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
|
||||||
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
|
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
|
||||||
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
|
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
|
||||||
These are regular time series and may be queried from VM just as any other time series.
|
These are regular time series and maybe queried from VM just as any other time series.
|
||||||
The state is stored to the configured address on every rule evaluation.
|
The state is stored to the configured address on every rule evaluation.
|
||||||
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state
|
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state
|
||||||
from configured address by querying time series with name `ALERTS_FOR_STATE`.
|
from configured address by querying time series with name `ALERTS_FOR_STATE`.
|
||||||
|
|
||||||
Both flags are required for proper state restoring. Restore process may fail if time series are missing
|
Both flags are required for proper state restoration. Restore process may fail if time series are missing
|
||||||
in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`)
|
in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`)
|
||||||
or received state doesn't match current `vmalert` rules configuration.
|
or received state doesn't match current `vmalert` rules configuration.
|
||||||
|
|
||||||
|
|
||||||
### Multitenancy
|
### Multitenancy
|
||||||
|
|
||||||
The are the following approaches exist for alerting and recording rules across
|
There are the following approaches exist for alerting and recording rules across
|
||||||
[multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy):
|
[multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy):
|
||||||
|
|
||||||
* To run a separate `vmalert` instance per each tenant.
|
* To run a separate `vmalert` instance per each tenant.
|
||||||
@ -244,6 +248,106 @@ The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz`
|
|||||||
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
|
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
|
||||||
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
|
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
|
||||||
|
|
||||||
|
### Topology examples
|
||||||
|
|
||||||
|
The following sections are showing how `vmalert` may be used and configured
|
||||||
|
for different scenarios.
|
||||||
|
|
||||||
|
Please note, not all flags in examples are required:
|
||||||
|
* `-remoteWrite.url` and `-remoteRead.url` are optional and are needed only if
|
||||||
|
you have recording rules or want to store [alerts state](#alerts-state-on-restarts) on `vmalert` restarts;
|
||||||
|
* `-notifier.url` is optional and is needed only if you have alerting rules.
|
||||||
|
|
||||||
|
#### Single-node VictoriaMetrics
|
||||||
|
|
||||||
|
<img alt="vmalert single" src="vmalert_single.png">
|
||||||
|
|
||||||
|
`vmalert` configuration flags:
|
||||||
|
```
|
||||||
|
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||||
|
-datasource.url=http://victoriametrics:8428 \ # VM-single addr for executing rules expressions
|
||||||
|
-remoteWrite.url=http://victoriametrics:8428 \ # VM-single addr to persist alerts state and recording rules results
|
||||||
|
-remoteRead.url=http://victoriametrics:8428 \ # VM-single addr for restoring alerts state after restart
|
||||||
|
-notifier.url=http://alertmanager:9093 # AlertManager addr to send alerts when they trigger
|
||||||
|
```
|
||||||
|
|
||||||
|
The simplest configuration where one single-node VM server is used for
|
||||||
|
rules execution, storing recording rules results and alerts state.
|
||||||
|
|
||||||
|
#### Cluster VictoriaMetrics
|
||||||
|
|
||||||
|
<img alt="vmalert cluster" src="vmalert_cluster.png">
|
||||||
|
|
||||||
|
`vmalert` configuration flags:
|
||||||
|
```
|
||||||
|
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||||
|
-datasource.url=http://vmselect:8481/select/0/prometheus # vmselect addr for executing rules expressions
|
||||||
|
-remoteWrite.url=http://vminsert:8480/insert/0/prometheuss # vminsert addr to persist alerts state and recording rules results
|
||||||
|
-remoteRead.url=http://vmselect:8481/select/0/prometheus # vmselect addr for restoring alerts state after restart
|
||||||
|
-notifier.url=http://alertmanager:9093 # AlertManager addr to send alerts when they trigger
|
||||||
|
```
|
||||||
|
|
||||||
|
In [cluster mode](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html)
|
||||||
|
VictoriaMetrics has separate components for writing and reading path:
|
||||||
|
`vminsert` and `vmselect` components respectively. `vmselect` is used for executing rules expressions
|
||||||
|
and `vminsert` is used to persist recording rules results and alerts state.
|
||||||
|
Cluster mode could have multiple `vminsert` and `vmselect` components. In case when you want
|
||||||
|
to spread the load on these components - add balancers before them and configure
|
||||||
|
`vmalert` with balancer's addresses. Please, see more about VM's cluster architecture
|
||||||
|
[here](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview).
|
||||||
|
|
||||||
|
#### vmalert high availability
|
||||||
|
|
||||||
|
<img alt="vmalert ha" src="vmalert_ha.png">
|
||||||
|
|
||||||
|
`vmalert` configuration flags:
|
||||||
|
```
|
||||||
|
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||||
|
-datasource.url=http://victoriametrics:8428 \ # VM-single addr for executing rules expressions
|
||||||
|
-remoteWrite.url=http://victoriametrics:8428 \ # VM-single addr to persist alerts state and recording rules results
|
||||||
|
-remoteRead.url=http://victoriametrics:8428 \ # VM-single addr for restoring alerts state after restart
|
||||||
|
-notifier.url=http://alertmanager1:9093 \ # Multiple AlertManager addresses to send alerts when they trigger
|
||||||
|
-notifier.url=http://alertmanagerN:9093 # The same alert will be sent to all configured notifiers
|
||||||
|
```
|
||||||
|
|
||||||
|
For HA user can run multiple identically configured `vmalert` instances.
|
||||||
|
It means all of them will execute the same rules, write state and results to
|
||||||
|
the same destinations, and send alert notifications to multiple configured
|
||||||
|
Alertmanagers.
|
||||||
|
|
||||||
|
To avoid recording rules results and alerts state duplication in VictoriaMetrics server
|
||||||
|
don't forget to configure [deduplication](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication).
|
||||||
|
|
||||||
|
Alertmanager will automatically deduplicate alerts with identical labels, so ensure that
|
||||||
|
all `vmalert`s are having the same config.
|
||||||
|
|
||||||
|
Don't forget to configure [cluster mode](https://prometheus.io/docs/alerting/latest/alertmanager/)
|
||||||
|
for Alertmanagers for better reliability.
|
||||||
|
|
||||||
|
This example uses single-node VM server for the sake of simplicity.
|
||||||
|
Check how to replace it with [cluster VictoriaMetrics](#cluster-victoriametrics) if needed.
|
||||||
|
|
||||||
|
#### Downsampling and aggregation via vmalert
|
||||||
|
|
||||||
|
<img alt="vmalert multi cluster" src="vmalert_multicluster.png">
|
||||||
|
|
||||||
|
`vmalert` configuration flags:
|
||||||
|
```
|
||||||
|
./bin/vmalert -rule=downsampling-rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||||
|
-datasource.url=http://raw-cluster-vmselect:8481/select/0/prometheus # vmselect addr for executing recordi ng rules expressions
|
||||||
|
-remoteWrite.url=http://aggregated-cluster-vminsert:8480/insert/0/prometheuss # vminsert addr to persist recording rules results
|
||||||
|
```
|
||||||
|
|
||||||
|
Example shows how to build a topology where `vmalert` will process data from one cluster
|
||||||
|
and write results into another. Such clusters may be called as "hot" (low retention,
|
||||||
|
high-speed disks, used for operative monitoring) and "cold" (long term retention,
|
||||||
|
slower/cheaper disks, low resolution data). With help of `vmalert`, user can setup
|
||||||
|
recording rules to process raw data from "hot" cluster (by applying additional transformations
|
||||||
|
or reducing resolution) and push results to "cold" cluster.
|
||||||
|
|
||||||
|
Please note, [replay](#rules-backfilling) feature may be used for transforming historical data.
|
||||||
|
|
||||||
|
Flags `-remoteRead.url` and `-notifier.url` are omitted since we assume only recording rules are used.
|
||||||
|
|
||||||
### Web
|
### Web
|
||||||
|
|
||||||
@ -263,7 +367,7 @@ vmalert sends requests to `<-datasource.url>/render?format=json` during evaluati
|
|||||||
if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render`
|
if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render`
|
||||||
implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) for `format=json`.
|
implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) for `format=json`.
|
||||||
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
|
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
|
||||||
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on query type.
|
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on the query type.
|
||||||
|
|
||||||
## Rules backfilling
|
## Rules backfilling
|
||||||
|
|
||||||
@ -322,11 +426,11 @@ to prevent cache pollution and unwanted time range boundaries adjustment during
|
|||||||
|
|
||||||
#### Recording rules
|
#### Recording rules
|
||||||
|
|
||||||
Result of recording rules `replay` should match with results of normal rules evaluation.
|
The result of recording rules `replay` should match with results of normal rules evaluation.
|
||||||
|
|
||||||
#### Alerting rules
|
#### Alerting rules
|
||||||
|
|
||||||
Result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts).
|
The result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts).
|
||||||
To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression:
|
To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression:
|
||||||
```
|
```
|
||||||
ALERTS{alertname="your_alertname", alertstate="firing"}
|
ALERTS{alertname="your_alertname", alertstate="firing"}
|
||||||
@ -339,7 +443,7 @@ There are following non-required `replay` flags:
|
|||||||
|
|
||||||
* `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request.
|
* `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request.
|
||||||
In two words, it affects the max time range for every `/query_range` request. The higher the value,
|
In two words, it affects the max time range for every `/query_range` request. The higher the value,
|
||||||
the less requests will be issued during `replay`.
|
the fewer requests will be issued during `replay`.
|
||||||
* `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries
|
* `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries
|
||||||
per rule before giving up.
|
per rule before giving up.
|
||||||
* `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining
|
* `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining
|
||||||
@ -361,7 +465,7 @@ See full description for these flags in `./vmalert --help`.
|
|||||||
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
|
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
|
||||||
metrics may be analyzed later.
|
metrics may be analyzed later.
|
||||||
|
|
||||||
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview. Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
|
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview. Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
|
||||||
If you have suggestions for improvements or have found a bug - please open an issue on github or add
|
If you have suggestions for improvements or have found a bug - please open an issue on github or add
|
||||||
a review to the dashboard.
|
a review to the dashboard.
|
||||||
|
|
||||||
|
BIN
vmalert_cluster.png
Normal file
BIN
vmalert_cluster.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 73 KiB |
BIN
vmalert_ha.png
Normal file
BIN
vmalert_ha.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 151 KiB |
BIN
vmalert_multicluster.png
Normal file
BIN
vmalert_multicluster.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 122 KiB |
BIN
vmalert_single.png
Normal file
BIN
vmalert_single.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 62 KiB |
Loading…
Reference in New Issue
Block a user