update wiki pages

Vika 2021-12-01 15:33:29 +00:00
parent fda857bab9
commit a05cfcd96f
5 changed files with 124 additions and 20 deletions

@ -6,7 +6,11 @@ sort: 4
`vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) `vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
rules against configured address. It is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/) rules against configured `-datasource.url`. For sending alerting notifications
vmalert relies on [Alertmanager]((https://github.com/prometheus/alertmanager)) configured via `-notifier.url` flag.
Recording rules results are persisted via [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
protocol and require `-remoteWrite.url` to be configured.
Vmalert is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
implementation and aims to be compatible with its syntax. implementation and aims to be compatible with its syntax.
## Features ## Features
@ -22,12 +26,12 @@ implementation and aims to be compatible with its syntax.
* Lightweight without extra dependencies. * Lightweight without extra dependencies.
## Limitations ## Limitations
* `vmalert` execute queries against remote datasource which has reliability risks because of network. * `vmalert` execute queries against remote datasource which has reliability risks because of the network.
It is recommended to configure alerts thresholds and rules expressions with understanding that network request It is recommended to configure alerts thresholds and rules expressions with the understanding that network
may fail; requests may fail;
* by default, rules execution is sequential within one group, but persistence of execution results to remote * by default, rules execution is sequential within one group, but persistence of execution results to remote
storage is asynchronous. Hence, user shouldn't rely on chaining of recording rules when result of previous storage is asynchronous. Hence, user shouldn't rely on chaining of recording rules when result of previous
recording rule is reused in next one; recording rule is reused in the next one;
## QuickStart ## QuickStart
@ -37,13 +41,13 @@ git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics cd VictoriaMetrics
make vmalert make vmalert
``` ```
The build binary will be placed to `VictoriaMetrics/bin` folder. The build binary will be placed in `VictoriaMetrics/bin` folder.
To start using `vmalert` you will need the following things: To start using `vmalert` you will need the following things:
* list of rules - PromQL/MetricsQL expressions to execute; * list of rules - PromQL/MetricsQL expressions to execute;
* datasource address - reachable MetricsQL endpoint to run queries against; * datasource address - reachable MetricsQL endpoint to run queries against;
* notifier address [optional] - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing, * notifier address [optional] - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
aggregating alerts and sending notifications. aggregating alerts, and sending notifications.
* remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations) * remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage to persist rules and alerts state info; compatible storage to persist rules and alerts state info;
* remote read address [optional] - MetricsQL compatible datasource to restore alerts state from. * remote read address [optional] - MetricsQL compatible datasource to restore alerts state from.
@ -62,7 +66,7 @@ Then configure `vmalert` accordingly:
Note there's a separate `remoteRead.url` to allow writing results of Note there's a separate `remoteRead.url` to allow writing results of
alerting/recording rules into a different storage than the initial data that's alerting/recording rules into a different storage than the initial data that's
queried. This allows using `vmalert` to aggregate data from a short-term, queried. This allows using `vmalert` to aggregate data from a short-term,
high-frequency, high-cardinality storage into a long-term storage with high-frequency, high-cardinality storage into a long-term storage with
decreased cardinality and a bigger interval between samples. decreased cardinality and a bigger interval between samples.
@ -124,14 +128,14 @@ expression and then act according to the Rule type.
There are two types of Rules: There are two types of Rules:
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) - * [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allow to define alert conditions via `expr` field and to send notifications to Alerting rules allow defining alert conditions via `expr` field and to send notifications to
[Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty. [Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty.
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) - * [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow to define `expr` which result will be then backfilled to configured Recording rules allow defining `expr` which result will be then backfilled to configured
`-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally `-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series. expensive expressions and save their result as a new set of time series.
`vmalert` forbids defining duplicates - rules with the same combination of name, expression and labels `vmalert` forbids defining duplicates - rules with the same combination of name, expression, and labels
within one group. within one group.
#### Alerting rules #### Alerting rules
@ -147,7 +151,7 @@ alert: <string>
expr: <string> expr: <string>
# Alerts are considered firing once they have been returned for this long. # Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending. # Alerts which have not yet been fired for long enough are considered pending.
# If param is omitted or set to 0 then alerts will be immediately considered # If param is omitted or set to 0 then alerts will be immediately considered
# as firing once they return. # as firing once they return.
[ for: <duration> | default = 0s ] [ for: <duration> | default = 0s ]
@ -192,19 +196,19 @@ For recording rules to work `-remoteWrite.url` must be specified.
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags: the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state * `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol. into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
These are regular time series and may be queried from VM just as any other time series. These are regular time series and maybe queried from VM just as any other time series.
The state is stored to the configured address on every rule evaluation. The state is stored to the configured address on every rule evaluation.
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state * `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state
from configured address by querying time series with name `ALERTS_FOR_STATE`. from configured address by querying time series with name `ALERTS_FOR_STATE`.
Both flags are required for proper state restoring. Restore process may fail if time series are missing Both flags are required for proper state restoration. Restore process may fail if time series are missing
in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`) in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`)
or received state doesn't match current `vmalert` rules configuration. or received state doesn't match current `vmalert` rules configuration.
### Multitenancy ### Multitenancy
The are the following approaches exist for alerting and recording rules across There are the following approaches exist for alerting and recording rules across
[multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy): [multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy):
* To run a separate `vmalert` instance per each tenant. * To run a separate `vmalert` instance per each tenant.
@ -244,6 +248,106 @@ The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz`
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise` at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags). tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### Topology examples
The following sections are showing how `vmalert` may be used and configured
for different scenarios.
Please note, not all flags in examples are required:
* `-remoteWrite.url` and `-remoteRead.url` are optional and are needed only if
you have recording rules or want to store [alerts state](#alerts-state-on-restarts) on `vmalert` restarts;
* `-notifier.url` is optional and is needed only if you have alerting rules.
#### Single-node VictoriaMetrics
<img alt="vmalert single" src="vmalert_single.png">
`vmalert` configuration flags:
```
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
-datasource.url=http://victoriametrics:8428 \ # VM-single addr for executing rules expressions
-remoteWrite.url=http://victoriametrics:8428 \ # VM-single addr to persist alerts state and recording rules results
-remoteRead.url=http://victoriametrics:8428 \ # VM-single addr for restoring alerts state after restart
-notifier.url=http://alertmanager:9093 # AlertManager addr to send alerts when they trigger
```
The simplest configuration where one single-node VM server is used for
rules execution, storing recording rules results and alerts state.
#### Cluster VictoriaMetrics
<img alt="vmalert cluster" src="vmalert_cluster.png">
`vmalert` configuration flags:
```
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
-datasource.url=http://vmselect:8481/select/0/prometheus # vmselect addr for executing rules expressions
-remoteWrite.url=http://vminsert:8480/insert/0/prometheuss # vminsert addr to persist alerts state and recording rules results
-remoteRead.url=http://vmselect:8481/select/0/prometheus # vmselect addr for restoring alerts state after restart
-notifier.url=http://alertmanager:9093 # AlertManager addr to send alerts when they trigger
```
In [cluster mode](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html)
VictoriaMetrics has separate components for writing and reading path:
`vminsert` and `vmselect` components respectively. `vmselect` is used for executing rules expressions
and `vminsert` is used to persist recording rules results and alerts state.
Cluster mode could have multiple `vminsert` and `vmselect` components. In case when you want
to spread the load on these components - add balancers before them and configure
`vmalert` with balancer's addresses. Please, see more about VM's cluster architecture
[here](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview).
#### vmalert high availability
<img alt="vmalert ha" src="vmalert_ha.png">
`vmalert` configuration flags:
```
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
-datasource.url=http://victoriametrics:8428 \ # VM-single addr for executing rules expressions
-remoteWrite.url=http://victoriametrics:8428 \ # VM-single addr to persist alerts state and recording rules results
-remoteRead.url=http://victoriametrics:8428 \ # VM-single addr for restoring alerts state after restart
-notifier.url=http://alertmanager1:9093 \ # Multiple AlertManager addresses to send alerts when they trigger
-notifier.url=http://alertmanagerN:9093 # The same alert will be sent to all configured notifiers
```
For HA user can run multiple identically configured `vmalert` instances.
It means all of them will execute the same rules, write state and results to
the same destinations, and send alert notifications to multiple configured
Alertmanagers.
To avoid recording rules results and alerts state duplication in VictoriaMetrics server
don't forget to configure [deduplication](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication).
Alertmanager will automatically deduplicate alerts with identical labels, so ensure that
all `vmalert`s are having the same config.
Don't forget to configure [cluster mode](https://prometheus.io/docs/alerting/latest/alertmanager/)
for Alertmanagers for better reliability.
This example uses single-node VM server for the sake of simplicity.
Check how to replace it with [cluster VictoriaMetrics](#cluster-victoriametrics) if needed.
#### Downsampling and aggregation via vmalert
<img alt="vmalert multi cluster" src="vmalert_multicluster.png">
`vmalert` configuration flags:
```
./bin/vmalert -rule=downsampling-rules.yml \ # Path to the file with rules configuration. Supports wildcard
-datasource.url=http://raw-cluster-vmselect:8481/select/0/prometheus # vmselect addr for executing recordi ng rules expressions
-remoteWrite.url=http://aggregated-cluster-vminsert:8480/insert/0/prometheuss # vminsert addr to persist recording rules results
```
Example shows how to build a topology where `vmalert` will process data from one cluster
and write results into another. Such clusters may be called as "hot" (low retention,
high-speed disks, used for operative monitoring) and "cold" (long term retention,
slower/cheaper disks, low resolution data). With help of `vmalert`, user can setup
recording rules to process raw data from "hot" cluster (by applying additional transformations
or reducing resolution) and push results to "cold" cluster.
Please note, [replay](#rules-backfilling) feature may be used for transforming historical data.
Flags `-remoteRead.url` and `-notifier.url` are omitted since we assume only recording rules are used.
### Web ### Web
@ -263,7 +367,7 @@ vmalert sends requests to `<-datasource.url>/render?format=json` during evaluati
if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render` if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render`
implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) for `format=json`. implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) for `format=json`.
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on query type. to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on the query type.
## Rules backfilling ## Rules backfilling
@ -322,11 +426,11 @@ to prevent cache pollution and unwanted time range boundaries adjustment during
#### Recording rules #### Recording rules
Result of recording rules `replay` should match with results of normal rules evaluation. The result of recording rules `replay` should match with results of normal rules evaluation.
#### Alerting rules #### Alerting rules
Result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts). The result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts).
To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression: To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression:
``` ```
ALERTS{alertname="your_alertname", alertstate="firing"} ALERTS{alertname="your_alertname", alertstate="firing"}
@ -339,7 +443,7 @@ There are following non-required `replay` flags:
* `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request. * `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request.
In two words, it affects the max time range for every `/query_range` request. The higher the value, In two words, it affects the max time range for every `/query_range` request. The higher the value,
the less requests will be issued during `replay`. the fewer requests will be issued during `replay`.
* `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries * `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries
per rule before giving up. per rule before giving up.
* `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining * `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining
@ -361,7 +465,7 @@ See full description for these flags in `./vmalert --help`.
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
metrics may be analyzed later. metrics may be analyzed later.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview. Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it. Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview. Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard. a review to the dashboard.

BIN
vmalert_cluster.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

BIN
vmalert_ha.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

BIN
vmalert_multicluster.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

BIN
vmalert_single.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB