2023-10-13 13:54:33 +02:00
---
weight: 12
menu:
docs:
2024-07-31 16:03:18 +02:00
parent: victoriametrics
2023-10-13 13:54:33 +02:00
weight: 12
title: vmalert-tool
2024-06-09 14:30:45 +02:00
aliases:
- /vmalert-tool.html
2023-10-13 13:54:33 +02:00
---
VMAlert command-line tool
## Unit testing for rules
You can use `vmalert-tool` to run unit tests for alerting and recording rules.
It will perform the following actions:
* sets up an isolated VictoriaMetrics instance;
* simulates the periodic ingestion of time series;
2024-04-18 01:44:12 +02:00
* queries the ingested data for recording and alerting rules evaluation like [vmalert ](https://docs.victoriametrics.com/vmalert/ );
2023-10-13 13:54:33 +02:00
* checks whether the firing alerts or resulting recording rules match the expected results.
See how to run vmalert-tool for unit test below:
2023-10-13 17:18:04 +02:00
2023-10-13 13:54:33 +02:00
```
2024-06-18 14:14:30 +02:00
# Run vmalert-tool with one or multiple test files via `--files` cmd-line flag
# Supports file path with hierarchical patterns and regexpes, and http url.
./vmalert-tool unittest --files /path/to/file --files http://< some-server-addr > /path/to/test.yaml
2023-10-13 13:54:33 +02:00
```
vmalert-tool unittest is compatible with [Prometheus config format for tests ](https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/#test-file-format )
except `promql_expr_test` field. Use `metricsql_expr_test` field name instead. The name is different because vmalert-tool
2024-04-18 02:14:53 +02:00
validates and executes [MetricsQL ](https://docs.victoriametrics.com/metricsql/ ) expressions,
2023-10-13 13:54:33 +02:00
which aren't always backward compatible with [PromQL ](https://prometheus.io/docs/prometheus/latest/querying/basics/ ).
2023-12-01 12:17:24 +01:00
### Limitations
* vmalert-tool evaluates all the groups defined in `rule_files` using `evaluation_interval` (default `1m` ) instead of `interval` under each rule group.
2024-04-18 01:44:12 +02:00
* vmalert-tool shares the same limitation with [vmalert ](https://docs.victoriametrics.com/vmalert/#limitations ) on chaining rules under one group:
2023-12-01 12:17:24 +01:00
>by default, rules execution is sequential within one group, but persistence of execution results to remote storage is asynchronous. Hence, user shouldn’ t rely on chaining of recording rules when result of previous recording rule is reused in the next one;
For example, you have recording rule A and alerting rule B in the same group, and rule B's expression is based on A's results.
2024-10-31 14:04:50 +01:00
Rule B won't get the latest data of A, since data didn't persist to remote storage yet.
2023-12-01 12:17:24 +01:00
The workaround is to divide them in two groups and put groupA in front of groupB (or use `group_eval_order` to define the evaluation order).
In this way, vmalert-tool makes sure that the results of groupA must be written to storage before evaluating groupB:
```yaml
groups:
- name: groupA
rules:
- record: A
expr: sum(xxx)
- name: groupB
rules:
- alert: B
expr: A >= 0.75
2024-10-31 14:04:50 +01:00
for: 1m
2023-12-01 12:17:24 +01:00
```
2023-10-13 13:54:33 +02:00
### Test file format
The configuration format for files specified in `--files` cmd-line flag is the following:
2023-10-13 17:18:04 +02:00
```yaml
2024-04-18 01:44:12 +02:00
# Path to the files or http url containing [rule groups](https://docs.victoriametrics.com/vmalert/#groups) configuration.
2023-10-13 13:54:33 +02:00
# Enterprise version of vmalert-tool supports S3 and GCS paths to rules.
rule_files:
[ - < string > ]
# The evaluation interval for rules specified in `rule_files`
[ evaluation_interval: < duration > | default = 1m ]
# Groups listed below will be evaluated by order.
# Not All the groups need not be mentioned, if not, they will be evaluated by define order in rule_files.
group_eval_order:
[ - < string > ]
# The list of unit test files to be checked during evaluation.
tests:
[ - < test_group > ]
```
#### `<test_group>`
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# Interval between samples for input series
2024-10-31 14:04:50 +01:00
[ interval: < duration > | default = evaluation_interval ]
2023-10-13 13:54:33 +02:00
# Time series to persist into the database according to configured <interval> before running tests.
input_series:
[ - < series > ]
# Name of the test group, optional
[ name: < string > ]
# Unit tests for alerting rules
alert_rule_test:
[ - < alert_test_case > ]
# Unit tests for Metricsql expressions.
metricsql_expr_test:
[ - < metricsql_expr_test > ]
2024-08-19 21:29:28 +02:00
# external_labels is not accessible for [templating](https://docs.victoriametrics.com/vmalert/#templating), use "-external.label" cmd-line flag instead.
# Will be deprecated soon, check https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6735 for details.
2023-10-13 13:54:33 +02:00
external_labels:
[ < labelname > : < string > ... ]
```
#### `<series>`
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# series in the following format '<metric name>{<label name>=<label value>, ...}'
# Examples:
# series_name{label1="value1", label2="value2"}
# go_goroutines{job="prometheus", instance="localhost:9090"}
series: < string >
# values support several special equations:
# 'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) … a+(c*b)'
# Read this as series starts at a, then c further samples incrementing by b.
# 'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) … a-(c*b)'
# Read this as series starts at a, then c further samples decrementing by b (or incrementing by negative b).
# '_' represents a missing sample from scrape
# 'stale' indicates a stale sample
# Examples:
# 1. '-2+4x3' becomes '-2 2 6 10' - series starts at -2, then 3 further samples incrementing by 4.
# 2. ' 1-2x4' becomes '1 -1 -3 -5 -7' - series starts at 1, then 4 further samples decrementing by 2.
# 3. ' 1x4' becomes '1 1 1 1 1' - shorthand for '1+0x4', series starts at 1, then 4 further samples incrementing by 0.
# 4. ' 1 _x3 stale' becomes '1 _ _ _ stale' - the missing sample cannot increment, so 3 missing samples are produced by the '_x3' expression.
values: < string >
```
#### `<alert_test_case>`
vmalert by default adds `alertgroup` and `alertname` to the generated alerts and time series.
So you will need to specify both `groupname` and `alertname` under a single `<alert_test_case>` ,
but no need to add them under `exp_alerts` .
You can also pass `--disableAlertgroupLabel` to skip `alertgroup` check.
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# The time elapsed from time=0s when this alerting rule should be checked.
# Means this rule should be firing at this point, or shouldn't be firing if 'exp_alerts' is empty.
eval_time: < duration >
# Name of the group name to be tested.
groupname: < string >
# Name of the alert to be tested.
alertname: < string >
# List of the expected alerts that are firing under the given alertname at
# the given evaluation time. If you want to test if an alerting rule should
# not be firing, then you can mention only the fields above and leave 'exp_alerts' empty.
exp_alerts:
[ - < alert > ]
```
#### `<alert>`
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# These are the expanded labels and annotations of the expected alert.
# Note: labels also include the labels of the sample associated with the alert
exp_labels:
[ < labelname > : < string > ]
exp_annotations:
[ < labelname > : < string > ]
```
#### `<metricsql_expr_test>`
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# Expression to evaluate
expr: < string >
# The time elapsed from time=0s when this expression be evaluated.
eval_time: < duration >
# Expected samples at the given evaluation time.
exp_samples:
[ - < sample > ]
```
#### `<sample>`
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# Labels of the sample in usual series notation '<metric name>{<label name>=<label value>, ...}'
# Examples:
# series_name{label1="value1", label2="value2"}
# go_goroutines{job="prometheus", instance="localhost:9090"}
labels: < string >
# The expected value of the Metricsql expression.
value: < number >
```
### Example
This is an example input file for unit testing which will pass.
2024-11-27 08:04:21 +01:00
`test.yaml` is the test file which follows the syntax above and `rules.yaml` contains the alerting rules.
2023-10-13 13:54:33 +02:00
2024-11-27 08:04:21 +01:00
With `rules.yaml` in the same directory with `test.yaml` , run `./vmalert-tool unittest --files=./unittest/testdata/test.yaml -external.label=cluster=prod` .
2023-10-13 13:54:33 +02:00
#### `test.yaml`
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
rule_files:
- rules.yaml
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: 'up{job="prometheus", instance="localhost:9090"}'
values: "0+0x1440"
metricsql_expr_test:
2024-06-03 10:04:13 +02:00
- expr: subquery_interval_test
2023-10-13 13:54:33 +02:00
eval_time: 4m
exp_samples:
2024-08-19 21:29:28 +02:00
- labels: '{__name__="subquery_interval_test", cluster="prod", instance="localhost:9090", job="prometheus"}'
2023-10-13 13:54:33 +02:00
value: 1
alert_rule_test:
- eval_time: 2h
groupname: group1
alertname: InstanceDown
exp_alerts:
- exp_labels:
job: prometheus
severity: page
instance: localhost:9090
2024-08-19 21:29:28 +02:00
cluster: prod
2023-10-13 13:54:33 +02:00
exp_annotations:
summary: "Instance localhost:9090 down"
2024-08-19 21:29:28 +02:00
description: "localhost:9090 of job prometheus in cluster prod has been down for more than 5 minutes."
2023-10-13 13:54:33 +02:00
- eval_time: 0
groupname: group1
alertname: AlwaysFiring
exp_alerts:
- exp_labels:
2024-08-19 21:29:28 +02:00
cluster: prod
2023-10-13 13:54:33 +02:00
- eval_time: 0
groupname: group1
alertname: InstanceDown
exp_alerts: []
```
2024-11-27 08:04:21 +01:00
#### `rules.yaml`
2023-10-13 13:54:33 +02:00
2023-10-13 17:18:04 +02:00
```yaml
2023-10-13 13:54:33 +02:00
# This is the rules file.
groups:
- name: group1
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
2024-08-19 21:29:28 +02:00
description: "{{ $labels.instance }} of job {{ $labels.job }} in cluster {{ $externalLabels.cluster }} has been down for more than 5 minutes."
2023-10-13 13:54:33 +02:00
- alert: AlwaysFiring
expr: 1
- name: group2
rules:
- record: job:test:count_over_time1m
expr: sum without(instance) (count_over_time(test[1m]))
2024-06-03 10:04:13 +02:00
- record: subquery_interval_test
2023-10-13 13:54:33 +02:00
expr: count_over_time(up[5m:])
2023-11-22 17:22:41 +01:00
```