Commit Graph

159 Commits

Author SHA1 Message Date
Roman Khavronenko
a07fa92ef2 vmalert: add new metric vmalert_remotewrite_flush_duration_seconds (#1622) 2021-09-16 14:01:24 +03:00
Roman Khavronenko
286b79dc48 vmalert: create basic auth config only if args aren't empty (#1618)
* vmalert: create basic auth config only if args aren't empty

follow-up after 68721f6

* vmalert: make lint happy
2021-09-15 09:35:38 +03:00
Aliaksandr Valialkin
63fb5a5ca5 docs/vmalert.md: follow-up after 68721f6e7d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:48:45 +03:00
Roman Khavronenko
eea6317d40 vmalert: support bearer token for datasource, remotewrite and remoteread (#1614)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:48:43 +03:00
Aliaksandr Valialkin
2932f495f3 docs/CHANGELOG.md: document 5494bc02a6 2021-09-13 17:14:45 +03:00
Roman Khavronenko
7df731ea91 vmalert: add flag to limit the max value for auto-resovle duration for alerts (#1609)
* vmalert: add flag to limit the max value for auto-resovle duration for alerts

The new flag `rule.maxResolveDuration` suppose to limit max value for
alert.End param, which is used by notifiers like Alertmanager for alerts auto resolve.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586
2021-09-13 17:14:45 +03:00
Roman Khavronenko
9dd121db36 vmalert: display extra filter labels in UI (#1613) 2021-09-13 14:17:51 +03:00
Aliaksandr Valialkin
e1632dc0df docs/vmalert.md: typo fix in Multitenancy chapter 2021-09-10 17:57:43 +03:00
Aliaksandr Valialkin
d72593cca2 app/vmalert: document GroupAlerts
This makes golint happy
2021-09-07 22:50:48 +03:00
Aliaksandr Valialkin
608ed19655 app/vmalert: follow-up after 21f022e5f0 2021-09-07 22:45:47 +03:00
Roman Khavronenko
bb26aa9b1f vmalert: add initial UI implementation (#1602)
New UI pages:
/ - welcome page with API handlers list;
/groups - list of all rules per group;
/alerts - list of all active alerts;
/groupID/alertID/status - status of the active alert;
2021-09-07 22:45:45 +03:00
Roman Khavronenko
1cf4f5a715 Vmalert extra params (#1587)
* vmalert: allow extra GET params in datasource package

ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.

* vmalert: add nocache=1 param for replay process

The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.

* vmalert: mention nocache=1 in replay description

* vmalert: fix bug with unused param
2021-09-01 12:20:01 +03:00
Nikolay
210c76b51e adds external_labels per group for vmalert (#1485)
* adds external_label per group for vmalert
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471
2021-09-01 12:19:49 +03:00
Roman Khavronenko
1cb7037fc8 Vmalert metrics update (#1580)
* vmalert: remove `vmalert_execution_duration_seconds` metric

The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.

* vmalert: update config reload success metric properly

Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.

* vmalert: add Grafana dashboard to overview application metrics

* docker: include vmalert target into list for scraping

* vmalert: extend notifier metrics with addr label

The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.

* vmalert: update documentation and docker environment for vmalert's dashboard

Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.

Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
2021-09-01 12:19:34 +03:00
Aliaksandr Valialkin
ff4c7c1a3d docs/vmalert.md: run make docs-sync after 9ee3d0378f 2021-08-21 20:25:26 +03:00
Roman Khavronenko
0c2284b95f vmalert: add flag disableAlertgroupLabel for disabling extra label added to series (#1534)
The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611
may negatively impact deduplication in Alertmanager. The new flag supposed to give
an option to disable adding this label.

To enable flag just add `-disableAlertgroupLabel` to binary execution command.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532
2021-08-21 20:23:22 +03:00
Alexander Rickardsson
9e2e9d83a5 vmalert: accept http.StatusOK for remotewrite (#1550) 2021-08-21 20:23:22 +03:00
Aliaksandr Valialkin
fe8c462044 app/vmalert: mention -remoteWrite.disablePathAppend in the description for -remoteWrite.url 2021-08-16 15:23:39 +03:00
Aliaksandr Valialkin
21974cb571 app/vmalert: follow-up for 2400f85761 2021-08-16 15:20:35 +03:00
Alexander Rickardsson
d27dc3721b vmalert: enable configuring explicit path (#1536)
* vmalert: allow to disable automatically added path to remote write address via disablePathAppend flag
* docs: update docs to include remoteWrite.disablePathAppend
2021-08-16 14:58:05 +03:00
Aliaksandr Valialkin
90efb5831b lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:32:40 +03:00
Roman Khavronenko
d5ba8248cc vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518)
* vmalert: expose new metrics for tracking number of produced samples during last evaluation

Two new metrics were added to track the number of samples produced during the last evaluation:
* vmalert_recording_rules_last_evaluation_samples
* vmalert_alerting_rules_last_evaluation_samples

The gauge type is used to remain consistent with Prometheus metric
`prometheus_rule_group_last_evaluation_samples` which is on the group level.
However, the counter type was considered as well.

Two metrics instead of one are used to make it easier to separate recording and
alerting rules. It is likely, number of samples produced by recording rules is
more important so people will refer to it more frequently.

The expected usage of the new metric is the following:
```
   - alert: RecordingRuleReturnsEmptyResults
        expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1
        annotations:
          summary: Recording rule {{$labels.recording}} returns empty results.
            Please verify expression correctness.
```

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494

* vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics
2021-08-05 10:02:35 +03:00
Qifei Wan
095bb90879 app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 16:12:48 +03:00
assassins
6ab0001a1f Performance optimization (#1481)
There are redundant steps
2021-07-28 19:29:22 +03:00
Aliaksandr Valialkin
d98e22fe50 app/vmalert: accept Prometheus-like durations in interval config option inside group section 2021-07-12 12:36:22 +03:00
Aliaksandr Valialkin
acb7a95c64 app/vmselect: follow-up after aa11ef6d3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4
2021-07-07 17:45:09 +03:00
Aliaksandr Valialkin
ceda2b1df4 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:11:29 +03:00
Roman Khavronenko
c5f493db8e Vmalert docs (#1372)
* vmalert: mention what happens if `for` is set to 0 or omitted

* vmalert: add more context to docs
2021-06-14 11:43:01 +03:00
Roman Khavronenko
f3cb2158a3 vmalert: fix mistake with object reuse while parsing response (#1370)
* vmalert: fix mistake with object reuse while parsing response

During the refactoring, the wrong optimisations was applied in
parse function which caused metric fields reset. The change removes
optimisation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369

* vmalert: add test to cover multiple metrics in one response
2021-06-11 11:30:07 +03:00
Aliaksandr Valialkin
f3749dedba docs: document rules replay feature for vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

This is a follow-up for 2a259ef5e7
2021-06-09 12:30:54 +03:00
Roman Khavronenko
5aa7846900 vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:30:54 +03:00
Roman Khavronenko
e183a5c532 vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-26 12:24:27 +03:00
Roman Khavronenko
beee24ecee vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 14:15:49 +03:00
Nikolay
23a6c9c016 changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
d77db9d813 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
69e365cd48 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:30:24 +03:00
Aliaksandr Valialkin
74ef40034c lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:25:27 +03:00
Aliaksandr Valialkin
1668280e67 docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:25:21 +03:00
Aliaksandr Valialkin
6ea191d196 docs: dealay -> delay 2021-05-18 01:07:32 +03:00
Roman Khavronenko
3428df6f15 vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-17 01:51:48 +03:00
Aliaksandr Valialkin
a6cb4f10a7 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:13:34 +03:00
Aliaksandr Valialkin
cca9670573 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:58 +03:00
Roman Khavronenko
a7f00101f5 vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:18:56 +03:00
Roman Khavronenko
35237fe1f5 vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:10:32 +03:00
Aliaksandr Valialkin
07bc021f58 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 19:53:45 +03:00
Roman Khavronenko
bb7e113dd4 vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 12:24:32 +03:00
Aliaksandr Valialkin
0a2e746175 docs/vmalert.md: update docs after afca7b430c 2021-04-30 11:49:40 +03:00
Roman Khavronenko
7394967841 vmalert: fix the typo in ApplyParams func (#1259) 2021-04-30 11:47:11 +03:00
Roman Khavronenko
6fbedd62b8 vmalert: use rule's evaluationInterval as step param by default (#1258)
User still can override param by specifying `datasource.queryStep` flag.
2021-04-30 10:03:50 +03:00
Aliaksandr Valialkin
daf2778025 docs/CHANGELOG.md: document the change from f3a048288e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
2021-04-30 09:56:47 +03:00