Commit Graph

208 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
d375d9b878 lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:29:33 +03:00
Roman Khavronenko
7416fdaa8b
vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518)
* vmalert: expose new metrics for tracking number of produced samples during last evaluation

Two new metrics were added to track the number of samples produced during the last evaluation:
* vmalert_recording_rules_last_evaluation_samples
* vmalert_alerting_rules_last_evaluation_samples

The gauge type is used to remain consistent with Prometheus metric
`prometheus_rule_group_last_evaluation_samples` which is on the group level.
However, the counter type was considered as well.

Two metrics instead of one are used to make it easier to separate recording and
alerting rules. It is likely, number of samples produced by recording rules is
more important so people will refer to it more frequently.

The expected usage of the new metric is the following:
```
   - alert: RecordingRuleReturnsEmptyResults
        expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1
        annotations:
          summary: Recording rule {{$labels.recording}} returns empty results.
            Please verify expression correctness.
```

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494

* vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics
2021-08-05 09:59:46 +03:00
Qifei Wan
fa9c5c5940
app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 12:55:29 +03:00
assassins
a483044557
Performance optimization (#1481)
There are redundant steps
2021-07-28 19:26:20 +03:00
Aliaksandr Valialkin
bfba4c28a4 app/vmalert: accept Prometheus-like durations in interval config option inside group section 2021-07-12 12:35:17 +03:00
Aliaksandr Valialkin
c5f0b454f0 app/vmselect: follow-up after aa11ef6d3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
2021-07-07 17:43:35 +03:00
Aliaksandr Valialkin
766edbc421 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:09:40 +03:00
Roman Khavronenko
6d5a8c28cd
Vmalert docs (#1372)
* vmalert: mention what happens if `for` is set to 0 or omitted

* vmalert: add more context to docs
2021-06-11 13:25:53 +03:00
Roman Khavronenko
7adfe878e1
vmalert: fix mistake with object reuse while parsing response (#1370)
* vmalert: fix mistake with object reuse while parsing response

During the refactoring, the wrong optimisations was applied in
parse function which caused metric fields reset. The change removes
optimisation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369

* vmalert: add test to cover multiple metrics in one response
2021-06-11 11:22:05 +03:00
Aliaksandr Valialkin
ab15bf8c90 docs: document rules replay feature for vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

This is a follow-up for 2a259ef5e7
2021-06-09 12:27:34 +03:00
Roman Khavronenko
2a259ef5e7
vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:20:38 +03:00
Roman Khavronenko
d210958fd0
vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-25 14:27:22 +01:00
Roman Khavronenko
84cc0513e1
vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 00:26:01 +03:00
Aliaksandr Valialkin
c54bb73867 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:34:06 +03:00
Nikolay
d626c5c2a9
changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 13:55:43 +03:00
Aliaksandr Valialkin
4c7bb75fa2 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:27:10 +03:00
Aliaksandr Valialkin
f4719889da lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:26:16 +03:00
Aliaksandr Valialkin
b30925738b docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:26:14 +03:00
Aliaksandr Valialkin
6c944b86d8 docs: dealay -> delay
Thanks to @jelmd . See 0b7e3510c8 (r50884991)
2021-05-18 01:07:52 +03:00
Roman Khavronenko
b38edec7ee
vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-15 13:25:57 +03:00
Aliaksandr Valialkin
10a47af631 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:12:24 +03:00
Aliaksandr Valialkin
229d9d6dd7 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:26 +03:00
Roman Khavronenko
5c448126dc
vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:11:45 +03:00
Roman Khavronenko
4247168a2d
vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:06:31 +03:00
Aliaksandr Valialkin
237e9f9fd7 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 15:59:35 +03:00
Roman Khavronenko
9cdd4696fe
vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 10:07:19 +03:00
Aliaksandr Valialkin
508f500477 docs/vmalert.md: update docs after afca7b430c 2021-04-30 11:49:01 +03:00
Roman Khavronenko
0f988e5a31
vmalert: fix the typo in ApplyParams func (#1259) 2021-04-30 10:31:45 +03:00
Roman Khavronenko
afca7b430c
vmalert: use rule's evaluationInterval as step param by default (#1258)
User still can override param by specifying `datasource.queryStep` flag.
2021-04-30 10:01:05 +03:00
Aliaksandr Valialkin
4394dc6cbb docs/CHANGELOG.md: document the change from f3a048288e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
2021-04-30 09:54:41 +03:00
Roman Khavronenko
f3a048288e
Vmalert: adjust time param for datasource queries according to evaluationInterval (#1257)
* Simplify arguments list for fn `queryDataSource` to improve readbility

* vmalert: adjust `time` param according to rule evaluation interval

With this change, vmalert will start to use rule's evaluation interval
for truncating the `time` param. This is mostly needed to produce consistent
time series with timestamps unaffected by vmalert start time. Now, timestamp
becomes predictable.
Additionally, adjustment is similar to what Grafana does for plotting range graphs.
Hence, recording rule series and recording rule expression plotted in grafana
suppose to become similar in most of cases.
2021-04-30 09:46:03 +03:00
Aliaksandr Valialkin
6fa5981e68 app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:36:43 +03:00
Nikolay
15609ee447
changes vmalert Querier with per rule querier (#1249)
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
2021-04-28 21:41:15 +01:00
Roman Khavronenko
87018650dd
vmalert: keep the returned timestamp when persisting recording rule (#1245)
Previously, vmalert used `lastExecTime` timestamp when writing recording rules
to the remote storage. This may be incorrect, if vmalert uses `datasource.lookback` flag,
which means rule's expression will be executed at some moment in the past.
To avoid such situations, vmalert now will use returned timestamp instead of `lastExecTime`.
2021-04-26 01:03:22 +03:00
Aliaksandr Valialkin
6bc52fe41a all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:16:17 +03:00
Aliaksandr Valialkin
5720b283fa all: consistency renaming Victoria Metrics -> VictoriaMetrics
VMInsert -> vminsert
VMSelect -> vmselect
VMStorage -> vmstorage
2021-04-20 11:45:48 +03:00
Aliaksandr Valialkin
663a91bb82 docs: update -help output after the commit 77be3e3a82 2021-04-12 12:34:59 +03:00
Artem Navoiev
77be3e3a82
improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:04 +03:00
Roman Khavronenko
4ed8de62ac
vmalert: document template functions and mention them in README (#1197) 2021-04-08 18:19:08 +03:00
Roman Khavronenko
c6a8ebb11f
docs: update docs ordering and formatting (#1192)
The major change is adding `sort` directive to docs. For those docs which are copied
from internal packages `sort` is added via makefile command. For the rest it is added
manually since they're updated manually as well.

The rest of changes is connected with markdown formatting. For example, changing headers
in some files (`##` => `#`) makes navigation on .github.io to look better. This especially
useful for `changelog` docs.

Table of contents for `vmctl` is dropped, since we already have it autogenerated on .github.io.

No link changes expected. The corresponding PR to `cluster` branch will be made in follow-up PR.
2021-04-07 13:39:16 +03:00
Aliaksandr Valialkin
0db901617d app: do not process non-GET requests on at / handler 2021-04-02 22:54:06 +03:00
Aliaksandr Valialkin
1db1a29ffa all: increase minimum supported Go version for building VictoriaMetrics components from v1.14 to v1.15
This is needed after the commit c0ac740f93, which uses URL.Redacted() method,
which has been added in v1.15.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:04:53 +03:00
Aliaksandr Valialkin
f6529f932a docs: add a link to the repository from build instruction for all the VictoriaMetrics components 2021-03-25 17:14:42 +02:00
Aliaksandr Valialkin
c3c3e51f17 docs/vmalert.md: remove misleading -evaluationInterval=3s from example config args
3s evaluation interval is too small for practical setups. It can result in increased load on datasource.
So it is better to remove it from example config args, which are usually copy-pasted by novice users.
2021-03-25 15:29:06 +02:00
Aliaksandr Valialkin
aba955fa16 Makefile: prepare vmutils-windows-*.zip archive on make release-vmutils command
The archive contains the following executables for Windows:

* vmagent
* vmalert
* vmauth
* vmctl

Other components - vmbackup, vmrestore, victoria-metrics - aren't supported for Windows yet
2021-03-16 20:52:41 +02:00
Aliaksandr Valialkin
85a95bf60c all: various fixes in command-line flag descriptions 2021-03-15 21:59:25 +02:00
Ihor Borodin
d212f35ae8
Fixing examples of external.alert.source in documentation (#1120)
* Fixing examples of external.alert.source in documentation
2021-03-09 20:49:50 +00:00
Aliaksandr Valialkin
201b685b13 all: bump minimum supported Go version from 1.13 to 1.14 2021-03-03 15:57:13 +02:00
Aliaksandr Valialkin
e09a245b2b app/vmalert/README.md: sync with docs/vmalert.md 2021-03-03 10:43:59 +02:00
Aliaksandr Valialkin
baefe5a8ad docs: actualize -help output 2021-03-01 17:01:27 +02:00
Nikolay
317b0cbed2
adds query params for vmalert (#1094)
remoteWrite.url now accepts query params at provided url
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1087
2021-02-27 10:04:58 +00:00
Aliaksandr Valialkin
03ebc028f7 app/vmalert: add missing multiarch Dockerfile 2021-02-18 15:23:17 +02:00
Roman Khavronenko
2ff038d841
vmalert: mention -datasource.appendTypePrefix in README (#1052) 2021-02-03 23:44:37 +02:00
Dmitry Shevchuk
2baf98082b
Adds ability to query right vmselect endpoint based on the query type (#1050)
* Adds ability to query right vmselect endpoint based on the query type

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2021-02-03 21:26:30 +00:00
Aliaksandr Valialkin
a2344ef4b7 docs/vmalert.md: mention that type option can be set at group level additionally to rule level 2021-02-03 21:13:13 +02:00
Aliaksandr Valialkin
5d87dbfd65 docs: document ability to query Graphite datasource from vmalert 2021-02-01 15:26:33 +02:00
Nikolay
195341a7cf Graphite vmalert wip (#112)
* init implementation for graphite alerts

* adds graphite support for vmalert

* small fix

* changes vmalert graphite api with type

* updates tests

* small fix

* fixes graphite parse

* Fixes graphite from time
2021-02-01 15:05:32 +02:00
weng zhao
cc3e69e963
vmalert: add option datasource.queryStep to allow user to address the inconsistency between grafana dashboards(query_range with step 15s usually) and ALERTS (#1027)
Co-authored-by: zhao.weng <zhao.weng@shopee.com>
2021-01-26 08:12:04 +00:00
Roman Khavronenko
2e2e4f7e21
vmalert-989: return non-empty result in template func query stub to pass validation (#1002)
On templates validation stage vmalert does not acutally send queries, so for complex
chained expression validation may fail. To avoid this, we add a blank sample in response
so validation can pass successfully. Later, during the rule execution, stub will be replaced
with real `query` function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/989
2021-01-10 02:56:11 +03:00
Nikolay
1de15ad490
adds escape for CRLF (#984)
at external.alert.source - \n and \r symbols was url encoded, instead of direct usage.
replace it from "\n" to `\n`  allows to skip url encoding.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
2020-12-25 11:03:13 +02:00
Aliaksandr Valialkin
0326638c90 app/vmalert: typo fix in descriptions for notifier.basicAuth.username and notifier.basicAuth.password command-line flags 2020-12-24 12:48:59 +02:00
Aliaksandr Valialkin
9df60518bb docs: mention that it is possible to set multiple -notifier.tlsInsecureSkipVerify command-line flags for vmalert
See c3a92968343c2b3619f1ab935702d0e9b3a46733
2020-12-22 22:32:13 +02:00
Nikolay
c270f8f3e6
changes vmalert notifier flag, (#978)
fixes issue with notifier insecure setting, now its possible to use multiple notifier.tlsInsecureSkipVerify multiple time.
2020-12-22 23:23:04 +03:00
Roman Khavronenko
404cbd1522
vmalert-974: fix order for labels templating (#975)
The change fixes bug caused by 3adf8c5a6f.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/974
2020-12-19 14:10:59 +02:00
Roman Khavronenko
6247884057
vmalert: add function "query", "first" and "value" to alert templates functions (#960)
The commit adds a support for template function `query`,
`first` and `value`. The function `query` executes
a MetricsQL query for active alerts. In vmalert we
update templates on every evaluation for active alerts
to keep them up to date. With `query` func it may become
a perf issue since it will fire a query on every execution.
We should keep it in mind for now.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539
2020-12-14 20:11:45 +02:00
Aliaksandr Valialkin
9d42546a27 docs: consistently use links to https://victoriametrics.github.io for documentation references 2020-12-11 21:08:18 +02:00
Aliaksandr Valialkin
4146fc4668 all: properly handle CPU limits set on the host system/container
This can reduce memory usage on systems with enabled CPU limits.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
2020-12-08 21:07:29 +02:00
Aliaksandr Valialkin
b6b1b06d70 app/{vmalert,vmagent}: skip empty values in -remoteWrite.label and -label lists 2020-12-08 14:55:13 +02:00
Aliaksandr Valialkin
9d787f9edd all: do not print usage info for all the flags when incorrect command-line flag is passed
This should improve usability for VictoriaMetrics apps that have big number of command-line flags,
i.e. all the apps.
2020-12-03 21:47:37 +02:00
Nikolay
0463cb5550
fixes checksum calculation (#928)
* fixes checksum calculation,
'for' rule param wasnt marshal properly during checksum calculation

* fixes error
2020-11-29 09:48:42 +02:00
Aliaksandr Valialkin
47a038401b all: consistently return text-based HTTP responses with charset=utf-8
This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2020-11-13 10:35:41 +02:00
Roman Khavronenko
3adf8c5a6f
vmalert: explicitly set extra labels to alert entities (#886)
The previous implementation treated extra labels (global and rule labels) as
separate label set to returned time series labels. Hence, time series always contained
only original labels and alert ID was generated from sorted labels key-values.
Extra labels didn't affect the generated ID and were applied on the following actions:
- templating for Summary and Annotations;
- persisting state via remote write;
- restoring state via remote read.

Such behaviour caused difficulties on restore procedure because extra labels had to be dropped
before checking the alert ID, but that not always worked. Consider the case when expression
returns the following time series `up{job="foo"}` and rule has extra label `job=bar`.
This would mean that restored alert ID will be always different to the real time series because
of collision.

To solve the situation extra labels are now always applied beforehand and `vmalert` doesn't
store original labels anymore. However, this could result into a new error situation.
Consider the case when expression returns two time series `up{job="foo"}` and `up{job="baz"}`,
while rule has extra label `job=bar`. In such case, applying extra labels will result into
two identical time series and `vmalert` will return error:
 `result contains metrics with the same labelset after applying rule labels`

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
2020-11-10 00:27:32 +02:00
Roman Khavronenko
f0bdc5716e
vmalert: skip automatically added labels on alerts restore (#871)
Label `alertgroup` was introduced in #611 and automatically added to generated
time series. By mistake, this new label wasn't correctly purged on restore event
and affected alert's ID uniqueness. This commit removes `alertgroup` label
in restore function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
2020-10-30 08:18:20 +00:00
kreedom
ef77120170
vmalert - add dryRun (#842)
vmalert: add `dryRun` flag for rules validation without running the service
2020-10-20 08:15:21 +01:00
Roman Khavronenko
bc42b5598f
vmalert: update docs to highlight the state restore requirements; (#833)
Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/830
2020-10-13 18:32:43 +03:00
Aliaksandr Valialkin
f4e8687c88 app/vmalert: accept days, weeks and years in for: part of config like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817
2020-10-08 20:13:15 +03:00
Aliaksandr Valialkin
d423d73251 app/vmalert: do not pring description for all the flags on config errors
The description is too big to consume by human and it just distracts humans.
2020-10-08 13:35:57 +03:00
Dmitry Shihovtsev
92e5d89fc9
Fix typos in the vmalert datasource (#814)
* Fix typos in the vmalert datasource

* Fix typo in the vmalert datasource test
2020-10-07 17:59:50 +03:00
Roman Khavronenko
daa2d1c065
vmalert: make maxIdleConnections configurable for datasource HTTP client (#797)
Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/795
2020-09-30 09:49:45 +03:00
Aliaksandr Valialkin
2985077c35 all: consistently use "%w" formatting in fmt.Errorf for wrapped errors 2020-09-23 22:46:34 +03:00
Aliaksandr Valialkin
1e1a27d803 app/vmalert: remove unneeded UTC() call
UTC() doesn't change the underlying timestamp, so the call isn't needed here
2020-09-21 15:55:59 +03:00
Roman Khavronenko
5dffc7a553
vmalert: add support for datasource.lookback flag (#779)
New datasource flag `datasource.lookback` defines how far to look into
past when evaluating queries.

Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/668
2020-09-21 15:53:49 +03:00
Roman Khavronenko
82c3bbce34
vmalert: fix the typo in error message (#782)
The error will be always nil so no sense in printing it.
2020-09-21 11:34:23 +03:00
Roman Khavronenko
6ad6480400
vmalert: add Group name as label to generated alerts and timeseries (#761)
Solves #611
2020-09-11 20:52:56 +01:00
Roman Khavronenko
4cdffb04a4
vmalert: update groups on config reload only if changes detected (#759)
On config reload event `vmalert` reloads configuration for every group. While
it works for simple configurations, the more complex and heavy installations may
suffer from frequent config reloads.
The change introduces the `checksum` field for every group and is set to md5 hash
of yaml configuration. The checksum will change if on any change to group
definition like rules order or annotation change. Comparing the `checksum` field
on config reload event helps to detect if group should be updated.
The groups update is now done concurrently, so reload duration will be limited by
the slowest group now.

Partially solves #691 by improving config reload speed.
2020-09-11 20:14:30 +01:00
Aliaksandr Valialkin
d7c04db1fc docs: sync docs for vmalert, vmauth, vmbackup and vmrestore 2020-09-09 21:10:34 +03:00
Nikolay Khramchikhin
764b3d4fda
changed vmalert behaviour (#738)
* VMAlert start with empty rules dir

There are some applications (operator for instance), that generates alerts configuration at runtime
and vmalert must start correctly without rules to support this behaviour.
Later application will add rules files and send SIGHUP to vmalert,
which will trigger reading rules files and start rules exectuion.

Removing rules files with SIGHUP signal must stop rules execution and
vmalert will wait for new rules.

* imports sorted

* added test cases for empty rules, removed blank line

* fixed imports conflict

* updated tests
2020-09-03 11:04:42 +03:00
Aliaksandr Valialkin
5f16ceb294 app/vmalert: imrovements over 3f932c2db1 2020-09-03 01:00:55 +03:00
DexterZhang
3f932c2db1
feat: spread load of rule evaluation by group when starting new groups (#724)
* feat: spread load of rule evaluation by group when starting new groups

* review: reduce the resulting diff.

* Update app/vmalert/group.go

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2020-09-03 00:58:54 +03:00
Roman Khavronenko
10601bc652
vmalert: update -rule flag description to enforce quotes using (#709)
Description for `-rule` flag uses as example specific chars like asterisks
which could be interpreted wrong by different shells. To avoid this, description
now contains quoted flag values.

See also #708
2020-08-20 22:36:38 +01:00
Aliaksandr Valialkin
184670fb9b docs: update docs 2020-08-14 19:13:42 +03:00
Aliaksandr Valialkin
c402265e88 all: support %{ENV_VAR} placeholders in yaml configs in all the vm* components
Such placeholders are substituted by the corresponding environment variable values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/583
2020-08-13 17:15:25 +03:00
Aliaksandr Valialkin
ef7e2af8f5 app: respect CPU limits set via cgroups
Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage
for cases when VictoriaMetrics components run in containers with CPU limits.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685
2020-08-11 22:59:19 +03:00
Roman Khavronenko
0be5b09fb4
app/vmalert: extend metrics set exported by vmalert #573 (#654)
* app/vmalert: extend metrics set exported by `vmalert` #573

New metrics were added to improve observability:
+ vmalert_alerts_pending{alertname, group} - number of pending alerts per group
per alert;
+ vmalert_alerts_acitve{alertname, group} - number of active alerts per group
per alert;
+ vmalert_alerts_error{alertname, group} - is 1 if alertname ended up with error
during prev execution, is 0 if no errors happened;
+ vmalert_recording_rules_error{recording, group} - is 1 if recording rule
 ended up with error during prev execution, is 0 if no errors happened;
* vmalert_iteration_total{group, file} - now contains group and file name labels.
This should improve control over specific groups;
* vmalert_iteration_duration_seconds{group, file} - now contains group and file name labels. This should improve control over specific groups;

Some collisions for alerts and recording rules are possible, because neither
group name nor alert/recording rule name are unique for compatibility reasons.

Commit contains list of TODOs for Unregistering metrics since groups and rules
are ephemeral and could be removed without application restart. In order to
unlock Unregistering feature corresponding PR was filed - https://github.com/VictoriaMetrics/metrics/pull/13

* app/vmalert: extend metrics set exported by `vmalert` #573

The changes are following:
* add an ID label to rules metrics, since `name` collisions within one group is
a common case - see the k8s example alerts;
* supports metrics unregistering on rule updates. Consider the case when one rule
was added or removed from the group, or the whole group was added or removed.

The change depends on https://github.com/VictoriaMetrics/metrics/pull/16
where race condition for Unregister method was fixed.
2020-08-09 09:41:29 +03:00
Aliaksandr Valialkin
d01f3c1943 all: add mssing APP_NAME to vm*-GOARCH builds 2020-07-31 13:42:18 +03:00
Aliaksandr Valialkin
3f498cf2dc docs/{vmagent,vmalert}: add instruction on how to build for ARM 2020-07-31 09:25:22 +03:00
Roman Khavronenko
2f1e7298ce
app/vmalert: support external.label to specify global labelset for all rules #622 (#652)
`external.label` flag supposed to help to distinguish alert or recording rules
source in situations when more than one `vmalert` runs for the same datasource
or AlertManager.
2020-07-28 14:20:31 +03:00
Aliaksandr Valialkin
b35cb293f5 lib/httpserver: log remote address in error message from httpserver.Errorf
This should improve detection of the root cause of errors.
Thanks to Anant for the idea.
2020-07-20 14:11:22 +03:00
Aliaksandr Valialkin
ed7580ad22 app/vmalert: consistently use "%w" instead of "%s" in fmt.Errorf when wrapping errors 2020-07-15 13:56:47 +03:00
Roman Khavronenko
703def4b2e
app/vmalert: add retries to remotewrite (#605)
* app/vmalert: add retries to remotewrite

Remotewrite pkg now does limited number of retries if write request failed.
This suppose to make vmalert state persisting more reliable.

New metrics were added to remotewrite in order to track rows/bytes sent/dropped.

defaultFlushInterval was increased from 1s to 5s for sanity reasons.

* fix

* wip

* wip

* wip

* fix bits alignment bug for 32-bit systems

* fix mistakenly dropped field
2020-07-05 18:46:52 +03:00