Commit Graph

549 Commits

Author SHA1 Message Date
Artem Navoiev
42baf5fe3f change link to the enterprise legal doc
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-09-05 01:53:36 -07:00
Aliaksandr Valialkin
edee262ecc
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Roman Khavronenko
ddf87b32ed
vmalert: correctly re-instantinate HTTP req on retries (#4864)
* vmalert: correctly re-instantinate HTTP req on retries

Previosly, request retry to datasource re-used existing HTTP request.
But if request object was already partially processed (body was read),
then retry will be unsuccessful.

The change re-instantinates HTTP request object before retry.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fix

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-24 00:04:05 +02:00
Abirdcfly
835c03fb47
vmalert: fix vmalert_remotewrite_send_duration_seconds_total metric value (#4801)
The deferred call's arguments are evaluated immediately, but the function call is not executed until the surrounding function returns.

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2023-08-10 10:51:44 +08:00
hagen1778
bc5065fd14
vmalert: mention vmalert_iteration_duration_seconds metric in README
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-09 22:03:22 +02:00
Haleygo
262c888d4b
vmalert: fix redundant clean up move (#4803)
Follow-up after 55ae2c2d57
2023-08-09 21:16:00 +02:00
Roman Khavronenko
1d4a0796f4
vmalert: cleanup config reload metrics handling (#4790)
* rename `configErr` to `lastConfigErr` to reduce confusion
* add tests to verify metrics and msg are set properly
* fix mistake when config success metric wasn't restored after an error

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-07 21:58:40 +02:00
Haleygo
527cfa2f74
vmalert: fix uncleaned tmp files in tests (#4788) 2023-08-07 14:24:56 +02:00
Zakhar Bessarab
2290252119
docs: make phrase about dedup and evaluation interval relation less obscure (#4781)
Value of `-dedup.minScrapeInterval` comand-line flag must be higher than `evaluation_interval` in order to make sure that only one sample on each evaluation will be left after deduplication.
Moreover, value of `-dedup.minScrapeInterval` must be a multiple of vmalert's `evaluation_interval` in order to make sure that samples will be aligned between deduplication window periods.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4774#issuecomment-1663940811

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-03 17:04:19 +02:00
Roman Khavronenko
216d4091f7
vmalert: remove deprecated in v1.79.0 web links with */status suffix (#4747)
Links of form `/api/v1/<groupID>/<alertID>/status` were deprecated
in favour of `/api/v1/alerts?group_id=<>&alert_id=<>` links in
v1.79.0. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825

This change removes code responsible for deprecated functionality.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-31 16:51:41 +02:00
hagen1778
3b524f671b
docs: rm typo of naming vmalert as a stateless service
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-31 16:47:50 +02:00
Roman Khavronenko
9ede3e996b
vmalert: remove deprecated in v1.61.0 -rule.configCheckInterval (#4745)
Use `-configCheckInterval` command-line flag instead.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-31 16:39:57 +02:00
Aliaksandr Valialkin
fcf5e33e6a
app/vmalert: use proper timestamp in setConfigSuccess() 2023-07-28 11:15:58 -07:00
Roman Khavronenko
9f1b9b86cc
vmalert: revert unittest feature (#4734)
* Revert "vmalert: unittest support stale datapoint (#4696)"

This reverts commit 0b44df7ec8.

* Revert "docs: specify min version and limitations for vmalert's unit tests"

This reverts commit a24541bd

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "vmalert: init unit test (#4596)"

This reverts commit da60a68d

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: mention unittest revert in changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-28 10:42:02 +02:00
Aliaksandr Valialkin
f8d30a486e
docs/vmalert.md: fix broken links to Web chapter 2023-07-27 18:05:01 -07:00
Aliaksandr Valialkin
6db8f26e2f
app/vmalert: make golangci-lint happy after ae0e4a8c90 2023-07-27 13:27:20 -07:00
Haleygo
ae0e4a8c90
vmalert: add keep_firing_for field for alerting rule (#4669)
vmalert: support `keep_firing_for` field for alerting rule

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4529

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-27 15:13:13 +02:00
hagen1778
03f04afeda
vmalert: clarify docs for state restore with additional details
The important change is to highlight that restore procedure happens
only once and only for already loaded rules. Config hot-reload
doesn't trigger the restore procedure.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-27 11:29:17 +02:00
hagen1778
7fe81fe612
vmalert: revert accidental changes to Makefile rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-27 11:08:44 +02:00
Aliaksandr Valialkin
cc8427f11b
docs: use 1. instead of N. in numbered bullets, so they are automatically adjusted by Github Markdown engine
See https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#lists
2023-07-26 14:39:44 -07:00
Aliaksandr Valialkin
56c17d16f6
app/vmalert/datasource: substitute golang.org/x/exp/slices.SortFunc with sort.Slice
This removes unnecessary third-party dependency on golang.org/x/exp.

This is a follow-up for da60a68d09
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945
2023-07-24 18:58:01 -07:00
Haleygo
0b44df7ec8
vmalert: unittest support stale datapoint (#4696)
* vmalert: unittest support stale datapoint

* add stale ut case
2023-07-24 15:40:14 +02:00
Zakhar Bessarab
866b150f0f
app/vmalert/datasource/graphite: allow overriding "from" parameter for datasource queries (#4687)
* app/vmalert/datasource/graphite: allow overriding "from" parameter for datasource queries

Fixes construction of URL parameters for graphite render to allow overriding "from" parameter.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4685
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmalert/datasource/graphite: update flow for building URL parameters

Makes flow of building URL parameters same as Prometheus datasource has:
1) Setting all default values
2) Merging those values with provided `extraParams`

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-21 14:28:10 +04:00
hagen1778
b9b1661319
docs: fix the next release version for vmalert
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-20 18:48:21 +02:00
hagen1778
a24541bdb7
docs: specify min version and limitations for vmalert's unit tests
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-20 15:32:49 +02:00
Haleygo
da60a68d09
vmalert: init unit test (#4596)
vmalert: support unit tests

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945 
---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-20 15:07:10 +02:00
Aliaksandr Valialkin
d25082cbd9
app/vmalert/README.md: sync with docs/vmalert.md after 54b7bd4564 2023-07-19 16:30:21 -07:00
Roman Khavronenko
c32a01c52e
docs: follow-up after aec4b5db81 (#4638)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-19 10:10:51 +02:00
Roman Khavronenko
25317b4e70
vmalert: follow-up after d4ac4b7813 (#4659)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-18 15:53:37 +02:00
venkatbvc
d4ac4b7813
vmalert: allow to blackhole alerting notifications (#4639)
vmalert: support option to blackhole alerting notifications

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4122

---------

Co-authored-by: Rao, B V Chalapathi <b_v_chalapathi.rao@nokia.com>
2023-07-18 15:06:19 +02:00
Haleygo
b002e2a743
vmalert: fix evalTS after modify group interval (#4629) 2023-07-14 14:45:24 +02:00
Aliaksandr Valialkin
4f6dc25c71
app/vmalert: silence golagci-lint at TestAlertingRule_Template
Add a break if gotAlert is nil

This removes the following golangci-lint warning:

app/vmalert/alerting_test.go:868:8: SA5011(related information): this check suggests that the pointer can be nil (staticcheck)
				if gotAlert == nil {
				   ^
2023-07-13 11:40:44 -07:00
Roman Khavronenko
cbc28ccdb2
vmalert: check for negative offset for missed rounds (#4628)
It could happen for low evaluation intervals and irregular
delays during execution that evaluation time would get
a negative offset. This could result into cumulative
discrepancy between the actual time and evaluation time for rules.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-13 17:11:22 +02:00
Zakhar Bessarab
51a9cc9783
docs: make httpAuth.* flags description less ambiguous (#4588)
* docs: make `httpAuth.*` flags description less ambiguous

Currently, it may confuse users whether `httpAuth.*` flags are used by HTTP client or server configuration(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4586 for example).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix a typo

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-07 13:50:13 +02:00
Haleygo
bca8ae034f
vmalert:fix query request using rfc3339 format (#4577)
vmalert: consistently use time.RFC3339 format for time in queries

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-07 10:39:25 +02:00
Aliaksandr Valialkin
9fa8e3895a
docs/vmalert.md: update -help output 2023-07-06 23:13:04 -07:00
Roman Khavronenko
7c8a215a7c
vmalert: allow disabling of step param attached to instant queries (#4574)
vmalert: allow disabling of `step` param attached to instant queries

This might be useful for using vmalert with datasources that to not support this param,
unlike VictoriaMetrics.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-07 07:44:34 +02:00
Dmytro Kozlov
9bde95bfff
app/vmalert: show on UI groups error after reload config (#4543)
show on UI groups error after reload config

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4076

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-03 14:59:52 +02:00
Haleygo
a97887a2d9
vmalert: add vmalert_remotewrite_sent_duration_seconds_total metric (#4517)
add `vmalert_remotewrite_sent_duration_seconds_total` metric
2023-06-26 07:34:51 +02:00
Roman Khavronenko
37c9a631ca
vmalert: make linter happy (#4509)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-22 17:46:12 +02:00
Roman Khavronenko
5f9ad22884
vmalert: update retry policy for pushing data to -remoteWrite.url (#4504)
By default, vmalert will make multiple retry attempts with exponential delay.
The total time spent during retry attempts shouldn't exceed `-remoteWrite.retryMaxTime` (default is 30s).
When retry time is exceeded vmalert drops the data dedicated for `-remoteWrite.url`.
Before, vmalert dropped data after 5 retry attempts with 1s delay between attempts (not configurable).

See `-remoteWrite.retryMinInterval` and `-remoteWrite.retryMaxTime` cmd-line flags.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-06-22 15:14:23 +02:00
Roman Khavronenko
4aad7a43df
vmalert: properly interrupt remotewrite retries on shutdown (#4505)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-22 15:07:32 +02:00
Roman Khavronenko
94f516df43
docs/vmalert: specify version requirements for new features (#4480)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-21 09:44:00 +02:00
Roman Khavronenko
79a5499cb2
vmalert: retry all errors except 4XX status codes (#4461)
vmalert: retry all errors except 4XX status codes

Retry all errors except 4XX status codes while pushing via remote-write
to the remote storage. Previously, errors like broken connection could
prevent vmalert from retrying the request.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-20 13:24:45 +02:00
Roman Khavronenko
28b23e7a4c
docs/vmalert: mention same labelset error in docs (#4443)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-13 17:03:53 +02:00
Roman Khavronenko
c4be49e21b
docs: mention stream aggregation as more efficient approach for aggregation (#4429)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-09 13:47:54 +02:00
Roman Khavronenko
de94812088
vmalert: fix nil map assignment (#4392)
* vmalert: fix nil map assignment

The storage instance with nil map params was created for remote-read purposes.
And before change 7a9ae9de0d this map was ignored in ApplyParams.
Now, it started to be used and vmalert panics in runtime.

The fix properly inits map for at `NewVMStorage` and verifies it is not nil
on assignment in `ApplyParams`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: add to changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly clone Storage params

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly clone Storage params

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly clone Storage params

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-02 11:38:55 +02:00
Roman Khavronenko
eccecdf177
app/vmalert: follow-up after 7a9ae9de0d (#4381)
7a9ae9de0d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-01 11:38:48 +02:00
gsakun
20dc3db71e
app/vmalert: fix datasource.roundDigits Parameter (#4341)
app/vmalert: fix querybuild clone and extraParams merge logic

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4340
2023-06-01 09:44:11 +02:00
Roman Khavronenko
1b24f4729b
vmalert: mention default value for external.url flag (#4365)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-30 12:33:17 +02:00
Roman Khavronenko
51cea6cad4
vmalert: properly form assets address if httpPrefix set (#4351)
Properly form path to static assets in WEB UI
 if `http.pathPrefix` set.

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4349

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-29 07:38:13 +02:00
Roman Khavronenko
66ed6fe62f
vmalert: do not return nil rules for /api/v1/rules (#4344)
The fix addresses a case when vmalert is configured with a group
which has `name`, but doesn't have `rules` configured. In this
case it still returns a `nil` instead of `[]` slice.

Fixing this via current commit.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-25 16:56:54 +02:00
Roman Khavronenko
76f7e66d8e
vmalert: fix the typo in popup (#4331)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-19 12:45:15 +02:00
Roman Khavronenko
f68d93cca2
vmalert: follow-up after 669becd011 (#4318)
* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-16 18:51:38 +02:00
Michael Hoffmann
3a65f4a733
vmalert: improve retry logic for remote write (#4134)
vmalert should not retry on 4xx status codes
according to https://prometheus.io/docs/concepts/remote_write_spec/
2023-05-16 16:30:03 +02:00
Roman Khavronenko
adc5635a07
vmalert: add hints to filter buttons (#4296)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-11 16:38:08 +02:00
Roman Khavronenko
2caf0b05c6
vmalert: correctly update seriesFetched metric for const exprs (#4287)
Previously, metric `vmalert_alerting_rules_last_evaluation_series_fetched`
would be set to 0 for const expressions, because const expression do not match
any series. This may result into a confusion: no series were matched but response isn't empty.
The change updates the logic behind metric: if no series were matched but there are samples
in response - use amount of samples as number of series.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-10 15:04:05 +02:00
Roman Khavronenko
51196739af
Vmalert UI updates (#4276)
* vmalert: expand rule groups on anchor click

before, anchor click was only updating the URL.
To expand the group, user had to click on rule's block.
Now, group will toggle automatically.

* vmalert: allow filtering group in web UI

The new filter allows to filter groups and rules within
groups by: errors only or noMatch only.

The filtering supposed to help navigating big numbers of groups/rules.
Filtering is reflected in URL, so can be shared as a link.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-10 14:38:13 +02:00
Alexander Marshalov
2e494e2375
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 09:50:41 +02:00
Aliaksandr Valialkin
3c0470f91e
docs/vmalert.md: clarify docs regarding the support of recursive globs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4041
2023-05-08 16:21:44 -07:00
Roman Khavronenko
fa3a17938e
vmalert: follow-up after cae87da (#4269)
* vmalert: follow-up after cae87da

cae87da4bb
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: update struct comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rm typo

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 13:31:54 +02:00
Haleygo
cae87da4bb
vmalert: support reading rule from http url (#4212)
vmalert: support reading rule's config from HTTP URL
2023-05-08 09:52:57 +02:00
Roman Khavronenko
5b8450fc1b
app/vmalert: detect alerting rules which don't match any series at all (#4198)
app/vmalert: detect alerting rules which don't match any series at all

vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4056

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 09:36:39 +02:00
Roman Khavronenko
3383f12a4b
vmalert: fix API to return non-nil values (#4222)
Properly return empty slices instead of nil for `/api/v1/rules` and `/api/v1/alerts` API handlers.
This improves compatibility with Grafana.

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-28 10:08:29 +02:00
Roman Khavronenko
eb746a4dab
Revert "http server: limit max concurrent requests (#4185)" (#4215)
This reverts commit 77f76371

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-27 13:02:47 +02:00
Roman Khavronenko
29e059e49c
app/vmalert: follow-up after 6c322b4a00 (#4214)
6c322b4a00

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-27 13:02:21 +02:00
Haleygo
6c322b4a00
vmalert: allow configuring custom notifier headers per group (#4088)
vmalert: allow configuring custom notifier headers per group
2023-04-27 12:17:26 +02:00
Zakhar Bessarab
b21a55febf
app/vmalert: add support of recursive path globs for rules and templates (#4148)
Supports using `**` for `-rule` and `-rule.templates`: `dir/**/*.tpl` loads contents of dir and all subdirectories recursively.

See: #4041

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-04-26 19:20:22 +02:00
Zakhar Bessarab
89a1c941c2
app/vmalert: return an error when using query function in -external.alert.source flag (#4191)
Templating of `-external.alert.source` is not expected to have access to the query which was causing runtime error when query function was passed as nil.
See: #4181

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-26 15:31:14 +02:00
Roman Khavronenko
77f76371d0
http server: limit max concurrent requests (#4185)
* lib/httpserver: introduce `-http.maxConcurrentRequests` command-line flag

Introduce `-http.maxConcurrentRequests` command-line flag to protect
VM components from resource exhaustion during unexpected spikes of HTTP requests.
By default, the new flag's value is set to 0 which means no limits are applied.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/httpserver: mention http.maxConcurrentRequests in docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-24 14:52:06 +02:00
Roman Khavronenko
de61a73c63
vmalert: retry datasource requests with EOF or unexpected EOF errors (#4146)
* vmalert: retry datasource requests with EOF or unexpected EOF errors

Retry failed read request on the closed connection one more time.
This may improve rules execution reliability when connection
between vmalert and datasource closes unexpectedly.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: fix old tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-19 10:18:32 +02:00
Zakhar Bessarab
a8d1497024
app/vmalert: update Grafana URLs to match latest format (#4061)
See: #4019

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-04 15:25:29 +04:00
Roman Khavronenko
4a49577028
vmalert: use missingkey=zero for templating (#4040)
Replace empty labels with "" instead of "<no value>"
during templating, as Prometheus does.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4012

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-30 16:57:00 +04:00
Roman Khavronenko
4021aa11b5
app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 15:18:25 -07:00
Roman Khavronenko
365d2ff0bf
vmalert: add anchor char to Group's link (#4006)
This should help users to see that Group's name is clickable
and used for anchoring.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:48:43 +01:00
Roman Khavronenko
8db6a71f83
vmalert: mention VMUI example for alert's source (#4005)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:40:55 +01:00
Roman Khavronenko
3283f0dae4
vmalert: support logs suppressing during config reloads (#3973)
* vmalert: support logs suppressing during config reloads

The change is mostly required for ENT version of vmalert,
since it supports object-storage for config files.
Reading data from object storage could be time-consuming,
so vmalert emits logs to track the progress.

However, these logs are mostly needed on start or on
manual config reload. Printing these logs each time
`rule.configCheckInterval` is triggered would too verbose.
So the change allows to control logs emitting during
config reloads.

Now, logs are emitted during start up or when SIGHUP is receieved.
For periodicall config checks logs emitted by config pkg are suppressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-20 16:08:30 +01:00
Roman Khavronenko
8fdd613f25
Vmalert tests (#3975)
* vmalert: add tests for notifier pkg

* vmalert: add tests for remotewrite pkg

* vmalert: add tests for template functions

* vmalert: add tests for web pages

* vmalert: fix int overflow in tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-17 15:57:24 +01:00
Zakhar Bessarab
f7a7eb8f3e
docs: add a note about cache reset for vmalert backfilling docs (#3940)
docs: add a note about cache reset for vmalert backfilling docs
2023-03-10 13:45:11 +01:00
Roman Khavronenko
d66bae212b
app/vmalert: log number of configration files found for each specified -rule (#3936)
The change also introduces `List` method to `FS` interface.
The `List` method can be used for wildcard support in object storage FS.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-03-09 14:46:19 +01:00
Aliaksandr Valialkin
1b5dc9f91d
all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:26:55 -08:00
Roman Khavronenko
2472baa934
app/vmalert: do not wait for group start on removal (#3891)
Each group in vmalert starts with an artifical delay to avoid
thundering herd problem. For some groups with high evaluation
intervals, the delay could be significant.
If during this delay user will remove the group from the config
and hot-reload it - vmalert will have to wait until the delay
ends. This results into slow config reloading and UI hang.

The change moves the start-delay logic back to the group's
`start` method. Now, group can immediately exit from the
delay when `group.close()` method is called.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-06 14:04:43 +01:00
Roman Khavronenko
d6fa4da712
vmalert: cancel in-flight requests on group's update or close (#3886)
When group's update() or close() method is called, the group
still need to wait for its current evaluation to finish.
Sometimes, evaluation could take a significant amount of time
which slows configuration update or vmalert's graceful shutdown.

The change interrupts current evaluation in order to speed up
the graceful shutdown or config update procedures.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-01 15:48:20 +01:00
Aliaksandr Valialkin
255a0cf635
all: add makefile rules for GOARCH=s390x for all the VictoriaMetrics components
This is a follow-up for 007530f882
2023-02-26 12:36:51 -08:00
Aliaksandr Valialkin
510f78a96b
all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 18:58:46 -08:00
my-git9
9dec3c8f80
chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:53:05 -08:00
Roman Khavronenko
5446ce0018
docs: mention rules replay blogpost in vmalert docs (#3851)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-21 22:16:15 +01:00
Roman Khavronenko
74f122294d
docs: update vmalert docs (#3843)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-20 16:39:31 +01:00
Aliaksandr Valialkin
8bd2eee8e0
app/vmalert/README.md: sync with docs/vmalert.md after 6ef6f3a771 2023-02-18 15:21:24 -08:00
Haleygo
6ef6f3a771
vmalert: fix maxResolveDuration flag note (#3827)
Signed-off-by: Haleygo <hui.wang@daocloud.io>
2023-02-16 19:26:17 +01:00
Roman Khavronenko
a645a95bd6
docs: improve troubleshooting docs for vmalert (#3812)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 17:29:30 +01:00
Oleksandr Redko
9fff48c3e3
app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
73358571ee
app/vmalert: follow-up after d3c64aae8768d58781ee7e358bd7f3d8e0eb836d
- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
2023-02-09 18:52:00 -08:00
Roman Khavronenko
14c20c1843
vmalert: support object storage for rules (#519)
* vmalert: support object storage for rules

Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.

* review fixes
2023-02-09 18:50:48 -08:00
Aliaksandr Valialkin
0e0095d350
all: run apk update && apk upgrade in base Alpine Docker image in order to get all the recent security fixes 2023-02-09 14:01:32 -08:00
Roman Khavronenko
e83f14210d
Vmalert fixes (#3788)
* vmalert: use group's ID in UI to avoid collisions

Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: prevent disabling state updates tracking

The minimum number of update states to track is now set to 1.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: display `debug` field for rule in UI

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: exclude `updates` field from json marhsaling

This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 14:34:03 +01:00
Roman Khavronenko
c32d8ea29e
vmalert: update docs (#3770)
vmalert: update flags description

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-06 09:51:30 +01:00
Roman Khavronenko
6fd10e8871
vmalert: speed up state restore procedure on start (#3758)
* vmalert: speed up state restore procedure on start

Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.

While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.

This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.

See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* make lint happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-03 19:46:13 -08:00
Roman Khavronenko
1cbdcd391c
docs: mention -vmalert.proxyURL in vmalert docs (#3730)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-30 16:28:33 +01:00
Aliaksandr Valialkin
0890adde67
docs: update command-line descriptions after 73256fe438 2023-01-27 00:00:37 -08:00