Commit Graph

554 Commits

Author SHA1 Message Date
Hui Wang
abd29c15ab
docs: update vmalert and vmagent docs (#6207)
* restore and actualize doc section explaining duplicated labels error
* rm misleading comment about post-aggregation in stream aggregation

(cherry picked from commit e3c226cf92)
2024-04-30 10:30:19 +02:00
Roman Khavronenko
2566e19306
app/vmalert: fix links with anchors in vmalert's UI (#6146)
Starting from v1.99.0 vmalert could ignore anchors pointing to specific
rule groups if `search` param was present in URL.
This change makes anchors compatible with `search` param in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 5f487c7090)
2024-04-22 15:05:23 +02:00
Hui Wang
e0d47ab6af
vmalert: avoid blocking APIs when alerting rule uses template functio… (#6129)
* vmalert: avoid blocking APIs when alerting rule uses template function `query`

* app/vmalert: small refactoring

* simplify labels and templates expanding
* simplify `newAlert` interface
* fix `TestGroupStart` which mistakenly skipped annotations
and response labels check

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* reduce alerts lock time when restore

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-19 11:30:40 +02:00
Roman Khavronenko
95b0f82c9b
app/vmalert: make TestGroupStart more reliable (#6130)
There was a sleep statement in the test, waiting for Group
to perform a couple of evaluation. But looks like
it worked unreliable for some CI tests like the one below
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/8718213844/job/23915007958?pr=6115

This commit changes the sleep statement on a function that
waits for a specific number of evaluations. It should make this
test faster in general case, and more reliable for slow environemnts.
2024-04-19 11:28:30 +02:00
Aliaksandr Valialkin
a99005eff6
all: replace old https://docs.victoriametrics.com/vmalert.html url with the new one - https://docs.victoriametrics.com/vmalert/ 2024-04-18 01:44:54 +02:00
Alexander Marshalov
c61eb6d620
vmalert: support any status code from the range 200-299 from alertmanager as successful (#6111)
* any status code from the range 200-299 from alertmanager to vmalert is not considered an error from now on (#6110)

* add changelog

(cherry picked from commit 7308bad777)
2024-04-16 09:58:09 +02:00
wanshuangcheng
52a4ae0b28
chore: fix function names in comment (#6076)
Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>
2024-04-08 15:38:51 +02:00
Aliaksandr Valialkin
00f59d6ddf
all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-03 03:00:45 +03:00
Hui Wang
fdb6eb1071
vmalert: fix sending alert messages (#6028)
* vmalert: fix sending alert messages
1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered;
2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile.

* docs: update changelog notes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-04-02 23:51:06 +03:00
Jiekun
1e0a079143
app/vmalert: respect batch size limit for remote write on shutdown (#6039)
During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.

This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).

This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025

(cherry picked from commit 623d257faf)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-29 14:30:24 +01:00
Tien M. Nguyen
03fb97d42f
feat: include cluster info in alert CPUThrottlingHigh (#5956) 2024-03-17 20:46:15 +02:00
Hui Wang
349564fd82
vmalert: deprecate cmd-line flag -datasource.lookback (#5877)
* vmalert: deprecate cmd-line flag `-datasource.lookback`

* fix lint

* review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e80b44f19d)
2024-03-12 16:17:40 +01:00
Aliaksandr Valialkin
9d5a385973
app/{vmalert,vmctl}: consistently use http.NewRequestWithContext() instead of http.NewRequest() + req.WithContext() 2024-02-29 15:25:36 +02:00
Aliaksandr Valialkin
63d635a5e4
app: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 03:06:14 +02:00
hagen1778
4474c23aed
app/vmalert: consistently sort groups by name and filename on /groups page
This should prevent non-deterministic sorting for groups with identical names.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e2dad3a2ac)
2024-02-20 13:51:31 +01:00
hagen1778
6c63fd831d
app/vmalert: follow-up after b60dcbe11f
* support case-insensitive search
* reflect search condition in URL, so link can be sharable
* support filtering on /alerts page
* fix collapseAll/expandAll logic to respect only shown entries
* add changelog

b60dcbe11f
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 11b03d9fc8)
2024-02-20 13:35:02 +01:00
Victor Amorim dos Santos
f79abd54b0
vmalert: add filter by group or rule name to UI (#5791)
Co-authored-by: Yury Molodov <yurymolodov@gmail.com>
(cherry picked from commit b60dcbe11f)
2024-02-20 13:35:02 +01:00
Roman Khavronenko
433c3726b2
app/vmalert: support filtering for /api/v1/rule like Prometheus does (#5787)
Follow-up after 62e5e2a4c8

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8850c7431d)
2024-02-09 14:36:15 +01:00
Victor Amorim dos Santos
56b1d8e9ed
app/vmalert: support type param for filtering /api/v1/rules response by rule type (#5749)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
(cherry picked from commit 62e5e2a4c8)
2024-02-09 14:36:14 +01:00
Aliaksandr Valialkin
cf64597878
all: add support for specifying multiple -httpListenAddr options 2024-02-09 03:22:49 +02:00
Khushi Jain
a076cb4a93
app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738)
(cherry picked from commit 83e55456e2)
2024-02-08 15:58:34 +01:00
Roman Khavronenko
a3e198588f
vmalert: set ActiveAt to evaluation timestamp in newAlert fn (#5657)
The change fixes flaky test `TestAlertingRule_Exec` which has dependency on the actual timestamps,
which resulted into inaccurate test states:
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/7608452967/job/20717699688

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 17:30:14 +01:00
Roman Khavronenko
562edb72ea
app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:43:02 +01:00
Roman Khavronenko
a2f83115ae
app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 20:51:50 +01:00
hagen1778
ede466be56
docs: fix Grafana link example for vmalert
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 18:41:38 +02:00
Aliaksandr Valialkin
885ee160c2
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-22 01:23:23 +02:00
Aliaksandr Valialkin
9e5e514faf
lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop()
Previously the was a race condition when the background goroutine still could try collecting metrics
from already stopped resources after returning from pushmetrics.Stop().
Now the pushmetrics.Stop() waits until the background goroutine is stopped before returning.

This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5549
and the commit fe2d9f6646 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
2024-01-16 21:18:22 +02:00
Aliaksandr Valialkin
d566aa7d78
lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:48:30 +02:00
Aliaksandr Valialkin
063ea8c773
app/vmalert/remotewrite: properly calculate vmalert_remotewrite_dropped_rows_total
It was calculating the number of dropped time series instead of the number of dropped samples.

While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated -
at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating
the size of snappy-compressed prompbmarshal.WriteRequest protobuf message.
Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.
2024-01-16 20:47:13 +02:00
Aliaksandr Valialkin
f7b589e38a
lib/prompb: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:43:09 +02:00
hagen1778
2a7207f38a
app/all: follow-up after 84d710beab
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-09 13:17:09 +01:00
Hui Wang
c14e229b20
vmalert: automatically add exported_ prefix for original evaluation… (#5398)
automatically add `exported_` prefix for original evaluation result label if it's conflicted with external or reserved one,
previously it was overridden.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5161

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1f477aba41)
2023-12-22 16:10:33 +01:00
Aliaksandr Valialkin
3a9cf13aaa
app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags
This is a follow-up for 5ebd5a0d7b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427
2023-12-20 21:38:16 +02:00
Aliaksandr Valialkin
261c173f4b
all: use Gauge instead of Counter for *_config_last_reload_successful metrics
This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag.
See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details.

This is follow-up for 326a77c697
2023-12-20 14:25:44 +02:00
Hui Wang
ed4f77575f
vmalert: validate schema for -external.url (#5450)
Requests with wrong or no schema in  `-external.url` could be rejected by alertmanager.
So we validate schema on start up.

(cherry picked from commit 9253c24dd6)
2023-12-15 11:54:07 +01:00
Aliaksandr Valialkin
55eb48f5ee
app: make more clear that -tls enables https at -httpListenAddr 2023-12-10 00:25:23 +02:00
Roman Khavronenko
276e9301f4
app/vmalert: sanitize label names before sending to Alertmanager (#5442)
Before, vmalert would send notifications with labels containing characters
  not supported by Alertmanager validator, resulting into validation errors
  like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"`

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-08 18:09:07 +02:00
Dmytro Kozlov
6a41e1ec0c
app/vmalert: replace error metrics for gauges with counter metrics (#5217)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 935bec447b)
2023-12-06 19:41:34 +01:00
Dmytro Kozlov
6770bad207
app/vmalert: expose /vmalert/api/v1/rule and /api/v1/rule API which returns rule status in JSON format (#5397)
* app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format

* app/vmalert: hide updates if query param not set

* app/vmalert: fix panic (recursion call)

* app/vmalert: add needed group name and file name

* app/vmalert: fix comment, update behavior

* app/vmalert: fix description

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-12-04 22:49:39 +02:00
Aliaksandr Valialkin
10b4dfbbf9
app/vmalert/notifier: remove backticks from the description for -notifier.blackhole command-line flag
Backticks in flag description are automatically converted to flag type. See https://pkg.go.dev/flag#PrintDefaults

This is a follow-up for 20025d4fd6 and 25317b4e70
2023-11-22 20:17:45 +02:00
Aliaksandr Valialkin
db6dadf1f7
docs: convert png images to webp in all the docs except of docs/operator/*
This reduces the size of docs/* folder from 33MB to 18MB

Images inside docs/operator/* must be converted at the https://github.com/VictoriaMetrics/operator/tree/master/docs
and then the updated images must be automatically propagated to the docs/operator/*

This is a follow-up for d3f919df3e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5206
2023-11-22 19:29:47 +02:00
hagen1778
0dbbffbdd5
docs: typo after 3f5a41e35e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 20025d4fd6)
2023-11-20 17:06:21 +01:00
Roman Khavronenko
c0039ce7a3
docs/vmalert: clarify deduplication recommendations for HA setup (#5336)
Please see discussion here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5279

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-16 16:27:47 +01:00
hagen1778
cfc58dd932
docs: clarify vmalert flag changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 21:44:46 +01:00
Roman Khavronenko
becf7bf8df
app/vmalert: update remote-write process (#5284)
* app/vmalert: update remote-write process

* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2023-11-13 09:25:29 +01:00
hagen1778
10da9e6e01
app/vmalert: fix typo in remoteWrite.concurrency description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c07dc45786)
2023-11-03 22:05:00 +01:00
Aliaksandr Valialkin
3d6f4da3b3
docs: update -help output after recent changes to VictoriaMetrics components 2023-11-02 20:27:16 +01:00
Roman Khavronenko
4e8c762fd9
app/vmalert: add label file pointing to the group's filename to metrics (#5281)
The filename should help identifying alerting rules belonging to specific groups
with identical names but different filenames.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5267

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit b5254199c6)
2023-11-02 16:02:29 +01:00
hagen1778
3773510e8f
app/vmalert: verify alert name correctness in restore test
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6eb205f8b0)
2023-11-02 16:02:29 +01:00
Hui Wang
44fcdf0cf0
vmalert: reduce restore query request for each alerting rule (#5265)
reduce the number of queries for restoring alerts state on start-up.
The change should speed up the restore process and reduce pressure on `remoteRead.url`.

(cherry picked from commit 90d45574bf)
2023-11-02 16:02:28 +01:00