Commit Graph

96 Commits

Author SHA1 Message Date
Roman Khavronenko
fa3a17938e
vmalert: follow-up after cae87da (#4269)
* vmalert: follow-up after cae87da

cae87da4bb
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: update struct comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rm typo

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 13:31:54 +02:00
Haleygo
cae87da4bb
vmalert: support reading rule from http url (#4212)
vmalert: support reading rule's config from HTTP URL
2023-05-08 09:52:57 +02:00
Zakhar Bessarab
b21a55febf
app/vmalert: add support of recursive path globs for rules and templates (#4148)
Supports using `**` for `-rule` and `-rule.templates`: `dir/**/*.tpl` loads contents of dir and all subdirectories recursively.

See: #4041

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-04-26 19:20:22 +02:00
Zakhar Bessarab
89a1c941c2
app/vmalert: return an error when using query function in -external.alert.source flag (#4191)
Templating of `-external.alert.source` is not expected to have access to the query which was causing runtime error when query function was passed as nil.
See: #4181

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-26 15:31:14 +02:00
Zakhar Bessarab
a8d1497024
app/vmalert: update Grafana URLs to match latest format (#4061)
See: #4019

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-04 15:25:29 +04:00
Roman Khavronenko
8db6a71f83
vmalert: mention VMUI example for alert's source (#4005)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:40:55 +01:00
Roman Khavronenko
3283f0dae4
vmalert: support logs suppressing during config reloads (#3973)
* vmalert: support logs suppressing during config reloads

The change is mostly required for ENT version of vmalert,
since it supports object-storage for config files.
Reading data from object storage could be time-consuming,
so vmalert emits logs to track the progress.

However, these logs are mostly needed on start or on
manual config reload. Printing these logs each time
`rule.configCheckInterval` is triggered would too verbose.
So the change allows to control logs emitting during
config reloads.

Now, logs are emitted during start up or when SIGHUP is receieved.
For periodicall config checks logs emitted by config pkg are suppressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-20 16:08:30 +01:00
Roman Khavronenko
d66bae212b
app/vmalert: log number of configration files found for each specified -rule (#3936)
The change also introduces `List` method to `FS` interface.
The `List` method can be used for wildcard support in object storage FS.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-03-09 14:46:19 +01:00
Aliaksandr Valialkin
1b5dc9f91d
all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:26:55 -08:00
Haleygo
6ef6f3a771
vmalert: fix maxResolveDuration flag note (#3827)
Signed-off-by: Haleygo <hui.wang@daocloud.io>
2023-02-16 19:26:17 +01:00
Aliaksandr Valialkin
73358571ee
app/vmalert: follow-up after d3c64aae8768d58781ee7e358bd7f3d8e0eb836d
- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
2023-02-09 18:52:00 -08:00
Roman Khavronenko
14c20c1843
vmalert: support object storage for rules (#519)
* vmalert: support object storage for rules

Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.

* review fixes
2023-02-09 18:50:48 -08:00
Roman Khavronenko
c32d8ea29e
vmalert: update docs (#3770)
vmalert: update flags description

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-06 09:51:30 +01:00
Roman Khavronenko
6fd10e8871
vmalert: speed up state restore procedure on start (#3758)
* vmalert: speed up state restore procedure on start

Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.

While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.

This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.

See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* make lint happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-03 19:46:13 -08:00
Nikolay
73256fe438
lib/netutil: init implimentation of proxy protocol (#3687)
* lib/netutil: init implimentation of proxy protocol
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-26 23:08:35 -08:00
Roman Khavronenko
6588fcbfca
vmalert: allow configuring the default number of stored rule's update states (#3556)
Allow configuring the default number of stored rule's update states in memory
 via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's
 `update_entries_limit` configuration param.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-29 12:36:44 +01:00
Aliaksandr Valialkin
a018b1d75e
app/vmalert/templates: properly escape all the special chars in quotesEscape function
Previously the `quotesEscape` function was escaping only double quotes.
This wasn't enough, since the input string could contain other special chars,
which must be escaped when put inside JSON string. For example, carriage return and line feed chars (\n\r),
backslash char, etc. This led to the following issues, which were improperly fixed:

- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890 - this issue
  was "fixed" by introducing the `crlfEscape` function, which led to unnecessary
  complications in user templates, while not fixing various corner cases
  such as backslash chars in the input string.
  See 1de15ad490

- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 - this issue
  was "fixed" by urlencoding the whole string passed to -external.alert.source
  command-line flag. This led to invalid urls, which couldn't be parsed by Grafana.
  See 00c838353d
  and 4bd0244599

This commit properly encodes the input string passed to `quotesEscape`, so it can be safely embedded inside JSON strings.

This commit deprecates crlfEscape template function and adds the following new template functions:

- strvalue and stripDomain - these functions are supported by Prometheus, so they were added
  for compatibility purposes.
- jsonEscape and htmlEscape for converting the input string to valid quoted JSON string
  and for html-escaping the input string, so it could be safely embedded as a plaintext
  into html.

This commit also documents all supported template functions at https://docs.victoriametrics.com/vmalert.html#template-functions
The deprecated crlfEscape function isn't documented on purpose, since its usefulness is negative in general case.
2022-10-28 00:01:16 +03:00
Aliaksandr Valialkin
4bd0244599
Revert "vmalert: escape query params if external alert source defined (#3267)"
This reverts commit 00c838353d.

Reason for revert: it incorrectly fixes the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3139 .
Now `-external.alert.source=explore?orgId=1&left=...` is converted to the following invalid url, which cannot be handled by Grafana:

https://grafana.example.com/explore%3ForgId%3D1%26left%3D...

The next commit will contain the correct fix of the issue - the `quotesEscape` function must
properly escape the string, so it could be embedded into JSON string. This function must
properly escape \n\r chars too. In this case the `crlfEscape` function becomes unnecessary.
Actually, the next commit makes the `crlfEscape` function deprecated.
2022-10-27 22:30:27 +03:00
Dmytro Kozlov
00c838353d
vmalert: escape query params if external alert source defined (#3267)
vmalert: escape query args if external alert source defined
2022-10-26 10:00:14 -04:00
Aliaksandr Valialkin
f9fc838b7b
app/vmalert: update -external.alert.source command-line flag description after 61544e13ad 2022-10-05 22:52:38 +03:00
Roman Khavronenko
61544e13ad
vmalert: allow using {{$labels}} for templating in -external.alert.source (#3194)
The change is supposed to provide additional flexibility for generating alert's
source link based on label values.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-10-05 19:25:03 +02:00
Aliaksandr Valialkin
c1fa9828b3
lib/flagutil: rename Array to ArrayString
This makes the ArrayString more consistent with other Array* types.

While at it, add ArrayBytes type, which will be used for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3071
2022-10-01 18:26:36 +03:00
Aliaksandr Valialkin
3b2599c659
docs/vmalert.md: update -help output after explicit marking of enterprise flags 2022-09-15 13:22:57 +03:00
Roman Khavronenko
68e56b6fc5
vmalert: set alert's source link to UI instead of JSON source (#2986)
We switch default alert's source link to redirect user
to vmalert's UI instead of previous JSON object. While it breaks
compatibility, it also supposed to improve user's experience.
The old behavior can be achieved by updating `-external.alert.source`
command-line flag.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-08-17 14:46:28 +02:00
Roman Khavronenko
f1b2273d13
vmalert: support $alertID and $groupID in template variables (#2983)
Support of these two variables allows building custom URLs with
alert's ID and group ID params.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/517#issuecomment-1207141432

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-08-16 08:08:27 +02:00
Matthew Blewitt
28441711e6
vmalert: mark some url flags as sensitive (#2965)
Other components, such as `vmagent`, mark these flags as sensitive and
hide them from the `/metrics` endpoint by default. This commit adds
similar handling to the `vmalert` component, hiding them by default, to
prevent logging of secrets inappropriately.

Showing of these values is controlled by an additional flag.

Follow up to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2947
2022-08-11 09:56:40 +02:00
Roman Khavronenko
1aa5112771
vmalert: remove notifier dependency from config (#2906)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-22 13:50:41 +02:00
Aliaksandr Valialkin
ad6b3cd47d
lib/pushmetrics: properly handle errors when initializing pushmetrics 2022-07-22 13:36:06 +03:00
Aliaksandr Valialkin
5ced032d66
all: follow-up after 46f803fa7a
Add -pushmetrics.* command-line flags to all the VictoriaMetrics apps
2022-07-21 20:36:27 +03:00
Roman Khavronenko
e9b977859b
vmalert: deprecate alert's status link (#2840)
* vmalert: deprecate alert's status link

Deprecate alert's status link `/api/v1/<groupID>/<alertID>/status` in favour of
`api/v1/alerts?group_id=<group_id>&alert_id=<alert_id>"`.

The change was needed for simplifying logic in vmselect for proxying vmalert's requests.

The old alert's status link will be still supported for a few versions but will be removed in the future.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: fix review comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-08 10:26:13 +02:00
Andrii Chubatiuk
a531a96193
added reusable templates support (#2532)
Signed-off-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2022-05-14 11:38:44 +02:00
Roman Khavronenko
331a5d9a17
Code check (#2558)
* vmstorage: make gofmt happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-09 10:11:56 +02:00
Roman Khavronenko
2b59fff526
vmalert: fix labels and annotations processing for alerts (#2403)
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.

For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.

See more details here https://github.com/prometheus/compliance/issues/80

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-06 20:24:45 +02:00
Dmytro Kozlov
11ae1ae924
Added resendDelay for alerts (#2296)
* vmalert: add support of `resendDelay` flag for alerts

Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 15:26:33 +00:00
hagen1778
2efa46a11c vmalert: support $externalLabels and $externalURL in templates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2193
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 17:33:52 +03:00
Roman Khavronenko
5da71eb685
vmalert: support configuration file for notifiers (#2127)
vmalert: support configuration file for notifiers

* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
2022-02-02 14:11:41 +02:00
Roman Khavronenko
9bb7905d26
vmalert: check if remoteWrite is configured for replay mode (#1990)
* vmalert: check if remoteWrite is configured for replay mode

The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update app/vmalert/main.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 20:25:47 +02:00
Roman Khavronenko
0afd14a14a
vmalert: introduce additional HTTP URL params per-group configuration (#1892)
* vmalert: introduce additional HTTP URL params per-group configuration

The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.

Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.

`params` fields are supported by both Prometheus and Graphite datasource types.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: provide more examples for `params` field

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: set higher priority for `params` setting

If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:45:08 +02:00
Roman Khavronenko
40fcf667b0
vmalert: continue to print errors for bad config during hot reload (#1871)
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt

Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:23:49 +02:00
Roman Khavronenko
dcd881bb7a
vmalert: properly init SIGHUP listener before starting group manager (#1725)
Regression was introduced during code refactoring. It potentially
could lead to situation when SIGHUP signals were ignored while
vmalert was still busy with initing group manager.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-19 16:35:27 +03:00
Roman Khavronenko
8df3c569c7
vmalert: add Source link to alerts UI (#1701)
The source link is controlled by `external.url` and `external.alert.source`
flags, in the same way as for alertmanager notifications.
The source link is added to Alerts list view, and specific Alert view.
2021-10-13 15:25:11 +03:00
Roman Khavronenko
5494bc02a6
vmalert: add flag to limit the max value for auto-resovle duration for alerts (#1609)
* vmalert: add flag to limit the max value for auto-resovle duration for alerts

The new flag `rule.maxResolveDuration` suppose to limit max value for
alert.End param, which is used by notifiers like Alertmanager for alerts auto resolve.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586
2021-09-13 15:48:18 +03:00
Roman Khavronenko
cfb6436be5
Vmalert extra params (#1587)
* vmalert: allow extra GET params in datasource package

ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.

* vmalert: add nocache=1 param for replay process

The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.

* vmalert: mention nocache=1 in replay description

* vmalert: fix bug with unused param
2021-08-31 14:57:47 +03:00
Roman Khavronenko
eff940aa76
Vmalert metrics update (#1580)
* vmalert: remove `vmalert_execution_duration_seconds` metric

The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.

* vmalert: update config reload success metric properly

Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.

* vmalert: add Grafana dashboard to overview application metrics

* docker: include vmalert target into list for scraping

* vmalert: extend notifier metrics with addr label

The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.

* vmalert: update documentation and docker environment for vmalert's dashboard

Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.

Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
2021-08-31 12:28:02 +03:00
Roman Khavronenko
9ee3d0378f
vmalert: add flag disableAlertgroupLabel for disabling extra label added to series (#1534)
The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611
may negatively impact deduplication in Alertmanager. The new flag supposed to give
an option to disable adding this label.

To enable flag just add `-disableAlertgroupLabel` to binary execution command.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532
2021-08-21 20:08:55 +03:00
Qifei Wan
fa9c5c5940
app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 12:55:29 +03:00
Roman Khavronenko
2a259ef5e7
vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:20:38 +03:00
Roman Khavronenko
d210958fd0
vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-25 14:27:22 +01:00
Aliaksandr Valialkin
c54bb73867 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:34:06 +03:00
Roman Khavronenko
9cdd4696fe
vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 10:07:19 +03:00