Commit Graph

572 Commits

Author SHA1 Message Date
Roman Khavronenko
4de1b2b74a
vmalert: fix labels and annotations processing for alerts (#2403)
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.

For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.

See more details here https://github.com/prometheus/compliance/issues/80

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-07 15:24:06 +03:00
Roman Khavronenko
ce1629b70a
vmalert: add flag for disabling long-lived connections (keepalive) (#2395)
The new flag `datasource.disableKeepAlive` allows disabling keepalive
connections. This may be useful if there are multiple datasource
replicas (e.g. vmselects) behind the HTTP balancer to avoid uneven
load spread because of long-lived connections.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-04 13:08:06 +03:00
Roman Khavronenko
7aa9d0f5f6
vmalert: protect executor's field from concurrent access (#2387)
Executor recently gain field for storing previously sent series.
Since the same executor object can be used in multiple goroutines,
the access to this field should be serialized.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-01 12:03:41 +03:00
Roman Khavronenko
ab10178c85
Vmalert compliance 2 (#2340)
* vmalert: split alert's `Start` field into `ActiveAt` and `Start`

The `ActiveAt` field identifies when alert becomes active for rules
with `for > 0`. Previously, this value was stored in field `Start`.

The field `Start` now identifies the moment alert became `FIRING`.

The split is needed in order to distinguish these two moments
in the API responses for alerts.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: support specific moment of time for rules evaluation

The Querier interface was extended to accept a new argument
used as a timestamp at which evaluation should be made.

It is needed to align rules execution time within the group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: mark disappeared series as stale

Series generated by alerting rules, which were sent to remote write
now will be marked as stale if they will disappear on the next
evaluation. This would make ALERTS and ALERTS_FOR_TIME series
more precise.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: evaluate rules at fixed timestamp

Before, time at which rules were evaluated was calculated
right before rule execution. The change makes sure
that timestamp is calculated only once per evalution round
and all rules are using the same timestamp.

It also updates the logic of resending of already resolved
alert notification.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: allow overridin `alertname` label value if it is present in response

Previously, `alertname` was always equal to the Alerting Rule name. Now,
its value can be overriden if series in response containt the different value
for this label.

The change is needed for improving compatibility with Prometheus.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: align rules evaluation in time

Now, evaluation timestamp for rules evaluates as if
there was no delay in rules evaluation. It means, that
rules will be evaluated at fixed timestamps+group_interval.
This way provides more consistent evaluation results and
improves compatibility with Prometheus,

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: add metric for missed iterations

New metric `vmalert_iteration_missed_total` will show
whether rules evaluation round was missed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: reduce delay before the initial rule evaluation in group

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rollback alertname override

According to the spec:
```
The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label.
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: throw err immediately on dedup detection

```
The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts
or add samples to samples receiver if there is more than one alert with the same labels
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: cleanup

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: use strings builder to reduce allocs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-01 12:03:41 +03:00
Roman Khavronenko
d907e0d9f0
docs: fix typo in vmalert's API (#2380)
The API handler was changed in 1.75 but docs
still contain the old address.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2366
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-01 12:03:41 +03:00
Aliaksandr Valialkin
ba7cfd7b25
app: sync Markdown changes from a8de1ab000 2022-03-22 14:12:03 +02:00
Aliaksandr Valialkin
3b7aefade2
docs/vmalert.md: sync after 11ae1ae924 2022-03-17 20:18:07 +02:00
Dmytro Kozlov
2f350c200a
Added resendDelay for alerts (#2296)
* vmalert: add support of `resendDelay` flag for alerts

Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-17 20:09:18 +02:00
Roman Khavronenko
35bf5bf688
Vmalert compliance improvements (#2320)
* vmalert: add support for `sortByLabel` template function

* vmalert: update API according to Prometheus conformance program

The changes to the API, field names and URL path has been made
according to the Prometheus specification for `alert_generator`
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

* vmalert: fix the timestamp of the evaluated rules

The timestamp used for alert's `EndsAt` was calculated
before sending the notification. While the correct way
is to use the timestamp taken right before rules evaluation.

* vmalert: add `-datasource.queryTimeAlignment` flag

The flag is supposed to provide ability to disable `time`
param alignment when executing rules. By default, this flag
is enabled, so it remains backward compatible.

The flag was introduced to achieve better compatibility
with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 13:22:26 +02:00
Roman Khavronenko
ec48d022df
docs: fix broken links (#2303)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 13:01:46 +02:00
Dmytro Kozlov
7f0212f964
Issue-1824: added flags and different auth types support (#2287)
* vmalert/notifier: added flags and different auth types support

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 12:52:01 +02:00
Bastien Dronneau
da722e462e
docs(vmalert): typo in path (#2278) 2022-03-16 12:24:12 +02:00
Denys Holius
fc6f71fba8
Added minimal supported version of AlertManager (#2237)
* added minimal supported version of supported AlertManager

* docs: `make docs-sync`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-22 20:09:26 +02:00
Roman Khavronenko
5a4b16794d
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:33:33 +02:00
hagen1778
11345c3fd1
vmalert: support $externalLabels and $externalURL in templates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2193
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 21:13:32 +02:00
Aliaksandr Valialkin
7e8596edb8
docs: update -help output for VictoriaMetrics components 2022-02-15 21:02:36 +02:00
Nikolay
48a9e068be
adds release build for macos darwin amd64 and arm64 (#2185)
* adds release build for macos darwin amd64 and arm64

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1896
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1851

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:42:33 +02:00
Roman Khavronenko
791cad8c2e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:51 +02:00
hagen1778
1662b5bcec
vmalert: fix bug with relative links in UI
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2167
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-10 12:21:05 +02:00
Aliaksandr Valialkin
eed66b6640
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:42:33 +02:00
Aliaksandr Valialkin
5da7f08be6
docs: cross-link downsampling docs from deduplication and vmalert docs 2022-02-07 14:53:50 +02:00
Aliaksandr Valialkin
dbead4813e
docs: updates after 5da71eb685
* Mention about the ability to configure vmalert notifiers via files in docs/CHANGELOG.md
* Mention about the ability to use Consul service discovery for vmalert notifiers in docs/CHANGELOG.md
* Run `make docs-sync` in order to sync app/vmalert/README.md to docs/vmalert.md
2022-02-02 23:42:25 +02:00
hagen1778
a67e843257
vmalert: add support of -notifier.basicAuth.passwordFile flag for notifiers
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1567

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-02 23:42:25 +02:00
hagen1778
605de7bdd6
vmalert: remove trailing slash for static notifier addresses
This would make addresses `http://localhost:9093` and `http://localhost:9093/`
both to result into `http://localhost:9093/api/v2/alerts`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-02 23:42:25 +02:00
Roman Khavronenko
2a3a62dc41
vmalert: support configuration file for notifiers (#2127)
vmalert: support configuration file for notifiers

* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
2022-02-02 23:42:25 +02:00
Aliaksandr Valialkin
755779adb7
app/vmalert: add parseDuration function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/8817
2022-01-13 23:30:56 +02:00
Aliaksandr Valialkin
0f81ecd7df
app/vmalert: add stripPort template function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/10002
2022-01-13 22:54:46 +02:00
Andrey Afoninsky
2e8c23e0f6
chore: add vmalert_remotewrite_total metric (#2040)
Co-authored-by: Andrey Afoninsky <andrey.afoninsky@booking.com>
2022-01-11 08:56:02 +02:00
Denys Holius
3a522c591a
Old links replaced for newest (#2033)
* replaced old links to the website

* fixed deletion main README.md file

* fix: added docs files after docs-sync
2022-01-05 16:32:54 +02:00
Roman Khavronenko
e92c987dc3
vmalert: check if remoteWrite is configured for replay mode (#1990)
* vmalert: check if remoteWrite is configured for replay mode

The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update app/vmalert/main.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 20:31:10 +02:00
Roman Khavronenko
1f0301c809
vmalert: always convert step value to seconds for better compatibility (#1955)
When using `vmalert` with older Prometheus versions, the passed
`step=2m` may be parsed by Prometheus with an err: "cannot parse \"2m0s\" to a valid duration".
In order to improve compatibility vmalert will always convert step duration to seconds.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1943
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-17 20:17:09 +02:00
Roman Khavronenko
e7d7f6c94e
vmalert: update the order of service labels attaching (#1922)
Service labels like `alertname` or `alertgroup` were attached
after template expanding for `labels` section. Because of this,
labels `alertname` or `alertgroup` weren't available for templating
in `labels` section of alert's definition.
This commit changes the order of labels attaching and adds a test
for verifying these labels availability.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-10 12:14:16 +02:00
Aliaksandr Valialkin
7d8c481960
app/vmalert/config: sort extra_filter labels before passing them to query args in order to get consistent order of query args across runs
This fixes TestGroupParams test - see https://github.com/VictoriaMetrics/VictoriaMetrics/runs/4432510244?check_suite_focus=true#step:5:288
2021-12-08 13:04:08 +02:00
Roman Khavronenko
582c063698
vmalert: introduce additional HTTP URL params per-group configuration (#1892)
* vmalert: introduce additional HTTP URL params per-group configuration

The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.

Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.

`params` fields are supported by both Prometheus and Graphite datasource types.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: provide more examples for `params` field

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: set higher priority for `params` setting

If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:51:54 +02:00
Roman Khavronenko
8b67168609
ci: bump go version to 1.17 (#1895)
The bump was required for `vmalert` package.
`vmalert` docs now also contain an updated description.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:51:45 +02:00
Roman Khavronenko
fc4e59ec1d
vmalert: adjust topologies docs in README (#1893)
Commit changes images width and order in topologies section
for better readability.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 10:33:47 +02:00
Roman Khavronenko
6823818843
Vmalert docs upd (#1890)
* vmalert: add topology examples in docs

* vmalert: docs typo fix
2021-12-02 10:33:47 +02:00
Roman Khavronenko
83c46eaf1a
vmalert: continue to print errors for bad config during hot reload (#1871)
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt

Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:24:20 +02:00
Roman Khavronenko
48fb972ddc
vmalert: make notifier.Addr optional (#1870)
For a long time notifier.Addr flag was required. The assumption was that vmalert will
be always used for alerting. However, practice shows that some users need only
recording rules. In this case, requirement of notifier.Addr is ambigious.

The change verifies if loaded config contains recording or alerting rules and
if there are corresponding flags set. This is true for initial config load
and hot reload.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:23:00 +02:00
Aliaksandr Valialkin
7b88daf2ec
docs/CHANGELOG.md: document 852a895b70 2021-11-30 01:22:59 +02:00
Aliaksandr Valialkin
aa01bb9349
app/vmalert/README.md: sync with docs/vmalert.md
This is a follow-up after d8c70903ec
2021-11-17 00:56:53 +02:00
Aliaksandr Valialkin
4fb19fe34b
all: consistently return application/json content-type without charset=utf-8
The `application/json` content-type has utf-8 encoding by default.
See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2021-11-09 18:07:22 +02:00
Aliaksandr Valialkin
bc357d7132
docs/vmalert.md: improve wording in Multitenancy chapter 2021-11-09 14:20:30 +02:00
Aliaksandr Valialkin
57aaf914ac
docs: mention that graphs on the official dashboards contain useful hints 2021-11-08 19:54:25 +02:00
Aliaksandr Valialkin
e58630401e
app/{vmalert,vmbackup}/README.md: sync with docs after the commit 47d1612bf8 2021-11-05 20:46:07 +02:00
Aliaksandr Valialkin
9aa40fc0c3
docs/vmalert.md: document the addition of -defaultTenant.prometheus and -defaultTenant.graphite command-line options to enterprise version of vmalert 2021-11-05 20:04:53 +02:00
Aliaksandr Valialkin
56a25146cb
app/vmalert/datasource: use plain string literals instead of constants
This removes the unneeded level of indirection and improves code readability.

The "prometheus" and "graphite" constants aren't going to change in the future, so there is no sense in hiding them behind constants.
2021-11-05 19:58:15 +02:00
Aliaksandr Valialkin
31ac507bee
app/vmalert: remove rule.type config, since it doesnt play well with the upcoming default tenants for -clusterMode
It is better from the consistency point of view to set up rule types at group level where tenant config is set up.
2021-11-05 19:52:41 +02:00
Aliaksandr Valialkin
847004fa77
app/{vminsert,vmagent}: hide passwords and auth tokens by default at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-05 14:42:13 +02:00
Aliaksandr Valialkin
2ebee4e741
app/{vmalert,vmagent}: improve the distribution of scrape offsets among targets / rules
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
2021-10-27 20:04:02 +03:00
Roman Khavronenko
5321127add
vmalert: allow groups with empty rules for compatibility reasons (#1742)
Prometheus allows to have groups with no rules, so we should support
it in vmalert as well for compatibility reasons.
It is also allowed to hot-reload empty groups by adding or removing rules.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-25 14:46:51 +03:00
Roman Khavronenko
e0f21d6000
vmalert: correctly calculate alert ID including extra labels (#1734)
Previously, ID for alert entity was generated without alertname or groupname.
This led to collision, when multiple alerting rules within the same group
producing same labelsets. E.g. expr: `sum(metric1) by (job) > 0` and
expr: `sum(metric2) by (job) > 0` could result into same labelset `job: "job"`.

The issue affects only UI and Web API parts of vmalert, because alert ID is used
only for displaying and finding active alerts. It does not affect state restore
procedure, since this label was added right before pushing to remote storage.

The change now adds all extra labels right after receiving response from the datasource.
And removes adding extra labels before pushing to remote storage.

Additionally, change introduces a new flag `Restored` which will be displayed in UI
for alerts which have been restored from remote storage on restart.
2021-10-22 12:31:35 +03:00
Aliaksandr Valialkin
5705f4b6d1
lib/httpserver: expose command-line flags at /flags page
This should simplify debugging.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695
2021-10-20 00:46:54 +03:00
Roman Khavronenko
ca49853664
vmalert: make group.ID() thread-safe (#1726)
Commit fixes potential race condition when group update
and generating of ID() happens simultaneously.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-19 16:45:52 +03:00
Roman Khavronenko
627224d493
vmalert: properly init SIGHUP listener before starting group manager (#1725)
Regression was introduced during code refactoring. It potentially
could lead to situation when SIGHUP signals were ignored while
vmalert was still busy with initing group manager.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-19 16:36:08 +03:00
Roman Khavronenko
44a88b7458
vmalert: remove extra / from path in WEB interface (#1717)
The extra `/` may cause issues when additional path prefixes
are configured. Also, removing it makes it consistent
with the rest of declarations.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-18 15:19:01 +03:00
Aliaksandr Valialkin
9273b83cd0
app/vmalert: follow-up after 0e2486df56 2021-10-18 15:08:42 +03:00
Alexander Rickardsson
0e1dbcd039
vmalert: add disablePathAppend to remote read (#1712)
* vmalert: add disablePathAppend to remoteRead

* docs: add docs for remoteRead.disablePathAppend
2021-10-18 14:59:17 +03:00
Alexander Rickardsson
63571e1334
vmalert: Redact passwords from error messages (#1713) 2021-10-18 14:59:17 +03:00
Roman Khavronenko
f393145843
Adjust http.Transport.MaxIdleConns setting for vmauth/vmalert services (#1704)
* vmalert: adjust `http.Transport.MaxIdleConns` value accordingly to `http.Transport.MaxIdleConnsPerHost`

`http.Transport.MaxIdleConnsPerHost` setting is controlled by `datasource.maxIdleConnections` flag,
while `http.Transport.MaxIdleConns` is inherited from DefaultTransport and is equal to `100`.
The fix adjusts `http.Transport.MaxIdleConns` value if it is lower than `http.Transport.MaxIdleConnsPerHost`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmauth: adjust `http.Transport.MaxIdleConns` value accordingly to `http.Transport.MaxIdleConnsPerHost`

`http.Transport.MaxIdleConnsPerHost` setting is controlled by `maxIdleConnsPerBackend` flag,
while `http.Transport.MaxIdleConns` is inherited from DefaultTransport and is equal to `100`.
The fix adjusts `http.Transport.MaxIdleConns` value if it is lower than `http.Transport.MaxIdleConnsPerHost`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-13 19:23:52 +03:00
Roman Khavronenko
14995eece6
vmalert: add Source link to alerts UI (#1701)
The source link is controlled by `external.url` and `external.alert.source`
flags, in the same way as for alertmanager notifications.
The source link is added to Alerts list view, and specific Alert view.
2021-10-13 15:26:57 +03:00
Roman Khavronenko
1bdccfc2bf
app/vmalert: remove unnecessary omitempty tag for interval param (#1649)
`omitempty` tag resulted into skipping this param on marshaling,
which was used as a checksum for groups configuration. Since on
config reload checksums are compared before applying changes,
any change to `interval` only didn't trigger config reload.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1641
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-09-23 20:47:56 +03:00
Roman Khavronenko
aa052097cf
app/vmalert: support http.pathPrefix flag in UI (#1636)
The change makes UI to respect `http.pathPrefix` flag
for API or navigation items links.
2021-09-21 18:15:08 +03:00
Roman Khavronenko
a07fa92ef2 vmalert: add new metric vmalert_remotewrite_flush_duration_seconds (#1622) 2021-09-16 14:01:24 +03:00
Roman Khavronenko
286b79dc48 vmalert: create basic auth config only if args aren't empty (#1618)
* vmalert: create basic auth config only if args aren't empty

follow-up after 68721f6

* vmalert: make lint happy
2021-09-15 09:35:38 +03:00
Aliaksandr Valialkin
63fb5a5ca5 docs/vmalert.md: follow-up after 68721f6e7d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:48:45 +03:00
Roman Khavronenko
eea6317d40 vmalert: support bearer token for datasource, remotewrite and remoteread (#1614)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:48:43 +03:00
Aliaksandr Valialkin
2932f495f3 docs/CHANGELOG.md: document 5494bc02a6 2021-09-13 17:14:45 +03:00
Roman Khavronenko
7df731ea91 vmalert: add flag to limit the max value for auto-resovle duration for alerts (#1609)
* vmalert: add flag to limit the max value for auto-resovle duration for alerts

The new flag `rule.maxResolveDuration` suppose to limit max value for
alert.End param, which is used by notifiers like Alertmanager for alerts auto resolve.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586
2021-09-13 17:14:45 +03:00
Roman Khavronenko
9dd121db36 vmalert: display extra filter labels in UI (#1613) 2021-09-13 14:17:51 +03:00
Aliaksandr Valialkin
e1632dc0df docs/vmalert.md: typo fix in Multitenancy chapter 2021-09-10 17:57:43 +03:00
Aliaksandr Valialkin
d72593cca2 app/vmalert: document GroupAlerts
This makes golint happy
2021-09-07 22:50:48 +03:00
Aliaksandr Valialkin
608ed19655 app/vmalert: follow-up after 21f022e5f0 2021-09-07 22:45:47 +03:00
Roman Khavronenko
bb26aa9b1f vmalert: add initial UI implementation (#1602)
New UI pages:
/ - welcome page with API handlers list;
/groups - list of all rules per group;
/alerts - list of all active alerts;
/groupID/alertID/status - status of the active alert;
2021-09-07 22:45:45 +03:00
Roman Khavronenko
1cf4f5a715 Vmalert extra params (#1587)
* vmalert: allow extra GET params in datasource package

ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.

* vmalert: add nocache=1 param for replay process

The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.

* vmalert: mention nocache=1 in replay description

* vmalert: fix bug with unused param
2021-09-01 12:20:01 +03:00
Nikolay
210c76b51e adds external_labels per group for vmalert (#1485)
* adds external_label per group for vmalert
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471
2021-09-01 12:19:49 +03:00
Roman Khavronenko
1cb7037fc8 Vmalert metrics update (#1580)
* vmalert: remove `vmalert_execution_duration_seconds` metric

The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.

* vmalert: update config reload success metric properly

Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.

* vmalert: add Grafana dashboard to overview application metrics

* docker: include vmalert target into list for scraping

* vmalert: extend notifier metrics with addr label

The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.

* vmalert: update documentation and docker environment for vmalert's dashboard

Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.

Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
2021-09-01 12:19:34 +03:00
Aliaksandr Valialkin
ff4c7c1a3d docs/vmalert.md: run make docs-sync after 9ee3d0378f 2021-08-21 20:25:26 +03:00
Roman Khavronenko
0c2284b95f vmalert: add flag disableAlertgroupLabel for disabling extra label added to series (#1534)
The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611
may negatively impact deduplication in Alertmanager. The new flag supposed to give
an option to disable adding this label.

To enable flag just add `-disableAlertgroupLabel` to binary execution command.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532
2021-08-21 20:23:22 +03:00
Alexander Rickardsson
9e2e9d83a5 vmalert: accept http.StatusOK for remotewrite (#1550) 2021-08-21 20:23:22 +03:00
Aliaksandr Valialkin
fe8c462044 app/vmalert: mention -remoteWrite.disablePathAppend in the description for -remoteWrite.url 2021-08-16 15:23:39 +03:00
Aliaksandr Valialkin
21974cb571 app/vmalert: follow-up for 2400f85761 2021-08-16 15:20:35 +03:00
Alexander Rickardsson
d27dc3721b vmalert: enable configuring explicit path (#1536)
* vmalert: allow to disable automatically added path to remote write address via disablePathAppend flag
* docs: update docs to include remoteWrite.disablePathAppend
2021-08-16 14:58:05 +03:00
Aliaksandr Valialkin
90efb5831b lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:32:40 +03:00
Roman Khavronenko
d5ba8248cc vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518)
* vmalert: expose new metrics for tracking number of produced samples during last evaluation

Two new metrics were added to track the number of samples produced during the last evaluation:
* vmalert_recording_rules_last_evaluation_samples
* vmalert_alerting_rules_last_evaluation_samples

The gauge type is used to remain consistent with Prometheus metric
`prometheus_rule_group_last_evaluation_samples` which is on the group level.
However, the counter type was considered as well.

Two metrics instead of one are used to make it easier to separate recording and
alerting rules. It is likely, number of samples produced by recording rules is
more important so people will refer to it more frequently.

The expected usage of the new metric is the following:
```
   - alert: RecordingRuleReturnsEmptyResults
        expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1
        annotations:
          summary: Recording rule {{$labels.recording}} returns empty results.
            Please verify expression correctness.
```

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494

* vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics
2021-08-05 10:02:35 +03:00
Qifei Wan
095bb90879 app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 16:12:48 +03:00
assassins
6ab0001a1f Performance optimization (#1481)
There are redundant steps
2021-07-28 19:29:22 +03:00
Aliaksandr Valialkin
d98e22fe50 app/vmalert: accept Prometheus-like durations in interval config option inside group section 2021-07-12 12:36:22 +03:00
Aliaksandr Valialkin
acb7a95c64 app/vmselect: follow-up after aa11ef6d3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4
2021-07-07 17:45:09 +03:00
Aliaksandr Valialkin
ceda2b1df4 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:11:29 +03:00
Roman Khavronenko
c5f493db8e Vmalert docs (#1372)
* vmalert: mention what happens if `for` is set to 0 or omitted

* vmalert: add more context to docs
2021-06-14 11:43:01 +03:00
Roman Khavronenko
f3cb2158a3 vmalert: fix mistake with object reuse while parsing response (#1370)
* vmalert: fix mistake with object reuse while parsing response

During the refactoring, the wrong optimisations was applied in
parse function which caused metric fields reset. The change removes
optimisation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369

* vmalert: add test to cover multiple metrics in one response
2021-06-11 11:30:07 +03:00
Aliaksandr Valialkin
f3749dedba docs: document rules replay feature for vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

This is a follow-up for 2a259ef5e7
2021-06-09 12:30:54 +03:00
Roman Khavronenko
5aa7846900 vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:30:54 +03:00
Roman Khavronenko
e183a5c532 vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-26 12:24:27 +03:00
Roman Khavronenko
beee24ecee vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 14:15:49 +03:00
Nikolay
23a6c9c016 changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
d77db9d813 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
69e365cd48 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:30:24 +03:00
Aliaksandr Valialkin
74ef40034c lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:25:27 +03:00
Aliaksandr Valialkin
1668280e67 docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:25:21 +03:00
Aliaksandr Valialkin
6ea191d196 docs: dealay -> delay 2021-05-18 01:07:32 +03:00
Roman Khavronenko
3428df6f15 vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-17 01:51:48 +03:00
Aliaksandr Valialkin
a6cb4f10a7 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:13:34 +03:00
Aliaksandr Valialkin
cca9670573 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:58 +03:00
Roman Khavronenko
a7f00101f5 vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:18:56 +03:00
Roman Khavronenko
35237fe1f5 vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:10:32 +03:00
Aliaksandr Valialkin
07bc021f58 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 19:53:45 +03:00
Roman Khavronenko
bb7e113dd4 vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 12:24:32 +03:00
Aliaksandr Valialkin
0a2e746175 docs/vmalert.md: update docs after afca7b430c 2021-04-30 11:49:40 +03:00
Roman Khavronenko
7394967841 vmalert: fix the typo in ApplyParams func (#1259) 2021-04-30 11:47:11 +03:00
Roman Khavronenko
6fbedd62b8 vmalert: use rule's evaluationInterval as step param by default (#1258)
User still can override param by specifying `datasource.queryStep` flag.
2021-04-30 10:03:50 +03:00
Aliaksandr Valialkin
daf2778025 docs/CHANGELOG.md: document the change from f3a048288e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
2021-04-30 09:56:47 +03:00
Roman Khavronenko
b55677e93d Vmalert: adjust time param for datasource queries according to evaluationInterval (#1257)
* Simplify arguments list for fn `queryDataSource` to improve readbility

* vmalert: adjust `time` param according to rule evaluation interval

With this change, vmalert will start to use rule's evaluation interval
for truncating the `time` param. This is mostly needed to produce consistent
time series with timestamps unaffected by vmalert start time. Now, timestamp
becomes predictable.
Additionally, adjustment is similar to what Grafana does for plotting range graphs.
Hence, recording rule series and recording rule expression plotted in grafana
suppose to become similar in most of cases.
2021-04-30 09:56:46 +03:00
Aliaksandr Valialkin
8be1cb297b app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:38:23 +03:00
Nikolay
2eb8ef7b2b changes vmalert Querier with per rule querier (#1249)
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
2021-04-29 11:31:07 +03:00
Roman Khavronenko
0ceb4f7565 vmalert: keep the returned timestamp when persisting recording rule (#1245)
Previously, vmalert used `lastExecTime` timestamp when writing recording rules
to the remote storage. This may be incorrect, if vmalert uses `datasource.lookback` flag,
which means rule's expression will be executed at some moment in the past.
To avoid such situations, vmalert now will use returned timestamp instead of `lastExecTime`.
2021-04-27 00:16:45 +03:00
Aliaksandr Valialkin
6dc5d3b357 all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin
64f1ddefe5 all: consistency renaming Victoria Metrics -> VictoriaMetrics
VMInsert -> vminsert
VMSelect -> vmselect
VMStorage -> vmstorage
2021-04-20 11:45:02 +03:00
Aliaksandr Valialkin
c872ba45b9 docs: update -help output after the commit 77be3e3a82 2021-04-12 12:35:39 +03:00
Artem Navoiev
c3dcfdef8c improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:36 +03:00
Roman Khavronenko
712725b4a5 vmalert: document template functions and mention them in README (#1197) 2021-04-08 18:20:57 +03:00
Roman Khavronenko
ff3711eea2 docs: update docs ordering and formatting (#1192)
The major change is adding `sort` directive to docs. For those docs which are copied
from internal packages `sort` is added via makefile command. For the rest it is added
manually since they're updated manually as well.

The rest of changes is connected with markdown formatting. For example, changing headers
in some files (`##` => `#`) makes navigation on .github.io to look better. This especially
useful for `changelog` docs.

Table of contents for `vmctl` is dropped, since we already have it autogenerated on .github.io.

No link changes expected. The corresponding PR to `cluster` branch will be made in follow-up PR.
2021-04-07 13:43:01 +03:00
Aliaksandr Valialkin
4028d692f5 app: do not process non-GET requests on at / handler 2021-04-02 22:56:38 +03:00
Aliaksandr Valialkin
0a8f0a4e2f all: increase minimum supported Go version for building VictoriaMetrics components from v1.14 to v1.15
This is needed after the commit c0ac740f93, which uses URL.Redacted() method,
which has been added in v1.15.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:06:36 +03:00
Aliaksandr Valialkin
82047be90b docs: add a link to the repository from build instruction for all the VictoriaMetrics components 2021-03-25 17:16:55 +02:00
Aliaksandr Valialkin
450e23533d docs/vmalert.md: remove misleading -evaluationInterval=3s from example config args
3s evaluation interval is too small for practical setups. It can result in increased load on datasource.
So it is better to remove it from example config args, which are usually copy-pasted by novice users.
2021-03-25 15:31:10 +02:00
Aliaksandr Valialkin
3caac5edd4 Makefile: prepare vmutils-windows-*.zip archive on make release-vmutils command
The archive contains the following executables for Windows:

* vmagent
* vmalert
* vmauth
* vmctl

Other components - vmbackup, vmrestore, victoria-metrics - aren't supported for Windows yet
2021-03-16 20:54:10 +02:00
Aliaksandr Valialkin
e2717d84c0 all: various fixes in command-line flag descriptions 2021-03-15 22:03:49 +02:00
Ihor Borodin
933de6b9b1 Fixing examples of external.alert.source in documentation (#1120)
* Fixing examples of external.alert.source in documentation
2021-03-10 12:08:22 +02:00
Aliaksandr Valialkin
d109e17f46 all: bump minimum supported Go version from 1.13 to 1.14 2021-03-03 15:58:17 +02:00
Aliaksandr Valialkin
2ecee0515a app/vmalert/README.md: sync with docs/vmalert.md 2021-03-03 10:42:54 +02:00
Aliaksandr Valialkin
d9e8af0e8f docs: actualize -help output 2021-03-01 17:02:05 +02:00
Nikolay
b52d1e4f19 adds query params for vmalert (#1094)
remoteWrite.url now accepts query params at provided url
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1087
2021-02-28 14:12:34 +02:00
Aliaksandr Valialkin
901710b9e2 app/vmalert: add missing multiarch Dockerfile 2021-02-18 15:23:57 +02:00
Roman Khavronenko
2aa37b0450 vmalert: mention -datasource.appendTypePrefix in README (#1052) 2021-02-03 23:48:45 +02:00
Dmitry Shevchuk
007fd6ce9c Adds ability to query right vmselect endpoint based on the query type (#1050)
* Adds ability to query right vmselect endpoint based on the query type

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2021-02-03 23:48:44 +02:00
Aliaksandr Valialkin
88ee836d0c docs/vmalert.md: mention that type option can be set at group level additionally to rule level 2021-02-03 21:12:39 +02:00
Aliaksandr Valialkin
6811445b64 docs: document ability to query Graphite datasource from vmalert 2021-02-01 15:28:31 +02:00
Nikolay
b8bc1c2e0f Graphite vmalert wip (#112)
* init implementation for graphite alerts

* adds graphite support for vmalert

* small fix

* changes vmalert graphite api with type

* updates tests

* small fix

* fixes graphite parse

* Fixes graphite from time
2021-02-01 15:28:30 +02:00
weng zhao
2a8a34ea05 vmalert: add option datasource.queryStep to allow user to address the inconsistency between grafana dashboards(query_range with step 15s usually) and ALERTS (#1027)
Co-authored-by: zhao.weng <zhao.weng@shopee.com>
2021-01-26 16:38:20 +02:00
Roman Khavronenko
304512b668 vmalert-989: return non-empty result in template func query stub to pass validation (#1002)
On templates validation stage vmalert does not acutally send queries, so for complex
chained expression validation may fail. To avoid this, we add a blank sample in response
so validation can pass successfully. Later, during the rule execution, stub will be replaced
with real `query` function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/989
2021-01-11 12:59:33 +02:00
Nikolay
14915071d6 adds escape for CRLF (#984)
at external.alert.source - \n and \r symbols was url encoded, instead of direct usage.
replace it from "\n" to `\n`  allows to skip url encoding.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
2020-12-25 11:06:47 +02:00
Aliaksandr Valialkin
b480585905 app/vmalert: typo fix in descriptions for notifier.basicAuth.username and notifier.basicAuth.password command-line flags 2020-12-24 12:49:40 +02:00
Aliaksandr Valialkin
d8511b6651 docs: mention that it is possible to set multiple -notifier.tlsInsecureSkipVerify command-line flags for vmalert
See c3a92968343c2b3619f1ab935702d0e9b3a46733
2020-12-22 22:32:56 +02:00
Nikolay
67e470e598 changes vmalert notifier flag, (#978)
fixes issue with notifier insecure setting, now its possible to use multiple notifier.tlsInsecureSkipVerify multiple time.
2020-12-22 22:27:03 +02:00
Roman Khavronenko
9ce8b36d2a vmalert-974: fix order for labels templating (#975)
The change fixes bug caused by 3adf8c5a6f.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/974
2020-12-19 14:21:27 +02:00
Roman Khavronenko
9f578e389c vmalert: add function "query", "first" and "value" to alert templates functions (#960)
The commit adds a support for template function `query`,
`first` and `value`. The function `query` executes
a MetricsQL query for active alerts. In vmalert we
update templates on every evaluation for active alerts
to keep them up to date. With `query` func it may become
a perf issue since it will fire a query on every execution.
We should keep it in mind for now.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539
2020-12-14 20:12:16 +02:00
Aliaksandr Valialkin
fc82c22e50 docs: consistently use links to https://victoriametrics.github.io for documentation references 2020-12-11 21:09:17 +02:00
Aliaksandr Valialkin
1a237c6903 all: properly handle CPU limits set on the host system/container
This can reduce memory usage on systems with enabled CPU limits.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
2020-12-08 21:07:03 +02:00
Aliaksandr Valialkin
7bdf07883b app/{vmalert,vmagent}: skip empty values in -remoteWrite.label and -label lists 2020-12-08 14:54:02 +02:00
Aliaksandr Valialkin
bdac2171f1 all: do not print usage info for all the flags when incorrect command-line flag is passed
This should improve usability for VictoriaMetrics apps that have big number of command-line flags,
i.e. all the apps.
2020-12-03 21:46:19 +02:00
Nikolay
e4e33cb757 fixes checksum calculation (#928)
* fixes checksum calculation,
'for' rule param wasnt marshal properly during checksum calculation

* fixes error
2020-11-29 09:50:57 +02:00
Aliaksandr Valialkin
7ceaf4ba8f all: consistently return text-based HTTP responses with charset=utf-8
This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2020-11-13 10:30:21 +02:00
Roman Khavronenko
4fd2b6cd16 vmalert: explicitly set extra labels to alert entities (#886)
The previous implementation treated extra labels (global and rule labels) as
separate label set to returned time series labels. Hence, time series always contained
only original labels and alert ID was generated from sorted labels key-values.
Extra labels didn't affect the generated ID and were applied on the following actions:
- templating for Summary and Annotations;
- persisting state via remote write;
- restoring state via remote read.

Such behaviour caused difficulties on restore procedure because extra labels had to be dropped
before checking the alert ID, but that not always worked. Consider the case when expression
returns the following time series `up{job="foo"}` and rule has extra label `job=bar`.
This would mean that restored alert ID will be always different to the real time series because
of collision.

To solve the situation extra labels are now always applied beforehand and `vmalert` doesn't
store original labels anymore. However, this could result into a new error situation.
Consider the case when expression returns two time series `up{job="foo"}` and `up{job="baz"}`,
while rule has extra label `job=bar`. In such case, applying extra labels will result into
two identical time series and `vmalert` will return error:
 `result contains metrics with the same labelset after applying rule labels`

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
2020-11-10 00:27:56 +02:00
Roman Khavronenko
333675875f vmalert: skip automatically added labels on alerts restore (#871)
Label `alertgroup` was introduced in #611 and automatically added to generated
time series. By mistake, this new label wasn't correctly purged on restore event
and affected alert's ID uniqueness. This commit removes `alertgroup` label
in restore function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
2020-11-01 23:26:00 +02:00
kreedom
4526cf92d3 vmalert - add dryRun (#842)
vmalert: add `dryRun` flag for rules validation without running the service
2020-10-20 10:49:22 +03:00
Roman Khavronenko
d6155a3f33 vmalert: update docs to highlight the state restore requirements; (#833)
Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/830
2020-10-13 18:34:00 +03:00
Aliaksandr Valialkin
4b1c401790 app/vmalert: accept days, weeks and years in for: part of config like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817
2020-10-08 20:13:20 +03:00
Aliaksandr Valialkin
f9f8e4a39c app/vmalert: do not pring description for all the flags on config errors
The description is too big to consume by human and it just distracts humans.
2020-10-08 13:35:46 +03:00
Dmitry Shihovtsev
aec863e70b Fix typos in the vmalert datasource (#814)
* Fix typos in the vmalert datasource

* Fix typo in the vmalert datasource test
2020-10-07 18:00:29 +03:00
Roman Khavronenko
368b890e11 vmalert: make maxIdleConnections configurable for datasource HTTP client (#797)
Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/795
2020-09-30 09:51:14 +03:00
Aliaksandr Valialkin
543f3aea97 all: consistently use "%w" formatting in fmt.Errorf for wrapped errors 2020-09-23 22:48:21 +03:00
Aliaksandr Valialkin
8cd89cb847 app/vmalert: remove unneeded UTC() call
UTC() doesn't change the underlying timestamp, so the call isn't needed here
2020-09-21 15:56:48 +03:00
Roman Khavronenko
d111969d39 vmalert: add support for datasource.lookback flag (#779)
New datasource flag `datasource.lookback` defines how far to look into
past when evaluating queries.

Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/668
2020-09-21 15:56:47 +03:00
Roman Khavronenko
0042b0f307 vmalert: fix the typo in error message (#782)
The error will be always nil so no sense in printing it.
2020-09-21 11:36:09 +03:00
Roman Khavronenko
e2b31590e6 vmalert: add Group name as label to generated alerts and timeseries (#761)
Solves #611
2020-09-11 23:41:12 +03:00
Roman Khavronenko
16e0bb496e vmalert: update groups on config reload only if changes detected (#759)
On config reload event `vmalert` reloads configuration for every group. While
it works for simple configurations, the more complex and heavy installations may
suffer from frequent config reloads.
The change introduces the `checksum` field for every group and is set to md5 hash
of yaml configuration. The checksum will change if on any change to group
definition like rules order or annotation change. Comparing the `checksum` field
on config reload event helps to detect if group should be updated.
The groups update is now done concurrently, so reload duration will be limited by
the slowest group now.

Partially solves #691 by improving config reload speed.
2020-09-11 23:41:12 +03:00
Aliaksandr Valialkin
475698d2ad docs: sync docs for vmalert, vmauth, vmbackup and vmrestore 2020-09-09 21:10:48 +03:00
Nikolay Khramchikhin
80a9dc79fe changed vmalert behaviour (#738)
* VMAlert start with empty rules dir

There are some applications (operator for instance), that generates alerts configuration at runtime
and vmalert must start correctly without rules to support this behaviour.
Later application will add rules files and send SIGHUP to vmalert,
which will trigger reading rules files and start rules exectuion.

Removing rules files with SIGHUP signal must stop rules execution and
vmalert will wait for new rules.

* imports sorted

* added test cases for empty rules, removed blank line

* fixed imports conflict

* updated tests
2020-09-03 11:07:40 +03:00
Aliaksandr Valialkin
7ac10ee978 app/vmalert: imrovements over 3f932c2db1 2020-09-03 01:14:30 +03:00
DexterZhang
85f49ad439 feat: spread load of rule evaluation by group when starting new groups (#724)
* feat: spread load of rule evaluation by group when starting new groups

* review: reduce the resulting diff.

* Update app/vmalert/group.go

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2020-09-03 01:14:26 +03:00
Roman Khavronenko
08b76cb26f vmalert: update -rule flag description to enforce quotes using (#709)
Description for `-rule` flag uses as example specific chars like asterisks
which could be interpreted wrong by different shells. To avoid this, description
now contains quoted flag values.

See also #708
2020-08-28 09:46:35 +03:00
Aliaksandr Valialkin
e7c0b2ca56 docs: update docs 2020-08-14 19:14:46 +03:00
Aliaksandr Valialkin
60c7397be5 all: support %{ENV_VAR} placeholders in yaml configs in all the vm* components
Such placeholders are substituted by the corresponding environment variable values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/583
2020-08-13 17:17:06 +03:00
Aliaksandr Valialkin
6721e47ae9 app: respect CPU limits set via cgroups
Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage
for cases when VictoriaMetrics components run in containers with CPU limits.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685
2020-08-11 23:01:03 +03:00
Roman Khavronenko
78afc61896 app/vmalert: extend metrics set exported by vmalert #573 (#654)
* app/vmalert: extend metrics set exported by `vmalert` #573

New metrics were added to improve observability:
+ vmalert_alerts_pending{alertname, group} - number of pending alerts per group
per alert;
+ vmalert_alerts_acitve{alertname, group} - number of active alerts per group
per alert;
+ vmalert_alerts_error{alertname, group} - is 1 if alertname ended up with error
during prev execution, is 0 if no errors happened;
+ vmalert_recording_rules_error{recording, group} - is 1 if recording rule
 ended up with error during prev execution, is 0 if no errors happened;
* vmalert_iteration_total{group, file} - now contains group and file name labels.
This should improve control over specific groups;
* vmalert_iteration_duration_seconds{group, file} - now contains group and file name labels. This should improve control over specific groups;

Some collisions for alerts and recording rules are possible, because neither
group name nor alert/recording rule name are unique for compatibility reasons.

Commit contains list of TODOs for Unregistering metrics since groups and rules
are ephemeral and could be removed without application restart. In order to
unlock Unregistering feature corresponding PR was filed - https://github.com/VictoriaMetrics/metrics/pull/13

* app/vmalert: extend metrics set exported by `vmalert` #573

The changes are following:
* add an ID label to rules metrics, since `name` collisions within one group is
a common case - see the k8s example alerts;
* supports metrics unregistering on rule updates. Consider the case when one rule
was added or removed from the group, or the whole group was added or removed.

The change depends on https://github.com/VictoriaMetrics/metrics/pull/16
where race condition for Unregister method was fixed.
2020-08-09 09:42:05 +03:00
Aliaksandr Valialkin
67cacb22ac lib/httpserver: add -tls, -tlsCertFile and -tlsKeyFile command-line flags in every vm binary
This makes such binaries compatible with binaries from `master` branch (aka single-node version)

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/677
2020-08-07 10:57:32 +03:00
Aliaksandr Valialkin
106e302d7a all: add mssing APP_NAME to vm*-GOARCH builds 2020-07-31 13:45:32 +03:00
Aliaksandr Valialkin
945645f38f docs/{vmagent,vmalert}: add instruction on how to build for ARM 2020-07-31 09:25:41 +03:00
Roman Khavronenko
ec6ed467c6 app/vmalert: support external.label to specify global labelset for all rules #622 (#652)
`external.label` flag supposed to help to distinguish alert or recording rules
source in situations when more than one `vmalert` runs for the same datasource
or AlertManager.
2020-07-28 14:23:04 +03:00
Aliaksandr Valialkin
31ef39e8da lib/httpserver: log remote address in error message from httpserver.Errorf
This should improve detection of the root cause of errors.
Thanks to Anant for the idea.
2020-07-20 14:06:29 +03:00
Aliaksandr Valialkin
ce381b3868 app/vmalert: consistently use "%w" instead of "%s" in fmt.Errorf when wrapping errors 2020-07-15 13:55:13 +03:00
Roman Khavronenko
9afd19d375 app/vmalert: add retries to remotewrite (#605)
* app/vmalert: add retries to remotewrite

Remotewrite pkg now does limited number of retries if write request failed.
This suppose to make vmalert state persisting more reliable.

New metrics were added to remotewrite in order to track rows/bytes sent/dropped.

defaultFlushInterval was increased from 1s to 5s for sanity reasons.

* fix

* wip

* wip

* wip

* fix bits alignment bug for 32-bit systems

* fix mistakenly dropped field
2020-07-05 18:47:38 +03:00
Ween
d28fb0baf9 [VMAlert] Fix error log when remoteWrite queue size is full (#602)
* Fix Auto metrics relabeled errors

* Finalize auto-genenated  Labels

* Fix Test Errors

* fix error logs when queue is full

Co-authored-by: xinyulong <xinyulong@kuaishou.com>
2020-07-03 16:50:43 +03:00
Aliaksandr Valialkin
a45856570b all: typo fix: exptected -> expected 2020-07-02 18:06:21 +03:00
BigFish
aa26b94f33 fix: spelling mistakes (#594)
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2020-07-01 01:36:40 +03:00
Aliaksandr Valialkin
d962568e93 all: use %w instead of %s for wrapping errors in fmt.Errorf
This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode .
See https://blog.golang.org/go1.13-errors for details.
2020-06-30 23:33:46 +03:00
Roman Khavronenko
156c83d112 app/vmalert: support multiple notifier urls (#584) (#590)
* app/vmalert: support multiple notifier urls (#584)

User now can set multiple notifier URLs in the same fashion
as for other vmutils (e.g. vmagent). The same is correct for
TLS setting for every configured URL. Alerts sending is done
in sequential way for respecting the specified URLs order.

* app/vmalert: add basicAuth support for notifier client (#585)

The change adds possibility to set basicAuth creds for notifier
client in the same fasion as for remote write/read and datasource.
2020-06-29 22:21:56 +03:00
Roman Khavronenko
bbeab70de6 app/vmalert: move flags description and initialization into subpackages
The change adds no new functionality and aims to move flags definitions
to subpackages that are using them. This should improve readability
of the main function.
2020-06-29 22:18:29 +03:00
kreedom
63c36e2e69 app/vmalert: properly set transport for HTTP clients
Fixes issue #586
2020-06-29 22:18:25 +03:00
nicbaz
ea2ed4b7e8 vmalert: add support for TLS configuration (#578)
app/vmalert: add support for TLS configuration

Add support for TLS optional configuration in a similar fashion to what
is currently supported in other vmutils such as vmagent. TLS
configuration options are distinct for datasource, remoteRead,
remoteWrite as well as notifier.
2020-06-23 22:47:23 +03:00
kreedom
f227799c87 Support of custom URL path for alert (#560)
app/vmalert: Support custom URL for alerts source

Add flag `external.alert.source` for configuring custom URL
for alert's source. This may be handy to re-point default source
URL to other systems like Grafana.
Updates #517
2020-06-21 16:33:58 +03:00
Roman Khavronenko
1a01fe2cf2 vmalert-537: allow name duplication for rules within one group. (#559)
Uniqueness of rule is now defined by combination of its name, expression and
labels. The hash of the combination is now used as rule ID and identifies rule within the group.

Set of rules from coreos/kube-prometheus was added for testing purposes to
verify compatibility. The check also showed that `vmalert` doesn't support
`query` template function that was mentioned as limitation in README.
2020-06-18 18:54:35 +03:00
Clémence Saussez
0b53e380cf app/vmalert: fix link to testdata (#547)
Fix broken link to vmalert test data
Signed-off-by: Clemence Saussez <clemence@zen.ly>
2020-06-10 19:37:21 +03:00
Roman Khavronenko
d71b6e6584 vmalert-491: allow to configure concurrent rules execution per group. (#542)
The feature allows to speed up group rules execution by
executing them concurrently.

Change also contains README changes to reflect configuration
details.
2020-06-09 15:22:11 +03:00
Roman Khavronenko
5c049bf4dd vmalert-521: allow to disable rules expression validation. (#536)
This feature may be useful for using `vmalert` with PromQL
compatible datasources like Loki.
2020-06-09 15:19:25 +03:00
Aliaksandr Valialkin
58069f5a6a app/vmalert: print brief usage info for vmalert -help 2020-06-05 10:43:24 +03:00
Aliaksandr Valialkin
045b87c662 app/vmalert: fix comment for UpdateWith exported methods 2020-06-01 14:35:03 +03:00
Roman Khavronenko
44c51c627f vmalert: Add recording rules support. (#519)
* vmalert: Add recording rules support.

Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;

* vmalert: uncomment tests

* vmalert: rm outdated TODO

* vmalert: fix typos in README
2020-06-01 13:53:46 +03:00
kreedom
2752d6cb26 vmalert add quotes escape function (#510)
* vmalert add quotes escape function

Co-authored-by: kreedom
2020-05-21 12:10:35 +03:00
Aliaksandr Valialkin
9ca781b8f0 app/vmalert/notifier: go fmt 2020-05-19 13:00:18 +03:00
kreedom
27911ae179 vmalert - add expr to variables, add escape functions (#495)
* vmalert - add expr to variables, add escape functions

Co-authored-by: kreedom
2020-05-19 11:55:03 +03:00
Roman Khavronenko
c7f3e58032 vmalert: avoid sending resolves for pending alerts (#498)
Before the change we were sending notifications to notifier
if following conditions are met:
* alert is in Fire state
* alert is in Inactive state

We were sending Inactive notifications to resolve alert ASAP. 
Unfortunately, we were sending resolves for Pending alerts that become
Inactive, which is wrong.

In this change we delete alert from the active list if
it was Pending and become Inactive. In this way we now
have Inactive alerts only if they were in state Fire before.
See test change for example.
2020-05-19 11:55:00 +03:00
Roman Khavronenko
e5f5342e18 vmalert: fix potential race during configuration reloads (#497)
Configuration reload and rules evaluation can't be executed
in same time now. This may make reload time longer but
prevents from potential races.
2020-05-19 11:54:55 +03:00
Aliaksandr Valialkin
b99d03a956 app/vmalert: run make quicktemplate-gen from the root dir of the repository 2020-05-16 22:45:45 +03:00
Aliaksandr Valialkin
2784015a4d all: print --help output to stdout instead of stderr
This is easier to grep and pipe
2020-05-16 12:03:06 +03:00
Roman Khavronenko
e850bf0eff vmalert: fix the access to rules slice element by wrong index (#486)
During group's update rules deletion was causing slice
mutations while slice index was assumed to be unchanged.
This caused "slice bounds out of range" errors when multiple
rules were deleted sequentially.
2020-05-15 13:26:06 +03:00
hagen1778
d369450f27 vmalert: update README 2020-05-15 13:26:04 +03:00
Aliaksandr Valialkin
3845420a8f lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:06:50 +03:00
Roman Khavronenko
e208e76222 vmalert: check if remoteRead object was initied before calling Restore (#473)
The check for non-nil remoteRead was mistakenly dropped
during refactoring which caused panics when `vmalert`
wasn't configured with `remoteRead` flag.
2020-05-13 22:57:26 +03:00
Roman Khavronenko
1523890742 vmalert: fix flag names and description in README (#475)
Change also adds the recommendation for `remotewrite`
queue error.
2020-05-13 22:57:20 +03:00
肖贝贝
8c3e9adf7f Feat/vmalert add max queue size (#472)
* feat: add remoteWrite.maxQueueSize to reduce queue full
* rename remote(write|read) flags to remote(Write|Read) for the sake of consistency

Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-05-13 22:57:16 +03:00
Roman Khavronenko
0157566fdb vmalert: cleanup and restructure of code to improve maintainability (#471)
The change introduces new entity `manager` which replaces
`watchdog`, decouples requestHandler and groups. Manager
supposed to control life cycle of groups, rules and
config reloads.

Groups export an ID method which returns a hash
from filename and group name. ID supposed to be unique
identifier across all loaded groups.

Some tests were added to improve coverage.

Bug with wrong annotation value if $value is used in
 templates after metrics being restored fixed.

Notifier interface was extended to accept context.

New set of metrics was introduced for config reload.
2020-05-11 14:35:55 +03:00
Nikolay Khramchikhin
0e8c345ffb vmalert config reload
added config hot reload for vmalert with sighup and api call
2020-05-11 14:35:50 +03:00
Roman Khavronenko
abce2b092f app/vmalert: restore alerts state from datasource metrics (#461)
* app/vmalert: restore alerts state from datasource metrics

Vmalert will restore alerts state for rules that have `rule.For` > 0 from previously written timeseries via `remotewrite.url` flag.

* app/vmalert: mention remotewerite and remoteread configuration in README
2020-05-05 00:52:19 +03:00
Artem Navoiev
121f7e1d56 Update README.md 2020-04-29 17:41:04 +03:00
Aliaksandr Valialkin
9ed4951ec8 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:30:06 +03:00
Aliaksandr Valialkin
a858b7e393 app/vmalert: added missing comments for public entities 2020-04-28 11:19:48 +03:00
Aliaksandr Valialkin
50af16baf2 app/vmalert: fix build 2020-04-28 00:34:01 +03:00
Aliaksandr Valialkin
e3db2c73a6 app/vmalert: sync with master branch 2020-04-28 00:19:42 +03:00
Aliaksandr Valialkin
7644f40763 app/vmalert: include it into the next release 2020-04-28 00:11:41 +03:00