Commit Graph

526 Commits

Author SHA1 Message Date
Roman Khavronenko
ffa75c423d
vmalert-521: allow to disable rules expression validation. (#536)
This feature may be useful for using `vmalert` with PromQL
compatible datasources like Loki.
2020-06-06 21:27:09 +01:00
Aliaksandr Valialkin
f5dd2a71a6 app/vmauth: disable automatic response compression/uncompression, since it may work improperly in some cases
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
2020-06-05 20:13:56 +03:00
Aliaksandr Valialkin
4b98e436ef app/vmauth: emit fatal errors instead of panics when incorrect command-line flags are set 2020-06-05 20:13:55 +03:00
Aliaksandr Valialkin
0d92abfbf6 app/vmalert: print brief usage info for vmalert -help 2020-06-05 10:43:18 +03:00
Aliaksandr Valialkin
ff1a725a56 app/vmauth: print brief usage info for vmauth -help 2020-06-05 10:40:00 +03:00
Aliaksandr Valialkin
05ae1472e3 app/vmagent: print brief usage info for vmagent -help 2020-06-05 10:39:59 +03:00
Aliaksandr Valialkin
f8692a1d43 app/vmauth: log when -auth.config is reloaded in SIGHUP 2020-06-03 23:22:14 +03:00
Aliaksandr Valialkin
d2f30e8d79 app/vmalert: fix comment for UpdateWith exported methods 2020-06-01 14:35:32 +03:00
Roman Khavronenko
270552fde4
vmalert: Add recording rules support. (#519)
* vmalert: Add recording rules support.

Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;

* vmalert: uncomment tests

* vmalert: rm outdated TODO

* vmalert: fix typos in README
2020-06-01 13:46:37 +03:00
Aliaksandr Valialkin
32652485e3 app/vmagent: reload -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig on SIGHUP and on /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/518
2020-05-30 14:37:12 +03:00
Aliaksandr Valialkin
d988f89415 app/vmagent: log fatal errors instead of panics when improper command-line flags are passed to vmagent 2020-05-30 14:23:14 +03:00
Aliaksandr Valialkin
5b6a9675d8 app/vmauth: fix make run-vmauth command 2020-05-22 16:45:02 +03:00
Aliaksandr Valialkin
8905bc2a40 app/vmagent: check for error returned from flag.Set 2020-05-21 16:31:14 +03:00
Aliaksandr Valialkin
f9847352b4 app/vmagent: add -dryRun option for checking all the configs mentioned in command-line flags without running vmagent
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/362
2020-05-21 15:23:27 +03:00
Aliaksandr Valialkin
dbd8beccfa app/vmselect/promql: add ascent_over_time(m[d]) and descent_over_time(m[d]) functions
These functions could be useful in GPS tracking apps for calculating the summary for height gain/loss
over the given duration `d`.
2020-05-21 12:07:48 +03:00
kreedom
6b23df2bec
vmalert add quotes escape function (#510)
* vmalert add quotes escape function

Co-authored-by: kreedom
2020-05-20 22:20:31 +03:00
Aaron France
619d4959c7 Update README.md 2020-05-20 09:04:31 +03:00
Aliaksandr Valialkin
70ea4e28a7 app/vmselect/promql: update numbers after the upgrade of github.com/VictoriaMetrics/metrics from v1.11.2 to v1.11.3 2020-05-20 03:06:23 +03:00
Aliaksandr Valialkin
80c18b7275 docs/vmagent.md: mention an alternative to refresh_interval option in scrape configs 2020-05-19 23:10:06 +03:00
Aliaksandr Valialkin
538fdfe133 app/vmselect/promql: move common code from aggrFuncOutliersK and newAggrFuncRangeTopK into getRangeTopKTimeseries 2020-05-19 16:11:14 +03:00
Aliaksandr Valialkin
f52769f6ee app/vmselect/promql: fix outilersk calculations 2020-05-19 14:44:53 +03:00
Aliaksandr Valialkin
a441cdd1d9 app/vmselect/promql: add outliersk(N, m) aggregate function for anomaly detection across groups of similar time series 2020-05-19 13:52:36 +03:00
Aliaksandr Valialkin
d0f08b4a58 app/vmalert/notifier: go fmt 2020-05-19 12:59:46 +03:00
kreedom
7e173655ba
vmalert - add expr to variables, add escape functions (#495)
* vmalert - add expr to variables, add escape functions

Co-authored-by: kreedom
2020-05-18 11:55:16 +03:00
Roman Khavronenko
92212f04da
vmalert: avoid sending resolves for pending alerts (#498)
Before the change we were sending notifications to notifier
if following conditions are met:
* alert is in Fire state
* alert is in Inactive state

We were sending Inactive notifications to resolve alert ASAP. 
Unfortunately, we were sending resolves for Pending alerts that become
Inactive, which is wrong.

In this change we delete alert from the active list if
it was Pending and become Inactive. In this way we now
have Inactive alerts only if they were in state Fire before.
See test change for example.
2020-05-17 15:13:22 +01:00
Roman Khavronenko
de60ad0cd6
vmalert: fix potential race during configuration reloads (#497)
Configuration reload and rules evaluation can't be executed
in same time now. This may make reload time longer but
prevents from potential races.
2020-05-17 15:12:09 +01:00
Aliaksandr Valialkin
eac3da478e app/vmalert: run make quicktemplate-gen from the root dir of the repository 2020-05-16 22:46:02 +03:00
Aliaksandr Valialkin
93c87d28f6 all: print --help output to stdout instead of stderr
This is easier to grep and pipe
2020-05-16 11:59:33 +03:00
Aliaksandr Valialkin
8cb35974af app/vmrestore: document better that vmrestore works like rsync --delete, i.e. it deletes files in -storageDataPath, which are missing in the backup 2020-05-16 09:22:17 +03:00
Aliaksandr Valialkin
3a68c47de0 app/vmagent/Makefile: fix make run-vmagent rule 2020-05-15 19:35:10 +03:00
Aliaksandr Valialkin
697b6af10f app/vmagent/remotewrite: remove unused import after the commit 93267f143f 2020-05-15 17:42:19 +03:00
Aliaksandr Valialkin
93267f143f app/vmagent/remotewrite: allow ingesting time series with multiple samples at once
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/481
2020-05-15 17:36:36 +03:00
Aliaksandr Valialkin
82ffbcb9a6 app/vmstorage: add vm_slow_metric_name_loads_total metric, which could be used as an indicator when more RAM is needed for improving query performance 2020-05-15 14:11:45 +03:00
Aliaksandr Valialkin
82ccdfaa91 app/vmstorage: add vm_slow_row_inserts_total and vm_slow_per_day_index_inserts_total metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series 2020-05-15 13:44:32 +03:00
Roman Khavronenko
a249cd9d22
vmalert: fix the access to rules slice element by wrong index (#486)
During group's update rules deletion was causing slice
mutations while slice index was assumed to be unchanged.
This caused "slice bounds out of range" errors when multiple
rules were deleted sequentially.
2020-05-15 07:55:22 +01:00
hagen1778
ef0e37cb9e vmalert: update README 2020-05-15 09:17:28 +03:00
Aliaksandr Valialkin
0afd48d2ee lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:02:07 +03:00
Aliaksandr Valialkin
894e5d2b9b docs/vmbackup.md: add a link to vmbackuper tool 2020-05-13 22:54:22 +03:00
Roman Khavronenko
415b1ddfb5
vmalert: check if remoteRead object was initied before calling Restore (#473)
The check for non-nil remoteRead was mistakenly dropped
during refactoring which caused panics when `vmalert`
wasn't configured with `remoteRead` flag.
2020-05-13 19:32:58 +01:00
Roman Khavronenko
db7dd96346
vmalert: fix flag names and description in README (#475)
Change also adds the recommendation for `remotewrite`
queue error.
2020-05-13 19:32:21 +01:00
肖贝贝
ba48438b06
Feat/vmalert add max queue size (#472)
* feat: add remoteWrite.maxQueueSize to reduce queue full
* rename remote(write|read) flags to remote(Write|Read) for the sake of consistency

Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-05-13 18:58:56 +01:00
Aliaksandr Valialkin
7882a0dbbf app/vmselect/promql: suppress "SA4006: this value of dstValues is never used" error in golangci-lint 2020-05-13 11:47:08 +03:00
Aliaksandr Valialkin
96e001d254 app/vmagent: fix a bug with improper relabeling when multiple -remoteWrite.urlRelableConfig args are set
This bug could result in incorrect relabeling and metrics' drop.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/467
2020-05-12 22:02:58 +03:00
Aliaksandr Valialkin
faf92a0965 app/vmselect/promql: fix any(..) calculations - return all the data points instead of the first one 2020-05-12 20:36:42 +03:00
Aliaksandr Valialkin
cc311e20fe app/vmselect/promql: add any(x) by (y) aggregate function, which returns any time series from q for each group y 2020-05-12 19:45:56 +03:00
Aliaksandr Valialkin
574289c3fb app/vmselect/promql: support for sum(x) by (y) limit N syntax in order to limit the number of output time series after aggregation 2020-05-12 19:45:54 +03:00
Aliaksandr Valialkin
0a134ace63 app/vmagent: fix scraping mTLS targets, which has been broken in v1.35.1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2020-05-12 17:23:03 +03:00
Aliaksandr Valialkin
8300cc17af app/vmagent,lib/promscrape: do not set HostClient.DialDualStack, since it isnt used if HostClient.Dial is set 2020-05-12 15:24:18 +03:00
Aliaksandr Valialkin
6273385618 app/vmagent/remotewrite: properly dial TCP6 addresses set via -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/469
2020-05-12 15:22:29 +03:00
Aliaksandr Valialkin
dbd0c552d5 lib/storage: gradually pre-populate per-day inverted index for the next day
This should prevent from CPU usage spikes at 00:00 UTC every day when
inverted index for new day must be quickly created for all the active time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
2020-05-12 12:13:05 +03:00
Roman Khavronenko
8c8ff5d0cb
vmalert: cleanup and restructure of code to improve maintainability (#471)
The change introduces new entity `manager` which replaces
`watchdog`, decouples requestHandler and groups. Manager
supposed to control life cycle of groups, rules and
config reloads.

Groups export an ID method which returns a hash
from filename and group name. ID supposed to be unique
identifier across all loaded groups.

Some tests were added to improve coverage.

Bug with wrong annotation value if $value is used in
 templates after metrics being restored fixed.

Notifier interface was extended to accept context.

New set of metrics was introduced for config reload.
2020-05-10 17:58:17 +01:00
Nikolay Khramchikhin
9e8733ff65
vmalert config reload
added config hot reload for vmalert with sighup and api call
2020-05-09 10:32:12 +01:00
Aliaksandr Valialkin
baedb25936 docs/vmauth.md: fix a link to docker images 2020-05-08 14:10:04 +03:00
Aliaksandr Valialkin
51291015a5 app/vmagent: return 200 from /-/reload endpoint as Prometheus does 2020-05-07 19:30:30 +03:00
Aliaksandr Valialkin
6afb25fd08 docs/{vmagent,vmauth}: small clarifications in the docs 2020-05-07 12:55:20 +03:00
Aliaksandr Valialkin
653d51694a app/vmauth: prevent from attacks with .. in path for accessing resources outside the configured url_prefix 2020-05-07 12:55:18 +03:00
Aliaksandr Valialkin
8a00807f60 app/vmagent: allow setting independent auth configs per each configured -remoteWrite.url 2020-05-06 16:51:41 +03:00
Aliaksandr Valialkin
b69eb7bf38 app/vmagent: properly set client-side TLS certificates for -remoteWrite.url. Previously they were mistakenly set as server-side 2020-05-06 16:50:30 +03:00
Aliaksandr Valialkin
e8936c9cb3 docs/vmagent.md: small fixes 2020-05-06 14:49:18 +03:00
Aliaksandr Valialkin
3f52a97f9b lib/promscrape: add Prometheus-compatible DNS-based service discovery aka dns_sd_configs 2020-05-06 00:01:58 +03:00
Aliaksandr Valialkin
08320cfcf4 docs/{vmauth,vmagent}: fix ports for profiling 2020-05-05 20:15:47 +03:00
Aliaksandr Valialkin
f65930b34d docs/vmauth.md: mention that we can help creating customized proxy 2020-05-05 12:34:42 +03:00
Aliaksandr Valialkin
266327642b docs/{vmagent,vmauth}: add Profiling section 2020-05-05 11:45:13 +03:00
Aliaksandr Valialkin
0c7cddfca6 docs: add vmauth.md 2020-05-05 11:17:23 +03:00
Aliaksandr Valialkin
e767aedd17 app/vmauth: add initial version of vmauth. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/README.md for details 2020-05-05 10:54:17 +03:00
Aliaksandr Valialkin
b5a780930d docs/vmagent.md: /targets page doesnt expose infomration about imporperly configured scrape configs now. It is written in error log instead 2020-05-05 10:54:14 +03:00
Roman Khavronenko
0ba1b5c71b
app/vmalert: restore alerts state from datasource metrics (#461)
* app/vmalert: restore alerts state from datasource metrics

Vmalert will restore alerts state for rules that have `rule.For` > 0 from previously written timeseries via `remotewrite.url` flag.

* app/vmalert: mention remotewerite and remoteread configuration in README
2020-05-05 00:51:22 +03:00
Aliaksandr Valialkin
40c3ffb359 lib/promscrape: add Prometheus-compatible service discovery for Consul aka consul_sd_configs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/330
2020-05-04 20:51:17 +03:00
Aliaksandr Valialkin
432187ac3b app/vminsert: add /-/reload handler in the same way as for vmagent 2020-04-30 02:15:39 +03:00
DexterZhang
67511d4165
feat(vmagent): add promscrap config reload suppport via http (#450)
* feat(vmagent): add promscrap config reload suppport via http endpoint `/-/reload`

* fix: typo fix
2020-04-30 02:00:32 +03:00
Aliaksandr Valialkin
43c39dc36c vendor: use github.com/VictoriaMetrics/fasthttp instead of github.com/fasthttp/fasthttp
The upstream fasthttp may contain issues like 996610f021 ,
plus a code that isn't used by VictoriaMetrics. So let's use a private copy under our control instead.
2020-04-29 17:33:34 +03:00
Artem Navoiev
4487b454a8
Update README.md 2020-04-29 12:39:15 +03:00
Aliaksandr Valialkin
57407cca83 app/vmselect/promql: remove -search.maxPointsPerTimeseries command-line flag
Limit the estimated time series count after aggregation with grouping by the number of source time series.
2020-04-29 00:20:04 +03:00
Aliaksandr Valialkin
4e4f57b121 lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics 2020-04-28 15:28:22 +03:00
Aliaksandr Valialkin
17d96e4503 app/vmselect: add -search.estimatedSeriesCountAfterAggregation command-line flag for tuning the probability of OOMs vs false-positive not enough memory errors 2020-04-28 12:52:37 +03:00
Aliaksandr Valialkin
1397612117 app/vmalert: added missing comments for public entities 2020-04-28 11:21:07 +03:00
Roman Khavronenko
3bfa41a95c
app/vmalert: initial remote-write support for alerts state persistence. (#442)
* app/vmalert: initial remote-write support for alerts state persistence.

If `remotewrite.url` flag is set, vmalert will send alerts state  via remote-write protocol to remote storage. The sending is asynchronous to avoid blocking calls in rules evaluation loop.

* app/vmalert: merge with master

* app/vmalert: write both `instant` and `for` alerts timeseries states in remote storage.
2020-04-28 00:18:02 +03:00
Aliaksandr Valialkin
90670cb55e app/vmalert: include it into the next release 2020-04-28 00:10:12 +03:00
Aliaksandr Valialkin
b768bc9a6a lib/promscrape: add initial support for Prometheus-compatible service discovery for Amazon EC2 aka ec2_sd_configs 2020-04-27 19:25:53 +03:00
Aliaksandr Valialkin
b4afe562c1 lib/storage: postpone reading data from blocks during search
This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics
during heavy queries, which touch big number of time series over long time ranges.

This improves single-node VM performance on heavy queries by up to 2x.
2020-04-27 11:45:24 +03:00
Aliaksandr Valialkin
0224071ebe lib/promscrape/discovery/gce: allow empty project and zone for gce_sd_config 2020-04-27 11:45:02 +03:00
Aliaksandr Valialkin
fcf57f9883 app/vmselect/netstorage: substitute sorting packedTimeseries with the natural order of the fetched blocks
This should minimize the number of disk seeks when reading data from temporary file.
2020-04-26 16:26:23 +03:00
Aliaksandr Valialkin
6954d0edb7 lib/promscrape/discovery/gce: allow empty zone arg in gce_sd_config - in this case zones for the given project are automatically discovered 2020-04-26 14:34:11 +03:00
kreedom
2c18548e08
alert - rename validate function and flags (#440)
* alert - rename validate function and flags
2020-04-26 14:15:04 +03:00
kreedom
5f61d43db9
vmalert - validate template in labels (#439) 2020-04-26 13:53:57 +03:00
肖贝贝
eeadfccdc5
fix: fix vmalert template label not complete bug (#435)
Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-04-26 13:30:10 +03:00
Aliaksandr Valialkin
1f3fd93b58 docs/{vmbackup,vmrestore}.md: update -help output 2020-04-24 22:44:21 +03:00
Jason Gardner
66af7e40f3
app/vmbackup: added ability to create and delete snapshots during backup (#428)
* app/vmbackup: added ability to create and delete snapshots during backup

Resolves: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/422

* Add snapshot create and delete url flags

* Fixed errcheck warnings in build
2020-04-24 22:35:03 +03:00
Aliaksandr Valialkin
a596aec82c app/vmselect: fix description for -search.resetCacheAuthKey 2020-04-24 19:45:50 +03:00
Aliaksandr Valialkin
9ef5935552 lib/promscrape: initial implementation for gce_sd_configs aga Prometheus-compatible service discovery for Google Compute Engine 2020-04-24 17:51:22 +03:00
Aliaksandr Valialkin
364db13c9c app/vmselect: add /api/v1/status/tsdb page with useful stats for locating root cause for high cardinality issues
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:43 +03:00
Aliaksandr Valialkin
9ebc937685 app/vmselect: add -search.minStalenessInterval command-line flag for removing gaps on graphs built from time series with irregular duration between samples
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/426
2020-04-20 19:42:15 +03:00
Aliaksandr Valialkin
fe57d46687 app/vmselect: merge -search.maxLookback and -search.maxStalenessInterval flags, since it has been appeared they have identical purpose :(
Leave both flags for backwards compatibility reasons.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/209
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/426
2020-04-20 19:26:31 +03:00
Aliaksandr Valialkin
851946af1e deployment/docker: allow building docker images on top of any base image set via ROOT_IMAGE environment var
For example, the following command will build VictoriaMetrics docker image on top of alpine image:

    ROOT_IMAGE=alpine make package-victoria-metrics
2020-04-20 01:16:57 +03:00
Aliaksandr Valialkin
936fb0eac3 app/vmagent/remotewrite: retry sending data if the server closes keep-alive connection
This should fix the following error when sending data to remote storage:

couldn't send a block with size XX bytes to "YYY": the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
2020-04-17 15:52:42 +03:00
Aliaksandr Valialkin
79fb595732 docs/vmagent.md: typo fix: unvailable -> unavailable 2020-04-17 13:11:31 +03:00
Aliaksandr Valialkin
546d26523c app/vmagent/README.md: mention about prodmscrape.suppressScrapeErrors 2020-04-17 13:08:21 +03:00
Aliaksandr Valialkin
f41e6a7bd9 app/vmselect: properly apply -search.maxLookback to queries sent to /api/v1/query 2020-04-17 12:30:11 +03:00
Aliaksandr Valialkin
071fdf5518 lib/logger: add WARN level for logging expected errors such as invalid user queries 2020-04-15 20:50:26 +03:00
Aliaksandr Valialkin
6f7f64f757 app/vmselect: handle timestamp(metric offset X) the same way as Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/415
2020-04-15 12:01:00 +03:00