Commit Graph

506 Commits

Author SHA1 Message Date
Roman Khavronenko
bbeab70de6 app/vmalert: move flags description and initialization into subpackages
The change adds no new functionality and aims to move flags definitions
to subpackages that are using them. This should improve readability
of the main function.
2020-06-29 22:18:29 +03:00
kreedom
63c36e2e69 app/vmalert: properly set transport for HTTP clients
Fixes issue #586
2020-06-29 22:18:25 +03:00
Aliaksandr Valialkin
2b504f17de docs: update the info that docker images are built on top of alpine image now
A follow-up after the commit ff624c9125
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/522
2020-06-26 13:52:25 +03:00
Aliaksandr Valialkin
a586b8b6d4 app/vminsert/netstorage: do not re-route every time series to more than two vmstorage nodes when certain vmstorage nodes are temporarily slower than the rest of them
Previously vminsert may spread data for a single time series across all the available vmstorage nodes
when vmstorage nodes couldn't handle the given ingestion rate. This could lead to increased usage
of CPU and memory on every vmstorage node, since every vmstorage node had to register all the time
series seen in the cluster. Now a time series may spread to maximum two vmstorage nodes under heavy load.
Every time series is routed to a single vmstorage node under normal load.
2020-06-25 16:42:37 +03:00
Aliaksandr Valialkin
12b87b2088 app/vmselect/netstorage: reset big result values every 10 seconds instead of after processing every time series
This should reduce GC pressure when processing time series with big number of rows
2020-06-24 19:37:35 +03:00
nicbaz
46c5c0772c vmselect: fix label_replace when mismatch (#579)
As per documentation on `label_replace` function: "If the regular
expression doesn't match then the timeseries is returned unchanged".

Currently this behavior is not enforced, if a regexp on an existing
tag doesn't match then the tag value is copied as-is in the destination
tag. This fix first checks that the regular expression matches the
source tag before applying anything.

Given the current implementation, this fix also changes the behavior
of the **MetricsQL** `label_transform` function which does not
document this behavior at the moment.
2020-06-23 23:54:29 +03:00
nicbaz
ea2ed4b7e8 vmalert: add support for TLS configuration (#578)
app/vmalert: add support for TLS configuration

Add support for TLS optional configuration in a similar fashion to what
is currently supported in other vmutils such as vmagent. TLS
configuration options are distinct for datasource, remoteRead,
remoteWrite as well as notifier.
2020-06-23 22:47:23 +03:00
Aliaksandr Valialkin
0fdbe5de25 app/vmselect/netstorage: increase concurrency when processing small number of time series with big number of data points per each time series
Previously VictoriaMetrics was processing up to 32 time series in a single goroutine.
This could be slow if each time series contains big number of data points (10M+ or more), since only a single CPU core could be loaded with work,
while other CPU cores were idle. Fix this by launching GOMAXPROCS workers for time series processing.

This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/572
2020-06-23 22:45:57 +03:00
Aliaksandr Valialkin
3a444bb7bb lib/promrelabel: add support for keep_if_equal and drop_if_equal actions to relabel configs
These actions may be useful for filtering out unneeded targets and/or metrics if they contain equal label values.
For example, the following rule would leave the target only if __meta_kubernetes_annotation_prometheus_io_port
equals __meta_kubernetes_pod_container_port_number:

  - action: keep_if_equal
    source_labels: [__meta_kubernetes_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
2020-06-23 17:29:19 +03:00
kreedom
f227799c87 Support of custom URL path for alert (#560)
app/vmalert: Support custom URL for alerts source

Add flag `external.alert.source` for configuring custom URL
for alert's source. This may be handy to re-point default source
URL to other systems like Grafana.
Updates #517
2020-06-21 16:33:58 +03:00
Aliaksandr Valialkin
70bf8218bb app/vmselect/promql: properly override label values from group_left and group_right lists like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/577
2020-06-21 16:32:27 +03:00
Aliaksandr Valialkin
2fc2679a3f app/vminsert/netstorage: remove possible race condition when broken connection may be recovered before acquiring storageNode.bcLock 2020-06-20 16:38:08 +03:00
Aliaksandr Valialkin
9409a31c07 docs/vmauth.md: mention that we can provide custom integration with SAML 2020-06-19 13:13:53 +03:00
Aliaksandr Valialkin
4400700832 app/vminsert: properly replicate data for the last RF-1 storage nodes for -replicationFactor=RF
Previously the data for the last `RF-1` storage noes has been incorrectly replicated to the first storage node.
2020-06-19 12:40:22 +03:00
Aliaksandr Valialkin
4f673a5201 app/vminsert: export metrics for determining ingested rows with dropped or truncated labels
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/565
2020-06-19 01:12:44 +03:00
Aliaksandr Valialkin
6939e36fdd app/vmselect/promql: fill gaps on right side with values from left side of or operator in the same way as Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/552
2020-06-18 23:05:23 +03:00
Aliaksandr Valialkin
85c1ccb8b8 app/vminsert/netstorage: add missing return in storageNode.checkHealth on connection failure 2020-06-18 20:51:51 +03:00
Aliaksandr Valialkin
464682f380 app/vminsert/netstorage: periodically check for each -storageNode health, so it could be marked as healthy when it is ready to accept data
This fixes uneven data routing in cluster version when `-replicationFactor` is set to 1 (default value),
i.e. when the replication is disabled.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/546
2020-06-18 20:42:43 +03:00
Roman Khavronenko
1a01fe2cf2 vmalert-537: allow name duplication for rules within one group. (#559)
Uniqueness of rule is now defined by combination of its name, expression and
labels. The hash of the combination is now used as rule ID and identifies rule within the group.

Set of rules from coreos/kube-prometheus was added for testing purposes to
verify compatibility. The check also showed that `vmalert` doesn't support
`query` template function that was mentioned as limitation in README.
2020-06-18 18:54:35 +03:00
Aliaksandr Valialkin
87151e825e docs/vmbackup.md: mention that backups from single-node and cluster versions are incompatible 2020-06-18 18:54:34 +03:00
Aliaksandr Valialkin
cc2225cc49 app/vmselect: fix the error after 936f35920a 2020-06-12 22:00:45 +03:00
Aliaksandr Valialkin
936f35920a app/vmselect/prometheus: allow returning partial response from /api/v1/export if -search.denyPartialResponse=false
This makes `/api/v1/export` behaviour consistent with other `/api/v1/*` handlers.
2020-06-12 21:11:48 +03:00
Clémence Saussez
0b53e380cf app/vmalert: fix link to testdata (#547)
Fix broken link to vmalert test data
Signed-off-by: Clemence Saussez <clemence@zen.ly>
2020-06-10 19:37:21 +03:00
Roman Khavronenko
d71b6e6584 vmalert-491: allow to configure concurrent rules execution per group. (#542)
The feature allows to speed up group rules execution by
executing them concurrently.

Change also contains README changes to reflect configuration
details.
2020-06-09 15:22:11 +03:00
Roman Khavronenko
5c049bf4dd vmalert-521: allow to disable rules expression validation. (#536)
This feature may be useful for using `vmalert` with PromQL
compatible datasources like Loki.
2020-06-09 15:19:25 +03:00
Aliaksandr Valialkin
c1be462d42 app/vmauth: disable automatic response compression/uncompression, since it may work improperly in some cases
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
2020-06-05 20:14:07 +03:00
Aliaksandr Valialkin
7680b7155d app/vmauth: emit fatal errors instead of panics when incorrect command-line flags are set 2020-06-05 20:14:05 +03:00
Aliaksandr Valialkin
01719f4949 app/vmstorage/transport: simplify setupTfss in order to prevent the possibility of nil tfs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534
2020-06-05 13:17:26 +03:00
Aliaksandr Valialkin
e4cef1b678 app/vmstorage: prevent from serving conns from vminsert and vmselect after the server is closed
Previously it was possible that the connection is served after the server is closed if the following
steps are performed:

1) Server accepts new connection.
2) Server.MustClose() is called and successfully finished.
3) Server starts processing the connection accepted at step 1. There could be various crashes
   like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/534 since the storage may be already closed.

Now the server closes the connection at step 3 without processing it.
2020-06-05 11:55:48 +03:00
Aliaksandr Valialkin
58069f5a6a app/vmalert: print brief usage info for vmalert -help 2020-06-05 10:43:24 +03:00
Aliaksandr Valialkin
3848ea3a4a app/vmauth: print brief usage info for vmauth -help 2020-06-05 10:40:11 +03:00
Aliaksandr Valialkin
8ad8ca350a app/vmagent: print brief usage info for vmagent -help 2020-06-05 10:40:10 +03:00
Aliaksandr Valialkin
3d0a0b3785 lib/fs: optimize MustGetFreeSpace performance by caching the results for up to 2 seconds 2020-06-04 13:14:04 +03:00
DexterZhang
fa103875a0
feat(vmselect): add tmp block dir size metrics vm_tmp_blocks_files_size_total (#527)
* feat(vmselect): add tmp block dir size metrics `vm_tmp_blocks_files_size_total`

* refactor(vmselect): use free space instead of used space in tmp block file metrics

* fix: add `bytes` suffix to tmp dir free space metric
2020-06-04 13:05:50 +03:00
Aliaksandr Valialkin
faea804b88 app/vmauth: log when -auth.config is reloaded in SIGHUP 2020-06-03 23:22:20 +03:00
Aliaksandr Valialkin
045b87c662 app/vmalert: fix comment for UpdateWith exported methods 2020-06-01 14:35:03 +03:00
Aliaksandr Valialkin
43b14b9569 app/vminsert/netstorage: free up unused memory in buffer after memory usage spikes 2020-06-01 14:33:35 +03:00
Roman Khavronenko
44c51c627f vmalert: Add recording rules support. (#519)
* vmalert: Add recording rules support.

Recording rules support required additional service refactoring since
it wasn't planned to support them from the very beginning. The list
of changes is following:
* new entity RecordingRule was added for writing results of MetricsQL
expressions into remote storage;
* interface Rule now unites both recording and alerting rules;
* configuration parser was moved to separate package and now performs
more strict validation;
* new endpoint for listing all groups and rules in json format was added;
* evaluation interval may be set to every particular group;

* vmalert: uncomment tests

* vmalert: rm outdated TODO

* vmalert: fix typos in README
2020-06-01 13:53:46 +03:00
Aliaksandr Valialkin
37aa4fe282 app/vmagent: reload -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig on SIGHUP and on /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/518
2020-05-30 14:37:02 +03:00
Aliaksandr Valialkin
a646131a33 app/vmagent: log fatal errors instead of panics when improper command-line flags are passed to vmagent 2020-05-30 14:22:38 +03:00
Aliaksandr Valialkin
f41a01332a app/vminsert/netstorage: evenly distribute rerouted rows among all the availalbe storage nodes
Previously such rows were distributed to the original storage node or to the next storage node.
This may result to uneven load among the remaining storage nodes.
2020-05-30 13:51:09 +03:00
Aliaksandr Valialkin
02b2064d8e app/vminsert/netstorage: do not increment vm_rpc_rows_lost_total when all the vmstorage nodes are unavailable, since vminsert retries sending the data instead of dropping it 2020-05-28 22:36:56 +03:00
Aliaksandr Valialkin
7a61357b5d app/vminsert/netstorage: make sure that the the data is always replicated among -replicationFactor vmstorage nodes
Previously vminsert could write multiple copies of the data to a single vmstorage node when the ingestion rate
exceeds the maximum throughput for connections to vmstorage nodes.
2020-05-28 19:59:07 +03:00
Aliaksandr Valialkin
77e5165e7b app/vminsert: add -replicationFactor command-line flag for enabling data replication among available -storageNode instances 2020-05-27 17:29:44 +03:00
Aliaksandr Valialkin
b4e3bffe4b app/vminsert/netstorage: emit warnings instead of errors when re-routing data to healthy storage nodes 2020-05-27 16:31:41 +03:00
Aliaksandr Valialkin
75f2f3b09d app/vminsert/netstorage: improve ingestion performance when a single vmstorage node is slower than other vmstorage nodes
Previously the ingestion performance has been limited by the slowest vmstorage node.
Now vminsert should re-route data from the slowest vmstorage node to the remaining nodes.
2020-05-27 15:08:22 +03:00
Aliaksandr Valialkin
9844845d79 app/vminsert: tune the maximum summary buffer size for pending data to 1/4 of available RAM, since 1/2 of RAM is too big considering GOGC overhead 2020-05-25 02:00:37 +03:00
Aliaksandr Valialkin
4a82631e44 app/vminsert: limit the summary buffer sizes for all the storage nodes to a half of the allowed memory 2020-05-25 01:39:33 +03:00
Aliaksandr Valialkin
4bd3d4b148 app/vminsert/netstorage: do not return error from storageNode.flushBufLocked when the buffer has been successfully re-routed to healthy nodes
This should reduce the number of false errors in the log and the number of falsely lost rows
2020-05-22 18:29:43 +03:00
Aliaksandr Valialkin
6edc33d9bb app/vminsert/netstorage: capture the first error instead of the last error when sending data to vmstorage
The first error has more chances to point to the real root cause of the issue.
2020-05-22 17:49:33 +03:00