VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-21 07:56:26 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	1d7efc4c64	app/vmagent/remotewrite: clarify the reason behind the default value for -remoteWrite.queues in the same way as the reason for -maxConcurrentInserts is defined at `73f5fb0f0c`	2024-03-06 13:57:53 +02:00
Aliaksandr Valialkin	27b9e8ed3e	app/{vmagent,vminsert}: add `-streamAggr.dropInputSamples` command-line flag for dropping the specified labels from input samples before deduplication and streaming aggregation	2024-03-05 02:27:27 +02:00
Aliaksandr Valialkin	c38c45d71f	app/{vminsert,vmagent}: allow using -streamAggr.dedupInterval without -streamAggr.config This allows performing online de-duplication of incoming samples	2024-03-05 00:47:23 +02:00
Aliaksandr Valialkin	48a425898a	lib/streamaggr: enable time alignment for aggregate flushed to multiples of interval For example, if `interval: 1m`, then data flush occurs at the end of every minute, while `interval: 1h` leads to data flush at the end of every hour. Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.	2024-03-04 06:23:35 +02:00
Aliaksandr Valialkin	63d635a5e4	app: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 03:06:14 +02:00
Anton L	8b7ff0f66e	#5833 Fix Deadlock when using shardByURL of VMAgent (#5834 )	2024-02-22 11:54:53 +02:00
Aliaksandr Valialkin	67091537ae	app/vmagent/remotewrite: add -remoteWrite.tlsHandshakeTimeout command-line flag for tuning tls handshake timeout to -remoteWrite.url Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699	2024-02-13 02:46:24 +02:00
Aliaksandr Valialkin	3b18659487	app/vmagent/remotewrite: limit the concurrency for marshaling time series before sending them to remote storage There is no sense in running more than GOMAXPROCS concurrent marshalers, since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth, but they may result in more RAM usage.	2024-01-30 12:20:27 +02:00
Aliaksandr Valialkin	d52fd73f18	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:39:16 +02:00
Aliaksandr Valialkin	d566aa7d78	lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto	2024-01-16 20:48:30 +02:00
Aliaksandr Valialkin	3a9cf13aaa	app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags This is a follow-up for `5ebd5a0d7b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427	2023-12-20 21:38:16 +02:00
Morgan	64e96fccd9	Expose OAuth2 Endpoint Parameters to cli (#5427 ) The user may which to control the endpoint parameters for instance to set the audience when requesting an access token. Exposing the parameters as a map allows for additional use cases without requiring modification.	2023-12-20 21:38:13 +02:00
Aliaksandr Valialkin	261c173f4b	all: use Gauge instead of Counter for `*_config_last_reload_successful` metrics This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag. See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details. This is follow-up for `326a77c697`	2023-12-20 14:25:44 +02:00
Aliaksandr Valialkin	bf187b2dc9	app/vmagent: add `-enableMultitenantHandlers` command-line flag This flag allows converting tenant id to (vm_account_id, vm_project_id) labels. this flag deprecates `-remoteWrite.multitenantURL` command-line flag, because `-enableMultitenantHandlers` is easier to use and combine with multitenant url at vminsert - https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy-via-labels See https://docs.victoriametrics.com/vmagent.html#multitenancy Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1505	2023-12-05 01:35:59 +02:00
Aliaksandr Valialkin	5ccc22d66d	app/vmagent: properly increase vmagent_remotewrite_samples_dropped_total when scraped samples cannot be sent to the remote storage and -remoteWrite.dropSamplesOnOverload is set This is a follow-up for `5034aa0773` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110	2023-11-25 14:44:42 +02:00
Aliaksandr Valialkin	2f14394335	app/vmagent: follow-up for `090cb2c9de` - Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check for the result returned from these functions. - Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue. - Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case. - Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage cannot keep up with the data ingestion rate. - Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples. - Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push data to persistent queue when -remoteWrite.disableOnDiskQueue is set. - Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics, because they are replaced with vmagent_remotewrite_samples_dropped_total metric. - Update 'Disabling on-disk persistence' docs at docs/vmagent.md - Update stale comments in the code Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110	2023-11-25 12:13:39 +02:00
Nikolay	25ac2aac31	app/vmagent: allow to disabled on-disk persistence (#5088 ) * app/vmagent: allow to disabled on-disk queue Previously, it wasn't possible to build data processing pipeline with a chain of vmagents. In case when remoteWrite for the last vmagent in the chain wasn't accessible, it persisted data only when it has enough disk capacity. If disk queue is full, it started to silently drop ingested metrics. New flags allows to disable on-disk persistent and immediatly return an error if remoteWrite is not accessible anymore. It blocks any writes and notify client, that data ingestion isn't possible. Main use case for this feature - use external queue such as kafka for data persistence. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110 * adds test, updates readme * apply review suggestions * update docs for vmagent * makes linter happy --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-25 12:12:29 +02:00
Aliaksandr Valialkin	a906a7d85c	app/vmagent/remotewrite: do not drop persistent queues when -remoteWrite.multitenantURL is set It is unsafe to drop persistent queues when -remoteWrite.multitenantURL command-line flag is set, since these queues are created on demand when a new sample for the given tenant is pushed to the remote storage. This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5357 The issue has been appeared in the commit `f3a51e8b1d` when implementing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014	2023-11-23 20:43:21 +02:00
Aliaksandr Valialkin	369d37749d	app/vmagent/remotewrite: add -remoteWrite.shardByURL.labels command-line flag This command-line flag can be used for specifying a list of labels used for sharding among -remoteWrite.url entries when -remoteWrite.shardByURL command-line flag is set. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4942	2023-11-01 23:09:08 +01:00
Aliaksandr Valialkin	f03e81c693	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-26 09:55:47 +02:00
Hui Wang	d7dd7614eb	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-26 08:56:54 +02:00
Aliaksandr Valialkin	b28f904dfa	app/vmagent/remotewrite: move sas var initialization closer to the place where it is used This makes the code sligthtly easier to understand. This is a follow-up for `1d3d989be5` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5170	2023-10-16 20:54:35 +02:00
hagen1778	1152c30430	app/vmagent/remotewrite: follow-up after `4f102ff945` `4f102ff945` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-16 20:54:35 +02:00
luosjde	c5bd3ff874	vmagent: fix streamaggr config reload bug https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5170 Authored-by: luoshaojun01 <luoshaojun01@baidu.com>	2023-10-16 20:54:35 +02:00
Aliaksandr Valialkin	a5a953fe1e	app/vmagent/remotewrite: fix data race when extra labels are added to samples before sending them to multiple remote storage systems See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972	2023-09-08 23:26:40 +02:00
Aliaksandr Valialkin	d8afd7fe98	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:25:49 +02:00
Aliaksandr Valialkin	1159b31270	app/vmagent/remotewrite: do not retry request immediately on io.ErrUnexpectedEOF, since this error isn't returned on stale connection Also, mention the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139 in comments to the code in order to simplify further maintenance of this code. This is a follow-up for `992a1c0a3a`	2023-08-29 09:48:49 +02:00
hagen1778	33bf28e1bd	app/vmagent: fix comment typo after `992a1c0a3a` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `757ae4275b`)	2023-08-27 09:05:04 +02:00
Roman Khavronenko	b9a2512ac3	vmagent: retry failed write request on the closed connection (#4857 ) * vmagent: retry failed write request on the closed connection Retry failed write request on the closed connection immediately, without waiting for backoff. This should improve data delivery speed and reduce amount of error logs emitted by vmagent when using idle connections. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmagent: retry failed write request on the closed connection Re-instantinate request before retry as body could have been already spoiled. Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com> (cherry picked from commit `992a1c0a3a`)	2023-08-27 09:04:59 +02:00
Aliaksandr Valialkin	50820ab1aa	app/vmagent/remotewrite: follow-up after `a27c2f3773` - Fix Prometheus-compatible naming after applying the relabeling if -usePromCompatibleNaming command-line flag is set. This should prevent from possible Prometheus-incompatible metric names and label names generated by the relabeling. - Do not return anything from relabelCtx.appendExtraLabels() function, since it cannot change the number of time series passed to it. Append labels for the passed time series in-place. - Remove promrelabel.FinalizeLabels() call after adding extra labels to time series, since this call has been already made at relabelCtx.applyRelabeling(). It is user's responsibility if he passes labels with double underscore prefixes to -remoteWrite.label. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247	2023-08-17 14:44:35 +02:00
Alexander Marshalov	73287a7c3a	vmagent: fixed premature release of the context (after #4247 / #4824 ) (#4849 ) Follow-up after `a27c2f3773` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247 Signed-off-by: Alexander Marshalov <_@marshalov.org> (cherry picked from commit `1e1a30ed7f`)	2023-08-17 12:16:04 +02:00
Alexander Marshalov	58cf862b05	fixed applying `remoteWrite.label` for pushed metrics (#4247 ) (#4824 ) vmagent: properly add extra labels before sending data to remote storage labels from `remoteWrite.label` are now added to sent metrics just before they are pushed to `remoteWrite.url` after all relabelings, including stream aggregation relabelings (#4247) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247 Signed-off-by: Alexander Marshalov <_@marshalov.org> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `a27c2f3773`)	2023-08-15 13:48:19 +02:00
Aliaksandr Valialkin	4b1f01e45d	lib/promrelabel: properly replace `:` char with `_` in metric names when -usePromCompatibleNaming command-line flag is set This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3113#issuecomment-1275077071 comment from @johnseekins	2023-08-14 16:18:17 +02:00
Aliaksandr Valialkin	0ee8a9120a	lib/flagutil: add defaultValue arg to NewArray{Int,Bytes,Duration} functions The defaultValue is printed in the flag description when passing -help to the app. This is a follow-up for `aef31f201a` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4776	2023-08-12 04:19:34 -07:00
Aliaksandr Valialkin	02a54dbe63	app/vmagent/remotewrite: go fmt	2023-08-11 06:26:12 -07:00
Aliaksandr Valialkin	b9e34a1386	docs/CHANGELOG.md: add a link to stream aggregation for the description of the bugfix at `a4a1884237` This makes the description more clear. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4804	2023-08-11 05:48:10 -07:00
Aliaksandr Valialkin	d02fb47c2d	app/vmagent/remotewrite: keep in sync the default value for -remoteWrite.sendTimeout option in the description with the actually used timeout This is a follow-up for `aef31f201a` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4776	2023-08-11 05:46:27 -07:00
Zakhar Bessarab	bffec2fc02	{vmagent/remotewrite,vminsert/common}: fix dropInput and keepInput flags inconsistency (#4809 ) {vmagent/remotewrite,vminsert/common}: fix dropInput and keepInput flags inconsistency Sync behavior for dropInput and keepInput flags between single-node and vmagent. Fix vmagent not respecting dropInput flag and reverse logic for keepInput.	2023-08-11 05:40:06 -07:00
Alexander Marshalov	d90dae2a68	add info about `remoteWrite.sendTimeout` default value (#4776 ) Signed-off-by: Alexander Marshalov <_@marshalov.org>	2023-08-11 04:53:16 -07:00
Aliaksandr Valialkin	fa295c7daa	app/vmagent: add ability to shard outgoing data among multiple remote storage systems Add -remoteWrite.shardByURL command-line flag, which instructs vmagent to spread evenly outgoing time series data among the configured remote storage systems specified via -remoteWrite.url . Samples for the same time series go to the same -remoteWrite.url . This allows building horizontally scalable stream aggregation when samples for counter and histogram series must be aggregated by the same second-level vmagent instance. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4637	2023-07-24 18:18:04 -07:00
Aliaksandr Valialkin	c049778ad1	lib/streamaggr: follow-up for `736197179e` - Use a byte slice instead of a map for tracking indexes for matching series. This improves performance, since access by slice index is faster than access by map key. - Re-use the byte slice for tracking indexes for matching series. This removes unnecessary memory allocations and improves stream aggregation performance a bit. - Add an ability to return to the previous behvaiour by specifying -remoteWrite.streamAggr.dropInput command-line flag. In this case all the input samples are dropped when stream aggregation is enabled. - Backport the new stream aggregation behaviour from vmagent to single-node VictoriaMetrics when -streamAggr.config option is set. - Improve docs regarding this change at docs/CHANGELOG.md - Document the new behavior at docs/stream-aggregation.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4243 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4575	2023-07-24 17:06:09 -07:00
Zakhar Bessarab	470afac5ff	{lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag (#4575 ) * {lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag Changes default behaviour of keepInput flag to write series which did not match any aggregators to the remote write. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4243 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Update app/vmagent/remotewrite/remotewrite.go Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-07-24 16:34:38 -07:00
Aliaksandr Valialkin	992c300ce9	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:48:26 -07:00
Zakhar Bessarab	e42f856b56	app/vmagent/remotewrite: fix error message for auth config (#4545 ) Error message will be present for any auth error, but message claims an error is about OAuth2 configuration which is confusing. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-07-06 22:10:13 -07:00
Zakhar Bessarab	7925e9698f	app/vmagent/remotewrite: fix vmagent panic on shutdown (#4407 ) app/vmagent/remotewrite: fix vmagent panic on shutdown Currently, when vmagent is stopping it first flushes pending series in remote write context and proceeds to stop streaming aggregation. This leads to streaming aggregation being unable to write results into pending timeseries (since it is already nil) and panic. This can lead to losing some aggregation results being lost almost silently. The fix is reordering flow to first stop streaming aggregation and flush all pending time series after that. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `ce7141383d`)	2023-06-09 10:40:52 +02:00
Alexander Marshalov	d321ea91f2	fixed typos in documentation and commandline flags descriptions (#4275 )	2023-05-10 02:22:06 -07:00
Aliaksandr Valialkin	079875a127	app/vmagent/remotewrite: make more user-friendly the warning message about too small -remoteWrite.maxdiskUsagePerURL value This is a follow-up for `bc17f4828c` . While at it, document the change at docs/CHANGELOG.md . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4195	2023-05-09 22:48:40 -07:00
Dmytro Kozlov	f425123116	app/vmagent,lib/persistentqueue: show warning message if `--remoteWrite.maxDiskUsagePerURL` flag lower than 500MB (#4196 ) * app/vmagent,lib/persistentqueue: show warning message if `--remoteWrite.maxDiskUsagePerURL` flag lower than 500MB * app/vmagent,lib/persistentqueue: linter fix * app/vmagent,lib/persistentqueue: fix comment	2023-05-08 15:45:21 -07:00
Aliaksandr Valialkin	cf4701db65	lib/fs: add MustReadDir() function Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity. The fs.MustReadDir() logs the error with the directory name and the call stack on error before exit. This information should be enough for debugging the cause of the error.	2023-04-14 22:11:40 -07:00
Aliaksandr Valialkin	dad13c0a91	lib/streamaggr: follow-up for `ff72ca14b9` - Make sure that the last successfully loaded config is used on hot-reload failure - Properly cleanup resources occupied by already initialized aggregators when the current aggregator fails to be initialized - Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config This should simplify monitoring and debugging failed reloads - Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa could be in use at realoadSaConfig() - Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push(). - Remove fine-grained aggregator reload - reload all the aggregators on config change instead. This simplifies the code a bit. The fine-grained aggregator reload may be returned back if there will be demand from real users for it. - Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag - Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639	2023-03-31 22:54:10 -07:00

1 2 3 4 5

216 Commits