VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-24 03:06:48 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	36ee08cad4	Revert "lib/streamaggr: do not flush dedup shards in parallel" This reverts commit `eb40395a1c`. Reason for revert: it has been appeared that the performance gain on multiple CPU cores wasn't visible because the benchmark was generating incorrect pushSample.key. See a207e0bf687d65f5198207477248d70c69284296	2024-03-04 19:13:50 +02:00
Aliaksandr Valialkin	9728aaf5d9	lib/streamaggr: properly generate pushSample.key in benchmarks	2024-03-04 19:13:49 +02:00
Aliaksandr Valialkin	93a057e4e6	lib/streamaggr: reduce the number of pointers at "total" aggregation state This should reduce load on GC when scanning heap objects.	2024-03-04 19:13:49 +02:00
Aliaksandr Valialkin	9e00d8ad60	lib/streamaggr: use multiple job label values in BenchmarkAggregatorsPush instead of single value This should make the benchmark closer to production cases	2024-03-04 19:13:48 +02:00
Aliaksandr Valialkin	9773ad200e	lib/streamaggr: use multiple job labels in BenchmarkAggregatorsPush	2024-03-04 19:13:48 +02:00
Aliaksandr Valialkin	482560a1f3	lib/streamaggr: do not flush dedup shards in parallel This significantly increases CPU usage on systems with many CPU cores, while doesn't reduce flush latency too much	2024-03-04 17:01:42 +02:00
Aliaksandr Valialkin	d7252fce79	lib/streamaggr: reduce memory allocations when registering new series in deduplication and aggregation structs	2024-03-04 17:01:41 +02:00
Aliaksandr Valialkin	402dc14ec0	lib/streamaggr: make aggregate.runFlusher() more roubst and clear	2024-03-04 17:01:41 +02:00
Aliaksandr Valialkin	2ffef39bb3	lib/streamaggr: properly drop samples on the first incomplete interval Previously samples were dropped on the first incomplete interval and the next complete interval. Also make sure that the de-duplication is performed just before flushing the aggregate state. This should help the case then dedup_interval = interval.	2024-03-04 17:01:40 +02:00
Aliaksandr Valialkin	c2dae136b3	lib/streamaggr: explicitly call resetSeries after flushSeries This makes the code less fragile	2024-03-04 06:23:36 +02:00
Aliaksandr Valialkin	48a425898a	lib/streamaggr: enable time alignment for aggregate flushed to multiples of interval For example, if `interval: 1m`, then data flush occurs at the end of every minute, while `interval: 1h` leads to data flush at the end of every hour. Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.	2024-03-04 06:23:35 +02:00
Aliaksandr Valialkin	d80deaeaf4	lib/streamaggr: ignore the first sample in new time series during staleness_interval seconds after the stream aggregation start for total and increase outputs	2024-03-04 03:04:58 +02:00
Aliaksandr Valialkin	5e9cbfd4db	lib/streamaggr: flush dedup state and aggregation state in parallel on all the available CPU cores This should reduce the time needed for aggregation state flush on systems with many CPU cores	2024-03-04 01:22:41 +02:00
Aliaksandr Valialkin	1e741ed6db	lib/streamaggr: add a benchmark for flushing dedup state	2024-03-04 01:22:40 +02:00
Aliaksandr Valialkin	5205972b83	lib/streamaggr: add a benchmark for measuring the performance of aggregator.flush	2024-03-04 01:22:40 +02:00
Aliaksandr Valialkin	8daf7a3f43	lib/streamaggr: add a benchmark for de-duplicating of 1M samples	2024-03-04 01:22:39 +02:00
Aliaksandr Valialkin	d4a425af87	lib/prompbmarshal: use clear() instead of a loop for clearing tss inside ResetTimeSeries()	2024-03-03 23:40:47 +02:00
Aliaksandr Valialkin	b958135677	lib/promutils: optimize LabelsCompressor.Decompress by using a specialized labelsMap struct instead of sync.Map The labelsMap struct employs the fact that label indexes are condensed around 0, so it stores the referred labels in a slice instead of map and uses slice index as label key. This allows increasing the LabelsCompressor.Decompress performance by up to 3x. This also reduces the latency of data flush in stream aggregation.	2024-03-03 23:25:27 +02:00
Aliaksandr Valialkin	0d5d46f9db	lib/streamaggr: huge pile of changes - Reduce memory usage by up to 5x when de-duplicating samples across big number of time series. - Reduce memory usage by up to 5x when aggregating across big number of output time series. - Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components for reducing memory usage for marshaled []prompbmarshal.Label. - Add `dedup_interval` option at aggregation config, which allows setting individual deduplication intervals per each aggregation. - Add `keep_metric_names` option at aggregation config, which allows keeping the original metric names in the output samples. - Add `unique_samples` output, which counts the number of unique sample values. - Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample per each newly encountered time series. - Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output. This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579 - Expose various metrics, which may help debugging stream aggregation: - vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication - vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures - vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures - vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor - vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes - vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes - vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes, which took longer than the configured interval - vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes, which took longer than the configured dedup_interval - Actualize docs/stream-aggregation.md The memory usage reduction increases CPU usage during stream aggregation by up to 30%. This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898	2024-03-02 03:15:43 +02:00
Aliaksandr Valialkin	31f0dc4b97	lib/streamaggr: allow one second aggregation interval	2024-03-01 21:35:43 +02:00
Aliaksandr Valialkin	7533070a52	lib/promrelabel: use clear() function inside CleanLabels()	2024-03-01 21:34:47 +02:00
Aliaksandr Valialkin	052f2177a4	lib/fs: fix GOOS=windows build after `f8baf29b6e`	2024-03-01 01:46:44 +02:00
Aliaksandr Valialkin	816202bca7	lib/protoparser/opentelemetry/firehose: verify that the full response is parsed properly in ProcessRequestBody This is a follow-up for `bf9cb84575` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5899	2024-03-01 00:39:47 +02:00
Andrii Chubatiuk	e575fb1aeb	opentelemetry: fix firehose message parsing (#5899 ) Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan>	2024-03-01 00:24:14 +02:00
Aliaksandr Valialkin	01d8bee14c	lib/mergeset: use unsafe.Slice and unsafe.String instead of deprecated reflect.SliceHeader with unsafe conversion from slice header to string header	2024-02-29 17:29:40 +02:00
Aliaksandr Valialkin	99269ea640	lib/bytesutil: use unsafe.String instead of unsafe conversion of slice header to string header	2024-02-29 17:28:04 +02:00
Aliaksandr Valialkin	ddc61e2309	lib/fs: properly handle the case when data=nil is passed to mUnmap	2024-02-29 17:26:26 +02:00
Aliaksandr Valialkin	22acd84019	lib/storage: use unsafe.Slice instead of deprecated reflect.SliceHeader	2024-02-29 17:24:44 +02:00
Aliaksandr Valialkin	a9fb2e91a6	lib/protoparser/csvimport: unse unsafe.Slice instead of deprecated reflect.SliceHeader	2024-02-29 17:20:05 +02:00
Aliaksandr Valialkin	9bc4c51ceb	lib/fs: use unsafe.Slice instead of deprecated reflect.SliceHeader	2024-02-29 17:18:42 +02:00
Aliaksandr Valialkin	4b1a262475	lib/fastnum: use unsafe.Slice() instead of deprecated reflect.SliceHeader	2024-02-29 17:17:24 +02:00
Aliaksandr Valialkin	3383f73191	lib/bytesutil: make BenchmarkToUnsafeString and BenchmarkToUnsafeBytes more reliable This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5880	2024-02-29 17:12:30 +02:00
helen	74b0605232	Optimize TouUnsafeBytes to make it leaner, more standards-compliant and (#5880 ) slightly faster.	2024-02-29 17:12:04 +02:00
XLONG96	88b9088499	lib/logstorage: avoid panic when parsing regex with stream filter (#5897 )	2024-02-29 15:32:25 +02:00
Aliaksandr Valialkin	7832d0800e	app/{vminsert,vmagent}: follow-up after `67a55b89a4` - Document the ability to read OpenTelemetry data from Amazon Firehose at docs/CHANGELOG.md - Simplify parsing Firehose data. There is no need in trying to optimize the parsing with fastjson and byte slice tricks, since OpenTelemetry protocol is really slooow because of over-engineering. It is better to write clear code for better maintanability in the future. - Move Firehose parser from /lib/protoparser/firehose to lib/protoparser/opentelemetry/firehose, since it is used only by opentelemetry parser. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5893	2024-02-29 14:47:20 +02:00
Andrii Chubatiuk	60cf0c9656	{vmagent,vminsert}: added firehose http destination opentelemetry data ingestion support (#5893 ) Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-29 14:46:16 +02:00
Aliaksandr Valialkin	8187244153	lib/streamaggr: make the BenchmarkAggregatorsPushByJobAvg closer to production case with long list of labels per sample	2024-02-29 02:41:48 +02:00
Hui Wang	d6ecfffa17	chore: add actual request size in error message (#5889 )	2024-02-29 02:40:57 +02:00
Aliaksandr Valialkin	d845edc24b	lib: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 02:10:04 +02:00
Aliaksandr Valialkin	61519f6c22	lib/backup/actions: expose vm_backups_downloaded_bytes_total metric in order to be consistent with vm_backups_uploaded_bytes_total metric	2024-02-24 01:14:57 +02:00
Aliaksandr Valialkin	510e3d9cda	lib/backup/actions: update vm_backups_uploaded_bytes_total metric along the file upload instead of after the file upload This solves two issues: 1. The vm_backups_uploaded_bytes_total metric will grow more smoothly 2. This prevents from int overflow at metrics.Counter.Add() when uploading files bigger than 2GiB	2024-02-24 01:08:34 +02:00
Aliaksandr Valialkin	0ac1c533dc	lib/backup/actions: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 01:02:37 +02:00
Aliaksandr Valialkin	6fd6d4c2de	lib/storage: replace the remaining atomic.* functions with atomic.* types for the sake of consistency See `ea9e2b19a5`	2024-02-24 00:51:03 +02:00
Aliaksandr Valialkin	a1baf25c2e	lib/storage: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-24 00:33:07 +02:00
Aliaksandr Valialkin	ca1e78bd16	lib/logstorage: consistently use atomic.* types instead of atomic.* functions on regular types See `ea9e2b19a5`	2024-02-24 00:29:39 +02:00
Aliaksandr Valialkin	d0538d11d3	lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-24 00:29:12 +02:00
Aliaksandr Valialkin	92e098012a	lib/logstorage: consistently use atomic.* type for refCount and mustDrop fields in datadb and storage structs in the same way as it is used in lib/storage See `ea9e2b19a5` and `a204fd69f1`	2024-02-24 00:28:56 +02:00
Aliaksandr Valialkin	7fa700a41c	lib/mergeset: consistently use atomic.* type for refCount and mustDrop fields in table struct in the same way as it is used in lib/storage See `ea9e2b19a5` and `a204fd69f1`	2024-02-24 00:28:37 +02:00
Aliaksandr Valialkin	e7dfcdfff6	lib/storage: consistently use atomic.* type for refCount and mustDrop fields in indexDB, table and partition structs See `ea9e2b19a5`	2024-02-24 00:26:26 +02:00
Aliaksandr Valialkin	e2b0cc873b	lib/storage: convert dedupsDuringMerge from uint64 to atomic.Uint64 This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions on int and bool types. See `ea9e2b19a5`	2024-02-24 00:25:44 +02:00
Aliaksandr Valialkin	1eb3346ecc	lib/{storage,mergeset}: properly fix 'unaligned 64-bit atomic operation' panic on 32-bit architectures The issue has been introduced in `bace9a2501` The improper fix was in the `d4c0615dcd` , since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field by 4-byte boundary inside partition struct. The proper fix is to use atomic.Int64 field - this guarantees that the access to this field won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860 and https://github.com/golang/go/issues/19057	2024-02-24 00:25:08 +02:00
Aliaksandr Valialkin	dc5b1e4dc1	lib/httpserver: return back the default value for -http.connTimeout to 2 minutes It has been appeared that there are VictoriaMetrics users, who rely on the fact that VictoriaMetrics components were closing incoming connections to -httpListenAddr every 2 minutes by default. So let's return back this value by default in order to fix the breaking change made at `d8c1db7953` . See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1961891450 .	2024-02-24 00:20:11 +02:00
hagen1778	ab4fae9dc2	lib/storage: cleanup after `d4c0615dcd` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `c8d1d2ab72`)	2024-02-23 18:55:40 +01:00
Dmytro Kozlov	eb22083924	lib/storage: fix aligning (#5860 ) (cherry picked from commit `d4c0615dcd`)	2024-02-23 18:55:39 +01:00
Aliaksandr Valialkin	2a5c6e1cd5	app/vmstorage: deprecate -snapshotCreateTimeout command-line flag Creating snapshot shouldn't time out under normal conditions. The timeout was related to the bug, which has been fixed in `6460475e3b` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551	2024-02-23 04:51:57 +02:00
Aliaksandr Valialkin	42437e05c7	lib/storage: do not drop (date, metricID) entries for the date older than 2 days if samples are ingested at this date Previously the (date, metricID) entries for dates older than the last 2 days were removed. This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling). The issue has been introduced in `431aa16c8d`	2024-02-23 04:06:54 +02:00
Aliaksandr Valialkin	83217b7473	app/vmselect: add -search.maxLabelsAPIDuration and -search.maxLabelsAPISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values This commit returns back limits for these endpoints, which have been removed at `5d66ee88bd` , since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-02-23 02:56:58 +02:00
Aliaksandr Valialkin	21170e558c	lib/promutils: hide the math.Round() logic inside ParseTimeMsec() function This should prevent from bugs similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5801 in the future This is a follow-up for `ce3ec3ff2e`	2024-02-23 01:21:42 +02:00
Aliaksandr Valialkin	dfcbcf4368	lib/mergeset: run `go fmt` after `bace9a2501`	2024-02-23 01:21:31 +02:00
Aliaksandr Valialkin	19032f9913	lib/{mergeset,storage}: convert bufferred items to searchable parts more optimally Do not convert shard items to part when a shard becomes full. Instead, collect multiple full shards and then convert them to a searchable part at once. This reduces the number of searchable parts, which, in turn, should increase query performance, since queries need to scan smaller number of parts.	2024-02-23 01:21:03 +02:00
Nikolay	22762d7a69	app/vmselect: change export/csv timestamp format for rfc3339 to respect milliseconds (#5853 ) * app/vmselect: adds milliseconds to the csv export response for rfc3339 * milliseconds is a standard prescion for VictoriaMetrics query request responses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5837 * app/victoria-metrics: adds tests for csv export/import follow-up after 3541a8d0cf96dd4f8563624c4aab6816615d0756 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-02-23 01:16:08 +02:00
Aliaksandr Valialkin	08c5250a7b	lib/storage: handle common case when the number of rows passed to flushRowsToInmemoryParts() doesnt exceed maxRawRowsPerShard	2024-02-23 01:12:18 +02:00
Aliaksandr Valialkin	8669584e9f	lib/{storage,mergeset}: convert beffered items into searchable in-memory parts exactly once per the given flush interval Previously the interval between item addition and its conversion to searchable in-memory part could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp() to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking the time when buffered items must be converted to searchable in-memory parts, since time.Now() calls aren't located in hot paths. Increase the flush interval for converting buffered samples to searchable in-memory parts from one second to two seconds. This should reduce the number of blocks, which are needed to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage. While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems with big amounts of RAM under high data ingestion rate.	2024-02-23 01:11:57 +02:00
Aliaksandr Valialkin	5f1fa8e7f7	lib/storage: avoid superflouos copy of block header data	2024-02-23 01:11:31 +02:00
Aliaksandr Valialkin	a982ab6bfb	app/vmstorage: expose vm_snapshots metric, which shows the current number of snapshots While at it, refresh docs about snapshots - https://docs.victoriametrics.com/#how-to-work-with-snapshots	2024-02-23 01:07:04 +02:00
Aliaksandr Valialkin	3f9022bc08	lib/storage: do not pool rawRowsBlock when flushing rawRows to in-memory blocks The pooled rawRowsBlock objects occupies big amounts of memory between flushes, and the flushes are relatively rare. So it is better to don't use the pool and to allocate rawRow blocks on demand. This should reduce the average memory usage between flushes.	2024-02-23 01:06:28 +02:00
Aliaksandr Valialkin	bf07e2ac87	lib/storage: do not keep rawRows buffer across flush() calls The buffer can be quite big under high ingestion rate (e.g. more than 100MB). This leads to increased memory usage between buffer flushes. So it is better to re-create the buffer on every flush in order to reduce memory usage between buffer flushes.	2024-02-23 01:06:09 +02:00
Alexander Marshalov	8322425364	[lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5814 ) * [lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801) * fixed tests * fixed test * Revert "fixed test" This reverts commit `8a29764806`. * Revert "fixed tests" This reverts commit `9ce13d1042`. * Revert "[lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)" This reverts commit `a7a04bd4` * [lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801) --------- Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-02-23 00:58:26 +02:00
Aliaksandr Valialkin	b58c429044	app/vlselect: follow-up for `451d2abf50` - Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb. See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request. - Document the change at docs/VictoriaLogs/CHANGELOG.md - Document the `limit` query arg at docs/VictoriaLogs/querying/README.md - Make the change less intrusive. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778	2024-02-18 23:06:08 +02:00
Dmytro Kozlov	2d674f98d4	Enable the `limit` query param for the `/select/logsql/query` (#5778 ) * app/vlselect: add limit for logs query * app/vlselect: CHANGELOG.md * app/vlselect: stop search process if limit is reached, update logic, remove default limit * app/vlselect: fix tests * app/vlselect: fix filter tests * app/vlselect: fix tests	2024-02-18 22:59:16 +02:00
Aliaksandr Valialkin	82e38e1627	lib/promscrape: add support for `enable_compression` option in the same way as Prometheus does Updates https://github.com/prometheus/prometheus/pull/13166 Updates https://github.com/prometheus/prometheus/issues/12319 Do not document enable_compression option at docs/sd_configs.md, since vmagent already supports more clear disable_compression option - see https://docs.victoriametrics.com/vmagent/#scrape_config-enhancements	2024-02-18 19:42:09 +02:00
Aliaksandr Valialkin	f0db7d474f	lib/promscrape/discovery/kuma: add support for `client_id` option See https://github.com/prometheus/prometheus/pull/13278	2024-02-18 19:19:55 +02:00
Aliaksandr Valialkin	55bba932d4	docs/CHANGELOG.md: document `f8207e33a2`	2024-02-17 17:55:01 +02:00
Alexander Marshalov	89e9bfc276	lib/httputils: fixed error message for getting zero duration (#5795 ) (#5812 ) (cherry picked from commit `f8207e33a2`)	2024-02-16 15:31:59 +01:00
Aliaksandr Valialkin	33b2553c78	app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition	2024-02-15 14:52:53 +02:00
Aliaksandr Valialkin	d4875cccdf	lib/uint64set: go fmt after `c0a9b87f46`	2024-02-15 14:52:53 +02:00
Aliaksandr Valialkin	4e9b70e8b4	lib/mergeset: optimize Set.AddMulti() a bit for len(items) < 10000 This should improve the search speed for time series matching the given label filters	2024-02-15 14:31:00 +02:00
Aliaksandr Valialkin	c89f4c97f3	lib/uint64set: benchmark AddMulti on small number of items, since this case is the most frequent in lib/storage	2024-02-15 14:31:00 +02:00
Aliaksandr Valialkin	06da06dac0	lib/promrelabel: store the original labels before returning them them to promutils.PutLabels() This should reduce memory allocations. This is a follow-up for `b09bd6c42a` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389	2024-02-14 16:09:38 +02:00
Aliaksandr Valialkin	990a46c478	lib/promrelabel: factor out applyInternal code into ApplyDebug and Apply functions This improves readability and maintanability Also remove memory allocation from SortLabels()	2024-02-14 14:27:44 +02:00
Aliaksandr Valialkin	61608b6303	lib/promscrape: avoid copying labels when -promscrape.dropOriginalLabels command-line flag is set This should save some CPU This regression has been introduced in `487f6380d0` when working on https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389	2024-02-14 03:26:32 +02:00
Aliaksandr Valialkin	f5680a6857	all: upgrade Go builder from Go1.21.7 to Go1.22.0 See https://go.dev/doc/go1.22	2024-02-12 22:14:00 +02:00
Aliaksandr Valialkin	99aaa5067f	lib/mergeset: do not panic on too long items passed to Table.AddItems() Instead, log a sample of these long items once per 5 seconds into error log, so users could notice and fix the issue with too long labels or too many labels. Previously this panic could occur in production when ingesting samples with too long labels.	2024-02-12 20:18:19 +02:00
Aliaksandr Valialkin	397bb8771b	lib/mergeset: properly record the firstItem in metaindexRow at blockStreamWriter.WriteBlock The `3c246cdf00` added an optimization where the previous metaindexRow could be saved to disk when the current block header couldn't be added indexBlock because the resulting indexBlock size became too big. This could result in an empty metaindexRow.firstItem for the next metaindexRow.	2024-02-12 20:16:50 +02:00
Aliaksandr Valialkin	838b2275d7	lib/storage: do not append headerData to bsw.indexData if its size exceeds maxBlockSize This is a follow-up optimization after `3c246cdf00`	2024-02-12 20:16:32 +02:00
Aliaksandr Valialkin	ae12ac69ba	lib/snapshot: move Time, Validate and NewName into lib/snapshot/snapshotutil package This allows removing importing unneeded command-line flags into binaries, which import lib/storage, which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions. This is a follow-up for `83e55456e2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738	2024-02-09 04:19:30 +02:00
Aliaksandr Valialkin	cf64597878	all: add support for specifying multiple -httpListenAddr options	2024-02-09 03:22:49 +02:00
Aliaksandr Valialkin	ae7da12280	lib/httpserver: do not close client connections every 2 minutes by default Closing client connections every 2 minutes doesn't help load balancing - this just leads to "jumpy" connections between multiple backend servers, e.g. the load isn't spread evenly among backend servers, and instead jumps between the servers every 2 minutes. It is still possible periodically closing client connections by specifying non-zero -http.connTimeout command-line flag. This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1636997037 This is a follow-up for `d387da142e`	2024-02-08 21:10:54 +02:00
Khushi Jain	a076cb4a93	app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738 ) (cherry picked from commit `83e55456e2`)	2024-02-08 15:58:34 +01:00
Aliaksandr Valialkin	1856c9fcc1	lib/mergeset: add a test for too long item passed to Table.AddItems()	2024-02-08 14:14:23 +02:00
Aliaksandr Valialkin	d2a846eddd	lib/mergeset: typo fix: indexdb/indexBlock -> indexdb/indexBlocks	2024-02-08 14:14:23 +02:00
Aliaksandr Valialkin	950b126a09	lib/{storage,mergeset}: do not create index blocks with sizes exceeding 64Kb in common case This should reduce memory fragmentation and memory usage for indexdb/indexBlocks and storage/indexBlocks caches	2024-02-08 14:14:22 +02:00
Aliaksandr Valialkin	1c3eac5c1e	lib/mergeset: verify that the index block for in-memory part doesnt exceed the 3*maxIndexBlockSize	2024-02-08 14:14:22 +02:00
Aliaksandr Valialkin	9a3a88b321	lib/mergeset: do not store commonPrefix in blockHeader if the block contains only a single item There is no sense in storing commonPrefix for blockHeader containing only a single item, since this only increases blockHeader size without any benefits.	2024-02-08 14:14:22 +02:00
Aliaksandr Valialkin	ae2a9c8195	lib/mergeset: prevent from possible `too big indexBlockSize` panic This panic could occur when samples with too long label values are ingested into VictoriaMetrics. This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each). This may inflate the maximum index block size by 4 * maxIndexBlockSize.	2024-02-08 12:55:58 +02:00
Aliaksandr Valialkin	ec02e9ba19	lib/protoparser/datadogsketches: use math.RoundToEven() for calculating the rank The original code uses this function - see `48d52eeea6/pkg/quantile/sparse.go (L138)` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775	2024-02-07 21:45:05 +02:00
Aliaksandr Valialkin	28fffdfcc7	lib/protoparser/datadogsketches: add more permalinks to the original source code These permalinks should help verifying the correctness of the code This is a follow-up after `07213f4e0c` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775	2024-02-07 21:45:05 +02:00
Andrii Chubatiuk	3aa439a618	added ddsketch permalink (#5775 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>	2024-02-07 21:45:04 +02:00
Aliaksandr Valialkin	5d9e0ab71e	docs/CHANGELOG.md: support empty command-line flag values in short array notation For example, -fooDuration=',10s,' is now supported - it sets three command-line flag values: - the first and the last one are set to the default value for `-fooDuration` - the second one is set to 10s	2024-02-07 20:55:01 +02:00
Aliaksandr Valialkin	82f4e4e070	app/{vmagent,vminsert}: follow-up after `a1d1ccd6f2` - Document the change at docs/CHANGELOG.md - Copy changes from docs/Single-server-VictoriaMetrics.md to README.md - Add missing handler for processing multitenant requests ( https://docs.victoriametrics.com/vmagent/#multitenancy ) - Substitute github.com/stretchr/testify dependency with 3 lines of code in the added tests - Comment unclear code at lib/protoparser/datadogsketches/parser.go , so @AndrewChubatiuk could update it and add permalinks to the original source code there. - Various code cleanups Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3091	2024-02-07 01:31:52 +02:00
Andrii Chubatiuk	c634859c4f	support datadog /api/beta/sketches API (#5584 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-07 01:30:00 +02:00
Aliaksandr Valialkin	293617028d	lib/storage: move fixupTimestamps() call to Block.Init() This is a follow-up for `0bf7921721`	2024-02-06 22:44:09 +02:00
Zakhar Bessarab	fdbc44d813	lib/storage/raw_row: properly initialize TS for tmp blocks (#5762 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-02-06 22:44:08 +02:00
Aliaksandr Valialkin	e19b53748a	lib/fs: lazily open the file at ReaderAt on the first access This should significantly reduce the number of open ReaderAt files on VictoriaMetrics and VictoriaLogs startup. The open files can be tracked via vm_fs_readers metric	2024-02-06 21:10:00 +02:00
Aliaksandr Valialkin	bace92fab6	lib/httpserver: add support for mTLS for requests to -httpListenAddr	2024-02-06 17:47:27 +02:00
Aliaksandr Valialkin	f222cf9200	lib/cgroup: remove SetGOGC() function GOGC can be already set via environment variable. There is no need in adding new approaches for setting the GOGC (such as command-line flag), since they complicate operations.	2024-02-05 12:13:08 +02:00
Aliaksandr Valialkin	8148cc52c9	lib/prompbmarshal: code cleanup after `8aaa828ba3`	2024-02-01 21:41:10 +02:00
Aliaksandr Valialkin	7a9f0b32a2	app/vmselect/netstorage: prevent from disk write IO when closing temporary files Remove temporary file before closing it in order to signal the OS that it shouldn't store the file contents from page cache to disk when the file is closed. Gracefully handle the case when the file cannot be removed before being closed - in this case remove the file after closing it. This allows working on Windows. Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing. This is a follow-up for `9b1e002287` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2024-02-01 19:54:48 +02:00
Dima Lazerka	d561f506cd	Improve docs on security http headers (#5262 ) * Improve docs on security http headers * Apply suggestions from code review --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-02-01 14:40:57 +02:00
noodles2hg	60a8e59366	lib/logstorage: proper exit during block search (#5400 )	2024-02-01 14:11:20 +02:00
Jiajing LU	9c75e3ee15	count inmemoryParts that have not been taken for merge (#5447 )	2024-02-01 14:07:13 +02:00
Aliaksandr Valialkin	6c56f49f9c	lib/prompbmarshal: return back custom protobuf marshaler for lib/prompbmarshal.WriteRequest The easyproto-based marshaler is 2x slower than the previous custom marshaler, so let's stick with it. This improves the performance for sending data to remote storage at vmagent and reduces CPU usage to pre-v1.97.0 levels.	2024-02-01 06:34:46 +02:00
Aliaksandr Valialkin	faeabfc730	lib/encoding: follow-up for `49e3665d6d` Improve performance for typical cases of varint marshaling / unmarshaling further. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5721	2024-02-01 05:38:58 +02:00
Fuchun Zhang	78af9b3e30	make encoding.MarshalVarInt64s faster (#5721 ) * make encoding.MarshalVarInt64s faster * add fast path for MarshalVarInt64s * make UnmarshalVarUint64s faster * remove comment	2024-02-01 03:33:59 +00:00
Aliaksandr Valialkin	eee210810e	lib/encoding: added benchmarks for marshaling / unmarshaling of varints This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5721	2024-02-01 05:11:35 +02:00
helen	99ea84f0fd	clean unused code (#5735 ) Signed-off-by: helen <haitao.zhang@daocloud.io>	2024-01-31 19:51:35 +02:00
Aliaksandr Valialkin	cc626ae3b5	lib/promauth: follow-up for `fca3b14b7b` - Simplify the code for handling BasicAuthConfig at lib/promauth/config.go - Move the description of the change into correct place at docs/CHANGELOG.md - Put tests for username in front of tests for password at lib/promauth/config_test.go Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5720 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511	2024-01-31 19:47:53 +02:00
Nihal	bcd094ac8b	Support for username_file in scrape config (basic_auth) similar to Prometheus for having config compatibility (#5720 ) * adding support for username_file in basic_auth of scrape config Signed-off-by: Syed Nihal <syed.nihal@nokia.com> * adding support for username_file in basic_auth of scrape config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511 Signed-off-by: Syed Nihal <syed.nihal@nokia.com> * adding support for username_file in basic_auth of scrape config Signed-off-by: Syed Nihal <syed.nihal@nokia.com> * adding support for username_file in basic_auth of scrape config Signed-off-by: Syed Nihal <syed.nihal@nokia.com> * adding support for username_file in basic_auth of scrape config Signed-off-by: Syed Nihal <syed.nihal@nokia.com> --------- Signed-off-by: Syed Nihal <syed.nihal@nokia.com>	2024-01-31 19:47:50 +02:00
Aliaksandr Valialkin	09c388a8e4	lib/promscrape: use the standard net/http.Client instead of fasthttp.Client for scraping targets in non-streaming mode While fasthttp.Client uses less CPU and RAM when scraping targets with small responses (up to 10K metrics), it doesn't work well when scraping targets with big responses such as kube-state-metrics. In this case it could use big amounts of additional memory comparing to net/http.Client, since fasthttp.Client reads the full response in memory and then tries re-using the large buffer for further scrapes. Additionally, fasthttp.Client-based scraping had various issues with proxying, redirects and scrape timeouts like the following ones: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5425 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2794 - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017 This should help reducing memory usage for the case when target returns big response and this response is scraped by fasthttp.Client at first before switching to stream parsing mode for subsequent scrapes. Now the switch to stream parsing mode is performed on the first scrape after reading the response body in memory and noticing that its size exceeds the value passed to -promscrape.minResponseSizeForStreamParse command-line flag. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567 Overrides https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4931	2024-01-30 18:39:55 +02:00
Aliaksandr Valialkin	645365b2d1	lib/promscrape: fix BenchmarkScrapeWorkScrapeInternal, which has been broken by the commit `65bc460323`	2024-01-30 16:07:40 +02:00
Aliaksandr Valialkin	61562cdee9	lib/storage: keep (date, metricID) entries only for the last two dates Entries for the previous dates is usually not used, so there is little sense in keeping them in memory. This should reduce the size of storage/date_metricID cache, which can be monitored via vm_cache_entries{type="storage/date_metricID"} metric.	2024-01-29 18:44:27 +01:00
hagen1778	2ff94b2bfa	lib/streamaggr: fix incorrect err message for min `interval` value Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-29 17:27:23 +01:00
Aliaksandr Valialkin	f5559c038c	lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests This limit has little sense for these APIs, since: - Thses APIs frequently result in scanning of all the time series on the given time range. For example, if extra_filters={datacenter="some_dc"} . - Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit, which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests. Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API. This limit shouldn't affect typical use cases for these APIs: - Grafana dashboard load when dashboard labels should be loaded - Auto-suggestion list load when editing the query in Grafana or vmui Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-01-29 16:44:46 +01:00
Aliaksandr Valialkin	412f872597	lib/decimal: follow-up for `e6bad5174f` - Add a benchmark for CalbirateAndScale. - Reduce the decimal multipliers table size from 256Kb to 192bytes. - Use more clear naming for variables. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5672	2024-01-27 00:08:32 +01:00
Fuchun Zhang	e6bad5174f	Optimize the performance of data merge: decimal.CalibrateScale() (#5672 ) * Optimize the performance of data merge: decimal.CalibrateScale() from 49633 ns/op to 9146 ns/op * Optimize the performance of data merge: decimal.CalibrateScale()	2024-01-27 00:05:04 +01:00
Hui Wang	f579adf05f	add inserting comma inside value instruction to flag description (#5666 )	2024-01-26 22:47:33 +01:00
Roman Khavronenko	9e9f170fe7	lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689 ) Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected. The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 22:45:45 +01:00
Aliaksandr Valialkin	7a8b92b590	lib/{mergeset,storage}: make background merge more responsive and scalable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for `fa566c68a6` . Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2024-01-26 22:19:52 +01:00
Aliaksandr Valialkin	c067f3f288	lib/mergeset: remove inmemoryBlock pooling, since it wasn't effecitve This should reduce memory usage a bit when new time series are ingested at high rate (aka high churn rate)	2024-01-26 21:34:22 +01:00
Aliaksandr Valialkin	230ef43a32	lib/logstorage: make sure that WaitGroup.Add isnt called after stopCh is closed and WaitGroup.Wait is called This protects from rare panic, which may occur during graceful shutdown of VictoriaLogs	2024-01-26 21:18:07 +01:00
Aliaksandr Valialkin	0715f1efcd	lib/storage: rename AssistedMerges to AssistedMergesCount in order to make these field names less misleading These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code	2024-01-25 10:21:13 +02:00
Aliaksandr Valialkin	1cdef56d84	lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts The maxFileParts usage has been accidentally removed in `fa566c68a6` While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading. Previously their names were falsely suggesting that these are gauges, which show the number of concurrently executed assisted merges.	2024-01-24 15:10:48 +02:00
Aliaksandr Valialkin	b8c7f0d3bc	lib/promscrape/discovery/kubernetes: typo fix in the comment for ContainerStateTerminated struct This is a follow-up for `ef12598ad4`	2024-01-24 15:10:47 +02:00
Aliaksandr Valialkin	1e364c992d	lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers Already terminated pods and containers cannot be scraped and will never resurrect, so there is zero sense in creating scrape targets for them.	2024-01-24 14:58:51 +02:00
Aliaksandr Valialkin	e6e5b97e1e	lib/streamaggr: expand `%{ENV}` placeholders in stream aggregation configs	2024-01-24 12:31:42 +02:00
Aliaksandr Valialkin	12698b9136	lib/mergeset: really limit the number of in-memory parts to 15 It has been appeared that the registration of new time series slows down linearly with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part when it searches for TSID by newly ingested metric name. The number of in-memory parts grows when new time series are registered at high rate. The number of in-memory parts grows faster on systems with big number of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries for the indexdb, and every such entry is transformed eventually into a separate in-memory part. The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 by @misutoth - to limit the number of in-memory parts with buffered channel. This solution is implemented in this commit. Additionally, this commit merges per-CPU parts into a single part before adding it to the list of in-memory parts. This reduces CPU load when searching for TSID by newly ingested metric name. The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190	2024-01-24 03:41:19 +02:00
Aliaksandr Valialkin	8dd73574ca	lib/encoding: remove uneeded re-slicing of byte slice before passing it to binary.BigEndian.Uint*	2024-01-23 22:50:11 +02:00
Aliaksandr Valialkin	5a97668ad6	lib/handshake: substitute time.Now() with fastttime.UnixTimestamp(), since profiling shows time.Now() is slow	2024-01-23 18:39:28 +02:00
Aliaksandr Valialkin	3199558da9	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:47:40 +02:00
Aliaksandr Valialkin	68d76b1436	lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average. The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.	2024-01-23 16:13:25 +02:00
Aliaksandr Valialkin	9b3217db61	lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting	2024-01-23 16:13:19 +02:00
Aliaksandr Valialkin	7ed7eb95b4	lib/filestream: do not measure read / write duration from / to in-memory buffers Measuring read / write duration from / to in-memory buffers has little sense, since it will be always fast. It is better to measure read / write duration from / to real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics. This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers. This is a follow-up for `2f63dec2e3`	2024-01-23 14:53:35 +02:00
Roman Khavronenko	8461add541	lib/promscrape: respect `0` value for `series_limit` param (#5663 ) * lib/promscrape: respect `0` value for `series_limit` param Respect `0` value for `series_limit` param in `scrape_config` even if global limit was set via `-promscrape.seriesLimitPerTarget`. Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`. This behavior aligns with possibility to override `series_limit` value via relabeling with `__series_limit__` label. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update docs/CHANGELOG.md --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 13:09:36 +02:00
Aliaksandr Valialkin	c2927053ee	lib/mergeset: make sure that the first and the last items are in the original range after prepareBlock() Previously the checks were to strict by requiring to leave the same first and last items by prepareBlock() Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5655	2024-01-23 12:59:04 +02:00
Aliaksandr Valialkin	389159767d	lib/mergeset: skip comparison for every item in the block during merge if the last item in the block is smaller than the first item in the next block Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5651	2024-01-23 03:16:30 +02:00
Zakhar Bessarab	60ef978ffc	lib/storage: print tenant ID in log when discarding or truncating labels (#5658 ) Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 02:27:59 +02:00
Aliaksandr Valialkin	d52fd73f18	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:39:16 +02:00
Aliaksandr Valialkin	64e615e6cc	lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex. The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache. Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores. The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache. This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds because of bottleneck at the mutex. Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2 cache hits under the mutex, e.g. when the entry is found in the mutable cache. This should automatically adjust intervals between merges depending on the addition rate for new time series (aka churn rate): - The interval will be much smaller than 10 seconds under high churn rate. This should reduce the mutex contention for mutable cache. - The interval will be bigger than 10 seconds under low churn rate. This should reduce the uneeded work on merging of mutable cache into immutable cache.	2024-01-22 18:14:30 +02:00
Aliaksandr Valialkin	c6f6f094c5	Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577 )" This reverts commit `cfec258803`. Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled. The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled. This prevents from storing the last response at scrapeWork.processScrapedData(). It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577	2024-01-22 01:46:12 +02:00
Aliaksandr Valialkin	d4a1a28543	app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery() This is a follow-up for `cf03e11d89` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630	2024-01-22 01:39:27 +02:00
Hui Wang	49fa92c1d0	lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557 ) * lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long. * remove mislead comment * docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 * wip * lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds. But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice` is being registered and the discovery of the associated `pod` and/or `service` objects takes longer than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details. Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher if the number of in-flight calls is non-zero. P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets. * typo fix --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-22 01:33:17 +02:00
Aliaksandr Valialkin	885ee160c2	all: allow dynamically reading *AuthKey flag values from files and urls Examples: 1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath 2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath 3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url The flag value is automatically updated when the file contents changes.	2024-01-22 01:23:23 +02:00
Aliaksandr Valialkin	5f5fcab217	all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*	2024-01-22 01:13:41 +02:00
Aliaksandr Valialkin	be5faef552	lib/promscrape: code cleanup: send stale markers immediately after generating automatic metrics This cleanup has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5557/files#diff-6b205cf6637d7b65a5c45d9417d08822d4efad94227268cb196f61aa2a0fc0f7	2024-01-22 01:12:56 +02:00
Aliaksandr Valialkin	e15f07d989	all: consistently clear prompbmarshal.Label by assigning an empty struct instead of zeroing Name and Value individually	2024-01-22 01:11:59 +02:00
Aliaksandr Valialkin	2f94bef59c	lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410	2024-01-22 01:11:36 +02:00
Aliaksandr Valialkin	2c7c812a9d	lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640 Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593	2024-01-22 01:08:52 +02:00
Nikolay	e196c61e36	app/vmselect: abort streaming connections for vmselect (#5650 ) * app/vmselect: abort streaming connections for vmselect due to streaming nature of export APIs, curl and simmilr tools cannot detect errors that happened after http.Header with status 200 was written to it. This PR tracks if body write was already started and closes connection. It allows client to detect not expected chunk sequence and return error to the caller. Mostly it affects vmselect at cluster version https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645 * wip Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5650 --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-22 00:54:32 +02:00
Aliaksandr Valialkin	c05982bfa7	lib/promscrape/discovery/hetzner: follow-up after `03a97dc678` - docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names, document missing __meta_hetzner_role label. - lib/promscrape/config.go: added missing MustStop() call for Hetzner SD, and moved the code to the correct place according to alphabetical order of SD names. - lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses, populate missing __meta_hetzner_role label like Prometheus does. - Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does. - Remove unused SDConfig.Token. - Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154	2024-01-22 00:53:23 +02:00
Hui Wang	66eb013b54	lib/promscrape: do not store last scrape response when stale markers … (#5577 ) * lib/promscrape: do not store last scrape response when stale markers are disabled * update changelog	2024-01-22 00:52:25 +02:00
Aliaksandr Valialkin	41d6c8a7dd	lib/storage: do not prefetch metric names for small number of metricIDs This eliminates prefetchedMetricIDsLock lock contention for queries, which return less than 500 time series. This is a follow-up for `9d886a2eb0`	2024-01-17 13:50:01 +02:00
Aliaksandr Valialkin	09f23b0296	lib/promscrape: cosmetic changes after `3ac44baebe` - Rename mustLoadScrapeConfigFiles() to loadScrapeConfigFiles(), since now it may return error. - Split too long line with the error message into two lines in order to improve readability a bit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5508 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5560	2024-01-17 01:07:16 +02:00
Aliaksandr Valialkin	75c58ab306	lib/httputils: handle step=undefined query arg as an empty value This is needed for Grafana, which may send step=undefined when working with alerting rules and instant queries.	2024-01-17 00:13:04 +02:00
Aliaksandr Valialkin	f673039e86	lib/storage: follow-up for `4b8088e377` - Clarify the bugfix description at docs/CHANGELOG.md - Simplify the code by accessing prefetchedMetricIDs struct under the lock instead of using lockless access to immutable struct. This shouldn't worsen code scalability too much on busy systems with many CPU cores, since the code executed under the lock is quite small and fast. This allows removing cloning of prefetchedMetricIDs struct every time new metric names are pre-fetched. This should reduce load on Go GC, since the cloning of uin64set.Set struct allocates many new objects.	2024-01-16 22:38:57 +02:00
Hui Wang	2f40ed3aac	exit vmagent if there is config syntax error in `scrape_config_files` when `-promscrape.config.strictParse=true` (#5560 )	2024-01-16 22:35:18 +02:00
Aliaksandr Valialkin	6ba2fd3312	app/vmselect/promql: follow-up for `ce4f26db02` - Document the bugfix at docs/CHANGELOG.md - Filter out NaN values before sorting as suggested at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509#discussion_r1447369218 - Revert unrelated changes in lib/filestream and lib/fs - Use simpler test at app/vmselect/promql/exec_test.go Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5506	2024-01-16 22:13:13 +02:00
Zongyang	cb37df5723	FIX bottomk doesn't return any data when there are no time range overlap between timeseries (#5509 ) * FIX sort order in bottomk * Add lessWithNaNsReversed for bottomk * Add ut for TopK * Move lt from loop * FIX lint * FIX lint * FIX lint * Mod log format --------- Co-authored-by: xiaozongyang <xiaozngyang@kanyun.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-16 22:12:49 +02:00
Aliaksandr Valialkin	724223fad4	lib/prompbmarshal: move WriteRequest proto definition to the correct place	2024-01-16 21:57:03 +02:00
Aliaksandr Valialkin	0196902b2e	lib/promscrape/discovery/hetzner: fix golangci-lint warnings after `03a97dc678` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550	2024-01-16 21:51:48 +02:00
Aliaksandr Valialkin	9e5e514faf	lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() Previously the was a race condition when the background goroutine still could try collecting metrics from already stopped resources after returning from pushmetrics.Stop(). Now the pushmetrics.Stop() waits until the background goroutine is stopped before returning. This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5549 and the commit `fe2d9f6646` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548	2024-01-16 21:18:22 +02:00
Aleksandr Stepanov	3a6e3adc7d	vmagent: added hetzner sd config (#5550 ) * added hetzner robot and hetzner cloud sd configs * remove gettoken fun and update docs * Updated CHANGELOG and vmagent docs * Updated CHANGELOG and vmagent docs --------- Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-01-16 21:13:20 +02:00
Roman Khavronenko	d562d772a8	lib/storage: properly check for `storage/prefetchedMetricIDs` cache expiration deadline (#5607 ) Before, this cache was limited only by size. Cache invalidation by time happens with jitter to prevent thundering herd problem. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-16 21:08:59 +02:00
Aliaksandr Valialkin	d566aa7d78	lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto	2024-01-16 20:48:30 +02:00
Aliaksandr Valialkin	f7b589e38a	lib/prompb: switch to github.com/VictoriaMetrics/easyproto	2024-01-16 20:43:09 +02:00
Aliaksandr Valialkin	7d40506744	lib/prompb: change type of Label.Name and Label.Value from []byte to string This makes it more consistent with lib/prompbmarshal.Label	2024-01-16 20:41:37 +02:00
Aliaksandr Valialkin	8cb138e8df	lib/protoparser/datadogv2: simplify code for parsing protobuf messages after `0597718435` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451	2024-01-16 20:35:17 +02:00
Aliaksandr Valialkin	f8ae2abd88	lib/protoparser/opentelemetry: use github.com/VictoriaMetrics/easyproto for protobuf message unmarshaling and marshaling This reduces VictoriaMetrics binary size by 100KB. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2570 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2424	2024-01-16 20:34:18 +02:00
Aliaksandr Valialkin	9eef72bce9	lib/protoparser/datadogv2: add support for reading protobuf-encoded requests at /api/v2/series endpoint Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094	2024-01-16 20:32:15 +02:00
Artem Navoiev	e1005209ba	docs: mention staleNaN handling during deduplication See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5587 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-16 20:11:45 +02:00
hagen1778	f301dc5cfb	lib/uint64: remove accidentally added test Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-09 13:32:22 +01:00
hagen1778	2a7207f38a	app/all: follow-up after `84d710beab` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-09 13:17:09 +01:00
zhdd99	84d710beab	lib/pushmetrics: fix a panic caused by pushing metrics during the graceful shutdown process of vmstorage nodes. (#5549 ) Co-authored-by: zhangdongdong <zhangdongdong@kuaishou.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-01-09 13:01:03 +01:00
Aliaksandr Valialkin	12de0d39eb	lib/protoparser/datadogv2: take into account source_type_name field, since it contains useful value such as kubernetes, docker, system, etc.	2023-12-21 23:05:52 +02:00
Aliaksandr Valialkin	6feef14095	lib/protoparser: add missing /datadog/ prefix to the /api/v2/series path in the description for -datadog.maxInsertRequestSize command-line flag	2023-12-21 21:05:24 +02:00
Aliaksandr Valialkin	62a105d9e9	app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent This commit adds only JSON support - https://docs.datadoghq.com/api/latest/metrics/#submit-metrics , while recent versions of DataDog Agent send data to /api/v2/series in undocumented Protobuf format. The support for this format will be added later. Thanks to @AndrewChubatiuk for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451	2023-12-21 20:50:27 +02:00
Aliaksandr Valialkin	426a451435	lib/promauth: add more context to errors returned by Options.NewConfig() in order to simplify troubleshooting	2023-12-20 21:58:19 +02:00
Aliaksandr Valialkin	3a9cf13aaa	app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags This is a follow-up for `5ebd5a0d7b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427	2023-12-20 21:38:16 +02:00
Morgan	64e96fccd9	Expose OAuth2 Endpoint Parameters to cli (#5427 ) The user may which to control the endpoint parameters for instance to set the audience when requesting an access token. Exposing the parameters as a map allows for additional use cases without requiring modification.	2023-12-20 21:38:13 +02:00
Nikolay	46a335aa1d	lib/awsapi: properly assume role with webIdentity token (#5495 ) * lib/awsapi: properly assume role with webIdentity token introduce new irsaRoleArn param for config. It's only needed for authorization with webIdentity token. First credentials obtained with irsa role and the next sts assume call for an actual roleArn made with those credentials. Common use case for it - cross AWS accounts authorization https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3822 * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-12-20 19:07:04 +02:00
Aliaksandr Valialkin	261c173f4b	all: use Gauge instead of Counter for `*_config_last_reload_successful` metrics This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag. See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details. This is follow-up for `326a77c697`	2023-12-20 14:25:44 +02:00
Aliaksandr Valialkin	0a99c819bf	all: add -metrics.exposeMetadata command-line flag, which can be used for adding TYPE and HELP metadata for metrics exposed at /metrics page This may be needed for systems, which require this metadata such as Google Cloud Managed Prometheus. See https://cloud.google.com/stackdriver/docs/managed-prometheus/troubleshooting#missing-metric-type	2023-12-19 03:26:02 +02:00
Aliaksandr Valialkin	9540d29154	lib/pushmetrics: add -pushmetrics.header and -pushmetrics.disableCompression command-line flags	2023-12-17 19:58:14 +02:00
Aliaksandr Valialkin	76b120e355	lib/protoparser/opentelemetry: allow ingesting metrics without resource labels Some clients may ingest samples via OpenTelemetry protocol without Resource labels. Previously VictoriaMetrics was silently dropping such samples. The commit `317834f876` added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"} counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459 It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.	2023-12-17 19:16:43 +02:00
Zakhar Bessarab	61f400eccb	lib/protoparser/opentelemetry: add metric to track skipped rows without resource (#5459 ) Currently, it is impossible to understand why metrics are not ingested when resource is not set by OTEL exporter. Adding metric should simplify debugging and make it improve debuggability. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `317834f876`)	2023-12-15 11:54:07 +01:00
Aliaksandr Valialkin	329bd244d2	lib/fs: remove unused IsEmptyDir() This function became unused after the commit `43b24164ef` The unused function has been found with deadode tool - https://go.dev/blog/deadcode	2023-12-14 19:40:52 +02:00
Aliaksandr Valialkin	e4bb2808f1	app/vmselect: add support for vmstorage groups with independent -replicationFactor per group Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5197 See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmstorage-groups-at-vmselect Thanks to @zekker6 for the initial pull request at https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/718	2023-12-13 00:14:34 +02:00
hagen1778	14117f2f90	lib/promscrape: comsetic changes after `e373bb84d5` * fix typos in docs * add `shard-` prefix to generated links when `-promscrape.cluster.memberURLTemplate` is enabled Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `e0fc5ef140`)	2023-12-12 13:45:34 +01:00
Aliaksandr Valialkin	15a7542ef8	vendor: run `make vendor-update`	2023-12-11 10:48:47 +02:00
Aliaksandr Valialkin	49552eaa15	app/vmauth: add support for `hot standby` mode via `first_available` load balancing policy vmauth in `hot standby` mode sends requests to the first url_prefix while it is available. If the first url_prefix becomes unavailable, then vmauth falls back to the next url_prefix. This allows building highly available setup as described at https://docs.victoriametrics.com/vmauth.html#high-availability Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792	2023-12-08 23:32:10 +02:00
Aliaksandr Valialkin	475ae2a1be	lib/promscrape: add a wraning when the /service-discovery page contains incomplete list of dropped targets	2023-12-08 19:04:29 +02:00
noodles2hg	f3c237bae1	lib/streamaggr/streamaggr.go: fix link in error message (#5439 )	2023-12-08 18:14:29 +02:00
Aliaksandr Valialkin	9074ab68d4	lib/promscrape: add `-promscrape.cluster.memberURLTemplate` command-line flag for creating direct links to vmagent instances at /service-discovery page See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018#issuecomment-1843811569	2023-12-07 16:05:03 +02:00
Aliaksandr Valialkin	896a0f32cd	lib/promscrape: show -promscrape.cluster.memberNum values for vmagent instances, which scrape the given dropped target at /service-discovery page The /service-discovery page contains the list of all the discovered targets after the commit `487f6380d0` on all the vmagent instances in cluster mode ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). This commit improves debuggability of targets in cluster mode by providing a list of -promscrape.cluster.memberNum values per each target at /service-discovery page, which has been dropped becasue of sharding, e.g. if this target is scraped by other vmagent instances in the cluster. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018	2023-12-07 00:11:30 +02:00
Aliaksandr Valialkin	e8dfecb3f1	lib/promscrape: show `never scraped` message for never scraped targets at /targets page	2023-12-06 22:33:27 +02:00
Aliaksandr Valialkin	8b6bce61e4	lib/promscrape: follow-up for `97373b7786` Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down /metrics exposition significantly when -promscrape.config contains thousands of scrape jobs. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335	2023-12-06 17:36:48 +02:00
Hui Wang	065f5a7f9e	vmagent: add `vm_promscrape_scrape_pool_targets` for scrape jobs like… (#5335 ) * vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers * add extra panel for new metric	2023-12-06 14:46:02 +02:00
Aliaksandr Valialkin	559e4db512	Revert "add datadog /api/v2/series and /api/beta/sketches support (#5094 )" This reverts commit `d6b4c8e4ef`. Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:30:40 +02:00
Aliaksandr Valialkin	61db92cdc7	Revert "lib/protoparser/datadog: follow-up after 543f218fe96574b9b2189c8350bb09afa349e3bb" This reverts commit `73d18fbc7a`. Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080	2023-12-05 02:29:00 +02:00
Aliaksandr Valialkin	85fcefaa34	app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers - Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags. - Wait until message processing goroutines are stopped before returning from gcppubsub.Stop(). - Prevent from multiple calls to Init() without Stop(). - Drop message if tenantID cannot be parsed properly. - Take into account tenantID for all the supported message formats. - Support gzip-compressed messages for graphite format. - Use exponential backoff sleep when the message cannot be pushed to remote storage systems because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence - Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called. - Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub - Mention Google PubSub support at docs/enterprise.md - Make Google PubSub docs more clear at docs/vmagent.md This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984 and e6eab781ce42285a6a1750dc01eba6801dd35516 . Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717 Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713	2023-12-04 22:51:04 +02:00
Aliaksandr Valialkin	b6d6a3a530	lib/promscrape: show dropped targets because of sharding at /service-discovery page Previously the /service-discovery page didn't show targets dropped because of sharding ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ). Show also the reason why every target is dropped at /service-discovery page. This should improve debuging why particular targets are dropped. While at it, do not remove dropped targets from the list at /service-discovery page until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets . Previously the list was cleaned up every 10 minutes from the entries, which weren't updated for the last minute. This could complicate debugging of dropped targets. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389	2023-12-04 17:42:46 +02:00
Aliaksandr Valialkin	2f4dc2aff1	lib/backup: consistently use path.Join() when constructing paths for s3, gs and azblob E.g. replace `fs.Dir + filePath` with `path.Join(fs.Dir, filePath)` The fs.Dir is guaranteed to end with slash - see Init() functions. The filePath may start with slash. If it starts with slash, then `fs.Dir + filePath` constructs an incorrect path with double slashes. path.Join() properly substitutes duplicate slashes with a single slash in this case. While at it, also substitute incorrect usage of filepath.Join() with path.Join() for constructing paths to object storage systems, which expect forward slashes in paths. filepath.Join() substittues forward slashes with backslashes on Windows, so this may break creating or managing backups from Windows. This is a follow-up for 0399367be602b577baf6a872ca81bf0f99ba401b Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/719	2023-12-04 17:25:41 +02:00
Aliaksandr Valialkin	f0215afee3	lib/promrelabel: add `keep_if_contains` and `drop_if_contains` relabeling actions (cherry picked from commit `ac65c6b178`)	2023-12-01 14:00:20 +01:00
Nikolay	9505d48070	lib/streamaggr: properly reference slice with labels (#5406 ) * lib/streamaggr: properly reference slice with labels by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402 * Reduce memory allocations when output_relabel_configs adds new labels to output samples --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> (cherry picked from commit `41f7940f97`)	2023-12-01 14:00:18 +01:00
hagen1778	73d18fbc7a	lib/protoparser/datadog: follow-up after `543f218fe9` * prevent /api/v1 from panic on parsing rows * add tests for Extract function for v1 and v2 api's * separate request types in different pools to prevent different objects mixing * add changelog line `543f218fe9` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `98d0f81f21`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-01 13:56:23 +01:00
Andrii Chubatiuk	d6b4c8e4ef	add datadog /api/v2/series and /api/beta/sketches support (#5094 ) Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com> Co-authored-by: Nikolay <https://github.com/f41gh7> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `543f218fe9`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-01 13:55:32 +01:00
Aliaksandr Valialkin	2f14394335	app/vmagent: follow-up for `090cb2c9de` - Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check for the result returned from these functions. - Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue. - Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case. - Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage cannot keep up with the data ingestion rate. - Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples. - Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push data to persistent queue when -remoteWrite.disableOnDiskQueue is set. - Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics, because they are replaced with vmagent_remotewrite_samples_dropped_total metric. - Update 'Disabling on-disk persistence' docs at docs/vmagent.md - Update stale comments in the code Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110	2023-11-25 12:13:39 +02:00
Nikolay	25ac2aac31	app/vmagent: allow to disabled on-disk persistence (#5088 ) * app/vmagent: allow to disabled on-disk queue Previously, it wasn't possible to build data processing pipeline with a chain of vmagents. In case when remoteWrite for the last vmagent in the chain wasn't accessible, it persisted data only when it has enough disk capacity. If disk queue is full, it started to silently drop ingested metrics. New flags allows to disable on-disk persistent and immediatly return an error if remoteWrite is not accessible anymore. It blocks any writes and notify client, that data ingestion isn't possible. Main use case for this feature - use external queue such as kafka for data persistence. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110 * adds test, updates readme * apply review suggestions * update docs for vmagent * makes linter happy --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-25 12:12:29 +02:00
Roman Khavronenko	26242f526e	lib/protoparser: decrease `import.maxLineLen` from 100MB to 10MB (#5364 ) Tests showed that importing a single line with 70MB size takes 5.3GiB RSS memory for VictoriaMetrics single-node. In the scenario when user exports and imports data from one VM to another, it could possibly lead to OOM exception for destination VM. Importing a single line with 16MB size taks 1.3GiB RSS memory. Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB to improve reliability of VictoriaMetrics during imports. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-24 13:13:33 +02:00
hagen1778	ae6152be5f	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 12:22:49 +02:00
hagen1778	91e365acb6	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 12:10:34 +02:00
Hui Wang	91379331eb	lib/protoparser/promremotewrite: fall back to zstd decoding if Snappy-decoding fails (#5344 ) This case is possible after the following steps: 1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data. 2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk. 3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression. 4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data. 5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy. The solution is the same as `12cd32fd75`, just fall back to zstd decompression if Snappy decompression fails.	2023-11-17 15:53:18 +01:00
Aliaksandr Valialkin	a0f02d06d7	lib/handshake: typo fix after `ef80a89a24`: SetReadDeadline -> SetWriteDeadline Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327	2023-11-16 16:47:07 +01:00
Aliaksandr Valialkin	ef80a89a24	lib/handshake: add SetReadDeadline and SetWriteDeadline implementations additionally to SetDeadline This is a follow-up for `27a5461785` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327	2023-11-16 16:43:36 +01:00
Roman Khavronenko	27a5461785	lib/handshake: check for deadline in Read and Write methods (#5327 ) The buffered connection could have exceeded the underlying connection deadline during reading or writing to an internal buffer. With this change, buffered connection struct additionally checks for a deadline in Read/Write methods. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-16 16:33:40 +01:00
Aliaksandr Valialkin	60ff3cbb3d	lib/querytracer: add missing blank comment line after `3121d76bee`	2023-11-15 16:11:50 +01:00
Aliaksandr Valialkin	e9639a49c2	lib/ingestserver: properly log the number of closed connections Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns) Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922	2023-11-14 21:53:10 +01:00
Nikolay	0730c2586d	lib/querytracer: makes package concurrent safe to use (#5322 ) * lib/querytracer: makes package concurrent safe to use it must fix various issues with concurrent code usage. Especially, when it's not reasonable to wait for all goroutines to be finished * wip --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-14 20:58:28 +01:00
Aliaksandr Valialkin	1f7ab894d7	lib/logger: increase default -loggerMaxArgLen command-line flag value from 500 to 1000 The 500 chars limit for the maximum arg lengths during logging appeared to be too low for some cases	2023-11-14 19:55:55 +01:00
Aliaksandr Valialkin	3a487666ca	lib/ingestserver: typo fix after `f7834767c1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922	2023-11-14 03:26:04 +01:00
Aliaksandr Valialkin	9760221214	lib/logstorage: always check the previous indexBlockHeader for blocks with matching tenantID and/or streamID The previous indexBlockHeader may contain blocks for the matching tenantID and/or streamID, so it must be scanned unconditionally during the search. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5295 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4856 This is a follow-up for `89dcbc2fe7`	2023-11-14 01:02:02 +01:00
XLONG96	77033dbfb6	lib/logstorage: fix streamID and tenantID search (#4856 ) (#5295 )	2023-11-14 01:02:02 +01:00
Zakhar Bessarab	f7834767c1	vmcluster: re-routing enhancement (#5293 ) * app/vmstorage: close vminsert connections gradually before stopping storage Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878 Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: update graceful shutdown logic - close connections from vminsert in determenistic order - update flag description - lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks). Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/cluster: add information about re-routing enhancement during restart Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/changelog: add entry for new command-line flag Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * {app/vmstorage,lib/ingestserver}: address review feedback Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs/cluster: add note to update workload scheduler timeout Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * wip --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-11-14 01:00:42 +01:00
Aliaksandr Valialkin	12cd32fd75	lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails This case is possible after the following steps: 1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether the remote storage supports zstd-compressed data. 2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression for the data sent to the remote storage. 3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk. 4. The remote storage becomes available. 5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data. 6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage, while falsely advertizing it sends zstd-compressed data. 7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd. The solution is to just fall back to Snappy decompression if zstd decompression fails. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301	2023-11-13 21:25:39 +01:00
Aliaksandr Valialkin	356deada8c	lib/htmlcomponents: use relative links for the top page and for favicon.ico This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes. For example, vmagent HTTP handlers can be served via /vmagent/ path prefix: - http://proxy/vmagent/targets - http://proxy/vmagent/service-discovery The path prefix can be arbitrary. For example, below are vmagent urls for /tenantID/vmagent/ path prefix: - http://proxy/tenantID/vmagent/targets - http://proxy/tenantID/vmagent/service-discovery While at it, consistently serve favicon.ico from any path directory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307	2023-11-13 20:28:17 +01:00
Aliaksandr Valialkin	a45cbc101f	all: cleanup: remove `// +build ...` lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20	2023-11-13 19:15:42 +01:00
Aliaksandr Valialkin	fb2071a01e	lib/regexutil: properly handle alternate regexps surrounded by .+ or .* Previously the following regexps were improperly handled: .+foo\|bar.+ .foo\|bar. This could lead to unexpected regexp match results. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5297 Thanks to @Haleygo for the initial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5308	2023-11-13 18:25:57 +01:00
Aliaksandr Valialkin	22927dcc53	lib/stringsutil: add tests for LimitStringLen() function	2023-11-13 10:33:07 +01:00
Dmytro Kozlov	faf788b4a6	lib/stringsutil: fix failing test (#5313 ) We have failed test on master branch. ``` --- FAIL: TestFormatLogMessage (0.00s) logger_test.go:24: unexpected result; got "foo: abcde, \"foo bar baz\", xx" want "foo: a..e, \"f..z\", xx" ``` if failed because maxArgs maxLen <= 4 in the `LimitStringLen` in that case we always will return the income string but in the test we limit the maxLen by value 4 ``` f("foo: %s, %q, %s", []interface{}{"abcde", fmt.Errorf("foo bar baz"), "xx"}, 4, `foo: a..e, "f..z", xx`)	2023-11-13 10:33:06 +01:00
Aliaksandr Valialkin	d9ecc3f6d7	lib/logger: add `-loggerMaxArgLen` command-line flag for fine-tuning the maximum length of logged args	2023-11-13 09:43:49 +01:00
Aliaksandr Valialkin	ed79f9806a	lib/blockcache: do not cache entries, which were attempted to be accessed 1 or 2 times Previously entries which were accessed only 1 time weren't cached. It has been appeared that some rarely executed heavy queries may read indexdb block twice in a row instead of once. There is no need in caching such a block then. This change should eliminate cache size spikes for indexdb/dataBlocks when such heavy queries are executed. Expose -blockcache.missesBeforeCaching command-line flag, which can be used for fine-tuning the number of cache misses needed before storing the block in the caching.	2023-11-13 09:38:57 +01:00
Aliaksandr Valialkin	996e746c2c	Makefile: update golangci-lint version from v1.54.2 to v1.55.1 See https://github.com/golangci/golangci-lint/releases/tag/v1.55.1	2023-11-02 21:42:35 +01:00
Aliaksandr Valialkin	3d6f4da3b3	docs: update -help output after recent changes to VictoriaMetrics components	2023-11-02 20:27:16 +01:00
Aliaksandr Valialkin	bf01a97f17	docs/CHANGELOG.md: update the description of the optimization for SLO/SLI-like queries according to latest changes See commits `4497a08e3d` and `92826b0b4a`	2023-11-02 20:09:22 +01:00
Aliaksandr Valialkin	5e7d495eb1	lib/httpserver: follow-up for `0638bbe69c` - Replace spaces with underscores in the `reason` label value for the vm_http_request_errors_total metric in order be consistent with Prometheus-like naming - Clarify the description for the change at docs/CHANGELOG.md Updates https://github.com/victoriaMetrics/victoriaMetrics/issues/4590 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5166	2023-10-31 19:10:48 +01:00
Aliaksandr Valialkin	2288f81c5b	lib/persistentqueue: properly re-create flock.lock file inside directory if persistent queue is broken. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5249 Thanks to @Sniper91 for the bugreport and initial fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5233	2023-10-31 19:10:26 +01:00
Aliaksandr Valialkin	09c5ac238a	lib/httpserver: call Request.Header() only once instead of calling it each time a new request header is set This is a follow-up for `ad839aa492`	2023-10-31 19:10:09 +01:00
Aliaksandr Valialkin	c22b63af04	lib/storage: follow-up for `29cebd82fb` Use atomic.CompareAndSwapUint32() instead of atomic.LoadUint32() followed by atomic.StoreUint32(). This makes the code more clear. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159	2023-10-31 19:03:50 +01:00
venkatbvc	85fd4917b1	vmauth: add counter metrics for auth successes and failures (#5166 ) New labels `reason="wrong basic auth creds"` and `reason="wrong auth key"` were added to metric `vm_http_request_errors_total` to help identify auth errors. https://github.com/victoriaMetrics/victoriaMetrics/issues/4590 Co-authored-by: Rao, B V Chalapathi <b_v_chalapathi.rao@nokia.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> (cherry picked from commit `0638bbe69c`)	2023-10-31 12:54:57 +01:00
Dima Lazerka	ed8fc04898	lib/httpserver: add flags to specify HSTS / Frame-Options / CSP headers for httpserver (#5111 ) support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options` HTTP headers in all VictoriaMetrics components. The values for headers can be specified by users via the following flags: `-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`. Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `ad839aa492`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-30 11:41:38 +01:00
Roman Khavronenko	733b73ffed	lib/storage: log warning about RO mode only on state change (#5191 ) Before, vmstorage would log the same message each second producing excessive amount of logs. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159 Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `29cebd82fb`)	2023-10-30 11:29:49 +01:00

... 3 4 5 6 7 ...

2621 Commits