Commit Graph

148 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
1c58c00618
app/vmselect/netstorage: limit the initial size for brsPoolCap with 32Kb
This should reduce the number of expensive memory allocations with sizes bigger than 32Kb
2024-01-23 22:29:39 +02:00
Aliaksandr Valialkin
43ecd5d258
app/vmselect/netstorage: pre-allocate memory for metricNamesBuf
This should reduce the number of metricNamesBuf re-allocations in append()
2024-01-23 21:34:16 +02:00
Aliaksandr Valialkin
41456d9569
app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery()
This avoids slow path in Go runtime for allocating objects bigger than 32Kb -
see 704401ffa0/src/runtime/malloc.go (L11)

This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 5dd37ad836 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 14:04:49 +02:00
Aliaksandr Valialkin
1f1768d7af
app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size
See 704401ffa0/src/runtime/malloc.go (L11)

This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 508c608062

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 13:46:37 +02:00
Aliaksandr Valialkin
e0399ec29a
app/vmselect/netstorage: remove tswPool, since it isnt efficient 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
72a838a2a1
app/vmselect/netstorage: avoid metricName->blockRef lookup when processing multiple blocks for the same time series
This saves a few CPU cycles for common case
2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
5dd37ad836
app/vmselect/netstorage: use []blockRef from blockRefPool in order to reduce memory allocations 2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
7345567c29
app/vmselect/netstorage: substitute pointer to blockRefs by brssPool index at the metricName->blockRefs map
This should reduce the pressure on Go GC, since it will see lower number of pointers.

This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
678234e9f0
app/vmselect/netstorage: reduce the number of allocations for blockRefs objects in ProcessSearchQuery()
This should reduce pressure on Go GC at vmselect

The change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:28 +02:00
Aliaksandr Valialkin
508c608062
app/vmselect/netstorage: reduce the number of memory allocations in ProcessSearchQuery() by storing all the metric names in a single byte slice
This reduces the number of memory allocations at the cost of possible memory usage increase,
since now different metric name strings may hold references to the previous byte slice.
This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory.

This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:28 +02:00
Nikolay
c9f39fd51f
app/vmselect/netstorage (#5649)
* app/vmselect/netstorage

correctly handle errGlobal set

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5649

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 02:47:29 +02:00
Roman Khavronenko
b8b6e120ff
app/vmselect: limit the number of parallel workers by 32 (#5195)
* app/vmselect: limit the number of parallel workers by 32

The change should improve performance and memory usage during query processing
on machines with big number of CPU cores. The number of parallel workers for
query processing is controlled via `-search.maxWorkersPerQuery` command-line flag.
By default, the number of workers is limited by the number of available CPU cores,
but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage,
  so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md

- Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag

- Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md

- Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS,
  because bigger values may worsen query performance and increase CPU usage

- Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX,
  since it is closer to a feature than to a bugfix.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-10-18 19:51:37 +02:00
Aliaksandr Valialkin
214be01dfa
app/vmselect/netstorage: remove duplicate see word from the error message
This is a follow-up for ac6c40e896

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4827
2023-08-14 02:05:44 -07:00
Aliaksandr Valialkin
ac6c40e896
all: refer to https://docs.victoriametrics.com/#resource-usage-limits in the error message about -search.max* limit
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4827
2023-08-14 01:57:34 -07:00
Aliaksandr Valialkin
45e345806c
app/vmselect/netstorage: remove runtime.Gosched() call from unpackWorker()
This should improve scalability of unpackWorker() on systems with many CPU cores.
This is a follow-up for a2ecf4fa4a and 16f3b279a2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-07-06 10:05:58 -07:00
Aliaksandr Valialkin
036a7b7365
lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist
Callers of these functions log the returned error and then exit. The returned error already contains the path
to directory, which was failed to be created. So let's just log the error together with the call stack
inside these functions. This leaves the debuggability of the returned error at the same level
while allows simplifying the code at callers' side.

While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk().
It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.
2023-04-13 22:11:59 -07:00
Nikolay
9b1e002287
app/vmselect: properly remove temp files at windows system (#4020)
With non-posix compliant systems it's not possible to remove unclosed files.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-27 18:10:15 -07:00
Aliaksandr Valialkin
5832242b44
app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores
This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419
2023-03-25 16:37:07 -07:00
Aliaksandr Valialkin
a1e496ced6
app/vmselect/netstorage: document why runtime.Gosched() is removed at 28f054bb00
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 16:36:51 -07:00
Zakhar Bessarab
28f054bb00
vmselect/netstorage: remove direct calls to Gosched to reduce amount of locks for global scope
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.

Updates #3966

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-25 16:34:03 -07:00
Aliaksandr Valialkin
70959d5dab
app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker()
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().

This should reduce CPU usage when processing queries on systems with big number of CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:31:02 -07:00
Aliaksandr Valialkin
4856a4cf5a
app/vmselect: optimize incremental aggregates a bit
Substitute sync.Map with an ordinary slice indexed by workerID.
This should reduce the overhead when updating the incremental aggregate state
2023-03-20 15:37:06 -07:00
Aliaksandr Valialkin
b5db69fe05
app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage
It has been appeared that this interning may lead to increased memory usage and increased CPU usage
when vmselect performs queries, which select big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
2023-03-12 00:52:35 -08:00
Oleksandr Redko
9fff48c3e3
app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
be8fba9b6a
app/vmselect/netstorage: tune the number of blocks per series which should be unpacked by a single goroutine instead of spinning up multiple goroutines
This reduces overhead on time series data unpacking for typical cases,
this reducing CPU usage at vmselect
2023-01-12 09:31:44 -08:00
Aliaksandr Valialkin
53d871d0b1
app/vmselect/netstorage: reduce tail latency during query processing
Previously the selected time series were split evenly among available CPU cores
for further processing - e.g unpacking the data and applying the given rollup
function to the unpacked data.
Some time series could be processed slower than others.
This could result in uneven work distribution among available CPU cores,
e.g. some CPU cores could complete their work sooner than others.
This could slow down query execution.

The new algorithm allows stealing time series to process from other CPU cores
when all the local work is done. This should reduce the maximum time
needed for query execution (aka tail latency).

The new algorithm should also scale better on systems with many CPU cores,
since every CPU processes locally assigned time series without inter-CPU communications.

The inter-CPU communications are used only when all the local work is finished
and the pending work from other CPUs needs to be stealed.
2023-01-10 13:43:14 -08:00
Aliaksandr Valialkin
e640ff72f1
app/vmselect/netstorage: reduce memory allocations when unpacking time series
Unpack time series with less than 400K samples in the currently running goroutine.
Previously a new goroutine was being started for unpacking the samples.
This was requiring additional memory allocations.
2023-01-09 23:18:17 -08:00
Aliaksandr Valialkin
df2a494a7c
app/vmselect/netstorage: pre-allocate 4 block references per each time series during querying
Usually the number of blocks returned per each time series during queries is around 4.
So it is a good idea to pre-allocate 4 block references per time series
in order to reduce the number of memory allocations.
2023-01-09 22:03:23 -08:00
Aliaksandr Valialkin
c5e0f527bc
app/vmselect/netstorage: cache canonical MetricName for time series returned from the storage
This reduces memory allocations for repeated queries, which return (almost) the same set of time series.
2023-01-09 21:53:10 -08:00
Aliaksandr Valialkin
7afcca0c51
all: use metricsql.CompileRegexp instead of regexp.Compile for compiling regexps used in graphite queries
This should speed up repeated queries, since metricsql.CompileRegexp returns regexps from the cache
on subsequent calls for the same input regexp.
2023-01-09 21:43:08 -08:00
Aliaksandr Valialkin
c38a10e143
app/vmselect/netstorage: eliminate memory allocation for sortBlocksHeap arg when calling mergeSortBlocks() 2023-01-09 21:08:51 -08:00
Aliaksandr Valialkin
1f9d605988
app/vmselect/netstorage: consistently select the sample with the biggest value out of samples with identical timestamps
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333

This fix is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3620 ,
but doesn't slow down the common case with merging replicated data blocks so significantly.

Benchmark results:

Before the change:

BenchmarkMergeSortBlocks/replicationFactor-1-4         	   13968	     85643 ns/op	 956.53 MB/s	    1700 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-2-4         	   10806	    109171 ns/op	1500.77 MB/s	    2191 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-3-4         	    8887	    130623 ns/op	1881.45 MB/s	    2660 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-4-4         	    7440	    157348 ns/op	2082.52 MB/s	    3174 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-5-4         	    6534	    184473 ns/op	2220.38 MB/s	    3612 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4  	   13419	     85205 ns/op	 961.44 MB/s	    2213 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 	     579	   1894900 ns/op	  43.23 MB/s	   46760 B/op	       1 allocs/op

After the change:

BenchmarkMergeSortBlocks/replicationFactor-1-4         	   13832	     85298 ns/op	 960.40 MB/s	    1716 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-2-4         	    8833	    134222 ns/op	1220.66 MB/s	    2675 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-3-4         	    6487	    184830 ns/op	1329.65 MB/s	    3636 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-4-4         	    4977	    236318 ns/op	1386.61 MB/s	    4733 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-5-4         	    4088	    296734 ns/op	1380.36 MB/s	    5761 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4  	   14083	     84067 ns/op	 974.47 MB/s	    2110 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 	     536	   2043534 ns/op	  40.09 MB/s	   50511 B/op	       1 allocs/op
2023-01-09 13:01:48 -08:00
Roman Khavronenko
7c0ae3a86a
lib/storage: keep sample with the biggest value on timestamp conflict (#3421)
The change leaves raw sample with the biggest value for identical
timestamps per each `-dedup.minScrapeInterval` discrete interval
when the deduplication is enabled.

```
benchstat old.txt new.txt
name                                         old time/op    new time/op    delta
DeduplicateSamples/minScrapeInterval=1s-10      817ns ± 2%     832ns ± 3%      ~     (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10     1.56µs ± 1%    2.12µs ± 0%   +35.19%  (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10     1.32µs ± 3%    1.65µs ± 2%   +25.57%  (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10    1.13µs ± 2%    1.50µs ± 1%   +32.85%  (p=0.000 n=10+10)

name                                         old speed      new speed      delta
DeduplicateSamples/minScrapeInterval=1s-10   10.0GB/s ± 2%   9.9GB/s ± 3%      ~     (p=0.052 n=10+10)
DeduplicateSamples/minScrapeInterval=2s-10   5.24GB/s ± 1%  3.87GB/s ± 0%   -26.03%  (p=0.000 n=9+7)
DeduplicateSamples/minScrapeInterval=5s-10   6.22GB/s ± 3%  4.96GB/s ± 2%   -20.37%  (p=0.000 n=10+10)
DeduplicateSamples/minScrapeInterval=10s-10  7.28GB/s ± 2%  5.48GB/s ± 1%   -24.74%  (p=0.000 n=10+10)
```

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-08 18:06:11 -08:00
Aliaksandr Valialkin
cae0f37edd
app/vmselect/netstorage: remove superflouos map lookup at ProcessSearchQuery
This should reduce CPU usage a bit during querying
2022-11-18 13:40:04 +02:00
Aliaksandr Valialkin
c53b7e66ef
app/vmselect: improve performance scalability on multi-CPU systems for /api/v1/export/... endpoints 2022-10-01 22:05:43 +03:00
Aliaksandr Valialkin
343241680b
all: remove the remaining bits of io/ioutil
The io/ioutil package is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil

VictoriaMetrics requires at least Go1.18, so it is time to remove the io/ioutil from source code

This is a follow-up for 02ca2342ab
2022-08-22 00:20:58 +03:00
Aliaksandr Valialkin
7478d423c5
app/vmselect/netstorage: cleanup after 92630c1ab4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2896
2022-08-04 18:28:11 +03:00
Aliaksandr Valialkin
c2bd75926b
app/vmselect/netstorage: initialize tsw.rowsProcessed before calling tsw.f, since tsw.f can modify r.Timestamps and r.Values lengths 2022-07-30 00:39:36 +03:00
Aliaksandr Valialkin
19a0b4679a
app/vmselect/netstorage: re-use random generator used for series shuffle in Result.RunParallel
This should reduce CPU usage needed for rand.Rand initialization
2022-07-30 00:30:37 +03:00
Aliaksandr Valialkin
92630c1ab4
app/vmselect/netstorage: improve the speed of queries over big number of time series on multi-CPU system
Reduce inter-CPU communications when processing the query over big number of time series.
This should improve performance for queries over big number of time series
on systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2896

Based on b596ac3745
Thanks to @zqyzyq for the idea.
2022-07-25 09:18:44 +03:00
Aliaksandr Valialkin
159c2e15e3
app/vmselect/netstorage: optimize mergeSortBlocks() for the worst case when blocks contain interleaved samples 2022-07-12 12:31:38 +03:00
Aliaksandr Valialkin
8429d4af5a
app/vmselect/netstorage: add mergeSortBlocks benchmark for the worstcase 2022-07-12 12:31:36 +03:00
Aliaksandr Valialkin
cd09f583fe
app/vmselect/netstorage: add benchmarks for mergeSortBlocks
This is a follow-up for 743ff84863
2022-07-11 12:54:48 +03:00
Aliaksandr Valialkin
743ff84863
app/vmselect/netstorage: optimize mergeSortBlocks function
- Use binary search instead of linear scan when locating the run of smallest timestamps
  in blocks with intersected time ranges. This should improve performance
  when merging blocks with big number of samples

- Skip samples with duplicate timestamps. This should increase query performance
  in cluster version of VictoriaMetrics with the enabled replication.
2022-07-09 00:34:42 +03:00
Aliaksandr Valialkin
77cbbacfdb
lib/vmselectapi: pass storage.SearchQuery to API calls instead of []*storage.TagFilters + storage.TimeRange + maxMetrics
This reduces the number of args to vmselectapi calls
2022-07-06 12:37:54 +03:00
Aliaksandr Valialkin
f435924ab3
lib/vmselectapi: pass maxSuffixes arg to tagValueSuffixes RPC call 2022-07-06 12:37:54 +03:00
Aliaksandr Valialkin
e1b8059086
lib/vmselectapi: rename deleteMetrics to more correct deleteSeries 2022-07-06 12:37:54 +03:00
Aliaksandr Valialkin
a60e03b3a7
lib/vmselectapi: use string type for tagKey and tagValuePrefix args at TagValueSuffixes()
This improves the API consistency
2022-07-06 12:37:53 +03:00
Aliaksandr Valialkin
a14188dd8e
app/vmselect: expose additional histograms at /metrics page, which may help get more insights for the query workload
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2792
2022-06-28 20:18:13 +03:00
Aliaksandr Valialkin
a350d1e81c
lib/storage: return marshaled metric names from SearchMetricNames
Previously SearchMetricNames was returning unmarshaled metric names.
This wasn't great for vmstorage, which should spend additional CPU time
for marshaling the metric names before sending them to vmselect.

While at it, remove possible duplicate metric names, which could occur when
multiple samples for new time series are ingested via concurrent requests.

Also sort the metric names before returning them to the client.
This simplifies debugging of the returned metric names across repeated requests to /api/v1/series
2022-06-28 18:17:15 +03:00