Commit Graph

1027 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
934a7f485c
app/vmselect: log locations of sendPrometheusError() calls
Previously the location inside the sendPrometheusError() was logged.
This could make hard investigating error locations via `vm_log_messages_total` metric.
2023-05-18 20:39:50 -07:00
Aliaksandr Valialkin
1ff67bb036
app/vmselect/vmui: run make vmui-update after 39c1b0f8d1 2023-05-18 12:15:22 -07:00
Alexander Marshalov
d321ea91f2
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 02:22:06 -07:00
Aliaksandr Valialkin
8703b2fa87
app/vmselect: small cleanup after 4f3f9950d0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807
2023-05-09 22:45:02 -07:00
Aliaksandr Valialkin
fbc28810b1
app/vmselect: small cleanup after 68e31a6000
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3811
2023-05-09 22:43:59 -07:00
Aliaksandr Valialkin
5dbaffe2c6
app/{vmselect,vmctl}: move ParseTime() to lib/promutils
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4091

This is a follow-up for e2053baf32
2023-05-09 22:42:35 -07:00
Roman Khavronenko
5bc8d8f290
vmselect: exit early from queue on context cancel (#4223)
* vmselect: exit early from queue on context cancel

When `-search.maxConcurrentRequests` is reached, vmselect puts
request in the queue. It is expected, that requests in the queue
will be processed as soon as it would be enough capacity to do so.

However, it could happen that while request was waiting its turn,
the client could have already cancel it (close the connection,
or just close the tab with UI). In this case, we should de-queue
such requests to avoid spending extra resources on them.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmselect: address review comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 22:58:05 -07:00
Aliaksandr Valialkin
1a7794735e
app/vmselect: fix the build after fb8889820aba710508033cbf6826eb63a357532a 2023-05-08 17:32:18 -07:00
Yury Molodov
de35cbf251
vmui: Integrate WITH template playground (#3831)
* feat: add WithTemplate page

* app/vmselect/prometheus: enable json mode for expand with expr API

* app/vmselect/prometheus: enable CORS and add content type

* feat: add api for expand with templates

* fix: remove console from useExpandWithExprs

* app/vmselect/prometheus: fix escaping

* vmui:  integrate WITH template

* app/vmctl: check content type instead of form param

* fix: add content-type for fetch with-exprs

* fix: add a header to the server's response that allows the "Content-Type" header

* app/vmctl: added comment and cleanup

* app/vmctl: use format query param

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-05-08 14:35:35 -07:00
Aliaksandr Valialkin
cf4701db65
lib/fs: add MustReadDir() function
Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity.
The fs.MustReadDir() logs the error with the directory name and the call stack on error
before exit. This information should be enough for debugging the cause of the error.
2023-04-14 22:11:40 -07:00
Aliaksandr Valialkin
c4638553a3
lib/fs: rename WriteFileAtomically to MustWriteAtomic
Callers of this function log the returned error and exit.
So let's just log the error with the given filepath and the call stack
inside the function itself and then exit. This simplifies the code
at callers' place while leaves the same level of debuggability in case of errors.
2023-04-13 22:43:30 -07:00
Aliaksandr Valialkin
aac3dccfd1
lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist
Callers of these functions log the returned error and then exit. The returned error already contains the path
to directory, which was failed to be created. So let's just log the error together with the call stack
inside these functions. This leaves the debuggability of the returned error at the same level
while allows simplifying the code at callers' side.

While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk().
It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.
2023-04-13 22:22:08 -07:00
Aliaksandr Valialkin
26b361f4c3
app/vmselect/vmui: run make vmui-update after 01fc228fb0 2023-04-06 15:11:54 -07:00
Aliaksandr Valialkin
a241485262
app/vmselect/vmui: run make vmui-update after a1601929ec 2023-04-06 03:20:16 -07:00
Yury Molodov
7871ee0e43
vmui: implement heatmap improvements (#4078)
* fix: disabled limits for histogram

* fix: add sorted buckets by upper bound

* refactor: move line chart components to folder

* feat: implement heatmap improvements (https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3384#issuecomment-1484023162)

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-04-05 22:15:23 -07:00
Aliaksandr Valialkin
fa2ba7b07b
app/vmselect/vmui: run make vmui-update after edb45d7fc1 2023-04-02 21:22:17 -07:00
Aliaksandr Valialkin
7b10af4846
app/vmselect/vmui: run make vmui-update after 42087518ba 2023-04-01 00:41:03 -07:00
Aliaksandr Valialkin
db8fda4ec6
app/vmselect/graphite: open source Graphite Render API 2023-03-31 23:37:40 -07:00
Nikolay
b38a145cfd
app/vmselect: properly remove temp files at windows system (#4020)
With non-posix compliant systems it's not possible to remove unclosed files.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-27 18:10:44 -07:00
Aliaksandr Valialkin
54b9537a76
app/vmselect/promql: follow-up for 79e1c6a6fc
- Document the fix at docs/CHANGELOG.md
- Add tests with multiple adjancent zero buckets
- Simplify the fix a bit

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4021
2023-03-27 18:04:30 -07:00
Ze'ev Klapow
680a661ec0
fix le buckets when adjacent vmrange is empty (#4021)
There is a bug here where if you have a single bucket like:

foo{vmrange="4.084e+02...4.642e+02"} 2 123

The expected output is three le encoded buckets like:

foo{le="4.084e+02"} 0 123
foo{le="4.642e+02"} 2 123
foo{le="+Inf"} 2 123

This correctly encodes the start and end of the vmrange.
If however, the input contains the previous bucket, and that bucket is
empty then you only get the end le and +Inf out currently, i.e:

foo{vmrange="7.743e+05...8.799e+05"} 5 123
foo{vmrange="6.813e+05...7.743e+05"} 0 123

results in:

foo{le="8.799e+05"} 5 123
foo{le="+Inf"} 5 123

This causes issues when you go to compute a quantile because this means
that the assumed lower bound of the buckets is 0 and this we interpolate
between 0->end rather than the vmrange start->end as expected.
2023-03-27 18:04:29 -07:00
Aliaksandr Valialkin
9387793f47
app/vmselect: follow-up for 10ab086366
- Expose stats.seriesFetched at `/api/v1/query_range` responses too
  for the sake of consistency.

- Initialize QueryStats when it is needed and pass it to EvalConfig then.
  This guarantees that the QueryStats is properly collected when the query
  contains some subqueries.
2023-03-27 15:11:42 -07:00
Roman Khavronenko
10ab086366
app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 08:51:33 -07:00
Yury Molodov
86a98fa131
vmui: heatmap (#3780)
* fix: add stroke and font for all axes

* feat: add util for generate gradient

* feat: add heatmap plugin

* feat: add heatmap legend

* feat: add heatmap graph (#3384)

* vmui: add heatmap graph (#3384)

* feat: add convert Prometheus to VictoriaMetrics histogram

* fix: prevent re-render graph

* feat: reset step for heatmap

* feat: normalize heatmap data

* fix: format heatmap legend

* wip

* app/vmselect/vmui: run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-26 00:31:21 -07:00
Aliaksandr Valialkin
db3bcbe56a
app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores
This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419
2023-03-25 16:38:39 -07:00
Aliaksandr Valialkin
a2ecf4fa4a
app/vmselect/netstorage: document why runtime.Gosched() is removed at 28f054bb00
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 16:38:28 -07:00
Zakhar Bessarab
16f3b279a2
vmselect/netstorage: remove direct calls to Gosched to reduce amount of locks for global scope
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.

Updates #3966

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-25 16:37:58 -07:00
Aliaksandr Valialkin
740fa57fdc
app/vmselect/promql: typo fix after e7f46a0aab 2023-03-24 23:47:11 -07:00
Aliaksandr Valialkin
7aff6f872f
app/vmselect/promql: follow-up for 7205c79c5a
- Allocate and initialize seriesByWorkerID slice in a single go instead
  of initializing every item in the list separately.
  This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-24 23:39:43 -07:00
Zakhar Bessarab
fec87e3ada
app/vmselect/promql: use lock-less approach to gather results of parallel processing for evalRollup* funcs (#4004)
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation

Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.

New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks

Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-24 23:39:41 -07:00
Aliaksandr Valialkin
b9632023c4
app/vmselect/vmui: run make vmui-update after dc2c712a29 2023-03-24 18:08:51 -07:00
Aliaksandr Valialkin
79d8f0e7c6
app/vmselect/promql: pass workerID to the callback inside doParallel()
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:57:34 -07:00
Aliaksandr Valialkin
e749a015a9
app/vmselect/promql: fix TestIncrementalAggr test on systems less than 3 CPU cores
This is a follow-up for 4856a4cf5a
2023-03-20 20:37:44 -07:00
Aliaksandr Valialkin
08da383eac
app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker()
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().

This should reduce CPU usage when processing queries on systems with big number of CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:32:56 -07:00
Aliaksandr Valialkin
18af01c387
app/vmselect: optimize incremental aggregates a bit
Substitute sync.Map with an ordinary slice indexed by workerID.
This should reduce the overhead when updating the incremental aggregate state
2023-03-20 15:42:13 -07:00
Aliaksandr Valialkin
7a1e2f49cc
app/vmselect/vmui: make vmui-update after d4525bd2d0 2023-03-20 14:35:17 -07:00
Aliaksandr Valialkin
fc3d826d7f
all: add Windows build for VictoriaMetrics
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.

The previous algorithm for background merge:

1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
   swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.

Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.

So the new algorithm for background merge has been implemented:

1. Merge source parts into a destination part inside the partition directory itself.
   E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
   e.g. remove the source parts from the list and add the destination part to the list
   before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.

This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:

- It should work better for NFS.
- It fits object storage semantics.

The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-19 23:28:26 -07:00
oliverpool
8c708ca1e9
app/vmselect/promql: add test to ensure 8-byte alignment (#3948)
See 0af9e2b693
2023-03-16 22:07:13 -07:00
Aliaksandr Valialkin
3b4a3583bc
app/vmselect/promql: prevent from cannot unmarshal timeseries from rollupResultCache panic after the upgrade to v1.89.0 2023-03-12 19:09:11 -07:00
Aliaksandr Valialkin
cf7d8811f6
app/vmselect/vmui: make vmui-update after 00a0816ab1 2023-03-12 17:22:28 -07:00
Aliaksandr Valialkin
a6a4beb89a
app/vmselect: remove data race on updating EvalConfig.IsPartialResponse from concurrently running goroutines
This properly returns `is_partial: true` for partial responses.
2023-03-12 16:53:03 -07:00
Aliaksandr Valialkin
5cd60c54d3
app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM)
Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927
2023-03-12 16:29:18 -07:00
Aliaksandr Valialkin
e491fee1f4
app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage
It has been appeared that this interning may lead to increased memory usage and increased CPU usage
when vmselect performs queries, which select big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
2023-03-12 00:44:08 -08:00
Aliaksandr Valialkin
54fe207cc0
all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:42:58 -08:00
Aliaksandr Valialkin
bc24d35153
app/vmselect/vmui: make vmui-update after bbf8e459a0 2023-03-08 01:40:18 -08:00
Aliaksandr Valialkin
d3605ad072
app/vmselect/promql: fix panic when calculating aggr_func(rollup*())
The panic has been introduced in dac21d874b
2023-02-27 11:48:38 -08:00
Aliaksandr Valialkin
bbd5914eb1
all: add makefile rules for GOARCH=s390x for all the VictoriaMetrics components
This is a follow-up for 007530f882
2023-02-26 12:38:48 -08:00
Aliaksandr Valialkin
18dd0d1dbf
.golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:19:58 -08:00
Roman Khavronenko
66d0b45651
vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 17:10:38 -08:00
Roman Khavronenko
79eb33556e
metricsql: support optional 2nd argument for rollup functions (#3841)
* metricsql: support optional 2nd argument for rollup functions

Support optional 2nd argument `min`, `max` or `avg` for rollup functions:
 * rollup
 * rollup_delta
 * rollup_deriv
 * rollup_increase
 * rollup_rate
 * rollup_scrape_interval

 If second argument is passed, then rollup function will return only the selected aggregation type.
 This change can be useful for situations where only one type of rollup calculation is needed.
 For example, `rollup_rate(requests_total[5m], "max")`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 13:48:30 -08:00