Commit Graph

2930 Commits

Author SHA1 Message Date
Khushi Jain
83e55456e2
app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738) 2024-02-08 15:52:00 +01:00
Aliaksandr Valialkin
a354924b0d
app/victoria-metrics: properly send staleness markers on victoriametrics shutdown if -selfScrapeInterval > 0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/943
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2024-02-08 15:29:19 +02:00
Aliaksandr Valialkin
aea8feee1a
docs: update -help output after 61d9df4c36
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:50:49 +02:00
Aliaksandr Valialkin
61d9df4c36
app/vmselect: add ability to reset rollup result cache on startup by passing -search.resetRollupResultCacheOnStartup command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:40:40 +02:00
Aliaksandr Valialkin
0cf56c1ba5
app/vmselect/promql: properly handle precision errors in rollup functions
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.

The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
2024-02-08 02:30:57 +02:00
Aliaksandr Valialkin
19c1066a25
docs/CHANGELOG.md: properly document the change at b74006e2ca
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774
2024-02-07 22:05:12 +02:00
Nihal
b74006e2ca
[vmsingle/vminsert]: change http success response code to 200 for -/reload request handler (#5776)
* change vmsingle's response code to 200 for reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-02-07 20:00:04 +00:00
Aliaksandr Valialkin
b431ccea5b
all: update Go builder from Go1.21.6 to Go1.21.7
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.7+label%3ACherryPickApproved
2024-02-07 04:00:37 +02:00
Aliaksandr Valialkin
6a7c7ae391
app/{vmselect,vlselect}/vmui: run make vmui-update vmui-logs-update after the recent changes to app/vmui
This is a follow-up for the following commits:

- dcbdbc760e
- a81ccbd749
- 65b8002aeb
2024-02-07 01:48:46 +02:00
Aliaksandr Valialkin
541b644d3d
app/{vmagent,vminsert}: follow-up after a1d1ccd6f2
- Document the change at docs/CHANGELOG.md
- Copy changes from docs/Single-server-VictoriaMetrics.md to README.md
- Add missing handler for processing multitenant requests ( https://docs.victoriametrics.com/vmagent/#multitenancy )
- Substitute github.com/stretchr/testify dependency with 3 lines of code in the added tests
- Comment unclear code at lib/protoparser/datadogsketches/parser.go , so @AndrewChubatiuk could update it
  and add permalinks to the original source code there.
- Various code cleanups

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3091
2024-02-07 01:28:05 +02:00
Andrii Chubatiuk
a1d1ccd6f2
support datadog /api/beta/sketches API (#5584)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:58:11 +00:00
Yury Molodov
dcbdbc760e
vmui: improve select component functionality (#5755)
* vmui: fix select closing on click outside (#5728)

* vmui: clear entered text in select after selecting a value (#5727)

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:50:04 +00:00
Yury Molodov
a81ccbd749
vmui: fix handling invalid timezone (#5758)
* vmui: fix handling invalid timezone (#5732)

* vmui: switch browser timezone flag to isValid

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:47:30 +00:00
Yury Molodov
65b8002aeb
vmui: fix graph dragging (#5769)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:32:03 +00:00
Aliaksandr Valialkin
de9a9546c2
lib/cgroup: remove SetGOGC() function
GOGC can be already set via environment variable. There is no need in adding
new approaches for setting the GOGC (such as command-line flag), since they complicate operations.
2024-02-05 12:11:08 +02:00
Aliaksandr Valialkin
0e3c532bf7
app/vmselect/netstorage: prevent from disk write IO when closing temporary files
Remove temporary file before closing it in order to signal the OS that it shouldn't
store the file contents from page cache to disk when the file is closed.

Gracefully handle the case when the file cannot be removed before being closed -
in this case remove the file after closing it. This allows working on Windows.

Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing.

This is a follow-up for 9b1e002287
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2024-02-01 19:12:44 +02:00
Aliaksandr Valialkin
88dc6cff70
app/vmselect: add missing whitespace into the description for -vmui.defaultTimezone command-line flag
This is a follow-up for eb6def0695
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5611
2024-02-01 14:49:51 +02:00
Aliaksandr Valialkin
db4623efc2
app/vmselect/netstorage: properly handle the case when an empty brsPool points to the end of brs.brs
This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs)
before trying to append a new item to brsPool and sharing its contents with brs.brs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733
2024-01-31 10:27:50 +02:00
Aliaksandr Valialkin
ec0ca8e7eb
app/vmselect/promql: really keep metric names when keep_metric_names modifier is applied to binary operator
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5556
2024-01-31 02:32:55 +02:00
Roman Khavronenko
6939c53e48
app/vmselect: set proper timestamp for cached instant responses (#5723)
* app/vmselect: set proper timestamp for cached instant responses

The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.

The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 20:03:34 +00:00
Aliaksandr Valialkin
c12bdd6c28
app/vmselect/vmui: run make vmui-update after 81b5db04f6 2024-01-30 21:13:01 +02:00
Yury Molodov
81b5db04f6
vmui: add the ability to expand all tracing entries (#5677) (#5726) 2024-01-30 19:10:10 +00:00
Aliaksandr Valialkin
adf585f7ed
app/vmselect/vmui: run make vmui-update after 6e8995cfb92fb5a87fc6ad78609bf9ea5e0e712f 2024-01-30 18:45:57 +02:00
Yury Molodov
7007c6a760
vmui: fix Enter key in query field (#5667) (#5717) 2024-01-30 14:36:19 +01:00
Aliaksandr Valialkin
583b6fe1e7
app/vmagent/remotewrite: limit the concurrency for marshaling time series before sending them to remote storage
There is no sense in running more than GOMAXPROCS concurrent marshalers,
since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth,
but they may result in more RAM usage.
2024-01-30 12:18:19 +02:00
Aliaksandr Valialkin
5d66ee88bd
lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests
This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-01-29 16:45:12 +01:00
Roman Khavronenko
24eb1ad0c8
vmalert: set ActiveAt to evaluation timestamp in newAlert fn (#5657)
The change fixes flaky test `TestAlertingRule_Exec` which has dependency on the actual timestamps,
which resulted into inaccurate test states:
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/7608452967/job/20717699688

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 12:02:02 +01:00
Aliaksandr Valialkin
b9dcaaa7f8
app/vmui: run make vmui-update after a7b11eff7c 2024-01-26 22:53:46 +01:00
Roman Khavronenko
df59ac7f0e
app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:42:21 +01:00
Yury Molodov
a7b11eff7c
vmui: fix Enter key in query field (#5667) (#5681) 2024-01-26 22:38:32 +01:00
Aliaksandr Valialkin
bb7a419cc3
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Aliaksandr Valialkin
8e03bc6b53
app/vmselect/promql: do not spend CPU time on verifying whether the rollup cache needs to be reset for the given metric rows when it has been already instructed to reset 2024-01-26 21:13:38 +01:00
Aliaksandr Valialkin
32c064a401
app/vmauth: return 503 service unavailable status code when the backend returns response with unsupported status code, but the request cannot be re-tried.
While at it, properly close response body. This should prevent from possible http keep-alive connection leak to backends because of unclosed response bodies.

This is a follow-up for 3c0aa14b5b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5688
2024-01-26 20:43:11 +01:00
Roman Khavronenko
b11f4ef5ea
app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 15:42:57 +01:00
Alexander Marshalov
3c0aa14b5b
vmauth: fix vmauth_user_request_backend_errors_total metric calc logic for use case when only one backend is available - if we get an error from the retry_status_codes list, but cannot execute retry, we increment vmauth_user_request_backend_errors_total as well (#5688) 2024-01-25 14:04:20 +01:00
Aliaksandr Valialkin
c3a585cfe5
lib/storage: rename *AssistedMerges to *AssistedMergesCount in order to make these field names less misleading
These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code
2024-01-25 10:19:32 +02:00
Alexander Marshalov
806c07ddd5
vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached. (#5682) 2024-01-24 17:55:06 +01:00
Aliaksandr Valialkin
18df07e824
lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts
The maxFileParts usage has been accidentally removed in fa566c68a6

While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
2024-01-24 15:08:42 +02:00
Aliaksandr Valialkin
1c58c00618
app/vmselect/netstorage: limit the initial size for brsPoolCap with 32Kb
This should reduce the number of expensive memory allocations with sizes bigger than 32Kb
2024-01-23 22:29:39 +02:00
Aliaksandr Valialkin
43ecd5d258
app/vmselect/netstorage: pre-allocate memory for metricNamesBuf
This should reduce the number of metricNamesBuf re-allocations in append()
2024-01-23 21:34:16 +02:00
Aliaksandr Valialkin
41456d9569
app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery()
This avoids slow path in Go runtime for allocating objects bigger than 32Kb -
see 704401ffa0/src/runtime/malloc.go (L11)

This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 5dd37ad836 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 14:04:49 +02:00
Aliaksandr Valialkin
1f1768d7af
app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size
See 704401ffa0/src/runtime/malloc.go (L11)

This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 508c608062

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 13:46:37 +02:00
Aliaksandr Valialkin
15a15e5b99
app/vmselect/vmui: run make vmui-update in order to sync recent changes in app/vmui 2024-01-23 04:31:44 +02:00
Yury Molodov
38231d5994
vmui: query report (#5497)
* vmui: add query analyzer page

* vmui: fix tabs for query analyzer

* vmui: add help to export query

* vmui: add time params to query analyzer

* docs/vmui: add query analyzer

* vmui: fix validation JSON form

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:23:26 +02:00
Yury Molodov
eb6def0695
vmui: add flag for default timezone setting (#5611)
* vmui: add flag for default timezone setting #5375

* vmui: validate timezone before client return

* Update app/vmselect/vmui.go

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:11:19 +02:00
Yury Molodov
633e6b48ad
vmui: fix cache autocomplete (#5591)
* vmui: fix the logic of closing the popper #5470

* vmui: fix the logic of caching autocomplete results #5472

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:06:14 +02:00
Aliaksandr Valialkin
bc7d19c8ca
app/vmselect/promql: remove superflouos memory allocations at aggrPrepareSeries()
While at it, also remove unneeded map lookup
2024-01-23 02:28:31 +02:00
Aliaksandr Valialkin
9240bc36a3
app/vmselect/promql/aggr_incremental.go: eliminate unnecessary memory allocation in incrementalAggrFuncContext.updateTimeseries 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
e0399ec29a
app/vmselect/netstorage: remove tswPool, since it isnt efficient 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
72a838a2a1
app/vmselect/netstorage: avoid metricName->blockRef lookup when processing multiple blocks for the same time series
This saves a few CPU cycles for common case
2024-01-23 02:28:29 +02:00