Commit Graph

1940 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
f5559c038c
lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests
This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-01-29 16:44:46 +01:00
Roman Khavronenko
9e9f170fe7
lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689)
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:45:45 +01:00
Roman Khavronenko
562edb72ea
app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:43:02 +01:00
Yury Molodov
551f48466c
vmui: fix Enter key in query field (#5667) (#5681) 2024-01-26 22:38:51 +01:00
Aliaksandr Valialkin
7a8b92b590
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:19:52 +01:00
Roman Khavronenko
a2f83115ae
app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 20:51:50 +01:00
Alexander Marshalov
14712e3b99
vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached. (#5682) 2024-01-25 10:21:09 +02:00
Aliaksandr Valialkin
1e364c992d
lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers
Already terminated pods and containers cannot be scraped and will never resurrect,
so there is zero sense in creating scrape targets for them.
2024-01-24 14:58:51 +02:00
Aliaksandr Valialkin
0dca3c4025
app/{vmselect,vmstorage}: return compression of the data passed from vmstorage to vmselect
This reverts cd4f641d32 , since it has been appeared that the disabled compression
for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads,
while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%.

The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage.
This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability
of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default.

The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage.
The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.
2024-01-24 13:37:05 +02:00
Aliaksandr Valialkin
e6e5b97e1e
lib/streamaggr: expand %{ENV} placeholders in stream aggregation configs 2024-01-24 12:31:42 +02:00
Aliaksandr Valialkin
12698b9136
lib/mergeset: really limit the number of in-memory parts to 15
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.

The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.

The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.

The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
2024-01-24 03:41:19 +02:00
Roman Khavronenko
8461add541
lib/promscrape: respect 0 value for series_limit param (#5663)
* lib/promscrape: respect `0` value for `series_limit` param

Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.

This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 13:09:36 +02:00
Yury Molodov
1db2b991b7
vmui: query report (#5497)
* vmui: add query analyzer page

* vmui: fix tabs for query analyzer

* vmui: add help to export query

* vmui: add time params to query analyzer

* docs/vmui: add query analyzer

* vmui: fix validation JSON form

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:26:04 +02:00
Yury Molodov
3a26e4d6ec
vmui: add flag for default timezone setting (#5611)
* vmui: add flag for default timezone setting #5375

* vmui: validate timezone before client return

* Update app/vmselect/vmui.go

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:15:14 +02:00
Yury Molodov
574d69775e
vmui: fix cache autocomplete (#5591)
* vmui: fix the logic of closing the popper #5470

* vmui: fix the logic of caching autocomplete results #5472

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:06:39 +02:00
Aliaksandr Valialkin
cd4f641d32
app/{vmstorage,vmselect}: disable vmstorage->vmselect RPC compression by default in order to improve query performance 2024-01-23 02:29:13 +02:00
Zakhar Bessarab
60ef978ffc
lib/storage: print tenant ID in log when discarding or truncating labels (#5658)
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 02:27:59 +02:00
Dmytro Kozlov
69e59ac9b7
deployment/docker: add grafana datasource to the docker-compose files (#5363)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3920
https://github.com/VictoriaMetrics/grafana-datasource/issues/113
2024-01-22 18:43:16 +02:00
Aliaksandr Valialkin
c6f6f094c5
Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577)"
This reverts commit cfec258803.

Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled.
The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled.
This prevents from storing the last response at scrapeWork.processScrapedData().

It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577
2024-01-22 01:46:12 +02:00
Aliaksandr Valialkin
d4a1a28543
app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-22 01:39:27 +02:00
Hui Wang
49fa92c1d0
lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557)
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice

Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.

* remove mislead comment

* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

* wip

* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls

groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.

Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.

P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.

* typo fix

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 01:33:17 +02:00
Aliaksandr Valialkin
885ee160c2
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-22 01:23:23 +02:00
Nikolay
73c51072e6
app/vmauth: adds metric_labels and backend_errors counter (#5585)
* app/vmauth: adds metric_labels and backend_errors counter
it must improve observability for user requests with new metric - per user backend errors counter.
it's needed to calculate requests fail rate to the configured backends.
metric_labels configuration allows to perform additional aggregations on top of multiple users from configuration section.
It could be multiple clients or clients with separate read/write tokens
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 01:09:51 +02:00
Aliaksandr Valialkin
2c7c812a9d
lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag
This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593
2024-01-22 01:08:52 +02:00
Hui Wang
e086ef16da
app/vmselect/promql: properly handle possible negative results caused… (#5608)
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()

* fix test
2024-01-22 01:04:50 +02:00
Nikolay
d79a7c36b0
app/vmselect/netstorage (#5649)
* app/vmselect/netstorage

correctly handle errGlobal set

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5649

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 01:01:38 +02:00
Nikolay
e196c61e36
app/vmselect: abort streaming connections for vmselect (#5650)
* app/vmselect: abort streaming connections for vmselect
due to streaming nature of export APIs, curl and simmilr tools cannot
detect errors that happened after http.Header with status 200 was
written to it.

This PR tracks if body write was already started and closes connection.

It allows client to detect not expected chunk sequence and return error
to the caller.

Mostly it affects vmselect at cluster version

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5650

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 00:54:32 +02:00
Aliaksandr Valialkin
c05982bfa7
lib/promscrape/discovery/hetzner: follow-up after 03a97dc678
- docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names,
  document missing __meta_hetzner_role label.
- lib/promscrape/config.go: added missing MustStop() call for Hetzner SD,
  and moved the code to the correct place according to alphabetical order of SD names.
- lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses,
  populate missing __meta_hetzner_role label like Prometheus does.
- Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does.
- Remove unused SDConfig.Token.
- Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154
2024-01-22 00:53:23 +02:00
Hui Wang
66eb013b54
lib/promscrape: do not store last scrape response when stale markers … (#5577)
* lib/promscrape: do not store last scrape response when stale markers are disabled

* update changelog
2024-01-22 00:52:25 +02:00
Roman Khavronenko
fe4934f0ec
app/vmui: send step param for instant queries (#5639)
The change reverts https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3896
 due to reasons explained in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3896#issuecomment-1896704401

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 00:25:31 +02:00
Artem Navoiev
8154d16420
docs: changelog fix the link to cluster
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-21 23:54:08 +02:00
Roman Khavronenko
148e14b3f2
app/vmselect: properly calculate start param for queries with too big look-behind window (#5630)
Properly determine time range search for instant queries with too big look-behind window like `foo[100y]`.
 Previously, such queries could return empty responses even if `foo` is present in database.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-21 23:47:09 +02:00
Aliaksandr Valialkin
014bba6d8e
docs/CHANGELOG*: move changes for 2023 year to docs/CHANGELOG_2023.md 2024-01-17 13:12:58 +02:00
Aliaksandr Valialkin
a130b86533
docs/CHANGELOG.md: document v1.93.10 LTS release
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.10
2024-01-17 01:45:22 +02:00
hagen1778
a2d3fce05f
deployment/dashboards: change title VictoriaMetrics to VictoriaMetrics - single-node
The new title should provide better understanding of this dashboard purpose.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-17 01:06:07 +02:00
Aliaksandr Valialkin
0b1c2f70a5
docs/CHANGELOG.md: document v1.87.13 LTS release
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.87.13
2024-01-17 01:04:26 +02:00
Aliaksandr Valialkin
1340ab1c07
docs/CHANGELOG.md: fix a link in the description of 70cd09e736
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5581
2024-01-17 00:16:17 +02:00
Aliaksandr Valialkin
b7fcdb1985
deployment/docker: update Go builder from Go1.21.5 to Go1.21.6 2024-01-17 00:05:24 +02:00
Aliaksandr Valialkin
f673039e86
lib/storage: follow-up for 4b8088e377
- Clarify the bugfix description at docs/CHANGELOG.md
- Simplify the code by accessing prefetchedMetricIDs struct under the lock
  instead of using lockless access to immutable struct.
  This shouldn't worsen code scalability too much on busy systems with many CPU cores,
  since the code executed under the lock is quite small and fast.
  This allows removing cloning of prefetchedMetricIDs struct every time
  new metric names are pre-fetched. This should reduce load on Go GC,
  since the cloning of uin64set.Set struct allocates many new objects.
2024-01-16 22:38:57 +02:00
Hui Wang
2f40ed3aac
exit vmagent if there is config syntax error in scrape_config_files when -promscrape.config.strictParse=true (#5560) 2024-01-16 22:35:18 +02:00
hagen1778
343176438e
deployment/alerts: add job label to DiskRunsOutOfSpace alerting rule
So it is easier to understand to which installation the triggered instance belongs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-16 22:21:34 +02:00
Aliaksandr Valialkin
6ba2fd3312
app/vmselect/promql: follow-up for ce4f26db02
- Document the bugfix at docs/CHANGELOG.md
- Filter out NaN values before sorting as suggested at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509#discussion_r1447369218
- Revert unrelated changes in lib/filestream and lib/fs
- Use simpler test at app/vmselect/promql/exec_test.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5506
2024-01-16 22:13:13 +02:00
Aliaksandr Valialkin
015c0a4d1a
app/vmselect/promql: consistently sort results of a or b query
Previously the order of results returned from `a or b` query could change with each request
because the sorting for such query has been disabled in order to satisfy
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763 .

This commit executes `a or b` query as `sortByMetricName(a) or sortByMetricName(b)`.
This makes the order of returned time series consistent across requests,
while maintaining the requirement from https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763 ,
e.g. `b` results are consistently put after `a` results.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5393
2024-01-16 22:12:15 +02:00
Aliaksandr Valialkin
9e5e514faf
lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop()
Previously the was a race condition when the background goroutine still could try collecting metrics
from already stopped resources after returning from pushmetrics.Stop().
Now the pushmetrics.Stop() waits until the background goroutine is stopped before returning.

This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5549
and the commit fe2d9f6646 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
2024-01-16 21:18:22 +02:00
Aleksandr Stepanov
3a6e3adc7d
vmagent: added hetzner sd config (#5550)
* added hetzner robot and hetzner cloud sd configs

* remove gettoken fun and update docs

* Updated CHANGELOG and vmagent docs

* Updated CHANGELOG and vmagent docs

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-01-16 21:13:20 +02:00
Roman Khavronenko
d562d772a8
lib/storage: properly check for storage/prefetchedMetricIDs cache expiration deadline (#5607)
Before, this cache was limited only by size.
Cache invalidation by time happens with jitter to prevent thundering herd problem.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-16 21:08:59 +02:00
rbizos
62db64e71b
Handling negative index in Graphite groupByNode/aliasByNode (#5581)
Handeling the error case with -1

Signed-off-by: Raphael Bizos <r.bizos@criteo.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-01-16 20:55:27 +02:00
Aliaksandr Valialkin
063ea8c773
app/vmalert/remotewrite: properly calculate vmalert_remotewrite_dropped_rows_total
It was calculating the number of dropped time series instead of the number of dropped samples.

While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated -
at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating
the size of snappy-compressed prompbmarshal.WriteRequest protobuf message.
Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.
2024-01-16 20:47:13 +02:00
Aliaksandr Valialkin
9eef72bce9
lib/protoparser/datadogv2: add support for reading protobuf-encoded requests at /api/v2/series endpoint
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094
2024-01-16 20:32:15 +02:00
Dmytro Kozlov
b95d6f5f5e
app/vmctl: add insecure skip verify flags for source and destination addresses for native protocol (#5606)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5595
2024-01-16 20:29:41 +02:00
Denys Holius
c9b7452a86
CHANGELOG.md: fixed wrong links to vmalert-tool documentation page (#5570) 2024-01-16 19:07:35 +02:00
hagen1778
2a7207f38a
app/all: follow-up after 84d710beab
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-09 13:17:09 +01:00
Dmytro Kozlov
37a76de800
app/vmui: fix broken link for the statistic inaccuracy explanation (#5568)
(cherry picked from commit 105c6b2eb7)
2024-01-08 20:15:04 +01:00
hagen1778
12acea2584
dashboards: update cluster dashboard
* add panels for detailed visualization of traffic usage between vmstorage, vminsert, vmselect
components and their clients. New panels are available in the rows dedicated to specific components.

* update "Slow Queries" panel to show percentage of the slow queries to the total number of read queries
served by vmselect. The percentage value should make it more clear for users whether there is a service degradation.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 463455665b)
2024-01-08 11:58:56 +01:00
Hui Wang
c14e229b20
vmalert: automatically add exported_ prefix for original evaluation… (#5398)
automatically add `exported_` prefix for original evaluation result label if it's conflicted with external or reserved one,
previously it was overridden.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5161

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1f477aba41)
2023-12-22 16:10:33 +01:00
Aliaksandr Valialkin
d145c68c5c
docs/CHANGELOG.md: typo fix after fb90a56de2: supperted -> supported 2023-12-21 21:01:54 +02:00
Aliaksandr Valialkin
62a105d9e9
app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent
This commit adds only JSON support - https://docs.datadoghq.com/api/latest/metrics/#submit-metrics ,
while recent versions of DataDog Agent send data to /api/v2/series in undocumented Protobuf format.
The support for this format will be added later.

Thanks to @AndrewChubatiuk for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451
2023-12-21 20:50:27 +02:00
Aliaksandr Valialkin
3a9cf13aaa
app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags
This is a follow-up for 5ebd5a0d7b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427
2023-12-20 21:38:16 +02:00
Morgan
64e96fccd9
Expose OAuth2 Endpoint Parameters to cli (#5427)
The user may which to control the endpoint parameters for instance to
set the audience when requesting an access token. Exposing the
parameters as a map allows for additional use cases without requiring
modification.
2023-12-20 21:38:13 +02:00
Aliaksandr Valialkin
c888d76c4b
app/vmselect/netstorage: make sure that at least a single result is collected from every storage group before deciding whether it is OK to skip results from the remaining storage nodes 2023-12-20 19:53:49 +02:00
Nikolay
46a335aa1d
lib/awsapi: properly assume role with webIdentity token (#5495)
* lib/awsapi: properly assume role with webIdentity token
introduce new irsaRoleArn param for config. It's only needed for authorization with webIdentity token.
First credentials obtained with irsa role and the next sts assume call for an actual roleArn made with those credentials.
Common use case for it - cross AWS accounts authorization
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3822

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-12-20 19:07:04 +02:00
Aliaksandr Valialkin
0a99c819bf
all: add -metrics.exposeMetadata command-line flag, which can be used for adding TYPE and HELP metadata for metrics exposed at /metrics page
This may be needed for systems, which require this metadata such as Google Cloud Managed Prometheus.
See https://cloud.google.com/stackdriver/docs/managed-prometheus/troubleshooting#missing-metric-type
2023-12-19 03:26:02 +02:00
Aliaksandr Valialkin
9540d29154
lib/pushmetrics: add -pushmetrics.header and -pushmetrics.disableCompression command-line flags 2023-12-17 19:58:14 +02:00
Aliaksandr Valialkin
cbd8a62ba4
docs/CHANGELOG.md: typo fix: use proper backtick closing quote instead of single quote 2023-12-17 19:28:22 +02:00
Roman Khavronenko
bcb2b8247c
vmctl: rename vm-native-disable-retries to vm-native-disable-per-metric-migration (#5476)
The change supposed to better reflect the meaning of this flag.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-17 19:19:22 +02:00
Aliaksandr Valialkin
76b120e355
lib/protoparser/opentelemetry: allow ingesting metrics without resource labels
Some clients may ingest samples via OpenTelemetry protocol without Resource labels.
Previously VictoriaMetrics was silently dropping such samples.

The commit 317834f876 added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"}
counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459

It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.
2023-12-17 19:16:43 +02:00
Aliaksandr Valialkin
bed19a0b20
docs/CHANGELOG.md: move the description of the bugfix from 9253c24dd6 into correct place
The description of the bugfix was incorrectly placed in already released v1.96.0,
while it should be placed in tip.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5450
2023-12-17 19:16:42 +02:00
Roman Khavronenko
5f14fa94dd
vmctl: retry requests that failed in the very end for vm-native (#5475)
Before, retries happened only on writes into a network connection
between source and destination. But errors returned by server after
all the data was transmitted were logged, but not retried.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 664fa5cb78)
2023-12-15 11:54:08 +01:00
Hui Wang
ed4f77575f
vmalert: validate schema for -external.url (#5450)
Requests with wrong or no schema in  `-external.url` could be rejected by alertmanager.
So we validate schema on start up.

(cherry picked from commit 9253c24dd6)
2023-12-15 11:54:07 +01:00
Aliaksandr Valialkin
42629dead1
app/vmstorage: addd missing -inmemoryDataFlushInterval command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
2023-12-14 20:47:49 +02:00
Aliaksandr Valialkin
8bc87acfb9
docs/CHANGELOG.md: document the bugfix at 66c76a4d4d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5414
2023-12-14 12:49:03 +02:00
Aliaksandr Valialkin
51acf0179c
app/vmauth: add ability to route requests to different backends depending on the request host 2023-12-14 00:47:00 +02:00
Aliaksandr Valialkin
d59ee93d04
docs/CHANGELOG.md: cut v1.96.0 release 2023-12-13 00:43:04 +02:00
Yury Molodov
e76c44c5b4
vmui: autocomplete usability improvements (#5422)
* vmui: add show quick tip for autocomplete

* vmui: auto-completion usability improvements #5348

* vmui: add const for min symbols in autocomplete

* Use proper queries to VictoriaMetrics

* vmui: fix comments for autocomplete

* app/vmselect: run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-12-13 00:33:27 +02:00
Aliaksandr Valialkin
e4bb2808f1
app/vmselect: add support for vmstorage groups with independent -replicationFactor per group
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5197

See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmstorage-groups-at-vmselect

Thanks to @zekker6 for the initial pull request at https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/718
2023-12-13 00:14:34 +02:00
hagen1778
3b841fe9ce
app/vmctl: follow-up after 6af732b6f7
Make docs more clear about new feature.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 242472086b)
2023-12-12 13:45:35 +01:00
Dmytro Kozlov
63fc200f16
app/vmctl: enable range steps in reverse order (#5444)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5376

(cherry picked from commit 6af732b6f7)
2023-12-12 13:45:35 +01:00
hagen1778
c2ca6cb84d
docs: fix formatting after a list in CHANGELOG.md
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1e02efd511)
2023-12-12 13:45:34 +01:00
hagen1778
307bcb8d4d
app/vmctl: follow-up after 27668c9d01
* remove duplications in error messages
* mention the change in CHANGELOG.md

27668c9d01
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 39c405ed4d)
2023-12-11 15:38:22 +01:00
hagen1778
73f843a70e
docs: mention alerts change in CHANGELOG.md
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e13dc04fbf)
2023-12-11 15:38:18 +01:00
Aliaksandr Valialkin
050e15d85b
docs/CHANGELOG.md: document v1.93.9 LTS release 2023-12-11 10:39:38 +02:00
Aliaksandr Valialkin
09fe1255ca
docs/CHANGELOG.md: document v1.87.12 2023-12-10 14:30:30 +02:00
Aliaksandr Valialkin
842aba3f46
deployment/docker: update base Docker image from alpine:3.18.5 to alpine:3.19.0
See https://www.alpinelinux.org/posts/Alpine-3.19.0-released.html
2023-12-10 02:28:31 +02:00
Aliaksandr Valialkin
3d6517b05e
app/vmselect: add -search.maxResponseSeries command-line flag for limiting the number of time series a single response can return
This limit can be used for preventing from high memory usage at Grafana when the response returns too many series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5372
2023-12-10 00:54:32 +02:00
Aliaksandr Valialkin
6203c1a745
docs: follow-up after 49552eaa15
Link to the related issue - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792
Fix heading for `Modifying HTTP headers` chapter at docs/vmagent.md
2023-12-08 23:56:40 +02:00
Aliaksandr Valialkin
49552eaa15
app/vmauth: add support for hot standby mode via first_available load balancing policy
vmauth in `hot standby` mode sends requests to the first url_prefix while it is available.
If the first url_prefix becomes unavailable, then vmauth falls back to the next url_prefix.
This allows building highly available setup as described at https://docs.victoriametrics.com/vmauth.html#high-availability

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792
2023-12-08 23:32:10 +02:00
Roman Khavronenko
276e9301f4
app/vmalert: sanitize label names before sending to Alertmanager (#5442)
Before, vmalert would send notifications with labels containing characters
  not supported by Alertmanager validator, resulting into validation errors
  like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"`

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-08 18:09:07 +02:00
Alexander Marshalov
e9cf39f519
added field version to the response for /api/v1/status/buildinfo API for using more efficient API in Grafana for receiving label values, added additional info about setup Grafana datasource (#5370) (#5437) 2023-12-07 16:41:56 +02:00
Aliaksandr Valialkin
9f79342e6a
app/vmselect/prometheus: properly encode Prometheus label values at /federate endpoint
Prometheus spec says that only \, \n and " must be escaped inside label values.
See 995743836e/content/docs/instrumenting/exposition_formats.md (L90)

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5431
2023-12-07 15:36:50 +02:00
Aliaksandr Valialkin
896a0f32cd
lib/promscrape: show -promscrape.cluster.memberNum values for vmagent instances, which scrape the given dropped target at /service-discovery page
The /service-discovery page contains the list of all the discovered targets
after the commit 487f6380d0 on all the vmagent instances
in cluster mode ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ).

This commit improves debuggability of targets in cluster mode by providing a list of -promscrape.cluster.memberNum
values per each target at /service-discovery page, which has been dropped becasue of sharding,
e.g. if this target is scraped by other vmagent instances in the cluster.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018
2023-12-07 00:11:30 +02:00
Aliaksandr Valialkin
12e94f10cc
deployment/docker: update Go builder from Go1.21.4 to Go1.21.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.5+label%3ACherryPickApproved
2023-12-06 22:33:27 +02:00
Dmytro Kozlov
6a41e1ec0c
app/vmalert: replace error metrics for gauges with counter metrics (#5217)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 935bec447b)
2023-12-06 19:41:34 +01:00
Aliaksandr Valialkin
8b6bce61e4
lib/promscrape: follow-up for 97373b7786
Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric
with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down
/metrics exposition significantly when -promscrape.config contains thousands of scrape jobs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335
2023-12-06 17:36:48 +02:00
Aliaksandr Valialkin
509339bf63
app/vmselect: properly adjust the lower bound for the time range where raw samples must be selected for default_rollup() function
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions

While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
2023-12-06 14:46:18 +02:00
Hui Wang
065f5a7f9e
vmagent: add vm_promscrape_scrape_pool_targets for scrape jobs like… (#5335)
* vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers

* add extra panel for new metric
2023-12-06 14:46:02 +02:00
Aliaksandr Valialkin
61db92cdc7
Revert "lib/protoparser/datadog: follow-up after 543f218fe96574b9b2189c8350bb09afa349e3bb"
This reverts commit 73d18fbc7a.

Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080
2023-12-05 02:29:00 +02:00
Aliaksandr Valialkin
bf187b2dc9
app/vmagent: add -enableMultitenantHandlers command-line flag
This flag allows converting tenant id to (vm_account_id, vm_project_id) labels.
this flag deprecates `-remoteWrite.multitenantURL` command-line flag,
because `-enableMultitenantHandlers` is easier to use and combine with multitenant url
at vminsert - https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy-via-labels

See https://docs.victoriametrics.com/vmagent.html#multitenancy

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1505
2023-12-05 01:35:59 +02:00
Dmytro Kozlov
6770bad207
app/vmalert: expose /vmalert/api/v1/rule and /api/v1/rule API which returns rule status in JSON format (#5397)
* app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format

* app/vmalert: hide updates if query param not set

* app/vmalert: fix panic (recursion call)

* app/vmalert: add needed group name and file name

* app/vmalert: fix comment, update behavior

* app/vmalert: fix description

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-12-04 22:49:39 +02:00
Aliaksandr Valialkin
a3d0bbfcda
deployment/docker: update backe Docker image from alpine 3.18.4 to 3.18.5
See https://www.alpinelinux.org/posts/Alpine-3.15.11-3.16.8-3.17.6-3.18.5-released.html
2023-12-04 18:17:07 +02:00
Aliaksandr Valialkin
d868155751
app/vmselect: do not limit concurrency for static and fast queries
Previously concurrency for static and fast queries was limited with the -search.maxConcurrentRequests
command-line flag. This could complicate identifying heavy queries via `vmui` at `Top queries` and `Active queries` pages,
since `vmui` and these pages couldn't be opened on overloaded vmselect.

Thanks to @f41gh7 for the idea.
2023-12-04 18:14:29 +02:00
Aliaksandr Valialkin
b6d6a3a530
lib/promscrape: show dropped targets because of sharding at /service-discovery page
Previously the /service-discovery page didn't show targets dropped because of sharding
( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ).

Show also the reason why every target is dropped at /service-discovery page.
This should improve debuging why particular targets are dropped.

While at it, do not remove dropped targets from the list at /service-discovery page
until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets .
Previously the list was cleaned up every 10 minutes from the entries, which weren't updated
for the last minute. This could complicate debugging of dropped targets.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2023-12-04 17:42:46 +02:00
Zakhar Bessarab
2992682f6c
lib/backup/s3remote: remove prev object versions for recursive delete (#719)
* lib/backup/s3remote: remove prev object versions for recursive delete

- fix error caused by sending empty objects list to be deleted. This was possible in case old versions of objects where deleted, but root-level entries where still available. This caused paginator to return an empty page which wasn't skipped.

- delete previous versions of objects recursively for S3 remote

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add vmbackupmanager fix entry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/backup/s3remote: unify path construction for S3 objects

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-12-04 17:01:09 +02:00
Aliaksandr Valialkin
9f352f1b93
app/vminsert/newrelic: simplify the code a bit after 1fb8dc0092
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5416
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5421
2023-12-04 16:26:52 +02:00
Dmytro Kozlov
1fb8dc0092
app/vminsert: fix newrelic ingestion in cluster version (#5421)
Properly pass tenant ID to ingested data from newrelic.
Before tenant ID was mistakenly skipped.
2023-12-04 09:38:32 +01:00
Hui Wang
3507e1e27b
vmalert-tool: fix alert_rule_test case when eval_time is not multiple of evaluation_interval (#5387)
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 1911320c86)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-01 14:00:58 +01:00
Aliaksandr Valialkin
d1445bc0c8
all: expose additional metrics for simplifying debugging of VictoriaMetrics components
Updates https://github.com/VictoriaMetrics/metrics/issues/54

(cherry picked from commit 8eddccfbb4)
2023-12-01 14:00:28 +01:00
Aliaksandr Valialkin
f0215afee3
lib/promrelabel: add keep_if_contains and drop_if_contains relabeling actions
(cherry picked from commit ac65c6b178)
2023-12-01 14:00:20 +01:00
Nikolay
9505d48070
lib/streamaggr: properly reference slice with labels (#5406)
* lib/streamaggr: properly reference slice with labels
by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402

* Reduce memory allocations when output_relabel_configs adds new labels to output samples

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
(cherry picked from commit 41f7940f97)
2023-12-01 14:00:18 +01:00
hagen1778
73d18fbc7a
lib/protoparser/datadog: follow-up after 543f218fe9
* prevent /api/v1 from panic on parsing rows
* add tests for Extract function for v1 and v2 api's
* separate request types in different pools to prevent different objects mixing
* add changelog line

543f218fe9
Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 98d0f81f21)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-01 13:56:23 +01:00
hagen1778
1e557b73a5
docs: mention contributor of PR 5368
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 5424632ba3)
2023-11-28 12:49:49 +01:00
luckyxiaoqiang
8ce82c5400
app/vmselect/promql: add day_of_year() function (#5368)
Co-authored-by: dingxiaoqiang <dingxiaoqiang@bytedance.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit d7897e0d70)
2023-11-28 12:49:48 +01:00
Aliaksandr Valialkin
2f14394335
app/vmagent: follow-up for 090cb2c9de
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
  for the result returned from these functions.

- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.

- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
  after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.

- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
  of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
  cannot keep up with the data ingestion rate.

- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.

- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
  data to persistent queue when -remoteWrite.disableOnDiskQueue is set.

- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
  because they are replaced with vmagent_remotewrite_samples_dropped_total metric.

- Update 'Disabling on-disk persistence' docs at docs/vmagent.md

- Update stale comments in the code

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 12:13:39 +02:00
Nikolay
25ac2aac31
app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-25 12:12:29 +02:00
Aliaksandr Valialkin
3674232128
docs: make more visible that the maximum JSON line length, which is accepted by /api/v1/import, is limited by -import.maxLineLen command-line flag value
This is a follow-up for 0cf55ded34

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5364
2023-11-24 13:14:40 +02:00
Roman Khavronenko
26242f526e
lib/protoparser: decrease import.maxLineLen from 100MB to 10MB (#5364)
Tests showed that importing a single line with 70MB size takes 5.3GiB
RSS memory for VictoriaMetrics single-node.
In the scenario when user exports and imports data from one VM to another,
it could possibly lead to OOM exception for destination VM.

Importing a single line with 16MB size taks 1.3GiB RSS memory.
Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB
to improve reliability of VictoriaMetrics during imports.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-24 13:13:33 +02:00
Aliaksandr Valialkin
01bc62eff9
docs/CHANGELOG.md: document Google PubSub support at vmagent (see 752f89f13f ) 2023-11-23 21:14:04 +02:00
Aliaksandr Valialkin
a906a7d85c
app/vmagent/remotewrite: do not drop persistent queues when -remoteWrite.multitenantURL is set
It is unsafe to drop persistent queues when -remoteWrite.multitenantURL command-line flag is set,
since these queues are created on demand when a new sample for the given tenant is pushed
to the remote storage.

This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5357
The issue has been appeared in the commit f3a51e8b1d
when implementing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-11-23 20:43:21 +02:00
Hui Wang
91379331eb
lib/protoparser/promremotewrite: fall back to zstd decoding if Snappy-decoding fails (#5344)
This case is possible after the following steps:
1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data.
2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk.
3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression.
4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data.
5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy.

The solution is the same as 12cd32fd75, just fall back to zstd decompression if Snappy decompression fails.
2023-11-17 15:53:18 +01:00
Aliaksandr Valialkin
1a15b0f57b
docs/CHANGELOG.md: cut v1.95.1 2023-11-16 20:32:27 +01:00
Aliaksandr Valialkin
ef80a89a24
lib/handshake: add SetReadDeadline and SetWriteDeadline implementations additionally to SetDeadline
This is a follow-up for 27a5461785

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327
2023-11-16 16:43:36 +01:00
Roman Khavronenko
27a5461785
lib/handshake: check for deadline in Read and Write methods (#5327)
The buffered connection could have exceeded the underlying connection
deadline during reading or writing to an internal buffer.
With this change, buffered connection struct additionally checks
for a deadline in Read/Write methods.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-16 16:33:40 +01:00
Aliaksandr Valialkin
147fe45828
docs/CHANGELOG.md: remove duplicate word query after 2cbdb1db22 2023-11-16 16:24:15 +01:00
Aliaksandr Valialkin
7ca8ebef20
app/vmselect/promql: properly handle duplicate series when merging cached results with the results obtained from the database
evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series)
when performing queries satisfying all the following conditions:

- It must select time series with multiple metric names. For example, {__name__=~"foo|bar"}
- The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo|bar"})
- The rollup function must be wrapped into aggregate function, which has no streaming optimization.
  For example, quantile(0.9, rate({__name__=~"foo|bar"})

In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error.
Instead, it should fall back to query execution with disabled cache.

Also properly store the merged results. Previously they were incorrectly stored because of a typo
introduced in the commit 41a0fdaf39

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337
2023-11-16 16:16:17 +01:00
hagen1778
7d72474a38
dashboards: use version instead of short_version in annotations
`version` label won't show the difference if various flavors of the same
version were deployed. But `short_version` will.

For example, on the sandbox env we test VM builds before new version release.
Without this change, the version update won't be visible on dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d389a4fcf3)
2023-11-16 09:27:42 +01:00
Aliaksandr Valialkin
9ad4a8fffe
docs/CHANGELOG.md: cut v1.95.0 release 2023-11-15 17:46:02 +01:00
Aliaksandr Valialkin
bd5bbdf00c
docs/CHANGELOG.md: document v1.93.8 LTS release 2023-11-15 17:12:56 +01:00
Aliaksandr Valialkin
6a8911ad38
docs/CHANGELOG.md: document v1.87.11 LTS release 2023-11-15 15:54:57 +01:00
Aliaksandr Valialkin
d7a63529b5
docs/CHANGELOG.md: consistently prepend command-line flags with a single dash 2023-11-14 21:44:46 +01:00
hagen1778
cfc58dd932
docs: clarify vmalert flag changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 21:44:46 +01:00
Nikolay
0730c2586d
lib/querytracer: makes package concurrent safe to use (#5322)
* lib/querytracer: makes package concurrent safe to use
it must fix various issues with concurrent code usage.
Especially, when it's not reasonable to wait for all goroutines to be finished

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 20:58:28 +01:00
hagen1778
72a40539b0
dashboards: update description for RSS and anonymous memory panels to be consistent for single-node, cluster and vmagent dashboards.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d3ae2b2f62)
2023-11-14 10:00:11 +01:00
hagen1778
777424082b
deployment/dashboards: respect job and instance filters for alerts annotation in cluster and single-node dashboards
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d6ae082598)
2023-11-14 10:00:11 +01:00
Aliaksandr Valialkin
d6a2264709
docs/CHANGELOG.md: document 0e056ddb2d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5203
2023-11-14 01:24:29 +01:00
Zakhar Bessarab
f7834767c1
vmcluster: re-routing enhancement (#5293)
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878

Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:00:42 +01:00
Aliaksandr Valialkin
c1f651a9f9
app/vmauth: add ability to drop the specified number of /-delimited prefix parts from request path
This can be done via `drop_src_path_prefix_parts` option at `url_map` and `user` levels.

See https://docs.victoriametrics.com/vmauth.html#dropping-request-path-prefix
2023-11-13 22:34:40 +01:00
Aliaksandr Valialkin
12cd32fd75
lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails
This case is possible after the following steps:

1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether
   the remote storage supports zstd-compressed data.
2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression
   for the data sent to the remote storage.
3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk.
4. The remote storage becomes available.
5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data.
6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage,
   while falsely advertizing it sends zstd-compressed data.
7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd.

The solution is to just fall back to Snappy decompression if zstd decompression fails.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
2023-11-13 21:25:39 +01:00
Aliaksandr Valialkin
356deada8c
lib/htmlcomponents: use relative links for the top page and for favicon.ico
This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes.
For example, vmagent HTTP handlers can be served via /vmagent/ path prefix:

- http://proxy/vmagent/targets
- http://proxy/vmagent/service-discovery

The path prefix can be arbitrary. For example, below are vmagent urls
for /tenantID/vmagent/ path prefix:

- http://proxy/tenantID/vmagent/targets
- http://proxy/tenantID/vmagent/service-discovery

While at it, consistently serve favicon.ico from any path directory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307
2023-11-13 20:28:17 +01:00
Aliaksandr Valialkin
fb2071a01e
lib/regexutil: properly handle alternate regexps surrounded by .+ or .*
Previously the following regexps were improperly handled:

  .+foo|bar.+
  .*foo|bar.*

This could lead to unexpected regexp match results.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5297

Thanks to @Haleygo for the initial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5308
2023-11-13 18:25:57 +01:00
Aliaksandr Valialkin
2a3352c70e
docs/CHANGELOG.md: remove trailing whitespace after bffd30b57a 2023-11-13 09:47:36 +01:00
Aliaksandr Valialkin
b9aba7edfb
app/vmauth: properly pass Host header to backends
Previously the `Host` header was remained unchanged when passing it in requests to backends.
This may improperly work if the backend uses host-based routing.

While at it, allows http/2.0 requests to backends. While VictoriaMetrics components
do not accept http/2.0 requests, other backends can require such requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
2023-11-13 09:45:34 +01:00
Aliaksandr Valialkin
78bc816220
app/vmauth: follow-up for 323f3720ed
- Re-use identically configured http.Transport across multiple users.
  This fixes handling of the limit on the number of connection, which can be established per each backend
  via -maxIdleConnsPerBackend command-line flag. This limit stopped working after 323f3720ed

- Add docs about backend TLS setup at https://docs.victoriametrics.com/vmauth.html#backend-tls-setup

- Add ability to disable backend TLS verification for all the users via -backend.tlsInsecureSkipVerify command-line flag.
  This flag may be useful when -auth.config contains big number of users, and every user must disable backend TLS verification.

- Add ability to specify TLS Root CA via tls_ca_file option at per-user basis and via -backend.tlsCAFile command-line flag
  across all the users.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
2023-11-13 09:45:16 +01:00
Aliaksandr Valialkin
76384b6d28
app/vmauth: improve docs a bit after 323f3720ed
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
2023-11-13 09:44:25 +01:00
Aliaksandr Valialkin
d9ecc3f6d7
lib/logger: add -loggerMaxArgLen command-line flag for fine-tuning the maximum length of logged args 2023-11-13 09:43:49 +01:00
Aliaksandr Valialkin
c916294b61
app/vmselect/promql: optimize instant queries with min_over_time() and max_over_time() rollup functions
This is a follow-up for 41a0fdaf39
2023-11-13 09:43:18 +01:00
Aliaksandr Valialkin
7bbdecb79a
deployment: update Go builder from Go1.21.3 to Go1.21.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.4+label%3ACherryPickApproved
2023-11-13 09:40:08 +01:00
Aliaksandr Valialkin
ed79f9806a
lib/blockcache: do not cache entries, which were attempted to be accessed 1 or 2 times
Previously entries which were accessed only 1 time weren't cached.
It has been appeared that some rarely executed heavy queries may read indexdb block twice
in a row instead of once. There is no need in caching such a block then.
This change should eliminate cache size spikes for indexdb/dataBlocks when such heavy queries are executed.

Expose -blockcache.missesBeforeCaching command-line flag, which can be used for fine-tuning
the number of cache misses needed before storing the block in the caching.
2023-11-13 09:38:57 +01:00
Zakhar Bessarab
7c7e0a5caa
docs/changelog: document vmbackupmanager bugfix (#5303)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-11-13 09:31:58 +01:00
Roman Khavronenko
becf7bf8df
app/vmalert: update remote-write process (#5284)
* app/vmalert: update remote-write process

* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2023-11-13 09:25:29 +01:00
Yury Molodov
d7c6153f68
vmui: display query error on Explore metrics page (#5272)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5202

(cherry picked from commit f90d2ec843)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-03 16:25:21 +01:00
Zakhar Bessarab
dea4695df5
app/vmauth: add option to skip TLS verification (#5256)
Add `tls_insecure_skip_verify` option on per-user basis which allows to disable TLS verification for all requests to backend on behalf of this user.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 323f3720ed)
2023-11-03 12:05:26 +01:00