Commit Graph

3365 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
8b1c38abde
app/vmauth: follow-up for 3a45bbb4e0
- Move the test for SRV discovery into a separate function. This allows verifying round-robin discovery across SRV records.
- Restore the original netutil.Resolver after the test finishes, so it doesn't interfere with other tests.
- Move the description of the bugfix into the correct place at docs/CHANGELOG.md - it should be placed under v1.102.0-rc2
  instead of v1.102.0-rc1.
- Remove unneeded code in URLPrefix.sanitizeAndInitialize(), since it is expected this function is called only once
  for finishing URLPrefix initializiation. In this case URLPrefix.nextDiscoveryDeadline and URLPrefix.n are equal to 0
  according to https://pkg.go.dev/sync/atomic#Uint64
- Properly fix the bug at URLPrefix.discoverBackendAddrsIfNeeded() - it is expected that hostToAddrs map uses
  the original hostname keys, including 'srv+' prefix, so it shouldn't be removed when looping over up.busOriginal.
  Instead, the 'srv+' prefix must be removed from the hostname only locally before passing the hostname to netutil.Resolver.LookupSRV.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6401
2024-07-16 10:41:08 +02:00
Aliaksandr Valialkin
468c04d3c2
app/vmauth: clarify the description for -idleConnTimeout command-line flag
This is a follow-up for d44058bcd6
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6388
2024-07-16 09:40:01 +02:00
Aliaksandr Valialkin
8b76a40715
lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag()
This is a follow-up for 61dce6f2a1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329
2024-07-16 01:08:41 +02:00
Aliaksandr Valialkin
aa52d6cd9b
app/vminsert: increase default value for -maxLabelValueLen command-line flag from 1KiB to 4KiB
It has been appeared that the standard Kubernetes monitoring can generate labels with sizes up to 4KiB

This is a follow-up for a5d1013042
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176
2024-07-15 23:32:54 +02:00
Aliaksandr Valialkin
476bf400ac
lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc
- Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call
- NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil
- Simplify the implementation of NewStatDialFunc by removing sync.Map from there.
- Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils
- Use gauge instead of counter type for *_conns metric

This is a follow-up for d7b5062917
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299
2024-07-15 23:05:46 +02:00
Aliaksandr Valialkin
353766061b
app/{vminsert,vmselect}: pass proper args to metrics.UnregisterSet() after a8356f3a26 2024-07-15 20:27:40 +02:00
Aliaksandr Valialkin
cbc637d1dd
app/vmagent/remotewrite: follow-up for f153f54d11
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
  This improves code maintainability a bit.

- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
  This prevents from potential resource leaks.

- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
  This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.

- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
  in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.

- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
  This label should simplify investigation of the exposed metrics.

- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
  it is hard to use these labels in query filters and aggregation functions.

- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
  This metric shows the number of samples generated by the corresponding streaming aggregation rule.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
  This metric has been added in the commit 861852f262 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462

- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
  which could modify the behaviour of the constructed streaming aggregator.
  Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.

- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
  This also allows passing nil instead of Options when default options are enough.

- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
  so they have consistent set of labels comparing to the rest of streaming aggregation metrics.

- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
  `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
  `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
  metrics. This is a follow-up for the commit 2eb1bc4f81 .
  See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604

- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
  figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
  in the commit 2eb1bc4f81 .

- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
  While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin
a8356f3a26
vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0
Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call
because of missing s.UnregisterAllMetrics() call.

This is a follow-up for 6a6e34ab8e . It is OK if some vmauth metrics
aren't visible for a few microseconds when the previous metrics are unregistered and new metrics
weren't registered yet.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805
2024-07-15 10:45:39 +02:00
Aliaksandr Valialkin
3365dd508f
app/vmagent/remotewrite: do not spend CPU time on an attempt to send data to blocked queue if some queues are unblocked
Previously remotewrite.TryPush() was trying to send data to remote storages with blocked persistent queues,
if some persistent queues to other remote storage systems were unblocked. This resulted in excess CPU usage
on relabeling and stream aggregation for the remote storage with blocked queues.

The solution is to check whether some peristent storages have blocked queues and skip them before applying
per- -remoteWrite.url relabeling and streaming aggregation.

While at it, properly update per- -remoteWrite.url vmagent_remotewrite_samples_dropped_total and vmagent_remotewrite_push_failures_total
counters when global streaming aggregation cannot send data to remote storage systems because of blocked queues.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 and https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268 .

This is a follow-up for 87fd400dfc and f153f54d11

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
2024-07-15 09:40:34 +02:00
Aliaksandr Valialkin
4921ec5604
docs/CHANGELOG.md: use new link to VictoriaMetrics cluster docs instead of old link
The old link was changed globally to the new link in the commit f4b1cbfef0 .
Unfortunately, old links are still posted in new commits :(

This is a follow-up for 680b8c25c8 .

While at it, remove duplicate 'len(*remoteWriteURLs) > 0' check in the remotewrite.Init() functions,
since this check is already made at the beginning of the function.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6253
2024-07-13 03:04:20 +02:00
Aliaksandr Valialkin
bc1f92d7f5
app/vmagent/remotewrite: follow-up for 87fd400dfc
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
  systems are configured with the disabled on-disk queue, every in-memory queue is full
  and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
  so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
  relabeling and other processing in this case.

- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
  Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
  In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
  and on-disk queue is disabled.
  The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
  the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
  processes pending data, so it drops aggregated samples in this case.

- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
  so it is clear that this flag can be set individually per each -remoteWrite.url.

- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
  are configured with the disabled on-disk queue, then there is no sense in keeping samples
  on some of these systems, while dropping samples on the remaining systems, since this
  will result in global stall on the remote storage system with the disabled on-disk queue
  and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
  from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
  written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
  is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
  This allows proceeding with newly scraped / pushed samples by sending them to the remaining
  remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.

- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.

- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
  See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
2024-07-13 02:30:10 +02:00
Aliaksandr Valialkin
5c7345b8ce
app/victoria-logs/Makefile: add make victoria-logs-linux-loong64 build rule
This is a follow-up for 80f3644ee3

The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6222 missed build rule for VictoriaLogs.
2024-07-12 23:13:19 +02:00
Aliaksandr Valialkin
43fc1183b9
app/vmalert: switch from table-driven tests to f-tests
This makes test code more clear and reduces the number of code lines by 500.
This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.

This is a follow-up for a9525da8a4
2024-07-12 22:45:50 +02:00
Aliaksandr Valialkin
04a304fd39
app/vmctl: switch from table-driven tests to f-tests
This simplifies debugging tests and makes the test code more clear and concise.
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e

While at is, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.

This is a follow-up for a9525da8a4
2024-07-12 22:45:49 +02:00
Aliaksandr Valialkin
7c97cef95c
app: consistently use t.Fatal* instead of t.Error* (except of app/vmalert and app/vmctl - these packages will be processed in a separate commit)
Consistently using t.Fatal* simplifies the test code and makes it less fragile, since it is common error
to forget to make proper cleanup after t.Error* call. Also t.Error* calls do not provide any practical
benefits when some tests fail. They just clutter test output with additional noise information,
which do not help in fixing failing tests most of the time.

This is a follow-up for a9525da8a4
2024-07-11 16:01:25 +02:00
Zhu Jiekun
2ea575e776
vmalert: [bug] fixed System hyperlink 404 redirect (#6620)
### Describe Your Changes

As mentioned in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6603, some hyperlinks under `vmalert` -> `System`
section is not working as expected.

Pages and redirection:
- For page `http://127.0.0.1:8880/`: `flags` button will redirect to
`http://127.0.0.1:8880/flags`
- For page `http://127.0.0.1:8880/vmalert`:
`http://127.0.0.1:8880/flags`
- For page `http://127.0.0.1:8880/vmalert/`:
`http://127.0.0.1:8880/vmalert/flags` (page not exists)
- Similar redirection could be observed with `-http.pathPrefix`

Two potential ways to avoid 404 redirection:
1. **avoid visiting `/vmalert/`** (I'm trying to do this).
2. provide support for `/vmalert/flags`.

`/vmalert/` could be visit only when user click other navigator (e.g.
Group) and click vmalert again:
![Peek 2024-07-10
10-07](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/13d7b147-a1b6-4e93-9ee0-26f881a16bef)
Because: `http://127.0.0.1:8880/vmalert/groups?search=` + `<a
class="nav-link" href=".">` = `http://127.0.0.1:8880/vmalert/`

So I'm trying to change the `href="."` to `href="../vmalert"`.

### Checklist

The following checks are **mandatory**:

- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit cadf1eb5ab)
2024-07-11 12:40:23 +02:00
Zakhar Bessarab
401ae72587
app/vmselect/promql: propagate lower bucket values when fixing a histogram (#6547)
### Describe Your Changes

In most cases histograms are exposed in sorted manner with lower buckets
being first. This means that during scraping buckets with lower bounds
have higher chance of being updated earlier than upper ones.

Previously, values were propagated from upper to lower bounds, which
means that in most cases that would produce results higher than expected
once all buckets will become updated.
Propagating from upper bound effectively limits highest value of
histogram to the value of previous scrape. Once the data will become
consistent in the subsequent evaluation this causes spikes in the
result.

Changing propagation to be from lower to higher buckets reduces value
spikes in most cases due to nature of the original inconsistency.

 See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580

An example histogram with previous(red) and updated(blue) versions:

![1719565540](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/1367798/605c5e60-6abe-45b5-89b2-d470b60127b8)

This also makes logic of filling nan values with lower buckets values: [1 2 3 nan nan nan] => [1 2 3 3 3 3] obsolete.
Since buckets are now fixed from lower ones to upper this happens in the main loop, so there is no need in a second one.

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6a4bd5049b)
2024-07-10 15:17:08 +02:00
Aliaksandr Valialkin
a1decb5ca1
app/vlinsert/loki: use easyproto instead for parsing Loki protobuf messages 2024-07-10 03:05:55 +02:00
Aliaksandr Valialkin
32ae40410c
app/vlselect/vmui: run make vmui-logs-update after 662e026279 2024-07-10 03:05:55 +02:00
Aliaksandr Valialkin
b8a8d3d6f1
lib/logstorage: drop all the pipes from the query when calculating the number of matching logs at /select/logsql/hits API 2024-07-10 00:39:16 +02:00
Aliaksandr Valialkin
d6415b2572
all: consistently use 'any' instead of 'interface{}'
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
2024-07-10 00:23:26 +02:00
Aliaksandr Valialkin
73ca22bb7d
app/vlinsert/loki: remove unused functions from the generated protobuf code 2024-07-10 00:22:10 +02:00
Yury Molodov
33bd5ccbab
vmui/logs: add spinner to bar chart (#6577)
Add a spinner to the bar chart

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6558

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 662e026279)
2024-07-09 18:27:23 +02:00
Hui Wang
6f602a4ef5
security: upgrade base docker image (Alpine) from 3.20.0 to 3.20.1
See https://www.alpinelinux.org/posts/Alpine-3.20.1-released.html

>including security fixes for:
OPENSSL
[CVE-2024-4741](https://security.alpinelinux.org/vuln/CVE-2024-4741)
BUSYBOX
[CVE-2023-42364](https://security.alpinelinux.org/vuln/CVE-2023-42364)
[CVE-2023-42365](https://security.alpinelinux.org/vuln/CVE-2023-42365)

(cherry picked from commit 8e9f98e725)
2024-07-09 11:38:44 +02:00
Artem Navoiev
7b508a9334
fix typo
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
(cherry picked from commit 4527020a68)
2024-07-09 10:52:50 +02:00
Yury Molodov
7fc9912d15
vmui: add compact JSON display (#6582)
### Describe Your Changes
If a JSON element has only one field, it will be displayed on a single
line.
 #6559

| Old Display | New Display |
|-------------|-------------|
|
![image](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/29711459/8866517b-a49d-450f-904c-19117397a078)
|
![image](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/29711459/8e222b43-a4cb-4f32-9a79-6199778404d3)
|

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 959a4383c5)
2024-07-05 09:49:12 +02:00
Hui Wang
bbd49a1a61
vmalert: allow omitting -replay.timeTo in replay mode, default valu… (#6575)
…e is the current timestamp

address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6492

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3169524fb7)
2024-07-05 09:49:06 +02:00
Roman Khavronenko
b13c363f12
app/vmalert: add examples for source override (#6561)
The change adds a new docs section with examples on how source can be
overridden. It should address questions like
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6536

While there, fix the example in `external.alert.source` cmd-line flag
and docker-compose examples.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c429bbf889)
2024-07-05 09:49:03 +02:00
Aliaksandr Valialkin
172ae1adf7
Revert c6c5a5a186 and b2765c45d0
Reason for revert:

There are many statsd servers exist:

- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics

These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).

Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.

The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).

Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.

P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:

- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)

It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.

In the mean time it is recommended continue using external statsd server.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
2024-07-03 23:57:49 +02:00
Aliaksandr Valialkin
cd152693c6
Revert "Exemplar support (#5982)"
This reverts commit 5a3abfa041.

Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .

Issues with Prometheus exemplars:

- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
  It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
  See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
  and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage

- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
  for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
  for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
  via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .

- It looks like exemplars are supported for Histogram metric types only -
  see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
  Exemplars aren't supported for Counter, Gauge and Summary metric types.

- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
  contains histogram_quantile() function. It queries exemplars via special Prometheus API -
  https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
  and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
  in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
  exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
  This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
  Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.

- Exemplars are usually connected to traces. While traces are good for some

I doubt exemplars will become production-ready in the near future because of the issues outlined above.

Alternative to exemplars:

Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982
2024-07-03 16:09:18 +02:00
Aliaksandr Valialkin
a5d60ad78e
app/vmagent/remotewrite,lib/streamaggr: re-use common code in tests after 879771808b
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
  This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.

- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
  so it could be used in tests for building []prompbmarshal.TimeSeries from string.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
2024-07-03 15:22:51 +02:00
Aliaksandr Valialkin
4268a310c1
app/vmagent/remotewrite/remotewrite.go: make remoteWriteCtx.TryPush code easier to follow
Move the code responsible for relabelCtx clearing into deferred function.
This allows making more clear the remoteWriteCtx.TryPush code.

This is a follow-up for 879771808b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205

While at it, clarify the description of the bugfix at docs/CHANGELOG.md
2024-07-03 14:18:51 +02:00
Aliaksandr Valialkin
f406764ccc
app/vmagent/remotewrite/streamaggr.go: clarify the description for -remoteWrite.streamAggr.* command-line flags, so they are applied to the corresponding -remoteWrite.url 2024-07-03 14:18:51 +02:00
Aliaksandr Valialkin
bb7406e9c0
app/vmselect/promql: follow-up for dd0d2c77c8 and 6149adbe10
Use metricsql.IsLikelyInvalid() function for determining whether the given query is likely invalid,
e.g. there is high change the query is incorrectly written, so it will return unexpected results.

The query is invalid most of the time if it passes something other than series selector into rollup function.
For example:

- rate(sum(foo))
- rate(foo + bar)
- rate(foo > bar)

Improtant note: the query is considered valid if it misses the lookbehind window in square brackes inside rollup function,
e.g. rate(foo), since this is very convenient MetricsQL extention to PromQL, and this query returns the expected results
most of the time.

Other unsafe query types can be added in the future into metricsql.IsLikelyInvalid().

TODO: probably, the -search.disableImplicitConversion command-line flag must be set by default in the future releases of VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6180
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6450
2024-07-03 00:46:56 +02:00
Aliaksandr Valialkin
82748b2b9d
deployment/docker: update Go builder from Go1.22.4 to Go1.22.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.22.5+label%3ACherryPickApproved
2024-07-03 00:07:55 +02:00
LHHDZ
c8431c8e4d
app/vmauth: reader pool to reduce gc & mem alloc (#6533)
follow up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6446

issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6445

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
(cherry picked from commit 4d66e042e3)
2024-07-02 14:37:15 +02:00
Aliaksandr Valialkin
0912a652d5
app/vlinsert/insertutils: flush the ingested logs from in-memory buffer to storage every second
Previously the in-memory buffer could remain unflushed for long periods of time under low ingestion rate.
The ingested logs weren't visible for search during this time.
2024-07-02 01:39:45 +02:00
Aliaksandr Valialkin
ab28a1f93e
app/vlinsert/syslog: add an ability to use log ingestion time as the _time field 2024-07-02 01:39:45 +02:00
Hui Wang
085bc1f15c
vmui: increase max query tab from 4 to 10 (#6546)
(cherry picked from commit 9da78f1e0e)
2024-07-01 16:40:42 +02:00
Hui Wang
87cb132f53
app/vmselect/netstorage: do not retry request when complexity limit i… (#6469)
…s already exceeded

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 16:38:15 +02:00
Andrii Chubatiuk
937ae2ca90
lib/streamaggr: added stale samples metric, added metrics labels (#6462)
### Describe Your Changes

- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
   - rwctx - global or number starting from 1
   - aggrid - aggregator id starting from 1
   - aggrSuffix - <interval>_(by|without)_label1_label2_labeln
   e.g: `name="global:1:1m_without_instance_pod"`

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 861852f262)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 15:01:49 +02:00
Aliaksandr Valialkin
4b3477e62b
lib/logstorage: add stream_context pipe, which allows selecting surrounding logs for the matching logs 2024-06-28 19:15:19 +02:00
Aliaksandr Valialkin
c9fc8079c4
app/vlinsert/syslog: properly skip empty lines in Syslog protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6548
2024-06-28 14:09:45 +02:00
Aliaksandr Valialkin
bb6424aeca
app/vlselect/logsql: add optional fields_limit query arg to /select/logsql/hits HTTP endpoint
This query arg is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6545
in order to return top N groups with the biggest number of hits.
2024-06-28 03:10:05 +02:00
Aliaksandr Valialkin
b26acec9a8
app/vlselect: properly return live tailing results 2024-06-27 15:06:15 +02:00
Aliaksandr Valialkin
dd62a2b9d6
lib/logstorage: work-in-progress 2024-06-27 14:21:03 +02:00
Andrii Chubatiuk
f79df2aa8b
app/vmauth: allow dropping host header (#6525)
### Describe Your Changes

Fixes #6453

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-06-26 19:12:35 +02:00
Yury Molodov
6bde0196d8
vmui/logs: fix the update of the relative time range (#6517)
### Describe Your Changes

- Fixed the update of the relative time range when `Execute Query` is
clicked
- Optimized server requests: now, if an error occurs in the `/query`
request, the `/hits` request will not be executed.

#6345 (duplicates: #6440, #6312)

(cherry picked from commit 43342745ac)
2024-06-26 11:26:08 +02:00
Yury Molodov
904ec020ed
vmui: fix input cursor position reset (#6530)
### Describe Your Changes

This PR addresses the issue where the cursor jumps to the end of the
input fields in the modal settings window after each keystroke.

### Before fix:

![ezgif-7-4c69805cea](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/29711459/2e99e833-09e3-4b44-89aa-fc1bd3c4346d)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit e9b71a2883)
2024-06-26 11:25:47 +02:00
Yury Molodov
25f3e700a6
vmui: update package-lock.json (#6532)
1. Updated `package-lock.json` to resolve [Dependabot
alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/security/dependabot).
2. Updated types to align with the latest `Preact` update.

(cherry picked from commit 6cab811134)
2024-06-26 11:25:45 +02:00