to allow configuring additional headers in each request to the
corresponding notifier.
Other flags like `-datasource.headers`, `-remoteWrite.headers` already
use `^^` as delimiter, it's consistent to use it in `-notifier.headers`
as well.
related https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3260
vmalert can integrate with alertmanager that supports multi-tenant by
adding tenantID header`X-Scope-OrgID` in requests.
In multitenancy, vmalert can also filter alerts which send to different
notifier addresses(or with different header settings) using
`alert_relabel_configs`.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3260
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 0f1ec33892)
Production workload shows that it's useful optimisation.
Channel based objects pool allows to handle irregural data ingestion
requests and make memory allocations more smooth.
It's improves sync.Pool efficiency, since objects from sync.Pool removed
after 2 GC cycles. With GOGC=30 value, GC runs significantly more often.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6733
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Describe Your Changes
When I use usePromCompatibleNaming with vmagent to process data that
needs to be formatted from different sources such as InfluxDB, I find
that it doesn’t work
However, it works in vminsert. I found that vminsert uses the
HasRelabeling method to determine whether to relabel.
```go
func HasRelabeling() bool {
pcs := pcsGlobal.Load()
return pcs.Len() > 0 || *usePromCompatibleNaming
}
```
in vmagent, the decision to relabel is determined only by
pcsGlobal.Len() > 0. However, in the applyRelabeling method, the
usePromCompatibleNaming logic is also used to determine whether to
relabel in the error handling.
```go
func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, pcs *promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries {
if pcs.Len() == 0 && !*usePromCompatibleNaming {
// Nothing to change.
return tss
}
```
So I think that the logic for determining whether to relabel in vmagent
is not as expected.
Checklist
The following checks are mandatory:
[✅]My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
(cherry picked from commit d134a310f3)
vmselect will create `./tmp` dir under `cacheDataPath`. If
`cacheDataPath` is set to `/`, vmselect will use `/tmp`.
content under `/tmp` dir might be auto removed based on the OS
behaviour. See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5770
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
…eep_metric_names` options in stream aggregation config together
With aggregated data and raw data under the same metric, results would
be confusing.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 62d19369a3)
VM has different responses to equivalent queries for MetricsQL and
GraphiteQL in case of failed access to one of vmstorage node of the
cluster vmstorage nodes. For GraphiteQL, the denyPartialResponse feature
is not used, it is always true, which is not always correct (depending
on the configuration).
In the PR I have removed the hardcoded denyPartialResponse for
GraphiteQL, just like MetricsQL does.
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Make `-remoteWrite.streamAggr.ignoreFirstIntervals` of array type so it could
accept multiple values which can be applied to the corresponding`-remoteWrite.url`.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8f5c26d788)
Fix `-streamAggr.dropInputLabels` behavior when global deduplication is enabled without `-streamAggr.config`.
Previously, `-remoteWrite.streamAggr.dropInputLabels` is misapplied.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 4863605469)
The error check was needed before a84491324d
It was kept by mistake and makes no sense to have rn.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 9726e6c1a2)
### Describe Your Changes
- Adds support for displaying the top 5 log streams in the hits graph,
grouping the remaining streams into an "other" label.
#6545
- Adds options to customize the graph display with bar, line, stepped
line, and points views.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
(cherry picked from commit 04c2232e45)
### Describe Your Changes
This is useful for clients which validate InfluxDB is available before
data ingestion can be started.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 9877a5e7d5)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Added `--vm-backoff-retries`, `--vm-backoff-factor`,
`--vm-backoff-min-duration` and `--vm-native-backoff-retries`,
`--vm-native-backoff-factor`, `--vm-native-backoff-min-duration`
command-line flags to the `vmctl` app. Those changes will help to
configure the retry backoff policy for different situations.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6622
### Checklist
The following checks are **mandatory**:
- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6f401daacb)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* add a toggle button to the "Group" tab that allows users to expand or collapse all groups at once
* introduce the ability to select a key for grouping logs within the "Group" tab
* display the number of entries within each log group.
* move the Markdown toggle to the general settings panel in the upper left corner.
(cherry picked from commit e06a19d85f)
### Describe Your Changes
Change response code to 502 to align it with behaviour of other existing
reverse proxies. Currently, the following reverse proxies will return
502 in case an upstream is not available: nginx, traefik, caddy, apache.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
The %q formatter may result in incorrectly formatted JSON string if the original string
contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string
cannot be parsed by JSON parsers.
This is a follow-up for c0caa69939
See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24
`Storage.AddRows()` returns an error only in one case: when
`Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But
the same error is treated as a warning when it happens inside
`Storage.add()` or returned by `Storage.prefillNextIndexDB()`.
This commit fixes this inconsistency by treating the error returned by
`Storage.updatePerDateData()` as a warning as well. As a result
`Storage.add()` does not need a return value anymore and so doesn't
`Storage.AddRows()`.
Additionally, this commit adds a unit test that checks all cases that
result in a row not being added to the storage.
---------
Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
- Rename overrideHostHeader() function to hasEmptyHostHeader()
- Rename overrideHostHeader field at UserInfo to useBackendHostHeader
This should simplify the future maintenance of the code
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6525
This reverts commit 4d66e042e3.
Reasons for revert:
- The commit makes unrelated invalid changes to docs/CHANGELOG.md
- The changes at app/vmauth/main.go are too complex. It is better splitting them into two parts:
- pooling readTrackingBody struct for reducing pressure on GC
- avoiding to use readTrackingBody when -maxRequestBodySizeToRetry command-line flag is set to 0
Let's make this in the follow-up commits!
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6445
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6533
- Move the test for SRV discovery into a separate function. This allows verifying round-robin discovery across SRV records.
- Restore the original netutil.Resolver after the test finishes, so it doesn't interfere with other tests.
- Move the description of the bugfix into the correct place at docs/CHANGELOG.md - it should be placed under v1.102.0-rc2
instead of v1.102.0-rc1.
- Remove unneeded code in URLPrefix.sanitizeAndInitialize(), since it is expected this function is called only once
for finishing URLPrefix initializiation. In this case URLPrefix.nextDiscoveryDeadline and URLPrefix.n are equal to 0
according to https://pkg.go.dev/sync/atomic#Uint64
- Properly fix the bug at URLPrefix.discoverBackendAddrsIfNeeded() - it is expected that hostToAddrs map uses
the original hostname keys, including 'srv+' prefix, so it shouldn't be removed when looping over up.busOriginal.
Instead, the 'srv+' prefix must be removed from the hostname only locally before passing the hostname to netutil.Resolver.LookupSRV.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6401
- Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call
- NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil
- Simplify the implementation of NewStatDialFunc by removing sync.Map from there.
- Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils
- Use gauge instead of counter type for *_conns metric
This is a follow-up for d7b5062917
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299
- Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go .
This improves code maintainability a bit.
- Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs().
This prevents from potential resource leaks.
- Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation.
This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions.
- Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label
in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics.
- Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation.
This label should simplify investigation of the exposed metrics.
- Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability:
it is hard to use these labels in query filters and aggregation functions.
- Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` .
This metric shows the number of samples generated by the corresponding streaming aggregation rule.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice.
This metric has been added in the commit 861852f262 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462
- Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params,
which could modify the behaviour of the constructed streaming aggregator.
Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory.
- Pass Options arg to LoadFromFile() function by reference, since this structure is quite big.
This also allows passing nil instead of Options when default options are enough.
- Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics,
so they have consistent set of labels comparing to the rest of streaming aggregation metrics.
- Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding
`aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding
`output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output
metrics. This is a follow-up for the commit 2eb1bc4f81 .
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604
- Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users
figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason
in the commit 2eb1bc4f81 .
- Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function.
While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268
The old link was changed globally to the new link in the commit f4b1cbfef0 .
Unfortunately, old links are still posted in new commits :(
This is a follow-up for 680b8c25c8 .
While at it, remove duplicate 'len(*remoteWriteURLs) > 0' check in the remotewrite.Init() functions,
since this check is already made at the beginning of the function.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6253
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
systems are configured with the disabled on-disk queue, every in-memory queue is full
and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
relabeling and other processing in this case.
- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
and on-disk queue is disabled.
The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
processes pending data, so it drops aggregated samples in this case.
- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
so it is clear that this flag can be set individually per each -remoteWrite.url.
- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
are configured with the disabled on-disk queue, then there is no sense in keeping samples
on some of these systems, while dropping samples on the remaining systems, since this
will result in global stall on the remote storage system with the disabled on-disk queue
and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
This allows proceeding with newly scraped / pushed samples by sending them to the remaining
remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.
- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.
- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
This makes test code more clear and reduces the number of code lines by 500.
This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e
While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.
This is a follow-up for a9525da8a4
This simplifies debugging tests and makes the test code more clear and concise.
See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e
While at is, consistently use t.Fatal* instead of t.Error* across tests, since t.Error*
requires more boilerplate code, which can result in additional bugs inside tests.
While t.Error* allows writing logging errors for the same, this doesn't simplify fixing
broken tests most of the time.
This is a follow-up for a9525da8a4
Consistently using t.Fatal* simplifies the test code and makes it less fragile, since it is common error
to forget to make proper cleanup after t.Error* call. Also t.Error* calls do not provide any practical
benefits when some tests fail. They just clutter test output with additional noise information,
which do not help in fixing failing tests most of the time.
This is a follow-up for a9525da8a4
### Describe Your Changes
In most cases histograms are exposed in sorted manner with lower buckets
being first. This means that during scraping buckets with lower bounds
have higher chance of being updated earlier than upper ones.
Previously, values were propagated from upper to lower bounds, which
means that in most cases that would produce results higher than expected
once all buckets will become updated.
Propagating from upper bound effectively limits highest value of
histogram to the value of previous scrape. Once the data will become
consistent in the subsequent evaluation this causes spikes in the
result.
Changing propagation to be from lower to higher buckets reduces value
spikes in most cases due to nature of the original inconsistency.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
An example histogram with previous(red) and updated(blue) versions:
![1719565540](https://github.com/VictoriaMetrics/VictoriaMetrics/assets/1367798/605c5e60-6abe-45b5-89b2-d470b60127b8)
This also makes logic of filling nan values with lower buckets values: [1 2 3 nan nan nan] => [1 2 3 3 3 3] obsolete.
Since buckets are now fixed from lower ones to upper this happens in the main loop, so there is no need in a second one.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6a4bd5049b)
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
The change adds a new docs section with examples on how source can be
overridden. It should address questions like
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6536
While there, fix the example in `external.alert.source` cmd-line flag
and docker-compose examples.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c429bbf889)
Reason for revert:
There are many statsd servers exist:
- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics
These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).
Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.
The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).
Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.
P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:
- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)
It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.
In the mean time it is recommended continue using external statsd server.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
This reverts commit 5a3abfa041.
Reason for revert: exemplars aren't in wide use because they have numerous issues which prevent their adoption (see below).
Adding support for examplars into VictoriaMetrics introduces non-trivial code changes. These code changes need to be supported forever
once the release of VictoriaMetrics with exemplar support is published. That's why I don't think this is a good feature despite
that the source code of the reverted commit has an excellent quality. See https://docs.victoriametrics.com/goals/ .
Issues with Prometheus exemplars:
- Prometheus still has only experimental support for exemplars after more than three years since they were introduced.
It stores exemplars in memory, so they are lost after Prometheus restart. This doesn't look like production-ready feature.
See 0a2f3b3794/content/docs/instrumenting/exposition_formats.md (L153-L159)
and https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage
- It is very non-trivial to expose exemplars alongside metrics in your application, since the official Prometheus SDKs
for metrics' exposition ( https://prometheus.io/docs/instrumenting/clientlibs/ ) either have very hard-to-use API
for exposing histograms or do not have this API at all. For example, try figuring out how to expose exemplars
via https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus .
- It looks like exemplars are supported for Histogram metric types only -
see https://pkg.go.dev/github.com/prometheus/client_golang@v1.19.1/prometheus#Timer.ObserveDurationWithExemplar .
Exemplars aren't supported for Counter, Gauge and Summary metric types.
- Grafana has very poor support for Prometheus exemplars. It looks like it supports exemplars only when the query
contains histogram_quantile() function. It queries exemplars via special Prometheus API -
https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars - (which is still marked as experimental, btw.)
and then displays all the returned exemplars on the graph as special dots. The issue is that this doesn't work
in production in most cases when the histogram_quantile() is calculated over thousands of histogram buckets
exposed by big number of application instances. Every histogram bucket may expose an exemplar on every timestamp shown on the graph.
This makes the graph unusable, since it is litterally filled with thousands of exemplar dots.
Neither Prometheus API nor Grafana doesn't provide the ability to filter out unneeded exemplars.
- Exemplars are usually connected to traces. While traces are good for some
I doubt exemplars will become production-ready in the near future because of the issues outlined above.
Alternative to exemplars:
Exemplars are marketed as a silver bullet for the correlation between metrics, traces and logs -
just click the exemplar dot on some graph in Grafana and instantly see the corresponding trace or log entry!
This doesn't work as expected in production as shown above. Are there better solutions, which work in production?
Yes - just use time-based and label-based correlation between metrics, traces and logs. Assign the same `job`
and `instance` labels to metrics, logs and traces, so you can quickly find the needed trace or log entry
by these labes on the time range with the anomaly on metrics' graph.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5982
- Export streamaggr.LoadFromData() function, so it could be used in tests outside the lib/streamaggr package.
This allows removing a hack with creation of temporary files at TestRemoteWriteContext_TryPush_ImmutableTimeseries.
- Move common code for mustParsePromMetrics() function into lib/prompbmarshal package,
so it could be used in tests for building []prompbmarshal.TimeSeries from string.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206
Move the code responsible for relabelCtx clearing into deferred function.
This allows making more clear the remoteWriteCtx.TryPush code.
This is a follow-up for 879771808b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
While at it, clarify the description of the bugfix at docs/CHANGELOG.md
Use metricsql.IsLikelyInvalid() function for determining whether the given query is likely invalid,
e.g. there is high change the query is incorrectly written, so it will return unexpected results.
The query is invalid most of the time if it passes something other than series selector into rollup function.
For example:
- rate(sum(foo))
- rate(foo + bar)
- rate(foo > bar)
Improtant note: the query is considered valid if it misses the lookbehind window in square brackes inside rollup function,
e.g. rate(foo), since this is very convenient MetricsQL extention to PromQL, and this query returns the expected results
most of the time.
Other unsafe query types can be added in the future into metricsql.IsLikelyInvalid().
TODO: probably, the -search.disableImplicitConversion command-line flag must be set by default in the future releases of VictoriaMetrics.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6180
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6450
Previously the in-memory buffer could remain unflushed for long periods of time under low ingestion rate.
The ingested logs weren't visible for search during this time.
### Describe Your Changes
- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
- rwctx - global or number starting from 1
- aggrid - aggregator id starting from 1
- aggrSuffix - <interval>_(by|without)_label1_label2_labeln
e.g: `name="global:1:1m_without_instance_pod"`
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 861852f262)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Fixes#6453
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
### Describe Your Changes
- Fixed the update of the relative time range when `Execute Query` is
clicked
- Optimized server requests: now, if an error occurs in the `/query`
request, the `/hits` request will not be executed.
#6345 (duplicates: #6440, #6312)
(cherry picked from commit 43342745ac)
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.
While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
This reverts commit 5ecf439078.
Reason for revert: the previous logic was correct.
The purpose of `-search.maxSamplesPerQuery` command-line flag is to limit the amounts of CPU resources,
which could be taken by a single query - see https://docs.victoriametrics.com/#resource-usage-limits .
VictoriaMetrics processes samples in blocks during querying - it reads the block, then unpacks it,
then filters out samples outside the selected time range. This means that it _spends CPU time_
on reading and unpacking of _all the samples_ in every block on the requested time range,
even if only a single sample per each block matches the given time range.
The previous logic was effectively limiting CPU time a single query could take.
The new logic fails limiting CPU time a single query could take in some pathological cases
when only a small fraction of samples per each requested block fit the requested time range.
This allows performing multiplication DoS-attacks by querying very narrow time ranges over historical blocks,
which tend to be full. For example, if the `-search.maxSamplesPerQuery` equals to a billion,
and the query requests a single sample out of 8K samples per each block, this means that the query
may unpack a billion of such blocks without exceeding the limit, e.g. it may unpack and process 8K*1e9=8e12 samples.
This is not what the resource usage limits were created for originally - see https://docs.victoriametrics.com/#resource-usage-limits
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5851
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6464