* feat: add cardinality support for prometheus (#4320)
* docs/CHANGELOG.md: add cardinality support for prometheus
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* added ability to set and clear response headers (#4825)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
* added ability to set and clear response headers (#4825)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
* fix review comment
Signed-off-by: Alexander Marshalov <_@marshalov.org>
---------
Signed-off-by: Alexander Marshalov <_@marshalov.org>
* vmagent: retry failed write request on the closed connection
Retry failed write request on the closed connection immediately,
without waiting for backoff. This should improve data delivery speed
and reduce amount of error logs emitted by vmagent when using idle connections.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmagent: retry failed write request on the closed connection
Re-instantinate request before retry as body could have been already spoiled.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* vmalert: correctly re-instantinate HTTP req on retries
Previosly, request retry to datasource re-used existing HTTP request.
But if request object was already partially processed (body was read),
then retry will be unsuccessful.
The change re-instantinates HTTP request object before retry.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: review fix
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Add button to prettify query
Just capitalizes query text for now
* Add /prettify-query API handler
* Replace UI pretiffier using prettifier API
* Add showing server errors
Had to pass setQueryErrors from useFetchQuery.ts
* Use serverUrl from global AppState
* Change icon to AutoAwsome icon + added style change color when button is active
* Add sync/await to prettifyQuery function
* Doc public function for lint
* Minor async fix
* Removed extra blank lines
* Extract usePrettifyQuery hook
* Made more generic style for :active button
* Refactor usePrettifyQuery
However, prettify errors don't clean up query errors, but should
* Add prettyQuery functionality to CHANGELOG.md
* Reuse queryErrors
* Unhide errors on start
---------
Co-authored-by: Tamara <toma.vashchuk@gmail.com>
- Fix Prometheus-compatible naming after applying the relabeling if -usePromCompatibleNaming command-line flag is set.
This should prevent from possible Prometheus-incompatible metric names and label names generated by the relabeling.
- Do not return anything from relabelCtx.appendExtraLabels() function, since it cannot change the number of time series
passed to it. Append labels for the passed time series in-place.
- Remove promrelabel.FinalizeLabels() call after adding extra labels to time series, since this call has been already
made at relabelCtx.applyRelabeling(). It is user's responsibility if he passes labels with double underscore prefixes
to -remoteWrite.label.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247
app/vmctl: don't interrupt migration process if tenant has no data
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Alexander Marshalov <_@marshalov.org>
vmagent: properly add extra labels before sending data to remote storage
labels from `remoteWrite.label` are now added to sent metrics just before they
are pushed to `remoteWrite.url` after all relabelings, including stream aggregation relabelings (#4247)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247
Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* app/vlinsert/elasticsearch: add a command-line flag to provide ES version
Adds a flag which will allow to change version which will be reported by ES endpoint for compatibility checks performed by external logs shippers(such as filebeat).
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4777
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Document the -elasticsearch.version command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4777
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This reverts commit 252643d100.
Reason for revert: the commit incorrectly fixes the the issue.
The `remoteAddr` must be properly quoted inside lib/httpserver.GetQuotedRemoteAddr().
It isn't quoted properly if the request contains X-Forwarded-For header.
The proper fix will be included in the follow-up commit.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4676
{vmagent/remotewrite,vminsert/common}: fix dropInput and keepInput flags inconsistency
Sync behavior for dropInput and keepInput flags between single-node and vmagent.
Fix vmagent not respecting dropInput flag and reverse logic for keepInput.
The deferred call's arguments are evaluated immediately, but the function call is not executed until the surrounding function returns.
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
* rename `configErr` to `lastConfigErr` to reduce confusion
* add tests to verify metrics and msg are set properly
* fix mistake when config success metric wasn't restored after an error
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Value of `-dedup.minScrapeInterval` comand-line flag must be higher than `evaluation_interval` in order to make sure that only one sample on each evaluation will be left after deduplication.
Moreover, value of `-dedup.minScrapeInterval` must be a multiple of vmalert's `evaluation_interval` in order to make sure that samples will be aligned between deduplication window periods.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4774#issuecomment-1663940811
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
vmauth: allow configuring deadline for a backend to be excluded from the rotation
The new flag `-failTimeout` allows overriding default time for a bad backend
to be excluded from rotation. The override option could be useful for systems
where it is expected for backends to be off for significant periods of time.
Co-authored-by: Zakhar Bessarab <zekker6@gmail.com>
Binary export API protocol can be disabled via `-vm-native-disable-binary-protocol` cmd-line flag when migrating data from VictoriaMetrics. Disabling binary protocol
can be useful for deduplication of the exported data before ingestion.
For this, deduplication need to be configured at `-vm-native-src-addr` side
and `-vm-native-disable-binary-protocol` should be set on vmctl side.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Links of form `/api/v1/<groupID>/<alertID>/status` were deprecated
in favour of `/api/v1/alerts?group_id=<>&alert_id=<>` links in
v1.79.0. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825
This change removes code responsible for deprecated functionality.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Revert "vmalert: unittest support stale datapoint (#4696)"
This reverts commit 0b44df7ec8.
* Revert "docs: specify min version and limitations for vmalert's unit tests"
This reverts commit a24541bd
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Revert "vmalert: init unit test (#4596)"
This reverts commit da60a68d
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docs: mention unittest revert in changelog
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* lib/protoparser: adds opentelemetry parser
app/{vmagent,vminsert}: adds opentelemetry ingestion path
Adds ability to ingest data with opentelemetry protocol
protobuf and json encoding is supported
data converted into prometheus protobuf timeseries
each data type has own converter and it may produce multiple timeseries
from single datapoint (for summary and histogram).
only cumulative aggregationFamily is supported for sum(prometheus
counter) and histogram.
Apply suggestions from code review
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
updates deps
fixes tests
wip
wip
wip
wip
lib/protoparser/opentelemetry: moves to vtprotobuf generator
go mod vendor
lib/protoparse/opentelemetry: reduce memory allocations
* wip
- Remove support for JSON parsing, since it is too fragile and is rarely used in practice.
The most clients send OpenTelemetry metrics in protobuf.
The JSON parser can be added in the future if needed.
- Remove unused code from lib/protoparser/opentelemetry/pb and lib/protoparser/opentelemetry/proto
- Do not re-use protobuf message between ParseStream() calls, since there is high chance
of high fragmentation of the re-used message because of too complex nested structure of the message.
* wip
* wip
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
The important change is to highlight that restore procedure happens
only once and only for already loaded rules. Config hot-reload
doesn't trigger the restore procedure.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Add -remoteWrite.shardByURL command-line flag, which instructs vmagent to spread evenly
outgoing time series data among the configured remote storage systems specified via -remoteWrite.url .
Samples for the same time series go to the same -remoteWrite.url . This allows building horizontally
scalable stream aggregation when samples for counter and histogram series must be aggregated
by the same second-level vmagent instance.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4637
- Use a byte slice instead of a map for tracking indexes for matching series.
This improves performance, since access by slice index is faster than access by map key.
- Re-use the byte slice for tracking indexes for matching series.
This removes unnecessary memory allocations and improves stream aggregation performance a bit.
- Add an ability to return to the previous behvaiour by specifying -remoteWrite.streamAggr.dropInput command-line flag.
In this case all the input samples are dropped when stream aggregation is enabled.
- Backport the new stream aggregation behaviour from vmagent to single-node VictoriaMetrics when -streamAggr.config
option is set.
- Improve docs regarding this change at docs/CHANGELOG.md
- Document the new behavior at docs/stream-aggregation.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4243
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4575
* {lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag
Changes default behaviour of keepInput flag to write series which did not match any aggregators to the remote write.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4243
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update app/vmagent/remotewrite/remotewrite.go
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* lib/storage: pre-create timeseries before indexDB rotation
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563
* lib/storage: further work on indexdb rotation optimization
- Document the change at docs/CHAGNELOG.md
- Move back various caches from indexDB to Storage. This makes the change less intrusive.
The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID)
entries for both the current and the next indexDB.
- Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function.
This improves code readability and maintainability a bit.
- Rewrite and simplify the code responsible for calculating the next retention timestamp.
Add various tests for corner cases of this code.
- Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called.
It is OK to add indexdb entries on demand in this function. This simplifies the code.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
* docs/CHANGELOG.md: refer to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* app/vmalert/datasource/graphite: allow overriding "from" parameter for datasource queries
Fixes construction of URL parameters for graphite render to allow overriding "from" parameter.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4685
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmalert/datasource/graphite: update flow for building URL parameters
Makes flow of building URL parameters same as Prometheus datasource has:
1) Setting all default values
2) Merging those values with provided `extraParams`
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update docs/CHANGELOG.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
- Improve docs
- Hide `debug relabeling` column when -promscrape.dropOriginalLabels command-line flag is set
- Inline the code from the added template functions, since the code is harder to follow
with the template functions, especially when these functions have misleading names.
Also, these functions are used only in one place, e.g. they do not reduce the amounts of code.
- Hide `click to show original labels` title at `labels` column when original labels aren't available.
- Show the reason on whey original labels aren't available at /service-discovery page.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4597
- Parse protobuf if Content-Type isn't set to `application/json` - this behavior is documented at https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki
- Properly handle gzip'ped JSON requests. The `gzip` header must be read from `Content-Encoding` instead of `Content-Type` header
- Properly flush all the parsed logs with the explicit call to vlstorage.MustAddRows() at the end of query handler
- Check JSON field types more strictly.
- Allow parsing Loki timestamp as floating-point number. Such a timestamp can be generated by some clients,
which store timestamps in float64 instead of int64.
- Optimize parsing of Loki labels in Prometheus text exposition format.
- Simplify tests.
- Remove lib/slicesutil, since there are no more users for it.
- Update docs with missing info and fix various typos. For example, it should be enough to have `instance` and `job` labels
as stream fields in most Loki setups.
- Allow empty of missing timestamps in the ingested logs.
The current timestamp at VictoriaLogs side is then used for the ingested logs.
This simplifies debugging and testing of the provided HTTP-based data ingestion APIs.
The remaining MAJOR issue, which needs to be addressed: victoria-logs binary size increased from 13MB to 22MB
after adding support for Loki data ingestion protocol at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4482 .
This is because of shitty protobuf dependencies. They must be replaced with another protobuf implementation
similar to the one used at lib/prompb or lib/prompbmarshal .
Loki uses default labels format without "or" operator. This format can't create a list of LabelFilters, so only first set of LabelFilters should be used.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmagent: fix creating target id if `--promscrape.dropOriginalLabels` flag was used
* app/vmagent: hide links if OriginalLabels was dropped
* app/vmagent: update CHANGELOG.md and added information to the docs
* app/vmagent: fix comments
* app/vlinsert: add support of loki push protocol
- implemented loki push protocol for both Protobuf and JSON formats
- added examples in documentation
- added example docker-compose
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vlinsert: move protobuf metric into its own file
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* deployment/docker/victorialogs/promtail: update reference to docker image
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* deployment/docker/victorialogs/promtail: make volume name unique
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vlinsert/loki: add license reference
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* deployment/docker/victorialogs/promtail: fix volume name
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/VictoriaLogs/data-ingestion: add stream fields for loki JSON ingestion example
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vlinsert/loki: move entities to places where those are used
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vlinsert/loki: refactor to use common components
- use CommonParameters from insertutils
- stop ingestion after first error similar to elasticsearch and jsonline
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vlinsert/loki: address review feedback
- add missing logstorage.PutLogRows calls
- refactor tenant ID parsing to use common function
- reduce number of allocations for parsing by reusing logfields slices
- add tests and benchmarks for requests processing funcs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This eliminates the need in .(*T) casting for results obtained from Load()
Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map,
because map is already a pointer type.
- Add `Active queries` chapter to VMUI docs
- Set `Content-Type: json` header inside promql.WriteActiveQueries() handler,
in order to be consistent with other request handlers called at app/vmselect/main.go
- Pass the request to promql.WriteActiveQueries() handler, so it can change its output
depending on the provided request params. This also improves consistency of
promql.WriteActiveQueries() args with other request hanlers at app/vmselect/main.go
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4653
* feat: add page to display a list of active queries (#4598)
* app/vmagent: code formatting
* fix: remove console
---------
Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
The `a op b keep_metric_names` is ambigouos to `a op (b keep_metric_names)` when `b` is a transform or rollup function.
For example, `a + rate(b) keep_metric_names`. So it is better to use more clear syntax: `(a op b) keep_metric_names`
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3710
* metricsql: add support of using keep_metric_names for binary operations
This should help to avoid confusion with queries like one in the issue #3710.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* wip
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9
Add a break if gotAlert is nil
This removes the following golangci-lint warning:
app/vmalert/alerting_test.go:868:8: SA5011(related information): this check suggests that the pointer can be nil (staticcheck)
if gotAlert == nil {
^
* app/vmctl: fix panic `--remote-read-filter-time-start` flag not defined
* app/vmctl: update CHANGELOG.md
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
It could happen for low evaluation intervals and irregular
delays during execution that evaluation time would get
a negative offset. This could result into cumulative
discrepancy between the actual time and evaluation time for rules.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docs: make `httpAuth.*` flags description less ambiguous
Currently, it may confuse users whether `httpAuth.*` flags are used by HTTP client or server configuration(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4586 for example).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: fix a typo
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
vmalert: allow disabling of `step` param attached to instant queries
This might be useful for using vmalert with datasources that to not support this param,
unlike VictoriaMetrics.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573
Signed-off-by: hagen1778 <roman@victoriametrics.com>
libcrypto3 and libssl3 in Alpine 3.18.0 have versions `3.1.0-r4`
which contains CVE-2023-2650:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-2650
Use ALpine image 3.18.2 which contains fixed versions of libssl3
and libcrypto3: 3.1.1-r0
NB: In Openshift these containers are marked as vulnerabilities
because of these CVEs.
Error message will be present for any auth error, but message claims an error is about OAuth2 configuration which is confusing.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>