Commit Graph

2333 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
3199558da9
lib/{storage,mergeset}: reduce the maxium compression level for the stored data
This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.
2024-01-23 17:47:40 +02:00
Aliaksandr Valialkin
68d76b1436
lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache
This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average.
The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.
2024-01-23 16:13:25 +02:00
Aliaksandr Valialkin
9b3217db61
lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting 2024-01-23 16:13:19 +02:00
Aliaksandr Valialkin
7ed7eb95b4
lib/filestream: do not measure read / write duration from / to in-memory buffers
Measuring read / write duration from / to in-memory buffers has little sense,
since it will be always fast. It is better to measure read / write duration from / to
real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics.
This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls
per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers.

This is a follow-up for 2f63dec2e3
2024-01-23 14:53:35 +02:00
Roman Khavronenko
8461add541
lib/promscrape: respect 0 value for series_limit param (#5663)
* lib/promscrape: respect `0` value for `series_limit` param

Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.

This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 13:09:36 +02:00
Aliaksandr Valialkin
c2927053ee
lib/mergeset: make sure that the first and the last items are in the original range after prepareBlock()
Previously the checks were to strict by requiring to leave the same first and last items by prepareBlock()

Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5655
2024-01-23 12:59:04 +02:00
Aliaksandr Valialkin
389159767d
lib/mergeset: skip comparison for every item in the block during merge if the last item in the block is smaller than the first item in the next block
Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5651
2024-01-23 03:16:30 +02:00
Zakhar Bessarab
60ef978ffc
lib/storage: print tenant ID in log when discarding or truncating labels (#5658)
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 02:27:59 +02:00
Aliaksandr Valialkin
d52fd73f18
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:39:16 +02:00
Aliaksandr Valialkin
64e615e6cc
lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate
The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex.
The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache.
Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores.
The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache.
This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds
because of bottleneck at the mutex.

Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2
cache hits under the mutex, e.g. when the entry is found in the mutable cache.
This should automatically adjust intervals between merges depending on the addition rate
for new time series (aka churn rate):

- The interval will be much smaller than 10 seconds under high churn rate.
  This should reduce the mutex contention for mutable cache.
- The interval will be bigger than 10 seconds under low churn rate.
  This should reduce the uneeded work on merging of mutable cache into immutable cache.
2024-01-22 18:14:30 +02:00
Aliaksandr Valialkin
c6f6f094c5
Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577)"
This reverts commit cfec258803.

Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled.
The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled.
This prevents from storing the last response at scrapeWork.processScrapedData().

It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577
2024-01-22 01:46:12 +02:00
Aliaksandr Valialkin
d4a1a28543
app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-22 01:39:27 +02:00
Hui Wang
49fa92c1d0
lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557)
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice

Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.

* remove mislead comment

* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

* wip

* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls

groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.

Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.

P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.

* typo fix

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 01:33:17 +02:00
Aliaksandr Valialkin
885ee160c2
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-22 01:23:23 +02:00
Aliaksandr Valialkin
5f5fcab217
all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time
This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*
2024-01-22 01:13:41 +02:00
Aliaksandr Valialkin
be5faef552
lib/promscrape: code cleanup: send stale markers immediately after generating automatic metrics
This cleanup has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5557/files#diff-6b205cf6637d7b65a5c45d9417d08822d4efad94227268cb196f61aa2a0fc0f7
2024-01-22 01:12:56 +02:00
Aliaksandr Valialkin
e15f07d989
all: consistently clear prompbmarshal.Label by assigning an empty struct instead of zeroing Name and Value individually 2024-01-22 01:11:59 +02:00
Aliaksandr Valialkin
2f94bef59c
lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search
Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410
2024-01-22 01:11:36 +02:00
Aliaksandr Valialkin
2c7c812a9d
lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag
This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593
2024-01-22 01:08:52 +02:00
Nikolay
e196c61e36
app/vmselect: abort streaming connections for vmselect (#5650)
* app/vmselect: abort streaming connections for vmselect
due to streaming nature of export APIs, curl and simmilr tools cannot
detect errors that happened after http.Header with status 200 was
written to it.

This PR tracks if body write was already started and closes connection.

It allows client to detect not expected chunk sequence and return error
to the caller.

Mostly it affects vmselect at cluster version

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5650

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-22 00:54:32 +02:00
Aliaksandr Valialkin
c05982bfa7
lib/promscrape/discovery/hetzner: follow-up after 03a97dc678
- docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names,
  document missing __meta_hetzner_role label.
- lib/promscrape/config.go: added missing MustStop() call for Hetzner SD,
  and moved the code to the correct place according to alphabetical order of SD names.
- lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses,
  populate missing __meta_hetzner_role label like Prometheus does.
- Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does.
- Remove unused SDConfig.Token.
- Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154
2024-01-22 00:53:23 +02:00
Hui Wang
66eb013b54
lib/promscrape: do not store last scrape response when stale markers … (#5577)
* lib/promscrape: do not store last scrape response when stale markers are disabled

* update changelog
2024-01-22 00:52:25 +02:00
Aliaksandr Valialkin
41d6c8a7dd
lib/storage: do not prefetch metric names for small number of metricIDs
This eliminates prefetchedMetricIDsLock lock contention for queries, which return less than 500 time series.

This is a follow-up for 9d886a2eb0
2024-01-17 13:50:01 +02:00
Aliaksandr Valialkin
09f23b0296
lib/promscrape: cosmetic changes after 3ac44baebe
- Rename mustLoadScrapeConfigFiles() to loadScrapeConfigFiles(), since now it may return error.
- Split too long line with the error message into two lines in order to improve readability a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5508
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5560
2024-01-17 01:07:16 +02:00
Aliaksandr Valialkin
75c58ab306
lib/httputils: handle step=undefined query arg as an empty value
This is needed for Grafana, which may send step=undefined
when working with alerting rules and instant queries.
2024-01-17 00:13:04 +02:00
Aliaksandr Valialkin
f673039e86
lib/storage: follow-up for 4b8088e377
- Clarify the bugfix description at docs/CHANGELOG.md
- Simplify the code by accessing prefetchedMetricIDs struct under the lock
  instead of using lockless access to immutable struct.
  This shouldn't worsen code scalability too much on busy systems with many CPU cores,
  since the code executed under the lock is quite small and fast.
  This allows removing cloning of prefetchedMetricIDs struct every time
  new metric names are pre-fetched. This should reduce load on Go GC,
  since the cloning of uin64set.Set struct allocates many new objects.
2024-01-16 22:38:57 +02:00
Hui Wang
2f40ed3aac
exit vmagent if there is config syntax error in scrape_config_files when -promscrape.config.strictParse=true (#5560) 2024-01-16 22:35:18 +02:00
Aliaksandr Valialkin
6ba2fd3312
app/vmselect/promql: follow-up for ce4f26db02
- Document the bugfix at docs/CHANGELOG.md
- Filter out NaN values before sorting as suggested at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509#discussion_r1447369218
- Revert unrelated changes in lib/filestream and lib/fs
- Use simpler test at app/vmselect/promql/exec_test.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5506
2024-01-16 22:13:13 +02:00
Zongyang
cb37df5723
FIX bottomk doesn't return any data when there are no time range overlap between timeseries (#5509)
* FIX sort order in bottomk

* Add lessWithNaNsReversed for bottomk

* Add ut for TopK

* Move lt from loop

* FIX lint

* FIX lint

* FIX lint

* Mod log format

---------

Co-authored-by: xiaozongyang <xiaozngyang@kanyun.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-16 22:12:49 +02:00
Aliaksandr Valialkin
724223fad4
lib/prompbmarshal: move WriteRequest proto definition to the correct place 2024-01-16 21:57:03 +02:00
Aliaksandr Valialkin
0196902b2e
lib/promscrape/discovery/hetzner: fix golangci-lint warnings after 03a97dc678
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
2024-01-16 21:51:48 +02:00
Aliaksandr Valialkin
9e5e514faf
lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop()
Previously the was a race condition when the background goroutine still could try collecting metrics
from already stopped resources after returning from pushmetrics.Stop().
Now the pushmetrics.Stop() waits until the background goroutine is stopped before returning.

This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5549
and the commit fe2d9f6646 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
2024-01-16 21:18:22 +02:00
Aleksandr Stepanov
3a6e3adc7d
vmagent: added hetzner sd config (#5550)
* added hetzner robot and hetzner cloud sd configs

* remove gettoken fun and update docs

* Updated CHANGELOG and vmagent docs

* Updated CHANGELOG and vmagent docs

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-01-16 21:13:20 +02:00
Roman Khavronenko
d562d772a8
lib/storage: properly check for storage/prefetchedMetricIDs cache expiration deadline (#5607)
Before, this cache was limited only by size.
Cache invalidation by time happens with jitter to prevent thundering herd problem.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-16 21:08:59 +02:00
Aliaksandr Valialkin
d566aa7d78
lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:48:30 +02:00
Aliaksandr Valialkin
f7b589e38a
lib/prompb: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:43:09 +02:00
Aliaksandr Valialkin
7d40506744
lib/prompb: change type of Label.Name and Label.Value from []byte to string
This makes it more consistent with lib/prompbmarshal.Label
2024-01-16 20:41:37 +02:00
Aliaksandr Valialkin
8cb138e8df
lib/protoparser/datadogv2: simplify code for parsing protobuf messages after 0597718435
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451
2024-01-16 20:35:17 +02:00
Aliaksandr Valialkin
f8ae2abd88
lib/protoparser/opentelemetry: use github.com/VictoriaMetrics/easyproto for protobuf message unmarshaling and marshaling
This reduces VictoriaMetrics binary size by 100KB.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2570
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2424
2024-01-16 20:34:18 +02:00
Aliaksandr Valialkin
9eef72bce9
lib/protoparser/datadogv2: add support for reading protobuf-encoded requests at /api/v2/series endpoint
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094
2024-01-16 20:32:15 +02:00
Artem Navoiev
e1005209ba
docs: mention staleNaN handling during deduplication
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5587

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-16 20:11:45 +02:00
hagen1778
f301dc5cfb
lib/uint64: remove accidentally added test
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-09 13:32:22 +01:00
hagen1778
2a7207f38a
app/all: follow-up after 84d710beab
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-09 13:17:09 +01:00
zhdd99
84d710beab
lib/pushmetrics: fix a panic caused by pushing metrics during the graceful shutdown process of vmstorage nodes. (#5549)
Co-authored-by: zhangdongdong <zhangdongdong@kuaishou.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-01-09 13:01:03 +01:00
Aliaksandr Valialkin
12de0d39eb
lib/protoparser/datadogv2: take into account source_type_name field, since it contains useful value such as kubernetes, docker, system, etc. 2023-12-21 23:05:52 +02:00
Aliaksandr Valialkin
6feef14095
lib/protoparser: add missing /datadog/ prefix to the /api/v2/series path in the description for -datadog.maxInsertRequestSize command-line flag 2023-12-21 21:05:24 +02:00
Aliaksandr Valialkin
62a105d9e9
app/{vminsert,vmagent}: preliminary support for /api/v2/series ingestion from new versions of DataDog Agent
This commit adds only JSON support - https://docs.datadoghq.com/api/latest/metrics/#submit-metrics ,
while recent versions of DataDog Agent send data to /api/v2/series in undocumented Protobuf format.
The support for this format will be added later.

Thanks to @AndrewChubatiuk for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4451
2023-12-21 20:50:27 +02:00
Aliaksandr Valialkin
426a451435
lib/promauth: add more context to errors returned by Options.NewConfig() in order to simplify troubleshooting 2023-12-20 21:58:19 +02:00
Aliaksandr Valialkin
3a9cf13aaa
app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags
This is a follow-up for 5ebd5a0d7b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427
2023-12-20 21:38:16 +02:00
Morgan
64e96fccd9
Expose OAuth2 Endpoint Parameters to cli (#5427)
The user may which to control the endpoint parameters for instance to
set the audience when requesting an access token. Exposing the
parameters as a map allows for additional use cases without requiring
modification.
2023-12-20 21:38:13 +02:00
Nikolay
46a335aa1d
lib/awsapi: properly assume role with webIdentity token (#5495)
* lib/awsapi: properly assume role with webIdentity token
introduce new irsaRoleArn param for config. It's only needed for authorization with webIdentity token.
First credentials obtained with irsa role and the next sts assume call for an actual roleArn made with those credentials.
Common use case for it - cross AWS accounts authorization
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3822

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-12-20 19:07:04 +02:00
Aliaksandr Valialkin
261c173f4b
all: use Gauge instead of Counter for *_config_last_reload_successful metrics
This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag.
See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details.

This is follow-up for 326a77c697
2023-12-20 14:25:44 +02:00
Aliaksandr Valialkin
0a99c819bf
all: add -metrics.exposeMetadata command-line flag, which can be used for adding TYPE and HELP metadata for metrics exposed at /metrics page
This may be needed for systems, which require this metadata such as Google Cloud Managed Prometheus.
See https://cloud.google.com/stackdriver/docs/managed-prometheus/troubleshooting#missing-metric-type
2023-12-19 03:26:02 +02:00
Aliaksandr Valialkin
9540d29154
lib/pushmetrics: add -pushmetrics.header and -pushmetrics.disableCompression command-line flags 2023-12-17 19:58:14 +02:00
Aliaksandr Valialkin
76b120e355
lib/protoparser/opentelemetry: allow ingesting metrics without resource labels
Some clients may ingest samples via OpenTelemetry protocol without Resource labels.
Previously VictoriaMetrics was silently dropping such samples.

The commit 317834f876 added vm_protoparser_rows_dropped_total{type="opentelemetry",reason="resource_not_set"}
counter for tracking of such dropped samples. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5459

It is better from usability PoV to accept such samples instead of dropping them and incrementing the corresponding counter.
2023-12-17 19:16:43 +02:00
Zakhar Bessarab
61f400eccb
lib/protoparser/opentelemetry: add metric to track skipped rows without resource (#5459)
Currently, it is impossible to understand why metrics are not ingested when resource is not set by OTEL exporter. Adding metric should simplify debugging and make it improve debuggability.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit 317834f876)
2023-12-15 11:54:07 +01:00
Aliaksandr Valialkin
329bd244d2
lib/fs: remove unused IsEmptyDir()
This function became unused after the commit 43b24164ef

The unused function has been found with deadode tool - https://go.dev/blog/deadcode
2023-12-14 19:40:52 +02:00
Aliaksandr Valialkin
e4bb2808f1
app/vmselect: add support for vmstorage groups with independent -replicationFactor per group
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5197

See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmstorage-groups-at-vmselect

Thanks to @zekker6 for the initial pull request at https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/718
2023-12-13 00:14:34 +02:00
hagen1778
14117f2f90
lib/promscrape: comsetic changes after e373bb84d5
* fix typos in docs
* add `shard-` prefix to generated links when `-promscrape.cluster.memberURLTemplate` is enabled

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e0fc5ef140)
2023-12-12 13:45:34 +01:00
Aliaksandr Valialkin
15a7542ef8
vendor: run make vendor-update 2023-12-11 10:48:47 +02:00
Aliaksandr Valialkin
49552eaa15
app/vmauth: add support for hot standby mode via first_available load balancing policy
vmauth in `hot standby` mode sends requests to the first url_prefix while it is available.
If the first url_prefix becomes unavailable, then vmauth falls back to the next url_prefix.
This allows building highly available setup as described at https://docs.victoriametrics.com/vmauth.html#high-availability

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792
2023-12-08 23:32:10 +02:00
Aliaksandr Valialkin
475ae2a1be
lib/promscrape: add a wraning when the /service-discovery page contains incomplete list of dropped targets 2023-12-08 19:04:29 +02:00
noodles2hg
f3c237bae1
lib/streamaggr/streamaggr.go: fix link in error message (#5439) 2023-12-08 18:14:29 +02:00
Aliaksandr Valialkin
9074ab68d4
lib/promscrape: add -promscrape.cluster.memberURLTemplate command-line flag for creating direct links to vmagent instances at /service-discovery page
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018#issuecomment-1843811569
2023-12-07 16:05:03 +02:00
Aliaksandr Valialkin
896a0f32cd
lib/promscrape: show -promscrape.cluster.memberNum values for vmagent instances, which scrape the given dropped target at /service-discovery page
The /service-discovery page contains the list of all the discovered targets
after the commit 487f6380d0 on all the vmagent instances
in cluster mode ( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ).

This commit improves debuggability of targets in cluster mode by providing a list of -promscrape.cluster.memberNum
values per each target at /service-discovery page, which has been dropped becasue of sharding,
e.g. if this target is scraped by other vmagent instances in the cluster.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4018
2023-12-07 00:11:30 +02:00
Aliaksandr Valialkin
e8dfecb3f1
lib/promscrape: show never scraped message for never scraped targets at /targets page 2023-12-06 22:33:27 +02:00
Aliaksandr Valialkin
8b6bce61e4
lib/promscrape: follow-up for 97373b7786
Substitute O(N^2) algorithm for exposing the `vm_promscrape_scrape_pool_targets` metric
with O(N) algorithm, where N is the number of scrape jobs. The previous algorithm could slow down
/metrics exposition significantly when -promscrape.config contains thousands of scrape jobs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5311
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5335
2023-12-06 17:36:48 +02:00
Hui Wang
065f5a7f9e
vmagent: add vm_promscrape_scrape_pool_targets for scrape jobs like… (#5335)
* vmagent: export `vm_promscrape_scrape_pool_targets` metric to track the number of targets that each scrape_job discovers

* add extra panel for new metric
2023-12-06 14:46:02 +02:00
Aliaksandr Valialkin
559e4db512
Revert "add datadog /api/v2/series and /api/beta/sketches support (#5094)"
This reverts commit d6b4c8e4ef.

Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080
2023-12-05 02:30:40 +02:00
Aliaksandr Valialkin
61db92cdc7
Revert "lib/protoparser/datadog: follow-up after 543f218fe96574b9b2189c8350bb09afa349e3bb"
This reverts commit 73d18fbc7a.

Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080
2023-12-05 02:29:00 +02:00
Aliaksandr Valialkin
85fcefaa34
app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers
- Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags.
- Wait until message processing goroutines are stopped before returning from gcppubsub.Stop().
- Prevent from multiple calls to Init() without Stop().
- Drop message if tenantID cannot be parsed properly.
- Take into account tenantID for all the supported message formats.
- Support gzip-compressed messages for graphite format.
- Use exponential backoff sleep when the message cannot be pushed to remote storage systems
  because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence
- Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called.
- Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub
- Mention Google PubSub support at docs/enterprise.md
- Make Google PubSub docs more clear at docs/vmagent.md

This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984
and e6eab781ce42285a6a1750dc01eba6801dd35516 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713
2023-12-04 22:51:04 +02:00
Aliaksandr Valialkin
b6d6a3a530
lib/promscrape: show dropped targets because of sharding at /service-discovery page
Previously the /service-discovery page didn't show targets dropped because of sharding
( https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets ).

Show also the reason why every target is dropped at /service-discovery page.
This should improve debuging why particular targets are dropped.

While at it, do not remove dropped targets from the list at /service-discovery page
until the total number of targets exceeds the limit passed to -promscrape.maxDroppedTargets .
Previously the list was cleaned up every 10 minutes from the entries, which weren't updated
for the last minute. This could complicate debugging of dropped targets.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2023-12-04 17:42:46 +02:00
Aliaksandr Valialkin
2f4dc2aff1
lib/backup: consistently use path.Join() when constructing paths for s3, gs and azblob
E.g. replace `fs.Dir + filePath` with `path.Join(fs.Dir, filePath)`

The fs.Dir is guaranteed to end with slash - see Init() functions.
The filePath may start with slash. If it starts with slash, then `fs.Dir + filePath` constructs
an incorrect path with double slashes.
path.Join() properly substitutes duplicate slashes with a single slash in this case.

While at it, also substitute incorrect usage of filepath.Join() with path.Join()
for constructing paths to object storage systems, which expect forward slashes in paths.
filepath.Join() substittues forward slashes with backslashes on Windows, so this may break
creating or managing backups from Windows.

This is a follow-up for 0399367be602b577baf6a872ca81bf0f99ba401b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/719
2023-12-04 17:25:41 +02:00
Aliaksandr Valialkin
f0215afee3
lib/promrelabel: add keep_if_contains and drop_if_contains relabeling actions
(cherry picked from commit ac65c6b178)
2023-12-01 14:00:20 +01:00
Nikolay
9505d48070
lib/streamaggr: properly reference slice with labels (#5406)
* lib/streamaggr: properly reference slice with labels
by limiting slice capacity. It must fix issues with slice modification, in case of append new slice will be allocated, instead of modifying refrenced slice
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5402

* Reduce memory allocations when output_relabel_configs adds new labels to output samples

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
(cherry picked from commit 41f7940f97)
2023-12-01 14:00:18 +01:00
hagen1778
73d18fbc7a
lib/protoparser/datadog: follow-up after 543f218fe9
* prevent /api/v1 from panic on parsing rows
* add tests for Extract function for v1 and v2 api's
* separate request types in different pools to prevent different objects mixing
* add changelog line

543f218fe9
Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 98d0f81f21)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-01 13:56:23 +01:00
Andrii Chubatiuk
d6b4c8e4ef
add datadog /api/v2/series and /api/beta/sketches support (#5094)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
Co-authored-by: Nikolay <https://github.com/f41gh7>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

(cherry picked from commit 543f218fe9)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-01 13:55:32 +01:00
Aliaksandr Valialkin
2f14394335
app/vmagent: follow-up for 090cb2c9de
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
  for the result returned from these functions.

- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.

- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
  after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.

- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
  of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
  cannot keep up with the data ingestion rate.

- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.

- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
  data to persistent queue when -remoteWrite.disableOnDiskQueue is set.

- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
  because they are replaced with vmagent_remotewrite_samples_dropped_total metric.

- Update 'Disabling on-disk persistence' docs at docs/vmagent.md

- Update stale comments in the code

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 12:13:39 +02:00
Nikolay
25ac2aac31
app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-25 12:12:29 +02:00
Roman Khavronenko
26242f526e
lib/protoparser: decrease import.maxLineLen from 100MB to 10MB (#5364)
Tests showed that importing a single line with 70MB size takes 5.3GiB
RSS memory for VictoriaMetrics single-node.
In the scenario when user exports and imports data from one VM to another,
it could possibly lead to OOM exception for destination VM.

Importing a single line with 16MB size taks 1.3GiB RSS memory.
Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB
to improve reliability of VictoriaMetrics during imports.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-24 13:13:33 +02:00
hagen1778
ae6152be5f
lib/storage: fix typo
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-21 12:22:49 +02:00
hagen1778
91e365acb6
lib/storage: fix typo
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-21 12:10:34 +02:00
Hui Wang
91379331eb
lib/protoparser/promremotewrite: fall back to zstd decoding if Snappy-decoding fails (#5344)
This case is possible after the following steps:
1. vmagent successfully performed handshake with the -remoteWrite.url and the remote storage supports zstd-compressed data.
2. remote storage became unavailable or slow to ingest data, vmagent compressed the collected data into blocks with zstd and puts these blocks to persistent queue on disk.
3. vmagent restarts and the remote storage is unavailable during the handshake, then vmagent falls back to Snappy compression.
4. vmagent starts sending zstd-compressed data from persistent queue to the remote storage, while falsely advertizing it sends Snappy-compressed data.
5. The remote storage receives zstd-compressed data and fails unpacking it with Snappy.

The solution is the same as 12cd32fd75, just fall back to zstd decompression if Snappy decompression fails.
2023-11-17 15:53:18 +01:00
Aliaksandr Valialkin
a0f02d06d7
lib/handshake: typo fix after ef80a89a24: SetReadDeadline -> SetWriteDeadline
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327
2023-11-16 16:47:07 +01:00
Aliaksandr Valialkin
ef80a89a24
lib/handshake: add SetReadDeadline and SetWriteDeadline implementations additionally to SetDeadline
This is a follow-up for 27a5461785

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327
2023-11-16 16:43:36 +01:00
Roman Khavronenko
27a5461785
lib/handshake: check for deadline in Read and Write methods (#5327)
The buffered connection could have exceeded the underlying connection
deadline during reading or writing to an internal buffer.
With this change, buffered connection struct additionally checks
for a deadline in Read/Write methods.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-16 16:33:40 +01:00
Aliaksandr Valialkin
60ff3cbb3d
lib/querytracer: add missing blank comment line after 3121d76bee 2023-11-15 16:11:50 +01:00
Aliaksandr Valialkin
e9639a49c2
lib/ingestserver: properly log the number of closed connections
Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922
2023-11-14 21:53:10 +01:00
Nikolay
0730c2586d
lib/querytracer: makes package concurrent safe to use (#5322)
* lib/querytracer: makes package concurrent safe to use
it must fix various issues with concurrent code usage.
Especially, when it's not reasonable to wait for all goroutines to be finished

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 20:58:28 +01:00
Aliaksandr Valialkin
1f7ab894d7
lib/logger: increase default -loggerMaxArgLen command-line flag value from 500 to 1000
The 500 chars limit for the maximum arg lengths during logging appeared to be too low for some cases
2023-11-14 19:55:55 +01:00
Aliaksandr Valialkin
3a487666ca
lib/ingestserver: typo fix after f7834767c1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922
2023-11-14 03:26:04 +01:00
Aliaksandr Valialkin
9760221214
lib/logstorage: always check the previous indexBlockHeader for blocks with matching tenantID and/or streamID
The previous indexBlockHeader may contain blocks for the matching tenantID and/or streamID,
so it must be scanned unconditionally during the search.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5295
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4856

This is a follow-up for 89dcbc2fe7
2023-11-14 01:02:02 +01:00
XLONG96
77033dbfb6
lib/logstorage: fix streamID and tenantID search (#4856) (#5295) 2023-11-14 01:02:02 +01:00
Zakhar Bessarab
f7834767c1
vmcluster: re-routing enhancement (#5293)
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878

Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:00:42 +01:00
Aliaksandr Valialkin
12cd32fd75
lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails
This case is possible after the following steps:

1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether
   the remote storage supports zstd-compressed data.
2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression
   for the data sent to the remote storage.
3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk.
4. The remote storage becomes available.
5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data.
6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage,
   while falsely advertizing it sends zstd-compressed data.
7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd.

The solution is to just fall back to Snappy decompression if zstd decompression fails.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
2023-11-13 21:25:39 +01:00
Aliaksandr Valialkin
356deada8c
lib/htmlcomponents: use relative links for the top page and for favicon.ico
This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes.
For example, vmagent HTTP handlers can be served via /vmagent/ path prefix:

- http://proxy/vmagent/targets
- http://proxy/vmagent/service-discovery

The path prefix can be arbitrary. For example, below are vmagent urls
for /tenantID/vmagent/ path prefix:

- http://proxy/tenantID/vmagent/targets
- http://proxy/tenantID/vmagent/service-discovery

While at it, consistently serve favicon.ico from any path directory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307
2023-11-13 20:28:17 +01:00
Aliaksandr Valialkin
a45cbc101f
all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:15:42 +01:00
Aliaksandr Valialkin
fb2071a01e
lib/regexutil: properly handle alternate regexps surrounded by .+ or .*
Previously the following regexps were improperly handled:

  .+foo|bar.+
  .*foo|bar.*

This could lead to unexpected regexp match results.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5297

Thanks to @Haleygo for the initial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5308
2023-11-13 18:25:57 +01:00
Aliaksandr Valialkin
22927dcc53
lib/stringsutil: add tests for LimitStringLen() function 2023-11-13 10:33:07 +01:00
Dmytro Kozlov
faf788b4a6
lib/stringsutil: fix failing test (#5313)
We have failed test on master branch.

```
--- FAIL: TestFormatLogMessage (0.00s)
    logger_test.go:24: unexpected result; got
        "foo: abcde, \"foo bar baz\", xx"
        want
        "foo: a..e, \"f..z\", xx"
```
if failed because maxArgs maxLen <= 4 in the  `LimitStringLen` in that case we always will return the income string
but in the test we limit the maxLen by value 4
```
f("foo: %s, %q, %s", []interface{}{"abcde", fmt.Errorf("foo bar baz"), "xx"}, 4, `foo: a..e, "f..z", xx`)
2023-11-13 10:33:06 +01:00