Commit Graph

2077 Commits

Author SHA1 Message Date
Zakhar Bessarab
8d99c12a7d
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048)
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs

It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850).

Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-22 13:01:33 +02:00
Aliaksandr Valialkin
3140ef7261
lib/storage: log fatal error inside searchMetricName() instead of propagating it to the caller
This simplifies the code a bit at searchMetricName() and searchMetricNameWithCache() call sites

This is a result of investigating https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972
2023-09-22 11:41:06 +02:00
Zakhar Bessarab
760cdcec68
lib/backup: fix issue with inconsistent copying of appliedRetention.txt (#5027)
* lib/backup: fix issue with inconsistent copying of appliedRetention.txt

appliedRetention.txt can be modified in place, so it should be always copied just the same as parts.json

Updates: https://github.com/victoriaMetrics/victoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry for appliedRetention.txt copying fix

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-21 11:25:19 +02:00
Zakhar Bessarab
bea3431ed1
lib/storage/partition: add check to ensure parts exist on disk (#5017)
* lib/storage/partition: add check to ensure parts exist on disk

If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name".

This change forces verification of part existence and throws a correct error in case it is missing on disk.

Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: use filepath.Join instead of string concatenation

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: add action points for error message

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* all: add a check for missing part in lib/mergeset and lib/logstorage

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-19 11:17:41 +02:00
Aliaksandr Valialkin
30a645cd82
lib/promscrape/discovery/kubernetes: follow-up after 03fece44e0
- Properly update vm_promscrape_discovery_kubernetes_url_watchers
  and vm_promscrape_discovery_kubernetes_group_watchers metrics after config changes

- Properly stop goroutine responsible for recreating scrapeWorks after the corresponding urlWatcher is stopped

- Log the event when urlWatcher is stopped in order to simplify debugging

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861
2023-09-18 23:23:45 +02:00
Aliaksandr Valialkin
03fece44e0
lib/promscrape/discovery/kubernetes: wait for 10 seconds before checking whether the urlWatcher must be stopped
This should prevent from excess urlWatcher churn on config reload, since it leads to removal of all the apiWatchers
before creating new apiWatchers. So, every config reload would lead to stopping of all the previous urlWatchers
and starting new urlWatchers.

The new logic gives 10 seconds for config reload before stopping unused urlWatchers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4861
2023-09-18 17:45:12 +02:00
Aliaksandr Valialkin
76af32d869
lib/promscrape/discovery/kubernetes: follow-up after eeb862f3ff
- Move the bugfix description to the correct place in docs/CHANGELOG.md
- Prevent from logging of 'context canceled' errors after the url watcher is stopped,
  since these errors are expected and may confuse users.
- Remove unused urlWatcher.refCount field.
- Remove unused urlWatcher.close() method.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
2023-09-18 17:06:39 +02:00
Aliaksandr Valialkin
4d01bc6d52
lib/backup: properly copy parts.json files inside indexdb directory additional to data directory
This is a follow-up for 264ffe3fa1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5006
2023-09-18 16:16:50 +02:00
Aliaksandr Valialkin
f93a7b8457
lib/backup/common: consistently use canonical path with / directory separators at Part.Path
Previously Part.Path could contain `\` directory separators on Windows OS,
which could result in incorrect filepaths generation when making backups at object storage.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4704

This is a follow-up for f2df8ad480
2023-09-18 16:15:34 +02:00
Zakhar Bessarab
eeb862f3ff
lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861)
* lib/promscrape/discovery/kubernetes: fix leaking api watcher

goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server

This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: clarify comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Apply suggestions from code review

* lib/promscrape/discovery/kubernetes: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-15 19:40:13 +02:00
faceair
b6ad581b45
lib/storage: remove ForceMergeAllParts internal loop (#4999)
Signed-off-by: faceair <git@faceair.me>
2023-09-15 19:04:54 +02:00
Zakhar Bessarab
264ffe3fa1
lib/backup: force copying of parts.json (#5006)
* lib/backup: force copying of parts.json

Copying of parts.json is required because `part.key()` comparison can create same key value for files with different contents. This will result in inconsistent backup being created or restored.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/backup: ensure parts.json is only copied once

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-15 19:04:38 +02:00
Aliaksandr Valialkin
a09c680170
lib/storage: handle fatal errors inside indexSearch.getTSIDByMetricID() instead of returning them to the caller
This simplifies the code a bit at caller side
2023-09-15 11:55:42 +02:00
Aliaksandr Valialkin
9de440c803
lib/logger: increase the maximum log arg size from 200 to 500
The 200 chars limit has been appeared too small for typical log messages emitted by VictoriaMetrics components

This is a follow-up for 87fea7d8ac
2023-09-07 16:11:08 +02:00
Aliaksandr Valialkin
87fea7d8ac
lib/logger: limit the maximum arg length, which can be emitted to log lines
This should prevent from emitting too long lines when too long args are passed to logger.* functions.
For example, too long MetricsQL queries or too long data samples.
2023-09-07 15:22:46 +02:00
Aliaksandr Valialkin
24d61bf193
lib/flagutil: add Duration.Milliseconds() convenience function after 0c7d46d637
This function is a faster replacement for Duration.Duration().Milliseconds() call
2023-09-03 10:56:44 +02:00
Dima Lazerka
0c7d46d637
flagutil: Make .Msecs private (#4906)
* Introduce flagutil.Duration

To avoid conversion bugs

* Fix tests

* Clarify documentation re. month=31 days

* Add fasttime.UnixTime() to obtain time.Time

The goal is to refactor out the last usage of `.Msecs`.

* Use fasttime for time.Now()

* wip

- Remove fasttime.UnixTime(), since it doesn't improve code readability and maintainability
- Run `make docs-sync` for syncing changes from README.md to docs/ folder
- Make lib/flagutil.Duration.Msec private
- Rename msecsPerMonth const to msecsPer31Days in order to be consistent with retention31Days

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-03 10:33:37 +02:00
Aliaksandr Valialkin
edee262ecc
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Dima Lazerka
e0e856d2e7
Add flagutil.Duration to avoid conversion bugs (#4835)
* Introduce flagutil.Duration

To avoid conversion bugs

* Fix tests

* Comment why not .Seconds()
2023-09-01 09:27:51 +02:00
Nikolay
00685b627f
lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch … (#4901)
* lib/promscrape/k8s_sd: set resourceVersion to 0 by default for watch requests
it must reduce load for kubernetes ETCD servers. Since requests without resourceVersion performs force cache sync at kubernetes API server with ETCD
more info at https://kubernetes.io/docs/reference/using-api/api-concepts/\#semantics-for-watch
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4855

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-30 16:03:41 +02:00
Aliaksandr Valialkin
1bba4c5118
lib/auth: add NewTokenPossibleMultitenant() for parsing auth token, which can be multitenant
Disallow parsing multitenant token at auth.NewToken().

Use auth.NewTokenPossibleMultitenant() at vminsert only. All the other callers should call auth.NewToken(),
since they do not support multitenant token.

This is a follow-up for f0c06b428e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910
2023-08-30 14:17:55 +02:00
Aliaksandr Valialkin
039b8667c4
lib/proxy: consistently use gopkg.in/yaml.v2 across all the code 2023-08-29 13:12:38 +02:00
Aliaksandr Valialkin
317a273c6d
lib/logstorage: eliminate data race when clearing s.ptwHot after deleting the corresponding partition
The previous code could result in the following data race:
1. The s.ptwHot partition is marked to be deleted
2. ptw.decRef() is called on it
3. ptw.pt is set to nil
4. s.ptwHot.pt is accessed from concurrent goroutine, which leads to panic.

The change clears s.ptwHot under s.partitionsLock in order to prevent from the data race.

This is a follow-up for 8d50032dd6

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4895
2023-08-29 11:09:55 +02:00
crossoverJie
8d50032dd6
lib/logstorage: Set ptwHot to nil when the partition pointed by ptwHot is dropped (#4902) 2023-08-29 11:01:19 +02:00
hagen1778
5d848363f0
lib/promscrape: follow-up after eabcfc9bcd
`-promscrape.cluster.membersCount` by default should be `1`, like every
single vmagent is a cluster of one member on its own.
The change additionally validates that user can't set `-promscrape.cluster.membersCount`
to value lower than `1`.

eabcfc9bcd
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-29 10:04:57 +02:00
Haleygo
eabcfc9bcd
fix clusterMembersCount check (#4900) 2023-08-29 15:58:24 +08:00
crossoverJie
cde5029bce
lib/logstorage: add nil check for ptwHot.pt (#4896) 2023-08-27 01:24:26 +02:00
Zakhar Bessarab
6e8611f301
lib/promscrape/client: sync timeout for HostClient and http.Client (#4889)
Initially, stream parse mode was reading data from response and parsing it on flight. This was causing longer delay to read the whole response and required increasing timeout value to allow data processing while reading. So that 908e35affd increased timeout value to fix this.

But after 74c00a8762 response in stream parse mode is saved into memory and then parsed eliminating necessity of having timeout value higher that for usual scrape.

Updates: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4847
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-25 15:47:11 +02:00
Aliaksandr Valialkin
f1c2508243
lib/promscrape: add -promscrape.cluster.memberLabel command-line flag
This flag allows specifying an additional label to add to all the scraped metrics.
The flag must contain label name to add. The label value will be equal to -promscrape.cluster.memberNum.

This functionality can help when there is a need to differentiate metrics scraped
by distinct vmagent instances in the cluster according to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4247#issuecomment-1692279393
2023-08-24 22:03:54 +02:00
hagen1778
4ebe8bb1d5
app/vmagent: follow-up after 6788704152
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-24 11:36:42 +02:00
Zakhar Bessarab
6788704152
lib/promscrape/client: make User-Agent consistent between fasthttp and native client (#4886)
User agent was not set for native client which resulted in using one provided by Golang.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4884

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-24 11:31:13 +02:00
Nikolay
c5aac34b68
lib/storage: properly caclucate nextRotationTimestamp (#4874)
cause of typo unix millis was used instead of unix for current timestamp
calculation
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4873
2023-08-23 13:22:53 +02:00
Dmytro Kozlov
b7d07e5acf
lib/protoparser: handle unexpected EOF error when parsing lines in prometheus exposition format (#4851)
Previously only io.EOF was handled, and io.ErrUnexpectedEOF was ignored, but it may happen if the client interrupts the connection.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4817
2023-08-18 08:55:42 +02:00
Aliaksandr Valialkin
cd9f86afe1
lib/envflag: do not allow unsupported form for boolean command-line flags in the form -boolFlag value
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4845
2023-08-17 13:26:53 +02:00
Aliaksandr Valialkin
fdae53a75b
lib/promrelabel: properly replace : char with _ in metric names when -usePromCompatibleNaming command-line flag is set
This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3113#issuecomment-1275077071 comment from @johnseekins
2023-08-14 16:14:42 +02:00
Aliaksandr Valialkin
63e3571e8c
lib/promrelabel: stop emitting DEBUG log lines when parsing if expressions
These lines were accidentally left in the commit 62651570bb

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4635
2023-08-14 15:24:31 +02:00
Aliaksandr Valialkin
ac6c40e896
all: refer to https://docs.victoriametrics.com/#resource-usage-limits in the error message about -search.max* limit
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4827
2023-08-14 01:57:34 -07:00
Aliaksandr Valialkin
a0f695f5de
app/vmbackup: add ability to make server-side copying of existing backups 2023-08-13 17:24:24 -07:00
Nikolay
d144e39592
lib/protoparser/openetelemetry: fixes panic (#4821)
Opentelemetry format allows histograms with non-counter buckets. In this case it makes no sense to add buckets into database
and save only counter with _count suffix.
It could be used as gauge.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4814

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-12 05:09:18 -07:00
Nikolay
f111ddb862
lib/promscrape: adds validation for proxy_url scheme (#4823)
* lib/promscrape: adds validation for proxy_url scheme
adds tests
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4811

* Update lib/proxy/proxy.go

* Update lib/proxy/proxy.go

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-12 05:03:08 -07:00
Aliaksandr Valialkin
d7067c46d0
lib/flagutil: add defaultValue arg to NewArray{Int,Bytes,Duration} functions
The defaultValue is printed in the flag description when passing -help to the app.

This is a follow-up for aef31f201a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4776
2023-08-12 04:19:05 -07:00
Zakhar Bessarab
1bd7637fe1
lib/promrelabel: fix relabeling if clause (#4816)
* lib/promrelabel: fix relabeling if clause being applied to labels outside of current context

Relabeling is applied to each metric row separately, but in order to lower amount of memory allocations it is reusing labels.

Functions which are working on current metric row labels are supposed to use only current metric labels by using provided offset, but if clause matcher was using the whole labels set instead of local metrics.

This leaded to invalid relabeling results such as one described here: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4806

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/CHANGELOG.md: document the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1998
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4806

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-11 06:37:48 -07:00
Aliaksandr Valialkin
be5c4818f5
lib/httpserver: properly quote the returned address from GetQuotedRemoteAddr() for requests with X-Forwarded-For header
Make sure that the quoted address can be used as JSON string.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4676#issuecomment-1663203424

This is a follow up for 252643d100 and ac0b7e0421

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4676
2023-08-11 05:19:50 -07:00
Zakhar Bessarab
f2df8ad480
vmbackupmanager: fixes for windows compatibility (#641)
* app/vmbackupmanager/storage: fix path join for windows

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4704

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/backup: fixes for windows support

- close dir before running os.RemoveAll. Windows FS does not allow to delete directory before all handles will be closed.

- add path "normalization" for local FS to use the same format of paths for both *unix and Windows

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4704

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-11 02:56:11 -07:00
Aliaksandr Valialkin
f35d27aa2b
app/vlstorage: expose vl_data_size_bytes metric at /metrics page for tracking the on-disk data size (both indexdb and the data itself) 2023-07-31 07:56:53 -07:00
Aliaksandr Valialkin
d18ff993e6
lib/promscrape: add a comment why honor_timestamps is set to false by default
This should prevent from returning it back to true in the future

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697
2023-07-28 21:36:32 -07:00
Aliaksandr Valialkin
e3ef3df938
lib/promscrape: use local scrape timestamp for scraped metrics unless honor_timestamps: true is set explicitly
This fixes the case with gaps for metrics collected from cadvisor,
which exports invalid timestamps, which break staleness detection at VictoriaMetrics side.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697 ,
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1654614799
and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1656540535

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1773
2023-07-28 21:11:26 -07:00
Aliaksandr Valialkin
9082a84566
lib/storage: update nextRotationTimestamp relative to the timestamp of the indexdb rotation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4563
2023-07-28 19:48:10 -07:00
Roman Khavronenko
9f1b9b86cc
vmalert: revert unittest feature (#4734)
* Revert "vmalert: unittest support stale datapoint (#4696)"

This reverts commit 0b44df7ec8.

* Revert "docs: specify min version and limitations for vmalert's unit tests"

This reverts commit a24541bd

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "vmalert: init unit test (#4596)"

This reverts commit da60a68d

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: mention unittest revert in changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-28 10:42:02 +02:00
Aliaksandr Valialkin
3d73640815
lib/promscrape/discovery: close unused HTTP connections to service discovery servers
This should prevent from connection leaks

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4724
2023-07-27 14:48:56 -07:00