Commit Graph

1465 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
808a2f3b61
lib/promauth: add support for proxy_url option at oauth2 section in the same way as Prometheus does 2022-04-23 00:01:53 +03:00
Aliaksandr Valialkin
4ade8511e2
lib/promauth: add support for tls_config section at oauth2 config in the same way as Prometheus does 2022-04-23 00:01:52 +03:00
Aliaksandr Valialkin
c2b13e6a04
lib/promscrape/discovery/kubernetes: limit the minimum sleep time between updating dependent ScrapeWork objects
Previously the sleep time could be dropped to nanoseconds, which could result in CPU time waste
2022-04-22 23:15:34 +03:00
Aliaksandr Valialkin
a89e31b304
lib/promscrape/discovery/kubernetes: allow attaching node-level labels and annotations to discovered pod targets in the same way as Prometheus 2.35 does
See https://github.com/prometheus/prometheus/issues/9510
and https://github.com/prometheus/prometheus/pull/10080
2022-04-22 20:15:34 +03:00
Aliaksandr Valialkin
cc6eae6992
lib/promscrape/discovery/kubernetes: improve the performance of urlWatcher.reloadObjects() on multi-CPU systems
Parallelize the generation of ScrapeWork objects there. Previously they were generated in a single goroutine.
2022-04-22 13:23:39 +03:00
Aliaksandr Valialkin
60f74dab56
lib/promscrape: prevent from memory leaks on -promscrape.config reload when only a small part of scrape jobs is updated
This is a follow-up after 26b78ad707
2022-04-22 13:23:37 +03:00
Aliaksandr Valialkin
ed1b394a1a
app/vmstorage: expose vm_indexdb_items_added_total and vm_indexdb_items_added_size_bytes_total counters at /metrics page
These counters can be used for monitoring the rate of addition of new entries in indexdb (aka inverted index).

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2471
2022-04-21 13:19:42 +03:00
Aliaksandr Valialkin
fea9d1e6ee
lib/promscrape/discovery/kubernetes: properly update endpoints and endpointslice objects when the related pod or service objects are updated
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240

This is a follow-up for 2341bd48d7
2022-04-21 13:06:49 +03:00
Aliaksandr Valialkin
1e0517b9cd
lib/promscrape: remove possible data race when cleaning up internStringsMap 2022-04-20 18:41:23 +03:00
Aliaksandr Valialkin
1ae16bf671
lib/promscrape: zero out labels after duplicate removal inside mergeLabels() 2022-04-20 18:35:27 +03:00
Aliaksandr Valialkin
e9f08b1e6a
lib/promscrape/discovery/kubernetes: do not pre-allocate memory for ScrapeWork objects
There is high chance that ScrapeWork objects won't be generated because of relabeling
2022-04-20 16:42:41 +03:00
Aliaksandr Valialkin
909a3ee0e4
lib/promscrape: follow-up after 91e290a8ff 2022-04-20 16:12:26 +03:00
Nikolay
429848a67d
lib/promscrape: reduce latency for k8s GetLabels (#2454)
replaces internStringMap with sync.Map - it greatly reduces lock contention
concurently reload scrape work for api watcher - each object labels added by dedicated CPU

changes can be tested with following script https://gist.github.com/f41gh7/6f8f8d8719786aff1f18a85c23aebf70
2022-04-20 16:12:25 +03:00
Dmytro Kozlov
9dbfd99777
lib/promscrape: simply update UI (#2479)
* lib/promscrape: simply update UI

* lib/promscrape: added vm icon
2022-04-20 15:38:04 +03:00
Aliaksandr Valialkin
45385a5dc6
lib/promscrape: optimize getScrapeWork() function
Reduce the number of memory allocations in this function. This improves its performance by up to 50%.
This should improve service discovery speed when big number of potential targets with big number of meta-labels
are generated by service discovery.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2270
2022-04-20 15:34:18 +03:00
Aliaksandr Valialkin
bfa0b8f710
lib/promscrape: use a hash over target labels as a key for dropped targets' map
This reduces the number of allocations and improves the performance for updating dropped targets' map.
This map is exposed at /api/v1/targets as in droppedTargets list.
2022-04-20 15:23:54 +03:00
Aliaksandr Valialkin
d0bac8e224
all: typo fix: Kuberntes -> Kubernetes 2022-04-20 10:51:41 +03:00
Dmytro Kozlov
17552dba8b
lib/promscrape: Enable filters for endpoint and labels (#2466)
* lib/promscrape: Enable filters for endpoint and labels

* lib/promscrape: cleanup

* lib/promscrape: update template

* lib/promscrape: move logic filter logic to backend

* lib/promscrape: updated placeholder

* lib/promscrape: updated placeholder

* lib/promscrape: use two different fields for filters, updated form, added error on parsing queries

* lib/promscrape: rename functions

* lib/promscrape: removed unused values

* wip

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-04-19 18:27:44 +03:00
Nikolay
628905f080
lib/promscrape: adds job restart method (#2455)
* lib/promscrape: adds job restart method
it must restart only ScrapeConfig with changed content
this change greatly reduce time, that needed for job restart
and it should decrease possible data loss when config frequently changed at kubernetes based deployments

Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* wip

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-04-16 20:29:33 +03:00
Aliaksandr Valialkin
7debf57ca6
lib/httpserver: clarify that -tls flag enables TLS for http requests to -httpListenAddr 2022-04-16 16:59:41 +03:00
Aliaksandr Valialkin
a7689e1b0c
app/vmstorage: add support for mTLS cipher suites via -cluster.tlsCipherSuites command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2404
2022-04-16 16:36:38 +03:00
Aliaksandr Valialkin
27e74f25d6
lib/httpserver: follow up after def0032c7d 2022-04-16 15:52:44 +03:00
Dmytro Kozlov
26ae50ec26
lib/httpserver: added tlsCipherSuites flag (#2468)
* lib/httpserver: added tlsCipherSuites flag

* lib/httpserver: compare lower case strings

* lib/httpserver: use EqualFold

* lib/httpserver: used flagutil.NewArray, supported only strings cipher suites

* lib/httpserver: updated flag description, added flag to documentation

* Update lib/httpserver/httpserver.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-04-16 15:52:42 +03:00
Aliaksandr Valialkin
c50e48a74c
lib/promscrape: follow-up after baa1c24b36 2022-04-16 14:26:38 +03:00
Nikolay
a56ee034af
lib/promscrape: removes omitempty for ScrapeConfig (#2457)
This change fixes incorrect marshalling for ScrapeConfig
it affects http endpoint and ScrapeConfig checksum.

With omitempty, custom Marshaller is not called if field is not a pointer.

Previously this issue happened at vmalert
2022-04-16 14:26:36 +03:00
Aliaksandr Valialkin
4a3172f150
lib/encoding: explicitly set slice length passed to binary.BigEndian.Uint*
This allows Go complier to generate more optimal code without bound checks
2022-04-12 12:56:52 +03:00
Aliaksandr Valialkin
70ad171070
lib/promscrape: follow-up after 7e79adfb55 2022-04-12 12:37:03 +03:00
Nikolay
e26bcb8bbb
lib/promscrape: allows to use k8s pod name as clusterMemberNum (#2436)
* lib/promscrape: allows to use k8s pod name as clusterMemberNum
it must improve user expirience and simplify clustering scrapers.
it must allow to use vmagent cluster with distroless images
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2359

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-04-12 12:37:02 +03:00
Aliaksandr Valialkin
81b7a31cb1
app/vmstorage: properly handle maxSeries limit passed from vmselect to vmstorage 2022-04-12 11:19:07 +03:00
Aliaksandr Valialkin
e3bf464f11
lib/protoparser/native: follow-up after fe01f4803d 2022-04-11 19:27:53 +03:00
Nikolay
39225fc809
lib/protoparser/native: fixes parseStream dead-lock (#2423)
previously, if native block cannot be unmarshaled, wg.Done wasn't called by unmarshal work.
It leads to connection blocking and possible dead-lock at client side
2022-04-11 19:27:51 +03:00
Aliaksandr Valialkin
edb139cfe4
lib/memory: export process_memory_limit_bytes metric, which shows the amounts of memory the current process has access to
This metric is equivalent to `vm_available_memory_bytes`, but it has better name,
since the metric is related to a process, not VictoriaMetrics itself.

Leave `vm_available_memory_bytes` for backwards compatibility.
2022-04-07 15:24:08 +03:00
Aliaksandr Valialkin
cb319b15bb
lib/storage: increase the number of rawRowsShard shards on systems with more than 4 CPU cores
This should improve data ingestion scalability on systems with many CPU cores
2022-04-06 19:50:41 +03:00
Aliaksandr Valialkin
8ef9348801
lib/mergeset: use more rawItemsShard shards on multi-CPU systems
This should improve the scalability for registering of new time series on multi-CPU system
2022-04-06 19:50:41 +03:00
Aliaksandr Valialkin
db00ddd23e
lib/mergeset: skip common prefixes when comparing inmemoryBlock items
This should improve the performance for items sorting inside inmemoryBlock.MarshalUnsortedData
if they have common prefix.

While at it, improve the performance for inmemoryBlock.updateCommonPrefix for sorted items.
This should improve performance for inmemoryBlock.MarshalSortedData during background merge.
2022-04-06 18:55:25 +03:00
Aliaksandr Valialkin
88c2631320
lib/protoparser: remove superflowous memory allocations during protocol parsing 2022-04-06 14:00:50 +03:00
Aliaksandr Valialkin
123a88bb65
lib/storage: reuse sync.WaitGroup objects
This reduces GC load by up to 10% according to memory profiling
2022-04-06 14:00:50 +03:00
Aliaksandr Valialkin
f526c7814e
lib/cgroup: reduce the default GOGC value from 50% to 30%
This reduces memory usage under production workloads by up to 10%,
while CPU spent on GC remains roughly the same.

The CPU spent on GC can be monitored with go_memstats_gc_cpu_fraction metric
2022-04-06 14:00:50 +03:00
Aliaksandr Valialkin
0f1ebd911d
lib/workingsetcache: reuse prev cache after its reset
This should reduce memory churn rate
2022-04-05 20:39:44 +03:00
Aliaksandr Valialkin
ac93c36be7
lib/workingsetcache: check more frequently for cache size overflow
This should reduce the probability of cache size limit overflow
2022-04-05 18:05:33 +03:00
Nikolay
7eb49d204f
vmctl verify-blocks command (#2390)
* lib/protoparser: changes ParseStream for native format
uses reader instead of http.Request
updates app/vmagent and app/vmagent method usage

* app/vmctl: add verify-block subcommand
it allows to check exported from VictoriaMetrics data block in native format
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2362

Update app/vmctl/README.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2022-04-05 17:46:36 +03:00
Aliaksandr Valialkin
fca0cb8156
lib/workingsetcache: reduce the expiration duration from 20 minutes to 10 minutes
This should reduce memory usage for the cache under high churn rate
2022-04-05 17:08:43 +03:00
Aliaksandr Valialkin
8752cce157
app/vminsert: reduce the max packet size, which vminsert can send to vmstorage
This reduces the max memory usage for vminsert and vmstorage under heavy ingestion rate
by up to 50% on production workload
2022-04-05 15:39:58 +03:00
Nikolay
4cf6219e07
lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache (#2293)
* lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache

It should decrease memory usage for regexp caching
with storing cacheEntry by pointer - golang map should be able to effectivly shrink it's size
original issue with this case - unexpected map grows and storage OOM

Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

Adds missing metrics for regexp cache and regexpPrefixes cache

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-03-26 12:57:27 +02:00
Aliaksandr Valialkin
b843f0e229
app/vmselect: add fine-grained limits for the number of returned/scanned time series for various APIs 2022-03-26 11:28:14 +02:00
Aliaksandr Valialkin
a8a4581c37
lib/blockcache: properly remove references to deleted parts
Previously references to deleted parts may remain active as cache.m keys.
This could prevent from proper memory de-allocation.
This could lead to increased memory usage for the following caches starting from v1.73.0:

* indexdb/indexBlocks
* indexdb/dataBlocks
* storage/indexBlocks

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2242
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007

This is a follow-up for 88605a7ea2
2022-03-18 17:07:54 +02:00
Aliaksandr Valialkin
e35c9124b7
lib/storage: reduce the interval for checking for free disk space from 30 seconds to 1 second
This should reduce the probability of out of disk space panics when -storage.minFreeDiskSpaceBytes is set to low values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2305
2022-03-18 16:53:19 +02:00
Aliaksandr Valialkin
7c92aaeaa4
lib/blockcache: properly release memory occupied by deleted entries
Proviously the deleted entries could remain referenced via lastAccessHeap for long time.
This could lead to increased memory usage for the following caches starting from v1.73.0:

* indexdb/indexBlocks
* indexdb/dataBlocks
* storage/indexBlocks

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2242
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-03-18 16:53:19 +02:00
Aliaksandr Valialkin
a6d65fc824
lib/storage: typo fix after e7831ae154 2022-03-18 16:53:19 +02:00
jduncan0000
e7831ae154
Fix for issue #2255 - matchTagFilters for positive empty-match filters (#2304)
* fix for issue 2255 - matchTagFilters for positive empty-match filters

* add example to comments

* formatting

* add test for positive empty match

* formatting
2022-03-18 13:08:54 +02:00
Aliaksandr Valialkin
698458b742
lib/httpserver: extract the code responsible for initializing server-side TLS config into netutil.GetServerTLSConfig 2022-03-17 19:46:20 +02:00
Aliaksandr Valialkin
191977b324
lib/storage: trashing -> thrashing typo in docs
This is a follow-up for 918ed5cb32
2022-03-16 13:28:29 +02:00
Vic (Shihang) Li
9767fcd837
fix: change thrashing typo (#2317) 2022-03-16 13:05:55 +02:00
Aliaksandr Valialkin
3f999b11f7
lib/mergeset: remove aux buffers from inmemoryPart
This should reduce the size of inmemoryPart items and may improve performance a bit during registering new time series

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
2022-03-03 17:12:25 +02:00
Aliaksandr Valialkin
ecf68da79e
lib/mergeset: eliminate copying of itemsData and lensData from storageBlock to inmemoryBlock
This should improve performance when registering new time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
2022-03-03 17:12:25 +02:00
Aliaksandr Valialkin
ecf4f7bf21
lib/mergeset: consistency renaming: ip->mp for inmemoryPart vars 2022-03-03 17:12:25 +02:00
Aliaksandr Valialkin
f4e466955d
lib/mergeset: move storageBlock from inmemoryPart to a sync.Pool
The lifetime of storageBlock is much shorter comparing to the lifetime of inmemoryPart,
so sync.Pool usage should reduce overall memory usage and improve performance
because of better locality of reference when marshaling inmemoryBlock to inmemoryPart.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
2022-03-03 17:12:25 +02:00
Aliaksandr Valialkin
b47f18f555
lib/{mergeset,storage}: tune compression levels for small blocks
This should reduce CPU usage spent on compression
2022-02-25 15:34:13 +02:00
Aliaksandr Valialkin
28b610db07
lib/storage: document why job-like and instance-like labels must be stored at mn.Tags[0] and mn.Tags[1]
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2244
2022-02-25 13:21:53 +02:00
Aliaksandr Valialkin
d1881fa582
lib/storage: add a comment to indexSearch.containsTimeRange() on why it allows false positives
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2239
2022-02-24 12:48:33 +02:00
Aliaksandr Valialkin
02a922b53f
lib/storage: properly handle series selector matching multiple metric names plus a negative filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2238

This is a follow-up for 00cbb099b6
2022-02-24 12:11:53 +02:00
Aliaksandr Valialkin
1967b9c211
lib/mergeset: remove superflouos sorting of inmemoryBlock.data at inmemoryBlock.sort()
There is no need to sort the underlying data according to sorted items there.
This should reduce cpu usage when registering new time series in `indexdb`.

Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2245
2022-02-24 11:19:44 +02:00
Aliaksandr Valialkin
2431c9cf81
lib/promrelabel: add support for conditional relabeling via if filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1998
2022-02-24 02:35:13 +02:00
Aliaksandr Valialkin
6fd85117ac
lib/workingsetcache: do not rotate cache if it is in whole state
This should reduce the maximum memory usage for the cache in `whole` state
2022-02-23 22:55:10 +02:00
Aliaksandr Valialkin
244c23ea2c
lib/workingsetcache: reduce the default cache rotation period from hour to 20 minutes
This should reduce memory usage under high time series churn rate
2022-02-23 13:42:27 +02:00
Aliaksandr Valialkin
fcaa0c5202
lib/storage: optimize /api/v1/status/tsdb call by skipping all the artificially created tag entries at once
This is a follow-up for b71be42d90
2022-02-21 19:00:04 +02:00
Aliaksandr Valialkin
986c12487a
lib/mergeset: typo fix after b6ed9afd6d 2022-02-21 19:00:04 +02:00
Aliaksandr Valialkin
d4df32cc6b
lib/blockcache: evict entries from the cache in LRU order
This should improve hit rate for smaller caches
2022-02-21 19:00:04 +02:00
Roman Khavronenko
5a4b16794d
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:33:33 +02:00
Roman Khavronenko
bd7837d524
lib: allow to configure cache size by type (#2206)
* lib: allow to configure cache size by type

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 13:55:51 +02:00
Aliaksandr Valialkin
5260d7a954
lib/storage: typo fix after c3affb0c4f 2022-02-17 12:56:33 +02:00
Aliaksandr Valialkin
d9bdb42219
lib/storage: simplify code for searching for label values
This is a follow-up after 9dd191b27c
2022-02-17 12:39:14 +02:00
Aliaksandr Valialkin
2ebc3d21c3
lib/storage: properly skip composite tag entries when searching for tag names or tag values
This is a follow-up for b71be42d90

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 23:02:18 +02:00
Aliaksandr Valialkin
3107224306
lib/blockcache: fix TestCache by ensuring that the cache size can be divided by the number of cache shards
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2204
2022-02-16 18:48:09 +02:00
Aliaksandr Valialkin
63bc89dd81
lib/storage: document why tsid cache is reset before saving it to disk
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2205
2022-02-16 18:37:29 +02:00
Aliaksandr Valialkin
ee066aa0d5
lib/storage: use binary search instead of full scan for skipping artificial tags when searching for tag names or tag values
This should improve performance for /api/v1/labels and /api/v1/label/<label_name>/values

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 18:17:27 +02:00
Roman Khavronenko
8cbcc560b9
vmagent: fix js error on CollapseAll/ExpandAll buttons click (#2192)
* vmagent: fix js error on CollapseAll/ExpandAll buttons click

`Uncaught TypeError: Cannot read properties of null (reading 'style')`

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2021
2022-02-15 12:54:39 +02:00
Corporte Gadfly
cf8df24227
match fileSDCheckInterval with prometheus file_sd_config default (#2188) 2022-02-15 12:05:57 +02:00
Aliaksandr Valialkin
5d8ea8c918
docs/CHANGELOG.md: document 3d890e89f1 2022-02-14 17:42:33 +02:00
Nikolay
748034e7af
Adds server certificate reload for lib/http (#2186)
* Adds server certificate reload for lib/http
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171

* Update lib/httpserver/httpserver.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:42:33 +02:00
Nikolay
c11d0949c8
fixes all_tenants query option usage for openstack service discovery (#2184)
explicit use configuration parametr instead of conditional add
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2182
2022-02-14 13:13:03 +02:00
Aliaksandr Valialkin
31b42e9c57
lib/promscrape: add expand all and collapse all buttons to /targets page
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2021
2022-02-12 18:42:01 +02:00
Aliaksandr Valialkin
53c2135d2a
lib/storage: tune the logic for pre-populating of the per-day inverted index for the next day
- Postpone the pre-poulation to the last hour of the current day. This should reduce the number
  of useless entries in the next per-day index, which shouldn't be created there,
  when the corresponding time series are stopped to be pushed during the current day.

- Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself
  when calculating the need for for the given MetricID pre-population.

- Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache
  after indexdb rotation. This should improve code maintainability.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
2022-02-12 16:39:33 +02:00
artifactori
943fc056ca
Show gce sdconfig zone on vmagent:8429/config (#2178)
* vmagent: add test for marshalling gce sdconfig with ZoneYAML

* vmagent: implement MarshalYAML for ZoneYAML on gce sdconfig
2022-02-12 00:48:38 +02:00
Roman Khavronenko
d107f86fbc
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:34:44 +02:00
Aliaksandr Valialkin
21a42e990f
lib/storage: fix broken BenchmarkHeadPostingForMatchers for {i=~".*"} after f4dead529f
The commit f4dead529f makes such query to return nothing instead of all the time series.
This aligns more with Prometheus behaviour.
2022-02-12 00:28:21 +02:00
Roman Khavronenko
791cad8c2e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:51 +02:00
Aliaksandr Valialkin
727b29d4a3
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpointslice_{label,annotation}* labels to be consistent with other role values for Kubernetes service discovery 2022-02-11 14:56:10 +02:00
Nikolay
265938a385
fixes service discovery for kubernetes (#2173)
* fixes service discovery for kubernetes
now it must take in account all pods that belong to the discovered endpoint and endpointslice
adds simple test for endpoints
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2134

* wip

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 13:35:34 +02:00
Aliaksandr Valialkin
895c9f4f11
lib/mergeset: tune indexdb/{indexBlocks,dataBlocks} cache sizes further according to production stats
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-10 19:11:13 +02:00
Aliaksandr Valialkin
102c9a4bf9
lib/blockcache: use higher number of shards for higher number of CPU cores
This should reduce mutex contention and increase performance

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-10 19:11:11 +02:00
Aliaksandr Valialkin
ad19c9d302
lib/promscrape: fix errors in test config
The errors were discovered after enabling strict parse mode by default.
See 9bb60ab00f
2022-02-08 20:10:28 +02:00
Aliaksandr Valialkin
fae3040868
lib/blockcache: split the cache into multiple shards
This should reduce contention on cache mutex on hosts with many CPU cores,
which, in turn, should increase overall throughput for the cache.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-08 19:48:32 +02:00
Aliaksandr Valialkin
a0a56d6c1c
lib/mergeset: tune sizes for indexdb/dataBlocks and indexdb/indexBlocks according to production workload
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007#issuecomment-1032308742
2022-02-08 18:04:03 +02:00
Aliaksandr Valialkin
eed66b6640
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:42:33 +02:00
Aliaksandr Valialkin
8863339b6b
lib/blockcache: make fmt 2022-02-08 15:42:31 +02:00
Aliaksandr Valialkin
1caee74235
lib/blockcache: eliminate possible race when Cache.Put is called for the same entry from multiple goroutines
The race could result in incorrect cache size tracking, which, in turn, could result in too frequent cache cleaning
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
10476738a8
lib/blockcache: increase the lifetime for rarely accessed blocks from 2 minutes to 5 minutes
This should improve data ingestion speed if time series samples are ingested with interval bigger than 2 minutes.
The actual interval could exceed 2 minutes if the original interval between samples doesn't exceed 2 minutes
in the case of slow inserts. Slow inserts may appear in the following cases:

* Big number of new time series are pushed to VictoriaMetrics, so they couldn't be registered in 2 minutes.
* MetricName->tsid cache reset on indexdb rotation or due to unclean shutdown.
  In this case VictoriaMetrics needs to load MetricName->tsid entries for all the incoming series from IndexDB.
  IndexDB uses the block cache for increasing lookup performance. If the cache has no the needed block,
  then IndexDB reads and unpacks the block from disk. This requires an extra disk read IO and CPU.
  See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007

This also should increase performance for periodically executed queries with intervals from 2 minutes to 5 minutes.
See the previous similar commit - 43103be011

It is possible that the timeout can be increased further. Let's collect production numbers for this change
so the timeout could be adjusted further.
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
b7cefff7b0
lib/workingsetcache: use the original cache size limits when rotating caches
Previously limits for new caches were taken from cache stats.
These limits could mismatch the original limits. This could result in failed cache load
if the stored cache has been created with the limits obtained from cache stats.
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
87071640a7
lib/blockcache: return proper number of entries from the cache
This has been broken in 0d7374ad2f
2022-02-08 01:18:27 +02:00