Commit Graph

621 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
e78f3ac8ac
app/vmselect/bufferedwriter: suppress trivial network errors, which can be generated by remote side
These errors include `broken pipe` and `reset by peer`.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2334
2022-03-18 19:28:02 +02:00
Aliaksandr Valialkin
ec03dec72d
app/vmagent/remotewrite: prevent from infinite recursion panic when pushing a time series with big number of samples to remote storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2335
2022-03-18 19:06:22 +02:00
Aliaksandr Valialkin
620b605786
docs: document 20bb5e703c 2022-03-18 18:41:19 +02:00
Aliaksandr Valialkin
2ae3a9a8a3
lib/storage: reduce the interval for checking for free disk space from 30 seconds to 1 second
This should reduce the probability of out of disk space panics when -storage.minFreeDiskSpaceBytes is set to low values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2305
2022-03-18 16:52:27 +02:00
Aliaksandr Valialkin
88605a7ea2
lib/blockcache: properly release memory occupied by deleted entries
Proviously the deleted entries could remain referenced via lastAccessHeap for long time.
This could lead to increased memory usage for the following caches starting from v1.73.0:

* indexdb/indexBlocks
* indexdb/dataBlocks
* storage/indexBlocks

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2242
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-03-18 16:52:27 +02:00
Aliaksandr Valialkin
db781a9342
docs/CHANGELOG.md: document e5868b9c29
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/546
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2255
2022-03-18 13:07:32 +02:00
Aliaksandr Valialkin
e6acd16daf
docs/CHANGELOG.md: document 11ae1ae924 2022-03-17 20:08:05 +02:00
Aliaksandr Valialkin
11869a8307
docs: document the addition of mTLS communication between cluster components 2022-03-17 20:00:56 +02:00
Aliaksandr Valialkin
8ae9825bb4
docs/CHANGELOG.md: document c1d07e7c52f0a2ab892921b0639cd42677aa33a8 2022-03-16 14:25:44 +02:00
Aliaksandr Valialkin
54ec080bbc
docs/CHANGELOG.md: document changes from fb6eab03a2 2022-03-16 13:21:57 +02:00
Aliaksandr Valialkin
370024c7ed
docs/CHANGELOG.md: document 565bd08c43
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1824
2022-03-16 12:50:49 +02:00
hagen1778
894416b4ca docs: add update details for some releases
Some of the releases could negatively affect performance for a limited
period of time due to some changes in core. Update details are meant to
warn users about expected changes in peformance after the update.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-15 07:06:48 +03:00
Roman Khavronenko
0fa7effc4b
docs: fix broken links (#2303)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-13 15:56:01 +02:00
Aliaksandr Valialkin
74bb9ea734
docs/CHANGELOG.md: cut v1.74.0 2022-03-03 19:30:41 +02:00
Aliaksandr Valialkin
ce8d28f8f4
docs/CHANGELOG.md: document performance improvements when registering new time series 2022-03-03 17:11:30 +02:00
Aliaksandr Valialkin
e757ebc58b
app/vmselect/netstorage: report vmstorage errors to vmselect clients even if partial responses are allowed
If a vmstorage is reachable and returns an application-level error to vmselect,
then such error must be returned to the caller even if partial responses are allowed,
since it usually means cluster mis-configuration.

Partial responses may be returned only if some vmstorage nodes are temporarily unavailable.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1941
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/678
2022-02-25 13:56:42 +02:00
Nikolay
9fe2e4e2c2
fixes incorrect step for calculation for MovingWindow functions (#283)
* fixes incorrect step for calculation for MovingWindow functions
https://victoriametrics.zendesk.com/agent/tickets/99

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-25 13:54:03 +02:00
Aliaksandr Valialkin
0d79c8cbef
docs/CHANGELOG.md: document cfc6c14dc48ae9dd35e65f1a6e5c7af8ccb9f029 2022-02-25 13:52:48 +02:00
Aliaksandr Valialkin
a16f1ae565
lib/storage: properly handle series selector matching multiple metric names plus a negative filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2238

This is a follow-up for 00cbb099b6
2022-02-24 12:15:54 +02:00
Aliaksandr Valialkin
3f49bdaeff
lib/promrelabel: add support for conditional relabeling via if filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1998
2022-02-24 02:27:26 +02:00
Nikolay
fbac1a9dad
fixes jwt token parse with correct base64Url decoding (#281)
* fixes jwt token parse with correct base64Url decoding
it must be applied according to jwt RFC that requires token to be URL safe

added slow path for decoding tokens with std base64 decoding

adds error logging for vmgateway

* docs/CHANGELOG.md: document the bugfix

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-23 13:57:36 +02:00
Aliaksandr Valialkin
62b46007c5
lib/workingsetcache: reduce the default cache rotation period from hour to 20 minutes
This should reduce memory usage under high time series churn rate
2022-02-23 13:41:45 +02:00
Aliaksandr Valialkin
acbea6c1ee
docs/CHANGELOG.md: cut v1.73.1 2022-02-22 21:11:42 +02:00
Aliaksandr Valialkin
0a38542a45
docs/CHANGELOG.md: link to the feature request for X-Influxdb-Version response header
Follow-up for 71ef3155c8

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2209
2022-02-22 20:35:53 +02:00
Roman Khavronenko
69d1893f4c
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:32:45 +02:00
Roman Khavronenko
b6ed9afd6d
lib: allow to configure cache size by type (#2206)
* lib: allow to configure cache size by type

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 13:50:34 +02:00
Aliaksandr Valialkin
71ef3155c8
app/vminsert: add X-Influxdb-Version response header for InfluxDB API requests
This is needed for some clients, which expect this header.
See https://github.com/ntop/ntopng/issues/5449#issuecomment-1005347597
2022-02-17 12:47:43 +02:00
Aliaksandr Valialkin
3c3805865b
docs: document 3d19fa6932 2022-02-16 23:30:17 +02:00
Aliaksandr Valialkin
b71be42d90
lib/storage: use binary search instead of full scan for skipping artificial tags when searching for tag names or tag values
This should improve performance for /api/v1/labels and /api/v1/label/<label_name>/values

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 18:15:41 +02:00
Aliaksandr Valialkin
424121c126
docs/CHANGELOG.md: document 2efa46a11c 2022-02-15 21:11:56 +02:00
Aliaksandr Valialkin
0f9e107e36
docs/CHANGELOG.md: document ad6bdd78d0 2022-02-15 12:47:55 +02:00
Aliaksandr Valialkin
ac502785b6
docs/CHANGELOG.md: cut v1.73.0 2022-02-14 17:54:32 +02:00
Aliaksandr Valialkin
1215f51043
docs/CHANGELOG.md: document 3d890e89f1 2022-02-14 17:39:12 +02:00
Nikolay
75e84144c7
adds release build for macos darwin amd64 and arm64 (#2185)
* adds release build for macos darwin amd64 and arm64

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1896
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1851

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:28:56 +02:00
Aliaksandr Valialkin
578a37aa14
docs/CHANGELOG.md: document c90c1c4d54 2022-02-14 13:09:12 +02:00
Aliaksandr Valialkin
f10c38b827
lib/promscrape: add expand all and collapse all buttons to /targets page 2022-02-12 18:41:29 +02:00
Aliaksandr Valialkin
b1f94f7f0e
app/vmselect/promql: return at most one time series from absent_over_time() in the same way as Prometheus does
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2130
2022-02-12 15:45:09 +02:00
Aliaksandr Valialkin
d8ffbf55a2
docs/CHANGELOG.md: document ea153e5f90 2022-02-12 00:48:06 +02:00
Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
Aliaksandr Valialkin
3cb72ccc2a
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpointslice_{label,annotation}* labels to be consistent with other role values for Kubernetes service discovery 2022-02-11 14:54:47 +02:00
Nikolay
4e7f7f3302
fixes service discovery for kubernetes (#2173)
* fixes service discovery for kubernetes
now it must take in account all pods that belong to the discovered endpoint and endpointslice
adds simple test for endpoints
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2134

* wip

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 13:34:22 +02:00
Aliaksandr Valialkin
8f2d03fdc7
docs/CHANGELOG.md: document 4e722c459b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2167
2022-02-10 12:20:12 +02:00
Aliaksandr Valialkin
0028b2c6d1
docs/CHANGELOG.md: add instructions on how to build VictoriaMetrics components from source code in order to test tip changes 2022-02-08 16:44:08 +02:00
Nikolay
a8acad7453
adds CGO build for arm64 (#2102)
* adds CGO build for arm64
it must improve performance for arm64 based deployments of vmstorage and
vmsingle for 15-20%

it depends on gozstd package update for correct musl gozstd vendoring

* typo fixes

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-08 16:25:59 +02:00
Aliaksandr Valialkin
9bb60ab00f
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:41:43 +02:00
Aliaksandr Valialkin
8b36044c93
docs/CHANGELOG.md: add links to issues, which could benefit from improved re-routing algorithm 2022-02-07 16:55:28 +02:00
Aliaksandr Valialkin
865f09ecbb
docs: sync with cluster branch 2022-02-07 14:54:16 +02:00
Aliaksandr Valialkin
b5b3c585b3
lib/promscrape: show the total number of scrapes and the total number of scrape errors per target at /targets page
This information may be useful when debugging unreliable scrape targets
2022-02-03 20:22:41 +02:00
Aliaksandr Valialkin
2968779f16
lib/promscrape: provide the ability to fetch target responses on behalf of vmagent or single-node VictoriaMetrics
This feature may be useful when debugging metrics for the given target located in isolated environment
2022-02-03 19:00:55 +02:00