Commit Graph

2511 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
222e8b5a7b
lib/streamaggr: update the minimum allowed timestamp for incoming samples before flushing the samples to the storage
This should prevent from dropping samples with old timestamps during long flushes.

This is a follow-up for 1cedaf61cb
2024-04-04 02:26:08 +03:00
Aliaksandr Valialkin
ecd782c75e
app/vmagent: follow-up for b3b29ba6ac
- Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag
- Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config
- Document the change as a feature instead of a bug at docs/CHANGELOG.md
- Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files.
- Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport
  instead of a callback, which can change the internal http.Transport.
- Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs.
  This should reduce memory usage a bit when tls isn't used for scraping big number of targets.
- Do not re-read TLS root CA files on every processed request. Re-read them once per second.
  This should reduce CPU usage when scraping big number of targets over https.
- Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded
  from byte slices via crypto/tls.X509KeyPair().
- Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171
2024-04-04 01:26:38 +03:00
Zakhar Bessarab
80315e07b1
lib/promscrape/config: fix missing timeout for http client (#6063)
Follow-up for b3b29ba6

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-04 00:40:48 +03:00
Zakhar Bessarab
da4352fe7b
lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk (#5725)
* lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk

Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change.

Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`.

This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: update NewRoundTripper API

Update API to allow user to update only parameters required for transport.

Add warning log when reloading Root CA failed.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: fix mutex acquire logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: replace RWMutex with regular mutex to simplify the code

- remove additional mutex used for getRootCABytes - require callee to use mutex
- replace RWMutex with regular mutex

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promauth/config: refactor

- hold the mutex lock to avoid round tripper being re-created twice
- move recreation logic into separate func to simplify the code

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-04-04 00:34:43 +03:00
Aliaksandr Valialkin
bb9bb600b3
lib/protoparser/opentelemetry: follow-up after 47892b4a4c
- Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming
- Clarify the description of the change at docs/CHANGELOG.md
- Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens
- Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens.
- Add tests for various edge cases for Prometheus metric names' normalization
  according to the code at b865505850/pkg/translator/prometheus/normalize_name.go
- Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035
2024-04-03 03:09:52 +03:00
Aliaksandr Valialkin
00f59d6ddf
all: fix golangci-lint(revive) warnings after 0c0ed61ce7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001
2024-04-03 03:00:45 +03:00
Aliaksandr Valialkin
1ffad3a182
lib/storage: consistently use stopCh instead of stop 2024-04-03 02:54:51 +03:00
Aliaksandr Valialkin
faa2ba828a
app/vmagent: simplify code after 509df44d03
- Simplify the code in order to improve its maintenance
- Properly pass tenant ID when processing multi-tenant opentelemetry request at vmagent

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016
2024-04-03 02:50:46 +03:00
Aliaksandr Valialkin
e51190a34c
Revert "app/vmselect: make vmselect resilient to absence of cache folder (#5987)"
This reverts commit cb23685681.

Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders
before their use. This may complicate detecting and fixing such bugs in the future.

There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 :
- To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp).
- To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory,
  which cannot be removed automatically by the OS.

The case when the user accidentally deletes the directory with some files created by VictoriaMetrics
shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically.
It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error
in this case.
2024-04-03 02:44:00 +03:00
Aliaksandr Valialkin
7edb5f77f1
app/vmagent: properly shutdown when -maxIngestionRate limit is reached
The remotewrite.Stop() expects that there are no pending calls to TryPush().
This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop().
Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop().

While at it, move the rate limiter into lib/ratelimiter package, since it has two users.
Also move the description of the feature to the correct place at docs/CHANGELOG.md.
Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags.

This is a follow-up for 02bccd1eb9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900
2024-04-03 02:41:11 +03:00
Zakhar Bessarab
3d15a31c6d
lib/storage: add ability to use downsampling for the given series filter (#733)
* lib/storage: add ability to use downsampling for the given series filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add information about downsampling filters

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix MetricsQL filter

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: treat missing downsampling filter as a bug

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/part_header: verify correctness of downsampling filters when opening partition

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: save only appliable rules in part metadata

Filter and save only rules which are appliable to partition based on MinTimestamp of stored data.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/downsampling: update log messages for final dedup

Properly specify a reason of re-running deduplication for partition.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules

Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule.
For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000].

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Follow-up

- Apply the first matching downsampling period if multiple filters match the given time series.
  This allows fine-tuning the downsampling config for the specific needs.
- Take into account downsampling filters during search queries.
- Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches.
- Properly parse series filters with colons inside them.
- Document the feature at docs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-03 02:38:37 +03:00
Aliaksandr Valialkin
ae6190f0b5
lib/storage/table.go: reduce the difference with enterprise branch 2024-04-03 02:37:05 +03:00
Aliaksandr Valialkin
b6d1d6982e
lib/storage/partition.go: reduce code difference a bit with enterprise branch 2024-04-03 02:36:49 +03:00
Nikolay
c457f7de69
lib/storage: adds metrics for downsampling (#382)
* lib/storage: adds metrics for downsampling
vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled
vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled

These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-04-03 02:36:05 +03:00
Andrii Chubatiuk
0799834aaa
opentelemetry: added cmd flag to sanitize metric names (#6035) 2024-04-03 02:31:39 +03:00
Aliaksandr Valialkin
b8d37ad747
lib/storage: follow-up for 76f00cea6b
Store the deadline when the metricID entries must be deleted from indexdb
if metricID->metricName entry isn't found after the deadline. This should
make the code more clear comparing the the previous version, where the timestamp
of the first metricID->metricName lookup miss was stored in missingMetricIDs.

Remove the misleading comment about the importance of the order for creating entries
in the inverted index when registering new time series. The order doesn't matter,
since any subset of the created entries can become visible for search
before any other subset after registering in indexdb.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
2024-04-02 23:46:21 +03:00
Zakhar Bessarab
7c1ee69205
lib/storage/table: wait for merges to be completed when closing a table (#5965)
* lib/storage/table: properly wait for force merges to be completed during shutdown

Properly keep track of running background merges and wait for merges completion when closing the table.
Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-04-02 21:25:30 +03:00
Andrii Chubatiuk
914b23f1e8
app/{vmagent,vminsert}: fixed firehose response (#6016) 2024-04-02 18:03:12 +03:00
Roman Khavronenko
548bf31dd2
app/vmselect: make vmselect resilient to absence of cache folder (#5987)
vmselect uses a cache folder in file system for two purposes:
1. Storing rollup cache results on shutdown;
2. Storing temporary search results from vmstorage during query executions.

It could happen that cache folder is deleted accidentally by user, or by OS
during cleanup routines. This would cause vmselect to:
1. panic on /metrics call, because `MustGetFreeSpace` will fail;
2. return query error user, as it won't be able to store temporary search results.

The changes in this commit are the following:
1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing;
2. Make vmselect to try re-creating the cache folder if it can't persist tmp search
results.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
(cherry picked from commit cb23685681)
2024-03-26 15:27:32 +01:00
hagen1778
c0659800d5
lib/promauth: follow-up b577413d3b
Convert test result expectations to canonical form.
Starting from b577413d3b specified header keys are forced
into canonical form https://pkg.go.dev/net/http#CanonicalHeaderKey

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e6dd52b04c)
2024-03-25 15:42:22 +01:00
Aliaksandr Valialkin
cd222d6502
lib/streamaggr: ignore out of order samples for last output
This is a follow-up for 6a465f6e29

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-18 01:03:58 +02:00
Aliaksandr Valialkin
eecc5e8463
lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search
The metricID->metricName entry can remain invisible for search for some time after registering new metricName.
This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName
entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948

See also 20812008a7
2024-03-18 00:37:11 +02:00
Aliaksandr Valialkin
05236e7cd4
lib/httputils: rename CAFile -> caFile in order to be consistent with local var naming in Go
This is a follow-up for 83e55456e2
2024-03-17 23:31:53 +02:00
Aliaksandr Valialkin
111d0aa2bf
app/{vmagent,vminsert}: add an ability to ignore input samples outside the current aggregation interval for stream aggregation
See https://docs.victoriametrics.com/stream-aggregation.html#ignoring-old-samples
2024-03-17 23:30:46 +02:00
Aliaksandr Valialkin
e70b644f1f
lib/streamaggr: ignore out of order samples when calculating increase, increase_prometheus, total and total_prometheus outputs
Out of order samples may result in unexpected spikes for these outputs.
So it is better to ignore such samples.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931
2024-03-17 23:24:14 +02:00
Aliaksandr Valialkin
1f753a049a
lib/streamaggr: follow-up for 15e33d56f1
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5939
2024-03-17 23:23:57 +02:00
Aliaksandr Valialkin
7c4d7dc6dd
lib/promauth: properly set Host header in requests to scrape targets.
The `Host` header must be set via net/http.Request.Host field, since net/http.Client
ignores this header if it is set via Request.Header.Set().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970
2024-03-17 23:22:54 +02:00
Andrii Chubatiuk
a58a81d80b
lib/streamaggr: pick sample with bigger timestamp or value on deduplicator (#5939)
Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-03-17 23:06:37 +02:00
Aliaksandr Valialkin
f92d4609e2
lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-12 03:01:47 +02:00
Aliaksandr Valialkin
540f65cc49
lib/storage: move the conversion of tag filters to composite tag filters into indexSearch.searchMetricIDsInternal
This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055

This is a follow-up for 2d31fd7855
2024-03-12 02:59:04 +02:00
Aliaksandr Valialkin
293f03f2dd
lib/storage: use composite indexes (metricName, label=value) when searching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb
This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-03-12 02:56:35 +02:00
Aliaksandr Valialkin
d9e3670627
lib/promauth: set the Host header to tlsServerName if itsn't empty
If tlsServerName isn't empty, then it is likely the https request is sent to IP instead of hostname.
In this case the request will fail, since Go automatically sets the Host header to the IP instead
of the desired hostname at tlsServerName. So set the Host header to tlsServerName if itsn't empty.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5802
2024-03-07 01:23:40 +02:00
Aliaksandr Valialkin
5817bf1c59
lib/streamaggr: add tests for keep_metric_names and drop_input_labels options 2024-03-06 20:06:23 +02:00
Aliaksandr Valialkin
1d7efc4c64
app/vmagent/remotewrite: clarify the reason behind the default value for -remoteWrite.queues in the same way as the reason for -maxConcurrentInserts is defined at 73f5fb0f0c 2024-03-06 13:57:53 +02:00
hagen1778
a364de3cfe
lib/writeconcurrencylimiter: mention dependency on CPU cores for -maxConcurrentInserts flag
The change also removes misleading `default` value from README for `maxConcurrentInserts`
cmd-line flag.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 73f5fb0f0c)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-03-05 18:56:38 +01:00
Aliaksandr Valialkin
27b9e8ed3e
app/{vmagent,vminsert}: add -streamAggr.dropInputSamples command-line flag for dropping the specified labels from input samples before deduplication and streaming aggregation 2024-03-05 02:27:27 +02:00
Aliaksandr Valialkin
c38c45d71f
app/{vminsert,vmagent}: allow using -streamAggr.dedupInterval without -streamAggr.config
This allows performing online de-duplication of incoming samples
2024-03-05 00:47:23 +02:00
Aliaksandr Valialkin
4352544d61
lib/streamaggr: do not reset aggregation state after the aggregation took longer than the configured interval
It is better from user PoV preserving this state until the next flush
2024-03-04 20:03:45 +02:00
Aliaksandr Valialkin
9f81450b38
lib/streamaggr: add missing "s" suffix in the warning message when the de-duplication or aggregation couldnt be finished in a timely manner 2024-03-04 19:38:39 +02:00
Aliaksandr Valialkin
d10932bd99
lib/streamaggr: benchmark only flush routines in BenchmarkDedupAggrFlushSerial and BenchmarkAggregatorsFlushSerial 2024-03-04 19:13:50 +02:00
Aliaksandr Valialkin
36ee08cad4
Revert "lib/streamaggr: do not flush dedup shards in parallel"
This reverts commit eb40395a1c.

Reason for revert: it has been appeared that the performance gain on multiple CPU cores
wasn't visible because the benchmark was generating incorrect pushSample.key.

See a207e0bf687d65f5198207477248d70c69284296
2024-03-04 19:13:50 +02:00
Aliaksandr Valialkin
9728aaf5d9
lib/streamaggr: properly generate pushSample.key in benchmarks 2024-03-04 19:13:49 +02:00
Aliaksandr Valialkin
93a057e4e6
lib/streamaggr: reduce the number of pointers at "total" aggregation state
This should reduce load on GC when scanning heap objects.
2024-03-04 19:13:49 +02:00
Aliaksandr Valialkin
9e00d8ad60
lib/streamaggr: use multiple job label values in BenchmarkAggregatorsPush instead of single value
This should make the benchmark closer to production cases
2024-03-04 19:13:48 +02:00
Aliaksandr Valialkin
9773ad200e
lib/streamaggr: use multiple job labels in BenchmarkAggregatorsPush 2024-03-04 19:13:48 +02:00
Aliaksandr Valialkin
482560a1f3
lib/streamaggr: do not flush dedup shards in parallel
This significantly increases CPU usage on systems with many CPU cores, while doesn't reduce flush latency too much
2024-03-04 17:01:42 +02:00
Aliaksandr Valialkin
d7252fce79
lib/streamaggr: reduce memory allocations when registering new series in deduplication and aggregation structs 2024-03-04 17:01:41 +02:00
Aliaksandr Valialkin
402dc14ec0
lib/streamaggr: make aggregate.runFlusher() more roubst and clear 2024-03-04 17:01:41 +02:00
Aliaksandr Valialkin
2ffef39bb3
lib/streamaggr: properly drop samples on the first incomplete interval
Previously samples were dropped on the first incomplete interval and the next complete interval.
Also make sure that the de-duplication is performed just before flushing the aggregate state.
This should help the case then dedup_interval = interval.
2024-03-04 17:01:40 +02:00
Aliaksandr Valialkin
c2dae136b3
lib/streamaggr: explicitly call resetSeries after flushSeries
This makes the code less fragile
2024-03-04 06:23:36 +02:00
Aliaksandr Valialkin
48a425898a
lib/streamaggr: enable time alignment for aggregate flushed to multiples of interval
For example, if `interval: 1m`, then data flush occurs at the end of every minute,
while `interval: 1h` leads to data flush at the end of every hour.

Add `no_align_flush_to_interval` option, which can be used for disabling the alignment.
2024-03-04 06:23:35 +02:00
Aliaksandr Valialkin
d80deaeaf4
lib/streamaggr: ignore the first sample in new time series during staleness_interval seconds after the stream aggregation start for total and increase outputs 2024-03-04 03:04:58 +02:00
Aliaksandr Valialkin
5e9cbfd4db
lib/streamaggr: flush dedup state and aggregation state in parallel on all the available CPU cores
This should reduce the time needed for aggregation state flush on systems with many CPU cores
2024-03-04 01:22:41 +02:00
Aliaksandr Valialkin
1e741ed6db
lib/streamaggr: add a benchmark for flushing dedup state 2024-03-04 01:22:40 +02:00
Aliaksandr Valialkin
5205972b83
lib/streamaggr: add a benchmark for measuring the performance of aggregator.flush 2024-03-04 01:22:40 +02:00
Aliaksandr Valialkin
8daf7a3f43
lib/streamaggr: add a benchmark for de-duplicating of 1M samples 2024-03-04 01:22:39 +02:00
Aliaksandr Valialkin
d4a425af87
lib/prompbmarshal: use clear() instead of a loop for clearing tss inside ResetTimeSeries() 2024-03-03 23:40:47 +02:00
Aliaksandr Valialkin
b958135677
lib/promutils: optimize LabelsCompressor.Decompress by using a specialized labelsMap struct instead of sync.Map
The labelsMap struct employs the fact that label indexes are condensed around 0,
so it stores the referred labels in a slice instead of map and uses slice index as label key.
This allows increasing the LabelsCompressor.Decompress performance by up to 3x.
This also reduces the latency of data flush in stream aggregation.
2024-03-03 23:25:27 +02:00
Aliaksandr Valialkin
0d5d46f9db
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
  for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
  deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
  metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
  per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
  This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
  - vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
  - vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
  - vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
  - vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
  - vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
  - vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
  - vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
    which took longer than the configured interval
  - vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
    which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md

The memory usage reduction increases CPU usage during stream aggregation by up to 30%.

This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 03:15:43 +02:00
Aliaksandr Valialkin
31f0dc4b97
lib/streamaggr: allow one second aggregation interval 2024-03-01 21:35:43 +02:00
Aliaksandr Valialkin
7533070a52
lib/promrelabel: use clear() function inside CleanLabels() 2024-03-01 21:34:47 +02:00
Aliaksandr Valialkin
052f2177a4
lib/fs: fix GOOS=windows build after f8baf29b6e 2024-03-01 01:46:44 +02:00
Aliaksandr Valialkin
816202bca7
lib/protoparser/opentelemetry/firehose: verify that the full response is parsed properly in ProcessRequestBody
This is a follow-up for bf9cb84575
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5899
2024-03-01 00:39:47 +02:00
Andrii Chubatiuk
e575fb1aeb
opentelemetry: fix firehose message parsing (#5899)
Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan>
2024-03-01 00:24:14 +02:00
Aliaksandr Valialkin
01d8bee14c
lib/mergeset: use unsafe.Slice and unsafe.String instead of deprecated reflect.SliceHeader with unsafe conversion from slice header to string header 2024-02-29 17:29:40 +02:00
Aliaksandr Valialkin
99269ea640
lib/bytesutil: use unsafe.String instead of unsafe conversion of slice header to string header 2024-02-29 17:28:04 +02:00
Aliaksandr Valialkin
ddc61e2309
lib/fs: properly handle the case when data=nil is passed to mUnmap 2024-02-29 17:26:26 +02:00
Aliaksandr Valialkin
22acd84019
lib/storage: use unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:24:44 +02:00
Aliaksandr Valialkin
a9fb2e91a6
lib/protoparser/csvimport: unse unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:20:05 +02:00
Aliaksandr Valialkin
9bc4c51ceb
lib/fs: use unsafe.Slice instead of deprecated reflect.SliceHeader 2024-02-29 17:18:42 +02:00
Aliaksandr Valialkin
4b1a262475
lib/fastnum: use unsafe.Slice() instead of deprecated reflect.SliceHeader 2024-02-29 17:17:24 +02:00
Aliaksandr Valialkin
3383f73191
lib/bytesutil: make BenchmarkToUnsafeString and BenchmarkToUnsafeBytes more reliable
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5880
2024-02-29 17:12:30 +02:00
helen
74b0605232
Optimize TouUnsafeBytes to make it leaner, more standards-compliant and (#5880)
slightly faster.
2024-02-29 17:12:04 +02:00
XLONG96
88b9088499
lib/logstorage: avoid panic when parsing regex with stream filter (#5897) 2024-02-29 15:32:25 +02:00
Aliaksandr Valialkin
7832d0800e
app/{vminsert,vmagent}: follow-up after 67a55b89a4
- Document the ability to read OpenTelemetry data from Amazon Firehose at docs/CHANGELOG.md

- Simplify parsing Firehose data. There is no need in trying to optimize the parsing with fastjson
  and byte slice tricks, since OpenTelemetry protocol is really slooow because of over-engineering.
  It is better to write clear code for better maintanability in the future.

- Move Firehose parser from /lib/protoparser/firehose to lib/protoparser/opentelemetry/firehose,
  since it is used only by opentelemetry parser.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5893
2024-02-29 14:47:20 +02:00
Andrii Chubatiuk
60cf0c9656
{vmagent,vminsert}: added firehose http destination opentelemetry data ingestion support (#5893)
Co-authored-by: Andrii Chubatiuk <wachy@Andriis-MBP-2.lan>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-29 14:46:16 +02:00
Aliaksandr Valialkin
8187244153
lib/streamaggr: make the BenchmarkAggregatorsPushByJobAvg closer to production case with long list of labels per sample 2024-02-29 02:41:48 +02:00
Hui Wang
d6ecfffa17
chore: add actual request size in error message (#5889) 2024-02-29 02:40:57 +02:00
Aliaksandr Valialkin
d845edc24b
lib: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 02:10:04 +02:00
Aliaksandr Valialkin
61519f6c22
lib/backup/actions: expose vm_backups_downloaded_bytes_total metric in order to be consistent with vm_backups_uploaded_bytes_total metric 2024-02-24 01:14:57 +02:00
Aliaksandr Valialkin
510e3d9cda
lib/backup/actions: update vm_backups_uploaded_bytes_total metric along the file upload instead of after the file upload
This solves two issues:

1. The vm_backups_uploaded_bytes_total metric will grow more smoothly
2. This prevents from int overflow at metrics.Counter.Add() when uploading files bigger than 2GiB
2024-02-24 01:08:34 +02:00
Aliaksandr Valialkin
0ac1c533dc
lib/backup/actions: consistently use atomic.* types instead of atomic.* functions
See ea9e2b19a5
2024-02-24 01:02:37 +02:00
Aliaksandr Valialkin
6fd6d4c2de
lib/storage: replace the remaining atomic.* functions with atomic.* types for the sake of consistency
See ea9e2b19a5
2024-02-24 00:51:03 +02:00
Aliaksandr Valialkin
a1baf25c2e
lib/storage: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-24 00:33:07 +02:00
Aliaksandr Valialkin
ca1e78bd16
lib/logstorage: consistently use atomic.* types instead of atomic.* functions on regular types
See ea9e2b19a5
2024-02-24 00:29:39 +02:00
Aliaksandr Valialkin
d0538d11d3
lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types
See ea9e2b19a5
2024-02-24 00:29:12 +02:00
Aliaksandr Valialkin
92e098012a
lib/logstorage: consistently use atomic.* type for refCount and mustDrop fields in datadb and storage structs in the same way as it is used in lib/storage
See ea9e2b19a5 and a204fd69f1
2024-02-24 00:28:56 +02:00
Aliaksandr Valialkin
7fa700a41c
lib/mergeset: consistently use atomic.* type for refCount and mustDrop fields in table struct in the same way as it is used in lib/storage
See ea9e2b19a5 and a204fd69f1
2024-02-24 00:28:37 +02:00
Aliaksandr Valialkin
e7dfcdfff6
lib/storage: consistently use atomic.* type for refCount and mustDrop fields in indexDB, table and partition structs
See ea9e2b19a5
2024-02-24 00:26:26 +02:00
Aliaksandr Valialkin
e2b0cc873b
lib/storage: convert dedupsDuringMerge from uint64 to atomic.Uint64
This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions
on int and bool types.

See ea9e2b19a5
2024-02-24 00:25:44 +02:00
Aliaksandr Valialkin
1eb3346ecc
lib/{storage,mergeset}: properly fix 'unaligned 64-bit atomic operation' panic on 32-bit architectures
The issue has been introduced in bace9a2501
The improper fix was in the d4c0615dcd ,
since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field
by 4-byte boundary inside partition struct.

The proper fix is to use atomic.Int64 field - this guarantees that the access to this field
won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860
and https://github.com/golang/go/issues/19057
2024-02-24 00:25:08 +02:00
Aliaksandr Valialkin
dc5b1e4dc1
lib/httpserver: return back the default value for -http.connTimeout to 2 minutes
It has been appeared that there are VictoriaMetrics users, who rely on the fact that
VictoriaMetrics components were closing incoming connections to -httpListenAddr every 2 minutes
by default. So let's return back this value by default in order to fix the breaking change
made at d8c1db7953 .

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1961891450 .
2024-02-24 00:20:11 +02:00
hagen1778
ab4fae9dc2
lib/storage: cleanup after d4c0615dcd
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c8d1d2ab72)
2024-02-23 18:55:40 +01:00
Dmytro Kozlov
eb22083924
lib/storage: fix aligning (#5860)
(cherry picked from commit d4c0615dcd)
2024-02-23 18:55:39 +01:00
Aliaksandr Valialkin
2a5c6e1cd5
app/vmstorage: deprecate -snapshotCreateTimeout command-line flag
Creating snapshot shouldn't time out under normal conditions.
The timeout was related to the bug, which has been fixed in 6460475e3b .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2024-02-23 04:51:57 +02:00
Aliaksandr Valialkin
42437e05c7
lib/storage: do not drop (date, metricID) entries for the date older than 2 days if samples are ingested at this date
Previously the (date, metricID) entries for dates older than the last 2 days were removed.
This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling).

The issue has been introduced in 431aa16c8d
2024-02-23 04:06:54 +02:00
Aliaksandr Valialkin
83217b7473
app/vmselect: add -search.maxLabelsAPIDuration and -search.maxLabelsAPISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values
This commit returns back limits for these endpoints, which have been removed at 5d66ee88bd ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-02-23 02:56:58 +02:00
Aliaksandr Valialkin
21170e558c
lib/promutils: hide the math.Round() logic inside ParseTimeMsec() function
This should prevent from bugs similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5801 in the future

This is a follow-up for ce3ec3ff2e
2024-02-23 01:21:42 +02:00
Aliaksandr Valialkin
dfcbcf4368
lib/mergeset: run go fmt after bace9a2501 2024-02-23 01:21:31 +02:00
Aliaksandr Valialkin
19032f9913
lib/{mergeset,storage}: convert bufferred items to searchable parts more optimally
Do not convert shard items to part when a shard becomes full. Instead, collect multiple
full shards and then convert them to a searchable part at once. This reduces
the number of searchable parts, which, in turn, should increase query performance,
since queries need to scan smaller number of parts.
2024-02-23 01:21:03 +02:00