VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-26 04:10:08 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	5320cc3198	lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json This should improve debuggability of unexpected deletion of directories inside partitions. While at it, log the proper path to parts.json when the directory for big part is missing in the partition. parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.	2024-04-17 12:00:10 +02:00
Aliaksandr Valialkin	9b2a5c8844	lib/storage: improve comments inside functions responsible for creating indexes for newly registered time series	2024-04-17 11:30:51 +02:00
Aliaksandr Valialkin	00f59d6ddf	all: fix golangci-lint(revive) warnings after `0c0ed61ce7` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001	2024-04-03 03:00:45 +03:00
Aliaksandr Valialkin	1ffad3a182	lib/storage: consistently use stopCh instead of stop	2024-04-03 02:54:51 +03:00
Zakhar Bessarab	3d15a31c6d	lib/storage: add ability to use downsampling for the given series filter (#733 ) * lib/storage: add ability to use downsampling for the given series filter Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add information about downsampling filters Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: fix MetricsQL filter Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: treat missing downsampling filter as a bug Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/part_header: verify correctness of downsampling filters when opening partition Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: save only appliable rules in part metadata Filter and save only rules which are appliable to partition based on MinTimestamp of stored data. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/downsampling: update log messages for final dedup Properly specify a reason of re-running deduplication for partition. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule. For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000]. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * Follow-up - Apply the first matching downsampling period if multiple filters match the given time series. This allows fine-tuning the downsampling config for the specific needs. - Take into account downsampling filters during search queries. - Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches. - Properly parse series filters with colons inside them. - Document the feature at docs/CHANGELOG.md. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960 --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-04-03 02:38:37 +03:00
Aliaksandr Valialkin	ae6190f0b5	lib/storage/table.go: reduce the difference with enterprise branch	2024-04-03 02:37:05 +03:00
Aliaksandr Valialkin	b6d1d6982e	lib/storage/partition.go: reduce code difference a bit with enterprise branch	2024-04-03 02:36:49 +03:00
Nikolay	c457f7de69	lib/storage: adds metrics for downsampling (#382 ) * lib/storage: adds metrics for downsampling vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612 * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-04-03 02:36:05 +03:00
Aliaksandr Valialkin	b8d37ad747	lib/storage: follow-up for `76f00cea6b` Store the deadline when the metricID entries must be deleted from indexdb if metricID->metricName entry isn't found after the deadline. This should make the code more clear comparing the the previous version, where the timestamp of the first metricID->metricName lookup miss was stored in missingMetricIDs. Remove the misleading comment about the importance of the order for creating entries in the inverted index when registering new time series. The order doesn't matter, since any subset of the created entries can become visible for search before any other subset after registering in indexdb. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959	2024-04-02 23:46:21 +03:00
Zakhar Bessarab	7c1ee69205	lib/storage/table: wait for merges to be completed when closing a table (#5965 ) * lib/storage/table: properly wait for force merges to be completed during shutdown Properly keep track of running background merges and wait for merges completion when closing the table. Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add changelog entry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-04-02 21:25:30 +03:00
Aliaksandr Valialkin	eecc5e8463	lib/storage: wait for up to 60 seconds before deciding to delete metricID entries from indexdb if metricID->metricName entry is missing during search The metricID->metricName entry can remain invisible for search for some time after registering new metricName. This is expected condition. So wait for up to 60 seconds in the hope that the metricID->metricName entry will become visible before deleting all the entries from indexdb, which are associated with the given metricID. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948 See also `20812008a7`	2024-03-18 00:37:11 +02:00
Aliaksandr Valialkin	f92d4609e2	lib/storage: optimize /api/v1/labels and /api/v1/label/.../values when match[] contains metric name Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-03-12 03:01:47 +02:00
Aliaksandr Valialkin	540f65cc49	lib/storage: move the conversion of tag filters to composite tag filters into indexSearch.searchMetricIDsInternal This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now. While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal() in order to quickly terminate search of time series in the old indexdb for new time ranges. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055 This is a follow-up for `2d31fd7855`	2024-03-12 02:59:04 +02:00
Aliaksandr Valialkin	293f03f2dd	lib/storage: use composite indexes (metricName, label=value) when searching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-03-12 02:56:35 +02:00
Aliaksandr Valialkin	22acd84019	lib/storage: use unsafe.Slice instead of deprecated reflect.SliceHeader	2024-02-29 17:24:44 +02:00
Aliaksandr Valialkin	6fd6d4c2de	lib/storage: replace the remaining atomic.* functions with atomic.* types for the sake of consistency See `ea9e2b19a5`	2024-02-24 00:51:03 +02:00
Aliaksandr Valialkin	a1baf25c2e	lib/storage: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-24 00:33:07 +02:00
Aliaksandr Valialkin	d0538d11d3	lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-24 00:29:12 +02:00
Aliaksandr Valialkin	e7dfcdfff6	lib/storage: consistently use atomic.* type for refCount and mustDrop fields in indexDB, table and partition structs See `ea9e2b19a5`	2024-02-24 00:26:26 +02:00
Aliaksandr Valialkin	e2b0cc873b	lib/storage: convert dedupsDuringMerge from uint64 to atomic.Uint64 This should simplify code maintenance by gradually converting to atomic.* types instead of calling atomic.* functions on int and bool types. See `ea9e2b19a5`	2024-02-24 00:25:44 +02:00
Aliaksandr Valialkin	1eb3346ecc	lib/{storage,mergeset}: properly fix 'unaligned 64-bit atomic operation' panic on 32-bit architectures The issue has been introduced in `bace9a2501` The improper fix was in the `d4c0615dcd` , since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field by 4-byte boundary inside partition struct. The proper fix is to use atomic.Int64 field - this guarantees that the access to this field won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860 and https://github.com/golang/go/issues/19057	2024-02-24 00:25:08 +02:00
hagen1778	ab4fae9dc2	lib/storage: cleanup after `d4c0615dcd` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `c8d1d2ab72`)	2024-02-23 18:55:40 +01:00
Dmytro Kozlov	eb22083924	lib/storage: fix aligning (#5860 ) (cherry picked from commit `d4c0615dcd`)	2024-02-23 18:55:39 +01:00
Aliaksandr Valialkin	2a5c6e1cd5	app/vmstorage: deprecate -snapshotCreateTimeout command-line flag Creating snapshot shouldn't time out under normal conditions. The timeout was related to the bug, which has been fixed in `6460475e3b` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551	2024-02-23 04:51:57 +02:00
Aliaksandr Valialkin	42437e05c7	lib/storage: do not drop (date, metricID) entries for the date older than 2 days if samples are ingested at this date Previously the (date, metricID) entries for dates older than the last 2 days were removed. This could lead to slow check for the (date, metricID) entry in the indexdb during ingesting historical data (aka backfilling). The issue has been introduced in `431aa16c8d`	2024-02-23 04:06:54 +02:00
Aliaksandr Valialkin	83217b7473	app/vmselect: add -search.maxLabelsAPIDuration and -search.maxLabelsAPISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values This commit returns back limits for these endpoints, which have been removed at `5d66ee88bd` , since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-02-23 02:56:58 +02:00
Aliaksandr Valialkin	19032f9913	lib/{mergeset,storage}: convert bufferred items to searchable parts more optimally Do not convert shard items to part when a shard becomes full. Instead, collect multiple full shards and then convert them to a searchable part at once. This reduces the number of searchable parts, which, in turn, should increase query performance, since queries need to scan smaller number of parts.	2024-02-23 01:21:03 +02:00
Aliaksandr Valialkin	08c5250a7b	lib/storage: handle common case when the number of rows passed to flushRowsToInmemoryParts() doesnt exceed maxRawRowsPerShard	2024-02-23 01:12:18 +02:00
Aliaksandr Valialkin	8669584e9f	lib/{storage,mergeset}: convert beffered items into searchable in-memory parts exactly once per the given flush interval Previously the interval between item addition and its conversion to searchable in-memory part could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp() to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking the time when buffered items must be converted to searchable in-memory parts, since time.Now() calls aren't located in hot paths. Increase the flush interval for converting buffered samples to searchable in-memory parts from one second to two seconds. This should reduce the number of blocks, which are needed to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage. While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems with big amounts of RAM under high data ingestion rate.	2024-02-23 01:11:57 +02:00
Aliaksandr Valialkin	5f1fa8e7f7	lib/storage: avoid superflouos copy of block header data	2024-02-23 01:11:31 +02:00
Aliaksandr Valialkin	a982ab6bfb	app/vmstorage: expose vm_snapshots metric, which shows the current number of snapshots While at it, refresh docs about snapshots - https://docs.victoriametrics.com/#how-to-work-with-snapshots	2024-02-23 01:07:04 +02:00
Aliaksandr Valialkin	3f9022bc08	lib/storage: do not pool rawRowsBlock when flushing rawRows to in-memory blocks The pooled rawRowsBlock objects occupies big amounts of memory between flushes, and the flushes are relatively rare. So it is better to don't use the pool and to allocate rawRow blocks on demand. This should reduce the average memory usage between flushes.	2024-02-23 01:06:28 +02:00
Aliaksandr Valialkin	bf07e2ac87	lib/storage: do not keep rawRows buffer across flush() calls The buffer can be quite big under high ingestion rate (e.g. more than 100MB). This leads to increased memory usage between buffer flushes. So it is better to re-create the buffer on every flush in order to reduce memory usage between buffer flushes.	2024-02-23 01:06:09 +02:00
Aliaksandr Valialkin	33b2553c78	app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition	2024-02-15 14:52:53 +02:00
Aliaksandr Valialkin	f5680a6857	all: upgrade Go builder from Go1.21.7 to Go1.22.0 See https://go.dev/doc/go1.22	2024-02-12 22:14:00 +02:00
Aliaksandr Valialkin	99aaa5067f	lib/mergeset: do not panic on too long items passed to Table.AddItems() Instead, log a sample of these long items once per 5 seconds into error log, so users could notice and fix the issue with too long labels or too many labels. Previously this panic could occur in production when ingesting samples with too long labels.	2024-02-12 20:18:19 +02:00
Aliaksandr Valialkin	838b2275d7	lib/storage: do not append headerData to bsw.indexData if its size exceeds maxBlockSize This is a follow-up optimization after `3c246cdf00`	2024-02-12 20:16:32 +02:00
Aliaksandr Valialkin	ae12ac69ba	lib/snapshot: move Time, Validate and NewName into lib/snapshot/snapshotutil package This allows removing importing unneeded command-line flags into binaries, which import lib/storage, which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions. This is a follow-up for `83e55456e2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738	2024-02-09 04:19:30 +02:00
Aliaksandr Valialkin	950b126a09	lib/{storage,mergeset}: do not create index blocks with sizes exceeding 64Kb in common case This should reduce memory fragmentation and memory usage for indexdb/indexBlocks and storage/indexBlocks caches	2024-02-08 14:14:22 +02:00
Aliaksandr Valialkin	293617028d	lib/storage: move fixupTimestamps() call to Block.Init() This is a follow-up for `0bf7921721`	2024-02-06 22:44:09 +02:00
Zakhar Bessarab	fdbc44d813	lib/storage/raw_row: properly initialize TS for tmp blocks (#5762 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-02-06 22:44:08 +02:00
Aliaksandr Valialkin	61562cdee9	lib/storage: keep (date, metricID) entries only for the last two dates Entries for the previous dates is usually not used, so there is little sense in keeping them in memory. This should reduce the size of storage/date_metricID cache, which can be monitored via vm_cache_entries{type="storage/date_metricID"} metric.	2024-01-29 18:44:27 +01:00
Aliaksandr Valialkin	f5559c038c	lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests This limit has little sense for these APIs, since: - Thses APIs frequently result in scanning of all the time series on the given time range. For example, if extra_filters={datacenter="some_dc"} . - Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit, which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests. Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API. This limit shouldn't affect typical use cases for these APIs: - Grafana dashboard load when dashboard labels should be loaded - Auto-suggestion list load when editing the query in Grafana or vmui Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-01-29 16:44:46 +01:00
Aliaksandr Valialkin	7a8b92b590	lib/{mergeset,storage}: make background merge more responsive and scalable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for `fa566c68a6` . Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2024-01-26 22:19:52 +01:00
Aliaksandr Valialkin	0715f1efcd	lib/storage: rename AssistedMerges to AssistedMergesCount in order to make these field names less misleading These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code	2024-01-25 10:21:13 +02:00
Aliaksandr Valialkin	3199558da9	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:47:40 +02:00
Aliaksandr Valialkin	68d76b1436	lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average. The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.	2024-01-23 16:13:25 +02:00
Aliaksandr Valialkin	9b3217db61	lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting	2024-01-23 16:13:19 +02:00
Zakhar Bessarab	60ef978ffc	lib/storage: print tenant ID in log when discarding or truncating labels (#5658 ) Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-01-23 02:27:59 +02:00
Aliaksandr Valialkin	d52fd73f18	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:39:16 +02:00

1 2 3 4 5 ...

759 Commits