VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-26 20:30:10 +01:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	140e7b6b74	all: replace atomic.Value with atomic.Pointer[T] This eliminates the need in .(*T) casting for results obtained from Load() Leave atomic.Value for map, since atomic.Pointer[map[...]...] makes double pointer to map, because map is already a pointer type.	2023-07-19 17:42:06 -07:00
Aliaksandr Valialkin	8815080030	app/vmselect/promql: add the ability to copy all the labels from `one` side of group_left()/group_right() operation This is performed by specifying `` inside group_left()/group_right(). Also allow specifying prefix for the copied labels via `group_left(...) prefix "..."` and `group_right(...) prefix "..."` syntax. For example, the following query adds all the namespace-related labels to pod info, and prefixes all the copied label names with "ns_" prefix: kube_pod_info on(namespace) group_left(*) prefix "ns_" kube_namespace_labels This resolves the following StackOverflow questions: - https://stackoverflow.com/questions/76661818/how-to-add-namespace-labels-to-pod-labels-in-prometheus - https://stackoverflow.com/questions/76653997/how-can-i-make-a-new-copy-of-kube-namespace-labels-metric-with-a-different-name	2023-07-17 19:07:39 -07:00
Aliaksandr Valialkin	7094fa38bc	lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping Previously all the newly ingested time series were registered in global `MetricName -> TSID` index. This index was used during data ingestion for locating the TSID (internal series id) for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names). The `MetricName -> TSID` index is stored on disk in order to make sure that the data isn't lost on VictoriaMetrics restart or unclean shutdown. The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache, and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics uses in-memory cache for speeding up the lookup for active time series. This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk. VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases: - If `storage/tsid` cache capacity isn't enough for active time series. Then just increase available memory for VictoriaMetrics or reduce the number of active time series ingested into VictoriaMetrics. - If new time series is ingested into VictoriaMetrics. In this case it cannot find the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index, since it doesn't know that the index has no the corresponding entry too. This is a typical event under high churn rate, when old time series are constantly substituted with new time series. Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index, are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics. Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName` for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod. This index can become very large under high churn rate and long retention. VictoriaMetrics caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups. The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series. This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics consults only the index for the current day when new time series is ingested into it. The downside of this change is increased indexdb size on disk for workloads without high churn rate, e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store identical `MetricName -> TSID` entries for static time series for every day. This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation, since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 . At the same time the change fixes the issue, which could result in lost access to time series, which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685 This is a follow-up for `1f28b46ae9`	2023-07-13 16:07:30 -07:00
Aliaksandr Valialkin	3b50b94f7a	lib/storage: fix possible test failure in TestStorageAddRowsConcurrent The number of parts in the snapshot partition may be zero if concurrent goroutine just started creating new partition, but didn't put data into it yet when the current goroutine made a snapshot.	2023-07-13 15:03:45 -07:00
Zakhar Bessarab	242050ba94	lib/storage: follow-up after `a50d63c376` (#4289 ) * lib/storage: follow-up after `a50d63c376` - ensure retentionMsecs is rounded to day - remove localTimeOffset in test as localOffset is ignored when using `UnixMilli` Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage: restore retention timezone offset effect on retention deadline Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2023-05-16 17:14:08 +02:00
Zakhar Bessarab	aca256735c	lib/storage: fix indexdb rotation infinite loop (#4249 ) When using `retentionTimezoneOffset` and having local timezone being more than 4 hours different from UTC indexdb retention calculation could return negative value. This caused indexdb rotation to get in loop. Fix calculation of offset to use `retentionTimezoneOffset` value properly and add test to cover all legit timezone configs. See: - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4207 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4206 Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-05-04 17:16:48 +02:00
Aliaksandr Valialkin	52006149b2	lib/storage: replace OpenStorage() with MustOpenStorage() Callers of OpenStorage() log the returned error and exit. The error logging and exit can be performed inside MustOpenStorage() alongside with printing the stack trace for better debuggability. This simplifies the code at caller side.	2023-04-14 23:02:40 -07:00
Aliaksandr Valialkin	df619bdff0	all: consistently use fs.MustClose() for closing lock files	2023-04-14 20:14:21 -07:00
Aliaksandr Valialkin	c8f2febaa1	lib/storage: consistently use OS-independent separator in file paths This is needed for Windows support, which uses `\` instead of `/` as file separator Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-25 14:33:58 -07:00
Zakhar Bessarab	39cdc546dd	lib/storage: enhancements for snapshots process (#3873 ) * lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858) * lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage * docs: fix formatting * app/vmstorage: add metrics to track status of snapshots * app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: update flag name in docs Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * app/vmstorage: reflect new metrics names change in docs Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-02-27 12:12:03 -08:00
Nikolay	9254e494f9	lib/storage: fixes finalDedup for backfilled data (#3737 ) previously historical data backfilling may trigger force merge for previous month every hour it consumes cpu, disk io and decrease cluster performance. Following commit fixes it by applying deduplication for InMemoryParts	2023-02-01 09:54:21 -08:00
Aliaksandr Valialkin	ba5a6c851c	lib/storage: use deterministic random generator in tests Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683	2023-01-23 20:10:32 -08:00
Aliaksandr Valialkin	8189770c50	all: add `-inmemoryDataFlushInterval` command-line flag for controlling the frequency of saving in-memory data to disk The main purpose of this command-line flag is to increase the lifetime of low-end flash storage with the limited number of write operations it can perform. Such flash storage is usually installed on Raspberry PI or similar appliances. For example, `-inmemoryDataFlushInterval=1h` reduces the frequency of disk write operations to up to once per hour if the ingested one-hour worth of data fits the limit for in-memory data. The in-memory data is searchable in the same way as the data stored on disk. VictoriaMetrics automatically flushes the in-memory data to disk on graceful shutdown via SIGINT signal. The in-memory data is lost on unclean shutdown (hardware power loss, OOM crash, SIGKILL). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337	2022-12-05 15:16:14 -08:00
Aliaksandr Valialkin	299285b147	lib/storage: fix TestUpdateCurrHourMetricIDs test when it runs on the first hour of the day by UTC	2022-12-02 18:52:37 -08:00
匠心零度	fa0ce10275	lib/storage: remove extra error check (#3396 )	2022-11-28 16:43:31 -08:00
Aliaksandr Valialkin	daa70e6560	lib/storage: follow-up for `790768f20b` - Document the bugfix at docs/CHANGELOG.md - Simplify the bugfix a bit	2022-11-07 14:04:08 +02:00
Aliaksandr Valialkin	dd88c628aa	lib/storage: remove unused isFull field from hourMetricIDs struct	2022-11-07 13:58:26 +02:00
Łukasz Marszał	790768f20b	Fix issue-3309 - currHourMetricIDs shouldn't contain metrics from prev hour (#3320 ) * fix issue-3309 currHourMetricIDs shouldn't contain metrics from prev hour * Update storage.go	2022-11-07 13:55:37 +02:00
Aliaksandr Valialkin	e1b8059086	lib/vmselectapi: rename deleteMetrics to more correct deleteSeries	2022-07-06 12:37:54 +03:00
Aliaksandr Valialkin	a350d1e81c	lib/storage: return marshaled metric names from SearchMetricNames Previously SearchMetricNames was returning unmarshaled metric names. This wasn't great for vmstorage, which should spend additional CPU time for marshaling the metric names before sending them to vmselect. While at it, remove possible duplicate metric names, which could occur when multiple samples for new time series are ingested via concurrent requests. Also sort the metric names before returning them to the client. This simplifies debugging of the returned metric names across repeated requests to /api/v1/series	2022-06-28 18:17:15 +03:00
Aliaksandr Valialkin	ba514284f1	lib/storage: add querytracer to more contexts querytracer has been added to the following storage.Storage methods: - RegisterMetricNames - DeleteMetrics - SearchTagValueSuffixes - SearchGraphitePaths	2022-06-27 13:45:51 +03:00
Aliaksandr Valialkin	52cf05c6d2	lib/storage: test GetTSDBStatusWithFiltersForDate on a global time range	2022-06-12 14:27:40 +03:00
Aliaksandr Valialkin	374beb350e	app/vmselect: optimize `/api/v1/labels` and `/api/v1/label/.../values` handlers when `match[]` query arg is passed to them	2022-06-12 04:32:13 +03:00
Aliaksandr Valialkin	41958ed5dd	all: add initial support for query tracing See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403	2022-06-01 02:29:23 +03:00
Artem Navoiev	37cf509c3a	lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487 ) * lib/{storage,flagutil} - Add option for snapshot autoremoval - add prometheus-like duration as command flag - add option to delete stale snapshots - update duration.go flag to re-use own code * wip * lib/flagutil: re-use Duration.Set() call in NewDuration * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-02 11:00:15 +03:00
Roman Khavronenko	cf1a8bce6b	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin	afafeb379a	all: typo fix: unexected -> unexpected	2021-12-20 17:39:52 +02:00
Aliaksandr Valialkin	e1a715b0f5	lib/storage: convert alternate regexps into Graphite wildcards inside `__graphite__` pseudo-label For example, `{__graphite__=~"foo.(bar\|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution. This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.	2021-12-14 19:51:49 +02:00
Aliaksandr Valialkin	2d8bd41f8a	lib/storage: reduce memory allocations when syncing dateMetricIDCache	2021-06-03 16:20:42 +03:00
Aliaksandr Valialkin	ad73f226ff	app/vmstorage: add ability to limit series cardinality via `-storage.maxHourlySeries` and `-storage.maxDailySeries` command-line flags	2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin	12d733dd5d	app/vminsert: add support for data ingestion via other vminsert nodes	2021-05-08 19:52:57 +03:00
Aliaksandr Valialkin	4a07820048	lib/storage: make sure that nobody uses partitions when closing the table	2021-02-17 14:59:04 +02:00
Aliaksandr Valialkin	9f5ac603a7	lib/storage: reduce the minimum supported retention for inverted index from one month to one day	2021-02-15 15:12:29 +02:00
Aliaksandr Valialkin	d16f22f3a1	app/vmselect,lib/storage: properly parse Graphite selectors with inner wildcards Example: foo{bar{x,yz},a[b-c],*de}	2021-02-03 20:14:22 +02:00
Aliaksandr Valialkin	157c02622b	app/vmselect: add ability to set Graphite-compatible filter via `{__graphite__="foo.*.bar"}` syntax	2021-02-03 01:21:54 +02:00
Aliaksandr Valialkin	0208d8c103	lib/storage: add a test for Storage.SearchMetricNames	2020-11-16 13:15:16 +02:00
Aliaksandr Valialkin	48d033a198	app/vminsert: add `/tags/tagSeries` and `/tags/tagMultiSeries` handlers from Graphite Tags API See https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb	2020-11-16 02:39:58 +02:00
immerrr again	51c529a2b6	app/vmstorage: add "/internal/force_flush" endpoint (#893 )	2020-11-11 14:40:27 +02:00
Aliaksandr Valialkin	5bfd4e6218	app/vmstorage: support for `-retentionPeriod` smaller than one month Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/17	2020-10-20 14:31:44 +03:00
Aliaksandr Valialkin	1f33dd717f	lib/storage: add `/internal/force_merge` handler for running forced compactions on historical per-month partitions This may be useful for freeing up storage space after time series deletion. See https://victoriametrics.github.io/#force-merge for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686	2020-09-17 12:20:40 +03:00
Aliaksandr Valialkin	039c9d2441	lib/storage: respect `-search.maxQueryDuration` when searching for time series in inverted index Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`. This commit stops searching in inverted index on query timeout.	2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin	d5dddb0953	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:05:11 +03:00
Aliaksandr Valialkin	b4afe562c1	lib/storage: postpone reading data from blocks during search This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics during heavy queries, which touch big number of time series over long time ranges. This improves single-node VM performance on heavy queries by up to 2x.	2020-04-27 11:45:24 +03:00
Aliaksandr Valialkin	f3e0c55ea1	lib/storage: serialize snapshot creation process with mutex This guarantees that the snapshot contains all the recently added data from inmemory buffers when multiple concurrent calls to Storage.CreateSnapshot are performed.	2020-03-24 22:27:05 +02:00
Aliaksandr Valialkin	605d588ba6	lib/uint64set: reduce memory usage in Union, Intersect and Subtract methods Iterate items with newly added Set.ForEach method instead of allocating `[]uint64` slice for all the items before the iteration.	2020-01-15 12:12:49 +02:00
Aliaksandr Valialkin	62a915f2b2	lib/storage: protect from time drift during indexdb rotation Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/248	2019-12-02 14:44:42 +02:00
Aliaksandr Valialkin	86a1cd700b	lib/storage: remove inmemory index for recent hour, since it uses too much memory Production workload shows that the index requires ~4Kb of RAM per active time series. This is too much for high number of active time series, so let's delete this index. Now the queries should fall back to the index for the current day instead of the index for the recent hour. The query performance for the current day index should be good enough given the 100M rows/sec scan speed per CPU core.	2019-11-13 17:58:07 +02:00
Aliaksandr Valialkin	ca259864e2	lib/storage: return back inmemory inverted index for recent hour Issues fixed: - Slow startup times. Now the index is loaded from cache during start. - High memory usage related to superflouos index copies every 10 seconds.	2019-11-13 13:11:04 +02:00
Aliaksandr Valialkin	01bb3c06c7	lib/storage: remove inmemory inverted index for recent hours Production load with >10M active time series showed it could slow down VictoriaMetrics startup times and could eat all the memory leading to OOM. Remove inmemory inverted index for recent hours until thorough testing on production data shows it works OK.	2019-11-13 10:45:53 +02:00
Oleg Kovalov	b4f44befa3	fix misspelled words (#229 )	2019-11-12 00:16:42 +02:00

1 2

60 Commits