VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-19 07:01:02 +01:00

Author	SHA1	Message	Date
Roman Khavronenko	e9ee043879	lib/storage: make `indexdb/tagFilters` cache size configurable (#2667 ) The default size of `indexdb/tagFilters` now can be overridden via `storage.cacheSizeIndexDBTagFilters` flag. Please, be careful with changing default size since it may lead to inefficient work of the vmstorage or OOM exceptions. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2663 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2022-06-01 14:57:39 +03:00
Aliaksandr Valialkin	38beb9fe04	lib/storage: add ability to change the indexdb rotation time offset with -retentionTimezoneOffset command-line flag This is a follow-up for `0fbf59199a` See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2574	2022-05-25 16:07:14 +03:00
Aliaksandr Valialkin	e961aec551	app/vmstorage: do not allow to set -retentionPeriod smaller than one day VictoriaMetrics doesn't support retention periods smaller than one day, so do not allow to set it to small values. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2496	2022-05-07 00:54:42 +03:00
Roman Khavronenko	c41ae2db2c	vmstorage: switch to rich duration parser for flag `snapshotsMaxAge` (#2542 ) The switch suppose to allow setting `d`, `w`, `y` duration units. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-05-05 21:13:55 +03:00
Aliaksandr Valialkin	361b08c30e	lib/storage: leave the last sample per each discrete interval during the deduplicaton This aligns better with staleness logic in Prometheus - https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness	2022-05-02 21:59:31 +03:00
Artem Navoiev	11db05a4ff	lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487 ) * lib/{storage,flagutil} - Add option for snapshot autoremoval - add prometheus-like duration as command flag - add option to delete stale snapshots - update duration.go flag to re-use own code * wip * lib/flagutil: re-use Duration.Set() call in NewDuration * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-05-02 11:24:12 +03:00
Aliaksandr Valialkin	a436836402	lib/flagutil: re-use Duration.Set() call in NewDuration	2022-05-02 10:58:08 +03:00
Aliaksandr Valialkin	ed1b394a1a	app/vmstorage: expose `vm_indexdb_items_added_total` and `vm_indexdb_items_added_size_bytes_total` counters at `/metrics` page These counters can be used for monitoring the rate of addition of new entries in indexdb (aka inverted index). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2471	2022-04-21 13:19:42 +03:00
Nikolay	4cf6219e07	lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache (#2293 ) * lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache It should decrease memory usage for regexp caching with storing cacheEntry by pointer - golang map should be able to effectivly shrink it's size original issue with this case - unexpected map grows and storage OOM Apply suggestions from code review Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> Adds missing metrics for regexp cache and regexpPrefixes cache * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-03-26 12:57:27 +02:00
Roman Khavronenko	bd7837d524	lib: allow to configure cache size by type (#2206 ) * lib: allow to configure cache size by type https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940 Signed-off-by: hagen1778 <roman@victoriametrics.com> * Apply suggestions from code review * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-21 13:55:51 +02:00
Roman Khavronenko	d107f86fbc	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:34:44 +02:00
Aliaksandr Valialkin	6ae584b9b3	lib/{mergeset,storage}: properly limit cache sizes for indexdb Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`, since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded, which could result to out of memory errors. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-20 18:45:03 +02:00
Aliaksandr Valialkin	cdfe854c9b	lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge() This improves the code readability and debuggability, since the output of these functions stops depending on global state.	2021-12-14 20:52:29 +02:00
Aliaksandr Valialkin	ab4be24397	app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query: vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9	2021-12-02 10:30:01 +02:00
Aliaksandr Valialkin	4fb19fe34b	all: consistently return `application/json` content-type without `charset=utf-8` The `application/json` content-type has utf-8 encoding by default. See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897	2021-11-09 18:07:22 +02:00
Aliaksandr Valialkin	4fddcf4c83	app/{vminsert,vmstorage}: follow-up after `a171916ef5` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269	2021-10-08 14:09:51 +03:00
Nikolay	a171916ef5	Adds read-only mode for vmstorage node (#1680 ) * adds read-only mode for vmstorage https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269 * changes order a bit * moves isFreeDiskLimitReached var to storage struct renames functions to be consistent change protoparser api - with optional storage limit check for given openned storage * renames freeSpaceLimit to ReadOnly	2021-10-08 12:52:56 +03:00
Aliaksandr Valialkin	c473d8ffe1	li/storage: re-use the per-day inverted index search code for searching in global index This allows removing a big pile of outdated code for global index search. This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486	2021-07-30 10:28:20 +03:00
Aliaksandr Valialkin	22c6e64bbc	lib/storage: consistency renaming: tagCache -> tagFiltersCache This improves code readability	2021-07-06 11:03:30 +03:00
Aliaksandr Valialkin	44855f0c9b	app/{vmselect,vmstorage}: clarify the description for `-dedup.minScrapeInterval` command-line flag Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1426	2021-07-02 15:06:41 +03:00
Aliaksandr Valialkin	165a9f9200	app/vmstorage: add ability to limit series cardinality via `-storage.maxHourlySeries` and `-storage.maxDailySeries` command-line flags	2021-05-20 15:31:57 +03:00
Aliaksandr Valialkin	4a5f45c77e	app/vminsert: add support for data ingestion via other vminsert nodes	2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin	6dc5d3b357	all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com	2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin	4028d692f5	app: do not process non-GET requests on at `/` handler	2021-04-02 22:56:38 +03:00
Aliaksandr Valialkin	512addc608	app/{vminsert,vmagent}: add `-sortLabels` command-line option for sorting time series labels before ingesting them in the storage This option can be useful when samples for the same time series are ingested with distinct order of labels. For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.	2021-03-31 23:27:21 +03:00
Aliaksandr Valialkin	ae1c653d55	lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels	2021-03-31 21:22:40 +03:00
Aliaksandr Valialkin	d074326970	app/vmstorage: add `-logNewSeries` command-line flag for determining the source of series churn rate	2021-03-15 22:40:28 +02:00
Aliaksandr Valialkin	83da939947	app/vmstorage: export vm_composite_filter_success_conversions_total and vm_composite_filter_missing_conversions_total metrics	2021-02-17 19:13:49 +02:00
Aliaksandr Valialkin	08f21d8761	app/vmstorage: export vm_composite_index_min_timestamp metric	2021-02-10 17:14:00 +02:00
Aliaksandr Valialkin	e8ee9fa7fe	app/vmstorage: export missing `vm_cache_size_bytes` metrics for indexdb and data caches	2021-02-09 00:49:58 +02:00
Aliaksandr Valialkin	d5a2b120e9	app/vmstorage: disable final merge by default, since it may result in high disk IO and CPU usage without measurable benefits such as increased query performance and reduced disk space usage	2021-01-08 00:12:12 +02:00
Aliaksandr Valialkin	a2eb451de4	app/{vmagent,vminsert}: follow-up for `ce8c2dd1f1`: return `/targets` page in HTML when requested via web browser	2020-12-14 14:13:01 +02:00
Aliaksandr Valialkin	1a237c6903	all: properly handle CPU limits set on the host system/container This can reduce memory usage on systems with enabled CPU limits. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946	2020-12-08 21:07:03 +02:00
Aliaksandr Valialkin	bdac2171f1	all: do not print usage info for all the flags when incorrect command-line flag is passed This should improve usability for VictoriaMetrics apps that have big number of command-line flags, i.e. all the apps.	2020-12-03 21:46:19 +02:00
Aliaksandr Valialkin	7ceaf4ba8f	all: consistently return text-based HTTP responses with charset=utf-8 This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897	2020-11-13 10:30:21 +02:00
immerrr again	1ec1a9f27f	app/vmstorage: add "/internal/force_flush" endpoint (#893 )	2020-11-11 14:46:37 +02:00
Aliaksandr Valialkin	0db7c2b500	app/vmstorage: support for `-retentionPeriod` smaller than one month Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/17	2020-10-20 14:42:46 +03:00
Aliaksandr Valialkin	d2e917d1cb	app/vmstorage: add `vm_rows_added_to_storage_total` metric, which shows the total number of rows added to storage since app start	2020-10-09 13:36:17 +03:00
Aliaksandr Valialkin	b51fa16177	app/vmstorage: add `-finalMergeDelay` command-line flag for configuring the delay before final merge for per-month partitions after no new data is ingested to it	2020-10-07 17:42:31 +03:00
Aliaksandr Valialkin	abfd3a8fab	app/{vminsert,vmselect,vmstorage}: add a link to https://victoriametrics.github.io/Cluster-VictoriaMetrics.html from main page of every cluster component	2020-10-06 15:30:07 +03:00
Aliaksandr Valialkin	fd7dd5064a	lib/storage: code cleanup after `10f2eedee0` Remove the code that uses metricIDs caches for the current and the previous hour during metricIDs search, since this code became unused after implementing per-day inverted index almost a year ago. While at it, fix a bug, which could prevent from finding time series with names containing dots (aka Graphite-like names such as `foo.bar.baz`).	2020-10-01 19:12:04 +03:00
Aliaksandr Valialkin	536aa8779a	app/vmstorage: rename `vm_{big\|small}_merge_need_free_disk_space` to `vm_merge_need_free_disk_space` This simplifies alerting.	2020-09-29 22:53:33 +03:00
Aliaksandr Valialkin	097a4c10dd	app/vmstorage: add metrics for determining whether background merges need additional disk space to complete These metrics are: * vm_small_merge_need_free_disk_space * vm_big_merge_need_free_disk_space Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686	2020-09-29 21:47:47 +03:00
Aliaksandr Valialkin	2ee0dc27a6	app/vmstorage: parallelize data processing obtained from a single connection from vminsert Previously vmstorage could use only a single CPU core for data processing from a single connection from vminsert. Now all the CPU cores can be used for data processing from a single connection from vminsert. This should improve the maximum data ingestion performance for a single vminsert->vmstorage connection.	2020-09-28 21:41:16 +03:00
Aliaksandr Valialkin	9b15b11f74	app/vmstorage: added `-forceMergeAuthKey` command-line flag for protecting `/internal/force_merge` endpoint	2020-09-17 14:24:20 +03:00
Aliaksandr Valialkin	d96858b921	lib/storage: add `/internal/force_merge` handler for running forced compactions on historical per-month partitions This may be useful for freeing up storage space after time series deletion. See https://victoriametrics.github.io/#force-merge for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686	2020-09-17 12:20:56 +03:00
Aliaksandr Valialkin	f5cb213ef9	lib/storage: reuse timestamp blocks for adjancent metric blocks with identical timestamps This should reduce disk space usage when scraping targets containing metrics with identical names such as `node_cpu_seconds_total`, histograms, quantiles, etc. Expose `vm_timestamps_blocks_merged_total` and `vm_timestamps_bytes_saved_total` metrics for monitoring the effectiveness of timestamp blocks merging.	2020-09-09 23:59:21 +03:00
Aliaksandr Valialkin	6721e47ae9	app: respect CPU limits set via cgroups Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage for cases when VictoriaMetrics components run in containers with CPU limits. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685	2020-08-11 23:01:03 +03:00
Aliaksandr Valialkin	a455930ab4	app/vmstorage: rename `vm_cache_size_entries{type="storage/prefetchedMetricIDs"}` to `vm_cache_entries{type="storage/prefetchedMetricIDs"}` to be consistent with other `vm_cache_entries` metrics	2020-08-06 16:34:18 +03:00
Aliaksandr Valialkin	a3e91c593b	lib/storage: limit the number of concurrent calls to storage.searchTSIDs to GOMAXPROCS*2 This should limit the maximum memory usage and reduce CPU trashing on vmstorage when multiple heavy queries are executed. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2020-08-05 18:27:21 +03:00
Aliaksandr Valialkin	29bbab0ec9	lib/storage: remove prioritizing of merging small parts over merging big parts, since it doesn't work as expected The prioritizing could lead to big merge starvation, which could end up in too big number of parts that must be merged into big parts. Multiple big merges may be initiated after the migration from v1.39.0 or v1.39.1. It is OK - these merges should be finished soon, which should return CPU and disk IO usage to normal levels. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618	2020-07-30 20:02:22 +03:00
Aliaksandr Valialkin	b8303afcd8	lib/storage: improve prioritizing of data ingestion over querying Prioritize also small merges over big merges. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648	2020-07-23 01:40:38 +03:00
Aliaksandr Valialkin	31ef39e8da	lib/httpserver: log remote address in error message from `httpserver.Errorf` This should improve detection of the root cause of errors. Thanks to Anant for the idea.	2020-07-20 14:06:29 +03:00
Aliaksandr Valialkin	0bff96fe4b	lib/storage: prioritize data ingestion over heavy queries Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream. Prevent this by delaying queries' execution until free resources are available for data ingestion. Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources for data ingestion and/or for executing heavy queries. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2020-07-05 19:44:04 +03:00
Aliaksandr Valialkin	d962568e93	all: use %w instead of %s for wrapping errors in `fmt.Errorf` This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode . See https://blog.golang.org/go1.13-errors for details.	2020-06-30 23:33:46 +03:00
Aliaksandr Valialkin	2784015a4d	all: print `--help` output to stdout instead of stderr This is easier to grep and pipe	2020-05-16 12:03:06 +03:00
Aliaksandr Valialkin	1e5c1d7eaa	app/vmstorage: add `vm_slow_metric_name_loads_total` metric, which could be used as an indicator when more RAM is needed for improving query performance	2020-05-15 14:12:24 +03:00
Aliaksandr Valialkin	d6b9a49481	app/vmstorage: add `vm_slow_row_inserts_total` and `vm_slow_per_day_index_inserts_total` metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series	2020-05-15 13:46:57 +03:00
Aliaksandr Valialkin	f7753b1469	lib/storage: gradually pre-populate per-day inverted index for the next day This should prevent from CPU usage spikes at 00:00 UTC every day when inverted index for new day must be quickly created for all the active time series. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430	2020-05-12 12:13:32 +03:00
Aliaksandr Valialkin	a53e332a93	app/vmstorage: add missing shutdown for http server on graceful shutdown This could result in the following panic during graceful shutdown when `/metrics` page is requested: http: panic serving 10.101.66.5:57366: runtime error: invalid memory address or nil pointer dereference goroutine 2050 [running]: net/http.(conn).serve.func1(0xc00ef22000) net/http/server.go:1772 +0x139 panic(0xa0fc00, 0xe91d80) runtime/panic.go:973 +0x3e3 github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache.(Cache).UpdateStats(0x0, 0xc0000516c8) github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache/cache.go:224 +0x37 github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(indexDB).UpdateMetrics(0xc00b931d00, 0xc02c41acf8) github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/index_db.go:258 +0x9f github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(Storage).UpdateMetrics(0xc0000bc7e0, 0xc02c41ac00) github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/storage.go:413 +0x4c5 main.registerStorageMetrics.func1(0x0) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:186 +0xd9 main.registerStorageMetrics.func3(0xc00008c380) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:196 +0x26 main.registerStorageMetrics.func7(0xc) github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/main.go:211 +0x26 github.com/VictoriaMetrics/metrics.(Gauge).marshalTo(0xc000010148, 0xaa407d, 0x20, 0xb50d60, 0xc005319890) github.com/VictoriaMetrics/metrics@v1.11.2/gauge.go:38 +0x3f github.com/VictoriaMetrics/metrics.(Set).WritePrometheus(0xc000084300, 0x7fd56809c940, 0xc005319860) github.com/VictoriaMetrics/metrics@v1.11.2/set.go:51 +0x1e1 github.com/VictoriaMetrics/metrics.WritePrometheus(0x7fd56809c940, 0xc005319860, 0xa16f01) github.com/VictoriaMetrics/metrics@v1.11.2/metrics.go:42 +0x41 github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.writePrometheusMetrics(0x7fd56809c940, 0xc005319860) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/metrics.go:16 +0x44 github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper(0xb5a120, 0xc005319860, 0xc005018f00, 0xc00002cc90) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:154 +0x58d github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1(0xb5a120, 0xc005319860, 0xc005018f00) github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:119 +0x8e net/http.HandlerFunc.ServeHTTP(0xc00002d110, 0xb5a660, 0xc0044141c0, 0xc005018f00) net/http/server.go:2012 +0x44 net/http.serverHandler.ServeHTTP(0xc004414000, 0xb5a660, 0xc0044141c0, 0xc005018f00) net/http/server.go:2807 +0xa3 net/http.(conn).serve(0xc00ef22000, 0xb5bf60, 0xc010532080) net/http/server.go:1895 +0x86c created by net/http.(Server).Serve net/http/server.go:2933 +0x35c	2020-04-02 21:09:55 +03:00
Aliaksandr Valialkin	3b744f3c32	app/vmstorage: typo fix	2020-04-01 23:43:09 +03:00
Aliaksandr Valialkin	f838cdc86e	app/vmstorage: add `vm_free_disk_space_bytes` metric for monitoring the remaining disk space at `-storageDataPath`	2020-04-01 23:10:44 +03:00
Aliaksandr Valialkin	8939c19281	app/vmstorage: return 500 status code instead of 200 status code on internal errors inside `/snapshot/*` handlers	2020-03-10 23:54:27 +02:00
Aliaksandr Valialkin	cf9aee4ec3	all: properly split `vm_deduplicated_samples_total` among cluster components Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345	2020-02-27 23:47:51 +02:00
Aliaksandr Valialkin	1010a57882	all: allow setting flags via environment vars Now flags can be set via environment vars with the same names as flags. Command-line flags override flags set via env vars. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/311	2020-02-10 13:31:21 +02:00
Aliaksandr Valialkin	ea66212c93	lib/storage: move `-dedup.minScrapeInterval` flag outside lib/storage, so it doesnt show up in `vminsert` in cluster version	2020-02-10 13:07:25 +02:00
Aliaksandr Valialkin	4ed5e9a7ce	lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init This should speed up Search.NextMetricBlock loop for big number of found time series.	2020-01-30 15:16:16 +02:00
Aliaksandr Valialkin	ea53a21b02	all: consistently log durations in seconds with millisecond precision This should improve logs readability	2020-01-22 18:35:24 +02:00
Aliaksandr Valialkin	4e22b521c2	lib/storage: remove metricID with missing metricID->metricName entry The metricID->metricName entry can be missing in the indexdb after unclean shutdown when only a part of entries for new time series is written into indexdb. Recover from such a situation by removing the broken metricID. New metricID will be automatically created for time series with the given metricName when new data point will arive to it.	2019-12-02 20:52:13 +02:00
Aliaksandr Valialkin	d297b65089	lib/storage: add `vm_cache_size_bytes{type="storage/hour_metric_ids"}` metric	2019-11-13 20:26:05 +02:00
Aliaksandr Valialkin	494ad0fdb3	lib/storage: remove inmemory index for recent hour, since it uses too much memory Production workload shows that the index requires ~4Kb of RAM per active time series. This is too much for high number of active time series, so let's delete this index. Now the queries should fall back to the index for the current day instead of the index for the recent hour. The query performance for the current day index should be good enough given the 100M rows/sec scan speed per CPU core.	2019-11-13 18:08:58 +02:00
Aliaksandr Valialkin	633dd81bb5	lib/storage: add `-disableRecentHourIndex` flag for disabling inmemory index for recent hour This may be useful for saving RAM on high number of time series aka high cardinality	2019-11-13 15:10:12 +02:00
Aliaksandr Valialkin	f1620ba7c0	lib/storage: fix inmemory inverted index issues found in v1.29 Issues fixed: - Slow startup times. Now the index is loaded from cache during start. - High memory usage related to superflouos index copies every 10 seconds.	2019-11-13 13:35:38 +02:00
Aliaksandr Valialkin	87b39222be	Revert "lib/fs: do not postpone directory removal on NFS error" This reverts commit 21aeb02b46649ac9906cb37733f7b155a77a0db9.	2019-11-12 16:29:50 +02:00
Aliaksandr Valialkin	c48e39eea9	lib/storage: add tests for dateMetricIDCache	2019-11-11 13:21:05 +02:00
Aliaksandr Valialkin	5f52eb7653	lib/fs: do not postpone directory removal on NFS error Continue trying to remove NFS directory on temporary errors for up to a minute. The previous async removal process breaks in the following case during VictoriaMetrics start - VictoriaMetrics opens index, finds incomplete merge transactions and starts replaying them. - The transaction instructs removing old directories for parts, which were already merged into bigger part. - VictoriaMetrics removes these directories, but their removal is delayed due to NFS errors. - VictoriaMetrics scans partition directory after all the incomplete merge transactions are finished and finds directories, which should be removed, but weren't still removed due to NFS errors. - VictoriaMetrics panics when it finds unexpected empty directory. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162	2019-11-10 13:27:16 +02:00
Aliaksandr Valialkin	9ea2bd822e	lib/storage: implement per-day inverted index	2019-11-10 00:20:32 +02:00
Aliaksandr Valialkin	dea2f3efed	lib/storage: use specialized cache for (date, metricID) entries This improves ingestion performance.	2019-11-09 23:09:18 +02:00
Aliaksandr Valialkin	46e67bb78c	lib/storage: export `vm_new_timeseries_created_total` metric for determining time series churn rate	2019-11-08 19:58:21 +02:00
Aliaksandr Valialkin	0063c857f5	lib/storage: add inmemory inverted index for the last hour It should improve performance for `last N hours` dashboards with update intervals smaller than 1 hour.	2019-11-08 19:37:46 +02:00
Aliaksandr Valialkin	1c777e0245	lib/storage: substitute error message about unsorted items in the index block after metricIDs merge with counter The origin of the error has been detected and documented in the code, so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`, so it could be monitored and alerted on high error rates. Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`, so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.	2019-11-06 14:32:41 +02:00
Aliaksandr Valialkin	6ab9c98a1e	app/vmstorage: add `-bigMergeConcurrency` and `-smallMergeConcurrency` flags for tuning the maximum number of CPU cores used during merges	2019-10-31 16:17:29 +02:00
Aliaksandr Valialkin	2c654258ef	lib/fs: add MustStopDirRemover for waiting until pending directories are removed on graceful shutdown This patch is mainly required for laggy NFS. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162	2019-09-05 11:17:17 +03:00
Aliaksandr Valialkin	f56c1298ad	app/vmstorage: add `vm_concurrent_addrows_*` metrics for tracking concurrency for Storage.AddRows calls Track also the number of dropped rows due to the exceeded timeout on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`	2019-08-06 15:08:43 +03:00
Aliaksandr Valialkin	8253790157	app/vmstorage: consistency renaming for `ignored rows` metrics vm_too_big_timestamp_rows_total -> vm_rows_ignored_total{reason="big_timestamp"} vm_too_small_timestamp_rows_total -> vm_rows_ignored_total{reason="small_timestamp"}	2019-07-26 20:02:24 +03:00
Aliaksandr Valialkin	c6bec48927	lib/storage: add metrics for calculating skipped rows outside the retention The metrics are: - vm_too_big_timestamp_rows_total - vm_too_small_timestamp_rows_total	2019-07-26 14:11:56 +03:00
Aliaksandr Valialkin	ba8195c58e	all: consistency renaming: bytesSize -> sizeBytes	2019-07-10 00:47:42 +03:00
Aliaksandr Valialkin	41f512af1c	all: add `vm_data_size_bytes` metrics for easy monitoring of on-disk data size and on-disk inverted index size	2019-07-04 19:43:04 +03:00
Aliaksandr Valialkin	a0c22a6830	app/vmstorage: add `vm_cache_entries{type="storage/hour_metric_ids"}` metric for tracking active time series count	2019-06-19 18:37:38 +03:00
Aliaksandr Valialkin	d54f5fec0b	lib/storage: skip adaptive searching for tag filter matching the minimum number of metrics if the identical previous search didn't found such filter This should improve speed for searching metrics among high number of time series with high churn rate like in big Kubernetes clusters with frequent deployments.	2019-06-10 14:07:47 +03:00
Aliaksandr Valialkin	4c3913290a	app/vmstorage: add missing `_total` suffixes to newly added metrics	2019-06-09 22:11:41 +03:00
Aliaksandr Valialkin	d882afa905	lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)	2019-06-09 19:14:04 +03:00
Aliaksandr Valialkin	24578b4bb1	all: open-sourcing cluster version	2019-05-23 00:25:38 +03:00
Aliaksandr Valialkin	1836c415e6	all: open-sourcing single-node version	2019-05-23 00:18:06 +03:00

1 2 3

144 Commits