Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation ( #2177 )
...
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin
5f84b17ed6
lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request
2022-01-21 12:38:09 +02:00
Nikolay
8ff7da7202
adds restore.lock ( #1988 )
...
* adds restore.lock
it must prevent from running storage after incomplete restore process
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958
* return back flock file deletion
* Apply suggestions from code review
* wip
* docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-22 13:10:15 +02:00
Aliaksandr Valialkin
ce333f28d8
all: use logger.WithThrottler() where appropriate
2021-12-21 17:03:25 +02:00
Aliaksandr Valialkin
e1a715b0f5
lib/storage: convert alternate regexps into Graphite wildcards inside __graphite__
pseudo-label
...
For example, `{__graphite__=~"foo.(bar|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution.
This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.
2021-12-14 19:51:49 +02:00
Aliaksandr Valialkin
7275ebf91a
app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches
...
The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query:
vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9
2021-12-02 10:30:43 +02:00
Aliaksandr Valialkin
53bb58ed2a
lib/storage: log a warning when the -storageDataPath has less than -storage.minFreeDiskSpaceBytes
...
This should improve the debuggability of the readonly feature.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1727
2021-10-19 23:59:13 +03:00
Aliaksandr Valialkin
001750c239
lib/storage: fix unaligned access on 32-bit architectures.
...
The bug has been introduced at a171916ef5
2021-10-08 19:43:03 +03:00
Aliaksandr Valialkin
cf5cbd1c70
app/{vminsert,vmstorage}: follow-up after a171916ef5
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-10-08 14:35:49 +03:00
Nikolay
4290b46e8c
Adds read-only mode for vmstorage node ( #1680 )
...
* adds read-only mode for vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
* changes order a bit
* moves isFreeDiskLimitReached var to storage struct
renames functions to be consistent
change protoparser api - with optional storage limit check for given openned storage
* renames freeSpaceLimit to ReadOnly
2021-10-08 14:35:48 +03:00
Aliaksandr Valialkin
f77dde837a
lib/promscrape: add the ability to limit the number of unique series per each scrape target
...
The number of series per target can be limited with the following options:
* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:03:59 +03:00
Aliaksandr Valialkin
4401464c22
all: add support for Prometheus staleness markers
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:10:17 +03:00
Aliaksandr Valialkin
682662b2ae
lib/storage: remove cache directory if it contains reset_cache_on_startup file
...
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:58:51 +03:00
Aliaksandr Valialkin
4f80b2f230
lib/storage: properly limit the size of storage/date_metricID
cache
2021-07-12 14:25:44 +03:00
Aliaksandr Valialkin
8b262d4ba7
lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate
2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
d0c830039d
lib/storage: tune cache sizes according to production workload
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
84fb59b0ba
lib/storage: move deletedMetricIDs set from indexDB to Storage
...
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .
Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .
This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .
The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin
c4f3fbfa5d
lib/storage: reset cache on disk during series deletion and during indexdb rotation
...
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin
2d8bd41f8a
lib/storage: reduce memory allocations when syncing dateMetricIDCache
2021-06-03 16:20:42 +03:00
Aliaksandr Valialkin
39ef1e7a51
lib/storage: do not stop data ingestion on the first error in Storage.AddRows
...
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:47 +03:00
Aliaksandr Valialkin
4b01c9fb2e
lib/storage: limit the number of rows per each block in Storage.AddRows()
...
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:24:07 +03:00
Aliaksandr Valialkin
f54133b200
lib/storage: do not populate MetricID->MetricName cache during data ingestion
...
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.
This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin
ad73f226ff
app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries
and -storage.maxDailySeries
command-line flags
2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin
d7be2753c0
lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss
2021-05-13 09:02:33 +03:00
Aliaksandr Valialkin
832651c6c2
app/vmselect: follow up after 8a0678678b
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168
2021-05-12 17:18:30 +03:00
Nikolay
8a0678678b
Adds tsdb match filters ( #1282 )
...
* init work on filters
* init propose for status filters
* fixes tsdb status
adds test
* fix bug
* removes checks from test
2021-05-12 15:18:45 +03:00
Aliaksandr Valialkin
12d733dd5d
app/vminsert: add support for data ingestion via other vminsert nodes
2021-05-08 19:52:57 +03:00
Aliaksandr Valialkin
dc9eafcd02
app/{vminsert,vmagent}: add -sortLabels
command-line option for sorting time series labels before ingesting them in the storage
...
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:58 +03:00
Aliaksandr Valialkin
e1f699bb6c
lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels
2021-03-31 21:24:46 +03:00
Aliaksandr Valialkin
aa81039b42
app/vmselect: log the metric which trigger rollup result cache reset
...
This should help finding the source of stale metrics
2021-03-25 21:31:39 +02:00
Aliaksandr Valialkin
3cfb3a3683
lib/storage: respect the deadline passed to Storage.SearchMetricNames
2021-03-22 23:03:17 +02:00
Aliaksandr Valialkin
8e2afdf568
lib/storage: improve Search.NextMetricBlock performance by using MetricID->MetricName cache
2021-03-22 22:49:18 +02:00
Aliaksandr Valialkin
726f6ad804
lib/storage: small code simplification after 6cee5338b2
2021-03-18 15:21:13 +02:00
Aliaksandr Valialkin
6cee5338b2
lib/storage: prevent from infinite loop if {__graphite__="..."}
filter matches a metric name with *
, [
or {
chars
...
The idea has been borrowed from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1137
2021-03-18 14:53:47 +02:00
John Belmonte
364fdf4a56
spelling fix: adjacent ( #1115 )
2021-03-09 09:18:19 +02:00
Aliaksandr Valialkin
4a07820048
lib/storage: make sure that nobody uses partitions when closing the table
2021-02-17 14:59:04 +02:00
Aliaksandr Valialkin
4e39bf148c
vendor: update github.com/VictoriaMetrics/metrics from v1.13.1 to v1.14.0
...
The new version switches from log-linear histograms to log-based histograms,
which provide up to 3.6 times better accuracy.
2021-02-15 15:12:29 +02:00
Aliaksandr Valialkin
9f5ac603a7
lib/storage: reduce the minimum supported retention for inverted index from one month to one day
2021-02-15 15:12:29 +02:00
Aliaksandr Valialkin
57cac289e0
lib/storage: fix inconsistencies in error logs
2021-02-10 18:12:16 +02:00
Aliaksandr Valialkin
5d5f0b0627
lib/storage: load metadata before loading indexdb, since indexdb depends on the metadata
2021-02-10 17:55:40 +02:00
Aliaksandr Valialkin
553016ea99
lib/storage: disable composite index usage when querying old data
2021-02-10 14:57:50 +02:00
Aliaksandr Valialkin
6b4e6c229c
lib/storage: reduce lock contention in dateMetricIDCache when registering new time series for the current day
...
This should help systems with multiple CPU cores
2021-02-10 00:01:13 +02:00
Aliaksandr Valialkin
d56390b925
optimize Storage.updatePerDateData()
2021-02-09 02:55:36 +02:00
Aliaksandr Valialkin
2242647a04
lib/storage: optimize data ingestion in the beginning of every hour
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1046
2021-02-08 12:01:12 +02:00
Aliaksandr Valialkin
83d3e582ab
lib/storage: check for prevHourMetricIDs cache before falling back to checking for (date, metricID) entries during data ingestion
...
This should reduce possible CPU usage spikes at the beginning of every hour.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1046
2021-02-04 18:48:13 +02:00
Aliaksandr Valialkin
d16f22f3a1
app/vmselect,lib/storage: properly parse Graphite selectors with inner wildcards
...
Example: foo{bar{x,yz},a[b-c],*de}
2021-02-03 20:14:22 +02:00
Aliaksandr Valialkin
a5a1b9bd66
lib/storage: fix a bug, which breaks searching by Graphite wildcard filters
2021-02-03 20:14:22 +02:00
Aliaksandr Valialkin
157c02622b
app/vmselect: add ability to set Graphite-compatible filter via {__graphite__="foo.*.bar"}
syntax
2021-02-03 01:21:54 +02:00
Aliaksandr Valialkin
4146fc4668
all: properly handle CPU limits set on the host system/container
...
This can reduce memory usage on systems with enabled CPU limits.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
2020-12-08 21:07:29 +02:00
Aliaksandr Valialkin
8a057e705a
lib/storage: log metric name plus all its labels when the metric timestamp is outside the configured retention
...
This should simplify debugging when the source of the metric with unexpected timestamp must be found.
2020-11-25 14:41:37 +02:00
Aliaksandr Valialkin
b65236530c
lib/storage: typo fix in error message: allowd->allowed
2020-11-25 14:15:42 +02:00
Aliaksandr Valialkin
465923b181
app/vmselect/graphite: add /tags/findSeries handler from Graphite Tags API
...
See https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags
2020-11-16 12:53:13 +02:00
Aliaksandr Valialkin
48d033a198
app/vminsert: add /tags/tagSeries
and /tags/tagMultiSeries
handlers from Graphite Tags API
...
See https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb
2020-11-16 02:39:58 +02:00
immerrr again
51c529a2b6
app/vmstorage: add "/internal/force_flush" endpoint ( #893 )
2020-11-11 14:40:27 +02:00
Aliaksandr Valialkin
b378cd6ed8
app/vmselect: optimize querying for /api/v1/labels
and /api/v1/label/<name>/values
when start
and end
args are set
2020-11-05 01:01:33 +02:00
Aliaksandr Valialkin
fe289331dd
lib/storage: remove obsolete code
2020-11-02 19:11:59 +02:00
Aliaksandr Valialkin
64e2d66014
lib/storage: code cleanup after 5bfd4e6218
2020-11-01 23:35:06 +02:00
Aliaksandr Valialkin
5bfd4e6218
app/vmstorage: support for -retentionPeriod
smaller than one month
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/17
2020-10-20 14:31:44 +03:00
Aliaksandr Valialkin
68f0e00761
app/vmstorage: add vm_rows_added_to_storage_total
metric, which shows the total number of rows added to storage since app start
2020-10-09 13:35:48 +03:00
Aliaksandr Valialkin
764dc2499f
lib/storage: code cleanup after 10f2eedee0
...
Remove the code that uses metricIDs caches for the current and the previous hour during metricIDs search,
since this code became unused after implementing per-day inverted index almost a year ago.
While at it, fix a bug, which could prevent from finding time series with names containing dots (aka Graphite-like names
such as `foo.bar.baz`).
2020-10-01 19:06:23 +03:00
Aliaksandr Valialkin
26115891db
lib/decimal: properly store Inf values
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/752
2020-09-18 19:07:07 +03:00
Aliaksandr Valialkin
1f33dd717f
lib/storage: add /internal/force_merge
handler for running forced compactions on historical per-month partitions
...
This may be useful for freeing up storage space after time series deletion.
See https://victoriametrics.github.io/#force-merge for more details.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686
2020-09-17 12:20:40 +03:00
Aliaksandr Valialkin
5a90a92378
lib/storage: do not store inf values, since they may lead to significant precision loss for previously stored values
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/752
2020-09-11 14:44:53 +03:00
Aliaksandr Valialkin
f6bc608e86
app/vmselect: initial implementation of Graphite Metrics API
...
See https://graphite-api.readthedocs.io/en/latest/api.html#the-metrics-api
2020-09-11 00:30:01 +03:00
Aliaksandr Valialkin
9d8fdff6c5
lib/storage: reuse timestamp blocks for adjancent metric blocks with identical timestamps
...
This should reduce disk space usage when scraping targets containing metrics with identical names
such as `node_cpu_seconds_total`, histograms, quantiles, etc.
Expose `vm_timestamps_blocks_merged_total` and `vm_timestamps_bytes_saved_total` metrics for monitoring
the effectiveness of timestamp blocks merging.
2020-09-09 23:59:32 +03:00
Aliaksandr Valialkin
582c74cd93
lib/storage: mention tag filters used in the query that led to error message
...
This should improve detecting invalid or heavy queries that lead to errors.
2020-08-10 13:36:49 +03:00
Aliaksandr Valialkin
f3d33e23c9
app/vmstorage: improve error logging when the request times out
2020-08-10 13:23:26 +03:00
Aliaksandr Valialkin
84fd8af6d3
lib/storage: slow down concurrent searches when the number of concurrent inserts reaches the limit
...
This should improve data ingestion performance when heavy searches are executed
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618
2020-08-07 08:49:40 +03:00
Aliaksandr Valialkin
9043a509a3
lib/storage: properly check timeouts and pace limits
...
Previously they were checked on every iteration for small number of iterations
2020-08-07 08:40:37 +03:00
Aliaksandr Valialkin
ad730d8a17
lib/storage: optimize prefetching metric names for the given metricIDs
2020-08-06 16:53:10 +03:00
Aliaksandr Valialkin
8f16388428
lib/storage: limit the number of concurrent calls to storage.searchTSIDs to GOMAXPROCS*2
...
This should limit the maximum memory usage and reduce CPU trashing on vmstorage
when multiple heavy queries are executed.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-08-05 18:30:07 +03:00
Aliaksandr Valialkin
922d9aadf2
lib/storage: properly update vm_slow_row_inserts_total
metric when importing multiple data points per time series at once
...
Previously the `vm_slow_row_inserts_total` metric may be incremented multiple times for different data points per a single time series,
while only a single increment is needed when inserting the first data point for this time series.
2020-07-30 16:17:24 +03:00
Aliaksandr Valialkin
039c9d2441
lib/storage: respect -search.maxQueryDuration
when searching for time series in inverted index
...
Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`.
This commit stops searching in inverted index on query timeout.
2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin
2a45871823
lib/storage: add more fine-grained pace limiting for search
2020-07-23 19:26:08 +03:00
Aliaksandr Valialkin
6f05c4d351
lib/storage: improve prioritizing of data ingestion over querying
...
Prioritize also small merges over big merges.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-07-23 13:23:36 +03:00
Aliaksandr Valialkin
e4303d3d21
lib/storage: prevent possible race condition when all the goroutines exit Storage.AddRows, before goroutines other goroutines are blocked on searchTSIDsCond inside Storage.searchTSIDs
...
This condition may occur after the following sequence of events:
1) A goroutine enters the loop body when len(addRowsConcurrencyCh) == cap(addRowsConcurrencyCh) inside Storage.searchTSIDs.
2) All the goroutines return from Storage.AddRows.
3) The goroutine from step 1 blocks on searchTSIDsCond.Wait() inside the loop body.
The goroutine remains blocked until the next call to Storage.AddRows, which calls searchTSIDsCond.Signal().
This may take indefinite time.
2020-07-22 21:52:34 +03:00
Aliaksandr Valialkin
d3442b40b2
lib/uint64set: optimize adding items to the set via Set.AddMulti
2020-07-21 20:56:59 +03:00
Aliaksandr Valialkin
e1107fec10
lib/storage: reset MetricName->TSID
cache after marking metricIDs as deleted
...
This is a follow-up commit after 12b16077c4
,
which didn't reset the `tsidCache` in all the required places.
This could result in indefinite errors like:
missing metricName by metricID ...; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time
Fix this by resetting the cache inside deleteMetricIDs function.
2020-07-14 14:06:32 +03:00
Aliaksandr Valialkin
cb92113632
lib/storage: limit the maximum concurrency for data ingestion to GOMAXPROCS
...
Previously the concurrency has been limited to GOMAXPROCS*2. This had little sense,
since every call to Storage.AddRows is bound to CPU, so the maximum ingestion bandwidth
is achieved when the number of concurrent calls to Storage.AddRows is limited to the number of CPUs,
i.e. to GOMAXPROCS.
2020-07-08 17:32:18 +03:00
Aliaksandr Valialkin
32b9fb58b8
lib/storage: clarify out of retention period
error message by mentioning -retentionPeriod
command-line flag
2020-07-08 13:54:26 +03:00
Aliaksandr Valialkin
12b16077c4
lib/storage: reset MetricName->TSID cache after deleting time series
...
This should prevent from adding new data points to deleted time series
without the need to check for the deleted time series.
This improves ingestion performance a bit when the `deleted time series ids` aka `dmis` set
contains big number of time series.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/596
Based on the idea from @n4mine at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/604
2020-07-06 22:01:08 +03:00
Aliaksandr Valialkin
6daa5f7500
lib/storage: prioritize data ingestion over heavy queries
...
Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream.
Prevent this by delaying queries' execution until free resources are available for data ingestion.
Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources
for data ingestion and/or for executing heavy queries.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2020-07-05 19:42:05 +03:00
Aliaksandr Valialkin
d5dddb0953
all: use %w instead of %s for wrapping errors in fmt.Errorf
...
This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode .
See https://blog.golang.org/go1.13-errors for details.
2020-06-30 23:05:11 +03:00
Aliaksandr Valialkin
b19ca3eb5f
lib/storage: do not increment vm_slow_metric_name_loads_total
counter for metric_ids which shouldnt be prefetched, since this may mislead users
2020-05-16 10:21:17 +03:00
Aliaksandr Valialkin
82ffbcb9a6
app/vmstorage: add vm_slow_metric_name_loads_total
metric, which could be used as an indicator when more RAM is needed for improving query performance
2020-05-15 14:11:45 +03:00
Aliaksandr Valialkin
82ccdfaa91
app/vmstorage: add vm_slow_row_inserts_total
and vm_slow_per_day_index_inserts_total
metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series
2020-05-15 13:44:32 +03:00
Aliaksandr Valialkin
4fc33163c4
lib/storage: optimize ingestion pefrormance for new time series
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
8b32e7c3a0
lib/storage: reduce indentation in Storage.add
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
1573ececb2
lib/storage: return the first error instead of the last error, since the first error usually points to the root cause
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
0afd48d2ee
lib: extract common code for returning fast unix timestamp into lib/fasttime
2020-05-14 23:02:07 +03:00
Aliaksandr Valialkin
dbd0c552d5
lib/storage: gradually pre-populate per-day inverted index for the next day
...
This should prevent from CPU usage spikes at 00:00 UTC every day when
inverted index for new day must be quickly created for all the active time series.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
2020-05-12 12:13:05 +03:00
Aliaksandr Valialkin
364db13c9c
app/vmselect: add /api/v1/status/tsdb
page with useful stats for locating root cause for high cardinality issues
...
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:43 +03:00
Aliaksandr Valialkin
f3e0c55ea1
lib/storage: serialize snapshot creation process with mutex
...
This guarantees that the snapshot contains all the recently added data
from inmemory buffers when multiple concurrent calls to Storage.CreateSnapshot are performed.
2020-03-24 22:27:05 +02:00
Aliaksandr Valialkin
18af31a4c2
all: properly split vm_deduplicated_samples_total
among cluster components
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345
2020-02-27 23:48:07 +02:00
Aliaksandr Valialkin
ce15cecae4
lib/storage: typo fix
2020-02-16 15:53:44 +02:00
Aliaksandr Valialkin
32e153e834
lib/storage: prevent from clobbering nin-nil lastError in Storage.add
2020-02-16 15:51:26 +02:00
Aliaksandr Valialkin
eceaf13e5e
lib/{storage,mergeset}: use time.Ticker instead of time.Timer where appropriate
...
It has been appeared that time.Timer was used in places where time.Ticker must be used instead.
This could result in blocked goroutines as in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/316 .
2020-02-13 13:10:07 +02:00
Aliaksandr Valialkin
2152f6f0cd
lib/storage: re-use indexSearch inside Storage.prefetchMetricNames
2020-01-31 01:16:53 +02:00
Aliaksandr Valialkin
d68546aa4a
lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init
...
This should speed up Search.NextMetricBlock loop for big number of found time series.
2020-01-30 15:08:51 +02:00
Aliaksandr Valialkin
680080887d
all: consistently log durations in seconds with millisecond precision
...
This should improve logs readability
2020-01-22 18:28:27 +02:00
Aliaksandr Valialkin
605d588ba6
lib/uint64set: reduce memory usage in Union, Intersect and Subtract methods
...
Iterate items with newly added Set.ForEach method instead of allocating `[]uint64`
slice for all the items before the iteration.
2020-01-15 12:12:49 +02:00
Aliaksandr Valialkin
97f70ccda7
lib/storage: optimize bulk import performance when multiple data points are inserted for the same time series
...
This should speed up `/api/v1/import` and make it more scalable on multi-core systems.
2019-12-19 18:18:29 +02:00
Aliaksandr Valialkin
62a915f2b2
lib/storage: protect from time drift during indexdb rotation
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/248
2019-12-02 14:44:42 +02:00
Aliaksandr Valialkin
7a4635f853
all: remove the remaining mentions of cluster version
2019-11-21 23:18:22 +02:00
Aliaksandr Valialkin
119dfd01bb
lib/storage: add vm_cache_size_bytes{type="storage/hour_metric_ids"}
metric
2019-11-13 20:24:21 +02:00
Aliaksandr Valialkin
86a1cd700b
lib/storage: remove inmemory index for recent hour, since it uses too much memory
...
Production workload shows that the index requires ~4Kb of RAM per active time series.
This is too much for high number of active time series, so let's delete this index.
Now the queries should fall back to the index for the current day instead of the index
for the recent hour. The query performance for the current day index should be good enough
given the 100M rows/sec scan speed per CPU core.
2019-11-13 17:58:07 +02:00
Aliaksandr Valialkin
c57eb0ff83
lib/storage: add -disableRecentHourIndex
flag for disabling inmemory index for recent hour
...
This may be useful for saving RAM on high number of time series aka high cardinality
2019-11-13 15:02:51 +02:00
Aliaksandr Valialkin
ca259864e2
lib/storage: return back inmemory inverted index for recent hour
...
Issues fixed:
- Slow startup times. Now the index is loaded from cache during start.
- High memory usage related to superflouos index copies every 10 seconds.
2019-11-13 13:11:04 +02:00
Aliaksandr Valialkin
01bb3c06c7
lib/storage: remove inmemory inverted index for recent hours
...
Production load with >10M active time series showed it could
slow down VictoriaMetrics startup times and could eat
all the memory leading to OOM.
Remove inmemory inverted index for recent hours until thorough
testing on production data shows it works OK.
2019-11-13 10:45:53 +02:00
Oleg Kovalov
b4f44befa3
fix misspelled words ( #229 )
2019-11-12 00:16:42 +02:00
Aliaksandr Valialkin
8e8f98f712
lib/storage: add tests for dateMetricIDCache
2019-11-11 13:21:57 +02:00
Aliaksandr Valialkin
c342f5e37e
lib/storage: eliminate data race when updating lastSyncTime in dateMetricIDCache.Has
2019-11-10 22:04:01 +02:00
Aliaksandr Valialkin
ee7765b10d
lib/storage: implement per-day inverted index
2019-11-10 00:02:46 +02:00
Aliaksandr Valialkin
5810ba57c2
lib/storage: use specialized cache for (date, metricID) entries
...
This improves ingestion performance.
2019-11-09 23:06:11 +02:00
Aliaksandr Valialkin
9ea549ed24
lib/storage: sync with cluster changes
2019-11-08 21:21:07 +02:00
Aliaksandr Valialkin
d888b21657
lib/storage: add inmemory inverted index for the last hour
...
It should improve performance for `last N hours` dashboards with update intervals smaller than 1 hour.
2019-11-08 21:21:07 +02:00
Aliaksandr Valialkin
6be4456d88
lib/{storage,uint64set}: add Set.Union() function and use it
2019-11-04 00:44:37 +02:00
Aliaksandr Valialkin
e0b292c6de
lib/storage: small cleanup in Storage.add
2019-10-31 14:30:34 +02:00
hanzai
b3c946e35a
warns during rows addition ( #214 )
2019-10-20 23:41:07 +03:00
Aliaksandr Valialkin
97ce4e03a5
all: add support for GOARCH=386 and fix all the issues related to 32-bit architectures such as GOARCH=arm
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212
2019-10-17 18:23:23 +03:00
Aliaksandr Valialkin
b986516fbe
lib/storage: create and use lib/uint64set
instead of map[uint64]struct{}
...
This should improve inverted index search performance for filters matching big number of time series,
since `lib/uint64set.Set` is faster than `map[uint64]struct{}` for both `Add` and `Has` calls.
See the corresponding benchmarks in `lib/uint64set`.
2019-09-24 21:17:55 +03:00
Aliaksandr Valialkin
c9063ece66
lib/storage: share tsids across all the partSearch instances
...
This should reduce memory usage when big number of time series matches the given query.
2019-09-23 22:35:15 +03:00
Aliaksandr Valialkin
0dc0006f34
lib/storage: calculate the maximum number of rows per small part from -memory.allowedPercent
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/159
This simplifies error detection additionally to the `vm_rows_ignored_total` counters.
2019-08-25 15:31:47 +03:00
Aliaksandr Valialkin
09fc6e22e5
all: use workingsetcache instead of fastcache
...
This should reduce the amount of RAM required for processing time series
with non-zero churn rate.
The previous cache behavior can be restored with `-cache.oldBehavior` command-line flag.
2019-08-13 21:39:34 +03:00
Aliaksandr Valialkin
0967683ae9
lib: move common code for creating flock.lock file into fs.CreateFlockFile
2019-08-13 01:45:46 +03:00
Aliaksandr Valialkin
b8bb74ffc6
app/vmstorage: add vm_concurrent_addrows_*
metrics for tracking concurrency for Storage.AddRows calls
...
Track also the number of dropped rows due to the exceeded timeout
on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`
2019-08-06 15:08:33 +03:00
Aliaksandr Valialkin
f586e1f83c
lib/storage: add metrics for calculating skipped rows outside the retention
...
The metrics are:
- vm_too_big_timestamp_rows_total
- vm_too_small_timestamp_rows_total
2019-07-26 14:11:01 +03:00
Aliaksandr Valialkin
2bd1a01d1a
lib/storage: do not pollute inverted index with data for samples outside the retention period
2019-07-11 17:04:56 +03:00
Aliaksandr Valialkin
1fe6d784d8
all: consistency renaming: bytesSize -> sizeBytes
2019-07-10 00:47:36 +03:00
Aliaksandr Valialkin
a78b3dba7f
app/vmstorage: add vm_cache_entries{type="storage/hour_metric_ids"}
metric for tracking active time series count
2019-06-19 18:36:47 +03:00
Aliaksandr Valialkin
d2c801029b
lib/storage: persist metric ids for the current and the previous hour on graceful shutdown
...
This should improve performance after restart when the db contains a lot of time series
with high time series churn (i.e. metrics from Kubernetes with many pods and frequent deployments)
2019-06-14 07:55:14 +03:00
Aliaksandr Valialkin
419197ba08
lib/fs: consolidate *RemoveAll* funcs into a single MustRemoveAll func
...
The func syncs parent dir in order to persist directory removal
in the event of power loss
2019-06-12 01:53:46 +03:00
Aliaksandr Valialkin
935bfd7a18
lib/fs: consistency renaming SyncPath -> MustSyncPath, since it doesnt return error
2019-06-11 23:13:49 +03:00
Aliaksandr Valialkin
ac7b186f13
all: try hard removing directory with contents
...
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
2019-06-11 01:57:59 +03:00
Aliaksandr Valialkin
cbe692f0e2
app/vmselect: add /api/v1/labels/count
handler for quick detection of labels with the maximum number of distinct values
2019-06-10 19:55:38 +03:00
Aliaksandr Valialkin
d37924900b
lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes)
2019-06-09 19:13:56 +03:00
Aliaksandr Valialkin
28f6c36ab4
lib/storage: tune updating a map with today`s metric ids
...
- Increase update iterval from 1s to 10s. This should reduce CPU usage
for large amounts of metric ids with constant churn.
- Reduce pendingTodayMetricIDsLock lock duration during the update.
2019-06-02 21:58:16 +03:00
Aliaksandr Valialkin
4794f894a4
lib/storage: speed up checking metricID existence in the list for the current date
2019-06-02 18:34:08 +03:00
Aliaksandr Valialkin
e307a4d92c
lib/timerpool: use timer pool in concurrency limiters
...
This should reduce the number of memory allocations in highly loaded system
2019-05-28 17:20:10 +03:00
Aliaksandr Valialkin
54fb8b21f9
all: fix misspellings
2019-05-25 21:51:11 +03:00
Aliaksandr Valialkin
1836c415e6
all: open-sourcing single-node version
2019-05-23 00:18:06 +03:00