Commit Graph

112 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
1c777e0245 lib/storage: substitute error message about unsorted items in the index block after metricIDs merge with counter
The origin of the error has been detected and documented in the code,
so it is enough to export a counter for such errors at `vm_index_blocks_with_metric_ids_incorrect_order_total`,
so it could be monitored and alerted on high error rates.

Export also the counter for processed index blocks with metricIDs - `vm_index_blocks_with_metric_ids_processed_total`,
so its' rate could be compared to `rate(vm_index_blocks_with_metric_ids_incorrect_order_total)`.
2019-11-06 14:32:41 +02:00
Aliaksandr Valialkin
c567a4353a lib/storage: take into account the requested time range when caching TSIDs for the given tag filters 2019-11-06 14:32:41 +02:00
Aliaksandr Valialkin
c6564c5d26 lib/storage: dump incorrectly sorted items on a single line; this should simplify error reporting 2019-11-05 18:41:50 +02:00
Aliaksandr Valialkin
e5b1fa0c38 lib/storage: separate the max inverted index scan loops per metric into fast and slow loops
Slow loops could require seeks and expensive regexp matching, while fast loops just scans
all the metricIDs for the given `tag=value` prefix. So these operations must have separate
max loops multiplier.
2019-11-05 17:28:57 +02:00
Aliaksandr Valialkin
f93c4f2493 lib/storage: skip repeated useless work when intersection of metricIDs with the given filter is too expensive
This should improve performance for query filters over big number of time series.
2019-11-05 14:35:55 +02:00
Aliaksandr Valialkin
f48e97263c lib/storage: reduce the maximum inverted index scans before giving up to label filters matching by metric name
The new value reduces the amount of wasted work during index scans over big number of time series.
2019-11-05 14:35:53 +02:00
Aliaksandr Valialkin
d2f688c550 lib/storage: try potentially faster tag filters at first, then apply slower tag filters
The fastest tag filters are non-negative non-regexp, since they are the most specific.
The slowest tag filters are negative regexp, since they require scanning
all the entries for the given label.
2019-11-05 14:35:48 +02:00
Aliaksandr Valialkin
f5fbc3ffd7 lib/{storage,uint64set}: add Set.Union() function and use it 2019-11-04 00:48:32 +02:00
Aliaksandr Valialkin
23e078261e lib/storage: tune the returned value from adjustMaxMetricsAdaptive 2019-11-04 00:45:28 +02:00
Aliaksandr Valialkin
6a22727676 lib/storage: optimize getMetricIDsForRecentHours for per-tenant lookups 2019-10-31 15:51:09 +02:00
Aliaksandr Valialkin
5b01b7fb01 all: add support for GOARCH=386 and fix all the issues related to 32-bit architectures such as GOARCH=arm
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212
2019-10-17 18:27:49 +03:00
Aliaksandr Valialkin
661b8ede5b lib/storage: harden the check that the original items are sorted after mergeTagToMetricIDsRows fails to preserve sort order 2019-10-09 12:13:43 +03:00
Aliaksandr Valialkin
7e410e1412 lib/storage: add tests for mergeTagToMetricIDsRows and return the original items if the function breaks items` ordering.
This should save from data corruption issues revealed in the previous releases up to v1.28.0-beta5.
2019-10-08 16:35:39 +03:00
Aliaksandr Valialkin
95e3d648cb lib/storage: verify whether items are sorted in the end of call to mergeTagToMetricIDsRows
This should prevent from inverted index corruption if bug in mergeTagToMetricIDsRows is discovered.
2019-09-26 13:13:58 +03:00
Aliaksandr Valialkin
4e3871ac1e lib/storage: add missing break in removeDuplicateMetricIDs 2019-09-25 18:23:13 +03:00
Aliaksandr Valialkin
4468f9f966 lib/storage: remove duplicate MetricIDs in tag->metricIDs items before writing them into inverted index 2019-09-25 17:57:36 +03:00
Aliaksandr Valialkin
adc18c3ee6 lib/{mergeset,storage}: do not cache inverted index blocks containing tag->metricIDs items
This should reduce the amounts of used RAM during queries with filters over big number of time series.
2019-09-25 13:48:24 +03:00
Aliaksandr Valialkin
de0e4eee2c lib/storage: create and use lib/uint64set instead of map[uint64]struct{}
This should improve inverted index search performance for filters matching big number of time series,
since `lib/uint64set.Set` is faster than `map[uint64]struct{}` for both `Add` and `Has` calls.
See the corresponding benchmarks in `lib/uint64set`.
2019-09-24 21:18:04 +03:00
Aliaksandr Valialkin
2212d0e421 lib/storage: typo fix: return dstData instead of data from mergeTagToMetricIDsRows 2019-09-24 19:32:58 +03:00
Aliaksandr Valialkin
9307de1b92 lib/storage: limit the number of metricIDs in tag->metricIDs row
This reduces the overhead on index and metaindex in lib/mergeset
2019-09-24 00:50:47 +03:00
Aliaksandr Valialkin
7734fc8012 lib/storage: share tsids across all the partSearch instances
This should reduce memory usage when big number of time series matches the given query.
2019-09-23 22:36:16 +03:00
Aliaksandr Valialkin
67a2bcb98a lib/{storage,mergeset}: verify PrepareBlock callback results
Do not touch the first and the last item passed to PrepareBlock
in order to preserve sort order of mergeset blocks.
2019-09-23 20:46:33 +03:00
Aliaksandr Valialkin
d2ed8cb0b2 lib/storage: generate the first tag->metricIDs item in a mergeset block with a single metricID
The first item from each mergeset block goes into index (lib/mergeset.blockHeader),
so it must be short in order to reduce index size.
2019-09-22 19:37:50 +03:00
Aliaksandr Valialkin
7d13c31566 lib/{storage,mergeset}: merge tag->metricID rows into tag->metricIDs rows for common tag values
This should improve lookup performance if the same `label=value` pair exists
in big number of time series.
This should also reduce memory usage for mergeset data cache, since `tag->metricIDs` rows
occupy less space than the original `tag->metricID` rows.
2019-09-20 22:06:23 +03:00
Aliaksandr Valialkin
7e0c6d4ca6 lib/storage: optimize selecting all the metricIDs by scanning MetricID->TSID entries instead of tag->MetricID entries
The number of MetricID->TSID entries is smaller than the number of tag->MetricID entries
and MetricID->TSID entries are usually shorter than tag->MetricID entries.
This should improve performance when selecting all the metricIDs.
2019-09-20 11:57:57 +03:00
Aliaksandr Valialkin
89234f395d lib/storage: use sort.Sort instead of sort.slice in getSortedMetricIDs 2019-09-19 20:08:13 +03:00
Aliaksandr Valialkin
6e586fa09c lib/storage: skip duplicate call to intersectMetricIDsWithTagFilter on zero successful intersects 2019-09-19 17:51:10 +03:00
Aliaksandr Valialkin
c05885fb5f lib/storage: mark tag filter returning errFallbackToMetricNameMatch as useless
This will save CPU on subsequent calls for this filter
2019-09-18 19:11:44 +03:00
Aliaksandr Valialkin
db71c940ea lib/storage: properly construct keys for uselessTagFiltersCache and register useless negative tag filters there 2019-09-17 23:18:37 +03:00
Aliaksandr Valialkin
0b0153ba3d lib/storage: invalidate tagFilters -> TSIDS cache when newly added index data becomes visible to search
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/163
2019-08-29 15:08:44 +03:00
Aliaksandr Valialkin
a63b69e9e2 lib/storage: report proper maxMetrics limit when more than -search.maxUniqueTimeseries series match the given filters 2019-08-27 14:21:31 +03:00
Aliaksandr Valialkin
6ec6a8d7c1 lib/storage: try slower path for searching the tag filter with the minimum number of matching time series before giving up with increase -search.maxUniqueTimeseries error 2019-08-19 16:07:05 +03:00
Aliaksandr Valialkin
99eed2ca14 lib/storage: properly cache tagFilters -> TSIDs entries from historical index 2019-08-14 02:32:25 +03:00
Aliaksandr Valialkin
f1d81b9405 lib/storage: compress contents of cache for tagFilters -> TSIDs
This should increase cache capacity
2019-08-14 02:32:22 +03:00
Aliaksandr Valialkin
8c2158af24 all: use workingsetcache instead of fastcache
This should reduce the amount of RAM required for processing time series
with non-zero churn rate.

The previous cache behavior can be restored with `-cache.oldBehavior` command-line flag.
2019-08-13 21:40:28 +03:00
Aliaksandr Valialkin
b7c4b0c6d2 lib/storage: fix matching against tag filter with empty name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/137
2019-07-30 15:15:21 +03:00
Aliaksandr Valialkin
73a47d2a53 lib/storage: remove unused function isTooBigTimeRangeForDateMetricIDs 2019-07-12 02:28:40 +03:00
Aliaksandr Valialkin
97f9397687 lib/storage: do not reduce maxMetrics on time ranges exceeding maxDaysForDateMetricIDs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/95
2019-07-12 02:21:52 +03:00
Aliaksandr Valialkin
0522efb2d6 lib/storage: add missing tagFilter.Marshal func 2019-07-11 15:01:01 +03:00
Aliaksandr Valialkin
bf2e1b0ac1 lib/storage: remember and skip individual tag filters matching too many metrics
This saves CPU time by skipping useless matching for individual tag filters.
2019-07-11 14:48:47 +03:00
Aliaksandr Valialkin
ba8195c58e all: consistency renaming: bytesSize -> sizeBytes 2019-07-10 00:47:42 +03:00
Aliaksandr Valialkin
ffc1bb00f6 lib/storage: skip non-matching metricIDs in sortedFilter
This should improve performance for big sorteFilter lists.
2019-06-29 13:49:40 +03:00
Aliaksandr Valialkin
416d27ef11 lib/storage: optimize time series search by regexp filter
This should improve search speed on label filters like `{foo=~"bar.+baz"}`
2019-06-27 16:18:00 +03:00
Aliaksandr Valialkin
ee23a143b9 lib/storage: make sure non-nil args are passed to openIndexDB 2019-06-25 20:10:08 +03:00
Aliaksandr Valialkin
8b0a63722f lib/storage: reduce too big maxMetrics in getTagFilterWithMinMetricIDsCountAdaptive
This should improve performance on inverted index search for big amount of unique time series
when big -search.maxUniqueTimeseries is set.
2019-06-25 19:57:31 +03:00
Aliaksandr Valialkin
0263cb0adc lib/storage: free up memory from caches owned by indexDB when it is deleted 2019-06-25 14:41:16 +03:00
Aliaksandr Valialkin
362e187011 lib/storage: use unversioned keys for tag cache in extDB
Data in ExtDB cannot be changed, so it is OK to use unversioned keys for tag cache.
This should improve performance for index lookups over big amount of time series.
2019-06-25 13:15:42 +03:00
Aliaksandr Valialkin
51e2f3b48f lib/storage: skip searching in extDB if it doesn't contain items for the given time range
This should improve inverted index search performance for big amount
of unique time series when the search is performed only on recent data.
2019-06-25 12:57:56 +03:00
Aliaksandr Valialkin
9cac11db64 lib/storage: typo fixes found by golangci-lint; updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/69 2019-06-20 14:38:45 +03:00
Aliaksandr Valialkin
18d6f293f7 lib/fs: consolidate *RemoveAll* funcs into a single MustRemoveAll func
The func syncs parent dir in order to persist directory removal
in the event of power loss
2019-06-12 01:55:18 +03:00
Aliaksandr Valialkin
3437c30180 all: try hard removing directory with contents
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
2019-06-11 01:58:08 +03:00
Aliaksandr Valialkin
0ccedbdfd2 lib/storage: mention the accountID and projectID in error message when filtering out other (accountID, projectID) entries 2019-06-10 14:43:53 +03:00
Aliaksandr Valialkin
d54f5fec0b lib/storage: skip adaptive searching for tag filter matching the minimum number of metrics if the identical previous search didn't found such filter
This should improve speed for searching metrics among high number of time series
with high churn rate like in big Kubernetes clusters with frequent deployments.
2019-06-10 14:07:47 +03:00
Aliaksandr Valialkin
27e50e86f4 lib/storage: factor out getTagFilterWithMinMetricIDsCountAdaptive from updateMetricIDsForTagFilters 2019-06-10 13:26:00 +03:00
Aliaksandr Valialkin
b69d3dbd0c lib/storage: filter out metricIDs from another (AccountID, ProjectID) in getMetricIDsForRecentHours 2019-06-10 13:05:16 +03:00
Aliaksandr Valialkin
3059ae7be0 lib/storage: give clearer names to more functions 2019-06-10 12:59:33 +03:00
Aliaksandr Valialkin
d3a024d2d6 lib/storage: give more clear names to functions 2019-06-10 12:50:22 +03:00
Aliaksandr Valialkin
e4cba5a7ed lib/storage: make getSeriesCount func indexSearch method 2019-06-10 12:29:24 +03:00
Aliaksandr Valialkin
d882afa905 lib/storage: optimize time series lookup for recent hours when the db contains many millions of time series with high churn rate (aka frequent deployments in Kubernetes) 2019-06-09 19:14:04 +03:00
Aliaksandr Valialkin
bdf696ef18 all: fix misspellings 2019-05-25 21:51:24 +03:00
Aliaksandr Valialkin
24578b4bb1 all: open-sourcing cluster version 2019-05-23 00:25:38 +03:00
Aliaksandr Valialkin
1836c415e6 all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00