2019-05-22 23:16:55 +02:00
package storage
import (
"bytes"
2020-04-22 18:57:36 +02:00
"container/heap"
2019-05-22 23:16:55 +02:00
"errors"
"fmt"
"io"
"path/filepath"
"sort"
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 23:30:08 +01:00
"strconv"
2019-05-22 23:16:55 +02:00
"sync"
"sync/atomic"
"time"
"unsafe"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
2021-02-16 12:03:58 +01:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
2019-06-11 00:56:37 +02:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
2019-05-22 23:16:55 +02:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset"
2022-06-01 01:29:19 +02:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
2024-05-12 11:24:48 +02:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
2019-09-24 20:10:22 +02:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/uint64set"
2019-08-13 20:35:19 +02:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache"
2019-05-22 23:16:55 +02:00
"github.com/VictoriaMetrics/fastcache"
2022-06-08 17:43:05 +02:00
"github.com/cespare/xxhash/v2"
2019-05-22 23:16:55 +02:00
)
const (
// Prefix for MetricName->TSID entries.
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
//
// This index was substituted with nsPrefixDateMetricNameToTSID,
// since the MetricName->TSID index may require big amounts of memory for indexdb/dataBlocks cache
// when it grows big on the configured retention under high churn rate
// (e.g. when new time series are constantly registered).
//
// It is much more efficient from memory usage PoV to query per-day MetricName->TSID index
// (aka nsPrefixDateMetricNameToTSID) when the TSID must be obtained for the given MetricName
// during data ingestion under high churn rate and big retention.
//
// nsPrefixMetricNameToTSID = 0
2019-05-22 23:16:55 +02:00
// Prefix for Tag->MetricID entries.
2019-09-20 18:46:47 +02:00
nsPrefixTagToMetricIDs = 1
2019-05-22 23:16:55 +02:00
// Prefix for MetricID->TSID entries.
nsPrefixMetricIDToTSID = 2
// Prefix for MetricID->MetricName entries.
nsPrefixMetricIDToMetricName = 3
// Prefix for deleted MetricID entries.
2019-09-25 12:47:06 +02:00
nsPrefixDeletedMetricID = 4
2019-05-22 23:16:55 +02:00
// Prefix for Date->MetricID entries.
nsPrefixDateToMetricID = 5
2019-11-09 22:17:42 +01:00
// Prefix for (Date,Tag)->MetricID entries.
nsPrefixDateTagToMetricIDs = 6
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
// Prefix for (Date,MetricName)->TSID entries.
nsPrefixDateMetricNameToTSID = 7
2019-05-22 23:16:55 +02:00
)
// indexDB represents an index db.
type indexDB struct {
2024-02-23 21:54:55 +01:00
// The number of references to indexDB struct.
refCount atomic . Int32
2019-10-17 17:22:56 +02:00
2024-02-23 21:54:55 +01:00
// if the mustDrop is set to true, then the indexDB must be dropped after refCount reaches zero.
mustDrop atomic . Bool
2019-10-17 17:22:56 +02:00
// The number of missing MetricID -> TSID entries.
// High rate for this value means corrupted indexDB.
2024-02-23 23:15:21 +01:00
missingTSIDsForMetricID atomic . Uint64
2019-10-17 17:22:56 +02:00
2019-11-09 22:17:42 +01:00
// The number of calls for date range searches.
2024-02-23 23:15:21 +01:00
dateRangeSearchCalls atomic . Uint64
2019-11-09 22:17:42 +01:00
// The number of hits for date range searches.
2024-02-23 23:15:21 +01:00
dateRangeSearchHits atomic . Uint64
2019-11-09 22:17:42 +01:00
2021-07-30 07:37:10 +02:00
// The number of calls for global search.
2024-02-23 23:15:21 +01:00
globalSearchCalls atomic . Uint64
2021-07-30 07:37:10 +02:00
2019-12-02 19:44:18 +01:00
// missingMetricNamesForMetricID is a counter of missing MetricID -> MetricName entries.
// High rate may mean corrupted indexDB due to unclean shutdown.
// The db must be automatically recovered after that.
2024-02-23 23:15:21 +01:00
missingMetricNamesForMetricID atomic . Uint64
2019-12-02 19:44:18 +01:00
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 23:30:08 +01:00
// generation identifies the index generation ID
// and is used for syncing items from different indexDBs
generation uint64
2019-10-17 17:22:56 +02:00
name string
tb * mergeset . Table
2019-05-22 23:16:55 +02:00
extDB * indexDB
extDBLock sync . Mutex
2022-10-23 11:15:24 +02:00
// Cache for fast TagFilters -> MetricIDs lookup.
tagFiltersToMetricIDsCache * workingsetcache . Cache
2019-05-22 23:16:55 +02:00
2021-06-11 11:42:26 +02:00
// The parent storage.
s * Storage
2020-07-14 13:02:14 +02:00
2021-02-23 14:47:19 +01:00
// Cache for (date, tagFilter) -> loopsCount, which is used for reducing
2020-03-30 23:44:41 +02:00
// the amount of work when matching a set of filters.
2021-02-23 14:47:19 +01:00
loopsPerDateTagFilterCache * workingsetcache . Cache
2020-03-30 23:44:41 +02:00
2019-05-22 23:16:55 +02:00
indexSearchPool sync . Pool
}
2022-10-23 11:15:24 +02:00
var maxTagFiltersCacheSize int
2022-06-01 10:07:53 +02:00
2022-10-23 11:15:24 +02:00
// SetTagFiltersCacheSize overrides the default size of tagFiltersToMetricIDsCache
func SetTagFiltersCacheSize ( size int ) {
maxTagFiltersCacheSize = size
2022-06-01 10:07:53 +02:00
}
2022-10-23 11:15:24 +02:00
func getTagFiltersCacheSize ( ) int {
if maxTagFiltersCacheSize <= 0 {
2022-06-01 10:07:53 +02:00
return int ( float64 ( memory . Allowed ( ) ) / 32 )
}
2022-10-23 11:15:24 +02:00
return maxTagFiltersCacheSize
2022-06-01 10:07:53 +02:00
}
2023-04-15 07:08:43 +02:00
// mustOpenIndexDB opens index db from the given path.
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 23:30:08 +01:00
//
// The last segment of the path should contain unique hex value which
// will be then used as indexDB.generation
2024-02-23 22:29:23 +01:00
func mustOpenIndexDB ( path string , s * Storage , isReadOnly * atomic . Bool ) * indexDB {
2021-06-11 11:42:26 +02:00
if s == nil {
logger . Panicf ( "BUG: Storage must be nin-nil" )
2020-07-14 13:02:14 +02:00
}
2019-06-25 19:09:57 +02:00
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 23:30:08 +01:00
name := filepath . Base ( path )
gen , err := strconv . ParseUint ( name , 16 , 64 )
if err != nil {
2023-04-15 07:08:43 +02:00
logger . Panicf ( "FATAL: cannot parse indexdb path %q: %s" , path , err )
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 23:30:08 +01:00
}
2023-04-15 07:08:43 +02:00
tb := mergeset . MustOpenTable ( path , invalidateTagFiltersCache , mergeTagToMetricIDsRows , isReadOnly )
2019-05-22 23:16:55 +02:00
2024-01-23 15:09:52 +01:00
// Do not persist tagFiltersToMetricIDsCache in files, since it is very volatile because of tagFiltersKeyGen.
2019-05-22 23:16:55 +02:00
mem := memory . Allowed ( )
2022-10-23 11:15:24 +02:00
tagFiltersCacheSize := getTagFiltersCacheSize ( )
2019-05-22 23:16:55 +02:00
db := & indexDB {
2023-07-23 00:20:21 +02:00
generation : gen ,
tb : tb ,
name : name ,
2019-05-22 23:16:55 +02:00
2022-10-23 11:15:24 +02:00
tagFiltersToMetricIDsCache : workingsetcache . New ( tagFiltersCacheSize ) ,
2021-06-11 11:42:26 +02:00
s : s ,
2022-02-23 12:39:11 +01:00
loopsPerDateTagFilterCache : workingsetcache . New ( mem / 128 ) ,
2019-05-22 23:16:55 +02:00
}
2024-02-23 21:54:55 +01:00
db . incRef ( )
2023-04-15 07:08:43 +02:00
return db
2019-05-22 23:16:55 +02:00
}
2020-07-23 19:42:57 +02:00
const noDeadline = 1 << 64 - 1
2019-05-22 23:16:55 +02:00
// IndexDBMetrics contains essential metrics for indexDB.
type IndexDBMetrics struct {
2022-10-23 11:15:24 +02:00
TagFiltersToMetricIDsCacheSize uint64
TagFiltersToMetricIDsCacheSizeBytes uint64
TagFiltersToMetricIDsCacheSizeMaxBytes uint64
TagFiltersToMetricIDsCacheRequests uint64
TagFiltersToMetricIDsCacheMisses uint64
2019-05-22 23:16:55 +02:00
DeletedMetricsCount uint64
IndexDBRefCount uint64
MissingTSIDsForMetricID uint64
2019-06-09 18:06:53 +02:00
RecentHourMetricIDsSearchCalls uint64
RecentHourMetricIDsSearchHits uint64
2019-11-09 22:17:42 +01:00
DateRangeSearchCalls uint64
DateRangeSearchHits uint64
2021-07-30 07:37:10 +02:00
GlobalSearchCalls uint64
2019-11-09 22:17:42 +01:00
2019-12-02 19:44:18 +01:00
MissingMetricNamesForMetricID uint64
2019-11-06 13:24:48 +01:00
IndexBlocksWithMetricIDsProcessed uint64
IndexBlocksWithMetricIDsIncorrectOrder uint64
2021-02-17 18:13:38 +01:00
MinTimestampForCompositeIndex uint64
CompositeFilterSuccessConversions uint64
CompositeFilterMissingConversions uint64
2021-02-10 15:53:26 +01:00
2019-05-22 23:16:55 +02:00
mergeset . TableMetrics
}
func ( db * indexDB ) scheduleToDrop ( ) {
2024-02-23 21:54:55 +01:00
db . mustDrop . Store ( true )
2019-05-22 23:16:55 +02:00
}
// UpdateMetrics updates m with metrics from the db.
func ( db * indexDB ) UpdateMetrics ( m * IndexDBMetrics ) {
var cs fastcache . Stats
2019-06-10 13:02:44 +02:00
cs . Reset ( )
2022-10-23 11:15:24 +02:00
db . tagFiltersToMetricIDsCache . UpdateStats ( & cs )
m . TagFiltersToMetricIDsCacheSize += cs . EntriesCount
m . TagFiltersToMetricIDsCacheSizeBytes += cs . BytesSize
m . TagFiltersToMetricIDsCacheSizeMaxBytes += cs . MaxBytesSize
m . TagFiltersToMetricIDsCacheRequests += cs . GetCalls
m . TagFiltersToMetricIDsCacheMisses += cs . Misses
2019-05-22 23:16:55 +02:00
2021-06-15 13:56:51 +02:00
m . DeletedMetricsCount += uint64 ( db . s . getDeletedMetricIDs ( ) . Len ( ) )
2019-05-22 23:16:55 +02:00
2024-02-23 21:54:55 +01:00
m . IndexDBRefCount += uint64 ( db . refCount . Load ( ) )
2024-02-23 23:15:21 +01:00
m . MissingTSIDsForMetricID += db . missingTSIDsForMetricID . Load ( )
2019-05-22 23:16:55 +02:00
2024-02-23 23:15:21 +01:00
m . DateRangeSearchCalls += db . dateRangeSearchCalls . Load ( )
m . DateRangeSearchHits += db . dateRangeSearchHits . Load ( )
m . GlobalSearchCalls += db . globalSearchCalls . Load ( )
2019-11-09 22:17:42 +01:00
2024-02-23 23:15:21 +01:00
m . MissingMetricNamesForMetricID += db . missingMetricNamesForMetricID . Load ( )
2019-12-02 19:44:18 +01:00
2024-02-23 23:15:21 +01:00
m . IndexBlocksWithMetricIDsProcessed = indexBlocksWithMetricIDsProcessed . Load ( )
m . IndexBlocksWithMetricIDsIncorrectOrder = indexBlocksWithMetricIDsIncorrectOrder . Load ( )
2019-11-06 13:24:48 +01:00
2021-06-11 11:42:26 +02:00
m . MinTimestampForCompositeIndex = uint64 ( db . s . minTimestampForCompositeIndex )
2024-02-23 23:15:21 +01:00
m . CompositeFilterSuccessConversions = compositeFilterSuccessConversions . Load ( )
m . CompositeFilterMissingConversions = compositeFilterMissingConversions . Load ( )
2021-02-10 15:53:26 +01:00
2019-05-22 23:16:55 +02:00
db . tb . UpdateMetrics ( & m . TableMetrics )
db . doExtDB ( func ( extDB * indexDB ) {
extDB . tb . UpdateMetrics ( & m . TableMetrics )
2024-02-23 21:54:55 +01:00
m . IndexDBRefCount += uint64 ( extDB . refCount . Load ( ) )
2019-05-22 23:16:55 +02:00
} )
}
func ( db * indexDB ) doExtDB ( f func ( extDB * indexDB ) ) bool {
db . extDBLock . Lock ( )
extDB := db . extDB
if extDB != nil {
extDB . incRef ( )
}
db . extDBLock . Unlock ( )
if extDB == nil {
return false
}
f ( extDB )
extDB . decRef ( )
return true
}
// SetExtDB sets external db to search.
//
// It decrements refCount for the previous extDB.
func ( db * indexDB ) SetExtDB ( extDB * indexDB ) {
db . extDBLock . Lock ( )
prevExtDB := db . extDB
db . extDB = extDB
db . extDBLock . Unlock ( )
if prevExtDB != nil {
prevExtDB . decRef ( )
}
}
// MustClose closes db.
func ( db * indexDB ) MustClose ( ) {
db . decRef ( )
}
func ( db * indexDB ) incRef ( ) {
2024-02-23 21:54:55 +01:00
db . refCount . Add ( 1 )
2019-05-22 23:16:55 +02:00
}
func ( db * indexDB ) decRef ( ) {
2024-02-23 21:54:55 +01:00
n := db . refCount . Add ( - 1 )
if n < 0 {
2019-05-22 23:16:55 +02:00
logger . Panicf ( "BUG: negative refCount: %d" , n )
}
if n > 0 {
return
}
tbPath := db . tb . Path ( )
db . tb . MustClose ( )
db . SetExtDB ( nil )
2019-06-25 13:39:17 +02:00
// Free space occupied by caches owned by db.
2022-10-23 11:15:24 +02:00
db . tagFiltersToMetricIDsCache . Stop ( )
2021-02-23 14:47:19 +01:00
db . loopsPerDateTagFilterCache . Stop ( )
2019-06-25 13:39:17 +02:00
2022-10-23 11:15:24 +02:00
db . tagFiltersToMetricIDsCache = nil
2021-06-11 11:42:26 +02:00
db . s = nil
2021-02-23 14:47:19 +01:00
db . loopsPerDateTagFilterCache = nil
2019-06-25 13:39:17 +02:00
2024-02-23 21:54:55 +01:00
if ! db . mustDrop . Load ( ) {
2019-05-22 23:16:55 +02:00
return
}
logger . Infof ( "dropping indexDB %q" , tbPath )
2022-09-13 14:48:20 +02:00
fs . MustRemoveDirAtomic ( tbPath )
2019-05-22 23:16:55 +02:00
logger . Infof ( "indexDB %q has been dropped" , tbPath )
}
2022-10-23 11:15:24 +02:00
var tagBufPool bytesutil . ByteBufferPool
func ( db * indexDB ) getMetricIDsFromTagFiltersCache ( qt * querytracer . Tracer , key [ ] byte ) ( [ ] uint64 , bool ) {
qt = qt . NewChild ( "search for metricIDs in tag filters cache" )
2022-06-09 18:46:26 +02:00
defer qt . Done ( )
2019-08-14 00:50:20 +02:00
buf := tagBufPool . Get ( )
defer tagBufPool . Put ( buf )
2022-10-23 11:15:24 +02:00
buf . B = db . tagFiltersToMetricIDsCache . GetBig ( buf . B [ : 0 ] , key )
if len ( buf . B ) == 0 {
qt . Printf ( "cache miss" )
return nil , false
2019-08-14 00:50:20 +02:00
}
2022-10-23 11:15:24 +02:00
qt . Printf ( "found metricIDs with size: %d bytes" , len ( buf . B ) )
2024-01-23 15:09:52 +01:00
metricIDs := mustUnmarshalMetricIDs ( nil , buf . B )
2022-10-23 11:15:24 +02:00
qt . Printf ( "unmarshaled %d metricIDs" , len ( metricIDs ) )
return metricIDs , true
2019-05-22 23:16:55 +02:00
}
2022-10-23 11:15:24 +02:00
func ( db * indexDB ) putMetricIDsToTagFiltersCache ( qt * querytracer . Tracer , metricIDs [ ] uint64 , key [ ] byte ) {
qt = qt . NewChild ( "put %d metricIDs in cache" , len ( metricIDs ) )
2022-06-09 18:46:26 +02:00
defer qt . Done ( )
2019-08-14 00:50:20 +02:00
buf := tagBufPool . Get ( )
2022-10-23 11:15:24 +02:00
buf . B = marshalMetricIDs ( buf . B , metricIDs )
qt . Printf ( "marshaled %d metricIDs into %d bytes" , len ( metricIDs ) , len ( buf . B ) )
db . tagFiltersToMetricIDsCache . SetBig ( key , buf . B )
qt . Printf ( "stored %d metricIDs into cache" , len ( metricIDs ) )
2019-08-14 00:50:20 +02:00
tagBufPool . Put ( buf )
2019-05-22 23:16:55 +02:00
}
func ( db * indexDB ) getFromMetricIDCache ( dst * TSID , metricID uint64 ) error {
// There is no need in checking for deleted metricIDs here, since they
// must be checked by the caller.
buf := ( * [ unsafe . Sizeof ( * dst ) ] byte ) ( unsafe . Pointer ( dst ) )
key := ( * [ unsafe . Sizeof ( metricID ) ] byte ) ( unsafe . Pointer ( & metricID ) )
2021-06-11 11:42:26 +02:00
tmp := db . s . metricIDCache . Get ( buf [ : 0 ] , key [ : ] )
2019-05-22 23:16:55 +02:00
if len ( tmp ) == 0 {
// The TSID for the given metricID wasn't found in the cache.
return io . EOF
}
if & tmp [ 0 ] != & buf [ 0 ] || len ( tmp ) != len ( buf ) {
return fmt . Errorf ( "corrupted MetricID->TSID cache: unexpected size for metricID=%d value; got %d bytes; want %d bytes" , metricID , len ( tmp ) , len ( buf ) )
}
return nil
}
func ( db * indexDB ) putToMetricIDCache ( metricID uint64 , tsid * TSID ) {
buf := ( * [ unsafe . Sizeof ( * tsid ) ] byte ) ( unsafe . Pointer ( tsid ) )
key := ( * [ unsafe . Sizeof ( metricID ) ] byte ) ( unsafe . Pointer ( & metricID ) )
2021-06-11 11:42:26 +02:00
db . s . metricIDCache . Set ( key [ : ] , buf [ : ] )
2019-05-22 23:16:55 +02:00
}
func ( db * indexDB ) getMetricNameFromCache ( dst [ ] byte , metricID uint64 ) [ ] byte {
// There is no need in checking for deleted metricIDs here, since they
// must be checked by the caller.
key := ( * [ unsafe . Sizeof ( metricID ) ] byte ) ( unsafe . Pointer ( & metricID ) )
2021-06-11 11:42:26 +02:00
return db . s . metricNameCache . Get ( dst , key [ : ] )
2019-05-22 23:16:55 +02:00
}
func ( db * indexDB ) putMetricNameToCache ( metricID uint64 , metricName [ ] byte ) {
key := ( * [ unsafe . Sizeof ( metricID ) ] byte ) ( unsafe . Pointer ( & metricID ) )
2021-06-11 11:42:26 +02:00
db . s . metricNameCache . Set ( key [ : ] , metricName )
2019-05-22 23:16:55 +02:00
}
2019-11-06 12:39:48 +01:00
func marshalTagFiltersKey ( dst [ ] byte , tfss [ ] * TagFilters , tr TimeRange , versioned bool ) [ ] byte {
2024-01-23 15:09:52 +01:00
// There is no need in versioning the tagFilters key, since the tagFiltersToMetricIDsCache
// isn't persisted to disk (it is very volatile because of tagFiltersKeyGen).
2019-06-25 12:08:56 +02:00
prefix := ^ uint64 ( 0 )
if versioned {
2024-02-23 23:15:21 +01:00
prefix = tagFiltersKeyGen . Load ( )
2019-06-25 12:08:56 +02:00
}
2020-10-01 13:35:49 +02:00
// Round start and end times to per-day granularity according to per-day inverted index.
startDate := uint64 ( tr . MinTimestamp ) / msecPerDay
2022-06-12 03:32:13 +02:00
endDate := uint64 ( tr . MaxTimestamp - 1 ) / msecPerDay
2019-05-22 23:16:55 +02:00
dst = encoding . MarshalUint64 ( dst , prefix )
2020-10-01 13:35:49 +02:00
dst = encoding . MarshalUint64 ( dst , startDate )
dst = encoding . MarshalUint64 ( dst , endDate )
2019-05-22 23:16:55 +02:00
for _ , tfs := range tfss {
dst = append ( dst , 0 ) // separator between tfs groups.
for i := range tfs . tfs {
dst = tfs . tfs [ i ] . Marshal ( dst )
}
}
return dst
}
2021-07-06 10:01:51 +02:00
func invalidateTagFiltersCache ( ) {
2024-01-23 15:09:52 +01:00
// This function must be fast, since it is called each time new timeseries is added.
2024-02-23 23:15:21 +01:00
tagFiltersKeyGen . Add ( 1 )
2020-10-01 13:35:49 +02:00
}
2024-02-23 23:15:21 +01:00
var tagFiltersKeyGen atomic . Uint64
2020-10-01 13:35:49 +02:00
2022-10-23 11:15:24 +02:00
func marshalMetricIDs ( dst [ ] byte , metricIDs [ ] uint64 ) [ ] byte {
2024-01-23 15:09:52 +01:00
// Compress metricIDs, so they occupy less space in the cache.
//
// The srcBuf is a []byte cast of metricIDs.
2024-02-29 16:24:34 +01:00
srcBuf := unsafe . Slice ( ( * byte ) ( unsafe . Pointer ( unsafe . SliceData ( metricIDs ) ) ) , 8 * len ( metricIDs ) )
2024-01-23 15:09:52 +01:00
dst = encoding . CompressZSTDLevel ( dst , srcBuf , 1 )
2019-05-22 23:16:55 +02:00
return dst
}
2024-01-23 15:09:52 +01:00
func mustUnmarshalMetricIDs ( dst [ ] uint64 , src [ ] byte ) [ ] uint64 {
// Decompress src into dstBuf.
//
// dstBuf is a []byte cast of dst.
2024-02-29 16:24:34 +01:00
dstBuf := unsafe . Slice ( ( * byte ) ( unsafe . Pointer ( unsafe . SliceData ( dst ) ) ) , 8 * cap ( dst ) )
dstBuf = dstBuf [ : 8 * len ( dst ) ]
2024-01-23 15:09:52 +01:00
dstBufLen := len ( dstBuf )
var err error
dstBuf , err = encoding . DecompressZSTD ( dstBuf , src )
if err != nil {
logger . Panicf ( "FATAL: cannot decompress metricIDs: %s" , err )
2019-05-22 23:16:55 +02:00
}
2024-01-23 15:09:52 +01:00
if len ( dstBuf ) == dstBufLen {
// Zero metricIDs
return dst
2019-05-22 23:16:55 +02:00
}
2024-01-23 15:09:52 +01:00
if ( len ( dstBuf ) - dstBufLen ) % 8 != 0 {
logger . Panicf ( "FATAL: cannot unmarshal metricIDs from buffer of %d bytes; the buffer length must divide by 8" , len ( dstBuf ) - dstBufLen )
2019-05-22 23:16:55 +02:00
}
2024-01-23 15:09:52 +01:00
// Convert dstBuf back to dst
2024-02-29 16:24:34 +01:00
dst = unsafe . Slice ( ( * uint64 ) ( unsafe . Pointer ( unsafe . SliceData ( dstBuf ) ) ) , cap ( dstBuf ) / 8 )
dst = dst [ : len ( dstBuf ) / 8 ]
2024-01-23 15:09:52 +01:00
return dst
2019-05-22 23:16:55 +02:00
}
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
// getTSIDByMetricName fills the dst with TSID for the given metricName at the given date.
2019-05-22 23:16:55 +02:00
//
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
// It returns false if the given metricName isn't found in the indexdb.
func ( is * indexSearch ) getTSIDByMetricName ( dst * generationTSID , metricName [ ] byte , date uint64 ) bool {
if is . getTSIDByMetricNameNoExtDB ( & dst . TSID , metricName , date ) {
// Fast path - the TSID is found in the current indexdb.
dst . generation = is . db . generation
return true
2019-05-22 23:16:55 +02:00
}
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
// Slow path - search for the TSID in the previous indexdb
ok := false
deadline := is . deadline
is . db . doExtDB ( func ( extDB * indexDB ) {
is := extDB . getIndexSearch ( deadline )
ok = is . getTSIDByMetricNameNoExtDB ( & dst . TSID , metricName , date )
extDB . putIndexSearch ( is )
if ok {
dst . generation = extDB . generation
}
} )
return ok
2019-05-22 23:16:55 +02:00
}
type indexSearch struct {
db * indexDB
ts mergeset . TableSearch
kb bytesutil . ByteBuffer
2019-09-20 18:46:47 +02:00
mp tagToMetricIDsRowParser
2019-05-22 23:16:55 +02:00
2020-07-23 19:42:57 +02:00
// deadline in unix timestamp seconds for the given search.
deadline uint64
2019-05-22 23:16:55 +02:00
}
2020-07-23 19:42:57 +02:00
func ( db * indexDB ) getIndexSearch ( deadline uint64 ) * indexSearch {
2019-05-22 23:16:55 +02:00
v := db . indexSearchPool . Get ( )
if v == nil {
v = & indexSearch {
db : db ,
}
}
is := v . ( * indexSearch )
2021-02-08 23:43:19 +01:00
is . ts . Init ( db . tb )
2020-07-23 19:42:57 +02:00
is . deadline = deadline
2019-05-22 23:16:55 +02:00
return is
}
func ( db * indexDB ) putIndexSearch ( is * indexSearch ) {
is . ts . MustClose ( )
is . kb . Reset ( )
2019-09-20 18:46:47 +02:00
is . mp . Reset ( )
2020-07-23 19:42:57 +02:00
is . deadline = 0
2019-05-22 23:16:55 +02:00
db . indexSearchPool . Put ( is )
}
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation
IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.
IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.
The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.
To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.
When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.
This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.
The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
* wip
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 23:30:08 +01:00
func generateTSID ( dst * TSID , mn * MetricName ) {
2019-05-22 23:16:55 +02:00
dst . MetricGroupID = xxhash . Sum64 ( mn . MetricGroup )
2022-02-25 12:21:02 +01:00
// Assume that the job-like metric is put at mn.Tags[0], while instance-like metric is put at mn.Tags[1]
// This assumption is true because mn.Tags must be sorted with mn.sortTags() before calling generateTSID() function.
// This allows grouping data blocks for the same (job, instance) close to each other on disk.
// This reduces disk seeks and disk read IO when data blocks are read from disk for the same job and/or instance.
2023-02-13 13:27:13 +01:00
// For example, data blocks for time series matching `process_resident_memory_bytes{job="vmstorage"}` are physically adjacent on disk.
2019-05-22 23:16:55 +02:00
if len ( mn . Tags ) > 0 {
dst . JobID = uint32 ( xxhash . Sum64 ( mn . Tags [ 0 ] . Value ) )
}
if len ( mn . Tags ) > 1 {
dst . InstanceID = uint32 ( xxhash . Sum64 ( mn . Tags [ 1 ] . Value ) )
}
2020-05-14 13:08:39 +02:00
dst . MetricID = generateUniqueMetricID ( )
2019-05-22 23:16:55 +02:00
}
2022-12-04 08:30:31 +01:00
func ( is * indexSearch ) createGlobalIndexes ( tsid * TSID , mn * MetricName ) {
2021-02-09 23:44:54 +01:00
ii := getIndexItems ( )
defer putIndexItems ( ii )
2019-05-22 23:16:55 +02:00
2024-04-16 18:53:29 +02:00
// Create metricID -> metricName entry.
2021-02-09 23:44:54 +01:00
ii . B = marshalCommonPrefix ( ii . B , nsPrefixMetricIDToMetricName )
ii . B = encoding . MarshalUint64 ( ii . B , tsid . MetricID )
ii . B = mn . Marshal ( ii . B )
ii . Next ( )
2019-05-22 23:16:55 +02:00
2024-04-16 18:53:29 +02:00
// Create metricID -> TSID entry.
2021-02-09 23:44:54 +01:00
ii . B = marshalCommonPrefix ( ii . B , nsPrefixMetricIDToTSID )
ii . B = encoding . MarshalUint64 ( ii . B , tsid . MetricID )
ii . B = tsid . Marshal ( ii . B )
ii . Next ( )
2019-05-22 23:16:55 +02:00
2024-04-16 18:53:29 +02:00
// Create tag -> metricID entries for every tag in mn.
kb := kbPool . Get ( )
kb . B = marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixTagToMetricIDs )
ii . registerTagIndexes ( kb . B , mn , tsid . MetricID )
kbPool . Put ( kb )
2021-02-09 23:44:54 +01:00
2022-12-04 08:30:31 +01:00
is . db . tb . AddItems ( ii . Items )
2019-05-22 23:16:55 +02:00
}
type indexItems struct {
B [ ] byte
Items [ ] [ ] byte
start int
}
func ( ii * indexItems ) reset ( ) {
ii . B = ii . B [ : 0 ]
ii . Items = ii . Items [ : 0 ]
ii . start = 0
}
func ( ii * indexItems ) Next ( ) {
ii . Items = append ( ii . Items , ii . B [ ii . start : ] )
ii . start = len ( ii . B )
}
func getIndexItems ( ) * indexItems {
v := indexItemsPool . Get ( )
if v == nil {
return & indexItems { }
}
return v . ( * indexItems )
}
func putIndexItems ( ii * indexItems ) {
ii . reset ( )
indexItemsPool . Put ( ii )
}
var indexItemsPool sync . Pool
2022-06-12 03:32:13 +02:00
// SearchLabelNamesWithFiltersOnTimeRange returns all the label names, which match the given tfss on the given tr.
func ( db * indexDB ) SearchLabelNamesWithFiltersOnTimeRange ( qt * querytracer . Tracer , tfss [ ] * TagFilters , tr TimeRange , maxLabelNames , maxMetrics int , deadline uint64 ) ( [ ] string , error ) {
qt = qt . NewChild ( "search for label names: filters=%s, timeRange=%s, maxLabelNames=%d, maxMetrics=%d" , tfss , & tr , maxLabelNames , maxMetrics )
defer qt . Done ( )
2024-03-10 11:54:20 +01:00
2022-06-12 03:32:13 +02:00
lns := make ( map [ string ] struct { } )
qtChild := qt . NewChild ( "search for label names in the current indexdb" )
2020-11-04 23:15:43 +01:00
is := db . getIndexSearch ( deadline )
2022-06-12 03:32:13 +02:00
err := is . searchLabelNamesWithFiltersOnTimeRange ( qtChild , lns , tfss , tr , maxLabelNames , maxMetrics )
2020-11-04 23:15:43 +01:00
db . putIndexSearch ( is )
2022-06-12 03:32:13 +02:00
qtChild . Donef ( "found %d label names" , len ( lns ) )
2020-11-04 23:15:43 +01:00
if err != nil {
return nil , err
}
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2022-06-12 03:32:13 +02:00
qtChild := qt . NewChild ( "search for label names in the previous indexdb" )
lnsLen := len ( lns )
2020-11-04 23:15:43 +01:00
is := extDB . getIndexSearch ( deadline )
2022-06-12 03:32:13 +02:00
err = is . searchLabelNamesWithFiltersOnTimeRange ( qtChild , lns , tfss , tr , maxLabelNames , maxMetrics )
2020-11-04 23:15:43 +01:00
extDB . putIndexSearch ( is )
2022-06-12 03:32:13 +02:00
qtChild . Donef ( "found %d additional label names" , len ( lns ) - lnsLen )
2020-11-04 23:15:43 +01:00
} )
2022-12-19 22:20:58 +01:00
if err != nil {
2020-11-04 23:15:43 +01:00
return nil , err
}
2022-06-12 03:32:13 +02:00
labelNames := make ( [ ] string , 0 , len ( lns ) )
for labelName := range lns {
labelNames = append ( labelNames , labelName )
2020-11-04 23:15:43 +01:00
}
2022-06-12 03:32:13 +02:00
// Do not sort label names, since they must be sorted by vmselect.
qt . Printf ( "found %d label names in the current and the previous indexdb" , len ( labelNames ) )
return labelNames , nil
2020-11-04 23:15:43 +01:00
}
2022-06-12 03:32:13 +02:00
func ( is * indexSearch ) searchLabelNamesWithFiltersOnTimeRange ( qt * querytracer . Tracer , lns map [ string ] struct { } , tfss [ ] * TagFilters , tr TimeRange , maxLabelNames , maxMetrics int ) error {
2020-11-04 23:15:43 +01:00
minDate := uint64 ( tr . MinTimestamp ) / msecPerDay
2022-06-12 03:32:13 +02:00
maxDate := uint64 ( tr . MaxTimestamp - 1 ) / msecPerDay
if maxDate == 0 || minDate > maxDate || maxDate - minDate > maxDaysForPerDaySearch {
qtChild := qt . NewChild ( "search for label names in global index: filters=%s" , tfss )
err := is . searchLabelNamesWithFiltersOnDate ( qtChild , lns , tfss , 0 , maxLabelNames , maxMetrics )
qtChild . Done ( )
return err
2021-04-07 12:31:57 +02:00
}
2020-11-04 23:15:43 +01:00
var mu sync . Mutex
2022-04-06 12:34:00 +02:00
wg := getWaitGroup ( )
2020-11-04 23:15:43 +01:00
var errGlobal error
2022-06-12 03:32:13 +02:00
qt = qt . NewChild ( "parallel search for label names: filters=%s, timeRange=%s" , tfss , & tr )
2020-11-04 23:15:43 +01:00
for date := minDate ; date <= maxDate ; date ++ {
wg . Add ( 1 )
2022-06-30 17:17:07 +02:00
qtChild := qt . NewChild ( "search for label names: filters=%s, date=%s" , tfss , dateToString ( date ) )
2020-11-04 23:15:43 +01:00
go func ( date uint64 ) {
2022-06-12 03:32:13 +02:00
defer func ( ) {
qtChild . Done ( )
wg . Done ( )
} ( )
lnsLocal := make ( map [ string ] struct { } )
2020-11-04 23:15:43 +01:00
isLocal := is . db . getIndexSearch ( is . deadline )
2022-06-12 03:32:13 +02:00
err := isLocal . searchLabelNamesWithFiltersOnDate ( qtChild , lnsLocal , tfss , date , maxLabelNames , maxMetrics )
2020-11-04 23:15:43 +01:00
is . db . putIndexSearch ( isLocal )
mu . Lock ( )
defer mu . Unlock ( )
if errGlobal != nil {
return
}
if err != nil {
errGlobal = err
return
}
2022-06-12 03:32:13 +02:00
if len ( lns ) >= maxLabelNames {
2020-11-04 23:15:43 +01:00
return
}
2022-06-12 03:32:13 +02:00
for k := range lnsLocal {
lns [ k ] = struct { } { }
2020-11-04 23:15:43 +01:00
}
} ( date )
}
wg . Wait ( )
2022-04-06 12:34:00 +02:00
putWaitGroup ( wg )
2022-06-12 03:32:13 +02:00
qt . Done ( )
2020-11-04 23:15:43 +01:00
return errGlobal
}
2022-06-12 03:32:13 +02:00
func ( is * indexSearch ) searchLabelNamesWithFiltersOnDate ( qt * querytracer . Tracer , lns map [ string ] struct { } , tfss [ ] * TagFilters , date uint64 , maxLabelNames , maxMetrics int ) error {
filter , err := is . searchMetricIDsWithFiltersOnDate ( qt , tfss , date , maxMetrics )
2019-05-22 23:16:55 +02:00
if err != nil {
2022-06-12 03:32:13 +02:00
return err
2019-05-22 23:16:55 +02:00
}
2022-08-16 12:32:30 +02:00
if filter != nil && filter . Len ( ) <= 100e3 {
// It is faster to obtain label names by metricIDs from the filter
// instead of scanning the inverted index for the matching filters.
2022-08-17 20:32:25 +02:00
// This would help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978
2022-08-16 12:32:30 +02:00
metricIDs := filter . AppendTo ( nil )
qt . Printf ( "sort %d metricIDs" , len ( metricIDs ) )
2023-09-22 11:32:59 +02:00
is . getLabelNamesForMetricIDs ( qt , metricIDs , lns , maxLabelNames )
return nil
2019-05-22 23:16:55 +02:00
}
2024-03-12 00:43:27 +01:00
2022-06-12 03:32:13 +02:00
var prevLabelName [ ] byte
2019-05-22 23:16:55 +02:00
ts := & is . ts
kb := & is . kb
2019-09-20 18:46:47 +02:00
mp := & is . mp
2021-06-15 13:56:51 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
2020-07-23 18:21:49 +02:00
loopsPaceLimiter := 0
2023-01-07 10:04:41 +01:00
underscoreNameSeen := false
2022-06-12 03:32:13 +02:00
nsPrefixExpected := byte ( nsPrefixDateTagToMetricIDs )
if date == 0 {
nsPrefixExpected = nsPrefixTagToMetricIDs
}
2024-03-12 00:43:27 +01:00
hasCompositeLabelName := false
2022-06-12 03:32:13 +02:00
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
2024-03-12 00:43:27 +01:00
if name := getCommonMetricNameForTagFilterss ( tfss ) ; len ( name ) > 0 {
compositeLabelName := marshalCompositeTagKey ( nil , name , nil )
kb . B = marshalTagValue ( kb . B , compositeLabelName )
// Drop trailing tagSeparator
kb . B = kb . B [ : len ( kb . B ) - 1 ]
hasCompositeLabelName = true
}
prefix := append ( [ ] byte { } , kb . B ... )
2019-09-20 18:46:47 +02:00
ts . Seek ( prefix )
2022-06-12 03:32:13 +02:00
for len ( lns ) < maxLabelNames && ts . NextItem ( ) {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return err
}
2020-07-23 18:21:49 +02:00
}
loopsPaceLimiter ++
2019-05-22 23:16:55 +02:00
item := ts . Item
2019-09-20 18:46:47 +02:00
if ! bytes . HasPrefix ( item , prefix ) {
2019-05-22 23:16:55 +02:00
break
}
2022-06-12 03:32:13 +02:00
if err := mp . Init ( item , nsPrefixExpected ) ; err != nil {
2019-09-20 18:46:47 +02:00
return err
2019-05-22 23:16:55 +02:00
}
2022-06-14 15:32:38 +02:00
if mp . GetMatchingSeriesCount ( filter , dmis ) == 0 {
2022-06-12 03:32:13 +02:00
continue
2021-02-09 23:44:54 +01:00
}
2022-06-12 03:32:13 +02:00
labelName := mp . Tag . Key
2024-03-12 00:43:27 +01:00
if len ( labelName ) == 0 || hasCompositeLabelName {
2023-01-07 10:04:41 +01:00
underscoreNameSeen = true
2022-06-12 03:32:13 +02:00
}
2024-03-12 00:43:27 +01:00
if ( ! hasCompositeLabelName && isArtificialTagKey ( labelName ) ) || string ( labelName ) == string ( prevLabelName ) {
2022-06-12 03:32:13 +02:00
// Search for the next tag key.
// The last char in kb.B must be tagSeparatorChar.
// Just increment it in order to jump to the next tag key.
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
2024-03-12 00:43:27 +01:00
if ! hasCompositeLabelName && len ( labelName ) > 0 && labelName [ 0 ] == compositeTagKeyPrefix {
2022-06-12 03:32:13 +02:00
// skip composite tag entries
kb . B = append ( kb . B , compositeTagKeyPrefix )
} else {
kb . B = marshalTagValue ( kb . B , labelName )
}
kb . B [ len ( kb . B ) - 1 ] ++
ts . Seek ( kb . B )
2023-01-07 09:50:14 +01:00
continue
}
2024-03-12 00:43:27 +01:00
if ! hasCompositeLabelName {
lns [ string ( labelName ) ] = struct { } { }
} else {
_ , key , err := unmarshalCompositeTagKey ( labelName )
if err != nil {
return fmt . Errorf ( "cannot unmarshal composite tag key: %s" , err )
}
lns [ string ( key ) ] = struct { } { }
}
2022-06-12 03:32:13 +02:00
prevLabelName = append ( prevLabelName [ : 0 ] , labelName ... )
2019-05-22 23:16:55 +02:00
}
2023-01-07 10:04:41 +01:00
if underscoreNameSeen {
lns [ "__name__" ] = struct { } { }
}
2019-05-22 23:16:55 +02:00
if err := ts . Error ( ) ; err != nil {
2020-06-30 21:58:18 +02:00
return fmt . Errorf ( "error during search for prefix %q: %w" , prefix , err )
2019-05-22 23:16:55 +02:00
}
return nil
}
2023-09-22 11:32:59 +02:00
func ( is * indexSearch ) getLabelNamesForMetricIDs ( qt * querytracer . Tracer , metricIDs [ ] uint64 , lns map [ string ] struct { } , maxLabelNames int ) {
2024-03-11 19:37:05 +01:00
if len ( metricIDs ) > 0 {
lns [ "__name__" ] = struct { } { }
}
2024-05-29 14:07:44 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
checkDeleted := dmis . Len ( ) > 0
2022-08-16 12:32:30 +02:00
var mn MetricName
foundLabelNames := 0
var buf [ ] byte
for _ , metricID := range metricIDs {
2024-05-29 14:07:44 +02:00
if checkDeleted && dmis . Has ( metricID ) {
// skip deleted IDs from result
continue
}
2023-09-22 11:32:59 +02:00
var ok bool
buf , ok = is . searchMetricNameWithCache ( buf [ : 0 ] , metricID )
if ! ok {
// It is likely the metricID->metricName entry didn't propagate to inverted index yet.
// Skip this metricID for now.
continue
2022-08-16 12:32:30 +02:00
}
if err := mn . Unmarshal ( buf ) ; err != nil {
2023-10-25 21:24:01 +02:00
logger . Panicf ( "FATAL: cannot unmarshal metricName %q: %s" , buf , err )
2022-08-16 12:32:30 +02:00
}
for _ , tag := range mn . Tags {
2023-09-22 11:32:59 +02:00
if _ , ok := lns [ string ( tag . Key ) ] ; ! ok {
2022-08-16 12:32:30 +02:00
foundLabelNames ++
lns [ string ( tag . Key ) ] = struct { } { }
if len ( lns ) >= maxLabelNames {
qt . Printf ( "hit the limit on the number of unique label names: %d" , maxLabelNames )
2023-09-22 11:32:59 +02:00
return
2022-08-16 12:32:30 +02:00
}
}
}
}
qt . Printf ( "get %d distinct label names from %d metricIDs" , foundLabelNames , len ( metricIDs ) )
}
2022-06-12 03:32:13 +02:00
// SearchLabelValuesWithFiltersOnTimeRange returns label values for the given labelName, tfss and tr.
func ( db * indexDB ) SearchLabelValuesWithFiltersOnTimeRange ( qt * querytracer . Tracer , labelName string , tfss [ ] * TagFilters , tr TimeRange ,
maxLabelValues , maxMetrics int , deadline uint64 ) ( [ ] string , error ) {
qt = qt . NewChild ( "search for label values: labelName=%q, filters=%s, timeRange=%s, maxLabelNames=%d, maxMetrics=%d" , labelName , tfss , & tr , maxLabelValues , maxMetrics )
defer qt . Done ( )
2024-03-10 11:54:20 +01:00
2022-06-12 03:32:13 +02:00
lvs := make ( map [ string ] struct { } )
qtChild := qt . NewChild ( "search for label values in the current indexdb" )
2020-11-04 23:15:43 +01:00
is := db . getIndexSearch ( deadline )
2022-06-12 03:32:13 +02:00
err := is . searchLabelValuesWithFiltersOnTimeRange ( qtChild , lvs , labelName , tfss , tr , maxLabelValues , maxMetrics )
2020-11-04 23:15:43 +01:00
db . putIndexSearch ( is )
2022-06-12 03:32:13 +02:00
qtChild . Donef ( "found %d label values" , len ( lvs ) )
2020-11-04 23:15:43 +01:00
if err != nil {
return nil , err
}
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2022-06-12 03:32:13 +02:00
qtChild := qt . NewChild ( "search for label values in the previous indexdb" )
lvsLen := len ( lvs )
2020-11-04 23:15:43 +01:00
is := extDB . getIndexSearch ( deadline )
2022-06-12 03:32:13 +02:00
err = is . searchLabelValuesWithFiltersOnTimeRange ( qtChild , lvs , labelName , tfss , tr , maxLabelValues , maxMetrics )
2020-11-04 23:15:43 +01:00
extDB . putIndexSearch ( is )
2022-06-12 03:32:13 +02:00
qtChild . Donef ( "found %d additional label values" , len ( lvs ) - lvsLen )
2020-11-04 23:15:43 +01:00
} )
2022-12-19 22:20:58 +01:00
if err != nil {
2020-11-04 23:15:43 +01:00
return nil , err
}
2022-06-12 03:32:13 +02:00
labelValues := make ( [ ] string , 0 , len ( lvs ) )
for labelValue := range lvs {
if len ( labelValue ) == 0 {
2020-11-04 23:15:43 +01:00
// Skip empty values, since they have no any meaning.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/600
continue
}
2022-06-12 03:32:13 +02:00
labelValues = append ( labelValues , labelValue )
2020-11-04 23:15:43 +01:00
}
2022-06-12 03:32:13 +02:00
// Do not sort labelValues, since they must be sorted by vmselect.
qt . Printf ( "found %d label values in the current and the previous indexdb" , len ( labelValues ) )
return labelValues , nil
2020-11-04 23:15:43 +01:00
}
2022-06-12 03:32:13 +02:00
func ( is * indexSearch ) searchLabelValuesWithFiltersOnTimeRange ( qt * querytracer . Tracer , lvs map [ string ] struct { } , labelName string , tfss [ ] * TagFilters ,
tr TimeRange , maxLabelValues , maxMetrics int ) error {
2020-11-04 23:15:43 +01:00
minDate := uint64 ( tr . MinTimestamp ) / msecPerDay
2022-06-12 03:32:13 +02:00
maxDate := uint64 ( tr . MaxTimestamp - 1 ) / msecPerDay
if maxDate == 0 || minDate > maxDate || maxDate - minDate > maxDaysForPerDaySearch {
qtChild := qt . NewChild ( "search for label values in global index: labelName=%q, filters=%s" , labelName , tfss )
err := is . searchLabelValuesWithFiltersOnDate ( qtChild , lvs , labelName , tfss , 0 , maxLabelValues , maxMetrics )
qtChild . Done ( )
return err
2021-04-07 12:31:57 +02:00
}
2020-11-04 23:15:43 +01:00
var mu sync . Mutex
2022-04-06 12:34:00 +02:00
wg := getWaitGroup ( )
2020-11-04 23:15:43 +01:00
var errGlobal error
2022-06-12 03:32:13 +02:00
qt = qt . NewChild ( "parallel search for label values: labelName=%q, filters=%s, timeRange=%s" , labelName , tfss , & tr )
2020-11-04 23:15:43 +01:00
for date := minDate ; date <= maxDate ; date ++ {
wg . Add ( 1 )
2022-06-30 17:17:07 +02:00
qtChild := qt . NewChild ( "search for label names: filters=%s, date=%s" , tfss , dateToString ( date ) )
2020-11-04 23:15:43 +01:00
go func ( date uint64 ) {
2022-06-12 03:32:13 +02:00
defer func ( ) {
qtChild . Done ( )
wg . Done ( )
} ( )
lvsLocal := make ( map [ string ] struct { } )
2020-11-04 23:15:43 +01:00
isLocal := is . db . getIndexSearch ( is . deadline )
2022-06-12 03:32:13 +02:00
err := isLocal . searchLabelValuesWithFiltersOnDate ( qtChild , lvsLocal , labelName , tfss , date , maxLabelValues , maxMetrics )
2020-11-04 23:15:43 +01:00
is . db . putIndexSearch ( isLocal )
mu . Lock ( )
defer mu . Unlock ( )
if errGlobal != nil {
return
}
if err != nil {
errGlobal = err
return
}
2022-06-12 03:32:13 +02:00
if len ( lvs ) >= maxLabelValues {
2020-11-04 23:15:43 +01:00
return
}
2022-06-12 03:32:13 +02:00
for v := range lvsLocal {
lvs [ v ] = struct { } { }
2020-11-04 23:15:43 +01:00
}
} ( date )
}
wg . Wait ( )
2022-04-06 12:34:00 +02:00
putWaitGroup ( wg )
2022-06-12 03:32:13 +02:00
qt . Done ( )
2020-11-04 23:15:43 +01:00
return errGlobal
}
2022-06-12 03:32:13 +02:00
func ( is * indexSearch ) searchLabelValuesWithFiltersOnDate ( qt * querytracer . Tracer , lvs map [ string ] struct { } , labelName string , tfss [ ] * TagFilters ,
date uint64 , maxLabelValues , maxMetrics int ) error {
filter , err := is . searchMetricIDsWithFiltersOnDate ( qt , tfss , date , maxMetrics )
2019-05-22 23:16:55 +02:00
if err != nil {
2022-06-12 03:32:13 +02:00
return err
2019-05-22 23:16:55 +02:00
}
2024-03-12 00:43:27 +01:00
if filter != nil && filter . Len ( ) <= 100e3 {
2022-08-16 12:44:45 +02:00
// It is faster to obtain label values by metricIDs from the filter
2022-08-16 12:32:30 +02:00
// instead of scanning the inverted index for the matching filters.
2022-08-17 20:32:25 +02:00
// This would help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978
2022-08-16 12:32:30 +02:00
metricIDs := filter . AppendTo ( nil )
qt . Printf ( "sort %d metricIDs" , len ( metricIDs ) )
2023-09-22 11:32:59 +02:00
is . getLabelValuesForMetricIDs ( qt , lvs , labelName , metricIDs , maxLabelValues )
return nil
2019-05-22 23:16:55 +02:00
}
2022-06-12 03:32:13 +02:00
if labelName == "__name__" {
// __name__ label is encoded as empty string in indexdb.
labelName = ""
2019-05-22 23:16:55 +02:00
}
2024-03-12 00:43:27 +01:00
2022-06-12 03:32:13 +02:00
labelNameBytes := bytesutil . ToUnsafeBytes ( labelName )
2024-03-12 00:43:27 +01:00
if name := getCommonMetricNameForTagFilterss ( tfss ) ; len ( name ) > 0 && labelName != "" {
labelNameBytes = marshalCompositeTagKey ( nil , name , labelNameBytes )
}
2022-06-12 03:32:13 +02:00
var prevLabelValue [ ] byte
2019-05-22 23:16:55 +02:00
ts := & is . ts
kb := & is . kb
2019-09-20 18:46:47 +02:00
mp := & is . mp
2021-06-15 13:56:51 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
2020-07-23 18:21:49 +02:00
loopsPaceLimiter := 0
2022-06-12 03:32:13 +02:00
nsPrefixExpected := byte ( nsPrefixDateTagToMetricIDs )
if date == 0 {
nsPrefixExpected = nsPrefixTagToMetricIDs
}
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
kb . B = marshalTagValue ( kb . B , labelNameBytes )
2024-03-12 00:43:27 +01:00
prefix := append ( [ ] byte { } , kb . B ... )
2019-05-22 23:16:55 +02:00
ts . Seek ( prefix )
2022-06-12 03:32:13 +02:00
for len ( lvs ) < maxLabelValues && ts . NextItem ( ) {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return err
}
2020-07-23 18:21:49 +02:00
}
loopsPaceLimiter ++
2019-09-20 18:46:47 +02:00
item := ts . Item
if ! bytes . HasPrefix ( item , prefix ) {
2019-05-22 23:16:55 +02:00
break
}
2022-06-12 03:32:13 +02:00
if err := mp . Init ( item , nsPrefixExpected ) ; err != nil {
2019-09-20 18:46:47 +02:00
return err
2019-05-22 23:16:55 +02:00
}
2022-06-14 15:32:38 +02:00
if mp . GetMatchingSeriesCount ( filter , dmis ) == 0 {
2022-06-12 03:32:13 +02:00
continue
2022-02-17 11:29:24 +01:00
}
2022-06-12 03:32:13 +02:00
labelValue := mp . Tag . Value
if string ( labelValue ) == string ( prevLabelValue ) {
// Search for the next tag value.
// The last char in kb.B must be tagSeparatorChar.
// Just increment it in order to jump to the next tag value.
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
kb . B = marshalTagValue ( kb . B , labelNameBytes )
kb . B = marshalTagValue ( kb . B , labelValue )
kb . B [ len ( kb . B ) - 1 ] ++
ts . Seek ( kb . B )
2022-02-17 11:29:24 +01:00
continue
2019-12-02 23:29:44 +01:00
}
2022-06-12 03:32:13 +02:00
lvs [ string ( labelValue ) ] = struct { } { }
prevLabelValue = append ( prevLabelValue [ : 0 ] , labelValue ... )
2019-05-22 23:16:55 +02:00
}
if err := ts . Error ( ) ; err != nil {
2020-06-30 21:58:18 +02:00
return fmt . Errorf ( "error when searching for tag name prefix %q: %w" , prefix , err )
2019-05-22 23:16:55 +02:00
}
return nil
}
2023-09-22 11:32:59 +02:00
func ( is * indexSearch ) getLabelValuesForMetricIDs ( qt * querytracer . Tracer , lvs map [ string ] struct { } , labelName string , metricIDs [ ] uint64 , maxLabelValues int ) {
2022-08-17 20:32:25 +02:00
if labelName == "" {
labelName = "__name__"
}
2024-05-29 14:07:44 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
checkDeleted := dmis . Len ( ) > 0
2022-08-16 12:32:30 +02:00
var mn MetricName
foundLabelValues := 0
var buf [ ] byte
for _ , metricID := range metricIDs {
2024-05-29 14:07:44 +02:00
if checkDeleted && dmis . Has ( metricID ) {
// skip deleted IDs from result
continue
}
2023-09-22 11:32:59 +02:00
var ok bool
buf , ok = is . searchMetricNameWithCache ( buf [ : 0 ] , metricID )
if ! ok {
// It is likely the metricID->metricName entry didn't propagate to inverted index yet.
// Skip this metricID for now.
continue
2022-08-16 12:32:30 +02:00
}
if err := mn . Unmarshal ( buf ) ; err != nil {
2023-09-22 11:32:59 +02:00
logger . Panicf ( "FATAL: cannot unmarshal metricName %q: %s" , buf , err )
2022-08-16 12:32:30 +02:00
}
tagValue := mn . GetTagValue ( labelName )
2023-09-22 11:32:59 +02:00
if _ , ok := lvs [ string ( tagValue ) ] ; ! ok {
2022-08-16 12:32:30 +02:00
foundLabelValues ++
lvs [ string ( tagValue ) ] = struct { } { }
if len ( lvs ) >= maxLabelValues {
qt . Printf ( "hit the limit on the number of unique label values for label %q: %d" , labelName , maxLabelValues )
2023-09-22 11:32:59 +02:00
return
2022-08-16 12:32:30 +02:00
}
}
}
qt . Printf ( "get %d distinct values for label %q from %d metricIDs" , foundLabelValues , labelName , len ( metricIDs ) )
}
2020-09-10 23:28:19 +02:00
// SearchTagValueSuffixes returns all the tag value suffixes for the given tagKey and tagValuePrefix on the given tr.
//
// This allows implementing https://graphite-api.readthedocs.io/en/latest/api.html#metrics-find or similar APIs.
2021-02-02 23:24:05 +01:00
//
// If it returns maxTagValueSuffixes suffixes, then it is likely more than maxTagValueSuffixes suffixes is found.
2022-07-05 22:47:46 +02:00
func ( db * indexDB ) SearchTagValueSuffixes ( qt * querytracer . Tracer , tr TimeRange , tagKey , tagValuePrefix string , delimiter byte , maxTagValueSuffixes int , deadline uint64 ) ( [ ] string , error ) {
2022-06-27 11:53:46 +02:00
qt = qt . NewChild ( "search tag value suffixes for timeRange=%s, tagKey=%q, tagValuePrefix=%q, delimiter=%c, maxTagValueSuffixes=%d" ,
& tr , tagKey , tagValuePrefix , delimiter , maxTagValueSuffixes )
defer qt . Done ( )
2020-09-10 23:28:19 +02:00
// TODO: cache results?
tvss := make ( map [ string ] struct { } )
is := db . getIndexSearch ( deadline )
err := is . searchTagValueSuffixesForTimeRange ( tvss , tr , tagKey , tagValuePrefix , delimiter , maxTagValueSuffixes )
db . putIndexSearch ( is )
if err != nil {
return nil , err
}
2021-02-02 23:24:05 +01:00
if len ( tvss ) < maxTagValueSuffixes {
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2021-02-02 23:24:05 +01:00
is := extDB . getIndexSearch ( deadline )
2022-06-27 11:53:46 +02:00
qtChild := qt . NewChild ( "search tag value suffixes in the previous indexdb" )
2021-02-02 23:24:05 +01:00
err = is . searchTagValueSuffixesForTimeRange ( tvss , tr , tagKey , tagValuePrefix , delimiter , maxTagValueSuffixes )
2022-06-27 11:53:46 +02:00
qtChild . Done ( )
2021-02-02 23:24:05 +01:00
extDB . putIndexSearch ( is )
} )
2022-12-19 22:20:58 +01:00
if err != nil {
2021-02-02 23:24:05 +01:00
return nil , err
}
2020-09-10 23:28:19 +02:00
}
suffixes := make ( [ ] string , 0 , len ( tvss ) )
for suffix := range tvss {
// Do not skip empty suffixes, since they may represent leaf tag values.
suffixes = append ( suffixes , suffix )
}
2021-02-02 23:24:05 +01:00
if len ( suffixes ) > maxTagValueSuffixes {
suffixes = suffixes [ : maxTagValueSuffixes ]
}
2020-09-10 23:28:19 +02:00
// Do not sort suffixes, since they must be sorted by vmselect.
2022-06-27 11:53:46 +02:00
qt . Printf ( "found %d suffixes" , len ( suffixes ) )
2020-09-10 23:28:19 +02:00
return suffixes , nil
}
2022-07-05 22:47:46 +02:00
func ( is * indexSearch ) searchTagValueSuffixesForTimeRange ( tvss map [ string ] struct { } , tr TimeRange , tagKey , tagValuePrefix string , delimiter byte , maxTagValueSuffixes int ) error {
2020-09-10 23:28:19 +02:00
minDate := uint64 ( tr . MinTimestamp ) / msecPerDay
2022-06-12 03:32:13 +02:00
maxDate := uint64 ( tr . MaxTimestamp - 1 ) / msecPerDay
2021-04-07 12:31:57 +02:00
if minDate > maxDate || maxDate - minDate > maxDaysForPerDaySearch {
2020-09-10 23:28:19 +02:00
return is . searchTagValueSuffixesAll ( tvss , tagKey , tagValuePrefix , delimiter , maxTagValueSuffixes )
}
// Query over multiple days in parallel.
2022-04-06 12:34:00 +02:00
wg := getWaitGroup ( )
2020-09-10 23:28:19 +02:00
var errGlobal error
var mu sync . Mutex // protects tvss + errGlobal from concurrent access below.
for minDate <= maxDate {
wg . Add ( 1 )
go func ( date uint64 ) {
defer wg . Done ( )
tvssLocal := make ( map [ string ] struct { } )
isLocal := is . db . getIndexSearch ( is . deadline )
err := isLocal . searchTagValueSuffixesForDate ( tvssLocal , date , tagKey , tagValuePrefix , delimiter , maxTagValueSuffixes )
2021-02-16 20:22:10 +01:00
is . db . putIndexSearch ( isLocal )
2020-09-10 23:28:19 +02:00
mu . Lock ( )
defer mu . Unlock ( )
if errGlobal != nil {
return
}
if err != nil {
errGlobal = err
return
}
2021-02-02 23:24:05 +01:00
if len ( tvss ) > maxTagValueSuffixes {
return
}
2020-09-10 23:28:19 +02:00
for k := range tvssLocal {
tvss [ k ] = struct { } { }
}
} ( minDate )
minDate ++
}
wg . Wait ( )
2022-04-06 12:34:00 +02:00
putWaitGroup ( wg )
2020-09-10 23:28:19 +02:00
return errGlobal
}
2022-07-05 22:47:46 +02:00
func ( is * indexSearch ) searchTagValueSuffixesAll ( tvss map [ string ] struct { } , tagKey , tagValuePrefix string , delimiter byte , maxTagValueSuffixes int ) error {
2020-09-10 23:28:19 +02:00
kb := & is . kb
nsPrefix := byte ( nsPrefixTagToMetricIDs )
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefix )
2022-07-05 22:47:46 +02:00
kb . B = marshalTagValue ( kb . B , bytesutil . ToUnsafeBytes ( tagKey ) )
kb . B = marshalTagValue ( kb . B , bytesutil . ToUnsafeBytes ( tagValuePrefix ) )
2020-09-10 23:28:19 +02:00
kb . B = kb . B [ : len ( kb . B ) - 1 ] // remove tagSeparatorChar from the end of kb.B
prefix := append ( [ ] byte ( nil ) , kb . B ... )
2021-02-02 23:24:05 +01:00
return is . searchTagValueSuffixesForPrefix ( tvss , nsPrefix , prefix , len ( tagValuePrefix ) , delimiter , maxTagValueSuffixes )
2020-09-10 23:28:19 +02:00
}
2022-07-05 22:47:46 +02:00
func ( is * indexSearch ) searchTagValueSuffixesForDate ( tvss map [ string ] struct { } , date uint64 , tagKey , tagValuePrefix string , delimiter byte , maxTagValueSuffixes int ) error {
2020-09-10 23:28:19 +02:00
nsPrefix := byte ( nsPrefixDateTagToMetricIDs )
kb := & is . kb
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefix )
kb . B = encoding . MarshalUint64 ( kb . B , date )
2022-07-05 22:47:46 +02:00
kb . B = marshalTagValue ( kb . B , bytesutil . ToUnsafeBytes ( tagKey ) )
kb . B = marshalTagValue ( kb . B , bytesutil . ToUnsafeBytes ( tagValuePrefix ) )
2020-09-10 23:28:19 +02:00
kb . B = kb . B [ : len ( kb . B ) - 1 ] // remove tagSeparatorChar from the end of kb.B
prefix := append ( [ ] byte ( nil ) , kb . B ... )
2021-02-02 23:24:05 +01:00
return is . searchTagValueSuffixesForPrefix ( tvss , nsPrefix , prefix , len ( tagValuePrefix ) , delimiter , maxTagValueSuffixes )
2020-09-10 23:28:19 +02:00
}
2021-02-02 23:24:05 +01:00
func ( is * indexSearch ) searchTagValueSuffixesForPrefix ( tvss map [ string ] struct { } , nsPrefix byte , prefix [ ] byte , tagValuePrefixLen int , delimiter byte , maxTagValueSuffixes int ) error {
2020-09-10 23:28:19 +02:00
kb := & is . kb
ts := & is . ts
mp := & is . mp
2021-06-15 13:56:51 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
2020-09-10 23:28:19 +02:00
loopsPaceLimiter := 0
ts . Seek ( prefix )
for len ( tvss ) < maxTagValueSuffixes && ts . NextItem ( ) {
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return err
}
}
loopsPaceLimiter ++
item := ts . Item
if ! bytes . HasPrefix ( item , prefix ) {
break
}
if err := mp . Init ( item , nsPrefix ) ; err != nil {
return err
}
2022-06-14 15:32:38 +02:00
if mp . GetMatchingSeriesCount ( nil , dmis ) == 0 {
2020-09-10 23:28:19 +02:00
continue
}
tagValue := mp . Tag . Value
2021-02-02 23:24:05 +01:00
suffix := tagValue [ tagValuePrefixLen : ]
2020-09-10 23:28:19 +02:00
n := bytes . IndexByte ( suffix , delimiter )
if n < 0 {
// Found leaf tag value that doesn't have delimiters after the given tagValuePrefix.
tvss [ string ( suffix ) ] = struct { } { }
continue
}
// Found non-leaf tag value. Extract suffix that end with the given delimiter.
suffix = suffix [ : n + 1 ]
tvss [ string ( suffix ) ] = struct { } { }
if suffix [ len ( suffix ) - 1 ] == 255 {
continue
}
// Search for the next suffix
suffix [ len ( suffix ) - 1 ] ++
kb . B = append ( kb . B [ : 0 ] , prefix ... )
kb . B = marshalTagValue ( kb . B , suffix )
kb . B = kb . B [ : len ( kb . B ) - 1 ] // remove tagSeparatorChar
ts . Seek ( kb . B )
}
if err := ts . Error ( ) ; err != nil {
return fmt . Errorf ( "error when searching for tag value sufixes for prefix %q: %w" , prefix , err )
}
return nil
}
2019-05-22 23:16:55 +02:00
// GetSeriesCount returns the approximate number of unique timeseries in the db.
//
// It includes the deleted series too and may count the same series
// up to two times - in db and extDB.
2020-07-23 19:42:57 +02:00
func ( db * indexDB ) GetSeriesCount ( deadline uint64 ) ( uint64 , error ) {
is := db . getIndexSearch ( deadline )
2019-06-10 11:27:44 +02:00
n , err := is . getSeriesCount ( )
2019-05-22 23:16:55 +02:00
db . putIndexSearch ( is )
if err != nil {
return 0 , err
}
var nExt uint64
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2020-07-23 19:42:57 +02:00
is := extDB . getIndexSearch ( deadline )
2019-06-10 11:27:44 +02:00
nExt , err = is . getSeriesCount ( )
2019-05-22 23:16:55 +02:00
extDB . putIndexSearch ( is )
} )
2022-12-19 22:20:58 +01:00
if err != nil {
2020-06-30 21:58:18 +02:00
return 0 , fmt . Errorf ( "error when searching in extDB: %w" , err )
2019-05-22 23:16:55 +02:00
}
return n + nExt , nil
}
2020-04-22 18:57:36 +02:00
func ( is * indexSearch ) getSeriesCount ( ) ( uint64 , error ) {
ts := & is . ts
kb := & is . kb
mp := & is . mp
2020-07-23 18:21:49 +02:00
loopsPaceLimiter := 0
2020-04-22 18:57:36 +02:00
var metricIDsLen uint64
// Extract the number of series from ((__name__=value): metricIDs) rows
2020-07-23 23:31:09 +02:00
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixTagToMetricIDs )
2020-04-22 18:57:36 +02:00
kb . B = marshalTagValue ( kb . B , nil )
ts . Seek ( kb . B )
for ts . NextItem ( ) {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return 0 , err
}
2020-07-23 18:21:49 +02:00
}
loopsPaceLimiter ++
2020-04-22 18:57:36 +02:00
item := ts . Item
if ! bytes . HasPrefix ( item , kb . B ) {
break
}
tail := item [ len ( kb . B ) : ]
n := bytes . IndexByte ( tail , tagSeparatorChar )
if n < 0 {
return 0 , fmt . Errorf ( "invalid tag->metricIDs line %q: cannot find tagSeparatorChar %d" , item , tagSeparatorChar )
}
tail = tail [ n + 1 : ]
if err := mp . InitOnlyTail ( item , tail ) ; err != nil {
return 0 , err
}
// Take into account deleted timeseries too.
// It is OK if series can be counted multiple times in rare cases -
// the returned number is an estimation.
metricIDsLen += uint64 ( mp . MetricIDsLen ( ) )
}
if err := ts . Error ( ) ; err != nil {
2020-06-30 21:58:18 +02:00
return 0 , fmt . Errorf ( "error when counting unique timeseries: %w" , err )
2020-04-22 18:57:36 +02:00
}
return metricIDsLen , nil
}
2022-06-14 16:46:16 +02:00
// GetTSDBStatus returns topN entries for tsdb status for the given tfss, date and focusLabel.
func ( db * indexDB ) GetTSDBStatus ( qt * querytracer . Tracer , tfss [ ] * TagFilters , date uint64 , focusLabel string , topN , maxMetrics int , deadline uint64 ) ( * TSDBStatus , error ) {
2022-06-09 18:46:26 +02:00
qtChild := qt . NewChild ( "collect tsdb stats in the current indexdb" )
2024-03-10 11:54:20 +01:00
2021-05-12 14:18:45 +02:00
is := db . getIndexSearch ( deadline )
2022-06-14 16:46:16 +02:00
status , err := is . getTSDBStatus ( qtChild , tfss , date , focusLabel , topN , maxMetrics )
2022-06-09 18:46:26 +02:00
qtChild . Done ( )
2021-05-12 14:18:45 +02:00
db . putIndexSearch ( is )
if err != nil {
return nil , err
}
if status . hasEntries ( ) {
return status , nil
}
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2022-06-09 18:46:26 +02:00
qtChild := qt . NewChild ( "collect tsdb stats in the previous indexdb" )
2021-05-12 14:18:45 +02:00
is := extDB . getIndexSearch ( deadline )
2022-06-14 16:46:16 +02:00
status , err = is . getTSDBStatus ( qtChild , tfss , date , focusLabel , topN , maxMetrics )
2022-06-09 18:46:26 +02:00
qtChild . Done ( )
2021-05-12 14:18:45 +02:00
extDB . putIndexSearch ( is )
} )
2022-12-19 22:20:58 +01:00
if err != nil {
2021-05-12 14:18:45 +02:00
return nil , fmt . Errorf ( "error when obtaining TSDB status from extDB: %w" , err )
}
return status , nil
}
2022-06-14 16:46:16 +02:00
// getTSDBStatus returns topN entries for tsdb status for the given tfss, date and focusLabel.
func ( is * indexSearch ) getTSDBStatus ( qt * querytracer . Tracer , tfss [ ] * TagFilters , date uint64 , focusLabel string , topN , maxMetrics int ) ( * TSDBStatus , error ) {
2022-06-12 03:32:13 +02:00
filter , err := is . searchMetricIDsWithFiltersOnDate ( qt , tfss , date , maxMetrics )
if err != nil {
return nil , err
}
if filter != nil && filter . Len ( ) == 0 {
qt . Printf ( "no matching series for filter=%s" , tfss )
return & TSDBStatus { } , nil
2021-05-12 15:32:48 +02:00
}
ts := & is . ts
kb := & is . kb
mp := & is . mp
2022-06-14 15:32:38 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
2021-05-12 14:18:45 +02:00
thSeriesCountByMetricName := newTopHeap ( topN )
2022-06-14 15:32:38 +02:00
thSeriesCountByLabelName := newTopHeap ( topN )
2022-06-14 16:46:16 +02:00
thSeriesCountByFocusLabelValue := newTopHeap ( topN )
2022-06-14 15:32:38 +02:00
thSeriesCountByLabelValuePair := newTopHeap ( topN )
thLabelValueCountByLabelName := newTopHeap ( topN )
var tmp , prevLabelName , prevLabelValuePair [ ] byte
2021-05-12 15:32:48 +02:00
var labelValueCountByLabelName , seriesCountByLabelValuePair uint64
2022-06-14 15:32:38 +02:00
var totalSeries , labelSeries , totalLabelValuePairs uint64
2021-05-12 15:32:48 +02:00
nameEqualBytes := [ ] byte ( "__name__=" )
2022-06-14 16:46:16 +02:00
focusLabelEqualBytes := [ ] byte ( focusLabel + "=" )
2021-05-12 14:18:45 +02:00
2021-05-12 15:32:48 +02:00
loopsPaceLimiter := 0
2022-06-12 03:32:13 +02:00
nsPrefixExpected := byte ( nsPrefixDateTagToMetricIDs )
if date == 0 {
nsPrefixExpected = nsPrefixTagToMetricIDs
}
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
2024-03-12 00:43:27 +01:00
prefix := append ( [ ] byte { } , kb . B ... )
2021-05-12 15:32:48 +02:00
ts . Seek ( prefix )
for ts . NextItem ( ) {
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
2021-05-12 14:18:45 +02:00
return nil , err
}
}
2021-05-12 15:32:48 +02:00
loopsPaceLimiter ++
item := ts . Item
if ! bytes . HasPrefix ( item , prefix ) {
break
}
2022-06-12 03:32:13 +02:00
if err := mp . Init ( item , nsPrefixExpected ) ; err != nil {
return nil , err
2021-05-12 14:18:45 +02:00
}
2022-06-14 15:32:38 +02:00
matchingSeriesCount := mp . GetMatchingSeriesCount ( filter , dmis )
2022-06-12 03:32:13 +02:00
if matchingSeriesCount == 0 {
// Skip rows without matching metricIDs.
continue
2021-05-12 14:18:45 +02:00
}
2022-06-12 03:32:13 +02:00
tmp = append ( tmp [ : 0 ] , mp . Tag . Key ... )
2022-06-14 15:32:38 +02:00
labelName := tmp
if isArtificialTagKey ( labelName ) {
2021-05-12 15:32:48 +02:00
// Skip artificially created tag keys.
2022-02-21 17:20:36 +01:00
kb . B = append ( kb . B [ : 0 ] , prefix ... )
2022-06-14 15:32:38 +02:00
if len ( labelName ) > 0 && labelName [ 0 ] == compositeTagKeyPrefix {
2022-02-21 17:20:36 +01:00
kb . B = append ( kb . B , compositeTagKeyPrefix )
} else {
2022-06-14 15:32:38 +02:00
kb . B = marshalTagValue ( kb . B , labelName )
2022-02-21 17:20:36 +01:00
}
kb . B [ len ( kb . B ) - 1 ] ++
ts . Seek ( kb . B )
2021-05-12 15:32:48 +02:00
continue
}
2022-06-14 15:32:38 +02:00
if len ( labelName ) == 0 {
labelName = append ( labelName , "__name__" ... )
tmp = labelName
2021-05-12 15:32:48 +02:00
}
2022-06-14 15:32:38 +02:00
if string ( labelName ) == "__name__" {
2022-06-08 17:43:05 +02:00
totalSeries += uint64 ( matchingSeriesCount )
}
2022-06-14 15:32:38 +02:00
tmp = append ( tmp , '=' )
tmp = append ( tmp , mp . Tag . Value ... )
labelValuePair := tmp
if len ( prevLabelName ) == 0 {
prevLabelName = append ( prevLabelName [ : 0 ] , labelName ... )
}
if string ( labelName ) != string ( prevLabelName ) {
thLabelValueCountByLabelName . push ( prevLabelName , labelValueCountByLabelName )
thSeriesCountByLabelName . push ( prevLabelName , labelSeries )
labelSeries = 0
2022-06-08 17:43:05 +02:00
labelValueCountByLabelName = 0
2022-06-14 15:32:38 +02:00
prevLabelName = append ( prevLabelName [ : 0 ] , labelName ... )
}
if len ( prevLabelValuePair ) == 0 {
prevLabelValuePair = append ( prevLabelValuePair [ : 0 ] , labelValuePair ... )
labelValueCountByLabelName ++
2022-06-08 17:43:05 +02:00
}
2022-06-14 15:32:38 +02:00
if string ( labelValuePair ) != string ( prevLabelValuePair ) {
thSeriesCountByLabelValuePair . push ( prevLabelValuePair , seriesCountByLabelValuePair )
if bytes . HasPrefix ( prevLabelValuePair , nameEqualBytes ) {
thSeriesCountByMetricName . push ( prevLabelValuePair [ len ( nameEqualBytes ) : ] , seriesCountByLabelValuePair )
2021-05-12 14:18:45 +02:00
}
2022-06-14 16:46:16 +02:00
if bytes . HasPrefix ( prevLabelValuePair , focusLabelEqualBytes ) {
thSeriesCountByFocusLabelValue . push ( prevLabelValuePair [ len ( focusLabelEqualBytes ) : ] , seriesCountByLabelValuePair )
}
2021-05-12 15:32:48 +02:00
seriesCountByLabelValuePair = 0
labelValueCountByLabelName ++
2022-06-14 15:32:38 +02:00
prevLabelValuePair = append ( prevLabelValuePair [ : 0 ] , labelValuePair ... )
2021-05-12 14:18:45 +02:00
}
2021-05-12 15:32:48 +02:00
// It is OK if series can be counted multiple times in rare cases -
// the returned number is an estimation.
2022-06-14 15:32:38 +02:00
labelSeries += uint64 ( matchingSeriesCount )
2021-05-12 15:32:48 +02:00
seriesCountByLabelValuePair += uint64 ( matchingSeriesCount )
2022-06-08 17:43:05 +02:00
totalLabelValuePairs += uint64 ( matchingSeriesCount )
2021-05-12 14:18:45 +02:00
}
2021-05-12 15:32:48 +02:00
if err := ts . Error ( ) ; err != nil {
return nil , fmt . Errorf ( "error when counting time series by metric names: %w" , err )
2021-05-12 14:18:45 +02:00
}
2022-06-14 15:32:38 +02:00
thLabelValueCountByLabelName . push ( prevLabelName , labelValueCountByLabelName )
thSeriesCountByLabelName . push ( prevLabelName , labelSeries )
thSeriesCountByLabelValuePair . push ( prevLabelValuePair , seriesCountByLabelValuePair )
if bytes . HasPrefix ( prevLabelValuePair , nameEqualBytes ) {
thSeriesCountByMetricName . push ( prevLabelValuePair [ len ( nameEqualBytes ) : ] , seriesCountByLabelValuePair )
2021-05-12 14:18:45 +02:00
}
2022-06-14 16:46:16 +02:00
if bytes . HasPrefix ( prevLabelValuePair , focusLabelEqualBytes ) {
thSeriesCountByFocusLabelValue . push ( prevLabelValuePair [ len ( focusLabelEqualBytes ) : ] , seriesCountByLabelValuePair )
}
2021-05-12 15:32:48 +02:00
status := & TSDBStatus {
2022-06-14 16:46:16 +02:00
TotalSeries : totalSeries ,
TotalLabelValuePairs : totalLabelValuePairs ,
SeriesCountByMetricName : thSeriesCountByMetricName . getSortedResult ( ) ,
SeriesCountByLabelName : thSeriesCountByLabelName . getSortedResult ( ) ,
SeriesCountByFocusLabelValue : thSeriesCountByFocusLabelValue . getSortedResult ( ) ,
SeriesCountByLabelValuePair : thSeriesCountByLabelValuePair . getSortedResult ( ) ,
LabelValueCountByLabelName : thLabelValueCountByLabelName . getSortedResult ( ) ,
2021-05-12 14:18:45 +02:00
}
2021-05-12 15:32:48 +02:00
return status , nil
2021-05-12 14:18:45 +02:00
}
2020-04-22 18:57:36 +02:00
// TSDBStatus contains TSDB status data for /api/v1/status/tsdb.
//
// See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats
type TSDBStatus struct {
2022-06-14 16:46:16 +02:00
TotalSeries uint64
TotalLabelValuePairs uint64
SeriesCountByMetricName [ ] TopHeapEntry
SeriesCountByLabelName [ ] TopHeapEntry
SeriesCountByFocusLabelValue [ ] TopHeapEntry
SeriesCountByLabelValuePair [ ] TopHeapEntry
LabelValueCountByLabelName [ ] TopHeapEntry
2020-04-22 18:57:36 +02:00
}
func ( status * TSDBStatus ) hasEntries ( ) bool {
return len ( status . SeriesCountByLabelValuePair ) > 0
}
// topHeap maintains a heap of topHeapEntries with the maximum TopHeapEntry.n values.
type topHeap struct {
topN int
a [ ] TopHeapEntry
}
// newTopHeap returns topHeap for topN items.
func newTopHeap ( topN int ) * topHeap {
return & topHeap {
topN : topN ,
}
}
// TopHeapEntry represents an entry from `top heap` used in stats.
type TopHeapEntry struct {
Name string
Count uint64
}
2022-06-14 15:32:38 +02:00
func ( th * topHeap ) push ( name [ ] byte , count uint64 ) {
2020-04-22 18:57:36 +02:00
if count == 0 {
return
}
if len ( th . a ) < th . topN {
th . a = append ( th . a , TopHeapEntry {
Name : string ( name ) ,
Count : count ,
} )
heap . Fix ( th , len ( th . a ) - 1 )
return
}
if count <= th . a [ 0 ] . Count {
return
}
th . a [ 0 ] = TopHeapEntry {
Name : string ( name ) ,
Count : count ,
}
heap . Fix ( th , 0 )
}
func ( th * topHeap ) getSortedResult ( ) [ ] TopHeapEntry {
result := append ( [ ] TopHeapEntry { } , th . a ... )
sort . Slice ( result , func ( i , j int ) bool {
a , b := result [ i ] , result [ j ]
if a . Count != b . Count {
return a . Count > b . Count
}
return a . Name < b . Name
} )
return result
}
// heap.Interface implementation for topHeap.
func ( th * topHeap ) Len ( ) int {
return len ( th . a )
}
func ( th * topHeap ) Less ( i , j int ) bool {
a := th . a
return a [ i ] . Count < a [ j ] . Count
}
func ( th * topHeap ) Swap ( i , j int ) {
a := th . a
a [ j ] , a [ i ] = a [ i ] , a [ j ]
}
2023-09-01 09:34:16 +02:00
func ( th * topHeap ) Push ( _ interface { } ) {
2020-04-22 18:57:36 +02:00
panic ( fmt . Errorf ( "BUG: Push shouldn't be called" ) )
}
func ( th * topHeap ) Pop ( ) interface { } {
panic ( fmt . Errorf ( "BUG: Pop shouldn't be called" ) )
}
2021-03-22 21:41:47 +01:00
// searchMetricNameWithCache appends metric name for the given metricID to dst
2019-05-22 23:16:55 +02:00
// and returns the result.
2023-09-22 11:32:59 +02:00
func ( db * indexDB ) searchMetricNameWithCache ( dst [ ] byte , metricID uint64 ) ( [ ] byte , bool ) {
2021-03-22 21:41:47 +01:00
metricName := db . getMetricNameFromCache ( dst , metricID )
if len ( metricName ) > len ( dst ) {
2023-09-22 11:32:59 +02:00
return metricName , true
2021-03-22 21:41:47 +01:00
}
2020-07-23 19:42:57 +02:00
is := db . getIndexSearch ( noDeadline )
2023-09-22 11:32:59 +02:00
var ok bool
dst , ok = is . searchMetricName ( dst , metricID )
2019-05-22 23:16:55 +02:00
db . putIndexSearch ( is )
2023-09-22 11:32:59 +02:00
if ok {
2021-04-13 09:20:35 +02:00
// There is no need in verifying whether the given metricID is deleted,
// since the filtering must be performed before calling this func.
db . putMetricNameToCache ( metricID , dst )
2023-09-22 11:32:59 +02:00
return dst , true
2019-05-22 23:16:55 +02:00
}
// Try searching in the external indexDB.
if db . doExtDB ( func ( extDB * indexDB ) {
2020-07-23 19:42:57 +02:00
is := extDB . getIndexSearch ( noDeadline )
2023-09-22 11:32:59 +02:00
dst , ok = is . searchMetricName ( dst , metricID )
2019-05-22 23:16:55 +02:00
extDB . putIndexSearch ( is )
2023-09-22 11:32:59 +02:00
if ok {
2021-04-13 09:20:35 +02:00
// There is no need in verifying whether the given metricID is deleted,
// since the filtering must be performed before calling this func.
extDB . putMetricNameToCache ( metricID , dst )
}
2023-09-22 11:32:59 +02:00
} ) && ok {
return dst , true
2019-05-22 23:16:55 +02:00
}
2024-03-17 23:19:17 +01:00
// Cannot find the MetricName for the given metricID.
// There are the following expected cases when this may happen:
//
// 1. The corresponding metricID -> metricName entry isn't visible for search yet.
// The solution is to wait for some time and try the search again.
// It is OK if newly registered time series isn't visible for search during some time.
// This should resolve https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959
//
// 2. The metricID -> metricName entry doesn't exist in the indexdb.
// This is possible after unclean shutdown or after restoring of indexdb from a snapshot.
// In this case the metricID must be deleted, so new metricID is registered
// again when new sample for the given metricName is ingested next time.
//
ct := fasttime . UnixTimestamp ( )
db . s . missingMetricIDsLock . Lock ( )
if ct > db . s . missingMetricIDsResetDeadline {
db . s . missingMetricIDs = nil
db . s . missingMetricIDsResetDeadline = ct + 2 * 60
}
2024-03-27 09:51:03 +01:00
deleteDeadline , ok := db . s . missingMetricIDs [ metricID ]
2024-03-17 23:19:17 +01:00
if ! ok {
if db . s . missingMetricIDs == nil {
db . s . missingMetricIDs = make ( map [ uint64 ] uint64 )
}
2024-03-27 09:51:03 +01:00
deleteDeadline = ct + 60
db . s . missingMetricIDs [ metricID ] = deleteDeadline
2024-03-17 23:19:17 +01:00
}
db . s . missingMetricIDsLock . Unlock ( )
2024-03-27 09:51:03 +01:00
if ct > deleteDeadline {
2024-03-17 23:19:17 +01:00
// Cannot find the MetricName for the given metricID for the last 60 seconds.
// It is likely the indexDB contains incomplete set of metricID -> metricName entries
// after unclean shutdown or after restoring from a snapshot.
// Mark the metricID as deleted, so it is created again when new sample
// for the given time series is ingested next time.
db . missingMetricNamesForMetricID . Add ( 1 )
db . deleteMetricIDs ( [ ] uint64 { metricID } )
}
2023-09-22 11:32:59 +02:00
return dst , false
2019-05-22 23:16:55 +02:00
}
// DeleteTSIDs marks as deleted all the TSIDs matching the given tfss.
//
// The caller must reset all the caches which may contain the deleted TSIDs.
//
// Returns the number of metrics deleted.
2022-06-27 11:53:46 +02:00
func ( db * indexDB ) DeleteTSIDs ( qt * querytracer . Tracer , tfss [ ] * TagFilters ) ( int , error ) {
qt = qt . NewChild ( "deleting series for %s" , tfss )
defer qt . Done ( )
2019-05-22 23:16:55 +02:00
if len ( tfss ) == 0 {
return 0 , nil
}
// Obtain metricIDs to delete.
2019-11-09 17:48:58 +01:00
tr := TimeRange {
MinTimestamp : 0 ,
2019-11-09 22:05:14 +01:00
MaxTimestamp : ( 1 << 63 ) - 1 ,
2019-11-09 17:48:58 +01:00
}
2020-07-23 19:42:57 +02:00
is := db . getIndexSearch ( noDeadline )
2022-06-27 11:53:46 +02:00
metricIDs , err := is . searchMetricIDs ( qt , tfss , tr , 2e9 )
2019-05-22 23:16:55 +02:00
db . putIndexSearch ( is )
if err != nil {
return 0 , err
}
2022-12-04 08:30:31 +01:00
db . deleteMetricIDs ( metricIDs )
2019-12-02 19:44:18 +01:00
// Delete TSIDs in the extDB.
deletedCount := len ( metricIDs )
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2019-12-02 19:44:18 +01:00
var n int
2022-06-27 11:53:46 +02:00
qtChild := qt . NewChild ( "deleting series from the previos indexdb" )
n , err = extDB . DeleteTSIDs ( qtChild , tfss )
qtChild . Donef ( "deleted %d series" , n )
2019-12-02 19:44:18 +01:00
deletedCount += n
2022-12-19 22:20:58 +01:00
} )
if err != nil {
return deletedCount , fmt . Errorf ( "cannot delete tsids in extDB: %w" , err )
2019-12-02 19:44:18 +01:00
}
return deletedCount , nil
}
2022-12-04 08:30:31 +01:00
func ( db * indexDB ) deleteMetricIDs ( metricIDs [ ] uint64 ) {
2019-05-22 23:16:55 +02:00
if len ( metricIDs ) == 0 {
// Nothing to delete
2022-12-04 08:30:31 +01:00
return
2019-05-22 23:16:55 +02:00
}
// atomically add deleted metricIDs to an inmemory map.
2019-11-03 23:34:24 +01:00
dmis := & uint64set . Set { }
2020-07-21 19:56:49 +02:00
dmis . AddMulti ( metricIDs )
2021-06-15 13:56:51 +02:00
db . s . updateDeletedMetricIDs ( dmis )
2019-05-22 23:16:55 +02:00
// Reset TagFilters -> TSIDS cache, since it may contain deleted TSIDs.
2021-07-06 10:01:51 +02:00
invalidateTagFiltersCache ( )
2019-05-22 23:16:55 +02:00
2020-07-14 13:02:14 +02:00
// Reset MetricName -> TSID cache, since it may contain deleted TSIDs.
2021-06-11 11:42:26 +02:00
db . s . resetAndSaveTSIDCache ( )
2020-07-14 13:02:14 +02:00
2021-06-11 11:42:26 +02:00
// Store the metricIDs as deleted.
// Make this after updating the deletedMetricIDs and resetting caches
// in order to exclude the possibility of the inconsistent state when the deleted metricIDs
// remain available in the tsidCache after unclean shutdown.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
items := getIndexItems ( )
for _ , metricID := range metricIDs {
items . B = append ( items . B , nsPrefixDeletedMetricID )
items . B = encoding . MarshalUint64 ( items . B , metricID )
items . Next ( )
}
2022-12-04 08:30:31 +01:00
db . tb . AddItems ( items . Items )
2021-06-11 11:42:26 +02:00
putIndexItems ( items )
2019-05-22 23:16:55 +02:00
}
2021-06-15 13:56:51 +02:00
func ( db * indexDB ) loadDeletedMetricIDs ( ) ( * uint64set . Set , error ) {
is := db . getIndexSearch ( noDeadline )
dmis , err := is . loadDeletedMetricIDs ( )
db . putIndexSearch ( is )
if err != nil {
return nil , err
}
return dmis , nil
2019-05-22 23:16:55 +02:00
}
2019-09-24 20:10:22 +02:00
func ( is * indexSearch ) loadDeletedMetricIDs ( ) ( * uint64set . Set , error ) {
dmis := & uint64set . Set { }
2019-05-22 23:16:55 +02:00
ts := & is . ts
kb := & is . kb
2019-09-25 12:47:06 +02:00
kb . B = append ( kb . B [ : 0 ] , nsPrefixDeletedMetricID )
2019-05-22 23:16:55 +02:00
ts . Seek ( kb . B )
for ts . NextItem ( ) {
item := ts . Item
if ! bytes . HasPrefix ( item , kb . B ) {
break
}
item = item [ len ( kb . B ) : ]
if len ( item ) != 8 {
return nil , fmt . Errorf ( "unexpected item len; got %d bytes; want %d bytes" , len ( item ) , 8 )
}
metricID := encoding . UnmarshalUint64 ( item )
2019-09-24 20:10:22 +02:00
dmis . Add ( metricID )
2019-05-22 23:16:55 +02:00
}
if err := ts . Error ( ) ; err != nil {
return nil , err
}
return dmis , nil
}
2024-01-23 15:09:52 +01:00
// searchMetricIDs returns metricIDs for the given tfss and tr.
//
// The returned metricIDs are sorted.
2022-10-23 11:15:24 +02:00
func ( db * indexDB ) searchMetricIDs ( qt * querytracer . Tracer , tfss [ ] * TagFilters , tr TimeRange , maxMetrics int , deadline uint64 ) ( [ ] uint64 , error ) {
qt = qt . NewChild ( "search for matching metricIDs: filters=%s, timeRange=%s" , tfss , & tr )
defer qt . Done ( )
2019-05-22 23:16:55 +02:00
if len ( tfss ) == 0 {
return nil , nil
}
2022-10-23 11:15:24 +02:00
qtChild := qt . NewChild ( "search for metricIDs in the current indexdb" )
2019-05-22 23:16:55 +02:00
tfKeyBuf := tagFiltersKeyBufPool . Get ( )
defer tagFiltersKeyBufPool . Put ( tfKeyBuf )
2019-11-06 12:39:48 +01:00
tfKeyBuf . B = marshalTagFiltersKey ( tfKeyBuf . B [ : 0 ] , tfss , tr , true )
2022-10-23 11:15:24 +02:00
metricIDs , ok := db . getMetricIDsFromTagFiltersCache ( qtChild , tfKeyBuf . B )
2019-05-22 23:16:55 +02:00
if ok {
2022-10-23 11:15:24 +02:00
// Fast path - metricIDs found in the cache
2022-06-09 18:46:26 +02:00
qtChild . Done ( )
2022-10-23 11:15:24 +02:00
return metricIDs , nil
2019-05-22 23:16:55 +02:00
}
2022-10-23 11:15:24 +02:00
// Slow path - search for metricIDs in the db and extDB.
2020-07-23 19:42:57 +02:00
is := db . getIndexSearch ( deadline )
2022-10-23 11:15:24 +02:00
localMetricIDs , err := is . searchMetricIDs ( qtChild , tfss , tr , maxMetrics )
2019-05-22 23:16:55 +02:00
db . putIndexSearch ( is )
if err != nil {
2023-10-25 21:24:01 +02:00
return nil , fmt . Errorf ( "error when searching for metricIDs in the current indexdb: %w" , err )
2019-05-22 23:16:55 +02:00
}
2022-06-09 18:46:26 +02:00
qtChild . Done ( )
2019-05-22 23:16:55 +02:00
2022-10-23 11:15:24 +02:00
var extMetricIDs [ ] uint64
2022-12-19 22:20:58 +01:00
db . doExtDB ( func ( extDB * indexDB ) {
2022-10-23 11:15:24 +02:00
qtChild := qt . NewChild ( "search for metricIDs in the previous indexdb" )
2022-06-09 18:46:26 +02:00
defer qtChild . Done ( )
2019-06-25 12:08:56 +02:00
tfKeyExtBuf := tagFiltersKeyBufPool . Get ( )
defer tagFiltersKeyBufPool . Put ( tfKeyExtBuf )
// Data in extDB cannot be changed, so use unversioned keys for tag cache.
2019-11-06 12:39:48 +01:00
tfKeyExtBuf . B = marshalTagFiltersKey ( tfKeyExtBuf . B [ : 0 ] , tfss , tr , false )
2022-10-23 11:15:24 +02:00
metricIDs , ok := extDB . getMetricIDsFromTagFiltersCache ( qtChild , tfKeyExtBuf . B )
2019-05-22 23:16:55 +02:00
if ok {
2022-10-23 11:15:24 +02:00
extMetricIDs = metricIDs
2019-05-22 23:16:55 +02:00
return
}
2020-07-23 19:42:57 +02:00
is := extDB . getIndexSearch ( deadline )
2022-10-23 11:15:24 +02:00
extMetricIDs , err = is . searchMetricIDs ( qtChild , tfss , tr , maxMetrics )
2019-05-22 23:16:55 +02:00
extDB . putIndexSearch ( is )
2022-10-23 11:15:24 +02:00
extDB . putMetricIDsToTagFiltersCache ( qtChild , extMetricIDs , tfKeyExtBuf . B )
2022-12-19 22:20:58 +01:00
} )
if err != nil {
2023-10-25 21:24:01 +02:00
return nil , fmt . Errorf ( "error when searching for metricIDs in the previous indexdb: %w" , err )
2019-05-22 23:16:55 +02:00
}
2022-10-23 11:15:24 +02:00
// Merge localMetricIDs with extMetricIDs.
metricIDs = mergeSortedMetricIDs ( localMetricIDs , extMetricIDs )
qt . Printf ( "merge %d metricIDs from the current indexdb with %d metricIDs from the previous indexdb; result: %d metricIDs" ,
len ( localMetricIDs ) , len ( extMetricIDs ) , len ( metricIDs ) )
// Store metricIDs in the cache.
db . putMetricIDsToTagFiltersCache ( qt , metricIDs , tfKeyBuf . B )
return metricIDs , nil
}
func mergeSortedMetricIDs ( a , b [ ] uint64 ) [ ] uint64 {
if len ( b ) == 0 {
return a
}
i := 0
j := 0
result := make ( [ ] uint64 , 0 , len ( a ) + len ( b ) )
for {
next := b [ j ]
start := i
for i < len ( a ) && a [ i ] <= next {
i ++
}
result = append ( result , a [ start : i ] ... )
if len ( result ) > 0 {
last := result [ len ( result ) - 1 ]
for j < len ( b ) && b [ j ] == last {
j ++
}
}
if i == len ( a ) {
return append ( result , b [ j : ] ... )
}
a , b = b , a
i , j = j , i
}
}
func ( db * indexDB ) getTSIDsFromMetricIDs ( qt * querytracer . Tracer , metricIDs [ ] uint64 , deadline uint64 ) ( [ ] TSID , error ) {
qt = qt . NewChild ( "obtain tsids from %d metricIDs" , len ( metricIDs ) )
defer qt . Done ( )
if len ( metricIDs ) == 0 {
return nil , nil
}
2023-09-15 11:54:50 +02:00
// Search for TSIDs in the current indexdb
2022-10-23 11:15:24 +02:00
tsids := make ( [ ] TSID , len ( metricIDs ) )
2022-12-19 20:56:46 +01:00
var extMetricIDs [ ] uint64
2022-10-23 11:15:24 +02:00
i := 0
2022-12-19 20:56:46 +01:00
err := func ( ) error {
is := db . getIndexSearch ( deadline )
defer db . putIndexSearch ( is )
for loopsPaceLimiter , metricID := range metricIDs {
if loopsPaceLimiter & paceLimiterSlowIterationsMask == 0 {
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return err
}
2022-10-23 11:15:24 +02:00
}
2022-12-19 20:56:46 +01:00
// Try obtaining TSIDs from MetricID->TSID cache. This is much faster
// than scanning the mergeset if it contains a lot of metricIDs.
tsid := & tsids [ i ]
err := is . db . getFromMetricIDCache ( tsid , metricID )
if err == nil {
// Fast path - the tsid for metricID is found in cache.
i ++
continue
}
if err != io . EOF {
return err
}
2023-09-15 11:54:50 +02:00
if ! is . getTSIDByMetricID ( tsid , metricID ) {
// Postpone searching for the missing metricID in the extDB.
extMetricIDs = append ( extMetricIDs , metricID )
continue
2022-12-19 20:56:46 +01:00
}
is . db . putToMetricIDCache ( metricID , tsid )
2022-10-23 11:15:24 +02:00
i ++
}
2022-12-19 20:56:46 +01:00
return nil
} ( )
if err != nil {
return nil , fmt . Errorf ( "error when searching for TISDs by metricIDs in the current indexdb: %w" , err )
}
tsidsFound := i
qt . Printf ( "found %d tsids for %d metricIDs in the current indexdb" , tsidsFound , len ( metricIDs ) )
2023-09-15 11:54:50 +02:00
if len ( extMetricIDs ) > 0 {
// Search for extMetricIDs in the previous indexdb (aka extDB)
db . doExtDB ( func ( extDB * indexDB ) {
is := extDB . getIndexSearch ( deadline )
defer extDB . putIndexSearch ( is )
for loopsPaceLimiter , metricID := range extMetricIDs {
if loopsPaceLimiter & paceLimiterSlowIterationsMask == 0 {
if err = checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return
}
2022-12-19 20:56:46 +01:00
}
2023-09-15 11:54:50 +02:00
// There is no need in searching for TSIDs in MetricID->TSID cache, since
// this has been already done in the loop above (the MetricID->TSID cache is global).
tsid := & tsids [ i ]
if ! is . getTSIDByMetricID ( tsid , metricID ) {
2022-12-19 20:56:46 +01:00
// Cannot find TSID for the given metricID.
// This may be the case on incomplete indexDB
// due to snapshot or due to unflushed entries.
2022-12-20 22:56:53 +01:00
// Just increment errors counter and skip it for now.
2024-02-23 23:15:21 +01:00
is . db . missingTSIDsForMetricID . Add ( 1 )
2022-12-19 20:56:46 +01:00
continue
}
2023-09-15 11:54:50 +02:00
is . db . putToMetricIDCache ( metricID , tsid )
i ++
2022-10-23 11:15:24 +02:00
}
2023-09-15 11:54:50 +02:00
} )
if err != nil {
return nil , fmt . Errorf ( "error when searching for TSIDs by metricIDs in the previous indexdb: %w" , err )
2022-10-23 11:15:24 +02:00
}
2023-09-15 11:54:50 +02:00
qt . Printf ( "found %d tsids for %d metricIDs in the previous indexdb" , i - tsidsFound , len ( extMetricIDs ) )
2022-10-23 11:15:24 +02:00
}
2022-12-19 20:56:46 +01:00
2022-10-23 11:15:24 +02:00
tsids = tsids [ : i ]
2022-12-19 20:56:46 +01:00
qt . Printf ( "load %d tsids for %d metricIDs from both current and previous indexdb" , len ( tsids ) , len ( metricIDs ) )
2019-05-22 23:16:55 +02:00
// Sort the found tsids, since they must be passed to TSID search
// in the sorted order.
sort . Slice ( tsids , func ( i , j int ) bool { return tsids [ i ] . Less ( & tsids [ j ] ) } )
2022-06-09 18:46:26 +02:00
qt . Printf ( "sort %d tsids" , len ( tsids ) )
2022-10-23 11:15:24 +02:00
return tsids , nil
2019-05-22 23:16:55 +02:00
}
var tagFiltersKeyBufPool bytesutil . ByteBufferPool
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
func ( is * indexSearch ) getTSIDByMetricNameNoExtDB ( dst * TSID , metricName [ ] byte , date uint64 ) bool {
2021-06-15 13:56:51 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
2019-05-22 23:16:55 +02:00
ts := & is . ts
kb := & is . kb
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
kb . B = marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixDateMetricNameToTSID )
kb . B = encoding . MarshalUint64 ( kb . B , date )
2019-05-22 23:16:55 +02:00
kb . B = append ( kb . B , metricName ... )
kb . B = append ( kb . B , kvSeparatorChar )
ts . Seek ( kb . B )
for ts . NextItem ( ) {
if ! bytes . HasPrefix ( ts . Item , kb . B ) {
// Nothing found.
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
return false
2019-05-22 23:16:55 +02:00
}
v := ts . Item [ len ( kb . B ) : ]
tail , err := dst . Unmarshal ( v )
if err != nil {
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
logger . Panicf ( "FATAL: cannot unmarshal TSID: %s" , err )
2019-05-22 23:16:55 +02:00
}
if len ( tail ) > 0 {
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
logger . Panicf ( "FATAL: unexpected non-empty tail left after unmarshaling TSID: %X" , tail )
2019-05-22 23:16:55 +02:00
}
2019-09-24 20:10:22 +02:00
if dmis . Len ( ) > 0 {
2019-05-22 23:16:55 +02:00
// Verify whether the dst is marked as deleted.
2019-09-24 20:10:22 +02:00
if dmis . Has ( dst . MetricID ) {
2019-05-25 20:51:11 +02:00
// The dst is deleted. Continue searching.
2019-05-22 23:16:55 +02:00
continue
}
}
// Found valid dst.
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
return true
2019-05-22 23:16:55 +02:00
}
if err := ts . Error ( ) ; err != nil {
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
logger . Panicf ( "FATAL: error when searching TSID by metricName; searchPrefix %q: %s" , kb . B , err )
2019-05-22 23:16:55 +02:00
}
// Nothing found
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
return false
2019-05-22 23:16:55 +02:00
}
2023-09-22 11:32:59 +02:00
func ( is * indexSearch ) searchMetricNameWithCache ( dst [ ] byte , metricID uint64 ) ( [ ] byte , bool ) {
2019-05-22 23:16:55 +02:00
metricName := is . db . getMetricNameFromCache ( dst , metricID )
if len ( metricName ) > len ( dst ) {
2023-09-22 11:32:59 +02:00
return metricName , true
2019-05-22 23:16:55 +02:00
}
2023-09-22 11:32:59 +02:00
var ok bool
dst , ok = is . searchMetricName ( dst , metricID )
if ok {
2021-04-13 09:20:35 +02:00
// There is no need in verifying whether the given metricID is deleted,
// since the filtering must be performed before calling this func.
is . db . putMetricNameToCache ( metricID , dst )
2023-09-22 11:32:59 +02:00
return dst , true
2021-04-13 09:20:35 +02:00
}
2023-09-22 11:32:59 +02:00
return dst , false
2021-03-22 21:41:47 +01:00
}
2019-05-22 23:16:55 +02:00
2023-09-22 11:32:59 +02:00
func ( is * indexSearch ) searchMetricName ( dst [ ] byte , metricID uint64 ) ( [ ] byte , bool ) {
2019-05-22 23:16:55 +02:00
ts := & is . ts
kb := & is . kb
2020-07-23 23:31:09 +02:00
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixMetricIDToMetricName )
2019-05-22 23:16:55 +02:00
kb . B = encoding . MarshalUint64 ( kb . B , metricID )
if err := ts . FirstItemWithPrefix ( kb . B ) ; err != nil {
if err == io . EOF {
2023-09-22 11:32:59 +02:00
return dst , false
2019-05-22 23:16:55 +02:00
}
2023-10-25 21:24:01 +02:00
logger . Panicf ( "FATAL: error when searching metricName by metricID; searchPrefix %q: %s" , kb . B , err )
2019-05-22 23:16:55 +02:00
}
v := ts . Item [ len ( kb . B ) : ]
dst = append ( dst , v ... )
2023-09-22 11:32:59 +02:00
return dst , true
2019-05-22 23:16:55 +02:00
}
2020-03-31 11:34:29 +02:00
func ( is * indexSearch ) containsTimeRange ( tr TimeRange ) ( bool , error ) {
ts := & is . ts
kb := & is . kb
2022-02-24 11:47:24 +01:00
// Verify whether the tr.MinTimestamp is included into `ts` or is smaller than the minimum date stored in `ts`.
// Do not check whether tr.MaxTimestamp is included into `ts` or is bigger than the max date stored in `ts` for performance reasons.
// This means that containsTimeRange() can return true if `tr` is located below the min date stored in `ts`.
// This is OK, since this case isn't encountered too much in practice.
// The main practical case allows skipping searching in prev indexdb (`ts`) when `tr`
// is located above the max date stored there.
2020-03-31 11:34:29 +02:00
minDate := uint64 ( tr . MinTimestamp ) / msecPerDay
2020-07-23 23:31:09 +02:00
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixDateToMetricID )
2020-03-31 11:34:29 +02:00
prefix := kb . B
kb . B = encoding . MarshalUint64 ( kb . B , minDate )
ts . Seek ( kb . B )
if ! ts . NextItem ( ) {
if err := ts . Error ( ) ; err != nil {
2020-06-30 21:58:18 +02:00
return false , fmt . Errorf ( "error when searching for minDate=%d, prefix %q: %w" , minDate , kb . B , err )
2020-03-31 11:34:29 +02:00
}
return false , nil
}
if ! bytes . HasPrefix ( ts . Item , prefix ) {
// minDate exceeds max date from ts.
return false , nil
}
return true , nil
}
2023-09-15 11:54:50 +02:00
func ( is * indexSearch ) getTSIDByMetricID ( dst * TSID , metricID uint64 ) bool {
2019-05-22 23:16:55 +02:00
// There is no need in checking for deleted metricIDs here, since they
// must be checked by the caller.
ts := & is . ts
kb := & is . kb
2020-07-23 23:31:09 +02:00
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixMetricIDToTSID )
2019-05-22 23:16:55 +02:00
kb . B = encoding . MarshalUint64 ( kb . B , metricID )
if err := ts . FirstItemWithPrefix ( kb . B ) ; err != nil {
if err == io . EOF {
2023-09-15 11:54:50 +02:00
return false
2019-05-22 23:16:55 +02:00
}
2023-09-15 11:54:50 +02:00
logger . Panicf ( "FATAL: error when searching TSID by metricID=%d; searchPrefix %q: %s" , metricID , kb . B , err )
2019-05-22 23:16:55 +02:00
}
v := ts . Item [ len ( kb . B ) : ]
tail , err := dst . Unmarshal ( v )
if err != nil {
2023-09-15 11:54:50 +02:00
logger . Panicf ( "FATAL: cannot unmarshal the found TSID=%X for metricID=%d: %s" , v , metricID , err )
2019-05-22 23:16:55 +02:00
}
if len ( tail ) > 0 {
2023-09-15 11:54:50 +02:00
logger . Panicf ( "FATAL: unexpected non-zero tail left after unmarshaling TSID for metricID=%d: %X" , metricID , tail )
2019-05-22 23:16:55 +02:00
}
2023-09-15 11:54:50 +02:00
return true
2019-05-22 23:16:55 +02:00
}
2019-06-10 11:57:34 +02:00
// updateMetricIDsByMetricNameMatch matches metricName values for the given srcMetricIDs against tfs
2019-05-22 23:16:55 +02:00
// and adds matching metrics to metricIDs.
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) updateMetricIDsByMetricNameMatch ( qt * querytracer . Tracer , metricIDs , srcMetricIDs * uint64set . Set , tfs [ ] * tagFilter ) error {
qt = qt . NewChild ( "filter out %d metric ids with filters=%s" , srcMetricIDs . Len ( ) , tfs )
defer qt . Done ( )
2019-05-22 23:16:55 +02:00
// sort srcMetricIDs in order to speed up Seek below.
2019-09-24 20:10:22 +02:00
sortedMetricIDs := srcMetricIDs . AppendTo ( nil )
2022-06-09 18:46:26 +02:00
qt . Printf ( "sort %d metric ids" , len ( sortedMetricIDs ) )
2019-05-22 23:16:55 +02:00
2021-02-10 00:24:45 +01:00
kb := & is . kb
2021-02-10 13:37:14 +01:00
kb . B = is . marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixTagToMetricIDs )
tfs = removeCompositeTagFilters ( tfs , kb . B )
2021-02-10 00:24:45 +01:00
2019-05-22 23:16:55 +02:00
metricName := kbPool . Get ( )
defer kbPool . Put ( metricName )
mn := GetMetricName ( )
defer PutMetricName ( mn )
2020-07-23 18:21:49 +02:00
for loopsPaceLimiter , metricID := range sortedMetricIDs {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterSlowIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return err
}
2020-07-23 18:21:49 +02:00
}
2023-09-22 11:32:59 +02:00
var ok bool
metricName . B , ok = is . searchMetricNameWithCache ( metricName . B [ : 0 ] , metricID )
if ! ok {
// It is likely the metricID->metricName entry didn't propagate to inverted index yet.
// Skip this metricID for now.
continue
2019-05-22 23:16:55 +02:00
}
if err := mn . Unmarshal ( metricName . B ) ; err != nil {
2023-09-22 11:32:59 +02:00
logger . Panicf ( "FATAL: cannot unmarshal metricName %q: %s" , metricName . B , err )
2019-05-22 23:16:55 +02:00
}
// Match the mn against tfs.
ok , err := matchTagFilters ( mn , tfs , & is . kb )
if err != nil {
2020-06-30 21:58:18 +02:00
return fmt . Errorf ( "cannot match MetricName %s against tagFilters: %w" , mn , err )
2019-05-22 23:16:55 +02:00
}
if ! ok {
continue
}
2019-09-24 20:10:22 +02:00
metricIDs . Add ( metricID )
2019-05-22 23:16:55 +02:00
}
2022-06-09 18:46:26 +02:00
qt . Printf ( "apply filters %s; resulting metric ids: %d" , tfs , metricIDs . Len ( ) )
2019-05-22 23:16:55 +02:00
return nil
}
2021-02-10 13:37:14 +01:00
func removeCompositeTagFilters ( tfs [ ] * tagFilter , prefix [ ] byte ) [ ] * tagFilter {
if ! hasCompositeTagFilters ( tfs , prefix ) {
return tfs
}
var tagKey [ ] byte
var name [ ] byte
tfsNew := make ( [ ] * tagFilter , 0 , len ( tfs ) + 1 )
2021-02-09 23:44:54 +01:00
for _ , tf := range tfs {
if ! bytes . HasPrefix ( tf . prefix , prefix ) {
tfsNew = append ( tfsNew , tf )
continue
}
suffix := tf . prefix [ len ( prefix ) : ]
var err error
2021-02-10 13:37:14 +01:00
_ , tagKey , err = unmarshalTagValue ( tagKey [ : 0 ] , suffix )
2021-02-09 23:44:54 +01:00
if err != nil {
logger . Panicf ( "BUG: cannot unmarshal tag key from suffix=%q: %s" , suffix , err )
}
if len ( tagKey ) == 0 || tagKey [ 0 ] != compositeTagKeyPrefix {
tfsNew = append ( tfsNew , tf )
continue
}
tagKey = tagKey [ 1 : ]
2024-05-14 01:23:44 +02:00
nameLen , nSize := encoding . UnmarshalVarUint64 ( tagKey )
if nSize <= 0 {
logger . Panicf ( "BUG: cannot unmarshal nameLen from tagKey %q" , tagKey )
2021-02-09 23:44:54 +01:00
}
2024-05-14 01:23:44 +02:00
tagKey = tagKey [ nSize : ]
2021-02-10 13:37:14 +01:00
if nameLen == 0 {
logger . Panicf ( "BUG: nameLen must be greater than 0" )
}
2021-02-09 23:44:54 +01:00
if uint64 ( len ( tagKey ) ) < nameLen {
logger . Panicf ( "BUG: expecting at %d bytes for name in tagKey=%q; got %d bytes" , nameLen , tagKey , len ( tagKey ) )
}
2021-02-10 13:37:14 +01:00
name = append ( name [ : 0 ] , tagKey [ : nameLen ] ... )
2021-02-09 23:44:54 +01:00
tagKey = tagKey [ nameLen : ]
2021-02-10 13:37:14 +01:00
var tfNew tagFilter
if err := tfNew . Init ( prefix , tagKey , tf . value , tf . isNegative , tf . isRegexp ) ; err != nil {
logger . Panicf ( "BUG: cannot initialize {%s=%q} filter: %s" , tagKey , tf . value , err )
}
tfsNew = append ( tfsNew , & tfNew )
}
if len ( name ) > 0 {
var tfNew tagFilter
if err := tfNew . Init ( prefix , nil , name , false , false ) ; err != nil {
logger . Panicf ( "BUG: unexpected error when initializing {__name__=%q} filter: %s" , name , err )
}
2021-02-09 23:44:54 +01:00
tfsNew = append ( tfsNew , & tfNew )
}
return tfsNew
}
2021-02-10 13:37:14 +01:00
func hasCompositeTagFilters ( tfs [ ] * tagFilter , prefix [ ] byte ) bool {
var tagKey [ ] byte
for _ , tf := range tfs {
if ! bytes . HasPrefix ( tf . prefix , prefix ) {
continue
}
suffix := tf . prefix [ len ( prefix ) : ]
var err error
_ , tagKey , err = unmarshalTagValue ( tagKey [ : 0 ] , suffix )
if err != nil {
logger . Panicf ( "BUG: cannot unmarshal tag key from suffix=%q: %s" , suffix , err )
}
if len ( tagKey ) > 0 && tagKey [ 0 ] == compositeTagKeyPrefix {
return true
}
}
return false
}
2019-05-22 23:16:55 +02:00
func matchTagFilters ( mn * MetricName , tfs [ ] * tagFilter , kb * bytesutil . ByteBuffer ) ( bool , error ) {
2019-09-20 18:46:47 +02:00
kb . B = marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixTagToMetricIDs )
2019-11-21 20:34:32 +01:00
for i , tf := range tfs {
2021-02-09 23:44:54 +01:00
if bytes . Equal ( tf . key , graphiteReverseTagKey ) {
// Skip artificial tag filter for Graphite-like metric names with dots,
// since mn doesn't contain the corresponding tag.
continue
}
2021-02-02 23:24:05 +01:00
if len ( tf . key ) == 0 || string ( tf . key ) == "__graphite__" {
2019-05-22 23:16:55 +02:00
// Match against mn.MetricGroup.
b := marshalTagValue ( kb . B , nil )
b = marshalTagValue ( b , mn . MetricGroup )
kb . B = b [ : len ( kb . B ) ]
2021-02-09 23:44:54 +01:00
ok , err := tf . match ( b )
2019-05-22 23:16:55 +02:00
if err != nil {
2020-06-30 21:58:18 +02:00
return false , fmt . Errorf ( "cannot match MetricGroup %q with tagFilter %s: %w" , mn . MetricGroup , tf , err )
2019-05-22 23:16:55 +02:00
}
if ! ok {
2019-11-21 20:34:32 +01:00
// Move failed tf to start.
// This should reduce the amount of useless work for the next mn.
if i > 0 {
tfs [ 0 ] , tfs [ i ] = tfs [ i ] , tfs [ 0 ]
}
2019-05-22 23:16:55 +02:00
return false , nil
}
continue
}
// Search for matching tag name.
tagMatched := false
2020-06-10 17:40:00 +02:00
tagSeen := false
2021-02-09 23:44:54 +01:00
for _ , tag := range mn . Tags {
2019-05-22 23:16:55 +02:00
if string ( tag . Key ) != string ( tf . key ) {
continue
}
2019-07-30 14:14:09 +02:00
// Found the matching tag name. Match the value.
2020-06-10 17:40:00 +02:00
tagSeen = true
2019-05-22 23:16:55 +02:00
b := tag . Marshal ( kb . B )
kb . B = b [ : len ( kb . B ) ]
2021-02-09 23:44:54 +01:00
ok , err := tf . match ( b )
2019-05-22 23:16:55 +02:00
if err != nil {
2020-06-30 21:58:18 +02:00
return false , fmt . Errorf ( "cannot match tag %q with tagFilter %s: %w" , tag , tf , err )
2019-05-22 23:16:55 +02:00
}
if ! ok {
2019-11-21 20:34:32 +01:00
// Move failed tf to start.
// This should reduce the amount of useless work for the next mn.
if i > 0 {
tfs [ 0 ] , tfs [ i ] = tfs [ i ] , tfs [ 0 ]
}
2019-05-22 23:16:55 +02:00
return false , nil
}
tagMatched = true
break
}
2022-03-18 11:58:22 +01:00
if ! tagSeen && ( ! tf . isNegative && tf . isEmptyMatch || tf . isNegative && ! tf . isEmptyMatch ) {
// tf contains positive empty-match filter for non-existing tag key, i.e.
// {non_existing_tag_key=~"foobar|"}
//
// OR
//
2020-06-10 17:40:00 +02:00
// tf contains negative filter for non-exsisting tag key
// and this filter doesn't match empty string, i.e. {non_existing_tag_key!="foobar"}
// Such filter matches anything.
//
// Note that the filter `{non_existing_tag_key!~"|foobar"}` shouldn't match anything,
// since it is expected that it matches non-empty `non_existing_tag_key`.
2022-03-18 11:58:22 +01:00
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/546 and
// https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2255 for details.
2020-06-10 17:40:00 +02:00
continue
}
if tagMatched {
// tf matches mn. Go to the next tf.
continue
}
// Matching tag name wasn't found.
// Move failed tf to start.
// This should reduce the amount of useless work for the next mn.
if i > 0 {
tfs [ 0 ] , tfs [ i ] = tfs [ i ] , tfs [ 0 ]
2019-05-22 23:16:55 +02:00
}
2020-06-10 17:40:00 +02:00
return false , nil
2019-05-22 23:16:55 +02:00
}
return true , nil
}
2022-06-12 03:32:13 +02:00
func ( is * indexSearch ) searchMetricIDsWithFiltersOnDate ( qt * querytracer . Tracer , tfss [ ] * TagFilters , date uint64 , maxMetrics int ) ( * uint64set . Set , error ) {
if len ( tfss ) == 0 {
return nil , nil
}
tr := TimeRange {
MinTimestamp : int64 ( date ) * msecPerDay ,
MaxTimestamp : int64 ( date + 1 ) * msecPerDay - 1 ,
}
if date == 0 {
// Search for metricIDs on the whole time range.
tr . MaxTimestamp = timestampFromTime ( time . Now ( ) )
}
metricIDs , err := is . searchMetricIDsInternal ( qt , tfss , tr , maxMetrics )
if err != nil {
return nil , err
}
return metricIDs , nil
}
2024-01-23 15:09:52 +01:00
// searchMetricIDs returns metricIDs for the given tfss and tr.
//
// The returned metricIDs are sorted.
2022-06-01 01:29:19 +02:00
func ( is * indexSearch ) searchMetricIDs ( qt * querytracer . Tracer , tfss [ ] * TagFilters , tr TimeRange , maxMetrics int ) ( [ ] uint64 , error ) {
2022-06-09 18:46:26 +02:00
metricIDs , err := is . searchMetricIDsInternal ( qt , tfss , tr , maxMetrics )
2021-05-12 15:32:48 +02:00
if err != nil {
return nil , err
2019-05-22 23:16:55 +02:00
}
2019-09-24 20:10:22 +02:00
if metricIDs . Len ( ) == 0 {
2019-05-22 23:16:55 +02:00
// Nothing found
return nil , nil
}
2019-09-24 20:10:22 +02:00
sortedMetricIDs := metricIDs . AppendTo ( nil )
2022-06-01 01:29:19 +02:00
qt . Printf ( "sort %d matching metric ids" , len ( sortedMetricIDs ) )
2019-05-22 23:16:55 +02:00
// Filter out deleted metricIDs.
2021-06-15 13:56:51 +02:00
dmis := is . db . s . getDeletedMetricIDs ( )
2019-09-24 20:10:22 +02:00
if dmis . Len ( ) > 0 {
2019-06-10 11:49:59 +02:00
metricIDsFiltered := sortedMetricIDs [ : 0 ]
for _ , metricID := range sortedMetricIDs {
2019-09-24 20:10:22 +02:00
if ! dmis . Has ( metricID ) {
2019-05-22 23:16:55 +02:00
metricIDsFiltered = append ( metricIDsFiltered , metricID )
}
}
2022-06-09 18:46:26 +02:00
qt . Printf ( "left %d metric ids after removing deleted metric ids" , len ( metricIDsFiltered ) )
2019-06-10 11:49:59 +02:00
sortedMetricIDs = metricIDsFiltered
2019-05-22 23:16:55 +02:00
}
2019-06-10 11:49:59 +02:00
return sortedMetricIDs , nil
2019-05-22 23:16:55 +02:00
}
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) searchMetricIDsInternal ( qt * querytracer . Tracer , tfss [ ] * TagFilters , tr TimeRange , maxMetrics int ) ( * uint64set . Set , error ) {
qt = qt . NewChild ( "search for metric ids: filters=%s, timeRange=%s, maxMetrics=%d" , tfss , & tr , maxMetrics )
defer qt . Done ( )
2024-03-11 19:37:05 +01:00
2021-05-12 15:32:48 +02:00
metricIDs := & uint64set . Set { }
2024-03-11 19:37:05 +01:00
ok , err := is . containsTimeRange ( tr )
if err != nil {
return nil , err
}
if ! ok {
qt . Printf ( "indexdb doesn't contain data for the given timeRange=%s" , & tr )
return metricIDs , nil
}
if tr . MinTimestamp >= is . db . s . minTimestampForCompositeIndex {
tfss = convertToCompositeTagFilterss ( tfss )
qt . Printf ( "composite filters=%s" , tfss )
}
2021-05-12 15:32:48 +02:00
for _ , tfs := range tfss {
if len ( tfs . tfs ) == 0 {
// An empty filters must be equivalent to `{__name__!=""}`
tfs = NewTagFilters ( )
if err := tfs . Add ( nil , nil , true , false ) ; err != nil {
logger . Panicf ( ` BUG: cannot add { __name__!=""} filter: %s ` , err )
}
}
2022-06-09 18:46:26 +02:00
qtChild := qt . NewChild ( "update metric ids: filters=%s, timeRange=%s" , tfs , & tr )
prevMetricIDsLen := metricIDs . Len ( )
err := is . updateMetricIDsForTagFilters ( qtChild , metricIDs , tfs , tr , maxMetrics + 1 )
qtChild . Donef ( "updated %d metric ids" , metricIDs . Len ( ) - prevMetricIDsLen )
if err != nil {
2021-05-12 15:32:48 +02:00
return nil , err
}
if metricIDs . Len ( ) > maxMetrics {
2022-04-12 10:19:04 +02:00
return nil , fmt . Errorf ( "the number of matching timeseries exceeds %d; either narrow down the search " +
2022-06-14 12:23:23 +02:00
"or increase -search.max* command-line flag values at vmselect; see https://docs.victoriametrics.com/#resource-usage-limits" , maxMetrics )
2021-05-12 15:32:48 +02:00
}
}
return metricIDs , nil
}
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) updateMetricIDsForTagFilters ( qt * querytracer . Tracer , metricIDs * uint64set . Set , tfs * TagFilters , tr TimeRange , maxMetrics int ) error {
err := is . tryUpdatingMetricIDsForDateRange ( qt , metricIDs , tfs , tr , maxMetrics )
2020-03-13 21:42:22 +01:00
if err == nil {
2019-11-09 22:17:42 +01:00
// Fast path: found metricIDs by date range.
return nil
}
2021-03-16 17:46:22 +01:00
if ! errors . Is ( err , errFallbackToGlobalSearch ) {
2020-03-13 21:42:22 +01:00
return err
}
2019-11-08 12:16:40 +01:00
2021-07-30 07:37:10 +02:00
// Slow path - fall back to search in the global inverted index.
2022-06-09 18:46:26 +02:00
qt . Printf ( "cannot find metric ids in per-day index; fall back to global index" )
2024-02-23 23:15:21 +01:00
is . db . globalSearchCalls . Add ( 1 )
2022-06-09 18:46:26 +02:00
m , err := is . getMetricIDsForDateAndFilters ( qt , 0 , tfs , maxMetrics )
2019-06-10 12:25:44 +02:00
if err != nil {
2022-04-12 10:19:04 +02:00
if errors . Is ( err , errFallbackToGlobalSearch ) {
return fmt . Errorf ( "the number of matching timeseries exceeds %d; either narrow down the search " +
2023-08-14 10:57:31 +02:00
"or increase -search.max* command-line flag values at vmselect; see https://docs.victoriametrics.com/#resource-usage-limits" , maxMetrics )
2022-04-12 10:19:04 +02:00
}
2019-08-19 15:04:12 +02:00
return err
2019-05-22 23:16:55 +02:00
}
2021-07-30 07:37:10 +02:00
metricIDs . UnionMayOwn ( m )
2019-05-22 23:16:55 +02:00
return nil
}
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) getMetricIDsForTagFilter ( qt * querytracer . Tracer , tf * tagFilter , maxMetrics int , maxLoopsCount int64 ) ( * uint64set . Set , int64 , error ) {
2019-05-22 23:16:55 +02:00
if tf . isNegative {
logger . Panicf ( "BUG: isNegative must be false" )
}
2019-09-24 20:10:22 +02:00
metricIDs := & uint64set . Set { }
2019-05-22 23:16:55 +02:00
if len ( tf . orSuffixes ) > 0 {
2020-10-16 23:46:55 +02:00
// Fast path for orSuffixes - seek for rows for each value from orSuffixes.
2021-07-30 07:37:10 +02:00
loopsCount , err := is . updateMetricIDsForOrSuffixes ( tf , metricIDs , maxMetrics , maxLoopsCount )
2022-06-09 18:46:26 +02:00
qt . Printf ( "found %d metric ids for filter={%s} using exact search; spent %d loops" , metricIDs . Len ( ) , tf , loopsCount )
2021-02-18 11:47:36 +01:00
if err != nil {
return nil , loopsCount , fmt . Errorf ( "error when searching for metricIDs for tagFilter in fast path: %w; tagFilter=%s" , err , tf )
2019-05-22 23:16:55 +02:00
}
2021-02-18 11:47:36 +01:00
return metricIDs , loopsCount , nil
2019-05-22 23:16:55 +02:00
}
2019-06-27 15:15:25 +02:00
// Slow path - scan for all the rows with the given prefix.
2021-07-30 07:37:10 +02:00
loopsCount , err := is . getMetricIDsForTagFilterSlow ( tf , metricIDs . Add , maxLoopsCount )
2022-06-09 18:46:26 +02:00
qt . Printf ( "found %d metric ids for filter={%s} using prefix search; spent %d loops" , metricIDs . Len ( ) , tf , loopsCount )
2021-02-18 11:47:36 +01:00
if err != nil {
return nil , loopsCount , fmt . Errorf ( "error when searching for metricIDs for tagFilter in slow path: %w; tagFilter=%s" , err , tf )
2019-06-27 15:15:25 +02:00
}
2021-02-18 11:47:36 +01:00
return metricIDs , loopsCount , nil
2019-06-27 15:15:25 +02:00
}
2021-03-16 17:46:22 +01:00
var errTooManyLoops = fmt . Errorf ( "too many loops is needed for applying this filter" )
2021-07-30 07:37:10 +02:00
func ( is * indexSearch ) getMetricIDsForTagFilterSlow ( tf * tagFilter , f func ( metricID uint64 ) , maxLoopsCount int64 ) ( int64 , error ) {
2019-06-27 15:15:25 +02:00
if len ( tf . orSuffixes ) > 0 {
logger . Panicf ( "BUG: the getMetricIDsForTagFilterSlow must be called only for empty tf.orSuffixes; got %s" , tf . orSuffixes )
}
// Scan all the rows with tf.prefix and call f on every tf match.
2019-05-22 23:16:55 +02:00
ts := & is . ts
2019-06-27 15:15:25 +02:00
kb := & is . kb
2019-09-20 18:46:47 +02:00
mp := & is . mp
var prevMatchingSuffix [ ] byte
2019-06-27 15:15:25 +02:00
var prevMatch bool
2021-03-16 17:46:22 +01:00
var loopsCount int64
2020-07-23 18:21:49 +02:00
loopsPaceLimiter := 0
2019-09-20 18:46:47 +02:00
prefix := tf . prefix
ts . Seek ( prefix )
2019-06-27 15:15:25 +02:00
for ts . NextItem ( ) {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterMediumIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , err
2020-07-23 19:42:57 +02:00
}
2020-07-23 18:21:49 +02:00
}
loopsPaceLimiter ++
2019-09-20 18:46:47 +02:00
item := ts . Item
if ! bytes . HasPrefix ( item , prefix ) {
2021-02-18 11:47:36 +01:00
return loopsCount , nil
2019-05-22 23:16:55 +02:00
}
2019-09-20 18:46:47 +02:00
tail := item [ len ( prefix ) : ]
n := bytes . IndexByte ( tail , tagSeparatorChar )
if n < 0 {
2021-02-18 11:47:36 +01:00
return loopsCount , fmt . Errorf ( "invalid tag->metricIDs line %q: cannot find tagSeparatorChar=%d" , item , tagSeparatorChar )
2019-05-22 23:16:55 +02:00
}
2019-09-20 18:46:47 +02:00
suffix := tail [ : n + 1 ]
tail = tail [ n + 1 : ]
if err := mp . InitOnlyTail ( item , tail ) ; err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , err
2019-05-22 23:16:55 +02:00
}
2020-09-21 23:33:43 +02:00
mp . ParseMetricIDs ( )
2021-03-16 17:46:22 +01:00
loopsCount += int64 ( mp . MetricIDsLen ( ) )
if loopsCount > maxLoopsCount {
return loopsCount , errTooManyLoops
}
2019-09-20 18:46:47 +02:00
if prevMatch && string ( suffix ) == string ( prevMatchingSuffix ) {
2019-06-27 15:15:25 +02:00
// Fast path: the same tag value found.
// There is no need in checking it again with potentially
// slow tf.matchSuffix, which may call regexp.
2019-09-20 18:46:47 +02:00
for _ , metricID := range mp . MetricIDs {
2020-09-21 23:36:43 +02:00
f ( metricID )
2019-06-27 15:15:25 +02:00
}
continue
}
2019-09-20 18:46:47 +02:00
// Slow path: need tf.matchSuffix call.
ok , err := tf . matchSuffix ( suffix )
2021-03-07 20:12:28 +01:00
// Assume that tf.matchSuffix call needs 10x more time than a single metric scan iteration.
2021-03-16 17:46:22 +01:00
loopsCount += 10 * int64 ( tf . matchCost )
2019-05-22 23:16:55 +02:00
if err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , fmt . Errorf ( "error when matching %s against suffix %q: %w" , tf , suffix , err )
2019-05-22 23:16:55 +02:00
}
if ! ok {
2019-06-27 15:15:25 +02:00
prevMatch = false
2019-12-02 23:29:44 +01:00
if mp . MetricIDsLen ( ) < maxMetricIDsPerRow / 2 {
// If the current row contains non-full metricIDs list,
// then it is likely the next row contains the next tag value.
// So skip seeking for the next tag value, since it will be slower than just ts.NextItem call.
continue
}
2019-06-27 15:15:25 +02:00
// Optimization: skip all the metricIDs for the given tag value
2019-09-20 18:46:47 +02:00
kb . B = append ( kb . B [ : 0 ] , item [ : len ( item ) - len ( tail ) ] ... )
2019-06-27 15:15:25 +02:00
// The last char in kb.B must be tagSeparatorChar. Just increment it
// in order to jump to the next tag value.
if len ( kb . B ) == 0 || kb . B [ len ( kb . B ) - 1 ] != tagSeparatorChar || tagSeparatorChar >= 0xff {
2021-02-18 11:47:36 +01:00
return loopsCount , fmt . Errorf ( "data corruption: the last char in k=%X must be %X" , kb . B , tagSeparatorChar )
2019-06-27 15:15:25 +02:00
}
kb . B [ len ( kb . B ) - 1 ] ++
ts . Seek ( kb . B )
2021-03-07 20:12:28 +01:00
// Assume that a seek cost is equivalent to 1000 ordinary loops.
loopsCount += 1000
2019-05-22 23:16:55 +02:00
continue
}
2019-06-27 15:15:25 +02:00
prevMatch = true
2019-09-20 18:46:47 +02:00
prevMatchingSuffix = append ( prevMatchingSuffix [ : 0 ] , suffix ... )
for _ , metricID := range mp . MetricIDs {
2020-09-21 23:36:43 +02:00
f ( metricID )
2019-06-27 15:15:25 +02:00
}
2019-05-22 23:16:55 +02:00
}
if err := ts . Error ( ) ; err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , fmt . Errorf ( "error when searching for tag filter prefix %q: %w" , prefix , err )
2019-05-22 23:16:55 +02:00
}
2021-02-18 11:47:36 +01:00
return loopsCount , nil
2019-05-22 23:16:55 +02:00
}
2021-07-30 07:37:10 +02:00
func ( is * indexSearch ) updateMetricIDsForOrSuffixes ( tf * tagFilter , metricIDs * uint64set . Set , maxMetrics int , maxLoopsCount int64 ) ( int64 , error ) {
2019-05-22 23:16:55 +02:00
if tf . isNegative {
logger . Panicf ( "BUG: isNegative must be false" )
}
kb := kbPool . Get ( )
defer kbPool . Put ( kb )
2021-03-16 17:46:22 +01:00
var loopsCount int64
2019-05-22 23:16:55 +02:00
for _ , orSuffix := range tf . orSuffixes {
kb . B = append ( kb . B [ : 0 ] , tf . prefix ... )
kb . B = append ( kb . B , orSuffix ... )
kb . B = append ( kb . B , tagSeparatorChar )
2021-07-30 07:37:10 +02:00
lc , err := is . updateMetricIDsForOrSuffix ( kb . B , metricIDs , maxMetrics , maxLoopsCount - loopsCount )
2021-06-08 12:04:08 +02:00
loopsCount += lc
2021-02-18 11:47:36 +01:00
if err != nil {
return loopsCount , err
2019-05-22 23:16:55 +02:00
}
2019-09-24 20:10:22 +02:00
if metricIDs . Len ( ) >= maxMetrics {
2021-02-18 11:47:36 +01:00
return loopsCount , nil
2019-05-22 23:16:55 +02:00
}
}
2021-02-18 11:47:36 +01:00
return loopsCount , nil
2019-05-22 23:16:55 +02:00
}
2021-07-30 07:37:10 +02:00
func ( is * indexSearch ) updateMetricIDsForOrSuffix ( prefix [ ] byte , metricIDs * uint64set . Set , maxMetrics int , maxLoopsCount int64 ) ( int64 , error ) {
2019-05-22 23:16:55 +02:00
ts := & is . ts
2019-09-20 18:46:47 +02:00
mp := & is . mp
2021-03-16 17:46:22 +01:00
var loopsCount int64
2020-07-23 18:21:49 +02:00
loopsPaceLimiter := 0
2019-05-22 23:16:55 +02:00
ts . Seek ( prefix )
2019-09-24 20:10:22 +02:00
for metricIDs . Len ( ) < maxMetrics && ts . NextItem ( ) {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , err
2020-07-23 19:42:57 +02:00
}
2020-07-23 18:21:49 +02:00
}
loopsPaceLimiter ++
2019-09-20 18:46:47 +02:00
item := ts . Item
if ! bytes . HasPrefix ( item , prefix ) {
2021-02-18 11:47:36 +01:00
return loopsCount , nil
2019-09-20 18:46:47 +02:00
}
if err := mp . InitOnlyTail ( item , item [ len ( prefix ) : ] ) ; err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , err
2019-09-20 18:46:47 +02:00
}
2021-03-16 17:46:22 +01:00
loopsCount += int64 ( mp . MetricIDsLen ( ) )
if loopsCount > maxLoopsCount {
return loopsCount , errTooManyLoops
}
2019-09-23 19:40:38 +02:00
mp . ParseMetricIDs ( )
2020-07-21 19:56:49 +02:00
metricIDs . AddMulti ( mp . MetricIDs )
2019-05-22 23:16:55 +02:00
}
if err := ts . Error ( ) ; err != nil {
2021-02-18 11:47:36 +01:00
return loopsCount , fmt . Errorf ( "error when searching for tag filter prefix %q: %w" , prefix , err )
2019-05-22 23:16:55 +02:00
}
2021-02-18 11:47:36 +01:00
return loopsCount , nil
2019-05-22 23:16:55 +02:00
}
2021-03-15 12:31:55 +01:00
var errFallbackToGlobalSearch = errors . New ( "fall back from per-day index search to global index search" )
2019-05-22 23:16:55 +02:00
2021-04-07 12:31:57 +02:00
const maxDaysForPerDaySearch = 40
2020-10-01 18:03:34 +02:00
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) tryUpdatingMetricIDsForDateRange ( qt * querytracer . Tracer , metricIDs * uint64set . Set , tfs * TagFilters , tr TimeRange , maxMetrics int ) error {
2024-02-23 23:15:21 +01:00
is . db . dateRangeSearchCalls . Add ( 1 )
2019-11-09 22:17:42 +01:00
minDate := uint64 ( tr . MinTimestamp ) / msecPerDay
2022-06-12 03:32:13 +02:00
maxDate := uint64 ( tr . MaxTimestamp - 1 ) / msecPerDay
2021-04-07 12:31:57 +02:00
if minDate > maxDate || maxDate - minDate > maxDaysForPerDaySearch {
2019-11-09 22:17:42 +01:00
// Too much dates must be covered. Give up, since it may be slow.
2021-03-15 12:31:55 +01:00
return errFallbackToGlobalSearch
2020-03-13 21:42:22 +01:00
}
if minDate == maxDate {
// Fast path - query only a single date.
2022-06-09 18:46:26 +02:00
m , err := is . getMetricIDsForDateAndFilters ( qt , minDate , tfs , maxMetrics )
2020-03-13 21:42:22 +01:00
if err != nil {
return err
}
2021-07-06 17:21:35 +02:00
metricIDs . UnionMayOwn ( m )
2024-02-23 23:15:21 +01:00
is . db . dateRangeSearchHits . Add ( 1 )
2020-03-13 21:42:22 +01:00
return nil
2019-11-09 22:17:42 +01:00
}
2020-03-13 21:42:22 +01:00
// Slower path - search for metricIDs for each day in parallel.
2022-06-09 18:46:26 +02:00
qt = qt . NewChild ( "parallel search for metric ids in per-day index: filters=%s, dayRange=[%d..%d]" , tfs , minDate , maxDate )
defer qt . Done ( )
2022-04-06 12:34:00 +02:00
wg := getWaitGroup ( )
2019-11-09 22:17:42 +01:00
var errGlobal error
2020-03-13 21:42:22 +01:00
var mu sync . Mutex // protects metricIDs + errGlobal vars from concurrent access below
2019-11-09 22:17:42 +01:00
for minDate <= maxDate {
2022-06-30 17:17:07 +02:00
qtChild := qt . NewChild ( "parallel thread for date=%s" , dateToString ( minDate ) )
2019-11-09 22:17:42 +01:00
wg . Add ( 1 )
2020-03-13 21:42:22 +01:00
go func ( date uint64 ) {
2022-06-09 18:46:26 +02:00
defer func ( ) {
qtChild . Done ( )
wg . Done ( )
} ( )
2020-07-23 19:42:57 +02:00
isLocal := is . db . getIndexSearch ( is . deadline )
2022-06-09 18:46:26 +02:00
m , err := isLocal . getMetricIDsForDateAndFilters ( qtChild , date , tfs , maxMetrics )
2021-02-16 20:22:10 +01:00
is . db . putIndexSearch ( isLocal )
2019-11-09 22:17:42 +01:00
mu . Lock ( )
2020-03-13 21:42:22 +01:00
defer mu . Unlock ( )
if errGlobal != nil {
return
2019-11-09 22:17:42 +01:00
}
if err != nil {
2019-12-03 13:46:39 +01:00
dateStr := time . Unix ( int64 ( date * 24 * 3600 ) , 0 )
2021-03-15 12:31:55 +01:00
errGlobal = fmt . Errorf ( "cannot search for metricIDs at %s: %w" , dateStr , err )
2020-03-13 21:42:22 +01:00
return
}
if metricIDs . Len ( ) < maxMetrics {
2021-07-06 17:21:35 +02:00
metricIDs . UnionMayOwn ( m )
2019-11-09 22:17:42 +01:00
}
2020-03-13 21:42:22 +01:00
} ( minDate )
2019-11-09 22:17:42 +01:00
minDate ++
}
wg . Wait ( )
2022-04-06 12:34:00 +02:00
putWaitGroup ( wg )
2019-11-09 22:17:42 +01:00
if errGlobal != nil {
2020-03-13 21:42:22 +01:00
return errGlobal
2019-11-09 22:17:42 +01:00
}
2024-02-23 23:15:21 +01:00
is . db . dateRangeSearchHits . Add ( 1 )
2020-03-13 21:42:22 +01:00
return nil
2019-11-09 22:17:42 +01:00
}
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) getMetricIDsForDateAndFilters ( qt * querytracer . Tracer , date uint64 , tfs * TagFilters , maxMetrics int ) ( * uint64set . Set , error ) {
2022-06-30 17:17:07 +02:00
if qt . Enabled ( ) {
qt = qt . NewChild ( "search for metric ids on a particular day: filters=%s, date=%s, maxMetrics=%d" , tfs , dateToString ( date ) , maxMetrics )
defer qt . Done ( )
}
2021-02-18 11:47:36 +01:00
// Sort tfs by loopsCount needed for performing each filter.
// This stats is usually collected from the previous queries.
2021-02-16 20:22:10 +01:00
// This way we limit the amount of work below by applying fast filters at first.
2021-02-16 12:03:58 +01:00
type tagFilterWithWeight struct {
2021-03-16 17:46:22 +01:00
tf * tagFilter
loopsCount int64
filterLoopsCount int64
2020-03-30 23:44:41 +02:00
}
2021-02-16 20:22:10 +01:00
tfws := make ( [ ] tagFilterWithWeight , len ( tfs . tfs ) )
2021-02-18 11:47:36 +01:00
currentTime := fasttime . UnixTimestamp ( )
2019-11-09 22:17:42 +01:00
for i := range tfs . tfs {
tf := & tfs . tfs [ i ]
2021-03-16 17:46:22 +01:00
loopsCount , filterLoopsCount , timestamp := is . getLoopsCountAndTimestampForDateFilter ( date , tf )
if currentTime > timestamp + 3600 {
2021-03-11 23:48:28 +01:00
// Update stats once per hour for relatively fast tag filters.
// There is no need in spending CPU resources on updating stats for heavy tag filters.
2021-02-21 20:32:52 +01:00
if loopsCount <= 10e6 {
loopsCount = 0
}
2021-03-16 17:46:22 +01:00
if filterLoopsCount <= 10e6 {
filterLoopsCount = 0
}
2021-02-17 16:55:29 +01:00
}
2021-02-16 20:22:10 +01:00
tfws [ i ] = tagFilterWithWeight {
2021-03-16 17:46:22 +01:00
tf : tf ,
loopsCount : loopsCount ,
filterLoopsCount : filterLoopsCount ,
2020-03-30 23:44:41 +02:00
}
2021-02-17 16:28:15 +01:00
}
sort . Slice ( tfws , func ( i , j int ) bool {
2021-02-16 20:22:10 +01:00
a , b := & tfws [ i ] , & tfws [ j ]
2021-02-18 11:47:36 +01:00
if a . loopsCount != b . loopsCount {
return a . loopsCount < b . loopsCount
2020-03-30 23:44:41 +02:00
}
2021-02-15 15:24:08 +01:00
return a . tf . Less ( b . tf )
2020-03-30 23:44:41 +02:00
} )
2021-03-16 17:46:22 +01:00
getFirstPositiveLoopsCount := func ( tfws [ ] tagFilterWithWeight ) int64 {
for i := range tfws {
if n := tfws [ i ] . loopsCount ; n > 0 {
return n
}
}
return int64Max
}
storeLoopsCount := func ( tfw * tagFilterWithWeight , loopsCount int64 ) {
if loopsCount != tfw . loopsCount {
tfw . loopsCount = loopsCount
is . storeLoopsCountForDateFilter ( date , tfw . tf , tfw . loopsCount , tfw . filterLoopsCount )
}
}
2020-03-30 23:44:41 +02:00
2022-06-09 18:46:26 +02:00
// Populate metricIDs for the first non-negative filter with the smallest cost.
qtChild := qt . NewChild ( "search for the first non-negative filter with the smallest cost" )
2020-04-24 20:11:46 +02:00
var metricIDs * uint64set . Set
2021-02-16 20:22:10 +01:00
tfwsRemaining := tfws [ : 0 ]
2021-03-16 17:46:22 +01:00
maxDateMetrics := intMax
if maxMetrics < intMax / 50 {
maxDateMetrics = maxMetrics * 50
}
for i , tfw := range tfws {
2021-02-16 12:03:58 +01:00
tf := tfw . tf
2021-09-09 20:09:18 +02:00
if tf . isNegative || tf . isEmptyMatch {
2021-02-16 20:22:10 +01:00
tfwsRemaining = append ( tfwsRemaining , tfw )
2020-04-24 20:11:46 +02:00
continue
}
2021-03-16 17:46:22 +01:00
maxLoopsCount := getFirstPositiveLoopsCount ( tfws [ i + 1 : ] )
2022-06-09 18:46:26 +02:00
m , loopsCount , err := is . getMetricIDsForDateTagFilter ( qtChild , tf , date , tfs . commonPrefix , maxDateMetrics , maxLoopsCount )
2020-04-24 20:11:46 +02:00
if err != nil {
2021-03-16 17:46:22 +01:00
if errors . Is ( err , errTooManyLoops ) {
// The tf took too many loops compared to the next filter. Postpone applying this filter.
2022-06-09 18:46:26 +02:00
qtChild . Printf ( "the filter={%s} took more than %d loops; postpone it" , tf , maxLoopsCount )
2021-03-17 14:09:40 +01:00
storeLoopsCount ( & tfw , 2 * loopsCount )
2021-03-16 17:46:22 +01:00
tfwsRemaining = append ( tfwsRemaining , tfw )
continue
}
// Move failing filter to the end of filter list.
storeLoopsCount ( & tfw , int64Max )
2020-04-24 20:11:46 +02:00
return nil , err
}
if m . Len ( ) >= maxDateMetrics {
2021-03-16 17:46:22 +01:00
// Too many time series found by a single tag filter. Move the filter to the end of list.
2022-06-09 18:46:26 +02:00
qtChild . Printf ( "the filter={%s} matches at least %d series; postpone it" , tf , maxDateMetrics )
2021-03-16 17:46:22 +01:00
storeLoopsCount ( & tfw , int64Max - 1 )
2021-03-15 19:31:24 +01:00
tfwsRemaining = append ( tfwsRemaining , tfw )
2019-11-09 22:17:42 +01:00
continue
}
2021-03-16 17:46:22 +01:00
storeLoopsCount ( & tfw , loopsCount )
2020-04-24 20:11:46 +02:00
metricIDs = m
2021-03-16 17:46:22 +01:00
tfwsRemaining = append ( tfwsRemaining , tfws [ i + 1 : ] ... )
2022-06-09 18:46:26 +02:00
qtChild . Printf ( "the filter={%s} matches less than %d series (actually %d series); use it" , tf , maxDateMetrics , metricIDs . Len ( ) )
2019-11-09 22:17:42 +01:00
break
}
2022-06-09 18:46:26 +02:00
qtChild . Done ( )
2021-03-16 17:46:22 +01:00
tfws = tfwsRemaining
2020-04-24 20:11:46 +02:00
if metricIDs == nil {
// All the filters in tfs are negative or match too many time series.
// Populate all the metricIDs for the given (date),
2020-03-13 21:42:22 +01:00
// so later they can be filtered out with negative filters.
2022-06-09 18:46:26 +02:00
qt . Printf ( "all the filters are negative or match more than %d time series; fall back to searching for all the metric ids" , maxDateMetrics )
2020-03-13 21:42:22 +01:00
m , err := is . getMetricIDsForDate ( date , maxDateMetrics )
if err != nil {
2020-06-30 21:58:18 +02:00
return nil , fmt . Errorf ( "cannot obtain all the metricIDs: %w" , err )
2019-11-09 22:17:42 +01:00
}
2020-04-24 20:11:46 +02:00
if m . Len ( ) >= maxDateMetrics {
// Too many time series found for the given (date). Fall back to global search.
2021-03-15 12:31:55 +01:00
return nil , errFallbackToGlobalSearch
2019-11-09 22:17:42 +01:00
}
2020-03-13 21:42:22 +01:00
metricIDs = m
2022-06-09 18:46:26 +02:00
qt . Printf ( "found %d metric ids" , metricIDs . Len ( ) )
2019-11-09 22:17:42 +01:00
}
2021-03-16 17:46:22 +01:00
sort . Slice ( tfws , func ( i , j int ) bool {
a , b := & tfws [ i ] , & tfws [ j ]
if a . filterLoopsCount != b . filterLoopsCount {
return a . filterLoopsCount < b . filterLoopsCount
}
return a . tf . Less ( b . tf )
} )
getFirstPositiveFilterLoopsCount := func ( tfws [ ] tagFilterWithWeight ) int64 {
for i := range tfws {
if n := tfws [ i ] . filterLoopsCount ; n > 0 {
return n
}
}
return int64Max
}
storeFilterLoopsCount := func ( tfw * tagFilterWithWeight , filterLoopsCount int64 ) {
if filterLoopsCount != tfw . filterLoopsCount {
is . storeLoopsCountForDateFilter ( date , tfw . tf , tfw . loopsCount , filterLoopsCount )
}
}
2020-03-13 21:42:22 +01:00
// Intersect metricIDs with the rest of filters.
2021-02-10 21:40:20 +01:00
//
// Do not run these tag filters in parallel, since this may result in CPU and RAM waste
2023-02-13 13:27:13 +01:00
// when the initial tag filters significantly reduce the number of found metricIDs,
2021-02-10 21:40:20 +01:00
// so the remaining filters could be performed via much faster metricName matching instead
// of slow selecting of matching metricIDs.
2022-06-09 18:46:26 +02:00
qtChild = qt . NewChild ( "intersect the remaining %d filters with the found %d metric ids" , len ( tfws ) , metricIDs . Len ( ) )
2021-03-15 19:31:24 +01:00
var tfsPostponed [ ] * tagFilter
2021-03-16 17:46:22 +01:00
for i , tfw := range tfws {
2021-02-16 12:03:58 +01:00
tf := tfw . tf
2021-02-10 21:40:20 +01:00
metricIDsLen := metricIDs . Len ( )
if metricIDsLen == 0 {
2021-03-16 17:46:22 +01:00
// There is no need in applying the remaining filters to an empty set.
2021-02-10 21:40:20 +01:00
break
}
2021-03-16 17:46:22 +01:00
if tfw . filterLoopsCount > int64 ( metricIDsLen ) * loopsCountPerMetricNameMatch {
2021-02-10 21:40:20 +01:00
// It should be faster performing metricName match on the remaining filters
// instead of scanning big number of entries in the inverted index for these filters.
2021-03-16 17:46:22 +01:00
for _ , tfw := range tfws [ i : ] {
tfsPostponed = append ( tfsPostponed , tfw . tf )
2021-03-15 19:31:24 +01:00
}
break
}
2021-03-16 17:46:22 +01:00
maxLoopsCount := getFirstPositiveFilterLoopsCount ( tfws [ i + 1 : ] )
2021-03-16 23:46:22 +01:00
if maxLoopsCount == int64Max {
maxLoopsCount = int64 ( metricIDsLen ) * loopsCountPerMetricNameMatch
}
2022-06-09 18:46:26 +02:00
m , filterLoopsCount , err := is . getMetricIDsForDateTagFilter ( qtChild , tf , date , tfs . commonPrefix , intMax , maxLoopsCount )
2021-02-10 21:40:20 +01:00
if err != nil {
2021-03-16 17:46:22 +01:00
if errors . Is ( err , errTooManyLoops ) {
// Postpone tf, since it took more loops than the next filter may need.
2022-06-09 18:46:26 +02:00
qtChild . Printf ( "postpone filter={%s}, since it took more than %d loops" , tf , maxLoopsCount )
2021-03-17 14:09:40 +01:00
storeFilterLoopsCount ( & tfw , 2 * filterLoopsCount )
2021-03-16 17:46:22 +01:00
tfsPostponed = append ( tfsPostponed , tf )
continue
}
// Move failing tf to the end of filter list
storeFilterLoopsCount ( & tfw , int64Max )
2021-02-10 21:40:20 +01:00
return nil , err
}
2021-03-16 17:46:22 +01:00
storeFilterLoopsCount ( & tfw , filterLoopsCount )
2021-09-09 20:09:18 +02:00
if tf . isNegative || tf . isEmptyMatch {
2021-02-10 21:40:20 +01:00
metricIDs . Subtract ( m )
2022-06-09 18:46:26 +02:00
qtChild . Printf ( "subtract %d metric ids from the found %d metric ids for filter={%s}; resulting metric ids: %d" , m . Len ( ) , metricIDsLen , tf , metricIDs . Len ( ) )
2021-02-10 21:40:20 +01:00
} else {
metricIDs . Intersect ( m )
2022-06-09 18:46:26 +02:00
qtChild . Printf ( "intersect %d metric ids with the found %d metric ids for filter={%s}; resulting metric ids: %d" , m . Len ( ) , metricIDsLen , tf , metricIDs . Len ( ) )
2021-02-10 21:40:20 +01:00
}
2021-02-10 15:13:17 +01:00
}
2022-06-09 18:46:26 +02:00
qtChild . Done ( )
2021-02-10 21:40:20 +01:00
if metricIDs . Len ( ) == 0 {
// There is no need in applying tfsPostponed, since the result is empty.
2022-06-09 18:46:26 +02:00
qt . Printf ( "found zero metric ids" )
2021-02-10 21:40:20 +01:00
return nil , nil
2019-11-09 22:17:42 +01:00
}
2020-04-24 20:11:46 +02:00
if len ( tfsPostponed ) > 0 {
// Apply the postponed filters via metricName match.
2022-06-09 18:46:26 +02:00
qt . Printf ( "apply postponed filters=%s to %d metrics ids" , tfsPostponed , metricIDs . Len ( ) )
2020-04-24 20:11:46 +02:00
var m uint64set . Set
2022-06-09 18:46:26 +02:00
if err := is . updateMetricIDsByMetricNameMatch ( qt , & m , metricIDs , tfsPostponed ) ; err != nil {
2020-04-24 20:11:46 +02:00
return nil , err
}
return & m , nil
}
2022-06-09 18:46:26 +02:00
qt . Printf ( "found %d metric ids" , metricIDs . Len ( ) )
2020-03-13 21:42:22 +01:00
return metricIDs , nil
2019-11-09 22:17:42 +01:00
}
2021-03-16 17:46:22 +01:00
const (
intMax = int ( ( ^ uint ( 0 ) ) >> 1 )
int64Max = int64 ( ( 1 << 63 ) - 1 )
)
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
func ( is * indexSearch ) createPerDayIndexes ( date uint64 , tsid * TSID , mn * MetricName ) {
2021-02-09 23:44:54 +01:00
ii := getIndexItems ( )
defer putIndexItems ( ii )
2019-11-09 22:17:42 +01:00
2024-04-16 18:53:29 +02:00
// Create date -> metricID entry.
2022-06-19 20:58:53 +02:00
ii . B = marshalCommonPrefix ( ii . B , nsPrefixDateToMetricID )
2021-02-09 23:44:54 +01:00
ii . B = encoding . MarshalUint64 ( ii . B , date )
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
ii . B = encoding . MarshalUint64 ( ii . B , tsid . MetricID )
ii . Next ( )
2024-04-16 18:53:29 +02:00
// Create metricName -> TSID entry.
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
ii . B = marshalCommonPrefix ( ii . B , nsPrefixDateMetricNameToTSID )
ii . B = encoding . MarshalUint64 ( ii . B , date )
ii . B = mn . Marshal ( ii . B )
ii . B = append ( ii . B , kvSeparatorChar )
ii . B = tsid . Marshal ( ii . B )
2021-02-09 23:44:54 +01:00
ii . Next ( )
2019-11-09 22:17:42 +01:00
2024-04-16 18:53:29 +02:00
// Create per-day tag -> metricID entries for every tag in mn.
2019-11-09 22:17:42 +01:00
kb := kbPool . Get ( )
2022-06-19 20:58:53 +02:00
kb . B = marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixDateTagToMetricIDs )
2019-11-09 22:17:42 +01:00
kb . B = encoding . MarshalUint64 ( kb . B , date )
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
ii . registerTagIndexes ( kb . B , mn , tsid . MetricID )
2024-04-16 18:53:29 +02:00
kbPool . Put ( kb )
2022-12-04 08:30:31 +01:00
is . db . tb . AddItems ( ii . Items )
2019-05-22 23:16:55 +02:00
}
2021-02-09 23:44:54 +01:00
func ( ii * indexItems ) registerTagIndexes ( prefix [ ] byte , mn * MetricName , metricID uint64 ) {
2024-04-16 18:53:29 +02:00
// Add MetricGroup -> metricID entry.
2021-02-09 23:44:54 +01:00
ii . B = append ( ii . B , prefix ... )
ii . B = marshalTagValue ( ii . B , nil )
ii . B = marshalTagValue ( ii . B , mn . MetricGroup )
ii . B = encoding . MarshalUint64 ( ii . B , metricID )
ii . Next ( )
ii . addReverseMetricGroupIfNeeded ( prefix , mn , metricID )
2024-04-16 18:53:29 +02:00
// Add tag -> metricID entries.
2021-02-09 23:44:54 +01:00
for _ , tag := range mn . Tags {
ii . B = append ( ii . B , prefix ... )
ii . B = tag . Marshal ( ii . B )
ii . B = encoding . MarshalUint64 ( ii . B , metricID )
ii . Next ( )
}
2024-04-16 18:53:29 +02:00
// Add index entries for composite tags: MetricGroup+tag -> metricID.
2021-02-09 23:44:54 +01:00
compositeKey := kbPool . Get ( )
for _ , tag := range mn . Tags {
compositeKey . B = marshalCompositeTagKey ( compositeKey . B [ : 0 ] , mn . MetricGroup , tag . Key )
ii . B = append ( ii . B , prefix ... )
ii . B = marshalTagValue ( ii . B , compositeKey . B )
ii . B = marshalTagValue ( ii . B , tag . Value )
ii . B = encoding . MarshalUint64 ( ii . B , metricID )
ii . Next ( )
}
kbPool . Put ( compositeKey )
}
func ( ii * indexItems ) addReverseMetricGroupIfNeeded ( prefix [ ] byte , mn * MetricName , metricID uint64 ) {
2020-05-27 20:35:58 +02:00
if bytes . IndexByte ( mn . MetricGroup , '.' ) < 0 {
// The reverse metric group is needed only for Graphite-like metrics with points.
return
}
// This is most likely a Graphite metric like 'foo.bar.baz'.
// Store reverse metric name 'zab.rab.oof' in order to speed up search for '*.bar.baz'
// when the Graphite wildcard has a suffix matching small number of time series.
2021-02-09 23:44:54 +01:00
ii . B = append ( ii . B , prefix ... )
ii . B = marshalTagValue ( ii . B , graphiteReverseTagKey )
2020-05-27 20:35:58 +02:00
revBuf := kbPool . Get ( )
revBuf . B = reverseBytes ( revBuf . B [ : 0 ] , mn . MetricGroup )
2021-02-09 23:44:54 +01:00
ii . B = marshalTagValue ( ii . B , revBuf . B )
2020-05-27 20:35:58 +02:00
kbPool . Put ( revBuf )
2021-02-09 23:44:54 +01:00
ii . B = encoding . MarshalUint64 ( ii . B , metricID )
ii . Next ( )
}
func isArtificialTagKey ( key [ ] byte ) bool {
if bytes . Equal ( key , graphiteReverseTagKey ) {
return true
}
if len ( key ) > 0 && key [ 0 ] == compositeTagKeyPrefix {
return true
}
return false
2020-05-27 20:35:58 +02:00
}
// The tag key for reverse metric name used for speeding up searching
// for Graphite wildcards with suffix matching small number of time series,
// i.e. '*.bar.baz'.
//
// It is expected that the given key isn't be used by users.
var graphiteReverseTagKey = [ ] byte ( "\xff" )
2021-02-09 23:44:54 +01:00
// The prefix for composite tag, which is used for speeding up searching
// for composite filters, which contain `{__name__="<metric_name>"}` filter.
//
// It is expected that the given prefix isn't used by users.
const compositeTagKeyPrefix = '\xfe'
func marshalCompositeTagKey ( dst , name , key [ ] byte ) [ ] byte {
dst = append ( dst , compositeTagKeyPrefix )
dst = encoding . MarshalVarUint64 ( dst , uint64 ( len ( name ) ) )
dst = append ( dst , name ... )
dst = append ( dst , key ... )
return dst
}
2022-06-09 18:46:26 +02:00
func unmarshalCompositeTagKey ( src [ ] byte ) ( [ ] byte , [ ] byte , error ) {
if len ( src ) == 0 {
return nil , nil , fmt . Errorf ( "composite tag key cannot be empty" )
}
if src [ 0 ] != compositeTagKeyPrefix {
return nil , nil , fmt . Errorf ( "missing composite tag key prefix in %q" , src )
}
src = src [ 1 : ]
2024-05-14 01:23:44 +02:00
n , nSize := encoding . UnmarshalVarUint64 ( src )
if nSize <= 0 {
return nil , nil , fmt . Errorf ( "cannot unmarshal metric name length from composite tag key" )
2022-06-09 18:46:26 +02:00
}
2024-05-14 01:23:44 +02:00
src = src [ nSize : ]
2022-06-09 18:46:26 +02:00
if uint64 ( len ( src ) ) < n {
return nil , nil , fmt . Errorf ( "missing metric name with length %d in composite tag key %q" , n , src )
}
name := src [ : n ]
key := src [ n : ]
return name , key , nil
}
2020-05-27 20:35:58 +02:00
func reverseBytes ( dst , src [ ] byte ) [ ] byte {
for i := len ( src ) - 1 ; i >= 0 ; i -- {
dst = append ( dst , src [ i ] )
}
return dst
}
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
func ( is * indexSearch ) hasDateMetricIDNoExtDB ( date , metricID uint64 ) bool {
2019-05-22 23:16:55 +02:00
ts := & is . ts
kb := & is . kb
2022-06-19 20:58:53 +02:00
kb . B = marshalCommonPrefix ( kb . B [ : 0 ] , nsPrefixDateToMetricID )
2019-05-22 23:16:55 +02:00
kb . B = encoding . MarshalUint64 ( kb . B , date )
kb . B = encoding . MarshalUint64 ( kb . B , metricID )
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
err := ts . FirstItemWithPrefix ( kb . B )
if err == nil {
if string ( ts . Item ) != string ( kb . B ) {
logger . Panicf ( "FATAL: unexpected entry for (date=%s, metricID=%d); got %q; want %q" , dateToString ( date ) , metricID , ts . Item , kb . B )
2019-05-22 23:16:55 +02:00
}
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
// Fast path - the (date, metricID) entry is found in the current indexdb.
return true
2019-05-22 23:16:55 +02:00
}
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
if err != io . EOF {
logger . Panicf ( "FATAL: unexpected error when searching for (date=%s, metricID=%d) entry: %s" , dateToString ( date ) , metricID , err )
2019-05-22 23:16:55 +02:00
}
lib/storage: switch from global to per-day index for `MetricName -> TSID` mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9350795af41cbfc3ca0e8a5af084fce
2023-07-14 00:33:41 +02:00
return false
2019-05-22 23:16:55 +02:00
}
2022-06-09 18:46:26 +02:00
func ( is * indexSearch ) getMetricIDsForDateTagFilter ( qt * querytracer . Tracer , tf * tagFilter , date uint64 , commonPrefix [ ] byte ,
maxMetrics int , maxLoopsCount int64 ) ( * uint64set . Set , int64 , error ) {
2022-06-30 17:17:07 +02:00
if qt . Enabled ( ) {
qt = qt . NewChild ( "get metric ids for filter and date: filter={%s}, date=%s, maxMetrics=%d, maxLoopsCount=%d" , tf , dateToString ( date ) , maxMetrics , maxLoopsCount )
defer qt . Done ( )
}
2019-11-09 22:17:42 +01:00
if ! bytes . HasPrefix ( tf . prefix , commonPrefix ) {
logger . Panicf ( "BUG: unexpected tf.prefix %q; must start with commonPrefix %q" , tf . prefix , commonPrefix )
}
kb := kbPool . Get ( )
2021-09-09 20:09:18 +02:00
defer kbPool . Put ( kb )
2022-06-12 03:32:13 +02:00
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
2022-01-05 15:00:11 +01:00
prefix := kb . B
kb . B = append ( kb . B , tf . prefix [ len ( commonPrefix ) : ] ... )
2019-11-09 22:17:42 +01:00
tfNew := * tf
tfNew . isNegative = false // isNegative for the original tf is handled by the caller.
2022-01-05 15:00:11 +01:00
tfNew . prefix = kb . B
2022-06-09 18:46:26 +02:00
metricIDs , loopsCount , err := is . getMetricIDsForTagFilter ( qt , & tfNew , maxMetrics , maxLoopsCount )
2021-09-09 20:09:18 +02:00
if err != nil {
return nil , loopsCount , err
}
if tf . isNegative || ! tf . isEmptyMatch {
return metricIDs , loopsCount , nil
}
// The tag filter, which matches empty label such as {foo=~"bar|"}
// Convert it to negative filter, which matches {foo=~".+",foo!~"bar|"}.
// This fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601
// See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395
maxLoopsCount -= loopsCount
2022-06-09 18:46:26 +02:00
var tfGross tagFilter
if err := tfGross . Init ( prefix , tf . key , [ ] byte ( ".+" ) , false , true ) ; err != nil {
2021-09-09 20:09:18 +02:00
logger . Panicf ( ` BUG: cannot init tag filter: { %q=~".+"}: %s ` , tf . key , err )
}
2022-06-09 18:46:26 +02:00
m , lc , err := is . getMetricIDsForTagFilter ( qt , & tfGross , maxMetrics , maxLoopsCount )
2021-09-09 20:09:18 +02:00
loopsCount += lc
if err != nil {
return nil , loopsCount , err
}
2022-06-09 18:46:26 +02:00
mLen := m . Len ( )
2021-09-09 20:09:18 +02:00
m . Subtract ( metricIDs )
2022-06-09 18:46:26 +02:00
qt . Printf ( "subtract %d metric ids for filter={%s} from %d metric ids for filter={%s}" , metricIDs . Len ( ) , & tfNew , mLen , & tfGross )
qt . Printf ( "found %d metric ids, spent %d loops" , m . Len ( ) , loopsCount )
2021-09-09 20:09:18 +02:00
return m , loopsCount , nil
2021-02-16 20:22:10 +01:00
}
2021-03-16 17:46:22 +01:00
func ( is * indexSearch ) getLoopsCountAndTimestampForDateFilter ( date uint64 , tf * tagFilter ) ( int64 , int64 , uint64 ) {
2021-06-29 11:40:03 +02:00
is . kb . B = appendDateTagFilterCacheKey ( is . kb . B [ : 0 ] , is . db . name , date , tf )
2021-02-16 20:22:10 +01:00
kb := kbPool . Get ( )
defer kbPool . Put ( kb )
2021-02-23 14:47:19 +01:00
kb . B = is . db . loopsPerDateTagFilterCache . Get ( kb . B [ : 0 ] , is . kb . B )
2021-03-16 17:46:22 +01:00
if len ( kb . B ) != 3 * 8 {
return 0 , 0 , 0
2021-02-16 20:22:10 +01:00
}
2021-03-16 17:46:22 +01:00
loopsCount := encoding . UnmarshalInt64 ( kb . B )
filterLoopsCount := encoding . UnmarshalInt64 ( kb . B [ 8 : ] )
timestamp := encoding . UnmarshalUint64 ( kb . B [ 16 : ] )
return loopsCount , filterLoopsCount , timestamp
2021-02-16 20:22:10 +01:00
}
2021-03-16 17:46:22 +01:00
func ( is * indexSearch ) storeLoopsCountForDateFilter ( date uint64 , tf * tagFilter , loopsCount , filterLoopsCount int64 ) {
2021-02-18 12:56:50 +01:00
currentTimestamp := fasttime . UnixTimestamp ( )
2021-06-29 11:40:03 +02:00
is . kb . B = appendDateTagFilterCacheKey ( is . kb . B [ : 0 ] , is . db . name , date , tf )
2021-02-16 20:22:10 +01:00
kb := kbPool . Get ( )
2021-03-16 17:46:22 +01:00
kb . B = encoding . MarshalInt64 ( kb . B [ : 0 ] , loopsCount )
kb . B = encoding . MarshalInt64 ( kb . B , filterLoopsCount )
2021-02-18 12:56:50 +01:00
kb . B = encoding . MarshalUint64 ( kb . B , currentTimestamp )
2021-02-23 14:47:19 +01:00
is . db . loopsPerDateTagFilterCache . Set ( is . kb . B , kb . B )
2021-02-16 20:22:10 +01:00
kbPool . Put ( kb )
2020-03-30 23:44:41 +02:00
}
2021-06-29 11:40:03 +02:00
func appendDateTagFilterCacheKey ( dst [ ] byte , indexDBName string , date uint64 , tf * tagFilter ) [ ] byte {
dst = append ( dst , indexDBName ... )
2020-03-30 23:44:41 +02:00
dst = encoding . MarshalUint64 ( dst , date )
dst = tf . Marshal ( dst )
return dst
2019-11-09 22:17:42 +01:00
}
2020-03-13 21:42:22 +01:00
func ( is * indexSearch ) getMetricIDsForDate ( date uint64 , maxMetrics int ) ( * uint64set . Set , error ) {
2019-11-09 22:17:42 +01:00
// Extract all the metricIDs from (date, __name__=value)->metricIDs entries.
kb := kbPool . Get ( )
defer kbPool . Put ( kb )
2022-06-12 03:32:13 +02:00
kb . B = is . marshalCommonPrefixForDate ( kb . B [ : 0 ] , date )
2019-11-09 22:17:42 +01:00
kb . B = marshalTagValue ( kb . B , nil )
2020-03-13 21:42:22 +01:00
var metricIDs uint64set . Set
if err := is . updateMetricIDsForPrefix ( kb . B , & metricIDs , maxMetrics ) ; err != nil {
return nil , err
}
return & metricIDs , nil
2019-11-09 22:17:42 +01:00
}
2019-11-09 17:00:58 +01:00
2019-11-09 22:17:42 +01:00
func ( is * indexSearch ) updateMetricIDsForPrefix ( prefix [ ] byte , metricIDs * uint64set . Set , maxMetrics int ) error {
ts := & is . ts
mp := & is . mp
2020-07-23 18:21:49 +02:00
loopsPaceLimiter := 0
2019-09-20 10:53:42 +02:00
ts . Seek ( prefix )
for ts . NextItem ( ) {
2020-08-07 07:37:33 +02:00
if loopsPaceLimiter & paceLimiterFastIterationsMask == 0 {
2020-07-23 19:42:57 +02:00
if err := checkSearchDeadlineAndPace ( is . deadline ) ; err != nil {
return err
}
2020-07-23 18:21:49 +02:00
}
loopsPaceLimiter ++
2019-09-20 10:53:42 +02:00
item := ts . Item
if ! bytes . HasPrefix ( item , prefix ) {
return nil
2019-05-22 23:16:55 +02:00
}
2019-09-20 10:53:42 +02:00
tail := item [ len ( prefix ) : ]
2019-11-09 17:00:58 +01:00
n := bytes . IndexByte ( tail , tagSeparatorChar )
if n < 0 {
return fmt . Errorf ( "invalid tag->metricIDs line %q: cannot find tagSeparatorChar %d" , item , tagSeparatorChar )
}
tail = tail [ n + 1 : ]
if err := mp . InitOnlyTail ( item , tail ) ; err != nil {
return err
}
mp . ParseMetricIDs ( )
2020-07-21 19:56:49 +02:00
metricIDs . AddMulti ( mp . MetricIDs )
2019-09-24 20:10:22 +02:00
if metricIDs . Len ( ) >= maxMetrics {
2019-09-20 10:53:42 +02:00
return nil
}
2019-05-22 23:16:55 +02:00
}
if err := ts . Error ( ) ; err != nil {
2020-06-30 21:58:18 +02:00
return fmt . Errorf ( "error when searching for all metricIDs by prefix %q: %w" , prefix , err )
2019-05-22 23:16:55 +02:00
}
return nil
}
2021-03-15 19:31:24 +01:00
// The estimated number of index scan loops a single loop in updateMetricIDsByMetricNameMatch takes.
2021-03-25 12:27:47 +01:00
const loopsCountPerMetricNameMatch = 150
2019-05-22 23:16:55 +02:00
var kbPool bytesutil . ByteBufferPool
// Returns local unique MetricID.
2020-05-14 13:08:39 +02:00
func generateUniqueMetricID ( ) uint64 {
// It is expected that metricIDs returned from this function must be dense.
// If they will be sparse, then this may hurt metric_ids intersection
// performance with uint64set.Set.
2024-02-23 23:15:21 +01:00
return nextUniqueMetricID . Add ( 1 )
2019-05-22 23:16:55 +02:00
}
// This number mustn't go backwards on restarts, otherwise metricID
// collisions are possible. So don't change time on the server
// between VictoriaMetrics restarts.
2024-02-23 23:15:21 +01:00
var nextUniqueMetricID = func ( ) * atomic . Uint64 {
var n atomic . Uint64
n . Store ( uint64 ( time . Now ( ) . UnixNano ( ) ) )
return & n
} ( )
2019-05-22 23:16:55 +02:00
func marshalCommonPrefix ( dst [ ] byte , nsPrefix byte ) [ ] byte {
dst = append ( dst , nsPrefix )
return dst
}
2020-07-23 23:31:09 +02:00
// This function is needed only for minimizing the difference between code for single-node and cluster version.
func ( is * indexSearch ) marshalCommonPrefix ( dst [ ] byte , nsPrefix byte ) [ ] byte {
return marshalCommonPrefix ( dst , nsPrefix )
}
2022-06-12 03:32:13 +02:00
func ( is * indexSearch ) marshalCommonPrefixForDate ( dst [ ] byte , date uint64 ) [ ] byte {
if date == 0 {
// Global index
return is . marshalCommonPrefix ( dst , nsPrefixTagToMetricIDs )
}
// Per-day index
dst = is . marshalCommonPrefix ( dst , nsPrefixDateTagToMetricIDs )
return encoding . MarshalUint64 ( dst , date )
}
2019-09-20 18:46:47 +02:00
func unmarshalCommonPrefix ( src [ ] byte ) ( [ ] byte , byte , error ) {
if len ( src ) < commonPrefixLen {
return nil , 0 , fmt . Errorf ( "cannot unmarshal common prefix from %d bytes; need at least %d bytes; data=%X" , len ( src ) , commonPrefixLen , src )
}
prefix := src [ 0 ]
return src [ commonPrefixLen : ] , prefix , nil
}
// 1 byte for prefix
const commonPrefixLen = 1
type tagToMetricIDsRowParser struct {
2019-11-09 22:17:42 +01:00
// NSPrefix contains the first byte parsed from the row after Init call.
// This is either nsPrefixTagToMetricIDs or nsPrefixDateTagToMetricIDs.
NSPrefix byte
// Date contains parsed date for nsPrefixDateTagToMetricIDs rows after Init call
Date uint64
2019-09-20 18:46:47 +02:00
// MetricIDs contains parsed MetricIDs after ParseMetricIDs call
MetricIDs [ ] uint64
2022-06-12 03:32:13 +02:00
// metricIDsParsed is set to true after ParseMetricIDs call
metricIDsParsed bool
2019-09-20 18:46:47 +02:00
// Tag contains parsed tag after Init call
Tag Tag
// tail contains the remaining unparsed metricIDs
tail [ ] byte
}
func ( mp * tagToMetricIDsRowParser ) Reset ( ) {
2019-11-09 22:17:42 +01:00
mp . NSPrefix = 0
mp . Date = 0
2019-09-20 18:46:47 +02:00
mp . MetricIDs = mp . MetricIDs [ : 0 ]
2022-06-12 03:32:13 +02:00
mp . metricIDsParsed = false
2019-09-20 18:46:47 +02:00
mp . Tag . Reset ( )
mp . tail = nil
}
// Init initializes mp from b, which should contain encoded tag->metricIDs row.
//
// b cannot be re-used until Reset call.
2019-11-09 22:17:42 +01:00
func ( mp * tagToMetricIDsRowParser ) Init ( b [ ] byte , nsPrefixExpected byte ) error {
tail , nsPrefix , err := unmarshalCommonPrefix ( b )
2019-09-20 18:46:47 +02:00
if err != nil {
2020-06-30 21:58:18 +02:00
return fmt . Errorf ( "invalid tag->metricIDs row %q: %w" , b , err )
2019-09-20 18:46:47 +02:00
}
2019-11-09 22:17:42 +01:00
if nsPrefix != nsPrefixExpected {
return fmt . Errorf ( "invalid prefix for tag->metricIDs row %q; got %d; want %d" , b , nsPrefix , nsPrefixExpected )
}
if nsPrefix == nsPrefixDateTagToMetricIDs {
// unmarshal date.
if len ( tail ) < 8 {
return fmt . Errorf ( "cannot unmarshal date from (date, tag)->metricIDs row %q from %d bytes; want at least 8 bytes" , b , len ( tail ) )
}
mp . Date = encoding . UnmarshalUint64 ( tail )
tail = tail [ 8 : ]
2019-09-20 18:46:47 +02:00
}
2019-11-09 22:17:42 +01:00
mp . NSPrefix = nsPrefix
2019-09-20 18:46:47 +02:00
tail , err = mp . Tag . Unmarshal ( tail )
if err != nil {
2020-06-30 21:58:18 +02:00
return fmt . Errorf ( "cannot unmarshal tag from tag->metricIDs row %q: %w" , b , err )
2019-09-20 18:46:47 +02:00
}
return mp . InitOnlyTail ( b , tail )
}
2019-11-09 22:17:42 +01:00
// MarshalPrefix marshals row prefix without tail to dst.
func ( mp * tagToMetricIDsRowParser ) MarshalPrefix ( dst [ ] byte ) [ ] byte {
dst = marshalCommonPrefix ( dst , mp . NSPrefix )
if mp . NSPrefix == nsPrefixDateTagToMetricIDs {
dst = encoding . MarshalUint64 ( dst , mp . Date )
}
dst = mp . Tag . Marshal ( dst )
return dst
}
2019-09-20 18:46:47 +02:00
// InitOnlyTail initializes mp.tail from tail.
//
// b must contain tag->metricIDs row.
// b cannot be re-used until Reset call.
func ( mp * tagToMetricIDsRowParser ) InitOnlyTail ( b , tail [ ] byte ) error {
if len ( tail ) == 0 {
return fmt . Errorf ( "missing metricID in the tag->metricIDs row %q" , b )
}
if len ( tail ) % 8 != 0 {
return fmt . Errorf ( "invalid tail length in the tag->metricIDs row; got %d bytes; must be multiple of 8 bytes" , len ( tail ) )
}
mp . tail = tail
2022-06-12 03:32:13 +02:00
mp . metricIDsParsed = false
2019-09-20 18:46:47 +02:00
return nil
}
// EqualPrefix returns true if prefixes for mp and x are equal.
//
// Prefix contains (tag)
func ( mp * tagToMetricIDsRowParser ) EqualPrefix ( x * tagToMetricIDsRowParser ) bool {
2019-11-09 22:17:42 +01:00
if ! mp . Tag . Equal ( & x . Tag ) {
return false
}
return mp . Date == x . Date && mp . NSPrefix == x . NSPrefix
2019-09-20 18:46:47 +02:00
}
2019-09-23 19:40:38 +02:00
// MetricIDsLen returns the number of MetricIDs in the mp.tail
func ( mp * tagToMetricIDsRowParser ) MetricIDsLen ( ) int {
return len ( mp . tail ) / 8
}
2019-09-20 18:46:47 +02:00
// ParseMetricIDs parses MetricIDs from mp.tail into mp.MetricIDs.
func ( mp * tagToMetricIDsRowParser ) ParseMetricIDs ( ) {
2022-06-12 03:32:13 +02:00
if mp . metricIDsParsed {
return
}
2019-09-20 18:46:47 +02:00
tail := mp . tail
n := len ( tail ) / 8
2024-05-12 11:24:48 +02:00
mp . MetricIDs = slicesutil . SetLength ( mp . MetricIDs , n )
2019-09-20 18:46:47 +02:00
metricIDs := mp . MetricIDs
_ = metricIDs [ n - 1 ]
for i := 0 ; i < n ; i ++ {
if len ( tail ) < 8 {
logger . Panicf ( "BUG: tail cannot be smaller than 8 bytes; got %d bytes; tail=%X" , len ( tail ) , tail )
return
}
metricID := encoding . UnmarshalUint64 ( tail )
metricIDs [ i ] = metricID
tail = tail [ 8 : ]
}
2022-06-12 03:32:13 +02:00
mp . metricIDsParsed = true
}
2022-06-14 15:32:38 +02:00
// GetMatchingSeriesCount returns the number of series in mp, which match metricIDs from the given filter
// and do not match metricIDs from negativeFilter.
2022-06-12 03:32:13 +02:00
//
// if filter is empty, then all series in mp are taken into account.
2022-06-14 15:32:38 +02:00
func ( mp * tagToMetricIDsRowParser ) GetMatchingSeriesCount ( filter , negativeFilter * uint64set . Set ) int {
if filter == nil && negativeFilter . Len ( ) == 0 {
2022-06-12 03:32:13 +02:00
return mp . MetricIDsLen ( )
}
mp . ParseMetricIDs ( )
n := 0
for _ , metricID := range mp . MetricIDs {
2022-06-14 15:32:38 +02:00
if filter != nil && ! filter . Has ( metricID ) {
continue
}
if ! negativeFilter . Has ( metricID ) {
2022-06-12 03:32:13 +02:00
n ++
}
}
return n
2019-09-20 18:46:47 +02:00
}
2021-02-21 21:06:45 +01:00
func mergeTagToMetricIDsRows ( data [ ] byte , items [ ] mergeset . Item ) ( [ ] byte , [ ] mergeset . Item ) {
2019-11-09 22:17:42 +01:00
data , items = mergeTagToMetricIDsRowsInternal ( data , items , nsPrefixTagToMetricIDs )
data , items = mergeTagToMetricIDsRowsInternal ( data , items , nsPrefixDateTagToMetricIDs )
return data , items
}
2021-02-21 21:06:45 +01:00
func mergeTagToMetricIDsRowsInternal ( data [ ] byte , items [ ] mergeset . Item , nsPrefix byte ) ( [ ] byte , [ ] mergeset . Item ) {
2019-11-09 22:17:42 +01:00
// Perform quick checks whether items contain rows starting from nsPrefix
2019-09-20 18:46:47 +02:00
// based on the fact that items are sorted.
2019-10-08 15:25:24 +02:00
if len ( items ) <= 2 {
// The first and the last row must remain unchanged.
2019-09-20 18:46:47 +02:00
return data , items
}
2021-02-21 21:06:45 +01:00
firstItem := items [ 0 ] . Bytes ( data )
2019-11-09 22:17:42 +01:00
if len ( firstItem ) > 0 && firstItem [ 0 ] > nsPrefix {
2019-09-20 18:46:47 +02:00
return data , items
}
2021-02-21 21:06:45 +01:00
lastItem := items [ len ( items ) - 1 ] . Bytes ( data )
2019-11-09 22:17:42 +01:00
if len ( lastItem ) > 0 && lastItem [ 0 ] < nsPrefix {
2019-09-20 18:46:47 +02:00
return data , items
}
2019-11-09 22:17:42 +01:00
// items contain at least one row starting from nsPrefix. Merge rows with common tag.
2019-09-20 18:46:47 +02:00
tmm := getTagToMetricIDsRowsMerger ( )
2019-10-08 15:25:24 +02:00
tmm . dataCopy = append ( tmm . dataCopy [ : 0 ] , data ... )
tmm . itemsCopy = append ( tmm . itemsCopy [ : 0 ] , items ... )
2019-09-20 18:46:47 +02:00
mp := & tmm . mp
mpPrev := & tmm . mpPrev
2019-10-08 15:25:24 +02:00
dstData := data [ : 0 ]
dstItems := items [ : 0 ]
2021-02-21 21:06:45 +01:00
for i , it := range items {
item := it . Bytes ( data )
2019-11-09 22:17:42 +01:00
if len ( item ) == 0 || item [ 0 ] != nsPrefix || i == 0 || i == len ( items ) - 1 {
// Write rows not starting with nsPrefix as-is.
2019-09-23 19:40:38 +02:00
// Additionally write the first and the last row as-is in order to preserve
2021-03-09 08:18:19 +01:00
// sort order for adjacent blocks.
2019-11-09 22:17:42 +01:00
dstData , dstItems = tmm . flushPendingMetricIDs ( dstData , dstItems , mpPrev )
2019-09-20 18:46:47 +02:00
dstData = append ( dstData , item ... )
2021-02-21 21:06:45 +01:00
dstItems = append ( dstItems , mergeset . Item {
Start : uint32 ( len ( dstData ) - len ( item ) ) ,
End : uint32 ( len ( dstData ) ) ,
} )
2019-09-20 18:46:47 +02:00
continue
}
2019-11-09 22:17:42 +01:00
if err := mp . Init ( item , nsPrefix ) ; err != nil {
logger . Panicf ( "FATAL: cannot parse row starting with nsPrefix %d during merge: %s" , nsPrefix , err )
2019-09-20 18:46:47 +02:00
}
2019-09-23 23:49:21 +02:00
if mp . MetricIDsLen ( ) >= maxMetricIDsPerRow {
2019-11-09 22:17:42 +01:00
dstData , dstItems = tmm . flushPendingMetricIDs ( dstData , dstItems , mpPrev )
2019-09-23 23:49:21 +02:00
dstData = append ( dstData , item ... )
2021-02-21 21:06:45 +01:00
dstItems = append ( dstItems , mergeset . Item {
Start : uint32 ( len ( dstData ) - len ( item ) ) ,
End : uint32 ( len ( dstData ) ) ,
} )
2019-09-23 23:49:21 +02:00
continue
}
2019-11-09 22:17:42 +01:00
if ! mp . EqualPrefix ( mpPrev ) {
2019-09-20 18:46:47 +02:00
dstData , dstItems = tmm . flushPendingMetricIDs ( dstData , dstItems , mpPrev )
}
mp . ParseMetricIDs ( )
tmm . pendingMetricIDs = append ( tmm . pendingMetricIDs , mp . MetricIDs ... )
mpPrev , mp = mp , mpPrev
2019-09-23 23:49:21 +02:00
if len ( tmm . pendingMetricIDs ) >= maxMetricIDsPerRow {
dstData , dstItems = tmm . flushPendingMetricIDs ( dstData , dstItems , mpPrev )
}
2019-09-20 18:46:47 +02:00
}
if len ( tmm . pendingMetricIDs ) > 0 {
2019-10-08 15:25:24 +02:00
logger . Panicf ( "BUG: tmm.pendingMetricIDs must be empty at this point; got %d items: %d" , len ( tmm . pendingMetricIDs ) , tmm . pendingMetricIDs )
}
2021-02-21 21:06:45 +01:00
if ! checkItemsSorted ( dstData , dstItems ) {
2019-11-06 13:24:48 +01:00
// Items could become unsorted if initial items contain duplicate metricIDs:
//
// item1: 1, 1, 5
// item2: 1, 4
//
// Items could become the following after the merge:
//
// item1: 1, 5
// item2: 1, 4
//
// i.e. item1 > item2
//
// Leave the original items unmerged, so they can be merged next time.
// This case should be quite rare - if multiple data points are simultaneously inserted
// into the same new time series from multiple concurrent goroutines.
2024-02-23 23:15:21 +01:00
indexBlocksWithMetricIDsIncorrectOrder . Add ( 1 )
2019-10-08 15:25:24 +02:00
dstData = append ( dstData [ : 0 ] , tmm . dataCopy ... )
2021-02-21 21:06:45 +01:00
dstItems = append ( dstItems [ : 0 ] , tmm . itemsCopy ... )
if ! checkItemsSorted ( dstData , dstItems ) {
2019-11-06 13:24:48 +01:00
logger . Panicf ( "BUG: the original items weren't sorted; items=%q" , dstItems )
2019-10-09 11:13:17 +02:00
}
2019-09-20 18:46:47 +02:00
}
2019-09-23 19:40:38 +02:00
putTagToMetricIDsRowsMerger ( tmm )
2024-02-23 23:15:21 +01:00
indexBlocksWithMetricIDsProcessed . Add ( 1 )
2019-09-24 18:32:06 +02:00
return dstData , dstItems
2019-09-20 18:46:47 +02:00
}
2024-02-23 23:15:21 +01:00
var indexBlocksWithMetricIDsIncorrectOrder atomic . Uint64
var indexBlocksWithMetricIDsProcessed atomic . Uint64
2019-11-06 13:24:48 +01:00
2021-02-21 21:06:45 +01:00
func checkItemsSorted ( data [ ] byte , items [ ] mergeset . Item ) bool {
2019-09-26 12:12:24 +02:00
if len ( items ) == 0 {
2019-11-06 13:24:48 +01:00
return true
2019-09-26 12:12:24 +02:00
}
2021-02-21 21:06:45 +01:00
prevItem := items [ 0 ] . String ( data )
for _ , it := range items [ 1 : ] {
currItem := it . String ( data )
if prevItem > currItem {
2019-11-06 13:24:48 +01:00
return false
2019-09-26 12:12:24 +02:00
}
prevItem = currItem
}
2019-11-06 13:24:48 +01:00
return true
2019-09-26 12:12:24 +02:00
}
2019-09-23 23:49:21 +02:00
// maxMetricIDsPerRow limits the number of metricIDs in tag->metricIDs row.
//
// This reduces overhead on index and metaindex in lib/mergeset.
const maxMetricIDsPerRow = 64
2019-09-19 19:00:33 +02:00
type uint64Sorter [ ] uint64
func ( s uint64Sorter ) Len ( ) int { return len ( s ) }
func ( s uint64Sorter ) Less ( i , j int ) bool {
return s [ i ] < s [ j ]
}
func ( s uint64Sorter ) Swap ( i , j int ) {
s [ i ] , s [ j ] = s [ j ] , s [ i ]
}
2019-09-20 18:46:47 +02:00
type tagToMetricIDsRowsMerger struct {
pendingMetricIDs uint64Sorter
mp tagToMetricIDsRowParser
mpPrev tagToMetricIDsRowParser
2019-10-08 15:25:24 +02:00
2021-02-21 21:06:45 +01:00
itemsCopy [ ] mergeset . Item
2019-10-08 15:25:24 +02:00
dataCopy [ ] byte
2019-09-20 18:46:47 +02:00
}
2019-09-23 19:40:38 +02:00
func ( tmm * tagToMetricIDsRowsMerger ) Reset ( ) {
tmm . pendingMetricIDs = tmm . pendingMetricIDs [ : 0 ]
tmm . mp . Reset ( )
tmm . mpPrev . Reset ( )
2019-10-08 15:25:24 +02:00
tmm . itemsCopy = tmm . itemsCopy [ : 0 ]
tmm . dataCopy = tmm . dataCopy [ : 0 ]
2019-09-23 19:40:38 +02:00
}
2021-02-21 21:06:45 +01:00
func ( tmm * tagToMetricIDsRowsMerger ) flushPendingMetricIDs ( dstData [ ] byte , dstItems [ ] mergeset . Item , mp * tagToMetricIDsRowParser ) ( [ ] byte , [ ] mergeset . Item ) {
2019-09-20 18:46:47 +02:00
if len ( tmm . pendingMetricIDs ) == 0 {
2019-11-09 22:17:42 +01:00
// Nothing to flush
return dstData , dstItems
2019-09-20 18:46:47 +02:00
}
2019-09-23 19:40:38 +02:00
// Use sort.Sort instead of sort.Slice in order to reduce memory allocations.
sort . Sort ( & tmm . pendingMetricIDs )
2019-09-25 16:55:13 +02:00
tmm . pendingMetricIDs = removeDuplicateMetricIDs ( tmm . pendingMetricIDs )
2019-09-23 19:40:38 +02:00
2019-09-25 16:55:13 +02:00
// Marshal pendingMetricIDs
2019-09-20 18:46:47 +02:00
dstDataLen := len ( dstData )
2019-11-09 22:17:42 +01:00
dstData = mp . MarshalPrefix ( dstData )
2019-09-23 19:40:38 +02:00
for _ , metricID := range tmm . pendingMetricIDs {
2019-09-20 18:46:47 +02:00
dstData = encoding . MarshalUint64 ( dstData , metricID )
}
2021-02-21 21:06:45 +01:00
dstItems = append ( dstItems , mergeset . Item {
Start : uint32 ( dstDataLen ) ,
End : uint32 ( len ( dstData ) ) ,
} )
2019-09-23 19:40:38 +02:00
tmm . pendingMetricIDs = tmm . pendingMetricIDs [ : 0 ]
2019-09-20 18:46:47 +02:00
return dstData , dstItems
}
2019-09-25 16:55:13 +02:00
func removeDuplicateMetricIDs ( sortedMetricIDs [ ] uint64 ) [ ] uint64 {
if len ( sortedMetricIDs ) < 2 {
return sortedMetricIDs
}
prevMetricID := sortedMetricIDs [ 0 ]
hasDuplicates := false
for _ , metricID := range sortedMetricIDs [ 1 : ] {
if prevMetricID == metricID {
hasDuplicates = true
2019-09-25 17:23:13 +02:00
break
2019-09-25 16:55:13 +02:00
}
prevMetricID = metricID
}
if ! hasDuplicates {
return sortedMetricIDs
}
dstMetricIDs := sortedMetricIDs [ : 1 ]
prevMetricID = sortedMetricIDs [ 0 ]
for _ , metricID := range sortedMetricIDs [ 1 : ] {
if prevMetricID == metricID {
continue
}
dstMetricIDs = append ( dstMetricIDs , metricID )
prevMetricID = metricID
}
return dstMetricIDs
}
2019-09-20 18:46:47 +02:00
func getTagToMetricIDsRowsMerger ( ) * tagToMetricIDsRowsMerger {
v := tmmPool . Get ( )
if v == nil {
return & tagToMetricIDsRowsMerger { }
}
return v . ( * tagToMetricIDsRowsMerger )
}
func putTagToMetricIDsRowsMerger ( tmm * tagToMetricIDsRowsMerger ) {
2019-09-23 19:40:38 +02:00
tmm . Reset ( )
2019-09-20 18:46:47 +02:00
tmmPool . Put ( tmm )
}
var tmmPool sync . Pool