VictoriaMetrics/lib/logstorage/consts.go

package logstorage

// maxUncompressedIndexBlockSize contains the maximum length of uncompressed block with blockHeader entries aka index block.
//
// The real block length can exceed this value by a small percentage because of the block write details.
const maxUncompressedIndexBlockSize = 128 * 1024

// maxUncompressedBlockSize is the maximum size of uncompressed block in bytes.
//
// The real uncompressed block can exceed this value by up to 2 times because of block merge details.
const maxUncompressedBlockSize = 2 * 1024 * 1024

// maxRowsPerBlock is the maximum number of log entries a single block can contain.
const maxRowsPerBlock = 8 * 1024 * 1024

// maxColumnsPerBlock is the maximum number of columns per block.
//
// It isn't recommended setting this value to too big value, because this may result
// in excess memory usage during data ingestion and significant slowdown during query execution,
// since every column header is unpacked in every matching block during query execution.
const maxColumnsPerBlock = 1_000

// MaxFieldNameSize is the maximum size in bytes for field name.
//
// Longer field names are truncated during data ingestion to MaxFieldNameSize length.
const MaxFieldNameSize = 128

// maxConstColumnValueSize is the maximum size in bytes for const column value.
//
// Const column values are stored in columnsHeader, which is read every time the corresponding block is scanned during search queries.
// So it is better to store bigger values in regular columns in order to speed up search speed.
const maxConstColumnValueSize = 256

// maxIndexBlockSize is the maximum size of the block with blockHeader entries (aka indexBlock)
const maxIndexBlockSize = 8 * 1024 * 1024

// maxTimestampsBlockSize is the maximum size of timestamps block
const maxTimestampsBlockSize = 8 * 1024 * 1024

// maxValuesBlockSize is the maximum size of values block
const maxValuesBlockSize = 8 * 1024 * 1024

// maxBloomFilterBlockSize is the maximum size of bloom filter block
const maxBloomFilterBlockSize = 8 * 1024 * 1024

// maxColumnsHeaderSize is the maximum size of columnsHeader block
const maxColumnsHeaderSize = 8 * 1024 * 1024

// maxDictSizeBytes is the maximum length of all the keys in the valuesDict.
//
// Dict is stored in columnsHeader, which is read every time the corresponding block is scanned during search qieries.
// So it is better to store bigger values in regular columns in order to speed up search speed.
const maxDictSizeBytes = 256

// maxDictLen is the maximum number of entries in the valuesDict.
//
// it shouldn't exceed 255, since the dict len is marshaled into a single byte.
const maxDictLen = 8
app/victoria-logs: initial code release 2023-06-20 07:55:12 +02:00			`package logstorage`

			`// maxUncompressedIndexBlockSize contains the maximum length of uncompressed block with blockHeader entries aka index block.`
			`//`
			`// The real block length can exceed this value by a small percentage because of the block write details.`
			`const maxUncompressedIndexBlockSize = 128 * 1024`

			`// maxUncompressedBlockSize is the maximum size of uncompressed block in bytes.`
			`//`
			`// The real uncompressed block can exceed this value by up to 2 times because of block merge details.`
			`const maxUncompressedBlockSize = 2 * 1024 * 1024`

			`// maxRowsPerBlock is the maximum number of log entries a single block can contain.`
			`const maxRowsPerBlock = 8 * 1024 * 1024`

			`// maxColumnsPerBlock is the maximum number of columns per block.`
lib/logstorage/consts.go: document that it isn't recommended setting maxColumnsPerBlock constant to too big values This should help avoiding cases like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2337446083 2024-09-24 18:51:23 +02:00			`//`
			`// It isn't recommended setting this value to too big value, because this may result`
			`// in excess memory usage during data ingestion and significant slowdown during query execution,`
			`// since every column header is unpacked in every matching block during query execution.`
lib/logstorage: initial implementation of pipes in LogsQL See https://docs.victoriametrics.com/victorialogs/logsql/#pipes 2024-05-12 16:33:29 +02:00			`const maxColumnsPerBlock = 1_000`
app/victoria-logs: initial code release 2023-06-20 07:55:12 +02:00
lib/logstorage: follow-up for 94627113dbe0b7bed23a7dc4864fc2f2903a819c - Move uniqueFields from rows to blockStreamMerger struct. This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(), which should improve readability and maintainability of the code. - Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock, since the provided logging didn't provide the solution for the issue with too many columns. I couldn't figure out the proper solution, which could be helpful for end user, so decided to remove the logging until we find the solution. This commit also contains the following additional changes: - It truncates field names longer than 128 chars during logs ingestion. This should prevent from ingesting bogus field names. This also should prevent from too big columnsHeader blocks, which could negatively affect search query performance, since columnsHeader is read on every scan of the corresponding data block. - It limits the maximum length of const column value to 256. Longer values are stored in an ordinary columns. This helps limiting the size of columnsHeader blocks and improving search query performance by avoiding reading too long const columns on every scan of the corresponding data block. - It deduplicates columns with identical names during data ingestion and background merging. Previously it was possible to pass columns with duplicate names to block.mustInitFromRows(), and they were stored as is in the block. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969 2023-10-02 19:01:17 +02:00			`// MaxFieldNameSize is the maximum size in bytes for field name.`
			`//`
			`// Longer field names are truncated during data ingestion to MaxFieldNameSize length.`
			`const MaxFieldNameSize = 128`

			`// maxConstColumnValueSize is the maximum size in bytes for const column value.`
			`//`
			`// Const column values are stored in columnsHeader, which is read every time the corresponding block is scanned during search queries.`
			`// So it is better to store bigger values in regular columns in order to speed up search speed.`
			`const maxConstColumnValueSize = 256`

app/victoria-logs: initial code release 2023-06-20 07:55:12 +02:00			`// maxIndexBlockSize is the maximum size of the block with blockHeader entries (aka indexBlock)`
			`const maxIndexBlockSize = 8 * 1024 * 1024`

			`// maxTimestampsBlockSize is the maximum size of timestamps block`
			`const maxTimestampsBlockSize = 8 * 1024 * 1024`

			`// maxValuesBlockSize is the maximum size of values block`
			`const maxValuesBlockSize = 8 * 1024 * 1024`

			`// maxBloomFilterBlockSize is the maximum size of bloom filter block`
			`const maxBloomFilterBlockSize = 8 * 1024 * 1024`

			`// maxColumnsHeaderSize is the maximum size of columnsHeader block`
			`const maxColumnsHeaderSize = 8 * 1024 * 1024`
lib/logstorage: follow-up for 94627113dbe0b7bed23a7dc4864fc2f2903a819c - Move uniqueFields from rows to blockStreamMerger struct. This allows localizing all the references to uniqueFields inside blockStreamMerger.mustWriteBlock(), which should improve readability and maintainability of the code. - Remove logging of the event when blocks cannot be merged because they contain more than maxColumnsPerBlock, since the provided logging didn't provide the solution for the issue with too many columns. I couldn't figure out the proper solution, which could be helpful for end user, so decided to remove the logging until we find the solution. This commit also contains the following additional changes: - It truncates field names longer than 128 chars during logs ingestion. This should prevent from ingesting bogus field names. This also should prevent from too big columnsHeader blocks, which could negatively affect search query performance, since columnsHeader is read on every scan of the corresponding data block. - It limits the maximum length of const column value to 256. Longer values are stored in an ordinary columns. This helps limiting the size of columnsHeader blocks and improving search query performance by avoiding reading too long const columns on every scan of the corresponding data block. - It deduplicates columns with identical names during data ingestion and background merging. Previously it was possible to pass columns with duplicate names to block.mustInitFromRows(), and they were stored as is in the block. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4969 2023-10-02 19:01:17 +02:00
			`// maxDictSizeBytes is the maximum length of all the keys in the valuesDict.`
			`//`
			`// Dict is stored in columnsHeader, which is read every time the corresponding block is scanned during search qieries.`
			`// So it is better to store bigger values in regular columns in order to speed up search speed.`
			`const maxDictSizeBytes = 256`

			`// maxDictLen is the maximum number of entries in the valuesDict.`
			`//`
			`// it shouldn't exceed 255, since the dict len is marshaled into a single byte.`
			`const maxDictLen = 8`