From 9b032763fa27295fb904012d1b3467a49f5f2d14 Mon Sep 17 00:00:00 2001 From: Vika Date: Mon, 20 Mar 2023 06:43:51 +0000 Subject: [PATCH] update wiki pages --- CHANGELOG.md | 5 +++++ README.md | 18 +++++++++--------- Single-server-VictoriaMetrics.md | 18 +++++++++--------- 3 files changed, 23 insertions(+), 18 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 387fd48..f8f8ada 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,11 @@ The following tip changes can be tested by building VictoriaMetrics components f ## tip +**Update note: this release contains backwards-incompatible change in storage data format, +so the previous versions of VictoriaMetrics will exit with the `unexpected number of substrings in the part name` error when trying to run them on the data +created by v1.90.0 or newer versions. The solution is to upgrade to v1.90.0 or newer releases** + +* FEATURE: publish VictoriaMetrics binaries for Windows. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236), [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70) issues. * FEATURE: log metrics with truncated labels if the length of label value in the ingested metric exceeds `-maxLabelValueLen`. This should simplify debugging for this case. * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add support for [VictoriaMetrics remote write protocol](https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol) when [sending / receiving data to / from Kafka](https://docs.victoriametrics.com/vmagent.html#kafka-integration). This protocol allows saving egress network bandwidth costs when sending data from `vmagent` to `Kafka` located in another datacenter or availability zone. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225). * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add `--kafka.consumer.topic.concurrency` command-line flag. It controls the number of Kafka consumer workers to use by `vmagent`. It should eliminate the need to start multiple `vmagent` instances to improve data transfer rate. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1957). diff --git a/README.md b/README.md index 455094e..7817851 100644 --- a/README.md +++ b/README.md @@ -1448,12 +1448,14 @@ can be configured with the `-inmemoryDataFlushInterval` command-line flag (note In-memory parts are persisted to disk into `part` directories under the `<-storageDataPath>/data/small/YYYY_MM/` folder, where `YYYY_MM` is the month partition for the stored data. For example, `2022_11` is the partition for `parts` with [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) from `November 2022`. +Each partition directory contains `parts.json` file with the actual list of parts in the partition. -The `part` directory has the following name pattern: `rowsCount_blocksCount_minTimestamp_maxTimestamp`, where: +Every `part` directory contains `metadata.json` file with the following fields: -- `rowsCount` - the number of [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) stored in the part -- `blocksCount` - the number of blocks stored in the part (see details about blocks below) -- `minTimestamp` and `maxTimestamp` - minimum and maximum timestamps across raw samples stored in the part +- `RowsCount` - the number of [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) stored in the part +- `BlocksCount` - the number of blocks stored in the part (see details about blocks below) +- `MinTimestamp` and `MaxTimestamp` - minimum and maximum timestamps across raw samples stored in the part +- `MinDedupInterval` - the [deduplication interval](#deduplication) applied to the given part. Each `part` consists of `blocks` sorted by internal time series id (aka `TSID`). Each `block` contains up to 8K [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples), @@ -1475,9 +1477,8 @@ for fast block lookups, which belong to the given `TSID` and cover the given tim and [freeing up disk space for the deleted time series](#how-to-delete-time-series) are performed during the merge Newly added `parts` either successfully appear in the storage or fail to appear. -The newly added `parts` are being created in a temporary directory under `<-storageDataPath>/data/{small,big}/YYYY_MM/tmp` folder. -When the newly added `part` is fully written and [fsynced](https://man7.org/linux/man-pages/man2/fsync.2.html) -to a temporary directory, then it is atomically moved to the storage directory. +The newly added `part` is atomically registered in the `parts.json` file under the corresponding partition +after it is fully written and [fsynced](https://man7.org/linux/man-pages/man2/fsync.2.html) to the storage. Thanks to this alogrithm, storage never contains partially created parts, even if hardware power off occurrs in the middle of writing the `part` to disk - such incompletely written `parts` are automatically deleted on the next VictoriaMetrics start. @@ -1506,8 +1507,7 @@ Retention is configured with the `-retentionPeriod` command-line flag, which tak Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders. Data partitions outside the configured retention are deleted on the first day of the new month. -Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`. -Data parts outside of the configured retention are eventually deleted during +Each partition consists of one or more data parts. Data parts outside of the configured retention are eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282). The maximum disk space usage for a given `-retentionPeriod` is going to be (`-retentionPeriod` + 1) months. diff --git a/Single-server-VictoriaMetrics.md b/Single-server-VictoriaMetrics.md index 814ead9..625d678 100644 --- a/Single-server-VictoriaMetrics.md +++ b/Single-server-VictoriaMetrics.md @@ -1451,12 +1451,14 @@ can be configured with the `-inmemoryDataFlushInterval` command-line flag (note In-memory parts are persisted to disk into `part` directories under the `<-storageDataPath>/data/small/YYYY_MM/` folder, where `YYYY_MM` is the month partition for the stored data. For example, `2022_11` is the partition for `parts` with [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) from `November 2022`. +Each partition directory contains `parts.json` file with the actual list of parts in the partition. -The `part` directory has the following name pattern: `rowsCount_blocksCount_minTimestamp_maxTimestamp`, where: +Every `part` directory contains `metadata.json` file with the following fields: -- `rowsCount` - the number of [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) stored in the part -- `blocksCount` - the number of blocks stored in the part (see details about blocks below) -- `minTimestamp` and `maxTimestamp` - minimum and maximum timestamps across raw samples stored in the part +- `RowsCount` - the number of [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) stored in the part +- `BlocksCount` - the number of blocks stored in the part (see details about blocks below) +- `MinTimestamp` and `MaxTimestamp` - minimum and maximum timestamps across raw samples stored in the part +- `MinDedupInterval` - the [deduplication interval](#deduplication) applied to the given part. Each `part` consists of `blocks` sorted by internal time series id (aka `TSID`). Each `block` contains up to 8K [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples), @@ -1478,9 +1480,8 @@ for fast block lookups, which belong to the given `TSID` and cover the given tim and [freeing up disk space for the deleted time series](#how-to-delete-time-series) are performed during the merge Newly added `parts` either successfully appear in the storage or fail to appear. -The newly added `parts` are being created in a temporary directory under `<-storageDataPath>/data/{small,big}/YYYY_MM/tmp` folder. -When the newly added `part` is fully written and [fsynced](https://man7.org/linux/man-pages/man2/fsync.2.html) -to a temporary directory, then it is atomically moved to the storage directory. +The newly added `part` is atomically registered in the `parts.json` file under the corresponding partition +after it is fully written and [fsynced](https://man7.org/linux/man-pages/man2/fsync.2.html) to the storage. Thanks to this alogrithm, storage never contains partially created parts, even if hardware power off occurrs in the middle of writing the `part` to disk - such incompletely written `parts` are automatically deleted on the next VictoriaMetrics start. @@ -1509,8 +1510,7 @@ Retention is configured with the `-retentionPeriod` command-line flag, which tak Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders. Data partitions outside the configured retention are deleted on the first day of the new month. -Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`. -Data parts outside of the configured retention are eventually deleted during +Each partition consists of one or more data parts. Data parts outside of the configured retention are eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282). The maximum disk space usage for a given `-retentionPeriod` is going to be (`-retentionPeriod` + 1) months.