docs: add Storage section to docs (#2155)

* docs: add Storage section to docs

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2151

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

* docs: `make docs-sync`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This commit is contained in:
Roman Khavronenko 2022-02-08 15:59:17 +02:00 committed by Aliaksandr Valialkin
parent eed66b6640
commit 763cb68a67
No known key found for this signature in database
GPG Key ID: A72BEC6CD3D0DED1
2 changed files with 72 additions and 4 deletions

View File

@ -1140,15 +1140,49 @@ write data to the same VictoriaMetrics instance. These vmagent or Prometheus ins
`external_labels` section in their configs, so they write data to the same time series.
## Storage
VictoriaMetrics stores time series data in [MergeTree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like
data structures. On insert, VictoriaMetrics accumulates up to 1s of data and dumps it on disk to
`<-storageDataPath>/data/small/YYYY_MM/` subdirectory forming a `part` with the following
name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`. Each part consists of two "columns":
values and timestamps. These are sorted and compressed raw time series values. Additionally, part contains
index files for searching for specific series in the values and timestamps files.
`Parts` are periodically merged into the bigger parts. The resulting `part` is constructed
under `<-storageDataPath>/data/{small,big}/YYYY_MM/tmp` subdirectory. When the resulting `part` is complete, it is atomically moved from the `tmp`
to its own subdirectory, while the source parts are atomically removed. The end result is that the source
parts are substituted by a single resulting bigger `part` in the `<-storageDataPath>/data/{small,big}/YYYY_MM/` directory.
Information about merging process is available in [single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176) Grafana dashboards.
See more details in [monitoring docs](#monitoring).
The `merge` process is usually named "compaction", because the resulting `part` size is usually smaller than
the sum of the source `parts`. There are following benefits of doing the merge process:
* it improves query performance, since lower number of `parts` are inspected with each query;
* it reduces the number of data files, since each `part`contains fixed number of files;
* better compression rate for the resulting part.
Newly added `parts` either appear in the storage or fail to appear.
Storage never contains partially created parts. The same applies to merge process — `parts` are either fully
merged into a new `part` or fail to merge. There are no partially merged `parts` in MergeTree.
`Part` contents in MergeTree never change. Parts are immutable. They may be only deleted after the merge
to a bigger `part` or when the `part` contents goes outside the configured `-retentionPeriod`.
See [this article](https://valyala.medium.com/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) for more details.
See also [how to work with snapshots](#how-to-work-with-snapshots).
## Retention
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means
that the data will be stored for 3 months and then deleted.
Data is split in per-month partitions inside `<-storageDataPath>/data/small` and `<-storageDataPath>/data/big` folders.
Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders.
Data partitions outside the configured retention are deleted on the first day of new month.
Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`.
Data parts outside of the configured retention are eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
Data parts outside of the configured retention are eventually deleted during
[background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
In order to keep data according to `-retentionPeriod` max disk space usage is going to be `-retentionPeriod` + 1 month.
For example if `-retentionPeriod` is set to 1, data for January is deleted on March 1st.

View File

@ -1144,15 +1144,49 @@ write data to the same VictoriaMetrics instance. These vmagent or Prometheus ins
`external_labels` section in their configs, so they write data to the same time series.
## Storage
VictoriaMetrics stores time series data in [MergeTree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like
data structures. On insert, VictoriaMetrics accumulates up to 1s of data and dumps it on disk to
`<-storageDataPath>/data/small/YYYY_MM/` subdirectory forming a `part` with the following
name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`. Each part consists of two "columns":
values and timestamps. These are sorted and compressed raw time series values. Additionally, part contains
index files for searching for specific series in the values and timestamps files.
`Parts` are periodically merged into the bigger parts. The resulting `part` is constructed
under `<-storageDataPath>/data/{small,big}/YYYY_MM/tmp` subdirectory. When the resulting `part` is complete, it is atomically moved from the `tmp`
to its own subdirectory, while the source parts are atomically removed. The end result is that the source
parts are substituted by a single resulting bigger `part` in the `<-storageDataPath>/data/{small,big}/YYYY_MM/` directory.
Information about merging process is available in [single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176) Grafana dashboards.
See more details in [monitoring docs](#monitoring).
The `merge` process is usually named "compaction", because the resulting `part` size is usually smaller than
the sum of the source `parts`. There are following benefits of doing the merge process:
* it improves query performance, since lower number of `parts` are inspected with each query;
* it reduces the number of data files, since each `part`contains fixed number of files;
* better compression rate for the resulting part.
Newly added `parts` either appear in the storage or fail to appear.
Storage never contains partially created parts. The same applies to merge process — `parts` are either fully
merged into a new `part` or fail to merge. There are no partially merged `parts` in MergeTree.
`Part` contents in MergeTree never change. Parts are immutable. They may be only deleted after the merge
to a bigger `part` or when the `part` contents goes outside the configured `-retentionPeriod`.
See [this article](https://valyala.medium.com/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) for more details.
See also [how to work with snapshots](#how-to-work-with-snapshots).
## Retention
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means
that the data will be stored for 3 months and then deleted.
Data is split in per-month partitions inside `<-storageDataPath>/data/small` and `<-storageDataPath>/data/big` folders.
Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders.
Data partitions outside the configured retention are deleted on the first day of new month.
Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`.
Data parts outside of the configured retention are eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
Data parts outside of the configured retention are eventually deleted during
[background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
In order to keep data according to `-retentionPeriod` max disk space usage is going to be `-retentionPeriod` + 1 month.
For example if `-retentionPeriod` is set to 1, data for January is deleted on March 1st.