diff --git a/README.md b/README.md index d0ccbdb0f..2651dbcc4 100644 --- a/README.md +++ b/README.md @@ -1414,7 +1414,12 @@ For example, `/api/v1/import?extra_label=foo=bar` would add `"foo":"bar"` label Note that it could be required to flush response cache after importing historical data. See [these docs](#backfilling) for detail. -VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. The solution is to split too long JSON lines into smaller lines. It is OK if samples for a single time series are split among multiple JSON lines. +VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. +This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. +The solution is to split too long JSON lines into shorter lines. It is OK if samples for a single time series are split among multiple JSON lines. +JSON line length can be limited via `max_rows_per_line` query arg when exporting via [/api/v1/export](how-to-export-data-in-json-line-format). + +The maximum JSON line length, which can be parsed by VictoriaMetrics, is limited by `-import.maxLineLen` command-line flag value. ### How to import data in native format @@ -1591,15 +1596,18 @@ The format follows [JSON streaming concept](http://ndjson.org/), e.g. each line ``` Note that every JSON object must be written in a single line, e.g. all the newline chars must be removed from it. -Every line length is limited by the value passed to `-import.maxLineLen` command-line flag (by default this is 10MB). +[/api/v1/import](#how-to-import-data-in-json-line-format) handler doesn't accept JSON lines longer than the value +passed to `-import.maxLineLen` command-line flag (by default this is 10MB). It is recommended passing 1K-10K samples per line for achieving the maximum data ingestion performance at [/api/v1/import](#how-to-import-data-in-json-line-format). Too long JSON lines may increase RAM usage at VictoriaMetrics side. +[/api/v1/export](#how-to-export-data-in-json-line-format) handler accepts `max_rows_per_line` query arg, which allows limiting the number of samples per each exported line. + It is OK to split [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) for the same [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series) across multiple lines. -The number of lines in JSON line document can be arbitrary. +The number of lines in the request to [/api/v1/import](#how-to-import-data-in-json-line-format) can be arbitrary - they are imported in streaming manner. ## Relabeling diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 8e2fd6c3b..2246e305e 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -29,6 +29,7 @@ The sandbox cluster installation is running under the constant load generated by ## tip * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add support for reading and writing samples via [Google PubSub](https://cloud.google.com/pubsub). See [these docs](https://docs.victoriametrics.com/vmagent.html#google-pubsub-integration). +* FEATURE: reduce the default value for `-import.maxLineLen` command-line flag from 100MB to 10MB in order to prevent excessive memory usage during data import via [/api/v1/import](https://docs.victoriametrics.com/#how-to-import-data-in-json-line-format). * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): prevent from `FATAL: cannot flush metainfo` panic when [`-remoteWrite.multitenantURL`](https://docs.victoriametrics.com/vmagent.html#multitenancy) command-line flag is set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5357). * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly decode zstd-encoded data blocks received via [VictoriaMetrics remote_write protocol](https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol). See [this issue comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301#issuecomment-1815871992). @@ -38,7 +39,6 @@ The sandbox cluster installation is running under the constant load generated by Released at 2023-11-16 * FEATURE: dashboards: use `version` instead of `short_version` in version change annotation for single/cluster dashboards. The update should reflect version changes even if different flavours of the same release were applied (custom builds). -* FEATURE: lower limit for `import.maxLineLen` cmd-line flag from 100MB to 10MB in order to prevent excessive memory usage during data import. Please note, the line length of exported data can be limited with `max_rows_per_line` query arg passed to `/api/v1/export`. The change affects vminsert/vmagent/VictoriaMetrics single-node. * BUGFIX: fix a bug, which could result in improper results and/or to `cannot merge series: duplicate series found` error during [range query](https://docs.victoriametrics.com/keyConcepts.html#range-query) execution. The issue has been introduced in [v1.95.0](https://docs.victoriametrics.com/CHANGELOG.html#v1950). See [this bugreport](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332) for details. * BUGFIX: improve deadline detection when using buffered connection for communication between cluster components. Before, due to nature of a buffered connection the deadline could have been exceeded while reading or writing buffered data to connection. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5327). diff --git a/docs/README.md b/docs/README.md index 443ba3452..0f977ee12 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1417,7 +1417,12 @@ For example, `/api/v1/import?extra_label=foo=bar` would add `"foo":"bar"` label Note that it could be required to flush response cache after importing historical data. See [these docs](#backfilling) for detail. -VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. The solution is to split too long JSON lines into smaller lines. It is OK if samples for a single time series are split among multiple JSON lines. +VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. +This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. +The solution is to split too long JSON lines into shorter lines. It is OK if samples for a single time series are split among multiple JSON lines. +JSON line length can be limited via `max_rows_per_line` query arg when exporting via [/api/v1/export](how-to-export-data-in-json-line-format). + +The maximum JSON line length, which can be parsed by VictoriaMetrics, is limited by `-import.maxLineLen` command-line flag value. ### How to import data in native format @@ -1594,15 +1599,18 @@ The format follows [JSON streaming concept](http://ndjson.org/), e.g. each line ``` Note that every JSON object must be written in a single line, e.g. all the newline chars must be removed from it. -Every line length is limited by the value passed to `-import.maxLineLen` command-line flag (by default this is 10MB). +[/api/v1/import](#how-to-import-data-in-json-line-format) handler doesn't accept JSON lines longer than the value +passed to `-import.maxLineLen` command-line flag (by default this is 10MB). It is recommended passing 1K-10K samples per line for achieving the maximum data ingestion performance at [/api/v1/import](#how-to-import-data-in-json-line-format). Too long JSON lines may increase RAM usage at VictoriaMetrics side. +[/api/v1/export](#how-to-export-data-in-json-line-format) handler accepts `max_rows_per_line` query arg, which allows limiting the number of samples per each exported line. + It is OK to split [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) for the same [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series) across multiple lines. -The number of lines in JSON line document can be arbitrary. +The number of lines in the request to [/api/v1/import](#how-to-import-data-in-json-line-format) can be arbitrary - they are imported in streaming manner. ## Relabeling diff --git a/docs/Single-server-VictoriaMetrics.md b/docs/Single-server-VictoriaMetrics.md index 1a234a94e..31745d442 100644 --- a/docs/Single-server-VictoriaMetrics.md +++ b/docs/Single-server-VictoriaMetrics.md @@ -1425,7 +1425,12 @@ For example, `/api/v1/import?extra_label=foo=bar` would add `"foo":"bar"` label Note that it could be required to flush response cache after importing historical data. See [these docs](#backfilling) for detail. -VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. The solution is to split too long JSON lines into smaller lines. It is OK if samples for a single time series are split among multiple JSON lines. +VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. +This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. +The solution is to split too long JSON lines into shorter lines. It is OK if samples for a single time series are split among multiple JSON lines. +JSON line length can be limited via `max_rows_per_line` query arg when exporting via [/api/v1/export](how-to-export-data-in-json-line-format). + +The maximum JSON line length, which can be parsed by VictoriaMetrics, is limited by `-import.maxLineLen` command-line flag value. ### How to import data in native format @@ -1602,15 +1607,18 @@ The format follows [JSON streaming concept](http://ndjson.org/), e.g. each line ``` Note that every JSON object must be written in a single line, e.g. all the newline chars must be removed from it. -Every line length is limited by the value passed to `-import.maxLineLen` command-line flag (by default this is 10MB). +[/api/v1/import](#how-to-import-data-in-json-line-format) handler doesn't accept JSON lines longer than the value +passed to `-import.maxLineLen` command-line flag (by default this is 10MB). It is recommended passing 1K-10K samples per line for achieving the maximum data ingestion performance at [/api/v1/import](#how-to-import-data-in-json-line-format). Too long JSON lines may increase RAM usage at VictoriaMetrics side. +[/api/v1/export](#how-to-export-data-in-json-line-format) handler accepts `max_rows_per_line` query arg, which allows limiting the number of samples per each exported line. + It is OK to split [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) for the same [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series) across multiple lines. -The number of lines in JSON line document can be arbitrary. +The number of lines in the request to [/api/v1/import](#how-to-import-data-in-json-line-format) can be arbitrary - they are imported in streaming manner. ## Relabeling