VictoriaMetrics/app
Aliaksandr Valialkin bb7a419cc3
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
..
victoria-logs lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() 2024-01-15 13:50:36 +02:00
victoria-metrics app/vmselect: properly calculate start param for queries with too big look-behind window (#5630) 2024-01-17 13:48:06 +01:00
vlinsert Add _stream fields log (#5068) 2023-11-17 15:58:52 +01:00
vlselect vmui: reduced the number of server requests (#5253) 2023-11-14 01:50:00 +01:00
vlstorage lib/logstorage: follow-up for 8a23d08c21 2023-10-02 16:52:23 +02:00
vmagent all: add up to 10% random jitter to the interval between periodic tasks performed by various components 2024-01-22 18:40:32 +02:00
vmalert app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680) 2024-01-25 15:42:57 +01:00
vmalert-tool vmalert-tool: fix alert_rule_test case when eval_time is not multiple of evaluation_interval (#5387) 2023-12-01 12:17:24 +01:00
vmauth app/vmauth: return 503 service unavailable status code when the backend returns response with unsupported status code, but the request cannot be re-tried. 2024-01-26 20:43:11 +01:00
vmbackup lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() 2024-01-15 13:50:36 +02:00
vmbackupmanager docs: convert png images to webp in all the docs except of docs/operator/* 2023-11-22 19:21:00 +02:00
vmctl app/vmctl/backoff: fix flaky test 2024-01-22 12:21:14 +01:00
vmgateway docs: convert png images to webp in all the docs except of docs/operator/* 2023-11-22 19:21:00 +02:00
vminsert all: allow dynamically reading *AuthKey flag values from files and urls 2024-01-21 22:03:38 +02:00
vmrestore lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() 2024-01-15 13:50:36 +02:00
vmselect app/vmselect/promql: do not spend CPU time on verifying whether the rollup cache needs to be reset for the given metric rows when it has been already instructed to reset 2024-01-26 21:13:38 +01:00
vmstorage lib/{mergeset,storage}: make background merge more responsive and scalable 2024-01-26 22:27:47 +01:00
vmui vmui: query report (#5497) 2024-01-23 04:23:26 +02:00