From 4af05065d1e66732b02ea125258267bf866bf298 Mon Sep 17 00:00:00 2001 From: Nikolay Date: Thu, 26 Jan 2023 17:05:20 +0100 Subject: [PATCH] lib/storage: properly release parts inMerge lock (#3711) if storage doesn't have enough disk space, finalDedupWatcher holds inMerge lock for all parts and never release it until storage restart --- docs/CHANGELOG.md | 1 + lib/storage/partition.go | 2 ++ 2 files changed, 3 insertions(+) diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index a55b722800..f2343159a0 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -22,6 +22,7 @@ The following tip changes can be tested by building VictoriaMetrics components f * FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): reduce memory usage when sending stale markers for targets, which expose big number of metrics. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675) issues. * FEATURE: add `-internStringMaxLen` command-line flag, which can be used for fine-tuning RAM vs CPU usage in certain workloads. For example, if the stored time series contain long labels, then it may be useful reducing the `-internStringMaxLen` in order to reduce memory usage at the cost of increased CPU usage. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692). +* BUGFIX: release `inMerge` parts lock properly at `finalDedupWatcher`. It prevented partition merges until storage restart if storage didn't have enough disk space for final deduplication and down-sampling. * BUGFIX: [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): propagate all the timeout-related errors from `vmstorage` to `vmselect` when `vmstorage`. Previously some timeout errors weren't returned from `vmselect` to `vmstorage`. Instead, `vmstorage` could log the error and close the connection to `vmselect`, so `vmselect` was logging cryptic errors such as `cannot execute funcName="..." on vmstorage "...": EOF`. * BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): add support for time zone selection for older versions of browsers. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3680). * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): update API version for [ec2_sd_configs](https://docs.victoriametrics.com/sd_configs.html#ec2_sd_configs) to fix [the issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3700) with missing `__meta_ec2_availability_zone_id` attribute. diff --git a/lib/storage/partition.go b/lib/storage/partition.go index 1e0a7532b6..adb03daa8b 100644 --- a/lib/storage/partition.go +++ b/lib/storage/partition.go @@ -952,6 +952,7 @@ func (pt *partition) ForceMergeAllParts() error { if newPartSize > maxOutBytes { freeSpaceNeededBytes := newPartSize - maxOutBytes forceMergeLogger.Warnf("cannot initiate force merge for the partition %s; additional space needed: %d bytes", pt.name, freeSpaceNeededBytes) + pt.releasePartsToMerge(pws) return nil } @@ -963,6 +964,7 @@ func (pt *partition) ForceMergeAllParts() error { } pws = pt.getAllPartsForMerge() if len(pws) <= 1 { + pt.releasePartsToMerge(pws) return nil } }