Commit Graph

21 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
bc1f92d7f5
app/vmagent/remotewrite: follow-up for 87fd400dfc
- Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage
  systems are configured with the disabled on-disk queue, every in-memory queue is full
  and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common,
  so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx
  relabeling and other processing in this case.

- Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped().
  Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set.
  In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full
  and on-disk queue is disabled.
  The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing
  the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage
  processes pending data, so it drops aggregated samples in this case.

- Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output,
  so it is clear that this flag can be set individually per each -remoteWrite.url.

- Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems
  are configured with the disabled on-disk queue, then there is no sense in keeping samples
  on some of these systems, while dropping samples on the remaining systems, since this
  will result in global stall on the remote storage system with the disabled on-disk queue
  and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false
  from remotewrite.TryPush() in this case. This will result in infinite duplicate samples
  written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload
  is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set.
  This allows proceeding with newly scraped / pushed samples by sending them to the remaining
  remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set.

- Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test.

- Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url.
  See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065
2024-07-13 02:30:10 +02:00
Slava Bobik
a7266785ce
Fixed a typo in the FastQueue mutex comment (#6514)
### Describe Your Changes

Fixed a small typo in a comment about the mutex inside the FastQueue
struct

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit d236604d39)
2024-06-20 14:00:08 +02:00
Roman Khavronenko
0bed453737
Feature allow configuring disableOnDiskQueue and dropSamplesOnOverload per url (#6248)
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html):
allow configuring `-remoteWrite.disableOnDiskQueue` and
`-remoteWrite.dropSamplesOnOverload` cmd-line flags per each
`-remoteWrite.url`. See this [pull
request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065).
Thanks to @rbizos for implementaion!
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add
labels `path` and `url` to metrics
`vmagent_remotewrite_push_failures_total` and
`vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes
and dropped samples can be tracked per `-remoteWrite.url`.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Raphael Bizos <r.bizos@criteo.com>
(cherry picked from commit 87fd400dfc)
2024-05-10 14:32:23 +02:00
Aliaksandr Valialkin
2f14394335
app/vmagent: follow-up for 090cb2c9de
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
  for the result returned from these functions.

- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.

- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
  after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.

- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
  of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
  cannot keep up with the data ingestion rate.

- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.

- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
  data to persistent queue when -remoteWrite.disableOnDiskQueue is set.

- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
  because they are replaced with vmagent_remotewrite_samples_dropped_total metric.

- Update 'Disabling on-disk persistence' docs at docs/vmagent.md

- Update stale comments in the code

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 12:13:39 +02:00
Nikolay
25ac2aac31
app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-25 12:12:29 +02:00
Aliaksandr Valialkin
fd7efad69f
lib/persistentqueue: typo fix after aea6df8197 2023-03-27 20:05:51 -07:00
Aliaksandr Valialkin
6f5bbf096a
app/vmagent/remotewrite: cosmetic updates after f3a51e8b1d
- Compare directory names instead of paths to directory when determining which persistent queues must be deleted
  This is less error-prone solution, since paths to the same directory can differ, which could lead
  to accidental directory removal for the existing -remoteWrite.url

- Log the `removed %d dangling queues` message when at least a single queue has been removed

- Consistently use filepath.Join() for creating paths to persistent queues.
  This is needed for Windows support (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70 )

- Clarify the description of the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-03-27 18:38:53 -07:00
Zakhar Bessarab
6ed6eb0c4c
app/vmagent: add -remoteWrite.removeDanglingQueues flag (#4017)
* app/vmagent: add `-remoteWrite.removeDanglingQueues` flag which allows to automatically remove dangling persistent queue contents

Related issue: #4014

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmagent: address review feedback

- remove persistent queues files by default
- rename `remoteWrite.removeDanglingQueues` to `remoteWrite.keepDanglingQueues`
- update docs to reflect changed behaviour

Related issue: #4014

* Apply suggestions from code review

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 18:38:51 -07:00
Aliaksandr Valialkin
1a88fe5b1f
lib/flagutil/bytes.go: properly handle values bigger than 2GiB on 32-bit architectures
This fixes handling of values bigger than 2GiB for the following command-line flags:

- -storage.minFreeDiskSpaceBytes
- -remoteWrite.maxDiskUsagePerURL
2022-12-14 19:29:57 -08:00
Aliaksandr Valialkin
c1f81f08d4 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:13:15 +03:00
Aliaksandr Valialkin
f14412321b lib/persistentqueue: eliminate possible data race when obtaining vm_persistentqueue_bytes_pending metric value 2021-04-27 00:26:32 +03:00
Aliaksandr Valialkin
4d6a3aba4d lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030
2021-04-05 19:26:51 +03:00
Aliaksandr Valialkin
72eef964d9 app/vmagent: properly perform graceful shutdown, which was broken in the commit 1d1ba889fe
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1065
2021-02-19 00:34:17 +02:00
Aliaksandr Valialkin
adb4d3b75c lib/persistentqueue: flush data to disk every second
Previously small amounts of data may be left unflushed for extended periods of time if vmagent collects small amounts of data.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/687
2020-09-18 13:06:03 +03:00
Aliaksandr Valialkin
9705ac5d7a lib/persistentqueue: code simplification after d455764a6f 2020-09-16 21:14:01 +03:00
Aliaksandr Valialkin
eee6f1e56d lib/persistentqueue: make the persistent queue more durable against unclean shutdown (kill -9, OOM, hard reset)
The strategy is:

- Periodical flushing of inmemory blocks to files, so they aren't lost on unclean shutdown.
- Periodical syncing of metadata for persisted queues, so the metadata remains in sync with the persisted data.
- Automatic adjusting of too big chunk size when opening the queue. The chunk size may be bigger than the writer offset after unclean shutdown.
- Skipping of broken chunk file if it cannot be read.
- Fsyncing finalized chunk files.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/687
2020-09-16 18:13:24 +03:00
Aliaksandr Valialkin
dc16cdd1ca lib/persistentqueue: a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/484 2020-05-16 09:32:30 +03:00
肖贝贝
c154a92d29 fix: fix vmagent multi queue may become one because sync bug (#484)
Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-05-16 09:32:29 +03:00
Aliaksandr Valialkin
f01d1bf4a8 app/vmagent: add -remoteWrite.maxDiskUsagePerURL for limiting the maximum disk usage for each -remoteWrite.url buffer
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/352
2020-03-03 19:49:20 +02:00
Aliaksandr Valialkin
c2e602286c lib/persistentqueue: reset chunk file when the persistent queue is empty 2020-02-28 20:06:59 +02:00
Aliaksandr Valialkin
7ee7614e90 app/vmagent: initial implementation for vmagent 2020-02-23 17:31:54 +02:00