* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html):
allow configuring `-remoteWrite.disableOnDiskQueue` and
`-remoteWrite.dropSamplesOnOverload` cmd-line flags per each
`-remoteWrite.url`. See this [pull
request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065).
Thanks to @rbizos for implementaion!
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add
labels `path` and `url` to metrics
`vmagent_remotewrite_push_failures_total` and
`vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes
and dropped samples can be tracked per `-remoteWrite.url`.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Raphael Bizos <r.bizos@criteo.com>
(cherry picked from commit 87fd400dfc)
Set correct suffix `<output>_prometheus` for aggregation outputs
`increase_prometheus` and `total_prometheus`
Before, outputs `total` and `total_prometheus` or `increase` and
`increase_prometheus` had the same suffix.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 8a03e987cb)
Though labels compressor is quite resource intensive, each aggregator
and deduplicator instance has it's own compressor. Made it shared across
all aggregators to consume less resources while using multiple
aggregators.
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
(cherry picked from commit a9283e06a3)
### Describe Your Changes
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6041
#### Added
- Added service discovery support for Vultr.
#### Docs
- `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated.
#### Note
- Useful links:
- Vultr API:
https://www.vultr.com/api/#tag/instances/operation/list-instances
- Vultr client SDK: https://github.com/vultr/govultr
- Prometheus SD:
https://github.com/prometheus/prometheus/tree/main/discovery/vultr
---
### Checklist
The following checks are mandatory:
- [X] I have read the [Contributing
Guidelines](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md)
- [x] All commits are signed and include `Signed-off-by` line. Use `git
commit -s` to include `Signed-off-by` your commits. See this
[doc](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about
how to sign your commits.
- [x] Tests are passing locally. Use `make test` to run all tests
locally.
- [x] Linting is passing locally. Use `make check-all` to run all
linters locally.
Further checks are optional for External Contributions:
- [X] Include a link to the GitHub issue in the commit message, if issue
exists.
- [x] Mention the change in the
[Changelog](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md).
Explain what has changed and why. If there is a related issue or
documentation change - link them as well.
Tips for writing a good changelog message::
* Write a human-readable changelog message that describes the problem
and solution.
* Include a link to the issue or pull request in your changelog message.
* Use specific language identifying the fix, such as an error message,
metric name, or flag name.
* Provide a link to the relevant documentation for any new features you
add or modify.
- [ ] After your pull request is merged, please add a message to the
issue with instructions for how to test the fix or try the feature you
added. Here is an
[example](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048#issuecomment-1546453726)
- [x] Do not close the original issue before the change is released.
Please note, in some cases Github can automatically close the issue once
PR is merged. Re-open the issue in such case.
- [x] If the change somehow affects public interfaces (a new flag was
added or updated, or some behavior has changed) - add the corresponding
change to documentation.
Signed-off-by: Jiekun <jiekun.dev@gmail.com>
(cherry picked from commit 17e3d019d2)
When at least one remote write has deduplication configured it cleans up
timeseries while they can be in use by another remote write without
deduplication
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6205
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 879771808b)
The panel will show how many ongoing select queries are processed by vmstorage
and should help to identify resource bottlenecks. See panel description for more details.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d386a68b59)
Allowing to select multiple instance IPs makes it much easier to view
metrics for longer periods of time in dynamic environments such as
Kubernetes. In k8s update will also cause IP to change making it harder
to use dashboard to check the status.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5869
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6b493582da)
The new flag type is supposed to be used for specifying URL values which
could contain sensitive information such as auth tokens in GET params or
HTTP basic authentication.
The URL flag also allows loading its value from files if `file://`
prefix is specified. As example, the new flag type was used in
app/vmbackup as it requires specifying `authKey` param for making the
snapshot.
See related issue
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973
Thanks to @wasim-nihal for initial implementation
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6060
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 029060af60)
Starting from v1.99.0 vmalert could ignore anchors pointing to specific
rule groups if `search` param was present in URL.
This change makes anchors compatible with `search` param in UI.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 5f487c7090)
Using `sh` or `console` formatting doesn't do word-breaking on render. This makes flags description
harder to read, as users need to scroll the web page horizontally.
Removing the formatting renders the description with normal word-breaking.
(cherry picked from commit 9bedbcfa2f)
Stream aggregation may yield inaccurate results if it processes incomplete data.
This issue can arise when data is sourced from clients that maintain a queue of unsent data, such as Prometheus or vmagent.
If the queue isn't fully cleared within the aggregation interval, only a portion of the time series may be included in that period, leading to distorted calculations.
To mitigate this we add an option to ignore first N aggregation intervals. It is expected, that client queues
will be cleared during the time while aggregation ignores first N intervals and all subsequent aggregations
will be correct.
(cherry picked from commit c0e4ccb7b5)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
It is better from visibility PoV if the CONTRIBUTING.md file is visible at https://docs.victoriametrics.com/contributing/ .
This page can be indexed by search engines and searched by our users later.
It is also easier to provide a link to this page now comparing to the old link, which is much harder to remember:
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/CONTRIBUTING.md
While at it, syncrhonize docs/CONTRIBUTING.md with .github/PULL_REQUEST_TEMPLATE/pull_request_template.md ,
so they do not contain duplicate information, which can be outdated over time. Now all the relevant information
is located at docs/CONTRIBUTING.md, while .github/PULL_REQUEST_TEMPLATE/pull_request_template.md refers this doc.
This is a follow-up for c006db1798
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6040
This speeds up auto-suggestion for metric names in VMUI and Grafana, which use the following query in this case:
/api/v1/label/__name__/values?match[]={__name__=~"*.some_value.*"}
When the user types `some_value` in the query input field.
- Use exact matching by default for the query arg value provided via arg=value syntax at src_query_args.
Regex matching can be enabled by using =~ instead of = . For example, arg=~regex.
This ensures that the exact matching works as expected without the need to escape special regex chars.
- Add helper functions for creating QueryArg, Header and Regex structs in tests.
This improves maintainability of the tests.
- Remove url.QueryUnescape() call on the url in TestCreateTargetURLSuccess(), since this is bogus approach.
The url.QueryUnescape() must be applied to individual query args, and it mustn't be applied to the whole url,
since in this case it may perform invalid unescaping in the context of the url, or make the resulting url invalid.
While at it, properly marshal all the fields inside UserInfo config to yaml in tests.
Previously Header and QueryArg structs were improperly marshaled because the custom MarshalYAML
is called only on pointers to Header and QueryArg structs. This improves test coverage.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6070
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6115
If one out of 5 vmstorage nodes is unavailable, then the remaining 4 vmstorage nodes will recieve 1/5=20% increase of workload, not 5%.
If one out of 10 vmstorage nodes is unavailable, then the remaining 9 vmstorage nodes will receive 1/10=10% increase of workload, not 1%.
This is a follow-up for 458338afa5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6099
* app/vmauth: do not increment backend_errors when hitting concurrency limit
Previously, both "vmauth_concurrent_requests_limit_reached_total" and "vmauth_user_request_backend_errors_total" were incremented.
This was based on the assumption that if concurrency limit is hit the backend must be failing to handle the request thus meaning an error.
This assumption does not work in case the endpoint can be overloaded by the misbehaving client sending too many requests within the timeframe.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* docs/cluster: update cluster resizing info
- add example of resources distribution
- add info about how to handle uneven disk usage after adding a new storage node
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit 458338afa5)
* any status code from the range 200-299 from alertmanager to vmalert is not considered an error from now on (#6110)
* add changelog
(cherry picked from commit 7308bad777)
- Automatically reload changed TLS root CA pointed by -remoteWrite.tlsCAFile command-line flag
- Automatically reload changed TLS root CA configured via oauth2.tsl_config.ca_file option at -promscrape.config
- Document the change as a feature instead of a bug at docs/CHANGELOG.md
- Simplify the code at lib/promauth, which is responsible for reloading changed TLS root CA files.
- Simplify the usage of lib/promauth.Config.NewRoundTripper() - now it accepts the base http.Transport
instead of a callback, which can change the internal http.Transport.
- Reuse the default tls config if lib/promauth.Config doesn't contain tls-specific configs.
This should reduce memory usage a bit when tls isn't used for scraping big number of targets.
- Do not re-read TLS root CA files on every processed request. Re-read them once per second.
This should reduce CPU usage when scraping big number of targets over https.
- Do not store cert.pem and key.pem files in TestTLSConfigWithCertificatesFilesUpdate, since they can be loaded
from byte slices via crypto/tls.X509KeyPair().
- Remove obsolete comparisons of string representations for authConfig and proxyAuthConfig at areEqualScrapeConfigs().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5725
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171
* app/vmgateway: add an ability to log invalid auth tokens
This is useful for debugging to make it easier for user to find issues in token contents.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6029
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: add info about new vmgateway flag
- add changelog entry
- add info about logInvalidAuthTokens flag
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmgateway/filters/auth: improve reject reason visibility
Explicitly return a rejection reason for request when "logInvalidAuthTokens" is enabled.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* lib/{promauth,promscrape}: automatically refresh root CA certificates after changes on disk
Added a custom `http.RoundTripper` implementation which checks for root CA content changes and updates `tls.Config` used by `http.RoundTripper` after detecting CA change.
Client certificate changes are not tracked by this implementation since `tls.Config` already supports passing certificate dynamically by overriding `tls.Config.GetClientCertificate`.
This change implements dynamic reload of root CA only for streaming client used for scraping. Blocking client (`fasthttp.HostClient`) does not support using custom transport so can't use this implementation.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5526
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: update NewRoundTripper API
Update API to allow user to update only parameters required for transport.
Add warning log when reloading Root CA failed.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: fix mutex acquire logic
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: replace RWMutex with regular mutex to simplify the code
- remove additional mutex used for getRootCABytes - require callee to use mutex
- replace RWMutex with regular mutex
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/promauth/config: refactor
- hold the mutex lock to avoid round tripper being re-created twice
- move recreation logic into separate func to simplify the code
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
* rm recommendation to keep look-behind window empty, as it is not correct
* mention the change of default value for `-search.latencyOffset`
Signed-off-by: hagen1778 <roman@victoriametrics.com>
- Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming
- Clarify the description of the change at docs/CHANGELOG.md
- Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens
- Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens.
- Add tests for various edge cases for Prometheus metric names' normalization
according to the code at b865505850/pkg/translator/prometheus/normalize_name.go
- Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go)
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035
Remove description for -search.maxExportDuration and -search.maxStatusRequestDuration command-line flags
from the 'Resource usage limits' chapter, since these flags are rarely used for limiting resource usage
and they are already documented in the 'List of command-line flags' chapter.
- Fix docs for new functions at app/vmselect/graphite/functions.json
- Properly drain series lists on errors in aggregateSeriesListsGeneric() and aggregateSeriesList()
- Add links to docs for the added functions at docs/CHANGELOG.md
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5809
- Allow specifying only a single HTTP header for reading auth tokens via -httpAuthHeader command-line flag.
This is better from security PoV, since this prevents from accidental reading of auth token from undesired
HTTP header. By default the -httpAuthHeader equals to Authorization. When it is overridden, then
auth token isn't read from Authorization header - it is read only from the specified header.
- Document the -httpAuthHeader command-line flag at https://docs.victoriametrics.com/vmauth/#reading-auth-tokens-from-other-http-headers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6009
* docs/vmanomaly: v1.12.0 & link updates
* add autotuned description to model section
* - update refs of vmanomaly on enterprise and vmalert pages
- add diagrams for model types
- update self-monitoring section
* - fix typos
- remove .index.html from links
This reverts commit cb23685681.
Reason for revert: the "fix" may hide programming bugs related to incorrect creation of folders
before their use. This may complicate detecting and fixing such bugs in the future.
There are the following fixes for the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985 :
- To configure the OS to do not drop data from the system-wide temporary directory (aka /tmp).
- To run VictoriaMetrics with -cacheDataPath command-line flag, which points to the directory,
which cannot be removed automatically by the OS.
The case when the user accidentally deletes the directory with some files created by VictoriaMetrics
shouldn't be considered as expected, so VictoriaMetrics shouldn't try resolving this case automatically.
It is much better from operation and debuggability PoV is to crash with the clear `directory doesn't exist` error
in this case.
The remotewrite.Stop() expects that there are no pending calls to TryPush().
This means that the ingestionRateLimiter.Register() must be unblocked inside TryPush() when calling remotewrite.Stop().
Provide remotewrite.StopIngestionRateLimiter() function for unblocking the rate limiter before calling the remotewrite.Stop().
While at it, move the rate limiter into lib/ratelimiter package, since it has two users.
Also move the description of the feature to the correct place at docs/CHANGELOG.md.
Also cross-reference -remoteWrite.rateLimit and -maxIngestionRate command-line flags.
This is a follow-up for 02bccd1eb9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5900
Custom HTTP headers are set via net/http.Header.Set or net/http.Header.Add functions.
These functions always convert header keys to canonical form. So the change at b577413d3b
isn't visible to users of VictoriaMetrics components.
There is no need in documenting this change at docs/CHANGELOG.md, since it doesn't give any useful information to users.
This is a follow-up for e6dd52b04c
* lib/storage: add ability to use downsampling for the given series filter
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: add information about downsampling filters
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: fix MetricsQL filter
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/downsampling: treat missing downsampling filter as a bug
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/part_header: verify correctness of downsampling filters when opening partition
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/downsampling: save only appliable rules in part metadata
Filter and save only rules which are appliable to partition based on MinTimestamp of stored data.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage/downsampling: update log messages for final dedup
Properly specify a reason of re-running deduplication for partition.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* lib/storage: consistently use MaxTimestamp to determine deduplication/downsampling rules
Using MinTimestamp leads to applying downsampling to parts which are only partially covered by downsampling rule.
For example, partition covers range [1000-2000]. At t=2100 and rule offset 500 data with t=2100-500 => 1600 must be downsampled. The range check against MinTimestamp evaluates to true even though partition contains range which must not be downsampled - [1600:2000].
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* Follow-up
- Apply the first matching downsampling period if multiple filters match the given time series.
This allows fine-tuning the downsampling config for the specific needs.
- Take into account downsampling filters during search queries.
- Reduce the difference between community and enterprise branches. This should simplify further maintenance of these branches.
- Properly parse series filters with colons inside them.
- Document the feature at docs/CHANGELOG.md.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4960
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
- Make the configuration more clear by accepting the list of ignored labels during sharding
via a dedicated command-line flag - -remoteWrite.shardByURL.ignoreLabels.
This prevents from overloading the meaning of -remoteWrite.shardByURL.labels command-line flag.
- Removed superfluous memory allocation per each processed sample if sharding by remote storage is enabled.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5938
* vmalert: fix sending alert messages
1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered;
2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile.
* docs: update changelog notes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* lib/storage/table: properly wait for force merges to be completed during shutdown
Properly keep track of running background merges and wait for merges completion when closing the table.
Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: add changelog entry
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.
This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).
This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025
(cherry picked from commit 623d257faf)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
vmselect uses a cache folder in file system for two purposes:
1. Storing rollup cache results on shutdown;
2. Storing temporary search results from vmstorage during query executions.
It could happen that cache folder is deleted accidentally by user, or by OS
during cleanup routines. This would cause vmselect to:
1. panic on /metrics call, because `MustGetFreeSpace` will fail;
2. return query error user, as it won't be able to store temporary search results.
The changes in this commit are the following:
1. Make `MustGetFreeSpace` to try re-creating the cache folder if it is missing;
2. Make vmselect to try re-creating the cache folder if it can't persist tmp search
results.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5985
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
(cherry picked from commit cb23685681)
* [vmagent] added ingestion rate limiting with new flag `-maxIngestionRate`. This flag can be used to limit the number of samples ingested by vmagent per second. If the limit is exceeded, the ingestion rate will be throttled.
* fix changelog
* fix review comment
(cherry picked from commit 02bccd1eb9)