VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-27 02:46:47 +01:00

Author	SHA1	Message	Date
Hui Wang	72941eac36	victorialogs: add more checks for stats query APIs (#7254 ) 1. Verify if field in [fields pipe](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe) exists. If not, it generates a metric with illegal float value "" for prometheus metrics protocol. 2. check if multiple time range filters produce conflicted query time range, for instance: ``` query: _time: 5m \| stats count(), start:2024-10-08T10:00:00.806Z, end: 2024-10-08T12:00:00.806Z, time: 2024-10-10T10:02:59.806Z ``` must give no result due to invalid final time range. --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-10-16 19:25:43 +02:00
Aliaksandr Valialkin	202eb429a7	lib/logstorage: refactor storage format to be more efficient for querying wide events It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields. For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log. Such logs are also known as "wide events". The previous storage format was optimized for logs with a few fields. When at least a single field was referenced in the query, then the all the meta-information about all the log fields was unpacked and parsed per each scanned block during the query. This could require a lot of additional disk IO and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset) index per each field in every data block. This index allows reading and extracting only the needed metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ). This allows increasing performance for queries over wide events by 10x and more. Another issue was that the data for bloom filters and field values across all the log fields except of _msg was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ). This could result in huge disk read IO overhead when some small field was referred in the query, since the Operating System usually reads more data than requested. It reads the data from disk in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB). So, if 512-byte bloom filter or values' block is read from the file, then the Operating System reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache), but this overhead may become very annoying when performing the query over large volumes of data which isn't present in OS page cache. The solution for this issue is to split bloom filters and field values across multiple shards. This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards, while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N. Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases performance for queries over large volumes of newly ingested data by up to 1000x. The new storage format is versioned as v1, while the old storage format is version as v0. It is stored in the partHeader.FormatVersion. Parts with the old storage format are converted into parts with the new storage format during background merge. It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .	2024-10-16 17:35:07 +02:00
Andrii Chubatiuk	daa7183749	lib/protoparser/influx: enable batch processing by default (#7165 ) Some checks failed publish-docs / Build (push) Waiting to run Details build / Build (push) Has been cancelled Details CodeQL Go / Analyze (push) Has been cancelled Details main / lint (push) Has been cancelled Details main / test (test-full) (push) Has been cancelled Details main / test (test-full-386) (push) Has been cancelled Details main / test (test-pure) (push) Has been cancelled Details ### Describe Your Changes Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-15 11:48:40 +02:00
Aliaksandr Valialkin	bac193e50b	app/vlselect: do not show empty fields in query results Some checks are pending build / Build (push) Waiting to run Details CodeQL Go / Analyze (push) Waiting to run Details main / lint (push) Waiting to run Details main / test (test-full) (push) Blocked by required conditions Details main / test (test-full-386) (push) Blocked by required conditions Details main / test (test-pure) (push) Blocked by required conditions Details publish-docs / Build (push) Waiting to run Details Empty fields are treated as non-existing fields by VictoriaLogs data model. So there is no sense in returning empty fields in query results, since they may mislead and confuse users.	2024-10-14 23:43:58 +02:00
Aliaksandr Valialkin	3c73dbbacc	app/vlstorage: add support for forced merge via /internal/force_merge HTTP endpoint Some checks are pending build / Build (push) Waiting to run Details CodeQL Go / Analyze (push) Waiting to run Details main / lint (push) Waiting to run Details main / test (test-full) (push) Blocked by required conditions Details main / test (test-full-386) (push) Blocked by required conditions Details main / test (test-pure) (push) Blocked by required conditions Details publish-docs / Build (push) Waiting to run Details	2024-10-13 22:20:31 +02:00
Aliaksandr Valialkin	b4b79a4961	lib/logstorage: make a copy of s.partitions slice when performing queries over the selected partitions s.partitions can be changed when new partition is registered or when old partition is dropped. This could lead to data races and panics when s.partitions slice is accessed by concurrently executed queries. The fix is to make a copy of the selected partitions under s.partitionsLock before performing the query.	2024-10-13 22:14:34 +02:00
Aliaksandr Valialkin	507b206a7d	lib/logstorage: move getConstColumnValue() and getColumnHeader() methods from columnsHeader to blockSearch Some checks are pending build / Build (push) Waiting to run Details CodeQL Go / Analyze (push) Waiting to run Details main / lint (push) Waiting to run Details main / test (test-full) (push) Blocked by required conditions Details main / test (test-full-386) (push) Blocked by required conditions Details main / test (test-pure) (push) Blocked by required conditions Details publish-docs / Build (push) Waiting to run Details This localizes blockSearch.getColumnsHeader() call at block_search.go . This call is going to be optimized in the next commits in order to avoid unmarshaling of header data for unneeded columns, which weren't requested by getConstColumnValue() / getColumnHeader().	2024-10-13 14:29:02 +02:00
Aliaksandr Valialkin	279e25e7c8	lib/logstorage: avoid redundant copying of column names and column values for dictionary-encoded columns during querying Refer the original byte slice with the marshaled columnsHeader for columns names and dictionary-encoded column values. This improves query performance a bit when big number of blocks with big number of columns are scanned during the query.	2024-10-13 13:25:38 +02:00
Aliaksandr Valialkin	9e48074b59	lib/logstorage: avoid calling columnsHeader.initFromBlockHeader() multiple times for the same blockSearch This should improve performance when blockSearch.getColumnsHeader() is called multiple times from different places of the code.	2024-10-13 12:56:12 +02:00
Aliaksandr Valialkin	867f671cc4	lib/logstorage: make sure that bs.br is non-nil before checking br.bs.bsw.bh.rowsCount there Some checks are pending build / Build (push) Waiting to run Details CodeQL Go / Analyze (push) Waiting to run Details main / lint (push) Waiting to run Details main / test (test-full) (push) Blocked by required conditions Details main / test (test-full-386) (push) Blocked by required conditions Details main / test (test-pure) (push) Blocked by required conditions Details publish-docs / Build (push) Waiting to run Details br.bs may be nil when br contains the block with additional filters applied during pipe calculations. For example, `* \| count() if (error) errors`.	2024-10-12 20:51:29 +02:00
Andrii Chubatiuk	9eb0c1fd86	lib/protoparser/opentelemetry: added exponential histograms support (#6354 ) ### Describe Your Changes added opentelemetry exponential histograms support. Such histograms are automatically converted into VictoriaMetrics histogram with `vmrange` buckets. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-10-11 13:44:52 +02:00
Aliaksandr Valialkin	7b475ed95d	lib/logstorage: disallow using pipe names as the first unquoted words in `filter` pipe Improperly written pipes could be silently parsed as filter pipe. For example, the following query: * \| by (x) was silently parsed to: * \| filter "by" x It is better to return error, so the user could identify and fix invalid pipe instead of silently executing invalid query with `filter` pipe.	2024-10-09 16:10:13 +02:00
Aliaksandr Valialkin	6acf543b90	lib/logstorage: disallow using by as the first word in log filters, since it frequently clashes with `stats by(...)` pipe where `stats` word is omitted	2024-10-09 15:53:15 +02:00
Zakhar Bessarab	eefae85450	vmagent: add support of HTTP2 client for Kubernetes SD (#7114 ) ### Describe Your Changes Currently, vmagent always uses a separate `http.Client` for every group watcher in Kubernetes SD. With a high number of group watchers this leads to large amount of opened connections. This PR adds 2 changes to address this: - re-use of existing `http.Client` - in case `http.Client` is connecting to the same API server and uses the same parameters it will be re-used between group watchers - HTTP2 support - this allows to reuse connections more efficiently due to ability of using streaming via existing connections. See this issue for the details and test results - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5971 ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-10-08 10:36:31 +02:00
Aliaksandr Valialkin	89686094a0	lib/logstorage: allow special chars in unquoted _stream tag names and values Some checks are pending build / Build (push) Waiting to run Details CodeQL Go / Analyze (push) Waiting to run Details main / lint (push) Waiting to run Details main / test (test-full) (push) Blocked by required conditions Details main / test (test-full-386) (push) Blocked by required conditions Details main / test (test-pure) (push) Blocked by required conditions Details publish-docs / Build (push) Waiting to run Details This simplifies writing _stream filters. For example, {foo-bar=abc:de} can be written instead of {"foo-bar"="abc:de"}	2024-10-07 15:10:03 +02:00
Aliaksandr Valialkin	462b7cd597	lib/logstorage: quote logfmt strings only if they contain special chars, which could break logfmt parsing and/or reading	2024-10-07 14:31:30 +02:00
Artem Fetishev	c1cd3e85a7	lib/promscrape: Fix TestClientProxyReadOk flaky test (#7173 ) This PR fixes #7062 For hijacked connections, one has to read from the connection buffer, but still write directly to the connection. Otherwise, when reading directly from such connections, the first byte may be lost. This, in turn corrupts the ClientHello TLS handshake message and when the backend server receives it, it closes the connection and reports the following error in the log: ``` http: TLS handshake error from 127.0.0.1:33150: tls: first record does not look like a TLS handshake ``` The first byte may be lost because underlying HTTP request handler may read it from the connection and put it into the buffer. As the result, subsequent connection reads won't see that byte. - See: https://github.com/golang/go/issues/27408 - The fix is taken from : https://github.com/k3s-io/k3s/pull/6216 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>	2024-10-03 18:27:15 +02:00
Aliaksandr Valialkin	364f084b43	lib/logstorage: add `len` pipe for calculating byte length of log field values	2024-10-03 18:21:10 +02:00
Roman Khavronenko	0d4f4b8f7d	(app\|lib)/vmstorage: do not increment `vm_rows_ignored_total` on NaNs (#7166 ) `vm_rows_ignored_total` metric is a metric for users to signalize about ingestion issues, such as bad timestamp or parsing error. In commit `a5424e95b3` this metric started to increment each time vmstorage gets NaN. But NaN is a valid value for Prometheus data model and for Prometheus metrics exposition format. Exporters from Prometheus ecosystem could expose NaNs as values for metrics and these values will be delivered to vmstorage and increment the metric. Since there is nothing user can do with this, in opposite to parsing errors or bad timestamps, there is not much sense in incrementing this metric. So this commit rolls-back `reason="nan_value"` increments. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-02 12:37:27 +02:00
Aliaksandr Valialkin	a350be48b6	lib/logstorage: do not count dictionary values which have no matching logs in `count_uniq` stats function Create blockResultColumn.forEachDictValue* helper functions for visiting matching dictionary values. These helper functions should prevent from counting dictionary values without matching logs in the future. This is a follow-up for `0c0f013a60` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7152	2024-10-01 13:34:45 +02:00
Aliaksandr Valialkin	630211cfed	app/vlogscli: add interactive command-line tool for querying VictoriaLogs	2024-10-01 12:23:07 +02:00
Zhu Jiekun	7bb8853a5c	feature: [vmagent] Add service discovery support for OVH Cloud VPS and dedicated server (#6160 ) ### Describe Your Changes related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6071 #### Added - Added service discovery support for OVH Cloud: - VPS. - Dedicated server. #### Docs - `CHANGELOG.md`, `sd_configs.md`, `vmagent.md` are updated. #### Note - Useful links: - OVH Cloud VPS API: https://eu.api.ovh.com/console/#/vps~GET - OVH Cloud Dedicated server API: https://eu.api.ovh.com/console/#/dedicated/server~GET - OVH Cloud SDK: https://github.com/ovh/go-ovh - Prometheus SD: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ovhcloud_sd_config Tested on OVH Cloud VPS and dedicated server. <img width="1722" alt="image" src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/d3f0adc8-b0ef-423e-9379-8a9b9b0792ee"> <img width="1724" alt="image" src="https://github.com/VictoriaMetrics/VictoriaMetrics/assets/30280396/18b5b730-3512-4fc0-8b2c-f2450ac550fd"> --- Signed-off-by: Jiekun <jiekun@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-09-30 14:42:46 +02:00
Hui Wang	664f337c70	stream aggregation: fix possible duplicated aggregation results (#7118 ) When ingesting samples with the same labels(duplicated samples or samples with the same labels after `by` or `without` options). They could register different entries for the same labelset in LabelsCompressor. For example, both index 99 and 100 can be assigned to label `foo=1` in two concurrent pushes. Then due to differing label indexes in encoded keys, the samples will appear as distinct in aggrState, resulting in duplicated results after decompressing the label indexes. `fbde238cdc/lib/streamaggr/streamaggr.go (L933)` In this pull request, since we need to store `idxToLabel` first to ensure the idx can be searched after `lc.labelToIdxStore`, the `lc.idxToLabel` still could contain a duplicated entries [100]="foo=1". But given the low likelihood of this issue and the size of idxToLabel, it should be fine.	2024-09-30 14:24:59 +02:00
Aliaksandr Valialkin	0c0f013a60	lib/logstorage: skip values with zero hits for 'uniq', 'top' and 'field_values' pipes See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/72#issuecomment-2352078483	2024-09-30 14:15:07 +02:00
Artem Fetishev	ed5da38ede	Introduce a flag for limiting the number of time series to delete (#7091 ) ### Describe Your Changes Introduce the `-search.maxDeleteSeries` flag that limits the number of time series that can be deleted with a single `/api/v1/admin/tsdb/delete_series` call. Currently, any number can be deleted and if the number is big (millions) then the operation may result in unaccounted CPU and memory usage spikes which in some cases may result in OOM kill (see #7027). The flag limits the number to 30k by default and the users may override it if needed at the vmstorage start time. --------- Signed-off-by: Artem Fetishev <rtm@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-30 10:02:21 +02:00
Aliaksandr Valialkin	1da4650143	lib/logstorage: allow using `!` in unescaped phrase Previously the phrase filter with `!` was treated unexpectedly. For example, `foo!bar` filter was treated at `foo AND NOT bar`, while most users expect that it matches "foo!bar" phrase. This commit aligns with users' expectations.	2024-09-29 11:14:15 +02:00
Aliaksandr Valialkin	60183c7c79	lib/logstorage: allow using `-` instead of `!` in front of `(...)`	2024-09-29 11:12:22 +02:00
Nikolay	3bbb2aed72	fscore: rollback trailing space trim (#7106 ) Previous commit `201fd6de1e` removed trailing space trim from data read from file. But common practice is to remove such trailing space. And it leaded to the authorization errors for the major group of users. In first place, this change must help to mitigate an issue with kubernetes. When authorization information was read from Secret content. Changes to the operator was made to mitigate such problem at commit `1cf64358c8` We could introduce later optional flag for VictoriaMetrics to disable trim space behavior. Related issues: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6986 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7089 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6947 --------- Signed-off-by: f41gh7 <nik@victoriametrics.com> Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>	2024-09-29 10:59:25 +02:00
Aliaksandr Valialkin	b52862badf	lib/logstorage: return the expected `hits` results from `uniq` pipe when the number of unique values reaches the specified limit Previously `uniq` pipe could return zero `hits` if the number of found unique values equals the specified limit. This wasn't expected in most cases.	2024-09-29 10:51:09 +02:00
Aliaksandr Valialkin	55eb321f77	lib/logstorage: clear hits slice obtained from encoding.GetUint64s() before updating it with hits for valueTypeDict column encoding.GetUint64s() returns uninitialized slice, which may contain arbitrary values. So values in this slice must be reset to zero before using it for counting hits in `uniq` and `top` pipes.	2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin	94afcbd9a9	lib/logstorage: postpone initialization of per-shard stateSizeBudget until the first call to pipeProcessor.writeBlock() This simplifies pipeProcessor initialization logic a bit. This also doesn't mangle the original maxStateSize value, which is used in error messages when the state size exceeds maxStateSize.	2024-09-29 10:29:13 +02:00
Aliaksandr Valialkin	0b91452ca4	lib/logstorage: add non-empty `if (...)` condition to automatically generated result names in `stats` pipe This allows executing queries with `stats` pipe, which calculate multiple results with the same functions, but with different `if (...)` conditions. For example: _time:5m \| count(), count() if (error) Previously such queries couldn't be executed becasue automatically generated name for the second result didn't include `if (error)`, so names for both results were identical - `count(*)`.	2024-09-29 09:51:28 +02:00
Aliaksandr Valialkin	8772aea24b	lib/logstorage: support `order` alias for `sort` pipe Now the following queries are equivalents: _time:5s \| sort by (_time) _time:5s \| order by (_time) This is needed for convenience, since `order by` is commonly used in other query languages such as SQL.	2024-09-29 09:51:27 +02:00
Aliaksandr Valialkin	09b309a82e	lib/logstorage: allow using `-` instead of `!` as a shorthand for `NOT` operator in LogsQL	2024-09-27 13:14:47 +02:00
Aliaksandr Valialkin	76c1b0b8ea	lib/logstorage: support skipping _stream: prefix for stream filters '_stream:{...}' can be written as '{...}' This simplifies writing queries with stream filters, and makes them more familier to Loki users.	2024-09-27 13:14:46 +02:00
Aliaksandr Valialkin	9367a9a6a2	lib/logstorage: consistently sort stream contexts belonging to different streams by the minimum time seen in the matching logs This should simplify debugging of stream_context output, since it remains stable over repeated requests.	2024-09-27 11:19:26 +02:00
Aliaksandr Valialkin	b49d1ea809	lib/logstorage: add _msg="---" delimiter between different log streams in stream_context output This should help investigating contexts, which belong to different log streams.	2024-09-27 11:01:13 +02:00
Aliaksandr Valialkin	b82bd0c2ec	lib/logstorage: improve performance for stream_context pipe over streams with big number of log entries Do not read timestamps for blocks, which cannot contain surrounding logs. This should improve peformance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 . Also optimize min(_time) and max(_time) calculations a bit by avoiding conversion of timestamp to string when it isn't needed. This should improve performance for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-26 22:22:23 +02:00
Aliaksandr Valialkin	3646724c6f	lib/contextutil: make golanci-lint happy by substituing unused function arg name with _ This is a follow-up for `4b1611267f`	2024-09-26 17:06:48 +02:00
Aliaksandr Valialkin	4b1611267f	lib/logstorage: properly return surrounding logs outside the selected time range by stream_context pipe Previously only logs inside the selected time range could be returned by stream_context pipe. For example, the following query could return up to 10 surrounding logs only for the last 5 minutes, while most users expect this query should return up to 10 surrounding logs without restrictions on the time range. _time:5m panic \| stream_context before 10 This enables the ability to implement stream context feature at VictoriaLogs web UI: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7063 . Reduce memory usage when returning stream context over big log streams with millions of entries. The new logic scans over all the log messages for the selected log stream, while keeping in memory only the given number of surrounding logs. Previously all the logs for the given log stream on the selected time range were loaded in memory before selecting the needed surrounding logs. This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6730 . Reduce the scan performance for big log streams by fetching only the requested fields. For example, the following query should be executed much faster than before if logs contain many fields other than _stream, _msg and _time: panic \| stream_context after 30 \| fields _stream, _msg, _time	2024-09-26 17:03:45 +02:00
Aliaksandr Valialkin	037652d5ae	app/vlinsert: support `_time` field without timezone information during data ingestion Use local timezone of the host server in this case. The timezone can be overridden with TZ environment variable if needed. While at it, allow using whitespace instead of T as a delimiter between data and time in the ingested _time field. For example, '2024-09-20 10:20:30' is now accepted during data ingestion. This is valid ISO8601 format, which is used by some log shippers, so it should be supported. This format is also known as SQL datetime format. Also assume local time zone when time without timezone information is passed to querying APIs. Previously such a time was parsed in UTC timezone. Add `Z` to the end of the time string if the old behaviour is preferred. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6721	2024-09-26 12:49:35 +02:00
Aliaksandr Valialkin	255d1d4e13	app/vlselect/logsql: clone the query with the current timestamp when performing live tailing requests in the loop Previously the original timestamp was used in the copied query, so _time:duration filters were applied to the original time range: (timestamp-duration ... timestamp]. This resulted in stopped live tailing, since new logs have timestamps bigger than the original time range. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7028	2024-09-26 08:57:23 +02:00
Aliaksandr Valialkin	e9950f6307	lib/logstorage: add `blocks_count` pipe This pipe is useful for debugging purposes when the number of processed blocks must be calculated for the given query: <query> \| blocks_count This helps detecting the root cause of query performance slowdown in cases like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	65b93b17b1	lib/logstorage: lazily read column headers metadata during queries This improves performance for analytical queries, which do not need column headers metadata. For example, the following query doesn't need column headers metadata, since _stream and min(_time) are stored in block header, which is read separately from colum headers metadata: _time:1w \| stats by (_stream) min(_time) min_time This commit significantly improves the performance for this query. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070	2024-09-25 19:17:48 +02:00
Aliaksandr Valialkin	4599429f51	lib/logstorage: read timestamps column when it is really needed during query execution Previously timestamps column was read unconditionally on every query. This could significantly slow down queries, which do not need reading this column like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7070 .	2024-09-25 19:17:47 +02:00
Aliaksandr Valialkin	7f1ba18719	lib/logstorage: improve the performance of obtaining _stream column value Substitute global streamTagsCache with per-blockSearch cache for ((stream.id) -> (_stream value)) entries. This improves scalability of obtaining _stream values on a machine with many CPU cores, since every CPU has its own blockSearch instance. This also should reduce memory usage when querying logs over big number of streams, since per-blockSearch cache of ((stream.id) -> (_stream value)) entries is limited in size, and its lifetime is bounded by a single query.	2024-09-24 20:57:00 +02:00
Aliaksandr Valialkin	cf2e7d0d92	lib/logstorage/consts.go: document that it isn't recommended setting maxColumnsPerBlock constant to too big values This should help avoiding cases like this one - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6425#issuecomment-2337446083	2024-09-24 18:51:46 +02:00
Aliaksandr Valialkin	f86e093b20	lib/logstorage: improve performance for streamID.marshalString() by more than 2x The streamID.marshalString() is executed in hot path if the query selects _stream_id field. Command to run the benchmark: go test ./lib/logstorage/ -run=NONE -bench=BenchmarkStreamIDMarshalString -benchtime=5s Results before the commit: BenchmarkStreamIDMarshalString-16 438480714 14.04 ns/op 71.23 MB/s 0 B/op 0 allocs/op Results after the commit: BenchmarkStreamIDMarshalString-16 982459660 6.049 ns/op 165.30 MB/s 0 B/op 0 allocs/op	2024-09-24 18:35:04 +02:00
Aliaksandr Valialkin	919d2dc90e	lib/logstorage: add benchmark for streamID.marshalString	2024-09-24 18:31:38 +02:00
hagen1778	8bb3f2fd43	lib/promscrape: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-24 15:12:55 +02:00
hagen1778	c7569dac50	lib/promscrape: temporary disable TestClientProxyReadOk This test is very flaky and prevents other tests from running in CI. Disabling this test should improve tests quality, since it isn't reliable anyway. There is a ticket to fix this test - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7062 Once fixed, this test should be uncommented. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-09-24 14:59:25 +02:00
Dmytro Kozlov	cbeb7d50e8	lib/promscrape: show only unhealthy targets if `show_only_unhealthy` filter is enabled (#6960 ) ### Describe Your Changes It is better to show only unhealthy targets instead of all of them when `show_only_unhealthy` filter is enabled. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3536 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-09-24 12:18:24 +02:00
Aliaksandr Valialkin	109772bdc4	lib/cgroup: round GOMAXPROCS to the lower integer value of cpuQuota Rounding GOMAXPROCS to the upper interger value of cpuQuota increases chances of CPU starvation, non-optimimal goroutine scheduling and additional CPU overhead related to context switching. So it is better to round GOMAXPROCS to the lower integer value of cpuQuota.	2024-09-23 16:09:12 +02:00
Artem Fetishev	55febc0920	lib/storage: restore ability to put empty metric ID list into tagFiltersToMetricIDsCache (#7064 ) ### Describe Your Changes Currently it the metricID list is empty it won't be mashalled and as the result won't be put into the tagFiltersToMetricIDsCache which causes the cache misses for the corresponding tagFilters. In some setups this causes severe search speed detradation (see #7009). The empty metric IDs was covered before but then was accidentally removed in `6c21439`. This PR restores the coverage of this case. A new unit test can be used as a proof that empty metricID lists are not added to the cache (just remove the fix in index_db.go and run the test to see the result) Also a benchmark has been added to see the implications of the compression. ``` user@laptop:~/p/github.com/rtm0/VictoriaMetrics/01/src$ go test ./lib/storage/ -run=NONE -bench BenchmarkMarshalUnmarshalMetricIDs --loggerLevel=ERROR goos: linux goarch: amd64 pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/storage cpu: 13th Gen Intel(R) Core(TM) i7-1355U BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-0-12 3237240 363.5 ns/op 0 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1-12 2831049 451.8 ns/op 0.4706 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10-12 1152764 1009 ns/op 1.667 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100-12 297055 3998 ns/op 5.755 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000-12 31172 34566 ns/op 8.484 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000-12 4900 289659 ns/op 9.416 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-100000-12 447 2341173 ns/op 9.456 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-1000000-12 42 24926928 ns/op 9.468 compression-rate BenchmarkMarshalUnmarshalMetricIDs/numMetricIDs-10000000-12 5 204098872 ns/op 9.467 compression-rate PASS ok github.com/VictoriaMetrics/VictoriaMetrics/lib/storage 15.018s ``` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-09-20 17:21:53 +02:00
Aliaksandr Valialkin	787b9cd9a0	lib/storage: improve performance for indexSearch.containsTimeRange() The indexSearch.containsTimeRange() function is called for the current indexDB and the previous indexDB every time when searching for metricIDs by label filters. This function consumes a lot of additional CPU time for cases when queries with lightweight label filters are sent to VictoriaMetrics at high rate (e.g. thousands of RPS), like in the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7009 . Optimize indexSearch.containsTimeRange() function in the following ways: - Unconditionally return true if this function is called for the current indexDB, since there are very high chances that the current indexDB contains the data with timestamps in the requested time range. - Cache the minimum timestamp, which is missing in the indexed data for the previous indexDB. This is safe to do, since the previous indexDB is readonly. This optimization eliminates potentially slow lookup in the previous indexDB for typical use cases when the requested time range is close to the current time.	2024-09-20 13:07:20 +02:00
Aliaksandr Valialkin	6f61e9d49d	lib/storage: simplify indexDB.doExtDB() usage by removing the returned value Previously indexDB.doExtDB() was returning boolean value, which was indicating whether f callback was called. There is no need in returning this boolean value, since the f callback can determine on itself whether it was called. This simplifies the code a bit. While at it, document indexDB.doExtDB().	2024-09-20 11:59:57 +02:00
Roman Khavronenko	218c533874	lib/storage: follow-up after `d8f8822fa5` (#7036 ) Make function name and comments more clear. `d8f8822fa5` Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-09-20 11:50:47 +02:00
Aliaksandr Valialkin	a3d8077959	lib/logstorage: make sure that getCommonTokens returns common tokens in the original order of tokens inside tokenSets arg This fixes flaky test TestGetCommonTokensForOrFilters: filter_or_test.go:143: unexpected tokens for field "_msg"; got ["foo" "bar"]; want ["bar" "foo"]	2024-09-19 15:59:48 +02:00
Roman Khavronenko	e115b85770	lib/logger: increase default value of `-loggerMaxArgLen` cmd-line fla… (#7008 ) …g from 1e3 to 5e3 This should improve visibility on errors produced by very long queries. The change is classified as BUG in order to port it to LTS releases. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Mathias Palmersheim <mathias@victoriametrics.com>	2024-09-19 14:29:18 +02:00
Nikolay	d8f8822fa5	lib/storage: consistently check for missing metricID index records (#6967 ) * Previously, only metricID->metricName missing index records were tracked with deadline But it was possible a case for missing metricID->TSID index records. IndexDB metrics fix exposed misleading metric for such missing records. * This commit adds check for metricID->TSID missing index records. And delete missing metricID entry if it hit 60 second deadline. Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6931 Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-16 10:05:08 +02:00
Nikolay	264c2ec6bd	lib/fs: properly call windows APIs (#6998 ) Previously we manually imported system windows DDLs and made direct syscall. But golang exposes syscall wrappers with sys/windows package. It seems, that direct syscall was broken at 1.23 golang release. It was `GetDiskFreeSpace` syscall in our case. This commit replaces all manual syscalls with wrappers Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6973 Related golang issue: https://github.com/golang/go/issues/69029 Signed-off-by: f41gh7 <nik@victoriametrics.com>	2024-09-13 12:22:25 +02:00
Aliaksandr Valialkin	657988ac3a	app/vlselect: consistently reuse the original query timestamp when executing /select/logsql/query with positive limit=N query arg Previously the query could return incorrect results, since the query timestamp was updated with every Query.Clone() call during iterative search for the time range with up to limit=N rows. While at it, optimize queries, which find low number of matching logs, while spend a lot of CPU time for searching across big number of logs. The optimization reduces the upper bound of the time range to search if the current time range contains zero matching rows. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785	2024-09-08 14:32:23 +02:00
Aliaksandr Valialkin	45a3713bdb	lib/logstorage: preserve the order of tokens to check against bloom filters in AND filters Previously tokens from AND filters were extracted in random order. This could slow down checking them agains bloom filters if the most specific tokens go at the beginning of the AND filters. Preserve the original order of tokens when matching them against bloom filters, so the user could control the performance of the query by putting the most specific AND filters at the beginning of the query. While at it, add tests for getCommonTokensForAndFilters() and getCommonTokensForOrFilters(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-08 12:27:30 +02:00
Aliaksandr Valialkin	eaee2d7db4	lib/logstorage: improve error logging for incorrect queries passed to /select/logsql/stats_query and /select/logsql/stats_query_range functions	2024-09-08 11:24:44 +02:00
Aliaksandr Valialkin	1cd06ace5a	lib/logstorage: properly extract common tokens from unsupported OR filters Previously the following query could miss rows matching !bar if these rows do not contain foo: foo OR !bar This is because of incorrect detection of common tokens for OR filters - all the unsupported filters were skipped (including the NOT filter (aka `!`)), while in this case zero common tokens must be returned. While at it, move repetiteve code in TestFilterAnd and TestFilterOr into f function. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-08 11:14:55 +02:00
Aliaksandr Valialkin	0a40064a6f	app/vlselect: add /select/logsql/stats_query_range endpoint for building time series panels in VictoriaLogs plugin for Grafana Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6943 Updates https://github.com/VictoriaMetrics/victorialogs-datasource/issues/61	2024-09-07 00:41:47 +02:00
Aliaksandr Valialkin	c9bb4ddeed	app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6942 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706	2024-09-06 23:06:43 +02:00
Aliaksandr Valialkin	00e7d5add3	lib/logstorage: substitute `\|` operator with `or` operator at `math` pipe This is needed for avoiding confusion between the `\|` operator at `math` pipe and `\|` pipe delimiter. For example, the following query was parsed unexpectedly: * \| math foo / bar \| fields x as * \| math foo / (bar \| fields) as x Substituting `\|` with `or` inside `math` pipe fixes this ambiguity.	2024-09-06 22:44:14 +02:00
Artem Fetishev	a5424e95b3	lib/storage: adds metrics that count records that failed to insert ### Describe Your Changes Add storage metrics that count records that failed to insert: - `RowsReceivedTotal`: the number of records that have been received by the storage from the clients - `RowsAddedTotal`: the number of records that have actually been persisted. This value must be equal to `RowsReceivedTotal` if all the records have been valid ones. But it will be smaller otherwise. The values of the metrics below should provide the insight of why some records hasn't been added - `NaNValueRows`: the number of records whose value was `NaN` - `StaleNaNValueRows`: the number of records whose value was `Stale NaN` - `InvalidRawMetricNames`: the number of records whose raw metric name has failed to unmarshal. The following metrics existed before this PR and are listed here for completeness: - `TooSmallTimestampRows`: the number of records whose timestamp is negative or is older than retention period - `TooBigTimestampRows`: the number of records whose timestamp is too far in the future. - `HourlySeriesLimitRowsDropped`: the number of records that have not been added because the hourly series limit has been exceeded. - `DailySeriesLimitRowsDropped`: the number of records that have not been added because the daily series limit has been exceeded. --- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-06 17:57:21 +02:00
Aliaksandr Valialkin	0205170409	lib/logstorage: consistently use nsecsPerDay constant and remove nsecPerDay constant	2024-09-06 16:17:04 +02:00
Aliaksandr Valialkin	258ccfb953	lib/logstorage: pre-calculate hashes from tokens used in bloom filter search Previously per-token hashes for per-block bloom filters were re-calculated on every scanned block. This could be slow when the number of tokens is big or when the number of blocks to scan is big. Pre-calculate hashes for bloom filters and then use them for searching in bloom filters. This improves performance by 2.5x for in(...) filters with many values to search inside `in()`.	2024-09-05 19:44:17 +02:00
Zhu Jiekun	c193e6d43e	lib/discovery/azure: fix host check in next link in Azure SD (#6915 ) Previous bugfix at `49f63b2` only partially fixed pagination host validation error. Before this fix it was: ``` unexpected nextLink host \"management.azure.com\", expecting \"https://management.azure.com\" ``` Now we only check the `Host` without schema. However, when Azure respond `nextLink` in `Host:Port` format, the `nextLink` check will fail: ``` unexpected nextLink host \"management.azure.com:443\", expecting \"management.azure.com\" ``` This pull request further relaxes the checks by only checking the `Hostname`. --- related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6912	2024-09-05 16:48:09 +02:00
Artem Fetishev	39294b4919	lib/storage: do not drop stale NaN samples (#6936 ) This patch reverts `1fd3385` After discussing it we've come to conclusion that this is a valid behavior which can be avoided by deleting the time series only once the corresponding stale NaNs have been received. On the other hand, the fix leads to lost stale NaNs in some rare but valid use cases. For example: - In a cluster configuration the samples for a given time series are normally sent to the same vmstorage replica. However, wminsert may reroute the samples to another replica because the original one is down or is overloaded. In this case the stale NaN may end up on a replica that has no data for that time series, but we still want to record that sample. Thus, reverting that fix. --- related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069 Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-05 16:45:09 +02:00
Hui Wang	b48f5f3e59	lib/storage: fix metric `vm_object_references{type="indexdb"}` (#6937 ) follow up `4ecc370acb` ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-09-05 16:42:49 +02:00
Aliaksandr Valialkin	49e57ea80e	lib/logstorage: delete unused function - bloomfilter.containsAny	2024-09-05 16:21:06 +02:00
Aliaksandr Valialkin	2dd845fa53	lib/logstorage: properly fix incorrect extraction of common tokens for `OR` filters at distinct log fields Previously (f1:foo OR f2:bar) was incorrectly returning `foo` token for `f1` and `bar` token for `f2`. These tokens were used for checking against bloom filter for every data block, so the data block, which didn't contain simultaneously `foo` token for `f1` field and `bar` token for `f2` field, was skipped. This was incorrect, since such a block may contain logs matching the original OR filter. The fix is to return common tokens from `OR`-delimted filters only if these tokens exist at EVERY such filter for the given field name. If some `OR`-delimited filter misses the given field name, then `OR`-delimited filters do not contain common tokens, which could be used for checking against bloom filter. While at it, add more tests covering various edge cases for filters delimited by AND and OR. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6554 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6556	2024-09-05 14:29:50 +02:00
f41gh7	7b0aaf1ea2	follow-up after `01430a155c` * properly check SeverityNumber at FormatSeverity function it could be negative, which could cause panic for victorialogs	2024-09-04 15:36:34 +02:00
Andrii Chubatiuk	01430a155c	vlinsert: added opentelemetry logs support Commit adds the following changes: * Adds support of OpenTelemetry logs for Victoria Logs with protobuf encoded messages * json encoding is not supported for the following reasons: - It brings a lot of fragile code, which works inefficiently. - json encoding is impossible to use with language SDK. * splits metrics and logs structures at lib/protoparser/opentelemetry/pb package. * adds docs with examples for opentelemetry logs. --- Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4839 Co-authored-by: AndrewChubatiuk <andrew.chubatiuk@gmail.com> Co-authored-by: f41gh7 <nik@victoriametrics.com>	2024-09-03 20:12:05 +02:00
rtm0	4df243d530	lib/storage: improve the message of the tooManyTimeseries error (#6893 ) ### Describe Your Changes This is a follow-up for #6836. Per @valyala's [comment](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6836#discussion_r1730291704), the error message does not reflect which flag needs to be adjusted. ### Checklist The following checks are mandatory: - [x ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-09-03 10:28:03 +02:00
jackyin	975ed27a76	lib/logstorage: `and` filter results in unexpected response (#6556 ) fix #6554 andfilter shouldn't return orfilter field which result in bloomfilter return false. --------- Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-09-03 10:17:44 +02:00
rtm0	2c856c6951	tests: check Metrics.RowsAddedTotal in unit tests (#6895 ) ### Describe Your Changes This is a follow-up PR: Unit tests introduced in #6872 can now use RowsAddedTotal counter whose scope was fixed in #6841. ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-30 14:31:15 +02:00
Roman Khavronenko	f586082520	attempt to fix flaky TestClientProxyReadOk (#6899 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-30 13:23:32 +02:00
dufucun	95bafc8caf	tests: fix slice init length (#6897 ) ### Describe Your Changes fix slice init length ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: dufucun <dufuchun@sohu.com>	2024-08-30 10:55:25 +02:00
rtm0	334cd92a6c	testing: allow disabling fsync to make tests run faster (#6871 ) ### Describe Your Changes fsync() ensures that the data is written to disk. In production this is needed for data durability. However, during the development, when the unit tests are run, this level of durability is not needed. Therefore fsync() can be disabled which will makes test runs two times faster. The disabling is done by setting the `DISABLE_FSYNC_FOR_TESTING` environment variable. The valid values for this variable are the same as the values of the arg of `go doc strconv.ParseBool`: ``` 1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False. ``` Any other value means `false`. The variable is set for all test build targets. Compare running times: Build Target \| DISABLE_FSYNC_FOR_TESTING=0 \| DISABLE_FSYNC_FOR_TESTING=1 ----------------- \| ------------------------------------------------ \| ------------------------------------------------- make test \| 1m5s \| 0m22s make test-race \| 3m1s \| 1m42s make test-pure \| 1m7s \| 0m20s make test-full \| 1m21s \| 0m32s make test-full-386 \| 1m42s \| 0m36s When running tests for a given package, fsync can be disabled as follows: ```shell DISABLE_FSYNC_FOR_TESTING=1 go test ./lib/storage ``` Disabling fsync() is intended for testing purposes only and the name of the variables reflects that. What could also have been done but haven't: - lib/filestream/filestream.go: `Writer.MustFlush()` also uses f.Sync() but nothing has been done to it, because the Writer.MustFlush() is not used anywhere in the VM codebase. A side question: what is the general policy for the unused code? - lib/filestream/filestream.go: Writer.Write() calls `adviceDontNeed()` which calls unix.Fdatasync(). Disabling it could potentially improve running time, but running tests with this code disabled has shown otherwise. ### Checklist The following checks are mandatory: - [ x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-08-30 10:54:46 +02:00
Nikolay	4ecc370acb	lib/storage: properly add previous indexDB metrics (#6890 ) Previously, some extIndexDB metrics were not registered. It resulted into missing metrics, if metric value was added to the extIndexDB. It's a usual case for search requests at both indexes. Current commit updates all metrics from extIndexDB according to the current IndexDB. It must fix such cases Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868 ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).	2024-08-28 11:14:28 +02:00
rtm0	9fcfba3927	lib/storage: properly handle maxMetrics limit at metricID search `TL;DR` This PR improves the metric IDs search in IndexDB: - Avoid seaching for metric IDs twice when `maxMetrics` limit is exceeded - Use correct error type for indicating that the `maxMetrics` limit is exceded - Simplify the logic of deciding between per-day and global index search A unit test has been added to ensure that this refactoring does not break anything. --- Function calls before the fix: ``` idb.searchMetricIDs \|__ is.searchMetricIDs \|__ is.searchMetricIDsInternal \|__ is.updateMetricIDsForTagFilters \|__ is.tryUpdatingMetricIDsForDateRange \| \| \|__ is.getMetricIDsForDateAndFilters ``` - `searchMetricIDsInternal` searches metric IDs for each filter set. It maintains a metric ID set variable which is updated every time the `updateMetricIDsForTagFilters` function is called. After each successful call, the function checks the length of the updated metric ID set and if it is greater than `maxMetrics`, the function returns `too many timeseries` error. - `updateMetricIDsForTagFilters` uses either per-day or global index to search metric IDs for the given filter set. The decision of which index to use is made is made within the `tryUpdatingMetricIDsForDateRange` function and if it returns `fallback to global search` error then the function uses global index by calling `getMetricIDsForDateAndFilters` with zero date. - `tryUpdatingMetricIDsForDateRange` first checks if the given time range is larger than 40 days and if so returns `fallback to global search` error. Otherwise it proceeds to searching for metric IDs within that time range by calling `getMetricIDsForDateAndFilters` for each date. - `getMetricIDsForDateAndFilters` searches for metric IDs for the given date and returns `fallback to global search` error if the number of found metric IDs is greater than `maxMetrics`. Problems with this solution: 1. The `fallback to global search` error returned by `getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is misleading. 2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search and returns `fallback to global search` error (because `getMetricIDsForDateAndFilters` returns it) then this will trigger global search in `updateMetricIDsForTagFilters`. However the global search uses the same maxMetrics value which means this search is destined to fail too. I.e. the same search is performed twice and fails twice. 3. `too many timeseries` error is already handled in `searchMetricIDsInternal` and therefore handing this error in `updateMetricIDsForTagFilters` is redundant 4. updateMetricIDsForTagFilters is a better place to make a decision on whether to use per-day or global index. Solution: 1. Use a dedicated error for `too many timeseries` case 2. Handle `too many timeseries` error in `searchMetricIDsInternal` only 3. Move the per-day or global search decision from `tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and remove `fallback to global search` error. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-27 21:39:03 +02:00
rtm0	eef6943084	lib/storage: properly register index records with RegisterMetricNames Once the timeseries is in tsidCache, new entries won't be created in per-day index because the RegisterMetricNames() code does consider different dates for the same timeseries. So this case has been added. The same bug exists for AddRows() but it is not manifested because the index entries are finally created in updatePerDateData(). RegisterMetricNames also updated to increase the newTimeseriesCreated counter because it actually creates new time series in index. A unit tests has been added that check all possible data patterns (different metric names and dates) and code branches in both RegisterMetricNames and AddRows. The total number of new unit tests is around 100 which increaded the running time of storage tests by 50%. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-08-27 21:33:53 +02:00
rtm0	30f98916f9	Move rowsAddedTotal counter to Storage (#6841 ) ### Describe Your Changes Reduced the scope of rowsAddedTotal variable from global to Storage. This metric clearly belongs to a given Storage object as it counts the number of records added by a given Storage instance. Reducing the scope improves the incapsulation and allows to reset this variable during the unit tests (i.e. every time a new Storage object is created by a test, that object gets a new variable). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-08-27 21:30:37 +02:00
Zhu Jiekun	e97e966f82	lib/promrelabel: follow-up for `8958cecad6` In the previous commit `8958cecad6` the default ports (80/443) were removed for both the `scrapeURL` and `instance` label values for those targets without a port in `__address__`. Different values in the `instance` label generate new time series. This commit reverts the changes made to the `instance` label. Now, for those targets: - `scrapeURL` will remain unchanged. - The `instance` label value will include the default port. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792	2024-08-27 13:04:26 +02:00
Nikolay	9feee15493	lib/promscrape: fixes proxy autorization (#6783 ) * Adds custom dial func for HTTP-Connect and socks5 proxy tunnels. Standard golang http.transport exposes GetProxyConnectHeader function, but it doesn't allow to use separate tls config for proxy. It also not possible to enforce HTTP-Connect with standard http lib. * For http scrape targets, by default http.Transport.Proxy function must be used. Since it has special case with full uri forward. * Adds proxy.URL json methods that allow to properly copy internal fields, like User/Password. It should fix bug with proxy_url. When credentials specified at URL was ignored. * Adds tests for scrape client proxy requests related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771	2024-08-19 22:31:18 +02:00
Zhu Jiekun	723d834c1a	lib/promrelabel: stop adding default port 80/433 to address label * It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed. * An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy. It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact. Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792 Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-19 22:28:49 +02:00
hagen1778	febba3971b	make go vet happy Address `non-constant format string in call` check: https://github.com/golang/go/issues/60529 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-19 21:15:33 +02:00
Roman Khavronenko	e58dde6925	lib/httputils: parse URL before creating HTTP transport (#6820 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6740 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-16 11:32:04 +02:00
Hui Wang	62d19369a3	stream aggregation: do not allow to enable `-stream.keepInput` and `k… (#6723 ) …eep_metric_names` options in stream aggregation config together With aggregated data and raw data under the same metric, results would be confusing. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-08-13 08:54:35 -04:00
Zhu Jiekun	9e2bd82376	app/vmagent: fixes azure service discovery pagination Azure API response with link to the next page was incorrectly validate. Validation used url.Host header to match configure API URL. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6784	2024-08-09 15:22:47 +02:00
Zakhar Bessarab	cb00b4b00f	lib/backup/s3remote: add retryer configuration (#6747 ) ### Describe Your Changes This helps to improve reliability of performing backups in environments with unreliable connection and tolerate temporary errors at S3 provider side. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6732 Default retry timeout is up to 3 minutes to make this consistent with the same configuration for GCS: `a05317f61f/lib/backup/gcsremote/gcs.go (L70-L76)` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-08-07 16:55:29 +02:00
Roman Khavronenko	f28f496a9d	lib/bytesutil: smooth buffer growth rate (#6761 ) Before, buffer growth was always x2 of its size, which could lead to excessive memory usage when processing big amount of data. For example, scraping a target with hundreds of MBs in response could result into hih memory spikes in vmagent because buffer has to double its size to fit the response. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759 The change smoothes out the growth rate, trading higher allocation rate for lower mem usage at certain conditions. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-07 16:49:43 +02:00
hagen1778	1154f90d2d	lib/mergeset: fix typos in comments Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-07 15:54:15 +02:00
Aliaksandr Valialkin	04981c7a7f	lib/streamaggr: remove resetState arg from aggrState.flushState() The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark. This benchmark was testing aggregate state flush performance by keeping the same state across flushes. The benhmark didn't reflect the performance and scalability of stream aggregation in production, while it led to non-trivial code changes related to resetState arg handling. So let's drop the benchmark together with all the code related to resetState handling, in order to simplify the code at lib/streamaggr a bit. Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314	2024-08-07 11:39:14 +02:00
Aliaksandr Valialkin	86c7afd126	lib/streamaggr: consistently use the same timestamp across all the output aggregated samples in a single aggregation interval Prevsiously every aggregation output was using its own timestamp for the output aggregated samples in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output aggregated samples. This commit consistently uses the same timestamp across all the output aggregated samples. This commit makes sure that the duration between subsequent timestamps strictly equals the configured aggregation interval. Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314 This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580	2024-08-07 11:39:13 +02:00

1 2 3 4 5 ...

2723 Commits