VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-24 03:06:48 +01:00

Author	SHA1	Message	Date
Nikolay	0f9536eaf5	lib/storage: properly add previous indexDB metrics (#6890 ) Previously, some extIndexDB metrics were not registered. It resulted into missing metrics, if metric value was added to the extIndexDB. It's a usual case for search requests at both indexes. Current commit updates all metrics from extIndexDB according to the current IndexDB. It must fix such cases Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6868 ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). (cherry picked from commit `4ecc370acb`)	2024-08-28 11:17:23 +02:00
rtm0	4c31a6a1fc	lib/storage: properly handle maxMetrics limit at metricID search `TL;DR` This PR improves the metric IDs search in IndexDB: - Avoid seaching for metric IDs twice when `maxMetrics` limit is exceeded - Use correct error type for indicating that the `maxMetrics` limit is exceded - Simplify the logic of deciding between per-day and global index search A unit test has been added to ensure that this refactoring does not break anything. --- Function calls before the fix: ``` idb.searchMetricIDs \|__ is.searchMetricIDs \|__ is.searchMetricIDsInternal \|__ is.updateMetricIDsForTagFilters \|__ is.tryUpdatingMetricIDsForDateRange \| \| \|__ is.getMetricIDsForDateAndFilters ``` - `searchMetricIDsInternal` searches metric IDs for each filter set. It maintains a metric ID set variable which is updated every time the `updateMetricIDsForTagFilters` function is called. After each successful call, the function checks the length of the updated metric ID set and if it is greater than `maxMetrics`, the function returns `too many timeseries` error. - `updateMetricIDsForTagFilters` uses either per-day or global index to search metric IDs for the given filter set. The decision of which index to use is made is made within the `tryUpdatingMetricIDsForDateRange` function and if it returns `fallback to global search` error then the function uses global index by calling `getMetricIDsForDateAndFilters` with zero date. - `tryUpdatingMetricIDsForDateRange` first checks if the given time range is larger than 40 days and if so returns `fallback to global search` error. Otherwise it proceeds to searching for metric IDs within that time range by calling `getMetricIDsForDateAndFilters` for each date. - `getMetricIDsForDateAndFilters` searches for metric IDs for the given date and returns `fallback to global search` error if the number of found metric IDs is greater than `maxMetrics`. Problems with this solution: 1. The `fallback to global search` error returned by `getMetricIDsForDateAndFilters` in case when maxMetrics is exceeded is misleading. 2. If `tryUpdatingMetricIDsForDateRange` proceeds to date range search and returns `fallback to global search` error (because `getMetricIDsForDateAndFilters` returns it) then this will trigger global search in `updateMetricIDsForTagFilters`. However the global search uses the same maxMetrics value which means this search is destined to fail too. I.e. the same search is performed twice and fails twice. 3. `too many timeseries` error is already handled in `searchMetricIDsInternal` and therefore handing this error in `updateMetricIDsForTagFilters` is redundant 4. updateMetricIDsForTagFilters is a better place to make a decision on whether to use per-day or global index. Solution: 1. Use a dedicated error for `too many timeseries` case 2. Handle `too many timeseries` error in `searchMetricIDsInternal` only 3. Move the per-day or global search decision from `tryUpdatingMetricIDsForDateRange` to `updateMetricIDsForTagFilters` and remove `fallback to global search` error. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-27 23:08:17 +02:00
rtm0	b51c6bf75d	lib/storage: properly register index records with RegisterMetricNames Once the timeseries is in tsidCache, new entries won't be created in per-day index because the RegisterMetricNames() code does consider different dates for the same timeseries. So this case has been added. The same bug exists for AddRows() but it is not manifested because the index entries are finally created in updatePerDateData(). RegisterMetricNames also updated to increase the newTimeseriesCreated counter because it actually creates new time series in index. A unit tests has been added that check all possible data patterns (different metric names and dates) and code branches in both RegisterMetricNames and AddRows. The total number of new unit tests is around 100 which increaded the running time of storage tests by 50%. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>	2024-08-27 23:00:27 +02:00
rtm0	332d290b38	Move rowsAddedTotal counter to Storage (#6841 ) ### Describe Your Changes Reduced the scope of rowsAddedTotal variable from global to Storage. This metric clearly belongs to a given Storage object as it counts the number of records added by a given Storage instance. Reducing the scope improves the incapsulation and allows to reset this variable during the unit tests (i.e. every time a new Storage object is created by a test, that object gets a new variable). Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>	2024-08-27 22:34:54 +02:00
Zhu Jiekun	5b49fd83be	lib/promrelabel: follow-up for `8958cecad6` In the previous commit `8958cecad6` the default ports (80/443) were removed for both the `scrapeURL` and `instance` label values for those targets without a port in `__address__`. Different values in the `instance` label generate new time series. This commit reverts the changes made to the `instance` label. Now, for those targets: - `scrapeURL` will remain unchanged. - The `instance` label value will include the default port. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792 (cherry picked from commit `e97e966f82`)	2024-08-27 15:44:07 +02:00
Nikolay	08cbbf8134	lib/promscrape: fixes proxy autorization (#6783 ) * Adds custom dial func for HTTP-Connect and socks5 proxy tunnels. Standard golang http.transport exposes GetProxyConnectHeader function, but it doesn't allow to use separate tls config for proxy. It also not possible to enforce HTTP-Connect with standard http lib. * For http scrape targets, by default http.Transport.Proxy function must be used. Since it has special case with full uri forward. * Adds proxy.URL json methods that allow to properly copy internal fields, like User/Password. It should fix bug with proxy_url. When credentials specified at URL was ignored. * Adds tests for scrape client proxy requests related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6771	2024-08-19 22:50:39 +02:00
Zhu Jiekun	8958cecad6	lib/promrelabel: stop adding default port 80/433 to address label * It was necessary to add default ports for fasthttp client. After migration to the std.httpclient it's no longer needed. * An additional configuration is required at proxy servers with implicitly set 80/443 ports to the host header (such as HA proxy. It's expected that after upgrade __address_ label may change. But it should be rare case. 80/443 ports are not widely used at monitoring ecosystem. And it shouldn't have much impact. Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6792 Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-08-19 22:50:39 +02:00
hagen1778	bd6405df01	make go vet happy Address `non-constant format string in call` check: https://github.com/golang/go/issues/60529 Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `febba3971b`)	2024-08-19 21:41:44 +02:00
Roman Khavronenko	d4240c4a3e	lib/httputils: parse URL before creating HTTP transport (#6820 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6740 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-16 11:34:49 +02:00
Hui Wang	e74d5f266e	stream aggregation: do not allow to enable `-stream.keepInput` and `k… (#6723 ) …eep_metric_names` options in stream aggregation config together With aggregated data and raw data under the same metric, results would be confusing. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `62d19369a3`)	2024-08-13 09:08:27 -04:00
Zhu Jiekun	49f63b2b9a	app/vmagent: fixes azure service discovery pagination Azure API response with link to the next page was incorrectly validate. Validation used url.Host header to match configure API URL. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6784	2024-08-09 15:26:18 +02:00
Zakhar Bessarab	54315fbad6	lib/backup/s3remote: add retryer configuration (#6747 ) ### Describe Your Changes This helps to improve reliability of performing backups in environments with unreliable connection and tolerate temporary errors at S3 provider side. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6732 Default retry timeout is up to 3 minutes to make this consistent with the same configuration for GCS: `a05317f61f/lib/backup/gcsremote/gcs.go (L70-L76)` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> (cherry picked from commit `cb00b4b00f`)	2024-08-07 16:59:23 +02:00
Roman Khavronenko	c41a9b8d17	lib/bytesutil: smooth buffer growth rate (#6761 ) Before, buffer growth was always x2 of its size, which could lead to excessive memory usage when processing big amount of data. For example, scraping a target with hundreds of MBs in response could result into hih memory spikes in vmagent because buffer has to double its size to fit the response. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6759 The change smoothes out the growth rate, trading higher allocation rate for lower mem usage at certain conditions. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `f28f496a9d`)	2024-08-07 16:59:23 +02:00
hagen1778	9e186c0319	lib/mergeset: fix typos in comments Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `1154f90d2d`)	2024-08-07 16:59:22 +02:00
Aliaksandr Valialkin	29d526e20a	lib/streamaggr: remove resetState arg from aggrState.flushState() The resetState arg was used only for the BenchmarkAggregatorsFlushInternalSerial benchmark. This benchmark was testing aggregate state flush performance by keeping the same state across flushes. The benhmark didn't reflect the performance and scalability of stream aggregation in production, while it led to non-trivial code changes related to resetState arg handling. So let's drop the benchmark together with all the code related to resetState handling, in order to simplify the code at lib/streamaggr a bit. Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314	2024-08-07 11:46:49 +02:00
Aliaksandr Valialkin	1332b6f912	lib/streamaggr: consistently use the same timestamp across all the output aggregated samples in a single aggregation interval Prevsiously every aggregation output was using its own timestamp for the output aggregated samples in a single aggregation interval. This could result in unexpected inconsitent timesetamps for the output aggregated samples. This commit consistently uses the same timestamp across all the output aggregated samples. This commit makes sure that the duration between subsequent timestamps strictly equals the configured aggregation interval. Thanks to @AndrewChubatiuk for the original idea at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6314 This commit should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580	2024-08-07 11:46:47 +02:00
Anzor	7e32daa63a	app/vmagent: read __sample_limit__ from labels (#6665 ) (#6666 ) By introducing this feature, users will have the ability to customize the sampleLimit parameter on a per-target basis, providing more flexibility and control over the job execution behavior. (cherry picked from commit `994796367b`)	2024-08-07 09:57:48 +02:00
hagen1778	c99700ae15	fix typos in comments Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `f283126084`)	2024-08-06 16:30:10 +02:00
Zakhar Bessarab	0b1def6e24	app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749 ) ### Describe Your Changes This is useful for clients which validate InfluxDB is available before data ingestion can be started. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6653 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `9877a5e7d5`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-05 09:45:32 +02:00
Juraj Bubniak	daa8c4970d	lib/backup/s3remote: fix typos (#6694 ) Fixes a few typos in errors in lib/backup/s3remote package. (cherry picked from commit `11c0b05e8a`)	2024-07-29 14:30:21 +02:00
jackyin	f0a87abedd	lib/netutil: validate TLS cert and key files immediately (#6621 ) Validate files specified via `-tlsKeyFile` and `-tlsCertFile` cmd-line flags on the process start-up. Previously, validation happened on the first connection accepted by HTTP server. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6608 --------- Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `e5d279bb71`)	2024-07-29 14:30:20 +02:00
Aliaksandr Valialkin	d2a825279b	Revert "refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 )" This reverts commit `e280d90e9a`. Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case when rows contain timestamps belonging to ptws[0]. The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0], but this partition is automatically moved to ptws[0] by the code at lines `6aad1d43e9/lib/storage/table.go (L287-L298)` , so the next time the typical case will work. Also the updated code makes the code harder to follow, since it introduces an additional level of indirection with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function. This function needs to be inspected and understood when reading the code at table.MustAddRows(). This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding the code even more. The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp() for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows() function multiple times. This makes the use of partition.HasTimestamp() call more consistent, easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition() calls. Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md, since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering. The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users. See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629	2024-07-25 14:43:00 +02:00
Ruixiang Tan	8e2ff15203	refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 ) ### Describe Your Changes The original logic is not only highly complex but also poorly readable, so it can be modified to increase readability and reduce time complexity. --------- Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>	2024-07-25 13:52:54 +02:00
Aliaksandr Valialkin	9b529c2742	lib/backup/azremote: follow-up for `5fd3aef549` - Mention that credentials can be configured via env variables at both vmbackup and vmrestore docs. - Make clear that the AZURE_STORAGE_DOMAIN env var is optional at https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables - Use string literals as is for env variable names instead of indirecting them via string constants. This makes easier to read and understand the code. These environment variable names aren't going to change in the future, so there is no sense in hiding them under string constants with some other names. - Refer to https://docs.victoriametrics.com/vmbackup/#providing-credentials-via-env-variables in error messages when auth creds are improperly configured. This should simplify figuring out how to fix the error. - Simplify the code a bit at FS.newClient(), so it is easier to follow it now. While at it, remove the check when superflouos environment variables are set, since it is too fragile and it looks like it doesn't help properly configuring vmbackup / vmrestore. - Remove envLookuper indirection - just use 'func(name string) (string, bool)' type inline. This simplifies code reading and understanding. - Split TestFSInit() into TestFSInit_Failure() and TestFSInit_Success(). This simplifies the test code, so it should be easier to maintain in the future. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6518 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984	2024-07-17 17:55:39 +02:00
Aliaksandr Valialkin	97d696ae8b	all: substitute double "the the" with "the" This is a follow-up for `8786a08d27` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6600	2024-07-17 14:29:05 +02:00
Aliaksandr Valialkin	f8aa445945	all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter The %q formatter may result in incorrectly formatted JSON string if the original string contains special chars such as \x1b . They must be encoded as \u001b , otherwise the resulting JSON string cannot be parsed by JSON parsers. This is a follow-up for `c0caa69939` See https://github.com/VictoriaMetrics/victorialogs-datasource/issues/24	2024-07-17 14:01:37 +02:00
Aliaksandr Valialkin	f2362812c3	lib/protoparser/graphite: use Regex.ReplaceAllLiteralString instead of Regex.ReplaceAllString for the case when the replacement cannot contain placeholders for capturing groups This is a follow-up for `74affa3aec`	2024-07-17 13:01:35 +02:00
Aliaksandr Valialkin	f7789b61e7	lib/protoparser/graphite: follow-up for `476faf5578` - Clarify the description of -graphite.sanitizeMetricName command-line flag at README.md - Do not sanitize tag values - only metric names and tag names must be sanitized, since they are treated specially by Grafana. Grafana doesn't apply any restrictions on tag values. - Properly replace more than two consecutive dots with a single dot. - Disallow unicode letters in metric names and tag names, since neither Prometheus nor Grafana do not support them. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6489 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077	2024-07-17 12:57:56 +02:00
Aliaksandr Valialkin	54d3abc092	lib: consistently use regexp.Regexp.ReplaceAllLiteralString instead of regexp.Regexp.ReplaceAllString in places where the replacement cannot contain matching group placeholders	2024-07-17 12:57:43 +02:00
rtm0	1b03d7e6de	Fix inconsistent error handling in Storage.AddRows() (#6583 ) `Storage.AddRows()` returns an error only in one case: when `Storage.updatePerDateData()` fails to unmarshal a `metricNameRaw`. But the same error is treated as a warning when it happens inside `Storage.add()` or returned by `Storage.prefillNextIndexDB()`. This commit fixes this inconsistency by treating the error returned by `Storage.updatePerDateData()` as a warning as well. As a result `Storage.add()` does not need a return value anymore and so doesn't `Storage.AddRows()`. Additionally, this commit adds a unit test that checks all cases that result in a row not being added to the storage. --------- Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-07-17 12:55:07 +02:00
Aliaksandr Valialkin	5168d0f754	lib/promrelabel: add test for IfExpression.String() function While at it, simplify this function a bit after the commit `861852f262` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462	2024-07-16 18:32:34 +02:00
Aliaksandr Valialkin	617a7b4db6	lib/promscrape/discovery/yandexcloud: follow-up for `070abe5c71` - Obtain IAM token via GCE-like API instead of Amazon EC2 IMDSv2 API, since it looks like IMDBSv2 API isn't supported by Yandex Cloud according to https://yandex.cloud/en/docs/security/standard/authentication#aws-token : > So far, Yandex Cloud does not support version 2, so it is strongly recommended > to technically disable getting a service account token via the Amazon EC2 metadata service. - Try obtaining IAM token via GCE-like API at first and then fall back to the deprecated Amazon EC2 IMDBSv1. This should prevent from auth errors for instances with disabled GCE-like auth API. This addresses @ITD27M01 concern at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513#issuecomment-1867794884 - Make more clear the description of the change at docs/CHANGELOG.md , add reference to the related issue. P.S. This change wasn't tested in prod because I have no access to Yandex Cloud. It is recommended to test this change by @ITD27M01 and @vmazgo , who filed the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6524	2024-07-16 18:06:33 +02:00
Aliaksandr Valialkin	6d237da3f3	lib/promscrape: follow-up for `1e83598be3` - Clarify that the -promscrape.maxScrapeSize value is used for limiting the maximum scrape size if max_scrape_size option isn't set at https://docs.victoriametrics.com/sd_configs/#scrape_configs - Fix query example for scrape_response_size_bytes metric at https://docs.victoriametrics.com/vmagent/#automatically-generated-metrics - Mention about max_scrape_size option at the -help description for -promscrape.maxScrapeSize command-line flag - Treat zero value for max_scrape_size option as 'no scrape size limit' - Change float64 to int type for scrapeResponseSize struct fields and function args, since response size cannot be fractional - Optimize isAutoMetric() function a bit - Sort auto metrics in alphabetical order in isAutoMetric() and in scrapeWork.addAutoMetrics() functions for better maintainability in the future Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6434 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429	2024-07-16 12:38:41 +02:00
Aliaksandr Valialkin	893a555051	Revert "lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451 )" This reverts commit `cd1aca217c`. Reason for revert: this commit has no sense, since the firehose response has application/json content-type, so it must contain JSON-encoded timestamp and requestId fields according to https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat . HTML-escaping the requestId field may break the response, so the client couldn't correctly recognize the html-escaped requestId. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6451	2024-07-16 09:50:16 +02:00
Aliaksandr Valialkin	8b76a40715	lib/httpserver: skip basic auth check for additional request paths, which should call httpserver.CheckAuthFlag() This is a follow-up for `61dce6f2a1` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6338 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329	2024-07-16 01:08:41 +02:00
Aliaksandr Valialkin	0cf18a6f63	lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now This makes unnecessary the checkDeleted variable at lib/storage/index_db.go This is a follow-up for `b984f4672e` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6342	2024-07-16 00:00:46 +02:00
Aliaksandr Valialkin	cbbc6e7141	lib/mergeset: properly update TableMetrics.TooLongItemsDroppedTotal inside Table.UpdateMetrics Substitute '+=' with '=', since tooLongItemsTotal is global counter, which doesn't belong to the Table struct. This is a follow-up for `69d244e6fb` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6297	2024-07-15 23:41:26 +02:00
Aliaksandr Valialkin	476bf400ac	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:05:46 +02:00
Aliaksandr Valialkin	f4dce57ebe	lib/streamaggr/streamaggr.go: typo fix after `5e29ef5ed5`: IgnoredNaNSamples -> ignoredNaNSamples	2024-07-15 21:59:03 +02:00
Aliaksandr Valialkin	cbc637d1dd	app/vmagent/remotewrite: follow-up for `f153f54d11` - Move the remaining code responsible for stream aggregation initialization from remotewrite.go to streamaggr.go . This improves code maintainability a bit. - Properly shut down streamaggr.Aggregators initialized inside remotewrite.CheckStreamAggrConfigs(). This prevents from potential resource leaks. - Use separate functions for initializing and reloading of global stream aggregation and per-remoteWrite.url stream aggregation. This makes the code easier to read and maintain. This also fixes INFO and ERROR logs emitted by these functions. - Add an ability to specify `name` option in every stream aggregation config. This option is used as `name` label in metrics exposed by stream aggregation at /metrics page. This simplifies investigation of the exposed metrics. - Add `path` label additionally to `name`, `url` and `position` labels at metrics exposed by streaming aggregation. This label should simplify investigation of the exposed metrics. - Remove `match` and `group` labels from metrics exposed by streaming aggregation, since they have little practical applicability: it is hard to use these labels in query filters and aggregation functions. - Rename the metric `vm_streamaggr_flushed_samples_total` to less misleading `vm_streamaggr_output_samples_total` . This metric shows the number of samples generated by the corresponding streaming aggregation rule. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove the metric `vm_streamaggr_stale_samples_total`, since it is unclear how it can be used in practice. This metric has been added in the commit `861852f262` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 - Remove Alias and aggrID fields from streamaggr.Options struct, since these fields aren't related to optional params, which could modify the behaviour of the constructed streaming aggregator. Convert the Alias field to regular argument passed to LoadFromFile() function, since this argument is mandatory. - Pass Options arg to LoadFromFile() function by reference, since this structure is quite big. This also allows passing nil instead of Options when default options are enough. - Add `name`, `path`, `url` and `position` labels to `vm_streamaggr_dedup_state_size_bytes` and `vm_streamaggr_dedup_state_items_count` metrics, so they have consistent set of labels comparing to the rest of streaming aggregation metrics. - Convert aggregator.aggrStates field type from `map[string]aggrState` to `[]aggrOutput`, where `aggrOutput` contains the corresponding `aggrState` plus all the related metrics (currently only `vm_streamaggr_output_samples_total` metric is exposed with the corresponding `output` label per each configured output function). This simplifies and speeds up the code responsible for updating per-output metrics. This is a follow-up for the commit `2eb1bc4f81` . See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6604 - Added missing urls to docs ( https://docs.victoriametrics.com/stream-aggregation/ ) in error messages. These urls help users figuring out why VictoriaMetrics or vmagent generates the corresponding error messages. The urls were removed for unknown reason in the commit `2eb1bc4f81` . - Fix incorrect update for `vm_streamaggr_output_samples_total` metric in flushCtx.appendSeriesWithExtraLabel() function. While at it, reduce memory usage by limiting the maximum number of samples per flush to 10K. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5467 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6268	2024-07-15 20:25:36 +02:00
Aliaksandr Valialkin	a8356f3a26	vendor: update github.com/VictoriaMetrics/metrics from v1.34.1 to v1.35.0 Fix potential memory leaks across VictoriaMetrics codebase after metrics.UnregisterSet(s) call because of missing s.UnregisterAllMetrics() call. This is a follow-up for `6a6e34ab8e` . It is OK if some vmauth metrics aren't visible for a few microseconds when the previous metrics are unregistered and new metrics weren't registered yet. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6252 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5805	2024-07-15 10:45:39 +02:00
Aliaksandr Valialkin	4878152678	lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval, so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk) to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval for flushing pending rows and items from memory to disk. This is a follow-up for `4c80b17027` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221	2024-07-15 10:11:23 +02:00
Aliaksandr Valialkin	c3d2351948	lib/streamaggr: consistently use alphabetical order of benchmarked stream aggregation outputs	2024-07-15 09:53:26 +02:00
Aliaksandr Valialkin	21f049e211	lib/streamaggr: follow-up for `9c3d44c8c9` - Consistently enumerate stream aggregation outputs in alphabetical order across the source code and docs. This should simplify future maintenance of the corresponding code and docs. - Fix the link to `rate_sum()` at `see also` section of `rate_avg()` docs. - Make more clear the docs for `rate_sum()` and `rate_avg()` outputs. - Encapsulate output metric suffix inside rateAggrState. This eliminates possible bugs related to incorrect suffix passing to newRateAggrState(). - Rename rateAggrState.total field to less misleading rateAggrState.increase name, since it calculates counter increase in the current aggregation window. - Set rateLastValueState.prevTimestamp on the first sample in time series instead of the second sample. This makes more clear the code logic. - Move the code for removing outdated entries at rateAggrState into removeOldEntries() function. This make the code logic inside rateAggrState.flushState() more clear. - Do not write output sample with zero value if there are no input series, which could be used for calculating the rate, e.g. if only a single sample is registered for every input series. - Do not take into account input series with a single registered sample when calculating rate_avg(), since this leads to incorrect results. - Move {rate,total}AggrState.flushState() function to the end of rate.go and total.go files, so they look more similar. This shuld simplify future mantenance. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6243	2024-07-15 08:44:48 +02:00
Aliaksandr Valialkin	bc1f92d7f5	app/vmagent/remotewrite: follow-up for `87fd400dfc` - Drop samples and return true from remotewrite.TryPush() at fast path when all the remote storage systems are configured with the disabled on-disk queue, every in-memory queue is full and -remoteWrite.dropSamplesOnOverload is set to true. This case is quite common, so it should be optimized. Previously additional CPU time was spent on per-remoteWriteCtx relabeling and other processing in this case. - Properly count the number of dropped samples inside remoteWriteCtx.pushInternalTrackDropped(). Previously dropped samples were counted only if -remoteWrite.dropSamplesOnOverload flag is set. In reality, the samples are dropped when they couldn't be sent to the queue because in-memory queue is full and on-disk queue is disabled. The remoteWriteCtx.pushInternalTrackDropped() function is called by streaming aggregation for pushing the aggregated data to the remote storage. Streaming aggregation cannot wait until the remote storage processes pending data, so it drops aggregated samples in this case. - Clarify the description for -remoteWrite.disableOnDiskQueue command-line flag at -help output, so it is clear that this flag can be set individually per each -remoteWrite.url. - Make the -remoteWrite.dropSamplesOnOverload flag global. If some of the remote storage systems are configured with the disabled on-disk queue, then there is no sense in keeping samples on some of these systems, while dropping samples on the remaining systems, since this will result in global stall on the remote storage system with the disabled on-disk queue and with the -remoteWrite.dropSamplesOnOverload=false flag. vmagent will always return false from remotewrite.TryPush() in this case. This will result in infinite duplicate samples written to the remaining remote storage systems. That's why the -remoteWrite.dropSamplesOnOverload is forcibly set to true if more than one -remoteWrite.disableOnDiskQueue flag is set. This allows proceeding with newly scraped / pushed samples by sending them to the remaining remote storage systems, while dropping them on overloaded systems with the -remoteWrite.disableOnDiskQueue flag set. - Verify that the remoteWriteCtx.TryPush() returns true in the TestRemoteWriteContext_TryPush_ImmutableTimeseries test. - Mention in vmagent docs that the -remoteWrite.disableOnDiskQueue command-line flag can be set individually per each -remoteWrite.url. See https://docs.victoriametrics.com/vmagent/#disabling-on-disk-persistence Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6248 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065	2024-07-13 02:30:10 +02:00
Aliaksandr Valialkin	43fc1183b9	app/vmalert: switch from table-driven tests to f-tests This makes test code more clear and reduces the number of code lines by 500. This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error* requires more boilerplate code, which can result in additional bugs inside tests. While t.Error* allows writing logging errors for the same, this doesn't simplify fixing broken tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-12 22:45:50 +02:00
hagen1778	c835a6351e	lib/streamaggr: add missing test cases Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `2f65956259`)	2024-07-12 14:19:17 +02:00
Hui Wang	f3cbd62823	vmagent: fix `vm_streamaggr_flushed_samples_total` counter (#6604 ) We use `vm_streamaggr_flushed_samples_total` to show the number of produced samples by aggregation rule, previously it was overcounted, and doesn't account for `output_relabel_configs`. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6462 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `2eb1bc4f81`)	2024-07-12 14:19:17 +02:00
hagen1778	7370f84b97	lib/bakcup/azremote: follow-up after `5fd3aef549` Simplify tests by converting them to f-tests. `5fd3aef549` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `03e4c5c19c`)	2024-07-10 15:17:06 +02:00
justinrush	e65e55e2dd	lib/backup: add support for Azure Managed Identity (#6518 ) ### Describe Your Changes These changes support using Azure Managed Identity for the `vmbackup` utility. It adds two new environment variables: * `AZURE_USE_DEFAULT_CREDENTIAL`: Instructs the `vmbackup` utility to build a connection using the [Azure Default Credential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.5.2#NewDefaultAzureCredential) mode. This causes the Azure SDK to check for a variety of environment variables to try and make a connection. By default, it tries to use managed identity if that is set up. This will close https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5984 ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). ### Testing However you normally test the `vmbackup` utility using Azure Blob should continue to work without any changes. The set up for that is environment specific and not listed out here. Once regression testing has been done you can set up [Azure Managed Identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) so your resource (AKS, VM, etc), can use that credential method. Once it is set up, update your environment variables according to the updated documentation. I added unit tests to the `FS.Init` function, then made my changes, then updated the unit tests to capture the new branches. I tested this in our environment, but with SAS token auth and managed identity and it works as expected. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Justin Rush <jarush@epic.com> Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `5fd3aef549`)	2024-07-10 12:26:21 +02:00

1 2 3 4 5 ...

2647 Commits