Commit Graph

385 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
b16e19c053 lib/storage/dedup.go: go fmt 2020-04-26 14:37:36 +03:00
Aliaksandr Valialkin
a0000c3a6e lib/storage: improve deduplication algorithm
Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval.

Previously it may leave two data points on the interval. This could lead to unexpected results
for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.
2020-04-26 13:10:18 +03:00
Aliaksandr Valialkin
13b4069c59 lib/storage: postpone label filters matching too many time series instead of giving up with error
This should reduce the frequency of the following errors:

    cannot find tag filter matching less than N time series; either increase -search.maxUniqueTimeseries or use more specific tag filters

    more than N time series found on the time range [...]; either increase -search.maxUniqueTimeseries or shrink the time range
2020-04-24 21:18:52 +03:00
Aliaksandr Valialkin
7c74efd640 lib/promscrape/discovery/gce: make golint happy by ignoring resp.Body.Close() result 2020-04-24 18:13:26 +03:00
Aliaksandr Valialkin
069690e3bd lib/promscrape: initial implementation for gce_sd_configs aga Prometheus-compatible service discovery for Google Compute Engine 2020-04-24 17:53:43 +03:00
Aliaksandr Valialkin
de991551f5 lib/promscrape: query /api/v1/namespaces/* for the configured namespaces in kubernetes_sd_config
This should fix authroization issues described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/432
2020-04-24 14:42:02 +03:00
Aliaksandr Valialkin
387a21c96d lib/promscrape: add -promscrape.configCheckInterval command-line flag for automating config checking
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/431
2020-04-23 23:41:26 +03:00
Aliaksandr Valialkin
83e4c8427e lib/promscrape: access Config entries by reference, so they can be compared by addresses 2020-04-23 14:38:29 +03:00
Aliaksandr Valialkin
e220f3eeb6 lib/promscrape: move KubernetesSDConfig to lib/promscrape/discovery/kubernetes 2020-04-23 11:34:30 +03:00
Aliaksandr Valialkin
1187494c8f lib/promscrape/discovery/kubernetes: hide role switch logic behind GetLabels function 2020-04-22 22:16:18 +03:00
Aliaksandr Valialkin
f9526809e5 app/vmselect: add /api/v1/status/tsdb page with useful stats for locating root cause for high cardinality issues
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:23 +03:00
Aliaksandr Valialkin
f3e5722257 lib/writeconcurrencylimiter: improve docs for -maxConcurrentInserts command-line flag 2020-04-20 21:03:09 +03:00
Aliaksandr Valialkin
81481abaa9 lib/promscrape/discovery/kubernetes: reuse a client for empty api_server inside different jobs 2020-04-20 17:07:37 +03:00
Aliaksandr Valialkin
6764efde39 lib/promscrape/discovery/kubernetes: update stale comments 2020-04-17 14:06:26 +03:00
Aliaksandr Valialkin
d86640d609 lib/promscrape: suppress scrape errors if -promscrape.suppressScrapeErrors flag is set 2020-04-16 23:41:52 +03:00
Aliaksandr Valialkin
70104f3fb1 lib/promscrape: print all the labels for the target on error message for failed scrape
This should improve debuggability.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/420
2020-04-16 23:35:10 +03:00
Aliaksandr Valialkin
266bbec52d lib/promscrape: retry target scraping when the target closes previously established keep-alive connection to it
This should fix the following error:

the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
2020-04-16 23:25:34 +03:00
Aliaksandr Valialkin
b2d009c8db lib/logger: typo fix 2020-04-16 00:20:02 +03:00
Aliaksandr Valialkin
d4bc60d63c lib/logger: add WARN level for logging expected errors such as invalid user queries 2020-04-15 20:50:45 +03:00
Aliaksandr Valialkin
a873b553cf app/vmselect: handle timestamp(metric offset X) the same way as Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/415
2020-04-15 12:01:05 +03:00
Aliaksandr Valialkin
99f0cb1f5f lib/promscrape: code cleanup in runScraper func 2020-04-15 11:36:35 +03:00
Aliaksandr Valialkin
e9d9638627 lib/storage: skip metricID if the corresponding metricID->metricName is missing in inverted index during search
This case is possible when the corresponding metricID->metricName entry didn't propagate to inverted index yet.

This should fix the following error:

error when searching tsids for tfss [...]: cannot find metricName by metricID 1582417212213420669: EOF
2020-04-15 00:10:11 +03:00
Aliaksandr Valialkin
6ec582acb9 lib/promscrape: show information on improperly configured scrape targets at the bottom of /targets page
This is a common error whith improperly configured target autodiscovery and/or relabeling.
This error leads to duplicate scraping of the same targets with the same set of labels, which leads
to duplicate samples in time series.
2020-04-14 14:55:13 +03:00
Aliaksandr Valialkin
391fb0903e lib/promscrape/discovery/kubernetes: remove only unused client for API server during cleaning 2020-04-14 14:19:26 +03:00
Aliaksandr Valialkin
636e1578de lib/promscrape: add promrelabel.GetLabelValueByName helper function 2020-04-14 14:12:15 +03:00
Aliaksandr Valialkin
3945bf9dec lib/promscrape: mention job name in error messages when target cannot be scraped
This should improve debuggability
2020-04-14 13:33:18 +03:00
Aliaksandr Valialkin
66da177fe9 lib/promscrape: reset ScrapeWork.ID in tests 2020-04-14 13:31:37 +03:00
Aliaksandr Valialkin
88366cad15 lib/promscrape: properly expose statuses for targets with duplicate scrape urls at /targets page
Previously targets with duplicate scrape urls were merged into a single line on the page.
Now each target with duplicate scrape url is displayed on a separate line.
2020-04-14 13:10:06 +03:00
Aliaksandr Valialkin
09f796e2ab lib/promscrape: remove labels starting with __meta_ after applying relabel_configs as Prometheus does
This should reduce CPU load during scraping when target discovery generates
big number of `__meta_*` labels (for instance, k8s discovery).

See https://www.robustperception.io/life-of-a-label for details.
2020-04-14 12:23:30 +03:00
Aliaksandr Valialkin
f58d15f27c lib/promscrape: rename 'scrape_config->scrape_limit' to 'scrape_config->sample_limit'
`scrape_config` block from Prometheus config contains `sample_limit` field,
while in `vmagent` this field was mistakenly named as `scrape_limit`.
2020-04-14 12:00:03 +03:00
Aliaksandr Valialkin
7c4fb038e3 lib/promscrape: add initial support for kubernetes_sd_config
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/334
2020-04-13 21:03:53 +03:00
Aliaksandr Valialkin
4017163393 lib/promscrape: add -promscrape.config.strictParse flag for detecting errors in -promscrape.config file 2020-04-13 13:15:52 +03:00
Aliaksandr Valialkin
7fbfef2aee lib/promscrape: extract common auth code to lib/promauth 2020-04-13 12:59:22 +03:00
Aliaksandr Valialkin
e0c6da8e2a lib/storage: disable deduplication after dedup tests are complete
The rest of tests expect that the de-duplication is disabled.
2020-04-10 17:33:38 +03:00
Aliaksandr Valialkin
8ed0d5471a lib/storage: correctly handle -dedup.minScrapeInterval values smaller than 8ms
Such small values may be used for removing samples with duplicate timestamps.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/409 for details.
2020-04-10 16:40:41 +03:00
Aliaksandr Valialkin
0b2f678d8e lib/{storage,mergeset}: make sure that requests and misses cache counters never go down 2020-04-10 14:44:52 +03:00
Aliaksandr Valialkin
661cfb03e2 lib/protoparser: add -*TrimTimstamp command-line flags for Influx, Graphite, OpenTSDB and CSV data
These flags can be used for reducing disk space usage for timestamps data ingested over the given protocols
2020-04-10 12:44:46 +03:00
Aliaksandr Valialkin
f0b08dbd9e lib/workingsetcache: accumulate stat counters on cache rotation
This should prevent from cache stats counters going down after cache rotation,
which may corrupt `cache hit ratio` graph on the official Grafan dasbhoards
when using the following query:

    1 - (sum(rate(vm_cache_misses_total[5m])) by (type) / sum(rate(vm_cache_requests_total[5m])) by (type))
2020-04-10 11:51:47 +03:00
Aliaksandr Valialkin
28c65b58a2 lib/memory: add more details to -memory.allowedPercent help message 2020-04-09 15:34:21 +03:00
Aliaksandr Valialkin
84fa146792 lib/httpserver: remove unnecessary http.HandlerFunc wrapper in gzipHandler 2020-04-01 18:14:47 +03:00
Aliaksandr Valialkin
0ad7aaf535 lib/storage: add missing reset for tagFilter.matchesEmptyValue on tagFilter.Init 2020-04-01 17:40:27 +03:00
Aliaksandr Valialkin
c189104be7 lib/promscrape: reduce timestamp jitter when scraping targets
This should improve compression for timestamps
2020-04-01 16:13:01 +03:00
Aliaksandr Valialkin
4c56acbafa lib/storage: remove duplicate data points on 7/8*minScrapeInterval interval instead of 1/2*minScrapeInterval
This should reduce storage usage and should improve deduplication accuracy
2020-04-01 15:47:04 +03:00
Aliaksandr Valialkin
504ea876f2 lib/storage: handle errors returned from TagFilters.Add when cloning TagFilters with negative filter 2020-03-31 16:18:34 +03:00
Aliaksandr Valialkin
ef714e01c1 lib/storage: add fast path for the previous indexdb search if it doesn't contain per-day inverted index yet 2020-03-31 12:35:15 +03:00
Aliaksandr Valialkin
7e755b4bac lib/storage: optimize per-day inverted index search for tag filters matching big number of time series
- Sort tag filters in the ascending number of matching time series
  in order to apply the most specific filters first.
- Fall back to metricName search for filters matching big number of time series
  (usually this are negative filters or regexp filters).
2020-03-31 00:53:29 +03:00
Aliaksandr Valialkin
d450249955 lib/storage: properly handle {label=~"foo|"} filters as Prometheus does
Such filters must match all the time series with `label="foo"` plus all the time series without `label`

Previously only time series with `label="foo"` were matched.
2020-03-30 20:21:47 +03:00
Aliaksandr Valialkin
b47444e69d lib/envflag: add -envflag.prefix for setting optional prefix for environment vars 2020-03-30 15:51:44 +03:00
kreedom
f058efb3d1 [vmalert] config parser (#393)
* [vmalert] config parser

* make linter be happy

* fix test

* fix sprintf add test for rule validation
2020-03-29 01:49:40 +02:00
Aliaksandr Valialkin
42c290ce9f lib/httpserver: add -http.maxGracefulShutdownDuration command-line flag for tuning the maximum duration required for graceful shutdown of http server 2020-03-27 20:10:05 +02:00