Aliaksandr Valialkin
424068f804
lib/promscrape: handle connection reset when targets responds with http redirect
2020-04-28 02:14:26 +03:00
肖贝贝
7d045bf2ca
fix: vmagent not follow 301/302 redirect bug ( #445 )
...
Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-04-28 02:14:25 +03:00
Aliaksandr Valialkin
2aecf7c37c
lib/{encoding,decimal}: typo fixes in tests: epxecting->expecting
2020-04-28 00:02:19 +03:00
Aliaksandr Valialkin
806dc73d8a
lib/encoding: reduce possibility of failure in TestMarshalInt64ArraySize
2020-04-28 00:02:18 +03:00
Aliaksandr Valialkin
a603a15757
lib/promscrape/discovery/gce: make golangci-lint happy
2020-04-27 19:29:42 +03:00
Aliaksandr Valialkin
86a1d9cb0c
lib/promscrape: add initial support for Prometheus-compatible service discovery for Amazon EC2 aka ec2_sd_configs
2020-04-27 19:29:22 +03:00
Aliaksandr Valialkin
1acb6eb25a
lib/promscrape/discovery/gce: properly set filter
query arg in api url
2020-04-27 16:01:53 +03:00
Aliaksandr Valialkin
0daa37fa02
lib/promscrape/discovery/gce: allow empty project and zone for gce_sd_config
2020-04-27 11:45:45 +03:00
Aliaksandr Valialkin
989d84cf3f
app/{vminsert,vmstorage}: wait for ack
from vmstorage
after each packet sent to it from vminsert
...
This should protect from possible data loss when `vmstorage` is stopped while the packet is sent from `vminsert`.
This commit switches to new protocol between vminsert and vmstorage, which is incompatible
with the previous protocol. So it is required that both vminsert and vmstorage nodes are updated.
2020-04-27 09:53:26 +03:00
Aliaksandr Valialkin
e933cbac16
lib/storage: postpone reading data from blocks during search
...
This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics
during heavy queries, which touch big number of time series over long time ranges.
This improves single-node VM performance on heavy queries by up to 2x.
2020-04-27 08:44:01 +03:00
Aliaksandr Valialkin
31861c5b8e
lib/promscrape/discovery/gce: allow empty zone
arg in gce_sd_config
- in this case zones for the given project are automatically discovered
2020-04-26 14:37:38 +03:00
Aliaksandr Valialkin
b16e19c053
lib/storage/dedup.go: go fmt
2020-04-26 14:37:36 +03:00
Aliaksandr Valialkin
a0000c3a6e
lib/storage: improve deduplication algorithm
...
Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval.
Previously it may leave two data points on the interval. This could lead to unexpected results
for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.
2020-04-26 13:10:18 +03:00
Aliaksandr Valialkin
13b4069c59
lib/storage: postpone label filters matching too many time series instead of giving up with error
...
This should reduce the frequency of the following errors:
cannot find tag filter matching less than N time series; either increase -search.maxUniqueTimeseries or use more specific tag filters
more than N time series found on the time range [...]; either increase -search.maxUniqueTimeseries or shrink the time range
2020-04-24 21:18:52 +03:00
Aliaksandr Valialkin
7c74efd640
lib/promscrape/discovery/gce: make golint happy by ignoring resp.Body.Close() result
2020-04-24 18:13:26 +03:00
Aliaksandr Valialkin
069690e3bd
lib/promscrape: initial implementation for gce_sd_configs
aga Prometheus-compatible service discovery for Google Compute Engine
2020-04-24 17:53:43 +03:00
Aliaksandr Valialkin
de991551f5
lib/promscrape: query /api/v1/namespaces/*
for the configured namespaces in kubernetes_sd_config
...
This should fix authroization issues described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/432
2020-04-24 14:42:02 +03:00
Aliaksandr Valialkin
387a21c96d
lib/promscrape: add -promscrape.configCheckInterval
command-line flag for automating config checking
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/431
2020-04-23 23:41:26 +03:00
Aliaksandr Valialkin
83e4c8427e
lib/promscrape: access Config entries by reference, so they can be compared by addresses
2020-04-23 14:38:29 +03:00
Aliaksandr Valialkin
e220f3eeb6
lib/promscrape: move KubernetesSDConfig to lib/promscrape/discovery/kubernetes
2020-04-23 11:34:30 +03:00
Aliaksandr Valialkin
1187494c8f
lib/promscrape/discovery/kubernetes: hide role switch logic behind GetLabels function
2020-04-22 22:16:18 +03:00
Aliaksandr Valialkin
f9526809e5
app/vmselect: add /api/v1/status/tsdb
page with useful stats for locating root cause for high cardinality issues
...
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:23 +03:00
Aliaksandr Valialkin
f3e5722257
lib/writeconcurrencylimiter: improve docs for -maxConcurrentInserts command-line flag
2020-04-20 21:03:09 +03:00
Aliaksandr Valialkin
81481abaa9
lib/promscrape/discovery/kubernetes: reuse a client for empty api_server
inside different jobs
2020-04-20 17:07:37 +03:00
Aliaksandr Valialkin
6764efde39
lib/promscrape/discovery/kubernetes: update stale comments
2020-04-17 14:06:26 +03:00
Aliaksandr Valialkin
d86640d609
lib/promscrape: suppress scrape errors if -promscrape.suppressScrapeErrors
flag is set
2020-04-16 23:41:52 +03:00
Aliaksandr Valialkin
70104f3fb1
lib/promscrape: print all the labels for the target on error message for failed scrape
...
This should improve debuggability.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/420
2020-04-16 23:35:10 +03:00
Aliaksandr Valialkin
266bbec52d
lib/promscrape: retry target scraping when the target closes previously established keep-alive connection to it
...
This should fix the following error:
the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
2020-04-16 23:25:34 +03:00
Aliaksandr Valialkin
b2d009c8db
lib/logger: typo fix
2020-04-16 00:20:02 +03:00
Aliaksandr Valialkin
d4bc60d63c
lib/logger: add WARN level for logging expected errors such as invalid user queries
2020-04-15 20:50:45 +03:00
Aliaksandr Valialkin
a873b553cf
app/vmselect: handle timestamp(metric offset X)
the same way as Prometheus does
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/415
2020-04-15 12:01:05 +03:00
Aliaksandr Valialkin
99f0cb1f5f
lib/promscrape: code cleanup in runScraper func
2020-04-15 11:36:35 +03:00
Aliaksandr Valialkin
e9d9638627
lib/storage: skip metricID if the corresponding metricID->metricName is missing in inverted index during search
...
This case is possible when the corresponding metricID->metricName entry didn't propagate to inverted index yet.
This should fix the following error:
error when searching tsids for tfss [...]: cannot find metricName by metricID 1582417212213420669: EOF
2020-04-15 00:10:11 +03:00
Aliaksandr Valialkin
6ec582acb9
lib/promscrape: show information on improperly configured scrape targets at the bottom of /targets
page
...
This is a common error whith improperly configured target autodiscovery and/or relabeling.
This error leads to duplicate scraping of the same targets with the same set of labels, which leads
to duplicate samples in time series.
2020-04-14 14:55:13 +03:00
Aliaksandr Valialkin
391fb0903e
lib/promscrape/discovery/kubernetes: remove only unused client for API server during cleaning
2020-04-14 14:19:26 +03:00
Aliaksandr Valialkin
636e1578de
lib/promscrape: add promrelabel.GetLabelValueByName helper function
2020-04-14 14:12:15 +03:00
Aliaksandr Valialkin
3945bf9dec
lib/promscrape: mention job name in error messages when target cannot be scraped
...
This should improve debuggability
2020-04-14 13:33:18 +03:00
Aliaksandr Valialkin
66da177fe9
lib/promscrape: reset ScrapeWork.ID in tests
2020-04-14 13:31:37 +03:00
Aliaksandr Valialkin
88366cad15
lib/promscrape: properly expose statuses for targets with duplicate scrape urls at /targets
page
...
Previously targets with duplicate scrape urls were merged into a single line on the page.
Now each target with duplicate scrape url is displayed on a separate line.
2020-04-14 13:10:06 +03:00
Aliaksandr Valialkin
09f796e2ab
lib/promscrape: remove labels starting with __meta_
after applying relabel_configs
as Prometheus does
...
This should reduce CPU load during scraping when target discovery generates
big number of `__meta_*` labels (for instance, k8s discovery).
See https://www.robustperception.io/life-of-a-label for details.
2020-04-14 12:23:30 +03:00
Aliaksandr Valialkin
f58d15f27c
lib/promscrape: rename 'scrape_config->scrape_limit' to 'scrape_config->sample_limit'
...
`scrape_config` block from Prometheus config contains `sample_limit` field,
while in `vmagent` this field was mistakenly named as `scrape_limit`.
2020-04-14 12:00:03 +03:00
Aliaksandr Valialkin
7c4fb038e3
lib/promscrape: add initial support for kubernetes_sd_config
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/334
2020-04-13 21:03:53 +03:00
Aliaksandr Valialkin
4017163393
lib/promscrape: add -promscrape.config.strictParse
flag for detecting errors in -promscrape.config
file
2020-04-13 13:15:52 +03:00
Aliaksandr Valialkin
7fbfef2aee
lib/promscrape: extract common auth code to lib/promauth
2020-04-13 12:59:22 +03:00
Aliaksandr Valialkin
e0c6da8e2a
lib/storage: disable deduplication after dedup tests are complete
...
The rest of tests expect that the de-duplication is disabled.
2020-04-10 17:33:38 +03:00
Aliaksandr Valialkin
8ed0d5471a
lib/storage: correctly handle -dedup.minScrapeInterval
values smaller than 8ms
...
Such small values may be used for removing samples with duplicate timestamps.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/409 for details.
2020-04-10 16:40:41 +03:00
Aliaksandr Valialkin
0b2f678d8e
lib/{storage,mergeset}: make sure that requests
and misses
cache counters never go down
2020-04-10 14:44:52 +03:00
Aliaksandr Valialkin
661cfb03e2
lib/protoparser: add -*TrimTimstamp
command-line flags for Influx, Graphite, OpenTSDB and CSV data
...
These flags can be used for reducing disk space usage for timestamps data ingested over the given protocols
2020-04-10 12:44:46 +03:00
Aliaksandr Valialkin
f0b08dbd9e
lib/workingsetcache: accumulate stat counters on cache rotation
...
This should prevent from cache stats counters going down after cache rotation,
which may corrupt `cache hit ratio` graph on the official Grafan dasbhoards
when using the following query:
1 - (sum(rate(vm_cache_misses_total[5m])) by (type) / sum(rate(vm_cache_requests_total[5m])) by (type))
2020-04-10 11:51:47 +03:00
Aliaksandr Valialkin
28c65b58a2
lib/memory: add more details to -memory.allowedPercent
help message
2020-04-09 15:34:21 +03:00