Commit Graph

8677 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
39f3f3a517 lib: move common code for creating flock.lock file into fs.CreateFlockFile 2019-08-13 01:46:20 +03:00
Aliaksandr Valialkin
73f866d874 lib/fs: atomically create file with the given contents on WriteFileAtomically
This should prevent from `transaction` and `metadata.json` files corruption
on unclean shutdown such as OOM, `kill -9`, power loss, etc.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/148
2019-08-12 15:02:04 +03:00
Aliaksandr Valialkin
ad5be625f8 deployment: update docker images 2019-08-06 16:10:03 +03:00
Aliaksandr Valialkin
4fb635b0c9 lib/storage: do not change timestamps to constant rate if values are constant or have constant delta
This breaks the original timestamps, which results in issues like
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/120 and
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/141 .
2019-08-06 15:40:17 +03:00
Aliaksandr Valialkin
f56c1298ad app/vmstorage: add vm_concurrent_addrows_* metrics for tracking concurrency for Storage.AddRows calls
Track also the number of dropped rows due to the exceeded timeout
on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`
2019-08-06 15:08:43 +03:00
Aliaksandr Valialkin
2d869c6d9b vendor: update github.com/VictoriaMetrics/metrics to v1.7.1 2019-08-05 19:21:53 +03:00
Aliaksandr Valialkin
8e05758ff5 app: add vm_concurrent_ metrics for visibility in concurrency limiters for vminsert and vmselect 2019-08-05 18:30:29 +03:00
Aliaksandr Valialkin
1258c9ef10 vendor: make vendor-update 2019-08-05 10:34:38 +03:00
Aliaksandr Valialkin
a3ecf3c1f7 lib/storage: properly reset partSearch.fetchData in partSearch.reset 2019-08-05 09:55:50 +03:00
Artem Navoiev
dd4ea63ed2 [deployment] add statefulset for vmselect (#140) 2019-08-04 23:34:05 +03:00
Aliaksandr Valialkin
a868f8607f deployment: update docker images to v1.24.0-cluster 2019-08-04 23:31:57 +03:00
Aliaksandr Valialkin
53c8f56436 app/vmselect: allow passing match[], start and time to /api/v1/label/<label_name>/values
`/api/v1/label/<label_name>/values?match[]=q` emulates emulates `label_values(q, <label_name>)`
call in Grafana templating.
2019-08-04 23:07:00 +03:00
Aliaksandr Valialkin
880b1d80b1 app/vmselect: optimize /api/v1/series by skipping storage data
Fetch and process only time series metainfo.
2019-08-04 23:00:46 +03:00
Aliaksandr Valialkin
7f5afae1e3 app/vmselect/prometheus: prevent from fetching and scanning all the data on /api/v1/searies call by default 2019-08-04 19:42:45 +03:00
Aliaksandr Valialkin
000c154641 app/vmselect/promql: tune automatic window adjustement
Increase the windows adjustement for small scrape intervals,
since they usually have higher jitter.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/139
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
2019-08-04 19:34:11 +03:00
Aliaksandr Valialkin
1d4ddadbb1 app/vmselect/promql: further increase the allowed jitter for scrape interval
Real-world production data shows higher jitter than 1/8 of scrape interval.
This may results in gaps on the graph. So increase the allowed jitter to 1/4
of scrape interval in order to reduce the probability of gaps on the graphs
over time series with high jitter for scrape_interval.
2019-08-02 20:16:41 +03:00
Aliaksandr Valialkin
8ed84a4713 app/vminsert/influx: round automatically generated timestamp according to the given precision arg 2019-08-02 00:24:39 +03:00
Aliaksandr Valialkin
ade7bc30db app/vmselect/promql: tolerate higher jitter in scrape interval
Allow jitter for up to 1/8 instead of 1/16 for the scrape interval.
This should imrpove graphs when `step` is smaller than the `scrape_interval`.
2019-08-01 23:25:53 +03:00
Aliaksandr Valialkin
a99e89945e lib/decimal: modernize tests a bit 2019-07-31 21:09:54 +03:00
Aliaksandr Valialkin
6fceedccce deployment: update docker images 2019-07-31 16:38:39 +03:00
Aliaksandr Valialkin
c994fbf500 app/vmselect/promql: add vm_slow_queries_total metric for counting slow queries
The query is slow if its execution time exceeds `-search.logSlowQueryDuration`
2019-07-31 03:36:45 +03:00
Aliaksandr Valialkin
071a122119 app/vmselect/promql: return NaN from histogram_quantile if at least a single bucket is broken 2019-07-31 01:18:11 +03:00
Aliaksandr Valialkin
b9a16b93e7 app/vmselect/promql: allow adjusting window for default rollup function
Default rollup function is `last_over_time`. It must support adjusting
the provided window in order to prevent from gaps on the graph
for window values smaller than scrape interval.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
2019-07-31 00:45:58 +03:00
Aliaksandr Valialkin
c901a6472f app/vmselect/promql: return NaN values if invalid bucket counts are passed to histogram_quantile
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/136
2019-07-30 22:05:55 +03:00
Aliaksandr Valialkin
b7c4b0c6d2 lib/storage: fix matching against tag filter with empty name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/137
2019-07-30 15:15:21 +03:00
Aliaksandr Valialkin
5b8526e925 app/vmselect/netstorage: improve error message when reading data blocks from storage
Mention the block number in the error. This should simplify troubleshooting in this code.
2019-07-28 12:17:33 +03:00
Aliaksandr Valialkin
b7089705b7 app/vminsert: add vm_rows_per_insert summary metric
This metric should help tuning batch sizes on clients writing data to VictoriaMetrics
2019-07-27 13:28:20 +03:00
Aliaksandr Valialkin
1fd4e9fb5c app/vminsert: improve error messages for Influx, OpenTSDB and Graphite parsing
Include in the error message the line which failed to parse.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/127
2019-07-26 22:09:21 +03:00
Aliaksandr Valialkin
34b21a8671 deployment: update cluster image to v1.23.0-cluster 2019-07-26 20:07:47 +03:00
Aliaksandr Valialkin
8253790157 app/vmstorage: consistency renaming for ignored rows metrics
vm_too_big_timestamp_rows_total -> vm_rows_ignored_total{reason="big_timestamp"}
  vm_too_small_timestamp_rows_total -> vm_rows_ignored_total{reason="small_timestamp"}
2019-07-26 20:02:24 +03:00
Aliaksandr Valialkin
c6bec48927 lib/storage: add metrics for calculating skipped rows outside the retention
The metrics are:

    - vm_too_big_timestamp_rows_total
    - vm_too_small_timestamp_rows_total
2019-07-26 14:11:56 +03:00
Aliaksandr Valialkin
aac482517f app/vmselect/promql: return NaN from count() over zero time series
This aligns `count` behavior with Prometheus.
2019-07-25 22:02:34 +03:00
Aliaksandr Valialkin
0e52357f35 app/vmselect/promql: properly calculate incremental aggregations grouped by __name__
Previously the following query may fail on multiple distinct metric names match:

    sum(count_over_time{__name__!=''}) by (__name__)
2019-07-25 21:53:26 +03:00
Aliaksandr Valialkin
f2e8d54fb0 lib/encoding/zstd: go fmt 2019-07-25 01:37:57 +03:00
Aliaksandr Valialkin
97b5dc7122 lib/encoding/zstd: disable CRC checks in pure Go build
This should give slightly better compression and decompressions performance.
Additionally this shaves off 4 bytes per each compressed block.
2019-07-24 19:17:32 +03:00
Aliaksandr Valialkin
54f035d4ce all: small updates after PR #114 2019-07-24 17:43:43 +03:00
Aliaksandr Valialkin
7a133567fb lib/encoding: small fixes in tests after the PR #114 2019-07-24 17:43:39 +03:00
Roman Khavronenko
fcf09aaa3c all: add Pure Go build (pull request #114)
Updates #94
2019-07-24 17:43:32 +03:00
Aliaksandr Valialkin
dd7bba94a3 dashboards: use rate instead of irate, because irate doesn't capture spikes
See https://medium.com/@valyala/why-irate-from-prometheus-doesnt-capture-spikes-45f9896d7832 for details
2019-07-20 15:55:48 +03:00
Aliaksandr Valialkin
3fae34eeb4 lib/encoding: improve gauge series detection
- Series with negative values are always gauges
- Counters may only have increasing values with possible counter resets

This should improve compression ratio for gauge series which
were previously mistakenly detected as counters.
2019-07-20 14:05:25 +03:00
Aliaksandr Valialkin
b335a811c3 deployment: switch builder from go1.12.6 to go1.12.7 2019-07-20 12:14:05 +03:00
Jiri Tyr
0aed0e0b5d Adding Grafana dashboards for VM cluster (#105) 2019-07-20 10:25:09 +03:00
Aliaksandr Valialkin
cb8104cf77 app: clarify error messages when -storageNode arg is missing in vminsert and vmselect 2019-07-20 10:21:59 +03:00
Aliaksandr Valialkin
fab1962e02 deployment/k8s/helm: use correct default ports for -storageNode
Previously these ports were swapped. Correct ports are:

- vminsert: -storageNode=*:8400
- vmselect: -storageNode=*:8401
2019-07-20 01:24:32 +03:00
Aliaksandr Valialkin
e3dcfe5851 deployment/docker/docker-compose.yml: use default ports for vminsert and vmselect services
These ports were swapped. Correct default ports are:

- vminsert: -httpListenAddr=:8480, -storageNode=*:8400
- vmselect: -httpListenAddr=:8481, -storageNode=*:8401
2019-07-20 01:20:08 +03:00
Thor Anker Kvisgård Lange
f576b267eb Fixed small bug in vmstorage name template
Signed-off-by: Thor Anker Kvisgård Lange <thanl@mhivestasoffshore.com>
2019-07-17 13:30:23 +03:00
Aliaksandr Valialkin
76b947dcb4 deployment: update Docker images 2019-07-15 23:56:24 +03:00
Aliaksandr Valialkin
7abb96b454 lib/netutil: do not count timeouts as network errors 2019-07-15 23:06:13 +03:00
Aliaksandr Valialkin
2b4254d01f app/vminsert: use netutil.TCPListener for collecting network-related metrics for Graphite and OpenTSDB TCP traffic 2019-07-15 22:58:35 +03:00
Aliaksandr Valialkin
092c9b39a8 app/vmselect/promql: remove empty time series after applying filters like q > 0
This should reduce CPU and RAM usage for queries over high number of time series.
2019-07-12 19:59:49 +03:00