diff --git a/README.md b/README.md index e7c770df56..f62732c359 100644 --- a/README.md +++ b/README.md @@ -513,6 +513,10 @@ The cluster works in the following way when some of `vmstorage` nodes are unavai In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable. Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes. + `vmselect` also accepts `-replicationFactor=N` command-line flag. This flag instructs `vmselect` to return full response + if less than `-replicationFactor` vmstorage nodes are unavailable during querying, since it assumes that the remaining + `vmstorage` nodes contain the full data. See [these docs](#replication-and-data-safety) for details. + `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete. Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details. @@ -648,7 +652,8 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason. The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. -Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time. + +Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less than `-replicationFactor` vmstorage nodes are unavailable during the query. See [cluster availability docs](#cluster-availability) for details. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable. @@ -1117,7 +1122,7 @@ Below is the output for `/path/to/vmselect -help`: Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage Supports an array of values separated by comma or specified via multiple flags. -replicationFactor int - How many copies of every time series is available on vmstorage nodes. vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses after adding new vmstorage nodes even if the replication is enabled at vminsert nodes (default 1) + How many copies of every time series is available on the provided -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. See also -search.skipSlowReplicas (default 1) -search.cacheTimestampOffset duration The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s) -search.denyPartialResponse @@ -1198,7 +1203,7 @@ Below is the output for `/path/to/vmselect -help`: -search.setLookbackToStep Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored -search.skipSlowReplicas - Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data. + Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster -search.treatDotsAsIsInRegexps Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter -selectNode array diff --git a/app/vmselect/netstorage/netstorage.go b/app/vmselect/netstorage/netstorage.go index 5efd9e407e..7c3950c49f 100644 --- a/app/vmselect/netstorage/netstorage.go +++ b/app/vmselect/netstorage/netstorage.go @@ -33,13 +33,12 @@ import ( ) var ( - replicationFactor = flag.Int("replicationFactor", 1, "How many copies of every time series is available on vmstorage nodes. "+ - "vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. "+ - "It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses "+ - "after adding new vmstorage nodes even if the replication is enabled at vminsert nodes") - skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip waiting for all replicas to respond during search query. "+ - "Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. "+ - "But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data.") + replicationFactor = flag.Int("replicationFactor", 1, "How many copies of every time series is available on the provided -storageNode nodes. "+ + "vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. "+ + "See also -search.skipSlowReplicas") + skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. "+ + "Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor "+ + "copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster") maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery") maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries") vmstorageDialTimeout = flag.Duration("vmstorageDialTimeout", 5*time.Second, "Timeout for establishing RPC connections from vmselect to vmstorage") @@ -1729,17 +1728,6 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co result := <-snr.resultsCh if err := f(result.data); err != nil { snr.finishQueryTracer(result.qt, fmt.Sprintf("error: %s", err)) - if *skipSlowReplicas && resultsCollected > len(sns)-*replicationFactor { - // There is no need in waiting for the remaining results, - // because the collected results contain all the data according to the given -replicationFactor. - // This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable. - // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 - // - // It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible. - snr.finishQueryTracers(fmt.Sprintf("cancel request because %d out of %d nodes already returned response according to -replicationFactor=%d", - resultsCollected, len(sns), *replicationFactor)) - return false, nil - } var er *errRemote if errors.As(err, &er) { // Immediately return the error reported by vmstorage to the caller, @@ -1767,6 +1755,17 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co } snr.finishQueryTracer(result.qt, "") resultsCollected++ + if *skipSlowReplicas && resultsCollected > len(sns)-*replicationFactor { + // There is no need in waiting for the remaining results, + // because the collected results contain all the data according to the given -replicationFactor. + // This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable. + // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 + // + // It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible. + snr.finishQueryTracers(fmt.Sprintf("cancel request because -search.skipSlowReplicas is set and %d out of %d nodes already returned response "+ + "according to -replicationFactor=%d", resultsCollected, len(sns), *replicationFactor)) + return false, nil + } } if len(errsPartial) < *replicationFactor { // Assume that the result is full if the the number of failing vmstorage nodes diff --git a/docs/Cluster-VictoriaMetrics.md b/docs/Cluster-VictoriaMetrics.md index b6503b8ad1..9f42641408 100644 --- a/docs/Cluster-VictoriaMetrics.md +++ b/docs/Cluster-VictoriaMetrics.md @@ -524,6 +524,10 @@ The cluster works in the following way when some of `vmstorage` nodes are unavai In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable. Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes. + `vmselect` also accepts `-replicationFactor=N` command-line flag. This flag instructs `vmselect` to return full response + if less than `-replicationFactor` vmstorage nodes are unavailable during querying, since it assumes that the remaining + `vmstorage` nodes contain the full data. See [these docs](#replication-and-data-safety) for details. + `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete. Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details. @@ -659,7 +663,8 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason. The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. -Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time. + +Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less than `-replicationFactor` vmstorage nodes are unavailable during the query. See [cluster availability docs](#cluster-availability) for details. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable. @@ -1128,7 +1133,7 @@ Below is the output for `/path/to/vmselect -help`: Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage Supports an array of values separated by comma or specified via multiple flags. -replicationFactor int - How many copies of every time series is available on vmstorage nodes. vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses after adding new vmstorage nodes even if the replication is enabled at vminsert nodes (default 1) + How many copies of every time series is available on the provided -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. See also -search.skipSlowReplicas (default 1) -search.cacheTimestampOffset duration The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s) -search.denyPartialResponse @@ -1209,7 +1214,7 @@ Below is the output for `/path/to/vmselect -help`: -search.setLookbackToStep Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored -search.skipSlowReplicas - Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data. + Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster -search.treatDotsAsIsInRegexps Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter -selectNode array