2019-12-05 19:37:27 +01:00
# Cluster version
2018-10-06 21:48:12 +02:00
2020-06-24 11:05:40 +02:00
< img alt = "Victoria Metrics" src = "logo.png" >
2021-01-07 22:39:05 +01:00
VictoriaMetrics is a fast, cost-effective and scalable time series database. It can be used as a long-term remote storage for Prometheus.
2018-10-13 00:33:14 +02:00
2019-07-02 14:57:45 +02:00
It is recommended using [single-node version ](https://github.com/VictoriaMetrics/VictoriaMetrics ) instead of cluster version
2020-01-08 13:44:21 +01:00
for ingestion rates lower than a million of data points per second.
2019-05-25 13:09:17 +02:00
Single-node version [scales perfectly ](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae )
with the number of CPU cores, RAM and available storage space.
Single-node version is easier to configure and operate comparing to cluster version, so think twice before sticking to cluster version.
2018-10-13 00:33:14 +02:00
2019-10-16 11:31:38 +02:00
Join [our Slack ](http://slack.victoriametrics.com/ ) or [contact us ](mailto:info@victoriametrics.com ) with consulting and support questions.
2019-10-14 23:11:26 +02:00
2018-10-13 00:33:14 +02:00
2019-05-22 23:16:55 +02:00
## Prominent features
2018-10-06 22:01:16 +02:00
2019-05-22 23:23:23 +02:00
- Supports all the features of [single-node version ](https://github.com/VictoriaMetrics/VictoriaMetrics ).
2020-05-27 16:29:37 +02:00
- Performance and capacity scales horizontally. See [these docs for details ](#cluster-resizing-and-scalability ).
- Supports multiple independent namespaces for time series data (aka multi-tenancy). See [these docs for details ](#multitenancy ).
- Supports replication. See [these docs for details ](#replication-and-data-safety ).
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
## Architecture overview
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
VictoriaMetrics cluster consists of the following services:
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
- `vmstorage` - stores the data
2019-07-07 21:00:32 +02:00
- `vminsert` - proxies the ingested data to `vmstorage` shards using consistent hashing
2019-05-22 23:23:23 +02:00
- `vmselect` - performs incoming queries using the data from `vmstorage`
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
Each service may scale independently and may run on the most suitable hardware.
2019-12-05 19:37:27 +01:00
`vmstorage` nodes don't know about each other, don't communicate with each other and don't share any data.
This is [shared nothing architecture ](https://en.wikipedia.org/wiki/Shared-nothing_architecture ).
It increases cluster availability, simplifies cluster maintenance and cluster scaling.
2019-05-22 23:16:55 +02:00
2019-06-28 16:54:13 +02:00
< img src = "https://docs.google.com/drawings/d/e/2PACX-1vTvk2raU9kFgZ84oF-OKolrGwHaePhHRsZEcfQ1I_EC5AB_XPWwB392XshxPramLJ8E4bqptTnFn5LL/pub?w=1104&h=746" >
2019-06-27 18:22:49 +02:00
2019-05-22 23:16:55 +02:00
2020-05-03 17:00:40 +02:00
## Multitenancy
VictoriaMetrics cluster supports multiple isolated tenants (aka namespaces).
Tenants are identified by `accountID` or `accountID:projectID` , which are put inside request urls.
See [these docs ](#url-format ) for details. Some facts about tenants in VictoriaMetrics:
* Each `accountID` and `projectID` is identified by an arbitrary 32-bit integer in the range `[0 .. 2^32)` .
If `projectID` is missing, then it is automatically assigned to `0` . It is expected that other information about tenants
such as auth tokens, tenant names, limits, accounting, etc. is stored in a separate relational database. This database must be managed
2020-12-11 20:08:13 +01:00
by a separate service sitting in front of VictoriaMetrics cluster such as [vmauth ](https://victoriametrics.github.io/vmauth.html ).
2020-05-21 21:53:40 +02:00
[Contact us ](mailto:info@victoriametrics.com ) if you need help with creating such a service.
2020-05-03 17:00:40 +02:00
* Tenants are automatically created when the first data point is written into the given tenant.
* Data for all the tenants is evenly spread among available `vmstorage` nodes. This guarantees even load among `vmstorage` nodes
when different tenants have different amounts of data and different query load.
* VictoriaMetrics doesn't support querying multiple tenants in a single request.
2019-10-06 10:42:29 +02:00
## Binaries
Compiled binaries for cluster version are available in the `assets` section of [releases page ](https://github.com/VictoriaMetrics/VictoriaMetrics/releases ).
See archives containing `cluster` word.
Docker images for cluster version are available here:
- `vminsert` - https://hub.docker.com/r/victoriametrics/vminsert/tags
- `vmselect` - https://hub.docker.com/r/victoriametrics/vmselect/tags
- `vmstorage` - https://hub.docker.com/r/victoriametrics/vmstorage/tags
2019-05-22 23:23:23 +02:00
## Building from sources
2019-05-22 23:16:55 +02:00
2019-06-19 16:55:51 +02:00
Source code for cluster version is available at [cluster branch ](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster ).
2019-05-22 23:23:23 +02:00
### Production builds
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
There is no need in installing Go on a host system since binaries are built
inside [the official docker container for Go ](https://hub.docker.com/_/golang ).
This makes reproducible builds.
So [install docker ](https://docs.docker.com/install/ ) and run the following command:
2019-05-22 23:16:55 +02:00
```
2019-05-22 23:23:23 +02:00
make vminsert-prod vmselect-prod vmstorage-prod
2019-05-22 23:16:55 +02:00
```
2020-04-01 17:05:53 +02:00
Production binaries are built into statically linked binaries. They are put into `bin` folder with `-prod` suffixes:
2019-05-22 23:16:55 +02:00
```
2019-05-22 23:23:23 +02:00
$ make vminsert-prod vmselect-prod vmstorage-prod
$ ls -1 bin
vminsert-prod
vmselect-prod
vmstorage-prod
2019-05-22 23:16:55 +02:00
```
2020-04-01 16:48:36 +02:00
### Development Builds
2020-04-01 17:05:53 +02:00
1. [Install go ](https://golang.org/doc/install ). The minimum supported version is Go 1.13.
2020-04-01 16:48:36 +02:00
2. Run `make` from the repository root. It should build `vmstorage` , `vmselect`
and `vminsert` binaries and put them into the `bin` folder.
2019-05-22 23:23:23 +02:00
### Building docker images
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
Run `make package` . It will build the following docker images locally:
2019-05-22 23:16:55 +02:00
2019-06-07 10:55:37 +02:00
* `victoriametrics/vminsert:<PKG_TAG>`
* `victoriametrics/vmselect:<PKG_TAG>`
* `victoriametrics/vmstorage:<PKG_TAG>`
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package` .
2019-05-22 23:16:55 +02:00
2020-06-26 12:47:21 +02:00
By default images are built on top of [alpine ](https://hub.docker.com/_/scratch ) image in order to improve debuggability.
It is possible to build an image on top of any other base image by setting it via `<ROOT_IMAGE>` environment variable.
For example, the following command builds images on top of [scratch ](https://hub.docker.com/_/scratch ) image:
2019-05-22 23:16:55 +02:00
2020-04-20 00:09:52 +02:00
```bash
2020-06-02 21:40:59 +02:00
ROOT_IMAGE=scratch make package
2020-04-20 00:09:52 +02:00
```
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
## Operation
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## Cluster setup
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
A minimal cluster must contain the following nodes:
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
* a single `vmstorage` node with `-retentionPeriod` and `-storageDataPath` flags
* a single `vminsert` node with `-storageNode=<vmstorage_host>:8400`
* a single `vmselect` node with `-storageNode=<vmstorage_host>:8401`
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
It is recommended to run at least two nodes for each service
for high availability purposes.
2019-05-22 23:16:55 +02:00
2020-05-27 17:09:33 +02:00
An http load balancer such as `nginx` must be put in front of `vminsert` and `vmselect` nodes:
2019-05-22 23:23:23 +02:00
- requests starting with `/insert` must be routed to port `8480` on `vminsert` nodes.
- requests starting with `/select` must be routed to port `8481` on `vmselect` nodes.
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
Ports may be altered by setting `-httpListenAddr` on the corresponding nodes.
2019-05-22 23:16:55 +02:00
2019-11-28 18:15:58 +01:00
It is recommended setting up [monitoring ](#monitoring ) for the cluster.
2020-12-15 23:59:44 +01:00
### Environment variables
2020-02-26 12:23:06 +01:00
Each flag values can be set thru environment variables by following these rules:
- The `-envflag.enable` flag must be set
- Each `.` in flag names must be substituted by `_` (for example `-insert.maxQueueDuration <duration>` will translate to `insert_maxQueueDuration=<duration>` )
- For repeating flags, an alternative syntax can be used by joining the different values into one using `,` as separator (for example `-storageNode <nodeA> -storageNode <nodeB>` will translate to `storageNode=<nodeA>,<nodeB>` )
2020-04-01 17:05:53 +02:00
- It is possible setting prefix for environment vars with `-envflag.prefix` . For instance, if `-envflag.prefix=VM_` , then env vars must be prepended with `VM_`
2019-11-28 18:15:58 +01:00
2020-12-15 23:59:44 +01:00
## Monitoring
2019-11-28 18:15:58 +01:00
All the cluster components expose various metrics in Prometheus-compatible format at `/metrics` page on the TCP port set in `-httpListenAddr` command-line flag.
By default the following TCP ports are used:
- `vminsert` - 8480
- `vmselect` - 8481
- `vmstorage` - 8482
2020-12-11 20:08:13 +01:00
It is recommended setting up [vmagent ](https://victoriametrics.github.io/vmagent.html )
2020-05-23 13:29:21 +02:00
or Prometheus to scrape `/metrics` pages from all the cluster components, so they can be monitored and analyzed
2020-03-12 14:10:02 +01:00
with [the official Grafana dashboard for VictoriaMetrics cluster ](https://grafana.com/grafana/dashboards/11176 )
or [an alternative dashboard for VictoriaMetrics cluster ](https://grafana.com/grafana/dashboards/11831 ).
2019-11-28 18:15:58 +01:00
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## URL format
2019-05-22 23:16:55 +02:00
2019-10-10 01:09:01 +02:00
* URLs for data ingestion: `http://<vminsert>:8480/insert/<accountID>/<suffix>` , where:
2020-05-03 17:00:40 +02:00
- `<accountID>` is an arbitrary 32-bit integer identifying namespace for data ingestion (aka tenant). It is possible to set it as `accountID:projectID` ,
where `projectID` is also arbitrary 32-bit integer. If `projectID` isn't set, then it equals to `0` .
2019-05-22 23:23:23 +02:00
- `<suffix>` may have the following values:
2020-05-16 21:12:53 +02:00
- `prometheus` and `prometheus/api/v1/write` - for inserting data with [Prometheus remote write API ](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write )
- `influx/write` and `influx/api/v2/write` - for inserting data with [Influx line protocol ](https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/ ).
2019-12-13 23:29:14 +01:00
- `opentsdb/api/put` - for accepting [OpenTSDB HTTP /api/put requests ](http://opentsdb.net/docs/build/html/api_http/put.html ).
2020-05-28 13:27:17 +02:00
This handler is disabled by default. It is exposed on a distinct TCP address set via `-opentsdbHTTPListenAddr` command-line flag.
2020-12-11 20:08:13 +01:00
See [these docs ](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#sending-opentsdb-data-via-http-apiput-requests ) for details.
2019-12-09 19:58:19 +01:00
- `prometheus/api/v1/import` - for importing data obtained via `api/v1/export` on `vmselect` (see below).
2020-09-26 03:29:45 +02:00
- `prometheus/api/v1/import/native` - for importing data obtained via `api/v1/export/native` on `vmselect` (see below).
2020-12-11 20:08:13 +01:00
- `prometheus/api/v1/import/csv` - for importing arbitrary CSV data. See [these docs ](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-csv-data ) for details.
2021-02-11 23:59:21 +01:00
- `prometheus/api/v1/import/prometheus` - for importing data in [Prometheus text exposition format ](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format ) and in [OpenMetrics format ](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md ). See [these docs ](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format ) for details.
2019-05-22 23:16:55 +02:00
2020-11-16 19:18:25 +01:00
* URLs for [Prometheus querying API ](https://prometheus.io/docs/prometheus/latest/querying/api/ ): `http://<vmselect>:8481/select/<accountID>/prometheus/<suffix>` , where:
2019-06-12 20:32:10 +02:00
- `<accountID>` is an arbitrary number identifying data namespace for the query (aka tenant)
2019-05-22 23:23:23 +02:00
- `<suffix>` may have the following values:
2020-05-16 21:12:53 +02:00
- `api/v1/query` - performs [PromQL instant query ](https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries ).
- `api/v1/query_range` - performs [PromQL range query ](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries ).
- `api/v1/series` - performs [series query ](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers ).
- `api/v1/labels` - returns a [list of label names ](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names ).
- `api/v1/label/<label_name>/values` - returns values for the given `<label_name>` according [to API ](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values ).
- `federate` - returns [federated metrics ](https://prometheus.io/docs/prometheus/latest/federation/ ).
2020-09-26 03:29:45 +02:00
- `api/v1/export` - exports raw data in JSON line format. See [this article ](https://medium.com/@valyala/analyzing-prometheus-data-with-external-tools-5f3e5e147639 ) for details.
- `api/v1/export/native` - exports raw data in native binary format. It may be imported into another VictoriaMetrics via `api/v1/import/native` (see above).
2020-10-12 19:01:51 +02:00
- `api/v1/export/csv` - exports data in CSV. It may be imported into another VictoriaMetrics via `api/v1/import/csv` (see above).
2020-05-16 21:12:53 +02:00
- `api/v1/status/tsdb` - for time series stats. See [these docs ](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats ) for details.
2020-11-11 12:35:06 +01:00
VictoriaMetrics accepts optional `topN=N` and `date=YYYY-MM-DD` query args for this handler, where `N` is the number of top entries to return in the response
and `YYYY-MM-DD` is the date for collecting the stats. By default the stats is collected for the current day.
2020-07-08 18:15:35 +02:00
- `api/v1/status/active_queries` - for currently executed active queries. Note that every `vmselect` maintains an independent list of active queries,
which is returned in the response.
2020-12-27 11:53:50 +01:00
- `api/v1/status/top_queries` - for listing the most frequently executed queries and queries taking the most duration.
2019-05-22 23:16:55 +02:00
2020-09-10 23:29:26 +02:00
* URLs for [Graphite Metrics API ](https://graphite-api.readthedocs.io/en/latest/api.html#the-metrics-api ): `http://<vmselect>:8481/select/<accountID>/graphite/<suffix>` , where:
- `<accountID>` is an arbitrary number identifying data namespace for query (aka tenant)
- `<suffix>` may have the following values:
2021-02-03 11:03:40 +01:00
- `render` - implements Graphite Render API. See [these docs ](https://graphite.readthedocs.io/en/stable/render_api.html ). This functionality is available in [Enterprise package ](https://victoriametrics.com/enterprise.html ).
2020-09-10 23:29:26 +02:00
- `metrics/find` - searches Graphite metrics. See [these docs ](https://graphite-api.readthedocs.io/en/latest/api.html#metrics-find ).
- `metrics/expand` - expands Graphite metrics. See [these docs ](https://graphite-api.readthedocs.io/en/latest/api.html#metrics-expand ).
- `metrics/index.json` - returns all the metric names. See [these docs ](https://graphite-api.readthedocs.io/en/latest/api.html#metrics-index-json ).
2020-11-23 11:33:17 +01:00
- `tags/tagSeries` - registers time series. See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb ).
- `tags/tagMultiSeries` - register multiple time series. See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb ).
2020-11-16 00:25:38 +01:00
- `tags` - returns tag names. See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags ).
2020-11-16 02:31:09 +01:00
- `tags/<tag_name>` - returns tag values for the given `<tag_name>` . See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags ).
2020-11-16 09:55:55 +01:00
- `tags/findSeries` - returns series matching the given `expr` . See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags ).
2020-11-16 13:49:46 +01:00
- `tags/autoComplete/tags` - returns tags matching the given `tagPrefix` and/or `expr` . See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#auto-complete-support ).
2020-11-16 14:22:36 +01:00
- `tags/autoComplete/values` - returns tag values matching the given `valuePrefix` and/or `expr` . See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#auto-complete-support ).
2020-11-23 14:26:20 +01:00
- `tags/delSeries` - deletes series matching the given `path` . See [these docs ](https://graphite.readthedocs.io/en/stable/tags.html#removing-series-from-the-tagdb ).
2020-09-10 23:29:26 +02:00
2020-12-27 11:53:50 +01:00
* URL for query stats across all tenants: `http://<vmselect>:8481/api/v1/status/top_queries` . It lists with the most frequently executed queries and queries taking the most duration.
2020-12-25 15:44:26 +01:00
2019-10-10 01:09:01 +02:00
* URL for time series deletion: `http://<vmselect>:8481/delete/<accountID>/prometheus/api/v1/admin/tsdb/delete_series?match[]=<timeseries_selector_for_delete>` .
Note that the `delete_series` handler should be used only in exceptional cases such as deletion of accidentally ingested incorrect time series. It shouldn't
be used on a regular basis, since it carries non-zero overhead.
2019-05-22 23:23:23 +02:00
* `vmstorage` nodes provide the following HTTP endpoints on `8482` port:
2020-09-17 13:21:39 +02:00
- `/internal/force_merge` - initiate [forced compactions ](https://victoriametrics.github.io/#forced-merge ) on the given `vmstorage` node.
2019-05-22 23:23:23 +02:00
- `/snapshot/create` - create [instant snapshot ](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282 ),
which can be used for backups in background. Snapshots are created in `<storageDataPath>/snapshots` folder, where `<storageDataPath>` is the corresponding
command-line flag value.
- `/snapshot/list` - list available snasphots.
- `/snapshot/delete?snapshot=<id>` - delete the given snapshot.
- `/snapshot/delete_all` - delete all the snapshots.
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
Snapshots may be created independently on each `vmstorage` node. There is no need in synchronizing snapshots' creation
across `vmstorage` nodes.
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## Cluster resizing and scalability
2019-10-09 12:01:59 +02:00
Cluster performance and capacity scales with adding new nodes.
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
* `vminsert` and `vmselect` nodes are stateless and may be added / removed at any time.
Do not forget updating the list of these nodes on http load balancer.
2019-10-09 16:28:00 +02:00
Adding more `vminsert` nodes scales data ingestion rate. See [this comment ](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/175#issuecomment-536925841 )
about ingestion rate scalability.
2019-10-09 12:01:59 +02:00
Adding more `vmselect` nodes scales select queries rate.
2019-05-22 23:23:23 +02:00
* `vmstorage` nodes own the ingested data, so they cannot be removed without data loss.
2019-10-09 12:01:59 +02:00
Adding more `vmstorage` nodes scales cluster capacity.
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
Steps to add `vmstorage` node:
2019-05-22 23:16:55 +02:00
2019-05-25 16:18:44 +02:00
1. Start new `vmstorage` node with the same `-retentionPeriod` as existing nodes in the cluster.
2019-05-22 23:23:23 +02:00
2. Gradually restart all the `vmselect` nodes with new `-storageNode` arg containing `<new_vmstorage_host>:8401` .
3. Gradually restart all the `vminsert` nodes with new `-storageNode` arg containing `<new_vmstorage_host>:8400` .
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## Updating / reconfiguring cluster nodes
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
All the node types - `vminsert` , `vmselect` and `vmstorage` - may be updated via graceful shutdown.
Send `SIGINT` signal to the corresponding process, wait until it finishes and then start new version
with new configs.
2019-05-22 23:16:55 +02:00
2019-05-22 23:23:23 +02:00
Cluster should remain in working state if at least a single node of each type remains available during
2019-05-23 01:25:54 +02:00
the update process. See [cluster availability ](#cluster-availability ) section for details.
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## Cluster availability
2020-06-18 22:54:44 +02:00
* HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
* The cluster remains available if at least a single `vmstorage` node exists:
- `vminsert` re-routes incoming data from unavailable `vmstorage` nodes to healthy `vmstorage` nodes
2021-01-27 13:05:59 +01:00
- `vmselect` continues serving partial responses if at least a single `vmstorage` node is available. If consistency over availability is preferred, then either pass `-search.denyPartialResponse` command-line flag to `vmselect` or pass `deny_partial_response=1` query arg in requests to `vmselect` .
2020-06-18 22:54:44 +02:00
Data replication can be used for increasing storage durability. See [these docs ](#replication-and-data-safety ) for details.
2020-12-15 23:59:44 +01:00
## Capacity planning
2019-10-19 09:47:46 +02:00
Each instance type - `vminsert` , `vmselect` and `vmstorage` - can run on the most suitable hardware.
2020-12-15 23:59:44 +01:00
### vminsert
2019-10-19 09:47:46 +02:00
* The recommended total number of vCPU cores for all the `vminsert` instances can be calculated from the ingestion rate: `vCPUs = ingestion_rate / 150K` .
* The recommended number of vCPU cores per each `vminsert` instance should equal to the number of `vmstorage` instances in the cluster.
* The amount of RAM per each `vminsert` instance should be 1GB or more. RAM is used as a buffer for spikes in ingestion rate.
2020-08-14 18:13:24 +02:00
The maximum amount of used RAM per `vminsert` node can be tuned with `-memory.allowedPercent` or `-memory.allowedBytes` command-line flags.
For instance, `-memory.allowedPercent=20` limits the maximum amount of used RAM to 20% of the available RAM on the host system.
2019-10-19 09:47:46 +02:00
* Sometimes `-rpc.disableCompression` command-line flag on `vminsert` instances could increase ingestion capacity at the cost
of higher network bandwidth usage between `vminsert` and `vmstorage` .
2020-12-15 23:59:44 +01:00
### vmstorage
2019-10-19 09:47:46 +02:00
* The recommended total number of vCPU cores for all the `vmstorage` instances can be calculated from the ingestion rate: `vCPUs = ingestion_rate / 150K` .
2020-11-18 12:00:32 +01:00
* The recommended total amount of RAM for all the `vmstorage` instances can be calculated from the number of active time series: `RAM = 2 * active_time_series * 1KB` .
2019-10-19 09:47:46 +02:00
Time series is active if it received at least a single data point during the last hour or if it has been queried during the last hour.
2020-10-13 15:47:19 +02:00
The required RAM per each `vmstorage` should be multiplied by `-replicationFactor` if [replication ](#replication-and-data-safety ) is enabled.
Additional RAM can be required for query processing.
Calculated RAM requrements may differ from actual RAM requirements due to various factors:
* The average number of labels per time series. More labels require more RAM.
* The average length of label names and label values. Longer labels require more RAM.
* The type of queries. Heavy queries that scan big number of time series over long time ranges require more RAM.
2019-10-19 09:47:46 +02:00
* The recommended total amount of storage space for all the `vmstorage` instances can be calculated
from the ingestion rate and retention: `storage_space = ingestion_rate * retention_seconds` .
2020-12-15 23:59:44 +01:00
### vmselect
2019-10-19 09:47:46 +02:00
The recommended hardware for `vmselect` instances highly depends on the type of queries. Lightweight queries over small number of time series usually require
small number of vCPU cores and small amount of RAM on `vmselect` , while heavy queries over big number of time series (>10K) usually require
bigger number of vCPU cores and bigger amounts of RAM.
2020-04-29 23:53:00 +02:00
In general it is recommended increasing the number of vCPU cores and RAM per `vmselect` node for higher query performance,
while adding new `vmselect` nodes only when old nodes are overloaded with incoming query stream.
2019-10-19 09:47:46 +02:00
2020-12-15 23:59:44 +01:00
## High availability
2020-06-20 14:53:02 +02:00
It is recommended to run all the components for a single cluster in the same subnetwork with high bandwidth, low latency and low error rates.
This improves cluster performance and availability.
It isn't recommended spreading components for a single cluster across multiple availability zones, since cross-AZ network usually has lower bandwidth, higher latency
and higher error rates comparing the network inside AZ.
If you need multi-AZ setup, then it is recommended running independed clusters in each AZ and setting up
2020-12-11 20:08:13 +01:00
[vmagent ](https://victoriametrics.github.io/vmagent.html ) in front of these clusters, so it could replicate incoming data
2020-06-20 14:53:02 +02:00
into all the cluster. Then [promxy ](https://github.com/jacksontj/promxy ) could be used for querying the data from multiple clusters.
2020-12-15 23:59:44 +01:00
## Helm
2019-05-22 23:16:55 +02:00
2019-06-25 19:13:47 +02:00
Helm chart simplifies managing cluster version of VictoriaMetrics in Kubernetes.
2019-10-13 22:05:37 +02:00
It is available in the [helm-charts ](https://github.com/VictoriaMetrics/helm-charts ) repository.
2019-05-22 23:16:55 +02:00
2020-08-12 20:14:55 +02:00
2020-12-15 23:59:44 +01:00
## Kubernetes operator
2020-08-12 20:14:55 +02:00
[K8s operator ](https://github.com/VictoriaMetrics/operator ) simplifies managing VictoriaMetrics components in Kubernetes.
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## Replication and data safety
2019-05-29 12:14:01 +02:00
2020-06-03 19:24:15 +02:00
In order to enable application-level replication, `-replicationFactor=N` command-line flag must be passed to `vminsert` .
This guarantees that all the data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
For example, when `-replicationFactor=3` is passed to `vminsert` , then it replicates all the ingested data to 3 distinct `vmstorage` nodes.
2020-11-22 23:39:34 +01:00
When the replication is enabled, `-replicationFactor=N` and `-dedup.minScrapeInterval=1ms` command-line flag must be passed to `vmselect` nodes.
The `-replicationFactor=N` improves query performance when a part of vmstorage nodes respond slowly and/or temporarily unavailable.
The `-dedup.minScrapeInterval=1ms` de-duplicates replicated data during queries. It is OK if `-dedup.minScrapeInterval` exceeds 1ms
2020-12-11 20:08:13 +01:00
when [deduplication ](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#deduplication ) is used additionally to replication.
2020-06-03 19:24:15 +02:00
Note that [replication doesn't save from disaster ](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883 ),
so it is recommended performing regular backups. See [these docs ](#backups ) for details.
2020-05-27 16:29:37 +02:00
By default VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` .
2019-07-02 14:57:45 +02:00
It is recommended storing data on [Google Compute Engine persistent disks ](https://cloud.google.com/compute/docs/disks/#pdspecs ),
2019-05-29 12:14:01 +02:00
since they are protected from data loss and data corruption. They also provide consistently high performance
and [may be resized ](https://cloud.google.com/compute/docs/disks/add-persistent-disk ) without downtime.
HDD-based persistent disks should be enough for the majority of use cases.
2019-07-02 14:57:45 +02:00
It is recommended using durable replicated persistent volumes in Kubernetes.
2019-05-29 12:14:01 +02:00
2020-12-15 23:59:44 +01:00
## Backups
2019-05-29 12:14:01 +02:00
2019-07-02 14:57:45 +02:00
It is recommended performing periodical backups from [instant snapshots ](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282 )
2019-05-29 12:14:01 +02:00
for protecting from user errors such as accidental data deletion.
The following steps must be performed for each `vmstorage` node for creating a backup:
1. Create an instant snapshot by navigating to `/snapshot/create` HTTP handler. It will create snapshot and return its name.
2020-12-11 20:08:13 +01:00
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup ](https://victoriametrics.github.io/vbackup.html ).
2019-11-07 20:05:39 +01:00
The archival process doesn't interfere with `vmstorage` work, so it may be performed at any suitable time.
2019-06-03 16:26:35 +02:00
3. Delete unused snapshots via `/snapshot/delete?snapshot=<snapshot_name>` or `/snapshot/delete_all` in order to free up occupied storage space.
2019-05-29 12:14:01 +02:00
There is no need in synchronizing backups among all the `vmstorage` nodes.
Restoring from backup:
1. Stop `vmstorage` node with `kill -INT` .
2020-12-11 20:08:13 +01:00
2. Restore data from backup using [vmrestore ](https://victoriametrics.github.io/vmrestore.html ) into `-storageDataPath` directory.
2019-11-27 18:56:09 +01:00
3. Start `vmstorage` node.
2019-05-29 12:14:01 +02:00
2019-05-22 23:16:55 +02:00
2020-12-15 23:59:44 +01:00
## Profiling
All the cluster components provide the following handlers for [profiling ](https://blog.golang.org/profiling-go-programs ):
* `http://vminsert:8480/debug/pprof/heap` for memory profile and `http://vminsert:8480/debug/pprof/profile` for CPU profile
* `http://vmselect:8481/debug/pprof/heap` for memory profile and `http://vmselect:8481/debug/pprof/profile` for CPU profile
* `http://vmstorage:8482/debug/pprof/heap` for memory profile and `http://vmstorage:8482/debug/pprof/profile` for CPU profile
Example command for collecting cpu profile from `vmstorage` :
```bash
2020-12-16 00:06:42 +01:00
curl -s http://vmstorage:8482/debug/pprof/profile > cpu.pprof
2020-12-15 23:59:44 +01:00
```
Example command for collecting memory profile from `vminsert` :
```bash
2020-12-16 00:06:42 +01:00
curl -s http://vminsert:8480/debug/pprof/heap > mem.pprof
2020-12-15 23:59:44 +01:00
```
2019-05-22 23:16:55 +02:00
## Community and contributions
We are open to third-party pull requests provided they follow [KISS design principle ](https://en.wikipedia.org/wiki/KISS_principle ):
- Prefer simple code and architecture.
- Avoid complex abstractions.
- Avoid magic code and fancy algorithms.
- Avoid [big external dependencies ](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d ).
- Minimize the number of moving parts in the distributed system.
- Avoid automated decisions, which may hurt cluster availability, consistency or performance.
Adhering `KISS` principle simplifies the resulting code and architecture, so it can be reviewed, understood and verified by many people.
2019-05-22 23:23:23 +02:00
Due to `KISS` cluster version of VictoriaMetrics has no the following "features" popular in distributed computing world:
2019-06-26 22:50:17 +02:00
- Fragile gossip protocols. See [failed attempt in Thanos ](https://github.com/improbable-eng/thanos/blob/030bc345c12c446962225221795f4973848caab5/docs/proposals/completed/201809_gossip-removal.md ).
2019-05-22 23:23:23 +02:00
- Hard-to-understand-and-implement-properly [Paxos protocols ](https://www.quora.com/In-distributed-systems-what-is-a-simple-explanation-of-the-Paxos-algorithm ).
2020-06-04 19:19:06 +02:00
- Complex replication schemes, which may go nuts in unforesseen edge cases. See [replication docs ](#replication-and-data-safety ) for details.
2019-05-22 23:23:23 +02:00
- Automatic data reshuffling between storage nodes, which may hurt cluster performance and availability.
- Automatic cluster resizing, which may cost you a lot of money if improperly configured.
- Automatic discovering and addition of new nodes in the cluster, which may mix data between dev and prod clusters :)
- Automatic leader election, which may result in split brain disaster on network errors.
2019-05-22 23:16:55 +02:00
## Reporting bugs
Report bugs and propose new features [here ](https://github.com/VictoriaMetrics/VictoriaMetrics/issues ).
## Victoria Metrics Logo
2018-11-29 21:47:31 +01:00
[Zip ](VM_logo.zip ) contains three folders with different image orientation (main color and inverted version).
Files included in each folder:
* 2 JPEG Preview files
* 2 PNG Preview files with transparent background
* 2 EPS Adobe Illustrator EPS10 files
2019-05-22 23:16:55 +02:00
### Logo Usage Guidelines
2018-11-29 21:47:31 +01:00
2019-05-22 23:16:55 +02:00
#### Font used:
2018-11-29 21:47:31 +01:00
2019-05-22 23:16:55 +02:00
* Lato Black
2018-11-29 21:47:31 +01:00
* Lato Regular
2019-05-22 23:16:55 +02:00
#### Color Palette:
2018-11-29 21:47:31 +01:00
2019-05-22 23:16:55 +02:00
* HEX [#110f0f ](https://www.color-hex.com/color/110f0f )
2018-11-29 21:47:31 +01:00
* HEX [#ffffff ](https://www.color-hex.com/color/ffffff )
2019-05-22 23:16:55 +02:00
### We kindly ask:
2018-11-29 21:47:31 +01:00
- Please don't use any other font instead of suggested.
- There should be sufficient clear space around the logo.
- Do not change spacing, alignment, or relative locations of the design elements.
- Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.