diff --git a/docs/Home.md b/docs/Home.md index bcb28f838..116b338d8 100644 --- a/docs/Home.md +++ b/docs/Home.md @@ -8,4 +8,5 @@ * [FAQ](FAQ) * [Cluster version](Cluster-VictoriaMetrics) * [Articles](Articles) - +* [vmbackup](vmbackup) +* [vmrestore](vmrestore) diff --git a/docs/vmbackup.md b/docs/vmbackup.md new file mode 100644 index 000000000..247fb1acd --- /dev/null +++ b/docs/vmbackup.md @@ -0,0 +1,181 @@ +## vmbackup + +`vmbackup` creates VictoriaMetrics data backups from [instant snapshots](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots). + +Supported storage systems for backups: + +* [GCS](https://cloud.google.com/storage/). Example: `gcs:///` +* [S3](https://aws.amazon.com/s3/). Example: `s3:///` +* Any S3-compatible storage such as [MinIO](https://github.com/minio/minio). See `-customS3Endpoint` command-line flag. +* Local filesystem. Example: `fs://` + +Incremental backups and full backups are supported. Incremental backups are created automatically if the destination path already contains data from the previous backup. +Full backups can be sped up with `-origin` pointing to already existing backup on the same remote storage. In this case `vmbackup` makes server-side copy for the shared +data between the existing backup and new backup. This saves time and costs on data transfer. + +Backup process can be interrupted at any time. It is automatically resumed from the interruption point when restarting `vmbackup` with the same args. + +Backed up data can be restored with [vmrestore](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmrestore/README.md). + +See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details. + + +### Use cases + +#### Regular backups + +Regular backup can be performed with the following command: + +``` +vmbackup -storageDataPath= -snapshotName= -dst=gcs:/// +``` + +* `` - path to VictoriaMetrics data pointed by `-storageDataPath` command-line flag in single-node VictoriaMetrics or in cluster `vmstorage`. + There is no need to stop VictoriaMetrics for creating backups, since they are performed from immutable [instant snapshots](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots). +* `` is the snapshot to backup. See [how to create instant snapshots](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots). +* `` is already existing name for [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets). +* `` is the destination path where new backup will be placed. + + +#### Regular backups with server-side copy from existing backup + +If the destination GCS bucket already contains the previous backup at `-origin` path, then new backup can be sped up +with the following command: + +``` +vmbackup -storageDataPath= -snapshotName= -dst=gcs:/// -origin=gcs:/// +``` + +This saves time and network bandwidth costs by performing server-side copy for the shared data from the `-origin` to `-dst`. + + +#### Incremental backups + +Incremental backups are performed if `-dst` points to already existing backup. In this case only new data is uploaded to remote storage. +This saves time and network bandwidth costs when working with big backups: + +``` +vmbackup -storageDataPath= -snapshotName= -dst=gcs:/// +``` + + +#### Smart backups + +Smart backups mean storing full daily backups into `YYYYMMDD` folders and creating incremental hourly backup into `latest` folder: + +* Run the following command every hour: + +``` +vmbackup -snapshotName= -dst=gcs:///latest +``` + +Where `` is the latest [snapshot](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots). +The command will upload only changed data to `gcs:///latest`. + +* Run the following command once a day: + +``` +vmbackup -snapshotName= -dst=gcs:/// -origin=gcs:///latest +``` + +Where `` is the snapshot for the last day ``. + + +This apporach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (`latest` backup) +or from any day (`YYYYMMDD` backups). Note that hourly backup shouldn't run when creating daily backup. + +Do not forget removing old snapshots and backups when they are no longer needed for saving storage costs. + + +### How does it work? + +The backup algorithm is the following: + +1. Collect information about files in the `-snapshotName`, in the `-dst` and in the `-origin`. +2. Determine files in `-dst`, which are missing in `-snapshotName`, and delete them. These are usually small files, which are already merged into bigger files in the snapshot. +3. Determine files from `-snapshotName`, which are missing in `-dst`. These are usually small new files and bigger merged files. +4. Determine files from step 3, which exist in the `-origin`, and perform server-side copy of these files from `-origin` to `-dst`. + This are usually the biggest and the oldest files, which are shared between backups. +5. Upload the remaining files from setp 3 from `-snapshotName` to `-dst`. + +The algorithm splits source files into 100MB chunks in the backup. Each chunk is stored as a separate file in the backup. +Such splitting minimizes the amounts of data to re-transfer after temporary errors. + +`vmbackup` relies on [instant snapshot](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) properties: + +- All the files in the snapshot are immutable. +- Old files are periodically merged into new files. +- Smaller files have higher probability to be merged. +- Consecutive snapshots share many identical files. + +These properties allow performing fast and cheap incremental backups and server-side copying from `-origin` paths. +See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details. +`vmbackup` can work improperly or slowly when these properties are violated. + + +### Troubleshooting + +* If the backup is slow, then try setting higher value for `-concurrency` flag. This will increase the number of concurrent workers that upload data to backup storage. +* If `vmbackup` eats all the network bandwidth, then set `-maxBytesPerSecond` to the desired value. +* If `vmbackup` has been interrupted due to temporary error, then just restart it with the same args. It will resume the backup process. + + +### Advanced usage + +Run `vmbackup -help` in order to see all the available options: + +``` + -concurrency int + The number of concurrent workers. Higher concurrency may reduce backup duration (default 10) + -configFilePath string + Path to file with S3 configs. Configs are loaded from default location if not set. + See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html + -configProfile string + Profile name for S3 configs (default "default") + -credsFilePath string + Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. + See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html + -customS3Endpoint string + Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set + -dst string + Where to put the backup on the remote storage. Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir + -dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded + -loggerLevel string + Minimum level of errors to log. Possible values: INFO, ERROR, FATAL, PANIC (default "INFO") + -maxBytesPerSecond int + The maximum upload speed. There is no limit if it is set to 0 + -memory.allowedPercent float + Allowed percent of system memory VictoriaMetrics caches may occupy (default 60) + -origin string + Optional origin directory on the remote storage with old backup for server-side copying when performing full backup. This speeds up full backups + -snapshotName string + Name for the snapshot to backup. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots + -storageDataPath string + Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data") + -version + Show VictoriaMetrics version +``` + + +### How to build from sources + +It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - see `vmutils-*` archives there. + + +#### Development build + +1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.12. +2. Run `make vmbackup` from the root folder of the repository. + It builds `vmbackup` binary and puts it into the `bin` folder. + +#### Production build + +1. [Install docker](https://docs.docker.com/install/). +2. Run `make vmbackup-prod` from the root folder of the repository. + It builds `vmbackup-prod` binary and puts it into the `bin` folder. + +#### Building docker images + +Run `make package-vmbackup`. It builds `victoriametrics/vmbackup:` docker image locally. +`` is auto-generated image tag, which depends on source code in the repository. +The `` may be manually set via `PKG_TAG=foobar make package-vmbackup`. diff --git a/docs/vmrestore.md b/docs/vmrestore.md new file mode 100644 index 000000000..1e62d143c --- /dev/null +++ b/docs/vmrestore.md @@ -0,0 +1,86 @@ +## vmrestore + +`vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md). +VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data. + +Restore process can be interrupted at any time. It is automatically resumed from the inerruption point +when restarting `vmrestore` with the same args. + + +### Usage + +VictoriaMetrics must be stopped during the restore process. + +``` +vmrestore -src=gcs:/// -storageDataPath= + +``` + +* `` is [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets) name. +* `` is the path to backup made with [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md) on GCS bucket. +* `` is the path to folder where data will be restored. This folder must be passed + to VictoriaMetrics in `-storageDataPath` command-line flag after the restore process is complete. + +The original `-storageDataPath` directory may contain old files. They will be susbstituted by the files from backup. + + +### Troubleshooting + +* If `vmrestore` eats all the network bandwidth, then set `-maxBytesPerSecond` to the desired value. +* If `vmrestore` has been interrupted due to temporary error, then just restart it with the same args. It will resume the restore process. + + +### Advanced usage + +Run `vmrestore -help` in order to see all the available options: + +``` + -concurrency int + The number of concurrent workers. Higher concurrency may reduce restore duration (default 10) + -configFilePath string + Path to file with S3 configs. Configs are loaded from default location if not set. + See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html + -configProfile string + Profile name for S3 configs (default "default") + -credsFilePath string + Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. + See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html + -customS3Endpoint string + Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set + -loggerLevel string + Minimum level of errors to log. Possible values: INFO, ERROR, FATAL, PANIC (default "INFO") + -maxBytesPerSecond int + The maximum download speed. There is no limit if it is set to 0 + -memory.allowedPercent float + Allowed percent of system memory VictoriaMetrics caches may occupy (default 60) + -src string + Source path with backup on the remote storage. Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir + -storageDataPath string + Destination path where backup must be restored. VictoriaMetrics must be stopped when restoring from backup. -storageDataPath dir can be non-empty. In this case only missing data is downloaded from backup (default "victoria-metrics-data") + -version + Show VictoriaMetrics version +``` + + +### How to build from sources + +It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - see `vmutils-*` archives there. + + +#### Development build + +1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.12. +2. Run `make vmrestore` from the root folder of the repository. + It builds `vmrestore` binary and puts it into the `bin` folder. + +#### Production build + +1. [Install docker](https://docs.docker.com/install/). +2. Run `make vmrestore-prod` from the root folder of the repository. + It builds `vmrestore-prod` binary and puts it into the `bin` folder. + +#### Building docker images + +Run `make package-vmrestore`. It builds `victoriametrics/vmrestore:` docker image locally. +`` is auto-generated image tag, which depends on source code in the repository. +The `` may be manually set via `PKG_TAG=foobar make package-vmrestore`.