diff --git a/docs/vmbackup.md b/docs/vmbackup.md index 2450dc241..58c1028aa 100644 --- a/docs/vmbackup.md +++ b/docs/vmbackup.md @@ -62,6 +62,10 @@ with the following command: ``` It saves time and network bandwidth costs by performing server-side copy for the shared data from the `-origin` to `-dst`. +Typical object storage just creates new names for already existing objects when performing server-side copy, +so this operation should be fast and inexpensive. Unfortunately, there are object storage systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/), +which make full copies for the copied objects during server-side copy. This may significantly slow down server-side copy +and make it very expensive. ### Incremental backups @@ -82,20 +86,24 @@ Smart backups mean storing full daily backups into `YYYYMMDD` folders and creati ./vmbackup -storageDataPath= -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs:///latest ``` -Where `` is the latest [snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots). -The command will upload only changed data to `gs:///latest`. +This command creates an [instant snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots) +and uploads it to `gs:///latest`. It uploads only the changed data (aka incremental backup). This saves network bandwidth costs and time +when backing up large amounts of data. * Run the following command once a day: ```console -./vmbackup -storageDataPath= -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs:/// -origin=gs:///latest +./vmbackup -storageDataPath= -origin=gs:///latest -dst=gs:/// ``` -Where `` is the snapshot for the last day ``. +This command creates server-side copy of the backup from `gs:///latest` to `gs:///`, were `` is the current +date like `20240125`. Server-side copy of the backup should be fast on most object storage systems, since it just creates new names for already +existing objects. The server-side copy can be slow on some object storage systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/), +since they may perform full object copy instead of creating new names for already existing objects. This may be slow and expensive. + +The `smart backups` approach described above saves network bandwidth costs on hourly backups (since they are incremental) +and allows recovering data from either the last hour (the `latest` backup) or from any day (`YYYYMMDD` backups). -This approach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (`latest` backup) -or from any day (`YYYYMMDD` backups). Because of this feature, it is not recommended to store `latest` data folder -in storages with expensive reads or additional archiving features (like [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/)). Note that hourly backup shouldn't run when creating daily backup. Do not forget to remove old backups when they are no longer needed in order to save storage costs. @@ -115,7 +123,9 @@ from `gs://bucket/foo` to `gs://bucket/bar`: The `-origin` and `-dst` must point to the same object storage bucket or to the same filesystem. The server-side backup copy is usually performed at much faster speed comparing to the usual backup, since backup data isn't transferred -between the remote storage and locally running `vmbackup` tool. +between the remote storage and locally running `vmbackup` tool. Object storage systems usually just make new names for already existing +objects during server-side copy. Unfortunately there are systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/), +which perform full object copy during server-side copying. This may be slow and expensive. If the `-dst` already contains some data, then its' contents is synced with the `-origin` data. This allows making incremental server-side copies of backups. diff --git a/docs/vmbackupmanager.md b/docs/vmbackupmanager.md index 025b57eb1..1b846b398 100644 --- a/docs/vmbackupmanager.md +++ b/docs/vmbackupmanager.md @@ -124,9 +124,11 @@ The result on the GCS bucket latest folder -Please note, `latest` data folder is used for [smart backups](https://docs.victoriametrics.com/vmbackup.html#smart-backups). -It is not recommended to store `latest` data folder in storages with expensive reads or additional archiving features -(like [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/)). +`vmbackupmanager` uses [smart backups](https://docs.victoriametrics.com/vmbackup.html#smart-backups) technique in order +to speed up backups and save both data transfer costs and data copying costs. This includes server-side copy of already existing +objects. Typical object storage systems implement server-side copy by creating new names for already existing objects. +This is very fast and efficient. Unfortunately there are systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/), +which perform full object copy during server-side copying. This may be slow and expensive. Please, see [vmbackup docs](https://docs.victoriametrics.com/vmbackup.html#advanced-usage) for more examples of authentication with different storage types.