Because CockroachDB is designed with high fault tolerance, backups are primarily needed for disaster recovery (i.e., if your cluster loses a majority of its nodes). Isolated issues (such as small-scale node outages) do not require any intervention. However, as an operational best practice, we recommend taking regular backups of your data.
There are two main types of backups:
You can use the BACKUP
statement to efficiently back up your cluster's schemas and data to popular cloud services such as AWS S3, Google Cloud Storage, or NFS, and the RESTORE
statement to efficiently restore schema and data as necessary. For more information, see Use Cloud Storage.
The BACKUP ... TO
and RESTORE ... FROM
syntax is deprecated as of v22.1 and will be removed in a future release.
We recommend using the BACKUP ... INTO {collectionURI}
syntax, which creates or adds to a backup collection in your storage location. For restoring backups, we recommend using RESTORE FROM {backup} IN {collectionURI}
with {backup}
being LATEST
or a specific subdirectory.
For guidance on the syntax for backups and restores, see the BACKUP
and RESTORE
examples.
We recommend using scheduled backups to automate daily backups of your cluster.
Supported products
The feature described on this page is available in CockroachDB Dedicated, CockroachDB Serverless, and CockroachDB Self-Hosted clusters when you are running customer-owned backups. For a full list of features, see Backup and restore product support.
Backup collections
A backup collection defines a set of backups and their metadata. The collection can contain multiple full backups and their subsequent incremental backups. The path to a backup is created using a date-based naming scheme and stored at the collection URI passed with the BACKUP
statement.
There are some specific cases where part of the collection data is stored at a different URI:
- A locality-aware backup. The backup collection will be stored according to the URIs passed with the
BACKUP
statement:BACKUP INTO LATEST IN {collectionURI}, {localityURI}, {localityURI}
. Here, thecollectionURI
represents the default locality. - As of v22.1, it is possible to store incremental backups at a different URI to the related full backup. This means that one or multiple storage locations can hold one backup collection.
By default, full backups are stored at the root of the collection's URI in a date-based path, and incremental backups are stored in the /incrementals
directory. The following example shows a backup collection created using these default values, where all backups reside in one storage bucket:
Collection URI:
|—— 2022
|—— 02
|—— 09-155340.13/
|—— Full backup files
[...]
|—— incrementals
|—— 2022
|—— 02
|—— 25-172907.21/
|—— 20220325
|—— 17921.23
|—— incremental backup files
SHOW BACKUPS IN {collectionURI}
will display a list of the full backup subdirectories at the collection's URI.
Alternately, the following directories also constitute a backup collection. There are multiple backups in two separate URIs. Each individual backup is a full backup and its related incremental backup(s). Despite using the incremental_location
option to store the incremental backup in an alternative location, that incremental backup is still part of this backup collection as it depends on the full backup in the first cloud storage bucket:
Collection URI
|—— 2022
|—— 02
|—— 09-155340.13/
|—— Full backup files
|—— 20220210/
|—— 155530.50/
|—— 16-143018.72/
|—— Full backup files
|—— incrementals
|—— 2022
|—— 02
|—— 25-172907.21/
|—— 20220325
|—— 17921.23
|—— incremental backup files
Explicit Incrementals URI
|—— 2022
|—— 02
|—— 25-172907.21/
|—— 20220325
|—— 17921.23
|—— incremental_location backup files
In the examples on this page, {collectionURI}
is a placeholder for the storage location that will contain the example backup.
Full backups
Full backups are now available to both core and Enterprise users.
Full backups contain an un-replicated copy of your data and can always be used to restore your cluster. These files are roughly the size of your data and require greater resources to produce than incremental backups. You can take full backups as of a given timestamp. Optionally, you can include the available revision history in the backup.
In most cases, it's recommended to take nightly full backups of your cluster. A cluster backup allows you to do the following:
- Restore table(s) from the cluster
- Restore database(s) from the cluster
- Restore a full cluster
Full cluster backups include Enterprise license keys. When you restore a full cluster backup that includes Enterprise license, the Enterprise license is also restored.
To set a target for the amount of backup data written to each backup file, use the bulkio.backup.file_size
cluster setting.
See the SET CLUSTER SETTING
page for more details on using cluster settings.
Take a full backup
To perform a full cluster backup, use the BACKUP
statement:
BACKUP INTO '{collectionURI}';
To restore a backup, use the RESTORE
statement, specifying what you want to restore as well as the collection's URI:
To restore the latest backup of a table:
RESTORE TABLE bank.customers FROM LATEST IN '{collectionURI}';
To restore the latest backup of a database:
RESTORE DATABASE bank FROM LATEST IN '{collectionURI}';
To restore the latest backup of your full cluster:
RESTORE FROM LATEST IN '{collectionURI}';
Note:A full cluster restore can only be run on a target cluster that has never had user-created databases or tables.
To restore a backup from a specific subdirectory:
RESTORE DATABASE bank FROM {subdirectory} IN '{collectionURI}';
To view the available backup subdirectories, use SHOW BACKUPS
.
Incremental backups
To take incremental backups, you need an Enterprise license.
If your cluster grows too large for daily full backups, you can take less frequent full backups (e.g., weekly) with daily incremental backups. Incremental backups are storage efficient and faster than full backups for larger clusters.
If you are taking backups on a regular cadence, we recommend creating a schedule for your backups.
Recommendations for incremental backup frequency
Incremental backups form chains between full backups. Each incremental backup contains only the data that has changed since a base set of backups. This base set of backups must include one full backup and can include many incremental backups, which are smaller and faster to produce than full backups. You can take incremental backups either as of a given timestamp or with full revision history.
Cockroach Labs recommends taking incremental backups every 10 minutes. CockroachDB supports up to 400 incremental backups between full backups. This may vary based on your specific use-case, so we recommend testing within your own environment and workloads. This can look like:
- A full backup taken daily with incrementals taken every hour for a total of 24 incremental backups.
- A full backup taken daily with incrementals taken every 10 minutes for a total of 144 incremental backups.
- A full backup taken daily with incrementals taken every 5 minutes for a total of 288 incremental backups.
- A full backup taken weekly with incrementals taken every hour for a total of 168 incremental backups.
- A full backup taken weekly with incrementals taken every 30 minutes for a total of 336 incremental backups.
Garbage collection and backups
Incremental backups with revision history are created by finding what data has been created, deleted, or modified since the timestamp of the last backup in the chain of backups. For the first incremental backup in a chain, this timestamp corresponds to the timestamp of the base (full) backup. For subsequent incremental backups, this timestamp is the timestamp of the previous incremental backup in the chain.
Garbage collection Time to Live (GC TTL) determines the period for which CockroachDB retains revisions of a key. If the GC TTL of the backup's target is shorter than the frequency at which you take incremental backups with revision history, then the revisions become susceptible to garbage collection before you have backed them up. This will cause the incremental backup with revision history to fail.
We recommend configuring the garbage collection period to be at least the frequency of incremental backups and ideally with a buffer to account for slowdowns. You can configure garbage collection periods using the ttlseconds
replication zone setting.
If an incremental backup is created outside of the garbage collection period, you will receive a protected ts verification error…
. To resolve this issue, see the Common Errors page.
If you are creating incremental backups as part of a backup schedule, protected timestamps will ensure the backup revision data is not garbage collected, which allows you to lower the GC TTL. See Protected timestamps and scheduled backups for more detail.
Take an incremental backup
Periodically run the BACKUP
command to take a full backup of your cluster:
> BACKUP INTO '{collectionURI}';
Then, create nightly incremental backups based off of the full backups you've already created. To append an incremental backup to the most recent full backup created at the collection's URI, use the LATEST
syntax:
> BACKUP INTO LATEST IN '{collectionURI}';
This will add the incremental backup to the default /incrementals
directory at the root of the backup collection's directory. With incremental backups in the /incrementals
directory, you can apply different lifecycle/retention policies from cloud storage providers to the /incrementals
directory as needed.
In v21.2 and earlier, incremental backups were stored in the same directory as their full backup (i.e., collectionURI/subdirectory
). If an incremental backup command points to a subdirectory with incremental backups created in v21.2 and earlier, v22.1 and later will write the incremental backup to the v21.2 default location. To use the v21.2 behavior on a backup that does not already contain incremental backups in the full backup subdirectory, use the incremental_location
option, as shown in this example.
If it's ever necessary, you can then use the RESTORE
statement to restore your cluster, database(s), and/or table(s). Restoring from incremental backups requires previous full and incremental backups.
To restore from the latest backup in the collection, stored in the default /incrementals
collection subdirectory, run:
RESTORE FROM LATEST IN '{collectionURI}';
To restore from a specific backup in the collection:
RESTORE FROM '{subdirectory}' IN '{collectionURI}';
For example:
RESTORE FROM '2023/03/23-213101.37' IN 's3://bucket/path?AUTH=implicit';
When you restore from an incremental backup, you're restoring the entire table, database, or cluster. CockroachDB uses both the latest (or a specific) incremental backup and the full backup during this process. You cannot restore an incremental backup without a full backup. Furthermore, it is not possible to restore over a table, database, or cluster with existing data. Refer to Restore types for detail on the types of backups you can restore.
RESTORE
will re-validate indexes when incremental backups are created from an older version (v20.2.2 and earlier or v20.1.4 and earlier), but restored by a newer version (v21.1.0+). These earlier releases may have included incomplete data for indexes that were in the process of being created.
Incremental backups with explicitly specified destinations
To explicitly control where your incremental backups go, use the incremental_location
option. By default, incremental backups are stored in the /incrementals
subdirectory at the root of the collection. However, there are some advanced cases where you may want to store incremental backups in a different storage location.
In the following examples, the {collectionURI}
specifies the storage location containing the full backup. The {explicit_incrementalsURI}
is the alternative location that you can store an incremental backup:
BACKUP INTO LATEST IN '{collectionURI}' AS OF SYSTEM TIME '-10s' WITH incremental_location = '{explicit_incrementalsURI}';
Although the incremental backup will be in a different storage location, it is still part of the logical backup collection.
A full backup must be present in the {collectionURI}
in order to take an incremental backup to the alternative {explicit_incrementalsURI}
. If there isn't a full backup present in {collectionURI}
when taking an incremental backup with incremental_location
, the error path does not contain a completed latest backup
will be returned.
For details on the backup directory structure when taking incremental backups with incremental_location
, see this incremental location directory structure example.
To take incremental backups that are stored in the same way as v21.2 and earlier, you can use the incremental_location
option. You can specify the same collectionURI
with incremental_location
and the backup will place the incremental backups in a date-based path under the full backup, rather than in the default /incrementals
directory:
BACKUP INTO LATEST IN '{collectionURI}' AS OF SYSTEM TIME '-10s' WITH incremental_location = '{collectionURI}';
When you append incrementals to this backup, they will continue to be stored in a date-based path under the full backup.
To restore an incremental backup that was taken using the incremental_location
option, you must run RESTORE
with the full backup's location and the incremental_location
option referencing the location passed in the original BACKUP
statement:
RESTORE TABLE movr.users FROM LATEST IN '{collectionURI}' WITH incremental_location = '{explicit_incrementalsURI}';
For details on cloud storage URLs, see Use Cloud Storage.
Examples
The examples in this section use one of the following storage URIs:
- External connections, which allow you to represent an external storage or sink URI. You can then specify the external connection's name in statements rather than the provider-specific URI. For detail on using external connections, see the
CREATE EXTERNAL CONNECTION
page. - Amazon S3 connection strings with the default
AUTH=specified
parameter. For guidance on usingAUTH=implicit
authentication with Amazon S3 buckets instead, read Cloud Storage Authentication.
For guidance on connecting to other storage options or using other authentication parameters instead, read Use Cloud Storage.
Scheduled backups
You can use CREATE SCHEDULE FOR BACKUP
to set a recurring schedule for full and incremental backups. To create a schedule that includes incremental backups, you must have an Enterprise license.
Include the FULL BACKUP ALWAYS
clause for a schedule to take only full backups. For example, to create a schedule for taking full cluster backups:
CREATE SCHEDULE core_schedule_label
FOR BACKUP INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
RECURRING '@daily'
FULL BACKUP ALWAYS
WITH SCHEDULE OPTIONS first_run = 'now';
schedule_id | name | status | first_run | schedule | backup_stmt
---------------------+---------------------+--------+---------------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
588799238330220545 | core_schedule_label | ACTIVE | 2020-09-11 00:00:00+00:00 | @daily | BACKUP INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' WITH detached
(1 row)
For more examples on how to schedule backups that take full and incremental backups, refer to CREATE SCHEDULE FOR BACKUP
.
Exclude a table's data from backups
In some situations, you may want to exclude a table's row data from a backup. For example, you have a table that contains high-churn data that you would like to garbage collect more quickly than the incremental backup schedule for the database or cluster holding the table. You can use the exclude_data_from_backup = true
parameter with a CREATE TABLE
or ALTER TABLE
statement to mark a table's row data for exclusion from a backup.
It is important to note that the backup produced contains an empty table with the data excluded. The backup process does not attempt to contact the KV storage layer ranges to retrieve data when exclude_data_from_backup = true
. Thus, backups will continue to succeed even when some ranges or regions are unavailable, if and only if, the unavailable ranges are in tables excluded from the backups.
Setting this parameter prevents the cluster or database backup from delaying GC TTL on the key span for this table, and it also respects the configured GC TTL. This is useful when you want to set a shorter garbage collection window for tables containing high-churn data to avoid an accumulation of unnecessary data and improves the reliability in partially unavailable clusters.
Using the movr
database as an example:
SHOW TABLES;
schema_name | table_name | type | owner | estimated_row_count | locality
--------------+----------------------------+-------+-------+---------------------+-----------
public | promo_codes | table | root | 1021 | NULL
public | rides | table | root | 730 | NULL
public | user_promo_codes | table | root | 58 | NULL
public | users | table | root | 211 | NULL
public | vehicle_location_histories | table | root | 10722 | NULL
public | vehicles | table | root | 69 | NULL
If the user_promo_codes
table's data does not need to be included in future backups, you can run the following to exclude the table's row data:
ALTER TABLE movr.user_promo_codes SET (exclude_data_from_backup = true);
Then back up the movr
database:
BACKUP DATABASE movr INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' AS OF SYSTEM TIME '-10s';
Restore the database with a new name:
RESTORE DATABASE movr FROM LATEST IN 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' WITH new_db_name = new_movr;
Move to the new database:
USE new_movr;
You'll find that the table schema is restored:
SHOW CREATE user_promo_codes;
table_name | create_statement
-------------------+------------------------------------------------------------------------------------------------------------------
user_promo_codes | CREATE TABLE public.user_promo_codes (
| city VARCHAR NOT NULL,
| user_id UUID NOT NULL,
| code VARCHAR NOT NULL,
| "timestamp" TIMESTAMP NULL,
| usage_count INT8 NULL,
| CONSTRAINT user_promo_codes_pkey PRIMARY KEY (city ASC, user_id ASC, code ASC),
| CONSTRAINT user_promo_codes_city_user_id_fkey FOREIGN KEY (city, user_id) REFERENCES public.users(city, id)
| ) WITH (exclude_data_from_backup = true)
However, the user_promo_codes
table has no row data:
SELECT * FROM user_promo_codes;
city | user_id | code | timestamp | usage_count
-----+---------+------+-----------+--------------
(0 rows)
To create a table with exclude_data_from_backup
, see Create a table with data excluded from backup.
Advanced examples
For examples of advanced BACKUP
and RESTORE
use cases, see:
- Incremental backups with a specified destination
- Backup with revision history and point-in-time restore
- Locality-aware backup and restore
- Encrypted backup and restore
- Restore into a different database
- Remove the foreign key before restore
- Restoring users from
system.users
backup - Show an incremental backup at a different location
- Exclude a table's data from backups
To take incremental backups, backups with revision history, locality-aware backups, and encrypted backups, you need an Enterprise license.