CockroachDB provides backup validation tools to check that backups you have in storage are restorable. Although a successful restore completely validates a backup, the validation tools offer a faster alternative and return an error message if a backup is not valid.
You can validate a backup of a cluster, database, or table backup with one of the following SHOW BACKUP
or RESTORE
statements. The three options result in increasing levels of backup validation:
SHOW BACKUP ... WITH check_files
: Check that all files belonging to a backup are in the expected location in storage. Refer to Validate backup files for an example.RESTORE ... WITH schema_only
: Restore the schema from the backup to verify that it is valid without restoring any rows. Refer to Validate a backup is restorable for an example.RESTORE ... WITH schema_only, verify_backup_table_data
: Run aschema_only
restore and have the restore read all rows from external storage, verify checksums, and discard the rows before writing them to disk. To useverify_backup_table_data
, you must includeschema_only
in the statement. Refer to Validate backup table data is restorable for an example.
The options that give the most validation coverage will increase the runtime of the check. That is, verify_backup_table_data
will take a longer time to validate a backup compared to check_files
or schema_only
alone. Despite that, each of these validation options provide a quicker way to validate a backup over running a "regular" restore.
You do not need an Enterprise license to run any of the backup validation tools described on this page.
Recommendations
Cockroach Labs recommends implementing the following validation plan to test your backups:
- Very frequent
schema_only
restores: Checks your credentials, schema issues, and version compatibility. - Frequent
verify_backup_table_data
restores: Tests that all data files are present and uncorrupted. - Somewhat frequent "full" restores: Validates a backup completely.
A schema_only
restore provides high confidence in the recoverability of your backup and a verify_backup_table_data
restore provides a slightly higher confidence as it checks for corruptions. Only a complete RESTORE
can provide full confidence in validating a backup. The actual frequency you should run backup validation tests depends on the criticality of your data, the rate of change in your database, and your organization's disaster recovery and business continuity requirements.
Supported products
The feature described on this page is available in CockroachDB Dedicated, CockroachDB Serverless, and CockroachDB Self-Hosted clusters when you are running customer-owned backups. For a full list of features, see Backup and restore product support.
Validate backup files
Using SHOW BACKUP
with the check_files
option, you can check that all SST and metadata files that belong to a backup are present in the storage location.
Take a backup that we'll use for each of the examples on this page:
BACKUP DATABASE movr INTO "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" AS OF SYSTEM TIME "-10s";
To find a specific backup to validate in the storage location, show the stored backups in the storage location:
SHOW BACKUPS IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}";
path ------------------------ 2022/09/19-134123.64 2022/09/19-134559.68 (2 rows)
Use
SHOW BACKUP ... check_files
with a backup for validation:SHOW BACKUP "2022/09/19-134123.64" IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" WITH check_files;
This will return the following output after validating that the backup files are correct and present:
database_name | parent_schema_name | object_name | object_type | backup_type | start_time | end_time | size_bytes | rows | is_full_cluster | file_bytes ----------------+--------------------+----------------------------+-------------+-------------+------------+----------------------------+------------+-------+-----------------+------------- NULL | NULL | movr | database | full | NULL | 2022-09-19 13:41:23.645189 | NULL | NULL | f | NULL movr | NULL | public | schema | full | NULL | 2022-09-19 13:41:23.645189 | NULL | NULL | f | NULL movr | public | users | table | full | NULL | 2022-09-19 13:41:23.645189 | 31155 | 340 | f | 16598 movr | public | vehicles | table | full | NULL | 2022-09-19 13:41:23.645189 | 22282 | 113 | f | 12459 movr | public | rides | table | full | NULL | 2022-09-19 13:41:23.645189 | 261950 | 902 | f | 135831 movr | public | vehicle_location_histories | table | full | NULL | 2022-09-19 13:41:23.645189 | 742557 | 10850 | f | 318583 movr | public | promo_codes | table | full | NULL | 2022-09-19 13:41:23.645189 | 228320 | 1034 | f | 118376 movr | public | user_promo_codes | table | full | NULL | 2022-09-19 13:41:23.645189 | 9320 | 111 | f | 4832
The output will return
file_bytes
along with the columns you receive fromSHOW BACKUP
withoutcheck_files
. Thefile_bytes
column indicates the estimated bytes in external storage for a particular table object. For more detail on the output columns, see theSHOW BACKUP
Response table.If
SHOW BACKUP ... check_files
cannot read from a file, it will return an error message similar to the following:ERROR: The following files are missing from the backup: s3:/bucket-name/2022/09/19-134123.64/data/797981063156727810.sst
SHOW BACKUP ... check_files
will return up to ten file paths for incorrect or missing files.
Validate a backup is restorable
To validate that a backup is restorable, you can run RESTORE
with the schema_only
option, which will complete a restore without restoring any rows. This process is significantly faster than running a regular restore for the purposes of validation.
A schema_only
restore produces close to complete validation coverage on backups. However, this restore type does not read or write from any of the SST files, which store the backed-up rows. You can use SHOW BACKUP ... WITH check_files
in addition to a schema_only
restore to check that these SST files are present for a restore operation. Or, you can use schema_only
in combination with verify_backup_table_data
. Refer to Validate backup table data is restorable.
Validation of full-cluster restores with schema_only
must be run on an empty cluster in the same way as a complete full-cluster restore. Once you have successfully validated the restore, you can destroy the test cluster.
Run RESTORE
with the schema_only
option, specifying either LATEST
or the specific backup you would like to restore:
RESTORE DATABASE movr FROM "2022/09/19-134123.64" IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" WITH schema_only;
job_id | status | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
797982663104856065 | succeeded | 1 | 0 | 0 | 0
(1 row)
You can also use the new_db_name
option to restore a database to a different name. For example, new_db_name = test_movr
.
Verify that the table schemas have been restored, but that the tables contain no rows:
SHOW TABLES FROM movr;
schema_name | table_name | type | owner | estimated_row_count | locality
--------------+----------------------------+-------+-------+---------------------+-----------
public | promo_codes | table | root | 0 | NULL
public | rides | table | root | 0 | NULL
public | user_promo_codes | table | root | 0 | NULL
public | users | table | root | 0 | NULL
public | vehicle_location_histories | table | root | 0 | NULL
public | vehicles | table | root | 0 | NULL
(6 rows)
You'll find the tables in place with no rows.
SELECT * FROM rides;
id | city | vehicle_city | rider_id | vehicle_id | start_address | end_address | start_time | end_time | revenue
-----+------+--------------+----------+------------+---------------+-------------+------------+----------+----------
(0 rows)
Once you have successfully validated the restore, you can revert the cluster to its pre-restore state by dropping the table or database:
DROP DATABASE movr CASCADE;
Validate backup table data is restorable
A restore with the verify_backup_table_data
option will perform a schema_only
restore, and:
- Read all rows from the storage location.
- Checksum the rows to ensure they are not corrupt.
- Discard the rows before they are written to disk.
Validation of full-cluster restores with schema_only
must be run on an empty cluster in the same way as a complete full-cluster restore. Once you have successfully validated the restore, you can destroy the test cluster.
Similarly, to just schema_only
restores, you'll find the table schemas restored. If a file is not present or unreadable in the backup, you'll receive an error.
Unlike a schema_only
restore, a verify_backup_table_data
restore also reads and checksums the rows to validate the backup.
It is necessary to include schema_only
when you run a restore with verify_backup_table_data
:
RESTORE DATABASE movr FROM LATEST IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" WITH schema_only, verify_backup_table_data;