Because of CockroachDB's multi-active availability design, you can perform a "rolling upgrade" of your CockroachDB cluster. This means that you can upgrade nodes one at a time without interrupting the cluster's overall health and operations.
This page describes how to upgrade to the latest v21.1 release, v21.1.21.
Terminology
Before upgrading, review the CockroachDB release terminology:
- A new major release is performed every 6 months. The major version number indicates the year of release followed by the release number, which will be either 1 or 2. For example, the latest major release is v23.1 (also written as v23.1.0).
- Each supported major release is maintained across patch releases that fix crashes, security issues, and data correctness issues. Each patch release increments the major version number with its corresponding patch number. For example, patch releases of v23.1 use the format v23.1.x.
- All major and patch releases are suitable for production usage, and are therefore considered "production releases". For example, the latest production release is v23.1.13.
- Prior to an upcoming major release, alpha and beta releases and release candidates are made available. These "testing releases" are not suitable for production usage. They are intended for users who need early access to a feature before it is available in a production release. These releases append the terms
alpha
,beta
, orrc
to the version number.
There are no "minor releases" of CockroachDB.
Step 1. Verify that you can upgrade
Run cockroach sql
against any node in the cluster to open the SQL shell. Then check your current cluster version:
> SHOW CLUSTER SETTING version;
Before upgrading from v20.2 to v21.1, you must ensure that any previously decommissioned nodes are fully decommissioned. Otherwise, they will block the upgrade. For instructions, see Check decommissioned nodes.
To upgrade to v21.1.21, you must be running either:
Any earlier v21.1 release: v21.1.0-alpha.1 to v21.1.19.
A v20.2 production release: v20.2.0 to v20.2.19.
If you are running any other version, take the following steps before continuing on this page:
Version | Action(s) before upgrading to any v21.1 release |
---|---|
Pre-v21.1 testing release | Upgrade to a corresponding production release; then upgrade through each subsequent major release, ending with a v20.2 production release. |
Pre-v20.2 production release | Upgrade through each subsequent major release, ending with a v20.2 production release. |
v20.2 testing release | Upgrade to a v20.2 production release. |
When you are ready to upgrade to v21.1.21, continue to step 2.
Step 2. Prepare to upgrade
Before starting the upgrade, complete the following steps.
Check load balancing
Make sure your cluster is behind a load balancer, or your clients are configured to talk to multiple nodes. If your application communicates with a single node, stopping that node to upgrade its CockroachDB binary will cause your application to fail.
Check cluster health
Verify the overall health of your cluster using the DB Console. On the Overview:
Under Node Status, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as suspect or dead, identify why the nodes are offline and either restart them or decommission them before beginning your upgrade. If there are dead and non-decommissioned nodes in your cluster, it will not be possible to finalize the upgrade (either automatically or manually).
Under Replication Status, make sure there are 0 under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to identify and resolve the cause of range under-replication and/or unavailability before beginning your upgrade.
In the Node List:
- Make sure all nodes are on the same version. If any nodes are behind, upgrade them to the cluster's current version first, and then start this process over.
- Make sure capacity and memory usage are reasonable for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. Also go to Metrics > Dashboard: Hardware and make sure CPU percent is reasonable across the cluster. If there's not enough headroom on any of these metrics, consider adding nodes to your cluster before beginning your upgrade.
Check decommissioned nodes
If your cluster contains partially-decommissioned nodes, they will block an upgrade attempt.
To check the status of decommissioned nodes, run the
cockroach node status --decommission
command:cockroach node status --decommission
In the output, verify that the value of the
membership
field of each node isdecommissioned
. If any node'smembership
value isdecommissioning
, that node is not fully decommissioned.If any node is not fully decommissioned, try the following:
- Before upgrading from v20.2 to v21.1, you must manually change the status of any
decommissioning
nodes todecommissioned
. To do this, runcockroach node decommission
on these nodes and confirm that they update todecommissioned
. - For a cluster running v21.1 and above:
- First, reissue the decommission command. The second command typically succeeds within a few minutes.
- If the second decommission command does not succeed, recommission and then decommission it again. Before continuing the upgrade, the node must be marked as
decommissioned
.
- Before upgrading from v20.2 to v21.1, you must manually change the status of any
Review breaking changes
Review the backward-incompatible changes in v21.1 and deprecated features. If any affect your deployment, make the necessary changes before starting the rolling upgrade to v21.1.
Two changes that are particularly important to note:
As of v21.1, CockroachDB always uses the Pebble storage engine. As such,
pebble
is the default and only option for the--storage-engine
flag on thecockroach start
command. RocksDB can no longer be used as the storage engine.- If your cluster currently uses RocksDB as the storage engine, before you upgrade to v21.1, restart each of your nodes, removing
--storage-engine=rocksdb
from thecockroach start
command. You can follow the same rolling process described in step 4 below, but do not change the binary; just remove the--storage-engine=rocksdb
flag and restart.
- If your cluster currently uses RocksDB as the storage engine, before you upgrade to v21.1, restart each of your nodes, removing
Interleaving data was deprecated in v20.2, disabled by default in v21.1, and removed from CockroachDB in v21.2. For migration steps, see the interleave deprecation notice.
- If your cluster includes interleaved data and you perform backups, first make sure you are running v20.2.10+; then update your
BACKUP
commands to use theINCLUDE_DEPRECATED_INTERLEAVES
option; and only then return to this page and upgrade to v21.1. Note that theINCLUDE_DEPRECATED_INTERLEAVES
option is a no-op in v20.2.10, but this sequence is the only way to prevent backups including interleaved data from failing on v21.1.
- If your cluster includes interleaved data and you perform backups, first make sure you are running v20.2.10+; then update your
Step 3. Decide how the upgrade will be finalized
This step is relevant only when upgrading from v20.2.x to v21.1. For upgrades within the v21.1.x series, skip this step.
By default, after all nodes are running the new version, the upgrade process will be auto-finalized. This will enable certain features and performance improvements introduced in v21.1. However, it will no longer be possible to perform a downgrade to v20.2. In the event of a catastrophic failure or corruption, the only option will be to start a new cluster using the old binary and then restore from one of the backups created prior to performing the upgrade. For this reason, we recommend disabling auto-finalization so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade, but note that you will need to follow all of the subsequent directions, including the manual finalization in step 5:
Upgrade to v20.2, if you haven't already.
Start the
cockroach sql
shell against any node in the cluster.Set the
cluster.preserve_downgrade_option
cluster setting:> SET CLUSTER SETTING cluster.preserve_downgrade_option = '20.2';
It is only possible to set this setting to the current cluster version.
Features that require upgrade finalization
When upgrading from v20.2 to v21.1, certain features and performance improvements will be enabled only after finalizing the upgrade, including but not limited to:
Improved multi-region features: After finalization, it will be possible to use new and improved multi-region features, such as the ability to set database regions, survival goals, and table localities. Internal capabilities supporting these features, such as non-voting replicas and non-blocking transactions, will be available after finalization as well.
Empty arrays in GIN indexes: After finalization, newly created GIN indexes will contain rows containing empty arrays in
ARRAY
columns, which allows the indexes to be used for more queries. Note, however, that rows containingNULL
values in an indexed column will still not be included in GIN indexes.Virtual computed columns: After finalization, it will be possible to use the
VIRTUAL
keyword to define virtual computed columns.Changefeed support for primary key changes: After finalization, changefeeds will detect primary key changes.
Step 4. Perform the rolling upgrade
For each node in your cluster, complete the following steps. Be sure to upgrade only one node at a time, and wait at least one minute after a node rejoins the cluster to upgrade the next node. Simultaneously upgrading more than one node increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability.
We recommend creating scripts to perform these steps instead of performing them manually. Also, if you are running CockroachDB on Kubernetes, see our documentation on single-cluster and/or multi-cluster orchestrated deployments for upgrade guidance instead.
These steps perform an upgrade to the latest v21.1 release, v21.1.21.
Drain and stop the node using one of the following methods:
- If the node was started with a process manager, gracefully stop the node by sending
SIGTERM
with the process manager. If the node is not shutting down after 1 minute, sendSIGKILL
to terminate the process. When usingsystemd
, for example, setTimeoutStopSec=60
in your configuration template and runsystemctl stop <systemd config filename>
to stop the node withoutsystemd
restarting it. - If the node was started using
cockroach start
and is running in the foreground, pressctrl-c
in the terminal. - If the node was started using
cockroach start
and the--background
and--pid-file
flags, runkill <pid>
, where<pid>
is the process ID of the node.
Note:The amount of time you should wait before sending
SIGKILL
can vary depending on your cluster configuration and workload, which affects how long it takes your nodes to complete a graceful shutdown. In certain edge cases, forcefully terminating the process before the node has completed shutdown can result in temporary data unavailability, latency spikes, uncertainty errors, ambiguous commit errors, or query timeouts. If you need maximum cluster availability, you can runcockroach node drain
prior to node shutdown and actively monitor the draining process instead of automating it.Verify that the process has stopped:
$ ps aux | grep cockroach
Alternately, you can check the node's logs for the message
server drained and shutdown completed
.- If the node was started with a process manager, gracefully stop the node by sending
Download and install the CockroachDB binary you want to use:
$ curl https://binaries.cockroachdb.com/cockroach-v21.1.21.darwin-10.9-amd64.tgz|tar -xzf -
$ curl https://binaries.cockroachdb.com/cockroach-v21.1.21.linux-amd64.tgz|tar -xzf -
If you use
cockroach
in your$PATH
, rename the outdatedcockroach
binary, and then move the new one into its place:i="$(which cockroach)"; mv "$i" "$i"_old
$ cp -i cockroach-v21.1.21.darwin-10.9-amd64/cockroach /usr/local/bin/cockroach
i="$(which cockroach)"; mv "$i" "$i"_old
$ cp -i cockroach-v21.1.21.linux-amd64/cockroach /usr/local/bin/cockroach
Start the node to have it rejoin the cluster.
Warning:For maximum availability, do not wait more than a few minutes before restarting the node with the new binary. See this open issue for context.
Without a process manager like
systemd
, re-run thecockroach start
command that you used to start the node initially, for example:$ cockroach start \ --certs-dir=certs \ --advertise-addr=<node address> \ --join=<node1 address>,<node2 address>,<node3 address>
If you are using
systemd
as the process manager, run this command to start the node:$ systemctl start <systemd config filename>
Verify the node has rejoined the cluster through its output to
stdout
or through the DB Console.If you use
cockroach
in your$PATH
, you can remove the old binary:$ rm /usr/local/bin/cockroach_old
If you leave versioned binaries on your servers, you do not need to do anything.
After the node has rejoined the cluster, ensure that the node is ready to accept a SQL connection.
Unless there are tens of thousands of ranges on the node, it's usually sufficient to wait one minute. To be certain that the node is ready, run the following command:
cockroach sql -e 'select 1'
The command will automatically wait to complete until the node is ready.
Repeat these steps for the next node.
Step 5. Finish the upgrade
This step is relevant only when upgrading from v20.2.x to v21.1. For upgrades within the v21.1.x series, skip this step.
If you disabled auto-finalization in step 3, monitor the stability and performance of your cluster for as long as you require to feel comfortable with the upgrade (generally at least a day). If during this time you decide to roll back the upgrade, repeat the rolling restart procedure with the old binary.
Once you are satisfied with the new version:
Start the
cockroach sql
shell against any node in the cluster.Re-enable auto-finalization:
> RESET CLUSTER SETTING cluster.preserve_downgrade_option;
Note:This statement can take up to a minute to complete, depending on the amount of data in the cluster, as it kicks off various internal maintenance and migration tasks. During this time, the cluster will experience a small amount of additional load.
Check the cluster version to confirm that the finalize step has completed:
> SHOW CLUSTER SETTING version;
Troubleshooting
After the upgrade has finalized (whether manually or automatically), it is no longer possible to downgrade to the previous release. If you are experiencing problems, we therefore recommend that you:
Run the
cockroach debug zip
command against any node in the cluster to capture your cluster's state.Reach out for support from Cockroach Labs, sharing your debug zip.
In the event of catastrophic failure or corruption, the only option will be to start a new cluster using the old binary and then restore from one of the backups created prior to performing the upgrade.