Why is my process hanging when I try to start nodes with the --background
flag?
Cockroach Labs recommends against using the --background
flag when starting a cluster. In production, operators usually use a process manager like systemd
to start and manage the cockroach
process on each node. Refer to Deploy CockroachDB On-Premises. When testing locally, starting nodes in the foreground is recommended so you can monitor the runtime closely.
If you do use --background
, you should also set --pid-file
. To stop or restart a cluster, send the SIGTERM
signal to the process ID in the PID file.
Check whether you have previously run a multi-node cluster using the same data directory. If you have not, refer to Troubleshoot Cluster Setup.
If you have previously started and stopped a multi-node cluster, and are now trying to bring it back up, note the following:
The --background
flag of cockroach start
causes the start
command to wait until the node has fully initialized and is able to start serving queries. In addition, to keep your data consistent, CockroachDB waits until a majority of nodes are running. This means that if only one node of a three-node cluster is running, that one node will not be operational.
As a result, starting nodes with the --background
flag will cause cockroach start
to hang until a majority of nodes are fully initialized.
To restart your cluster, you should either:
- Use multiple terminal windows to start multiple nodes in the foreground.
- Start each node in the background using your shell's functionality (e.g.,
cockroach start &
) instead of using the--background
flag.
Why is memory usage increasing despite lack of traffic?
Like most databases, CockroachDB caches the most recently accessed data in memory so that it can provide faster reads, and its periodic writes of time-series data cause that cache size to increase until it hits its configured limit. For information about manually controlling the cache size, see Recommended Production Settings.
Why is disk usage increasing despite lack of writes?
By default, DB Console stores time-series cluster metrics within the cluster. By default, data is retained at 10-second granularity for 10 days, and at 30-minute granularity for 90 days. An automatic job periodically runs and prunes historical data. For the first several days of your cluster's life, the cluster's time-series data grows continually.
CockroachDB writes about 15 KiB per second per node to the time-series database. About half of that is optimized away by the storage engine. Therefore an estimated calculation of how much data will be stored in the time-series database is:
8 KiB * 24 hours * 3600 seconds/hour * number of days
For the first 10 days of your cluster's life, you can expect storage per node to increase by about the following amount:
8 * 24 * 3600 * 10 = 6912000
or about 6 GiB. With on-disk compression, the actual disk usage is likely to be about 4 GiB.
However, depending on your usage of time-series charts in the DB Console, you may prefer to reduce the amount of disk used by time-series data. To reduce the amount of time-series data stored, or to disable it altogether, refer to Can I reduce or disable the storage of time-series data?
What is the internal-delete-old-sql-stats
process and why is it consuming my resources?
When a query is executed, a process records query execution statistics on system tables. This is done by recording SQL statement fingerprints.
The CockroachDB internal-delete-old-sql-stats
process cleans up query execution statistics collected on system tables, including system.statement_statistics
and system.transaction_statistics
. These system tables have a default row limit of 1 million, set by the sql.stats.persisted_rows.max
cluster setting. When this limit is exceeded, there is an hourly cleanup job that deletes all of the data that surpasses the row limit, starting with the oldest data first. For more information about the cleanup job, use the following query:
> SELECT * FROM crdb_internal.jobs WHERE job_type='AUTO SQL STATS COMPACTION';
In general, the internal-delete-old-sql-stats
process is not expected to impact cluster performance. There are a few cases where there has been a spike in CPU due to an incredibly large amount of data being processed; however, those cases were resolved through workload optimizations and general improvements over time.
Can I reduce or disable the storage of time-series data?
Yes, you can either reduce the interval for time-series storage or disable time-series storage entirely.
After reducing or disabling time-series storage, it can take up to 24 hours for time-series data to be deleted and for the change to be reflected in DB Console metrics.
Reduce the interval for time-series storage
To reduce the interval for storage of time-series data:
- For data stored at 10-second resolution, reduce the
timeseries.storage.resolution_10s.ttl
cluster setting to anINTERVAL
value less than240h0m0s
(10 days).
For example, to change the storage interval for time-series data at 10s resolution to 5 days, run the following SET CLUSTER SETTING
command:
> SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '120h0m0s';
> SHOW CLUSTER SETTING timeseries.storage.resolution_10s.ttl;
timeseries.storage.resolution_10s.ttl
+---------------------------------------+
120:00:00
(1 row)
This setting has no effect on time-series data aggregated at 30-minute resolution, which is stored for 90 days by default.
- For data stored at 30-minute resolution, reduce the
timeseries.storage.resolution_30m.ttl
cluster setting to anINTERVAL
value less than2160h0m0s
(90 days).
Cockroach Labs recommends that you avoid increasing the period of time that DB Console retains time-series metrics. If you need to retain this data for a longer period, consider using a third-party tool such as Prometheus to collect the cluster's metrics and disabling the DB Console's collection of time-series metrics. Refer to Monitoring and Alerting.
Disable time-series storage
Disabling time-series storage is recommended only if you exclusively use a third-party tool such as Prometheus for time-series monitoring. Prometheus and other such tools do not rely on CockroachDB-stored time-series data; instead, they ingest metrics exported by CockroachDB from memory and then store the data themselves.
When storage of time-series metrics is disabled, the DB Console Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available.
To disable the storage of time-series data, run the following command:
> SET CLUSTER SETTING timeseries.storage.enabled = false;
> SHOW CLUSTER SETTING timeseries.storage.enabled;
timeseries.storage.enabled
+----------------------------+
false
(1 row)
This setting only prevents the collection of new time-series data. To also delete all existing time-series data, also change both the timeseries.storage.resolution_10s.ttl
and timeseries.storage.resolution_30m.ttl
cluster settings:
> SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '0s';
> SET CLUSTER SETTING timeseries.storage.resolution_30m.ttl = '0s';
Historical data is not deleted immediately, but is eventually removed by a background job within 24 hours.
What happens when a node runs out of disk space?
When a node runs out of disk space, it shuts down and cannot be restarted until space is freed up.
To prepare for this case, CockroachDB automatically creates an emergency ballast file in each node's storage directory that can be deleted to free up enough space to be able to restart the node.
For more information about troubleshooting disk usage issues, see storage issues.
In addition to using ballast files, it is important to actively monitor remaining disk space.
Why would increasing the number of nodes not result in more operations per second?
If queries operate on different data, then increasing the number of nodes should improve the overall throughput (transactions/second or QPS).
However, if your queries operate on the same data, you may be observing transaction contention. For details, see Transaction Contention.
Why does CockroachDB collect anonymized cluster usage details by default?
Cockroach Labs collects information about CockroachDB's real-world usage to help prioritize the development of product features. We choose our default as "opt-in" to strengthen the information collected, and are careful to send only anonymous, aggregate usage statistics. For details on what information is collected and how to opt out, see Diagnostics Reporting.
What happens when node clocks are not properly synchronized?
CockroachDB requires moderate levels of clock synchronization to preserve data consistency. For this reason, when a node detects that its clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed, it spontaneously shuts down. This offset defaults to 500ms but can be changed via the --max-offset
flag when starting each node.
While serializable consistency is maintained regardless of clock skew, skew outside the configured clock offset bounds can result in violations of single-key linearizability between causally dependent transactions. It's therefore important to prevent clocks from drifting too far by running NTP or other clock synchronization software on each node.
In very rare cases, CockroachDB can momentarily run with a stale clock. This can happen when using vMotion, which can suspend a VM running CockroachDB, migrate it to different hardware, and resume it. This will cause CockroachDB to be out of sync for a short period before it jumps to the correct time. During this window, it would be possible for a client to read stale data and write data derived from stale reads. By enabling the server.clock.forward_jump_check_enabled
cluster setting, you can be alerted when the CockroachDB clock jumps forward, indicating it had been running with a stale clock. To protect against this on vMotion, however, use the --clock-device
flag to specify a PTP hardware clock for CockroachDB to use when querying the current time. When doing so, you should not enable server.clock.forward_jump_check_enabled
because forward jumps will be expected and harmless. For more information on how --clock-device
interacts with vMotion, see this blog post.
Considerations
When setting up clock synchronization:
- All nodes in the cluster must be synced to the same time source, or to different sources that implement leap second smearing in the same way. For example, Google and Amazon have time sources that are compatible with each other (they implement leap second smearing in the same way), but are incompatible with the default NTP pool (which does not implement leap second smearing).
- For nodes running in AWS, we recommend Amazon Time Sync Service. For nodes running in GCP, we recommend Google's internal NTP service. For nodes running elsewhere, we recommend Google Public NTP. Note that the Google and Amazon time services can be mixed with each other, but they cannot be mixed with other time services (unless you have verified leap second behavior). Either all of your nodes should use the Google and Amazon services, or none of them should.
- If you do not want to use the Google or Amazon time sources, you can use
chrony
and enable client-side leap smearing, unless the time source you're using already does server-side smearing. In most cases, we recommend the Google Public NTP time source because it handles smearing the leap second. If you use a different NTP time source that doesn't smear the leap second, you must configure client-side smearing manually and do so in the same way on each machine. - Do not run more than one clock sync service on VMs where
cockroach
is running. - For new clusters using the multi-region SQL abstractions, Cockroach Labs recommends lowering the
--max-offset
setting to250ms
. This setting is especially helpful for lowering the write latency of global tables. Nodes can run with different values for--max-offset
, but only for the purpose of updating the setting across the cluster using a rolling upgrade.
Tutorials
For guidance on synchronizing clocks, see the tutorial for your deployment environment:
Environment | Featured Approach |
---|---|
On-Premises | Use NTP with Google's external NTP service. |
AWS | Use the Amazon Time Sync Service. |
Azure | Disable Hyper-V time synchronization and use NTP with Google's external NTP service. |
Digital Ocean | Use NTP with Google's external NTP service. |
GCE | Use NTP with Google's internal NTP service. |
How can I tell how well node clocks are synchronized?
As explained in more detail in our monitoring documentation, each CockroachDB node exports a wide variety of metrics at http://<host>:<http-port>/_status/vars
in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes:
Metric | Definition |
---|---|
clock_offset_meannanos |
The mean difference between the node's clock and other nodes' clocks in nanoseconds |
clock_offset_stddevnanos |
The standard deviation of the difference between the node's clock and other nodes' clocks in nanoseconds |
As described in the above answer, a node will shut down if the mean offset of its clock from the other nodes' clocks exceeds 80% of the maximum offset allowed. It's recommended to monitor the clock_offset_meannanos
metric and alert if it's approaching the 80% threshold of your cluster's configured max offset.
You can also see these metrics in the Clock Offset graph on the DB Console.
How do I prepare for planned node maintenance?
Perform a node shutdown to temporarily stop a node that you plan to restart.