Distributed Dashboard

On this page Carat arrow pointing down
Warning:
CockroachDB v22.1 is no longer supported. For more details, see the Release Support Policy.

The Distributed dashboard lets you monitor important distribution layer health and performance metrics.

To view this dashboard, access the DB Console and click Metrics on the left-hand navigation, and then select Dashboard > Distributed.

Dashboard navigation

Use the Graph menu to display metrics for your entire cluster or for a specific node:

  • When set to Graph: Cluster, data is aggregated together for all nodes in your cluster.
  • When set to Graph: {node}, only data for the specific selected node is shown.

To the right of the Graph and Dashboard menus, a time interval selector allows you to filter the view for a predefined or custom time interval. Use the navigation buttons to move to the previous, next, or current time interval. When you select a time interval, the same interval is selected in the SQL Activity pages. However, if you select 10 or 30 minutes, the interval defaults to 1 hour in SQL Activity pages.

When viewing graphs, a tooltip will appear at your mouse cursor providing further insight into the data under the mouse cursor. Click anywhere within the graph to pin the tooltip in place, decoupling the tooltip from your mouse movements. Click anywhere within the graph to cause the tooltip to follow your mouse once more.

Note:

All timestamps in the DB Console are shown in Coordinated Universal Time (UTC).

The Distributed dashboard displays the following time series graphs:

Batches

DB Console batches graph

The Batches graph displays various details about BatchRequest traffic in the Distribution layer.

Hovering over the graph displays values for the following metrics:

Metric Description
Batches The number of BatchRequests made, as tracked by the distsender.batches metric.
Partial Batches The number of partial BatchRequests made, as tracked by the distsender.batches.partial metric.

RPCs

DB Console RPCs graph

The RPCs graph displays various details about RPC traffic in the Distribution layer.

Hovering over the graph displays values for the following metrics:

Metric Description
RPCs Sent The number of RPC calls made, as tracked by the distsender.rpc.sent metric.
Local Fast-path The number of local fast-path RPC calls made, as tracked by the distsender.rpc.sent.local metric.

RPC Errors

DB Console RPC errors graph

The RPC Errors graph displays various details about RPC errors encountered in the Distribution layer.

Hovering over the graph displays values for the following metrics:

Metric Description
Replica Errors The number of RPCs sent due to per-replica errors, as tracked by the distsender.rpc.sent.nextreplicaerror metric.
Not Leaseholder Errors The number of NotLeaseHolderErrors logged, as tracked by the distsender.errors.notleaseholder metric.

KV Transactions

DB Console KV transactions graph

The KV Transactions graph displays various details about transactions in the Transaction layer.

Hovering over the graph displays values for the following metrics:

Metric Description
Committed The number of committed KV transactions (including fast-path), as tracked by the txn.commits metric.
Fast-path Committed The number of committed one-phase KV transactions, as tracked by the txn.commits1PC metric.
Aborted The number of aborted KV transactions, as tracked by the txn.aborts metric.

KV Transaction Durations: 99th percentile

DB Console KV transaction durations: 99th percentile graph

The KV Transaction Durations: 99th percentile graph displays the 99th percentile of transaction durations over a one-minute period.

Hovering over the graph displays values for the following metrics:

Metric Description
<node> The 99th percentile of transaction durations observed over a one-minute period for that node, as calculated from the txn.durations metric.

KV Transaction Durations: 90th percentile

DB Console KV transaction durations: 90th percentile graph

The KV Transaction Durations: 90th percentile graph displays the 90th percentile of transaction durations over a one-minute period.

Hovering over the graph displays values for the following metrics:

Metric Description
<node> The 90th percentile of transaction durations observed over a one-minute period for that node, as calculated from the txn.durations metric.

Node Heartbeat Latency: 99th percentile

DB Console node heartbeat latency: 99th percentile graph

The Node Heartbeat Latency: 99th percentile graph displays the 99th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period.

Hovering over the graph displays values for the following metrics:

Metric Description
<node> The 99th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period for that node, as calculated from the liveness.heartbeatlatency metric.

Node Heartbeat Latency: 90th percentile

DB Console node heartbeat latency: 90th percentile graph

The Node Heartbeat Latency: 90th percentile graph displays the 90th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period.

Hovering over the graph displays values for the following metrics:

Metric Description
<node> The 90th percentile of time elapsed between node liveness heartbeats on the cluster over a one-minute period for that node, as calculated from the liveness.heartbeatlatency metric.

Summary and events

Summary panel

A Summary panel of key metrics is displayed to the right of the timeseries graphs.

Metric Description
Total Nodes The total number of nodes in the cluster. Decommissioned nodes are not included in this count.
Capacity Used The storage capacity used as a percentage of usable capacity allocated across all nodes.
Unavailable Ranges The number of unavailable ranges in the cluster. A non-zero number indicates an unstable cluster.
Queries per second The total number of SELECT, UPDATE, INSERT, and DELETE queries executed per second across the cluster.
P99 Latency The 99th percentile of service latency.
Note:

If you are testing your deployment locally with multiple CockroachDB nodes running on a single machine (this is not recommended in production), you must explicitly set the store size per node in order to display the correct capacity. Otherwise, the machine's actual disk capacity will be counted as a separate store for each node, thus inflating the computed capacity.

Events panel

Underneath the Summary panel, the Events panel lists the 5 most recent events logged for all nodes across the cluster. To list all events, click View all events.

DB Console Events

The following types of events are listed:

  • Database created
  • Database dropped
  • Table created
  • Table dropped
  • Table altered
  • Index created
  • Index dropped
  • View created
  • View dropped
  • Schema change reversed
  • Schema change finished
  • Node joined
  • Node decommissioned
  • Node restarted
  • Cluster setting changed

See also


Yes No
On this page

Yes No