Performance Benchmarking with TPC-C

On this page Carat arrow pointing down
Warning:
CockroachDB v2.0 is no longer supported. For more details, see the Release Support Policy.

New in v2.0:This page walks you through TPC-C performance benchmarking on CockroachDB. It measures tpmC (new order transactions/minute) on two TPC-C datasets:

  • 1,000 warehouses (for a total dataset size of 200GB) on 3 nodes
  • 10,000 warehouses (for a total dataset size of 2TB) on 30 nodes

These two points on the spectrum show how CockroachDB scales from modest-sized production workloads to larger-scale deployments. This demonstrates how CockroachDB achieves high OLTP performance of over 128,000 tpmC on a TPC-C dataset over 2TB in size.

Benchmark a small cluster

Step 1. Create 3 Google Cloud Platform GCE instances

  1. Create 3 instances for your CockroachDB nodes. While creating each instance:

    • Use the n1-highcpu-16 machine type.

      For our TPC-C benchmarking, we use n1-highcpu-16 machines. Currently, we believe this (or higher vCPU count machines) is the best configuration for CockroachDB under high traffic scenarios.

    • Create and mount a local SSD using a SCSI interface.

      We attach a single local SSD to each virtual machine. Local SSDs are low latency disks attached to each VM, which maximizes performance. We chose this configuration because it best resembles what a bare metal deployment would look like, with machines directly connected to one physical disk each. We do not recommend using network-attached block storage.

    • Optimize the local SSD for write performance (see the Disable write cache flushing section).

    • To apply the Admin UI firewall rule you created earlier, click Management, disk, networking, SSH keys, select the Networking tab, and then enter cockroachdb in the Network tags field.

  2. Note the internal IP address of each n1-highcpu-16 instance. You'll need these addresses when starting the CockroachDB nodes.

  3. Create a fourth instance for running the TPC-C benchmark.

Warning:

This configuration is intended for performance benchmarking only. For production deployments, there are other important considerations, such as ensuring that data is balanced across at least three availability zones for resiliency. See the Production Checklist for more details.

Step 2. Start a 3-node cluster

  1. SSH to the first n1-highcpu-16 instance.

  2. Download the CockroachDB archive for Linux, extract the binary, and copy it into the PATH:

    icon/buttons/copy
    $ curl https://binaries.cockroachdb.com/cockroach-v2.0.7.linux-amd64.tgz \
    | tar -xz
    
    icon/buttons/copy
    $ cp -i cockroach-v2.0.7.linux-amd64/cockroach /usr/local/bin/
    

    If you get a permissions error, prefix the command with sudo.

  3. Run the cockroach start command:

    icon/buttons/copy
    $ cockroach start \
    --insecure \
    --advertise-host=<node1 internal address> \
    --join=<node1 internal address>:26257,<node2 internal address>:26257,<node3 internal address>:26257 \
    --cache=.25 \
    --max-sql-memory=.25 \
    --background
    
  4. Repeat steps 1 - 3 for the other two n1-highcpu-16 instances.

  5. From the fourth n1-highcpu-16 instance, run the cockroach init command:

    icon/buttons/copy
    $ cockroach init --insecure --host=localhost
    

    Each node then prints helpful details to the standard output, such as the CockroachDB version, the URL for the Web UI, and the SQL URL for clients.

Step 3. Load data for the benchmark

CockroachDB offers a pre-built workload binary for Linux that includes several load generators for simulating client traffic against your cluster. This step features CockroachDB's version of the TPC-C workload.

  1. SSH to the fourth instance (the one not running a CockroachDB node), download workload, and make it executable:

    icon/buttons/copy
    $ wget https://edge-binaries.cockroachdb.com/cockroach/workload.LATEST ; chmod 755 workload.LATEST
    
  2. Rename and copy workload into the PATH:

    icon/buttons/copy
    $ cp -i workload.LATEST /usr/local/bin/workload
    
  3. Start the TPC-C workload, pointing it at the connection string of a node and including any connection parameters:

    icon/buttons/copy
    $ ./workload.LATEST fixtures load tpcc \
    --warehouses=1000 \
    "postgres://root@<node1 address>:26257?sslmode=disable"
    

    This command runs the TPC-C workload against the cluster. This will take about an hour and loads 1,000 "warehouses" of data.

    Tip:

    For more tpcc options, use workload run tpcc --help. For details about other load generators included in workload, use workload run --help.

  4. To monitor the load generator's progress, follow along with the process on the Admin UI > Jobs table.

    Open the Admin UI by pointing a browser to the address in the admin field in the standard output of any node on startup. Follow along with the process on the Admin UI > Jobs table.

Step 4. Run the benchmark

Still on the fourth instance, run workload for five minutes against the other 3 instances:

icon/buttons/copy
$ ./workload.LATEST run tpcc \
--ramp=30s \
--warehouses=1000 \
--duration=300s \
--split \
--scatter \
"postgres://root@<node1 address>:26257?sslmode=disable postgres://root@<node2 address>:26257?sslmode=disable postgres://root@<node3 address>:26257?sslmode=disable [...space separated list]"

Step 5. Interpret the results

Once the workload has finished running, you should see a final output line:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  298.9s    13154.0 102.3%     75.1     71.3    113.2    130.0    184.5    436.2

You will also see some audit checks and latency statistics for each individual query. For this run, some of those checks might indicate that they were SKIPPED due to insufficient data. For a more comprehensive test, run workload for a longer duration (e.g., two hours). The tpmC (new order transactions/minute) number is the headline number and efc ("efficiency") tells you how close CockroachDB gets to theoretical maximum tpmC.

The TPC-C specification has p90 latency requirements in the order of seconds, but as you see here, CockroachDB far surpasses that requirement with p90 latencies in the hundreds of milliseconds.

Benchmark a large cluster

The methodology for reproducing CockroachDB's 30-node, 10,000 warehouse TPC-C result is similar to that for the 3-node, 1,000 warehouse example. The only difference (besides the larger node count and dataset) is that you will use CockroachDB's partitioning feature to ensure replicas for any given section of data are located on the same nodes that will be queried by the load generator for that section of data. Partitioning helps distribute the workload evenly across the cluster.

Before you start

Benchmarking a large cluster uses partitioning. You must have a valid enterprise license to use partitioning features. For details about requesting and setting a trial or full enterprise license, see Enterprise Licensing.

Step 1. Create 30 Google Cloud Platform GCE instances

  1. Create 30 instances for your CockroachDB nodes. While creating each instance:

    • Use the n1-highcpu-16 machine type.

      For our TPC-C benchmarking, we use n1-highcpu-16 machines. Currently, we believe this (or higher vCPU count machines) is the best configuration for CockroachDB under high traffic scenarios.

    • Create and mount a local SSD.

      We attach a single local SSD to each virtual machine. Local SSDs are low latency disks attached to each VM, which maximizes performance. We chose this configuration because it best resembles what a bare metal deployment would look like, with machines directly connected to one physical disk each. We do not recommend using network-attached block storage.

    • Optimize the local SSD for write performance (see the Disable write cache flushing section).

    • To apply the Admin UI firewall rule you created earlier, click Management, disk, networking, SSH keys, select the Networking tab, and then enter cockroachdb in the Network tags field.

  2. Note the internal IP address of each n1-highcpu-16 instance. You'll need these addresses when starting the CockroachDB nodes.

  3. Create a 31st instance for running the TPC-C benchmark.

Warning:

This configuration is intended for performance benchmarking only. For production deployments, there are other important considerations, such as ensuring that data is balanced across at least three availability zones for resiliency. See the Production Checklist for more details.

Step 2. Add an enterprise license

For this benchmark, you will use partitioning, which is an enterprise feature. For details about requesting and setting a trial or full enterprise license, see Enterprise Licensing.

To add an enterprise license to your cluster once it is started, use the built-in SQL client as follows:

  1. SSH to the 31st instance (the one not running a CockroachDB node) and launch the built-in SQL client:

    icon/buttons/copy
    $ cockroach sql --insecure
    
  2. Add your enterprise license:

    icon/buttons/copy
    > SET CLUSTER SETTING enterprise.license = '<secret>';
    
  3. Exit the interactive shell, using \q or ctrl-d.

Step 3. Start a 30-node cluster

  1. SSH to the first n1-highcpu-16 instance.

  2. Download the CockroachDB archive for Linux, extract the binary, and copy it into the PATH:

    icon/buttons/copy
    $ curl https://binaries.cockroachdb.com/cockroach-v2.0.7.linux-amd64.tgz \
    | tar -xz
    
    icon/buttons/copy
    $ cp -i cockroach-v2.0.7.linux-amd64/cockroach /usr/local/bin/
    

    If you get a permissions error, prefix the command with sudo.

  3. Run the cockroach start command:

    icon/buttons/copy
    $ cockroach start \
    --insecure \
    --advertise-host=<node1 internal address> \
    --join=<node1 internal address>:26257,<node2 internal address>:26257,<node3 internal address>:26257, [...] \
    --cache=.25 \
    --max-sql-memory=.25 \
    --locality=rack=1 \
    --background
    

    Each node will start with a locality that includes an artificial "rack number" (e.g., --locality=rack=1). Use 10 racks for 30 nodes so that every tenth node is part of the same rack (e.g., --locality=rack=2, --locality=rack=3, ...).

  4. Repeat steps 1 - 3 for the other 29 n1-highcpu-16 instances.

  5. From the 31st n1-highcpu-16 instance, run the cockroach init command:

    icon/buttons/copy
    $ cockroach init --insecure --host=localhost
    

    Each node then prints helpful details to the standard output, such as the CockroachDB version, the URL for the Web UI, and the SQL URL for clients.

Step 4. Load data for the benchmark

CockroachDB offers a pre-built workload binary for Linux that includes several load generators for simulating client traffic against your cluster. This step features CockroachDB's version of the TPC-C workload.

  1. Still on the 31st instance (the one not running a CockroachDB node), download workload, and make it executable:

    icon/buttons/copy
    $ wget https://edge-binaries.cockroachdb.com/cockroach/workload.LATEST ; chmod 755 workload.LATEST
    
  2. Rename and copy workload into the PATH:

    icon/buttons/copy
    $ cp -i workload.LATEST /usr/local/bin/workload
    
  3. Start the TPC-C workload, pointing it at the connection string of a node and including any connection parameters:

    icon/buttons/copy
    $  ./workload.LATEST fixtures load tpcc \
    --warehouses=10000 \
    "postgres://root@<node1 address>:26257?sslmode=disable"
    

    This command runs the TPC-C workload against the cluster. This will take at about an hour and loads 10,000 "warehouses" of data.

    Tip:

    For more tpcc options, use workload run tpcc --help. For details about other load generators included in workload, use workload run --help.

  4. To monitor the load generator's progress, follow along with the process on the Admin UI > Jobs table.

    Open the Admin UI by pointing a browser to the address in the admin field in the standard output of any node on startup.

Step 5. Increase the snapshot rate

To increase the snapshot rate, which helps speed up this large-scale data movement:

  1. Still on the 31st instance, launch the built-in SQL client:

    icon/buttons/copy
    $ cockroach sql --insecure
    
  2. Set the cluster setting to increase the snapshot rate:

    icon/buttons/copy
    > SET CLUSTER SETTING kv.snapshot_rebalance.max_rate='64MiB';
    
  3. Exit the interactive shell, using \q or ctrl-d.

Step 6. Partition the database

Next, partition your database to divide all of the TPC-C tables and indexes into ten partitions, one per rack, and then use zone configurations to pin those partitions to a particular rack.

  1. Still on the 31st instance, start the partitioning:

    icon/buttons/copy
    $ ulimit -n 10000 && workload.LATEST run tpcc \
    --partitions=10 \
    --split \
    --scatter \
    --warehouses=10000 \
    --duration=1s \
    "postgres://root@<node31 address>:26257?sslmode=disable"
    

    This command runs the TPC-C workload against the cluster for 1 second, long enough to add the partitions.

    Partitioning the data will take at least 12 hours. It takes this long because all of the data (over 2TB replicated for TPC-C-10K) needs to be moved to the right locations.

  2. To watch the progress, follow along with the process on the Admin UI > Metrics > Queues > Replication Queue graph. Change the timeframe to Last 10 Min to view a more granular graph.

    Open the Admin UI by pointing a browser to the address in the admin field in the standard output of any node on startup.

    Once the Replication Queue gets to 0 for all actions and stays there, the cluster should be finished rebalancing and is ready for testing.

Step 7. Run the benchmark

Still on the 31st instance, run workload for five minutes against the other 30 instances:

$ ulimit -n 10000 && ./workload.LATEST run tpcc \
--warehouses=10000 \
--ramp=30s \
--duration=300s \
--split \
--scatter \
"postgres://root@<node1 address>:26257?sslmode=disable postgres://root@<node2 address>:26257?sslmode=disable postgres://root@<node3 address>:26257?sslmode=disable [...space separated list]"

Step 8. Interpret the results

Once the workload has finished running, you should see a final output line similar to the output in Benchmark a small cluster. The tpmC should be about 10x higher, reflecting the increase in the number of warehouses:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  291.6s   131109.8 102.0%    115.3     88.1    184.5    268.4    637.5   4295.0

You will also see some audit checks and latency statistics for each individual query. For this run, some of those checks might indicate that they were SKIPPED due to insufficient data. For a more comprehensive test, run workload for a longer duration (e.g., two hours). The tpmC (new order transactions/minute) number is the headline number and efc ("efficiency") tells you how close CockroachDB gets to theoretical maximum tpmC.

The TPC-C specification has p90 latency requirements in the order of seconds, but as you see here, CockroachDB far surpasses that requirement with p90 latencies in the hundreds of milliseconds.

See also


Yes No
On this page

Yes No