DELETE

On this page Carat arrow pointing down
Warning:
CockroachDB v20.1 is no longer supported. For more details, see the Release Support Policy.

The DELETE statement deletes rows from a table.

Warning:
If you delete a row that is referenced by a foreign key constraint and has an ON DELETE action, all of the dependent rows will also be deleted or updated.
Note:
To delete columns, see DROP COLUMN.

Required privileges

The user must have the DELETE and SELECT privileges on the table.

Synopsis

Parameters

Parameter Description
common_table_expr See Common Table Expressions.
table_name The name of the table that contains the rows you want to update.
AS table_alias_name An alias for the table name. When an alias is provided, it completely hides the actual table name.
WHERE a_expr a_expr must be an expression that returns Boolean values using columns (e.g., <column> = <value>). Delete rows that return TRUE.

Without a WHERE clause in your statement, DELETE removes all rows from the table. To delete all rows in a table, we recommend using TRUNCATE instead of DELETE.
sort_clause An ORDER BY clause.

See Ordering of rows in DML statements for more details.
limit_clause A LIMIT clause. See Limiting Query Results for more details.
RETURNING target_list Return values based on rows deleted, where target_list can be specific column names from the table, * for all columns, or computations using scalar expressions.

To return nothing in the response, not even the number of rows updated, use RETURNING NOTHING.

Success responses

Successful DELETE statements return one of the following:

Response Description
DELETE int int rows were deleted.

DELETE statements that do not delete any rows respond with DELETE 0. When RETURNING NOTHING is used, this information is not included in the response.
Retrieved table Including the RETURNING clause retrieves the deleted rows, using the columns identified by the clause's parameters.

See an example.

Disk space usage after deletes

Deleting a row does not immediately free up the disk space. This is due to the fact that CockroachDB retains the ability to query tables historically.

If disk usage is a concern, the solution is to reduce the time-to-live (TTL) for the zone by setting gc.ttlseconds to a lower value, which will cause garbage collection to clean up deleted objects (rows, tables) more frequently.

Select performance on deleted rows

Queries that scan across tables that have lots of deleted rows will have to scan over deletions that have not yet been garbage collected. Certain database usage patterns that frequently scan over and delete lots of rows will want to reduce the time-to-live values to clean up deleted rows more frequently.

Sorting the output of deletes

To sort the output of a DELETE statement, use:

icon/buttons/copy
> WITH a AS (DELETE ... RETURNING ...)
  SELECT ... FROM a ORDER BY ...

For an example, see Sort and return deleted rows.

For more information about ordering query results in general, see Ordering Query Results and Ordering of rows in DML statements.

Force index selection for deletes

By using the explicit index annotation (also known as "index hinting"), you can override CockroachDB's index selection and use a specific index for deleting rows of a named table.

Note:

Index selection can impact performance, but does not change the result of a query.

The syntax to force a specific index for a delete is:

icon/buttons/copy
> DELETE FROM table@my_idx;

This is equivalent to the longer expression:

icon/buttons/copy
> DELETE FROM table@{FORCE_INDEX=my_idx};

To view how the index hint modifies the query plan that CockroachDB follows for deleting rows, use an EXPLAIN statement. To see all indexes available on a table, use SHOW INDEXES.

For examples, see Delete with index hints.

Batch deletes

To delete a large number of rows (i.e., tens of thousands of rows or more), we recommend iteratively deleting subsets of the rows that you want to delete, until all of the unwanted rows have been deleted. You can write a script to do this, or you can write a loop into your application.

In the sections below, we provide guidance on batch deleting with the DELETE query filter on an indexed column and on a non-indexed column. Filtering on an indexed column is both simpler to implement and more efficient, but adding an index to a table can slow down insertions to the table and may cause bottlenecks. Queries that filter on a non-indexed column must perform at least one full-table scan, a process that takes time proportional to the size of the entire table.

Warning:

Exercise caution when batch deleting rows from tables with foreign key constraints and explicit ON DELETE foreign key actions. To preserve DELETE performance on tables with foreign key actions, we recommend using smaller batch sizes, as additional rows updated or deleted due to ON DELETE actions can make batch loops significantly slower.

Batch delete on an indexed column

For high-performance batch deletes, we recommending filtering the DELETE query on an indexed column.

Note:

Having an indexed filtering column can make delete operations faster, but it might lead to bottlenecks in execution, especially if the filtering column is a timestamp. To reduce bottlenecks, we recommend using a hash-sharded index.

Each iteration of a batch-delete loop should execute a transaction containing a single DELETE query. When writing this DELETE query:

  • Use a WHERE clause to filter on a column that identifies the unwanted rows. If the filtering column is not the primary key, the column should have a secondary index. Note that if the filtering column is not already indexed, it is not beneficial to add an index just to speed up batch deletes. Instead, consider batch deleting on non-indexed columns.
  • To ensure that rows are efficiently scanned in the DELETE query, add an ORDER BY clause on the filtering column.
  • Use a LIMIT clause to limit the number of rows to the desired batch size. To determine the optimal batch size, try out different batch sizes (1,000 rows, 10,000 rows, 100,000 rows, etc.) and monitor the change in performance.
  • Add a RETURNING clause to the end of the query that returns the filtering column values of the deleted rows. Then, using the values of the deleted rows, update the filter to match only the subset of remaining rows to delete. This narrows each query's scan to the fewest rows possible, and preserves the performance of the deletes over time. This pattern assumes that no new rows are generated that match on the DELETE filter during the time that it takes to perform the delete.

For example, suppose that you want to delete all rows in the tpcc new_order table where no_w_id is less than 5, in batches of 5,000 rows. To do this, you can write a script that loops over batches of 5,000 rows, following the DELETE query guidance provided above. Note that in this case, no_w_id is the first column in the primary index, and, as a result, you do not need to create a secondary index on the column.

In Python, the script would look similar to the following:

icon/buttons/copy
#!/usr/bin/env python3

import psycopg2
import psycopg2.sql
import os

conn = psycopg2.connect(os.environ.get('DB_URI'))
filter = 4
lastrow = None

while True:
  with conn:
    with conn.cursor() as cur:
        if lastrow:
            filter = lastrow[0]
        query = psycopg2.sql.SQL("DELETE FROM new_order WHERE no_w_id <= %s ORDER BY no_w_id DESC LIMIT 5000 RETURNING no_w_id")
        cur.execute(query, (filter,))
        print(cur.statusmessage)
        if cur.rowcount == 0:
            break
        lastrow = cur.fetchone()

conn.close()

This script iteratively deletes rows in batches of 5,000, until all of the rows where no_w_id <= 4 are deleted. Note that at each iteration, the filter is updated to match a narrower subset of rows.

Batch delete on a non-indexed column

If you cannot index the column that identifies the unwanted rows, we recommend defining the batch loop to execute separate read and write operations at each iteration:

  1. Execute a SELECT query that returns the primary key values for the rows that you want to delete. When writing the SELECT query:

    • Use a WHERE clause that filters on the column identifying the rows.
    • Add an AS OF SYSTEM TIME clause to the end of the selection subquery, or run the selection query in a separate, read-only transaction with SET TRANSACTION AS OF SYSTEM TIME. This helps to reduce transaction contention.
    • Use a LIMIT clause to limit the number of rows queried to a subset of the rows that you want to delete. To determine the optimal SELECT batch size, try out different sizes (10,000 rows, 100,000 rows, 1,000,000 rows, etc.), and monitor the change in performance. Note that this SELECT batch size can be much larger than the batch size of rows that are deleted in the subsequent DELETE query.
    • To ensure that rows are efficiently scanned in the subsequent DELETE query, include an ORDER BY clause on the primary key.
  2. Write a nested DELETE loop over the primary key values returned by the SELECT query, in batches smaller than the initial SELECT batch size. To determine the optimal DELETE batch size, try out different sizes (1,000 rows, 10,000 rows, 100,000 rows, etc.), and monitor the change in performance. Where possible, we recommend executing each DELETE in a separate transaction.

For example, suppose that you want to delete all rows in the tpcc history table that are older than a month. You can create a script that loops over the data and deletes unwanted rows in batches, following the query guidance provided above.

In Python, the script would look similar to the following:

icon/buttons/copy
#!/usr/bin/env python3

import psycopg2
import os
import time

conn = psycopg2.connect(os.environ.get('DB_URI'))

while True:
    with conn:
        with conn.cursor() as cur:
            cur.execute("SET TRANSACTION AS OF SYSTEM TIME '-5s'")
            cur.execute("SELECT h_w_id, rowid FROM history WHERE h_date < current_date() - INTERVAL '1 MONTH' ORDER BY h_w_id, rowid LIMIT 20000")
            pkvals = list(cur)
    if not pkvals:
        return
    while pkvals:
        batch = pkvals[:5000]
        pkvals = pkvals[5000:]
        with conn:
            with conn.cursor() as cur:
                cur.execute("DELETE FROM history WHERE (h_w_id, rowid) = ANY %s", (batch,))
                print(cur.statusmessage)
    del batch
    del pkvals
    time.sleep(5)

conn.close()

At each iteration, the selection query returns the primary key values of up to 20,000 rows of matching historical data from 5 seconds in the past, in a read-only transaction. Then, a nested loop iterates over the returned primary key values in smaller batches of 5,000 rows. At each iteration of the nested DELETE loop, a batch of rows is deleted. After the nested DELETE loop deletes all of the rows from the initial selection query, a time delay ensures that the next selection query reads historical data from the table after the last iteration's DELETE final delete.

Batch-delete "expired" data

CockroachDB does not support Time to Live (TTL) on table rows. To delete "expired" rows, we recommend automating a batch delete process using a job scheduler like cron.

For example, suppose that every morning you want to delete all rows in the tpcc history table that are older than a month. To do this, you could use the example Python script that batch deletes on the non-indexed h_date column.

To run the script with a daily cron job:

  1. Make the file executable:

    icon/buttons/copy

    $ chmod +x cleanup.py
    
  2. Create a new cron job:

    icon/buttons/copy

    $ crontab -e
    
    icon/buttons/copy
    30 10 * * * DB_URI='cockroachdb://user@host:26257/bank' cleanup.py >> ~/cron.log 2>&1
    

Saving the cron file will install a new job that runs the cleanup.py file every morning at 10:30 A.M., writing the results to the cron.log file.

Preserving DELETE performance over time

CockroachDB relies on multi-version concurrency control (MVCC) to process concurrent requests while guaranteeing strong consistency. As such, when you delete a row, it is not immediately removed from disk. The MVCC values for the row will remain until the garbage collection period defined by the gc.ttlseconds variable in the applicable zone configuration has passed. By default, this period is 25 hours.

This means that with the default settings, each iteration of your DELETE statement must scan over all of the rows previously marked for deletion within the last 25 hours. If you try to delete 10,000 rows 10 times within the same 25 hour period, the 10th command will have to scan over the 90,000 rows previously marked for deletion.

To preserve performance over iterative DELETE queries, we recommend taking one of the following approaches:

  • At each iteration, update the WHERE clause to filter only the rows that have not yet been marked for deletion. For an example, see Batch-delete on an indexed filter above.
  • At each iteration, first use a SELECT statement to return primary key values on rows that are not yet deleted. Rows marked for deletion will not be returned. Then, use a nested DELETE loop over a smaller batch size, filtering on the primary key values. For an example, see Batch delete on a non-indexed column above.
  • To iteratively delete rows in constant time, using a simple DELETE loop, you can alter your zone configuration and change gc.ttlseconds to a low value like 5 minutes (i.e., 300), and then run your DELETE statement once per GC interval.

Examples

Setup

The following examples use MovR, a fictional vehicle-sharing application, to demonstrate CockroachDB SQL statements. For more information about the MovR example application and dataset, see MovR: A Global Vehicle-sharing App.

To follow along, run cockroach demo to start a temporary, in-memory cluster with the movr dataset preloaded:

icon/buttons/copy
$ cockroach demo

Delete rows using Primary Key/unique columns

Using columns with the Primary Key or Unique constraints to delete rows ensures your statement is unambiguous—no two rows contain the same column value, so it's less likely to delete data unintentionally.

In this example, code is our primary key and we want to delete the row where the code equals "about_stuff_city". Because we're positive no other rows have that value in the code column, there's no risk of accidentally removing another row.

icon/buttons/copy
> DELETE FROM promo_codes WHERE code = 'about_stuff_city';
DELETE 1

Delete rows using non-unique columns

Deleting rows using non-unique columns removes every row that returns TRUE for the WHERE clause's a_expr. This can easily result in deleting data you didn't intend to.

icon/buttons/copy
> DELETE FROM promo_codes WHERE creation_time > '2019-01-30 00:00:00+00:00';
DELETE 4

The example statement deleted four rows, which might be unexpected.

Return deleted rows

To see which rows your statement deleted, include the RETURNING clause to retrieve them using the columns you specify.

Use all columns

By specifying *, you retrieve all columns of the delete rows.

Use specific columns

To retrieve specific columns, name them in the RETURNING clause.

icon/buttons/copy
> DELETE FROM promo_codes WHERE creation_time > '2019-01-29 00:00:00+00:00' RETURNING code, rules;
           code          |                    rules
+------------------------+----------------------------------------------+
  box_investment_stuff   | {"type": "percent_discount", "value": "10%"}
  energy_newspaper_field | {"type": "percent_discount", "value": "10%"}
  simple_guy_theory      | {"type": "percent_discount", "value": "10%"}
  study_piece_war        | {"type": "percent_discount", "value": "10%"}
  tv_this_list           | {"type": "percent_discount", "value": "10%"}
(5 rows)

Change column labels

When RETURNING specific columns, you can change their labels using AS.

icon/buttons/copy
> DELETE FROM promo_codes WHERE creation_time > '2019-01-28 00:00:00+00:00' RETURNING code, rules AS discount;
         code         |                   discount
+---------------------+----------------------------------------------+
  chair_company_state | {"type": "percent_discount", "value": "10%"}
  view_reveal_radio   | {"type": "percent_discount", "value": "10%"}
(2 rows)

Sort and return deleted rows

To sort and return deleted rows, use a statement like the following:

icon/buttons/copy
> WITH a AS (DELETE FROM promo_codes WHERE creation_time > '2019-01-27 00:00:00+00:00' RETURNING *)
  SELECT * FROM a ORDER BY expiration_time;
             code            |                                                                                                  description                                                                                                   |       creation_time       |      expiration_time      |                    rules
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------------------------+----------------------------------------------+
  often_thing_hair           | Society right wish face see if pull. Great generation social bar read budget wonder natural. Somebody dark field economic material. Nature nature paper law worry common. Serious activity hospital wide none. | 2019-01-27 03:04:05+00:00 | 2019-01-29 03:04:05+00:00 | {"type": "percent_discount", "value": "10%"}
  step_though_military       | Director middle summer most create any.                                                                                                                                                                        | 2019-01-27 03:04:05+00:00 | 2019-01-29 03:04:05+00:00 | {"type": "percent_discount", "value": "10%"}
  own_whose_economy          | Social participant order this. Guy toward nor indeed police player inside nor. Model education voice several college art on. Start listen their maybe.                                                         | 2019-01-27 03:04:05+00:00 | 2019-01-30 03:04:05+00:00 | {"type": "percent_discount", "value": "10%"}
  crime_experience_certainly | Prepare right teacher mouth student. Trouble condition weight during scene something stand.                                                                                                                    | 2019-01-27 03:04:05+00:00 | 2019-01-31 03:04:05+00:00 | {"type": "percent_discount", "value": "10%"}
  policy_its_wife            | Player either she something good minute or. Nearly policy player receive. Somebody mean book store fire realize.                                                                                               | 2019-01-27 03:04:05+00:00 | 2019-01-31 03:04:05+00:00 | {"type": "percent_discount", "value": "10%"}
(5 rows)

Delete with index hints

Suppose you create a multi-column index on the users table with the name and city columns.

icon/buttons/copy
> CREATE INDEX ON users (name, city);

Now suppose you want to delete the two users named "Jon Snow". You can use the EXPLAIN (OPT) command to see how the cost-based optimizer decides to perform the delete:

icon/buttons/copy
> EXPLAIN (OPT) DELETE FROM users WHERE name='Jon Snow';
                                     text
-------------------------------------------------------------------------------
  delete users
   ├── scan users@users_name_city_idx
   │    └── constraint: /8/7/6: [/'Jon Snow' - /'Jon Snow']
   └── f-k-checks
        ├── f-k-checks-item: vehicles(city,owner_id) -> users(city,id)
        │    └── semi-join (hash)
        │         ├── with-scan &1
        │         ├── scan vehicles@vehicles_auto_index_fk_city_ref_users
        │         └── filters
        │              ├── city = vehicles.city
        │              └── id = owner_id
        ├── f-k-checks-item: rides(city,rider_id) -> users(city,id)
        │    └── semi-join (lookup rides@rides_auto_index_fk_city_ref_users)
        │         ├── with-scan &1
        │         └── filters (true)
        └── f-k-checks-item: user_promo_codes(city,user_id) -> users(city,id)
             └── semi-join (hash)
                  ├── with-scan &1
                  ├── scan user_promo_codes
                  └── filters
                       ├── city = user_promo_codes.city
                       └── id = user_id
(22 rows)

The output of the EXPLAIN statement shows that the optimizer scans the newly-created users_name_city_idx index when performing the delete. This makes sense, as you are performing a delete based on the name column.

Now suppose that instead you want to perform a delete, but using the id column instead.

icon/buttons/copy
> EXPLAIN (OPT) DELETE FROM users WHERE id IN ('70a3d70a-3d70-4400-8000-000000000016', '3d70a3d7-0a3d-4000-8000-00000000000c');
  delete users
   ├── select
   │    ├── scan users@users_name_city_idx
   │    └── filters
   │         └── users.id IN ('3d70a3d7-0a3d-4000-8000-00000000000c', '70a3d70a-3d70-4400-8000-000000000016')
   └── f-k-checks
        ├── f-k-checks-item: vehicles(city,owner_id) -> users(city,id)
        │    └── semi-join (hash)
        │         ├── with-scan &1
        │         ├── scan vehicles@vehicles_auto_index_fk_city_ref_users
        │         └── filters
        │              ├── city = vehicles.city
        │              └── id = owner_id
        ├── f-k-checks-item: rides(city,rider_id) -> users(city,id)
        │    └── semi-join (lookup rides@rides_auto_index_fk_city_ref_users)
        │         ├── with-scan &1
        │         └── filters (true)
        └── f-k-checks-item: user_promo_codes(city,user_id) -> users(city,id)
             └── semi-join (hash)
                  ├── with-scan &1
                  ├── scan user_promo_codes
                  └── filters
                       ├── city = user_promo_codes.city
                       └── id = user_id
(24 rows)

The optimizer still scans the newly-created users_name_city_idx index when performing the delete. Although scanning the table on this index could still be the most efficient, you may want to assess the performance difference between using users_name_city_idx and an index on the id column, as you are performing a delete with a filter on the id column.

If you provide an index hint (i.e., force the index selection) to use the primary index on the column instead, the CockroachDB will scan the users table using the primary index, on city, and id.

icon/buttons/copy
> EXPLAIN (OPT) DELETE FROM users@primary WHERE id IN ('70a3d70a-3d70-4400-8000-000000000016', '3d70a3d7-0a3d-4000-8000-00000000000c');
                                                     text
---------------------------------------------------------------------------------------------------------------
  delete users
   ├── select
   │    ├── scan users
   │    │    └── flags: force-index=primary
   │    └── filters
   │         └── users.id IN ('3d70a3d7-0a3d-4000-8000-00000000000c', '70a3d70a-3d70-4400-8000-000000000016')
   └── f-k-checks
        ├── f-k-checks-item: vehicles(city,owner_id) -> users(city,id)
        │    └── semi-join (hash)
        │         ├── with-scan &1
        │         ├── scan vehicles@vehicles_auto_index_fk_city_ref_users
        │         └── filters
        │              ├── city = vehicles.city
        │              └── id = owner_id
        ├── f-k-checks-item: rides(city,rider_id) -> users(city,id)
        │    └── semi-join (lookup rides@rides_auto_index_fk_city_ref_users)
        │         ├── with-scan &1
        │         └── filters (true)
        └── f-k-checks-item: user_promo_codes(city,user_id) -> users(city,id)
             └── semi-join (hash)
                  ├── with-scan &1
                  ├── scan user_promo_codes
                  └── filters
                       ├── city = user_promo_codes.city
                       └── id = user_id
(25 rows)

See also


Yes No
On this page

Yes No