Migrate from CSV

On this page Carat arrow pointing down
Warning:
Cockroach Labs will stop providing Assistance Support for v22.2 on June 5, 2024. Prior to that date, upgrade to a more recent version to continue receiving support. For more details, see the Release Support Policy.

This page has instructions for migrating data from CSV files into CockroachDB using IMPORT INTO.

The examples on this page use the employees data set that is also used in the MySQL docs.

The examples pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs, dumped as a set of CSV files.

Tip:

For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.

Note:

As of v22.1, certain IMPORT TABLE statements that defined the table schema inline are not supported. See Import — Considerations for more details. To import data into a new table, use CREATE TABLE followed by IMPORT INTO. For an example, read Import into a new table from a CSV file.

Step 1. Export data to CSV

Refer to the documentation of your database for instructions on exporting data to CSV.

IMPORT INTO requires that you export one file per table with the following attributes:

  • Files must be in valid CSV (comma-separated values) or TSV (tab-separated values) format.
  • The delimiter must be a single character. Use the delimiter option to set a character other than a comma (such as a tab, for TSV format).
  • Files must be UTF-8 encoded.
  • If one of the following characters appears in a field, the field must be enclosed by double quotes:
    • Delimiter (, by default).
    • Double quote ("). Because the field will be enclosed by double quotes, escape a double quote inside a field by preceding it with another double quote. For example: "aaa","b""bb","ccc".
    • Newline (\n).
    • Carriage return (\r).
  • If a column is of type BYTES, it can either be a valid UTF-8 string or a hex-encoded byte literal beginning with \x. For example, a field whose value should be the bytes 1, 2 would be written as \x0102.

Step 2. Host the files where the cluster can access them

Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for more information on the types of storage IMPORT INTO can pull from, see the following:

Tip:

We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.

Step 3. Import the CSV

You will need to write a CREATE TABLE statement that matches the schema of the table data you're importing.

For example, to import the data from employees.csv into an employees table, issue the following statement to create the table:

icon/buttons/copy
CREATE TABLE employees (
  emp_no INT PRIMARY KEY,
  birth_date DATE NOT NULL,
  first_name STRING NOT NULL,
  last_name STRING NOT NULL,
  gender STRING NOT NULL,
  hire_date DATE NOT NULL
      );

Next, use IMPORT INTO to import the data into the new table:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz'
     );
       job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes   
--------------------+-----------+--------------------+--------+---------------+----------------+----------
 381866942129111041 | succeeded |                  1 | 300024 |             0 |              0 | 13258389
(1 row)

Repeat this process for each CSV file you want to import.

Before importing CSV data, consider the following:

  • The column order in your schema must match the column order in the file being imported.
  • You will need to run ALTER TABLE ... ADD CONSTRAINT to add any foreign key relationships.

Configuration Options

The following options are available to IMPORT ... CSV:

Column delimiter

The delimiter option is used to set the Unicode character that marks where each column ends. Default: ,.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH delimiter = e'\t';

Comment syntax

The comment option determines which Unicode character marks the rows in the data to be skipped.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH comment = '#';

Skip header rows

The skip option determines the number of header rows to skip when importing a file.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH skip = '2';

Null strings

The nullif option specifies a column value that should be converted to NULL.

Note:

To match the nullif setting, a CSV input value must be unquoted. For details, see IMPORT INTO.

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH nullif = '';

File compression

The compress option defines which decompression codec should be used on the CSV file to be imported. Options include:

  • gzip: Uses the gzip algorithm to decompress the file.
  • bzip: Uses the bzip algorithm to decompress the file.
  • none: Disables decompression.
  • auto: Default. Guesses based on file extension ('none' for .csv, 'gzip' for .gz, 'bzip' for .bz and .bz2).

Example usage:

icon/buttons/copy
IMPORT INTO employees (emp_no, birth_date, first_name, last_name, gender, hire_date)
     CSV DATA (
       'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz')
        WITH compress = 'gzip';

See also


Yes No
On this page

Yes No