When you create an Enterprise changefeed, you can include the initial_scan = 'only'
option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a succeeded
status. As a result, you can create a changefeed with initial_scan = 'only'
to EXPORT
data out of your database.
You can also schedule a changefeed to use a changefeed initial scan for exporting data on a regular cadence.
The benefits of using changefeeds for this use case instead of export, include:
- Changefeeds are jobs, which can be paused, resumed, cancelled, scheduled, and altered.
- There is observability into a changefeed job using
SHOW CHANGEFEED JOBS
and the Changefeeds Dashboard in the DB Console. - Changefeed jobs have built-in checkpointing and retries.
- Changefeed sinks provide additional endpoints for your data.
- You can use the
format=csv
option withinitial_scan= 'only'
to emit messages in CSV format.
Changefeeds emit the same CSV format as EXPORT
. In v22.1, changefeeds emitted CSV data that wrapped some values in single quotes, which were not wrapped when exporting data with the EXPORT
statement.
Message formats
By default, changefeeds emit messages in JSON format. You can use a different format by creating a changefeed with the format
option and specifying one of the following:
json
csv
avro
parquet
Examples
Export data with a changefeed
To create a changefeed that will only complete an initial scan of a table(s), run the following:
CREATE CHANGEFEED FOR TABLE movr.users INTO '{scheme}://{host}:{port}?{query_parameters}' WITH initial_scan = 'only', format=csv;
Or, use CDC queries to filter the data that your changefeed emits:
CREATE CHANGEFEED INTO '{scheme}://{host}:{port}?{query_parameters}'
WITH initial_scan = 'only', format=csv AS SELECT name, city FROM movr.users;
The job will return a job ID once it has started. You can use SHOW CHANGEFEED JOBS
to check on the status:
SHOW CHANGEFEED JOB {job ID};
When the scan has completed you will find the output shows succeeded
in the status
field.
Create a scheduled changefeed to export filtered data
This example creates a nightly export of some filtered table data with a scheduled changefeed that will run just after midnight every night. The changefeed uses CDC queries to query the table and filter the data it will send to the sink:
CREATE SCHEDULE sf_skateboard FOR CHANGEFEED INTO 'external://cloud-sink' WITH format=csv
AS SELECT current_location, id, type, status FROM vehicles
WHERE city = 'san francisco' AND type = 'skateboard'
RECURRING '1 0 * * *' WITH SCHEDULE OPTIONS on_execution_failure=retry, on_previous_running=start;
The schedule options control the schedule's behavior:
- If it runs into a failure,
on_execution_failure=retry
will ensure that the schedule retries the changefeed immediately. - If the previous scheduled changefeed is still running,
on_previous_running=start
will start a new changefeed at the defined cadence.