JOIN expressions

On this page Carat arrow pointing down
Warning:
Cockroach Labs will stop providing Assistance Support for v22.2 on June 5, 2024. Prior to that date, upgrade to a more recent version to continue receiving support. For more details, see the Release Support Policy.

A JOIN expression, also called a join, combines the results of two or more table expressions based on conditions on the values of particular columns (such as equality conditions). A join is a particular kind of table expression.

A JOIN expression defines a data source in the FROM sub-clause of a SELECT clause or as parameter to a TABLE clause.

Tip:

The cost-based optimizer supports hint syntax to force the use of a specific join algorithm. For more information, see Join hints.

Synopsis

( joined_table ) table_ref CROSS opt_join_hint NATURAL FULL LEFT RIGHT OUTER INNER opt_join_hint JOIN table_ref FULL LEFT RIGHT OUTER INNER opt_join_hint JOIN table_ref USING ( name , ) ON a_expr

Parameters

Parameter Description
joined_table A join expression.
table_ref A table expression.
opt_join_hint A join hint.
a_expr A scalar expression to use as an ON join condition.
name A column name to use as a USING join condition.

Supported join types

CockroachDB supports the following join types:

Inner joins

Only the rows from the left and right operand that match the condition are returned.

<table expr> [ INNER ] JOIN <table expr> ON <val expr>
<table expr> [ INNER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL [ INNER ] JOIN <table expr>
<table expr> CROSS JOIN <table expr>

Left outer joins

For every left row where there is no match on the right, NULL values are returned for the columns on the right.

<table expr> LEFT [ OUTER ] JOIN <table expr> ON <val expr>
<table expr> LEFT [ OUTER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL LEFT [ OUTER ] JOIN <table expr>

Right outer joins

For every right row where there is no match on the left, NULL values are returned for the columns on the left.

<table expr> RIGHT [ OUTER ] JOIN <table expr> ON <val expr>
<table expr> RIGHT [ OUTER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL RIGHT [ OUTER ] JOIN <table expr>

Full outer joins

For every row on one side of the join where there is no match on the other side, NULL values are returned for the columns on the non-matching side.

<table expr> FULL [ OUTER ] JOIN <table expr> ON <val expr>
<table expr> FULL [ OUTER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL FULL [ OUTER ] JOIN <table expr>

Supported join conditions

CockroachDB supports the following conditions to match rows in a join:

  • No condition with CROSS JOIN: each row on the left is considered to match every row on the right.
  • ON predicates: a Boolean scalar expression is evaluated to determine whether the operand rows match.
  • USING: the named columns are compared pairwise from the left and right rows; left and right rows are considered to match if the columns are equal pairwise.
  • NATURAL: generates an implicit USING condition using all the column names that are present in both the left and right table expressions.

    Note:
    NATURAL is supported for compatibility with PostgreSQL; Cockroach Labs discourages use in new applications because its results can silently change in unpredictable ways when new columns are added to one of the join operands.

Join algorithms

CockroachDB supports the following algorithms for performing a join:

Merge joins

To perform a merge join of two tables, both tables must be indexed on the equality columns, and any indexes must have the same ordering. Merge joins offer better computational performance and more efficient memory usage than hash joins. When tables and indexes are ordered for a merge, CockroachDB chooses to use merge joins over hash joins, by default. When merge conditions are not met, CockroachDB resorts to the slower hash joins. Merge joins can be used only with distributed query processing.

Merge joins are performed on the indexed columns of two tables as follows:

  1. CockroachDB checks for indexes on the equality columns and that they are ordered the same (i.e., ASC or DESC).
  2. CockroachDB takes one row from each table and compares them.
    • For inner joins:
      • If the rows are equal, CockroachDB returns the rows.
      • If there are multiple matches, the Cartesian product of the matches is returned.
      • If the rows are not equal, CockroachDB discards the lower-value row and repeats the process with the next row until all rows are processed.
    • For outer joins:
      • If the rows are equal, CockroachDB returns the rows.
      • If there are multiple matches, the Cartesian product of the matches is returned.
      • If the rows are not equal, CockroachDB returns NULL for the non-matching column and repeats the process with the next row until all rows are processed.

Hash joins

If a merge join cannot be used, CockroachDB uses a hash join. Hash joins are computationally expensive and require additional memory.

Hash joins are performed on two tables as follows:

  1. CockroachDB reads both tables and attempts to pick the smaller table.
  2. CockroachDB creates an in-memory hash table on the smaller table. If the hash table is too large, it will spill over to disk storage (which could affect performance).
  3. CockroachDB then scans the large table, looking up each row in the hash table.

Lookup joins

The cost-based optimizer decides when it would be beneficial to use a lookup join. Lookup joins are used when there is a large imbalance in size between the two tables, as it only reads the smaller table and then looks up matches in the larger table. A lookup join requires that the right-hand (i.e., larger) table be indexed on the columns involved in the join condition. A partial index can only be used if it contains the subset of rows being looked up.

Lookup joins are performed on two tables as follows:

  1. CockroachDB reads each row in the small table.
  2. CockroachDB then scans (or "looks up") the larger table for matches to the smaller table and outputs the matching rows.

The optimizer imposes some restrictions on the usage of inequalities in lookup join conditions:

  1. If the lookup condition contains no equalities (i.e., is composed only of an inequality), either the input of the join must return only one row or the join must have a LOOKUP hint. This prevents poor performance of the current lookup join implementation.
  2. If the index column is DESC and the inequality is of the form idxCol < inputCol or equivalently inputCol > idxCol, the column type must be countable in order to support retrieving the immediate previous value. This allows types like BOOL, FLOAT, and INT, but disallows types like STRING or BYTES.

You can override the use of lookup joins using join hints.

Note:

To make the optimizer prefer lookup joins to merge joins when performing foreign key checks, set the prefer_lookup_joins_for_fks session variable to on.

The output of EXPLAIN (VERBOSE) shows whether equality cols are key for lookup joins, which means that the lookup columns form a key in the target table such that each lookup has at most one result.

Inverted joins

Inverted joins force the optimizer to use a join using a GIN index on the right side of the join. Inverted joins can only be used with INNER and LEFT joins.

<table expr> INNER INVERTED JOIN <table expr> ON <val expr>
<table expr> LEFT INVERTED JOIN <table expr> ON <val expr>

See the cost-based optimizer examples for statements that use inverted joins.

LATERAL joins

CockroachDB supports LATERAL subquery joins for INNER and LEFT cross joins. For more information about LATERAL subqueries, see Lateral subqueries.

Apply joins

Apply join is the operator that executes a lateral join if the optimizer is not able to de-correlate it (i.e., rewrite the query to use a regular join). Most of the time, the optimizer can de-correlate most queries. However, there are some cases where the optimizer cannot perform this rewrite, and apply-join would show up in the EXPLAIN output for the query. The optimizer also replaces correlated subqueries with apply joins, and therefore apply-join may appear in the EXPLAIN output even if LATERAL was not used.

Apply joins are inefficient because they must be executed one row at a time. The left side row must be used to construct the right side row, and only then can the execution engine determine if the two rows should be output by the join. This corresponds to an O(n*m) time complexity.

Other types of joins supported by CockroachDB (e.g., hash join, merge join, and lookup join) are generally much more efficient. For example, with a hash join, a hash table is constructed using rows from the smaller side of the join, and then the larger side of the join is used to probe into the hash table using the ON conditions of the join. This corresponds to an O(n+m) time complexity.

If you see an apply-join, it means the optimizer was not able to perform de-correlation, and you should probably try to rewrite your query in a different way in order to get better performance.

Performance best practices

  • When no indexes can be used to satisfy a join, CockroachDB may load all the rows in memory that satisfy the condition one of the join operands before starting to return result rows. This may cause joins to fail if the join condition or other WHERE clauses are insufficiently selective.
  • Outer joins (i.e., left outer joins, right outer joins, and full outer joins) are generally processed less efficiently than inner joins. Use inner joins whenever possible. Full outer joins are the least optimized.
  • Use EXPLAIN over queries containing joins to verify that indexes are used.
  • Use indexes for faster joins.

See also


Yes No
On this page

Yes No