Introduction
There are many kinds of joins in SQL. I often have trouble keeping them straight. This article divides the joins into pairs of opposites, giving brief definitions of each along with their opposites. For the rest of this article, we'll be joining two tables, which we'll call the "left table" and the "right table". We'll be using these tables:create table rel1 ( id int primary key, letter varchar(1) ); create table rel2 ( id int primary key, fk int, foreign key (fk) references rel1(id), mychar varchar(1) ); select * from rel1; +----+--------+ | id | letter | +----+--------+ | 1 | a | | 2 | b | | 3 | c | | 4 | d | +----+--------+ select * from rel2; +----+--------+--------+ | id | fk | mychar | +----+--------+--------+ | 1 | 1 | z | | 3 | 4 | o | | 5 | 2 | m | | 8 | 2 | q | +----+--------+--------+`rel1` will be used as the left table, and `rel2` as the right table unless otherwise noted.
Inner vs. outer
Inner joins return only matching pairs of rows between the two tables. Outer joins also return rows from one or both tables that match 0 rows from the other table. (Note that `on rel1.id = rel2.id` is the "join predicate" in these examples)select * from rel1 inner join rel2 on rel1.id = rel2.id; +----+--------+----+------+--------+ | id | letter | id | fk | mychar | +----+--------+----+------+--------+ | 1 | a | 1 | 1 | z | | 3 | c | 3 | 4 | o | +----+--------+----+------+--------+ select * from rel1 outer join rel2 on rel1.id = rel2.id; +------+--------+------+------+--------+ | id | letter | id | fk | mychar | +------+--------+------+------+--------+ | 1 | a | 1 | 1 | z | | 2 | b | NULL | NULL | NULL | | 3 | c | 3 | 4 | o | | 4 | d | NULL | NULL | NULL | | NULL | NULL | 5 | 2 | m | | NULL | NULL | 8 | 2 | q | +------+--------+------+------+--------+
Outer join: left and right outer join
Left outer joins return, in addition to the rows returned by the corresponding inner join, rows from the left table that don't match any rows in the right table. Right outer joins are identical to inner joins except that they also return unmatched rows from the right table.select * from rel1 left outer join rel2 on rel1.id = rel2.id; +----+--------+------+------+--------+ | id | letter | id | fk | mychar | +----+--------+------+------+--------+ | 1 | a | 1 | 1 | z | | 2 | b | NULL | NULL | NULL | | 3 | c | 3 | 4 | o | | 4 | d | NULL | NULL | NULL | +----+--------+------+------+--------+ select * from rel1 right outer join rel2 on rel1.id = rel2.id; +------+--------+----+------+--------+ | id | letter | id | fk | mychar | +------+--------+----+------+--------+ | 1 | a | 1 | 1 | z | | 3 | c | 3 | 4 | o | | NULL | NULL | 5 | 2 | m | | NULL | NULL | 8 | 2 | q | +------+--------+----+------+--------+Note that the union of a left join and a right join is a full outer join; their intersection is an inner join.
Semi vs. anti
"Semi"-joins return rows from the left table that match rows in the right table. "Anti"-joins return rows from the left table that don't match rows in the right table. Both of these join forms act as a sort of filter for the left table, since no columns from the right table are returned.-- all rows in rel1 that have a matching row (by fk value) in rel2: select * from rel1 where id in (select fk from rel2); +----+--------+ | id | letter | +----+--------+ | 1 | a | | 2 | b | | 4 | d | +----+--------+ -- all rel1 rows that don't have a matching row (by fk value) in rel2: select * from rel1 where id not in (select fk from rel2); +----+--------+ | id | letter | +----+--------+ | 3 | c | +----+--------+Note that there are multiple ways to implement semi- and anti-joins in SQL:
-- "semi join" implemented using "inner join" and "distinct" select distinct rel1.* from rel1 inner join rel2 on rel1.id = rel2.fk; +----+--------+ | id | letter | +----+--------+ | 1 | a | | 2 | b | | 4 | d | +----+--------+ -- "anti join" implemented using "left join" and "is null" select distinct rel1.* from rel1 left join rel2 on rel1.id = rel2.fk where rel2.id is null; +----+--------+ | id | letter | +----+--------+ | 3 | c | +----+--------+
Equi vs. theta
"Equi"-joins are those where the join predicate is an equality. "Theta"-joins are those where the join predicate is an inequality.-- equi-join select * from rel1 inner join rel2 on rel1.id = rel2.id; +----+--------+----+------+--------+ | id | letter | id | fk | mychar | +----+--------+----+------+--------+ | 1 | a | 1 | 1 | z | | 3 | c | 3 | 4 | o | +----+--------+----+------+--------+ -- theta-join select * from rel1 inner join rel2 on rel1.id != rel2.id; +----+--------+----+------+--------+ | id | letter | id | fk | mychar | +----+--------+----+------+--------+ | 2 | b | 1 | 1 | z | | 3 | c | 1 | 1 | z | | 4 | d | 1 | 1 | z | | 1 | a | 3 | 4 | o | | 2 | b | 3 | 4 | o | | 4 | d | 3 | 4 | o | | 1 | a | 5 | 2 | m | | 2 | b | 5 | 2 | m | | 3 | c | 5 | 2 | m | | 4 | d | 5 | 2 | m | | 1 | a | 8 | 2 | q | | 2 | b | 8 | 2 | q | | 3 | c | 8 | 2 | q | | 4 | d | 8 | 2 | q | +----+--------+----+------+--------+Note that the union of an equi-join and a theta-join is a cartesian product, if the same tables and join predicate are used.
Implicit vs. explicit
source Explicit joins use one of the "join" keywords, such as "inner join" or "left join", combined with a join predicate:select * from rel1 inner join rel2 on rel1.id = rel2.id; +----+--------+----+------+--------+ | id | letter | id | fk | mychar | +----+--------+----+------+--------+ | 1 | a | 1 | 1 | z | | 3 | c | 3 | 4 | o | +----+--------+----+------+--------+Implicit joins do not use a "join" keyword, simply listing the tables in the "from" clause, and place the join predicate in the "where" clause:
select * from rel1, rel2 where rel1.id = rel2.id; +----+--------+----+------+--------+ | id | letter | id | fk | mychar | +----+--------+----+------+--------+ | 1 | a | 1 | 1 | z | | 3 | c | 3 | 4 | o | +----+--------+----+------+--------+Explicit joins are generally recommended, as they are thought to be more maintainable and more clearly express the programmer's intent. Note that this is purely a syntactic difference.
Cross join
This produces a cartesian product of the tables, joining every row in the left table with every row in the right table.select * from rel1 inner join rel2 on 1; +----+--------+----+------+--------+ | id | letter | id | fk | mychar | +----+--------+----+------+--------+ | 1 | a | 1 | 1 | z | | 2 | b | 1 | 1 | z | | 3 | c | 1 | 1 | z | | 4 | d | 1 | 1 | z | | 1 | a | 3 | 4 | o | | 2 | b | 3 | 4 | o | | 3 | c | 3 | 4 | o | | 4 | d | 3 | 4 | o | | 1 | a | 5 | 2 | m | | 2 | b | 5 | 2 | m | | 3 | c | 5 | 2 | m | | 4 | d | 5 | 2 | m | | 1 | a | 8 | 2 | q | | 2 | b | 8 | 2 | q | | 3 | c | 8 | 2 | q | | 4 | d | 8 | 2 | q | +----+--------+----+------+--------+
Natural join
The result is the set of rows, from the two tables, that have the same values for columns of the same name. If the tables share all columns names, the result is an intersection. If the tables share no column names, the result is a cartesian product. The number of columns in the result is the number of distinct column names in the tables.select * from rel1 natural join rel2; +----+--------+------+--------+ | id | letter | fk | mychar | +----+--------+------+--------+ | 1 | a | 1 | z | | 3 | c | 4 | o | +----+--------+------+--------+ select * from rel1 inner join rel2 using(id); +----+--------+------+--------+ | id | letter | fk | mychar | +----+--------+------+--------+ | 1 | a | 1 | z | | 3 | c | 4 | o | +----+--------+------+--------+ select rel1.*, rel2.fk, rel2.mychar from rel1 inner join rel2 on rel1.id = rel2.id; +----+--------+------+--------+ | id | letter | fk | mychar | +----+--------+------+--------+ | 1 | a | 1 | z | | 3 | c | 4 | o | +----+--------+------+--------+This join is not useful in practice, as it can be completely replaced by queries using (more easily understandable) inner joins. Additionally, it may produce non-intuitive results in that tables are not necessarily joined on foreign keys, which to my mind is the actual "natural" way to join tables. (Recall that rel1.id <=> rel2.fk was the foreign key relationship we defined).
Self join
This is just a special case of a join where the same table is used for both sides of the join. It can be combined with any of the preceding "join" forms.select * from rel2 a left join rel2 b on a.id < b.fk; +----+------+--------+------+------+--------+ | id | fk | mychar | id | fk | mychar | +----+------+--------+------+------+--------+ | 1 | 1 | z | 3 | 4 | o | | 1 | 1 | z | 5 | 2 | m | | 1 | 1 | z | 8 | 2 | q | | 3 | 4 | o | 3 | 4 | o | | 5 | 2 | m | NULL | NULL | NULL | | 8 | 2 | q | NULL | NULL | NULL | +----+------+--------+------+------+--------+Note the table aliases, used to distinguish between the two uses of the same table.
Summary
A convenient way to understand joins is in pairs, as I showed:join type | opposite | major point of difference |
---|---|---|
inner | outer | do/don't keep non-matching rows |
left outer | right outer | augment inner join with unmatched rows from left/right table |
semi | anti | filter rows from a single table, keeping/dropping those that match rows in another table |
equi | theta | join predicate is equality/inequality |
implicit | explicit | syntax |
Sources
article about joinsWikipedia's "join" article
Nice post. I don't understand why equi/theta joins are distinct from inner joints.
ReplyDeleteThanks for your comment. The point I was trying to make was that a single SQL statement employing a join may be classified several ways -- for instance, you could have an explicit, inner equi-join all at once -- but you couldn't have both a semi- and an anti-join.
ReplyDelete