SQL Code

SQL, Chapter 3: FROM Clause and JOIN Subclause

posted in: Software Engineering, SQL | 0

To see how the FROM clause and JOIN subclause fit into the SELECT statement, please refer back to Chapter 2.

The FROM Clause

The FROM clause has this basic syntax:

FROM { from_item [ [ AS ] alias ] [, …] | select_statement }

The FROM clause defines the set of data from which the resultset will be selected. If the FROM clause is used, at least one from_item or one select_statement is required.

The FROM clause in its simplest form refers to a single table. That table may be joined to other tables with a join subclause to create a composite of data from multiple tables, from which data can be selected.

The FROM clause can also contain a SELECT statement of its own, called a subquery. The results of the subquery become the dataset from which the outside SELECT statement selects its data. This process can be infinitely repeated, although performance degrades pretty quickly with multiple nested subqueries. (Chapter 5 covers subqueries in more detail.)

The Join Subclause

Joins are defined in the JOIN clause, which is a subclause of the FROM clause. There are several types of joins, and multiple tables can be joined with multiple JOIN clauses in a single FROM clause. (For an explanation of how the JOIN clause fits into the FROM clause syntactically, see Chapter 2.)

The syntax for the JOIN subclause is:

[{ join_type from_item ON join_expression ] […] | CROSS JOIN from_item }]

The join_type can be any of these:

  • [ INNER ] JOIN
  • LEFT [ OUTER ] JOIN
  • RIGHT [ OUTER ] JOIN
  • FULL [ OUTER ] JOIN
  • CROSS JOIN

A join expression is an expression with a result of type boolean that specifies how a row from one table matches a row from the other. It is preceded by the ON keyword, so it is sometimes called the ON clause.

In join expressions, the terms INNER and OUTER are usually omitted in practice: JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN. For the purpose of this explanation, I’ll leave them in.

It can be helpful to express the different types of joins in terms of set theory. In terms of set theory, and in terms of two sets A and B:

  • An inner join is the intersection of A and B, or A ∩ B.
  • A left outer join is the union of the intersection of A and B and A‘s relative complement of B (i.e. those members of A that are not included in B), or (A ∩ B) ∪ (A - B).
  • A right outer join is the the union of the intersection of A and B and B‘s relative complement of A, or (A ∩ B) ∪ (B - A).
  • A full outer join is the union of A and B, or A ∪ B.
  • A cross join is the Cartesian product of A and B, or A × B.

To give an example of each type of join, I’ll use these two tables:

fruits
id name color_id
1 orange 4
2 cherry 1
3 lemon 2
4 date null
5 pear 2
colors
id color
1 red
2 yellow
3 blue
4 orange

INNER JOIN

INNER JOIN includes only rows that have matching values as specified in the ON clause (in this case fruits.color_id and colors.id).

This SQL statement:

Will have this result:

name color
orange orange
cherry red
lemon yellow
pear yellow

LEFT OUTER JOIN

LEFT OUTER JOIN includes all the rows that INNER JOIN does, as well as all other rows in the table that the FROM clause references (the “FROM table,” fruits in this case). Rows in the FROM table that do not have matching values in the JOIN table (the colors table in this case) have null values for all columns in the JOIN table.

This SQL statement:

Will have this result:

name color
orange orange
cherry red
lemon yellow
date
pear yellow

RIGHT OUTER JOIN

RIGHT OUTER JOIN includes all the rows that INNER JOIN does, as well as any other rows in the table that the JOIN clause references. Rows in the JOIN table that do not have matching values in the FROM table have null values for all columns in the FROM table.

This SQL statement:

Will have this result:

name color
orange orange
cherry red
lemon yellow
pear yellow
blue

FULL OUTER JOIN

FULL OUTER JOIN returns a combination of the rows returned by LEFT OUTER JOIN and RIGHT OUTER JOIN.

This SQL statement:

Will have this result:

name color
orange orange
cherry red
lemon yellow
date
pear yellow
blue

CROSS JOIN

CROSS JOIN returns the specified columns from the Cartesian product of the two tables. A Cartesian product is the set of all possible combinations of each of the elements in two sets. For example, a deck of cards is the Cartesian product of a set of ranks and a set of suits.

This SQL statement:

Will have this result:

name color
orange red
orange yellow
orange blue
orange orange
cherry red
cherry yellow
cherry blue
cherry orange
lemon red
lemon yellow
lemon blue
lemon orange
date red
date yellow
date blue
date orange
pear red
pear yellow
pear blue
pear orange
Using Cross Joins to Get Missing Combinations

Cross joins can be used to get a list of all members of two entities in many-to-many relationship that have no relation to each other.

A practical example will show why this is useful. Suppose I have some customers in a customers table, and I have some products in a products table. I also keep a list of which customers have bought which products in a customers_products table. I want a list of customers and the products that they haven’t bought yet. In other words, I want a list of all the combinations of customers and products that aren’t in the customers_products table.

Here are tables with some sample data:

customers
id name
1 Anne
2 Bob
3 John
products
id name
1 widget
2 frammis
3 gewgaw
customers_products
cust_id prod_id
1 1
1 2
2 3
3 1
3 2
3 3

Now, this query:

Will have this result:

cust prod
Anne gewgaw
Bob widget
Bob frammis

Which is what we want: all of the products that each customer hasn’t bought. We are selecting the Cartesian product of customers and products, minus the combinations that exist in the customers_products table.

For more on the WHERE NOT EXISTS predicate, see Chapter 6.