where I’ll use these tables in the examples:
fruits
id | name | color_id |
---|---|---|
1 | orange | 5 |
2 | cherry | 1 |
3 | lemon | 2 |
4 | grape | 6 |
5 | lime | 4 |
6 | date | null |
7 | pear | 2 |
colors
id | color |
---|---|
1 | red |
2 | yellow |
3 | blue |
4 | green |
5 | orange |
6 | purple |
WHERE
Clause
The WHERE
clause takes this form:
where The WHERE
clause can contain any expression that evaluates to type boolean
. The WHERE
clause filters the resultset such that it includes only those rows where condition evaluates to true
, when the corresponding values in the row being evaluated are substituted for any column references in condition.
Using the fruits
and colors
tables listed above, this SQL statement:
1 2 3 4 5 6 |
SELECT name FROM fruits f JOIN colors c ON f.color_id = c.id WHERE c.color = 'yellow' |
Will have this result:
name |
---|
lemon |
pear |
GROUP BY
Clause
The GROUP BY
clause takes this form:
GROUP BY
is used to group a set of columns together, typically performing some sort of aggregate function (count
, sum
, avg
, etc.) on them while doing so. GROUP BY
will condense into a single row those rows that have equivalent values in the grouping_element list. While doing the condensing, it will calculate the value of any aggregate functions that are specified, for the rows in the group being condensed.
Using the fruits
and colors
tables listed above, this SQL statement:
1 2 3 4 5 6 |
SELECT c.color, count(f.color_id) FROM colors c LEFT JOIN fruits f ON c.id = f.color_id GROUP BY c.color; |
Will have this result:
color | count |
---|---|
yellow | 2 |
green | 1 |
blue | 0 |
red | 1 |
purple | 1 |
orange | 1 |
(If we don’t want colors with zero fruits, then we would do an inner join instead of a left join.)
Any columns that show in the resultset (or the primary key of the table containing those columns) must be listed in the GROUP BY
clause. Failure to do so will cause an error.
If no aggregate function is invoked, then GROUP BY
simply eliminates duplicates in the resultset. This is more concisely handled by using DISTINCT
.
HAVING
Clause
The HAVING
clause takes this form:
The HAVING
clause is used in conjunction with GROUP BY
to filter groups, usually by evaluating the results of aggregate functions. Groups that do not meet the criteria specified in condition are eliminated. In this, it is similar to the WHERE
clause, but while the WHERE
clause eliminates individual rows that do not match its specified criteria, the HAVING
clause eliminates groups that do not match.
The specification for condition in the HAVING
clause is the same as its specification in the WHERE
clause: it can contain any expression that evaluates to type boolean
. (While it may — and usually does — include references to the results of aggregate function calculations in its condition, the WHERE
condition may not reference aggregate functions.)
Using the fruits
and colors
tables listed above, this SQL statement:
1 2 3 4 5 6 7 |
SELECT c.color, count(f.color_id) FROM colors c LEFT JOIN fruits f ON c.id = f.color_id GROUP BY c.color HAVING count(f.color_id) > 1; |
Will have this result:
color | count |
---|---|
yellow | 2 |
ORDER BY
Clause
The ORDER BY
clause takes this form:
The ORDER BY
clause sorts the resultset based on expression. Typically, expression is a column or comma-separated list of columns (including, possibly, columns that are the result of calculations, such as list_price - cost
). If there is more than one column, the resultset is sorted by the first column in the list, then columns with equal values for that column are sorted by the second column in the list, and so on.
Columns may be sorted in either ascending or descending order, by providing the ASC
or DESC
keyword. If this keyword is not provided, ASC
is assumed by default.
A resultset may be sorted by a column containing values from an aggregate function. The name of the column as shown in the resultset is the value that must be used in the ORDER BY
expression, so it is often advisable to name this column using the AS
keyword.
Using the fruits
and colors
tables listed above, this SQL statement:
1 2 3 4 5 6 7 |
SELECT c.color, count(f.color_id) FROM colors c LEFT JOIN fruits f ON c.id = f.color_id GROUP BY c.color ORDER BY c.color; |
Will have this result:
color | count |
---|---|
blue | 0 |
green | 1 |
orange | 1 |
purple | 1 |
red | 1 |
yellow | 2 |
And, this SQL statement:
1 2 3 4 5 6 7 |
SELECT c.color, count(f.color_id) FROM colors c LEFT JOIN fruits f ON c.id = f.color_id GROUP BY c.color ORDER BY count DESC, c.color; |
Will have this result:
color | count |
---|---|
yellow | 2 |
green | 1 |
orange | 1 |
purple | 1 |
red | 1 |
blue | 0 |
In this case, it is possible to directly reference the count
column, and count
is probably the best name for the column as well. However, this is not always the case. This SQL statement:
1 2 3 4 5 6 7 |
SELECT c.color, count(f.color_id) + 3 FROM colors c LEFT JOIN fruits f ON c.id = f.color_id GROUP BY c.color ORDER BY '?column?' |
Will have this result:
color | ?column? |
---|---|
blue | 3 |
green | 4 |
red | 4 |
purple | 4 |
orange | 4 |
yellow | 5 |
It’s possible to enclose ?column?
in quotes to reference it (ORDER BY '?column?'
), but it would be better to assign a more descriptive name:
1 2 3 4 5 6 7 8 |
SELECT c.color, count(f.color_id) + 3 AS color_plus_3 FROM colors c LEFT JOIN fruits f ON c.id = f.color_id GROUP BY c.color ORDER BY color_plus_3 |
With this result:
color | color_plus_3 |
---|---|
blue | 3 |
green | 4 |
red | 4 |
purple | 4 |
orange | 4 |
yellow | 5 |