SQL Code

SQL, Chapter 4: WHERE, GROUP BY, HAVING and ORDER BY Clauses

posted in: Software Engineering, SQL | 0

where I’ll use these tables in the examples:

fruits
id name color_id
1 orange 5
2 cherry 1
3 lemon 2
4 grape 6
5 lime 4
6 date null
7 pear 2
colors
id color
1 red
2 yellow
3 blue
4 green
5 orange
6 purple

WHERE Clause

The WHERE clause takes this form:

[ WHERE condition ]

where The WHERE clause can contain any expression that evaluates to type boolean. The WHERE clause filters the resultset such that it includes only those rows where condition evaluates to true, when the corresponding values in the row being evaluated are substituted for any column references in condition.

Using the fruits and colors tables listed above, this SQL statement:

Will have this result:

name
lemon
pear

GROUP BY Clause

The GROUP BY clause takes this form:

[ GROUP BY grouping_element [, …] ]

GROUP BY is used to group a set of columns together, typically performing some sort of aggregate function (count, sum, avg, etc.) on them while doing so. GROUP BY will condense into a single row those rows that have equivalent values in the grouping_element list. While doing the condensing, it will calculate the value of any aggregate functions that are specified, for the rows in the group being condensed.

Using the fruits and colors tables listed above, this SQL statement:

Will have this result:

color count
yellow 2
green 1
blue 0
red 1
purple 1
orange 1

(If we don’t want colors with zero fruits, then we would do an inner join instead of a left join.)

Any columns that show in the resultset (or the primary key of the table containing those columns) must be listed in the GROUP BY clause. Failure to do so will cause an error.

If no aggregate function is invoked, then GROUP BY simply eliminates duplicates in the resultset. This is more concisely handled by using DISTINCT.

HAVING Clause

The HAVING clause takes this form:

[ HAVING condition [, …] ]

The HAVING clause is used in conjunction with GROUP BY to filter groups, usually by evaluating the results of aggregate functions. Groups that do not meet the criteria specified in condition are eliminated. In this, it is similar to the WHERE clause, but while the WHERE clause eliminates individual rows that do not match its specified criteria, the HAVING clause eliminates groups that do not match.

The specification for condition in the HAVING clause is the same as its specification in the WHERE clause: it can contain any expression that evaluates to type boolean. (While it may — and usually does — include references to the results of aggregate function calculations in its condition, the WHERE condition may not reference aggregate functions.)

Using the fruits and colors tables listed above, this SQL statement:

Will have this result:

color count
yellow 2

ORDER BY Clause

The ORDER BY clause takes this form:

[ ORDER BY expression [ ASC | DESC ] [, …] ]

The ORDER BY clause sorts the resultset based on expression. Typically, expression is a column or comma-separated list of columns (including, possibly, columns that are the result of calculations, such as list_price - cost). If there is more than one column, the resultset is sorted by the first column in the list, then columns with equal values for that column are sorted by the second column in the list, and so on.

Columns may be sorted in either ascending or descending order, by providing the ASC or DESC keyword. If this keyword is not provided, ASC is assumed by default.

A resultset may be sorted by a column containing values from an aggregate function. The name of the column as shown in the resultset is the value that must be used in the ORDER BY expression, so it is often advisable to name this column using the AS keyword.

Using the fruits and colors tables listed above, this SQL statement:

Will have this result:

color count
blue 0
green 1
orange 1
purple 1
red 1
yellow 2

And, this SQL statement:

Will have this result:

color count
yellow 2
green 1
orange 1
purple 1
red 1
blue 0

In this case, it is possible to directly reference the count column, and count is probably the best name for the column as well. However, this is not always the case. This SQL statement:

Will have this result:

color ?column?
blue 3
green 4
red 4
purple 4
orange 4
yellow 5

It’s possible to enclose ?column? in quotes to reference it (ORDER BY '?column?'), but it would be better to assign a more descriptive name:

With this result:

color color_plus_3
blue 3
green 4
red 4
purple 4
orange 4
yellow 5