WHERE, GROUP BY, HAVING and ORDER BY Clauses in SQL

where I’ll use these tables in the examples:

fruits

id	name	color_id
1	orange	5
2	cherry	1
3	lemon	2
4	grape	6
5	lime	4
6	date	null
7	pear	2

colors

id	color
1	red
2	yellow
3	blue
4	green
5	orange
6	purple

`WHERE` Clause

The WHERE clause takes this form:

[ WHERE condition ]

where The WHERE clause can contain any expression that evaluates to type boolean. The WHERE clause filters the resultset such that it includes only those rows where condition evaluates to true, when the corresponding values in the row being evaluated are substituted for any column references in condition.

Using the fruits and colors tables listed above, this SQL statement:


SELECT name
  FROM fruits f
  JOIN colors c ON f.color_id = c.id
 WHERE c.color = 'yellow'

SELECT name

FROM fruits f

JOIN colors c ON f.color_id = c.id

WHERE c.color = 'yellow'

Will have this result:

name
lemon
pear

`GROUP BY` Clause

The GROUP BY clause takes this form:

[ GROUP BY grouping_element [, …] ]

GROUP BY is used to group a set of columns together, typically performing some sort of aggregate function (count, sum, avg, etc.) on them while doing so. GROUP BY will condense into a single row those rows that have equivalent values in the grouping_element list. While doing the condensing, it will calculate the value of any aggregate functions that are specified, for the rows in the group being condensed.

Using the fruits and colors tables listed above, this SQL statement:


SELECT c.color, count(f.color_id)
  FROM colors c
  LEFT JOIN fruits f ON c.id = f.color_id
 GROUP BY c.color;

SELECT c.color, count(f.color_id)

FROM colors c

LEFT JOIN fruits f ON c.id = f.color_id

GROUP BY c.color;

Will have this result:

color	count
yellow	2
green	1
blue	0
red	1
purple	1
orange	1

(If we don’t want colors with zero fruits, then we would do an inner join instead of a left join.)

Any columns that show in the resultset (or the primary key of the table containing those columns) must be listed in the GROUP BY clause. Failure to do so will cause an error.

If no aggregate function is invoked, then GROUP BY simply eliminates duplicates in the resultset. This is more concisely handled by using DISTINCT.

`HAVING` Clause

The HAVING clause takes this form:

[ HAVING condition [, …] ]

The HAVING clause is used in conjunction with GROUP BY to filter groups, usually by evaluating the results of aggregate functions. Groups that do not meet the criteria specified in condition are eliminated. In this, it is similar to the WHERE clause, but while the WHERE clause eliminates individual rows that do not match its specified criteria, the HAVING clause eliminates groups that do not match.

The specification for condition in the HAVING clause is the same as its specification in the WHERE clause: it can contain any expression that evaluates to type boolean. (While it may — and usually does — include references to the results of aggregate function calculations in its condition, the WHERE condition may not reference aggregate functions.)

Using the fruits and colors tables listed above, this SQL statement:


SELECT c.color, count(f.color_id)
  FROM colors c
  LEFT JOIN fruits f ON c.id = f.color_id
 GROUP BY c.color
HAVING count(f.color_id) > 1;

SELECT c.color, count(f.color_id)

FROM colors c

LEFT JOIN fruits f ON c.id = f.color_id

GROUP BY c.color

HAVING count(f.color_id) > 1;

Will have this result:

color	count
yellow	2

`ORDER BY` Clause

The ORDER BY clause takes this form:

[ ORDER BY expression [ ASC | DESC ] [, …] ]

The ORDER BY clause sorts the resultset based on expression. Typically, expression is a column or comma-separated list of columns (including, possibly, columns that are the result of calculations, such as list_price - cost). If there is more than one column, the resultset is sorted by the first column in the list, then columns with equal values for that column are sorted by the second column in the list, and so on.

Columns may be sorted in either ascending or descending order, by providing the ASC or DESC keyword. If this keyword is not provided, ASC is assumed by default.

A resultset may be sorted by a column containing values from an aggregate function. The name of the column as shown in the resultset is the value that must be used in the ORDER BY expression, so it is often advisable to name this column using the AS keyword.

Using the fruits and colors tables listed above, this SQL statement:


SELECT c.color, count(f.color_id)
  FROM colors c
  LEFT JOIN fruits f ON c.id = f.color_id
 GROUP BY c.color
 ORDER BY c.color;

SELECT c.color, count(f.color_id)

FROM colors c

LEFT JOIN fruits f ON c.id = f.color_id

GROUP BY c.color

ORDER BY c.color;

Will have this result:

color	count
blue	0
green	1
orange	1
purple	1
red	1
yellow	2

And, this SQL statement:


SELECT c.color, count(f.color_id)
  FROM colors c
  LEFT JOIN fruits f ON c.id = f.color_id
 GROUP BY c.color
 ORDER BY count DESC, c.color;

SELECT c.color, count(f.color_id)

FROM colors c

LEFT JOIN fruits f ON c.id = f.color_id

GROUP BY c.color

ORDER BY count DESC, c.color;

Will have this result:

color	count
yellow	2
green	1
orange	1
purple	1
red	1
blue	0

In this case, it is possible to directly reference the count column, and count is probably the best name for the column as well. However, this is not always the case. This SQL statement:


SELECT c.color, count(f.color_id) + 3
  FROM colors c
  LEFT JOIN fruits f ON c.id = f.color_id
 GROUP BY c.color
 ORDER BY '?column?'

SELECT c.color, count(f.color_id) + 3

FROM colors c

LEFT JOIN fruits f ON c.id = f.color_id

GROUP BY c.color

ORDER BY '?column?'

Will have this result:

color	?column?
blue	3
green	4
red	4
purple	4
orange	4
yellow	5

It’s possible to enclose ?column? in quotes to reference it (ORDER BY '?column?'), but it would be better to assign a more descriptive name:


SELECT c.color, 
       count(f.color_id) + 3 AS color_plus_3
  FROM colors c
  LEFT JOIN fruits f ON c.id = f.color_id
 GROUP BY c.color
 ORDER BY color_plus_3

SELECT c.color,

count(f.color_id) + 3 AS color_plus_3

FROM colors c

LEFT JOIN fruits f ON c.id = f.color_id

GROUP BY c.color

ORDER BY color_plus_3

With this result:

color	color_plus_3
blue	3
green	4
red	4
purple	4
orange	4
yellow	5

Robert Rodes

Software Developer

Turning ideas into software...

Robert Rodes

SQL, Chapter 4: WHERE, GROUP BY, HAVING and ORDER BY Clauses

fruits

colors

`WHERE` Clause

`GROUP BY` Clause

`HAVING` Clause

`ORDER BY` Clause

fruits

colors

WHERE Clause

GROUP BY Clause

HAVING Clause

ORDER BY Clause

`WHERE` Clause

`GROUP BY` Clause

`HAVING` Clause

`ORDER BY` Clause