SQL Code

SQL, Chapter 5: the WHERE Clause’s IN Predicate, and Subqueries

posted in: Software Engineering, SQL | 0

The IN Predicate

The IN predicate takes this form:

WHERE expression [NOT] IN (list)
  • expression is usually a column, but can also be a calculated value such as column1 + column2.
  • list is a list of scalar values.

The WHERE IN clause evaluates to true if expression is equal to any value in list. The WHERE NOT IN clause evaluates to true if expression is not equal to any value in list.

Suppose I have this table:

Family

id name
1 Anne
2 Bob
3 John
4 David
5 Peter
6 Paul
7 Mary

Using this table, this query:

Will have this result:

id name
5 Peter
6 Paul
7 Mary

And this query:

Will have this result:

id name
1 Anne
2 Bob
3 John
4 David

Subqueries

Subqueries, as the name implies, are queries within queries. They are usually an alternative way to do a query, meaning that there are usually ways to get the same data that a statement with a subquery returns without using subqueries. Subqueries are usually found in the WHERE clause, although they can be in the FROM clause as well.

For example, using these tables:

Films

id title year duration genre_id
1 Die Hard 1988 132 7
2 Casablanca 1942 102 5
3 The Conversation 1974 113 1
4 1984 1956 90 4
5 The Godfather 1972 175 2
6 Tinker Tailor Soldier Spy 2011 127 6
7 The Birdcage 1996 118 3

Genres

id name
1 thriller
2 crime
3 comedy
4 scifi
5 drama
6 espionage
7 action

I can get the oldest film like this:

(LIMIT 1 will return the first value in a resultset in PostgreSQL or MySQL. Different RDBMSs have different ways of getting the first value in a resultset. See this for more information.)

I can also get it this way, using a subquery:

Subqueries can often take the place of joins. Suppose I want the genre of the oldest film. I can do it like this:

Or I could nest a couple of subqueries:

Or I could combine the two approaches:

The IN Predicate with Subqueries

The list value in the IN predicate can also be the resultset of a subquery. The resultset must have exactly one column, and that column becomes the list of scalar values that IN evaluates.

Suppose, along with films and genres, I also have this table:

Directors

id name
1 Mike Nichols
2 Michael Anderson
3 John McTiernan
4 Tomas Alfredson
5 Francis Ford Coppola
6 Michael Curtiz

Now, I want to find all the directors who have directed a sci-fi film. I can do this:

Now, if I don’t know the genre_id (as I probably don’t), I can use a subquery in the inner WHERE clause to look it up from the name value:

And again, I can get the same data using joins:

In a large database, it’s worth benchmarking the different ways to get the result you want, because one way will often outperform the others.