The IN Predicate
The IN predicate takes this form:
- expression is usually a column, but can also be a calculated value such as
column1 + column2. - list is a list of scalar values.
The WHERE IN clause evaluates to true if expression is equal to any value in list. The WHERE NOT IN clause evaluates to true if expression is not equal to any value in list.
Suppose I have this table:
Family
| id | name |
|---|---|
| 1 | Anne |
| 2 | Bob |
| 3 | John |
| 4 | David |
| 5 | Peter |
| 6 | Paul |
| 7 | Mary |
Using this table, this query:
|
1 2 3 4 |
SELECT id, name FROM family WHERE id IN (5, 6, 7) |
Will have this result:
| id | name |
|---|---|
| 5 | Peter |
| 6 | Paul |
| 7 | Mary |
And this query:
|
1 2 3 4 |
SELECT id, name FROM family WHERE id IN (5, 6, 7) |
Will have this result:
| id | name |
|---|---|
| 1 | Anne |
| 2 | Bob |
| 3 | John |
| 4 | David |
Subqueries
Subqueries, as the name implies, are queries within queries. They are usually an alternative way to do a query, meaning that there are usually ways to get the same data that a statement with a subquery returns without using subqueries. Subqueries are usually found in the WHERE clause, although they can be in the FROM clause as well.
For example, using these tables:
Films
| id | title | year | duration | genre_id |
|---|---|---|---|---|
| 1 | Die Hard | 1988 | 132 | 7 |
| 2 | Casablanca | 1942 | 102 | 5 |
| 3 | The Conversation | 1974 | 113 | 1 |
| 4 | 1984 | 1956 | 90 | 4 |
| 5 | The Godfather | 1972 | 175 | 2 |
| 6 | Tinker Tailor Soldier Spy | 2011 | 127 | 6 |
| 7 | The Birdcage | 1996 | 118 | 3 |
Genres
| id | name |
|---|---|
| 1 | thriller |
| 2 | crime |
| 3 | comedy |
| 4 | scifi |
| 5 | drama |
| 6 | espionage |
| 7 | action |
I can get the oldest film like this:
|
1 2 3 4 5 |
SELECT * FROM films ORDER BY year LIMIT 1 |
(LIMIT 1 will return the first value in a resultset in PostgreSQL or MySQL. Different RDBMSs have different ways of getting the first value in a resultset. See this for more information.)
I can also get it this way, using a subquery:
|
1 2 3 4 5 6 |
SELECT * FROM films WHERE year = ( SELECT min(year) FROM films ); |
Subqueries can often take the place of joins. Suppose I want the genre of the oldest film. I can do it like this:
|
1 2 3 4 5 6 |
SELECT g.name FROM films f JOIN genres g ON f.genre_id = g.id ORDER BY f.year LIMIT 1; |
Or I could nest a couple of subqueries:
|
1 2 3 4 5 6 7 8 9 10 |
SELECT name FROM genres WHERE id = ( SELECT genre_id FROM films WHERE year = ( SELECT min(year) FROM films ) ); |
Or I could combine the two approaches:
|
1 2 3 4 5 6 7 |
SELECT g.name FROM films f JOIN genres g ON f.genre_id = g.id WHERE f.year = ( SELECT min(year) FROM films ); |
The IN Predicate with Subqueries
The list value in the IN predicate can also be the resultset of a subquery. The resultset must have exactly one column, and that column becomes the list of scalar values that IN evaluates.
Suppose, along with films and genres, I also have this table:
Directors
| id | name |
|---|---|
| 1 | Mike Nichols |
| 2 | Michael Anderson |
| 3 | John McTiernan |
| 4 | Tomas Alfredson |
| 5 | Francis Ford Coppola |
| 6 | Michael Curtiz |
Now, I want to find all the directors who have directed a sci-fi film. I can do this:
|
1 2 3 4 5 6 7 |
SELECT name FROM directors WHERE id IN ( SELECT director_id FROM films WHERE genre_id = 4 ); |
Now, if I don’t know the genre_id (as I probably don’t), I can use a subquery in the inner WHERE clause to look it up from the name value:
|
1 2 3 4 5 6 7 8 9 10 |
SELECT name FROM directors WHERE id IN ( SELECT director_id FROM films WHERE genre_id = ( SELECT id FROM genre WHERE name = 'scifi' ) ); |
And again, I can get the same data using joins:
|
1 2 3 4 5 6 |
SELECT name FROM directors d JOIN films f ON d.id = f.director_id JOIN genres g on g.id = f.genre_id WHERE g.name = 'scifi' |
In a large database, it’s worth benchmarking the different ways to get the result you want, because one way will often outperform the others.
