The IN
Predicate
The IN
predicate takes this form:
- expression is usually a column, but can also be a calculated value such as
column1 + column2
. - list is a list of scalar values.
The WHERE IN
clause evaluates to true
if expression is equal to any value in list. The WHERE NOT IN
clause evaluates to true
if expression is not equal to any value in list.
Suppose I have this table:
Family
id | name |
---|---|
1 | Anne |
2 | Bob |
3 | John |
4 | David |
5 | Peter |
6 | Paul |
7 | Mary |
Using this table, this query:
1 2 3 4 |
SELECT id, name FROM family WHERE id IN (5, 6, 7) |
Will have this result:
id | name |
---|---|
5 | Peter |
6 | Paul |
7 | Mary |
And this query:
1 2 3 4 |
SELECT id, name FROM family WHERE id IN (5, 6, 7) |
Will have this result:
id | name |
---|---|
1 | Anne |
2 | Bob |
3 | John |
4 | David |
Subqueries
Subqueries, as the name implies, are queries within queries. They are usually an alternative way to do a query, meaning that there are usually ways to get the same data that a statement with a subquery returns without using subqueries. Subqueries are usually found in the WHERE
clause, although they can be in the FROM
clause as well.
For example, using these tables:
Films
id | title | year | duration | genre_id |
---|---|---|---|---|
1 | Die Hard | 1988 | 132 | 7 |
2 | Casablanca | 1942 | 102 | 5 |
3 | The Conversation | 1974 | 113 | 1 |
4 | 1984 | 1956 | 90 | 4 |
5 | The Godfather | 1972 | 175 | 2 |
6 | Tinker Tailor Soldier Spy | 2011 | 127 | 6 |
7 | The Birdcage | 1996 | 118 | 3 |
Genres
id | name |
---|---|
1 | thriller |
2 | crime |
3 | comedy |
4 | scifi |
5 | drama |
6 | espionage |
7 | action |
I can get the oldest film like this:
1 2 3 4 5 |
SELECT * FROM films ORDER BY year LIMIT 1 |
(LIMIT 1
will return the first value in a resultset in PostgreSQL or MySQL. Different RDBMSs have different ways of getting the first value in a resultset. See this for more information.)
I can also get it this way, using a subquery:
1 2 3 4 5 6 |
SELECT * FROM films WHERE year = ( SELECT min(year) FROM films ); |
Subqueries can often take the place of joins. Suppose I want the genre of the oldest film. I can do it like this:
1 2 3 4 5 6 |
SELECT g.name FROM films f JOIN genres g ON f.genre_id = g.id ORDER BY f.year LIMIT 1; |
Or I could nest a couple of subqueries:
1 2 3 4 5 6 7 8 9 10 |
SELECT name FROM genres WHERE id = ( SELECT genre_id FROM films WHERE year = ( SELECT min(year) FROM films ) ); |
Or I could combine the two approaches:
1 2 3 4 5 6 7 |
SELECT g.name FROM films f JOIN genres g ON f.genre_id = g.id WHERE f.year = ( SELECT min(year) FROM films ); |
The IN
Predicate with Subqueries
The list value in the IN
predicate can also be the resultset of a subquery. The resultset must have exactly one column, and that column becomes the list of scalar values that IN
evaluates.
Suppose, along with films
and genres
, I also have this table:
Directors
id | name |
---|---|
1 | Mike Nichols |
2 | Michael Anderson |
3 | John McTiernan |
4 | Tomas Alfredson |
5 | Francis Ford Coppola |
6 | Michael Curtiz |
Now, I want to find all the directors who have directed a sci-fi film. I can do this:
1 2 3 4 5 6 7 |
SELECT name FROM directors WHERE id IN ( SELECT director_id FROM films WHERE genre_id = 4 ); |
Now, if I don’t know the genre_id
(as I probably don’t), I can use a subquery in the inner WHERE
clause to look it up from the name
value:
1 2 3 4 5 6 7 8 9 10 |
SELECT name FROM directors WHERE id IN ( SELECT director_id FROM films WHERE genre_id = ( SELECT id FROM genre WHERE name = 'scifi' ) ); |
And again, I can get the same data using joins:
1 2 3 4 5 6 |
SELECT name FROM directors d JOIN films f ON d.id = f.director_id JOIN genres g on g.id = f.genre_id WHERE g.name = 'scifi' |
In a large database, it’s worth benchmarking the different ways to get the result you want, because one way will often outperform the others.