MOO WebTech
Subqueries & CTEtheory-practiceintermediate

Subqueries

Writing subqueries in WHERE, FROM, and SELECT clauses; correlated subqueries and EXISTS.

80 minL34

๐ŸŽฏLearning Objectives

  • Explain what a subquery is and where it can appear in a SQL statement
  • Write non-correlated subqueries using IN and single-value comparisons
  • Build derived tables in the FROM clause and query them like regular tables
  • Use scalar subqueries in the SELECT clause to compute context values
  • Write correlated subqueries that reference the outer query
  • Use EXISTS and NOT EXISTS to test for the presence or absence of related rows

๐Ÿ“–Theory

1. What Is a Subquery

A subquery is a SELECT statement nested inside another SQL statement. The outer statement treats the result of the inner SELECT as if it were a table, a list, or a single value.

Subqueries appear in three main positions:

  • WHERE clause โ€” filter rows based on a condition derived from another query
  • FROM clause โ€” treat a query result as a temporary table (called a derived table)
  • SELECT clause โ€” compute a scalar value for each output row
SQL
-- Shape of each position
SELECT ... FROM t WHERE col IN (SELECT col FROM t2);           -- WHERE
SELECT ... FROM (SELECT ... FROM t) AS alias;                  -- FROM
SELECT col, (SELECT MAX(x) FROM t2) AS max_val FROM t;         -- SELECT

2. Non-Correlated Subquery in WHERE with IN

A non-correlated subquery runs once and its result is passed to the outer query. The IN operator tests whether a value exists in that result list.

SQL
-- Films in the 'Action' category
SELECT title, rental_rate
FROM film
WHERE film_id IN (
    SELECT film_id FROM film_category
    WHERE category_id = (SELECT category_id FROM category WHERE name = 'Action')
);

The innermost subquery resolves 'Action' to a category_id. The middle subquery returns all film_id values for that category. The outer query filters film by those IDs.

Tip: IN (subquery) is equivalent to = ANY (subquery) in PostgreSQL. Both accept a multi-row subquery result.

3. Single-Row Subquery

When you know a subquery will return exactly one row and one column, you can use = instead of IN. If the subquery unexpectedly returns more than one row, PostgreSQL raises an error.

SQL
-- The customer who spent the most overall
SELECT first_name, last_name
FROM customer
WHERE customer_id = (
    SELECT customer_id
    FROM payment
    GROUP BY customer_id
    ORDER BY SUM(amount) DESC
    LIMIT 1
);

The inner query returns a single customer_id โ€” the top spender. The outer query looks up that customer's name.

4. Derived Table in the FROM Clause

A subquery in the FROM clause is called a derived table or inline view. It must always have an alias. You query it exactly like a regular table.

SQL
-- Average film length by rating, filtered to ratings above 100 minutes
SELECT rating, avg_length
FROM (
    SELECT rating, AVG(length) AS avg_length
    FROM film
    GROUP BY rating
) AS film_stats
WHERE avg_length > 100;

The inner query computes averages; the outer query filters them. You cannot put a WHERE clause with an aggregate condition directly inside the inner GROUP BY without a HAVING clause, but wrapping in a derived table gives you that flexibility cleanly.

5. Scalar Subquery in SELECT

A scalar subquery in the SELECT list is evaluated for every row in the outer query. It must return exactly one row and one column.

SQL
-- Each film's rental rate and how far it is from the overall average
SELECT
    title,
    rental_rate,
    (SELECT AVG(rental_rate) FROM film) AS overall_avg,
    rental_rate - (SELECT AVG(rental_rate) FROM film) AS diff_from_avg
FROM film
ORDER BY diff_from_avg DESC
LIMIT 5;

Tip: PostgreSQL caches the result of a stable scalar subquery, so (SELECT AVG(rental_rate) FROM film) is typically computed once, not once per row.

6. Correlated Subquery

A correlated subquery references a column from the outer query. It is re-evaluated for every row in the outer result โ€” once per row. This makes it powerful but potentially slow on large datasets.

SQL
-- For each customer, show their total number of rentals
SELECT
    c.first_name,
    c.last_name,
    (SELECT COUNT(*) FROM rental r WHERE r.customer_id = c.customer_id) AS rental_count
FROM customer c
ORDER BY rental_count DESC
LIMIT 10;

The inner SELECT COUNT(*) runs once for each customer row, using c.customer_id from the outer query. The result is different for every customer.

7. EXISTS and NOT EXISTS

EXISTS tests whether a subquery returns at least one row. It stops scanning as soon as the first matching row is found โ€” making it efficient for existence checks.

SQL
-- Customers who currently have an unreturned rental
SELECT first_name, last_name
FROM customer c
WHERE EXISTS (
    SELECT 1 FROM rental r
    WHERE r.customer_id = c.customer_id
      AND r.return_date IS NULL
);

The SELECT 1 inside EXISTS is a convention โ€” only the existence of a row matters, not its value.

NOT EXISTS is the opposite: it keeps the outer row only when the subquery returns nothing.

SQL
-- Films that have never been rented
SELECT title
FROM film f
WHERE NOT EXISTS (
    SELECT 1 FROM inventory i
    JOIN rental r ON i.inventory_id = r.inventory_id
    WHERE i.film_id = f.film_id
);

8. Subquery vs JOIN: When to Use Each

SituationPrefer
Need columns from both tables in outputJOIN
Just testing existence or membershipEXISTS / IN (subquery)
Computing an aggregate for contextScalar subquery or CTE
Complex filtering that reads naturally as nestedSubquery
Performance-critical path on large tablesJOIN (usually faster)

Subqueries often express intent more clearly. JOINs are often faster for large datasets because the query planner has more optimisation options. Use the approach that is clearest first; optimise with JOINs or CTEs if needed.

๐Ÿ’ปCode Examples

Example A โ€” Films in a Given Category (Nested IN Subqueries)

SQL
SELECT f.title,
       f.rental_rate,
       f.rating
FROM film f
WHERE f.film_id IN (
    SELECT fc.film_id
    FROM film_category fc
    WHERE fc.category_id = (
        SELECT c.category_id
        FROM category c
        WHERE c.name = 'Horror'
    )
)
ORDER BY f.rental_rate DESC;

This reads naturally: "give me films whose ID appears in the list of film IDs that belong to the Horror category." No JOINs are needed; the nesting communicates the logic directly.

Example B โ€” Customers Above Their Store's Average Payment (Derived Table)

SQL
SELECT c.first_name,
       c.last_name,
       cust_stats.total_paid,
       store_avg.avg_payment
FROM customer c
INNER JOIN (
    SELECT customer_id, SUM(amount) AS total_paid
    FROM payment
    GROUP BY customer_id
) AS cust_stats ON c.customer_id = cust_stats.customer_id
CROSS JOIN (
    SELECT AVG(amount) AS avg_payment FROM payment
) AS store_avg
WHERE cust_stats.total_paid > store_avg.avg_payment
ORDER BY cust_stats.total_paid DESC
LIMIT 10;

Two derived tables are used: one aggregates per-customer totals, and one computes the overall average. The outer query compares them.

Example C โ€” Film Stats With Context (Scalar Subqueries in SELECT)

SQL
SELECT
    f.title,
    f.rental_rate,
    f.length,
    (SELECT AVG(rental_rate) FROM film)  AS avg_rate,
    (SELECT AVG(length)      FROM film)  AS avg_length,
    ROUND(f.rental_rate - (SELECT AVG(rental_rate) FROM film), 2) AS rate_delta,
    ROUND(f.length      - (SELECT AVG(length)      FROM film), 2) AS length_delta
FROM film f
WHERE f.rating = 'PG'
ORDER BY rate_delta DESC
LIMIT 10;

Each row shows the film's values alongside the overall averages, making it easy to see how PG-rated films compare to the full catalogue.

โœ๏ธPractice Tasks

Task 1Films Longer Than Average
EASY

Use a single-row subquery in the WHERE clause to find all films whose length is greater than the average film length. Return title, length, and rating. Order by length descending. Limit to 15 rows.

Hint

The subquery is (SELECT AVG(length) FROM film). Use > to compare.

SQL
SELECT title, length, rating
FROM film
WHERE length > (SELECT AVG(length) FROM film)
ORDER BY length DESC
LIMIT 15;
Task 2Top Renter Per Store
MEDIUM

For each store, find the customer who made the most rentals. Return store_id, customer_id, first_name, last_name, and rental_count.

Use a derived table that counts rentals per customer per store, then join it to customer.

Hint

The derived table groups by store_id and customer_id. After joining to customer, filter each store to its maximum rental count using another subquery or by ranking. One clean approach:

SQL
SELECT s.store_id, c.customer_id, c.first_name, c.last_name, rs.rental_count
FROM (
    SELECT cu.store_id,
           r.customer_id,
           COUNT(*) AS rental_count,
           RANK() OVER (PARTITION BY cu.store_id ORDER BY COUNT(*) DESC) AS rnk
    FROM rental r
    JOIN customer cu ON r.customer_id = cu.customer_id
    GROUP BY cu.store_id, r.customer_id
) AS rs
JOIN customer c ON rs.customer_id = c.customer_id
JOIN store s    ON rs.store_id    = s.store_id
WHERE rs.rnk = 1
ORDER BY s.store_id;
Task 3Customers With Overdue Rentals and Their Debt
HARD

An overdue rental is one where return_date IS NULL and NOW() - rental_date > INTERVAL '7 days'.

Write a query that returns each customer's name, email, the number of overdue rentals they have, and their total unpaid rental fees estimated as overdue_count * 1.99. Use a correlated subquery or derived table. Order by overdue_count descending.

Hint

Use a derived table that counts overdue rentals per customer, then join to customer:

SQL
SELECT c.first_name,
       c.last_name,
       c.email,
       od.overdue_count,
       ROUND(od.overdue_count * 1.99, 2) AS estimated_fee
FROM customer c
INNER JOIN (
    SELECT customer_id, COUNT(*) AS overdue_count
    FROM rental
    WHERE return_date IS NULL
      AND NOW() - rental_date > INTERVAL '7 days'
    GROUP BY customer_id
) AS od ON c.customer_id = od.customer_id
ORDER BY od.overdue_count DESC;

โš ๏ธCommon Mistakes

Returning more than one row with =. Using = (subquery) when the subquery returns multiple rows causes an error: "more than one row returned by a subquery used as an expression." Use IN when multiple rows are possible.

Forgetting to alias a derived table. Every subquery in the FROM clause must have an alias. FROM (SELECT ...) AS alias โ€” without AS alias, PostgreSQL raises a syntax error.

Correlated subquery with poor performance. A correlated subquery runs once per outer row. On a table with 100,000 rows, that is 100,000 executions. For aggregation, a JOIN + GROUP BY or CTE is almost always faster.

*EXISTS with SELECT . WHERE EXISTS (SELECT * FROM ...) works but is a minor anti-pattern. Use SELECT 1 โ€” it is conventional, slightly more readable, and makes the intent clear.

Scalar subquery returning NULL. If a scalar subquery finds no rows, it returns NULL. Arithmetic with NULL propagates NULL. Use COALESCE((subquery), 0) when a default makes sense.

Nesting too deeply. Three or more levels of nested subqueries become hard to read and debug. Beyond two levels, refactor with CTEs.

๐ŸŽ“Instructor Notes

Timing breakdown:

  • 0โ€“5 min: Warm-up โ€” show a query with a hardcoded ID and ask "how do we avoid this?"
  • 5โ€“20 min: Non-correlated WHERE subquery โ€” IN vs single-value, live demo
  • 20โ€“30 min: Derived table in FROM โ€” show aggregation then filtering
  • 30โ€“40 min: Scalar subquery in SELECT โ€” live demo with the average comparison
  • 40โ€“50 min: Correlated subquery โ€” emphasise one-execution-per-row behaviour; draw the loop
  • 50โ€“60 min: EXISTS / NOT EXISTS โ€” start with the "films never rented" problem
  • 60โ€“80 min: Students work on Tasks 1โ€“3

Live-coding cues:

  • Run the inner subquery first, alone, to show its result. Then wrap it in the outer query. This builds the mental model of "inside-out" execution.
  • For correlated subqueries, use EXPLAIN to show the "SubPlan" node and spark a discussion about cost.
  • Deliberately use = (SELECT ...) where multiple rows are possible, let it fail, then switch to IN.

Discussion questions:

  • "When would you choose EXISTS over IN? Does it matter for correctness? For performance?"
  • "What is the danger of a scalar subquery that sometimes returns zero rows?"
  • "Can a derived table in FROM be joined to another table? Can it reference an outer query?"