Table of Contents
A subquery is a query nested inside another query. Think of it like asking a question within a question: "Show me all employees whose salary is above (what is the average salary?)." The part in parentheses is the subquery — MySQL runs it first, gets the answer, then uses that answer in the outer query.
10.1 — Scalar Subquery in SELECT
A scalar subquery returns exactly one value (one row, one column). You can use it as a "computed column" inside your SELECT list. MySQL runs the subquery once and pastes that single value into every row of your result.
SELECT
c.customer_id,
c.name,
c.city,
(SELECT COUNT(*) FROM orders) AS total_orders_in_system,
(SELECT ROUND(AVG(order_count), 1)
FROM (
SELECT COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) AS avg_tbl
) AS avg_orders_per_customer
FROM customers c
LIMIT 5;| customer_id | name | city | total_orders_in_system | avg_orders_per_customer |
|---|---|---|---|---|
| 1 | Alice Johnson | New York | 1284 | 4.2 |
| 2 | Bob Smith | Chicago | 1284 | 4.2 |
| 3 | Carol Davis | Houston | 1284 | 4.2 |
| 4 | David Wilson | Phoenix | 1284 | 4.2 |
| 5 | Elena Martin | Seattle | 1284 | 4.2 |
Notice: the scalar subquery values (1284 and 4.2) are the same on every row because they are computed once.
10.2 — Subquery in WHERE with Comparison
You can use a subquery inside a WHERE clause with comparison operators like =, >, <. The subquery computes a single value, and the outer query filters rows against it. Think of it as: "First, figure out the average — then show me everything above that number."
-- Find customers who have more orders than the average customer
SELECT
c.customer_id,
c.name,
c.city,
COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.city
HAVING COUNT(o.order_id) > (
-- This subquery calculates the average number of orders per customer
SELECT AVG(cnt)
FROM (
SELECT COUNT(*) AS cnt
FROM orders
GROUP BY customer_id
) AS sub
)
ORDER BY order_count DESC;| customer_id | name | city | order_count |
|---|---|---|---|
| 12 | Rachel Green | Denver | 11 |
| 3 | Carol Davis | Houston | 9 |
| 7 | Frank Miller | Boston | 8 |
| 1 | Alice Johnson | New York | 7 |
| 15 | Tom Baker | Portland | 6 |
Average is 4.2, so only customers with more than 4 orders appear.
10.3 — Subquery in WHERE with IN
When the subquery returns multiple rows (a list of values), you cannot use = — you need IN. It is like saying: "Give me all customers whose ID is in this list." The subquery generates the list.
SELECT
c.customer_id,
c.name,
c.email,
c.city
FROM customers c
WHERE c.customer_id IN (
-- Subquery: get IDs of customers with priority orders
SELECT DISTINCT o.customer_id
FROM orders o
WHERE o.priority = 'Urgent'
)
ORDER BY c.name;| customer_id | name | city | |
|---|---|---|---|
| 3 | Carol Davis | carol@example.com | Houston |
| 7 | Frank Miller | frank@example.com | Boston |
| 1 | Alice Johnson | alice@example.com | New York |
| 12 | Rachel Green | rachel@example.com | Denver |
10.4 — Correlated Subquery
A correlated subquery is special because it references a column from the outer query. This means the subquery cannot run on its own — it depends on the outer query. MySQL runs it once for every row in the outer query. It is slower but very powerful when you need per-row calculations.
Think of it like a manager reviewing reports: for each employee (outer row), they check that employee's sales history (subquery). They cannot review all employees at once because each check is different.
Regular Subquery
Correlated Subquery
SELECT
c.customer_id,
c.name,
c.city,
o.order_id,
o.order_date,
o.priority
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date = (
-- Correlated: references c.customer_id from the OUTER query
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.customer_id = c.customer_id -- <-- this links to outer query!
)
ORDER BY o.order_date DESC;| customer_id | name | city | order_id | order_date | priority |
|---|---|---|---|---|---|
| 12 | Rachel Green | Denver | 1047 | 2025-02-20 | Urgent |
| 7 | Frank Miller | Boston | 1045 | 2025-02-18 | Standard |
| 3 | Carol Davis | Houston | 1042 | 2025-02-15 | Urgent |
| 1 | Alice Johnson | New York | 1039 | 2025-02-12 | Standard |
| 15 | Tom Baker | Portland | 1036 | 2025-02-08 | Express |
10.5 — Derived Table (Subquery in FROM)
A derived table is a subquery placed in the FROM clause. It acts like a temporary table that exists only for that one query. You must give it an alias name. This is great for calculating summaries first, then filtering or joining against them.
-- Step 1: The subquery creates a "temporary table" of department order counts
-- Step 2: The outer query filters it to find high-volume departments
SELECT
dept_stats.department,
dept_stats.manager_name,
dept_stats.order_count,
dept_stats.urgent_orders
FROM (
-- This whole subquery IS the derived table
SELECT
d.name AS manager_name,
d.department,
COUNT(o.order_id) AS order_count,
SUM(CASE WHEN o.priority = 'Urgent' THEN 1 ELSE 0 END) AS urgent_orders
FROM departments d
JOIN orders o ON d.department_id = o.department_id
GROUP BY d.name, d.department
) AS dept_stats -- <-- alias is REQUIRED
WHERE dept_stats.order_count > 10
ORDER BY dept_stats.order_count DESC;| department | manager_name | order_count | urgent_orders |
|---|---|---|---|
| Sales | Linda Park | 47 | 18 |
| Marketing | James Taylor | 35 | 5 |
| Engineering | Sarah Chen | 28 | 9 |
| Support | Mike Brown | 22 | 12 |
| Operations | Nancy White | 15 | 3 |
10.6 — EXISTS Subquery
EXISTS checks whether a subquery returns any rows at all. It does not care about the actual values — it just asks "does at least one matching row exist?" It returns TRUE or FALSE. This is often faster than IN because it stops searching as soon as it finds the first match.
SELECT
c.customer_id,
c.name,
c.email,
c.city
FROM customers c
WHERE EXISTS (
-- Does this customer have at least one high-value order?
SELECT 1 -- value doesn't matter, EXISTS just checks for rows
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.total > 500
)
ORDER BY c.name;| customer_id | name | city | |
|---|---|---|---|
| 3 | Carol Davis | carol@example.com | Houston |
| 7 | Frank Miller | frank@example.com | Boston |
| 4 | David Wilson | david@example.com | Phoenix |
| 12 | Rachel Green | rachel@example.com | Denver |
| 1 | Alice Johnson | alice@example.com | New York |
SELECT 1 is a common convention to make this clear.
10.7 — NOT EXISTS
NOT EXISTS is the opposite of EXISTS. It returns TRUE when the subquery produces zero rows. This is perfect for finding "orphan" records — rows that have no matching rows in another table. For example, customers who have never placed an order.
SELECT
c.customer_id,
c.name,
c.email,
c.city
FROM customers c
WHERE NOT EXISTS (
-- Is there any order at all for this customer?
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
)
ORDER BY c.name;| customer_id | name | city | |
|---|---|---|---|
| 20 | Kevin Adams | kevin@example.com | Austin |
| 18 | Laura Bennett | laura@example.com | Miami |
| 21 | Olivia Foster | olivia@example.com | Atlanta |
These 3 customers are registered but have never placed an order.
- EXISTS is usually faster when the subquery table is large, because it stops at the first match.
- IN is simpler to read and works well when the subquery returns a small list.
- NOT EXISTS handles NULLs correctly (unlike NOT IN, which can give unexpected results if the subquery returns NULLs).
Set operations let you combine the results of two or more SELECT queries into one result set. Think of it like combining two lists. All set operations require that both queries return the same number of columns with compatible data types.
11.1 — UNION (Combine + Deduplicate)
UNION takes the results of two SELECT statements and stacks them on top of each other, then removes any duplicate rows. It is like merging two attendance lists and crossing off any names that appear on both.
-- Query 1: Customers with orders from the Online channel
SELECT c.customer_id, c.name, c.city, 'Online' AS source
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.channel = 'Online'
UNION -- removes duplicates
-- Query 2: Customers with Urgent priority orders
SELECT c.customer_id, c.name, c.city, 'Urgent Order' AS source
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.priority = 'Urgent';Query 1 returns:
| customer_id | name | city | source |
|---|---|---|---|
| 1 | Alice Johnson | New York | Online |
| 3 | Carol Davis | Houston | Online |
| 7 | Frank Miller | Boston | Online |
| 12 | Rachel Green | Denver | Online |
Query 2 returns:
| customer_id | name | city | source |
|---|---|---|---|
| 3 | Carol Davis | Houston | Urgent Order |
| 7 | Frank Miller | Boston | Urgent Order |
| 12 | Rachel Green | Denver | Urgent Order |
| 15 | Tom Baker | Portland | Urgent Order |
UNION result (combined, duplicates removed):
| customer_id | name | city | source |
|---|---|---|---|
| 1 | Alice Johnson | New York | Online |
| 3 | Carol Davis | Houston | Online |
| 3 | Carol Davis | Houston | Urgent Order |
| 7 | Frank Miller | Boston | Online |
| 7 | Frank Miller | Boston | Urgent Order |
| 12 | Rachel Green | Denver | Online |
| 12 | Rachel Green | Denver | Urgent Order |
| 15 | Tom Baker | Portland | Urgent Order |
Note: rows are unique because the "source" column differs. UNION deduplicates based on ALL columns together.
11.2 — UNION ALL (Combine Keeping Duplicates)
UNION ALL works just like UNION, but it does not remove duplicates. It simply stacks all rows from both queries. It is faster than UNION because it skips the deduplication step. Use it when you know there are no duplicates, or when you deliberately want to keep them.
-- All Electronics orders from January
SELECT o.customer_id, p.name AS product_name, o.total, o.order_date
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Electronics'
AND o.order_date BETWEEN '2025-01-01' AND '2025-01-31'
UNION ALL -- keeps ALL rows, even if identical
-- All Clothing orders from January
SELECT o.customer_id, p.name AS product_name, o.total, o.order_date
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Clothing'
AND o.order_date BETWEEN '2025-01-01' AND '2025-01-31';| customer_id | product_name | total | order_date |
|---|---|---|---|
| 1 | Wireless Headphones | 149.99 | 2025-01-05 |
| 3 | Bluetooth Speaker | 89.50 | 2025-01-10 |
| 7 | USB-C Hub | 45.00 | 2025-01-12 |
| 12 | Laptop Stand | 129.00 | 2025-01-20 |
| 1 | Winter Jacket | 199.00 | 2025-01-05 |
| 3 | Running Shoes | 120.50 | 2025-01-10 |
| 7 | Denim Jeans | 65.00 | 2025-01-12 |
| 15 | Wool Sweater | 85.00 | 2025-01-22 |
11.3 — INTERSECT (Common Rows Only)
INTERSECT returns only the rows that appear in both query results. It is the overlap area in a Venn diagram. Available in MySQL 8.0.31 and later.
-- Customers who bought Electronics
SELECT o.customer_id
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Electronics'
INTERSECT
-- Customers who bought Clothing
SELECT o.customer_id
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Clothing';| customer_id |
|---|
| 1 |
| 3 |
| 7 |
| 12 |
Only customers who appear in both result sets are returned.
11.4 — EXCEPT (Rows in First But Not Second)
EXCEPT returns rows from the first query that are not present in the second query. Think of it as "subtraction" for result sets. Available in MySQL 8.0.31 and later.
-- Customers who bought Electronics
SELECT o.customer_id
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Electronics'
EXCEPT
-- Remove those who also bought Clothing
SELECT o.customer_id
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Clothing';| customer_id |
|---|
| 4 |
| 15 |
| 18 |
These customers bought Electronics but never bought Clothing.
A EXCEPT B gives different results than B EXCEPT A. The first query is always the "starting set" and the second query defines what gets removed.
An index is like the index at the back of a textbook. Without it, to find the word "inventory," you would have to flip through every single page. With an index, you look up "inventory" in the back, see "page 142," and go right to it. Database indexes work the same way — they help MySQL find rows without scanning the entire table.
Without Index (Full Table Scan)
Must check EVERY row to find customer_id = 7
With Index (Direct Lookup)
Index says: customer_id 7 is at row position 7. Jump there!
12.1 — CREATE INDEX
CREATE INDEX adds a basic index on one column. It is like adding a tab to your binder for a specific subject. MySQL will use it to quickly look up rows by that column instead of scanning the whole table.
Without Index
"Find all orders for customer_id = 7"
Read EVERY page of the book from start to finish (full table scan: 50,000 rows checked)
With Index
"Find all orders for customer_id = 7"
Look up "customer_id: 7" in the index → pages 142, 305, 410 (only 3 rows checked!)
-- Syntax
CREATE INDEX index_name ON table_name (column_name);
-- Example: speed up lookups by customer_id on orders
CREATE INDEX idx_orders_customer
ON orders (customer_id);| Result | |
|---|---|
| Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 |
idx_ followed by the table and column name. Example: idx_orders_customer. This makes it easy to identify what the index is for.
12.2 — CREATE UNIQUE INDEX
A unique index works like a regular index but also enforces uniqueness — no two rows can have the same value in the indexed column(s). It is both a performance tool and a data integrity constraint. Attempting to insert a duplicate value will produce an error.
-- Ensure no two customers have the same email address
CREATE UNIQUE INDEX idx_customers_email
ON customers (email);
-- Now try inserting a duplicate:
INSERT INTO customers (name, email, city)
VALUES ('Test Customer', 'alice@example.com', 'Dallas');
-- ERROR 1062 (23000): Duplicate entry 'alice@example.com' for key 'idx_customers_email'| Statement | Result |
|---|---|
| CREATE UNIQUE INDEX | Query OK, 0 rows affected |
| INSERT (duplicate email) | ERROR 1062: Duplicate entry 'alice@example.com' |
12.3 — CREATE FULLTEXT INDEX
A fulltext index is designed for searching within text columns (like product descriptions or review comments). Regular indexes match exact values; fulltext indexes can find words within large blocks of text. You use it with the MATCH(...) AGAINST(...) syntax.
-- Create a fulltext index on the description column
CREATE FULLTEXT INDEX idx_products_description
ON products (description);
-- Search for products mentioning "lightweight" or "portable"
SELECT
product_id,
name,
description,
MATCH(description) AGAINST('lightweight portable') AS relevance_score
FROM products
WHERE MATCH(description) AGAINST('lightweight portable')
ORDER BY relevance_score DESC
LIMIT 5;| product_id | name | description | relevance_score |
|---|---|---|---|
| 287 | Travel Backpack | Lightweight portable backpack, perfect for daily commute | 2.8341 |
| 142 | Folding Desk | Portable folding desk, lightweight aluminum frame | 2.1205 |
| 395 | Bluetooth Speaker | Portable speaker with lightweight design for outdoor use | 1.9876 |
| 501 | Laptop Sleeve | Slim lightweight sleeve, fits most portable laptops | 1.2043 |
| 89 | Power Bank | Lightweight charger, portable enough for travel | 1.1502 |
Results are ranked by relevance. Rows containing both "lightweight" AND "portable" score higher.
12.4 — Composite Index (Multi-Column)
A composite index covers multiple columns. Column order matters! MySQL can use the index for queries that filter on the leftmost columns. Think of a phone book sorted by (last_name, first_name) — you can look up by last_name alone, or by last_name + first_name, but you cannot efficiently look up by first_name alone.
-- Index on (customer_id, order_date) for orders
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);| Result | |
|---|---|
| Query OK, 0 rows affected (0.22 sec) |
Index on: (customer_id, order_date, product_id)
12.5 — DROP INDEX
DROP INDEX removes an index from a table. The data in the table is not affected — only the index structure is deleted. Queries that relied on that index will become slower (they will fall back to full table scans).
-- Remove the index we created earlier
DROP INDEX idx_orders_customer ON orders;
-- Alternative syntax (MySQL-specific)
ALTER TABLE orders DROP INDEX idx_orders_customer;| Result | |
|---|---|
| Query OK, 0 rows affected (0.08 sec) |
12.6 — SHOW INDEXES FROM table
SHOW INDEXES displays all the indexes that exist on a table, including their names, which columns they cover, whether they are unique, and more. It is essential for understanding how a table is currently optimized.
SHOW INDEXES FROM orders;| Table | Non_unique | Key_name | Seq_in_index | Column_name | Index_type |
|---|---|---|---|---|---|
| orders | 0 | PRIMARY | 1 | order_id | BTREE |
| orders | 1 | idx_orders_customer_date | 1 | customer_id | BTREE |
| orders | 1 | idx_orders_customer_date | 2 | order_date | BTREE |
| orders | 1 | idx_orders_product | 1 | product_id | BTREE |
| orders | 1 | idx_orders_notes | 1 | notes | FULLTEXT |
Non_unique=0 means unique index (like PRIMARY). Non_unique=1 means duplicates allowed. Seq_in_index shows column order in composite indexes.
12.7 — EXPLAIN SELECT
EXPLAIN shows MySQL's execution plan for a query — how it plans to find your data. It tells you whether it will use an index, how many rows it expects to scan, and what join strategy it will use. This is the most important tool for diagnosing slow queries.
EXPLAIN SELECT *
FROM orders
WHERE customer_id = 7
AND order_date > '2025-01-01';| id | select_type | table | type | possible_keys | key | key_len | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | orders | range | idx_orders_customer_date | idx_orders_customer_date | 8 | 5 | 100.00 | Using index condition |
- type — How rows are accessed. From best to worst:
const→eq_ref→ref→range→index→ALL(full scan = bad!) - possible_keys — Which indexes MySQL considered using
- key — Which index MySQL actually chose (NULL = no index used!)
- rows — Estimated number of rows MySQL needs to examine. Lower = better.
- Extra —
Using indexis great (covering index).Using filesortorUsing temporarycan indicate performance issues.
Comparison: same query with and without index
EXPLAIN Without Index
| type | key | rows | Extra |
|---|---|---|---|
| ALL | NULL | 52,480 | Using where |
Full table scan! Checks every row.
EXPLAIN With Index
| type | key | rows | Extra |
|---|---|---|---|
| range | idx_orders_customer_date | 5 | Using index condition |
Only 5 rows examined! 10,000x faster.
12.8 — EXPLAIN ANALYZE SELECT
EXPLAIN ANALYZE goes beyond regular EXPLAIN — it actually runs the query and shows real execution statistics: actual time, actual rows, and the number of loops. This is invaluable for finding exactly where a query spends its time. Available in MySQL 8.0.18+.
EXPLAIN ANALYZE
SELECT c.name, c.city, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date > '2025-01-01'
GROUP BY c.customer_id, c.name, c.city
ORDER BY order_count DESC
LIMIT 5;-> Limit: 5 row(s) (actual time=1.245..1.248 rows=5 loops=1)
-> Sort: order_count DESC, limit input to 5 row(s) per chunk
(actual time=1.243..1.245 rows=5 loops=1)
-> Table scan on <temporary>
(actual time=1.102..1.138 rows=18 loops=1)
-> Aggregate using temporary table
(actual time=1.098..1.098 rows=18 loops=1)
-> Nested loop join (cost=42.15 rows=87)
(actual time=0.085..0.892 rows=87 loops=1)
-> Filter: (o.order_date > '2025-01-01')
(actual time=0.052..0.415 rows=87 loops=1)
-> Index range scan on o
using idx_orders_customer_date
(actual time=0.048..0.328 rows=87 loops=1)
-> Single-row index lookup on c
using PRIMARY (customer_id=o.customer_id)
(actual time=0.004..0.004 rows=1 loops=87)- actual time — Two numbers: time to return first row .. time to return all rows (in milliseconds)
- rows — Actual number of rows processed at this step
- loops — How many times this step was executed. For a nested loop join, the inner side runs once per outer row.
- Read from the innermost indentation outward to understand the execution flow.
- Look for steps with high
actual timeor highrows— those are your bottlenecks.
Good Candidates for Indexes
- Columns used in WHERE clauses frequently
- Columns used in JOIN conditions (foreign keys)
- Columns used in ORDER BY or GROUP BY
- Columns with high cardinality (many unique values, like customer_id or email)
- Large tables (thousands+ of rows)
Bad Candidates for Indexes
- Small tables (under a few hundred rows) — full scan is fast enough
- Columns with few unique values (like gender or yes/no flags)
- Tables with heavy INSERT/UPDATE/DELETE — indexes slow down writes because they must be updated too
- Columns rarely used in WHERE, JOIN, or ORDER BY
- Wide columns (very long text) — use prefix indexes or fulltext instead