MySQL 8.0 Learning Guide — Part 5

← Back to Main Guide

Part 5: Subqueries, Set Operations & Indexes

Master nested queries, combine result sets, and turbocharge performance with indexes — all with real business database examples.

Table of Contents

Section 10 — Subqueries & Derived Tables
Section 11 — Set Operations
Section 12 — Indexes & Performance
10
Subqueries & Derived Tables
Intermediate

A subquery is a query nested inside another query. Think of it like asking a question within a question: "Show me all employees whose salary is above (what is the average salary?)." The part in parentheses is the subquery — MySQL runs it first, gets the answer, then uses that answer in the outer query.

10.1 — Scalar Subquery in SELECT

A scalar subquery returns exactly one value (one row, one column). You can use it as a "computed column" inside your SELECT list. MySQL runs the subquery once and pastes that single value into every row of your result.

SQL — Scalar Subquery in SELECT
SELECT c.customer_id, c.name, c.city, (SELECT COUNT(*) FROM orders) AS total_orders_in_system, (SELECT ROUND(AVG(order_count), 1) FROM ( SELECT COUNT(*) AS order_count FROM orders GROUP BY customer_id ) AS avg_tbl ) AS avg_orders_per_customer FROM customers c LIMIT 5;
customer_idnamecitytotal_orders_in_systemavg_orders_per_customer
1Alice JohnsonNew York12844.2
2Bob SmithChicago12844.2
3Carol DavisHouston12844.2
4David WilsonPhoenix12844.2
5Elena MartinSeattle12844.2

Notice: the scalar subquery values (1284 and 4.2) are the same on every row because they are computed once.

10.2 — Subquery in WHERE with Comparison

You can use a subquery inside a WHERE clause with comparison operators like =, >, <. The subquery computes a single value, and the outer query filters rows against it. Think of it as: "First, figure out the average — then show me everything above that number."

SQL — Customers with Orders Above Average
-- Find customers who have more orders than the average customer SELECT c.customer_id, c.name, c.city, COUNT(o.order_id) AS order_count FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name, c.city HAVING COUNT(o.order_id) > ( -- This subquery calculates the average number of orders per customer SELECT AVG(cnt) FROM ( SELECT COUNT(*) AS cnt FROM orders GROUP BY customer_id ) AS sub ) ORDER BY order_count DESC;
customer_idnamecityorder_count
12Rachel GreenDenver11
3Carol DavisHouston9
7Frank MillerBoston8
1Alice JohnsonNew York7
15Tom BakerPortland6

Average is 4.2, so only customers with more than 4 orders appear.

10.3 — Subquery in WHERE with IN

When the subquery returns multiple rows (a list of values), you cannot use = — you need IN. It is like saying: "Give me all customers whose ID is in this list." The subquery generates the list.

SQL — Customers Who Have Priority (Urgent) Orders
SELECT c.customer_id, c.name, c.email, c.city FROM customers c WHERE c.customer_id IN ( -- Subquery: get IDs of customers with priority orders SELECT DISTINCT o.customer_id FROM orders o WHERE o.priority = 'Urgent' ) ORDER BY c.name;
customer_idnameemailcity
3Carol Daviscarol@example.comHouston
7Frank Millerfrank@example.comBoston
1Alice Johnsonalice@example.comNew York
12Rachel Greenrachel@example.comDenver

10.4 — Correlated Subquery

A correlated subquery is special because it references a column from the outer query. This means the subquery cannot run on its own — it depends on the outer query. MySQL runs it once for every row in the outer query. It is slower but very powerful when you need per-row calculations.

Think of it like a manager reviewing reports: for each employee (outer row), they check that employee's sales history (subquery). They cannot review all employees at once because each check is different.

Regular Subquery vs. Correlated Subquery
Regular Subquery
Inner query runs ONCE
Produces a single result (e.g., 4.2)
Outer query uses that result for ALL rows
Correlated Subquery
Outer query: Row 1 (Alice)
↓ passes customer_id=1
Inner query runs for Alice → 2025-12-01
——————
Outer query: Row 2 (Bob)
↓ passes customer_id=2
Inner query runs for Bob → 2025-01-15
——————
Repeats for EVERY row...
SQL — Each Customer's Most Recent Order (Correlated)
SELECT c.customer_id, c.name, c.city, o.order_id, o.order_date, o.priority FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date = ( -- Correlated: references c.customer_id from the OUTER query SELECT MAX(o2.order_date) FROM orders o2 WHERE o2.customer_id = c.customer_id -- <-- this links to outer query! ) ORDER BY o.order_date DESC;
customer_idnamecityorder_idorder_datepriority
12Rachel GreenDenver10472025-02-20Urgent
7Frank MillerBoston10452025-02-18Standard
3Carol DavisHouston10422025-02-15Urgent
1Alice JohnsonNew York10392025-02-12Standard
15Tom BakerPortland10362025-02-08Express
Performance note: Correlated subqueries can be slow on large tables because the inner query runs once per outer row. If you have 10,000 customers, the subquery runs 10,000 times. Consider using JOINs or window functions as alternatives when performance matters.

10.5 — Derived Table (Subquery in FROM)

A derived table is a subquery placed in the FROM clause. It acts like a temporary table that exists only for that one query. You must give it an alias name. This is great for calculating summaries first, then filtering or joining against them.

SQL — Department Stats Then Filter High-Volume
-- Step 1: The subquery creates a "temporary table" of department order counts -- Step 2: The outer query filters it to find high-volume departments SELECT dept_stats.department, dept_stats.manager_name, dept_stats.order_count, dept_stats.urgent_orders FROM ( -- This whole subquery IS the derived table SELECT d.name AS manager_name, d.department, COUNT(o.order_id) AS order_count, SUM(CASE WHEN o.priority = 'Urgent' THEN 1 ELSE 0 END) AS urgent_orders FROM departments d JOIN orders o ON d.department_id = o.department_id GROUP BY d.name, d.department ) AS dept_stats -- <-- alias is REQUIRED WHERE dept_stats.order_count > 10 ORDER BY dept_stats.order_count DESC;
departmentmanager_nameorder_counturgent_orders
SalesLinda Park4718
MarketingJames Taylor355
EngineeringSarah Chen289
SupportMike Brown2212
OperationsNancy White153

10.6 — EXISTS Subquery

EXISTS checks whether a subquery returns any rows at all. It does not care about the actual values — it just asks "does at least one matching row exist?" It returns TRUE or FALSE. This is often faster than IN because it stops searching as soon as it finds the first match.

SQL — Customers Who Have High-Value Orders
SELECT c.customer_id, c.name, c.email, c.city FROM customers c WHERE EXISTS ( -- Does this customer have at least one high-value order? SELECT 1 -- value doesn't matter, EXISTS just checks for rows FROM orders o WHERE o.customer_id = c.customer_id AND o.total > 500 ) ORDER BY c.name;
customer_idnameemailcity
3Carol Daviscarol@example.comHouston
7Frank Millerfrank@example.comBoston
4David Wilsondavid@example.comPhoenix
12Rachel Greenrachel@example.comDenver
1Alice Johnsonalice@example.comNew York
Why SELECT 1? Inside an EXISTS subquery, the actual column you select does not matter. MySQL only cares whether any row is returned, not what values it contains. Writing SELECT 1 is a common convention to make this clear.

10.7 — NOT EXISTS

NOT EXISTS is the opposite of EXISTS. It returns TRUE when the subquery produces zero rows. This is perfect for finding "orphan" records — rows that have no matching rows in another table. For example, customers who have never placed an order.

SQL — Customers With No Orders
SELECT c.customer_id, c.name, c.email, c.city FROM customers c WHERE NOT EXISTS ( -- Is there any order at all for this customer? SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id ) ORDER BY c.name;
customer_idnameemailcity
20Kevin Adamskevin@example.comAustin
18Laura Bennettlaura@example.comMiami
21Olivia Fosterolivia@example.comAtlanta

These 3 customers are registered but have never placed an order.

EXISTS vs. IN — When to use which?
  • EXISTS is usually faster when the subquery table is large, because it stops at the first match.
  • IN is simpler to read and works well when the subquery returns a small list.
  • NOT EXISTS handles NULLs correctly (unlike NOT IN, which can give unexpected results if the subquery returns NULLs).
11
Set Operations
Intermediate

Set operations let you combine the results of two or more SELECT queries into one result set. Think of it like combining two lists. All set operations require that both queries return the same number of columns with compatible data types.

Visual Overview — All Four Set Operations
UNION — Everything (no dupes)
dupes kept
UNION ALL — Everything (with dupes)
INTERSECT — Only overlap
EXCEPT — Left only (not in right)

11.1 — UNION (Combine + Deduplicate)

UNION takes the results of two SELECT statements and stacks them on top of each other, then removes any duplicate rows. It is like merging two attendance lists and crossing off any names that appear on both.

SQL — UNION: Customers from Online and Wholesale channels (no duplicates)
-- Query 1: Customers with orders from the Online channel SELECT c.customer_id, c.name, c.city, 'Online' AS source FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.channel = 'Online' UNION -- removes duplicates -- Query 2: Customers with Urgent priority orders SELECT c.customer_id, c.name, c.city, 'Urgent Order' AS source FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.priority = 'Urgent';

Query 1 returns:

customer_idnamecitysource
1Alice JohnsonNew YorkOnline
3Carol DavisHoustonOnline
7Frank MillerBostonOnline
12Rachel GreenDenverOnline

Query 2 returns:

customer_idnamecitysource
3Carol DavisHoustonUrgent Order
7Frank MillerBostonUrgent Order
12Rachel GreenDenverUrgent Order
15Tom BakerPortlandUrgent Order

UNION result (combined, duplicates removed):

customer_idnamecitysource
1Alice JohnsonNew YorkOnline
3Carol DavisHoustonOnline
3Carol DavisHoustonUrgent Order
7Frank MillerBostonOnline
7Frank MillerBostonUrgent Order
12Rachel GreenDenverOnline
12Rachel GreenDenverUrgent Order
15Tom BakerPortlandUrgent Order

Note: rows are unique because the "source" column differs. UNION deduplicates based on ALL columns together.

11.2 — UNION ALL (Combine Keeping Duplicates)

UNION ALL works just like UNION, but it does not remove duplicates. It simply stacks all rows from both queries. It is faster than UNION because it skips the deduplication step. Use it when you know there are no duplicates, or when you deliberately want to keep them.

SQL — UNION ALL: Combine recent Electronics and Clothing orders
-- All Electronics orders from January SELECT o.customer_id, p.name AS product_name, o.total, o.order_date FROM orders o JOIN products p ON o.product_id = p.product_id WHERE p.category = 'Electronics' AND o.order_date BETWEEN '2025-01-01' AND '2025-01-31' UNION ALL -- keeps ALL rows, even if identical -- All Clothing orders from January SELECT o.customer_id, p.name AS product_name, o.total, o.order_date FROM orders o JOIN products p ON o.product_id = p.product_id WHERE p.category = 'Clothing' AND o.order_date BETWEEN '2025-01-01' AND '2025-01-31';
customer_idproduct_nametotalorder_date
1Wireless Headphones149.992025-01-05
3Bluetooth Speaker89.502025-01-10
7USB-C Hub45.002025-01-12
12Laptop Stand129.002025-01-20
1Winter Jacket199.002025-01-05
3Running Shoes120.502025-01-10
7Denim Jeans65.002025-01-12
15Wool Sweater85.002025-01-22
UNION vs UNION ALL speed: UNION ALL is always faster because it skips the sort + deduplication step. If you are certain there will be no duplicates (e.g., the two queries pull from different product categories), always prefer UNION ALL.

11.3 — INTERSECT (Common Rows Only)

INTERSECT returns only the rows that appear in both query results. It is the overlap area in a Venn diagram. Available in MySQL 8.0.31 and later.

INTERSECT — Only the Overlap
Q1 Q2 BOTH
SQL — INTERSECT: Customers who bought from BOTH Electronics and Clothing
-- Customers who bought Electronics SELECT o.customer_id FROM orders o JOIN products p ON o.product_id = p.product_id WHERE p.category = 'Electronics' INTERSECT -- Customers who bought Clothing SELECT o.customer_id FROM orders o JOIN products p ON o.product_id = p.product_id WHERE p.category = 'Clothing';
customer_id
1
3
7
12

Only customers who appear in both result sets are returned.

11.4 — EXCEPT (Rows in First But Not Second)

EXCEPT returns rows from the first query that are not present in the second query. Think of it as "subtraction" for result sets. Available in MySQL 8.0.31 and later.

EXCEPT — Left Side Minus Overlap
KEPT Q2 removed
SQL — EXCEPT: Customers who bought Electronics but NOT Clothing
-- Customers who bought Electronics SELECT o.customer_id FROM orders o JOIN products p ON o.product_id = p.product_id WHERE p.category = 'Electronics' EXCEPT -- Remove those who also bought Clothing SELECT o.customer_id FROM orders o JOIN products p ON o.product_id = p.product_id WHERE p.category = 'Clothing';
customer_id
4
15
18

These customers bought Electronics but never bought Clothing.

Order matters with EXCEPT! A EXCEPT B gives different results than B EXCEPT A. The first query is always the "starting set" and the second query defines what gets removed.
12
Indexes & Performance
Intermediate

An index is like the index at the back of a textbook. Without it, to find the word "inventory," you would have to flip through every single page. With an index, you look up "inventory" in the back, see "page 142," and go right to it. Database indexes work the same way — they help MySQL find rows without scanning the entire table.

Query WITHOUT Index vs. WITH Index
Without Index (Full Table Scan)

Must check EVERY row to find customer_id = 7

1
2
3
4
5
6
7
8
9
10
11
12
Checked 12 of 12 rows (red = scanned, green = match)
vs
With Index (Direct Lookup)

Index says: customer_id 7 is at row position 7. Jump there!

1
2
3
4
5
6
7
8
9
10
11
12
Checked 1 of 12 rows (grey = skipped, green = match)

12.1 — CREATE INDEX

CREATE INDEX adds a basic index on one column. It is like adding a tab to your binder for a specific subject. MySQL will use it to quickly look up rows by that column instead of scanning the whole table.

Book Index Analogy
Without Index

"Find all orders for customer_id = 7"

Read EVERY page of the book from start to finish (full table scan: 50,000 rows checked)

With Index

"Find all orders for customer_id = 7"

Look up "customer_id: 7" in the index → pages 142, 305, 410 (only 3 rows checked!)

SQL — CREATE INDEX
-- Syntax CREATE INDEX index_name ON table_name (column_name); -- Example: speed up lookups by customer_id on orders CREATE INDEX idx_orders_customer ON orders (customer_id);
Result
Query OK, 0 rows affected (0.15 sec)
Records: 0   Duplicates: 0   Warnings: 0
Naming convention: Prefix index names with idx_ followed by the table and column name. Example: idx_orders_customer. This makes it easy to identify what the index is for.

12.2 — CREATE UNIQUE INDEX

A unique index works like a regular index but also enforces uniqueness — no two rows can have the same value in the indexed column(s). It is both a performance tool and a data integrity constraint. Attempting to insert a duplicate value will produce an error.

SQL — CREATE UNIQUE INDEX
-- Ensure no two customers have the same email address CREATE UNIQUE INDEX idx_customers_email ON customers (email); -- Now try inserting a duplicate: INSERT INTO customers (name, email, city) VALUES ('Test Customer', 'alice@example.com', 'Dallas'); -- ERROR 1062 (23000): Duplicate entry 'alice@example.com' for key 'idx_customers_email'
StatementResult
CREATE UNIQUE INDEXQuery OK, 0 rows affected
INSERT (duplicate email)ERROR 1062: Duplicate entry 'alice@example.com'

12.3 — CREATE FULLTEXT INDEX

A fulltext index is designed for searching within text columns (like product descriptions or review comments). Regular indexes match exact values; fulltext indexes can find words within large blocks of text. You use it with the MATCH(...) AGAINST(...) syntax.

SQL — CREATE FULLTEXT INDEX + Search
-- Create a fulltext index on the description column CREATE FULLTEXT INDEX idx_products_description ON products (description); -- Search for products mentioning "lightweight" or "portable" SELECT product_id, name, description, MATCH(description) AGAINST('lightweight portable') AS relevance_score FROM products WHERE MATCH(description) AGAINST('lightweight portable') ORDER BY relevance_score DESC LIMIT 5;
product_idnamedescriptionrelevance_score
287Travel BackpackLightweight portable backpack, perfect for daily commute2.8341
142Folding DeskPortable folding desk, lightweight aluminum frame2.1205
395Bluetooth SpeakerPortable speaker with lightweight design for outdoor use1.9876
501Laptop SleeveSlim lightweight sleeve, fits most portable laptops1.2043
89Power BankLightweight charger, portable enough for travel1.1502

Results are ranked by relevance. Rows containing both "lightweight" AND "portable" score higher.

12.4 — Composite Index (Multi-Column)

A composite index covers multiple columns. Column order matters! MySQL can use the index for queries that filter on the leftmost columns. Think of a phone book sorted by (last_name, first_name) — you can look up by last_name alone, or by last_name + first_name, but you cannot efficiently look up by first_name alone.

SQL — Composite Index
-- Index on (customer_id, order_date) for orders CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
Result
Query OK, 0 rows affected (0.22 sec)
Leftmost Prefix Rule — Which queries can use this index?

Index on: (customer_id, order_date, product_id)

WHERE customer_id = 7
customer_id
order_date
product_id
Yes, uses index
WHERE customer_id = 7 AND order_date = ...
customer_id
order_date
product_id
Yes, uses index
WHERE customer_id = 7 AND order_date = ... AND product_id = ...
customer_id
order_date
product_id
Yes, full index used
WHERE order_date = '2025-01-15'
customer_id
order_date
product_id
No! Skips leftmost column
WHERE product_id = 5
customer_id
order_date
product_id
No! Skips leftmost columns

12.5 — DROP INDEX

DROP INDEX removes an index from a table. The data in the table is not affected — only the index structure is deleted. Queries that relied on that index will become slower (they will fall back to full table scans).

SQL — DROP INDEX
-- Remove the index we created earlier DROP INDEX idx_orders_customer ON orders; -- Alternative syntax (MySQL-specific) ALTER TABLE orders DROP INDEX idx_orders_customer;
Result
Query OK, 0 rows affected (0.08 sec)

12.6 — SHOW INDEXES FROM table

SHOW INDEXES displays all the indexes that exist on a table, including their names, which columns they cover, whether they are unique, and more. It is essential for understanding how a table is currently optimized.

SQL — SHOW INDEXES
SHOW INDEXES FROM orders;
TableNon_uniqueKey_nameSeq_in_indexColumn_nameIndex_type
orders0PRIMARY1order_idBTREE
orders1idx_orders_customer_date1customer_idBTREE
orders1idx_orders_customer_date2order_dateBTREE
orders1idx_orders_product1product_idBTREE
orders1idx_orders_notes1notesFULLTEXT

Non_unique=0 means unique index (like PRIMARY). Non_unique=1 means duplicates allowed. Seq_in_index shows column order in composite indexes.

12.7 — EXPLAIN SELECT

EXPLAIN shows MySQL's execution plan for a query — how it plans to find your data. It tells you whether it will use an index, how many rows it expects to scan, and what join strategy it will use. This is the most important tool for diagnosing slow queries.

SQL — EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 7 AND order_date > '2025-01-01';
idselect_typetabletypepossible_keyskeykey_lenrowsfilteredExtra
1SIMPLEordersrangeidx_orders_customer_dateidx_orders_customer_date85100.00Using index condition
Key EXPLAIN Columns Explained
  • type — How rows are accessed. From best to worst: consteq_refrefrangeindexALL (full scan = bad!)
  • possible_keys — Which indexes MySQL considered using
  • key — Which index MySQL actually chose (NULL = no index used!)
  • rows — Estimated number of rows MySQL needs to examine. Lower = better.
  • ExtraUsing index is great (covering index). Using filesort or Using temporary can indicate performance issues.

Comparison: same query with and without index

EXPLAIN Without Index
typekeyrowsExtra
ALLNULL52,480Using where

Full table scan! Checks every row.

EXPLAIN With Index
typekeyrowsExtra
rangeidx_orders_customer_date5Using index condition

Only 5 rows examined! 10,000x faster.

12.8 — EXPLAIN ANALYZE SELECT

EXPLAIN ANALYZE goes beyond regular EXPLAIN — it actually runs the query and shows real execution statistics: actual time, actual rows, and the number of loops. This is invaluable for finding exactly where a query spends its time. Available in MySQL 8.0.18+.

SQL — EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT c.name, c.city, COUNT(o.order_id) AS order_count FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date > '2025-01-01' GROUP BY c.customer_id, c.name, c.city ORDER BY order_count DESC LIMIT 5;
Output (tree format)
-> Limit: 5 row(s) (actual time=1.245..1.248 rows=5 loops=1) -> Sort: order_count DESC, limit input to 5 row(s) per chunk (actual time=1.243..1.245 rows=5 loops=1) -> Table scan on <temporary> (actual time=1.102..1.138 rows=18 loops=1) -> Aggregate using temporary table (actual time=1.098..1.098 rows=18 loops=1) -> Nested loop join (cost=42.15 rows=87) (actual time=0.085..0.892 rows=87 loops=1) -> Filter: (o.order_date > '2025-01-01') (actual time=0.052..0.415 rows=87 loops=1) -> Index range scan on o using idx_orders_customer_date (actual time=0.048..0.328 rows=87 loops=1) -> Single-row index lookup on c using PRIMARY (customer_id=o.customer_id) (actual time=0.004..0.004 rows=1 loops=87)
Reading EXPLAIN ANALYZE Output
  • actual time — Two numbers: time to return first row .. time to return all rows (in milliseconds)
  • rows — Actual number of rows processed at this step
  • loops — How many times this step was executed. For a nested loop join, the inner side runs once per outer row.
  • Read from the innermost indentation outward to understand the execution flow.
  • Look for steps with high actual time or high rows — those are your bottlenecks.
When to Use Indexes — Guidelines
Good Candidates for Indexes
  • Columns used in WHERE clauses frequently
  • Columns used in JOIN conditions (foreign keys)
  • Columns used in ORDER BY or GROUP BY
  • Columns with high cardinality (many unique values, like customer_id or email)
  • Large tables (thousands+ of rows)
Bad Candidates for Indexes
  • Small tables (under a few hundred rows) — full scan is fast enough
  • Columns with few unique values (like gender or yes/no flags)
  • Tables with heavy INSERT/UPDATE/DELETE — indexes slow down writes because they must be updated too
  • Columns rarely used in WHERE, JOIN, or ORDER BY
  • Wide columns (very long text) — use prefix indexes or fulltext instead
Remember: Every index speeds up reads but slows down writes (INSERT, UPDATE, DELETE). Each index must be updated whenever the data changes. Do not create indexes on every column — only on the ones that actually appear in your slow queries. Use EXPLAIN to verify your indexes are being used.
← Back to Main Guide | ↑ Back to Top