Part 2: Window Functions, CTEs & Aggregates

Section 2 — Window / Ranking Functions
Section 3 — CTEs (Common Table Expressions)
Section 4 — Aggregate Functions

Section 2 — Window / Ranking Functions

Window functions let you do calculations across a set of rows that are related to the current row — without collapsing rows like GROUP BY does. Think of it like looking through a sliding window at your data: you can see surrounding rows and compute things like rankings, running totals, or the previous row's value.

1. ROW_NUMBER() OVER (ORDER BY ...)

Plain English: Give every row a unique number, starting at 1, based on the order you choose. Even if two rows have the same value, they still get different numbers.

-- Number all products from cheapest to most expensive
SELECT
    ROW_NUMBER() OVER (ORDER BY price) AS row_num,
    product_name,
    price
FROM products;

Output:

row_num	product_name	price
1	USB Cable	9.99
2	Mouse Pad	14.50
3	Wireless Mouse	29.99
4	Mechanical Keyboard	89.00
5	Monitor Stand	129.99

2. ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

Plain English: Same as above, but the numbering restarts at 1 for each group. It is like numbering students within each classroom separately.

-- Number each customer's orders, most recent first, within each customer
SELECT
    ROW_NUMBER() OVER (
        PARTITION BY customer_name
        ORDER BY order_date DESC
    ) AS rn,
    customer_name,
    product_name,
    order_date
FROM orders;

Output:

rn	customer_name	product_name	order_date
1	Alice Johnson	Monitor Stand	2025-03-10
2	Alice Johnson	Wireless Mouse	2025-02-14
1	Bob Wilson	Mechanical Keyboard	2025-03-12
2	Bob Wilson	Mouse Pad	2025-03-01
3	Bob Wilson	Wireless Mouse	2025-01-20
1	Jane Doe	USB Cable	2025-03-05

3. RANK() OVER (ORDER BY ...)

Plain English: Like a sports ranking — if two people tie for 2nd place, they both get rank 2, and the next person gets rank 4 (it skips 3). There are gaps when ties happen.

-- Rank employees by number of sales closed (ties allowed, with gaps)
SELECT
    RANK() OVER (ORDER BY sales_closed DESC) AS rnk,
    employee_name,
    sales_closed
FROM employees;

Output:

rnk	employee_name	sales_closed
1	Diana Prince	120
2	Bob Wilson	98
2	Charlie Brown	98
4	John Smith	85
5	Jane Doe	72

4. DENSE_RANK() OVER (ORDER BY ...)

Plain English: Same as RANK, but with no gaps. If two people tie for 2nd, the next person gets rank 3 (not 4). The ranks stay consecutive.

-- Dense-rank employees by sales closed (no gaps on ties)
SELECT
    DENSE_RANK() OVER (ORDER BY sales_closed DESC) AS d_rnk,
    employee_name,
    sales_closed
FROM employees;

Output:

d_rnk	employee_name	sales_closed
1	Diana Prince	120
2	Bob Wilson	98
2	Charlie Brown	98
3	John Smith	85
4	Jane Doe	72

5. Side-by-Side: ROW_NUMBER vs RANK vs DENSE_RANK

Plain English: This is the best way to see the difference. All three functions run on the same data. Notice how ties are handled differently by each one.

SELECT
    employee_name,
    sales_closed,
    ROW_NUMBER() OVER (ORDER BY sales_closed DESC) AS row_num,
    RANK()       OVER (ORDER BY sales_closed DESC) AS rnk,
    DENSE_RANK() OVER (ORDER BY sales_closed DESC) AS d_rnk
FROM employees;

Output:

employee_name	sales_closed	row_num	rnk	d_rnk
Diana Prince	120	1	1	1
Bob Wilson	98	2	2	2
Charlie Brown	98	3	2	2
John Smith	85	4	4	3
Jane Doe	72	5	5	4

Key takeaway: The tied rows (98 each, highlighted yellow) all share the same RANK and DENSE_RANK, but ROW_NUMBER always gives unique numbers. After the tie, RANK skips to 4 while DENSE_RANK continues to 3.

6. NTILE(n) OVER (ORDER BY ...)

Plain English: Divide all your rows into n roughly equal buckets. If you say NTILE(3), it splits the data into 3 groups. Useful for creating quartiles, percentiles, etc.

-- Split products into 3 price tiers
SELECT
    product_name,
    price,
    NTILE(3) OVER (ORDER BY price) AS price_tier
FROM products;

Output:

product_name	price	price_tier
USB Cable	9.99	1
Mouse Pad	14.50	1
Wireless Mouse	29.99	2
Mechanical Keyboard	89.00	2
Monitor Stand	129.99	3

7. PERCENT_RANK() OVER (ORDER BY ...)

Plain English: Shows where each row falls as a percentage. The formula is (rank - 1) / (total_rows - 1). The lowest value gets 0 and the highest gets 1. Think of it as: "What percentage of rows are below me?"

SELECT
    product_name,
    price,
    PERCENT_RANK() OVER (ORDER BY price) AS pct_rank
FROM products;

Output:

product_name	price	pct_rank
USB Cable	9.99	0.0000
Mouse Pad	14.50	0.2500
Wireless Mouse	29.99	0.5000
Mechanical Keyboard	89.00	0.7500
Monitor Stand	129.99	1.0000

8. CUME_DIST() OVER (ORDER BY ...)

Plain English: Cumulative distribution — the fraction of rows that have a value less than or equal to the current row. Formula: count of rows ≤ current / total rows. Unlike PERCENT_RANK, the first row is NOT zero.

SELECT
    product_name,
    price,
    CUME_DIST() OVER (ORDER BY price) AS cume
FROM products;

Output:

product_name	price	cume
USB Cable	9.99	0.2000
Mouse Pad	14.50	0.4000
Wireless Mouse	29.99	0.6000
Mechanical Keyboard	89.00	0.8000
Monitor Stand	129.99	1.0000

9. LAG(col, n) OVER (ORDER BY ...)

Plain English: Look at the value from a previous row. LAG(price, 1) means "give me the price from 1 row before this one." If there is no previous row, you get NULL. Super useful for comparing to the previous period.

-- Compare each month's sales total to the previous month
SELECT
    month_name,
    total_sales,
    LAG(total_sales, 1) OVER (ORDER BY month_num) AS prev_month,
    total_sales - LAG(total_sales, 1) OVER (ORDER BY month_num) AS change
FROM monthly_sales;

Output:

month_name	total_sales	prev_month	change
January	320	NULL	NULL
February	285	320	-35
March	410	285	125
April	398	410	-12
May	445	398	47

10. LAG() with PARTITION BY

Plain English: Same idea as LAG, but it looks at the previous row within each group. The "memory" resets when you move to a new group. Perfect for comparing an employee's current performance to their previous performance.

-- Each employee's monthly sales compared to their previous month
SELECT
    employee_name,
    sale_date,
    amount,
    LAG(amount, 1) OVER (
        PARTITION BY employee_name
        ORDER BY sale_date
    ) AS prev_amount
FROM sales;

Output:

employee_name	sale_date	amount	prev_amount
Bob Wilson	2025-01-15	4500	NULL
Bob Wilson	2025-02-20	3800	4500
Bob Wilson	2025-03-18	5200	3800
Jane Doe	2025-01-22	2900	NULL
Jane Doe	2025-03-01	3100	2900

11. LEAD(col, n) OVER (ORDER BY ...)

Plain English: The opposite of LAG — it peeks at the next row's value. LEAD(price, 1) means "give me the price from 1 row after this one." NULL if there is no next row.

-- Show the next scheduled delivery for each order
SELECT
    customer_name,
    order_date,
    LEAD(order_date, 1) OVER (
        ORDER BY order_date
    ) AS next_order
FROM orders;

Output:

customer_name	order_date	next_order
Charlie Brown	2025-03-01	2025-03-03
Alice Johnson	2025-03-03	2025-03-05
Bob Wilson	2025-03-05	2025-03-07
Diana Prince	2025-03-07	2025-03-10
Jane Doe	2025-03-10	NULL

12. FIRST_VALUE(col) OVER (...)

Plain English: Grabs the very first value in the window. For example, "which was the cheapest product?" or "what was the earliest date?" Every row in the result shows that same first value, so you can compare each row to it.

-- Show each product alongside the cheapest product's name
SELECT
    product_name,
    price,
    FIRST_VALUE(product_name) OVER (
        ORDER BY price
    ) AS cheapest_product
FROM products;

Output:

product_name	price	cheapest_product
USB Cable	9.99	USB Cable
Mouse Pad	14.50	USB Cable
Wireless Mouse	29.99	USB Cable
Mechanical Keyboard	89.00	USB Cable
Monitor Stand	129.99	USB Cable

13. LAST_VALUE(col) OVER (... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

Plain English: Grabs the very last value in the window. But there is a catch! By default, the window frame only goes up to the current row, so LAST_VALUE would just return the current row's value. You must add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to tell MySQL "look at ALL the rows, from the very first to the very last."

-- Show each product alongside the most expensive product's name
SELECT
    product_name,
    price,
    LAST_VALUE(product_name) OVER (
        ORDER BY price
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS most_expensive
FROM products;

Output:

product_name	price	most_expensive
USB Cable	9.99	Monitor Stand
Mouse Pad	14.50	Monitor Stand
Wireless Mouse	29.99	Monitor Stand
Mechanical Keyboard	89.00	Monitor Stand
Monitor Stand	129.99	Monitor Stand

Without the frame clause, LAST_VALUE would return the current row's own product_name for each row — not the actual last value. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause expands the window frame to cover all rows.

14. NTH_VALUE(col, n) OVER (...)

Plain English: Get the value from the nth row in the window. NTH_VALUE(product_name, 2) means "give me the product name from the 2nd row." You usually need the same frame clause as LAST_VALUE to see all rows.

-- Show the 2nd cheapest product alongside each row
SELECT
    product_name,
    price,
    NTH_VALUE(product_name, 2) OVER (
        ORDER BY price
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_cheapest
FROM products;

Output:

product_name	price	second_cheapest
USB Cable	9.99	Mouse Pad
Mouse Pad	14.50	Mouse Pad
Wireless Mouse	29.99	Mouse Pad
Mechanical Keyboard	89.00	Mouse Pad
Monitor Stand	129.99	Mouse Pad

Visual: What Is a "Window"?

Plain English: A "window" is the set of rows that a window function can see when it calculates a value for the current row. The window can be all rows, rows in the same group (PARTITION BY), or a sliding range of rows (frame clause). Below is a visual showing how the window changes for each row.

Window Frame: ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

For each "current row" (yellow), the window includes 1 row before and 1 row after (blue). Gray rows are outside the window.

When current row = Row 1

Row	product_name	price
1	USB Cable	9.99
2	Mouse Pad	14.50
3	Wireless Mouse	29.99
4	Mechanical Keyboard	89.00
5	Monitor Stand	129.99

When current row = Row 3

Row	product_name	price
1	USB Cable	9.99
2	Mouse Pad	14.50
3	Wireless Mouse	29.99
4	Mechanical Keyboard	89.00
5	Monitor Stand	129.99

When current row = Row 5

Row	product_name	price
1	USB Cable	9.99
2	Mouse Pad	14.50
3	Wireless Mouse	29.99
4	Mechanical Keyboard	89.00
5	Monitor Stand	129.99

Current row In the window Outside the window

Section 3 — CTEs (Common Table Expressions)

A CTE is like giving a name to a temporary result set. Instead of nesting subqueries deep inside each other (which gets hard to read), you write WITH name AS (query) at the top, and then use that name like a table. It exists only for the duration of that single SQL statement.

1. Basic CTE (WITH ... AS)

Plain English: Think of it as creating a temporary "mini-table" that you define at the top and then use in your main query. It makes complex queries easier to read.

-- Find customers whose total spending exceeds 1000
WITH customer_totals AS (
    SELECT
        customer_name,
        SUM(total) AS total_spent,
        COUNT(*) AS order_count
    FROM orders
    GROUP BY customer_name
)
SELECT *
FROM customer_totals
WHERE total_spent > 1000;

Output:

customer_name	total_spent	order_count
Bob Wilson	1200.00	3
Alice Johnson	1150.00	2

2. Multiple CTEs Chained Together

Plain English: You can define several CTEs separated by commas, and later CTEs can even reference earlier ones. It is like building blocks — each CTE builds on top of the previous one.

-- CTE 1: total sales per department
-- CTE 2: average across departments
-- Final: departments above average
WITH dept_counts AS (
    SELECT
        department,
        COUNT(*) AS total_sales
    FROM sales
    GROUP BY department
),
avg_sales AS (
    SELECT AVG(total_sales) AS avg_count
    FROM dept_counts
)
SELECT
    d.department,
    d.total_sales,
    ROUND(a.avg_count, 1) AS overall_avg
FROM dept_counts d
CROSS JOIN avg_sales a
WHERE d.total_sales > a.avg_count;

Output:

department	total_sales	overall_avg
Engineering	245	186.7
Marketing	210	186.7

3. Recursive CTE

Plain English: A recursive CTE calls itself repeatedly. It starts with a "base" query (the anchor), then keeps running the "recursive" part that references the CTE, joining new rows each time, until no more rows are found. Perfect for hierarchies like org charts or category trees.

Company Hierarchy (What we are querying)

  Diana Prince (CEO)
  ├── Bob Wilson (VP Engineering)
  │   ├── John Smith (Senior Developer)
  │   └── Jane Doe (Senior Developer)
  ├── Charlie Brown (VP Marketing)
  │   └── Alice Johnson (Marketing Manager)
  └── Sarah Lee (VP Sales)
      └── Mike Chen (Sales Manager)

-- Build the full hierarchy from a self-referencing table
WITH RECURSIVE org_chart AS (
    -- Anchor: start with the top-level (no manager)
    SELECT
        employee_id,
        employee_name,
        title,
        manager_id,
        0 AS level,
        CAST(employee_name AS CHAR(500)) AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: find employees whose manager is in the previous level
    SELECT
        e.employee_id,
        e.employee_name,
        e.title,
        e.manager_id,
        o.level + 1,
        CONCAT(o.path, ' > ', e.employee_name)
    FROM employees e
    INNER JOIN org_chart o
        ON e.manager_id = o.employee_id
)
SELECT
    CONCAT(REPEAT('  ', level), employee_name) AS hierarchy,
    title,
    level,
    path
FROM org_chart
ORDER BY path;

Output:

hierarchy	title	level	path
Diana Prince	CEO	0	Diana Prince
Bob Wilson	VP Engineering	1	Diana Prince > Bob Wilson
Jane Doe	Senior Developer	2	Diana Prince > Bob Wilson > Jane Doe
John Smith	Senior Developer	2	Diana Prince > Bob Wilson > John Smith
Charlie Brown	VP Marketing	1	Diana Prince > Charlie Brown
Alice Johnson	Marketing Manager	2	Diana Prince > Charlie Brown > Alice Johnson
Sarah Lee	VP Sales	1	Diana Prince > Sarah Lee
Mike Chen	Sales Manager	2	Diana Prince > Sarah Lee > Mike Chen

Section 4 — Aggregate Functions

Aggregate functions take many rows and crunch them down into a single value (or one value per group if you use GROUP BY). They are the bread and butter of reporting.

1. COUNT(*)

Plain English: Counts the total number of rows. It counts everything — even rows with NULL values. Use it when you want to know "how many records are there?"

-- How many orders were placed in total?
SELECT
    COUNT(*) AS total_orders
FROM orders;

Output:

total_orders
12

2. COUNT(column)

Plain English: Counts only the rows where that specific column is NOT NULL. If a customer has no email on file (NULL), COUNT(email) will not count that row.

-- How many customers have a phone number on file?
SELECT
    COUNT(phone) AS with_phone,
    COUNT(*) AS total_customers
FROM customers;

Output:

with_phone	total_customers
4	5

Diana Prince has no phone number (NULL), so COUNT(phone) returns 4 while COUNT(*) returns 5.

3. COUNT(DISTINCT column)

Plain English: Counts only the unique values. If "Wireless Mouse" appears 5 times, it only counts as 1.

-- How many different products have been ordered?
SELECT
    COUNT(DISTINCT product_name) AS unique_products,
    COUNT(product_name) AS total_order_lines
FROM orders;

Output:

unique_products	total_order_lines
5	12

4. SUM()

Plain English: Adds up all the values in a column. Like using the SUM function in a spreadsheet. NULL values are ignored.

-- Total revenue per department
SELECT
    department,
    SUM(amount) AS total_revenue
FROM sales
GROUP BY department;

Output:

department	total_revenue
Engineering	8,750.00
Marketing	12,300.00
Sales	3,450.00

5. AVG()

Plain English: Calculates the average (mean). It adds everything up and divides by the count of non-NULL values.

-- Average sale amount by department
SELECT
    department,
    ROUND(AVG(amount), 2) AS avg_amount
FROM sales
GROUP BY department;

Output:

department	avg_amount
Engineering	350.00
Marketing	550.00
Sales	150.00

6. MIN()

Plain English: Finds the smallest value. Works with numbers, dates, and strings (alphabetically first).

-- Earliest order date and smallest sale per department
SELECT
    department,
    MIN(amount) AS smallest_sale,
    MIN(sale_date) AS earliest_date
FROM sales
GROUP BY department;

Output:

department	smallest_sale	earliest_date
Engineering	350.00	2025-01-05
Marketing	200.00	2025-01-08
Sales	150.00	2025-01-12

7. MAX()

Plain English: Finds the largest value. The opposite of MIN.

-- Largest sale and latest date per department
SELECT
    department,
    MAX(amount) AS largest_sale,
    MAX(sale_date) AS latest_date
FROM sales
GROUP BY department;

Output:

department	largest_sale	latest_date
Engineering	350.00	2025-03-15
Marketing	800.00	2025-03-18
Sales	150.00	2025-03-10

8. GROUP_CONCAT(col SEPARATOR ',')

Plain English: Takes all the values in a group and glues them together into one comma-separated string. Instead of 5 rows for 5 products, you get 1 row with all product names joined together.

-- List all products each customer has ordered, in one cell
SELECT
    customer_name,
    GROUP_CONCAT(product_name SEPARATOR ', ') AS products_ordered
FROM orders
GROUP BY customer_name;

Output:

customer_name	products_ordered
Bob Wilson	Wireless Mouse, Mouse Pad, Mechanical Keyboard
Jane Doe	USB Cable, Wireless Mouse
Charlie Brown	Monitor Stand, Wireless Mouse
Alice Johnson	Monitor Stand, Mechanical Keyboard, Mouse Pad
Diana Prince	USB Cable, Mouse Pad

9. GROUP_CONCAT with ORDER BY and DISTINCT

Plain English: You can sort the values before concatenating, and use DISTINCT to remove duplicates. This ensures a clean, alphabetically ordered list with no repeats.

-- Unique products per category, sorted alphabetically
SELECT
    category,
    GROUP_CONCAT(
        DISTINCT product_name
        ORDER BY product_name ASC
        SEPARATOR ' | '
    ) AS available_products
FROM products
GROUP BY category;

Output:

category	available_products
Accessories	Mouse Pad \| USB Cable
Peripherals	Mechanical Keyboard \| Monitor Stand \| Wireless Mouse

10. JSON_ARRAYAGG(col)

Plain English: Like GROUP_CONCAT, but instead of a comma-separated string, it produces a proper JSON array. Useful when your application consumes JSON data.

-- List each customer's orders as a JSON array
SELECT
    customer_name,
    JSON_ARRAYAGG(product_name) AS products_json
FROM orders
GROUP BY customer_name;

Output:

customer_name	products_json
Bob Wilson	["Wireless Mouse", "Mouse Pad", "Mechanical Keyboard"]
Jane Doe	["USB Cable", "Wireless Mouse"]
Charlie Brown	["Monitor Stand", "Wireless Mouse"]
Alice Johnson	["Monitor Stand", "Mechanical Keyboard", "Mouse Pad"]
Diana Prince	["USB Cable", "Mouse Pad"]

11. JSON_OBJECTAGG(key, value)

Plain English: Creates a JSON object (key-value pairs) from two columns. The first column becomes the key, the second becomes the value. Great for creating lookup maps.

-- Build a JSON price list for each category
SELECT
    category,
    JSON_OBJECTAGG(product_name, price) AS price_list
FROM products
GROUP BY category;

Output:

category	price_list
Accessories	{"USB Cable": 9.99, "Mouse Pad": 14.50}
Peripherals	{"Wireless Mouse": 29.99, "Mechanical Keyboard": 89.00, "Monitor Stand": 129.99}

12. STD() / STDDEV()

Plain English: Standard deviation measures how spread out the values are. A low standard deviation means values are clustered close to the average; a high one means they are widely spread. STD() and STDDEV() are synonyms in MySQL.

-- How consistent are sale amounts within each department?
SELECT
    department,
    COUNT(*) AS num_sales,
    ROUND(AVG(amount), 2) AS avg_amount,
    ROUND(STD(amount), 2) AS std_dev
FROM sales
GROUP BY department;

Output:

department	num_sales	avg_amount	std_dev
Engineering	25	350.00	0.00
Marketing	18	548.33	247.22
Sales	15	150.00	0.00

Engineering and Sales each have a consistent sale amount, so their standard deviation is 0 — no variation. Marketing has three different sale tiers (200, 650, 800), producing a higher standard deviation.

13. VARIANCE()

Plain English: Variance is the standard deviation squared. It also measures how spread out values are, but in squared units. If the standard deviation is 247.22, the variance is 247.22 × 247.22 = 61,117.73. Some statistical formulas need variance instead of standard deviation.

-- Variance of sale amounts per department
SELECT
    department,
    ROUND(AVG(amount), 2) AS avg_amount,
    ROUND(VARIANCE(amount), 2) AS amount_variance,
    ROUND(STD(amount), 2) AS std_dev
FROM sales
GROUP BY department;

Output:

department	avg_amount	amount_variance	std_dev
Engineering	350.00	0.00	0.00
Marketing	548.33	61117.73	247.22
Sales	150.00	0.00	0.00

MySQL 8.0 Complete Guide — Part 2: Window Functions, CTEs & Aggregates
← Back to Main Guide

Part 2: Window Functions, CTEs & Aggregates

Table of Contents

1. ROW_NUMBER() OVER (ORDER BY ...)

2. ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

3. RANK() OVER (ORDER BY ...)

4. DENSE_RANK() OVER (ORDER BY ...)

5. Side-by-Side: ROW_NUMBER vs RANK vs DENSE_RANK

6. NTILE(n) OVER (ORDER BY ...)

7. PERCENT_RANK() OVER (ORDER BY ...)

8. CUME_DIST() OVER (ORDER BY ...)

9. LAG(col, n) OVER (ORDER BY ...)

10. LAG() with PARTITION BY

11. LEAD(col, n) OVER (ORDER BY ...)

12. FIRST_VALUE(col) OVER (...)

13. LAST_VALUE(col) OVER (... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

14. NTH_VALUE(col, n) OVER (...)

Visual: What Is a "Window"?

1. Basic CTE (WITH ... AS)

2. Multiple CTEs Chained Together

3. Recursive CTE

1. COUNT(*)

2. COUNT(column)

3. COUNT(DISTINCT column)

4. SUM()

5. AVG()

6. MIN()

7. MAX()

8. GROUP_CONCAT(col SEPARATOR ',')

9. GROUP_CONCAT with ORDER BY and DISTINCT

10. JSON_ARRAYAGG(col)

11. JSON_OBJECTAGG(key, value)

12. STD() / STDDEV()

13. VARIANCE()