Window functions let you do calculations across a set of rows that are related to the current row — without collapsing rows like GROUP BY does. Think of it like looking through a sliding window at your data: you can see surrounding rows and compute things like rankings, running totals, or the previous row's value.
1. ROW_NUMBER() OVER (ORDER BY ...)
Plain English: Give every row a unique number, starting at 1, based on the order you choose. Even if two rows have the same value, they still get different numbers.
-- Number all products from cheapest to most expensive
SELECT
ROW_NUMBER() OVER (ORDER BY price) AS row_num,
product_name,
price
FROM products;
Output:
| row_num | product_name | price |
| 1 | USB Cable | 9.99 |
| 2 | Mouse Pad | 14.50 |
| 3 | Wireless Mouse | 29.99 |
| 4 | Mechanical Keyboard | 89.00 |
| 5 | Monitor Stand | 129.99 |
2. ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)
Plain English: Same as above, but the numbering restarts at 1 for each group. It is like numbering students within each classroom separately.
-- Number each customer's orders, most recent first, within each customer
SELECT
ROW_NUMBER() OVER (
PARTITION BY customer_name
ORDER BY order_date DESC
) AS rn,
customer_name,
product_name,
order_date
FROM orders;
Output:
| rn | customer_name | product_name | order_date |
| 1 | Alice Johnson | Monitor Stand | 2025-03-10 |
| 2 | Alice Johnson | Wireless Mouse | 2025-02-14 |
| 1 | Bob Wilson | Mechanical Keyboard | 2025-03-12 |
| 2 | Bob Wilson | Mouse Pad | 2025-03-01 |
| 3 | Bob Wilson | Wireless Mouse | 2025-01-20 |
| 1 | Jane Doe | USB Cable | 2025-03-05 |
3. RANK() OVER (ORDER BY ...)
Plain English: Like a sports ranking — if two people tie for 2nd place, they both get rank 2, and the next person gets rank 4 (it skips 3). There are gaps when ties happen.
-- Rank employees by number of sales closed (ties allowed, with gaps)
SELECT
RANK() OVER (ORDER BY sales_closed DESC) AS rnk,
employee_name,
sales_closed
FROM employees;
Output:
| rnk | employee_name | sales_closed |
| 1 | Diana Prince | 120 |
| 2 | Bob Wilson | 98 |
| 2 | Charlie Brown | 98 |
| 4 | John Smith | 85 |
| 5 | Jane Doe | 72 |
4. DENSE_RANK() OVER (ORDER BY ...)
Plain English: Same as RANK, but with no gaps. If two people tie for 2nd, the next person gets rank 3 (not 4). The ranks stay consecutive.
-- Dense-rank employees by sales closed (no gaps on ties)
SELECT
DENSE_RANK() OVER (ORDER BY sales_closed DESC) AS d_rnk,
employee_name,
sales_closed
FROM employees;
Output:
| d_rnk | employee_name | sales_closed |
| 1 | Diana Prince | 120 |
| 2 | Bob Wilson | 98 |
| 2 | Charlie Brown | 98 |
| 3 | John Smith | 85 |
| 4 | Jane Doe | 72 |
5. Side-by-Side: ROW_NUMBER vs RANK vs DENSE_RANK
Plain English: This is the best way to see the difference. All three functions run on the same data. Notice how ties are handled differently by each one.
SELECT
employee_name,
sales_closed,
ROW_NUMBER() OVER (ORDER BY sales_closed DESC) AS row_num,
RANK() OVER (ORDER BY sales_closed DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY sales_closed DESC) AS d_rnk
FROM employees;
Output:
| employee_name | sales_closed | row_num | rnk | d_rnk |
| Diana Prince | 120 | 1 | 1 | 1 |
| Bob Wilson | 98 | 2 | 2 | 2 |
| Charlie Brown | 98 | 3 | 2 | 2 |
| John Smith | 85 | 4 | 4 | 3 |
| Jane Doe | 72 | 5 | 5 | 4 |
Key takeaway: The tied rows (98 each, highlighted yellow) all share the same RANK and DENSE_RANK, but ROW_NUMBER always gives unique numbers. After the tie, RANK skips to 4 while DENSE_RANK continues to 3.
6. NTILE(n) OVER (ORDER BY ...)
Plain English: Divide all your rows into n roughly equal buckets. If you say NTILE(3), it splits the data into 3 groups. Useful for creating quartiles, percentiles, etc.
-- Split products into 3 price tiers
SELECT
product_name,
price,
NTILE(3) OVER (ORDER BY price) AS price_tier
FROM products;
Output:
| product_name | price | price_tier |
| USB Cable | 9.99 | 1 |
| Mouse Pad | 14.50 | 1 |
| Wireless Mouse | 29.99 | 2 |
| Mechanical Keyboard | 89.00 | 2 |
| Monitor Stand | 129.99 | 3 |
7. PERCENT_RANK() OVER (ORDER BY ...)
Plain English: Shows where each row falls as a percentage. The formula is (rank - 1) / (total_rows - 1). The lowest value gets 0 and the highest gets 1. Think of it as: "What percentage of rows are below me?"
SELECT
product_name,
price,
PERCENT_RANK() OVER (ORDER BY price) AS pct_rank
FROM products;
Output:
| product_name | price | pct_rank |
| USB Cable | 9.99 | 0.0000 |
| Mouse Pad | 14.50 | 0.2500 |
| Wireless Mouse | 29.99 | 0.5000 |
| Mechanical Keyboard | 89.00 | 0.7500 |
| Monitor Stand | 129.99 | 1.0000 |
8. CUME_DIST() OVER (ORDER BY ...)
Plain English: Cumulative distribution — the fraction of rows that have a value less than or equal to the current row. Formula: count of rows ≤ current / total rows. Unlike PERCENT_RANK, the first row is NOT zero.
SELECT
product_name,
price,
CUME_DIST() OVER (ORDER BY price) AS cume
FROM products;
Output:
| product_name | price | cume |
| USB Cable | 9.99 | 0.2000 |
| Mouse Pad | 14.50 | 0.4000 |
| Wireless Mouse | 29.99 | 0.6000 |
| Mechanical Keyboard | 89.00 | 0.8000 |
| Monitor Stand | 129.99 | 1.0000 |
9. LAG(col, n) OVER (ORDER BY ...)
Plain English: Look at the value from a previous row. LAG(price, 1) means "give me the price from 1 row before this one." If there is no previous row, you get NULL. Super useful for comparing to the previous period.
-- Compare each month's sales total to the previous month
SELECT
month_name,
total_sales,
LAG(total_sales, 1) OVER (ORDER BY month_num) AS prev_month,
total_sales - LAG(total_sales, 1) OVER (ORDER BY month_num) AS change
FROM monthly_sales;
Output:
| month_name | total_sales | prev_month | change |
| January | 320 | NULL | NULL |
| February | 285 | 320 | -35 |
| March | 410 | 285 | 125 |
| April | 398 | 410 | -12 |
| May | 445 | 398 | 47 |
10. LAG() with PARTITION BY
Plain English: Same idea as LAG, but it looks at the previous row within each group. The "memory" resets when you move to a new group. Perfect for comparing an employee's current performance to their previous performance.
-- Each employee's monthly sales compared to their previous month
SELECT
employee_name,
sale_date,
amount,
LAG(amount, 1) OVER (
PARTITION BY employee_name
ORDER BY sale_date
) AS prev_amount
FROM sales;
Output:
| employee_name | sale_date | amount | prev_amount |
| Bob Wilson | 2025-01-15 | 4500 | NULL |
| Bob Wilson | 2025-02-20 | 3800 | 4500 |
| Bob Wilson | 2025-03-18 | 5200 | 3800 |
| Jane Doe | 2025-01-22 | 2900 | NULL |
| Jane Doe | 2025-03-01 | 3100 | 2900 |
11. LEAD(col, n) OVER (ORDER BY ...)
Plain English: The opposite of LAG — it peeks at the next row's value. LEAD(price, 1) means "give me the price from 1 row after this one." NULL if there is no next row.
-- Show the next scheduled delivery for each order
SELECT
customer_name,
order_date,
LEAD(order_date, 1) OVER (
ORDER BY order_date
) AS next_order
FROM orders;
Output:
| customer_name | order_date | next_order |
| Charlie Brown | 2025-03-01 | 2025-03-03 |
| Alice Johnson | 2025-03-03 | 2025-03-05 |
| Bob Wilson | 2025-03-05 | 2025-03-07 |
| Diana Prince | 2025-03-07 | 2025-03-10 |
| Jane Doe | 2025-03-10 | NULL |
12. FIRST_VALUE(col) OVER (...)
Plain English: Grabs the very first value in the window. For example, "which was the cheapest product?" or "what was the earliest date?" Every row in the result shows that same first value, so you can compare each row to it.
-- Show each product alongside the cheapest product's name
SELECT
product_name,
price,
FIRST_VALUE(product_name) OVER (
ORDER BY price
) AS cheapest_product
FROM products;
Output:
| product_name | price | cheapest_product |
| USB Cable | 9.99 | USB Cable |
| Mouse Pad | 14.50 | USB Cable |
| Wireless Mouse | 29.99 | USB Cable |
| Mechanical Keyboard | 89.00 | USB Cable |
| Monitor Stand | 129.99 | USB Cable |
13. LAST_VALUE(col) OVER (... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Plain English: Grabs the very last value in the window. But there is a catch! By default, the window frame only goes up to the current row, so LAST_VALUE would just return the current row's value. You must add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to tell MySQL "look at ALL the rows, from the very first to the very last."
-- Show each product alongside the most expensive product's name
SELECT
product_name,
price,
LAST_VALUE(product_name) OVER (
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS most_expensive
FROM products;
Output:
| product_name | price | most_expensive |
| USB Cable | 9.99 | Monitor Stand |
| Mouse Pad | 14.50 | Monitor Stand |
| Wireless Mouse | 29.99 | Monitor Stand |
| Mechanical Keyboard | 89.00 | Monitor Stand |
| Monitor Stand | 129.99 | Monitor Stand |
Without the frame clause, LAST_VALUE would return the current row's own product_name for each row — not the actual last value. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause expands the window frame to cover all rows.
14. NTH_VALUE(col, n) OVER (...)
Plain English: Get the value from the nth row in the window. NTH_VALUE(product_name, 2) means "give me the product name from the 2nd row." You usually need the same frame clause as LAST_VALUE to see all rows.
-- Show the 2nd cheapest product alongside each row
SELECT
product_name,
price,
NTH_VALUE(product_name, 2) OVER (
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_cheapest
FROM products;
Output:
| product_name | price | second_cheapest |
| USB Cable | 9.99 | Mouse Pad |
| Mouse Pad | 14.50 | Mouse Pad |
| Wireless Mouse | 29.99 | Mouse Pad |
| Mechanical Keyboard | 89.00 | Mouse Pad |
| Monitor Stand | 129.99 | Mouse Pad |
Visual: What Is a "Window"?
Plain English: A "window" is the set of rows that a window function can see when it calculates a value for the current row. The window can be all rows, rows in the same group (PARTITION BY), or a sliding range of rows (frame clause). Below is a visual showing how the window changes for each row.
Window Frame: ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
For each "current row" (yellow), the window includes 1 row before and 1 row after (blue). Gray rows are outside the window.
When current row = Row 1
| Row | product_name | price |
| 1 | USB Cable | 9.99 |
| 2 | Mouse Pad | 14.50 |
| 3 | Wireless Mouse | 29.99 |
| 4 | Mechanical Keyboard | 89.00 |
| 5 | Monitor Stand | 129.99 |
When current row = Row 3
| Row | product_name | price |
| 1 | USB Cable | 9.99 |
| 2 | Mouse Pad | 14.50 |
| 3 | Wireless Mouse | 29.99 |
| 4 | Mechanical Keyboard | 89.00 |
| 5 | Monitor Stand | 129.99 |
When current row = Row 5
| Row | product_name | price |
| 1 | USB Cable | 9.99 |
| 2 | Mouse Pad | 14.50 |
| 3 | Wireless Mouse | 29.99 |
| 4 | Mechanical Keyboard | 89.00 |
| 5 | Monitor Stand | 129.99 |
Current row
In the window
Outside the window
A CTE is like giving a name to a temporary result set. Instead of nesting subqueries deep inside each other (which gets hard to read), you write WITH name AS (query) at the top, and then use that name like a table. It exists only for the duration of that single SQL statement.
1. Basic CTE (WITH ... AS)
Plain English: Think of it as creating a temporary "mini-table" that you define at the top and then use in your main query. It makes complex queries easier to read.
-- Find customers whose total spending exceeds 1000
WITH customer_totals AS (
SELECT
customer_name,
SUM(total) AS total_spent,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_name
)
SELECT *
FROM customer_totals
WHERE total_spent > 1000;
Output:
| customer_name | total_spent | order_count |
| Bob Wilson | 1200.00 | 3 |
| Alice Johnson | 1150.00 | 2 |
2. Multiple CTEs Chained Together
Plain English: You can define several CTEs separated by commas, and later CTEs can even reference earlier ones. It is like building blocks — each CTE builds on top of the previous one.
-- CTE 1: total sales per department
-- CTE 2: average across departments
-- Final: departments above average
WITH dept_counts AS (
SELECT
department,
COUNT(*) AS total_sales
FROM sales
GROUP BY department
),
avg_sales AS (
SELECT AVG(total_sales) AS avg_count
FROM dept_counts
)
SELECT
d.department,
d.total_sales,
ROUND(a.avg_count, 1) AS overall_avg
FROM dept_counts d
CROSS JOIN avg_sales a
WHERE d.total_sales > a.avg_count;
Output:
| department | total_sales | overall_avg |
| Engineering | 245 | 186.7 |
| Marketing | 210 | 186.7 |
3. Recursive CTE
Plain English: A recursive CTE calls itself repeatedly. It starts with a "base" query (the anchor), then keeps running the "recursive" part that references the CTE, joining new rows each time, until no more rows are found. Perfect for hierarchies like org charts or category trees.
Company Hierarchy (What we are querying)
Diana Prince (CEO)
├── Bob Wilson (VP Engineering)
│ ├── John Smith (Senior Developer)
│ └── Jane Doe (Senior Developer)
├── Charlie Brown (VP Marketing)
│ └── Alice Johnson (Marketing Manager)
└── Sarah Lee (VP Sales)
└── Mike Chen (Sales Manager)
-- Build the full hierarchy from a self-referencing table
WITH RECURSIVE org_chart AS (
-- Anchor: start with the top-level (no manager)
SELECT
employee_id,
employee_name,
title,
manager_id,
0 AS level,
CAST(employee_name AS CHAR(500)) AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find employees whose manager is in the previous level
SELECT
e.employee_id,
e.employee_name,
e.title,
e.manager_id,
o.level + 1,
CONCAT(o.path, ' > ', e.employee_name)
FROM employees e
INNER JOIN org_chart o
ON e.manager_id = o.employee_id
)
SELECT
CONCAT(REPEAT(' ', level), employee_name) AS hierarchy,
title,
level,
path
FROM org_chart
ORDER BY path;
Output:
| hierarchy | title | level | path |
| Diana Prince | CEO | 0 | Diana Prince |
| Bob Wilson | VP Engineering | 1 | Diana Prince > Bob Wilson |
| Jane Doe | Senior Developer | 2 | Diana Prince > Bob Wilson > Jane Doe |
| John Smith | Senior Developer | 2 | Diana Prince > Bob Wilson > John Smith |
| Charlie Brown | VP Marketing | 1 | Diana Prince > Charlie Brown |
| Alice Johnson | Marketing Manager | 2 | Diana Prince > Charlie Brown > Alice Johnson |
| Sarah Lee | VP Sales | 1 | Diana Prince > Sarah Lee |
| Mike Chen | Sales Manager | 2 | Diana Prince > Sarah Lee > Mike Chen |
Aggregate functions take many rows and crunch them down into a single value (or one value per group if you use GROUP BY). They are the bread and butter of reporting.
1. COUNT(*)
Plain English: Counts the total number of rows. It counts everything — even rows with NULL values. Use it when you want to know "how many records are there?"
-- How many orders were placed in total?
SELECT
COUNT(*) AS total_orders
FROM orders;
Output:
2. COUNT(column)
Plain English: Counts only the rows where that specific column is NOT NULL. If a customer has no email on file (NULL), COUNT(email) will not count that row.
-- How many customers have a phone number on file?
SELECT
COUNT(phone) AS with_phone,
COUNT(*) AS total_customers
FROM customers;
Output:
| with_phone | total_customers |
| 4 | 5 |
Diana Prince has no phone number (NULL), so COUNT(phone) returns 4 while COUNT(*) returns 5.
3. COUNT(DISTINCT column)
Plain English: Counts only the unique values. If "Wireless Mouse" appears 5 times, it only counts as 1.
-- How many different products have been ordered?
SELECT
COUNT(DISTINCT product_name) AS unique_products,
COUNT(product_name) AS total_order_lines
FROM orders;
Output:
| unique_products | total_order_lines |
| 5 | 12 |
4. SUM()
Plain English: Adds up all the values in a column. Like using the SUM function in a spreadsheet. NULL values are ignored.
-- Total revenue per department
SELECT
department,
SUM(amount) AS total_revenue
FROM sales
GROUP BY department;
Output:
| department | total_revenue |
| Engineering | 8,750.00 |
| Marketing | 12,300.00 |
| Sales | 3,450.00 |
5. AVG()
Plain English: Calculates the average (mean). It adds everything up and divides by the count of non-NULL values.
-- Average sale amount by department
SELECT
department,
ROUND(AVG(amount), 2) AS avg_amount
FROM sales
GROUP BY department;
Output:
| department | avg_amount |
| Engineering | 350.00 |
| Marketing | 550.00 |
| Sales | 150.00 |
6. MIN()
Plain English: Finds the smallest value. Works with numbers, dates, and strings (alphabetically first).
-- Earliest order date and smallest sale per department
SELECT
department,
MIN(amount) AS smallest_sale,
MIN(sale_date) AS earliest_date
FROM sales
GROUP BY department;
Output:
| department | smallest_sale | earliest_date |
| Engineering | 350.00 | 2025-01-05 |
| Marketing | 200.00 | 2025-01-08 |
| Sales | 150.00 | 2025-01-12 |
7. MAX()
Plain English: Finds the largest value. The opposite of MIN.
-- Largest sale and latest date per department
SELECT
department,
MAX(amount) AS largest_sale,
MAX(sale_date) AS latest_date
FROM sales
GROUP BY department;
Output:
| department | largest_sale | latest_date |
| Engineering | 350.00 | 2025-03-15 |
| Marketing | 800.00 | 2025-03-18 |
| Sales | 150.00 | 2025-03-10 |
8. GROUP_CONCAT(col SEPARATOR ',')
Plain English: Takes all the values in a group and glues them together into one comma-separated string. Instead of 5 rows for 5 products, you get 1 row with all product names joined together.
-- List all products each customer has ordered, in one cell
SELECT
customer_name,
GROUP_CONCAT(product_name SEPARATOR ', ') AS products_ordered
FROM orders
GROUP BY customer_name;
Output:
| customer_name | products_ordered |
| Bob Wilson | Wireless Mouse, Mouse Pad, Mechanical Keyboard |
| Jane Doe | USB Cable, Wireless Mouse |
| Charlie Brown | Monitor Stand, Wireless Mouse |
| Alice Johnson | Monitor Stand, Mechanical Keyboard, Mouse Pad |
| Diana Prince | USB Cable, Mouse Pad |
9. GROUP_CONCAT with ORDER BY and DISTINCT
Plain English: You can sort the values before concatenating, and use DISTINCT to remove duplicates. This ensures a clean, alphabetically ordered list with no repeats.
-- Unique products per category, sorted alphabetically
SELECT
category,
GROUP_CONCAT(
DISTINCT product_name
ORDER BY product_name ASC
SEPARATOR ' | '
) AS available_products
FROM products
GROUP BY category;
Output:
| category | available_products |
| Accessories | Mouse Pad | USB Cable |
| Peripherals | Mechanical Keyboard | Monitor Stand | Wireless Mouse |
10. JSON_ARRAYAGG(col)
Plain English: Like GROUP_CONCAT, but instead of a comma-separated string, it produces a proper JSON array. Useful when your application consumes JSON data.
-- List each customer's orders as a JSON array
SELECT
customer_name,
JSON_ARRAYAGG(product_name) AS products_json
FROM orders
GROUP BY customer_name;
Output:
| customer_name | products_json |
| Bob Wilson | ["Wireless Mouse", "Mouse Pad", "Mechanical Keyboard"] |
| Jane Doe | ["USB Cable", "Wireless Mouse"] |
| Charlie Brown | ["Monitor Stand", "Wireless Mouse"] |
| Alice Johnson | ["Monitor Stand", "Mechanical Keyboard", "Mouse Pad"] |
| Diana Prince | ["USB Cable", "Mouse Pad"] |
11. JSON_OBJECTAGG(key, value)
Plain English: Creates a JSON object (key-value pairs) from two columns. The first column becomes the key, the second becomes the value. Great for creating lookup maps.
-- Build a JSON price list for each category
SELECT
category,
JSON_OBJECTAGG(product_name, price) AS price_list
FROM products
GROUP BY category;
Output:
| category | price_list |
| Accessories | {"USB Cable": 9.99, "Mouse Pad": 14.50} |
| Peripherals | {"Wireless Mouse": 29.99, "Mechanical Keyboard": 89.00, "Monitor Stand": 129.99} |
12. STD() / STDDEV()
Plain English: Standard deviation measures how spread out the values are. A low standard deviation means values are clustered close to the average; a high one means they are widely spread. STD() and STDDEV() are synonyms in MySQL.
-- How consistent are sale amounts within each department?
SELECT
department,
COUNT(*) AS num_sales,
ROUND(AVG(amount), 2) AS avg_amount,
ROUND(STD(amount), 2) AS std_dev
FROM sales
GROUP BY department;
Output:
| department | num_sales | avg_amount | std_dev |
| Engineering | 25 | 350.00 | 0.00 |
| Marketing | 18 | 548.33 | 247.22 |
| Sales | 15 | 150.00 | 0.00 |
Engineering and Sales each have a consistent sale amount, so their standard deviation is 0 — no variation. Marketing has three different sale tiers (200, 650, 800), producing a higher standard deviation.
13. VARIANCE()
Plain English: Variance is the standard deviation squared. It also measures how spread out values are, but in squared units. If the standard deviation is 247.22, the variance is 247.22 × 247.22 = 61,117.73. Some statistical formulas need variance instead of standard deviation.
-- Variance of sale amounts per department
SELECT
department,
ROUND(AVG(amount), 2) AS avg_amount,
ROUND(VARIANCE(amount), 2) AS amount_variance,
ROUND(STD(amount), 2) AS std_dev
FROM sales
GROUP BY department;
Output:
| department | avg_amount | amount_variance | std_dev |
| Engineering | 350.00 | 0.00 | 0.00 |
| Marketing | 548.33 | 61117.73 | 247.22 |
| Sales | 150.00 | 0.00 | 0.00 |