Table of Contents

Section 2 — Window / Ranking Functions

Window functions let you do calculations across a set of rows that are related to the current row — without collapsing rows like GROUP BY does. Think of it like looking through a sliding window at your data: you can see surrounding rows and compute things like rankings, running totals, or the previous row's value.

1. ROW_NUMBER() OVER (ORDER BY ...)

Plain English: Give every row a unique number, starting at 1, based on the order you choose. Even if two rows have the same value, they still get different numbers.
-- Number all products from cheapest to most expensive
SELECT
    ROW_NUMBER() OVER (ORDER BY price) AS row_num,
    product_name,
    price
FROM products;
Output:
row_numproduct_nameprice
1USB Cable9.99
2Mouse Pad14.50
3Wireless Mouse29.99
4Mechanical Keyboard89.00
5Monitor Stand129.99

2. ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

Plain English: Same as above, but the numbering restarts at 1 for each group. It is like numbering students within each classroom separately.
-- Number each customer's orders, most recent first, within each customer
SELECT
    ROW_NUMBER() OVER (
        PARTITION BY customer_name
        ORDER BY order_date DESC
    ) AS rn,
    customer_name,
    product_name,
    order_date
FROM orders;
Output:
rncustomer_nameproduct_nameorder_date
1Alice JohnsonMonitor Stand2025-03-10
2Alice JohnsonWireless Mouse2025-02-14
1Bob WilsonMechanical Keyboard2025-03-12
2Bob WilsonMouse Pad2025-03-01
3Bob WilsonWireless Mouse2025-01-20
1Jane DoeUSB Cable2025-03-05

3. RANK() OVER (ORDER BY ...)

Plain English: Like a sports ranking — if two people tie for 2nd place, they both get rank 2, and the next person gets rank 4 (it skips 3). There are gaps when ties happen.
-- Rank employees by number of sales closed (ties allowed, with gaps)
SELECT
    RANK() OVER (ORDER BY sales_closed DESC) AS rnk,
    employee_name,
    sales_closed
FROM employees;
Output:
rnkemployee_namesales_closed
1Diana Prince120
2Bob Wilson98
2Charlie Brown98
4John Smith85
5Jane Doe72

4. DENSE_RANK() OVER (ORDER BY ...)

Plain English: Same as RANK, but with no gaps. If two people tie for 2nd, the next person gets rank 3 (not 4). The ranks stay consecutive.
-- Dense-rank employees by sales closed (no gaps on ties)
SELECT
    DENSE_RANK() OVER (ORDER BY sales_closed DESC) AS d_rnk,
    employee_name,
    sales_closed
FROM employees;
Output:
d_rnkemployee_namesales_closed
1Diana Prince120
2Bob Wilson98
2Charlie Brown98
3John Smith85
4Jane Doe72

5. Side-by-Side: ROW_NUMBER vs RANK vs DENSE_RANK

Plain English: This is the best way to see the difference. All three functions run on the same data. Notice how ties are handled differently by each one.
SELECT
    employee_name,
    sales_closed,
    ROW_NUMBER() OVER (ORDER BY sales_closed DESC) AS row_num,
    RANK()       OVER (ORDER BY sales_closed DESC) AS rnk,
    DENSE_RANK() OVER (ORDER BY sales_closed DESC) AS d_rnk
FROM employees;
Output:
employee_namesales_closedrow_numrnkd_rnk
Diana Prince120111
Bob Wilson98222
Charlie Brown98322
John Smith85443
Jane Doe72554
Key takeaway: The tied rows (98 each, highlighted yellow) all share the same RANK and DENSE_RANK, but ROW_NUMBER always gives unique numbers. After the tie, RANK skips to 4 while DENSE_RANK continues to 3.

6. NTILE(n) OVER (ORDER BY ...)

Plain English: Divide all your rows into n roughly equal buckets. If you say NTILE(3), it splits the data into 3 groups. Useful for creating quartiles, percentiles, etc.
-- Split products into 3 price tiers
SELECT
    product_name,
    price,
    NTILE(3) OVER (ORDER BY price) AS price_tier
FROM products;
Output:
product_namepriceprice_tier
USB Cable9.991
Mouse Pad14.501
Wireless Mouse29.992
Mechanical Keyboard89.002
Monitor Stand129.993

7. PERCENT_RANK() OVER (ORDER BY ...)

Plain English: Shows where each row falls as a percentage. The formula is (rank - 1) / (total_rows - 1). The lowest value gets 0 and the highest gets 1. Think of it as: "What percentage of rows are below me?"
SELECT
    product_name,
    price,
    PERCENT_RANK() OVER (ORDER BY price) AS pct_rank
FROM products;
Output:
product_namepricepct_rank
USB Cable9.990.0000
Mouse Pad14.500.2500
Wireless Mouse29.990.5000
Mechanical Keyboard89.000.7500
Monitor Stand129.991.0000

8. CUME_DIST() OVER (ORDER BY ...)

Plain English: Cumulative distribution — the fraction of rows that have a value less than or equal to the current row. Formula: count of rows ≤ current / total rows. Unlike PERCENT_RANK, the first row is NOT zero.
SELECT
    product_name,
    price,
    CUME_DIST() OVER (ORDER BY price) AS cume
FROM products;
Output:
product_namepricecume
USB Cable9.990.2000
Mouse Pad14.500.4000
Wireless Mouse29.990.6000
Mechanical Keyboard89.000.8000
Monitor Stand129.991.0000

9. LAG(col, n) OVER (ORDER BY ...)

Plain English: Look at the value from a previous row. LAG(price, 1) means "give me the price from 1 row before this one." If there is no previous row, you get NULL. Super useful for comparing to the previous period.
-- Compare each month's sales total to the previous month
SELECT
    month_name,
    total_sales,
    LAG(total_sales, 1) OVER (ORDER BY month_num) AS prev_month,
    total_sales - LAG(total_sales, 1) OVER (ORDER BY month_num) AS change
FROM monthly_sales;
Output:
month_nametotal_salesprev_monthchange
January320NULLNULL
February285320-35
March410285125
April398410-12
May44539847

10. LAG() with PARTITION BY

Plain English: Same idea as LAG, but it looks at the previous row within each group. The "memory" resets when you move to a new group. Perfect for comparing an employee's current performance to their previous performance.
-- Each employee's monthly sales compared to their previous month
SELECT
    employee_name,
    sale_date,
    amount,
    LAG(amount, 1) OVER (
        PARTITION BY employee_name
        ORDER BY sale_date
    ) AS prev_amount
FROM sales;
Output:
employee_namesale_dateamountprev_amount
Bob Wilson2025-01-154500NULL
Bob Wilson2025-02-2038004500
Bob Wilson2025-03-1852003800
Jane Doe2025-01-222900NULL
Jane Doe2025-03-0131002900

11. LEAD(col, n) OVER (ORDER BY ...)

Plain English: The opposite of LAG — it peeks at the next row's value. LEAD(price, 1) means "give me the price from 1 row after this one." NULL if there is no next row.
-- Show the next scheduled delivery for each order
SELECT
    customer_name,
    order_date,
    LEAD(order_date, 1) OVER (
        ORDER BY order_date
    ) AS next_order
FROM orders;
Output:
customer_nameorder_datenext_order
Charlie Brown2025-03-012025-03-03
Alice Johnson2025-03-032025-03-05
Bob Wilson2025-03-052025-03-07
Diana Prince2025-03-072025-03-10
Jane Doe2025-03-10NULL

12. FIRST_VALUE(col) OVER (...)

Plain English: Grabs the very first value in the window. For example, "which was the cheapest product?" or "what was the earliest date?" Every row in the result shows that same first value, so you can compare each row to it.
-- Show each product alongside the cheapest product's name
SELECT
    product_name,
    price,
    FIRST_VALUE(product_name) OVER (
        ORDER BY price
    ) AS cheapest_product
FROM products;
Output:
product_namepricecheapest_product
USB Cable9.99USB Cable
Mouse Pad14.50USB Cable
Wireless Mouse29.99USB Cable
Mechanical Keyboard89.00USB Cable
Monitor Stand129.99USB Cable

13. LAST_VALUE(col) OVER (... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

Plain English: Grabs the very last value in the window. But there is a catch! By default, the window frame only goes up to the current row, so LAST_VALUE would just return the current row's value. You must add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to tell MySQL "look at ALL the rows, from the very first to the very last."
-- Show each product alongside the most expensive product's name
SELECT
    product_name,
    price,
    LAST_VALUE(product_name) OVER (
        ORDER BY price
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS most_expensive
FROM products;
Output:
product_namepricemost_expensive
USB Cable9.99Monitor Stand
Mouse Pad14.50Monitor Stand
Wireless Mouse29.99Monitor Stand
Mechanical Keyboard89.00Monitor Stand
Monitor Stand129.99Monitor Stand
Without the frame clause, LAST_VALUE would return the current row's own product_name for each row — not the actual last value. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause expands the window frame to cover all rows.

14. NTH_VALUE(col, n) OVER (...)

Plain English: Get the value from the nth row in the window. NTH_VALUE(product_name, 2) means "give me the product name from the 2nd row." You usually need the same frame clause as LAST_VALUE to see all rows.
-- Show the 2nd cheapest product alongside each row
SELECT
    product_name,
    price,
    NTH_VALUE(product_name, 2) OVER (
        ORDER BY price
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_cheapest
FROM products;
Output:
product_namepricesecond_cheapest
USB Cable9.99Mouse Pad
Mouse Pad14.50Mouse Pad
Wireless Mouse29.99Mouse Pad
Mechanical Keyboard89.00Mouse Pad
Monitor Stand129.99Mouse Pad

Visual: What Is a "Window"?

Plain English: A "window" is the set of rows that a window function can see when it calculates a value for the current row. The window can be all rows, rows in the same group (PARTITION BY), or a sliding range of rows (frame clause). Below is a visual showing how the window changes for each row.
Window Frame: ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

For each "current row" (yellow), the window includes 1 row before and 1 row after (blue). Gray rows are outside the window.

When current row = Row 1
Rowproduct_nameprice
1USB Cable9.99
2Mouse Pad14.50
3Wireless Mouse29.99
4Mechanical Keyboard89.00
5Monitor Stand129.99
When current row = Row 3
Rowproduct_nameprice
1USB Cable9.99
2Mouse Pad14.50
3Wireless Mouse29.99
4Mechanical Keyboard89.00
5Monitor Stand129.99
When current row = Row 5
Rowproduct_nameprice
1USB Cable9.99
2Mouse Pad14.50
3Wireless Mouse29.99
4Mechanical Keyboard89.00
5Monitor Stand129.99
Current row    In the window    Outside the window
Section 3 — CTEs (Common Table Expressions)

A CTE is like giving a name to a temporary result set. Instead of nesting subqueries deep inside each other (which gets hard to read), you write WITH name AS (query) at the top, and then use that name like a table. It exists only for the duration of that single SQL statement.

1. Basic CTE (WITH ... AS)

Plain English: Think of it as creating a temporary "mini-table" that you define at the top and then use in your main query. It makes complex queries easier to read.
-- Find customers whose total spending exceeds 1000
WITH customer_totals AS (
    SELECT
        customer_name,
        SUM(total) AS total_spent,
        COUNT(*) AS order_count
    FROM orders
    GROUP BY customer_name
)
SELECT *
FROM customer_totals
WHERE total_spent > 1000;
Output:
customer_nametotal_spentorder_count
Bob Wilson1200.003
Alice Johnson1150.002

2. Multiple CTEs Chained Together

Plain English: You can define several CTEs separated by commas, and later CTEs can even reference earlier ones. It is like building blocks — each CTE builds on top of the previous one.
-- CTE 1: total sales per department
-- CTE 2: average across departments
-- Final: departments above average
WITH dept_counts AS (
    SELECT
        department,
        COUNT(*) AS total_sales
    FROM sales
    GROUP BY department
),
avg_sales AS (
    SELECT AVG(total_sales) AS avg_count
    FROM dept_counts
)
SELECT
    d.department,
    d.total_sales,
    ROUND(a.avg_count, 1) AS overall_avg
FROM dept_counts d
CROSS JOIN avg_sales a
WHERE d.total_sales > a.avg_count;
Output:
departmenttotal_salesoverall_avg
Engineering245186.7
Marketing210186.7

3. Recursive CTE

Plain English: A recursive CTE calls itself repeatedly. It starts with a "base" query (the anchor), then keeps running the "recursive" part that references the CTE, joining new rows each time, until no more rows are found. Perfect for hierarchies like org charts or category trees.
Company Hierarchy (What we are querying)
  Diana Prince (CEO)
  ├── Bob Wilson (VP Engineering)
  │   ├── John Smith (Senior Developer)
  │   └── Jane Doe (Senior Developer)
  ├── Charlie Brown (VP Marketing)
  │   └── Alice Johnson (Marketing Manager)
  └── Sarah Lee (VP Sales)
      └── Mike Chen (Sales Manager)
-- Build the full hierarchy from a self-referencing table
WITH RECURSIVE org_chart AS (
    -- Anchor: start with the top-level (no manager)
    SELECT
        employee_id,
        employee_name,
        title,
        manager_id,
        0 AS level,
        CAST(employee_name AS CHAR(500)) AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: find employees whose manager is in the previous level
    SELECT
        e.employee_id,
        e.employee_name,
        e.title,
        e.manager_id,
        o.level + 1,
        CONCAT(o.path, ' > ', e.employee_name)
    FROM employees e
    INNER JOIN org_chart o
        ON e.manager_id = o.employee_id
)
SELECT
    CONCAT(REPEAT('  ', level), employee_name) AS hierarchy,
    title,
    level,
    path
FROM org_chart
ORDER BY path;
Output:
hierarchytitlelevelpath
Diana PrinceCEO0Diana Prince
  Bob WilsonVP Engineering1Diana Prince > Bob Wilson
    Jane DoeSenior Developer2Diana Prince > Bob Wilson > Jane Doe
    John SmithSenior Developer2Diana Prince > Bob Wilson > John Smith
  Charlie BrownVP Marketing1Diana Prince > Charlie Brown
    Alice JohnsonMarketing Manager2Diana Prince > Charlie Brown > Alice Johnson
  Sarah LeeVP Sales1Diana Prince > Sarah Lee
    Mike ChenSales Manager2Diana Prince > Sarah Lee > Mike Chen
Section 4 — Aggregate Functions

Aggregate functions take many rows and crunch them down into a single value (or one value per group if you use GROUP BY). They are the bread and butter of reporting.

1. COUNT(*)

Plain English: Counts the total number of rows. It counts everything — even rows with NULL values. Use it when you want to know "how many records are there?"
-- How many orders were placed in total?
SELECT
    COUNT(*) AS total_orders
FROM orders;
Output:
total_orders
12

2. COUNT(column)

Plain English: Counts only the rows where that specific column is NOT NULL. If a customer has no email on file (NULL), COUNT(email) will not count that row.
-- How many customers have a phone number on file?
SELECT
    COUNT(phone) AS with_phone,
    COUNT(*) AS total_customers
FROM customers;
Output:
with_phonetotal_customers
45
Diana Prince has no phone number (NULL), so COUNT(phone) returns 4 while COUNT(*) returns 5.

3. COUNT(DISTINCT column)

Plain English: Counts only the unique values. If "Wireless Mouse" appears 5 times, it only counts as 1.
-- How many different products have been ordered?
SELECT
    COUNT(DISTINCT product_name) AS unique_products,
    COUNT(product_name) AS total_order_lines
FROM orders;
Output:
unique_productstotal_order_lines
512

4. SUM()

Plain English: Adds up all the values in a column. Like using the SUM function in a spreadsheet. NULL values are ignored.
-- Total revenue per department
SELECT
    department,
    SUM(amount) AS total_revenue
FROM sales
GROUP BY department;
Output:
departmenttotal_revenue
Engineering8,750.00
Marketing12,300.00
Sales3,450.00

5. AVG()

Plain English: Calculates the average (mean). It adds everything up and divides by the count of non-NULL values.
-- Average sale amount by department
SELECT
    department,
    ROUND(AVG(amount), 2) AS avg_amount
FROM sales
GROUP BY department;
Output:
departmentavg_amount
Engineering350.00
Marketing550.00
Sales150.00

6. MIN()

Plain English: Finds the smallest value. Works with numbers, dates, and strings (alphabetically first).
-- Earliest order date and smallest sale per department
SELECT
    department,
    MIN(amount) AS smallest_sale,
    MIN(sale_date) AS earliest_date
FROM sales
GROUP BY department;
Output:
departmentsmallest_saleearliest_date
Engineering350.002025-01-05
Marketing200.002025-01-08
Sales150.002025-01-12

7. MAX()

Plain English: Finds the largest value. The opposite of MIN.
-- Largest sale and latest date per department
SELECT
    department,
    MAX(amount) AS largest_sale,
    MAX(sale_date) AS latest_date
FROM sales
GROUP BY department;
Output:
departmentlargest_salelatest_date
Engineering350.002025-03-15
Marketing800.002025-03-18
Sales150.002025-03-10

8. GROUP_CONCAT(col SEPARATOR ',')

Plain English: Takes all the values in a group and glues them together into one comma-separated string. Instead of 5 rows for 5 products, you get 1 row with all product names joined together.
-- List all products each customer has ordered, in one cell
SELECT
    customer_name,
    GROUP_CONCAT(product_name SEPARATOR ', ') AS products_ordered
FROM orders
GROUP BY customer_name;
Output:
customer_nameproducts_ordered
Bob WilsonWireless Mouse, Mouse Pad, Mechanical Keyboard
Jane DoeUSB Cable, Wireless Mouse
Charlie BrownMonitor Stand, Wireless Mouse
Alice JohnsonMonitor Stand, Mechanical Keyboard, Mouse Pad
Diana PrinceUSB Cable, Mouse Pad

9. GROUP_CONCAT with ORDER BY and DISTINCT

Plain English: You can sort the values before concatenating, and use DISTINCT to remove duplicates. This ensures a clean, alphabetically ordered list with no repeats.
-- Unique products per category, sorted alphabetically
SELECT
    category,
    GROUP_CONCAT(
        DISTINCT product_name
        ORDER BY product_name ASC
        SEPARATOR ' | '
    ) AS available_products
FROM products
GROUP BY category;
Output:
categoryavailable_products
AccessoriesMouse Pad | USB Cable
PeripheralsMechanical Keyboard | Monitor Stand | Wireless Mouse

10. JSON_ARRAYAGG(col)

Plain English: Like GROUP_CONCAT, but instead of a comma-separated string, it produces a proper JSON array. Useful when your application consumes JSON data.
-- List each customer's orders as a JSON array
SELECT
    customer_name,
    JSON_ARRAYAGG(product_name) AS products_json
FROM orders
GROUP BY customer_name;
Output:
customer_nameproducts_json
Bob Wilson["Wireless Mouse", "Mouse Pad", "Mechanical Keyboard"]
Jane Doe["USB Cable", "Wireless Mouse"]
Charlie Brown["Monitor Stand", "Wireless Mouse"]
Alice Johnson["Monitor Stand", "Mechanical Keyboard", "Mouse Pad"]
Diana Prince["USB Cable", "Mouse Pad"]

11. JSON_OBJECTAGG(key, value)

Plain English: Creates a JSON object (key-value pairs) from two columns. The first column becomes the key, the second becomes the value. Great for creating lookup maps.
-- Build a JSON price list for each category
SELECT
    category,
    JSON_OBJECTAGG(product_name, price) AS price_list
FROM products
GROUP BY category;
Output:
categoryprice_list
Accessories{"USB Cable": 9.99, "Mouse Pad": 14.50}
Peripherals{"Wireless Mouse": 29.99, "Mechanical Keyboard": 89.00, "Monitor Stand": 129.99}

12. STD() / STDDEV()

Plain English: Standard deviation measures how spread out the values are. A low standard deviation means values are clustered close to the average; a high one means they are widely spread. STD() and STDDEV() are synonyms in MySQL.
-- How consistent are sale amounts within each department?
SELECT
    department,
    COUNT(*) AS num_sales,
    ROUND(AVG(amount), 2) AS avg_amount,
    ROUND(STD(amount), 2) AS std_dev
FROM sales
GROUP BY department;
Output:
departmentnum_salesavg_amountstd_dev
Engineering25350.000.00
Marketing18548.33247.22
Sales15150.000.00
Engineering and Sales each have a consistent sale amount, so their standard deviation is 0 — no variation. Marketing has three different sale tiers (200, 650, 800), producing a higher standard deviation.

13. VARIANCE()

Plain English: Variance is the standard deviation squared. It also measures how spread out values are, but in squared units. If the standard deviation is 247.22, the variance is 247.22 × 247.22 = 61,117.73. Some statistical formulas need variance instead of standard deviation.
-- Variance of sale amounts per department
SELECT
    department,
    ROUND(AVG(amount), 2) AS avg_amount,
    ROUND(VARIANCE(amount), 2) AS amount_variance,
    ROUND(STD(amount), 2) AS std_dev
FROM sales
GROUP BY department;
Output:
departmentavg_amountamount_variancestd_dev
Engineering350.000.000.00
Marketing548.3361117.73247.22
Sales150.000.000.00
MySQL 8.0 Complete Guide — Part 2: Window Functions, CTEs & Aggregates
← Back to Main Guide