MySQL allows you to easily organize data using GROUP BY clause. It is commonly used to summarize data by product, category, geography, etc. and other group variables. But sometimes, database developers may need to pick top N rows in each group. There is no built-in function or clause to support this requirement. So we will need to construct one. In this article, we will learn how to select top N rows per group in MySQL.
Why Select Top Rows Per Group
In this case, the requirement is to select top N rows for each distinct value of another column (GROUP BY), based on one or more criteria. For example, you may need to select top 3 sales reps for each region (GROUP) in sales data table. You may need to select top 3 products for each product category in product sales data.
As you can see, it is very important to be able to pick top N rows in each group from a table, based on a criteria. It is applicable in different business areas such as sales, orders, marketing, etc. Since there is no direct function or clause to support this, you need to construct an SQL query for this purpose.
How to Select Top N Rows Per Group
There are two ways to select top N rows per group in MySQL – using windows function and using sub query.
Let us say you have the following sales data table which lists quantity ordered per brand for 3 categories – laptop, mobile and car.
CREATE TABLE sales (
id INTEGER PRIMARY KEY auto_increment,
category varchar(255),
brand varchar(255),
quantity int
);
insert into sales(category,brand, quantity)
values('Laptop','Apple',100),
('Laptop','Samsung',200),
('Laptop','IBM',150),
('Mobile','Apple',900),
('Mobile','Samsung',500),
('Mobile','Sony',100),
('Car','Toyota',100),
('Car','Honda',250),
('Car','Hyundai',120);
select * from sales;
+----+----------+---------+----------+
| id | category | brand | quantity |
+----+----------+---------+----------+
| 1 | Laptop | Apple | 100 |
| 2 | Laptop | Samsung | 200 |
| 3 | Laptop | IBM | 150 |
| 4 | Mobile | Apple | 900 |
| 5 | Mobile | Samsung | 500 |
| 6 | Mobile | Sony | 100 |
| 7 | Car | Toyota | 100 |
| 8 | Car | Honda | 250 |
| 9 | Car | Hyundai | 120 |
+----+----------+---------+----------+
Let us say you want to get top 2 brands for each product category based on quantity column. Here are the two ways to do this.
1. Using row_number()
Row_number() is a window function. Please note, this window function is available only in MySQL 8+. For older MySQL versions, use the next solution.
Before we proceed further, it is important to understand what a window function is.
What is a window function
A windows function is used to perform certain calculation for rows that are related to the present row. It sequentially traverses the table, one row at a time, and in each iteration performs calculations across rows related to this present row. These calculations are generally aggregations or some other statistical function. Unlike GROUP BY clause, that results in a single row of result, window function stores the result of calculation for each row. It is commonly used to calculate running total and rank values. You can learn more about window functions here.
Here is the SQL query to select top 2 brands per category, based on quantity column.
WITH RankedSales AS (
SELECT
category,
brand,
quantity,
ROW_NUMBER() OVER(PARTITION BY category ORDER BY quantity DESC) AS rn
FROM sales
)
SELECT category,brand, quantity
FROM RankedSales
WHERE rn <= 2;
+----------+---------+----------+
| category | brand | quantity |
+----------+---------+----------+
| Car | Honda | 250 |
| Car | Hyundai | 120 |
| Laptop | Samsung | 200 |
| Laptop | IBM | 150 |
| Mobile | Apple | 900 |
| Mobile | Samsung | 500 |
+----------+---------+----------+
Let us look at the above queries in detail. We use 2 SQL queries for our purpose.
First of all, we use a common table expression (CTE) named RankedSale to simplify the query. It is optional though.
Next, we use the following line to assign a unique sequential integer to each row, within a partition.
ROW_NUMBER() OVER(PARTITION BY category ORDER BY quantity DESC
Lastly, we use an SQL query to filter only those rows from RankedSale where rn<=2, that is, where row number <=2.
2. Using Sub Query
Here is an SQL query to determine the top 2 brands in each category based on quantity field.
SELECT category,brand,quantity
FROM
(SELECT category,brand,quantity,
@category_rank := IF(@current_category = category, @category_rank + 1, 1) AS category_rank,
@current_category := category
FROM sales
ORDER BY category,quantity DESC
) ranked
WHERE category_rank <= 2;
+----------+---------+----------+
| category | brand | quantity |
+----------+---------+----------+
| Car | Honda | 250 |
| Car | Hyundai | 120 |
| Laptop | Samsung | 200 |
| Laptop | IBM | 150 |
| Mobile | Apple | 900 |
| Mobile | Samsung | 500 |
+----------+---------+----------+
Let us look at the above query in detail. Here we use a nested SQL sub query. The outer SQL query iterates over sales table.
The inner query counts the number of records where quantity column has a value greater than or equal to that of the present row. It calculates and stores the category_rank and current_category value for each row.
The condition <=2 will limit the result to only top 2 rows for each category.
In the above query, we use @current_category:=category session variable to store the present ranking category of each row. For each row, if the current_category variable’s value is same as category column’s value, we increment category_rank by 1. If it is the first row, we set it to 1.
If you only run inner SQL query, it will return the following result.
SELECT category,brand,quantity,
@category_rank := IF(@current_category = category, @category_rank + 1, 1) AS category_rank,
@current_category := category
FROM sales
ORDER BY category,quantity DESC;
+----------+---------+----------+---------------+-------------------------------+
| category | brand | quantity | category_rank | @current_category := category |
+----------+---------+----------+---------------+-------------------------------+
| Car | Honda | 250 | 1 | Car |
| Car | Hyundai | 120 | 2 | Car |
| Car | Toyota | 100 | 3 | Car |
| Laptop | Samsung | 200 | 1 | Laptop |
| Laptop | IBM | 150 | 2 | Laptop |
| Laptop | Apple | 100 | 3 | Laptop |
| Mobile | Apple | 900 | 1 | Mobile |
| Mobile | Samsung | 500 | 2 | Mobile |
| Mobile | Sony | 100 | 3 | Mobile |
+----------+---------+----------+---------------+-------------------------------+
From the above table, we simply select rows where category_rank<=2.
The main problem of this solution is that the inner query is executed for each row of the table and can be inefficient for large tables. It is useful for older versions of MySQL since it does not use window function or session variables.
Final Verdict
If you are using MySQL 8+, then use solution #1 with window functions since it is more efficient for large data sets. Also it is easier to understand.
If you are using MySQL<8, then you need to use solution #2 with sub query.
Conclusion
In this article, we have learnt how to select top N rows per group in MySQL. We have learnt how to do this using window function as well as using sub query. Using row_number() function with common table expression or sub query is the most efficient way to do this. However, using sub query is suitable and probably your only option in older MySQL (<8.0) systems. You can use either of these solutions as per your requirement.
FAQs
1. When to use Window Function
If you use newer MySQL versions (8.0+), then you can use window function since it is supported and more efficient than using sub queries.
2. When to use Sub query
If you use older version of MySQL (<8.0), then you should use sub query since it does not support window functions like row_number().
Also read:
How to Enable Remote Access in MySQL
How to Create Read Only MySQL User
How to Increase Max Connections in MySQL

Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.