Last updated on September 11th, 2020 at 10:11 am
It can be a pain to calculate percentile in MySQL. There are no functions for it yet. Percentiles are useful to rank and group users or customers.
You can identify your most valuable users or customers and create special offers for them. Here’s a ready made query to do it.
E.g, You have a table orders which contains all product orders for each user. You want to calculate percentile.
order +-----------+------------+----------+ | user_id | product | sales | +-----------+------------+----------+ | 1 | Soap | 10 | | 4 | Perfume | 100 | | 1 | Noodles | 20 | | 3 | Deo | 200 | +-----------+------------+----------+
percentiles +-----------+----------+---------+---------------+ | user_id | total | rank | percentile | +-----------+----------+---------+---------------+ | 1 | 30 | 3 | 33.33 | | 4 | 100 | 2 | 66.67 | | 3 | 200 | 1 | 100 | +-----------+----------+---------+---------------+
Here’s a query you can use to calculate percentile in MySQL based on totals. Just replace the columns – user_id, sales and table – order. It aggregates the total sales for each user. Then it ranks them on total sales. Finally, it calculates percentile using rank.
select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM (select user_id,sum(sales) as total from `order` group by user_id) p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
If you already have total sales for each user in the table and want to directly use the table to calculate percentile, here’s a query
select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM `order` p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
order +-----------+----------+ | user_id | total | +-----------+----------+ | 1 | 30 | | 4 | 100 | | 3 | 200 | +-----------+----------+
percentiles +-----------+----------+---------+---------------+ | user_id | total | rank | percentile | +-----------+----------+---------+---------------+ | 1 | 30 | 3 | 33.33 | | 4 | 100 | 2 | 66.67 | | 3 | 200 | 1 | 100 | +-----------+----------+---------+---------------+
As you can see the last ranked user does not have a zero percentile. This is the nature of percentile calculation. Either the first person can have 100 percentile or the last ranked can have zero. Both cannot happen at the same time. If you want to force the last ranked person to have a zero percentile, you can use the following queries. I am not adding 1 to the rank while calculating percentile.
select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM (select user_id,sum(sales) as total from `order` group by user_id) p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
If you already have total sales for each user in the table and want to directly use the table to calculate percentile, here’s a query
select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM `order` p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
percentiles +-----------+----------+---------+---------------+ | user_id | total | rank | percentile | +-----------+----------+---------+---------------+ | 1 | 30 | 3 | 0 | | 4 | 100 | 2 | 33.33 | | 3 | 200 | 1 | 66.67 | +-----------+----------+---------+---------------+
SQL to create the sample table order:
Create table `order` (user_id int, product varchar(255),sales int); Insert into `order` values (1,'Soap', 10),(4,'Perfume',100),(1,'Noodles',20),(3,'Deo',200);
Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.