Last updated on September 11th, 2020 at 10:17 am
Sometimes you may want to calculate percentile from frequency or counts in MySQL based on. E.g, to rank customers based on number of purchases or orders instead of total sales.There are no functions for it yet. Here’s a ready made query to do it.
E.g, You have a table orders which contains all product orders for each user. You want to calculate percentile from frequency of orders.
order +------+--------------+-------------+ | user_id | product | sales | +-----------+------------+----------+ | 1 | Soap | 10 | | 4 | Perfume | 100 | | 1 | Noodles | 20 | | 4 | Soap | 10 | | 1 | Glue | 20 | | 3 | Deo | 200 | +-----------+------------+----------+
percentiles +-----------+----------+---------+---------------+ | user_id | total | rank | percentile | +-----------+----------+---------+---------------+ | 1 | 3 | 1 | 100 | | 4 | 2 | 2 | 66.67 | | 3 | 1 | 3 | 33.33 | +-----------+----------+---------+---------------+
Here’s a query you can use to calculate percentile from frequency or count in MySQL. Just replace the columns – user_id, sales and table – order. It counts the number of orders for each user. Then it ranks them on number of orders. Finally, it calculates percentile using rank.
select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM (select user_id,count(sales) as total from `order` group by user_id) p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
If you already have number of orders for each user in the table and want to directly use the table to calculate percentile from frequency or count, here’s a query
select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM `order` p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
order +-----------+----------+ | user_id | total | +-----------+----------+ | 1 | 3 | | 4 | 2 | | 3 | 1 | +-----------+----------+
percentiles +-----------+----------+---------+---------------+ | user_id | total | rank | percentile | +-----------+----------+---------+---------------+ | 1 | 3 | 1 | 100 | | 4 | 2 | 2 | 66.67 | | 3 | 1 | 3 | 33.33 | +-----------+----------+---------+---------------+
As you can see the last ranked user does not have a zero percentile. This is the nature of percentile calculation. Either the first person can have 100 percentile or the last ranked can have zero. Both cannot happen at the same time. If you want to force the last ranked person to have a zero percentile, you can use the following queries. I am not adding 1 to the rank while calculating percentile.
select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM (select user_id,count(sales) as total from `order` group by user_id) p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
If you already have total sales for each user in the table and want to directly use the table to calculate percentile, here’s a query
select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from (SELECT user_id,total,@curRank := @curRank + 1 AS rank FROM `order` p, (SELECT @curRank := 0) r ORDER BY total desc ) as dt,(select count(distinct user_id) as cnt from `order`) as ct
percentiles +-----------+----------+---------+---------------+ | user_id | total | rank | percentile | +-----------+----------+---------+---------------+ | 1 | 3 | 1 | 66.67 | | 4 | 2 | 2 | 33.33 | | 3 | 1 | 3 | 0 | +-----------+----------+---------+---------------+
SQL to create the sample table order:
Create table `order` (user_id int, product varchar(255),sales int); Insert into `order` values (1,'Soap', 10),(4,'Perfume',100),(1,'Noodles',20),(4,'Soap',10),(1,'Glue',20),(3,'Deo',200);
Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.