Calculate percentile from frequency in MySQL


Sometimes you may want to calculate percentile from frequency or counts in MySQL based on. E.g, to rank customers based on number of purchases or orders instead of total sales.There are no functions for it yet. Here’s a ready made query to do it.

E.g, You have a table orders which contains all product orders for each user. You want to calculate percentile from frequency of orders.

order
+------+--------------+-------------+
|  user_id  |   product  |   sales  |
+-----------+------------+----------+
|     1     |     Soap   |    10    |
|     4     |   Perfume  |   100    |
|     1     |   Noodles  |   20     |
|     4     |     Soap   |   10     |
|     1     |    Glue    |   20     |
|     3     |     Deo    |   200    |
+-----------+------------+----------+
percentiles
+-----------+----------+---------+---------------+
|  user_id  |   total  |  rank   |   percentile  |
+-----------+----------+---------+---------------+
|     1     |     3    |    1    |      100      |
|     4     |     2    |    2    |     66.67     |
|     3     |     1    |    3    |     33.33     |
+-----------+----------+---------+---------------+

Here’s a query you can use to calculate percentile from frequency or count in MySQL. Just replace the columns – user_id, sales and table – order. It counts the number of orders for each user. Then it ranks them on number of orders. Finally, it calculates percentile using rank.

select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM      (select user_id,count(sales) as total from `order` group by user_id)
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct

If you already have number of orders for each user in the table and want to directly use the table to calculate percentile from frequency or count, here’s a query

select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM   `order`
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct
order
+-----------+----------+
|  user_id  |   total  |
+-----------+----------+
|     1     |    3     |
|     4     |    2     |
|     3     |    1     |
+-----------+----------+
percentiles
+-----------+----------+---------+---------------+
|  user_id  |   total  |  rank   |   percentile  |
+-----------+----------+---------+---------------+
|     1     |    3     |    1    |      100      |
|     4     |    2     |    2    |     66.67     |
|     3     |    1     |    3    |     33.33     |
+-----------+----------+---------+---------------+

As you can see the last ranked user does not have a zero percentile. This is the nature of percentile calculation. Either the first person can have 100 percentile or the last ranked can have zero. Both cannot happen at the same time. If you want to force the last ranked person to have a zero percentile, you can use the following queries. I am not adding 1 to the rank while calculating percentile.

select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM      (select user_id,count(sales) as total from `order` group by user_id)
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct

If you already have total sales for each user in the table and want to directly use the table to calculate percentile, here’s a query

select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM   `order`
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct
percentiles
+-----------+----------+---------+---------------+
|  user_id  |   total  |  rank   |   percentile  |
+-----------+----------+---------+---------------+
|     1     |    3     |    1    |     66.67     |
|     4     |    2     |    2    |     33.33     |
|     3     |    1     |    3    |       0       |
+-----------+----------+---------+---------------+

SQL to create the sample table order:

Create table `order` (user_id int, product varchar(255),sales int);
Insert into `order` values  (1,'Soap', 10),(4,'Perfume',100),(1,'Noodles',20),(4,'Soap',10),(1,'Glue',20),(3,'Deo',200);
mm

About Sreeram Sreenivasan

Sreeram Sreenivasan is the Founder of Ubiq, a business dashboard & reporting platform for small & medium businesses. Ubiq makes it easy to build business dashboards & reports for your business. Try it for free today!