Calculate percentile from frequency in MySQL

Last updated on September 11th, 2020 at 10:17 am

Sometimes you may want to calculate percentile from frequency or counts in MySQL based on. E.g, to rank customers based on number of purchases or orders instead of total sales.There are no functions for it yet. Here’s a ready made query to do it.

E.g, You have a table orders which contains all product orders for each user. You want to calculate percentile from frequency of orders.

order
+------+--------------+-------------+
|  user_id  |   product  |   sales  |
+-----------+------------+----------+
|     1     |     Soap   |    10    |
|     4     |   Perfume  |   100    |
|     1     |   Noodles  |   20     |
|     4     |     Soap   |   10     |
|     1     |    Glue    |   20     |
|     3     |     Deo    |   200    |
+-----------+------------+----------+
percentiles
+-----------+----------+---------+---------------+
|  user_id  |   total  |  rank   |   percentile  |
+-----------+----------+---------+---------------+
|     1     |     3    |    1    |      100      |
|     4     |     2    |    2    |     66.67     |
|     3     |     1    |    3    |     33.33     |
+-----------+----------+---------+---------------+

Here’s a query you can use to calculate percentile from frequency or count in MySQL. Just replace the columns – user_id, sales and table – order. It counts the number of orders for each user. Then it ranks them on number of orders. Finally, it calculates percentile using rank.

select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM      (select user_id,count(sales) as total from `order` group by user_id)
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct

If you already have number of orders for each user in the table and want to directly use the table to calculate percentile from frequency or count, here’s a query

select user_id,total,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM   `order`
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct
order
+-----------+----------+
|  user_id  |   total  |
+-----------+----------+
|     1     |    3     |
|     4     |    2     |
|     3     |    1     |
+-----------+----------+
percentiles
+-----------+----------+---------+---------------+
|  user_id  |   total  |  rank   |   percentile  |
+-----------+----------+---------+---------------+
|     1     |    3     |    1    |      100      |
|     4     |    2     |    2    |     66.67     |
|     3     |    1     |    3    |     33.33     |
+-----------+----------+---------+---------------+

As you can see the last ranked user does not have a zero percentile. This is the nature of percentile calculation. Either the first person can have 100 percentile or the last ranked can have zero. Both cannot happen at the same time. If you want to force the last ranked person to have a zero percentile, you can use the following queries. I am not adding 1 to the rank while calculating percentile.

select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM      (select user_id,count(sales) as total from `order` group by user_id)
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct

If you already have total sales for each user in the table and want to directly use the table to calculate percentile, here’s a query

select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM   `order`
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct
percentiles
+-----------+----------+---------+---------------+
|  user_id  |   total  |  rank   |   percentile  |
+-----------+----------+---------+---------------+
|     1     |    3     |    1    |     66.67     |
|     4     |    2     |    2    |     33.33     |
|     3     |    1     |    3    |       0       |
+-----------+----------+---------+---------------+

SQL to create the sample table order:

Create table `order` (user_id int, product varchar(255),sales int);
Insert into `order` values  (1,'Soap', 10),(4,'Perfume',100),(1,'Noodles',20),(4,'Soap',10),(1,'Glue',20),(3,'Deo',200);