Calculate percentile from recency in MySQL

Last updated on September 21st, 2021 at 11:12 am

Sometimes you may want to calculate percentile from recency in MySQL based on. E.g, to rank customers based on recency of purchases or orders instead of total sales.This helps create special offers for people who have recently bought something on your website. There are no functions for it yet. Here’s a ready made query to do it.

E.g, You have a table orders which contains all product orders for each user. You want to calculate percentile from recency of orders.

order
+-----------+------------+------------------+
|  user_id  |   product  |   purchase_date  |
+-----------+------------+------------------+
|     1     |     Soap   |     2013-11-20   |
|     4     |   Perfume  |     2013-07-02   |
|     1     |   Noodles  |     2013-10-20   |
|     4     |     Soap   |     2013-11-11   |
|     1     |    Glue    |     2013-09-12   |
|     3     |     Deo    |     2013-10-20   |
+-----------+------------+------------------+
percentiles
+-----------+----------------+---------+---------------+
|  user_id  |   latest       |  rank   |   percentile  |
+-----------+----------------+---------+---------------+
|     1     |  2013-11-20    |    1    |      100      |
|     4     |  2013-11-11    |    2    |     66.67     |
|     3     |  2013-10-20    |    3    |     33.33     |
+-----------+----------------+---------+---------------+

Here’s a query you can use to calculate percentile from recency in MySQL. Just replace the columns – user_id, purchase_date and table – order. It gets the latest purchase date each user. Then it ranks them on latest over date of purchase. Finally, it calculates percentile using rank.

select user_id,latest,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from   
(SELECT  user_id,latest,@curRank := @curRank + 1 AS rank
FROM      (select user_id,max(purchase_date) as latest from `order` group by user_id)
p, (SELECT @curRank := 0) r
ORDER BY  latest desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct

If you already have latest purchase date for each user in the table and want to directly use the table to calculate percentile from recency of purchase, here’s a query

select user_id,purchase_date,rank,round(100*(cnt-rank+1)/cnt,0) as percentile from   
(SELECT  user_id,purchase_date,@curRank := @curRank + 1 AS rank
FROM   `order`
p, (SELECT @curRank := 0) r
ORDER BY  purchase_date desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct
order
+-----------+----------------+---------+---------------+
|  user_id  |   purchase_date|  rank   |   percentile  |
+-----------+----------------+---------+---------------+
|     1     |  2013-11-20    |    1    |      100      |
|     4     |  2013-11-11    |    2    |     66.67     |
|     3     |  2013-10-20    |    3    |     33.33     |
+-----------+----------------+---------+---------------+
percentiles
+-----------+----------------+---------+---------------+
|  user_id  |   purchase_date|  rank   |   percentile  |
+-----------+----------------+---------+---------------+
|     1     |  2013-11-20    |    1    |      100      |
|     4     |  2013-11-11    |    2    |     66.67     |
|     3     |  2013-10-20    |    3    |     33.33     |
+-----------+----------------+---------+---------------+

As you can see the last ranked user does not have a zero percentile. This is the nature of percentile calculation. Either the first person can have 100 percentile or the last ranked can have zero. Both cannot happen at the same time. If you want to force the last ranked person to have a zero percentile, you can use the following queries. I am not adding 1 to the rank while calculating percentile.

select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM      (select user_id,count(sales) as total from `order` group by user_id)
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct

If you already have total sales for each user in the table and want to directly use the table to calculate percentile, here’s a query

select user_id,total,rank,round(100*(cnt-rank)/cnt,0) as percentile from   
(SELECT  user_id,total,@curRank := @curRank + 1 AS rank
FROM   `order`
p, (SELECT @curRank := 0) r
ORDER BY  total desc ) as dt,(select count(distinct user_id) as cnt from
`order`) as ct
percentiles
+-----------+----------------+---------+---------------+
|  user_id  |   purchase_date|  rank   |   percentile  |
+-----------+----------------+---------+---------------+
|     1     |  2013-11-20    |    1    |     66.67     |
|     4     |  2013-11-11    |    2    |     33.33     |
|     3     |  2013-10-20    |    3    |       0       |
+-----------+----------------+---------+---------------+

SQL to create the sample table order:

Create table `order` (user_id int, product varchar(255),purchase_date date);
Insert into `order` values  (1,'Soap', '2013-11-20'),(4,'Perfume','2013-07-02'),(1,'Noodles','2013-10-20'),(4,'Soap','2013-11-11'),(1,'Glue','2013-09-12'),(3,'Deo','2013-10-20');