calculate percentile in postgresql

How to Calculate Percentile in PostgreSQL

Calculating Percentile helps you rank data for analysis as well as calculate median for a set of values. Here’s how to calculate percentile in PostgreSQL using percentile_cont window function. We will also look at how to calculate 50th percentile, 90th percentile, 95th percentile and nth percentile in PostgreSQL, since they are commonly used in data analysis.

 

How to Calculate Percentile in PostgreSQL

We will calculate percentile in PostgreSQL using percentile_cont window function

Let’s say you have the following table sales(order_date,amount) and you want to calculate 50th percentile.

postgres=# create table sales(order_date date,sale int);

postgres=# insert into sales values('2020-04-01',210),
           ('2020-04-02',125),('2020-04-03',150),('2020-04-04',230),
           ('2020-04-05',200),('2020-04-10',220),('2020-04-06',25),
           ('2020-04-07',215),('2020-04-08',300),('2020-04-09',250);

postgres=# select * from sales;
 order_date | sale
------------+------
 2020-04-01 |  210
 2020-04-02 |  125
 2020-04-03 |  150
 2020-04-04 |  230
 2020-04-05 |  200
 2020-04-10 |  220
 2020-04-06 |   25
 2020-04-07 |  215
 2020-04-08 |  300
 2020-04-09 |  250

Bonus Read : How to Fill Missing Dates in PostgreSQL

 

Here’s the query to calculate 50th percentile in PostgreSQL. You can use the same query to calculate median in PostgreSQL.

postgres=# SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY sale) FROM sales;
 percentile_cont
-----------------
           212.5

In the above query, we use percentile_cont window function that allows you to define what percentile (e.g 0.5 that is 50th percentile) you want to calculate. Then we also use WITHIN GROUP clause to calculate percentile within the group of sale values. We also use ORDER BY clause to sort the sale values before we calculate percentile in PostgreSQL.

When you use percentile_cont window function, PostgreSQL will interpolate the median value as the average of 2 middle values (210, 215), in case there are 2 middle values. That is why it returns 212.5

 

Bonus Read : How to Calculate Moving Average in PostgreSQL

 

When you can calculate percentile in PostgreSQL, if you don’t want PostgreSQL to interpolate but only return the closest value, use percentile_disc window function instead. Here’s an example

postgres=# SELECT PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY sale) FROM sales;
 percentile_disc
-----------------
             210

In this case, it will return the one of the middle values, in case there are 2 middle values.

 

Bonus Read : How to Import CSV File into PostgreSQL

 

How to Calculate 90th Percentile in PostgreSQL

Similarly, you can calculate 90th percentile in PostgreSQL using percentile_cont. We change 0.5 to 0.9 in percentile_cont function

postgres=# SELECT PERCENTILE_CONT(0.9) WITHIN GROUP(ORDER BY sale) FROM sales;
percentile_cont
-----------------
255

You can also calculate 90th percentile in PostgreSQL using percentile_disc

postgres=# SELECT PERCENTILE_DISC(0.9) WITHIN GROUP(ORDER BY sale) FROM sales;
percentile_disc
-----------------
250

 

How to Calculate 95th Percentile in PostgreSQL

Similarly, you can calculate 95th percentile in PostgreSQL, using percentile_cont

postgres=# SELECT PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY sale) FROM sales;
percentile_cont
-----------------
277.5

You can also calculate 95th percentile in PostgreSQL using percentile_disc

postgres=# SELECT PERCENTILE_DISC(0.95) WITHIN GROUP(ORDER BY sale) FROM sales;
percentile_disc
-----------------
300

 

How to Calculate nth Percentile in PostgreSQL

Similarly, you can calculate nth percentile in PostgreSQL using percentile_cont. We basically supply n/100 in percentile_cont function

postgres=# SELECT PERCENTILE_CONT(n/100) WITHIN GROUP(ORDER BY sale) FROM sales;

 

You can also calculate 90th percentile in PostgreSQL using percentile_disc

postgres=# SELECT PERCENTILE_DISC(n/100) WITHIN GROUP(ORDER BY sale) FROM sales;

 

After you calculate percentile in PostgreSQL, you can use a PostgreSQL Reporting Tool to plot the percentile values along with sale values on a line chart to analyze the difference and also do goal setting. For that we will join the calculated 50th percentile value with original sales table as shown below

postgres=# select * from sales,
(SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY sale) FROM sales) temp 
order by order_date;
 order_date | sale | percentile_cont
------------+------+-----------------
 2020-04-01 |  210 |           212.5
 2020-04-02 |  125 |           212.5
 2020-04-03 |  150 |           212.5
 2020-04-04 |  230 |           212.5
 2020-04-05 |  200 |           212.5
 2020-04-06 |   25 |           212.5
 2020-04-07 |  215 |           212.5
 2020-04-08 |  300 |           212.5
 2020-04-09 |  250 |           212.5
 2020-04-10 |  220 |           212.5

 

Here’s a line chart that shows daily sales along with percentile, plotted using Ubiq

daily sales with percentile

 

If you want to create charts, dashboards & reports from PostgreSQL database, you can try Ubiq. We offer a 14-day free trial.

mm

About Ubiq

Ubiq is a powerful dashboard & reporting platform. Build dashboards, charts & reports for your business in minutes. Try it for free!