How to Calculate Median in PostgreSQL

Last updated on May 17th, 2020 at 06:11 am

Calculating median helps you find middle value in data and avoid noise due to extreme values. Here’s how to calculate median in PostgreSQL using percentile_cont since there is no function to calculate median in PostgreSQL

 

How to Calculate Median in PostgreSQL

We will calculate median in PostgreSQL by calculating 50th percentile value, since there is no built-in function to calculate median in PostgreSQL.

Let’s say you have the following table sales(order_date,amount)

postgres=# create table sales(order_date date,sale int);

postgres=# insert into sales values('2020-04-01',210),
           ('2020-04-02',125),('2020-04-03',150),('2020-04-04',230),
           ('2020-04-05',200),('2020-04-10',220),('2020-04-06',25),
           ('2020-04-07',215),('2020-04-08',300),('2020-04-09',250);

postgres=# select * from sales;
 order_date | sale
------------+------
 2020-04-01 |  210
 2020-04-02 |  125
 2020-04-03 |  150
 2020-04-04 |  230
 2020-04-05 |  200
 2020-04-10 |  220
 2020-04-06 |   25
 2020-04-07 |  215
 2020-04-08 |  300
 2020-04-09 |  250

Bonus Read : How to Calculate Moving Average in PostgreSQL

 

Here’s the query to find median, that is, calculate 50th percentile in PostgreSQL.

postgres=# SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY sale) FROM sales;
 percentile_cont
-----------------
           212.5

In the above query, we use percentile_cont function that allows you to define what percentile (e.g 0.5 that is 50th percentile) you want to calculate. Then we also use WITHIN GROUP clause to tell PostgreSQL to calculate percentile within the group of sale values. We also use ORDER BY clause to tell PostgreSQL to sort the sale values before calculating median.

When you use percentile_cont, PostgreSQL will interpolate the median value as the average of 2 middle values (210,215).

Bonus Read : How to Import CSV File in PostgreSQL table

 

If you don’t want PostgreSQL to interpolate but only return the closest value, use percentile_disc instead. Here’s an example

postgres=# SELECT PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY sale) FROM sales;
 percentile_disc
-----------------
             210

If you want to calculate median in PostgreSQL for each group, use GROUP BY clause as shown below. Let’s say your sales table also contains product column with product names. In that case, you can calculate median for each product using the following query.

postgres=# SELECT PRODUCT, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY sale) 
FROM sales 
GROUP BY PRODUCT;

Bonus Read : How to Calculate Month over Month Growth in PostgreSQL

 

After you calculate median in PostgreSQL, you can use a PostgreSQL Reporting Tool to plot the median values along with sale values on a line chart to analyze the difference. For that we will join the calculated median value with original sales table as shown below

postgres=# select * from sales,
(SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY sale) FROM sales) temp 
order by order_date;
 order_date | sale | percentile_cont
------------+------+-----------------
 2020-04-01 |  210 |           212.5
 2020-04-02 |  125 |           212.5
 2020-04-03 |  150 |           212.5
 2020-04-04 |  230 |           212.5
 2020-04-05 |  200 |           212.5
 2020-04-06 |   25 |           212.5
 2020-04-07 |  215 |           212.5
 2020-04-08 |  300 |           212.5
 2020-04-09 |  250 |           212.5
 2020-04-10 |  220 |           212.5

 

Here’s a line chart that shows daily sales along with median, plotted using Ubiq

calculate median in postgresql

 

If you want to create charts, dashboards & reports from PostgreSQL database, you can try Ubiq. We offer a 14-day free trial.