How to Create Histogram in PostgreSQL

Histograms help you visualize frequency distribution in data analysis and reporting, and get insights quickly. Here’s how to create histogram in PostgreSQL. We will look at a couple of different ways to get frequency distribution in PostgreSQL.

 

How to Create Histogram in PostgreSQL

Here are the steps to create histogram in PostgreSQL.

Let’s say you have the following table sales(order_date, sale)

postgres=# create table sales(order_date date,sale int);

postgres=# insert into sales values('2020-04-01',210),
           ('2020-04-02',125),('2020-04-03',150),('2020-04-04',230),
           ('2020-04-05',200),('2020-04-10',220),('2020-04-06',250),
           ('2020-04-07',215),('2020-04-08',300),('2020-04-09',250);

postgres=# select * from sales;
 order_date | sale
------------+------
 2020-04-01 |  210
 2020-04-02 |  125
 2020-04-03 |  150
 2020-04-04 |  230
 2020-04-05 |  200
 2020-04-10 |  220
 2020-04-06 |  250
 2020-04-07 |  215
 2020-04-08 |  300
 2020-04-09 |  250

 

Bonus Read : Create User in PostgreSQL

 

Create Histogram in PostgreSQL

There are multiple PostgreSQL Histogram queries to create histogram in PostgreSQL. We will look at each of them.

 

Simplest way to create Histogram in PostgreSQL

If you want PostgreSQL Histogram bins to  have equal intervals (10-20, 20-30, 30-40, etc.) then here’s the SQL query to create frequency distribution in PostgreSQL.

postgres=# select floor(sale/10)*10 as bin_floor, count(*)
     from sales
     group by 1
     order by 1;
+-----------+----------+
| bin_floor | count(*) |
+-----------+----------+
|       120 |        1 |
|       150 |        1 |
|       200 |        1 |
|       210 |        2 |
|       220 |        1 |
|       230 |        1 |
|       250 |        2 |
|       300 |        1 |
+-----------+----------+

In the above query, we calculate the bin number by dividing each sale value with histogram bin size, that is, 10. If you want to change the bin size to 25, change floor(sale/10)*10 to floor(sale/25)*25

Bonus Read : PostgreSQL DROP VIEW

postgres=# select floor(sale/25)*25 as bin_floor, count(*)
     from sales
     group by 1
     order by 1;
+-----------+----------+
| bin_floor | count(*) |
+-----------+----------+
|       125 |        1 |
|       150 |        1 |
|       200 |        4 |
|       225 |        1 |
|       250 |        2 |
|       300 |        1 |
+-----------+----------+

The advantage of the above method is that it automatically changes the bins based on addition/deletion/modification of data. It also creates new bins for new data and/or removes bins in case their count becomes 0. So it allows you to dynamically create histogram in PostgreSQL.

Bonus Read : PostgreSQL Create Index

 

Most Flexible Way to Create Histrogram in PostgreSQL

If you already know the histogram bins to be created or if your histogram bucket sizes are different, then you can create frequency distribution using CASE statement. Here’s the Histogram query for PostgreSQL

postgresql=# select '100-150' as TotalRange,count(sale) as Count from sales
        where sale between 100 and 150
     union (
     select '150-200' as TotalRange,count(sale) as Count from sales
        where sale between 150 and 200)
     union (
     select '200-225' as TotalRange,count(sale) as Count from sales
        where sale between 200 and 225)
     union (
     select '225-250' as TotalRange,count(sale) as Count from sales
        where sale between 225 and 250);
+------------+-------+
| TotalRange | Count |
+------------+-------+
| 100-150    |     2 |
| 150-200    |     2 |
| 200-225    |     4 |
| 225-250    |     3 |
+------------+-------+

In this query, we assign bucket for each sale value using case statement. You can change the histogram buckets to any range you want, they don’t need to have uniform intervals.

This query has 2 advantages:

  1. You can change the bucket name to anything you want. For example, instead of displaying “100-150” you can show it as “between 100 and 150”
  2. You can have non uniform bucket size. For example, you can have a bucket as 100-150 with a bucket size of 50, and another bucket as 150-250 with a bucket size of 100.

 

Depending on your requirement, you can choose any of the above methods to create histogram in PostgreSQL.

Ubiq makes it easy to visualize data in minutes, and monitor in real-time dashboards. Try it Today!