How to Calculate Running Total in Redshift

Last updated on May 17th, 2020 at 06:18 am

Cumulative Total, also known as running total, is a useful KPI metric to monitor sum total values over time. If your data is stored in Amazon Redshift, then it is quite easy to calculate cumulative sum in Redshift. Let’s see how to calculate running total in Redshift.

How to Calculate Running Total in Redshift

Here are the steps to calculate running total in Redshift. Let’s say you have the following table that contains daily sales

postgres-#> create table sales(order_date date,sale int);

postgres-#> insert into sales values('2020-03-01',30),
('2020-03-02',35),('2020-03-03',25),('2020-03-04',40),('2020-03-05',30);

postgres-#> select * from sales;
+------------+------+
| order_date | sale |
+------------+------+
| 2020-03-01 |   30 |
| 2020-03-02 |   35 |
| 2020-03-03 |   25 |
| 2020-03-04 |   40 |
| 2020-03-05 |   30 |
+------------+------+

Let’s say you want to calculate cumulative sum of sales, like so,

+------------+------+-----------------+
| order_date | sale | cumulative_sale |
+------------+------+-----------------+
| 2020-03-01 |   30 |              30 |
| 2020-03-02 |   35 |              65 |
| 2020-03-03 |   25 |              90 |
| 2020-03-04 |   40 |             130 |
| 2020-03-05 |   30 |             160 |
+------------+------+-----------------+

Here’s the SQL query to calculate running total in Redshift. Since Redshift is basically PostgreSQL, all its functions and expressions are available in PostgreSQL. So we will use a Window Function, available in PostgreSQL, to do so.

postgres=# SELECT order_date
          , sum(sale) over (order by order_date asc rows between unbounded preceding and current row)
          FROM   sales
          ORDER  BY order_date;

 order_date | sum
------------+-----
 2020-03-01 |  30
 2020-03-02 |  65
 2020-03-03 |  90
 2020-03-04 | 130
 2020-03-05 | 160

Let’s look at the above query in detail. The SUM function above calculates the cumulative sum for each row. However, we use a Window Function OVER which helps us define the number of rows to be used for calculating running total in Redshift.

Bonus Read: How to Calculate Moving Average in Redshift

First, we order our data chronologically by order_date to ensure that we add up the right rows for each cumulative sum. Next, for each row, we tell Redshift to sum sale column for all preceding rows. We do that using window function OVER, and mention rows between unbounded preceding and current row. That is, for each row, consider only preceding rows for cumulative sum.

As the window frame changes for each row, only the preceding rows will be used to calculate running total.

If you want to filter your data before you calculate running total in Redshift, you can do so by adding WHERE clause, as shown below.

postgres=# SELECT order_date
          , sum(sale) over (order by order_date asc rows between unbounded preceding and current row)
          FROM   sales
          ORDER  BY order_date
          WHERE <condition>;

Bonus Read : How to Create Pivot Table in PostgreSQL

Also, since Redshift is basically PostgreSQL, you can also use the above query to calculate running total in PostgreSQL.

After you calculate cumulative sum in Redshift, you can use a Redshift data visualization tool to plot it on a line chart. Here’s an example of a line chart for cumulative total, created using Ubiq.