Last updated on May 17th, 2020 at 06:23 am
Rolling Average or Moving Average is a useful metric that helps you keep track of average value over a moving period (e.g average sales for past 7 days). Calculating moving average over time gives more reasonable trend, compared to plotting daily numbers. Since there is no built-in function to calculate moving average in Redshift, here’s the SQL query to do it.
How to Calculate Moving Average in Redshift
Here are the steps to calculate moving average in Redshift. Let’s say you have the following table that contains daily sales information in Redshift.
# create table sales(order_date date,sale int); # insert into sales values('2020-01-01',20), ('2020-01-02',25),('2020-01-03',15),('2020-01-04',30), ('2020-01-05',20),('2020-01-10',20),('2020-01-06',25), ('2020-01-07',15),('2020-01-08',30),('2020-01-09',20); # select * from sales; +------------+------+ | order_date | sale | +------------+------+ | 2020-01-01 | 20 | | 2020-01-02 | 25 | | 2020-01-03 | 15 | | 2020-01-04 | 30 | | 2020-01-05 | 20 | | 2020-01-10 | 20 | | 2020-01-06 | 25 | | 2020-01-07 | 15 | | 2020-01-08 | 30 | | 2020-01-09 | 20 | +------------+------+
Let’s say you want to calculate moving average in Redshift for past 5 days. Redshift (which is basically Postgresql) makes this really easy with the help of Redshift Window Functions. Here’s the SQL query to calculate moving average for past 5 days. We will look at it in detail below.
SELECT a.order_date,a.sale, AVG(a.sale) OVER(ORDER BY a.order_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_sales FROM sales a ;
If you want to round the results, you can use ROUND function as shown to calculate running average in Redshift
SELECT a.order_date,a.sale, round(AVG(a.sale) OVER(ORDER BY a.order_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW),2) AS avg_sales FROM sales a; order_date | sale | avg_sales ------------+------+--------------- 2020-01-01 | 20 | 20.00 2020-01-02 | 25 | 22.50 2020-01-03 | 15 | 20.00 2020-01-04 | 30 | 22.50 2020-01-05 | 20 | 22.00 2020-01-06 | 25 | 23.00 2020-01-07 | 15 | 21.00 2020-01-08 | 30 | 24.00 2020-01-09 | 20 | 22.00 2020-01-10 | 20 | 22.00
Let’s look at the above query in detail. AVG function calculates average value of sale column. However, when we use it along with WINDOW function OVER it calculates average value only for the window that we define.
First we use ORDER BY on our data to ensure that rows are sorted chronologically. Then we define our window for average using OVER function, and mention ROWS BETWEEN 4 PRECEDING AND CURRENT ROW. That is, for each row, calculate average for preceding 4 rows and current row. As the window frame changes for each row, only preceding 4 days and current date will be used.
You can also add filters by adding WHERE clause in the above SQL query.
SELECT a.order_date,a.sale, round(AVG(a.sale) OVER(ORDER BY a.order_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW),2) AS avg_sales FROM sales a WHERE condition;
If you want to calculate moving average in Redshift for past 30 days/1 month, modify the above query to use PRECEDING 29 ROWS AND CURRENT ROW
SELECT a.order_date,a.sale, round(AVG(a.sale) OVER(ORDER BY a.order_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),2) AS avg_sales FROM sales a;
Bonus Read : How to Create Pivot Table in PostgreSQL
How to Calculate Moving Average in Redshift for Past 3 Months
Let’s say you have monthly sales data instead of daily data, and want to calculate rolling average for past 3 months.
# create table monthly_sales(order_month date,sale int); postgres=# insert into monthly_sales values('2019-12-01',20), ('2020-01-30',25),('2020-02-28',15),('2020-03-31',30), ('2020-04-30',20),('2020-05-31',20),('2020-06-30',25), ('2020-07-31',15),('2020-08-31',30),('2020-09-30',20); postgres=# select * from monthly_sales; order_month | sale -------------+------ 2019-12-01 | 20 2020-01-30 | 25 2020-02-28 | 15 2020-03-31 | 30 2020-04-30 | 20 2020-05-31 | 20 2020-06-30 | 25 2020-07-31 | 15 2020-08-31 | 30 2020-09-30 | 20
We use the same logic to calculate moving average in Redshift, in this case. First ORDER BY order_month column to ensure the rows are chronologically sorted. Then calculate average for PRECEDING 2 ROWS AND CURRENT ROW
SELECT a.order_month,a.sale, round(AVG(a.sale) OVER(ORDER BY a.order_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS avg_sales FROM monthly_sales a ; order_month | sale | avg_sales -------------+------+----------- 2019-12-01 | 20 | 20.00 2020-01-30 | 25 | 22.50 2020-02-28 | 15 | 20.00 2020-03-31 | 30 | 23.33 2020-04-30 | 20 | 21.67 2020-05-31 | 20 | 23.33 2020-06-30 | 25 | 21.67 2020-07-31 | 15 | 20.00 2020-08-31 | 30 | 23.33 2020-09-30 | 20 | 21.67
Bonus Read : How to Calculate Running Total in Redshift
You can also add filters by including WHERE clause in the above SQL query.
SELECT a.order_month,a.sale, round(AVG(a.sale) OVER(ORDER BY a.order_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS avg_sales FROM monthly_sales a WHERE condition;
You can customize the above query to calculate moving average in Redshift, as per your requirements.
After you calculate moving average in Redshift, you can use a charting tool to plot it on a line chart and share it with your team. Here’s an example of a line chart that visualizes moving average, created using Ubiq.
If you want to create charts, dashboards & reports from Redshift database, you can try Ubiq. We offer a 14-day free trial.
Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.