Breaking BI: Predictive Analytics in Tableau Part 9: Automatic Time Series Analysis

Monday, February 24, 2014

Predictive Analytics in Tableau Part 9: Automatic Time Series Analysis

Today, we will talk about automatically performing time series analysis using Tableau 8.1's new R functionality. Up to now, we've looked at ways to determine which type of exponential or ARIMA model fits the data best. However, the number of models can be overwhelming to an analyst with less statistical exposure. Today, we'll look at how to let R choose the model for you. As usual, we will use the Superstore Sales sample data set in Tableau.

Let's imagine that we came up with a candidate list of models and want to see which ones are better for our type of data.

Too Many models

Here, we can see that charts can easily get very messy to see which models are best when there are a lot of models. So, what better way is there? Now, we're going to introduce two functions that will help. The function ets() will create an appropriate Exponential model for your data and auto.arima() will create an appropriate ARIMA model for your data. Like always, the code is identical until the last couple of lines. You can find the full code in the appendix at the end of this post. Here are the last couple of lines:

Exponential Model

fit <- ets(timeser)

c(rep(NA, len.new), forecast(fit)[[4]][1:hold.orig])

ARIMA Model

fit <- auto.arima(timeser)
(rep(NA, len.new), forecast(fit)[[4]][1:hold.orig])

Now, let's look at our chart.

Automatic Models

As you can see, these calculations quickly gave us two "good" candidate models to look at. You might notice that neither of these models fits the data extremely well. However, we could look at these models and see that the ARIMA model doesn't seem to be able to account for any of the variability, which is a very bad thing. The Exponential model seems much better, but still needs some tweaking. Therefore, we could throw out all of the ARIMA models and focus on creating a better Exponential model. In just a couple of minutes, we were able to narrow our modeling down to a single family! That's a huge time saver. Perhaps we could even improve this process by adding new functions and new techniques. But, that's a topic for another post. Thanks for reading. We hope you found this informative.

Brad Llewellyn
Data Analytics Consultant
Mariner, LLC
llewellyn.wb@gmail.com
https://www.linkedin.com/in/bradllewellyn

Appendix

ARIMA (Auto)

SCRIPT_REAL("
library(forecast)

## Creating vectors

hold.orig <- .arg4
len.orig <- length( hold.orig )
len.new <- len.orig - hold.orig[1]

year.orig <- .arg1
month.orig <- .arg2
sales.orig <- .arg3

## Sorting the Data

date.orig <- year.orig + month.orig / 12
dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
dat.new <- dat.orig[1:len.new,]

## Fitting the Time Series

timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1], dat.new[len.new,2]), frequency = 12)
fit <- auto.arima(timeser)
c(rep(NA, len.new), forecast(fit)[[4]][1:hold.orig])
",

ATTR( MONTH( [Order Date] ) ), ATTR( YEAR( [Order Date] ) ), SUM( [Sales] ), [Months to Forecast] )

Exponential (Auto)

SCRIPT_REAL("
library(forecast)

## Creating vectors

hold.orig <- .arg4
len.orig <- length( hold.orig )
len.new <- len.orig - hold.orig[1]

year.orig <- .arg1
month.orig <- .arg2
sales.orig <- .arg3

## Sorting the Data

date.orig <- year.orig + month.orig / 12
dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
dat.new <- dat.orig[1:len.new,]

## Fitting the Time Series

timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1], dat.new[len.new,2]), frequency = 12)
fit <- ets(timeser)
c(rep(NA, len.new), forecast(fit)[[4]][1:hold.orig])
",

ATTR( MONTH( [Order Date] ) ), ATTR( YEAR( [Order Date] ) ), SUM( [Sales] ), [Months to Forecast] )

4 comments:

prabaSeptember 14, 2014 at 2:29 PM
i am getting subscript error for this linedat.new <- dat.orig[1:len.new,]. i am new to r .
ReplyDelete
Replies
AnonymousJuly 16, 2015 at 11:56 AM
I am getting the following error:
Error in dat.orig[1:len.new, ] : subscript out of bounds
The only difference in my data set is that i am not summing sales. On the contrary i am counting the number of dates that appear in a particular month and forecasting that number moving forward.
ReplyDelete
Replies
AnonymousMay 4, 2016 at 2:37 PM
As is, the forecast is overlayed ontop of the original data. Is there away to "append" a forecast to the end of the data, so it is predicting future data points?
ReplyDelete
Replies

Add comment

About Me

Brad is a Service Engineer on Microsoft's FastTrack for Azure Team in Charlotte, NC. Brad helps individuals and organizations leverage Analytics and Azure to revolutionize themselves and their industries. He has an M.S. in Statistics from the University of South Carolina, MCSE Certification in Data Management and Analytics, MCSE Certification in Cloud Platform and Infrastructure and various MCSA Certifications in Business Intelligence and Advanced Analytics. Brad is an active blogger at breaking-bi.blogspot.com. He is also an organizer for the Charlotte BI Group, a local PASS chapter in Charlotte, NC. You can connect with him on LinkedIn at https://www.linkedin.com/in/bradllewellyn and on Twitter @BreakingBI.