Monday, February 3, 2014

Predictive Analytics in Tableau Part 6: Single Exponential Time Series

Before we start today's topic, we want to direct you towards a great source of information about Tableau 8.1's R functionality.  This page was compiled by members of the Tableau community and can be found here.  Now, on to the demonstration.

Today, we will talk about creating time series analyses using Tableau 8.1's new R functionality.  In our previous demonstrations, we used varying regression techniques to predict values.  These techniques took into account the relationships between the variables, but not across time.  Time Series Analyses approach from the opposite direction.  They attempt to predict new values by looking at previous (or future) values in time, but only considering one variable.  For this demonstration, we will use the Superstore Sales sample data set from Tableau.

As always, the first step is to look at the data we want to forecast.  Let's look at Total Sales per Month.
Sales by Month
We can see that there is quite a bit of variation in this data.  Now, we run into our first conundrum.  I don't know of an easy way to allow Tableau to see dates outside of what's in the data.  Therefore, we can only forecast for dates we already have in the data.  This is actually a good and bad thing for us, as we'll soon see.  If you have a system that includes budget data for the future, then the dates will already be available and you won't have this problem.  Alas, we do not have such luck.  So, let's first hold out the last six months of our data and create the model using only data up to June 2012.  That way, we have something to compare our forecasts to.  In order to do this, we create an integer parameter.
Months to Forecast
Now, we can dynamically set the number of months we want to forecast.  Next, we need to create the code to do this.  But, before we do that, we need to install the "forecast" package on the R Server.
Install Packages
If you don't have the access to do this, then you will probably need to talk to the administrator on your R server.  Fortunately, we host the server on our local machine; so, there is no issue.  Now, back to the forecasting.

This code is long and somewhat complex.  So, we'll only touch on the important parts.  The code for Sales (Forecast) is as follows:

SCRIPT_REAL("
    library(forecast)

    ## Creating vectors

    hold.orig <- .arg4
    len.orig <- length( hold.orig )
    len.new <- len.orig - hold.orig[1]

    year.orig <- .arg1
    month.orig <- .arg2
    sales.orig <- .arg3

    ## Sorting the Data

    date.orig <- year.orig + month.orig / 12
    dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
    dat.new <- dat.orig[1:len.new,]

    ## Fitting the Time Series

    timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1],
                     dat.new[len.new,2]), frequency = 12)
    fit <- HoltWinters(timeser, beta=FALSE, gamma=FALSE)
    c(rep(NA,len.new), forecast(fit, hold.orig[1])$mean)
"
, ATTR( YEAR( [Order Date] ) ), ATTR( MONTH( [Order Date] ) ), SUM( [Sales] ),
 [Months to Forecast] )

First, we need to install the forecast library using library(forecast).  Then, we need to sort the data by date, and hold out the last few months.  How many months get held out depends on the value of our [Months to Forecast] parameter.  Finally, we fit our time series as a single exponential model and output the results.  Let's see what it looks like.
Sales by Month (with Forecast)
As you can see, the forecast is pretty close to the true values.  This type of model cannot pick up on the trend or seasonality though.  We'll deal with that in a later post.  For now, let's see how well this model fits by also plotting the 80% and 95% confidence intervals.  You should remember confidence intervals from our posts about regression.  The code for the intervals are identical to the code for the forecast, with the exception of the last line of code.  I'll post the final lines here.  If you want to see the full code, please refer to the appendix at the end of this post.

Sales (Forecast) (Lower 80%)

c(rep(NA,len.new), forecast(fit, hold.orig[1])$lower[,1])

Sales (Forecast) (Lower 95%)

c(rep(NA,len.new), forecast(fit, hold.orig[1])$lower[,2])

Sales (Forecast) (Upper 80%)

c(rep(NA,len.new), forecast(fit, hold.orig[1])$upper[,1])

Sales (Forecast) (Upper 95%)

c(rep(NA,len.new), forecast(fit, hold.orig[1])$upper[,2])

Now, let's see what the intervals look like
Sales by Month (with Forecast and Bounds)
We can see that all of the actual values fall well within the 95% bound.  This means that our model fits our data reasonably well.  Now, let's talk about how to use this model to predict future values.  The simplest way we could find to do this is to add some filler lines to your data set.  In our case, we're using an Excel spreadsheet.  So, we add the following lines to our data.
Adding Empty Dates
You should note that if you are doing forecasts for a specific type of product, customer, etc., you will need to duplicate these dates for every combination of dimensions you want to forecast over.  Now, when we refresh our worksheet, we have our forecasts.
Sales by Month (with Future Forecast and Bounds)
One very important thing to notice about these forecasts is that the bounds are extremely far from the graph.  This would seem to imply that the model is not a good fit for the data.  Therefore, we would want to use a different type of time series model for it.  Unfortunately, that's going to have to wait until the next post.  Thanks for reading.  We hope you found this informative.

P.S.

If anybody can think of a better way to get future forecasts for dates that don't exist in the data, please let us know in the comments.  Also, this method is not robust against missing data.  If you're data has some missing months/week/etc., you may have to alter the method slightly.

Brad Llewellyn
Data Analytics Consultant
Mariner, LLC
llewellyn.wb@gmail.com
https://www.linkedin.com/in/bradllewellyn

APPENDIX:

The code for Sales (Forecast) (Lower 80%) is as follows:

SCRIPT_REAL("
    library(forecast)

    ## Creating vectors

    hold.orig <- .arg4
    len.orig <- length( hold.orig )
    len.new <- len.orig - hold.orig[1]

    year.orig <- .arg1
    month.orig <- .arg2
    sales.orig <- .arg3

    ## Sorting the Data

    date.orig <- year.orig + month.orig / 12
    dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
    dat.new <- dat.orig[1:len.new,]

    ## Fitting the Time Series

    timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1], dat.new[len.new,2]), frequency = 12)
    fit <- HoltWinters(timeser, beta=FALSE, gamma=FALSE)
    c(rep(NA,len.new), forecast(fit, hold.orig[1])$lower[,1])
"
, ATTR( YEAR( [Order Date] ) ), ATTR( MONTH( [Order Date] ) ), SUM( [Sales] ),
 [Months to Forecast] )

The code for Sales (Forecast) (Lower 95%) is as follows:

SCRIPT_REAL("
    library(forecast)

    ## Creating vectors

    hold.orig <- .arg4
    len.orig <- length( hold.orig )
    len.new <- len.orig - hold.orig[1]

    year.orig <- .arg1
    month.orig <- .arg2
    sales.orig <- .arg3

    ## Sorting the Data

    date.orig <- year.orig + month.orig / 12
    dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
    dat.new <- dat.orig[1:len.new,]

    ## Fitting the Time Series

    timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1], dat.new[len.new,2]), frequency = 12)
    fit <- HoltWinters(timeser, beta=FALSE, gamma=FALSE)
    c(rep(NA,len.new), forecast(fit, hold.orig[1])$lower[,2])
"
, ATTR( YEAR( [Order Date] ) ), ATTR( MONTH( [Order Date] ) ), SUM( [Sales] ),

 [Months to Forecast] )

The code for Sales (Forecast) (Upper 80%) is as follows:

SCRIPT_REAL("
    library(forecast)

    ## Creating vectors

    hold.orig <- .arg4
    len.orig <- length( hold.orig )
    len.new <- len.orig - hold.orig[1]

    year.orig <- .arg1
    month.orig <- .arg2
    sales.orig <- .arg3

    ## Sorting the Data

    date.orig <- year.orig + month.orig / 12
    dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
    dat.new <- dat.orig[1:len.new,]

    ## Fitting the Time Series

    timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1], dat.new[len.new,2]), frequency = 12)
    fit <- HoltWinters(timeser, beta=FALSE, gamma=FALSE)
    c(rep(NA,len.new), forecast(fit, hold.orig[1])$upper[,1])
"
, ATTR( YEAR( [Order Date] ) ), ATTR( MONTH( [Order Date] ) ), SUM( [Sales] ),

 [Months to Forecast] )

The code for Sales (Forecast) (Upper 95%) is as follows:

SCRIPT_REAL("
    library(forecast)

    ## Creating vectors

    hold.orig <- .arg4
    len.orig <- length( hold.orig )
    len.new <- len.orig - hold.orig[1]

    year.orig <- .arg1
    month.orig <- .arg2
    sales.orig <- .arg3

    ## Sorting the Data

    date.orig <- year.orig + month.orig / 12
    dat.orig <- cbind(year.orig, month.orig, sales.orig)[sort(date.orig, index.return = TRUE)$ix,]
    dat.new <- dat.orig[1:len.new,]

    ## Fitting the Time Series

    timeser <- ts(dat.new[,3], start = c(dat.new[1,1], dat.new[1,2]), end = c(dat.new[len.new,1], dat.new[len.new,2]), frequency = 12)
    fit <- HoltWinters(timeser, beta=FALSE, gamma=FALSE)
    c(rep(NA,len.new), forecast(fit, hold.orig[1])$upper[,2])
"
, ATTR( YEAR( [Order Date] ) ), ATTR( MONTH( [Order Date] ) ), SUM( [Sales] ),

 [Months to Forecast] )

4 comments:

  1. Alex Macharia
    Create a shift date calculated field for future dates by
    dateadd('month', [Months to Forecast],[Order Date])

    ReplyDelete
  2. How can you alter the code to only forecast by year? I keep getting an error: Error in dat.orig[1:len.new, ] : subscript out of bounds. Thank you very much for any help

    ReplyDelete
  3. Hello,
    This blog is really helpful. I tried using the forecasting, but whenever I used the Sales(forecast) field it shows me the following error: Error in library(forecast) : there is no package called 'forecast'.
    I have installed the forecast package in R.
    Can you please help me solve this error.
    Thank you.

    ReplyDelete
  4. After adding the blank lines, Tableau retuns: "An error occurred while communication with the Rserve service", "Error in hw(p, beta, gamma) : NA/NaN/Inf in foreign function call (arg 1)"

    Any ideas on a fix?

    ReplyDelete