Monday, January 27, 2014

Predictive Analytics in Tableau Part 5: Polynomial, Exponential and Piecewise Regression

Today, we will talk about some more types of regression using Tableau 8.1's new R functionality.  We previously talked about prediction using Linear Regression.  But, what if the relationship isn't linear?  In the real world, linearity is rarely the truth.  For this demonstration, we will use the same data set we used in Parts 1, 2, and 3.

First, let's look at our data.
 DJIA vs. Foreign by Year
As you can see, this data is not related linearly.  There is very little change in DJIA when Foreign is below 140.  However, when Foreign surpasses 140, DJIA skyrockets.  If we tried to use a line to predict these values, we would get something like this:
 DJIA vs. Foreign by Year (with Linear Trend)
This line is pretty far off from most of those points.  Fortunately, Tableau offers some more built-in trend lines.  Here are some good ones:
 DJIA vs. Foreign by Year (with Quadratic Trend)
 DJIA vs. Foreign by Year (with Exponential Trend)
As you can see, both of these models fit the data pretty well.  Now, let's see how to do them using R.
 DJIA (Exponential)
We see that all we have to do is make a new variable with whatever function we want and add it to the model.  As far as R knows, these are two completely different variables.  We could add all of the other variables to the mix as well if we wanted to.  We could easily model Foreign as a quadratic while modeling Consumer as a logarithm.  One important thing to note is that if you include a polynomial, you should include all lesser degrees.  Simply put, if you have a x^2, then you must also have x.  If you have x^3, you must also have x^2 and x.  We won't go into detail about why you should do this at this time.

Looking back at the earlier trends, we're not happy with either of these trends.  The quadratic trend has that troubling curvature at the left side while the exponential trend doesn't seem strong enough to capture the upward curvature.  Now, let's make a new model that Tableau doesn't even have!

What if we believe that there are actually two trends here?  Let's imagine that the relationship is linear for small values of Foreign (less than 135) and a different linear relationship for larger values of Foreign?  No problem!  We can combine our models!

SCRIPT_REAL("
djia <- .arg1
fore <- .arg2
th <- 135
fore2 <- fore^2

smallfore <- fore[fore<th]
smalldjia <- djia[fore<th]

largefore <- fore[fore>=th]
largedjia <- djia[fore>=th]

smallfit <- lm( smalldjia ~ smallfore )
largefit <- lm( largedjia ~ largefore )

c(smallfit\$fitted, largefit\$fitted)
",
SUM( [DJIA] ), SUM( [FOREIGN] ) )

Finally, let's see the results.
 DJIA vs. Foreign by Year (with Piecewise Prediction)
This model fits our data so much better!  This method is called Piecewise Regression.  The amazing thing about R is that there's a method for predicting anything you want.  You can even create your own prediction method if you need to.  Thanks for reading.  We hope you found this informative.

P.S.

We're curious how Tableau keeps track of which value in the output vector corresponds to each input.  This was a huge obstacle in Part 3 as well.  If you have ideas, let us know in the comments.

Data Analytics Consultant
Mariner, LLC
llewellyn.wb@gmail.com

1. Hi Brad, this is a great series! To answer your question about keeping track of the values, my understanding is that the ordering of what is sent to R from Tableau is based on the sort order of the addressing dimension(s). That creates a vector for each argument, and on the return from R to Tableau the assignment is in the same order. So the 1st address aka row in the partition gets the first value of the return vector, the 2nd address gets the 2nd value, and so on.

1. Thanks for the input Jonathan. That's what I suspected was the case. That concerns me when dealing with Nulls and functions where Time Series are involved. Perhaps I will have to investigate this a little further.

2. Thanks a lot Brad. The posts on Predictive Analytics are really helpful.

3. Daer Brad. This post are very interessting. Could you please help me to write the right syntax in Tableau for this R-syntax:

reg.quad <- nls (av~a*uv^2^2+ b*uv^2+c, data=Datensatz, start=list (a=51.75045, b=77.30759 , c=45.07692))

I am so sorry because i am a tableau beginner.