Breaking BI: Predictive Analytics in Tableau Part 1: Simple Linear Regression

Monday, December 30, 2013

Predictive Analytics in Tableau Part 1: Simple Linear Regression

Today, we will begin the next series of posts about performing predictive analysis via Tableau 8.1's new R functionality. More specifically, we'll be talking about Simple Linear Regression. Some of you may remember our previous post on this topic, Performing Simple Linear Regression in Tableau. That procedure utilized Table Calculations, which despite being powerful in their own right, were not quite meant for such complex mathematics. The new R integration makes this task significantly easier, as we are about to see. For this procedure, we will use a sample data set from a collegiate source.

The first question you might ask is "Why is this important?" Well, regression is one of the many ways in which you can predict new observations. Want to know what your sales will be next month for a particular product line? Regression can help. Want to know how many new customers you will acquire in the next 3 months? Regression can help. The list goes on and on. Now, all we need is a good foundation. Then, addressing many of your business problems would be within our grasp.

The first step of any regression model is determining which variables are going to be your predictors and which variables are going to be your responses. In a simple linear regression model, we can only have one predictor and one response. We also assume that they are related in a "linear" fashion, which is easiest understand via a picture.

Linear vs. Nonlinear

Now, we can see that the relationship between Foreign and Consumer is approximately linear. However, some of you might ask, "What are these values?" These are U.S. Economic figures for the years between 1976 and 1987. Foreign is "Foreign Investments / Billions of Dollars" and Consumer is "Consumer Debt / Billions of Dollars." Now, let's try to predict how much we will have in foreign investments given that we know what consumer debt will be.

Foreign (Predicted)

The code for creating a linear regression model is extremely simple, nothing more than two lines of actual code. Now, let's see what values we get.

Consumer, Foreign, and Predicted Foreign by Year

Voila! We have predictions. However, it's difficult to see the relationship in a text table. Let's create a % Difference calculation and make this into a highlight table.

Consumer vs. Foreign (Highlight Table)

Now we can easily see which predictions were close, and which were not so close. This was a bit too easy though, let's try something else. When you use a regression model to predict a value, you don't just get a single value, you actually get a range that the value is likely to be in. This is called a prediction interval. If you want a more rigorous definition, you can check out this article. Now, what if we were to use R to calculate these intervals? Let's see!

Foreign (Predicted Lower)

Foreign (Predicted Upper)

These calculations leverage a neat function called predict() that we won't go into detail on at this time. However, it's one of R's many multi-purpose functions. Now, we just need a way to look at these. A little imagination goes a long way in these types of charts.

Consumer vs. Foreign (Banded Scatterplot)

As you can see, all of our values fall within the bounds. This is a good thing. It means that our model fit our data pretty well.

The really cool part about this new R integration is definitely in the prediction and forecasting scenarios. As we just saw, it's really easy to get some cool predictions and display them. There's WAY more to do here. We could have gotten more technical by looking at residual plots or QQ plots. Don't think that linear regression was the right model? No problem! We could have used a time series, artificial neural network, or even a Bayesian model. You're only limited by your imagination. We hope you found this informative. Thanks for reading.

Data Analytics Consultant

Mariner, LLC

brad.llewellyn@mariner-usa.com

http://www.linkedin.com/in/bradllewellyn
http://breaking-bi.blogspot.com

11 comments:

AnonymousJune 3, 2014 at 5:57 AM
Hi Brad,

Great post around regression modelling. I just have a small questions. can we see the beta coefficients of all independent variables with dependent variable

Regards,
Ankit
ReplyDelete
Replies
yyyrJuly 6, 2015 at 5:50 PM
Hello Brad,

I have a requirement using R an tableau, The details are as follows,

can we pass the arguments(.arg1,arg2....) as parameters, so that user can select the X and Y variables in tableau using the parameter control and the visualization changes accordingly.?

Lets consider a Simple linear regression R script in tableau:

SCRIPT_int('
mydata <- data.frame(rooms <- .arg1, ok <-.arg2, af <- .arg3, rating <- .arg4, initial<-.arg5, time <- .arg6 );
lrmodel <- lm(time ~ as.factor(ok) + rooms + as.factor(af) + rating, data = mydata);
lrmodel$fitted
prob <- predict(lrmodel, newdata = mydata, type = "response")
',
ATTR([Number of Rooms]),
ATTR([OK or SUI]),
ATTR([AF Counted]),
ATTR([Rating]),
ATTR([Initial or Makeup]),
ATTR([total TIME])
)
Now the requirement is instead passing the variables directly for X and Y as arguments , pass the arguments as parameters??

Kindly let me know if this question makes sense, or if more information is needed?. Thanks awaiting for the reply.

Thanks and Regards
Yashwanth
ReplyDelete
Replies
UnknownDecember 12, 2015 at 7:27 AM
how to get the percent of difference and how to calculate it?
ReplyDelete
Replies
UnknownDecember 16, 2015 at 12:56 PM
Hi Brad,
I'm using a similar formula and getting a strange error that I can't seem to sort out:

Unexpected number of results returned by SCRIPT function. Function expected 5206 values; 2268 values returned

Any thoughts on how to fix this? Here's my code

SCRIPT_REAL("

fit <- lm(.arg1 ~ .arg2 + .arg3)

fit$fitted
"
,
SUM([Applications (2015)]),
SUM([Applications (2014)]),
SUM([Applications (2013)])

)
ReplyDelete
Replies
AnonymousJanuary 7, 2016 at 2:57 PM
How can we run the new data to predict the dependent variable basis on already established predictive model?
ReplyDelete
Replies
AnonymousFebruary 22, 2016 at 1:09 PM
I am not able to run the lm function in the code. Do I need to install any package related to it in R first?
ReplyDelete
Replies
JuraMarch 3, 2016 at 1:44 PM
Hi Brad, nice post, congratulation.
witch sample data set from a collegiate source did you pick? Please
ReplyDelete
Replies
UnknownMay 30, 2016 at 6:50 AM
What are the technical features available in Predictive analysis of BI????
ReplyDelete
Replies
SupreethMekaJune 28, 2016 at 10:07 PM
Hi is there a way that we can save the fit method some where (more precisely in Rserve session)and use the predict method for the calculated field ?
ReplyDelete
Replies

About Me

Brad is a Service Engineer on Microsoft's FastTrack for Azure Team in Charlotte, NC. Brad helps individuals and organizations leverage Analytics and Azure to revolutionize themselves and their industries. He has an M.S. in Statistics from the University of South Carolina, MCSE Certification in Data Management and Analytics, MCSE Certification in Cloud Platform and Infrastructure and various MCSA Certifications in Business Intelligence and Advanced Analytics. Brad is an active blogger at breaking-bi.blogspot.com. He is also an organizer for the Charlotte BI Group, a local PASS chapter in Charlotte, NC. You can connect with him on LinkedIn at https://www.linkedin.com/in/bradllewellyn and on Twitter @BreakingBI.