When potting residuals, check that your table calculation has the correct "compute using" setting. If may default to Table(Across) when you need it to be the set to the key that identifies each row, which in the example above, is YEAR.

I keep getting a perfect model fit & my residuals are zero. what is wrong witn my calculating? thanks

I keep getting a perfect model fit & my residuals are zero. I have a large data set & purposefully duplicated all variables in one row with two different y values..& it STILL predicts perfectly. I'm obviously doing something wrong. Can you help?

I keep getting a perfect model fit & my residuals are zero. I have a large data set & purposefully duplicated all variables in one row with two different y values..& it STILL predicts perfectly. I'm obviously doing something wrong. Can you help?

How can I show the regression model's coefficients in Tableau? eg. We can use something like summary(fit)$coefficients[,1] in R to show one of the coefficients, but how to show it in Tableau? Thanks!

All right, thanks a lot.
If I make it work I will definitely post it here!

Gabriel

My mistake. I misunderstood your data structure. So, you have a data set with X rows and columns Score, A, B, C, D, E, F, G + some more dimensions. The issue here is that you would traditionally place your ID field (unique ID for each row/observation) onto the Detail Shelf along with AVG( [Score] ), AVG( [A] ), etc. This would mean that your Scatterplot has X points on the chart, not the 7 that you would want in order to plot a point for each Regressor. To simplify the story substantially, what you're asking for is possible, yet extremely complex. It would require an expert knowledge of data structure and Table Calculations as well as a decent amount of time to work through it. After that, it would likely not perform very well given any reasonable amount of data.

If you still wish to attempt this, I'll give you one piece of advice "The R Script is nothing more than a table calculation, and you can treat it as such."

Sorry I didn't have better news. If you do figure it out, please post it up here. It would be a great experience for myself and the other readers.

Cheers,

Brad Llewellyn

Good to know it's the most frustrating part, I thought maybe I was the only one facing this issue!
 
I am not sure what you mean by doing rep( P, 8 ). What I am trying to achieve is to get those values (the 8 coefficients and 8 p-values) in a single table in Tableau. The best way would be to get a matrix from R but since we get only a single value I am wondering how I can modify my code in order to get those values into Tableau. 

The visulization I'm looking for is a bubble chart:
X axis = average Score 
Y axis = Coefficient 
The bubble would be A-B-C-D-E-F-G according to their average score and coefficient.
And I would filter by p-value > 0.05 to show only the attributes that are statistically significant.

Tell me if I'm not being clear enough, I can send you an example of what I'm trying to achieve.

Thanks,

Gabriel

Thanks for commenting! You've run into the most frustrating part of the Tabeau/R integration. The data that R returns has have the same length as the data you send to it. For instance, if you send in vector #1, which has 8 values, you must return 8 values. So, if you wanted to return the p-value, you have to do rep( P, 8 ). Does this make sense?

Hi Brad,
Your blog is awesome, it's helping me a lot.

I'm having issues with the multiple regression procedure in Tableau.

I am using the following code:

SCRIPT_REAL("
Score <- .arg1
A <- .arg2
B <- .arg3
C <- .arg4
D <- .arg5
E <- .arg6
F <- .arg7
G <- .arg8

fit <- lm( OSAT ~ A + B + C + D + E + F + G )
fit$fitted
"
, avg([Score]), avg([A]), avg([B]), avg([C]), avg([D]), avg([E]), avg([F]), avg([G]))

What I actually need is the coefficient score and the P-value for each of those arguments (A to G)

So I tried changing the fit$fitted for something like this:

fit$coefficients

Also, the data frame I am using come from a survey database. It looks like this:

Respondent # | Score | A | B | C | D | E | F | G| 
#1 | 10 | 9 | 9 | 8 | 7 | 6 | 7 | 5
#2 | 5 | 4 | 2 | 3 | 5 | 7 | 8 | 6 
...
i

So what I am trying to achieve is to get the coefficients and p-value on a single sheet for all those arguments (A to G) in order to make a scatter plot.

Thanks a lot


In a simple way, no. The data you get back from R has to be 1-1 with the data you send in. So, if you send in 20 rows of data (considering a tabular), you will get 20 values back. You can get each value back individually if you'd like....and display them on a chart. But you can't see the R output within R.

Thank you Brad. Your blog was helpful. is it possible to view a summary of the linear regression? I am thinking the R function "summary(lm)" in tableau.

Thanks for commenting! Your problem is likely due to the "Compute Using". The R interface is actually just a table calculation. So, when you change your dimension from Year to Company, you will probably need to change your Compute Using as well. Hopefully this post will help you.

http://breaking-bi.blogspot.com/2013/07/introduction-to-table-calculations.html

Thanks very much for the informative session. I'm having a bit of a problem. If I would like to see each of the predicted y value, how would I be able to do that? It seems like Tableau is putting everything into the SUM, and if I don't want to see it by year, rather, by company or by field, it stops working?

Edward,
Thanks for commenting! As far as I can see, there are two issue with your code. First, in the ## Defining Variables section, you shouldn't use commas to end the lines. R automatically executes a line when you "Carriage Return". Second, the order of the variables in the Define Variables section need to be the same as the order in the last line of the code where you have all of the SUM(). Otherwise, you will get false results.

Thanks!

Good day Sir,
I am new to the R language, I just started to educate myself with it just this year. I was trying to perform a linear model with multiple variables with my Tableau. My syntax is:

SCRIPT_REAL( "

## Defining Variables

[GVA]<- .arg1,
[Emp]<- .arg2,
[Surv]<- .arg3,
[CP]<- .arg4,

## Fitting the Model

fit <- lm( GVA ~ Emp + CP + Surv)
fit$fitted
"
,SUM( [GVA]), SUM([CP]), SUM([Emp]), SUM([Surv]))

but I received an error saying:

Error in base::parse(text = .cmd) : :5:5: unexpected '['
4:
5: [
^
What seems to be the problem? Thank you so much.

That's a really good question. The predict function returns a table with the following columns: Prediction, Lower Bound, and Upper Bound. Therefore, since we need to pull out one column at a time, we give it a column number to pull out. We could have also used [,1], but that would give us that same values we get when we use fit$fitted.

Thanks for this series. I do have a question however. When writing the calc for the prediction intervals, what do the[,2] and [,3] represent? It obviously gives the lower and upper interval, but what specifically does it mean?

Thanks,