EDIT: This instance is definitely a case of "foot-in-mouth" syndrome. Later in this post, I use the incorrect formula to calculate the P-value. However, the rest of the procedure is valid, just not the P-value. Even so, this is still a good example of showing the many uses of table calculations. Enjoy!
Step 1:
- Determine what two sets of data you would like to compare
For our example, we chose profits for 2011 and 2012, rolled up to the customer level. We can see these distributions below.
Histograms |
Step 2:
- Aggregate your data and divide it into your two samples.
- Create the following calculated fields.
2011 Profit |
2012 Profit |
- Find the means and standard deviations of your two sets.
- Create the following calculated fields.
2011 Profit Sample Mean |
2012 Profit Sample Mean |
2011 Profit Standard Deviation |
2012 Profit Standard Deviation |
Step 4:
- Calculate your pooled standard deviation
- Create the following calculated fields
2011 Customers |
2012 Customers |
Pooled Profit Standard Deviation |
Step 5:
- Calculate the P-value
- Create the following calculated field
P-value |
EDIT: I had a big DUH! moment earlier today. The formula I used to calculate the p-value is incorrect. It should be an integral, which Tableau is incapable of calculating. In spite of this, the Z-statistic is still valid, you just have to use some other means to calculate a rejection region and/or a P-value. On the bright side, maybe some type of statistical package integration is in the works. Cheers!
Final Table |
The one caveat is that you have to put [Customer] somewhere in the canvas in order for this method to work. We thought the level of detail would be the easiest. As we suspected, the P-value is extremely high, meaning that we don't have enough evidence that say that these two samples come from populations with different means.
WHEW! Wasn't that a lot of work? We think so too. It seems that we've shown that Tableau is capable of basic statistical testing. However, you are literally doing all of the work from scratch. It would probably be much easier to combine Tableau's graphical capabilities with a more specialized tool, such as R or SAS. On the bright side, there's an idea on the Tableau forums to put some sort of R integration in Tableau. Here's to hoping!
In all seriousness though, if you need advanced statistical procedures, you're probably better off using a specialized tool and exporting the values manually. I hope you found this informative. Thanks for reading.
Brad Llewellyn
Associate Consultant
Mariner, LLC
llewellyn.wb@gmail.com
https://www.linkedin.com/in/bradllewellyn
https://www.linkedin.com/in/bradllewellyn
Brad,
ReplyDeleteI wish to know how to explain use of these statistical test to laymen ? Can you pls let me know bit more of below line ?
As we suspected, the P-value is extremely high, meaning that we don't have enough evidence that say that these two samples come from populations with different means.
Harihara,
DeleteThat description is as simple as I feel comfortable saying. Some people might say
"Since the P-value is extremely high, the means are the same."
These people would be wrong. Alas, I digress.
The P-value is the first thing taught in an introductory statistics course, which is now mandatory in many universities. If your business users don't understand them, then it's probably best not to use the procedure at all. If they really want to know whether the distributions look the same, but don't understand P-values, just show them a histogram and let them decide that way.
I apologize if this sounds "high and mighty", but I don't feel comfortable spreading misinformation.
Thanks,
Brad Llewellyn
Associate Consultant
Mariner, LLC
brad.llewellyn@mariner-usa.com
http://www.linkedin.com/in/bradllewellyn
http://breaking-bi.blogspot.com
Yes. Bit clear. Imagine i new to world of statistics and histograms.
ReplyDeleteHow well it brings decision making capabilities or should i use these kind of graphs for decision making.
Why not comparative analysis or my basic bar and line or 23 types of tableau charts. I want to understand its core and basic use. Can you help ?
I studied a lot of statistics, p value, normal distribution etc. where does it really used in real life and how it will be benefitted to clients. from your exp, i wish to know.
It's simply a method of "proving" a concept. Anyone can look at a graph and see something. The underlying question is always, "Is that really true?" Statistics gives you a quantifiable measure to determine whether or not you are seeing what you think you are seeing. Most business users won't have any use for statistics. You would have to find an audience that is ready for the next level of analytics, and these audiences are not very common.
DeleteBrad, Thanks for the awesome blog. I was wondering if Tableau would allow us to do these tests at different levels with the same data set. Now, the data is at customer level and say, we add one more categorical variable like Location (North, South being two values).
ReplyDeleteSo, I need Z statistic for the following pairs:
2011 Profit of North Vs 2012 Profit of North
2011 Profit of South Vs 2012 Profit of South
2011 Profit Vs 2012 Profit
*I'll have a drop down to select North and South.
Appreciate your help! Expecting a reply soon!
Thanks,
Surya
Surya,
DeleteTake note of the beginning of the post where we create a calculation using
SUM(
IF YEAR( [Order Date] ) = 2012 THEN [Profit] END
)
You could replace that IF statement with anything you want. You could even create a parameter to allow the user to select the value via a dropdown! On a different note, Tableau 8.1 has an integration with R that will allow you to do this type of thing much easier.
Thanks,
Brad Llewellyn
Brad,
DeleteThanks a ton for the response! And it worked perfectly. I need help on just one more doubt which arose on solving this problem.
Now, as we are working in the client environment.and the client doesn't have Tableau 8.1 neither does he has R. We took help from the blog below to perform t-test
http://community.tableausoftware.com/docs/DOC-1428
As mentioned earlier, the initial problem was to provide t-test result (p-value) even when a user is making changes in filter selections. We have solved this issue now.
The difficulty which we are facing (and I think we might face even after integrating Tableau with R) is using p-value to colour the bar chart. To understand the issue better please have a look at the tableau file which I have shared with you on your official email.
The workbook attached has two sheets
o T-test table
o Desired chart
- Using the aforementioned blog we are able to perform t-test (in t-test table). The p-value is getting changed if we change any/all of these filters (problem 1 solved)
- But the client requirement is to colour the chart (aggregated at retailer name level) in ‘desired chart’ using p-value. Eg. - Significant (p<0.05) as green and non-significant as red
- As p-value is a table calculation, we are struggling to change colour of bar graph.
- Any guidance on this would be extremely helpful
Thanks,
Surya
That's an interesting question. Unfortunately, I don't have access to that email address at the moment. Changing the color shouldn't be such a difficult task if you understand Compute Using. Perhaps this post will help
Deletehttp://breaking-bi.blogspot.com/2013/07/introduction-to-table-calculations.html
Thanks,
Brad Llewellyn