tag:blogger.com,1999:blog-3721871707048712457.post7174221640739656383..comments2024-03-28T00:39:48.395-04:00Comments on Breaking BI: Performing K-Means Clustering in TableauBreaking BIhttp://www.blogger.com/profile/02551920506874509998noreply@blogger.comBlogger15125tag:blogger.com,1999:blog-3721871707048712457.post-29327286174637725412016-07-28T07:22:37.253-04:002016-07-28T07:22:37.253-04:00Hi, when I tried to run the below code and in spit...Hi, when I tried to run the below code and in spite of turning off "Aggregate Measures" under Analysis, i met with the error "Error in sample.int(m, k) : <br /> cannot take a sample larger than the population when 'replace = FALSE'. <br /> <br /><br />SCRIPT_INT("<br /> ## Sets the seed<br /><br /> set.seed(.arg6[1])<br /> <br /> ## Studentizes the variables<br /> Overdue_Amount <- (.arg1 - mean(.arg1)) / sd(.arg1)<br /> Days_Late_paid <- (.arg2 - mean(.arg2) ) / sd(.arg2)<br /> Credit_Limit <- (.arg3 - mean(.arg3) ) / sd(.arg3)<br /> DSO_Days <- (.arg4 - mean(.arg4)) / sd(.arg4)<br /> dat <- cbind(Overdue_Amount, Days_Late_paid, Credit_Limit, DSO_Days)<br /> num <- .arg5[1]<br /><br /> ## Creates the clusters<br /> kmeans(dat, num)$cluster<br />", <br /><br />MAX( [A Total Overdue Open Inv Amt] ), MAX( [Avg DL Paid Invs] ), MAX( [Cash Cus Credit Limit] ),<br />MAX( [Cash Cus Dso Days]),<br />[Number of Clusters], [Seed]<br />)<br /><br />Can any one be of some help?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-73491382418579900902016-07-15T15:59:31.987-04:002016-07-15T15:59:31.987-04:00Hi, i am new to statistics and Tableau. I am not ...Hi, i am new to statistics and Tableau. I am not quite getting the SD shading right - if you recall, did you leave the defaults? my shading is covering the entire range - did you choose scope of Per Pane? did you pick sample or population? Cord Thomashttps://www.blogger.com/profile/01789300866584419111noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-79532370005308417712015-07-30T21:39:44.478-04:002015-07-30T21:39:44.478-04:00Also I want to understand the use of aggregate fun...Also I want to understand the use of aggregate functions like SUM,MAX,MIN in SCRIPT functions as I have seen different codes with different functions.It will be great if you could explain the reason for selecting MAX function in your example.Anonymoushttps://www.blogger.com/profile/06837815423862385601noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-75241451954686422192015-07-30T21:38:41.657-04:002015-07-30T21:38:41.657-04:00Hi,
I tried standardizing the data as mentioned i...Hi,<br /> I tried standardizing the data as mentioned in your post but I am getting NA/NAN error while running kmean clustering for a demographic dataset. I want to standardize the data since my parameters have different units. The problem is similar to the one that you have mentioned in your post.<br /><br />Below is the kmeans cluster code that I am using for normalization<br /><br />age <- ( .arg1 - mean(.arg1) ) / sd(.arg1)<br /> income <- ( .arg2 - mean(.arg2) ) / sd(.arg2)<br /> experience <- ( .arg3 - mean(.arg3) ) / sd(.arg3)<br /> Anonymoushttps://www.blogger.com/profile/06837815423862385601noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-16127849992751730162015-06-23T12:51:04.119-04:002015-06-23T12:51:04.119-04:00I wanted to recreate the example but couldn't ...I wanted to recreate the example but couldn't find the data. Could you please add the link (or am I missing something?)<br /><br />Thank youvonjdhttps://www.blogger.com/profile/12488764399725481497noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-28736216481096127492014-11-10T09:12:19.007-05:002014-11-10T09:12:19.007-05:00Steven,
Thanks for commenting! I think you'r...Steven,<br /><br />Thanks for commenting! I think you're already there. If the vector you return to Tableau is the [X Center], but it is duplicated, then you could place your [X Centers] field on the Columns Shelf, with your arguments on the details shelf. This should create a 1-dimensional scatterplot with duplicate value on each [X Center]. Then, use the filter<br /><br />FIRST() == 0<br /><br />with Compute Using set to [Cluster] to remove all of the duplicates. This method requires that you also return the cluster number to Tableau, which shouldn't be an issue seeing that you've already computed it in your R code. Does this help? Breaking BIhttps://www.blogger.com/profile/02551920506874509998noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-5095384479667255022014-11-10T00:01:21.750-05:002014-11-10T00:01:21.750-05:00Thank you for this very interesting post.
I'm...Thank you for this very interesting post.<br /><br />I'm also working on a Tableau + R integration using the k-means model.<br /><br />I'm trying to bring back the centers to the view as I would normaly do in R using: points(cl$centers, pch = 17, cex=2). But since Tableau only allows a vector of the same lenght to be brought back, I created two separated fields for the X and Y components:<br /><br />SCRIPT_REAL('<br />set.seed(1234)<br />param <- max(.arg3)<br />result <- kmeans(x = data.frame(.arg1,.arg2), param)<br />df <- data.frame(.arg1,.arg2)<br />df2 <- cbind(df, result$cluster)<br />colnames(df2)[3] <- "Cluster"<br />df3 <- cbind(result$centers, c(1:param))<br />colnames(df3)[3] <- "Cluster"<br />df4 <- merge(df2, df3, by="Cluster")<br />df4[,4]', <br />SUM([Petal#Length]),SUM([Petal#Width]),[Parameter 1])<br /><br />Now if I would like to only return the distinct values for the centers and plot them but I can't use<br /><br />IF FIRST()==0 THEN WINDOW_SUM(COUNTD([X Centers])) END<br /><br />Because [X Centers] is already an aggregated field. Any idea on how I should procede ? Stevenhttps://www.blogger.com/profile/10716129168405734311noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-13753914332597971842014-05-12T05:56:34.827-04:002014-05-12T05:56:34.827-04:00Thank you so much for the advice sir, I have an ex...Thank you so much for the advice sir, I have an extra letter in the parameter section Number of Clusters instead of Number of Cluster, seedd instead of seeds. R language is really quite a challenge to master.Anonymoushttps://www.blogger.com/profile/16554239696699545132noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-53392014285892317722014-05-11T14:27:18.385-04:002014-05-11T14:27:18.385-04:00In my workbook, I create a parameter for Number of...In my workbook, I create a parameter for Number of Clusters and Seed. Therefore, I was able to use them in the calculation. Did you also create these parameters?Breaking BIhttps://www.blogger.com/profile/02551920506874509998noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-29061798550585715532014-05-11T08:20:55.776-04:002014-05-11T08:20:55.776-04:00Hi Sir,
Im a highschool student and very new to R...Hi Sir,<br /><br />Im a highschool student and very new to R-script and Tableau, this is a question coming from a novice, I am trying to perform k-mean clustering and my script is<br /><br />SCRIPT_INT("<br /> ## Sets the seed<br /><br /> set.seed( .arg6[1] )<br /><br /> ## Studentizes the variables<br /><br /> CP <- ( .arg1 - mean(.arg1) ) / sd(.arg1)<br /> Emp <- ( .arg2 - mean(.arg2) ) / sd(.arg2)<br /> GVA <- ( .arg3 - mean(.arg3) ) / sd(.arg3)<br /> Surv <- ( .arg4 - mean(.arg4) ) / sd(.arg4)<br /> dat <- cbind(CP, Emp, GVA, Surv)<br /><br /> num <- .arg5[1]<br /><br /> ## Creates the clusters<br /><br /> kmeans(dat, num)$cluster<br />", <br /><br />MAX([CP]), MAX([Emp]), MAX([GVA]),<br />MAX( [Surv] ),<br />[Number of Clusters],[Seed]<br />)<br /><br />I got an error saying:<br />Reference to undefined field [Number of Cluster].<br />Reference to undefined field [seeds].<br /><br />got any advice where or how I got this problem? Thanks so much for the help.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-4115041804242902802014-03-24T19:33:44.039-04:002014-03-24T19:33:44.039-04:00Thanks for commenting! I'm not sure I underst...Thanks for commenting! I'm not sure I understand your question. If you try using "Duplicate Sheet as Crosstab", you might find an answer. Does this make sense or am i misunderstanding?Breaking BIhttps://www.blogger.com/profile/02551920506874509998noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-45914728886468997682014-03-24T17:08:47.841-04:002014-03-24T17:08:47.841-04:00Hi Brad,
How do you extract the clusters in table ...Hi Brad,<br />How do you extract the clusters in table form once you use the R kmeans on tableau?<br />ThanksAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-48798986497462167822014-01-16T09:18:00.768-05:002014-01-16T09:18:00.768-05:00Thanks for commenting! Glad you figured out the i...Thanks for commenting! Glad you figured out the issue.Breaking BIhttps://www.blogger.com/profile/02551920506874509998noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-75353235575120601872014-01-15T20:02:23.214-05:002014-01-15T20:02:23.214-05:00Brad,
All I needed to do was turn off "Aggre...Brad,<br /><br />All I needed to do was turn off "Aggregate Measures" under AnalysisCoreyThttps://www.blogger.com/profile/04207663076007021491noreply@blogger.comtag:blogger.com,1999:blog-3721871707048712457.post-25709893126455812142014-01-15T19:40:49.230-05:002014-01-15T19:40:49.230-05:00Hey Brad,
Great post!
I am trying to replicate t...Hey Brad,<br /><br />Great post!<br /><br />I am trying to replicate this to identify members of a particular cluster. Although I am receiving an error within the kmeans function....<br /><br />Error in sample.int(m, k) : <br />cannot take a sample larger than the population when 'replace = FALSE'<br /><br />My cluster calculation is below:<br /><br />SCRIPT_INT(<br />"<br />## Sets the seed<br /><br />set.seed( .arg5[1] )<br /><br />## Studentizes the variables<br /><br />fte <- ( .arg1 - mean(.arg1) ) / sd(.arg1)<br />fr <- ( .arg2 - mean(.arg2) ) / sd(.arg2)<br />below <- ( .arg3 - mean(.arg3) ) / sd(.arg3)<br />dat <- cbind(fte, fr, below)<br /><br />num <- .arg4[1]<br /><br />## Creates the clusters<br /><br />kmeans(dat, num)$cluster<br />", <br /><br />MAX( [Special Ed FTE Fixed] ), MAX( [FR Status Fixed] ), MAX( [Below Standard] ), [Number of Clusters], [Seed]<br /><br />)<br /><br /><br />Any ideas on why I would receive this error?<br /><br />Thanks for your timeCoreyThttps://www.blogger.com/profile/04207663076007021491noreply@blogger.com