Monday, October 29, 2018

Data Science in Power BI: Creating Custom R Visuals

Today, we're going to talk about Creating Custom R Visuals within Power BI.  If you haven't read the earlier posts in this series, Introduction, Getting Started with R Scripts, Clustering, Time Series Decomposition, Forecasting and Correlations, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let's move on to the core of this post, Creating Custom R Visuals.

In Power BI, it's possible to create custom visuals using TypeScript.  It's also possible to have these visuals leverage custom R code.  You can find out more about this process here.  While extremely interesting, these is NOT the type of visuals we'll be talking about today.

Over the last few posts, we've shown how to use custom R visuals built by others.  Today, we're going to build our own using the Custom R Visual available in Power BI Desktop.  If you haven't read the second post in this series, Getting Started with R Scripts, it is highly recommended you do so now, as it provides necessary context for how to link Power BI to your local R ISE.

In the previous post, we created a bunch of log-transformed measures to find good predictors for Revenue.  We're going to use these same measures today to create a basic linear regression model to predict Revenue.  If you want to follow along, the dataset can be found here.  Here's the custom DAX we used to create the necessary measures.

<CODE START>

Revenue Current Month (Log) = IFERROR( LN( CALCULATE( [RevenueTY], 'Date'[YearPeriod] = "201411" ) ), BLANK() )

Revenue Previous Month (Log) = IFERROR( LN( CALCULATE( [RevenueTY], 'Date'[YearPeriod] = "201410" ) ), BLANK() )

COGS Previous Month (Log) = IFERROR( LN( CALCULATE( [Total COGS], 'Date'[YearPeriod] = "201410" ) ), BLANK() )

Labor Cost Previous Month (Log) = IFERROR( LN( CALCULATE( [Sum of Labor Costs Variable], 'Date'[YearPeriod] = "201410" ) ), BLANK() )

Third Party Costs Previous Month (Log) = IFERROR( LN( CALCULATE( [Sum of Cost Third Party], 'Date'[YearPeriod] = "201410" ) ), BLANK() )

Travel Expenses Previous Month (Log) = IFERROR( LN( CALCULATE( [Sum of Travel Expenses], 'Date'[YearPeriod] = "201410" ) ), BLANK() )

Rev for Exp Travel Previous Month (Log) = IFERROR( LN( CALCULATE( [Sum of Rev for Exp Travel], 'Date'[YearPeriod] = "201410" ) ), BLANK() )

<CODE END>

Next, we create an empty "Custom R Visual".
Custom R Visual
Then, we add "Name" from the Customer table and all of our new measures to the Values Shelf for the Custom R Visual.
Values Shelf
Now, we could begin writing our script directly in Power BI.  However, it's very difficult to test and debug within Power BI.  So, we recommend using an external ISE, such as RStudio.  Fortunately, Power BI has a button that exports the data to a csv and loads it into the ISE we specified in the options.
Edit Script in External ISE
In our ISE, the first thing we want to do is take a look at our dataset using the following command:
head(dataset)
Dataset Head
We can see that we have nulls (R calls them NAs) in our dataset.  In our case, we can assume that all of these nulls are actually 0's.  Please note that this is not always the case, but we are assuming it in this case.  We've done a number of posts in the past on different ways to handle "imputation".  Check them out here and here.  There are probably some more further back if you want to dig.  For now, all we want to do is replace all NA's with 0's.  We can do this using the following command:
dataset[is.na(dataset)] <- 0
head(dataset)
Imputed Dataset Head
Now, we're ready to build a basic linear regression model.  However, we don't want to include the Name column, as it provides no analytical value.  We can do this using the following command:
reg <- lm(`Revenue Current Month (Log)` ~ . - `Name` , data = dataset )
summary(reg)
Regression Results
Those with experience in linear regression will notice that an "Adjusted R-Squared" of .08 and a "P-Value" of .15 mean that this model is terrible.  However, the purpose of this point is on visualizations, not regression models!  We have a previous post that touches on the finer points of regression.

Now that we have a regression model, we can plot some interesting things about it.  Our original dataset contained seven columns, not counting "Name".  We can't plot seven dimensions on a scatterplot.  So, we need another way to look at our regression model.  One of the ways to do this is with a Residuals vs. Fitted plot.  Basically, this plot shows how "far off" the regression model was.  A better model would have residual values tightly clustered around zero, whereas a worse model would have larger residual values.  We can make one using the following code:
plot(x = reg$fitted.values, y = reg$residuals, xlab = "Fitted Values", ylab = "Residuals")
Residuals vs. Fitted Plot
While this chart is interesting, it's tough to see where zero is for the residuals.  Let's add a horizontal line to make it more obvious.  We can accomplish this using the following code:
abline(h = 0)
Residuals vs. Fitted (Horizontal Line)
This is becoming easier to understand.  Now we can see that the majority of our points are slightly above the zero line, with a few points very far below.  It's pretty obvious that these three points are having a major impact on our model.  This is the value of using a Residuals vs. Fitted plot.  Alas, this post is not about improving our model.  Let's add one final detail, the R-Squared value.  We can use the following code to add the R-Squared value to the title:
plot(x = reg$fitted.values, y = reg$residuals, xlab = "Fitted Values", ylab = "Residuals", main = paste( "Predicting Revenue Current Month (Log): R-Squared = ", round(summary(reg)$r.squared, digits = 3), sep=""))
abline(h = 0)
Now we have a plot that gives us some interesting information about our data.  The final step is to take the R code we've created and copy it into the Custom R Visual in Power BI.  We don't need any of the heads or summaries since Power BI won't output them anyway.  Here's all the code we need:
dataset[is.na(dataset)] <- 0
reg <- lm(`Revenue Current Month (Log)` ~ . - `Name` , data = dataset )
plot(x = reg$fitted.values, y = reg$residuals, xlab = "Fitted Values", ylab = "Residuals",
      main = paste( "Predicting Revenue Current Month (Log): R-Squared = ", round(summary(reg)$r.squared, digits = 3), sep=""))
abline(h = 0)
Now for the final piece.  One of the main reasons this R functionality is so powerful is that you can filter this R visual by using other Power BI charts or slicers.  For instance,  we can see the difference in our regression before and after excluding certain products.
Custom R Visual (All Products)
Custom R Visual (Some Products)
We can see that our R-Squared increased by .1 when we removed the (Blank) and Sova products.  This would be a great way for us to see which products may follow a different sales pattern.

Hopefully, this post opened your minds to the possibilities of creating your own R visuals within Power BI.  Obviously, we only scratched the surface of this incredibly complex topic.  There are many excellent blog posts around the internet showcasing some of the coolest visuals available in R.  We encourage you to try some of them out.  Stay tuned for the next post where we'll dig into R Scripts in Power Query to see what they have to offer.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Senior Analytics Associate - Data Science
Syntelli Solutions
@BreakingBI
www.linkedin.com/in/bradllewellyn
llewellyn.wb@gmail.com

No comments:

Post a Comment