Monday, February 18, 2019

Data Science in Power BI: Stream Analytics and Azure Machine Learning Studio

Today, we're going to talk about combining Stream Analytics with Azure Machine Learning Studio within Power BI.  If you haven't read the earlier posts in this series, Introduction, Getting Started with R Scripts, Clustering, Time Series Decomposition, Forecasting, Correlations, Custom R Visuals, R Scripts in Query Editor, Python, Azure Machine Learning Studio and Stream Analytics, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let's move on to the core of this post, Stream Analytics and Azure Machine Learning Studio.

This post is going to build directly on what we created in the two previous posts, Azure Machine Learning Studio and Stream Analytics.  As such, we recommend that you read them before proceeding.

In our previous post on Azure Machine Learning Studio, we built a basic predictive model that predicts the income of new customers based on their demographic information.  This predictive model is currently exposed to us as a web service.  In the previous post on Stream Analytics, we built a basic data stream of new customers to show how the Power BI Service can be leveraged to visualize streaming data.  Obviously, the next step here is to combine these two features to create a data stream that predicts Income in real-time.  We start by opening up our existing Stream Analytics query in the Azure Portal.
Stream Analytics Query
We see that this query simply passes through the data with no additional transformations.  So, how do we leverage Azure Machine Learning Studio here?  There's one more element in the "Job Topology" pane that we haven't mentioned yet, Functions.
Create Azure ML Function
DataScienceInPowerBI Function
By navigating the to "Functions" pane and selecting "Add -> Azure ML", it's extremely easy to create an Azure ML function that connects to our existing web service.  It's also possible to create Javascript UDFs and UDAs, but those are topics for another post.
Function Details
Once we've created the function, we can open it again to see the details.  Specifically, this allows us to see exactly what fields are expected by the web service.  At this point, it's important to note that this web service requires us to pass in the "id" and "income" fields.  We left these in because they open up some interesting possibilities for analysis or visualizations that we may use at some point.  In production use cases, we recommend removing these.  Next, we need to alter our query to leverage this new function.
Updated Query
<CODE START>


WITH T AS (
SELECT *
,DataScienceInPowerBI(
1, [age], [workclass], [fnlwgt], [education], [education-num]
,[marital-status], [occupation], [relationship], [race], [sex]
,[capital-gain], [capital-loss], [hours-per-week], [native-country]
,''
) AS [PredictedIncomeRecord]
FROM [CustomersSASource]
)

SELECT *
,[PredictedIncomeRecord].[Scored Labels] AS PredictedIncome
,[PredictedIncomeRecord].[Scored Probabilities] AS ScoredProbability
INTO [CustomersStream]
FROM T

<CODE END>

As it turns out, the "DataScienceInPowerBI" function returns a value of type "record".  This means that we can't simply call the function and pass the results to Power BI.  So, we needed to make a CTE to capture the result in the "PredictedIncomeRecord" field.  Then, we pulled out the "Scored Labels" and "Scored Probabilities" fields in the final SQL statement.  It's important to note that we hard-coded 1 and '' (empty string) into the function call as our values for "id" and "income", as these fields do not exist in our data stream.  As we mentioned earlier, these fields should typically be removed in production scenarios.

As a side note, testing functions can be very difficult in Stream Analytics if you are using a Power BI sink output.  As we were developing this, we used a temporary blob storage container as our output sink, which allowed us to see the results of our queries in raw JSON format.  This is the recommended approach when developing more complex Stream Analytics queries.

Finally, let's clean up from our previous runs and start the stream with fresh data.  This will give us a dashboard that shows real-time streaming predictions.
Streaming Dashboard with Predictions
Hopefully, this post open your eyes a little to the possibilities of combining streaming data and machine learning within Power BI.  There's a tremendous amount of content here that we can't possibly cover in this blog.  Feel free to check it out on your own to see how it can help you solve some of the problems you're currently facing.  Stay tuned for the next post where we'll dig into the world of Big Data with HDInsight Hive.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Service Engineer - FastTrack for Azure
Microsoft
@BreakingBI
www.linkedin.com/in/bradllewellyn
llewellyn.wb@gmail.com