|Experiment So Far|
|Automobile Price Data (Clean) 1|
|Automobile Price Data (Clean) 2|
In the previous post, we were able to calculate important evaluation statistics for our regression models, R Squared being the most important. However, we left out a very important concept known as Normalization.
Many statistical algorithms (including some regression algorithms) attempt to determine the "best" model by reducing the variance of something (often the residuals). However, this can be a problem when we are dealing with features on massively different scales. Let's start by considering the calculation for variance. The calculation starts by taking an individual value and subtracting the mean (also known as average). This means that for very large values (like "Price" in our dataset), this difference will be very large, while for small values (like "Stroke" and "Bore" in our dataset), this difference will be very small. Then, we square this value, making the difference even larger (and always positive). Finally, we repeat this process for the rest of the values in the column, then add them together and divide by the number of records.
So, if we asked an algorithm to minimize this value across a number of different factors, we would find that it would almost always minimize the variance for the largest features, while completely ignoring the small features. Therefore, it would be extremely helpful if we could take all of our features, and put them on the same scale. This is what normalization does. Let's take a look at the module in Azure ML.
|Normalize Data (Visualization)|
|Ordinary Least Squares Linear Regression|
|Online Gradient Descent Linear Regression|
|Boosted Decision Tree Regression|
With this in mind, we can conclusively say that Poisson Regression (without normalization) created the best model for our situation. Hopefully, this experiment has enlightened you to all the ways in which you can use Regression in your organization. Regression truly is one of the easiest techniques to use in order to gain tremendous value. Thanks for reading. We hope you found this informative.