## Monday, May 1, 2017

### Azure Machine Learning: Regression Using Poisson Regression

Today, we're going to continue our walkthrough of Sample 4: Cross Validation for Regression: Auto Imports Dataset.  In the previous posts, we walked through the Initial Data Load, Imputation, Ordinary Least Squares Linear Regression, Online Gradient Descent Linear Regression and Boosted Decision Tree Regression phases of the experiment.
 Experiment So Far
Let's refresh our memory on the data set.
 Automobile Price Data (Clean) 1

 Automobile Price Data (Clean) 2
We can see that this data set contains a bunch of text and numeric data about each vehicle, as well as its price.  The goal of this experiment is to attempt to predict the price of the car based on these factors.  Specifically, we're going to be walking through the Poisson Regression algorithm.  Let's start by talking about Poisson Regression.

Poisson Regression is used to predict values that have a Poisson Distribution, i.e. counts within a given timeframe.  For example, the number of customers that enter a store on a given day may follow a Poisson Distribution.  Given that these values are counts, there are a couple of caveats.  First, the counts cannot be negative. Second, the counts could theoretically extend to infinity.  Finally, the counts must be Whole Numbers.

Just by looking at these three criteria, it may seem like Poisson Regression is theoretically appropriate for this data set.  However, the issue comes when we consider the mathematical underpinning of the Poisson Distribution.  Basically, the Poisson Distribution assumes that each entity being counted operates independently of the other entities.  Back to our earlier example, we assume that each customer entering the store on a given day does so without considering whether the other customers will be going to the store on that day as well.  Comparing this to our vehicle price data, that would be akin to saying that when a car is bought, each dollar independently decides whether it wants to jump out of the buyer's pocket and into the seller's hand.  Obviously, this is a ludicrous notion.  However, we're not theoretical purists and love bending rules (as long as they produce good results).  For us, the true test comes from the validation portion of the experiment, which we'll cover in a later post.  If you want to learn more about Poisson Regression, read this and this.  Let's take a look at the parameters for this module.
 Poisson Regression
The Poisson Regression algorithm uses an optimization technique known as Limited Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS).  Basically, this technique tries to find the "best" set of parameters to fill in our Poisson Regression equation, which is described in detail here.  In practice, the smaller we make "Optimization Tolerance", the longer the algorithm will take to train and the more accurate the results should be.  This value can be optimized using the "Tune Model Hyperparameters" module.

Without going into too much depth, the "L1 Regularization Weight" and "L2 Regularization Weight" parameters penalize complex models.  If you want to learn more about Regularization, read this and this.  As with "Optimization Tolerance", Azure ML will choose this value for us.

"Memory Size for L-BFGS" specifies the amount of memory allocated to the L-BFGS algorithm.  We can't find much more information about what effect changing this value will have.  Through some testing, we did find that this value had very little impact on our model, regardless of how large or small we made it (the minimum value we could provide is 1).  However, if our data set had an extremely large number of columns, we may find that this parameter becomes more significant.  Once again, we do not have to choose this value ourselves.

The "Random Number Seed" parameter allows us to create reproducible results for presentation/demonstration purposes.  Oddly enough, we'd expect this value to play a role in the L-BFGS algorithm, but it doesn't seem to.  We were unable to find any impact caused by changing this value.

Finally, we can choose to deselect "Allow Unknown Categorical Levels".  When we train our model, we do so using a specific data set known as the training set.  This allows the model to predict based on values it has seen before.  For instance, our model has seen "Num of Doors" values of "two" and "four".  So, what happens if we try to use the model to predict the price for a vehicle with a "Num of Doors" value of "three" or "five"?  If we leave this option selected, then this new vehicle will have its "Num of Doors" value thrown into an "Unknown" category.  This would mean that if we had a vehicle with three doors and another vehicle with five doors, they would both be thrown into the same "Num of Doors" category.  To see exactly how this works, check out of our previous post, Regression Using Linear Regression (Ordinary Least Squares).

The options for the "Create Trainer Mode" parameter are "Single Parameter" and "Parameter Range".  When we choose "Parameter Range", we instead supply a list of values for each parameter and the algorithm will build multiple models based on the lists.  These multiple models must then be whittled down to a single model using the "Tune Model Hyperparameters" module.  This can be really useful if we have a list of candidate models and want to be able to compare them quickly.  However, we don't have a list of candidate models, but that actually makes "Tune Model Hyperparameters" more useful.  We have no idea what the best set of parameters would be for this data.  So, let's use it to choose our parameters for us.
 Tune Model Hyperparameters
 Tune Model Hyperparameters (Visualization)
We can see that there is very little difference between the top models using Coefficient of Determination, also known as R Squared.  This is a great sign because it means that our model is very robust and we don't have to sweat over choosing the perfect parameters.

On a side note, there is a display issue causing some values for the "Optimization Tolerance" parameter to display as 0 instead of whatever extremely small value they actually are.  This is disappointing as it limits our ability to manually type these values into the "Poisson Regression" module.  One of the outputs from the "Tune Model Hyperparameters" module is the Trained Best Model, whichever model appears at the top of list based on the metric we chose.  This means that we can use this as an input into other modules like "Score Model".  However, it does mean that we cannot use these parameters in conjunction with the "Cross Validate Model" module as that requires an Untrained Model as an input.  Alas, this is not a huge deal because we see that "Optimization Tolerance" does not have a very large effect on the resulting model.
 All Regression Models Complete
Hopefully we've laid the groundwork for you to understand Poisson Regression and utilize it in your work.  Stay tuned for the next post where we'll be talking about Regression Model Evaluation.  Thanks for reading.  We hope you found this informative.