Monday, August 18, 2014

Data Mining in Excel Part 16: More Classification

Today, we're going to continue our demonstration of the Classify algorithm.
 Classify
In our previous post, we showed how to use this algorithm to build a basic classification tree.  However, the tree only branches once, which is not very useful.  Let's recreate the model changing one of the parameters so that we can get a bigger tree for demonstration purposes.
 Complexity of .25
The Complexity Parameter affects how large the tree can grow.  By lowering the value to .25, we allow the tree to grow more than before.  Let's check out the new results.
 Browse (.25 Complexity)
Right away, we can see that this tree is more complex than before.  We also see another attribute being displayed now.  Notice how some of the nodes are darker shades of blue than the others?  The darkness of the node indicates the number of rows it holds.  We can see that Node 3 (Age >= 39 and < 67) is much darker than the rest of the nodes.  See the + at the edge of the middle nodes in the last column?  These denote that there are more nodes beyond.  To display these nodes, we can either click the + or move the slider at the top of the window.
 Expanding the Tree
Now, we have a few variables at play in this tree.  Let's click on the Dependency Network tab and see what's over there.
 Dependency Network
This graph shows us all of the variables in our tree and whether they are used as predictors, responses, or both.  Since our model only has one response, it's a pretty clean network.  But these can get much more complex as the models grow.  Let's see what happens when we click on the response, Purchased Bike.
 Dependency Network (Purchased Bike)
See that all of the predictors turn red, just like the legend at the bottom says.  Now, what happens if we select a predictor?
 Dependency Network (Children)
We see that Purchased Bike turns blue, just like the legend says.  Unfortunately, our model doesn't allow for a variable to both a predictor and a response, so we don't get to see purple.  Notice the slider on the far left of the window?  What happens if we slide that down?
Every notch we slide it down, another variable drops off based on how well it predicts Purchased Bike.  If we slide it halfway down, we are left with Cars, Age, and Commute Distance, which are the same three variables we saw in the first few levels of the classification tree.  This gives you a good idea of which variables are important for your predictions and which aren't.

Let's say that you accidentally closed the browser and didn't get to look at your model.  You can revisit the browser for the model any time you wish by using the Browse tool.
 Browse