## Monday, November 17, 2014

### Data Mining in Excel Part 29: Profit Chart

Today, we're going to talk about the next in the Microsoft suite of Model Validation tools, Profit Chart.
 Profit Chart
 Select Structure
First, we need to select our structure or model.  Just like the previous two posts on Accuracy Charts and Classification Matrices, this chart cannot be used on models that were built using the Association Rules or Time Series algorithms.  Let's move on.
 Select Parameters
Here, we can input the cost of our campaign.  First, we need to decide which value we're looking for.  In our case, we want to find people who will buy a bike.  Next, you input the number of potential customers, 50000 in our case.  Then, you input the fixed cost of creating the campaign, \$5000 for us.  Now, you input the individual cost to advertise to a person, \$10 in this example.  Lastly, you need to define how much money you would make it you sold a bike, \$15 per bike.  This is a great start.  However, in the real world, we get discounts for dealing in bulk.  So, what if an advertiser will give you a discount to create a certain amount of advertisements.  If we click the "..." beside the "Individual Cost" box, we can input that.
Here, we're saying that the first 1000 flyers will cost \$10 per flyer, the next 9000 will cost \$7.50 per flyer and everything above that costs \$5 per flyer.  Let's keep going.
 Select Data Source
Finally, we need to choose our data source.  As usual, we strongly advise against testing the model using the same data that you used to train the model.  Let's check out the results.
 Profit Chart (Purchased Bike)
This chart requires a little background.  Imagine that you used these algorithms to assign a probability to each prospective customer (which we haven't learned how to do yet).  This probability tells you how likely that customer is to buy a bike from you.  Then, you order these customers according to those probabilities, highest probability to lower probability.  This ordering is how the horizontal (X) axis of this chart works.  Let's pick 30%.  This point says that if you advertised to the top 30% of your prospective customers as defined by this model, you would make Y dollars.  So, the goal is to see what percentage of our population we need to advertise to in order to maximize our profit.  We see that the blue line (random) is basically the worst performer out of the bunch, followed closely by the red line (Decision Tree Model).  The rest of the models seem to be pretty tightly packed.  The algorithm gives us an easier chart to read.
 Maximum Profit per Model
This table was originally on a single line in the Excel worksheet.  We moved it around for easy readability.  We see that the best model is the Neural Network (again).  Followed closely by each of the other models.  This table also tells us what percentage of the population we would have to advertise to in order to make that amount of money.  Compare the Neural Network Model (Middle Right) to the Neural Network 10 HNR Model (Bottom Right).  If we advertise to 5% more people, we make \$2500 more.  This isn't an incredibly large amount.  If there's some subtle reason why we wouldn't want to advertise to that many people, then perhaps we should pick the Neural Network 10 HNR Model despite the fact that it's not worth as much.

Now that we've gone through three different Model Validation procedures, we should have a good grasp on how to analyze these models to see which perform the best.  Stay tuned for our next post where we'll be talking about Cross-Validation.  Thanks for reading.  We hope you found this informative.