Monday, September 29, 2014

Data Mining in Excel Part 22: More Forecasting

Today, we're going to talk some more about the Forecast algorithm within the Data Mining Add-in for Excel.
Forecast
In the previous post, we built a simple forecasting algorithm to predict future sales values.  However, the model's predictions were not very good.  So, let's see if we can't use some of the parameters to create a better model.
Parameters
Here are all the parameters we have access to.  For more information on what these parameters do, read this.  As we talked about in the previous post, this algorithm is made up of two distinct models, ARIMA and ARTXP.  The Prediction Smoothing parameter affects how much weight is applied to each model.  To get a starting point, let's set Prediction Smoothing to 0 and 1 to see how the pure ARIMA and pure ARTXP models fare.  To be clear, Prediction Smoothing = 0 is pure ARTXP and Prediction Smoothing = 1 is pure ARIMA.
ARTXP Model
The first thing we notice about this model is that the Confidence Bands (the vertical dotted lines) are enormous.  Far too big to have any confidence in a prediction.  This presents a very serious issue for using model.  Also, the model doesn't account for much of the variability across months.  Let's see how the ARIMA model fares.
ARIMA Model
Immediately, we notice that the confidence bands on this model are extremely small, which is a good thing.  The model also seems to be accounting for the massive variability of the data.  However, the model is always choosing the wrong direction of gain or loss.  It thinks the value will go up when it actually goes down, and vice-versa.  We're pretty confident that we want to use the ARIMA model because of the tightness of the predictions; however, let's see if we can use some more parameters to finesse it a little.

Parameters
Looking back at the parameters, we have Auto Detect Periodicity.  A periodicity is a distinct pattern over time in the data.  For instance, monthly data typically has a periodicity of 12 because customer habits tend to be similar around the same time every year.  On the same note, daily data might have periodicities of 7, 15 and 30 because that's when people get paid.  Many promotions begin and end on Tuesdays, which seems to create a weekly periodicity in retail/sales data.  Let's see what happens when we set this value to .3 and .8.
ARIMA Model (.3 ADP)
ARIMA Model (.8 ADP)
Looking at both of these models, we see no discernible difference.  For curiosity's sake, let's see what happens if we use Auto Detect Periodicity = 0.
ARIMA Model (0 ADP)
This model appears to be predicting using the overall mean or something like it.  It doesn't matter much because it doesn't seem that we can use this parameter to fix our issue.  Let's go back.
Parameters
There's another periodicity parameter called Periodicity Hint.  This parameter allows you to tell the algorithm what kind of periodicity you think your data has.  Our data has some definite peaks that appear in the middle and at the end of every year.  So, why don't we set this value to {6, 12}.
ARIMA Model ({6, 12} PH)
This model seems to account for the highs and lows much better than the previous ones.  However, we have an issue that seems apparent now.  Our data has a very recent upward trend.  Our historic predictions don't seem to be accounting for this.  The historic prediction model is built using two parameters, Historic Model Count and Historic Model Gap.
Parameters
Historic Model Count dictates how many distinct historic models are built and Historic Model Gap dictates how many months each model will predict.  For instance, the defaults were 1 and 10.  This means that 1 model was built for the previous 10 months, using only data from more than 10 months ago.  If we set these values to 2 and 5, we would get two models.  The first model would see data more than 10 months ago and predict data until 5 months ago.  The second model would see data more than 5 months ago and predict data until the final month.  This would let us see how the model predicts for observations in the near future.  Let's set these values to 3 and 2 and see what happens.
ARIMA Model ({6, 12} PH, 3 HMC, 2 HMG)
As you can, these historic predictions are FAR better than what we had seen before.  This leads us to believe that our model is pretty good, but only for a short period into the future.  If we want to see further than that, we either need to use a different type of model (which would require a different tool) or get more data (which would require waiting).

This was a good investigation into the Microsoft Time Series Algorithm.  Stay tuned for our next post where we will be talking about creating multiple models under a single structure.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit
llewellyn.wb@gmail.com
http://www.linkedin.com/in/bradllewellyn

Monday, September 22, 2014

Data Mining in Excel Part 21: Forecasting

Today, we'll be talking about a very cool algorithm that absolutely everyone can make use of, Forecast.
Forecast
Put simply, forecasting is the process of predicting future values.  This can range anywhere from "How many units will I sell next month?" to "Which customers are likely to be profitable next year?".  As long as you have data over time, Forecast can help you gain some serious insight.

We've worked with a fair number of companies and they all seem to budget the same way.  Compare this month's sales to last month's sales, and the sales of the same month last year.  While this technique is not bad by any means, it lacks any hint of sophistication.  If you want to really get some good insights, you've got to use some math.  But....math is hard!  What if Excel could do it for us?  Let's dive in.
Select Data Source
Of course, the first step is to select your data.  You can also choose to use an External SQL source if you would like.  Moving on.
Column Selection
Next, we need to choose which field is our Time Stamp.  If you are using data from Excel, the data source needs to be sorted in Ascending Order.  We're not sure if the same applies if you are connecting to an Analysis Services data source.  Feel free to let us know in the comments.  Now, we need to select which columns we want to forecast.  Let's check out the parameters.
Parameters
There are quite a few parameters here.  The really important one is Forecast Method.  This algorithm supports two distinct methods, ARIMA and ARTXP.  We recommend using the MIXED option unless you have a good reason not to.  These methods are quite complex.  For more information about them, you can visit ARIMA and ARTXP.  The main difference that should be noted is that the ARIMA algorithm is Univariate and the ARTXP algorithm is Multivariate.  This means that the ARIMA algorithm will not consider values from the other columns when it calculates its predictions.  The ARTXP algorithm will.  The process of using one series to predict another series is called "Cross-Prediction".  This makes the ARTXP algorithm a good option for predicting one value ahead, while the ARIMA algorithm is typically better for predicting many future values.  For more information on the rest of the parameters, read this.  Let's keep going.
Create Model
Since a forecasting model doesn't want any missing values, we do not have to define Training and Testing Sets.  We jump right to the model creation.  Let's check out the results.
Forecasted Percentages
This chart shows us the known values, as well as the predicted values for a certain number of steps in the future.  Oddly enough, the default is to show the percentage different from the starting value in the series.  This allows you to visualize many different series without having to worry about value.  However, it does lack the functional ability to show real values.  However, we can easily toggle to show absolute values.
Forecasted Absolutes
By clicking the button in red at the top of the screen, you can toggle the display between Percentages and Absolute Values.  Let's try selecting the "Show Historic Predictions" option.
Show Historic Predictions
This options shows you some of the predictions that were made when the actual values were known.  This is great for checking the accuracy of the model.  The closer the predicted values are to the actual values, the better the model is at predicting.  Unfortunately, this model doesn't seem to fit too well.  No need to worry about that, we can tweak the parameters later.  Let's check out the "Show Deviations" option.
Show Deviations
Selecting this option made the chart a little too cluttered.  So, we used the filter (highlighted in brown) to only show the North America data.  The "Show Deviations" option shows us how much deviation is expected between the actuals and the predictions.  If our actuals fall within the line, then the prediction was good.  As you can see, this is usually not the case.  This is great indication that our model needs some tweaking.  Let's look at our last option, "Prediction Steps".
Prediction Steps
We can use this option to choose how many steps into the future we would like to predict.  In our case, these steps are Months.  Please note that adding more steps does not change the predictions for previous values.  In other words, adding a sixth prediction doesn't change the previous five predictions.  However, the further into the future you predict, the less reliable your predictions will become.  Let's back up a little and look at the values on the right side of the screen.
Predicted Values
The box on the far right of the window will show you the actuals and/or predictions corresponding to the spot on the graph you select.  For instance, we selected 200407 on the chart and these values were populated.  This is the only way to get these predictions from this chart.  Let's move on to the Model.
North America Model
Since the ARTXP model is a type of decision tree, we can visualize it the same way we did for our previous decision trees.  It is interesting to note that we are looking at the North American Amounts (as indicated in the red box at the top of the window), while the tree is splitting based off of the European Amount.  When you select a node, the right side of the window (highlighted in brown) will show the Tree node equation (ARTXP) and the ARIMA equation for the selected node.  If you want to see the other series, you can select them from the red box.

This was a good example of showing how you can use the Forecast feature to develop some mathematically sound predictions based on your own data.  Stay tuned for the next post where we'll attempt to tweak this model to be a little more relevant.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit
llewellyn.wb@gmail.com
http://www.linkedin.com/in/bradllewellyn

Monday, September 15, 2014

Data Mining in Excel Part 20: Association

Today, we'll be talking about one of the more complex data mining algorithms, Association.
Associate
The association algorithm is commonly known as Market Basket Analysis or Shopping Basket Analysis.  It attempts to discern which events tend to happen together.  For example, "Which items do customers tend to buy together?" or "Which websites do people tend to surf to in the same visit?".  If you have groups of anything, this algorithm can help you determine which items are "associated".  Let's check it out.
Select Data Source
The first step is always to select your data source.  Interestingly enough, you can't connect to an external SQL Source using this algorithm.  In a later post, we'll talk about ways to work around this.  Let's move on.
Transaction and Item Identification
Here, we need to tell the algorithm which fields correspond to Transaction ID and Item.  The transaction is the simple one, usually an Order ID or a Visit Number.  The item is a bit more interested.  For a retailer, you'd think that an item would be an individual product.  If you were to use Product for Item, you would get a bunch of silly rules saying "People who buy Pet 2% milk tend to buy Hershey's Cookies and Cream Cereal Puffs".  That rule is way too specific to be of any use.  A more reasonable rule would be "People who buy Milk tend to buy Cereal".  This is why we use Category instead to get more useful rules.

Next, we need to tell the algorithm what the minimum support and minimum rule probabilities need to be.  The support of a rule is the number of times the item appeared.  The probability is the percentage of times that the items appeared together relative to the number of times they appear separately.  Let's say you have a rule that says "People who buy Milk tend to buy Cereal".  The support would be the number of times that people bought Milk.  The percentage would be the percentage of those orders that also had Cereal.  If 10 people bought Milk and 5 of those also bought cereal, then the support would 10 and the probability would be 5 / 10 = .50 = 50%.  We're fine with using the default values here.  Let's check out the parameters.
Parameters
Here, we have a bunch of values similar to the support and percentage we talked about earlier.  We're okay with keeping these defaults for now.  For more information, read this.  Moving on.
Create Model
Finally, we need to create our mining structure and model.  Once we do this, we can browse the model.
Rules Browser
This is the Rules Browser.  It let's you see all of the rules that the algorithm built from your data, as well as their probability and importance.  We already talked about what probability is.  Importance is new for us though.  Importance, aka Lift, is a type of probability that tells us how, for lack of a better word, "important" the rule is.  For instance, if every customer buys Milk and Eggs, then a rule saying "People who buy Milk also buy Eggs" isn't very useful.  You could put Eggs and Milk on completely different sides of the store and it wouldn't matter because everyone is going to buy them anyway.  However, if only 10% of your customers buy Milk, but they all buy Eggs as well, then that rule becomes much more useful.  Therefore, importance is typically what you want to look for.  Let's zoom in on a couple areas of this browser.
Probability and Importance Filters
These two filters will allow you to filter out rules that are too uncommon or unimportant for your analysis.
Rule and Display Filters
Here, you can filter for rules that contain certain keywords and choose what you would like have to displayed.
Rules
Finally, we can see some rules.  Let's check out the first rule.  It has a pretty high probability of .7 and a high importance of .855.  The rule says "If the customer buys Socks and Mountain Bikes, they are also likely to buy Fenders."  The second rules says "If the customer buys Vests and Mountain Bikes, they are also likely to buy Fenders."  These rules can provide extraordinary insight into your business if you approach them with some business knowledge in hand.  A good business analyst will look at some of these rules and say "Duh!  I already knew that".  The real value here lies in the rules that make the analyst stop and say "Hmm.  I didn't know that.  Let me dig into this a little deeper."  Let's look at the Itemset Browser.
Itemsets Browser
This view shows you something a little different.  It shows you which items are more commonly bought together.  This is very useful if you want to know which combinations of items are selling, without having worry about things like importance.  Let's check out the Dependency Network.
Dependency Network
We've seen a couple of these before.  So, we won't spend too much time talking about it.  But this does give us an idea about how different products relate with one another.  For instance, there are 5 different categories that predict the purchase of Tires and Tubes.  In reality, the Itemsets Browser showed us that Tires and Tubes is by far the most popular category.  Therefore, this isn't much of a surprise.  However, did you know that people who buy Gloves or Touring Bikes tend to also buy Helmets?  These are the kinds of insights you find if you put for the effort to find them.

The association algorithm is actually one of the most complex algorithms commonly used today.  The good thing is that Excel makes it pretty easy to work with.  Stay tuned for our next post where we'll be talking about Forecasting.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit
llewellyn.wb@gmail.com
http://www.linkedin.com/in/bradllewellyn

Monday, September 8, 2014

Data MIning in Excel Part 19: More Clustering

Today, we're going to continue our analysis of the Clustering algorithm.
Clustering
If you haven't read the previous post, we highly recommend you read it because this picks up right where it left off.  We just finished looking at the Cluster Profiles.  Let's move on to the Cluster Characteristics.
Cluster Characteristics
While the previous views allowed us to compare clusters to each other.  This view allows us to see what makes up each cluster individually.  This view is great for naming our clusters.  We could select the first few variables and use those as names.  So, we could call Cluster 1 (Older, Married, North American Home Owners with 2 Cars).  That's a pretty good description of a group of people that took almost no effort to get.  We could repeat this process for the remaining clusters if we wanted to.  But, let's move to the final view, Cluster Discrimination.
Cluster Discrimination (1 vs. 2)
This view is very similar to what we just saw.  However, this view shows us what's important in each cluster, as compared to another cluster.  For instance, we can see what really distinguishes Cluster 1 from Cluster 2.  Imagine you had a cluster that buys your products and another that doesn't.  We could use this view to come up with a list of candidate attributes that may have an impact on customers' buying habits.  We can even use this view to compare a cluster to everything that isn't in that cluster.  This is known as a complement.
Cluster Discrimination (1 vs. Not 1)
This view is perfect for determining which attributes uniquely define a cluster.  You might be wondering how this differs from the Cluster Characteristics.  The Cluster Characteristics view shows you what's in a cluster.  The Cluster Discrimination view shows you what's in a cluster that's NOT in other clusters.  So, you could use this view to develop unique names for your clusters.

Wait a minute!  That's the third view in a row that we could use to name our clusters.  Which view should you use?  That's up to personal choice and how the names will be used.  If you want a 1-stop shop for all of your information in a graphical format, the Cluster Profiles is a great place to start.  It also looks nice if you were ever presenting your results.  If you want to let the algorithm determine which features are important for your naming convention, then use the Cluster Characteristics or the Cluster Discrimination.  Personally, we think the Cluster Discrimination view is the most statistically sound way to do it.  Alas, the choice is yours.

In Statistics, there's a concept called "Robustness".  Basically, a robust model doesn't change very much if you try to tweak it.  Robutness is a very good thing that every model should have.  Imagine that you're a baseball coach.  Would you rather have a pitcher that can play well in all conditions, or a pitcher that can only play well when the sun's out, the temperature is 75 degrees and he's facing West?  It's pretty obvious; you want consistency, in your pitchers and your statistical models.  So, how do we make sure that our model is robust?  Let's check out the parameters.
Parameters
The first parameter we should notice is Cluster Seed.  This parameter determines which row the clustering algorithm uses to create the first cluster.  If you try a few different values here and the clusters don't change much, then the model is pretty robust.

The second parameter we should notice is Clustering_Method.  This parameters determines which of four different clustering algorithms get used to create the clusters.  The primary methods are 1 (E-M) and 3 (K-Means).  If you change this parameter and the clusters don't change much, then the model is pretty robust.

The question is "How do we know if the clusters changed?"  Unfortunately, we're not that far along yet.  We're still looking at the algorithms.  Have no fear, we'll soon start talking about how to take these models and get tangible results out of them.  Keep an eye out for the next post where we'll be talking about Associations.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit
llewellyn.wb@gmail.com
http://www.linkedin.com/in/bradllewellyn

Monday, September 1, 2014

Data Mining in Excel Part 18: Clustering

Today, we're going to talk about the next in the set of Data Mining algorithms, Clustering.
Clustering
Clustering is our all-time favorite statistical procedure.  It requires no knowledge of the data at all and gives great insight as to what is actually in the data.  To our knowledge, it's one of the only statistical procedures that doesn't require you to ask a question.  For instance, regression algorithms require you to ask "What would my profit be if I ordered this new product?".  With a clustering algorithm, you simply throw data at it and the algorithm tells you what's important and what's not.  Let's see it in action.  As usual, we will be using the Data Mining Sample data set from Microsoft.
Select Data Source
Of course, the first step is to select our data source.  We could use an external SQL source if we wished, but we'll go ahead and use the table.
Column Selection
The first thing we need to choose is the number of clusters, or segments, that we want.  Typically, it's not a good idea to specify the number of clusters.  The algorithm has different criteria it uses to choose this value for us.  So, we'll let the algorithm do the hard work.  Also, we never want to use IDs in our analyses.  Let's check out what kinds of parameters we have available.
Parameters
The clustering algorithm has more parameters than the decision tree algorithm.  The most important one to notice is the Clustering Method parameter.  Changing this value will likely have a serious impact on how your clusters are designed.  Perhaps a little foreshadowing here, but we may see that parameter again.  For more information on these parameters, look here.  Let's move on.
Training Set Selection
All of these algorithms require us to set aside a portion of the data for testing purposes.  We can keep the default of 30%.
Create Model
Finally, we need to create a structure and model to house the results.  Let's get to the analysis!
Cluster Diagram
The cluster diagram shows the relationships between the clusters.  The more similar two clusters are, the "stronger" the link between them will be.  The stronger the link, the darker the line.  You can show more or less links by using the slider on the left side of the window (enclosed in red below).  You can also examine the clusters by how dark they appear.  The shading of the clusters is determined by the box at the top of the screen (enclosed in brown below).
Cluster Diagram (Links and Population)
The shading variable is Population.  This means that the darkness of the cluster corresponds to the number of rows within it.  You can hover your cursor over any cluster to see it's size.  You can also change the shading value to see how certain values are distributed across clusters.  This is neat in some ways, but we prefer the Cluster Profiles for this type of analysis.
Cluster Profiles
We've created clusters in a number of programs.  We've even used Tableau in all of it's splendor.  However, the Microsoft Cluster Profiles view is by far the best view we've ever seen for inspecting clusters.  Let's zoom in on a couple pieces so we can see some things.
Cluster Profiles Zoom
In this view, each column is a cluster and each row is a variable.  The first cluster is actually the entire population.  They even do us the liberty of naming it "Population".  However, the real attraction here is the comparison of clusters.  With this view, we can scroll across a single row to see which variables distinguish each cluster.  For instance, Cluster 2 has mostly young people, while Clusters 1 and 3 are older.  Cluster 2 has quite a few people with zero cars, while Clusters 1 and 3 have mostly people with 2 cars.  We can repeat this process for all of the clusters and all of the variables to intuitively name our clusters if we chose.  For instance, we could call Cluster 2 "Young Adults with No Cars".  However, it gets slightly more complex because there are a quite a few other variables here we have to look at as well.  If your data has too many variables, you can use the parameters to force the algorithm to choose only the most distinct variables.

This is a great place to stop for today.  We've seen that the Clustering algorithm is a great way to get a good feeling for what's in your data, without having to do any manual investigation.  Stay tuned for the next post where we'll continue with our analysis and maybe even make some alterations.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit
llewellyn.wb@gmail.com
http://www.linkedin.com/in/bradllewellyn