Monday, May 5, 2014

Data Mining in Excel Part 1: Introduction

Today, we're beginning a new series about Data Mining.  Unlike my previous posts, this series will not focus on Tableau, but instead will focus on everyone's favorite analytics tool, Microsoft Excel.  The terms "Data Mining" and "Predictive Analytics" have been tossed around quite a bit recently and a lot of you might be wondering exactly what they mean.  In my mind, they revolve around predicting "unknown values."  This unknown value can range anywhere from "How many units will I ship next month?" to "Will this customer buy my product?" and even as far as "Which presidential party is likely to win the 2040 election?"

These questions are not new to industry by any means.  In fact, people have probably been budgeting since trade first began.  However, even with all of the current technology, quite a few companies still budget by getting a bunch of account managers in the room and basically guessing what they will earn next year.  This guess is based off of business knowledge, but typically with very little mathematical thought.  We won't go into a diatribe about human perception, but we will say that people have been shown to be terrible estimators.

So, why don't we use some of these really cool tools we have to do the math for us?  Well, the algorithms have been around for decades.  We think the issue is that they've remained solely in the realm of Mathematicians and Computer Programmers.  It doesn't take long for a businessman to throw away a tool like R or SAS when they see that it requires years of mathematical education and learning an entirely new programming language to even get started.  This is where we think the Data Mining tools within Microsoft Excel shine.  They require almost no knowledge to get started and are as easy as clicking a few buttons on your Excel ribbon.  However, it seems that almost none of the business users we've spoken even know that they exist.  For some reason, they think that data mining requires a multi-million dollar investment in software and resources.

We're here to show you differently.  Throughout this series, we'll show you that all it takes to get real results using your data is a few minutes of basic data prep and the imagination to ask the questions that are important to your business.  Another great bonus of using these tools within Excel is that they play nicely with Power Query and Power Pivot, the data integration and analytics tools within Microsoft's Power BI stack.

In order to use these tools, you will need access to SQL Server Analysis Services (SSAS) 2008 or higher.  Unfortunately, I do not think you can find a free copy of SSAS.  However, if your company has anything resembling an IT department, it's very likely that they have at least one instance.  All you need is access to the instance and be allowed to create mining models.  They don't even need to give you access to the production server or anything.

Once you have access to SSAS, the rest of the tools are free downloads.

Data Mining Add-in for Excel (Excel 2007, 2010, 2013):
http://www.microsoft.com/en-us/download/details.aspx?id=7294

Data Mining Sample Data:
https://dataminingaddins.codeplex.com/releases/view/87029

Power Pivot (Excel 2010, 2013):
http://office.microsoft.com/en-us/excel/download-power-pivot-HA101959985.aspx

Power Query (Excel 2013):
http://www.microsoft.com/en-us/download/details.aspx?id=39379

It should be noted that Microsoft's official stance is that there is no interaction between Power Pivot and the Data Mining Add-ins.  They are technically correct but we'll show you some very simple ways to combine them.  There's also a book on this topic that we found to be extremely helpful.

Data Mining with Microsoft SQL Server 2008:
http://www.amazon.com/Data-Mining-Microsoft-Server-2008/dp/0470277742

We look forward to exploring these tools with you.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Data Analytics Consultant
Mariner, LLC
llewellyn.wb@gmail.com
https://www.linkedin.com/in/bradllewellyn

4 comments:

  1. I tried data mining at work, no luck.

    Microsoft does sell SQL Server Developer Edition for $50 which has SSAS, but I haven't tried data mining at home yet, just ssrs, and building a basic cube.

    ReplyDelete
    Replies
    1. That's a shame! Hopefully the $50 edition will work!

      Delete
    2. Yeah, I think the SSAS server has to be set to allow data mining. Haven't had time to try it though, Tableau is too fun ;P

      Delete
    3. Yet another hurdle! Thump your IT guys on the head and tell them that you want Data Mining and you want it now! Cheers.

      Delete