Monday, December 22, 2014

Using the In/Out Tools in Alteryx

We've recently been exposed to a Data Analysis Tool called Alteryx.  It exists in the gaping middle ground where Excel is too weak and SQL is too cumbersome.  With complete honesty, we can say that Alteryx lets us do analysis in less than half the time it would take in SQL or Excel.  To make things even better, Alteryx can output data directly to a Tableau Data Extract for visualization in Tableau.  This makes it 10x better in our book.  If you haven't seen Alteryx in action, you can check out their On-Demand Training Videos here.

Today, we're going to walk through the In/Out Tools.

Input Tool 1: Input Data
Input Data
This is the second most common tool you will use in Alteryx.  It is the beginning of almost every analysis and requires very little explanation.  This is where you specify your data set.  This tool can take in quite a few different types of files.
Input File Types
Basically, this covers most of the files you are going to find.  If you need to connect to a database, they allow you to create your own connections, similarly to the way Excel does it.
Database Connection
There are certainly some databases missing from this, Hadoop to name a big one.  However, in the database world, there's almost always a way to connect to your database using a generic connector, such as ODBC or OleDB.  If you have one of these systems, you may need to get IT to help you create the connection.

Input Tool 2: Map Input
Map Input
This is probably the coolest input tool, albeit the least useful for real analysis.  It allows you to select point, draw lines, or create polygons by drawing with your mouse.  You can even use a map as the base to draw polygons around geographic areas.  For instance, if you wanted to point out key locations on a map, or draw a polygon around a location or set of locations.
Statue of Liberty
You can even import your own Alteryx Database (YXDB) files to use as a start.
Ad Areas
This tool is really useful for cool presentations.  But, we don't see it having much analytical use other than estimating locations of things for geocoding purposes.

Input Tool 3: Text Input
Text Input
This tool is really useful when you need to relabel or append data on a one-off basis.  Let's say that you have a file with all of your customers and you need to Categorize them according to a complex list.  You could create that list with a Text Input and join it to your original data.
Customer Types
If you have the ability to change the original data, you should definitely add this there.  You could also create a separate Excel file with just this table so that you can use it in other places as well.  If for some reason you can't do any of that, then the Text Input tool is a quick alternative.

We actually had a case where we created a complex R macro that always required at least 1 row of data.  However, some times we would not have any data to give it.  So, we added a Text Input with a blank row and appended it to the data source.  This way, there was always at least 1 row in the data set.  Alas, this is a very situational tool.

Input Tool 4: Directory

This tool allows you to input the file names, paths, and a myriad of other data about the files within a certain folder on your hard drive or server.  
Directory Properties
This is useful for determining the last time certain files are updated and how often people are writing files to a directory.  Combined with the Dynamic Input tool, you can use this to import multiple files within multiple directories using very complex logic.  This could allow to keep track of all your data in a single place without having to archive it in a single folder.  There's a Sample Module in Alteryx that uses this tool to a similar purpose.

Input Tool 5: Date Time Now
Date Time Now
This tool hardly needs an explanation.  You tell it the format, and it outputs the Date and Time that your Alteryx Module runs.
Date Time Now Properties
This is great if you want to track the times that a module or finding which pieces of a macro or module are taking a long time.  Interestingly enough, the Formula tool can give you this same information in a more efficient way.  Therefore, we can't think of too many cases where this tool would be useful.

Output Tool 1: Browse
This is the most common tool you will use in Alteryx.  It allows to you to view the data being passed through any section of your Alteryx module.  It can even view charts, visualizations, and maps.
Pet Store Monthly Sales
If you want to save this data for later, you can copy it to the clipboard or save it to one of the output file types.
Save Browse

Output Tool 2: Output Data
Output Data
This tool is great if you want to output files without the manual work of using the Browse tool.  It can output a pretty comprehensive list of file types.
Output File Types
You can even name the file using a field within the data if you want to use this tool within some type of Iterative or Batch Macro.  The big thing to note here is that Alteryx can output proprietary file types like Tableau Data Extracts, QlikView data eXchanges, and ESRI SHP Files.

As you can see, Alteryx is pretty sophisticated when it comes to reading in and writing out different types of data.  The real power of the tool comes when you quickly join the data together and create advanced analyses.  Stay tuned for the next post where we'll be talking about the Field Preparation tools.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit

Monday, December 1, 2014

Data Mining in Excel Part 31: Querying your Model

Today, after months of creating and validating models, we get to the final piece of the Data Mining puzzle, Querying your Model.  In addition to Querying, we'll also briefly touch on the other components of the Model Usage and Management segments.
Model Usage and Management
First, the Browse option allows you see the Result sets we've been looking at all along.  For instance, let's check out the Results from our Classify Purchased Bike Model.
Now, what if you needed to provide documentation of the model you built to allow other people to reference it later?  The "Document Model" option does exactly that.
Summary Information
Complete Information
You can choose to get a short amount of Summary Information or you can choose to get a long list of Complete Information that has basically all of the information about your model.  This can be very helpful in a large-scale production environment.

Next, you can use the Manage Models option to rename, delete, clear, process, export, and import models.  This can be really helpful when you are using these models quite a bit over long periods of time.
Manage Structures and Models
Finally, let's get to the Querying.  You be wondering "Why did you make us wait 31 weeks to get this information?"  Well, querying a model is one of the easiest thing to do.  However, that's only the very last piece of an analysis.  What truly separates good data miners from bad ones is how well they analyze and validate their models.  That's why we spent such a large amount of effort in those areas.  Now, let's click on the "Query" button and see what happens.
Select Model
First, we need to select the Model that we would like to query.  Let's say that we want to use our Forecasting models.  Remember that we chose the Forecast Sales ARIMA 6 12 PH 3 HMC 2 HMG as our best model.  Let's keep going.
Choose Output
Next, we need to choose how many months into the future and which series we would like to predict.
Choose Destination
Finally, we need to choose where we would like to store these predictions.  Let's check them out.
North America Amount Prediction
See how easy that was?  Remember when we chose our Naive Bayes Model as the best model for predicting who is going to buy a bike?  Let's try it with that model.
Select Data Source
When we queried our forecasting model, we didn't have to specify an input because the model always predicts based on previous values.  However, the classification predicts based on other values for that observation.  This means that we have to specify some input data.  Let's use the Source Data, which contains ALL of the customers.  Let's keep going.
Specify Relationship
Since the input data may be different from the training data, we need to make sure that all of the columns match up.  Since this data is formatted the same way, we have the same column names.  Let's move on.
Choose Output (Empty)
Here, we have a slightly different Choose Output window.  We need to decide which types of output we would like by clicking the "Add Output" button.
Add Output
Here, we can can decide which of the five types of output we would like.  Just to see them, let's choose them all, including the Advanced Functions that can be seen by checking the box in the bottom-left corner.
Choose Output (Complete)
Let's move on.
Choose Destination (Purchased Bike)
Here, we can choose to append the results to our data set.  Let's check them out!
We have so much information in front of us now.  The really important pieces are the Predictions and Probabilities.  Now, we can compare known Bike Buyer values to our Predictions to see how well our model actually works.  We can also see that larger probabilities correspond to Yes values while smaller probabilities correspond to No values.  We won't go any further into detail about this area.  But we will say that there is a TON more to do here.  This functionality also exists in SQL Server Analysis Services and can be used for even more cool stuff.  We'll leave you with one final piece.  In a few of the windows we looked at, there's an "Advanced" button.  Clicking on it will give you a snippet of a programming language called DMX.
Query Editor
You can alter this to create almost anything you want.  You can even use this as a good way to learn DMX.  We hope you enjoyed this series and want to take what you've learned here and apply it to your data.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Director, Consumer Sciences
Consumer Orbit