We performed some unusual transformations in the last post for the sake of showcasing functionality, but they didn't have any effect on the overall project. So, we'll delete all of the Data Preparation Steps past the initial "Reference Dataflow" step.
However, when we view the statistics of a numeric feature, we get some very useful mathematical values. There are a few points of interest here. First, the median and the mean are very close. This means that our data is not heavily skewed. We can also see the minimum and maximum values, as well as the upper and lower quartiles. These will let us know if we have "heavy tails" (lots of extreme observations) or even if we have impossible data, such as an Age of 500 or -1. In this case, there's nothing that jumps out at us from these values. However, wouldn't it be easier if we could see this visually? Queue the histogram!
We also have the option of hovering over the bars to see the exact range they represent. We can even select the bars, then select the Filter icon in the top-right corner (look for the red box in the above screenshot) of the histogram to see a detailed view.
Finally, we can select the "Edit" button in the top-right of the histogram to edit the settings.
At this point in this post, we wanted to showcase the "Pattern Frequency" chart. However, our dataset doesn't have a feature appropriate for this. Instead, we pulled a screenshot from Microsoft Docs (source).
Hopefully, this post showed you how Inspectors can add some much needed visualizations to the data science process in AML Workbench. Visualizations are one of the most important parts of the Data Cleansing and Feature Engineering process. Therefore, any data science tool would need robust visualization capabilities. While we are not currently impressed by the breadth of Inspectors within AML Workbench, we expect that Microsoft will make great investments in this area before they release the tool as GA. Stay tuned for the next post where we'll walk through the Python notebook that predicts Species within the "Classifying Iris" project. Thanks for reading. We hope you found this informative.
Senior Analytics Associate - Data Science