In early phases of a data mining project, when selecting, discovering and preparing the data the analyst looks at quite a few graphs to get a quick understanding of what can be used and where further preparation and transformation is needed. Those graphs are disposable in most cases and don’t need fancy axis labels or even alignment with corporate design. What matters here is speed, ease of use and readability for the informed power user. The leading data mining suites such as PASW Modeler aka Clementine, the SAS Enterprise Miner and also the open source workbench KNIME[1] from University of Konstanz are well prepared for this kind of graph mass production and disposal. This phase of data preparation and step by step building up of understanding is often quite time consuming. The ergonomics and efficiency gains provided here are one of the reasons d’être of the mining suites as opposed to isolated tools
But from time to time one or more of these disposable graphs turn out to be nuggets — valuable discoveries or insight that deserve being shared with a wider audience. So the graph — say a histogram comparing numbers of contacts for a few segments — is already there. Capturing this by-product should not slow us down on our way towards the model we are heading for. Unfortunately the histogram’s layout, axes and captions are not in compliance with the visualisation standards and rules we have been preaching and teaching before[2] . The histogram with tick marks for 0.8 1.8 and so on until 19.8 for these difficult to be more integer data is simply not presentation ready. Too bad.
Now starting up another tool reproducing the selection, the little transformation and the histo itself will likely take a few minutes and distract us from our core task, the model to be built.
So after all, we might be inclined to limit the celebration of the nugget to showing it to their colleague who happens to take a coffee 2 meters away. But anything more would be too much hassle and not as target oriented as we ought to be. At least as long as data mining suites and presentable graphics remain two different stories.
PS: This hold actually more for Clementine and KNIME which have the aspiration of being exhaustive and fully integrated. SAS will often require the use of two tools (Guide and Miner) anyway, depending on where the data come from.
[1] The three tools do not make up an exhaustive list but simple happen to be the ones that I happen to have used
[2] See for example http://www.perceptualedge.com/blog/ or various publications on data visualisation for guidelines how presentable graphs should look like.
Great site, how do I subscribe?