Monday, November 21, 2011

Reaction: Information Visualization for Text Analysis.

The subject presents a completely new field to me called "text mining". I felt that this read presents a few good case studies of text mining and critiques a couple of real world examples. It is always enjoyable to learn about interfaces and the purpose behind their creation. Identifying key elements within a text and showing its inter connections can be used in pattern recognition and pattern analysis. Especially when the text is unclear, a natural language processing system can use text mining to discover predictions.

The chapter is however limited in scope with the author specifically mentioning the applications of information visualization in the areas of text mining, document concordances and word frequencies, literature and citation relationships and has basic examples for all three. Coming to document concordances and word frequencies - again the authors prefer to critique a few ideas than going in depth to provide a detailed analysis for the user about how to think about visualizing documents and word frequencies. The examples seem out-dated with some going back all the way to the year 1994. The positive take away from this is the neat categorization of examples from tag-clouds to text arc to bar charts. Variation in presenting examples has been good and it certainly helped me decide which visualization to choose depending on the requirement. Baby-names has always been my most favorite.

The literature and citation relationships section uncovers a couple of possible applications like detecting plagiarism. Suppose a set of nodes map to a single node indicating that all of them have a common citation and a different node maps to the set of nodes but not the parent node, then in this case it is evident that the authors did not acknowledge original work done previously. Also, the importance of a paper can be determined by analyzing the degree of the node. If the degree is high, then it means that more number of papers have referenced this node and thus it is of high importance. I found it interesting to note the shift in analysis from nodes and links towards linking interactions. This approach certainly helps a better drill down into a certain time frame or a certain author to tell us which papers did the author reference the most.

Overall, I felt that the paper is very basic and the conclusion does not make a strong statement about why the authors prefer to critique the examples/tools than helping the reader understand in what lines can good visualizations be done.