As the title suggested this paper talks about application of visualization in analysis of text and documents. This topic was very new to me so I had lot to learn from it. I got a good insight of the applications in Text Mining.
The authors say that the most common strategy in text mining is to identify important entities within the text and show connections among them. Through a series of example, including TAKMI system, Jigsaw system, BETA system of IBM Web Foundation etc., they have told what it means by identifying entities and how connections are showed between them. But I wanted to know the underlying principles for this as in how they identify whether an entity is important or not and how do they make sure whether two entities should be connected or not.
The author then talks in detail about the TRIST tool. I guess they were trying to underline the importance of categorizing the extracted data into different dimensions and provide the user/analysts more flexibility. Moving on, they provide methods for visualizing document concordances and word frequencies. Here they talk about alphabetical indexing and contexts. I agree with the notion of sticking to the basics like this as they are very easy for users to figure out. Here again they provide a lot of examples like DocuBurst, Word Tree, Tag clouds, word clouds etc. It is really easy to understand this paper and what it is talking about due to the number of detailed examples they have provided. But I believe that’s too much in a paper for the reader to get distracted easily.
0 comments:
Post a Comment