The paper is an interesting read to allow people interested in information retrieval from large data collections. The author point out the importance of context and structure in information access from such large data sets thus deriving the point that the pattern of distribution of the terms comprising the text would be a critical structural aspect. The author has played a vital role in emphasizing the concept of distributional information, which was mostly not focused by the time of the paper. The proposed visualization tool, TileBars, explains different distributional characteristics and properties associated with respect to the document and others used in retrieving information.
The author tries to explain the standard information retrieval methodologies used before briefly explaining his visualization tool to allow users to gain insight into what thought process is used in information retrieval of texts using boolean queries. In Boolean retrieval a query is stated in terms of disjunctions, conjunctions, and negations among sets of documents that contain particular words and phrases. The author's TextTiling algorithm used for his visualization tool TileBars, tries to improve the performance of the text analysis by overcoming the drawbacks in traditional approaches associated with Ranking. It also makes use of the various visualization properties to enable a better user interface. The author follows up with the explanation of his approach with couple of examples and quotes his proposed solutions for future work.
0 comments:
Post a Comment