Tuesday, November 22, 2011

TileBars: Visualization of Term Distribution Information in Full Text Information Access

The purpose of an information access system is to retrieve the most relevant information as requested by the user. There have been many approaches for information retrieval and this paper presents a promising approach using visualization paradigm called “TitleBars” that makes use of the text structure from full text documents.

To begin with, the authors present a brief overview of information retrieval using query and the issues faced using this approach. They emphasize the need to analyze the data retrieved by a query and highlight the main features of TitleBars including simultaneously viewing of the length of the retrieved documents, frequency of the query terms, and their distributional properties with respect to the document. The visualization approach helps the user better understand the role of each query term within the documents retrieved and where other standard information retrieval methods succeed or fail.

The structure of the paper is well written. The authors present a glimpse of the standard retrieval techniques, and their drawbacks. TitleBars is then introduced as a solution to these drawbacks and the approach is well explained as a reaction to three main hypotheses. The paper is concluded by stating the related work and the future extensions to this paper.

I feel this approach is of extreme importance to every user who uses the internet for searching and retrieving documents. Often, I find myself confused and lost when I am trying to search for data using a generic keyword such as “web visualization” and most of the results retrieved have less than 1% relevance to my keyword. The users need to be really creative to give the right combination of keywords to ensure apt search results. The approach presented in this paper, “TitleBars” is very effective as it presents a visual display of the results and thus the user can open the documents that have a high relevance to his search term and ignore the others. Apart from reducing the visual clutter, it also saves a lot of time for the user.