Tuesday, November 29, 2011

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper discusses a new visualization pattern called TileBars for searching full-text documents, instead of just searching the abstracts. This allows for systematic comparison between different articles. It also employs explicit term distribution information in Boolean-type queries by making use of the text structure. This was something I had not come across before, regarding retrieval of full text documents.
The algorithm in TileBars parses each paper into "tiles". A tile is basically a subtopic of the full article and the frequency of case term use within the subtopics. It presents visually, the results and the user can see the relevance of each article based on the distribution and frequency of terms. Thus, it is very effective as the user can open those documents that have high relevance to his search term and ignore the others.
The author mentions an interesting point, that such information access methods are not isolated and have dependencies on factors such as document subsets. I would be particularly interested in looking at the results of applying this pattern on huge documents and how much clutter and time it can save for the user. Nevertheless, it is a very good article for someone interested in data mining and information retrieval.