Friday, November 25, 2011

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

     Marti Hearst presents an interesting visualization for presenting boolean style search results on full text documents. The "TileBars" paradigm involves dividing up each document into subtopics called TextTiles. The boolean search is performed on each TextTile and a small graph is presented beside each search result that helps the user understand roughly how many subtopics a paper has, the frequency of the search terms across each subtopic, and the distribution of them across the paper. This enables the reader to see if their search terms were scattered throughout the paper, or concentrated in one or two subtopics. I'm curious if the algorithm includes the abstract as a 'subtopic'.
     This visualization seems very useful for doing full text searches, such as for research. I don't see it being very useful for general web searches or having many other uses. I would be very interested in reading more about the author's TextTiling algorithm and how he evaluated it. To me that seems to be a very important aspect of the TileBar system and yet the author went into very little detail about it. He says it is "Serviceable" for TitleBars, but how did he come about that conclusion?
     I did find his analysis of the difference between the abstract of an article and its full text. Everything made sense and was intuitive, but I had never thought about the difference from a search perspective and so the simple analysis was refreshing.