Monday, November 21, 2011

Reaction: TileBars: Visualization of Term Distribution Information in Full Text INformation Access

This paper is an early approach to text analysis of full docuements. It starts out point out that up until the point the article was written, most text analysis of documents focused on the titles and the abstract. This paper introduces a system to analyze the entire document and makes the case for why that would be preferable.

This paper seemed to focus on the unique structure of academic papers and how that could be exploited for relevance searches. I thought that it was an interesting idea to incorporate, but wasn't sure how it could be applied more generally, since many documents don't follow the same structure as academic papers. Indeed, the TextTiling algorithm would only be useful on academic papers and not on, say, a novel.

Overall this was a good introduction to the issues involved in analyzzing text documents. I think it was a little too specialized, but the tool they introduced seemed to do a fairly good job at the analysis it was attempting. I thought the best point that I took away from this paper was in the related works section where they talk about how difficult document content information is to display in existing graphical interface techniques. In that case, it would make sense to try to section the problem as best you could and deal with smaller problems as opposed to tackling the big problem.