Tuesday, November 22, 2011

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The paper is more than a decade old. I don't know if there's any analogy to Moore's law in case of Visualizations and time, but still the paper was an interesting read and bit informative too.

Here, author discusses about visualization approach of TileBars which is an effective way of structuring text in full text documents. In the beginning, authors explain about how can we retrieve information by running queries and also lists out the pros and cons of this approach.

I found this visualization approach of TileBars quite interesting where anyone would also know the importance of such running queries. I think after about 16 years when this paper was published, there are number of actual examples that we have today which are inherited from such approach.

Author also proposes an interesting algorithm called TextTiling to query and know the frequency distribution of the text and possibly determine any patterns. However, I think this approach is quite old now. We certainly have better text summarization and information extraction methods which are based on word clusters and ranking algorithms. I'm not sure if the TextTiling was the tipping point of such algorithmic needs.

Overall, I learned couple of useful things from this paper and feel that this would be very useful for anybody who is into data mining and information extraction field.