Monday, November 28, 2011

TileBars; Visualization of Term Distribution Information in Full Text Information Access

                In this paper, they discuss the use of a new visualization method, along with a new algorithm, for searching full-text articles, instead of simply searching abstracts or short articles, in a way that allows one to systematically compare different articles in an unbiased way. TileBars uses an unknown algorithm to parse each paper into “tiles” where each tile represents a subtopic of the full article and the frequency of case term use within the subtopics. The visualization then allows you to quickly see the relative relevance of each article based on this distribution and frequency of terms.

                Though this paper was written in 1995, I have not seen many text mining systems that incorporate positional information of terms within a document, and compare it across other documents. This could be very useful in the early stages of information retrieval for certain research projects by providing a quick ranking of each articles relative importance in a search. When you search through something like the PubMed database, where there are literally millions of articles, the more refined your search becomes, the more likely you are to miss critical papers. This technique would allow you to limit the precision of a search and return a larger collection of papers. Normally this would be problematic since more papers returned (and sometimes there are thousands) means more papers to read. This provides a visual clue as to the most important articles to read first. Even though there is no true ranking for this approach, the visual system is very good at interpreting what relevance a paper could have based upon its profile, thus saving time and effort. This technique would be welcomed for the research that I do at the EPA, and I will most likely recommend looking into it.