Saturday, November 19, 2011

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

Keeping in trend with the other papers in this set, this paper also aims towards making analysis of the information an easy task. A given text or data is analysed for various reason and using various parameters. Though the conventional use of reports for such analysis is still widely used, it at times might not provide a comprehensive analysis. The analysis using reports might not lead to proper identification of entities, relationship between the entities and other information. Hence in this paper the authors propose a tool called “Jigsaw” which provides an in-depth analysis of the information.

The Jigsaw method as the name suggests first identifies the entities form a text and then relates them with each other just like a jigsaw puzzle. One point which is stressed upon by the authors in this paper is that they still believe and support the use of reports for analysis but propose using Jigsaw as an additional tool for better understanding. Jigsaw provides a multiview perspective to the analyst and represents the data with an interactive visualization. The information is presented using four views which are tabular connections views, semantic graph view, scatter plot view and a text view to provide a perspective to the analyst. The entity action in one view is translated to the other view as an event and represented for analysis. The tool has features which allow the analysts to query or search for keywords in the data and analysts can also draw diagrams while inferring information from the text as Jigsaw integrates with Microsoft OneNote. However the authors state that for proper viewing of the data, Jigsaw might be required to be viewed on different screens which can prove to be an overhead for the tool.

Towards the second half of the paper, the authors discuss about the implementation of Jigsaw. Jigsaw is built using Java and accepts XML as the input. It is designed by following the MVC architecture. But as for visualization, the tool provides the analysed data in terms of list, graph, text and scatter plot. These have been well illustrated in the paper with examples. Thus as it is inferred from the paper, Jigsaw can be primarily used for investigative purpose; the tool can be further worked upon to extend the usage. Also in my opinion, the authors should target at increasing the number of visuals or views in which the data can be presented by the tool.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

I liked the start of this paper which stresses upon the need to analyse the data retrieved by a query. The idea suggested here gives more though on the structure and properties of the data retrieved by a query. This is in particular to the length, frequency and other properties of the word. Thus the tool suggested in this paper by the author, “TileBar” promises to provide an efficient solution for this.

I particularly liked the structure of the paper presented. The author has presented the standard information retrieval techniques. In particular there is a discussion on the “similarity search” method for understanding the similarity between the information retrieved from multiple document sources. The author points out some of the drawbacks of using the similarity search techniques like ranking, information placing and the way information is discussed throughout the text. There is also a difference when the comparison is made on the full text and the abstract. Thus as a solution for the problems discussed, the method of TileBars is presented.

Tilebars provides a good visual inference for the data retrieved by a query. In particular I liked how the length of the bars varies as per the document length. It also gives a good graphical representation with the change in brightness for keeping a track on the frequency of the terms. The method seems to prove quite useful in the example medical scenarios presented in this paper. Though a discussion on few more instances or examples could have proven the usability of this tool more.

This is the second work presented by Marti A. Hearst that has been reviewed in this set. I could state that the author gives lot of importance upon the information visualisation and understanding the visual presented. Even “TileBar” presented in this paper is used for the same purpose.

Reaction: Tag Clouds and the Case for Vernacular Visualization

This article to start with was very short and precise. It expected the reader to be familiar with multiple concepts such as the tag clouds, word visualization and other visualization techniques used. However the authors for this article have presented numerous illustrations to support their discussion in the paper. The first example of the word cloud that came to my mind after reading this article is the one present on our course blog which is built on the similar lines.

The concept of tag clouds has been around for quite some time and has been well explained by the article. The history of the idea for tag clouds and its first instance by social psychologist Stanley Milgram has been well explained and we can find a similar implementation amongst the tag cloud examples seen in The New York Times today. The authors have coined the term “vernacular visualization” for the idea of tag clouds as the idea was born outside the world of computers and the research community. The data is depicted in tag clouds using size or color differentiated text, time-lined based text or in the form of two-worded phrase as in IBM’s ManyEyes. Also the authors have focussed on some of the theoretical problems faced by tag clouds as in the need for the words to be ordered alphabetically, the relation between two similar words causing to be a problem (for example, east and west being related still is placed apart from each other), and words becoming more important because of their size. These factors have been discussed with the help of the experiments. But inspite of all these factors the authors have attributed the success for tag clouds as present a friendly appearance to the viewers. I would also support this argument and state that tag clouds does provide the viewer a quick inference to the information present in a text of data of related words.

Reaction: Information visualization for text analysis

The main theme of this paper is to analyse data in a text. The authors have presented this analysis using visualization. What I liked about this paper is that the aim of the work is clearly defined in the start of the paper which includes text mining and use of visualization for the same; visualization of words or phrases and forming concordances of the same and finally visualizing the relationships between words and their usage in the language.

The paper has presented various techniques and all of them correspond to each of the visualization techniques discussed in the class. For text mining, the visualization techniques discussed includes TAKMI text mining system, JIGSAW system, BETA system and triage system. As discussed in the paper, amongst the four systems, even I found the triage system to be effective as a system for visualization. The icons and grouping of related items in this system proves to be effective for analysts while studying the data. The next attempt in this paper is to visualize concordances in the text and word frequencies. Amongst the methods described for this visualization, I liked the Sunburst (modified as the DocuBurst) method the most here. This system gives a good view of the text treemap and effectively links the words in a text. Baby Name Voyager is another example that had been discussed many times in the class and has been pointed as an effective method for visualization of text for analysis. Towards the end of this paper, the author discusses on methods for visualizing the numbers and relations between the citations. It has been observed that many of the works have been used as references and have been cited widely. This visualization discussion helps in analysing the most referred or cited work or field f work across the references. The linked bubble graph used here serves to be useful and helps for the purpose.

Thus this work gives us an overview of using visualization as a tool for analysing the text and data. It has provided us with an example for each of the techniques discussed in the class.

Viz: Fukushima radiation spread: wide dispersion and localized hot spots

Reaction: Tag clouds and the case for vernacular visualization.

The article describes the problems of a "vernacular" visualization tool such as tag clouds, and presents an analysis of when the first tag cloud was created, and how the Web 2.0 used it to aggregate unstructured information.

In my opinion, this articles describes the problems of tag clouds, and help the reader realize that even though it has many problems, people are still using tag clouds because it is a way to visualize text in a simple way. The article finishes by asking the theorists to rethink their visualizations tools. I agree with the article in that tag clouds have all the problems they describe (i.e. long words get unnoticed, find a word can be difficult, font size doesn't give concrete information about the word), but it can be a simple way to understand the text you are analyzing, and can give you the first step to understand all your text corpus.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This paper, like the previous three, introduces a new system of visualization for investigative analysis called JigSaw. JigSaw provides analysts with several different views for a document collection. It primarily focuses on displaying connection between entities in the documents. The authors claim that each view provides different perspective to data and thus they become the main theme of this paper.

It was interesting to know that JigSaw is actually written in Java and is based on MVC pattern. The paper provides some technical background related to JigSaw in preceding section. The focus of the article then shifts to the different views provided by JigSaw viz. –

  • List View: which shows connections between sets of entities and what needs to be done based on the situation (list being too long or too short) 
  • Graph View : represents reports and their entities in traditional node-link graph/network visualization. JigSaw is compared here with other graph viz like the GreenLand and I would consent with the author to this point about Jigsaw handling the situation of drawing incremental view rather than depicting it on large layout. 
  • Scatter plot View: highlights pairwise connections between entities and there was not much new in this view that I already didn’t know about and finally, 
  • Text View: which is based on actual text of the reports. The authors claim that multiple reports can be loaded into one Text View and I like the idea of showing the same using multiple tabs within the same window.

The paper then describes a scenario where JigSaw was applied and its result. The paper then concludes with the limitations of the system being introduced mainly being the lack of a way to represent estimated likelihood of probability which investigative analysis often takes in consideration. Overall a descriptive reading.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

This is very old paper, again from Marti Hearst which focuses mainly on introducing a new visualization paradigm called TileBars. The paper starts with a taxonomy of few terms involved in visualization of full length text article then it is followed by listing out the problems faced in similarity search, ranking problem and how abstract search can sometimes be misleading.

The paper then focuses on the main theme of the paper i.e. TileBars and next few sections describe how they can facilitate in searching of terms in full length text. 

I also noticed few important cues to visualizing the TileBars like - 
  • variation in position, size value or texture, imposes an order which is universal and immediately perceptible. I agree to this point completely because if you just vary one of these parameter it wont have the same visual impact as the combined one will have
  • Another such cue was how varying shading implied less or more region and how they are far better than varying color.

It was new for me in the sense that though I knew that document searching takes place on keyword frequency and distribution, this unique way of representing the same fact was new to me and helped in getting to know TileBars in details.

Friday, November 18, 2011

Reaction: Tag Clouds and the Case for Vernacular Visualization

This paper is very informative, in the sense that it first explains what vernacular visualization is – basically visualization that comes from non-academic sources - Tag Cloud being one such example. The paper then talks about various ways to represent tag cloud are mentioned in this paper, advantages and limitation of this kind of visualizaiton among other useful points. After a brief introduction to Tag Cloud which are described as visualization that came into light after the advent of Web 2.0 where there was tremendous activity on website and coping with such tedious thing was left to tag clouds their history is pointed out in this paper which is pretty much factual and irrelevant to main theme of the paper.

The paper then jumps to various ways of representing tag clouds like based on examination of textual documentation different tag clouds  can be plotted viz. –
  • Timeline based tag cloud where a timeline is associated with tag cloud
  • Color varied or Size varied tag cloud where words are varied in color and size based on their usage/frequency
  • Two-word view of IBM’s allowing grouping of two words and depicting a two-word view among others.

Finally the paper describes theoretical problems faced by tag clouds mainly –
  • Long words getting undue importance
  • Closely related words like east and west separated out due to alphabetical ordering

And there are research studies that support this view.

Marti Hearst whose paper on Information Visualization for text mining, I read prior to this, suggested that, even if tag cloud have limitations of their own, they serve as social signifiers that prove to have friendly atmosphere and provide a point of entry into complex site thus proving a valuable asset when summarizing activities of people on such sites.
Overall a very good and I learnt that indeed there are classification or I should say types when it comes to visualizing unstructured text in so called tag cloud.

I found that this paper was to be followed up by reading this article I had read some time back which completely depicts picture of tag clouds -


The paper provides several examples of tools used in text mining and how they try to visualize their results in order to make it easy for understanding the results of text mining. The paper is self describing and easy to read in the sense that I liked the way it is organized into –

  1. Visualization for text mining where the author speaks about how visualization the text mining results is becoming a promising tools citing several examples on the way like the TAKMI’s text mining system, BETA system of IBM’s Web Fountain project
  2. Visualization in documents and website where Marti explains that word frequencies i.e. concordance visualization along with other methods like tag cloud on websites and theme river on sites earlier covered in class viz. NameVoyager help in understanding text collection
  3. Visualization in literature and citations is explained by Marti as though the text is semi-structured can be can help in literary analysis.

I agree with Marti’s view that, visualization has also been applied to online conversations and other forms of social interaction which have textual components and I believe such analysis help in getting acquainted with trends that follow day-to-day activities. IBM’s should get a special mention in this type of visualization which is done by the author.

I feel that it was a good overall short but precise and upto the point.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

The first thing that stood out to me as I read this article is the authors' consideration of the cost to analyze documents and determine relationships. This also brought up the question of how will verification of the analysis be carried out? Finally, will this system be able to replace the process of manually forming a mental model of the information presented in the papers?

As I kept reading, I realized the Jigsaw system is a much more intuitive application than I imagined. Although limited to a multi-monitor set-up, it definitely has its practical uses in academic and professional settings. In my opinion, Jigsaw's graph view is the most useful out all the ones presented in the paper. It provides users with the simplest method of visualizing the relationships between the reports being analyzed. The other views seem more involved and would require the user to do further investigation, whereas the graph view could allow the user to retrieve a mental model with a quick glance.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The problems presented in this paper describing meaningful textual searching with lengthy documents are ones I can currently relate to this semester. I have experienced the trouble in finding research papers when searching for a broad topic such as "web engineering" and have had to get creative in determining additional keywords to narrow the search. I have began to notice that many paper abstracts are frequently accompanied by a comma delimited list of keywords that the authors believe may prove significant and useful in a database search. A major limitation of this technique is: how does one know which keywords the author has selected?

The TileBars interface is quiet intuitive. By providing a visual method of query prominence, a searcher can really fly through pages of papers that are insignificant to what he or she is looking for. I have been using Inspec and Compendex to search for papers related to web engineering this semester without this ability. I find myself wasting time by opening a link for every abstract of every paper that has a a title that may potentially be related to the specific topic I am searching for. The number of tabs in my browser becomes difficult to handle, and I often times close tabs that contain what I'm looking for because the pages for each search result look identical. An improvement that I think would enhance the experience of TileBars would be to sort the results based on which bar has the most "blotting." This would allow the searcher to see the most prominent papers first.

Guest: Billy Houghteling at crit on Nov 28

Hey folks,

We will be visited by Billy Houghteling, director of the Office of Tech Transfer at NC State, and founding director of Springboard @ NC State, an university incubator. He'll offer his input on your critiques.

Looks like he'll also be visiting us during final project presentations. More on those soon.



Thursday, November 17, 2011

Find: in defence of bad graphics

See in particular the "word clouds harmful" link and video.

Find: NY Times cross-platform ads in HTML5

NYT goes html5 across its sites.

The New York Times runs a cross-platform interactive ad campaign in HTML5
via Nieman Journalism Lab by Megan Garber on 11/15/11

Reaction: Tag Clouds and the Case for Vernacular Visualization

This short paper discusses the history, application, and apparent contradictions in the style of word clouds. I found this an interesting and informative read. The history of word cloud is distinctly interesting to me in that the application of such styles of interface had no real purpose; much like the discovery of mathematical equations in which, at the time of discovery, were mere computational constructions, but decades later, provided the foundations for real world problems, such as with Maxwell’s equations for electricity and magnetism, or number theory which has such strong application to computer processes. Something that strikes me as particularly peculiar, most likely because I am an American, is the the Eiffel Tour is not as well known the as Seine.

                Tag clouds, since densely constructed, can provide a wealth of information about word use, including word association. What word clouds tend to fail out, however, is directly comparing words across different word lengths were a simple order list of words is more effective, or for an efficient order search (except in the case of and order tag cloud).  At a glance, these clouds seem to be able to show a distinction between purpose and goals within the visualization community as it violates many rules, yet provides what some would call a vital function.

Reaction: Tree-Maps: A space filling approach to visualization of hierarchical information structures

Tree maps are one of the widely used visualizations used for representing hierarchial data. One of the prominent examples are The paper does well in explaining the pros and cons of using a tree map. Tree maps may be one of the better ways of displaying hierarchial information in a more asthetical way, but they have their own limitations on space and representing additional data.

The article tries to compare a tree listing, venn diagram and a tree map based on an example on file listing. This example potrays one of the underlying strengths of a tree map which is showing size/ importance together with the necessary text. We can use the size, color and text to show these three dimensions. For the same data, a venn diagram would have taked much more space because it uses circular shapes. However a tree map loses its charm when there are too many blocks and it is difficult to grasp the relative importance using the area or color because the area would look similar or very small. In the newsmap.js example, if a particular news article is not all that important, it will be very difficult to show the headlines in the same area. Also since humans are not best at perceiving area as length or angle, comparing areas will be difficult.

The paper acknowledges some of these difficulties and gives a note on how tree maps can be used alternately. I think a better approach at visualizing a great amount of information is to split it in parts and let each part be seperately visualized by a tree map. From what I have read, I believe this paper is aiming to introduce tree map as a potential visualization and therefore there are not too many statistical references or examples.

Reaction: Balancing Systematic and Flexible Exploration of Social Networks

Visualization, is not just about analyzing discreet entities but can involve analyzing data from a group perspective as well. Visualizing groups have become a really important and indispensible as they give a different view of data and can render critical information. For example, any email inbox can be used to vizualize individual emails. It can also be used to see the impact of a social group or a group of friends on the receiver. The paper also cites different examples where social analysis in very important.

The article endorses SocialAction which it introduces as unique escpecially because of the way it uses criteria and ranking. It aims to achieve good analytical results using this data and helps user to make new discoveries that are not possible without aggregating. I feel Social Action is an excellent tool for visualing aggregations and social networks because of a number of features it implements and flexibility that it provides using its ranking system and filtering. The tool shows greater promise when the graph looks a little too cluttered and helps in disambiguiting the information.

It is interesting to see the output graph patterns after enabling communites on the network graph and for me that is the standout feature. I feel this feature will help to understand the relationship a group in the community has with other as well as its own internal structure.

It would also be interesting to see how this type of graph can be applied to social networking sites and can be used by an users to see their activity in terms of group iteractions and how their contribution has changed in a group over time.

I feel this tool is a great step forward in analyzing groups because of the complexity of analyzing aggregated data and also using this data for correlating between groups.