Saturday, December 3, 2011

Find: Navigation with way-finding graphics: with Tony Howard

Part of the inspiration for the metro mobile ui. Nice discussion of color. 

Navigation with way-finding graphics: with Tony Howard

Tony Howard, the Managing Director of the London based Transport Design Consultancy discusses his approach to way-finding signage.

...(read more)

Thursday, December 1, 2011

Tool: JavaScript Client Library for Google APIs Alpha version released

JavaScript Client Library for Google APIs Alpha version released

author photo
author photo

By Brendan O’Brien and Antonio Fuentes, Google Developer Team
Today we reached another milestone in our efforts to provide infrastructure and tools to make it easier for developers to use Google APIs: we have released the Google APIs Client Library for JavaScript in Alpha. This client library is the latest addition to our suite of client libraries, which already includes Python, PHP, and Java.

Wednesday, November 30, 2011

Reaction: Information Visualization for Text Analysis

This was an interesting read. It deals with applications in the field of Text mining which involve visualizing connections among entities within and across documents, methods for visualization occurrences of words or phrases within documents and visualizing relationships between words in their usage in language and in lexical ontologies.

As discussed in the first section of the article that one of the most common strategies used in text mining is identifying the major entities within the text and attempt to show connections among those entities. This is explained by giving an example figure 11.1. I think this is a pattern followed most commonly followed by everyone in the field of visualization. Whenever you look at a social networking site or any other site that deals with lots of objects, people generally tend to find the major entities and relationship among them.

This is an article with lots of example and pictures so it was very easy to connect with the read as its always easier to understand graphical visualizations. I found the examples given in figures 11.3 and 11.4 the most interesting. 11.3 deals with The BETA system for exploring document collections, showing results listings for the query web fountain on the right, augmented with TileBars, and entity frequency information plotted along the left hand side and 11.4 deals with the TRIST interface.

I think overall it was an interesting chapter giving good knowledge about various visualizations used to represent textual data.

Find: You're reading bad fonts online, says leading typeface designer

Tuesday, November 29, 2011

Reaction:Jigsaw: Supporting Investigative Analysis through Interactive Visualization

I found this paper to be the most interesting. It talks about a system called Jigsaw that represents documents and their entities visually in order to help analysts examine reports more efficiently and develop theories about potential actions more quickly. The Jigsaw provides a special emphasis on visually illustrating connections between entities across the different documents.

There are four views provided by the system. Some are mainly textual and report based. So they fall under the category of text view and scatter view. The text view has document in highlighted entities which shows that the main focus is in that particular region. The scatter view provides sliders to ponder over specific entities in a report and its connections. The other views in the system are list view and semantic view. As the name suggests the list view provides a list of entities and relations between them where as the semantic view is a graphical representation of the entities and their relationships. The text view provides an interesting view by allowing the raw report to be viewed with highlighted words that group by colors.

Overall it was a very interesting paper. I think the scenario explained at the end of the paper through figure 6 was pretty interesting. I think it highlights the basic gist of the paper and is a good practical example of all the views explained so far in the paper.

Reaction:TileBars: visualization of term distribution information in full text information access

This paper deals with the concept of TileBars. TileBars demonstrate the usefulness of explicit term distribution information in Boolean –type queries.

The paper also has a discussion on rankings. The author says that the ranking should provide users results which are quite informative and easily comprehendible. The standard approach to document ranking is opaque, users are unable to see what role their query terms played in the ranking of the retrieved documents. An ordered list of titles and probabilities is under informative. The TileBars are compactly arranged and indicate relative document length, query term frequency, and query term distribution. The technique helps in ranking of the documents based on various features of the term sets.

According to me TileBars is certainly an effective way of visualization. Every document is represented using rectangles and each column represents sections. The frequency is represented using color coding. And each row in the rectangle represents the visualization for a word in the multi-word search query.

I found the figures 4 and 5 quite interesting. They show the result of a query on three term set in a version of the interface that allows the user to restrict which documents are displayed according to several constraints like minimum number of hits for each term set, minimum distribution etc.

Reaction:Tag Clouds and the case of Vernacular Visualization

Tag cloud is a very old terminology, going as far back as the early 19 hundreds. Of course in those days it was used for a different purpose, but this paper shows how the technique has evolved over the years and now it is being used in the field of information visualization and graphics.

These days tag clouds have been used as aggregators specifically for social networking sites to display the text or messages involved in a network. In spite of all its drawbacks, tag clouds have a very special place in the field of information visualization as the paper suggests. Some of the drawbacks are that in tag clouds the longer words get an undue emphasis and the shorter ones appear as if they don’t exist! While if they are arranged in an alphabetical order then all the related words gets scattered which is not actually what we want! So for analytical purposes they are not very effective.

As we know that at present the tag clouds are used on a large scale. Although its main use in the field of visualization originated from Flickr, a lot of people have opted for this graphical technique.

A tag cloud is truly a “vernacular” technique which does not come from the visualization community and that violates some of the major rules of visualization design. But tag cloud’s widespread popularity and flexibility suggests that it passes the test of applicability.

Reaction:Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This is one of the interesting papers on text and document visualizations which talks about visualization tools and their importance for investigative analysts. This form of analysis is widely used in academia for trying to find relevant information in a particular field of study and its significance comes in saving lot of human labor. The other important factor that makes this kind of a tool a very interesting one is that the human element is limited in deriving conclusions from the information acquired from a particular number of documents.

Sense-making has been defined as ‘A motivated, continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively’. And most of the work associated with investigative analysts is dealt with sense making. The paper talks about the work put in by the authors in trying to help analysts search, review, and understand the documents better. The authors tried to incorporate the intelligence analysis process conducted by Pirolli and Card that is organized by foraging and sense-making. the primary unit of analysis from documents for Jigsaw is an entity. Jigsaw does not incorporate computational linguistics analysis techniques.

It provides for various views (List, Graph, Scatter plot, Document, Calendar, Document Cluster, Shoebox) which provide different perspective onto the data. User interaction with one view is translated to an event and communicated to all other views which then update themselves appropriately to allow for simultaneous examining of documents under different perspectives. The authors explained the design and infrastructure of the system in detail and have provided plenty of necessary information. This is definitely a very good read for text and document visualizations.

Reaction: TileBars: visualization of term distribution information in full text information access

According to the author TileBars demonstrate the usefulness of explicit term distribution information in Boolean type queries. It simultaneously and compactly indicates relative document length, query term frequency and query term distribution.

The paper is an interesting read to allow people interested in information retrieval from large data collections. The author point out the importance of context and structure in information access from such large data sets thus deriving the point that the pattern of distribution of the terms comprising the text would be a critical structural aspect. The author has played a vital role in emphasizing the concept of distributional information, which was mostly not focused by the time of the paper. The proposed visualization tool, TileBars, explains different distributional characteristics and properties associated with respect to the document and others used in retrieving information.

The author tries to explain the standard information retrieval methodologies used before briefly explaining his visualization tool to allow users to gain insight into what thought process is used in information retrieval of texts using boolean queries. In Boolean retrieval a query is stated in terms of disjunctions, conjunctions, and negations among sets of documents that contain particular words and phrases. The author's TextTiling algorithm used for his visualization tool TileBars, tries to improve the performance of the text analysis by overcoming the drawbacks in traditional approaches associated with Ranking. It also makes use of the various visualization properties to enable a better user interface. The author follows up with the explanation of his approach with couple of examples and quotes his proposed solutions for future work.

Reaction: TIMELINES: Tag clouds and the case for vernacular visualization.

The paper is a very interesting read giving insight into the details of how/why/when/where tag clouds were used. The authors classify the primary purpose of a tag cloud is to be able to present a visual overview of collection of text to which I agree without any doubt. The authors have tried to quote examples of usage of tag clouds dated way back to 1970's. It is quoted in the paper that the tag clouds have gained significance from early 2001 when different social networking sites have started to use them and then followed by various other websites. The authors stress on the point that although aggregation of tags is a significant feature of the tag clouds it is not the sole purpose. Different examples of tag clouds and their features have been quoted in various contexts of the paper. The paper suggests that although tag clouds some times fail to provide quantifiable benefits they imply a friendly atmosphere and a point of entry to complex information. The paper suggests with proofs that it is always a better option to have the tag clouds ordered alphabetically. Also it clearly explains the real meaning of what alphabetical ordering. The authors also bring in the fact that some tag clouds are portraits of individuals rather than groups. This paper clearly explains the fact that TAG CLOUDS are derived from the vernacular techniques of data analysis and is not closely knit to the visualization techniques used and hence they work in practice but not in theory of visualizations.

Reaction: Information Visualization for text analysis.

This is a reaction to one of the chapters in a textbook. This read talks about how visualization can be used as a medium for analyzing and understanding large collections of data. It actually thrives to make a point of the significance of visualizations in literary analysis by quoting umpteen examples. The author makes it very clear that most of the people working in the data analytics have not properly weighed the importance of visualizations as a search interface and I concur to the authors point on that.

This chapter of the textbook that we read, tried to explain data collections in view of analytics describing the relations,connections among different entities within the collection. It also talks about the relationships that exist between words in their usage language and in lexical ontologies.

The significance of this read is that it explains what factors are to be taken into consideration while working on information visualizations in text mining, document concordances, word frequencies, literature and citation relationships. Different examples quoted for explaining the visualizations in text mining were TAKMI, JigSaw, BETA systems. This read allowed me to gain understanding of how concordances can be made to provide data/information visualizations. This read talked about the SeeSoft and the TextArc visualizations in explaining how this is achieved. DocuBurst and the WordTree visualizations are the most commonly referred used techniques and this chapter talks about them too. The read also talks about the citation analysis graph and the ThemeRiver visualization.

This chapter is a comprehensive read of different visualization techniques to be used in text analysis and explains them by quoting different examples. Although examples for different systems were quoted this read failed to explain most of the important concepts in detail.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

Jigsaw is a tool to help document analyzers create visualizations to aid in their analyzations. My first thought when I read this was of our previous reading about writing notes and writing on paper in general as an external cognition aid, and sure enough, the authors note this in the third paragraph. They do a good job of motivating the usefulness of a tool like Jigsaw later in the introduction as well.

It looks like Jigsaw's primary document input should be short reports, and Jigsaw concentrates primarily on drawing connections between entities.

I appreciate the discussion about their implementation. They went with an MVC architecture, and an XML file structure, which is interesting to see.

Beyond that, I think we've seen all of these views before. I think we've seen the list view, but I know we've seen graph views and scatterplots. The text view is a good addition though, since it links the more abstract visualizations back to the original source. That's good stuff.

Anyway, I've been inspired to at least look for their website, here. It's somewhat disappointing that apparently they make you email somebody to get a copy of Jigsaw, and they don't seem to offer the source code.

Reaction: TileBars: visualization of term distribution information in full text information access

This article talks about tilebars, which are actually a visualization that I'd never seen. After reading through the paper, it looks like something that could be fairly useful for some quick late night research. Tilebars allow for a quick overview of relevancy for a document for a given document search, and also allows you to tell where inside that document might be the most relevant area. Skipping directly to the most useful area of a document could help a user decide whether the whole document is worth perusing for their current use. Outside of the visualization, it sounds like the author try to tackle indexing methods for allowing a quick creation of this visualization. It sounds like they're trying to avoid natural language processing though, and they do not do any cross-referencing of documents to see how many times that document has been cited, or our famous the authors are or anything.

Still, I can't help but wonder why I do not see this anywhere, and I wonder if it has to do with indexing.

Reaction: TIMELINES: Tag clouds and the case for vernacular visualization

This was an interesting piece about the origin and usefulness of tag clouds. The article seems to mainly talk about the rise of tag clouds, and almost seems more promotional than purl informational. The evidence presented for their effectiveness as visualizations seems more circumstantial, but it's convincing enough. It touts them as a visualization that allows perspective on text. It offers a simple overview of text content from many different sources: blogs, political speeches, psychological experiments, and fiction writing apparently. The designation of "unstructured text," as an area for potential new visualization research is interesting, although the authors mention that since people want to use them of analysis, maybe researches should "rethink the purpose and goals of their creations." I thought easing analysis was already a large part of information visualization.

Reaction: Information visualization for text analysis

This web article is a good overview of various information visualizations for text analysis. At the outset the authors mention a distinction between textual search tasks, and analysis, although they seem to refer back to previous discussions to actually definite the difference. It's probably easy enough to guess the difference.

Text mining is discussed, and we see a reference to Jigsaw, from a different article we've read. A good description of a few other applications are given as well. Word frequencies and concordances are talked about. I find it interesting that the author shows that tag clouds are not as useful as lists for search, but I guess that seems obvious. Citation analysis is mentioned as well, and that's something I've found useful in the past, if not with a fancy visualization. I do kind of with Google scholar did a visualization in such a way (unless they already do, and I'm missing it).

This was a great overview, as I mentioned. I think I would like to read some of the other chapters in the series that this author mentions.

Tool: VisIt

Visit is a parallel visualization and graphical analysis tool for viewing scientific data on Unix and Windows machines. It is completely free and is available for download along with the source code.
It is developed at the LLNL to analyze the results of the terascale simulations.

It can be used to visualize scalar and vector fields defined on unstructured meshes and its prime significance is of its ability to treat both 2d and 3d models equally well.

The tool can be downloaded here and the rich features that it can provide can be found here.

Following are the screenshots of the tool being used to analyze data of a aerodynamics model.

Different plots ( contour, psuedocolor, volume, vector, subset, molecule, label plots, etc ) that are provided by the visualization tool, VisIt, allow users to input data in various formats. Different file formats that the tool can read are found here. The important thing is that it can read image data as well and plot contour information based on the user requirements to allow analyze images. And the lable plot that the tool provides allows users to tag different objects within the input model and then perform analysis on them.

Viz: Visualization of Food Consumption across the world

This is a visualization of the world's food consumption. This visualization presents information of the calories consumed and the income spent on food at various places across the world giving a very close estimate of the world's consumption rate. This work is done by the Food Service Warehouse using HTML 5 and jQuery. It is a simple visualization providing insight into the top countries that dominate the consumption on the world scale when compared with those countries that have a very less consumption rate. It also gives a thought into the income that is spent on the food.

You can find the visualization at the following link here.

Following are the screenshots of the visualization:

Reaction:Jigsaw: Supporting Investigative Analysis through Interactive Visualization

In this paper the author puts light on how "Jigsaw" provides multiple views on document entities.Personally I found it interesting how Jigsaw helped in providing multiple views of the document.Moreover the flexibility that jigsaw model provides to incorporate other models into is useful too.It provides number of views like text,document,graph,scatter plot etc. Out of these I found the graph view rather interesting. It is close to one of the papers that we read which had the example of the site InfoViz in it. I feel this kind a connectivity between entities has a wide range range use especially in the form of the graph view.

On a whole the Jigsaw is a nice method to show connectivity between entities,however choosing the appropriate view is a serious challenge I feel.

Jigsaw: Supporting Investigative Analysis through Interactive Visualization

Investigative analysts are people who try to uncover hidden plots by reading several documents and try to make connections between entities and events mentioned in the documents. This is a really memory intensive task and plays very important role in crime control and military intelligence. The authors have built a new elaborate visualization tool called jigsaw to assist the analysts in their work. The tool is not based on complex algorithms instead it leaves the intelligent work to the analyst and just aims to act as aid. 

Jigsaw has four different views. Each view is planned to be in a different screen. Each view provides a different perspective to the analyst. The list view and the semantic view are graphical representations. The list view shows all the entities and events of interest mentioned in a document as a list and connects there occurence in different documents. It is like a bi partite node link graph. It is simple and easy to comprehend. The semantic view presents a node link diagram which represents the connections between entities in a very manner. These links are across documents. There are two more views. These views provide a more textual representation. Scatter plots let the analyst explore the documents using sliders and a scatter plot of entities in the document. The last view is a text view that highlights the parts of the document under consideration. 

When the analyst interacts with one of the views these events are propogated to all the views. All views respond to these events with there own specific responses. However the analyst can turn off listening of these events for a particular view. This provides greater flexibility.

TileBars: visualization of term distribution information in full text information access

The authors discuss about the assumptions and the strategies to be used for text retrieval from full length text and how it is different from conventional methods of using extracting from titles and abstracts.The authors present a new visualization, Tile Bars which indicate frequency of the queried term and its distribution and overlap within the document.

Tile Bars acts as a analytic tool and the visualization also allows users to also view the relative length of the documents that are retrieved. It helps to know how well information can be retrieved for different queries.

It was interesting to read about the concepts of Similarity search and boolean search and how it is difficult to present precise and important information to the user in a convincing manner based on the result obtained by the above information retrieval techniques.The approach of Tile bars for the above problems seems a great idea as it takes into considerations the scenarios of long texts, coherent compact representation and passage context. The visualization shows a large rectangle representing the document itself and squares within the document represents Text files. The frequency is indicated by color which is very intuitive to the user. It is interesting to note that the visualization exploits the natural pattern recognition capabilities of human perceptual information so that the user can easily view and understand the information.

Thus, this visualization allows the users to intuitively view the query word distribution and other properties so that the user can comprehend the relevance among the documents.

Reaction: Jigsaw - Supporting Investigative Analysis through Interactive Visualization

This paper talks about Jigsaw which is built to analyse documents and the entities within those documents so as to create visualizations which can help in effective analysis. This system takes into consideration the concept of entities in the documents. The author suggests that entities mentioned in the documents to be analysed are connected. Jigsaw aims to effectively map these connections.

Jigsaw provides three kinds of views for the analyst after the required document has been processed. The tabular view which consists of lists of entities in different tables and relation portrayed by links between them. The graph view which relays the connections between entities and reports in a link-node diagram. Scatter-plot view which provides the option of viewing relations between entities and a text view which has the text with the entities under consideration  i.e words highlighted.

The author further explains about each view systematically and how useful it can be for analysis. I think that providing such different views and mapping the interactions within one view to the other is an exceptional property of this application. Normally, visualization don't keep a track of what the user has done in an other view. But, with this functionality; Jigsaw helps maintain the consistency between view which is very important in any analytic process. The option of having one view on one screen is also great.

In a nutshell, Jigsaw is an excellent tool for textual investigative analysis with a lot of scope for improvement and practical applications.

TIMELINES: Tag clouds and the case for vernacular visualization

The paper talks about "Vernacular visualizations" which mainly talks about new visualization ideas to represent information intuitively emerging outside the area of research. The paper talks in detail about tag clouds and how they are used to present information.

It was fascinating to run through the history of tag cloud and how Fortune magazine used colors and font sizes to convey company statistics and Flickr used it to represent the tags used for photos. Also  interesting is the representation by Many Eyes that displays two-word clouds so that users can get a better idea of the information.

The paper also discusses the shortcomings of the tag cloud as there are some problems like longer words getting more importance and how it is more difficult to make overall sense by just looking at the tag cloud. Nevertheless, tag cloud still seems like a simple way to present the gist of what is going on and is an easy quick attractive way for the users. Thus, it clicks in practice although in theory it may be cumbersome and unclear.

Information visualization for text analysis

This article talks about text mining, its visualization and its implications. The authors provide many examples of visualization of text mining and how most of them identify the important entities and then display the relationship among the entities or group visually related items for analysis.

 The next representation of textual information is in the form of concordances where any word of interest is centered and then its context in the document is presented in some visual way. Word cloud and tag cloud visualization are famous forms of text mining visualization.

 I specifically found it interesting that the article points out how text is very difficult to visualize given that it is not a nominal or categorical attribute. One form in which it is handled, is where visualizations represent the frequency of usage of a term(word) against a time series.

An important aspect use of visualizing text mining is representing citation relations and links for research papers. I found the visualization presented by Paperlens both simple and interesting by their use of brushing between views and linking.

In conclusion, the article was a great read, very clear and easy to understand and conveying the importance of visualization for text mining analysis.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper focuses on a new application for text visualization called TileBars. The author mentions that all textual visualization applications never take the structure of the document into consideration in the process of information retrieval. The application; TileBars takes this structure into consideration before building relations and providing results to the user. 

The author gives document structure prime importance because according to him; in an extremely long text there are multiple sub-topics and paragraphs which are related to each other in many ways. Thus, a user should have the power to search the 'query' in such sub-parts rather than the entire document. This according to me makes complete sense. In long chapters or papers, there are subtopics which might focus on the query itself. Thus, if there is a search on the complete paper; it might not turn up in the results as opposes to if its sub-topics are searched.

TileBars uses TextTiling algorithm developed by the author to partition the document into multi-paragraph segments and provide the user the length of the document, the position of the paragraph with respect to the document. TileBars uses size, color and position to depict the length of the paragraph in which the query was found. It also uses color to provide the information of the frequency of the occurrence. I think that with the use of structure as a property; this is an excellent attempt at depicting the retrieval of information about a certain query from the document(s).

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

Traditional information retrieval systems work well for titles and abstracts, which are compact and focused on main topics. However, full text documents can have different organization structures which means the traditional approaches may be irrelevant or incomplete for information retrieval. Thus the author presents Tilebars, an algorithm which displays relative frequency and distribution of search queries in visual form.

During the discussion, author describes the ranking techniques and states an important fact that ranking should focus on providing user with informative and compact results which are easy to comprehend. The tilebars algorithm gives user a visual representation which indicates relative lengths of the documents. It then uses shading to indicate frequency and distribution of the search terms. The author stresses on identification of document structure of full text documents which may contain subtopics or dropout topics along with the main topics. The figures in the paper could have been organized in a better manner, especially for examples where you may have to scroll up and down too often while reading.

This technique can be very useful to filter out the large research literature to narrow down the focus to relevant papers. Other techniques like ranking can be applied later to this filtered list. However, effectiveness of this technique needs to be studied with in depth testing. Overall, this paper was a good read for someone whose interests lie in information analysis or data mining

Reaction: Jigsaw: Supporting Investigative Analysis Through Interactive Visualization

This paper discusses Jigsaw, a tool which visualizes the textual contents of the documents. Jigsaw attempts to highlight and communicate connections and relationships between entities in a report. The author proposes different perceptive visual forms like graph, tables, scatter view, text view etc. Jigsaw presents entity relations and links in easily perceivable form which helps an analyst to get a better and broader understanding of the events and facts documented in the reports. However, author agrees that Jigsaw is not a substitute for careful analysis of the reports, but it acts as a visual index which can be used as guide for further actions.

The usability analysis of Jigsaw needs to be evaluated. Especially, usefulness for the real analysts should be determined. Also, as the size of the document grows, the effectiveness of the tool may get impacted. For larger report collections, in which entities in a category grow to thousands or beyond, some sort of dynamic filtering could be required.

Overall, Jigsaw promises to be interesting tool for exploratory data analysis. However, more study needs to be performed on the effectiveness and scalability of the tool.

Reaction: Information Visualization for Text Analysis

This article takes an overview of different text analysis techniques in terms of text mining, concordance and textual ontology. The paper explains different visualization techniques for each of these areas of text analysis. Each technique comes with a visual example which is very helpful for understanding.

Text mining is interesting field of extracting required information from available resources. The author explains simple systems like TAKMI, Jigsaw and Tilebars, as well as gives an introduction of more sophisticated techniques like BETA and TRIST. The next section considers visualization techniques for concordance which means an alphabetical index of words in a text with respect to their context. This section includes some interesting visualizations like DocuBurst and NameVoyager. This section also discusses popular concept of tag clouds in detail. The third section discusses a very interesting field of textual ontology which visualizes relation between words and texts like citation analysis. The graph created by Small has certainly inspired present search engines.
This is a great article for beginners, providing excellent introduction to current techniques and also providing researchers with the further directions for their study.

Reaction: Tag Clouds and the Case for Vernacular Visualization

This paper focuses on tag clouds, which the author terms as vernacular visualizations. Presently, websites use the tag clouds as aggregators to summarize the activity on the social websites or blogs. The author also points out other attempts of using tag clouds for analysis of political speeches and academic articles. Tag Clouds often act as an entry point to the websites and hence have became popular amongst the casual users. however, they have number of shortcomings in terms of length and organization of the words. Tag clouds may not be very effective for analytical purposes.

This article makes some important conclusions. As author mentions 'tag clouds work in practice but not in theory'. Casual users find it to be a good way to visualize and analyze the unstructured text. Thus, they have become popular even if they violet some rules of traditional visualizations. It was interesting to read how practical applications drive the theory even in the field of visualization, like some of the other technical fields.


This is a very interesting read according to me.The concept of reading the entire text and not just the abstract to perform information visualization was a very interesting and new concept according to me.The concept put forward in this paper is "tiles".Each paper goes through the tiles algorithm and further the papers can be compared.

However I felt that the information got by this can be very confusing for users to accomplish.

However the idea of using all the information of the papers for comparison according to me is a very nice thought,but however newer methods can be formulated to make its usage easier for end users.

Reaction: Jigsaw: Supporting Investigative Analysis Through Interactive Visualization

Jigsaw is a tool that helps in exploratory analysis by providing multiple views of a document and its entities. While reading a document, we usually form a mental picture or model, which is difficult to continue with as the document size grows bigger. In such situations, Jigsaw can help get a clear understanding of the structure and content of the document.

I liked this idea of represent trillions of bytes of data visually and is certainly better than the text summarizations available today. Also, Jigsaw can help in the area of data analysis and text can be presented in the form of graphs, charts or reports. The limitation in scalability of reports having thousands of entities in a category still remains. In this case, we would have a large number of labels, nodes etc and the user would need to scroll a lot. It would be interesting to see how this tool gets modified to adapt to such scenarios.

Reactions: Tag clouds and the case for vernacular visualization

This paper provides a good history of tagcloud and its unconventional rise in popularity. It then went on to describe some problematic aspects of tagcloud and how it violates tradional visualization designs. For example, related words are scattered because the cloud is organized alphabetically.

Personally as a user, I utilize tagcloud to simply access related posts w/ similar tags. I rarely paid attention to the visualization aspect.

Reactions: TileBars

TileBar is a visualization tool that aims on using text structure. Unlike internet search engine that is out today, tilebar will rank the results based on query term distribution, document length, and query term frequency. One inherit problem w/ this approach is that it's often difficult to detect when term(s) end and begin. Tilebar aims to resolve this problem by using the passage the text showed up in.

Unfontunately, the visualization that resulted from TileBar was disappointing. The lack of color and small pixels makes it very hard for the user to determine which document can potentially be more useful. In addition, numerous items in the reading like description of infocrystal and cube of contents was extremely confusing.

Reactions: Jigsaw

The paper began by describing the underlying difficulties in making discoveries on large datasets. Jigsaw, the reseaerch tool this paper revolves around, aims to solve this problem. By analyzing documents, different views can be generated by Jigsaw to help the user digest information. The paper then went ahead and elaborated on the four views and frame them into a scenario for clarity.

I feel that by having the ability to generate different views, we take advantage of the inherit pros of each visualization type. This in turns provide a good platform for the user to scan, analyze and try to discover relationships/correlations between documents. Beyond investigative analysts, this tool can also be useful to researchers in biology and chemistry field.

Reactions: Information Visualization for Text Analysis

This paper overviews different visualization that involves text analysis. Many examples have either been discussed in class or showed up in later readings. Of the group, I find NameVoyager to be the most effective visualization and tagcloud to be the least. DocuBurst is perhaps the most visually appealing but it fairs poorly for words that are not high on occurrences. Similar problem occurs w/ TextArc.

Overall text based visualization fairs pretty poorly against other visualizations we have seen in the past.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper discusses a new visualization pattern called TileBars for searching full-text documents, instead of just searching the abstracts. This allows for systematic comparison between different articles. It also employs explicit term distribution information in Boolean-type queries by making use of the text structure. This was something I had not come across before, regarding retrieval of full text documents.
The algorithm in TileBars parses each paper into "tiles". A tile is basically a subtopic of the full article and the frequency of case term use within the subtopics. It presents visually, the results and the user can see the relevance of each article based on the distribution and frequency of terms. Thus, it is very effective as the user can open those documents that have high relevance to his search term and ignore the others.
The author mentions an interesting point, that such information access methods are not isolated and have dependencies on factors such as document subsets. I would be particularly interested in looking at the results of applying this pattern on huge documents and how much clutter and time it can save for the user. Nevertheless, it is a very good article for someone interested in data mining and information retrieval.

Jigsaw: supporting investigative analysis through interactive visualization

The number of documents and the number of concepts and entities within the documents are growing larger. sense-making processes become more and more difficult for the analysts.

"Jigsaw mainly represents documents and their entities visually in order to help analysts examine them more efficiently and develop theories about potential actions more quickly".

Jigsaw presents information about documents and entities through multiple distinct visualizations, called views. Each view provides something unique to the user(analysts mainly) to get a different perspective. There are several views :

1) List View
2) Graph View
3) Scatter Plot View
4) Document view
5) Calendar view
6) Document Cluster view
7) Shoebox

The docment view is more like ibm many eyes or somewhat similar to tag cloud. Highlighting the word with a color and also using size to represent. Document cluster view is also interesting with rectangular shapes and giving user some control.

One of the applications that they talk about is using Jigsaw to explore
online articles such as web news reports or blogs on topics like political debates. They user could build a set of loosely related documents about a particular topic or person and then use Jigsaw to find connections etween entities among the documents. This can be really helpful.

In a way it is really good to have different views which user can choose but again also make it difficult to choose the appropriate one each time.

Reaction: Information Visualization for Text Analysis

This textbook chapter gives an overview of the state of the art on the tools and ideas for visualization of text, with the purpose of letting the readers understand text collections. The different visual examples and graphical illustrations make it easier to understand the difference between the visualizations. Also, the examples and diagrams, within a topic, helps understand the different representations of a visualization better. I especially found the tool TRIST interesting because it represents results of a search as document icons and also supports multiple linked dimensions that helps to find correlations among the documents.
Overall, it is a very good article on different tools, but the contained material should have been elaborated further to save the user some time on doing research on the particular tool of interest. Even then, it was good to read about and recollect the different visualization tools taught in class, as well as go over some of the interesting topics such as ManyEyes and SeeSoft.

TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper mainly argues for making use of text structure when retrieving from full text documents, and presents a visualization paradigm, called TileBars. TileBars seem to be a analytical tool for understanding the results of various types of queries. This approach to visualization of the role of the query terms within the retrieved documents. When we make a query there can multiple documents that can be of interest with further concentration of how similar or the frequency or the ranking of the occurrences.

Figure 3 given an inital preview of what exactly TitleBars is about. It provides a compact and informative iconic representation of the documents' contents with respect to the query terms.

TileBars allow users to make informed decisions about not only which documents to view, but also which passages of those documents, based on the distributional behavior of the query terms in the documents.

The three main aspects are :

1) The relative length of the document,
2) The frequency of the term sets in the document, and
3) The distribution of the term sets with respect to the document and to each other.

The bars are not ordered in any fashion but they just have a number. Seeing the bar it is easy to compare the documents and also the occurrences(how exact match). One drawback would if we have a lot of documents having the query hits with variation that is difficult to represent by the gradient of (black to grey to white). Also as we increase the search set(1,2,3..) the bar will be divided and again hard to compare .

Reaction: Tag Clouds and the Case for Vernacular Visualization

This paper explains the information visualization technique of Tag-Clouds. The authors mention how important tag clouds are and how they have transformed the way of analyzing textual information. The authors give a brief history of tag clouds - of how 'font size' was used as a property of providing or portraying information about the word. As time progressed the tag clouds became more and more famous and came into use in magazines and newspapers.

The authors then focuses on the importance of tag clouds as an information medium. In this discussion, the author mentions of the theoretical faults in the said information visualization. The authors mention that the font sizes that are much closer to each other are extremely difficult to compare. This is very true in practical sense. People cannot make out the difference between font size of '12' and '13'. Also, another problem is the problem of having longer words getting a-lot of emphasis over shorter words in the tag cloud. 

Even with these problems in the tag cloud visualization, authors say that it is still being used in a lot of places due to its advantages in the social media and its property of being personally appealing. Furthermore, tag clouds are the best tool to a get a gist of a huge text document or data matter. This makes tag clouds a powerful tool. This paper is a great information medium on introduction to the power of tag clouds.  

Reaction: Tag Clouds and the Case for Vernacular Visualization

This paper introduces the history and applications of tag clouds, which are used for visualizing textual information. It is a very interesting and informative article. The author, while describing the advent of tag clouds, presents the various forms in which they were used to provide and derive useful information from a collection of text.

Currently, tag clouds are used in many websites and are not limited to traditional ones like Flickr. The paper mentions single word and two word clouds. Even our class website presents an interesting example of tag cloud usage, where the heavy font of the word 'reactions', denotes that the largest amount of posts on the blog have that tag. Tag clouds can also be timeline based. In my opinion, using tag clouds for visualization depends heavily on their layout, how fonts are applied to the words inside them, the colors used and also the ordering between them. An interesting example was presented in class yesterday by a group which is working on tag clouds for their project.

The paper also mentions some problems with tag clouds, such as finding a specific word, or the problem of long words being difficult to spot. Personally, I feel that tag clouds are one of the simplest and better methods for analyzing large amounts of text, although it may take some time getting used to. They can definitely be used as a starting step in that direction.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This paper discusses about Jigsaw analyst tool which provides an analyst with multiple perspectives on a document collection. The system's primary focus in on displaying connections between entities in the documents.

Jigsaw presents information about documents and entities through four distinct visualizations. A tabular connections view containing multiple reorderable lists of entities in which connections between entities are shown by coloring related entities and drawing links between them. A grahp view displaying connections between entities and reports in a node-link diagram. A scatter plot view giving an overview of the relationships between any two entity categories. A text view displaying the original documents highlighting entities.

I think all these visualizations would be very useful. I like the idea that if user interacts using events among "select and show", that will propagate to all the views/visualizations. That way the analyst will know how the event will play in all the views. Integrating Microsoft OneNote to capture the thoughts will also be very helpful.

But, I suppose that Information analysis from multiple documents in real-time scenario is much complex. The Jigsaw is certainly a great start, where the software which is written in Java has an MVC architecture. There is certainly no flaw in the software architecture. But, the authors probably would want to interact with analysts and make a survey, which will help them incorporate features which would help analysts in real-time situation.

Reaction: TileBars: visualization of term distribution information in full text information access

Document visualization is one of the challenging and hot topics of Information Visualization.

Document visualization turns around two main concepts:

Relevance - what is directly interesting for me
Serendipity - what thing I wasn't searching for could also interest me

Tilebar is certainly an efficient and interesting approach towards document visualization, where we have a rectangle for each document, and each column represents sections(chapters, paragraphs etc.). The frequency is represented using color coding. And each row in the rectangle represents the visulization for a word in the multi-word search query.

This enables us to visualize explicit term distribution information in a full text information access system. The representation simultaenously and compactly indicates relative document length, query term frequency, and query tem distribution.

The Tilebar representation can be easily extended to other media types.Tilebars not only addresses "Relevance" factor but also the "Serendipity" factor.

The paper is certainly a very good read, with helpful diagrams to illustrate the explained concepts.

Reaction: Tag clouds and the case for Vernacular visualization

This paper discusses about various instances of usage of TagClouds a visualization which was born outside the "Research domain", hence it is termed as "Vernacular Visualization".

The paper discusses the history of "Tag clouds", how it was initially thought of during Soviet Constructivism. Then later on with Web 2.0, the TagClouds caught up really well with real world, where several companies like Flickr, Forbes have used "TagClouds" in visualizing their activity over several years.

Several instances are presented with clear images, on how TagClouds was used. I think TagClouds is a good visualization, but not very intuitive when considered for large sets of data. Because, there will be several instances of sample data, it would be really hard for us to compare the relative sizes. Yes, TagClouds will be able to give a very good glimpse of what are the most frequent words, but theoretically not feasible to all the situations, as pointed out in the paper.

Overall the paper is a very good read, but the scope of the paper is very limited.

Reaction: Information Visualization for Text Analysis

This chapter aims to introduce to the reader several Information visualization tools in text mining. The chapter is organized into three sections. The first section discusses about tools which helps the analysts in visualizing connections amongs entities within and across documents. The second section discusses about tools which help the analysts to visualize concordances. The final section discusses about visualizing relationships between words in their usage in language and in lexical ontologies.

The chapter is helpful for beginners in Information Visualization to introduce them to several text mining tools available. There are several tools which I discussed about in other Reactions like TagClouds, TileBars and Jigsaw in the chapter. Dr. Watson explained and illustrated several of these tools in his lectures in the beginning of the semester, so I wasn't surprised to go through this chapter.

Reaction: Information Visualization for Text Analysis

This chapter is focussed on giving a brief introduction and explanation of the different visualizations that are concentrated on the field of text visualization. The chapter is an excellent resource on how visualizations or specifically different types of visualizations can be applied to do data mining on a block or documents of text.

The chapter further discusses about visualizations which are used for text mining. Such visualizations are very interesting because they establish a relation between two or more documents with a specific string or same phrases. The next kind of visualizations mentioned are related to word count an word frequencies. According to me, these visualizations are the most useful in today's digital world. With the expansion of digital matter and media, there is an ever-growing need of an efficient way to analyse a-lot of information quickly. Word-Cloud or Tag-cloud allows us to visualize digital textual matter efficiently by highlighting the most important or most mentioned phrases or words in the matter. Wordle is an excellent example of such visualizations.

The chapter finally mentions about visualizations related to citation and literature relations. These visualizations are a bit complex and difficult to understand but nonetheless extremely useful. The chapter could have explained the examples related to these visualizations more as I couldn't appropriately visualize how useful the information portrayed by such applications could be.

Overall, the chapter gives a very good explanation on the various visualizations related to textual data and it is very interesting to learn of the the different approaches made on the same 'dataset' and how each of them differ in their own unique way.

Reaction : Information visualization for text analysis

The chapter provides a brief introduction to several tools/ideas used for visualizing textual data. It is a good overview of all the visualizations we have seen and discussed in class. The chapter would've been a little hard to understand without the previous discussions. It has enough graphical representations to jog your memory about important concepts to be remembered about visualization techniques or tools. It was interesting to read about See Soft and how it was originally designed for software development. As usual Many Eyes was one of the very interesting topics to go over again.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The paper presented aspects of document retrieval that I had not considered before, especially regarding retrieval of full text documents. I realized that distribution of the queried terms is an important factor to consider. The information about Document Structure was interesting to read and was a necessary factor to consider when searching documents without an abstract.
I noticed that the search results weren't in any sorted order. The paper states this, but I wondered if providing the results in a sorted order would make the life of the user much easier. The user experience in a way looks shaky because of the several implications that he/she has to be aware of while looking at the results.

Reaction: Information Visualization for Text Analysis

The chapter is a survey of various approaches and their pros and cons used in the visualization of text mining. The authors start by mentioning the brushing and linking technique based dashboard. This looks a lot like the list view of the jigsaw tool. Then they mention the jigsaw tool, the tile bars for visualizing documents etc. They talk about the TRIST tool and its unique way of depicting sets of documents. TRIST used icons do depict documents and the properties of the icon represented the properties of the document. Then they go on discussing about seesoft visualization and its depiction of document size, document attributes and accessing the documents using the cursor. Textarc is a circular version of the see soft bars. Other famous visualizations discussed berifly are the word tree visualization, tag clouds and so on.

Then they discuss citation and relationship visualization in documents. They talk about the standard node and link representation. They have discussed breifly some of its alternatives like the paperLens visualization and the legal citation analysis visualization.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The representation of data using tilebars is used to represent data varying in frequency, distribution and length. The use of tiles within a rectangular bar helps in analyzing the documents in their entirety so that no part fo the paper is neglected and every part of the paper is given its due importance. The representation of information as tiles and use of gray scales to distinguish the frequency is like visualizing in four dimensions. After the visualization is done, the ability of a user to drill down on particular results by clicking on the tiles can help in locating specific information.
The technique helps in ranking of the documents based on various features of the term sets. I think the analysis based on the distribution of the queried terms helps in locating what the user is looking for in the document set. The representation in a sorted order and using relative importance is a structured and progressive approach in text analysis. The representation of articles is term sets based on importance would be helpful when combined with various other data mining techniques in narrowing down on user queries.

Reaction: Tag Clouds and the Case for Vernacular Visualization

History of tag clouds is discussed as born out of the academic community hence they are called vernacular visualizations. They are used by many famous magazines and websites for various reasons like depicting word frequency, word importance measured in some other sense like tf-idf etc. However there are Theoretical problems with tag clouds. Long words get a bias, Cumbersome to search for a words. Semantically related might be far off even if we plan to arrange them in alphabetical ordering. So despite all these flaws tag clouds are popular and some of the reasons may be that they are used as a point of entry to different parts of a web site. Especially in web 2.0 social tagging websites. It has been observed that tag clouds provide a crude idea about what a lot of text is about. So that is one of the reasons they are popular.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

Jigsaw is an intelligent technique that can be used to visualize textual data in a variety of ways by constructing views using the document sets. Jigsaw does not actually analyse the documents but helps in analysis of the documents by visualizing the data in an interesting manner. The four views provided by the author are very ingenious in retrieving the points of interest from the given set of documents. The representation of data as stated in the first two views by generating small reports could help the analysts to focus more on relationships as these views could point to a more nearer range than wasting a lot of time. 

The representation of data in graph view and list view using different symbols and use of sizes is a good way of representing data in multiple dimensions. I think the representation of scatter plots forms good relationships between various documents as the similarity of various documents can be best represented. I think the propagation of actions from one view to another is an interesting part of the tool as this is what required for the efficient use of the tool.  This tool helps in drawing good relationships between entities when dealing on a big dataset. 

Monday, November 28, 2011

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The authors start by talking about the problems with visualizing search results. They point out that a lot of information like relative length of documents, frequency, distribution. They further talk about the problems of visualizing so much information. To solve this problem they propose the idea of tilebars.

Tilebars is a visualization involving a set of rectangle whose size depicts the relative length of the result set documents. It uses a gray scale from 0 to 8 to indicate the presence/absence of the query terms. This is achieved by showing tiles inside the rectangle with different gray scales. This kind of visualization also provides information about the distribution of query terms in the document. It also provides better feedback to the user about why a particular document was chosen in the list. Further they have provided features to jump to a particular part of the document by just clicking on the particular tile too.

Reaction: Information Visualization for Text Analysis

Visualization aids used in text mining help a lot in drawing correlations that are more semantically meaningful and help achieve the objective. The use of visualizations can be more helpful when the scale of the documents to be analysed is very large and the relations need to be drawn between various things in  the documents. The higher dimensionality of categorical data is an inherent problem in visualizing the word frequencies in a way that the purpose is satisfied as a visualization cannot convey all the information graphically when it is a text. The representation of tag clouds to show word frequencies help in understanding what are the most important terms on a particular website or document and are effective in understanding the frequency of word use. This can also help in analytics.
Theme River Visualization is an interesting type of visualization as that helps in relating various topics and can be used over a large set of documents to analyse the context. The analysis of literature and citation patterns can be helpful to identify good quality pages on the web as they will have a lot of backlinks and this symbolizes the quality of the papers.

Reaction: Tag Clouds and the Case for Vernacular Visualization

The tag clouds have become a popular way of visualizing word frequencies in various situations. The tag clouds or more appropriately word clouds can be used as a navigation helper on various websites that attract users and are very attractive to look at used on the website using tag clouds. As the author says the tag clouds are a complete violation of the visualization design guidelines, but still play an important role in  showing off interesting trends of a website when used to analyse trends, statistics on realtime etc..,

I think further research in the field of tag clouds in visualizing content relating them to the semantics of the words would be worthwhile in creating meaningful visualizations. I contradict with the point that tag clouds are a vernacular visualizations only as they may have evolved in a much different way, but further study of tag clouds along with analytics and using them in fields like text analysis and natural language processing would significantly improve the understanding of those subjects.

Reaction : Information visualization for text analysis

The chapter discusses about contents of text collections from a more analytical point of view through text mining challenges and examples, document concordances and word frequencies. It can be deduced that the user of the visualization makes a huge difference in effectiveness of the viz. I am agreeing to the point of Veerasamy and Heikes where they say interest of searcher would be different from that of computation linguist.

In this chapter we could see how the readings on Jigsaw and Tilebars were incorporated. This chapter had covered almost all the major examples we could think of in word / tag visualizations or which are generally knows as Tag/Word Cloud. The transition from historical way of representing text to the new era was an interesting read. It was very clear from the chapter that importance of visualizing textual data is gaining more and more importance day by day. The chapter was basically a summary of different tools and visualizations which we read and reacted in previous readings , with sufficient images and explanation.Overall the chapter was a good read.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The part where the paper mentions about, "if the user prefers a dense discussion of images and ... tangential reference to networking ...", reminded me of secondary keyword search within a keyword search. And as depicted in the figures, it provides multiple keyword search.
In Figure 3, what does the numbers next to the rectangles mean? It seems like an accumulated number. As I read through further, I was able to learn that the numbers indicated document IDs.
As a result of a query in Fig 3, is the result sorted? By looking at Figure 5, the result is definitely not sorted. In fact, the paper later on talks about the results not being sorted other than the document ID number. It seems that they had a hard time figuring out how to sort the results. Also, I wonder why there are rectangles with no dark squares inside. In Figure 3, if the queries are about "law And legal attorney lawsuit" and "network lan", why would the first two results show up that have almost no darker squares in the rectangle? Although the paper mentions about blending the corresponding tiles together may produce lightly colored tiles and large areas of white space for scattered discussions, it is not informative to see a white space in the rectangle.
However, it's clever to have the vertical scrolling bar at left while in general, scroll bars are set to the right side of the page.
Overall, it is a system that reaches its goal to visualize the relative length of the document, the frequency of the term sets and the distribution of the term sets.

Reaction : Tile Bars - Visualization of Term Distribution information in full text information access.

In this paper the author has tried to discuss about a new technique or tool called TileBars for visualization. It was able to incorporate length, frequency and distribution of terms in a document using this tool.

As the author has mentioned they help in understanding boolean type queries. The major challenges faced in information retrieval that is , Problem with ranking, importance of document structure were addressed in Tilebars. Author has described the advantages of Tilebars using similarity search , boolean search etc which was very informative.

I felt that,  if the author had put images in color the understanding of the topic would have been seamlessly easy. Even though author has described everything in detail, the paper lacked images for description. Understanding  the description was not very easy , I felt it bit confusing. The importance of  info crystal and cube could have been made more clear , as in how it relates to the data analysis part since it is not useful for explaining the frequency or  distribution information. But over all the idea looked interesting.

Reaction : Tag Clouds and the Case for Vernacular Visualization

The paper discusses about the well known historical topic tag clouds, variety of inputs to and usage patterns. Historical evolvement of tag clouds , and their usage in aggregation of tags and analytical data were good to read. Agreeing to author that when it started analyzing documents, it was more of word cloud , and to the fact that tag clouds help search for specific items in the display.
I believe that tag clouds have changed the information visualizations deeply as the authors have suggested but considering the fact that it was discovered long back there has not been a significant improvement to the field. I think its because of the complexity of word representation. Even though tag clouds do not belong to the visualization field fully I would like it to be part of the info viz as it helps analyze  huge amount of data in a summarized way. Though its a vernacular visualization it has moved the entire infoviz industry is what I believe.

Tag Clouds and the Case for Vernacular Visualization

Fernanda B. ViƩgas and Martin Wattenberg from IBM Research start with the history of "Tag clouds" and also advocate on the "Vernacular" case that they put up. They walk us through 90 years to Soviet Constructivism to Stanley Milgram in 1976 and fortune magazine to flickr. The simple point that they make is this is not the output of the academic or the research of visualization but something that has been picked up from outside.

I feel the fortune magazine representation is in a way a break through which influenced all other subsequently. The second part where they try to understand as to why "tag clouding" has become popular as it almost violates all the golden rules of the visualization.

They attribute some of the tag clouding to them serving as social signifiers and also imply a friendly atmosphere. Th reason for the word “vernacular” is that it does not come from the visualization community, and that violates the basics. Even the researcher confirm that it is difficult to read and understand the words in the tag cloud.

They conclude by saying that there is an increasing demand for tag clouds which shows that there is an important class of data that users want to visualize the unstructured text, although it violates the very basics of visualization !

Information visualization for text analysis

This chapter mainly focuses on the visualization of the textual data. This is something we have been revisiting throughout the classes in our discussions and many of the initial visualization presented by the class have covered quite a few of the cited visualization in the chapter.

Textual mining mainly focuses on showing the significant entities in the document and also the relationship with other significant ones. However the chapter categorizes into Visualization for Text Mining, Visualizing Document Concordances and Word Frequencies
and Visualizing Literature and Citation Relationships. The ibm webfountain looks good but is using a lot of colors with gradient hard to differentiate. If you focus on the 3rd bar graph , it is very difficult Sergey brin and bill gates (both have almost same color ). The tag cloud looks cool but really hard to identify all the word because of the orientation.

I think the new york times representation is really a good one to look for , it is simple with a lot of clarity in showing the frequency by size and having them arranged in line which makes it easy to compare.

Overall it is a good chapter to go through as we have been visiting these intermittently during our lectures.

Reaction: Jigsaw - Supporting Investigative Analysis Through Interactive Visualization

The paper discusses about Jigsaw and External cognition aid. Jigsaw connects actions in one view to the other and resembles a puzzle.
The author has tried to discuss more about the data structures , system architecture and event messaging of the visualization technique which gives more theoretical knowledge about Jigsaw tool through the paper.

They have tried to  highlight and communicate connections and relationships between entities. I too agree to the fact this would be visually more easy to understand by the global community. They have done it by tabular connections,semantic graph , scatter plot and text view. The filters given for list view, the incremental nature of graph view , readable scatter plot with range sliders and color coding used in text view were very helpful for large amount of data which data analysts would come across in their daily life.

Overall Jigsaw has tried to do most of the requirements a data analysts requires though it has not been able to represent estimates and probability. The author has mentioned that its not for careful analysis, so keeping that fact ,I think that the views created would be a good read for data analysts and people related to data visualizations.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

The paper is a description of the concepts, implementation and evaluation of a tool called Jigsaw. Jigsaw is intended to help in investigative analysis of documents to uncover hidden plots. This is a very intensive task involving experts pondering over several documents to make connections and try to uncover plots. In order to assist the experts in such a task. The researchers have worked on a natural and simple visualizations of the documents being considered for a case.

The main application provides four views. Two are report based visualizations meaning they are mostly textual and are in context of the report. These are called the text view and the scatter view. In the text view the documents are shown with highlighted entities to focus on a specific region. The scatter view provides sliders to ponder over specific entities in a report and its connnections. There are two more graphical representations a list view and a semantic view. The list view provides a list of entities and the connections between them. The semantic view provides a graphical visualization of the entities and connections as nodes and links visualization. If an event occurs in one of the views (selection, etc) it is propogated to others. This can be turned off too to give user greater flexibility on what he wants to see on each screen.

This is a really great idea as it leaves the intelligence to the expert and attempts to just act as an aid in his work. However since it has a multiple views and all of them are equally important this concept can be used only in a elaborate multi-monitor setup

Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This paper discusses the initial Jigsaw system for visualizing textual data in a variety of ways.  The system attempts to take a collection of reports and select entities (people, place, date, organization) from the text and determine connections between entities.  The system targets smaller reports to provide an interactive visualization that provides data about connections and allows analysts to browse and explore reports and information.  The system tries to provide multiple ways for analysts to see the data.  The first is a double column list with space in the middle for connecting links to be drawn.  Each columns can be sorted independently to provide different ways of clumping the data.  Also provides a graph view with entities as bubbles and lines drawn as connections that is effective for small to moderate size data sets.  The scatter plot appears busy and the data easily gets squished down to a hard to read state.  The text view provides an interesting view by allowing the raw report to be viewed with highlighted words that groups by colors.

This paper does a good job of exploring and examining the flaws in the system at its current state as well as the papers thoroughness.  First off, is that as the number of reports grows the ability to trace connections and relevant information becomes difficult.  Also, a collection must initially be static and at the current time of the paper could not adapt to adding in new reports.  The authors themselves propose ideas that I myself had while reading such as the ability for analysts to write and store notes and thoughts and to provide crowd source annotations about different data visualizations.  This system when written was a good first step into a difficult world of analytic text visualization.  I would be interested in how it has evolved over time.

Reaction: Tag Clouds and the Case for Vernacular Visualization

This article separates itself into two separate points: exploring and expanding on the world of tag/word clouds and stating a case for how the success of tag clouds provides evidence for why so-called "vernacular visualizations" should not be over looked.

The paper begins with an examination of how tag clouds have been around since before computers from Paris landmarks to the web and finally landing as a mainstream tool used in many types of media.  The mainstream nature of tag clouds was brought to light by Flickr when it tried to provide a visualization for image tags.  The heart of tag clouds are to summarize textual data by making words that appear important standout from the rest.  The determination of importance can be as simple as frequency or are as complex meshing issues in a Presidential speech with concerns of the American public.  IBM brought out the two-word phrase when prior tag clouds had only used one word entities.  Further, tag clouds can withdraw and show beneath the surface data or condense large data sets to isolate importance.  Patterns can be detected and are routinely used for such in political speeches.  However, tag clouds are not perfect.  Long words can receive undue emphasis, large tag clouds make it harder to find specific words, font size differences can be difficult for humans to detect the differences in and lose some of the relative sizing that is intending to show intensity.  Lastly, if the ordering is simply alphabetical, which most are, there is no sense of relationship.  For example, east and west are similar in concept but would be placed far apart.

The other side of the paper brings forth this notion of a vernacular visualization - one not brought about by academia but from use in practice.  By showing the success of tag clouds, the author intends to show that not all visualizations necessarily need to come from academia.  However, the research community should still vet and explore vernacular visualizations to provide the scientific foundations.  I tend to agree with this approach and often find that everyday experiences may bring ideas to light that rigorous science would not otherwise.  However, research should still be performed after the fact in order to confirm findings in a rigorous manner.

Reaction: Information Visualization for Text Analysis

This article explores how to perform analysis of text in a condensed form using information visualization techniques and how various systems are doing such work.  Prior to information visualization techniques, search was the primary text analysis tool but, if analytic examination needs to be performed then search is a poor tool.  The paper proceeds by going through various systems that showcase different techniques.

First is an attempt at visualization of identifying entities and discerning connections between them and providing that information visually through connecting lines to words.  This concept was expounded upon by the Jigsaw system which provided two columns of ordered lists with connections through the middle.  Other systems use rows and groupings to provide categorizations for browsing of search data.  SeeSoft takes lines of text and highlights connected words using colors to indicate matching groupings.  Their system was originally written for programmers who spend most of their time in a text based world to do reviews and refactorings.  The system was expanded to be able to be used in fictions books and the Bible.  The last systems touched on were generic word clouds and the NameVoyager system.  NameVoyager as we have seen in class provides baby names in a stacked bar graph that allows for filtering down to see more or less condensed data.  All in all this article provides a good overview for anyone needing to see what is out there in the world of analytic text analysis.  This article feeds in to the next selection of readings that explore some of the individual systems mentioned here.


This paper discuss about diffrent types of visulization used in analysis of documents or texts. Primary focus of this paper is to illustrate understanding of text collection from analytical point of view. And to achieve this author discuss about application of visualization in text mining, forming concordances and relationships between words and their usage in the language and lexical ontologies. Text mining focus on identifying important entities within the text and showing connections among them. Examples to illustrate this includes TAKMI text mining system, JIGSAW system, BETA system and triage system. Visualizing Concordance And Word Frequencies involves placing the word of interest in the center with related text around sorted in some way. I particularly like TextArc visualization as it's coxcomb-type radial visualization is intuitive as well as flexible to suit user's expectations and also provide an effective visual display of text treemap and word linkage in a document. Tag cloud visualization and Name voyager has been discussed in class which analyze a collection of texts by extracting concordance. I agree with notion that categorical nature of text and its high dimensionality, make
it very challenging to display content graphically as they have no inherent ordering. At last, author discuss about literary and citation analysis to assess the importance of the authors, the papers, and the topics in the field. Node-links graph and PaperLens interface explains the concept. Overall, its a very informative read and examples cited makes it easy to understand the underlying concept.

Reaction: Jigsaw: Supporting Investigative Analysis Through Interactive Visualization

                This paper starts off rather ambiguously in defining the obstacles that this project aims to overcome and defining how the system works. They eventually move into more concrete detail as to how the system works and provide several examples of it. This software could be highly useful in situations as far ranging as counter-terrorism, to political dirt digging. Though nothing new is really presented in this paper, the organizational structure seems very clean and the interesting way of linking names through data sources is also of interest. I have not seen a structure like this before and wonder could this type of technique be used for determining plagiarism within research articles by finding links among authorships?

As a research tool, JigSaw could be potentially useful in extracting connections between gene regulations. Assuming as an entity, a gene called gene1, is mentioned in the same paper as gene2, and gene2 is mentioned in a second paper which mentions gene3. Suddenly gene1 and gene3 are linked. Most data mining software today that I have seen does not have the ability to link terms in such a distinctive, visually appealing, and informative way. If this software could be adapted to be used in this manner, gene regulatory networks might be developed more quickly with less need to read through every paper that even mentions a gene. This automatically would construct the links necessary for finding additional papers worth reading as well.

Reaction:Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This paper is a comprehensive visual abstraction of various facets associated with Investigative analysis of textual reports.The tool enumerated in this paper, per se, is not a substitute for investigative reporting, but can provide visual hints as to how to inter-relate and comprehend short textual information related to an investigation.
Four faced interactive and inter-communicative tool developed in Java View-model architecture could be broadly divided into two:
a.Text based components comprising of Text Views and Scatter Plots
b.Graphic based “entity-connected-components” comprising of Graph View and List Views
I would prefer to call the Text Views as a visual-one-dimensional component.  This scans the entities such as ‘who’, ‘when’, ‘where’ and provides visual rendering with different colors. Scatter Plots are two-dimensioned since text-wise occurrences of entities are picturized with the help of horizontal and vertical ‘axes’.  There is further value addition by visually positioning the occurrences.

Communication between the entities and their connectivity is visualized with different colours, different sizes, different bar-densities in Graph View and List View.  I would like to term this as a dynamic-dimension forming various end points of communication from textual-visuals.

Author has used a java based architecture which has event based messaging libraries.  XML tree structure finds a relevant mention in this paper.

Can we create a JML (Jigsaw Markup Language) with a defined XML-Schema so as to ensure standardization in furthering the research in this area?

Literature Survey:
During literature survey on internet, I found a presentation in .pdf format by Stefan Lorenz of Universitat des Saarlandes titled “Jigsaw: supporting investigative analysis through interactive visualization”.  It may be advantageous for readers to go through this presentation as an excellent supplementary material. (

TileBars; Visualization of Term Distribution Information in Full Text Information Access

                In this paper, they discuss the use of a new visualization method, along with a new algorithm, for searching full-text articles, instead of simply searching abstracts or short articles, in a way that allows one to systematically compare different articles in an unbiased way. TileBars uses an unknown algorithm to parse each paper into “tiles” where each tile represents a subtopic of the full article and the frequency of case term use within the subtopics. The visualization then allows you to quickly see the relative relevance of each article based on this distribution and frequency of terms.

                Though this paper was written in 1995, I have not seen many text mining systems that incorporate positional information of terms within a document, and compare it across other documents. This could be very useful in the early stages of information retrieval for certain research projects by providing a quick ranking of each articles relative importance in a search. When you search through something like the PubMed database, where there are literally millions of articles, the more refined your search becomes, the more likely you are to miss critical papers. This technique would allow you to limit the precision of a search and return a larger collection of papers. Normally this would be problematic since more papers returned (and sometimes there are thousands) means more papers to read. This provides a visual clue as to the most important articles to read first. Even though there is no true ranking for this approach, the visual system is very good at interpreting what relevance a paper could have based upon its profile, thus saving time and effort. This technique would be welcomed for the research that I do at the EPA, and I will most likely recommend looking into it.