Web Visualization @ NCSU: 2011-11-20

Saturday, November 26, 2011

Viz: What topics science lovers link to the most

Pretty, but occlusion a problem. Good thing there aren't many nodes!

What topics science lovers link to the most

Reactions: TileBars: visualization of term distribution information in full text information access.

This paper talks about TileBars which is a visualization technique to display information and relation between full text of multiple documents. It lets user to show relative length of document, frequency of terms and distribution of terms across the document(also with respect to other documents).

Paper talks about older techniques for information retrieval just work with title and abstracts. Paper tells about most common approach for text retrieval where it informs about similarity search. Similarity search uses vector space model and probabilistic model for determining how closes document are with each other. In this it uses Boolean retrieval where documents are extracted in ranking order after they satisfy constrain given by user. This way suffered from numerous drawbacks so they proposed TileBars.

TileBars helps user in decision of which documents to view but it goes in more detail by telling which passage of those documents. TileBars displays search result of information retrieval using Tiles in square. Color shades of tiles is used to visualized term frequency and their size is used to visualize length of documents. This is achieved using ‘TextTiling’. Paper’s discussion about ‘TextTiling’ algorithm is very short but paper provides brief about its working. TextTiling provides boundaries between subtopic using term repetition in documents.

I agree to the author’s points that columns of TileBars can easily be searched and understood in comparison to previously stated techniques. Overall I find paper very well structured but somewhat hard to understand for some sections. Even after being old I think it is still can also be used today with news or information website (like blogs). Also I do feel some similarity(use of tile and color) between this visualization and Map of the Market which we discussed in class.

Reactions:Jigsaw: Supporting Investigative Analysis through Interactive Visualization.

This paper talks about Jigsaw system which helps analyst to retrieve information in document or report easily. Authors proposed this system to overcome difficulty to understand and makes sense from long and ever growing number of documents. Jigsaw provides multiple visualizations for data and link between content of reports. These multiple views different advantages of each view .Using these views interactively analyst can put the pieces of information together to reach conclusion(mainly find hidden information). It has following views: (1) List view with connection between object of different columns; (2) Graph view displaying link between reports and entities;(3) Scatter plot for closer investigation between any two categories and (4) Highlighted original text report.

I agree to author’s point that Jigsaw is a system to assist the analyst in sense making activities but not a substitute for careful analysis of reports. Paper is very structured and provides very well description of tool. Moreover paper also compare with works of similar tools like GeoTime, WebTAS, TRIST and SANDBOX. Jigsaw differs from other system because it focuses on (1) exploring relationships among the entities in documents and (2) representations of relationships.I feel Screenshot provided in paper helps in getting real good description of jigsaw views. I feel tool can be very useful because each view can overcome disadvantage of other views.

Reactions: TIMELINES: Tag clouds and the case for vernacular visualization

This paper starts from history of tag clouds and states first use in 1976 by psychologist for capturing mental map of Paris using landmark name. Paper talks about it first time use in computer field in 1995 by Douglas Coupland. First major use of Tag cloud was to show Fortune Magazine for displaying 500 corporations. Then it interest kept on growing with time and later on with its usage on websites. Paper discusses about two type of tag cloud one being traditional one word tag cloud and other being two word tag cloud. In two word tag cloud emphasis is given through visual technique to most used two word phrase.

Most interesting fact that paper states is tag cloud don’t follow lots of theoretical rule of visualization still it is adopted by so many websites. Few problems that paper states about tag cloud are: Long words get undue advantages of more visual attention than shorter words, alphabetical ordering of words can show unrelated word closer and also they are not so good in overall sense making in comparison to few other techniques. However that’s in theory but in practical it is still getting interest even after breaking some golden rules of visualization. This is because tag cloud provides friendly atmosphere and easy entry into complex site like social websites and blog sites. I like the content about paper overall as it contains many interesting fact which subtle advantages and theoretical disadvantages defined very clearly. Also I liked the point of author, that we may need to relook for new possibilities and at old knowledge as tag cloud succeeds even after not complying to best practices. This could be very beneficial for finding new techniques because tag cloud being one of the most curious and used visualization was not found through research and unconventional ways.

Reactions:Information visualization for text analysis.

This chapter from Search User Interfaces talks about information visualization for text analysis. This chapter talks about multiple ways and tools for visualizing mined text. First section of chapter talks about strategies like text mining to find important words in text and relation or connection between words. There are multiple tools talked about in very brief which provide this functionalities. It talks about TAKMI for text mining for call center; Jigsaw system for analyzing relationship between entities or word; BETA system for IBM fountain for data exploration using Tilebar. However all of them are talked with very less detail and some screen shot.

Then chapter talks about analyzing words and text by extracting concordances and display them in context. It discussed way very common way to view accordance is to display word for interest in center and others around it by relevancy. For this it talks about tools like SeeSoft and TextArc which differ in display of concordance. SeeSoft display’s concordance in columns and TextArc display’s concordance in spiral with connection to words. Then it also talks about tag clouds used in websites and few other ways for displaying concordance and relativity between words. Chapter also discusses about displaying relation between citations in literature. One of visualization discussed in chapter uses graph with nodes and link for displaying relationship between authors and documents.

I feel chapter describes very basics and introductory information about visualization for text mining and its tools. It talks about tools but not in detail and just provides screen shots which I feel is of very less use if I want to create tool for this purpose. This chapter would have been more informative if would have discussed about detailed process of mining information and how tools process information for visualization.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

     Stasko, Gorg, Liu, and Singhal present an interesting visualization system to aid in the discovery and review of a collection of full text documents. Their stated goal is to "develop visual representations of the information within text documents and report collections in order to help search review and understand the reports" and I would agree they have at least accomplished most of that. Their Jigsaw system is a Java based system that provides four different views of viewing relationships between documents and entities (people, places, dates, and organizations). I see the benefit to having four different views, it allows you to see things in different ways, but it seems odd for a tool such as this to have four completely different views who's visualization methods really do not relate to one another. This requires the user to readjust themselves to each view and perhaps requires a bit more of a 'context' switch between views. With this decision I am left wondering if there are any other 'views' that would be helpful to users of the system.
     I'm not certain how easy the tool is with large data sets. Many of the screenshots I feel show a visualization that would not scale well to thousands of documents. For example the "List View" does not seem to have an easy way to filter each side of the list. While the paper discusses a search option, a simple filter would be nice. Their original stated goal is to help "understand" the reports, but I'm not sure how well they do that. I could see it helping in discovering related documents, which may help in understanding, but more of as a by-product.
     Overall the tools seems like it would be a good tool for exploring small to medium sized collections of documents.

Friday, November 25, 2011

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This paper presents a visual analytic system called Jigsaw that represents documents and their entities visually in order to help analysts examine reports more efficiently and develop theories about potential actions more quickly. It provides multiple coordinated views of document entities with a special emphasis on visually illustrating connections between entities across the different documents. It helps investigative analysts who are faced with the challenging task of assessing and making sense of large bodies of information.

Jigsaw is designed to assist analysts with foraging and sense-making activities across collections of textual reports. Through interactive exploration, analysts are able to browse the entities and connections to help form mental models about the plans and activities suggested by the report data but it is not a substitute for careful analysis of the reports. Instead, it acts as a visual index that presents entity relations and links in forms that are more easily perceived, thus suggesting relevant reports to examine next. Since Jigsaw uses other available tools to extract the entities from a document, without determining the accuracy of these tools, it is not possible to judge if Jigsaw presents the information correctly.

Reaction: INFORMATION VISUALIZATION FOR TEXT ANALYSIS

This paper is a very interesting read for data analysts and people interested in the field of text mining. This chapter describes ideas that have been put forward for understanding the contents of text collections from a more analytical point of view. The paper discusses applications in the field of Text Mining, which usually involve visualizing connections among entities within and across documents. It also discusses methods for visualization occurrences of words or phrases within documents and various attempts done to visualize relationships between words in their usage in language and in lexical ontologies.

I really liked the system, TRIST used for entity extraction to identify the people, places, and organizations that occur within the retrieved documents. Things of interest could also be dragged to the workspace below the search results and documents could also be grouped by clustering or automated categorization.

Each visualization/system has been supported by a very engaging image of itself which makes the visualization much clearer to understand. All the applications listed are pretty interesting like Jigsaw system, TAKMI text mining interface etc. Social visualization sites like IBM's manyeyes.com, and other tools continue to make visualization generally accessible to non programmers. Overall a good read.

Reaction: TileBars - Visualization of Term Distribution Information in Full Text Information Access

This paper presents a new visualization pattern called Tile-Bars which shows the usefulness of explicit term distribution information in Boolean-type queries by making use of text structure while retrieving from full text documents.

It is a very useful analytical tool for understanding the results of Boolean-type queries. This visualization technique "TileBars" for full text searches the output and should also contain the frequency and distribution of words in the document algorithm called "TextTiling" to automatically determine the kind of document structure. The search results are displayed using rectangles and shaded squares indicating the length of document and subcategories of document and the shades denoting the frequency of word in that sub category. Its a very novel method of analyzing search results of Boolean queries and also provides a condense view containing the document length, key term frequency and distribution.

TitleBars is very effective as it presents a visual display of the results and thus the user can open the documents that have a high relevance to his search term and ignore the others. Apart from reducing the visual clutter, it also saves a lot of time for the user. This paper is a must read for people interested in the field of data mining and information retrieval.

Reaction: Tag Clouds and the Case for Vernacular Visualization

This paper shows how tag clouds have evolved over the time and found their way into social media and Web 2.0. I totally agree with the authors point that a tag cloud is truly a “vernacular” technique, i.e. visualization that does not come from visualization community, for example tag clouds. The paper shows the usage of tag clouds and how it is very popular and useful because of its manner of making things so visual, engaging, flexible and easy to read.

The paper cites all the instances when tags proved to be useful from Jim Flanagan usage of tags to show the popular search terms that led people to his website to Flickr showing tagged images to psychological experiments. Since then many sites have followed this idea like del.icio.us and Technorati. It also says that collection of tags is not the only use of tag clouds rather it is also used these days for analytical tasks, which forms word cloud. Overall this paper is short and to the point.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

     Marti Hearst presents an interesting visualization for presenting boolean style search results on full text documents. The "TileBars" paradigm involves dividing up each document into subtopics called TextTiles. The boolean search is performed on each TextTile and a small graph is presented beside each search result that helps the user understand roughly how many subtopics a paper has, the frequency of the search terms across each subtopic, and the distribution of them across the paper. This enables the reader to see if their search terms were scattered throughout the paper, or concentrated in one or two subtopics. I'm curious if the algorithm includes the abstract as a 'subtopic'.
     This visualization seems very useful for doing full text searches, such as for research. I don't see it being very useful for general web searches or having many other uses. I would be very interested in reading more about the author's TextTiling algorithm and how he evaluated it. To me that seems to be a very important aspect of the TileBar system and yet the author went into very little detail about it. He says it is "Serviceable" for TitleBars, but how did he come about that conclusion?
     I did find his analysis of the difference between the abstract of an article and its full text. Everything made sense and was intuitive, but I had never thought about the difference from a search perspective and so the simple analysis was refreshing.

Wednesday, November 23, 2011

Reaction: Tag Clouds and the Case for Vernacular Visualization

     This article details the history of tag clouds, what they are, and their strengths and weaknesses. Overall I thought this was a fairly good article detailing why tag clouds can do a poor job at summarizing large sets of data in most cases. I was a little confused why there needed to be a very large distinction between visualization methods originating inside or outside academia. While all visualization methods should be evaluated for their effectiveness and clarity, and it seems silly to make such a distinction and almost comes off as somewhat condescending.
     The article's examples of the one-word tag clouds versus the two-word tag cloud was interesting. While the two-word tag cloud provided more insight into what the speech was actually about, most words seem to either be extremely large (only one), large (about two), and everything else. This does not seem to indicate frequency as much as the one-word cloud, but probably gives a better indication of content. More details about this would be interesting.
     Overall the article is fairly good and comes down pretty hard against word clouds, which i would mostly agree with, especially for things like speeches. I did like how they gave examples of word clouds that were more beneficial, such as tags of users photos that give an overview of that user.

Reaction: Information visualization for text analysis (Chapter 11)

     The goal of this chapter is to describe ideas that have been presented to understand large text collections. It seems to cover three main aspects, Text Mining, occurance of words, and word relationships. In all cases it doesn't seem to do a very good job.
     Mostly it seems that this is a list of tools, not of visualization ideas for any of the three categories it claims to cover. It poorly presents the tools with most a one or two sentence description without much depth. The few times it does go into more detail about a tool it does not evalute it to much degree.
     I am not one hundered percent sure what this article as aimed at, as it does not seem to attempt to accomplish its goals, nor does it seem to be anything more than a starting point for someone looking into tools for representing large text collections and at this it will probably become quickly outdatated.

Guests: final presentations, December 12

Folks,

We will have several guests for final presentations. As I've said before, I will not be there:

Pat Fitzgerald, Prof. Art & Design and director of the Advanced Media Lab
Billy Houghteling of OTT and Springboard
Clayton Coleman of IBM

I am working on other guests, more when they are confirmed.

Best,

Ben

Tool: Jonathan Stark on Mobile" from the The Web Ahead podcast

Good info on html5 for mobiles and phonegap.

Jonathan Stark joins Jen Simmons to talk about web apps vs. native apps, when to use which mobile technology, how to plan a good mobile experience, touch events, and more.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

It was a really interesting paper to read. The authors discuss about a novel method to analyze large volume of text and thereby aid the analysts in the investigation process. For this the authors have developed a toll called Jigsaw which they used to represent the text document in a visual fashion. One thing which was interesting about this paper was that unlike other papers they have talked about the cost to analyze the document and how the relationships are determined. Another thing of concern is that how can the analysis be verified.

It gives me a feeling that having a tool like Jigsaw for large documents is very important in text summarizing as the amount of text data we have is enormous and having it represented as a visualization is ought to be very helpful. But I don't know how difficult and complex it would be for large ones as the authors have shown it only for short documents.

Another thing of concern is having to cope up with the use of multiple monitors. I have personally used two monitors at a time and it's really helpful but I guess it might be very easy to lose focus if we increase the number of monitor one has to look at simultaneously.

Further the authors have used various visual forms like tables, graph, text view, scatter view etc. to represent the document. I believe the advantage of having multiple representations is to provide a flexible and effective understanding of the document under study.

It would be really interesting to see some future work where they try to scale the use of this tool to larger documents.

Find: User Experience and Experience Design

A great intro to ux and xd. How will your design be woven into
people's narratives? What personal needs does it meet?

http://www.interaction-design.org/encyclopedia/user_experience_and_experience...

Tuesday, November 22, 2011

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization.

The paper provides a good introduction to the investigative tool called Jigsaw. The tool helps the user to understand connections between entities: persons, places, dates and organizations using 4 different views: List View, Graph View, Scatterplot View, and Text View.

In my opinion, each of those views complement each other, and they will be helpful in their own way. However, it will be interesting to see a geographic view that presents a map with different colors depending on how many mentions a country or region had.

Also, I don't think that the tool will be useful when you have large amounts of entities because you would have many labels, nodes, ineligible axis, or the need to scroll a lot, so it would be interesting to see how the tool would adapt to large amounts of text/entities.

Reaction: TileBars: visualization of term distribution information in full text information access.

This is an interesting paper because it explains the differences on performing text retrieval the way they where doing on titles and abstracts, compared to full texts when the information became available.

Since titles and abstracts were shorter than full texts, the information retrieval was performed in a different way. The author introduces the tool called TileBars that performs information retrieval in full texts, but helping the user by providing information about how long was the text, how are the terms distributed among the structure of the document, and provides a way for the user to decide how he wants to perform the search (how many term sets he wants to provide, and the boolean connectors between them).

The tool is a useful because the user can understand how the search was performed, and it will help to refine their searches in order to get better results. Nowadays, whenever we use the search from Google, we need to "trust" on how the search is being performed, and the algorithm is not displayed to the user in a graphic way, so I think that the tool was useful. The paper also provides 3 different examples on how the search is performed using different set terms, and what are the results of the search.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This is another article in the line of correlating seperate documents and their entities and see how siginificant their coupling in terms of concepts. The article warns the reader that this tool is simply to aid to understand the documents and correlate them and has no intention of replacing the documents. I feel this article is related to the TileBars paper as far as understanding the inherent relationships between the documents.

Jigsaw gives a large number of different views that are handy in analyzing the data from different angles and also connects this visualizations as explained in the scenario. The user is able to switch to different views to get detailed information regarding certain facets of the information. There is scatterplot for showing relationships between entities and the documents, graphs show incremental view of the same information. List view allows to select a particular node after arranging the list in a particular order and text view gives the document content.

I feel this visualization is more comprehensive that the TileBars; however in perspective both are meant for different analysis. I feel where Jigsaw really standsout is the different views it present. The visualizations have tremendous utilities by exposing small widgets like zooming, arraning the graph in the geospatial region, node expansions, timelines and sorting. In addition to this, it supports querying.

The paper also states the limitation of ZigZag system that being scalability and scrolling for a large list of data. It handles scalability in case of network graphs however.

I feel this tool is of great use to people trying to understand the documents and corroborate findings using different views.

Reaction: Information Visualization for Text Analysis

I found this chapter to give some insights on quite an interesting area of research i.e. text analysis. Coupled with an effective data visualization scenario, the topic can be of great interest to any computational linguistic researcher I believe.

Author clearly explains the heuristics of applying data visualizations to the text mining in the beginning. The examples and tools mentioned helped me understand how actually text analysis works. I especially liked the tool TRIST very good as it represents search results as document icons and also supports multiple linked dimensions that allow for finding characteristics and correlations among the documents.

Also, according to the authors Tag Clouds are not a very good way to represent a textual context into a visualizations. However, a few years back there was a wave of bloggers adding Tag Clouds to their blogs. I feel, Tag clouds are effective but with too much information overload, users may get overwhelmed. I think probably that's the reason Twitter now only shows few of the popular hashtags

In the last section of the chapter authors discuss about visualizing literature and citation relationships both of which are closely related to the field of text mining and text concordance analysis . Some of examples were discussed in the class too (NameVoyager visualization)

I certainly feel that the chapter is useful for anyone who's into textual analysis and want to learn some effective visual representation as an addendum. Even the most popular social networks these days are coming up with innovative approach of visual representation of content that user adds (Facebook timeline) and sites like IBM's manyeyes.com which makes visualizations accessible to non tech-savvy users.

Reaction: TIMELINES: Tag clouds and the case for vernacular visualization

I found this paper to be precise enough to explain the history of tag clouds and how it evolved over a period of time. Tag clouds are quite a modern approach in the era of Web 2.0, however its evolution certainly takes us back in time.

According to the authors, tag clouds are not a very formal way of visualizing textual data, however they are still quite popular among vernacular media. Author also mentions that the Tag clouds may work in practice but not in theory. Probably, I think, with too much of content generated each day, it's an enormous task for anyone to visualize the information that is represented by such tag clouds.

I have been seeing the Tag clouds since number of years now. As a novice researcher in Visualization, for me Tag clouds represents a niche of the system. Anyone can easily point out what a blog or a website (or any other medium) is all about by glancing through the Tag Cloud

Interestingly, if we have a look at the Tag cloud of this blog, "reactions" are the most popular tag followed by "data", "graphs" which certainly tells any anonymous user about the niche of this website. I think that's the reason why they have linked vernacular visualizations to the Tag clouds.

I feel there's a lot of scope to vernacular visualizations. Tag clouds are probably the first encounter to such visualizations.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The paper is more than a decade old. I don't know if there's any analogy to Moore's law in case of Visualizations and time, but still the paper was an interesting read and bit informative too.

Here, author discusses about visualization approach of TileBars which is an effective way of structuring text in full text documents. In the beginning, authors explain about how can we retrieve information by running queries and also lists out the pros and cons of this approach.

I found this visualization approach of TileBars quite interesting where anyone would also know the importance of such running queries. I think after about 16 years when this paper was published, there are number of actual examples that we have today which are inherited from such approach.

Author also proposes an interesting algorithm called TextTiling to query and know the frequency distribution of the text and possibly determine any patterns. However, I think this approach is quite old now. We certainly have better text summarization and information extraction methods which are based on word clusters and ranking algorithms. I'm not sure if the TextTiling was the tipping point of such algorithmic needs.

Overall, I learned couple of useful things from this paper and feel that this would be very useful for anybody who is into data mining and information extraction field.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This is a very interesting research paper that discusses about a novel approach of assessing information from huge chunk of textual data. For this, authors have created a tool Jigsaw which is used to visually represent the textual content in the documents.

I certainly feel that this is a very interesting topic of text summarization , especially today when gazillion bytes of data is created everyday. There are number of tools available today for text summarizing, however representing the textual data into sort of visualizations would be indeed a very good idea!

Authors discusses that it is possible to break down any document into perceptive visual forms like graph, tables, scatter view, text view etc. This is helpful in the field of data analytics where the data is in the form of text and not represented as a reports.

In the end, paper also discusses the limitation of such system that it's difficult to evaulate it from usability perspective and trial use by real analysts. Another challenge that it faces is the scalability as for larger report collections in which the number of entitites in a category can grow into the thousands or beyond. Probably their approach needs some form of dynamic update and filtering.

Overall I feel that the Jigsaw is not a substitute for careful analysis of the data reports, however it certainly gives some insights about visual indexes that present entity relations in forms that are easily perceieved.

Announcement: final project presentations

Hi folks,

Unfortunately my work takes me out of town on December 12, the day of our final. So here is how the final project will work:

You must create a screencast demo of your visualization. It should essentially be a version of your final project presentation. Put it on your site, where visitors can view it to learn about your site.

There are several free tools that you could use, including camstudio, screen2avi, jing, screencast-o-matic, microsoft expression encoder, screenpresso, and fraps.
You should put it online no later than December 15. If you submit it December 8 or earlier, I will give you feedback that you can use to improve your live presentation.

On the day of the final (December 12), you will give your presentation to several visitors, including ncsu faculty and corporate visitors. They will give you feedback about your presentations and projects.

Reacction:Jigsaw: Supporting Investigative Analysis through Interactive Visualization

This paper explains about Jigsaw system , an alternative technique of of assessing and making sense of large bodies of information. I believe , it acts as a visual index that presents entity relations and links in forms that are more easily perceived, thus suggesting relevant reports to examine next. Further the author suggests that for a better visual perception, its efficient to go with a graph based approach compared to a scatter plot as in the later it becomes very difficult for the user to discern and extract critical information. Decision making for Business Process intelligence can be one thought of one of the important beneficiaries of this scheme because on a day to day basis I fell there is a driving need for information summarizes to perform transition from goals to actual business realization. The author also talks about report generation in a visually perceptive way through Analytics and mentions that they should not be replaced by tools. Other systems sometimes put too much information into a single complex view. With the result that though information may be present, it is harder to discern and is much less easier from the analyst's viewpoint. With Jigsaw I feel the synthesis of all the views and their interactive capabilities that provide an environment for aiding investigative analysis.

Reaction:TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper introduces a new Visualization technique "TileBars" for full Text Searches. Usually when a Boolean query is performed the output will be documents that contain the word and the ranking of documents in list depends on the frequency of word in that document. These documents are usually structured in the format as title, abstraction. The author says that the output should also contain the frequency and distribution of words in the document and the length of document. To serve this purpose TileBars can be used.This would help in analyzing the search results of Boolean type queries.The author introduces an algorithm called "TextTiling" to automatically determine the kind of document structure. The search results are displayed using rectangles and shaded squares indicating the length of document and subcategories of document and the shades denoting the frequency of word in that sub category. I find it as a very new and useful method of analyzing search results of Boolean queries and it also provides a compact view containing the document length, keyterm frequency and distribution.

Reaction- Tag Clouds and the Case for Vernacular Visualization

The paper starts with the history and evolution of tag clouds but what was really interesting to find was that the concept of tag cloud started very early around 1976. It was simply unbelievable for me. I always thought of it being fairly new. Anyway I never got to read about them in detail but this paper allowed me to do so.

I agree with the point that tag clouds are not restricted to just websites anymore and have application in a number a varied domains. It is one of the effective tools used for text analysis. It was good to see so many different types of example tag clouds used in different contexts.

I think that it will be too cluttered as the size of the cloud grows and hence it might lose its benefits of finding useful information when all the words are of different sizes. I don't understand the relevance in that case as some important but just slightly smaller words might not get noticed at all. I totally agree with the argument about alphabetical ordering versus clustering. It is really difficult to pick one over the other as both have their own advantages.

Overall it was a simple, short, well compiled paper really easy to read and understand. They have done a good job in including so much information in such a short paper along with figures and examples, history and evolution thereby providing a good understanding for Tag Clouds.

Reaction:Tag clouds and the case for vernacular visualtion

In this paper the author points to a very interesting concept called tag cloud visualization.I like the fact how the author mentions about the existence of tag clouds even before it has gained popularity today.It's interesting to see how a concept that was always known is gaining popularity now.

Another interesting fact about the paper is how the author highlights the pros and cons of tag cloud visualization.He stresses how the tag cloud gives importance to certain words based on serialization and this may not necessarily be the right way.

Personally I feel tag cloud visualization still needs some more research into it before it can be used. The page looks too cluttered with words and is not ideal for reading. It may good for few fields of visualization ,but yet is not fit to apply to all fields of visualization as yet.

Reaction:Tag Clouds and the Case for Vernacular Visualization

Tagclouds are commonly used to provide an overview of some textual information. The paper provides different scenarios where tag clouds where used and how they have evolved to their present state. Their usage is no longer limited to displaying popular words in a blog or sites like Flickr. The several possible variations in tagclouds such as timeline based, single word and two word clouds are shown in the paper. Another interesting variation is the word cloud where the frequency of words is also shown.The major disadvantage of single word clouds is that they are often not in the context. IBM has come up with a two-word tag cloud which shows frequently used phrases rather than just words. I find this to be more intuitive than the single word approach.

The layout of tag clouds, font sizes of words, ordering of words are important aspects in determining the usefulness of the visualization. Studies have shown that tag clouds can be used for browsing but not when looking for specific answers.In my opinion, having font sizes proportional to the frequency of word occurence is a good way of representing unstructured text and I find it more intuitive over alphabetical ordering. Also, depending on the context it makes sense to have two word clouds. Overall,the paper is a good read and gives us a general overview of tag clouds and their evolution.

Reaction:Information Visualization for Text Analysis

This chapter is basically an overview of three different ideas for text visualization. One approach is text mining which is the representation of documents as a collection of entities and showing the connections between entities like in Jigsaw. In this paper, an example of mapping products with associated complaints received in a call center is provided. Such a representation would help identify which are the products having problems instead of having to skim through reports of the same. IBM's web fountain combines the tilebar approach with the document-entity relation approach and I find the results impressive. Also, in TRIST I liked the idea of representing each search result as a document icon. But with a search returning over hundreds of documents, the screen can appear clumsy affecting the readability.

The second approach is concordance analysis. Concordance is an index of all the words that appear in a text, showing those words in the contexts in which they appear. The words are alphabetically sorted. The examples presented under this section weren't easy to follow and there was too much information in the visualizations. In the DocuBurst visualization, the orientation of the words affected the readbaility. I found only the word tree visualization to be readable.

The third approach is to visualize relations between authors and citations. I find this technique useful in the field of research and see its applicability.But the sample graph created by Small does not appear to be very intuitive. I liked the paper lens visualization and found it easier to understand.

Announcment: last reactions due next Tuesday 29th

Folks,

Since we will have no class this Wednesday and a visitor for critique on Monday, our last lecture will take place next week Wednesday the 29th. Your reactions for the current set of readings are therefore due Tuesday the 28th.

Venkata Manda and Lavanya Mohanan will present our four readings on text, dividing between them so each has two.

Best,

Ben

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

The authors have proposed a tool Jigsaw which will represent the documents and corresponding document objects visually which will help in analysis of the document as a whole. I agree with the statement that whenever the size of the document grows large then the understanding rate goes on depleting. I have experienced this myself many a times. I have read several technical papers of average length of 10-12 pages. These papers mainly consist of 2 column text and not many diagrams. I am able to concentrate for the first half and then I stop reading because I cannot digest it after that.

I think Jigsaw will be helpful since it gives a visual perception. I liked the idea of breaking up the document into entities and representing them in tabular or graphical form. These are powerful tools of viz. Of all the views I would go with the graph view. This is because it allows expanding and contracting of the data. In contrast to it the scatter plot and list view are not so intuitive and will be difficult for the user to handle.

I agree with the author's view that the analytic reports should not be replaced by tools. Reports are very important in Business Intelligence and Analytics. I too understand that reports are irreplaceable. However the authors wan to say that adding Visual effects to these reports will make the task of analysis more easy for the analysts.

One final note is that the field of analytics will have data pouring everyday in your data warehouses. You will have Extraction Transform Loading tools which cleanse the data and generate reports for you. We should strike a balance between the two - large textual data and an effort to represent them in a visual way. None of them is totally complete. They aid each other or we can say complement each other.

TileBars: Visualization of Term Distribution Information in Full Text Information Access

The purpose of an information access system is to retrieve the most relevant information as requested by the user. There have been many approaches for information retrieval and this paper presents a promising approach using visualization paradigm called “TitleBars” that makes use of the text structure from full text documents.

To begin with, the authors present a brief overview of information retrieval using query and the issues faced using this approach. They emphasize the need to analyze the data retrieved by a query and highlight the main features of TitleBars including simultaneously viewing of the length of the retrieved documents, frequency of the query terms, and their distributional properties with respect to the document. The visualization approach helps the user better understand the role of each query term within the documents retrieved and where other standard information retrieval methods succeed or fail.

The structure of the paper is well written. The authors present a glimpse of the standard retrieval techniques, and their drawbacks. TitleBars is then introduced as a solution to these drawbacks and the approach is well explained as a reaction to three main hypotheses. The paper is concluded by stating the related work and the future extensions to this paper.

I feel this approach is of extreme importance to every user who uses the internet for searching and retrieving documents. Often, I find myself confused and lost when I am trying to search for data using a generic keyword such as “web visualization” and most of the results retrieved have less than 1% relevance to my keyword. The users need to be really creative to give the right combination of keywords to ensure apt search results. The approach presented in this paper, “TitleBars” is very effective as it presents a visual display of the results and thus the user can open the documents that have a high relevance to his search term and ignore the others. Apart from reducing the visual clutter, it also saves a lot of time for the user.

Reaction- Information Visualization For Text Analysis

As the title suggested this paper talks about application of visualization in analysis of text and documents. This topic was very new to me so I had lot to learn from it. I got a good insight of the applications in Text Mining.

The authors say that the most common strategy in text mining is to identify important entities within the text and show connections among them. Through a series of example, including TAKMI system, Jigsaw system, BETA system of IBM Web Foundation etc., they have told what it means by identifying entities and how connections are showed between them. But I wanted to know the underlying principles for this as in how they identify whether an entity is important or not and how do they make sure whether two entities should be connected or not.

The author then talks in detail about the TRIST tool. I guess they were trying to underline the importance of categorizing the extracted data into different dimensions and provide the user/analysts more flexibility. Moving on, they provide methods for visualizing document concordances and word frequencies. Here they talk about alphabetical indexing and contexts. I agree with the notion of sticking to the basics like this as they are very easy for users to figure out. Here again they provide a lot of examples like DocuBurst, Word Tree, Tag clouds, word clouds etc. It is really easy to understand this paper and what it is talking about due to the number of detailed examples they have provided. But I believe that’s too much in a paper for the reader to get distracted easily.

Reaction: TIMELINES: Tag clouds and the case for vernacular visualization

This paper digs back to the origin of tag cloud visualization and how it evolved as time went by. Tag clouds are typically used to depict metadata tags, These tags are nothing but single word texts whose font size and color vary according to their importance and thus helping in identifying the prominent texts. Tag clouds are first introduced by a social psychologist and a fictional character. It got in to better form when Flickr used this visualization technique to show the popularity of various tags. Following Flickr many sites adopted tag clouds and it soon became a hallmark for web. John Meda declared Tag cloud as "The Greatest Diagram of 2004".Apart from aggregating the tags in one whole view it can be used for analytical purpose too, It is majorly used to do a quick scan of a collection of Documents. IBM has brought up a new version of TagCloud which uses Two words text, I don't think this is much of a change. TagClouds are now a renowned InfoViz technique even though it does not follow rules of Visualization and it is not born from Visualization. The author says that Tag Clouds is best and painless tool for analyzing individual's personality Ex: Tag Cloud of Political speeches will aid in analyzing the politician's personality.

Reaction- TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper talks about textual analysis of the entire document so as to give a better and very close result to the searched items. Prior to this most of the textual search was based on title and abstract of the paper. I personally feel that it will really affect the search results. There have been instances in the past when I am looking for an academic paper and the search result are so broad and I'm often overwhelmed by the number of results not of any use to me. Also I even wonder how the search results are ranked in order and how legitimate that ordering is.

This paper proposes a new and quite intuitive style of display called TileBars. What they have tried to do is to provide the user with a relative view of the length of the document, and allow them to query term frequencies and distributional properties. They also talk about TextTiling to determine the structure of the document. In my opinion this is fine for general style papers of limited size following a specified structure e.g. Academic papers but I don't think it would be very optimal to use this approach for very lengthy documents like novels etc.

I agree with the author that information access mechanisms should not be thought of as retrieval in isolation as there are a lot of dependencies on other things like document subsets. But I would still like to see how this approach is extended to large documents and how the TileBars will look in that case. Overall it was an interesting read as it introduced me to something new.

Reaction:Tag Clouds and the Case for Vernacular Visualization

Tag clouds is the fastest growing visualization and can be used in almost all of the datasets. The primary field is the word frequencies and the relations between the words. This paper published by the IBM research explains about the details of the tag clouds.

This visualization is growing because the basic look of a tag cloud is a combination of many different type sizes in a single view. This can be used in a vareity of fields ranging from tags in a blog websites to maps of countries. This can be made more powerful by adding many different versions. IBM's own Manyeyes system which lets users upload and visualize data in a variety of ways, features two flavors of tag clouds: the traditional one-word and an unconventional two-word view. This method provides an easier way to understand the content.

Tag clouds are one of the best visualization to analyze the relations between any objects. This paper is a must read for all of the visualization engineers.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

This was published in 1995 and is now a quite old one. This paper discusses about the analysis of the full text documents. At that time most of the text analysis mainly concentrated on the abstracts and the titles for analysis. However with the advent of more and more full text documents flooding the web the technique of title and abstraction wouldn't work. The author has propose Tile bar which would help in this.

I had not heard about tile bars before reading this paper. It was interesting to read how these tile bars would solve the issues of relative length, query and frequency distribution of the text. The author has proposed a TextTiling algorithm. However the results of the algorithm cannot be determined which is a bit concerning for me when the author says that this is a fully implemented algorithm.

Most appealing thing about the TileBar is that since it is a visual technique to view the frequency and distribution of the words , this will greatly help the users in skimming through long text and in turn will reduce their time of searching long documents. I too have experienced this problems many a times.

Reaction:Information visualization for text analysis

This is about application of visualization in Text mining. Text mining is an interesting topic in web Analytics, which is used to derive valuable information from text. The chapter starts with going through some existing visualization systems for text mining like Jigsaw, Trist etc... Trist seems to be interesting especially the icon size varies along the document size, large icons corresponds to large documents and other features like drag gable entities, clustering of documents makes it appealing for the user. Next the author comes up with some visualization tools used for document concordance ( "concordance : an alphabetical index of all the words in a text, showing those words in the contexts in which they appear") and word frequencies. Some visualization tools like concordancesoftware, SeeSoft, TextArc, etc... are presented. Among those TextArc looks very bad at interaction. The word tree visualization looks appealing and easy to visualize. The babynameswizard is interesting as it makes intelligent use of colors, pink for girl names and blue for boy names. Then the author presents some tools for Literature and Citation relationship. These tools are used to visualize how authors cite each other and to get the variation of importance of papers along time. Paperlens is an interesting tool, it shows user the citation frequency of all authors and popular topics of the year, It has different views for each. Visualization has great application in analyzing text at various forms, The visualizations should be in a way understandable to all kinds of users not just analysts who know what they are looking for.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

In the earlier paper, Hearst gave a small introduction to the Jigsaw systems. In an overview this method can be used for document concordances their entities visually in order to help analysts examine reports more efficiently. This paper goes in detail of the system and explains the various places where this system can be used and the various views which are available within the system.

Jigsaw system has listview, graphview, scatterplot view and text view. These views explains how Jigsaw can be used in various fields. A growing number of research and commercial systems are using visualization and visual analytic techniques to help support investigative analysis to the Jigsaw systems.

One of the most interesting way of using this system is to link all the views provided by the system and looking at the dataset helps to extract more from the dataset and the ultimate use of the Jigsaw system. To understand about a person one can use the list view first, and then use the text view to sort the words and then the scatterplot view to find the word frequencies. This way helps this system to stand unique from the other systems.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

Jigsaw, is a tool that provides multiple views of documents and their entities.Such a tool can help in exploratory analysis. Usually, people form a mental model of the document as they are reading. But as the size of the documents grows, it becomes difficult to have a clear understanding of the document structure and the presented facts. I agree with the author that using visual representations would help expand the working memory.

In Jigsaw, an entity is chosen to be the primary unit. An entity can be a place, organization, date or person. The relationships between entities are represented in four distinct views; tabular, semantic,scatter-plot and textual. Jigsaw can also be used with a pen tablet for making notes and this is can be very useful. The tool is written in Java and is based on the model-view-controller architecture. Jigsaw extracts entities and represents them as nodes in XML. With longer reports, the list view would not be very suitable and it would involve frequent scrolling. To circumvent this issue, the selected entity and the connected entities are moved to the top of the list. Also, I find the scatter plot view shown in the paper to be clumsy and it is hard to read the labels on the axes. Even with sliders, it wouldn't be very helpful. As the size of the document grows, the usability of the tool is impacted. I found only the graph view to be useful as it allows expanding and collapsing of content. The text view is also a neat way of representing entities.

It would be worthwhile to see if Jigsaw has the same effectiveness in analyzing longer reports. Also, Jigsaw uses other available tools to extract the entities from a document. Without determining the accuracy of these tools, it is not possible to judge if Jigsaw presents the information correctly.

Reaction:TileBars:Visualization of Term Distribution Information in Full Text Information Access

The author has proposed a new display style 'TileBars' which provides users a comprehensive view of the length of the document, query term frequencies and distributional properties. I agree with the author's point of view that users should be able to see the relationship between their query and the documents returned by their query. Moreover, the role of the query terms in determining the ranking of the document is usually not shown. e.g. In keyword searches, only the resulting documents are listed without any information as to why the document was retrieved. I often find it difficult to scan through hundreds of documents returned by search results without knowing what caused the document to be shown in the first place. Displaying the query term frequency and other related quantitative information would definitely help in such situations.

The author says that it is necessary to separate the main topic and the sub-topics and querying against any of the subtopics should also leas to the full text document being retrieved. The possible relationships between words in the main topic and subtopic are shown.The texttile is an interesting way to represent the frequency of occurence of terms in the document.I feel such a pattern would definitely help people identify the information easily. Overall, the paper is well written with diagrams to supplement the author's explanations.

Reaction: INFORMATION VISUALIZATION FOR TEXT ANALYSIS

Visualization can be used effectively for text collection and literary analysis. This paper gives enormous details to prove that. In this chapter, Hearst explains about the existing tools which involve in exploring the text collections which can be used by both normal computer users and the non-computer analysts.

Hearst explains many systems which includes jigsaw, manyeyes by IBM, Tirage, Trist, TextARc and more. These systems help in analyzing the word frequencies, word concordances and text mining. Text mining is an upcoming field with social analysis and expected to grow with more people in the social networking.

I feel that TextArc system has the most impressive way of exploring the text collections. The Paper by Dr. Collins Document Content Visualization Using Language Structure explains more about the text arc systems. This system arranged the lines of text in a spiral and placed frequently occurring words within the center of the spiral. This way explains the relations between the word more clearly and many details can be extracted. With many such methods this paper is useful for people who are in the field of text mining.

Reaction: Tag Clouds and the Case for Vernacular Visualization

It was interesting to read that Tag cloud started as early as 1976 by naming the landmarks in Paris. I did not know tag cloud was a concept which originated in late 70s. I though it was a offspring of the Web 2.0.

I agree with the author's idea that the tag cloud has no longer remained just the tag cloud but has been applied to many different domains and is not just restricted to website tags. You will get a number of tag clouds on the web which help the analysis. As rightly said instead of tag cloud we could use the term 'word cloud'. Authors have also given the example of many eyes which is a really powerful viz tool which I too have tried for one of the visualizations.

I think the last page of the paper is very important in terms of the arguments the authors make for tag clouds. I agree with their view (this was also presented in the previous text mining paper) that users typically like lists over tag clouds in specific scenarios. When the requirement of the user is to have an ordered list then tag clouds will not serve that purpose. The authors have given an apt example of "East" and "Easter". However, there are many other fields like analytics etc where tag clouds can be really helpful as stated by the authors. I agree with the authors that whenever your perspective for looking at the words is of some kind of analysis then tag clouds prove to be helpful over plain old lists.

One last comment about the paper -This paper was really well presented and supported with apt examples. The only one thing I would suggest as an improvement is the 3 column structure that the authors have used. It was difficult for me to read this paper in this 3 column structure. I had to scroll up and down three times to finish a page. Barring this the paper is an excellent read.

Reaction: INFORMATION VISUALIZATION FOR TEXT ANALYSIS

This paper aims at describing how effectively we can incorporate visualization into text. This would not just be simple text records but the text that the analysts are looking for. After the initial paragraphs it made me think about this idea. We always say that pictures speak louder than words. However there are some areas where you have just the text and not the pictures. The examples provided by the authors are apt like the telephone log or results returned by a search. In both the above cases you will just have 'boring' text before the analysts. Idea of the paper is that what if we could apply visualizations to the text mining domain?

To continue, the first section of the paper talks exactly about this. How visualization can be applied to the text mining. I could figure it out a bit but this section is too cluttered with examples and a lot of examples. I would have liked some introductory text and then examples to fortify your views. Instead the authors have just cited the examples and the section ends.

Word clouds have also been discussed in the second part of the paper. Here the authors make a statement that the tag clouds are not effective in terms of overall representation and finding any particular word. I have worked on some projects related specifically to words and there I came across the same thing that words can be searched easily and can be better represented in a word list than a fancy cloud or a tag cloud. Next it also discusses about the categorical and nominal variables. I agree with their statement that they are difficult to fit in some visualization because of their non-ordering nature. However the namevoyager is a cool viz and we have gone through it in the class before.

In the end authors have discussed citation analysis and it is related to the above two ideas of the text mining and the concordance analysis. This paper was a good read for people who are new ti text mining however I personally think that the authors could have presented their thoughts in a more better manner as compared to their current presentation. Instead of giving too many examples, they could have explained the concepts in a more better way and then cited the examples.

Reaction: TileBars: visualization of term distribution information in full text information access.

It is impressive to see that the paper was written in mid 1990s and the concepts that discussed are still used till date. I find a strong connection that relates the ideas in the paper to the search engines which we use today. I however do not see the due importance which has to be given to the weightage of the search terms which resulted in the search results that are displayed. This definitely helps the user to improve his search terms so that the results can be altered correspondingly. The author clearly states the papers objective which is to make use of text structure when retrieving from full length text documents.

I like the way the paper has been written, especially the numerous examples given as a supplement to each subtopic which aids the reader to improve his understanding. TileBars is I feel more than just a tool, it lays the basis for search engines by describing what are the key points which are to be kept in mind when performing a search. However I am strongly inclined to say that these ideas do not represent the whole, they are just a part of a larger set. For example we should also keep in mind the users previous search results for a particular term because the user would like to see the links that he has visited in the past when doing the same search in the future. This should also play an important role while ranking the search results.

I also liked the fact that the authors have thought about clustering the search results from two different tools. This is preceded by an acknowledgement to the fact that information access mechanisms should not be visualized in isolation but rather they have to be weaved together.

There is no hesitation in saying that the paper presents valuable research which played in important role in the design of early search engines. I also read one of the reference papers titled "Automatic text processing", which is a good read too.

Reaction:Information visualization of text analysis.

In this chapter we understand about the various tools that can be used for text analysis.Number of tools have been presented in this chapter which helps the readers get a decent understanding of the various tools available.

In this chapter the author has tried to explain each tool with an example. However I felt that though examples are given,it is still a little difficult to get a detailed understanding about the usage of these tools.To use any of these tools,the user needs to research a little more personally before beginning to using the tool. This chapter acts as a good material to introduce the users to different tools to analyze text.

The author states some useful points about the usefulness of these tools to analyze textual information as this is a filed that is going to be very useful and popular for a long time.

Overall the paper is informative ,but the knowledge gained from the paper is not sufficient enough to start using these tools.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

In this paper the author discusses a system for assessing large amount of information, The Jigsaw. The paper give a good deal of information about why to choose grouping method while visualizing data, and why not use any other methods like scatter plots etc. This method is mainly useful for analyzing large documents at a time.

The addition of visual perception to any document that I read helps me understand it better and helps me make sure that I haven't missed any important points that author wants to state. I strongly agree with the authors view point that reports are entities that are irreplaceable in business context. The author also gives the practical applications of this system which created a sense of direction in my mind while reading. More information about the different views in jigsaw should have been given.

I don't agree to the point that jigsaw alone will give the best visual perception. It should be blended with other techniques like list view etc to make a better perception. I always felt that visualizing helps a reader to store the content in his memory for a longer time. In terms of human computer interaction, I would say that visualizations maintain states in memory for a longer time than textual representation.

More technical information on the processing of the large document should have been give to make us better understand the process. As the efficiency of this model is highly dependent on those methods, information about them is a necessity for the reader to appreciate this. It would be interesting to see how information is shown when the size of jigsaw itself become huge. The sample space for the experiment is small which does not add fidelity to his conclusion. More information regarding the type of the input(the document) should have been given. Overall this paper substantiated my understanding of the role of visualization in the analyses of documents.

Monday, November 21, 2011

Reaction: Tag Clouds and the Case for Vernacular Visualization.

The article was a great read. I never knew that tag clouds had a long history dating back to late 1900s - especially the varying flavors with which tag clouds comes. And another interesting point I noted was that tag clouds became popular in the early 2000s which is more than 30 years after its initial conception. The applications of tag clouds are very diversified too, ranging from simple blogging material to the analyzing court documents.

I totally agree with the authors point of view that tag clouds can make it difficult for users to find useful text especially when all the words are of different sizes. Lexicographic ordering of the words provides some ease but things can still get messy when the words are just too many. Out of all the different styles presented I liked the "Money makes the world go round" the most because it has an added dimension of grouping elements together which the users expect to occur nearby. Also, the coloring of bubbles can be used to indicate an additional co-relation factor at which simple tag clouds fail to do a good job.

The authors have made put in good effort to argue the theoretical point of view regarding tag clouds and how it differs from practical applications. As the title of the paper says "Vernacular" meaning non academic (not standardized in some way), the practical application of tag clouds often defeats a theorists view of its limited scope and readability. As it is mentioned in the paper, a tag cloud totally violates the traditional visualization technique and powered with web 2.0 its applications are numerous with which comes the unearthing an important class of data called "unstructured text".

The authors did a great job in writing the article and by keeping arguments concise, they have been able to include as much information as possible in few pages. The ordering of the text flow has been great with the description of the history in the beginning of the article and then slowly moving on to today's applications and finishing off with a gist of why we need tag clouds. I would rate the article 10/10 as I thoroughly enjoyed reading and learning from it.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

In this paper, the author discusses about a visualization called TileBArs and their role in searching text in a full text. But the paper actually kicks off with the explanation about the jargon's that are involved in the paper. This gave me a kind of familiarity with the topic. The drawbacks of the other methods like similarity searching, ranking etc helped in understanding the reason for the thought and implementation of a visualization such as TileBars.

As we know that searching for a specific target with generalized words is very difficult. But the frequency of those general words in that specific target may be very high. So, visualization that help in searching these kinds of text in full text is very important and lot of research have been done and will be done in the future. I personally haven't seen much of the practical implementation of these visualization though.

More details about the implementations should have been provided to make this paper more interesting, but looking at the date in which this paper was published, this expectation looks a bit over. A little bit of googling on thing topic directed me to websites that describe this topic in much more depth. Overall the paper presented the topic of visualizations role in searching full texts in a good and in depth manner, which also gave a flavor of how demand urges innovation and creativity as this paper is written at the age of initial development in this field.

Reaction: Tag Clouds and the Case for Vernacular Visualization

In this paper the author discusses on the most interesting and popular types of visualization which is Tag Clouds. Though it has its own limitations, but the advantages that one gain with its use and the visualization that it provides make it one of the most popular visualization technique. The paper initially starts of with the detailed explanation of the history of Tag Cloud. The visualization of text using Tag cloud is very simple and it is the backbone for its popularity and existence. The author tried to emphasize this point through out the paper.

He presented the content in an interesting way. At every point he showed the downside of the tag cloud as well explained reason why it is preferred compared to other visualization. This helps the reader open his mind in all directions and question the conclusions he had developed while going through the paper.

There is a lot of scope for research in this field as it can be noted from the content of the paper. The font size can also be made significant in the tag cloud which may help visualizing text much better. The way of representing longer words in the tag cloud can also be changed to make visualization better. It looks as if longer word have some preference in a normal tag cloud even though they actually don't in that context. The size of the text is adding that flavor to the word.

The paper also gives a good deal of information on the different ways of visualization of the tag clouds. The impact that tag clouds made on the Web 2.0 was an interesting topic to ponder on. It will be interesting to look for the efficient algorithm that is behind this. I think the should be lot of parsing involved to remove unnecessary duplicates. The grammatical distinction between the same words also adds an interesting flavor to this visualization. Overall the paper was specific to its scope but made a great deal of explanation about the visualization.

Reaction: Information Visualization for Text Analysis.

The subject presents a completely new field to me called "text mining". I felt that this read presents a few good case studies of text mining and critiques a couple of real world examples. It is always enjoyable to learn about interfaces and the purpose behind their creation. Identifying key elements within a text and showing its inter connections can be used in pattern recognition and pattern analysis. Especially when the text is unclear, a natural language processing system can use text mining to discover predictions.

The chapter is however limited in scope with the author specifically mentioning the applications of information visualization in the areas of text mining, document concordances and word frequencies, literature and citation relationships and has basic examples for all three. Coming to document concordances and word frequencies - again the authors prefer to critique a few ideas than going in depth to provide a detailed analysis for the user about how to think about visualizing documents and word frequencies. The examples seem out-dated with some going back all the way to the year 1994. The positive take away from this is the neat categorization of examples from tag-clouds to text arc to bar charts. Variation in presenting examples has been good and it certainly helped me decide which visualization to choose depending on the requirement. Baby-names has always been my most favorite.

The literature and citation relationships section uncovers a couple of possible applications like detecting plagiarism. Suppose a set of nodes map to a single node indicating that all of them have a common citation and a different node maps to the set of nodes but not the parent node, then in this case it is evident that the authors did not acknowledge original work done previously. Also, the importance of a paper can be determined by analyzing the degree of the node. If the degree is high, then it means that more number of papers have referenced this node and thus it is of high importance. I found it interesting to note the shift in analysis from nodes and links towards linking interactions. This approach certainly helps a better drill down into a certain time frame or a certain author to tell us which papers did the author reference the most.

Overall, I felt that the paper is very basic and the conclusion does not make a strong statement about why the authors prefer to critique the examples/tools than helping the reader understand in what lines can good visualizations be done.

Reaction: Information Visualization for Text Analysis

This chapter presents a summary of large number of tools and products that aid in visualizing textual information, text mining, documents, websites, as well presents a good deal about the mining papers based on their relationships with the citations. The tools they described have an example associated with them but they lack the explanation in a much more granular level. The presentation of these many tools in a single chapter provides reader with multiple options to chose to continue research on, but the description given was very brief.

Most of visualizations presented in the chapter were already discussed in the class and acted as a good touch up on those concepts. This area of research is of high importance as we can see the growth in the amount of data in this fast pacing world. We need good visualization to represent this huge amount of text. The chapter kindled a thought in me about the difficulty in visualizing text when compared to that of numerical data.

I completely agree with theDude in the context of the citations. It is difficult to generate relationships to the paper based on the citations. There will be lot factors involved in this case which should have been considered while explaining. Overall, the chapter was a good read but would have made much more sense if the tools have been described much more in detail.

Find: Typographic Design in the Digital Domain: with Erik Spiekermann

On design and legibility in the small.

Typographic Design in the Digital Domain: with Erik Spiekermann

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualiation

This article discusses a novel way for analysts to search written text and aid in the investigative process. The emphaasis was on aiding researchers, not replacing researchers. The focus of the system that they developed seemed limited. They focused on very short (1-5 paragraph) documents, and I wish they had made a case for why that was a good thing to focus on. Indeed, it seemed like the data that their system could take it needed to be very specific.

I think the idea of using multiple views to help in analyzing documents is a good idea, but having to have four monitors in order to use the system effectively seems a little excessive. I know that in my own work it helps to have a second monitory, but I would imagine there's a point where having a certain number of monitors would start to hamper and not help productive analysis. The paper goes through the different views that Jigsaw has, but there isn't a lot of discussion for why they are using those views. I would have liked the view choices to have been backed up by some current analytical techniques, instead of a focus on how the views were created.

The scenario that the paper goes through seemed helpful, but rather contrived and abstract. It would have been better, in my opinion, to see the tool used in a real life setting instead of this example that seemed like the person using it already knew what to look for. There was not a lot of reason given for why the analyst looked for certain connections, and how that situation was true to the real world. The article concedes this point at the end when they talk about how the system hasn't been evaluated. I would be very interested in seeing how the system faired in a real world evaluation. Overall, though, I think the article had some interesting takes on document analysis.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text INformation Access

This paper is an early approach to text analysis of full docuements. It starts out point out that up until the point the article was written, most text analysis of documents focused on the titles and the abstract. This paper introduces a system to analyze the entire document and makes the case for why that would be preferable.

This paper seemed to focus on the unique structure of academic papers and how that could be exploited for relevance searches. I thought that it was an interesting idea to incorporate, but wasn't sure how it could be applied more generally, since many documents don't follow the same structure as academic papers. Indeed, the TextTiling algorithm would only be useful on academic papers and not on, say, a novel.

Overall this was a good introduction to the issues involved in analyzzing text documents. I think it was a little too specialized, but the tool they introduced seemed to do a fairly good job at the analysis it was attempting. I thought the best point that I took away from this paper was in the related works section where they talk about how difficult document content information is to display in existing graphical interface techniques. In that case, it would make sense to try to section the problem as best you could and deal with smaller problems as opposed to tackling the big problem.

Reaction: Tag Clouds and the Case for Vernacular visualization

This article looks at tag clouds in the sense that they were visualizations that developed outside of the academic world, but more in response to the large amount of collective information that people were dealing with on the web. The article went through the history of tag clouds and it was interesting to me that at one point tag clouds were declared "The Greatest Diagram" of any year because my impression from other sources is that tag clouds are mostly viewed as poor visualizations. The end of the article pointed that out when it talked about the theoretical problems with word clouds. I thought it was interested that they said word clouds might work in practice, but not in theory.

One point that I think the article suggested, but didn't explore enough was when they mentioned that there's a difference between a tag cloud and a word cloud. I think they are definitely different things and I think it would be useful to look at them in different contexts because they serve different purposes. Tag clouds seem, to me, to be useful for indexing and serve as an entry point to a website, whereas word clouds would be more useful for analysis of the given text.

Reaction: Ch. 11: Information Visualization for Text Analysis

This chapter is about applying visualization techniques to text geared more towards text analysis. The paper goes through the basic reasons people would want to perform a text analysis on a document and discusses some of the currently available software for performing specific types of analytical tasks. There wasn't a lot of context given and at some points it seemed like the chapter was just listing different functionalities of various software programs.

The chapter only briefly touches on the reasons why people would want to perform a certain analysis of text. I would have liked to know more about each of the reasons mentioned (text mining, concordances/word frequencies, etc) for why people would analyze text and maybe where the programs they talked about fell short. Overall, it was a good general introduction to visualization of text.

Reaction: Information Visualization for Text Analysis

This chapter puts forth the applications of visualization in text-mining, the methods used in these visualizations to show correlations among objects across documents, and how visualization techniques are being used to extract meaning across syntactic and lexical ontologies.

The first section seems to focus solely on simple products that amount to examples within the field. Within my own work in text and data mining, these specific examples seem to provide a poor overview of the distribution of lexical analysis types and there probable use. The also seem to be quite limited in their capabilities to analyze associations and extensively expand on the implemented techniques used.

The second section, which re-covers many of the visualizations we have previously gone over, seems similar to the first section in that it is limited in it its breadth of visualization types and techniques. It is, however, helpful for my group’s particular project in that it discuss the pros and cons of these various techniques in some detail, and some of them we have looked at as possible project visualizations.

The third and final section is even more limited in its scope. Data analysis in text mining is a huge area of research, especially in mining scientific papers for previously unknown information elucidated through connections between possibly disparate articles and authors, and this paper limits itself simply to author citation analysis. This technique, when including a world view, is often incredibly unhelpful as citation techniques vary wildly from country to country. In China, citations are often of other Chinese papers which are plagiarized from non-chinese writers, where the original authorship is obfuscated. In Russia, specifically Moscow, where data falsification is especially rampant, citations are often minimalized to reduce the chances of falsified data being discovered, resulting in a falsely low connection rate.

Overall, as this is a chapter in a book, the material contained within could have been greatly expanded to cover more topics in a broader way.

Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization

Jigsaw is a visual analytic tool that helps investigative anaylsts deduce conceptual models quickly by studying multiple coordinated views of document entities and amount of coupling and correlation among entities.Views are coordinated in such a way that actions within a view is reflected in others. Author specifically mentions that its a human centered system which puts analyst in charge of analysis as opposed to automated algorithm or techniques. Document entities are represented as four coordinated views :- tabular,semantic graph,scatter plot and text view , wherein changes in one is immediately communicated to other views.Jigsaw offer many customization feature to support different analysis requirement of users like authoring view etc.Its interesting to learn that jigsaw designed using familiar lang like Java and XML and based on Model View Controller architecture.List view efficiently handles overload by providing a mode in which all selected and connected entities are automatically moved to the top of the list. Incremental approach of graph view definitely makes it useful, systematic as well as informative. However, logic should have been designed for positioning entities systematically, not randomly. Scatter Plot view look clustered, however multiple visual representation, color coding and shapes will offer a slightly clear picture. I believe that scatter plots are less informative as compared to graphs or tabular views. Text View appears to be an overall summary view with links and mapping to other views. Author supports finding with help of a scenario. As a part of future work, author propose to replace current batch -oriented model with more dynamic model wherein entity extraction will be integrated on-the-fly. Overall, its a good read.Jigsaw is an effective way to visualize large text documents,but it will be interesting to do some stress testing on jigsaw to learn more about its real scope and limitations.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

This paper discuss about Tilebar which is an effective analytical tool to evaluate importance of a document based on relative document length, query term frequencies and distribution.i agree that context and structure, both, should play an important role in information access from full text document collections. Author also mention other text retrieval techniques like similarity search which rank documents according to query term proximity and Boolean retrival which use set operations of group or differentiate documents. But their effectiveness fade as user is unaware about the contribution of query terms in ranking of retrieved documents.Structuring the query to include search by main topic and their subtopics would definitely improve full text retrieval techniques as diffrent distribution of term have different semantics.To supplement the above thought, author introduces new "Text Tiling" Algorithm which is intuitive but results are not aligned to expectations. Tilebars use deviation from standard behaviour and appropriate shade mapping as a measure of its visualization properties. Tilebars also allow user to customize their search and present results according to distribution pattern based on variation of interface. Author support finding with help of its application in medical field.This pattern will definitely improve the text retrieval techniques. Overall,the paper is well organized and very informative read.

Reaction: Tag Clouds and the Case for Vernacular Visualization

This paper is a very interesting read as it talks about tag clouds which has its application in many technical and non-technical areas. Author first gives some background knowledge about vernacular visualization and then talks about how tag clouds,originated in Web 2.0 environment,can be applied on wide range of data inputs and patterns.History of tag clouds suggest that they have been prevalent from a very long time, wide range of its usage has been realized lately.Primary purpose of tag clouds is to provide a visual overview of collection and importance of text.Author explains this with the help Milgram experiments, followed by Flanagan;s Search Referral Zeitgeist and flickr. I agree with author's view that "Tag Clouds function as aggregators of activity being carried out by thousands of users, summarizing the action that happens beneath the surface of socially oriented websites. Overall its an interesting read.

Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access

The article focusses on querying documents and indicates that this tileBar approach has significant applications when the documents retrieved are huge. The article gives a background of existing document searching techniques and claims that these approaches are not transparent, flexible and because of these reasons, they might not give the best desired results. Also the paper acknowledges the importance of not just a proper searching technique but a visualization system for fast interpretation of results.

Document structure becomes extremely important in cases like these because this approach is very systematic and discovers the document structure before actually searching for keywords. We can see that tile bars provides good visual cues in order to understand the search results. The paper identifies 3 important factors: length of the document, frequency of terms and distribution of terms in the document in searching documents and tries to get this information out in its search results visually. I liked this approach because this information set can be huge and I think displaying it visually is the best way to do it. One thing which I thought should have been explained is whether the tool allows filtering of results and also searching in specific areas of the document if necessary rather than the entire document always. One of the striking features of this tool is that it is really comprehensive in showing the results and gives not just visual information but also statistical data to back it up. The author correctly puts it as "weaves together interface, presentation and search in a mutually reinforcing fashion". This definitely will help users to understand the relative importance of documents for the search query.

I found the paper to be well organized and self explanatory. The flow of ideas was really good starting with the existing implementations and with the limitations and then carrying it forward with TileBars approach. It would be really interesting to read about TIPSTER which also includes user feeback and judgement.