Saturday, November 26, 2011
Viz: What topics science lovers link to the most
Reactions: TileBars: visualization of term distribution information in full text information access.
This paper talks about TileBars which is a visualization technique to display information and relation between full text of multiple documents. It lets user to show relative length of document, frequency of terms and distribution of terms across the document(also with respect to other documents).
Paper talks about older techniques for information retrieval just work with title and abstracts. Paper tells about most common approach for text retrieval where it informs about similarity search. Similarity search uses vector space model and probabilistic model for determining how closes document are with each other. In this it uses Boolean retrieval where documents are extracted in ranking order after they satisfy constrain given by user. This way suffered from numerous drawbacks so they proposed TileBars.
TileBars helps user in decision of which documents to view but it goes in more detail by telling which passage of those documents. TileBars displays search result of information retrieval using Tiles in square. Color shades of tiles is used to visualized term frequency and their size is used to visualize length of documents. This is achieved using ‘TextTiling’. Paper’s discussion about ‘TextTiling’ algorithm is very short but paper provides brief about its working. TextTiling provides boundaries between subtopic using term repetition in documents.
I agree to the author’s points that columns of TileBars can easily be searched and understood in comparison to previously stated techniques. Overall I find paper very well structured but somewhat hard to understand for some sections. Even after being old I think it is still can also be used today with news or information website (like blogs). Also I do feel some similarity(use of tile and color) between this visualization and Map of the Market which we discussed in class.
Reactions:Jigsaw: Supporting Investigative Analysis through Interactive Visualization.
This paper talks about Jigsaw system which helps analyst to retrieve information in document or report easily. Authors proposed this system to overcome difficulty to understand and makes sense from long and ever growing number of documents. Jigsaw provides multiple visualizations for data and link between content of reports. These multiple views different advantages of each view .Using these views interactively analyst can put the pieces of information together to reach conclusion(mainly find hidden information). It has following views: (1) List view with connection between object of different columns; (2) Graph view displaying link between reports and entities;(3) Scatter plot for closer investigation between any two categories and (4) Highlighted original text report.
I agree to author’s point that Jigsaw is a system to assist the analyst in sense making activities but not a substitute for careful analysis of reports. Paper is very structured and provides very well description of tool. Moreover paper also compare with works of similar tools like GeoTime, WebTAS, TRIST and SANDBOX. Jigsaw differs from other system because it focuses on (1) exploring relationships among the entities in documents and (2) representations of relationships.I feel Screenshot provided in paper helps in getting real good description of jigsaw views. I feel tool can be very useful because each view can overcome disadvantage of other views.
Reactions: TIMELINES: Tag clouds and the case for vernacular visualization
This paper starts from history of tag clouds and states first use in 1976 by psychologist for capturing mental map of Paris using landmark name. Paper talks about it first time use in computer field in 1995 by Douglas Coupland. First major use of Tag cloud was to show Fortune Magazine for displaying 500 corporations. Then it interest kept on growing with time and later on with its usage on websites. Paper discusses about two type of tag cloud one being traditional one word tag cloud and other being two word tag cloud. In two word tag cloud emphasis is given through visual technique to most used two word phrase.
Most interesting fact that paper states is tag cloud don’t follow lots of theoretical rule of visualization still it is adopted by so many websites. Few problems that paper states about tag cloud are: Long words get undue advantages of more visual attention than shorter words, alphabetical ordering of words can show unrelated word closer and also they are not so good in overall sense making in comparison to few other techniques. However that’s in theory but in practical it is still getting interest even after breaking some golden rules of visualization. This is because tag cloud provides friendly atmosphere and easy entry into complex site like social websites and blog sites. I like the content about paper overall as it contains many interesting fact which subtle advantages and theoretical disadvantages defined very clearly. Also I liked the point of author, that we may need to relook for new possibilities and at old knowledge as tag cloud succeeds even after not complying to best practices. This could be very beneficial for finding new techniques because tag cloud being one of the most curious and used visualization was not found through research and unconventional ways.
Reactions:Information visualization for text analysis.
This chapter from Search User Interfaces talks about information visualization for text analysis. This chapter talks about multiple ways and tools for visualizing mined text. First section of chapter talks about strategies like text mining to find important words in text and relation or connection between words. There are multiple tools talked about in very brief which provide this functionalities. It talks about TAKMI for text mining for call center; Jigsaw system for analyzing relationship between entities or word; BETA system for IBM fountain for data exploration using Tilebar. However all of them are talked with very less detail and some screen shot.
Then chapter talks about analyzing words and text by extracting concordances and display them in context. It discussed way very common way to view accordance is to display word for interest in center and others around it by relevancy. For this it talks about tools like SeeSoft and TextArc which differ in display of concordance. SeeSoft display’s concordance in columns and TextArc display’s concordance in spiral with connection to words. Then it also talks about tag clouds used in websites and few other ways for displaying concordance and relativity between words. Chapter also discusses about displaying relation between citations in literature. One of visualization discussed in chapter uses graph with nodes and link for displaying relationship between authors and documents.
I feel chapter describes very basics and introductory information about visualization for text mining and its tools. It talks about tools but not in detail and just provides screen shots which I feel is of very less use if I want to create tool for this purpose. This chapter would have been more informative if would have discussed about detailed process of mining information and how tools process information for visualization.
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
I'm not certain how easy the tool is with large data sets. Many of the screenshots I feel show a visualization that would not scale well to thousands of documents. For example the "List View" does not seem to have an easy way to filter each side of the list. While the paper discusses a search option, a simple filter would be nice. Their original stated goal is to help "understand" the reports, but I'm not sure how well they do that. I could see it helping in discovering related documents, which may help in understanding, but more of as a by-product.
Overall the tools seems like it would be a good tool for exploring small to medium sized collections of documents.
Friday, November 25, 2011
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
This paper presents a visual analytic system called Jigsaw that represents documents and their entities visually in order to help analysts examine reports more efficiently and develop theories about potential actions more quickly. It provides multiple coordinated views of document entities with a special emphasis on visually illustrating connections between entities across the different documents. It helps investigative analysts who are faced with the challenging task of assessing and making sense of large bodies of information.
Jigsaw is designed to assist analysts with foraging and sense-making activities across collections of textual reports. Through interactive exploration, analysts are able to browse the entities and connections to help form mental models about the plans and activities suggested by the report data but it is not a substitute for careful analysis of the reports. Instead, it acts as a visual index that presents entity relations and links in forms that are more easily perceived, thus suggesting relevant reports to examine next. Since Jigsaw uses other available tools to extract the entities from a document, without determining the accuracy of these tools, it is not possible to judge if Jigsaw presents the information correctly.
Reaction: INFORMATION VISUALIZATION FOR TEXT ANALYSIS
This paper is a very interesting read for data analysts and people interested in the field of text mining. This chapter describes ideas that have been put forward for understanding the contents of text collections from a more analytical point of view. The paper discusses applications in the field of Text Mining, which usually involve visualizing connections among entities within and across documents. It also discusses methods for visualization occurrences of words or phrases within documents and various attempts done to visualize relationships between words in their usage in language and in lexical ontologies.
I really liked the system, TRIST used for entity extraction to identify the people, places, and organizations that occur within the retrieved documents. Things of interest could also be dragged to the workspace below the search results and documents could also be grouped by clustering or automated categorization.
Each visualization/system has been supported by a very engaging image of itself which makes the visualization much clearer to understand. All the applications listed are pretty interesting like Jigsaw system, TAKMI text mining interface etc. Social visualization sites like IBM's manyeyes.com, and other tools continue to make visualization generally accessible to non programmers. Overall a good read.
Reaction: TileBars - Visualization of Term Distribution Information in Full Text Information Access
This paper presents a new visualization pattern called Tile-Bars which shows the usefulness of explicit term distribution information in Boolean-type queries by making use of text structure while retrieving from full text documents.
It is a very useful analytical tool for understanding the results of Boolean-type queries. This visualization technique "TileBars" for full text searches the output and should also contain the frequency and distribution of words in the document algorithm called "TextTiling" to automatically determine the kind of document structure. The search results are displayed using rectangles and shaded squares indicating the length of document and subcategories of document and the shades denoting the frequency of word in that sub category. Its a very novel method of analyzing search results of Boolean queries and also provides a condense view containing the document length, key term frequency and distribution.
TitleBars is very effective as it presents a visual display of the results and thus the user can open the documents that have a high relevance to his search term and ignore the others. Apart from reducing the visual clutter, it also saves a lot of time for the user. This paper is a must read for people interested in the field of data mining and information retrieval.
Reaction: Tag Clouds and the Case for Vernacular Visualization
This paper shows how tag clouds have evolved over the time and found their way into social media and Web 2.0. I totally agree with the authors point that a tag cloud is truly a “vernacular” technique, i.e. visualization that does not come from visualization community, for example tag clouds. The paper shows the usage of tag clouds and how it is very popular and useful because of its manner of making things so visual, engaging, flexible and easy to read.
The paper cites all the instances when tags proved to be useful from Jim Flanagan usage of tags to show the popular search terms that led people to his website to Flickr showing tagged images to psychological experiments. Since then many sites have followed this idea like del.icio.us and Technorati. It also says that collection of tags is not the only use of tag clouds rather it is also used these days for analytical tasks, which forms word cloud. Overall this paper is short and to the point.
Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access
This visualization seems very useful for doing full text searches, such as for research. I don't see it being very useful for general web searches or having many other uses. I would be very interested in reading more about the author's TextTiling algorithm and how he evaluated it. To me that seems to be a very important aspect of the TileBar system and yet the author went into very little detail about it. He says it is "Serviceable" for TitleBars, but how did he come about that conclusion?
I did find his analysis of the difference between the abstract of an article and its full text. Everything made sense and was intuitive, but I had never thought about the difference from a search perspective and so the simple analysis was refreshing.
Wednesday, November 23, 2011
Reaction: Tag Clouds and the Case for Vernacular Visualization
The article's examples of the one-word tag clouds versus the two-word tag cloud was interesting. While the two-word tag cloud provided more insight into what the speech was actually about, most words seem to either be extremely large (only one), large (about two), and everything else. This does not seem to indicate frequency as much as the one-word cloud, but probably gives a better indication of content. More details about this would be interesting.
Overall the article is fairly good and comes down pretty hard against word clouds, which i would mostly agree with, especially for things like speeches. I did like how they gave examples of word clouds that were more beneficial, such as tags of users photos that give an overview of that user.
Reaction: Information visualization for text analysis (Chapter 11)
Mostly it seems that this is a list of tools, not of visualization ideas for any of the three categories it claims to cover. It poorly presents the tools with most a one or two sentence description without much depth. The few times it does go into more detail about a tool it does not evalute it to much degree.
I am not one hundered percent sure what this article as aimed at, as it does not seem to attempt to accomplish its goals, nor does it seem to be anything more than a starting point for someone looking into tools for representing large text collections and at this it will probably become quickly outdatated.
Guests: final presentations, December 12
We will have several guests for final presentations. As I've said before, I will not be there:
- Pat Fitzgerald, Prof. Art & Design and director of the Advanced Media Lab
- Billy Houghteling of OTT and Springboard
- Clayton Coleman of IBM
Tool: Jonathan Stark on Mobile" from the The Web Ahead podcast
Jonathan Stark joins Jen Simmons to talk about web apps vs. native apps, when to use which mobile technology, how to plan a good mobile experience, touch events, and more.
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
It was a really interesting paper to read. The authors discuss about a novel method to analyze large volume of text and thereby aid the analysts in the investigation process. For this the authors have developed a toll called Jigsaw which they used to represent the text document in a visual fashion. One thing which was interesting about this paper was that unlike other papers they have talked about the cost to analyze the document and how the relationships are determined. Another thing of concern is that how can the analysis be verified.
It gives me a feeling that having a tool like Jigsaw for large documents is very important in text summarizing as the amount of text data we have is enormous and having it represented as a visualization is ought to be very helpful. But I don't know how difficult and complex it would be for large ones as the authors have shown it only for short documents.
Another thing of concern is having to cope up with the use of multiple monitors. I have personally used two monitors at a time and it's really helpful but I guess it might be very easy to lose focus if we increase the number of monitor one has to look at simultaneously.
Further the authors have used various visual forms like tables, graph, text view, scatter view etc. to represent the document. I believe the advantage of having multiple representations is to provide a flexible and effective understanding of the document under study.
It would be really interesting to see some future work where they try to scale the use of this tool to larger documents.
Find: User Experience and Experience Design
people's narratives? What personal needs does it meet? http://www.interaction-design.org/encyclopedia/user_experience_and_experience...
Tuesday, November 22, 2011
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization.
The paper provides a good introduction to the investigative tool called Jigsaw. The tool helps the user to understand connections between entities: persons, places, dates and organizations using 4 different views: List View, Graph View, Scatterplot View, and Text View.
In my opinion, each of those views complement each other, and they will be helpful in their own way. However, it will be interesting to see a geographic view that presents a map with different colors depending on how many mentions a country or region had.
Also, I don't think that the tool will be useful when you have large amounts of entities because you would have many labels, nodes, ineligible axis, or the need to scroll a lot, so it would be interesting to see how the tool would adapt to large amounts of text/entities.
Reaction: TileBars: visualization of term distribution information in full text information access.
This is an interesting paper because it explains the differences on performing text retrieval the way they where doing on titles and abstracts, compared to full texts when the information became available.
Since titles and abstracts were shorter than full texts, the information retrieval was performed in a different way. The author introduces the tool called TileBars that performs information retrieval in full texts, but helping the user by providing information about how long was the text, how are the terms distributed among the structure of the document, and provides a way for the user to decide how he wants to perform the search (how many term sets he wants to provide, and the boolean connectors between them).
The tool is a useful because the user can understand how the search was performed, and it will help to refine their searches in order to get better results. Nowadays, whenever we use the search from Google, we need to "trust" on how the search is being performed, and the algorithm is not displayed to the user in a graphic way, so I think that the tool was useful. The paper also provides 3 different examples on how the search is performed using different set terms, and what are the results of the search.
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
Reaction: Information Visualization for Text Analysis
Reaction: TIMELINES: Tag clouds and the case for vernacular visualization
Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
Announcement: final project presentations
Unfortunately my work takes me out of town on December 12, the day of our final. So here is how the final project will work:
- You must create a screencast demo of your visualization. It should essentially be a version of your final project presentation. Put it on your site, where visitors can view it to learn about your site.
- There are several free tools that you could use, including camstudio, screen2avi, jing, screencast-o-matic, microsoft expression encoder, screenpresso, and fraps.
- You should put it online no later than December 15. If you submit it December 8 or earlier, I will give you feedback that you can use to improve your live presentation.
- On the day of the final (December 12), you will give your presentation to several visitors, including ncsu faculty and corporate visitors. They will give you feedback about your presentations and projects.
Reacction:Jigsaw: Supporting Investigative Analysis through Interactive Visualization
Reaction:TileBars: Visualization of Term Distribution Information in Full Text Information Access
Reaction- Tag Clouds and the Case for Vernacular Visualization
The paper starts with the history and evolution of tag clouds but what was really interesting to find was that the concept of tag cloud started very early around 1976. It was simply unbelievable for me. I always thought of it being fairly new. Anyway I never got to read about them in detail but this paper allowed me to do so.
I agree with the point that tag clouds are not restricted to just websites anymore and have application in a number a varied domains. It is one of the effective tools used for text analysis. It was good to see so many different types of example tag clouds used in different contexts.
I think that it will be too cluttered as the size of the cloud grows and hence it might lose its benefits of finding useful information when all the words are of different sizes. I don't understand the relevance in that case as some important but just slightly smaller words might not get noticed at all. I totally agree with the argument about alphabetical ordering versus clustering. It is really difficult to pick one over the other as both have their own advantages.
Overall it was a simple, short, well compiled paper really easy to read and understand. They have done a good job in including so much information in such a short paper along with figures and examples, history and evolution thereby providing a good understanding for Tag Clouds.
Reaction:Tag clouds and the case for vernacular visualtion
Another interesting fact about the paper is how the author highlights the pros and cons of tag cloud visualization.He stresses how the tag cloud gives importance to certain words based on serialization and this may not necessarily be the right way.
Personally I feel tag cloud visualization still needs some more research into it before it can be used. The page looks too cluttered with words and is not ideal for reading. It may good for few fields of visualization ,but yet is not fit to apply to all fields of visualization as yet.
Reaction:Tag Clouds and the Case for Vernacular Visualization
Reaction:Information Visualization for Text Analysis
Announcment: last reactions due next Tuesday 29th
Since we will have no class this Wednesday and a visitor for critique on Monday, our last lecture will take place next week Wednesday the 29th. Your reactions for the current set of readings are therefore due Tuesday the 28th.
Venkata Manda and Lavanya Mohanan will present our four readings on text, dividing between them so each has two.
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
TileBars: Visualization of Term Distribution Information in Full Text Information Access
The purpose of an information access system is to retrieve the most relevant information as requested by the user. There have been many approaches for information retrieval and this paper presents a promising approach using visualization paradigm called “TitleBars” that makes use of the text structure from full text documents.
To begin with, the authors present a brief overview of information retrieval using query and the issues faced using this approach. They emphasize the need to analyze the data retrieved by a query and highlight the main features of TitleBars including simultaneously viewing of the length of the retrieved documents, frequency of the query terms, and their distributional properties with respect to the document. The visualization approach helps the user better understand the role of each query term within the documents retrieved and where other standard information retrieval methods succeed or fail.
The structure of the paper is well written. The authors present a glimpse of the standard retrieval techniques, and their drawbacks. TitleBars is then introduced as a solution to these drawbacks and the approach is well explained as a reaction to three main hypotheses. The paper is concluded by stating the related work and the future extensions to this paper.
I feel this approach is of extreme importance to every user who uses the internet for searching and retrieving documents. Often, I find myself confused and lost when I am trying to search for data using a generic keyword such as “web visualization” and most of the results retrieved have less than 1% relevance to my keyword. The users need to be really creative to give the right combination of keywords to ensure apt search results. The approach presented in this paper, “TitleBars” is very effective as it presents a visual display of the results and thus the user can open the documents that have a high relevance to his search term and ignore the others. Apart from reducing the visual clutter, it also saves a lot of time for the user.
Reaction- Information Visualization For Text Analysis
As the title suggested this paper talks about application of visualization in analysis of text and documents. This topic was very new to me so I had lot to learn from it. I got a good insight of the applications in Text Mining.
The authors say that the most common strategy in text mining is to identify important entities within the text and show connections among them. Through a series of example, including TAKMI system, Jigsaw system, BETA system of IBM Web Foundation etc., they have told what it means by identifying entities and how connections are showed between them. But I wanted to know the underlying principles for this as in how they identify whether an entity is important or not and how do they make sure whether two entities should be connected or not.
The author then talks in detail about the TRIST tool. I guess they were trying to underline the importance of categorizing the extracted data into different dimensions and provide the user/analysts more flexibility. Moving on, they provide methods for visualizing document concordances and word frequencies. Here they talk about alphabetical indexing and contexts. I agree with the notion of sticking to the basics like this as they are very easy for users to figure out. Here again they provide a lot of examples like DocuBurst, Word Tree, Tag clouds, word clouds etc. It is really easy to understand this paper and what it is talking about due to the number of detailed examples they have provided. But I believe that’s too much in a paper for the reader to get distracted easily.
Reaction: TIMELINES: Tag clouds and the case for vernacular visualization
Reaction- TileBars: Visualization of Term Distribution Information in Full Text Information Access
This paper talks about textual analysis of the entire document so as to give a better and very close result to the searched items. Prior to this most of the textual search was based on title and abstract of the paper. I personally feel that it will really affect the search results. There have been instances in the past when I am looking for an academic paper and the search result are so broad and I'm often overwhelmed by the number of results not of any use to me. Also I even wonder how the search results are ranked in order and how legitimate that ordering is.
This paper proposes a new and quite intuitive style of display called TileBars. What they have tried to do is to provide the user with a relative view of the length of the document, and allow them to query term frequencies and distributional properties. They also talk about TextTiling to determine the structure of the document. In my opinion this is fine for general style papers of limited size following a specified structure e.g. Academic papers but I don't think it would be very optimal to use this approach for very lengthy documents like novels etc.
Reaction:Tag Clouds and the Case for Vernacular Visualization
Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access
Reaction:Information visualization for text analysis
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
Reaction:TileBars:Visualization of Term Distribution Information in Full Text Information Access
Reaction: INFORMATION VISUALIZATION FOR TEXT ANALYSIS
Reaction: Tag Clouds and the Case for Vernacular Visualization
Reaction: INFORMATION VISUALIZATION FOR TEXT ANALYSIS
Reaction: TileBars: visualization of term distribution information in full text information access.
I like the way the paper has been written, especially the numerous examples given as a supplement to each subtopic which aids the reader to improve his understanding. TileBars is I feel more than just a tool, it lays the basis for search engines by describing what are the key points which are to be kept in mind when performing a search. However I am strongly inclined to say that these ideas do not represent the whole, they are just a part of a larger set. For example we should also keep in mind the users previous search results for a particular term because the user would like to see the links that he has visited in the past when doing the same search in the future. This should also play an important role while ranking the search results.
I also liked the fact that the authors have thought about clustering the search results from two different tools. This is preceded by an acknowledgement to the fact that information access mechanisms should not be visualized in isolation but rather they have to be weaved together.
There is no hesitation in saying that the paper presents valuable research which played in important role in the design of early search engines. I also read one of the reference papers titled "Automatic text processing", which is a good read too.
Reaction:Information visualization of text analysis.
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualization
The addition of visual perception to any document that I read helps me understand it better and helps me make sure that I haven't missed any important points that author wants to state. I strongly agree with the authors view point that reports are entities that are irreplaceable in business context. The author also gives the practical applications of this system which created a sense of direction in my mind while reading. More information about the different views in jigsaw should have been given.
I don't agree to the point that jigsaw alone will give the best visual perception. It should be blended with other techniques like list view etc to make a better perception. I always felt that visualizing helps a reader to store the content in his memory for a longer time. In terms of human computer interaction, I would say that visualizations maintain states in memory for a longer time than textual representation.
More technical information on the processing of the large document should have been give to make us better understand the process. As the efficiency of this model is highly dependent on those methods, information about them is a necessity for the reader to appreciate this. It would be interesting to see how information is shown when the size of jigsaw itself become huge. The sample space for the experiment is small which does not add fidelity to his conclusion. More information regarding the type of the input(the document) should have been given. Overall this paper substantiated my understanding of the role of visualization in the analyses of documents.
Monday, November 21, 2011
Reaction: Tag Clouds and the Case for Vernacular Visualization.
I totally agree with the authors point of view that tag clouds can make it difficult for users to find useful text especially when all the words are of different sizes. Lexicographic ordering of the words provides some ease but things can still get messy when the words are just too many. Out of all the different styles presented I liked the "Money makes the world go round" the most because it has an added dimension of grouping elements together which the users expect to occur nearby. Also, the coloring of bubbles can be used to indicate an additional co-relation factor at which simple tag clouds fail to do a good job.
The authors have made put in good effort to argue the theoretical point of view regarding tag clouds and how it differs from practical applications. As the title of the paper says "Vernacular" meaning non academic (not standardized in some way), the practical application of tag clouds often defeats a theorists view of its limited scope and readability. As it is mentioned in the paper, a tag cloud totally violates the traditional visualization technique and powered with web 2.0 its applications are numerous with which comes the unearthing an important class of data called "unstructured text".
The authors did a great job in writing the article and by keeping arguments concise, they have been able to include as much information as possible in few pages. The ordering of the text flow has been great with the description of the history in the beginning of the article and then slowly moving on to today's applications and finishing off with a gist of why we need tag clouds. I would rate the article 10/10 as I thoroughly enjoyed reading and learning from it.
Reaction: TileBars: Visualization of Term Distribution Information in Full Text Information Access
As we know that searching for a specific target with generalized words is very difficult. But the frequency of those general words in that specific target may be very high. So, visualization that help in searching these kinds of text in full text is very important and lot of research have been done and will be done in the future. I personally haven't seen much of the practical implementation of these visualization though.
More details about the implementations should have been provided to make this paper more interesting, but looking at the date in which this paper was published, this expectation looks a bit over. A little bit of googling on thing topic directed me to websites that describe this topic in much more depth. Overall the paper presented the topic of visualizations role in searching full texts in a good and in depth manner, which also gave a flavor of how demand urges innovation and creativity as this paper is written at the age of initial development in this field.
Reaction: Tag Clouds and the Case for Vernacular Visualization
He presented the content in an interesting way. At every point he showed the downside of the tag cloud as well explained reason why it is preferred compared to other visualization. This helps the reader open his mind in all directions and question the conclusions he had developed while going through the paper.
There is a lot of scope for research in this field as it can be noted from the content of the paper. The font size can also be made significant in the tag cloud which may help visualizing text much better. The way of representing longer words in the tag cloud can also be changed to make visualization better. It looks as if longer word have some preference in a normal tag cloud even though they actually don't in that context. The size of the text is adding that flavor to the word.
The paper also gives a good deal of information on the different ways of visualization of the tag clouds. The impact that tag clouds made on the Web 2.0 was an interesting topic to ponder on. It will be interesting to look for the efficient algorithm that is behind this. I think the should be lot of parsing involved to remove unnecessary duplicates. The grammatical distinction between the same words also adds an interesting flavor to this visualization. Overall the paper was specific to its scope but made a great deal of explanation about the visualization.
Reaction: Information Visualization for Text Analysis.
The chapter is however limited in scope with the author specifically mentioning the applications of information visualization in the areas of text mining, document concordances and word frequencies, literature and citation relationships and has basic examples for all three. Coming to document concordances and word frequencies - again the authors prefer to critique a few ideas than going in depth to provide a detailed analysis for the user about how to think about visualizing documents and word frequencies. The examples seem out-dated with some going back all the way to the year 1994. The positive take away from this is the neat categorization of examples from tag-clouds to text arc to bar charts. Variation in presenting examples has been good and it certainly helped me decide which visualization to choose depending on the requirement. Baby-names has always been my most favorite.
The literature and citation relationships section uncovers a couple of possible applications like detecting plagiarism. Suppose a set of nodes map to a single node indicating that all of them have a common citation and a different node maps to the set of nodes but not the parent node, then in this case it is evident that the authors did not acknowledge original work done previously. Also, the importance of a paper can be determined by analyzing the degree of the node. If the degree is high, then it means that more number of papers have referenced this node and thus it is of high importance. I found it interesting to note the shift in analysis from nodes and links towards linking interactions. This approach certainly helps a better drill down into a certain time frame or a certain author to tell us which papers did the author reference the most.
Overall, I felt that the paper is very basic and the conclusion does not make a strong statement about why the authors prefer to critique the examples/tools than helping the reader understand in what lines can good visualizations be done.
Reaction: Information Visualization for Text Analysis
Most of visualizations presented in the chapter were already discussed in the class and acted as a good touch up on those concepts. This area of research is of high importance as we can see the growth in the amount of data in this fast pacing world. We need good visualization to represent this huge amount of text. The chapter kindled a thought in me about the difficulty in visualizing text when compared to that of numerical data.
I completely agree with theDude in the context of the citations. It is difficult to generate relationships to the paper based on the citations. There will be lot factors involved in this case which should have been considered while explaining. Overall, the chapter was a good read but would have made much more sense if the tools have been described much more in detail.
Find: Typographic Design in the Digital Domain: with Erik Spiekermann
Typographic Design in the Digital Domain: with Erik Spiekermann
Reaction: Jigsaw: Supporting Investigative Analysis through Interactive Visualiation
I think the idea of using multiple views to help in analyzing documents is a good idea, but having to have four monitors in order to use the system effectively seems a little excessive. I know that in my own work it helps to have a second monitory, but I would imagine there's a point where having a certain number of monitors would start to hamper and not help productive analysis. The paper goes through the different views that Jigsaw has, but there isn't a lot of discussion for why they are using those views. I would have liked the view choices to have been backed up by some current analytical techniques, instead of a focus on how the views were created.
The scenario that the paper goes through seemed helpful, but rather contrived and abstract. It would have been better, in my opinion, to see the tool used in a real life setting instead of this example that seemed like the person using it already knew what to look for. There was not a lot of reason given for why the analyst looked for certain connections, and how that situation was true to the real world. The article concedes this point at the end when they talk about how the system hasn't been evaluated. I would be very interested in seeing how the system faired in a real world evaluation. Overall, though, I think the article had some interesting takes on document analysis.
Reaction: TileBars: Visualization of Term Distribution Information in Full Text INformation Access
This paper seemed to focus on the unique structure of academic papers and how that could be exploited for relevance searches. I thought that it was an interesting idea to incorporate, but wasn't sure how it could be applied more generally, since many documents don't follow the same structure as academic papers. Indeed, the TextTiling algorithm would only be useful on academic papers and not on, say, a novel.
Overall this was a good introduction to the issues involved in analyzzing text documents. I think it was a little too specialized, but the tool they introduced seemed to do a fairly good job at the analysis it was attempting. I thought the best point that I took away from this paper was in the related works section where they talk about how difficult document content information is to display in existing graphical interface techniques. In that case, it would make sense to try to section the problem as best you could and deal with smaller problems as opposed to tackling the big problem.
Reaction: Tag Clouds and the Case for Vernacular visualization
One point that I think the article suggested, but didn't explore enough was when they mentioned that there's a difference between a tag cloud and a word cloud. I think they are definitely different things and I think it would be useful to look at them in different contexts because they serve different purposes. Tag clouds seem, to me, to be useful for indexing and serve as an entry point to a website, whereas word clouds would be more useful for analysis of the given text.
Reaction: Ch. 11: Information Visualization for Text Analysis
The chapter only briefly touches on the reasons why people would want to perform a certain analysis of text. I would have liked to know more about each of the reasons mentioned (text mining, concordances/word frequencies, etc) for why people would analyze text and maybe where the programs they talked about fell short. Overall, it was a good general introduction to visualization of text.