Author's Abstract
We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than non-relevant documents. We demonstrate that users are able to use this system close to its full potential.
Additional Comments
Scatter/Gather is a cluster-based document browsing method, an alternative to ranked titles for the organization and viewing of retrieval results. Clustering in Scatter/Gather is dynamic, created based only on the top 250 top-scoring documents from a query. The algorithm divides the retrieved documents into 5 clusters, and lists the common keywords for the articles at the top. By examining these identifying keywords for each cluster, and considering the number of documents in each cluster, the user can determine which of the document clusters best fits their information need. The user study and experimental study both supported clustering as a useful and usable tool. Even though this is not a graphical interface, the underlying framework is important to graphical interfaces, especially since concepts like "scatter," "gather" and "clustering" are all visual metaphors for text-based presentations.
Marti A. Hearst, Xerox Palo Alto Research Center
Author's Abstract
The field of information retrieval has traditionally focused on textbases consisting of titles and abstracts. As a consequence, many underlying assumptions must be altered for retrieval from full-length text collections. This paper argues for making use of text structure when retrieving from full text documents, and presents a visualization paradigm, called TileBars, that demonstrates the usefulness of explicit term distribution information in Boolean-type queries. TileBars simultaneously and compactly indicate relative document length, query term frequency, and query term distribution. The patterns in a column of TileBars can be quickly scanned and deciphered, aiding users in making judgments about the potential relevance of the retrieved documents.
Additional Comments
TileBars is frequently cited in many information visualization systems. The traditional ranked list of results is supplemented by a graphical representation of the article's correspondence to the search query.
Aravindan Veerasamy, Shamkant Navathe, College of Computing,
Georgia Institute of Technology
Author's Abstract
We describe the design of an User Interface for a ranked output Information Retrieval system that integrates querying, navigation and visualization in a seamless fashion. Highlights of the system include the following:
Additional Comments
This system is an earlier version of the more widely cited system by Veerasamy. The name "Tkinq" is rarely cited, although Veerasamy's name appears in many bibliographies and the system is quite original and potentially useful as a visualization tool.
.pdf available from ACM-SIGIR '96
Aravindan Veerasamy, College of Computing, Georgia Institute of Technology
Nicholas J. Belkin, School of Communication, Information & Library Studies, Rutgers University
Author's Abstract
We report on the design and evaluation of a visualization tool for Information Retrieval (IR) systems that aims to help the end user in the following respects:
Additional Comments
This concise yet information dense graphical display of relevancy results from queries is novel and effective. "The presence or absence of specific significant words in any document can be quickly seen, and it is possible to identify sequences of documents which do, or do not have important contributions from specific query words." This interface differs from the much-cited Tilebars system because the graphical results display the characteristics of many whole documents simultaneously, rather than focusing (as Tilebars) on characteristics of individual documents in a set.
Aravindan Veerasamy, College of Computing,Georgia Institute of Technology
Russell Heikes, Statistics Center, School of Industrial Systems and Engineering, Georgia Institute of Technology
Author's Abstract
We present the design of a visualization tool that graphically displays the strength of query concepts in the retrieved documents. Graphically displaying document surrogate information enables set-at-a-time perusal of documents, rather than document-at-a-time perusal of textual displays. By providing additional relevance information about the retrieved documents, the tool aids the user in accurately identifying relevant documents. Results of an experiment evaluating the tool shows that when users have the toll they are able to identify relevant documents in a shorter period of time than without the tool, and with increased accuracy. We have evidence to believe that appropriately designed graphical displays can enable users to better interact with the system.
Additional Comments
This system attempts to present information retrieval results in a novel and helpful way to save the user both time and effort in finding relevant material (and eliminating irrelevant material). A combination of a visualization of result relevance with a title display allows the user to skim only those titles which seem relevant from the visual summary. The system was experimentally user-tested and found to be more efficient than a traditional title display. References to similar visualization systems and a comparison to TileBars are included.
Matthew Chalmers and Paul Chitson, Rank Xerox Cambridge EuroPARC
Author's Abstract
We describe work on the visualization of bibliographic data and, to aid in this task, the application of numerical techniques for multidimensional scaling.
Many areas of scientific research involve complex multivariate data. One example of this is Information Retrieval. Document comparisons may be done using a large number of variables. Such conditions do not favour the more well-known methods of visualization and graphical analysis, as it is rarely feasible to map each variable onto one aspect of even a three-dimensional, coloured and textured space.
Bead is a prototype system for the graphically-based exploration of information. In this system, articles in a bibliography are represented by particles in 3-space. By using physically-based modelling techniques to take advantage of fast methods for the approximation of potential fields, we represent the relationships between articles by their relative spatial positions. Inter-particle forces tend to make similar articles move closer to one another and dissimilar ones move apart. The result is a 3D scene which can be used to visualize patterns in the high-D information space.
Additional Comments
BEAD is one of the first graphical prototypes for information retrieval results. This innovative system inspired much future research and developments on systems for displaying retrieved documents graphically. While the display is not impressive by today's graphical standards, BEAD is a truly novel interface.
Anton Leouski and James Allan, Center for Intelligent Information Retrieval
Department of Computer Science, University of Massachusetts
Author's Abstract
none
Additional Comments
This study investigates an interactive visualization technique where retrieved documents are placed in a 3-dimensional space and positioned according to the similarity among them. "Although our system does not explicitly create any clusters, we observed that relevant documents tend to appear in close proximity to each other, often forming tight "clumps" that stand apart from the rest of the material. Two added features incorporate the user's feedback into the visualization (after the user has marked some documents as relevant or non-relevant). "Warping" makes known relevant objects move closer together and attract other relevant material. "Restraining" makes known relevant and non-relevant objects move apart, and the rest of the documents "stretch" between these two groups. The system in the study can visualize documents in 1, 2 and 3 dimensions. "We observed that 1-dimensional visualization was generally inferior to higher dimensional presentation, however we found almost no difference between 2- and 3- dimensional pictures." Some user testing is implied but not described in the wording of the brief report, and no additional information has been published.
R.J. Hendley, N.S.Drew, A.M.Wood & R. Beale
University of Birmingham, UK
Author's Abstract
It is becoming increasingly important that support is provided for users who are dealing with complex information spaces. The need is driven by the growing number of domains where there is a requirement for users to understand, navigate and manipulate large sets of computer based data; by the increasing size and complexity of this information and by the pressures to use this information efficiently. The paradigmatic example is the World Wide Web, but other domains include software systems, information systems and concurrent engineering. One approach to providing this support is to provide sophisticated visualization tools which will lead the users to form an intuitive understanding of the structure and behavior of their domain and which will provide mechanisms which allow them to manipulate objects within their systems. This paper describes such a tool and a number of visualisation techniques that it implements.
Additional Comments
This site includes several screen shots of the system with descriptions. The VR-VIBE system is applied to collaborative visualizations in a project called Populated Information Terrains, which is described at http://www.crg.cs.nott.ac.uk/research/applications/pits/.
Chris Brown, Steve Benford and Dave Snowdon, Communications Research Group, University of Nottingham
Author's Abstract
We begin by reviewing techniques for visualizing large scale hypermedia databases. We present a definition of large scale
databases, introduce a scoping technique to handle them, and discuss collaboration support. This leads to a discussion of
the implementation; we discuss browsing and searching, and the embodiment of database users in the visualization.
Finally, we present an example application of these techniques: The Internet Foyer.
.pdf available from the ACM Digital Library - SIGCHI '96
Author's Abstract
LifeLines provide a general visualization environment for personal histories that can be applied to medical and court records, professional histories and other types of biographical data. A one screen overview shows multiple facets of
the records. Aspects, for example medical conditions or legal cases, are displayed as individual time lines, while icons indicate discrete events, such as physician consultations or legal reviews. Line color and thickness illustrate
relationships or significance, rescaling tools and filters allow users to focus on part of the information. LifeLines reduce the chances of missing information, facilitate spotting anomalies and trends, streamline access to details,
while remaining tailorable and easily transferable between applications. The paper describes the use of LifeLines for youth records of the Maryland Department of Juvenile Justice and also for medical records. User's feedback was
collected using a Visual Basic prototype for the youth record.
Additional Comments
Additional publications are available at http://www.cs.umd.edu/hcil/lifelines/
Author's Abstract
This paper introduces a novel user interface that integrates search and browsing of very large category hierarchies with their associated text collections. A key component is the separate but simultaneous display of the representations of the categories and the retrieved documents. Another key component is the display of multiple selected categories simultaneously, complete with their hierarchical content. The prototype implementation uses animation and a three-dimensional graphical workspace to accommodate the category hierarchy and to store intermediate search results. Query specification in this 3D environment is accomplished via a novel method for painting Boolean queries over a combination of category labels and free text. Examples are shown on a collection of medical texts.
Additional Comments
While combining existing 3D animation with information retrieval for MedLine, this system attempts
Author's Abstract
Users often must browse hierarchies with thousands of nodes in search of those that best match their information needs. The PDQ Tree-Browser (Pruning with Dynamic Queries) visualization tool was specified, designed and developed for this purpose. This tool presents trees in two tightly-coupled views, one a detailed view and the other an overview. Users can use dynamic queries, a method for rapidly filtering data, to filter nodes at each level of the tree. The dynamic query panels are user-customizable. Sub-trees of unselected nodes are pruned out, leading to compact views of relevant nodes. Usability testing of the PDQ Tree-browser, done with eight subjects, helped assess strengths and identify possible improvements. A controlled experiment, with 24 subjects, showed that pruning significantly improved performance speed and subjective user satisfaction. Future research directions are suggested.
Additional Comments
The Pruning with Dynamic Queries (PDQ) Tree-browser allows users to view hierarchical data in a detailed view and an overview concurrently. At each hierarchical level, users can select three attributes to query on, and various widgets are provided to specify these queries. Results are dynamically updated as the sliders/menus are changed for each attribute. Nodes that do not match the query are greyed-out and subtrees of these nodes are pruned completely and not shown. The idea is that with the irrelevant information no longer available, the user can examine the relevant information quicker and easier and make a more informed decision. The overview of related research was quite detailed and included references to many similar projects, like fish-eyes, and FilmFinder.