Development and Testing of a
Visual Information Retrieval Environment

Proposal to the University of Illinois Campus Research Board

P. Bryan Heidorn

Graduate School of Library and Information Science

Introduction

A visual information retrieval environment provides visualization features that help users manage the large result sets that are typical in many information retrieval environments. There are two main objectives for the current research. The first objective is to develop a multimedia information retrieval environment that can serve as a platform for a series of studies in visual information retrieval beginning with those proposed here. The second objective is to use the environment to conduct a pilot study that is a good example of the type of research that can be performed in this environment. This environment and the pilot study would provide the proof of concept needed for external funding. The multimedia information retrieval environment will be an extension of a current system, the Visual Information Browsing Environment (VIBE) (Olsen et al., 1993; Korfhage, 1997). The initial pilot study and the example future studies will provide performance data on three features of the visualization interface: dimensionality, document structure and image indexing. This data will inform the design of the proposal for outside funding which would examine these features and others. The proposal provides a brief introduction to Visual Information Retrieval Interfaces (VIRI), followed by a description of VIBE. Traditional measures of recall and precision are not sufficient for measuring the system performance of interest in this work. Appendix A provides a discussion of recall effort as a measure of retrieval effectiveness. The final sections provide a description of the three experiments. The first experiment investigates dimensionality and individual differences (see Appendix B). The second and third experiments provide an indication of future direction and applicability of this multimedia information retrieval environment. The final section details the system development effort. Appendix H provides a timeline, personnel responsibilities and detailed budget. Support for this proposal would provide funding for a system port, system enhancements, and this first experiment.

The volume of information that can be retrieved has become a barrier to the usefulness of the information. One approach to the problem is to provide new forms of information space visualization that are tuned to the skills and experience of the individual. The information environment proposed here allows searchers to represent hundreds or thousands of documents simultaneously on a computer screen and manipulate and navigate through this information space. This approach gives the user dynamic control of the information space. This is qualitatively different from the relatively static picture of information space provided by concept mapping techniques such as Kohonen self-organization maps (SOM).

While the VIRI approach has promise, little work has been done on evaluating the approach's effectiveness for information retrieval tasks. In addition to this, frequently these visualization interfaces provide many features simultaneously, making it difficult to determine which combination of features is responsible for the observed strengths and weaknesses of the systems.

The vision behind this work is to produce a retrieval visualization environment containing features that have been verified, alone and in concert, as positively contributing to the retrieval task performance. The first test arena for the technology will be in the field of botanical informatics. There are hundreds of thousands of botanical descriptions spread over dozens of databases around the world. Many countries are funding projects to expand these collections. At no point in the near future will there be retrieval mechanisms and standards available to allow for precise federated searching of these collections. This state of database development is consistent with the status of many scientific and non-scientific disciplines. In this type of information landscape facilities for information exploring and visualization become more important than standard retrieval precision.

The first feature to be evaluated and the only one to be funded by this proposal is 1-D (lists) versus 2-D (ordered scattergrams) computer displays. VIBE already supports a 2-D display but performance with 2-D displays has been inconsistent (Koshman, 1996; Morse and Lewis, 1997; Morse et al., 1998). This experiment will control the system and task parameters to identify the causes for this inconstancy. In the course of conducting this work, we will be familiarizing University of Illinois students with the system development efforts required to build this type of system and with the experimental procedures required to perform this type research.

While this first study is being conducted, system development personnel will move ahead with developing new system functionality. The functionality, as discussed later in the proposal, includes the addition of faceted retrieval to support XML and Z39.50 developments in the underlying databases. It also includes image exemplar based retrieval.

To summarize, this proposal seeks support to fund the modification and extension of an information space visualization system, VIBE. These modifications include usage tracking functions. They also include a new 1-D, ranked list, result display interface. This new interface will be used to perform a pilot study and then full experiment comparing the effect of result display dimensionality on retrieval performance. At the same time the experiment will evaluate the interaction of individual differences in spatial ability with display dimensionality. This experiment will not only shed light on potential merits of the newer multidimensional document display systems but will also serve as a milestone and deliverable for the system development component of the proposal. A detailed experimental design may be found in Appendix B. This system will serve as the foundation for a set of future studies on visualizing information spaces of multimedia data. Two such applications are briefly outlined below to help clarify the general applicability of the approach. These include visualization of faceted retrieval results and the use of faceted document similarity. Some facets will include image features derived from text descriptions.

Background

While it is useful to investigate a variety of factors related to features of Visual Information Retrieval Interfaces (VIRI), such as selection of color (a feature of VIBE) or screen density, it is impractical to evaluate them all simultaneously. Here we are more interested in issues more characteristic of VIRI than to general computer systems or even other character based information retrieval systems. There is no attempt here to review the literature on human factors in database systems or information retrieval systems. There are a number of recent excellent reviews in the design of information systems (e.g. Allen, 1996; Marchionini, 1997; Marchionini and Komlodi, in press; Shneiderman, et al., 1997).

VIRI

There are many information retrieval tasks where large amounts of information are retrieved and displayed as a result of user queries. For example, many web search engines may return hundreds of thousands of items. Commercial services such as Dialog, Educational Research Information Center (ERIC), or Medline, also frequently return large data sets. These are traditionally displayed in one-dimensional lists ordered by some criterion. These displays convey little information to the user about the relationship of the documents to one another or to the query. In contrast, new visual displays such as Bead (Chalmers and Chitson, 1992), BIRD--Browsing Interface for Retrieval of Documents (Kim and Korfhage, 1994), InforCrystal (Spoerri, 1993), Space IR (Newby, 1992), LyberWorld (Hemmje et al., 1994), and VIBE--Visual Information Browsing Environment (Olsen et al., 1993) can represent the relationships in 2-D or 3-D layouts. The addition of more dimensions leads to the loss of bibliographic information such as the title and replaces it with spatial information. The effect of this interface characteristic has not been systematically investigated in IR.

Cugini and colleagues (1996) present three different visual displays of results from a modified version of the NIST statistical text retrieval system called PRISE. The visual displays include a document spiral that converts the ranked list into an iconic display with the one-dimensional line of rank order twisted into a spiral. This allows visualization of the entire 1-D ranking space. The second display type is a three-keyword axis display where each keyword or group of keywords may be assigned to one of three axes. The final display type is a nearest neighbor sequence, which uses a nearest neighbor algorithm to project the high dimensional keyword space into a three dimensional space while keeping documents that were near one another in keyword space, near one another in the lower dimensional space. Unfortunately, the displays have not yet been tested for usability.

NSF-funded research has looked at the relative merits of different VIRI types (Morse & Lewis, 1997; Morse et al., 1998). On a two term Boolean task, performance for text based lists and icon lists were highest and graphs while a spring display (VIBE) interfaces were lowest (Morse et al., 1998). Users had a preference for the spring display over text in spite of the relatively poor performance. The authors point out that this may be due to the Hawthorne effect. There was a cross-condition learning effect for graph and spring displays that had not leveled by the end of the five trials used in the study. This indicates that familiarity with text systems may have contributed to the findings. That research group is currently investigating more complex tasks and learning effects.

Since it is not possible to train everyone on the use of information retrieval (IR) systems, it is essential to design interfaces that can adapt to needs and strengths of particular users. It is necessary to determine which aspects of the interface are responsible for performance differences. Sugar (1995) has noted that in order to design effective systems, we must "identify human characteristics that lead to performance differences and the interface factors with which they interact." In the study proposed here we will investigate the relationship between cognitive skills and display format.

VIBE

This section describes the operation of VIBE in its current implementation. The System Development Effort section of this proposal will discuss the modification that will be made to VIBE in the course of the proposed work. VIBE, Visual Information Browsing Environment (Olsen et al. 1993; Korfhage, 1997), represents documents on a two-dimensional display rather than in the one-dimensional display that is typical of commercial retrieval engines. The placement of the documents is determined by their relationship to reference points or points of interest (POIs). The ratio of the similarities between the documents and the POIs determines the location of document icons. An example screen from VIBE is provided in Figure 1: VIBE. In this example the user has selected four POIs, "contract", "lender", "member" and "director". These are represented as the labeled round icons. Documents are represented as rectangular icons. The size of the document icons represents the size of the documents.

All documents are retrieved as a Boolean OR of the POIs term value. Any document which contains one or more of the POIs is retrieved and placed on the display. A threshold may be set to filter additional documents from the display. The placement of document icons in VIBE is determined by the ratio of similarities between the documents and the POIs. If a document is more like one POI than another, it is placed on the screen closer to the POI it is more similar to. The location of document icons on the display is defined for n reference points and similarity measures s1, s2,...,sn where si is the similarity between a given document and reference point i. Where pi is the position vector for reference point i, the location of the document icon is defined by,

The similarity measure may be any appropriate to the document representation such as a document term frequency. The similarity measure need not be defined in terms of simple word frequency only. Where the POIs and documents are text, the cosine or other vector measure of similarity may define s. Where the documents are images, image analysis measures (e.g. color histogram similarity) may be used. p may be defined in two or more dimensions. In the current study two dimensions are used but versions of VIBE have been produced which work in three dimensions (Benford et al., 1995). Later work may explore the three dimensional representation after a viable research environment has been built through this project and the results determined for the one-dimensional versus two-dimensional display.

Figure 1: VIBE

The searcher may expand any of the document icons of interest. This is accomplished by drawing a box around the document(s). This selection process is depicted in Figure 2: Document Selection. Document titles for all selected documents are displayed in a list. The user may click on any title to open a window including the full document. The document display may be found in Figure 3. In the proposed version of VIBE, the system will record when documents are opened. Also radio buttons will be added to the document display so that the user may mark it as the desired document or not (as described below in the listing of system modifications.)

Figure 2: Document Selection

Figure 3: Document Display

In VIBE searchers may add, move, remove or deactivate POIs on the display. A searcher adds a POI by selecting it from the POI list (Index terms). The searcher then moves the mouse pointer to the desired location on the screen and clicks the right mouse button to place the POI. The system automatically selects documents from the database that match the POI at a level above the threshold. These are located on the screen according to the formula described above. The locations of document icons already on the screen are adjusted appropriately. A searcher may select a POI with the mouse and move it to another location on the screen. When a POI is removed or deactivated, documents with similarity above threshold for only that POI are removed from the screen and then the position of the remaining documents is adjusted.

WebVIBE (Morse & Lewis, 1997) is the basis for the current work. In that research the investigators demonstrated that defeaturing of the VIBE display to produce WebVIBE reduced some of the performance and preference problems that had been reported with VIBE (Koshman, 1996). Defeaturing removes all but the essential features from the interface. The defeaturing result highlights the question as to which features of VIRI and VIBE in particular are leading to performance and preference differences. The University of Pittsburgh development team is working on studying the individual features of VIBE. The work proposed here is coordinated with and complements theirs.

A multidimensional Image Browser derived from VIBE (Cinque et al., 1998) was developed at the Universta' di Roma. POIs in this system, ImageVIBE, may be user sketches. Similarity between a POI and an image icon is determined by a set of geometric scoring functions such as minimum enclosing rectangle, signature, orientation of axis and three other properties. User evaluation has not been performed on ImageVIBE. The last experiments outlined as "future work" at the end of this proposal will differ from ImageVIBE in that the geometric properties will be automatically extracted from the data set and will include domain specific properties which will be contextualized and more psychologically accessible to novice users.

Yet another version of VIBE, VR-VIBE (Benford et al., 1995), developed at the University of Nottingham, is a 3-D version of VIBE. The main difference is the addition of a third dimension. LyberWorld, another 3-D reference point system was developed at the Darmstadt University of Technology (Hemmje et al., 1994). There was no user evaluation or evaluation of the utility of individual features. The current 1-D versus 2-D study developed here could be extended to 3-D in a straightforward manner but this will be left to future research.

Cognitive Factors

There are a number of studies which indicate that subjects have differing ability to use information in text displays, graphical displays and displays with multiple dimensionality. The verbal scores of students in library and information science are higher then the general student population but their spatial abilities are lower (Allen & Allen, 1993). Allen (1994) showed by manipulating a text-retrieval system feature, that those users with higher perceptual speed were able to identify a subject heading, learn from it, minimize response time and retrieve information faster. Leitheiser and Munro (1995) demonstrated that a GUI provided performance benefits over a command line interface for both users with high spatial ability and users with low spatial ability. However, users with high spatial ability benefited more from the use of a GUI than users with low spatial ability. In other studies spatial ability appears to be less important than experience with systems using spatial displays (Swan et al., 1998). Steinberg and colleagues (1995) have shown that users were able to adapt more quickly to the task of monitoring missiles and airplanes when using a 3-D interface rather than a 2-D visual display. Extending these findings to a design of multimedia IR systems it is necessary to understand the relationship between spatial ability and system features such as dimensionality.

Using VIBE, Koshman (1996) investigated the interaction between level of user expertise (novice, online expert, and visual expert) and interface type (graphical and text). Koshman found that subjects classified as visual experts searched faster using the graphical interface and, more importantly, these visual experts exhibited difficulties when using the traditional text retrieval system. People who were familiar with text based systems had difficulty with graphical interfaces. Characteristics of the subjects interacted with system features. Koshman's study compared a Microsoft Windows version of VIBE with the Windows based commercial system, AskSam. Query formation between the systems is similar in that in both systems users select the search terms from a list. The systems' primary difference is in the output display formatting. As in this proposal, VIBE was 2-D while AskSam was 1-D. We expect to find results consistent with those found by Koshman except that we will be controlling system differences more directly. We also will evaluate spatial ability with a cognitive evaluation tool rather than defining spatial ability by experience levels. In our study query formation (term selection) and the subjects' relevance judgement mechanism will be identical between the 1-D and 2-D result display conditions since the same system will be used for both. Also, more instrumentation will be provided in VIBE so that we can examine the amount of time spent in different activities such as term selection, POI movement, and document reading. A more detailed evaluation of the cognitive processes involved in the task can help to identify potential relevant subject and system characteristics.

The information retrieval task can be characterized as one of information overload. Large retrieval sets are common. Much of the retrieved information is irrelevant to the information searcher’s individual need. The users must, therefore, scan the retrieval set for cues that will help them recognize relevant items. The cues which are traditionally made available in retrieval systems include document order, title and author of document. Retrieval sets are usually presented in a list that is ordered by some measure of similarity between the document and the user's query. These features -- order, author and title --constitute the cues that the user has to categorize documents as relevant or irrelevant to the information task.

In this study we investigate the impact of varying the cues which subjects have in visualizing retrieval sets. The traditional text based cues of author and title are replaced with spatial cues for keyword similarity. The tradeoff is between information that may be conveyed by an author’s name and words in a title but which consumes substantial screen space versus an iconic representation that presents relative keyword similarities at the expense of text.

Textual cues require a substantial amount of screen space. An author and title even when truncated as in standard displays may consume 50 characters of display space. This means that few (about 10) members of a retrieval set may be displayed at any one time. There is evidence that users do not look beyond the first list of items. A recent study showed that 58% of users do not look beyond the first 10 titles and 77% do not look beyond the first 20 (Jansen et al., 1998). Conversely, an iconic display allows each item to be displayed in approximately one character space. In addition, many icons may be superimposed without loss of information intrinsic to the icon. Therefore, hundreds of documents may be represented on the screen simultaneously. Which of these strategies is better depends on the relative efficiency with which individuals can exploit the different cue types: text versus space. We hypothesize that individuals with strong verbal abilities will be able to exploit title/author information, while individuals with strong spatial abilities will be able to exploit iconic position information. In the proposed study we will measure for both versions the amount of time searchers spend browsing the document title lists and in VIBE the amount of time in POI manipulation. If the VIBE browsing display is helpful, then the time spent in POI manipulation should be compensated for by savings in time searching title lists because VIBE title lists will be shorter than in the 1-D display condition.

Experimental Design

Research Questions

Future Research Questions

Experiments

The purpose of these studies is to investigate the performance of subjects with a retrieval system while varying interface features. The first study examines performance differences with interface dimensionality: a one-dimensional rank-ordered list and a two-dimensional visual information browsing environment (VIBE) display (see Appendix B). Future studies will focus on faceted retrieval. One version of the retrieval engine will allow full document searches, while the other version will allow searches within particular text fields. Another version of the system allows searches using image exemplars.

Subjects

The subject pool will be the same for all three studies. Subjects will be paid twelve dollars per hour. Experiment 1 should require approximately two hours for each subject to complete. The number of stimuli will be adjusted to fit into this time frame. Experiments 2 and 3 will require approximately one and a half hours since there is no administration of a spatial reasoning survey. Because of the findings on library student cognition discussed above (Allen & Allen, 1993), one-half of the subjects will be selected from the Graduate School of Library and Information Science (GSLIS) and one-half from other departments of the University of Illinois at Urbana-Champaign. An advertisement for subjects will be posted on the GSLIS general electronic bulletin board. Paper fliers with the same advertisement will be posted on bulletin boards in the LIS building and in the Illini Union. An entrance interview will be used to exclude subjects with knowledge such as botanical degrees that might allow them to perform the tasks without using the retrieval environment. Only students who report being proficient with the use of the Netscape or Internet Explorer web browsers will be accepted as subjects. No individual subject will participate in more than one study. The first study will include 40 subjects. This will allow for evaluation of statistical significance in the 2X2 design. The remaining two studies are pilot studies to verify proper system functioning and the proper experimental parameters. We will seek funding from outside sources to investigate the performance of larger samples in these studies. See Appendix C, for the preliminary questionnaire that will be administered to students.

Document Collection

The collection will be the same for all three studies. The search collection will consist of approximately 600 species files extracted from 290 genus descriptions of plants described in the first three volumes of the Flora of North America. Each genus includes an image file representing all of the species in the genus. The collection size is large enough to make it realistic and to make serial browsing of the collection impractical. The keys will not be provided although automatic indexing and structuring of the keys would be a useful future project. The treatments include Pteridophytes, Gymnosperms, Magnoliidae, and Hamamelidae. While all members will be included in the collection, the target stimuli (queries) will be limited to species that may be identified without the aid of microscopic analysis or other laboratory equipment.

Appendix G - Zamia contains an example genus file. This file includes a genus description and descriptions for seven species of the genus.

Experiment 1: Retrieval Set Dimensionality

The design of this experiment is discussed in detail in Appendix B. In this experiment subjects will be asked to perform an object identification task. Two groups are given the description of an object and use one of two interfaces to perform the search. The interfaces are identical except that one group's result set will be displayed in a conventional list format while the other group's results will be displayed in a two-dimensional VIBE browsing environment. The system will record the time spent using different parts of the interface as well as correct and incorrect responses. Cognitive factors research instruments will be used to evaluate the subjects perceptual speed and spatial scanning. We expect to find differences between the groups. We also hope to explain some of these differences by evaluating where people spend their time in the interface. For example, subjects who use a standard result display list may spend more time in query reformulation and may open more documents. We also expect to find that subjects with higher scores on the cognitive factors will be able to perform the task more efficiently regardless of the display format and that these individuals will show the greatest advantage on the two-dimensional display.

Future Work

Experiment 2: Faceted Retrieval

The design of this experiment is identical to the prior study but it is designed to develop and test a proposed new feature of VIBE, Faceted Retrieval. VIBE currently retrieves documents only for individual POIs as defined by the indexers. However, documents in the FNA collection and in many others are composed of discrete parts. Scientific journal articles are composed of a bibliographic section (author, title, and publication), abstract, introduction, methods, result and others. Programs were written to identify the facets or parts of documents in the FNA collection. A large number of the facets represent the parts of the plants being described. These facets are listed in the following table. Note that some facets are treated as synonyms and collapsed into one entry for indexing (e.g. "branch").

Facet

Source Field

Facet

Source Field

bark

bark

Seed

Seed

branches

branch

seed_cone_base

Seed

branchlet

branch

seed_cones

Seed

branchlet_sprays

branch

seeds

Seed

branchlets

branch

seedlings

Seedlings

lateral_branches

branch

shoot_system

Shoot

lower_branches

branch

long_shoots

Shoot

twigs

branch

shrubs

Shrubs

buds

buds

spach

Spach

terminal_buds

buds

stem

Stem

cones

cones

stems

Stem

crown

crown

trees

Trees

leaf_blades

leaf

shrubby_trees

Trees

leaves

leaf

common_name

Common_name

adult_leaves

leaf

open_text

Open_text

scalelike_leaves

leaf

source

Source

ovule

ovule

filename

Filename

ovules

ovule

genus

Genus

pollen

pollen

species

Species

pollen_cones

pollen_cones

variant

Variant

roots

roots

reference

Reference

lateral_roots

roots

range

Range

   

description

Description

In experiment 2 we will develop and test a new version of VIBE that supports faceted search. For this study, the facets will be individually indexed. The two groups in this study will be the "Full Index" group and the "Faceted Group." The "Full Index" group will select POIs that have scope over entire documents, while the "Faceted" group will pick a Facet first and then the term, which may apply to that facet. For the Faceted condition, POI to document similarity will be calculated over the scope of the facet only. For example, the term "Oval" might be used as a POI by both groups. But in the Faceted group the scope may be limited to the "Leaves" facet only. Both groups will be given a two dimensional display for results.

All dependent variables will remain the same as in Experiment 1 above.

Experiment 3: Faceted Similarity "exemplar" Retrieval

For the third experiment we will develop and test another feature for visual interfaces for VIBE. This is the ability to use the values of individual facets as exemplars that may act as complex POIs. The research question is, will similarity based faceted retrieval facilitate retrieval as measured by the dependent variables defined for experiment 1.

This exemplar feature will support an image-browsing environment. So, for example, a user may select an image of a leaf of a particular species to serve as a POI. Individual features of the image similarity would be selectable by the user. Because of the metric nature of image similarity measures, image retrieval systems such as QBIC (Flickner, et al., 1995), Virage (Gupta et al., 1997), VisualSeek (Smith & Chang, 1996a; 1996b), MARS (Huang, et al., 1997) display retrieval sets as ordered lists. There is generally no definition for exact match as there is in database management systems. Rather, as is the case in text matching, it is a graded measure. The current version of VIBE processes text documents only, yet there is an abundance of documents that contain images and other types of information. VIBE defines query - document similarity in terms of relative term frequency. To produce an image based system we will modify VIBE to handle image similarity based on the relative strength of visual features. This can only be done after the work proposed to support experiment 1 above. Visual features would include color and texture as well as shape features such as cross-section "roundness", dimensionality, axis curvature and other perceptually motivated features. Users will be able to place these features on the screen and images will gravitate toward them based on the similarity ratios.

Current versions of VIBE rely on POIs that are defined by the developers. These are usually single terms. In the new version of VIBE the subjects will be able to select facets of particular species documents and use these as POIs. For example, a searcher may select the "Stem" facet of the Zamia integrifolia to act as a POI. From the example in Appendix G - Zamia, this includes the term list, "subterranean, or leaf-bearing apex exposed". In essence, the similarity between the POI and individual documents will be determined for the entire facet rather than by individual terms. Of course similarity calculations would be limited to term frequencies within the facet. This effect could be replicated with the standard interface by placing multiple POIs, one for each term, on top of one another. The selection of the correct term set however would be difficult.

As in the other studies there are two groups. In this case it is the "Faceted group" as in Experiment 2 and the "Similarity group" which is new to this study. On experimental grounds Experiment 2 and 3 could be combined but for the practical purposes of development, they can not. We wish to test the features in experiment 2 as early as possible on a pilot basis so that the results can be included in an outside grant proposal. We would like to be able to report progress toward exemplar similarity matching at the writing of that proposal.

System Development Effort

To conduct this research we propose to modify and instrument VIBE. Parsing algorithms must be constructed to restructure the FNA data set which currently combines genus and species information into a single file. Modifications include the addition of the ability to present results in a ranked list much like traditional retrieval systems. This will allow comparison of the ranked list to a two-dimensional ordered scattergram normally used in VIBE. As part of the development effort VIBE will be modified to allow for faceted definition of POIs. Currently the system implementers define POIs in VIBE. We propose to allow searchers to restrict similarity calculations to specific values in specific fields of the source documents. VIBE will be modified to allow images to serve as POIs. Similarity in this version will still be defined by the word frequency distributions for text describing the images. New instrumentation will allow for tracking of system usage parameters such as the time spent in different parts of the interface, and record the identity of the documents viewed and the time spent viewing them. This will enable effectiveness evaluations based on the relevance of viewed documents.

Some of the modifications relate to particular types of experiments to be performed with the system. These include one-dimensional ranked lists. In order to conduct the main experiment of this study a new display format will be added to VIBE to produce output that is more similar to the traditional output of results of a search. The Cosine measure will be used to rank and order document titles into a list. In the current system this list is generated only by selecting sets of icons (representing documents) from the visual display.

Document lists in VIBE currently include only title information as seen in Figure 3: Document Display. This will be modified so that both 1-D VIBE and 2-D VIBE will contain Genus, species and common name in the title field.

Currently searchers are able to click on a document title and view the full text in a scrolling display window. The system will be altered so that users will be able to mark documents as relevant on radio buttons; irrelevant, probably relevant, or relevant. The system will be modified to keep track of the correctness of the judgements. If the item is correctly classified as relevant with either the relevant or probably relevant button, the trial will end, the decision will be recorded along with the time. If the document describes the target species and is given a relevance score of 0 (irrelevant), the system will record the mistake and allow the subject to continue exploration without notification. It the user incorrectly identifies a species as relevant, the system will notify them of the mistake and allow them to continue with the search.

Instrumentation will be added to the VIBE code so that time stamps and program states will be logged as searchers move from one part of the program to another. This will include time spent manipulating the POIs, time spent with full text displays on the screen and other independent variables.

The PI and GA will make other modifications to the system on an opportunity basis. While there is no experiment to test these features in a formal setting, nonetheless they will be added when modifying associated modules. Future external proposals will be sought to complete and test these modifications. The more that can be accomplished in system enhancement as part of this proposal the stronger later proposals will be. These auxiliary development efforts can best be understood in the context of the future experiments. One of these is faceted retrieval. This is a study of the use of visually supported faceted retrieval. Documents in this collection and in many others are composed of discrete parts. Scientific journal articles are composed of a bibliographic section (author, title, and publication) abstract, introduction, methods, result and others. Programs have already been written to identify the facets or parts of documents in the FNA collection. In large part the facets represent the parts of the plants being described. These facets are listed in the table, which is included in the discussion above.

The next major modification is faceted exemplar retrieval. This experiment includes the development and testing of another feature for visual interfaces for VIBE. This is the ability to use the values of individual facets as exemplars, which may act as complex POIs. The research question is, will similarity based faceted retrieval facilitate retrieval as defined by the dependent variables defined in the experiment in this proposal.

Current versions of VIBE rely on POIs that are defined by the developers. These are usually single terms. In the new version of VIBE, the subject will be able to select facets of particular species documents and use these as POIs. For example, a searcher may select the "Stem" facet of the Zamia integrifolia to act as a POI. From the example in Appendix G, this includes the term list, "subterranean, or leaf-bearing apex exposed". In essence, the similarity between the POI and individual documents will be determined for the entire facet rather than by individual terms. Of course similarity calculations would be limited to term frequencies within the facet. This effect could be replicated with the standard interface by placing multiple POIs, one for each term, on top of one another. The selection of the correct term set however would be difficult.

Summary

This proposal is for a project that will include the development and testing of features for a Visual Information Browsing Environment (VIBE). While the experimentation is on this modified VIBE, the experimental parameters are intended to allow generalization to other systems with similar interface features such as one-dimensional lists versus two-dimensional icon-based displays. A significant amount of the effort will go into system development. The resulting system and results from the pilot study will serve as the basis for future proposals for outside funding to agencies such as NSF (which supported previous work on VIBE).

Bibliography

Allen, B. (1994). Perceptual Speed, Learning and Information Retrieval Performance. Proceedings of the Seventh Annual International ACM-SIGIR Conference on Research and Development in Information retrieval, 71-80.

Allen, B. (1996). Information Tasks: Toward a User-Centered Approach to Information Systems. New York, N.Y.: Academic Press.

Allen, B., & Allen, G. (1993). Cognitive abilities of academic librarians and their patrons. College & Research Libraries, 54(1), 67-73.

Belkin, N. J., Oddy, R.N., & Brooks, H.M.. (1982). ASK for information retrieval: Part I. Background and Theory. Part II. Results of a design study. Journal of Documentation. 38, pp. 61-71, 145-164.

Benford, S. D., Snowdon, D N., Greenhalgh, C M., Ingram, R J., Knox, I. and Brown, C C., VR-VIBE: A Virtual Environment for Co-operative Information Retrieval, Computer Graphics Forum, 14, (3), pp. 349-360, 1995, NCC Blackwell. [also Proc. Eurographics '95]

Bruner, J.S. (1992). Another Look at New Look 1. American Psychologists, June, 47, 780-783.

Bruner, J.S. ( 1973). On perceptual readiness. In J.M. Anglin (Eds.), Beyond the Information Given. New York: Norton, 7-42.

Chalmers, M. and Chitson, P. (1992) Bead: Explorations in Information Visualization. In Proceedings of SIGIR '92, Copenhagen, Denmark. ACM Press. 330-337.

Cobb, B. (1963). Ferns. Boston, Houghton Mifflin.

Cugini, J., Piatko, C., Laskowski, S. (1996). Interactive 3D Visualization for Document Retrieval. Paper presented at the Conference on Information, Knowledge and Management (CIKM 96) in the Workshop on New Paradigms in Information Visualization and Manipulation. (http://zing.ncsl.nist.gov/~cugini/uicd/viz.html) [Last Accessed: February 16, 1999]

Dervin, B. (1983). Information as a User Construct: The Relevance of Perceived Information Needs to Synthesis and Interpretation. In S.A. Ward & L.J. Reed (Eds.), Knowledge Structure and Use: Implications for Synthesis and Interpretation. Philadelphia, PA: Temple University Press, 153-183.

Ekstrom, R. B., French, J.W., Harman, H.H. and Dermen, D. (1976). The Kit of Factor-Referenced Cognitive Tests. Educational Testing Service, Princeton, NJ. (Perceptual Speed and Spatial Scanning tests)

Ellis, D. (1996). The Dilemma of Measurement in Information Retrieval Research. Journal of American Society for Information Science. 47 (1), 23-36.

Ellis, D. (1984). Theory and explanation in information retrieval research. Journal of Information Science, 8, 25-38.

Gupta, A., Santini, S., and Jain, R. (1997). In search of information in visual media, Communications of the ACM, 40(12), 35-42.

Harter, P. H. (1996). Variations in Relevance Assessment and the Measurement of Retrieval Effectiveness. Journal of the American Society for Information Science, 47(1), 37-49.

Harter, P. H. (1992). Psychological Relevance and Information Science. Journal of the American Society for Information Science, 43(9), 602-615.

Heflin et al. WebTOC: Evaluation of a Hierarchical Browsing Interface for the World Wide Web. (http://www.otal.umd.edu/SHORE/bs11/)[Last Accessed: February 21, 1999].

Hemmje, M., Kunkel, C., Willett, A. (1994). LyberWorld - A visualization User Interface Supporting Fulltext Retrieval. In Proceedings of ACM SIGIR '94, Dublin, 1994.

Huang, T., Mahrotra, S., and Rachandra, (1997) "Multimedia Analysis and Retrieval System (MARS) project", In P. B. Heidorn and B. Namachchivaya (Eds.), Digital Image Access and Retrieval, (papers presented at the Clinic on Library Applications of Data Processing, March 24-26, 1996). Urbana: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign.

Jansen, Major Bernard J., Amanda Spink, Judy Bateman and Tefko Saracevic. (1998) Real Life Information Retrieval: A Study of User Queries on the Web, SIGIR Forum, 32 (1) 1-17.

Keen, E. M. (1971). Evaluation parameters. In Salton, Gerard (Ed.). The SMART retrieval system; experiments in automatic document processing. Englewood Cliffs, N.J., Prentice-Hall.

Kim, H. and Korfhage, R. R. (1994). BIRD: Browsing Interface for the Retrieval of Documents. Proceedings of IEEE Symposium on Visual Languages, St. Louis, 176-177.

Korfhage, R. R. (1997). Information Storage and Retrieval. New York, N.Y.: Wiley & Sons, Inc.

Koshman, S. L. (1996). User Testing of a Prototype Visualization-Based Information Retrieval System. Doctoral dissertation. University of Pittsburgh.

Leitheiser, R. L., and Munro, D. (1995). An Experimental Study of the Relationship Between Spatial Ability and the Learning of a Graphical User Interface. Proceedings of the First Americas Conference on Information Systems, Association for Information Systems, August 25-27, 1995. Pittsburgh, Pennsylvania, U.S.A., pp. 122-124. (http://hsb.baylor.edu/ramsower/acis/papers/leitheis.htm)[Last Accessed: January 28, 1998].

Little, A. L. (1996). The Audubon Society field guide to North American trees: eastern region. New York, N.Y.: Alfred A. Knopf.

Marchionini, G. (1997). Information Seeking in Electronic Environments. New York, N.Y.: Cambridge University Press.

Marchionini, G. and Komlodi, A. (in press). Design of interfaces for information seeking. In: Williams, Martha., ed. Annual Review of Information Science and Technology: Volume 33, 1998. Medford, NJ: Information Today, Inc. for the American Society for Information Science.

Mizzaro, S. (1997). Relevance: the whole story. Journal for the Society of Information Science, 48(9), 810-832.

Morse, E., and Lewis, M. (1997). Why information visualizations sometimes fail. Proceedings of IEEE International Conference on Systems Man and Cybernetics, Orlando, FL. October 12-15, 1997.

Morse, E., Lewis, M., Korfhage, R. and Olsen, K. (1998). Evaluation of text, numeric and graphical presentations for information retrieval interfaces: User preference and task performance measures. Proceedings of IEEE International Conference on Systems Man and Cybernetics, San Diego, CA. October 11-14, 1998.

Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P. (1995). Query by image and video content: The QBIC system. Computer, 28 (9), 23-30.

Newby, G. (1992). Towards Navigation for Information Retrieval. Unpublished doctoral dissertation. Syracuse University.

Niering, W. A. and Olmstead., N. C. (1979). The Audubon Society field guide to North American wildflowers: Eastern region. A Chanticleer Press Edition. New York, N.Y.: Alfred A. Knopf, Publ., 887 p.

Olsen, K.A., Korfhage, R. R., Sochats, K.M., Spring, M. B. and Williams, J.G. (1993). Visualization of a document collection: The VIBE system. Information Processing and Management, 29 (1), 69-81.

Schamber, L., Eisenberg, M. B., & Nilan, M. S. (1990). A re-examination of relevance: Toward a dynamic, situational definition. Information Processing and Management, 26, 755-776.

Schamber, L. (1994). Relevance and Information Behavior. Annual Review of Information Science and Technology (ARIST), 29, 3-48.

Shneiderman, B., Byrd, D., Croft, W. B. (1997). Clarifying Search: A User-Interface Framework for Text Searches. D-Lib Magazine. 1997 January. ISSN: 1082-9873. (http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html)[Last Accessed: February 21, 1999].

Smith, J. R. and Chang S-F, (1996a). VisualSEEK: a fully automated content-based image query system. In Proceedings of the ACM International Conference on Multimedia, ACM, Boston, MA.

Smith, J. R. and Chang, S-F (1996b). Tools and techniques for color image retrieval. In Proceedings Storage & Retrieval for Image and Video Databases IV, IS&T/SPIE}, Vol. 2670.

Sperber, D., & Wilson, D. (1986). Relevance: Communication and Cognition. Cambridge, MA: Harvard University Press.

Spoerri, A. 1993. Visual tools for information retrieval. Proceedings of the 1993 IEEE Symposium on Visual Languages. Bergen, Norway. Los Alamitos, CA: IEEE Computer Society Press, 160-168.

Steinberg, D, DePlachett, C. Pathak, K and Strickland, D. 3-D Displays for Real-Time Monitoring of Air Traffic. CHI ’95 Proceedings.
(http://www.acm.org/sigchi/chi95/Electronic/documnts/intpost/rks_bdy.htm) [Last Accessed February 21, 1999]

Sugar, W. (1995). User-centered perspective of information retrieval research and analysis methods. In M. Williams. (Ed.). Annual Review of Information Science and Technology. Vol. 30, pp. 77-109.

Swan, R., Allan, J., Byrd, D. (1998). Evaluating a Visual Retrieval Interface: AspInquery at TREC-6. Position paper for the CHI '98 Workshop on Innovation and Evaluation in Information Exploration Interfaces (Los Angeles, April 1998). (http://www.fxpal.com/CHI98IE/submissions/long/swan/index.htm) [Last Accessed: January 28, 1999]

Swanson, D. (1988). Historical Note: Information Retrieval and the Future of an Illusion. Journal for the American Society of Information Science. 39(2), 92-98.

Swanson, D. (1976). Information Retrieval as Trial-and-Error. Library Quarterly. 47, 128-148.

Tague-Sutcliffe, J. (1992). The pragmatics of information retrieval experimentation, revisited. Information Processing and Management 28, 467-490.

Taylor, R. S. (1986). Value-added processes in information systems. Norwood, N.J.: Ablex Pub. Corp.

Appendix A

Information Retrieval Effectiveness and relevance

The type of information retrieval with which we are concerned is the object identification task. This task requires that the user determine the identity of an unlabeled object. This is unlike both the typical data base retrieval and full text retrieval in several important aspects. Unlike database retrieval there is no unique key that is accessible to the system user. Unlike typical text retrieval problems where many objects are potentially relevant, in the object identification problem only one document in the collection is relevant or identifies the unknown. Typical examples of object identification include the identification of unknown plants, animals or people. In the experimental condition being studied here, there is a collection of six hundred vascular plant species descriptions. The subject's task is to identify ten of these species from photographs. The prototype system aids them in the identification task by facilitating access to the collection.

Under these test conditions the traditional measures of recall and precision are poor measures of retrieval effectiveness. Recall effort and task completion time as defined below, better represent the ability of the system in facilitating task completion. The task conditions are such that in nearly all trials the target will be found after examining some small fraction of the collection so recall, the proportion of relevant documents retrieved, will always be 100%. Precision, the proportion of retrieved and relevant documents to all retrieved documents might be slightly more representative except that in VIBE most documents are activated or displayed on the screen as icons, so the numerator will always be 1 and the denominator will be constant, as is the case with most browsing interfaces. Document icons may be opened by the users to examine the underlying documents. The opened set might be defined as the retrieved set to give a more meaningful denominator but this is just a degenerate case of the recall effort.

Recall effort is a much more appropriate measure of system effectiveness. This is one of the "user-oriented" measures proposed by Keen (1971). This measure takes into account the number of documents examined by the user in order to find the desired document. In the object identification case the number desired is always one (assuming target recognition) as discussed below. For the test collection the recall effort may vary from one where the target is the first item selected to N, the number of documents in the collection, in the unlikely case where the relevant document is the last examined.

This measure assumes that the collection contains a sufficient number of desired documents (1) and that the system permits the user to search sufficiently long enough to find it. Both conditions are true here with the task failure caveat.

From the user's perspective, a closely related measure of system effectiveness is the task completion time. Where the time to examine documents is constant (which is approximately the case here since document size varies little), the measures of task completion time and recall effort will be perfectly correlated. However, in this system subjects also spend time organizing the display moving POIs and turning on and off features such as "star" and "color". Under some test conditions users may spend much more time performing these operations. In the extreme, some system configurations may allow the user to manipulate the display so that only the correct document is examined but the manipulation time might be prohibitive. In the current situation such an effect might be found between the component based condition and the full text condition described in the Future Work section below. In the component based condition the user must select and manipulate both the components (plant parts) and the terms to describe them while in the aggregate condition, only the terms (but from a larger set) need to be selected and manipulated.

Recall effort and task completion time are good measures of system effectiveness only where the user is able to successfully complete the task, that is, retrieve and recognize the relevant document. This is not the case in the three conditions for task failure. All three conditions will be reported and analyzed. First, the searcher may retrieve and examine a relevant document but not recognize it. In this case the search would continue until the user abandons the search. In the second case, the searcher may incorrectly identify an incorrect item as the target, prematurely ending the search. In the final case, the searcher may abandon the search before the relevant document is examined. These task failures are informative. The recognition failure rate should be constant across conditions since there is nothing about the task that might cause lack of recognition (except perhaps fatigue). The failure due to task abandonment may be higher for the poorer system and would prevent calculation of recall effort and time to completion but is itself a measure of effectiveness or lack of effectiveness.

There is also the possibility that the user will examine and not recognize a target but later return to it and recognize it. In this case, task completion time and recall effort will be calculated as time to recognition rather than time to first examination. It is assumed that the frequency of occurrence of this condition will be independent of experimental conditions but the frequency will be recorded.

In order to help control the time required for the experiment, a time limit will be placed on each identification task. This limit will be determined in the pilot study. Where this time is exceeded it will be coded as task failure. Object mis-identification will be coded as a failure but the subject will be notified of the problem and will be encouraged to continue searching.

Appendix B

Experiment 1: Dimensionality

Task Description: The objective of the task is for subjects to search through a collection of documents to find one which will allow them to correctly identify an unknown item. Subjects will identify five unknown items. In each trial subjects will be given a photograph of a single plant specimen and a description of the plant from a non-FNA flora. The subjects will use the VIBE system to identify the species. The groups will differ in output presentation: 1-D or 2-D.

Stimulus Selection: Images and descriptions that are used as stimuli (queries). The query images and descriptions will be selected from the National Audubon Society Field Guides to North American flora including the guides to wildflowers (Niering & Olmstead, 1979), trees (Little, 1996) and ferns (Cobb, 1963). Entries will be randomly selected from the FNA database. Species that require microscopic or other laboratory analysis for identification will be excluded. This will be determined using the identification keys provided with each genus in FNA. If an image and description is available from a field guide, the field guide data will be added to the stimulus set. Species will be selected from the FNA database until 40 stimuli are identified in the field guide. Twenty lists of five species each will be created from this list by random selection. The actual number of query species may vary slightly depending on the results of pilot studies. Species will not repeat within a list. Each of the twenty subjects in each group will be assigned one of these lists. Using this method each individual ordering list will be used by one subject in each group.

Groups: Screened subjects will be randomly assigned to one of the two groups with the restriction that equal numbers of library and information science students are assigned to each group. Each group will have twenty subjects. One member of each group will be assigned one of the twenty stimuli sets (of 5 species each). Each group will have a different output display. One group ("One-Dimensional Display") will see results displayed directly into a list containing Genus and Species information. The other group ("Two-Dimensional Display") will see results on the VIBE 2-D display as described below. The "One-Dimensional Display" group will have the results displayed in a list containing the best matches at the top of the list. This is defined by the Cosine measure of document similarity and will take into account the terms present and the matching term frequencies. The "Two Dimensional Display" group will have results displayed in a VIBE style display. POIs (query terms) will be plotted on the screen in a circle formation to use the maximum screen real estate. Icons representing individual database records will be located on the screen dependent on the relative strength of their relationship to the keywords. The details of the spatial layout calculations are discussed in the VIBE section above.

Equipment: The modified WebVIBE database will reside on a SiliconGraphics Indy running an Apache Server. Netscape 4.5 will be used as the browser on the client end. For the display interface, subjects will use either the system monitor or a PC on the same sub-LAN. A non-system monitor will be used only if network delays for images are consistent and minimal (< 3 seconds). This will be determined after implementation. In any case all subjects will use the same monitor. The monitor is 17 inch, 24bit color 1280X1024. A standard mouse and keyboard will be used.

Procedure:

Potential subjects will phone, email or personally contact the PI to determine if they are eligible for the study. On phone or by email subjects will be screened for student status and botanical experience. Qualified subjects will be informed where and when to report.

The testing room is a private office containing two desks and the computer equipment used in the study. The researcher will repeat the participation requirements. Those without substantial botanical experience will be informed of their rights as a subject and will be asked to read and sign the consent form (see Appendix F). The experimenter will assign a subject number at this time and look up the experimental condition. They will then be asked to fill out the subject data form included in Appendix C – Preliminary User Questionnaire. Two spatial abilities, perceptual speed and spatial scanning, will be assessed using the Kit of Factor-Referenced Cognitive Tests (Ekstrom, French, and Harman, 1976).

The interface features that the participants will see will be similar to those seen in the screen snapshot in Figure 1: VIBE. This interface will be de-featured. Features will be removed in order to isolate some of the critical issues in the difference between a 1-D and a 2-D display. There are aspects of the VIBE interface that make it more flexible but these features take time to learn. For example, the user may assign color to individual keywords or POIs. The color of document icons is then determined by the color of the POIs that have influence over the location of the icon. This feature may be used to disambiguate the influence of POIs on icons. Since color is not central to the 1-D versus 2-D distinction it can be eliminated, as will be done for this study. This however has the potential side effect of reducing the 2-D functionality. With this reservation the following de-featuring will be used.

The "FILE" option will be disabled since there will be only one database to search. The "OPTIONS" menu will be changed in the 2-D condition to remove the "SELECT ICON", "LENS" and "SELECT GROUP" options will be removed. The default behavior will be the same as "SELECT GROUP". The result of a "SELECT GROUP" is a 1-D list of Species Names. This is analogous to the query result list in the 1-D display condition. This will allow the list lengths to be compared as discussed below in the Expected Results section.

The next toolbar button, "VIEW," and the associated menu will be removed in both conditions. All view options, "Show Toolbar", "Show Statusbar", "Flyby Tips" and "ToolTips" will be set to on in the 2-D condition and not applicable to the 1-D condition. The "TOOLS" button will be removed in both conditions. As stated above the "COLORS" button will be removed in both conditions.

Based on the experimental condition the experimenter will enter the appropriate URL to run the version of VIBE that is appropriate for the condition. The experimenter will enter a subject ID and begin a ten-minute training session. In the training session the subject will be shown a sample identification task performed by the experimenter. The subject will then perform another identification task under the direction of the experimenter. There will be no discussion of the strategy for selecting keywords. In the training examples keywords/POIs will be selected so that the correct target's abstraction is accessible. The target abstraction is the species name in the 1-D interface and an icon in the 2-D interface. Participants will be shown example records from the database but not the records for any images that will be targets for experimental identification tasks for the subject. Details of the interface are discussed below. Subjects may ask questions any time throughout the training session.

After training the subject will be given the first of five test images and descriptions. The order of search task will be randomized to make it possible to evaluate the potential learning effect between trials. The subjects will be instructed to attempt to find the Latin and common names for the image and description in the database. The subject may ask for assistance at any time. When asked, the researcher may provide information about the use of the interface but will not suggest search strategies or query terms.

Subjects will begin each search task with the interface screen appropriate for the condition. Subjects in both conditions will select keywords/POIs from a list of terms that are sorted in alphabetical order. The subject may use the mouse to select items from the list. The selected term is added to a current search term display window. Users may delete terms from the cells in the window. When an item is selected, the system will conduct a search based on that item. In the 1-D display a ranked list of items will be presented in a scrollable window similar to that displayed in Figure 2: Document Selection. The Latin Name and first Common Name will be presented in this window. Documents will be retrieved with a logical OR of the search terms and ranked by the Cosine Measure. The length of the retrieval list will be limited to 100 documents. When an item is selected in the 2-D display condition, the mouse cursor will become a POI with this term attached and the user may place it anywhere on the display surface. Any document containing this word will be added to the display set (an OR of the search terms) and the individual document icon positions will be updated to reflect the influence of the new POI.

After selecting new terms in either experimental condition, the subject may decide to either refine the query further (add or delete terms) or to browse the current retrieval set. In the 1-D condition, subjects will be able to read species names on the display list and scroll to items later in the list. A subject may view a document by clicking the mouse on the list item for that document. The document will be displayed in a scrollable "Document Display Window" much as in the window in Figure 3: Document Display with the exception that there will be two buttons, "Submit as Match" and "Close". The keywords in the query will be highlighted in the document.

In the 2-D condition the subject may lasso one or more documents. If there is one document it will be displayed in a display window identical to the 1-D condition. If there is more than one it will be displayed in a 1-D list identical to that in the 1-D condition. The 2-D condition subjects may select individual items for display from this list. In either condition, after evaluating displayed documents, the subject will click on a "Submit as Match" button on the document display window when they believe that they have found the target. The system will indicate if they are correct. If they are not correct, they will be able to continue searching. Subjects are given fifty cents each for correct selections in an hour. A 10-cent penalty is imposed on incorrect solutions and will not exceed a loss of 50 cents for any one trial. The accumulated reward/penalty for correct answers is reported on the display screen at all times during the study. Reimbursement from this reward mechanism is independent from the hourly pay participation. When a subject finds the matching item or chooses to abandon a search the researcher will give them the next trial item. The screen will be cleared of all query terms and documents. This will continue until all items are used. The session will also end if an hour has passed without the subject finding all items.

After the identification task is completed subjects will be given the exit questionnaire (see Appendix E) and will be paid. The experimenter will then conduct the exit interview (see Appendix D). The experimenter will answer any questions about the experiment at this time but will not explain the two display conditions. The experimenter will offer to send copies of the results of the study to the subject once they have been written.

Independent Variable

  1. Interface visual display: 1-dimensional list or 2-dimensional VIBE

DEPENDENT VARIABLES

  1. Recall Effort - the number of items that must be viewed before the desired item is identified. This includes the Number of Documents examined.
  2. Task Failure Conditions - unrecognized target, false target and abandoned search
  3. Mean Task Completion Time (seconds) – used by Heflin et al. to compute the average time needed to complete each task.
  4. Icon Browsing Time - the amount of time spent looking at and manipulating the icon display. This time only exists for experimental conditions which include a two dimensional display.
  5. Document List Browsing Time - the amount of time spent searching through document surrogate lists (genus, species, and common name).
  6. Query Length at Time of Retrieval - the number of query terms at the time when the matching document is recognized.
  7. Document Viewing Time - the time spent with full text documents on the display screen.
  8. Number of Query Reformulations - the number of times that terms are added or deleted.
  9. Number of Correct Selections - the number of times the subject correctly identified the document that matched the unknown target.
  10. Qualitative Measures: Satisfaction - a five-point scale of how willing subjects would be to use the system again.
  11. Subject Variables - including Perceptual Speed and Scanning Speed from the Kit of Factor-Referenced Tests, Major (LIS and non-LIS), Age, Retrieval Experience.

Anticipated Results

Data will be analyzed in an ANOVA. There are two main conditions of Display Types (1-D and 2-D). The dependent variables are listed above. Recall Effort is expected to be different between the systems. The 1-D list, through the ranking mechanism, provides an indication of the strength of the relationship between the documents and the query as a whole. No information is provided about the relationship between individual query terms and the document. The 2-D interface contains a greater amount of information about the relationship of documents to the retrieval terms. It is hoped that the 2-D display format will help to deal with the information overload that is typical of many retrieval environments. As we cannot predict the direction of the effect in this study, the analysis will be two tailed.

The display with the lowest Mean Task Completion Time may be said to be superior in some ways to the other display format. This measure is dependent on the sum of all of the time the subjects spend on different subtasks. The distribution of time between these tasks may be even more informative than the overall Mean Task Completion Time. The relative effectiveness of the two systems is a tradeoff between time spent browsing the 1-D list displays in both conditions plus the time spent viewing documents, plus in the 2-D condition, time spent manipulating the POI positions. Since the 1-D condition does not have this process, the only way for the Mean Task Completion Time for the 2-D condition to be faster is if the list browse time and /or the time viewing documents is shorter than the counterparts in the 1-D condition. This might be the case since the Lists are expected to be shorter in the 2-D case and the total number of documents examined are expected to be small. These savings may or may not be sufficient to offset the additional time spent manipulating POIs. Information about these relationships will help to focus effort for future development.

There may be an ordering effect that has direct bearing on the training time and system familiarity. In both conditions the time needed to complete tasks may be less on average for later trials than for earlier trials. The order of the tasks is randomized so such an effect is not attributable to differences in task difficulty. However, since users are generally unfamiliar with 2-D interfaces, it may take a longer time to learn to use this display interface. Since the 2-D interface has more configuration options and is less familiar, the learning effect should be greatest for this condition. Therefore the improvement might be expected to level off in later trials of the more familiar 1-D interface while the users of the 2-D interface may still be learning effective strategies for 2-D exploration. If this turns out to be the case, there is an argument for repeating the study with a longer training period and additional trials. Subjects will become faster across trials as they become familiar with the systems.

Koshman (1996) found that experience with text based Boolean systems led to lower performance with VIBE. The same effect is expected in this study. The finding in the Koshman study may however in part be confounded with other subject variables. Koshman's subject pool was library science students. These same subjects may have lower spatial scores and higher verbal scores than the population at large (Allen & Allen, 1993). Another study found that spatial visualization is a significant factor in use of a GUI versus a command line. Subjects with high spatial visualization scores took less time to complete using a GUI but there was no significant difference in scores for command line tasks (Leitheiser & Munro, 1995). Another study (Swan et al., 1998) found no difference in retrieval performance in the use of a 3-D system between librarians and the general student population but a significant difference in retrieval effectiveness using a 3-D iconic interface and a more traditional ranked list system. The same study found that librarians overwhelmingly (7-1) preferred a system with ranked lists while the general population preferred a 3-D display (6-2). These conflicting findings may indicate that there is a complex interaction between familiarity with traditional systems, Spatial Visualization skill,s and profession. While this is not the primary focus of this study, we expect that the library students will have lower spatial ability than the general population. Subjects with high spatial ability are expected to have higher performance scores using the 2-D interface but will not differ significantly in their performance with the 1-D system.

Limitations

These results need to be interpreted carefully when judging their generality. For example, when interpreting the Recall Effort and Mean Task Completion Time, no matter which condition turns out to be "superior", it might be possible to add additional features for the "inferior" interface that might reverse the results. For example, more information might be added to the 1-D list display that might make it easier to determine the nature of the contents without opening the document. For example the keywords that matched might be added to the 1-D Name List. The results may also be sensitive to characteristics of the data set and in particular the size of the set. The relative merits of the display formats may be sensitive to the size of the retrieval set that is in turn dependent in part on the size of the collection. Larger collections generally lead to larger retrieval sets given the same query. The 2-D display format might have an advantage for larger retrieval sets. This issue is left to future work.

There is also a question about generalizability to Boolean systems. This question is independent of the current study that focuses on result display formats and not query formats. If a Boolean search were allowed in the 1-D condition then to make the display formats comparable it would be necessary to add a Boolean query filter in front of the 2-D display condition as well. There is potentially an interaction between the use of a Boolean query format and the result set. Assuming that the subjects know how to use Boolean queries effectively, on average, the result of a Boolean query may produce a smaller retrieval set than a vector query.

There is also a question about generalizability to other collections. A list of journal articles in a retrieval set would generally include the author name and article title. For a particular group of users, this might provide additional information about the potential contents of the article allowing the user to reject some documents on the basis of the title. This would eliminate the need to open and review some potential matches, leading to improved Recall Effort and Time to Task Completion. This advantage would also exist in the 2-D condition but to a smaller extent. The 1-D list generated when subjects lasso a set of documents would also contain this extended information, but these lists are generally shorter than the list in the 1-D condition. On the opposite side of the argument, people generally have difficulty forming effective Boolean queries and the effect might be offset by a larger collection. These issues of generalizability are left to future work.

There is a question of the validity and reliability of psychological measures such as perceptual speed in information retrieval tasks. These tests may not be measuring characteristics that the experimenter believes they are. The scores may be correlated with general intelligence or other factors. Indeed these tests and ones like them are still the subject of research in psychology. Whatever they are measuring, the results of the study will indicate if these factors interact with retrieval effectiveness. This issue is discussed elsewhere (Allen, 1994). Generally, the results seem to be consistent in as much the scores for different groups are the same (where subjects are randomly assigned).

Materials

  1. Subject variable sheet – a questionnaire designed to collect information, which describes subject characteristics such as age, grade, gender, searching experience. (see Appendix C)
  2. Training and instructional materials - (to be developed)
  3. Retrieval System—The experimental treatments will use the same VIBE interface with different retrieval displays (ranked list or 2-D iconic display). The Boolean operators supported will be AND/OR.
  4. Retrieval Stimuli – These are species descriptions from filed guides.
  5. Database – Flora of North America treatments including Pteridophytes, Gymnosperms, Magnoliidae, and Hamamelidae. (see Appendix G)
  6. Institutional Review Board Forms from the University of Illinois – these primarily relate to subject permission forms. (see Appendix F)

Appendix C – Preliminary User Questionnaire

Visual Information Retrieval

Preliminary Questionnaire

Instructions: Where choices are listed circle the response that applies

  1. Age: ________
  2. Year of study: Freshman Sophomore Junior Senior Graduate
  3. Major: ___________________________
  4. Field of Interest in Major: ___________________________
  5. Gender: Female Male
  6. How many years have you been using computers?
  7. Never Used Less than 1 year 1-3 years More than 4 years

  8. Rate your familiarity with online information retrieval databases
  9. Novice Intermediate Advanced

    If you answered Novice to question 6, please skip the rest of the questionnaire.

  10. How long have you been searching the World Wide Web/ Internet?
  11. Never Used Less than 1 year 1-3 years More than 4 years

  12. How long have you been using Email?
  13. Never Used Less than 1 year 1-3 years More than 4 years

    If you answered "Never used" to questions 7 and 8, please skip question 9

  14. Have you ever used?
  15. 1) Dialog Yes No

    2) Lexis Nexis Yes No

    3) Dow Jones Yes No

    4) UIUC Library Gateway Yes No

  16. List the five online information retrieval systems or databases that you most frequently use:
  1. ________________________
  2. ________________________
  3. ________________________
  4. ________________________
  5. ________________________

Subject ID: ______________________________

Date: __________________________________

 

Thank you for your participation!

Appendix D - Exit Interview

In this interview, the experimenter in both conditions will review some of the decisions that the user made in order to evaluate psychological relevance.

Subjects who made false positive identifications will be reminded of a task and then shown the query state at one point in the retrieval process along with examples of documents that this user had selected as relevant but which did not match the query item. These documents were those that were viewed immediately prior to a query reformulation. The documents from the last trials will be studied first. The number of documents evaluated will be determined during the pilot study but is expected to be from one to three since more would require using tasks that were too long ago to allow accurate memory from the subjects. The subject will be asked, "After viewing this document, you changed the words that you were using in your search. Is there anything about this document that led you to make that change?" The responses will be recorded on audio tape.

Appendix E – User Satisfaction Survey

Thank you for your participation in this experiment. Please take a few minutes to complete this survey to help us evaluate the results.

1) Rate the difficulty of the set of tasks you were asked to perform (Circle one):

(easy)

     

(difficult)

Task 1

1

2

3

4

5

Task 2

1

2

3

4

5

Task 3

1

2

3

4

5

Task 4

1

2

3

4

5

Task 5

1

2

3

4

5

2) How easy or difficult was it for you to learn the system you were using for this experiment?

(easy)

     

(difficult)

1

2

3

4

5

3) Do you think that this was a useful interface for doing this task?

(useful)

     

(not useful)

1

2

3

4

5

4) Please give any comments you have about the experiment below:
_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________



 

 

 

 

 

 

 

 

 

 

 

Subject ID: ___________

Appendix F – Sample Consent Form

CONSENT TO PARTICIPATE IN AN EXPERIMENTAL STUDY

Visual Information Retrieval

Investigators:

P. Bryan Heidorn, Assistant Professor, Graduate School of Library and Information Science, 221 LIS Building, 501 E. Daniel St., Champaign, IL (217) 244-7792

Sarai Lastra, Graduate Student, Graduate School of Library and Information Science, 501 E. Daniel St., Champaign, IL (217) 244-2757

DESCRIPTION: The goal of this study is to better understand the ways in which people use visual information retrieval interfaces. The experiment will last approximately two hours. You will be paid twelve dollars per hour for participation in the study.

RISKS AND BENEFITS: The methods used in this study present no danger to you. It is hoped that the results from these studies will lead to improvements in the design of information retrieval interfaces.

CONFIDENTIALITY: Your performance data will be kept confidential. No personally identifying information will be included with the data records for any individual. Published accounts of the research will focus on group averages and refer to individual responses using codes (e.g., Subject 1, Subject 2, etc.) that preserve your anonymity.

RIGHT TO REFUSE OR END PARTICIPATION:

Your participation is entirely voluntary. You may stop participation in the experiment at any time for any reason.

VOLUNTARY CONSENT: I certify that I have read the preceding and that I understand its contents. The research staff has answered any questions I have pertaining to this research. A copy of this consent form will be given to me. My signature below means that I have freely agreed to participate in this research study.

x____________________ __________ ______________________

Participant Date Witness

INVESTIGATOR'S CERTIFICATION: I certify that I have explained to the above individual the nature and purpose, the potential benefits, and the possible risks associated with participating in this research study, have answered any questions that may have been raised, and have witnessed the above signature.

x____________________ __________ ______________________

Investigator Date Witness

 

Appendix G - Zamia

Zamia Linnaeus
Garrie P. Landry

1. ZAMIA Linnaeus, Sp. Pl. ed. 2, 2: 659. 1762 - [Derivation equivocal, perhaps from misreading of Latin azania, a kind of pine cone, or from Latin zamia, loss, from the "sterile appearance" of the pollen cones]

Stems often branched, subterranean to aboveground. Leaves broadly oblong-elliptic; leaflets entire to coarsely dentate, without midribs, venation dichotomous but appearing parallel. Cones distinctly peduncled. Pollen cones more slender than seed cones. x = 8.

Species ca. 30 (1 in the flora): subtropics and tropics, North America, Mexico, West Indies, Central America, South America.

1.Zamia integrifolia Linnaeus f. in Aiton, Hort. Kew. 3: 478. 1789 - Coontie, Florida arrowroot, conti hateka (Seminole)

Zamia floridana A. de Candolle; Z. silvicola Small; Z. umbrosa Small

Stem subterranean, or leaf-bearing apex exposed. Leaves 2--10 dm; petiole unarmed; leaflets 6--17 cm &times; 2--18 mm, linear, often twisted, very stiff, dark glossy green, 7--23-veined; margins often revolute, entire or with small teeth to slight denticulations near apex. Pollen cones generally 2--5 per plant, narrowly cylindric, 5--16 cm, tapering slightly at apex. Seed cones cylindric-ellipsoid, 5--19 cm, blunt at apex; ovules 2 per sporophyll. Seeds drupelike, oblong to ovoid, somewhat angular, 1.5--2 cm, outer coat bright orange. 2n = 16.

Period of receptivity and maturation of seeds December--March. Hammocks, pine-oak woodlands, scrub, and shell mounds; 0--30 m; Fla., Ga.; West Indies.

Once common to locally abundant, Zamia integrifolia is becoming increasingly uncommon as its habitats are being destroyed. The species is now considered "endangered" in Florida.

The choice of specific epithet to use for our species follows the conclusion reached by D. W. Stevenson (1987).

Controversy has long existed over the classification of Zamia in Florida. Recent researchers, however, have concluded that only one species is present in the flora. The several binomials applied to our Zamia reflect variability in plant vigor, leaf shape, leaflet width, number of marginal teeth and veins per leaflet, and geographic distribution. Forms with wide leaflets---"Zamia umbrosa"---are restricted to coastal hammocks of northeastern Florida and southeastern Georgia and appear to be quite distinct from plants of the remainder of Florida---Z. integrifolia and "Z. floridana." Especially robust forms have been described as "Zamia silvicola." Studies by D. B. Ward (n.d.) indicate that these features have a genetic basis, but formal recognition of these different phases as species does not lead to better understanding of the complex. The variants in Florida may have originated from introductions of divergent forms of Zamia from elsewhere. The starchy stems, after treatment to remove a poisonous principle, were a significant part of aboriginal diets, and the plants were presumably dispersed by aborigines.

Zamia angustifolia Jacquin, a species thought to be restricted to the Bahamas and eastern Cuba, was reported in southern Florida by J. K. Small (1933). No voucher specimens were cited or are known to exist. Small also reported Zamia pumila Linnaeus from Florida, although erroneously.

Copyright (c) 1996 Flora of North America