Appendix A: URI in HTML tags -...

Semantics, Complexity and Capability:The Use of Integrated Navigational Tools

for Information Finding in Hypertext Document Space

By

Wasu Chaopanon

B.S.E.E., Khon Kaen University, Thailand, 1987

M.S., Computer Science, New York University, New York, 1996

Submitted to the Graduate Faculty ofInformation Sciences in partial fulfillment

of the requirement for theDegree of Doctor of Philosophy

University of Pittsburgh

2001

Semantics, Complexity and Capability:The Use of Integrated Navigational Tools

for Information Finding in Hypertext Document Space

Wasu Chaopanon, Ph.D.

University of Pittsburgh, 2001

This study examines the performance of navigational tools in information finding tasks based

on the complexity of the hypertext space, and the degree of “information scent” available through the

tools. Operational metrics for the Web site complexity were examined and analyzed. Information

scent was measured empirically. The 3x2x2 factorial design within subjects was employed. A

browser, a graphical overview and integrated tool were examined. Questions were created, measured

for information scent, and classified as high information scent and low information scent questions

over six Web sites, three low complexity Web sites and three high complexity Web sites. The number

of tasks completed, the number of answers found, time spent on task, and the number of pages viewed

were measured.

Performance in the information finding tasks was different when using different tools in the

various conditions. The results showed that there were significant interactions between tool, Web site

complexity, and question type in performance measurements. Three-way interactions were found in

the number of tasks completed and the number of revisited page views. Two-way interactions

between tool and Web site complexity were found in the number of page views, the number of pages,

and the number of extra page views. Two-way interactions between tool and question type were

found in the number of answers found, time spent on task, the number of pages, and the number of

extra page views. The Web site complexity and the information scent show strong effects on tasks’

performance.

Although the integrated tool had more capabilities than either single tool alone, it did not

provide higher performance. The integrated tool leverages the difference of single tool capabilities.

There was an indication that the integrated tool had more cognitive overhead.

iii

TABLE OF CONTENTS

1 INTRODUCTION...........................................................................................................................1

1.1 Overview.................................................................................................................................1

1.2 Problem Statement..................................................................................................................2

1.3 Motivation and Goal of this Research.....................................................................................4

1.4 Definition of Terms.................................................................................................................5

1.5 Scope and Limitations.............................................................................................................6

2 LITERATURE REVIEW................................................................................................................7

2.1 Document space and the WWW.............................................................................................7

2.1.1 Document Spaces............................................................................................................7

2.1.2 Hypertext.........................................................................................................................8

2.1.3 The World Wide Web (WWW)......................................................................................9

2.2 Navigation and Information-Finding Tasks..........................................................................11

2.2.1 The Questions Answered by Navigational Tools..........................................................12

2.2.2 Navigation in Document Space.....................................................................................13

2.2.3 Problems of Navigation in Document Space................................................................16

2.2.4 WWW Usage.................................................................................................................18

2.3 Navigational tools in Document Spaces................................................................................19

2.4 Integrated Document Space Navigational tools....................................................................24

2.4.1 Tool Integration.............................................................................................................24

2.4.2 Evaluation of Integrated Navigational tools in Hypertext.............................................28

3 RESEARCH METHODOLOGY..................................................................................................31

3.1 Introduction...........................................................................................................................31

3.1.1 Document Space............................................................................................................31

3.1.2 Web Site Metrics...........................................................................................................31

3.1.3 Study of Web sites structure..........................................................................................36

3.1.4 Task and semantic relatedness......................................................................................49

3.1.5 Navigational tools and Integration................................................................................50

3.1.6 Summary.......................................................................................................................53

3.2 Hypotheses............................................................................................................................53

3.3 Participants............................................................................................................................55

iv

3.4 Material.................................................................................................................................55

3.4.1 Web Sites.......................................................................................................................55

3.4.2 Questions and their Information Scent..........................................................................56

3.4.3 Software.........................................................................................................................60

3.5 Experimental Design.............................................................................................................60

3.6 Experimental Task.................................................................................................................61

3.7 Procedure...............................................................................................................................61

3.8 Data Collection and Measurement........................................................................................62

4 RESULTS AND DISCUSSION...................................................................................................64

4.1 Demographic Data of Recruited Subjects.............................................................................64

4.2 Results...................................................................................................................................65

4.2.1 Tool usage.....................................................................................................................65

4.2.2 Task completion............................................................................................................80

4.2.3 Number of answers found.............................................................................................82

4.2.4 Task performance..........................................................................................................87

4.2.5 Web complexity, Question type and their interaction...................................................98

4.3 Summary task performance at each condition......................................................................98

4.3.1 Low complexity Web sites with high information-scent questions..............................99

4.3.2 High complexity Web sites with high information-scent questions..............................99

4.3.3 Low complexity Web sites with low information-scent questions.............................100

4.3.4 High complexity Web sites with low information-scent questions.............................100

4.4 User satisfaction..................................................................................................................101

4.5 Support for Hypotheses.......................................................................................................103

5 CONCLUSIONS AND FUTURE STUDY................................................................................105

5.1 Review of the research........................................................................................................105

5.2 Summary finding.................................................................................................................106

5.3 Comparison to prior research results...................................................................................107

5.4 Issues to reconsider.............................................................................................................109

5.5 Future research....................................................................................................................110

Appendix A : Web visualize tools.......................................................................................................111

Appendix B : URI in HTML tags........................................................................................................113

Appendix C : Stratum formula............................................................................................................113

v

Appendix D : Web site structure statistic............................................................................................114

Appendix E : Web Sites in the experiment and their properties.........................................................116

Appendix F : Information Scent experiment.......................................................................................117

Appendix G : The main experiment instruction sheet.........................................................................127

Appendix H : Questionnaires..............................................................................................................135

H.1 Demographics, Computer and World Wide Web Experience form.........................................135

H.2 Web sites familiarity score.......................................................................................................136

H.3 User satisfaction Questionnaire................................................................................................137

Appendix I : Statistical Analysis results..............................................................................................139

I.1 Tool usage statistic.....................................................................................................................139

I.2 Task completion statistic............................................................................................................140

I.3 Number of answer found statistic..............................................................................................141

I.4 Outliers: Extreme cases..............................................................................................................144

I.5 Time spent on task statistic........................................................................................................146

I.6 Number of page viewed statistic................................................................................................147

I.7 Tools performances comparisons...............................................................................................151

I.8 Web complexity by question type interaction............................................................................153

I.9 User satisfaction statistic............................................................................................................154

I.10 Web site familiarity statistic....................................................................................................155

Reference List......................................................................................................................................156

vi

List of TablesTable 1: Content types of scanned URLs..............................................................................................38

Table 2: Tags-attributes of links............................................................................................................39

Table 3: Descriptive Statistics of number of nodes and links...............................................................42

Table 4: Descriptive statistic of Web site properties.............................................................................44

Table 5: Number of Web Sites by their complexity..............................................................................47

Table 6: Questions classification based on their information scents.....................................................58

Table 7: Summary of the information scent of the selected questions..................................................59

Table 8: Summary of the minimum pages required finding the selected target nodes.........................59

Table 9: Summary of subjects’ demographic data................................................................................64

Table 10: Summary of subjects’ computer experience data..................................................................65

Table 11: Summary statistic of time between anchor clicks in the browser.........................................66

Table 12: ANOVA on ln(time between anchor clicks) of the browser.................................................67

Table 13: Pairwise comparison between ln(time between anchor clicks), Bonferroni adjustment......67

Table 14: Summary statistic of time between icon clicks of the graphical overview...........................68

Table 15: ANOVA on ln(time between icon clicks) of the graphical overview...................................68

Table 16: Pairwise comparison between ln(time between icon clicks), Bonferroni adjustment..........69

Table 17: Frequency Distribution for tool usage based on location of navigation actions...................70

Table 18: Summary statistic of BNAR and BTUR grouped by Web site complexity conditions and

question type conditions................................................................................................................72

Table 19: ANOVA on Browser Navigation Action Ratio....................................................................72

Table 20: State transition probability in using the integrated tool........................................................74

Table 21: Time between state transitions in using the integrated tool..................................................75

Table 22: ANOVA on ln(time between clicking) when using the integrated tool................................76

Table 23: ANOVA on ln(time between anchors clicking) comparison the browser and the integrated

tool.................................................................................................................................................77

Table 24: ANOVA on ln(time between icons clicking) comparison the graphical overview and the

integrated tool................................................................................................................................77

Table 25: Summary statistic of adjusted time spent on tool..................................................................79

Table 26: Summary statistic of number of tasks completed.................................................................80

Table 27: ANOVA on number of tasks completed, lower bound correction........................................81

Table 28: ANOVA on number of tasks completed in the high complexity Web site condition with

lower-bound correction.................................................................................................................82

vii

Table 29: ANOVA on number of task completed in the low complexity Web site condition with

lower-bound correction.................................................................................................................82

Table 30: Summary statistics of the number of answers found............................................................83

Table 31: Summary number of answer found, answer not found, and timed out grouped by Web site

complexity and question type........................................................................................................84

Table 32: ANOVA on the number of answers found............................................................................86

Table 33: Summary statistic of time spent on tasks (sec.)....................................................................87

Table 34: ANOVA on ln(time spent on task)........................................................................................88

Table 35: Descriptive statistics of the number of page views and the number of pages.......................90

Table 36: Descriptive statistics of the number of revisited page views and the number of extra page

views..............................................................................................................................................91

Table 37: Number of tasks where the extra page views were zero.......................................................92

Table 38: ANOVA on ln(number of page views), ln(number of pages), ln(number of revisited page

views), and ln(number of extra page views).................................................................................93

Table 39: ANOVA on ln(number of revisited page views) only in the high information-scent question

type................................................................................................................................................96

Table 40: ANOVA on ln(number of revisited page views) only in the low information-scent question

type................................................................................................................................................96

Table 41: Summary of tools difference in Web site complexity and question type condition.............98

Table 42: Questionnaire descriptive statistics.....................................................................................102

Table 43: ANOVA on PSSQU score with lower-bound correction....................................................102

Table 44: Correlations between numbers of nodes.............................................................................114

Table 45: Correlations between numbers of links...............................................................................114

Table 46: Distance measurement correlation......................................................................................114

Table 47: Correlation between the Web site metrics..........................................................................115

Table 48: Questions, target Web page and selected Web pages for information scent experiment. . .121

Table 49: Information-scent score.......................................................................................................126

Table 50: Pairwise Comparisons ln(time between clicking) of the integrated tool............................139

Table 51: Mauchly's Test of Sphericity on number of tasks completed.............................................140

Table 52: Pairwise Comparisons on number of task completed between tools in question type

conditions only in high complex Web site..................................................................................140

Table 53: Number task that subject visited the target node but submitted other node or time out.....141

Table 54: Number of answer grouped by question.............................................................................141

Table 55: Task submitted only the answer not found.........................................................................142

viii

Table 56: Mauchly's Test of Sphericity on number of answers found................................................142

Table 57: Pairwise comparisons on number of answers found between tools in question type

conditions....................................................................................................................................143

Table 58: Pairwise Comparisons on number of answers found between tools in Web site complexity

conditions....................................................................................................................................143

Table 59: Number of extreme case......................................................................................................145

Table 60: Mauchly's Test of Sphericity on ln(time spent on task)......................................................146

Table 61: Pairwise comparisons on ln(time spent on task).................................................................146

Table 62: Mauchly's Test of Sphericity for number of pages statistic................................................147

Table 63: Pairwise comparisons on Ln(number of pages) between tools in Web complexity conditions

.....................................................................................................................................................148

Table 64: Pairwise Comparisons on Ln(number of pages) between tool in question type conditions

.....................................................................................................................................................149

Table 65 Pairwise comparisons on Ln(number of re-visited pages) between tools in Web complexity

conditions only in the high information-scent question type......................................................150

Table 66: Pairwise comparisons on Ln(number of re-visited pages) between tools only in the low

information-scent question type..................................................................................................150

Table 67: Tools performances comparisons........................................................................................151

Table 68: Pairwise Comparisons between question types in Web site complexity conditions...........153

Table 69: Mauchly's Test of Sphericity on PPSUQ score...................................................................154

Table 70: Pairwise Comparisons on PPSUQ score between tools......................................................154

Table 71: Subject's Web site familiarity.............................................................................................155

Table 72: Tasks performed by subjects who had visited Web sites prior experiment grouped by tool,

Web site complexity, and question type......................................................................................155

ix

List of Figures

Figure 1: Graphical overview and Browser............................................................................................2

Figure 2: Frequency of navigation action as a percentage of the total navigation events and (b) details

of Open URL action......................................................................................................................18

Figure 3: Process model of information seeking using Web (transition probability)...........................19

Figure 4: A taxonomy of multiple window coordination (North & Shneiderman, 1997).....................26

Figure 5: Proportion of navigational tool usage in an exploratory and directed tasks..........................29

Figure 6: Summary of URLs founded...................................................................................................38

Figure 7: Links summary......................................................................................................................39

Figure 8: Number of URLs...................................................................................................................41

Figure 9: Number of links.....................................................................................................................41

Figure 10: Histogram of number of HTML nodes and number of connections....................................42

Figure 11: Total URLs versus total links and HTML node versus connections of each site................43

Figure 12: Histogram of #connections per #HTML node-1..................................................................45

Figure 13: Histogram of connected ratio...............................................................................................45

Figure 14: Histogram of stratum...........................................................................................................46

Figure 15: Histograms of distances.......................................................................................................46

Figure 16: Mean directed distance and bi-direction distance versus Number of HTML nodes...........48

Figure 17: Scatter plots between Web site parameters..........................................................................48

Figure 18: The browser screen snapshot...............................................................................................51

Figure 19: The graphical overview and text viewer screen snapshot....................................................52

Figure 20: The graphical overview and the browser.............................................................................52

Figure 21: Information scent score........................................................................................................58

Figure 22: Cell line chart of mean (time between anchor clicks) when using the browser..................67

Figure 23: Cell line chart of mean (time between icon clicks) when using the graphical overview....68

Figure 24: Histogram of browser navigation action ratio in the integrated tool...................................71

Figure 25: Histogram of browser time usage ratio in the integrated tool..............................................71

Figure 26: State transition probability in using the integrated tool.......................................................73

Figure 27: Cell line chart of mean (time between clicking) when using the integrated tool................76

Figure 28: Cell line chart of mean ln(time between anchor-anchor clicking) when using the browser

and using the integrated tool.........................................................................................................78

Figure 29: Cell line chart of mean ln(time between icon-icon clicking) when using the graphical

overview and using the integrated tool..........................................................................................78

x

Figure 30: Cell line chart of mean number of tasks completed grouped by tool, Web site complexity,

and question type show interactions..............................................................................................81

Figure 31: The percent of answers found, answers not found, and tasks incomplete for each question.

.......................................................................................................................................................85

Figure 32: Histogram of submitted pages each question only tasks that not timed out and the target

node not found...............................................................................................................................85

Figure 33: Cell line charts of mean number of answers found showing tool by Web site complexity

interaction and tool by question type interaction..........................................................................86

Figure 34: Histogram of time spent on task..........................................................................................88

Figure 35: Cell line chart of mean ln(time spent on task) grouped by tool, question type show tool by

question type interaction...............................................................................................................89

Figure 36: Histograms of the number of page views, the number of pages, the number of revisited

page views, and the number of extra page views by tasks............................................................91

Figure 37: Cell line chart of mean ln(pages views) shows tool by Web site complexity interaction...94

Figure 38: Cell line charts for ln(number of pages) shows tool by Web site complexity interaction and

tool by question type interaction...................................................................................................95

Figure 39: Cell line chart of mean for ln(number of revisited page views) show tool by Web

complexity interaction...................................................................................................................95

Figure 40: Cell line charts of mean ln(number of extra page views) show tool by Web site complexity

interaction and tool by question type interaction..........................................................................97

Figure 41: Web browser with a distortion technique tool...................................................................111

Figure 42: Web browser with a zoom technique tool.........................................................................111

Figure 43: Web browser with an expanding outline technique tool....................................................112

Figure 44: Demographic data screen...................................................................................................135

Figure 45: Web site familiarity screen................................................................................................136

Figure 46: User Satisfaction Questionnaire screen.............................................................................137

xi

Semantics, Complexity and Capability:

The Use of Integrated Navigational Tools for Information Finding in

Hypertext Document Space

1 INTRODUCTION

1.1 OverviewThis research examined the use of integrated navigational tools to find information located

within a single Web site of the World Wide Web (WWW). An empirical experiment was conducted

in order to understand the use of navigational tools in document spaces of varying complexity and

with varying levels of semantic information.

Ease of access has made the WWW a common source of information. The number of Web

pages already exceeds 800 million (Lawrence & Giles, 1999). It has been growing at an exponential

rate and is expected to double in the next five years (Nielsen, 1999). It is sometimes difficult to find

information in this massive information space and improvements in Web page structure and

navigational tools are needed.

From the library at Alexandria to the electronic repositories on the WWW, browsing has been

a method people use to find information. The process of browsing is easy to understand using

metaphors of space, place, and movement. A sense of location and place can easily be obtained by

most users with little conscious attention (McKnight, Dillon, & Richardson, 1991). Navigation is the

activity that allows browsing of a document space. The design of improved navigational tools will

contribute to the overall efficiency and effectiveness of browsing activities.

While the capability of the tools available is one factor in navigation, it is not the only factor.

The structure of the hypertext or document space also influences the navigation process. For instance,

a typical Web browser in a space that is a linear linked list will require a visit to all nodes before

accessing the end node. In a mesh structure, all nodes will be one link apart. Thus, link following is

strongly influenced by the underlying structure. On the other hand, using a graphical overview (see

Figure 1), any visible node can be selected directly without regard to the structure.

1

Figure 1: Graphical overview and Browser

There are yet other factors beyond tool capability and space complexity that interact with the

navigation process. In an information-finding task, the match between the information need and

information provided by a navigational tool becomes a significant factor in selecting the path to travel

through a document space. At a simple level, labeling of link anchors or nodes in a graphical

overview can provide cues as to where to go.

In summary, navigation to find information is dependent upon the complexity of the space in

which the information is located, the nature of the navigational tools available, and the richness of the

information available about the space.

1.2 Problem StatementResearchers are looking at tools to manage the various information spaces. This paper focuses

on one subset of tools, navigational tools, as one method of finding information. Further, the focus is

on one type of information space -- a document space. The World Wide Web (WWW) was selected

as the subject of the investigation because it is widely used and has hypertext features which are a

generic class of document spaces. A document space that has a list structure or a hierarchical

structure is a special case of a network.

There are already many kinds of navigational tools. In order to improve their effectiveness,

integration among tools is proposed as a key factor. The idea comes from Spring, Morse, & Heo

(1996) who discussed a set of interrelated tools that play a role in different phases of navigation.

Two navigational tools will be examined, Web browser and graphical overview. A Web

browser is the most common mode of navigation in the WWW (e.g. Internet Explorer or Netscape).

2

A graphical overview is a traditional hypertext tool, and in the literature is often called a “Browser”.

In the early hypertext literature, a browser provides a structural overview of a hypertext. However, in

this study, the browser will refer to the Web browser one.

A Web browser provides navigation capability as well as document content presentation. Only

one document, one page or one node at a time is presented by a Web browser. It is similar to

navigating in an egocentric view. In contrast, a graphical overview presents a view of the overall

structure of a hypertext, an exocentric view. Depending on the size of the Web site, a graphical

overview may present only a local overview of space. With a scroll bar, other areas can be shown. A

graphical overview navigates a Web site via active graphical objects. The integrated tool in this thesis

is the combination of the graphical overview and the browser.

Navigation in a Web browser is tied to the structure of the Web pages because the two

common methods of navigation in a Web browser are following a link and going back. On the other

hand, navigation by a graphical overview allows jumps to any node in the space with equal ease. A

graphical overview has it own problems in navigation. It cannot show very much information about

the nodes, i.e. only label, part of a label or some encoded data via color, size, shape of icon. This is

due to the size of the structure. It may present the Web page as a very small icon. The links can

quickly overwhelm the display. Many other display techniques may help such as zooming,

focus+context scheme, grouping nodes into single node, and combining multiple links line into single

thick line. An integrated tool might perform better than a single tool alone since it has the capability

of each individual tool.

Empirical studies show that using an additional navigational tool, specifically an overview

map, causes mixed results in efficiency and effectiveness of navigation process. Monk, Walsh, & Dix

(1988) show significant improvement when the overview map is provided. Hammond & Allinson

(1989) report on a small effect or non-statistical improvement in task efficiency. Heo (2000) reports

lower performance in a navigation task when an integrated tool was used with the Web browser, i.e.

response times were higher when compared to using the Web browser alone. Details of these studies

will be presented in section 2.4.2. Many of the new navigational tools presented do not report a

usability study.

One of the goals to navigation in the WWW is to find useful information. The information

need drives the navigation process. In navigating, decisions are made to select the path. These

decisions depend on the information need and the information that is provided by the environment,

i.e. information presented by the interface used for navigation. The relation between the information

need and the information provided by the tool is defined as “semantic relatedness”, “residual

information”, and “information scent.” For example, suppose the information need is some person’s

3

office address. We navigate the WWW looking for the person’s name. If the Web page contains an

anchor with that person’s name, it has a high information scent. The anchor may or may not lead to

the person’s address information. If the Web page contains an anchor with “personnel” or “staff”, it

has a lower information scent. On other hand, if the Web page contains nothing related to the person

at all, it has low information scent. Semantic relatedness is discussed further in section 2.2.2.

1.3 Motivation and Goal of this ResearchFurnas (1997) provided a framework to determine the effectiveness of a view of a space. A

view is a presentation of the information space via a user interface. The view can be analyzed in terms

of view traversal and view navigation components. The view traversal refers to the ability to move

the view around within an information structure. The traversibility of the view can be described in

terms of “out-degree of vertices” and the distance between pairs of vertices. The vertices are the

active items in the view. Out-degree of vertices refers to the number of vertices that the source

vertices lead to. View navigation refers to information in each view that describes other views. The

view that is effective should have a low out-degree of vertices, low distance between pairs of vertices

and high “residue” of view information in all other views. The “residue” concept is similar to

“information scent”.

Furnas’s framework indicated two components that might be used to improve the

navigational performance when using the Web browser, structural property of the Web site and

semantic relatedness between information need and information provided by the Web page. The

navigational performance when using the Web browser with a graphical overview should be different

from using the Web browser alone because the view of the information space is different.

This study explored the suitability of selected types of navigational tools for different spaces

(Web sites). Navigation in an information-finding task was a major concern. The structure of Web

site was analyzed in terms of several metrics – number of nodes, number of links, mean distance

between nodes, etc., as an indicator of its complexity. Information scent was also measured as the

relation between the information sought in an information-finding task and information provided

through the interface. In this study, the structure of Web site and information scent were controlled.

Two navigational tools were tested and their performance was compared in different complexities

under different information scent conditions. The integration of navigational tools was compared to a

single tool performance.

Based on the findings, new tools may be recommended for certain types of document spaces

or modifications in the design of spaces may be suggested when appropriate tools are unavailable.

Finally we believe the study may reveal the conditions under which integrated navigational tools

4

contribute to navigation and conditions under which they simply add noise and unnecessary cognitive

overhead to the task.

1.4 Definition of Terms Anchor: an active text or graphic area in a hypertext system indicating a link. It is

used in a link following interface to navigate to the link’s destination.

Closed Hypertext: a self-contained hypertext system.

Document: “A document is an identifiable entity, having some durable form,

produced by a person or persons toward the goal of communication and may take a

number of forms, but must have at least one symbolic manifestation that can be

comprehended by humans." Spring, 1991 (p.8)

Document Space: a collection of documents, which have some common attributes.

Graphical overview: a graphic user interface that provides an overview a set of

linked nodes.

Hypertext: a document or documents with explicitly defined relationships between

documents or document components.

Information Scent: “the (imperfect) perception of the value, cost, or access path of

information sources obtained from proximal cues” (Pirolli & Card, 1999).

Link: an explicitly defined relationship between nodes in a hypertext system.

Navigation: a process of moving in space, including virtual movement through

cognitive space.

Navigational tools: tools that help us in navigation. These include tools to navigate

and tools that give information for navigation.

Node: a basic unit of reference in a hypertext system. A node contains content.

Open Hypertext: a hypertext system that is linked to other hypertext systems.

System: a coordinated and integrated set of tools.

Tool: modular program that provides a specific presentation and interaction and

fulfill a special function.

Typed Link: a link that provides additional information about the relationships

between the linked components.

Web browser: a user interface that presents a single node, with the capability to

display anchors that may be used as navigation links. Internet Explorer and Netscape

navigator are examples of Web browsers.

Web Site: a set of Web pages that is provided by a Web Server.

5

World Wild Web (WWW): an open hypertext system implemented using the HTTP

protocol, HTML, and other markup languages, and URL links.

1.5 Scope and LimitationsMany structural properties of a hypertext have been investigated, including number of nodes,

number of links, and topology. Research has shown the relation between structure and navigation

performance, as is discussed in section 3.1.3. It would be an advantage to predict navigation

performance of a Web site in advance of constructing the Web site and to use these metrics as an

assessment tool. However, there are many metrics and the interaction effects between these metrics

and navigation performance are unknown. Some metrics are subjective. The main concern in this

proposal was to classify the document space (Web site) into high complexity and low complexity

rather than evaluate the metrics. The selected Web site metrics might not be a good representation of

complexity of the Web site’s structure. As a consequence, the metrics selected here might not be a

good predictor of navigation performance.

There are a wide variety of navigational tools. Two navigational tools were selected in this

study, a Web browser and a graphical overview. The results of this study on browsers and graphical

overviews might not be able to be generalized to other types of navigational tools. The interfaces used

in the experiment represent only one instance of a Web browser and a graphical overview. Thus, the

results might not be generalized to the broader class of navigational tools. The performance might

also depend on other factors such as data encoding schemes or interaction techniques.

Many tasks are performed with WWW, including finding information, reading, learning and so

forth. The navigation process is a sub-task. The navigation process is reviewed in section 2.2. The

information finding task was addressed in this thesis because it is a common task in the WWW

environment. However, navigational tools that achieve high performance for the navigation process in

information finding task may not facilitate other tasks. For instance, a navigational tool that makes it

easy to remember documents, may not have a significance effect in improving a navigation process in

new and unknown environments but it may improve the navigation process for re-visiting documents.

The study assumes users are of average skill and engaged in an information-finding task. Results

within a controlled experimental setting may vary from those in a real environment.

6

2 LITERATURE REVIEW

2.1 Document space and the WWWThere are many definitions of a document. Efforts to define what a document is, and more

generally, what information is, have been discussed in detail by Buckland (1991). He points out that

definitions for a document have ranged from any text object to any informative thing, including living

animals in a zoo. To narrow the scope of this study, the document definition given by Spring (1991)

will be used. It is stated as follows;

“A document is an identifiable entity, having some durable form, produced by a person or

persons toward the goal of communication and may take a number of forms, but must have at

least one symbolic manifestation that can be comprehended by humans." (p.8)

Documents include text, graphics, images and sounds in various combinations. Documents

may be produced on demand, based on what customers need and when they need it. Using the

WWW, the contents of a document can be constructed based on a user’s request. Many news Web

pages are “live documents,” i.e., the content of the document is dynamic. New document types, such

as active documents that search for users instead of waiting for to be found by a user, are beginning to

emerge.

2.1.1 Document Spaces

Benedikt (1992) investigated physical space to develop guidelines for designing artificial

spaces. He discussed space in terms of its topological properties, including dimensionality, continuity,

limits, and density. From these space properties, seven principles were proposed for designing a

cyberspace. The principles concentrated on what it would look like and how it would be effectively

presented. A space's dimensions may be described as extrinsic and intrinsic. Generally, an extrinsic

dimension controls the location of objects in space-time. An intrinsic dimension is a property of an

object. A space may be bounded or unbounded, as well as discrete or continuous. In part this depends

on the nature of the data type mapped to the spatial dimensions. Theoretically, some spaces have

unbounded dimensions. For example, the dimension formed by an integer attribute, such as file size,

has no upper limit. Practically, however, there is a finite number of documents in some given scope.

A space can be bounded on some values but still be extensible, i.e. bounded space may have infinite

resolution, (e.g. rational numbers between integers). The density of a space refers to how many

objects and sub-spaces can be contained within the space. The density will be reflected in scale of

space and movement through space.

7

Document space is used to refer to a collection of documents with some common attributes.

It is possible that some attributes are specified only in some documents. In general, orthogonal

attributes are used as dimensions. A space is defined by its dimensions. A space implies all possible

objects in it with respect to the dimensions. In this view, a document space is not the same as a

perceived physical space. However, it can be projected so as to be presented in a perceived pseudo

physical space.

Given a space, documents are objects within the space. (There are also other possibilities for

transformation of a document mapped to a non-object, such as vector field or force, but these cases

are rare). For the purposes of this discussion, document-objects are projected into some location in a

space, based on attribute values that conform to the dimensions of the space. The perception of a

document object is controlled by space properties.

As a corollary of the definition of a space, it is useful to define the laws that apply to all

objects in the space. In this paper, space is often defined in terms of the properties of the objects in

the space. Objects may belong to a space if they contain an existence property. For instance, a query

will result in the creation of a sub-space, and only documents that match a query belong in that sub-

space. Other laws would include the notion that position and distance are created by a space itself,

and that there is a Universe, the space that covers all spaces. General laws may be defined in the

design of a space. In physical space, the laws of physics govern. For example, two objects cannot

coexist at a given location; i.e. only one object can exist at single location. However, this and other

laws may be relaxed in an artificial space.

2.1.2 Hypertext

In a hypertext system, a document is no longer a single integrated unit, but may consist of a

network of components. A document is no longer linear but consists of a graph of “nodes” and

“links.” One may consider a hypertext as a set of documents, where each path through the nodes may

be defined as one document. Further, because users can choose any path when reading or can create

new links, the structure of the document is both dynamic and extensible, publicly and privately.

A node contains content and anchors. A link is defined as the relation between two anchors.

In general, a link joins a source anchor and a destination anchor. In implementation, a link also

contains source node identification and destination node identification. The scope of an anchor is

bound in a node. A link may contain other attributes such as link types and directions. Links may be

managed by a link manager to maintain consistency when a node is moved or deleted. More details

about a concept and implementations of hypertext systems can be found in Conklin (1987).

8

Hypertext was first envisioned by Vannevar Bush (1945). The memex (memory extension) he

envisioned contained a very large library and personal notes. It was used to make links to related

documents, thereby joining them into a trail. The system was optimized for scientific use and the

primary goals were to support making notes and browsing documents. Douglas Engelbart (1963)

developed the first operational computer-based hypertext system, NLS (oN Line System). Ted Nelson

(1987) is considered by many to be the spiritual father of a global hypertext system which he called

Xanadu. In Xanadu, related documents would be linked together on a large scale where everything

would be in a single system. Further, he envisioned a document being archived with a history of its

development -- versioning.

2.1.3 The World Wide Web (WWW)

The World Wide Web originated as a distributed hypertext system. It consists of an address

system (Uniform Resource Locators: URLs), a network protocol (HyperText Transfer Protocol:

HTTP), and a markup language (HyperText Markup Language: HTML) (Berners-Lee, Cailliau,

Luotonen, Nielsen, & Secret, 1994). A WWW system is composed of one or more WWW servers and

one or more WWW browsers. The first widely used WWW browser, Mosaic, was able to view

HTML documents and pictures. In addition to using HTTP, it was capable of using GOPHER and

FTP protocols.

The Uniform Resource Locator (URL) (Internet Engineering Task Force [IETF], 1994

[RFC1738]) standard specifies mechanisms for locating resources. URL is a subset of the Uniform

Resource Identifier (URI) (IETF, 1998 [RFC2396]). The URL standard specifies the syntax and

semantics in the context of the Internet. It comprises a syntax for protocol names, host Internet

addresses, and internal file names. The “query operator” may be applied to a URL as a mechanism to

pass state parameters through a URL.

The HTTP protocol is stateless (IETF, 1999 [RFC2616]). HTTP 1.1 offers nine operations of

which “GET” and “POST” are the most frequently used. Resources can be obtained from or stored

on a server. It also provides a flexible scheme for transferring many types of data.

HTML (World Wide Web Consortium, 1999), while considered by many to be a markup

language in its own right, is in reality an instantiation of one Document Type Definition (DTD) under

the Standard Generalized Markup Language (SGML). It provides the syntax of markup in an HTML

document. HTML specifies the syntax for specifying hypertext links. The browser is able to

recognize an anchor and traverse a link embedded within an HTML document. The distinction

between links and anchors is collapsed into a single anchor tag using the HREF (Hypertext

REFerence) attribute. It is a unidirectional, untyped, direct link. (HTML version 4 proposes the

9

capability for link types and direction). WWW client-software is required to comprehend an HTML

document. Current WWW client software also has the ability to present a variety of document

formats. HTML is currently being superceded by the eXtensible Markup Language (XML), which

like SGML, is a standard that allows for the definition of multiple document types.

According to Conklin's definition of hypertext (Conklin, 1987), the WWW is a weak example

of hypertext. It lacks the node and link manager aspect of many of the early hypertext systems. There

is nothing to prevent the dissolution of links or the creation of invalid links. Conklin suggested that an

essential component of hypertext was a “browser.” The “browser,” used to display the network

graphically for navigation, does not exist as a standard part of the WWW. In the WWW, the term

“browser” is used to refer to a tool for viewing a node.

The WWW uses the concept of a page (a hypertext node). A document is not specifically

defined. It may be a single page or a set of pages. Multiple documents could be included on one page.

Because a page can be pointed to by a URL, most components (e.g. server, client, and search engine)

use a page as a basic unit for service. The HTML standard is flexible; metadata may be used to

describe a set of pages as a document.

In the WWW, relationships between pages are explicitly defined by links as in a hypertext

system. The frame feature in HTML creates a complex relation between pages allowing new kinds of

implicit relations. On the presentation level, frames create an effect of state. The view is dependent on

which combinations of nodes are used to fill a frame. The frame creates a structural display area; that

can show multiple HTML files. Activating links in one frame area can causes another frame area to

display a different HTML file.

Many features were added to HTML versions 3 and 4.01 to support a variety of interactions

for WWW clients. These include applets, intrinsic event declaration, and scripting. With add-on

technology and improvement of Web browsers, current WWW content may also include programs,

e.g. java scripts. As a result, the interface of the WWW is equivalent to an interactive program, not

simply a text and image viewer.

As reported by Lawrence and Giles (1999), in February 1999, the estimated number of Web

servers was 2.8 million. Lawrence and Giles reported, based on a sampling of the number of pages in

thousands of servers, the mean number of Web pages per server was about 300, and distribution was

skewed. The estimation of total number of Web pages was about 800 million.

A comprehensive summary of WWW data can be found in Pitkow (1998). The summary

includes a characterization of client, proxy and gateways, server, and WWW. Woodruff, Aoki,

Brewer, Gauthier, & Rowe (1996) and Bray (1996) studied Web page characteristics showing; the

mean page sizes are 4.4 KB and 6.5 KB; the median is 2 KB; a page size distribution has high

10

deviation with a long tail. Bray also reported that over 50% of pages contain more than one image.

The HTML format is used in 76% of all nodes, and nearly 95% of HTML pages had the HREF

attribute with an average of 14 anchors per document (Woodruff, Aoki, Brewer, Gauthier, & Rowe,

1996). The number of links between sites was small, nearly 80% of sites had no links to other sites,

and 80% of sites had 1-10 links pointing to them (Bray, 1996). These figures indicate that only a

small number of major sites had contributed to navigation to other sites. A WWW structural analysis

can be found in Broder et al. (2000). This study indicated that the distribution of in-degree and out-

degree follows a power law. The study showed that there is a 75% chance that there are no paths

between two random nodes and if there is a path, on average there will be 16 links in the path. The

life span of nodes is around 50 days (Pitkow, 1998). It should be noted that this does not include

pages generated dynamically; these pages have a life span equal to the length of time they are viewed.

More details about Web page lifetime and rate of change can be found in Brewington & Cybenko

(2000).

2.2 Navigation and Information-Finding TasksJul and Furnas (1997) indicate that navigation is a process of moving something, i.e.

locomotion of either navigator or object, and making decisions about where to move. These processes

take place within a context, i.e. within an information environment or set of locations. Locomotion

assumes the concepts of location and direction. The decisions that are made sometimes follow a plan

and sometimes respond to the environment according to some goal. They depend on both declarative

and procedural knowledge and frequently require coordination of knowledge in different forms

(orientation). Thus, navigation is an incremental real-time process that integrates these two

components (locomotion and decision-making). In the process of navigation, a mental or physical

map of space is built. Jul and Furnas discuss situated navigation, in contrast to plan-based navigation,

models of navigation, and other issues including characteristics of the space, task, strategy, and user

knowledge.

Spence (1998) divides navigation activity into browsing, context modeling, gradient

perception, and strategy formulation. These activities are driven by intention.

The navigational process relies on knowledge about space. Knowledge about physical space

primarily comes from the senses, directly from the environment, or indirectly through a map or some

other representational aid. There are differences between small physical spaces, those within the line

of sight, and large spaces. Both route and survey perspectives are commonly used to communicate

spatial knowledge. A route perspective may use observers as a frame of reference, i.e. an egocentric

perspective. Alternatively, the environment can be described by the relative direction of a landmark to

11

an observer. The survey perspective takes a view from above, an exocentric frame of reference, and

describes environments relative to one another. On a small scale, such as a tabletop view, or on a very

large scale, such as a state level or global level, we get spatial knowledge from survey perspectives.

We look from above onto some representation or map, because, either we cannot be within the

environment, or the environment is too large to obtain route information.

Tversky, Franklin, Taylor, & Bryant (1994) indicated that the perspective information, either

route or survey, is not encoded in spatial mental models. Knowledge of route and survey perspectives

can be translated into each other equally well. However, a human “cognitive map” is not as accurate

as a physical map. Hirtle and Jonides (1985) reported on evidence of hierarchical relations in the

recognition of places.

In understanding a physical space, some objects are considered landmarks. Landmarks are

special objects and are different from other objects in that environment. Landmarks are used as points

of reference in a space.

2.2.1 The Questions Answered by Navigational Tools

Navigational tools assist in navigation. Functionally, a tool is a navigational tool when it

helps to answer one or more of the following questions;

Where am I?

Where is my destination?

How can I go there?

To answer these questions, the information that helps us identify paths can be received by answering

the following questions:

What are the conditions of alternative paths?

Where have I been?

Where can I go next?

Navigational tools also give us information about space itself.

How are objects in a space related to each other?

Why are the objects in that place?

These questions were mentioned by Grice (1989), cited by Mackinlay (1986), and Fleming

(1998). Fleming addressed navigation in Web page design by dividing user goals and expectations

into three tiers. Similar to the above questions, general navigation questions comprises the first tier.

He added the second tier as purpose-oriented questions and the third tier as product or audience

oriented questions.

12

Whitaker (1997) suggested that navigation is different within structured and unstructured

environments. In a structured environment (e.g. towns and corridors), navigation is primarily based

on landmarks and standard structures of the environment. In the unstructured environment (e.g.

natural or off-road environment), four strategies are used in navigation: prediction, recovery, catching

features, and aiming off. He suggested that these strategies might be applied to the WWW

environment as problem-solving strategies. Prediction, the ability to predict what will come next,

might be used in path selection. Recovery is the ability to recover from loss, i.e. to backtrack.

Catching features are features indicating that a given activity will move us too far from the goal

location. Aiming off is a strategy of following a well-known path, which is not directly toward the

goal location, but not far off either, then moving to the goal location later.

There are many navigation levels, which may be derived from the size of the space. For

example, state maps are used for traveling interstate. These maps show which interstate highways

should be followed. Once in the city, a city map provides more detail about which city road to use.

While driving on highways, road signs usually show how far it is to the next exit. On the other hand,

in cities where the junctions are close together and the speed limit is lower than on highways, road

signs usually show street names. Just as different tools are used to navigate physical spaces, it would

make sense to use variety of navigational tools, depending on the size and type of the document

space.

2.2.2 Navigation in Document Space

While document spaces are no less real than physical spaces, they are less likely to be the

comfortable three-dimensional space we are used to navigating. They may be one-dimensional as is

the case for an ordered list or n-dimensional as in the case of a vector retrieval system. A document

space may have many presentations. Navigational tools will vary based on the presentation of the

space.

Locomotion in a document space can be complex. Clicking on a link or an icon in the

interface, a new display may appear; this may be considered as “go to” or “get it.” Observers may

move in a space, or the space may move and change its appearance around the observers.

The continuity of motion in a physical space may be not applied in a document space.

Jumping from place to place is more common than walking along a continuous path. Travel in the

physical world occurs in an egocentric view where an observer is moving in an egocentric frame of

reference. The navigational information, such as map recognition or route knowledge, will be

transformed and used in this viewpoint. However, in a document space interface, traveling can use

both egocentric and exocentric views. Locomotion in document space can be relative or absolute.

13

Relative locomotion occurs when the next location is relative to a current position, while absolute

locomotion does not need a notion of current position. The common desktop metaphor views objects

on a display as if the view were from above a desk; navigation does not take place “in” a space but

“on” a space.

In physical space, one’s own location is a single point in space. In a document space, it is

possible to have many interfaces of radically different types open in multiple windows. For example,

Microsoft Windows Explorer allows selection of multiple files. The navigation metaphor does not fit

well in this situation because involves different places at the same time. One may argue however, that

observers still have one central point of focus at a particular view or window.

While navigation in physical space is concerned with place and location, with where to go

and how to get there, in a document space, the major concern is the information need. The high level

goal of navigation is the finding and use of information. According to Jul and Furnas (1997), tasks

can be identified as either searching or browsing, and tactics as either querying or navigation. The

definitions are given as follows;

“Searching – The task of looking for a known target.

Browsing – The task of looking to see what is available in the world.

Querying – Submitting a description of the object being sought (for instance, using

keyword) to a search engine which will return relevant content or information.

Navigation – Moving oneself sequentially around an environment, deciding at each step

where to go.” (Jul & Furnas, 1997)

The task and tactics are combined, i.e. searching by querying, searching by navigation, browsing by

querying and browsing by navigation.

Navigational activities are classified by Maurer (1996) in the following five categories:

“Scanning: covering a large area without depth.

Browsing: following a path by association until one’s interest caught.

Searching: striving to find an explicit goal.

Exploring: finding out the extent of the information space.

Wandering: ambling along in a purposeless, unstructured manner.” (Maurer, 1996)

Czerwinski and Larson (1998) discusses Web design and tools according to the following tasks:

Targeted revisitation: finding a Web document that you know exists and that you have

visited before.

Targeted search: finding a Web document that you know exists but that you have never

seen before.

14

Comprehensive browsing: finding a Web document and most of the pages related to it on

a particular topic.

Satisficing during browsing: finding a Web document on a topic that is “close enough” to

the subject at hand.

Navigation in information space is often accomplished by using an interface, a combination

of data presentations and interactions. The knowledge about a space is derived from a presentation

and interaction through an interface. To present data from document space in a display, the physical

dimensions of the display must be used. An object on the screen represents some data from the

document space. There are many ways to encode data into the physical dimensions of the screen and

to specify interactions. The encoding on a screen may not encode anything from the data; i.e. when

users may freely move objects on the screen, location may not be used for encoding. The screen

encoding may be used to encode attributes in one dimension; i.e., objects are spatially displayed in

some sorted order. The notion of “place” of presentation may differ from “place” in the document

space. The place on the screen can be changed dramatically; the distance relationship between

objects may not be preserved while interacting with the interface.

Documents can be classified by type. Document types might include such categories as

fiction or non-fiction, book, text, periodical, journal, novel, news, and so forth. The content of each

document type has some expected structure. For instance, a scientific paper is normally structured in

some order; for example abstract, general discussion, experiment method, experiment result,

discussion, and conclusion. Dillon (1994) has shown that users can predict the location of information

in a journal article with a high level of accuracy. The type of document is also differentiated by how it

is read. A novel may be read only once but a textbook may be read repeatedly. The overall structure

of a document collection of each document type is different. For instance, a book may be referred by

title and author where a newspaper may be referred by date of printing.

Furnas (1997) described the “navigability of a view” as the outer-link information. The outer-

link in a Web page is an anchor and the content that surrounds it. For effective navigability, the outer-

link information should not only describe the next node but also the whole set of nodes that the link

leads to. In other words, a node must have a good “residue” at every other node.

Pirolli and Card (1999) use the term “information scent”, defined as “the (imperfect)

perception of the value, cost, or access path of information sources obtained from proximal cues”

(p.646). The information scent is comparable to “residue” in Furnas’ work. Pirolli, Card, & Wege

(2000) developed and used an information scent score in comparing two navigational tools, the

Hyperbolic browser and the Explorer. The Hyperbolic browser makes use of distortion techniques.

The Explorer uses an expandable tree technique. The information scent score had an effect on

15

reducing task completion time in a retrieval task. Both navigational tools had lower task completion

time in high information scent score conditions than in low information scent score conditions. The

Hyperbolic browser had lower completion time than the Explorer in high information scent score

conditions but higher completion time at low information scent score conditions.

Not only does the data in navigation come from the structure of the document space itself, it

also comes from information about where users have been traveling through space, the current

position in space, and perhaps user’s plan and alternative paths to some other place.

2.2.3 Problems of Navigation in Document Space

Problems of disorientation and cognitive overhead were reported by Conklin (1987). The

terms “lost in space” and disorientation are used in hypertext. These terms are based on the problems

of not knowing where you are in the network of hypertext and how to get to some other places that

you know (or think) exist in the network. The problems include the decision of where to go next, and

whether it is worth going to.

Nielsen (1990) investigated the homogeneity problem of an information space. On-line text

always looks the same. Thus, places and sense of location are not easily recognized or understood,

which is part of the disorientation problem. He also suggested that the problem in navigation is not

only in the “context-in-the-large” which addresses the entire hypertext structure, but also in the

“context-in-the-small” which address reading hypertext nodes. The problem is “losing track of the

text one is currently reading is related to the immediately preceding or following text” (Nielsen,

1990).

Mackinlay (1986) studied the use of hypertext to search for information. Two classes of

problems were encountered in using hypertext: category troubles and navigation troubles. The

category troubles, created by “the lack of shared literal meanings of categories,” manifested

themselves in terms of subject confusion. The experiment showed that in 39% of the searches,

subjects had category troubles. While being confused by a context which was not related to searching

topics, subjects in 27% of the searches still expected and hoped to find useful information this way.

Subjects also refused to accept that the category was different from their own understanding; this is

indicated by their going though the same path repeatedly, as shown in 31% of the searches.

Three kinds of navigational troubles were reported; linearity assumptions, becoming lost in

space, and linked navigation breakdown. The linearity assumption is a misconception about the

nonlinear structure of hypertext. Subjects were surprised when they ended up at an unexpected place

when non-sequential links were used. Subjects expressed these perceptions in 30% of the searches.

The lost in space troubles occurred in non-sequential link traversal and in poorly chosen non-literal

16

sequential links series. The linked navigation breakdown was caused by the fact that the subjects had

no certainty about what they had explored and what they had not. This problem occurred because the

size of the hypertext was unknown to the subjects. Subjects navigated though hypertext by

“wandering around aimlessly.” Gray also reported that subjects overestimated the size of hypertext.

After a two-hour session, subjects reported from 16 to 1,000 screens; the mean of estimation was

219.19 screens and the deviation was 325.41 from the actual 68 screens.

Dillon's experiment (Dillon, 1994) on estimating a document size provided similar results. In

a hypertext environment, users had difficulty estimating the number of nodes, while in a linear

condition, reading from paper and word processor, estimated page counts were more accurate.

Dillon’s hypothesis was that the hypertext version, which did not provide a structure of the document

space, would lead to a problem in estimating document size and would be difficult to navigate. In his

experiment, a navigation problem was indicated by time spent on contents index as percent of total

time. The result showed that hypertext navigation has a significantly higher usage of contents index

than the linear text condition.

From 1994 to 1998, the Graphics, Visualization & Usability (GVU) Center at the Georgia

Institute of Technology conducted user surveys of the WWW (GVU, 1998). According to GVU’s

WWW user survey question “What do you find to be the biggest problems in using the Web?” only

small number of those responding report on the problem of “Not being able to determine where I am

(i.e., 'lost in hyperspace' problem),” 3.7% - 6.4% of the cases in the fifth to the tenth surveys. The

problem of “Not being able to visualize where I have been and where I can go (e.g., view portions of

a Web site, view clickstream)” is also low, 6.5% - 11.1% of the cases. Finding new information is

more problematic, 45.4% – 49.5% of cases, from the eighth to the tenth surveys. Finding a page that

is known to be out there is reported as a problem in 28.4% - 32.4% of the cases and revisiting pages is

reported as 17.8% - 12.2% of the cases. The biggest problem is a concern with speed reported by

61.4% - 80.9% of the respondents. The problems in navigation (i.e. being lost, visualizing location,

finding new information, finding a page, and revisiting a page) show different responses by gender,

age group and experience. The differences are consistent across multiple surveys. Females report

more problems than males. Being lost or unable to visualizing location are more of a problem for the

young, 11-20, and old, 50+ than the 20-50 age groups. The finding information problem is more

frequently reported from the young group. In general, those with more experience in Web usage

report fewer problems in navigation.

17

2.2.4 WWW Usage

Pitkow (1999) summarized the characteristics of Web usage. There are notions of popularity

in usage: requested files showed a Zipf distribution in both client usage and requested files from

servers, and 25% of sites were responsible for 80 to 95% of accesses.

Tauscher and Greenberg (1997b) studied navigation on the WWW. They found that there is a

58% chance that the next page will be a page that has already been visited. However, users visit only

a few pages frequently. Many pages are only visited once (60%) or twice (19%). The classification of

navigation actions and their frequency of usage are shown in Figure 2. Following an anchor and

“Back” button are common ways in navigation using a Web browser.

Figure 2: Frequency of navigation action as a percentage of the total navigation events and (b)

details of Open URL action.

According to the GVU’s tenth WWW user survey (1998), about 70% of users report that they

use the WWW for finding specific information “most of the time”. About 33% and 55% of users

report that they use the WWW to have “fun” and explore, “most of the time” and “some time”

respectively.

Hölscher and Strube (2000) investigated user behavior in information searching tasks using

the WWW. Twenty-four participants used the WWW to answer 5 questions with in 10 minutes. The

usage actions were captured by a Web proxy. The result was presented as the transition probability

shown in Figure 3. The study shows a difference in searching behavior between experts and novices

in terms of Web experience and domain knowledge. The results of usage data show that in

information searching on the WWW, users use a search engine more than browsing the index

hierarchy or going to the known Web site combined. When user browses the result Web site, there is

a 70% chance that the user will continue to navigate through such the Web site.

18

Figure 3: Process model of information seeking using Web (transition probability)

2.3 Navigational tools in Document SpacesAccording to Nielsen, “.. [a] hypertext system has two navigational dimensions; a linear

dimension used to move back and forth among the text pages within a given node, and a non linear

dimension used for hypertext jumps.” (Nielsen, 1990). In addition to a link follower, the following

tools were suggested for navigation in hypertext:

Overview diagram of the global information space and the local neighborhood of the current

node.

Backtracking facility tools for going back to a previous page.

Interaction history including timestamps, footprints, and breadcrumbs. Timestamps record

time and user movement and show when pages were visited. Footprints provide check marks

in an overview diagram of visited pages. Bread-crumbs show check marks in an anchor of

visited pages.

Gloor (1997) classifies navigational tools that are related to hypermedia documents into seven

categories as follows:

Linking - links in hypertext. Links are also classified as static links or dynamic links.

Searching - a full-text search engine such as WAIS.

Sequentialization - helps navigate by making a sequential path such as a guided tour.

Hierarchy - a hierarchical display of hypertext structure in various forms.

Similarity - a display based on document similarity.

Mapping - overview map of hyperdocuments.

Agents - artificial intelligence based techniques.

The Web browser is a common presentation of hypertext and the WWW. One node or Web

page is presented with active anchor areas. Clicking on an anchor will lead to the linked node, which

19

will be displayed replacing the current page. There are many approaches to providing more

information about links to aid user in anchor selection process. Campbell and Maglio (1999) add the

“traffic light”, small image indicating connection speed, in front of an anchor. Weinreich and

Lamersdorf (2000) implement a prototype, HyperScout system, provides information about the

anchor-link (i.e. title, author, object size, and etc.) via a small pop-up window.

Many hypertext systems provide an active trail list. A history list in Netscape is shown as a

list of visited sites including the title of page, the URL, the first visit date, the last visit date, the visit

count, and so forth. An active trail list can be ordered by those attributes. In Internet Explorer version

5, a history list is shown by date, by site, by most visited, and by order visited today. In “by date”

and “by site” views, history is shown as a two level expandable/collapseable tree. Items are grouped

by either site name or date. It is interesting that the date grouping is non-linear -- today, days in a

week, last week and last 2 weeks.

Instead of showing a trail using an ordered list, a trail may be viewed based on visited nodes.

In a Web Browser, an anchor may change color when it has been visited. In a graphical overview map

of a document collection, nodes and links visited while navigating may be highlighted. One may

present only visited nodes and links, similar to an overview map. This is a presentation of visited sub-

space. Samples of this system are WebNet (Cockburn & Jones, 1996), Footprint Site Map, and

Footprint Paths (Wexelblat & Maes, 1999). Controlled experiments (Cockburn & Jones, 1996;

Wexelblat & Maes, 1999) were conducted and gave positive results in the utilization of tools.

The trail list of a current session shows a list of visited pages. The trail list can be shown by

expanding the back button (in many Web Browsers). The top item is the most recently visited page –

the destination of a click on the back button. There are many schemes that might be used to create

this list. The stack-based scheme is commonly used. Dix and Mancini (1998) investigated six history

and backtracking mechanisms. The formal definitions are provided. They indicate that the back

button is used in different ways in many applications. In general, the linear traversal of links will

give the same results. However, when the list includes a node that has been visited several times,

each mechanism treats the visits differently. As a result, “go back” will go to different positions.

Tauscher & Greenberg (1997a) found that a trail list that presents the last 10 URLs with duplicates

saved only in the last position would be more predictive and usable than a stack based system.

A bookmark is a list of marked locations. In the WWW environment, a bookmark list is

shown as a title list of marked pages. Most Web browsers provide a hierarchical organization of

bookmark items. MS Windows implements a bookmark item as a linked file. While bookmarks are

still accessible from the menu, MS File Manager can also access them. In some implementations,

bookmarks are built into an HTML file.

20

Document usage data is also useful, especially for managing a document space. Animation

of the number of accesses per day on a given Web site could be very effective for identifying new hot

pages on a site. It could also show pages that, over time, are cooling down or becoming of less

interest. Similarly, document usage data makes it possible to see general growth patterns and clusters

of activity. Animation of visitors of a group to Web pages can be found in Minar and Donath (1999).

The graphical overview diagram is common in early hypertext systems, i.e. NoteCards,

Intermedia and WE (Conklin, 1987). The graphical overview is one promising tool for aiding

navigation in complex Web space (Czerwinski & Larson, 1998; Nielsen, 1999). There are many

graphical overview implemented in the WWW environment such as HyperSpace (Wood, Drew,

Beale, & Hendley, 1995), Hyperbolic Browser (Lamping, Rao, & Pirolli, 1995), WWW3D

(Snowdon, Fahlen, & Stenius, 1996), WebTOC (Nation, Plaisant, Marchionini, & Komlodi, 1997),

MAPA (Durand & Kahn, 1998), Microsoft FrontPage, CLEARWeb (CLEARWeb, Inc.), HoTMetal

(SoftQuad Software, Inc.), InContext WebAnalyzer (Geac Computer Corporation Limited), Ixsite

Web Analyzer (Ixacta, Inc.), Site Manager (Silicon Graphics, Inc.), and so forth. Many of these

systems are designed for Web site management. Only a few them, for example the MAPA system,

provide a client side viewer. The process of scanning Web structure takes time, which may make the

overview system inappropriate as navigation aids and for browsing. Pre-scanning a Web site structure

in some way will be important if graphical overviews are to be used for navigation.

Chen and Rada (1996) performed a meta-analysis, which showed that a graphical overview

diagram – a visualization of the organization of hypertext, is significantly useful. Graphical

overviews or maps provide an exocentric view of the space. They make sense of what the whole

space looks like, how the space is organized, and how objects are related.

The simple graphical overview diagram shows the structure of a document space. It shows an

explicitly defined set of relations, such as the hierarchical structure of a file system or the network

structure of hypertext. Diagrams use spatial dimensions in a partially ordered manner. Relationships

among objects in diagrams are often presented by connection lines. The objects are represented in

some simple symbolic form. The layout of objects in diagrams conveys information such as a

nearness relation and a group-cluster relation. Many algorithms are used to create diagrams. Display

constraints have been set in order to create a nice-looking networks, for instance, to maintain a

minimum number of cross-links, avoid overlapping nodes, and to keep a minimum link length.

Algorithms for optimizing diagram layout with many constraints are intractable, NP-complete

(Brandenburg, 1987). Heuristic methods and relaxed constraints are common in implementation. The

classification of graph structure topology, with extensive treatment of the aesthetics of diagram

construction, can be found in Beccaria, Bertolazzi, Battista, & Liotta (1991)

21

Another type of overview diagram is a semantic map (Lin, Soergel, & Marchionini, 1991;

Fowler, Fowler, & Wilson, 1991; Fowler, Kumar, & Williams, 1996; Kohonen, 1998). Words in

documents are processed into a semantic map. The process may not be truly semantic, but rather an

attempt to capture the semantic aspects of the documents. For example, a semantic map may be

created by projecting a set of documents into 2D or 3D space and optimizing the distance between

them so that similar documents will be clustered. Document similarity may be measured by a

distance vector method. Alternatively, similarity may be determined by classification mapping.

However, Ankerst, Berchtold, & Keim (1998) have proven that the optimal spatial arrangement

problem by similarity of multiple variables is NP-complete.

The graphical overviews or maps are also classified as global or local views. The global view

presents an overall view of the space. It is relative to the size of the pertinent space. For instance, a

state map may be considered a global map if one is concerned with travel only within a city.

Similarly, a map of all the Web pages in a single Web site map may be considered the global view

even if there are many Web Sites. The local overview/map view shows neighborhood around a local

focus. It can indicate “where we can go next” from a top-view. A local view may be a “zoom-in” of a

global view.

An author of a Web Site may create Web pages that serve as a map. For instance, the “Site

Map” page, table of contents or index pages might be viewed as a map of a Web Site.

Hypertext uses a network model for relations among nodes. A hypertext collection is

presented as a network diagram. However, network relations are sometimes simplified as a

hierarchical structure. There are various forms for presenting a hierarchical structure. The overview of

a hierarchical structure is normally presented as a tree diagram. Expanding and collapsing sub-trees

operate as a general strategy to avoid too much node information in a view.

Conklin (1987) reported several problems with graphical overviews. The problems

mentioned included difficulties in presenting a large number of nodes and/or links; difficulties in

dealing with a frequently changing hypertext network; difficult in overcoming slow response time in

user interaction. Other problems Conklin reported included an insufficient visual differentiation

among nodes or links and the fact that disorientation problems still exist for non-visually oriented

users.

The design of the display is always a tradeoff between data that can be displayed and data

that will not be visible at the moment. In order to display all of the data at once on a limited display

space, a data point has to be reduced to a very small point. On the other hand, if data are visible at a

size that can be readable or selectable by a mouse, some data points will be not visible due to

occlusions from other data points, or due to being out of the boundaries of the display. A large data

22

set also slows down an interaction process. Many strategies are used to solve these problems,

including the following:

The occlusion of objects may be allowed and interaction techniques, such as local

manipulation of the viewpoint, may be used to see them.

Panning of a virtual display space is allowed when the space to be displayed is larger than

will fit on the view area.

Multiple levels of display may be used, where more details of the object may be shown by

zooming in.

Content+focus addresses the problem of details versus overview by showing both of them at

the same time. At the point of focus, details of objects are shown and an overview is shown in

the rest of the area.

Some interaction techniques are:

Dynamic Queries: Dynamic queries technique, developed by Shneiderman (1994), is an

interactive display with controllers for direct-manipulation of queries and results. Controllers

are created which are bound to a range of values of interest corresponding to an attribute. The

presented data changes dynamically as the controller is manipulated within a bounded range.

Mural: Mural is a scheme that provides an overview presentation to fit on the screen (Jerding

& Stasko, 1995). The Mural view is a miniature of larger content that cannot be viewed

without losing detail or is not readable in a single display space. The display space is

condensed. Therefore, a single dot or Mural view may present multiple data points from the

original display. A secondary encoding, such as the color dimension of pixels, may be used to

present additional data.

Magic Lenses: A Magic Lens uses the concept of spatial sensitivity. It mimics a magnifying

lens. Magic lenses are areas which are superimposed on top of another presentation. Many

functions can be applied to lenses such as showing more detail or filtering. What is shown on

the lens is a function of the lens's position (Stone, Fishkin, & Bier, 1994).

Pad and Pad++: Pad uses zooming and panning as main interactions (Perlin & Fox, 1993;

Bederson & Hollan, 1994). The data first appear at a certain magnification factor. Zooming in

shows objects at a bigger size, with more detail, or with different detail (semantic zooming).

An object in Pad space is 3D, having both an X-Y coordinate and depth. At a certain zoom

factor, objects that have a certain depth will appear.

Furnas' Fisheye: Furnas (1982) provides a view combination showing both overview and

detail; using zooming, an overview structure is not visible when viewing detail. Using

multiple views, it is difficult to relate the information in both views. The fisheye provides a

23

combination of overview and detail views in a single view. The objects of interest will be

visible and change dynamically according to the focus point. By using the distance

metaphor, the focal object will appear closer than other objects. Furnas’ Fisheye is a basic

concept of spatial distortion presentation by “degree of interest” and it is also called

“Detail+Context” or “Focus+Context.” A formal description and framework of

“Focus+Context” appears in Björk, Holmquist, & Redström (1999).

2.4 Integrated Document Space Navigational tools

2.4.1 Tool Integration

Each document space navigational tool is designed and optimized for specific tasks. It is not

possible to meaningfully present at one time all the information about documents at all the levels it

might be presented. This failure is a result of the limitations of physical display devices and human

perceptual abilities. Supporting navigation requires a combination of different presentations in an

appropriate order. For example, it may be useful at the beginning to get an overview of a space – to

understand the structure of that space. During traversal of space, specific detail, obtained by zooming

to some local map with more detail on objects, may be useful. At the final stages, a content viewer is

needed to examine the content. This idea is similar to one in Spring, Morse, & Heo (1996) as well as

being consistent with Shneiderman's view that user interface should provide an “Overview first, zoom

and filter, then details on demand” (Shneiderman, 1998).

While it is possible to add many features to an application, i.e. to add a variety of document

presentation tools, a user might not choose to use these additional features. According to Albers

(1997), learning new methods and options requires additional work and remembering. Users tend to

work to optimize their cognitive resources rather than maximizing their work output.

Tools can be simple or complex, simple purpose or multi-purpose. Tools can be application

specific or generic, i.e. tools that are used by a variety of other programs. The term tool is used here

to refer to the simple generic type. Tools are modular programs that provide a specific presentation

and a specific interaction and fulfill a special function. A coordinated and integrated set of tools will

be called a system.

Navigational tools use document-related data and a presentation or display of that data.

Different tools may use the same data with different presentations or use the same presentation on

different data.

24

In terms of presentation, the integration of navigational tools needs to address how different

presentations can be viewed on a single screen or in a specific sequence. The concerns of integration

include multiple windows, graphical object combinations, and interaction schemes.

At the code level, the integration of navigational tools is a matter of sharing or exchanging

state data and content data between program modules. When the user is interacting with a system of

navigational tools, tool state information, such as current position, will be of use when switching

between tools. When the user is using a search tool, passing the search terms allows the system to

provide result texts with the query words highlighted in context.

To this point, two types of data have been described – document data and navigation data.

One might imagine document data to be public and navigation data to be personal. One might further

imagine document data to be stored remotely and navigation data to be stored locally. It is not hard to

find counter examples for these cases. For example, link traversal or navigation data on a Web site

might be collected to find high traffic paths. Similarly, file system data on a PC may be considered

personal – and only stored locally. Below, some of the issues pertaining to document and navigation

data are outlined:

Most GUI technology is organized around “windows” as a basic unit. The window is a

rectangular area for display that has some degree of automated functionality provided by a window

manager program. These include the reporting of various events such as: window events (e.g.

exposure, resizing, etc.); user events (e.g. mouse clicks, keyboard actions, etc); and system events

(e.g. OS interrupts, signals, inter application messaging, etc.) Window systems allow -- require --

windows to be organized within other windows -- creating a hierarchy. Within X-window

terminology, all graphical objects are windows, including icons and scroll bars. From this viewpoint,

each navigational tool may be defined as having its own window. The placement of windows and

indication of relations among windows are a major concern when integrating-tools.

The layout of windows may be static or dynamic. Windows may be laid out side-by-side or

overlapping. The advantage of side-by-side windows is that all of the information is presented in a

single view. The disadvantage is that the territory of all the windows must be less than the territory of

the display itself. Using overlapping windows, the total territory in all the windows may be many

times greater than the display territory. The disadvantage of overlapping windows is that some of the

information in the windows will be hidden from view at any given time. Interaction can resolve this

problem by allowing the user to select which window should be up front. X-Windows provides a

policy to do an automatic “bring to front” when the mouse enters the window area. MS Windows 95

uses a “task bar” to aid this process. Some applications provide a “stay on top” option to avoid

25

occlusion by other windows. Gaylin (1986) has shown that cycling through windows is the most

frequent window action (i.e. more than moving or resizing).

Many applications, which use a multi-window scheme, provide an automatic layout as either

tiled or overlapping. Window layout is dependent on the tasks being performed. Tasks that require

little window manipulation can be performed faster in tiled than in overlapping windows (Bly &

Rosenberg, 1986).

North & Shneiderman (1997) provide a taxonomy of multiple window coordination. It is a

two dimensional taxonomy. The first dimension relates to the data in the two windows, which is

either the same or different. (The same data might have a different presentation in each window.)

Different data should have explicitly defined relationships among them. (They may be an aggregate

of data items.) The second dimension is the function of the window. It is suggested that the windows

might be either selection or navigation windows. Given two windows, both might be navigational in

function, both might be selection- oriented in function, or they may be split.

The navigation functions include scrolling, zooming, following links, opening files, and so

forth. The six cases of combination are show in Figure 4. Shneiderman and North have reviewed the

advantages in presenting data with multiple windows with coordination between them. They

implemented a “snap-together visualization” which allows a user to define the coordination of

windows (North & Shneiderman, 1999).

Figure 4: A taxonomy of multiple window coordination (North & Shneiderman, 1997)

Multiple window system might encounter difficulty in presenting the relationship between

the windows on the screen. Many windows from different applications may be shown on the same

screen. As more windows are added, the screen becomes crowded. The windows may be shown to be

26

related by presenting them within a single application window. Relationships among windows may

be shown by the synchronization of changes. Interactions in one window can be used to change other

windows. For instance, when the data from one window is changed by interaction (e.g. selection), the

contents of another window (e.g. a contents window) can be changed. The relationship may be

explicit -- shown by some presentation such as a line and an arrow-- or unannounced.

A pop-up window is a window that only appears after some interaction with a main window.

A pop-up window generally captures control or focus from the main window. A common type of

pop-up window becomes active with a mouse click action and disappears when some button on the

window is clicked. Some variations such as Balloon Help in Apple system and Tool Tips in MS

windows are activated when the mouse pointer has been over some specific area for a specified

amount of time. A pop-up window may be kept open via some holding action. A “Pin” or “tear off”

capability is used in some systems to keep a pop-up menu open even after the pointer has left the

menu area.

The location of a pop-up window varies; positioning it at the center of the screen is a

common practice. Many applications position a pop-up window under the active area to avoid

obstructing the active area. A pop-up window shifts the focus of attention from the main window.

The second window may be the same size and in the same position as the current window,

but with a transparency property. Data is drawn on the transparent window, which is layered on top of

the other window. The data from a new layer may block the view of the layers beneath. This scheme

is used when spatial encoding of both views is similar. The interaction of views should be coherent

in both layers. The magic lens uses a scheme where a second smaller window is positioned in the

larger window, and the contents of the smaller window is a function of location within the main

window.

Changing the mode of a window may be concerned the same as creating a second window to

replace the first window. When changing windows, the interaction may be the same or it may use a

new set of interactions in the new windows. This approach has the disadvantage of associating

information from a previous display with a new one. It requires cognition overhead to recognize the

change in mode.

Windows may be synchronized in three ways. The first is one-way synchronization which

propagate a change occurs in one window to cause another window to change state; the second

window dose not change the first. The second is two-way synchronization. Changes in either

window will propagate to the other. The third possibility is that both windows maintain their internal

states independently, with no synchronization.

27

Regardless of whether navigational tools are presented at the same or at different times and in

the same or different windows, the navigational tools should be capable of being synchronized to

each other. This synchronization can be done by passing display data or tool state information or

both. The synchronization may be in terms of:

Place

Selection

Boundary or view

Attributes

For instance, if three windows are used, one showing a global tree map, another the files in a

directory list, and one showing file contents, the global tree map could have an indicator to show

which files are being displayed in the file directory list. The file content viewer may display the file

that is selected in the file list. In this case, the navigation might be done in either the tree map or the

file list. Navigation via the tree map should change the file list contents.

It is conceivable that each navigational tool has its own internal representation of document

space and navigation data. In order to communicate between tools, only common data can be

interchanged. The representation of the current place, or current selection, will have to be converted

to some shared representation.

2.4.2 Evaluation of Integrated Navigational tools in Hypertext

Monk, Walsh, & Dix (1988) compared three types of interface, i.e. hypertext browser,

scrolling text and folding text in two experiments. The task was to answer questions about a program.

The first experiment results showed that there were significant differences in time used between

scrolling text and hypertext browser. The hypertext browser was significantly slower than scrolling

text. There was no difference in task correctness. In a second experiment, they showed that a static

overview map of the program structure improved the hypertext browser navigation as indicated by the

reduction in response time. In controlled conditions, using the hypertext browser and showing a list of

titles that were similar to titles shown in an overview map, had little effect. Note that the document

contained 12 nodes. In the hypertext browser condition, two windows presented two nodes at the

same time. Subjects had no experience in using a mouse-based system.

Hammond & Allinson (1989) compared a hypertext browser alone to a hypertext browser

used with a map, a hypertext browser used with an index, a hypertext browser used with tours, and a

hypertext browser with all three of these navigational tools. Two task types, exploratory and directed,

were studied. The exploratory task was to read documents for subsequent testing. The directed task

was to use documents to answer a series of questions. All subjects were novices to the tools. The

28

document contained 39 screens with up to 6 navigation screens. There were three map screens for the

map tool. The results show that there are no significant differences in task performance, i.e. accuracy

score and time to complete task, for each tool. When using the hypertext browser with a single

navigational tool, subjects used the additional facilities to navigate separately; the map was used 31%

of the time, the index 23% and the tours for 49% of the total transitions. Usage of a hypertext browser

with all three navigational tools is as shown in Figure 5. When using the hypertext browser alone,

subjects viewed fewer screens and fewer different screens in terms of the total than the other

conditions. The new-to-old screen ratio in the hypertext browser alone condition was significantly

smaller than the other conditions.

Exploratory

Hypertext54%

Tour28%

Index6%

Map12%

Directed

Hypertext59%

Tour8%

Index17%

Map16%

Figure 5: Proportion of navigational tool usage in an exploratory and directed tasks

Nielsen (1989) summarized usability studies in hypertext and reported several factors that

affected performance. Most of the effect sizes were small. Only 17 of 92 effect factors were higher

than 2. Two significant issues reported are individual differences among users (i.e. age groups,

activity level and expertise) and the effect of different tasks (i.e. exploration vs. direct task). These

factors affected usability measurements to a greater degree than other factors studied.

Wright & Lickorish (1990) studied the effect of two navigation systems, i.e. index navigation

and page navigation, on various types of question answering. In index navigation, an index page is

provided and navigation to other pages can only achieved by clicking on an item in the index page.

This index page may be viewed as an overview map of the hypertext. In page navigation, each page

contained active anchors that allowed jumps to various places, i.e. navigate by using a hypertext

browser. The results showed that the page navigation group performed tasks by using more clicks

than the index group. However, time spent for the task depended on the question. For finding a page,

there was no significant difference in the time spent performing the task. The error rate was not

affected by navigation systems.

29

Heo (2000) studied Web visualization techniques. Web visualization techniques were

classified into four categories: distortion, three-dimensional layout, zoom and expanding outline. The

usability study was conducted to examine the performance of users in information-finding tasks using

four different navigational tools in two sizes of Web space. The experiments used a Web browser

(Internet Explorer as a control condition), a Web browser with a distortion technique tool (Site

Analyst - using Hyperbolic distortion), a Web browser with a zoom technique tool (MerzScope), and

a Web browser with an expanding outline technique tool (LiveIndex) (see Appendix A, Figure 41,

Figure 42, and Figure 43). Two Web sites were used; a small Web site containing 50 pages and a

large Web site containing 583 pages. The performances were measured by response time and

accuracy. The results showed that there was a significant difference in response time between tools

but no significant difference in accuracy. The Web browser with the zoom technique tool took more

time than a Web browser alone and the Web browser with the expanding outline technique tool. In

general, the mean response time, when using a Web browser with the tools, was higher than when

using a Web browser alone. The results showed that users took more time to complete tasks and

answer the questions less accurately on the large Web site than on the small Web site. However, there

was no interaction effect between tools and size of Web site in subject’s performance.

In summary, the findings about performance are mixed. The research would seem to indicate

that performance is highly dependent upon the user and the task. Less attention has been paid to the

characteristics of the space being navigated. This current study looks to examine the effectiveness of

navigational tools for information finding in hypertext spaces of varying complexity and “scent.”

30

3 RESEARCH METHODOLOGY

3.1 IntroductionDocument space properties are one of many factors that contribute to navigation efficiency.

The WWW environment is selected because it is a widely available document space. Web site

complexity is selected as a simple classification of the properties of the space. A number of

complexity measures are investigated and selected as predictors of navigation performance based on

Web site complexity.

Navigation behavior depends on the given tasks. Navigation for information-finding tasks is

used in this study. Navigating for information finding is a common task in the WWW environment.

The selection of a single task type is a simplifying condition for this research.

The Web browser is used as a base condition because it has become a common tool for

navigation in the WWW. The graphical overview diagram of Web site is chosen for study because of

its historical use in hypertext systems and it theoretic potential as a navigational aid. It shows the

structure of a document space visually. Theories of navigation support the utility of a graphical

overview of the space.

3.1.1 Document Space

A document space may be implicitly or explicitly linked or not linked at all, i.e. a simple

collection. Hypertext generally, and the WWW specifically, is a document space with a network

structure. A network structure can be a complex mesh, a hierarchical structure or a simple list

structure.

The WWW is an open and variably complex mesh environment, i.e. the user can travel from

collection to collection. This study addresses isolated sites. Links to other sites are disabled. An

individual Web Site is considered a closed hypertext system.

3.1.2 Web Site Metrics

This section examines data collected to describe the complexity of a Web site. The WWW is

a single heterogeneous hypertext. While all nodes can be identified as belonging to the WWW, it may

be the case that they are from several hypertexts. The WWW may also be defined as a collection of

Web sites.

A Web site may be viewed as a hierarchical collection according to its logical construction or

Domain Name structure. For example, department Web sites may be considered to be part of a

31

university Web site, i.e. www.sis.pitt.edu and www.cis.pitt.edu belong to www.pitt.edu, as their

primary domain names are the same.

A Web site may also be defined as synonymous with a Web server. As a further

complication, it should be noted that the implementation of many Web servers, i.e. Apache, MS IIS,

allows a single server to support multiple Web sites with different Internet addresses. Thus, while a

Web site may be defined as one or more servers managed by a single organization, other definitions

such as those above are possible.

There are many ways to gather information about a Web site. One convenient way is using

spiders (also called crawlers, robots and bots). This kind of program can be structured to return what

a user would encounter when navigating links. Both a linked graph, and a hierarchical structure map

of the site can be constructed using a Web spider to capture the structure of the Web site. However,

an unconnected component cannot be captured by a Web spider. (It would be possible, if a given

server allowed directory listings, to construct a spider that could find unconnected components in

directories where connected component existed.)

When scanning a Web site, a Web spider will encounter files in many different formats.

Many file formats, such as MS Office and Adobe PDF, are proprietary standards. Other, such as

VRML and VML, proposed by W3C, are public. Special software, i.e. “ a plug in”, is used to present

special formats. In order to simplify the Web Spider program, special file formats are ignored and

only HTML file formats are scanned. Scanning only HTML file types will cover the majority of

documents in a Web site. Woodruff, Aoki, Brewer, Gauthier, & Rowe (1996) reported that 75% of all

URLs point to files that are HTML. Bray (1996) reported that 44.7% of all URLs point to files with

no extension and 36.5% of all URLs point to html. Bray concludes that 80% of URLs are likely to use

the HTML file format.

A Web page from a Web site may be a static file or dynamically generated by a program

from a Web server. Common Gateway Interface (CGI), Active Server Pages (ASP), Internet Server

Application Programming Interface (ISAPI), PHP Hypertext Preprocessor, and Server Side Includes

(SSI) are examples of methods for create dynamic Web pages. In practice, a URL of the dynamic

Web page uses a specific file type (i.e. “.cgi”, “.asp”, “.dll”, etc.) The request from the client (Web

Browser) for a dynamic page commonly contains parameters that determine the specific content to be

created. The parameters may come from user entry on an input form, events from an active

component, state information kept by a cookie or state information kept at the sever site. In some

cases, it is impossible to determine the number of pages or the content of a dynamic Web page

because of the non-deterministic nature of the inputs. Iteration over dynamic Web page parameters

32

may not be feasible. A Web spider program is able to capture only a snapshot of a generated Web

page.

The HTML version 4 standard specifies a number of tags using URI. The details are shown in

Appendix B. One URI that is used in navigation by a Web browser is presented after the HREF

attribute in the “A” tag. However, HTML can provide other mechanisms to navigate, such as

javascript and “ismap” attributes of the “IMAGE” tag. These are more difficult to find and follow

using a Web spider. The script and “ismap” tags were ignored.

A Web site can be represented by a directed-graph. A number of attributes may be used to

describe a Web site including the number of nodes, the number of links, and the topology. The

topology may be described in terms of connections between nodes or in terms of the average distance

between nodes.

The size of a Web site may be defined in terms of the number of nodes or the size of the

nodes. There are many kinds of nodes in the WWW environment. The target URI may be an HTML

file, a text file, an image file or a proprietary file type.

Nodes may be classified as follows;

o HTML pages

o Embedded objects such as graphics (<BODY background > and <IMG src>)

o Other file types, which are referenced by anchor

Lawrence & Giles (1999) reported that the mean number of pages per server was 289, with an

extreme skew. According to Huberman & Adamic (1999), the distribution of the number of pages per

Web site may be predicted by a universal power law.

In the WWW environment, anchors and links are combined. The number of links is equal to

the number of tags and attributes that refer to a URI. In this study, a link is defined as those tags and

attributes that can be used in navigation to another node by the Web Browser. These include <A href>

and <AREA href>. The targets of the links, in HTML, can be classified as follows: internal node

links, internal Web site links and external links. This study is concerned only with internal Web site

links.

Other common derived attributes of the node such as the number of incoming-links and the

number of outgoing-links can be computed. The true number of incoming links to a node is unknown

in the WWW environment. In the scope of a Web site, the number of incoming links can be

determined by counting when the node is a target of a link from another node at the Web site. The

number of outgoing-links can be determined by counting referent tags of the node. The global

structure of a Web site may be represented by the average number of links and their deviation (e.g. of

incoming and outgoing links).

33

Two global structure measurements of the Web site developed by Botafogo, Rivlin, &

Shneiderman (1992) are compactness and stratum. The compactness indicates reachability of nodes.

It is defined as

.

The computation applied to Converted Distance Matrix (CDM), a distance matrix which

defines the non-reachable node pair distance with some constant, K, rather than an infinite value.

is the distance between node i and j and where n is number of

nodes and C is the maximum value a CDM entry can assume, usually C = K. In fully disconnected

nodes, Cp = 0 and in fully connected Cp = 1.

Stratum is a metric that suggests whether there is an order for reading the hypertext. In a

linear hypertext, it can only be read in one-way; stratum value is one. On the other hand, with a cyclic

hypertext there is no difference in ordering from what node reading starts; stratum value is zero. The

detail on Stratum formula is shown in Appendix B.

There are other ways to simplify a directed-graph distance matrix in order to avoid an infinite

distance. For example, distance matrix of directed graph can be assumed as an undirected graph and

computed with all-pair shortest path between nodes.

In the case of a Web site, the distance between nodes may be defined as the number of

“clicks” when navigating using a browser. For instance, node a and b which have no direct link have

a distance equal to the links from a to the root and the root to b. In general, this would be the distance

between root node to a target node plus one, assuming that it is possible to jump from source node to

root by one “click”. The source node to root node distance is set as one by assuming that there is a

backtrack mechanism. This distance metric depends on the root node.

Average distance from the root node is a metric that may be useful in an information-finding

task. Based on the fact that the root node is a convenient entry point to the Web site, the distance from

the root node is an optimized path condition. If the information need is randomly distributed

throughout the structure, the average distance will be a suitable predictor for the number of nodes that

need to be visited to find the information. However, it is known that in information seeking tasks, the

user may use a search engine. The search results will provide a direct access mechanism. Also, there

is a notion of node popularity, i.e. information may not be distributed evenly across the Web pages in

the site. Many Web sites provide an index page for use as a navigation aid. These are not target

nodes. Finally, users may not use the shortest path or may use some other strategy in navigation such

as aim off.

34

Boyle & Teh (1992) showed that increasing the number of links in a preserved hypertext

structure would decrease the average number of nodes visited in an information-finding task. They

also found that the total time for the completion of a task and the numbers of errors were not affected

significantly by the number of links.

Schoon (1997) showed that the different hypertext structures -- linear, hierarchical, star, and

arbitrary, had a significant effect on navigation in finding the location of the answer in a closed Web

site. The star and hierarchy structures were more navigationally efficient than the linear structure. The

arbitrary structure was significantly less efficient than the others. Efficiency was measured by

Navigation Action Efficiency (NAE) and derived from the following formula:

There was no significant difference in NAE value between groups with different levels of experience.

(It should be noted that the experience rating score was self-reported.) There was significant

difference in the NAE value among gender only in the arbitrary structure; males had lower NAE

values than females. In other structure types, there was no significant difference of the NAE value

based on gender.

Larson & Czerwinski (1998) showed the reaction time and “lostness” in three Web site

structures each with 512 bottom level node, i.e., 8x8x8 (8 top-level categories, each with 8 sub-levels

and 8 content level categories under each sub-level), 16x32 (16 top-level categories, each with 32

content level categories), and 32x16 (32 top-level categories, each with 16 content level categories)

hierarchies. The 8x8x8 hierarchy had a significantly higher reaction time and “lostness” than the

other two in an information-finding task. (The answer for the task was in the bottom level of each

structure type.) The 16x32 structure had a better reaction time than 32x16 but it was not significant.

Regarding “lostness”, the 16x32 had a better reaction time than 32x16, but it was marginally

significant. The “lostness” was computed by the number of unique and total links visited in

comparison to the “optimal path.” The information scent was controlled by the editor to make each

level of category appear natural. The depth of the Web structure contributed to both reaction time and

lostness.

Nakayama, Kato, & Yamane (2000) studied the Web site usage log in order to improve Web

site design. Access co-occurrence, measured by cosine similarity of access log, was used as one

metric. The path length is computed by using the shortest path between two nodes. The study showed

a relation between access co-occurrence and the path length between nodes, a negative correlation i.e.

nodes that are near each other are likely to be accessed together.

35

Frame construction in HTML creates the appearance of a single Web page by combining

HTML files. In this case, the Web browser is not a single node text viewer. Also, navigation using a

“frames” condition is different in that some combination of anchors from several nodes is visible and

may be used. In this study, a node will be applied to a single HTML file. The frame tag, i.e. <frame

src>, is considered to be a zero length link to the target node when used to compute the distance

between nodes. However, it was not counted when considering the navigational link.

Various Web structure metrics, including number of nodes, number of links and other derived

information that summarizes the connection between nodes, may determine performance in Web

navigation.

3.1.3 Study of Web sites structure

As discussed in the last section, there are many metrics that may be used to describe the

structure of a Web site. Some of them have been shown to be correlated to navigation performance in

information finding tasks, i.e. number of nodes, average distance from root node, and topology of a

Web site. In order to select metrics for representation of a Web site structure, a study was conducted

on a set of Web sites. A suitable set of metrics should differentiate Web sites with high and low

complexity. The research will test to see if these metrics have high correlation with navigation

performance using differential tools. The purpose of this preliminary study was to gather a suitable

set of Web sites for analysis. This set will serve as the basis for the selection of sites with high and

low complexity for use in the experiment.

The Web sites within the University of Pittsburgh domain were selected. Within this domain

are a large number of department and program sites that were identified and scanned. The local

university’s Web sites were selected in order to minimize Web site scanning response time and

minimize out-going network traffic. The department and program Web sites are managed by each

department thus they should have a variety of structural properties. However, because these sites were

similar in subject matter (i.e. they introduce programs and provide academic information) many

similarities of structure were noticeable. For instance, most of the main pages contain links to lists of

staff and faculty. These similarities were considered beneficial in that it would provide an additional

form of control in the final experiment. That is, while the sites would vary in complexity, they would

be similar in scope and content. However, It is true that the University Web sites are not

representative of the Web sites in the WWW. The majority of Web sites in the WWW are commercial

Web sites. The commercial Web sites differ from the University Web site in their objective and

content.

36

The Web sites list came from the University of Pittsburgh departments and program list page,

http://www.pitt.edu/academics.html. Web sites were also gathered by scanning all IP addresses in the

“pitt.edu” domain. The result of scanning all the IP addresses, from 2562 possible IP addresses, found

that a total of 636 hosts responded to an HTTP call at port 80. The first page of all these sites was

scanned. The Web sites were investigated based on their first page. Of the 636 addresses responding,

57% (365) of were either printer manager Web sites, default Web server software package pages, or

device manager Web sites. Of the 636 responses, including the printers and other device managers,

there were 308 unique first pages. Beyond the device manager sites, there were a few sites with the

same text because the hosts respond to multiple IP addresses. This analysis led to the identification of

232 sites that might be scanned for content. This list was then compared with a list generated by

scanning the main Web pages of the University to develop a list of 83 sites for further analysis.

The data was collected by a Spider program. Major components of the spider program came

from the public domain. The Web spider core code was written by Jef Poskanzer, www.acme.com.

The HTML parser was written by Arthur Do, (http://www-cs-students.stanford.edu/~do/

htmlstreamer.html). These two components were modified and merged together. An interface and

database functions were added.

The spider program uses a given URL as a starting point for scanning a Web site. The spider

program follows the URL that has the same prefix as the given URL. This given URL prefix is the

boundary of the Web site, i.e. only URLs with the same host and sub-directory path are in the same

site. Only URLs from anchors (i.e. “A href”, “AREA href”) and frames (i.e. “FRAME src”, and

“IFRAME src”) were followed. Other sources of URL are stored but not further scanned. Only the

target-URLs that are identified to be HTML file type were parsed to extract additional URLs.

A directed graph representing the Web site’s structure was created from the scanned Web site

data. Nodes of the graph were the HTML files within the Web site. Edges of the graph were presented

when the HTML file had a link to the target HTML file within the same Web site. The edges weight,

i.e. distance between nodes, were initialized to 1 when there were linked by anchor and initialized to

0 when there were linked within a frame.

The Web sites were scanned between June - September, 2000. Of the 83 Web sites scanned,

three Web sites were removed because they caused an error in the spider program. Given 80 starting

URLs, a total of 45,984 URLs were discovered from the 12,007 HTML files parsed. The summary of

URLs found are shown in Figure 6. The 40,856 (88.85% of the total URL) URLs used the HTTP

protocol. Other protocols found were: mailto - 4,186 (9.10%), javascript - 656 (1.43%) and ftp - 98

(0.21%). The rest 188 (0.41%) of URLs were typing errors, or system specific protocols such as

“gopher”, “file”, “news” and so forth.

37

http://www-cs-students.stanford.edu/~do/

http://www.acme.com/

Figure 6: Summary of URLs founded

The Spider successfully scanned 30,627 (74.96 % of HTTP URLs) of URLs that used the

HTTP protocol. Of the 10,229 (25.04% of HTTP URLs) URLs that were not scanned, 8,627 URLs

were not in the Web site (21.12% of HTTP URLs), 1,117 URLs resulted in a server responds of error

or access denied (2.73% of HTTP URLs), and 485 URLs were linked by other tag type (1.19% of

HTTP URLs).

Of the 30,627 URLs that were scanned, the content types identified by the server were as

shown in Table 1. The content type “text/html” was identified for 12,007 URLs. These were parsed

to extract URLs, URLs that had others content-types were ignored.

Table 1: Content types of scanned URLs

Content type Number of scanned URLstext/html 12,007 39.20%image/gif 11,807 38.55%image/jpeg 5,565 18.17%Others 1,248 4.07%Total 30,627

The files of type “text/html” content-type identified by the server, had the following file

extensions: “.html” 7,648 files (63.70% of expected HTML files), “.htm” 3,581 files (29.82%), no

file type 731 files (6.09%), and other file type include “.lasso”, “.asp” and “.map” 27 files (0.39%).

All of these files were scanned and parsed for tags.

There were 219 HTML files that contained a “FRAME” tag (1.82 % of HTML files), 160

files that contained “FRAME” and “BODY” tags (1.33%), 1,791 files that contained programs, i.e.

“SCRIPT” tag (14.92%), 360 files that contained “ISMAP” tags (3.00%). No files used “IFRAME”

tags.

38

20,085 outside site targets

Total 277,890 links

3,664 target nods

171,652 within sitetargets

A href(57.72%)

Img src(33.98%)

Areahref

(2.48%)

Others(5.82%)

25,263 links(9.13%) to outside sites

252,527 links(90.87%) to within site

33,191 links(13.94%) self reference

Figure 7: Links summary

A total of 277,890 links were found (see Figure 7). A link is a tag-attribute that contains a

URL. Of the 277,890 links, 191,737 were “tag-connections” where a connection is defined by

ignoring multiple links between source-target pairs with identical tag-attributes. The tags-attributes

that created links are shown in Table 2. Of the 277,890 links, 252,527 links (90.87% of the links)

were to nodes within the site. There were 171,652 tag-connections (89.52 % of the tag-connections)

that pointed to nodes within the same site. Of the links within the site, 3,664 nodes had 33,191 links

(11.94 % of total links, 13.14% of the links within site) to themselves. There were 25,363 links

(9.13% of the links) that pointed to nodes in other sites. Of the 191,737 tag-connections, 20,085 of the

tag-connections (10.48% of the tag-connections) pointed to nodes outside the site.

Table 2: Tags-attributes of links

Tags-attributes #Tags-connections % #Links %A href 107,251 55.94% 160,408 57.72%Img src 68,578 35.77% 94,419 33.98%Area href 6,450 3.36% 6,883 2.48%Body background 4,002 2.09% 4,189 1.51%Link href 1,987 1.04% 2,329 0.84%Img usemap 1,638 0.85% 7,806 2.81%Form action 695 0.36% 710 0.26%Frame src 583 0.30% 592 0.21%Script src 535 0.28% 535 0.19%Applet codebase 9 0.00% 10 0.00%Object classid 3 0.00% 3 0.00%Input src 3 0.00% 3 0.00%Object codebase 2 0.00% 2 0.00%Script for 1 0.00% 1 0.00%Total 191,737 277,890

39

In the navigation process by a Web browser, only the anchor links, i.e. “A href” and “AREA

href”, are shown as active areas. In this paper, these types of links will be defined as “navigational

links”. From the data, there are 167,291 navigational links (60.20 % of all links). The situation is

made more complicated by the fact that while the user does not “navigate” links that are a part of the

frame source structure, these do represent connections. “FRAME src” and “IFRAME src” links are

defined, for definitional clarity, as structural links.

In the graph structure representation of a Web site the anchor and frame were used to

connect HTML files. In this paper, the “connections” are defined as the number of connected pairs of

HTML nodes, regardless of how many navigational links and structural links connect them, within

the site pointed to by the “A href,” “AREA href,” “FRAME src” and “IFRAME src” tags-attributes.

There were 72,651 connections. Figure 8 shows a breakdown of the URLs identified at each site

classified as the total number of URLs discovered, the number of URLs pointing to nodes within the

site, and the number of in site nodes that were classed as HTML nodes. The graph is ordered by site’s

rank of total URLs. The graph shows that the rank versus total URLs distribution can be described by

a power law. Note that the number of URLs is in log scale. The distribution is similar to the

Huberman & Adamic (1999) prediction.

The distribution of the number of links at each Web site is shown in Figure 9. The Figure

shows the total number of links, internal and external at the site along with the number of

navigational links to other nodes within the site, i.e. navigational links to external sites are not

counted. Finally, the figure shows the number of connections to other nodes in the site. Note that the

graph uses a log scale.

40

Figure 8: Number of URLs

1

10

100

1000

10000

100000

Site sorted by number of total link

Num

ber o

f lin

ks

Total linksNavigation linksConnections

Figure 9: Number of links

41

Descriptive statistics of the number of URLs and links show in Table 3. Note that the

numbers of URLs/site and number of links/site have high standard deviation and skewness values.

The histogram of number of HTML nodes and connected links shows in Figure 10. There are high

correlations between the total number of URLs discovered, the number of URLs pointing to nodes

within the site, and the number of nodes in a site (r = 0.935 - 0.992, detail in Appendix D, Table 44).

These show that the ratio of HTML nodes to URLs within site is approximately constant (mean 0.27,

SD 0.13). There are also high correlations between each of links type groups (r = 0.960 – 0.969,

detail in Appendix D, Table 45). These show that the ratio of connected links to total links within site

(mean 0.23, SD 0.12) and ratio of navigation links to total links within site (mean 0.64, SD 0.17) are

approximately constant.

Table 3: Descriptive Statistics of number of nodes and links

Minimum Maximum Mean Std. Dev. SkewnessTotal URLs 13 8476 601.64 1225.34 5.011 URLs within site 13 8035 465.16 1128.67 5.489 HTML nodes 2 1842 150.09 256.03 4.341 Total links 16 48652 3472.52 7848.87 4.709 Navigation links 8 26911 2090.81 4107.55 4.085 Connections 1 14678 908.14 2162.28 4.648

Figure 10: Histogram of number of HTML nodes and number of connections

The total URLs versus total links and HTML nodes versus connections were plotted and are

shown in Figure 11. The plot is in log-log scale. Note that the number of links is at least the number

of nodes minus one because of the scanning process using the spider software. Only connected

components were discovered. The upper limit of number of connections is n2 – n where n is the

number of HTML nodes. There is a high correlation between HTML nodes and connections (Pearson

Correlation r = 0.82, p < 0.0001).

42

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04Number of nodes

Num

ber o

f lin

ksTotal URLs vs. Total linksHTML nodes vs. connect links

n 2 - n

n - 1

Figure 11: Total URLs versus total links and HTML node versus connections of each site

With HTML files as nodes of the graph and links as the edges of the graph, an “all-pair-

shortest path” algorithm was applied to compute all distances between nodes. The distance metrics

were computed in three graphs; directed graph, bi-directional graph and “jump to root” graph. Given

the nature of HTML links, the hypertext network structure is a directed graph. The bi-directional

graph assumes that no direction in the hypertext structure. The distance computed by bi-directional

graph will be the lowest distance between all nodes provided by the structure. The “jump to root”

graph is computed by assuming the distance of non-connected pair of nodes, in directed graph, to be

the distance from root to the target node plus one, which is the jump from source to root node.

The following data and metrics were collected or computed to provide an indication of the

Web site structure:

Number of HTML pages

Number of connections

Number of connections per number of HTML node - 1

Connection ratio – define as number of connected nodes pair per number of all possible

connected nodes pair

Mean of directed distance – computed based on only connected nodes pair

Median of directed distance

Standard deviation of directed distance

43

Skew factor of directed distance

Compactness

Stratum

Mean bi-direction distance

Mean jump-to-root distance

Mean of distance from root node

Descriptive statistics for the data are shown in Table 4.

Table 4: Descriptive statistic of Web site properties

N Min Max MeanStd.

Deviation Median SkewnessHTML nodes 80 2 1842 150.09 256.03 61 4.341Connections 80 1 14678 908.14 2162.28 215.5 4.648Connections per HTML node - 1 ratio 80 1.0000 22.9394 4.9901 4.7036 3.113 2.060

Connected ratio 80 .0298 1.0000 .6333 .3031 .705 -.551Compactness 80 .0296 .9912 .6146 .2938 .676 -.532Stratum 79 .0017 1.0000 .0930 .1243 .072 5.227Directed distance 80 .8000 6.5762 2.7563 1.2331 2.536 1.121Bi-direction distance 80 .7000 7.3026 2.5345 1.0782 2.275 1.419Jump to root distance 80 1.0000 8.1699 2.9804 1.2398 2.889 1.268Root distance 80 .5000 7.3783 2.4147 1.0528 2.229 1.676

The distribution of connections per HTML node –1 is shown as Figure 12. Only a small

number of sites have a connection per HTML node -1 ratio higher than 10.

However, nodes are connected within a Web site more than between Web sites as reported by

Broder et al. (2000). Broder further found that 75% of randomly selected pairs of nodes would not

have a directed connection path. The connected ratio showed that, on an average site, 63% of the

pairs of node are connected. Thus, 37% have no connection. Moreover, from the distribution of the

connected ratio, shown in Figure 13, and the skewness of the connected ratio, there is an indication of

a trend toward high connected ratio. The data also show very high correlation between connected

ratio and compactness (Pearson Correlation r = 0.998, p < 0.001). There are high correlations

between connection per HTML node – 1 and connected ratio and compactness (r = 0.594, p < 0.001

and r = 0.619, p < 0.001 respectively).

44

Link per node-1 ratio

23.021.0

19.017.0

15.013.0

11.09.07.05.03.01.0

Link per node-1 ratio

Freq

uenc

y

20

10

0

Std. Dev = 4.70

Mean = 5.0

N = 80.00

Figure 12: Histogram of #connections per #HTML node-1

Connected ratio

1.00.94

.88.81

.75.69

.63.56

.50.44

.38.31

.25.19

.13.06

0.00

Connected ratio

Freq

uenc

y

12

10

8

6

4

2

0

Std. Dev = .30

Mean = .63

N = 80.00

Figure 13: Histogram of connected ratio

Stratum values of most Web sites are small. The stratum value’s distribution is shown in

Figure 14. This suggests that most Web sites have several paths to travel between nodes. The mean distance between nodes in Web sites has a distribution as shown in Figure 15. Note

that the directed distance mean is computed based on connected nodes only. The bi-direction distance

mean is computed from the mean distance of all pair of nodes. On average, the mean distance

between nodes within Web site is 2.75 if they are connected. Mean of bi-direction distance, on

average, is 2.53. Jump to root distance mean value, on average, is 2.98. Mean distance from root

node, on average, is 2.41. Most of the sites have short mean distances within 4. The mean value of

distance from root indicates that most of Web sites are shallow. Some of the mean distance values are

less than one because the zero distance is used between “Frame” nodes. The data show there are high

correlations between distance measurements (Appendix D, Table 46).

45

Stratum

1.00.94

.88.81

.75.69

.63.56

.50.44

.38.31

.25.19

.13.06

0.00

Stratum

Freq

uenc

y

30

20

10

0

Std. Dev = .12

Mean = .09

N = 79.00

Figure 14: Histogram of stratum

Directed distance mean

6.255.75

5.254.75

4.253.75

3.252.75

2.251.75

1.25.75


Freq

uenc

y

12

10

8

6

4

2

0

Std. Dev = 1.23

Mean = 2.76

N = 80.00

Bi-direction distance mean

7.507.00

6.506.00

5.505.00

4.504.00

3.503.00

2.502.00

1.501.00

.50

Bi-direction distance meanFr

eque

ncy

30

20

10

0

Std. Dev = 1.08

Mean = 2.53

N = 80.00

Jump to root distance mean

8.007.50

7.006.50

6.005.50

5.004.50

4.003.50

3.002.50

2.001.50

1.00


Freq

uenc

y

20

10

0

Std. Dev = 1.24

Mean = 2.98

N = 80.00

Root distance mean

7.507.00

6.506.00

5.505.00

4.504.00

3.503.00

2.502.00

1.501.00

.50

Root distance mean

Freq

uenc

y

30

20

10

0

Std. Dev = 1.05

Mean = 2.41

N = 80.00

Figure 15: Histograms of distances

46

It is expected that when the number of nodes increases, the distance between nodes in the

Web site will increase. The relationship between mean directed distance and number of nodes is

shown in Figure 16. This graph shows a tendency toward higher distance between nodes when the

number of nodes is high. However, some Web sites had a shorter distance between nodes compared

to other site with similar number of nodes. The correlation between number of HTML nodes and

distance metrics were not high (r = 0.507-0.647, Appendix D, Table 47).

The scatter plots between all pair of metrics are shown in Figure 17 and the detail of

correlations between all pair of metrics is shown in Appendix D, Table 47.

In conclusion, these preliminary studies show that there is a high correlation between number

of nodes and number of links, but a low correlation between nodes and distance. Also, structure as

measured by number of connections per HTML nodes –1, stratum, connected ratio and compactness

appears to provide additional data about the complexity of a Web site.

In this study, complex sites – those that might be predicted to benefit from a tool to visualize

a site might be characterized as large size, far distance and small connected ratio. Those that would

not require such a tool would be small size, close distance and high connected ratio. From the data,

“small size” Web sites are defined as Web sites with less than 60 HTML nodes – lower than 50% of

rank. “Large size” Web sites are defined as Web sites with more than 60 HTML nodes. “Close

distance” Web sites are defined as Web sites with mean root distance less than 2.4, mean of root

distance, and “far distance” Web sites are defined as Web sites with mean root distance more than

2.4. “Small connected ratio” Web sites are defined as Web sites with connected ratio less than 0.704,

median of connected ratio, and “high connected ratio” Web sites are defined as Web sites with

connected ratio more than 0.704. The numbers of Web sites of each category given by these

classifications are shown in Table 5.

Table 5: Number of Web Sites by their complexity

Root distanceClose Close Total Far Far Total

Grand TotalConnected ratio Connected ratio

Size High Small High Small Small 19* 14 33 4 3 7 40 Large 5 6 11 12 17** 29 40Grand Total 24 20 44 16 20 36 80* low complexity** high complexity

47

0

1

2

3

4

5

6

7

8

1 10 100 1000 10000

Number of HTML nodes

Mea

n di

stan

ce

directed distancebi-direction distance

Figure 16: Mean directed distance and bi-direction distance versus Number of HTML nodes

#Nodes

#C links

L/N

Co nt. ratio

Comp actness

Stratum

Direct. dist.

Bi- d istance

Jump d istance

Root d istance

Figure 17: Scatter plots between Web site parameters

48

3.1.4 Task and semantic relatedness

This study examines navigation in support of information finding tasks, specifically, a

targeted search task. Subjects will be asked to answer questions that may or may not be answerable

based on where the information is located in a Web site that is new to them. The extent, organization,

and content will be new. This kind of task is one frequently under taken in the WWW context (GVU,

1998). For a targeted search task, only an answerable question will be used in the controlled

experiment. A question that not is answerable would require a lot of effort by subjects; they might

have to look at all the pages in the Web site.

The “information scent” or semantic relatedness between information needs (i.e. given

questions, and information provided by the site through tools, like node name, content of Web page

and anchor text) play a major role in navigation. If there are no hints from the node name in a

graphical overview, the advantage of being “one click away” cannot be used. In contrast, in a highly

complex environment, with multiple links between nodes and minimal intermediate path information,

browsers may offer significant advantages, both in the short run, i.e. an information-finding task and

in the long run, i.e. a space structure modeling task. In this study, the information scent will be

measured and controlled.

The operational information scent score can be found in Pirolli, Card, & Wege (2000). It is

defined as follows:

“Information scent = the proportion of participants who correctly identified the location of the

task answer from looking at upper branches in the tree.” (Pirolli, Card, & Wege, 2000) p. 5.

The tree browser, i.e. Microsoft Window Explorer, was used in their experiment.

In this paper, because a Web site has a network structure, a difference technique is used for

measuring the information scent. It is not practical to measure information scent of all pages in a Web

site because of its size. The shortest path from the root page was chosen as representative of the

information scent. The root page is a common entry point to the Web site and it is the entry point for

the main experiment. For a given target page of a specific question, the shortest paths from root page

to the target page was identified. Many shortest paths are possible but only one is selected. The

pages, in the selected shortest path, were presented to subject to select the anchor(s) which the subject

believes will lead to the target node. An anchor weighted positively if it lead toward target page, and

negatively if it moved farther from the target node. The information scent of the question and the

Web site is defined as the sum of average score of pages within the shortest path. In order to compare

between questions and Web site, the information scent is normalized by path length. The average of

information scent from 10 subjects was used.

49

3.1.5 Navigational tools and Integration

Two common navigational tools that are frequently used in a network structured document

space are: browser, with visible anchors connected to links; and graphical overview, a graph showing

link structure and nodes. The browser is the most common navigational tool in the WWW. Many

graphical overviews have been developed in hypertext research. There is theoretical support that they

will be useful in WWW navigation.

A browser provides navigation capability as well as document content presentation. A

browser presents only one page or node, at a time. It is similar to navigating in an egocentric view. In

contrast, a graphical overview presents a view of the overall structure of a hypertext, an exocentric

view. Depending on the size of the document space, a graphical overview may present only a local

overview of the space. With a scroll bar, other areas can be shown. A graphical overview navigates a

document space via active graphical objects. In order to access the content of a node, a graphical

overview has to be integrated with a text viewer or browser.

Browsers and graphical overviews are different tools. One would expect that they would be

better for navigation of one or another kinds of space. In general, the browser may be better for a

highly structured document space. For instance, a linear list hypertext may be visited in order, using a

browser. In contrast, a highly complicated space might be better navigated with a graphical overview.

Every node, despite its distance from the current node, is only one click away in a graphical

overview. More abstractly, a graphical overview will be of more use where the user either has or is

able to construct a simple mental model of the document space.

The anchor in a browser generally provides more information about nodes and links than

does a graphical overview. It can be used to provide more semantic information about the target node

in a limited display space. The semantic information in an anchor combined with the surrounding text

information allows a more accurate path selection than is possible with a graphical overview. A

graphical overview may show only the structure and a brief node name.

While many implementations of browsers and graphical overviews exist, experiment specific

navigation software was developed in order to automatically capture user interaction with the system.

The software was written in JAVA. The browser program was written by using “Web Browser”

object provided by Microsoft Development platform. The interface of the browser is a simplification

of the Web Browser (i.e. Internet Explorer); the features were minimized to be able to navigate in a

Web site. The sample of the browser screen is shown in Figure 18. The graphical overview was

written by modifying the generic graph viewing program from Visualizing Graphs with Java (VGJ,

1998). The program was modified to be able to read Web site structure data from the Spider. There

are many way to present a Web site structure and many interaction techniques as previously

50

discussed. The simple presentation is used by showing a Web structure in 2D planar graph. A tree

layout algorithm is used. The Web site graph is simplified by a breadth-first search. The graphical

overview uses the text viewer to present detail of the selected node. The sample of the graphical

overview and its text viewer screen is shown in Figure 19. The graphical overview also provides

panning and zooming functions.

As described previously, there are many ways to integrate navigational tools. This study used

a synchronized condition. The browser and the graphical overview were synchronized, i.e. navigation

using either tool will cause a change in the other view and the selected object will be the same in both

tools. The display layout of tools was side-by-side without adjustment ability. This layout is chosen

because it made the tools available to the user all the time. Multiple independent windows can cause

distraction in performing experimental tasks, i.e. adjusting the window size or moving the window to

the front or back. The integrated browser and the graphical overview screen is show in Figure 20.

Figure 18: The browser screen snapshot

51

Figure 19: The graphical overview and text viewer screen snapshot

Figure 20: The graphical overview and the browser

52

3.1.6 Summary

In conclusion, it was hypothesized that navigational tool performance depends on the

semantic relatedness of information presented to the information needed and the complexity of a

document space. The complexity of a document space may be described by a document space’s size

and its structure. However, document structure as described in Schoon’s taxonomy is subjective. The

number of html nodes, mean root distance and connected ratio are proposed for this study as

sufficient metrics.

In a space where there is a strongly semantic relationship between the question and the

content or node name, a user knows where to navigate. In the case of the browser, the structure of the

document space will have an impact on navigation performance. For example, a star configuration

will allow one-step access to information. In contrast, a linear list will require n steps where n is the

distance between the original node and the target node. In the case of the graphical overview, it is

possible in one jump to find the answer node.

In a space where there is a small semantic relationship between question and node context,

navigation may be considered a random walk through the space. Size, average distance between

nodes, and variation of distance between nodes will cause the difference in navigation performance.

However, the effect may not be equal to that of using the graphical overview and the browser

together.

3.2 HypothesesThis research examines the effect of integrated navigational tools on information finding in

closed hypertext. Navigational tools may operate at different levels of performance in different

environments. Integrated tools may be useful in selected environments under given predictable

conditions.

The Null hypothesis of this research is:

H0: There is no difference in user performance in information-finding tasks between integrated

navigational tools and individual navigational tools.

The working hypothesis is:

H1: There are significant differences in user performance in information-finding tasks when using

different navigational tools in certain kinds of environments.

Two navigational tools will be used, the graphical overview and the browser. The study will

assess three navigational tools conditions:

The browser alone, with a “back” facility (in essence, a history list)

53

The graphical overview with a text display window (no link following capability in a text

display)

The browser and the graphical overview with both tools synchronized.

Further, the closed hypertext, where information-finding tasks are carried out, will be

controlled in terms of the “complexity” of the hypertext and the “information scent.”

H1a: Integrated navigational tools, i.e. the browser and the graphical overview, will provide higher

performance in information-finding tasks and navigation within complex Web site spaces with high

information scent than will the browser or the graphical overview alone.

Performance will be measured by the following:

Number of tasks completion

Number of answers found

Time to complete the task

Total number of page views

Total number of pages

Total number of re-visited page views

Total number of extra page views

The Web sites will be measured by using the following criteria:

Number of HTML nodes small/large

Mean root distance small/large

Connected ratio small/large

Web site complexity is defined as high for large number of HTML nodes with high mean root

distance and high connected ratio, and as low complexity for sites which has small number of HTML

nodes with small mean root distance and small connected ratio. Two sets of Web sites were selected

based on low/high complexity measurements.

The information-finding tasks were conducted by giving subjects questions and asking them

to find the Web pages that contain the answer of the given questions.

The Web sites’ “information scent” or the relationship between question asked and the

amount of semantic information in the presentation were measured as described in section 3.4.2, and

each question at each Web site was classified as having low/high information scent.

Additional Hypotheses are:

H2: Subjects will perform better when using the browser than when using the graphical overview in

simple structured Web sites with little information scent.

54

Auxiliary Hypotheses

H3: Subject performance when using integrated navigational tools will degrade with the simplicity

of the hypertext, as the tool becomes a noise contributor rather than an information provider.

The ceiling effect and familiarity of navigational tool were also of concern. The ceiling effect

might occur when a single navigational tool can be used at the highest performance. Thus, the

improvement in performance due to an integrated navigational tool might not be able to shown.

Regards familiarity, the single familiar navigational tool in an integrated environment might be the

only one used and as the result, no performance improvement will be achieved.

3.3 Participants108 subjects (54 man and 54 women, using Internet more than one year) were recruited from

the University student population. Subjects were randomly assigned to each condition. Subjects were

paid (15 $US). Each subject performed a total 27 information-finding tasks, using three navigational

tool conditions. The experiment was expected to be completed in 90 minutes. Individual differences

such as gender and experience can affect an experiment as shown in the previous discussion. The

gender group, and experience were controlled in the recruitment process. This experience (using

Internet more than one year) is a majority of the Web user (GVU, 1998).

3.4 Material

3.4.1 Web Sites

Eight Web sites from the preparatory study were used in this experiment. Three high

complexity Web sites and three of low complexity Web sites were used. This selection was based on

their complexity property and additional properties such as including small numbers of pages that

contains programs and frames, small number of error pages and high variety of title text. Two Web

sites were selected for practicing. The list of selected Web sites and their properties are shown in

Appendix E. Selected Web sites were re-scanned in October, 2000 and copied into local storage. The

links to other sites were removed. The search facility, input form, server site query and java applets

were removed.

3.4.2 Questions and their Information Scent

Questions used in the experiment were prepared and controlled by the following conditions:

The questions were related to the content of Web site.

55

The questions were specific. The answer of the question appears in the textual content of the

Web page. There was no need to derive information from the Web page to answer the

question.

The questions were prepared by randomly selecting a Web page as a target of searching. The

given Web page was used for generating questions base on the content of the page. Six questions

were prepared for each Web site. The selected target Web pages and questions are shown in

Appendix F, Table 48. There were some pages that were difficult to create questions for, i.e. pages

that only contained links to other pages and pages that contained “The information you have

requested is being compiled and is not yet ready for posting here.” These pages, root pages, and

already selected pages were ignored and a new page randomly selected.

The graphical overview screen snapshot of the Web Site was generated. The label in a

graphical overview was extracted from title of the Web page. If the Web page did not contain a title,

one was generated from the heading or headings on the page or from the file name.

The Information scent or semantic relatedness between the question asked and presentation

was measured by the experiment discussed below. From the result of the information scent

experiment, four questions (two high information scent questions and two low information scent

questions) were selected for each selected Web site.

Information Scent Measurement Experiment

Ten subjects were recruited from the University student population. Subjects were given a

total of 36 questions (6 question for each of 6 selected Web site). For each question one graphical

overview and a set of Web pages were presented to subject. The Web pages were on the shortest-path

from the starting page to the page that precedes the answer page. The selected pages for information

scent experiment and the questions are shown in Appendix F, Table 48.

For each question, subjects were asked to select and rank 3 of the labels in a graphical

overview and 3 anchors on Web pages that lead to the answer page. The number of Web pages for

each question varied. A total of 74 Web pages were used.

A program was written to present questions, graphical overview and Web pages, and collect

data. The instruction sheet used is shown in Appendix F.1. Subjects were not allowed to go back to

previous page or questions. There was no time limit. The ordering of questions presented to subject

was random. The Web pages, of each question, were presented in the same order as accessing by the

browser from the root node to the target node.

56

The graphical overview’s information scent score is a weighted total of the number of users

who selected the target Web page label. Weights of 1, 0.5 and 0.3 are assigned to the first, second and

third selection, respectively.

The browser’s information scent score is computed by the following:

The selected anchor from each page is weighted by the order. Weights of 1, 0.5 and

0.3 are assigned to the first, second and third selection, respectively.

Anchor is also weighted by +1 if the anchor leads toward the target node and -1 if the

anchor leads further from the target node.

The sum of selected order weight and anchor weight is a Web page information scent

score.

The average of Web pages information scent score is used to represent Web site

information scent score of each question.

Average of the subjects score represents the scent of the question in the Web Site.

The information scent score of each question is the average of the graphical overview scent

score and the browser scent score. The results from the experiment are shown in Appendix F, Table

49 and Figure 21. The information scent score was used to select and classify questions. Four

questions, the two lowest information scent score and the two highest information scent score for

each Web site, were selected for use in the main experiment. The selected questions are listed in

Appendix F, Table 48.

Using the mean of browser scent score and the mean of graphical overview scent score,

questions were classified into four groups; browser low/high scent score and graphical overview

low/high scent score as shown in Table 6. High information scent questions had a high information

scent score in both the graphical overview and the browser. Low information scent questions had a

low information scent score in both the graphical overview and the browser.

57

-0.60

-0.30

0.00

0.30

0.60

0.90

9Q2H

29Q

1H1

9Q3H

39Q

4L1

9Q6L

39Q

5L2

2Q1H

12Q

6L3

2Q2H

22Q

3H3

2Q5L

22Q

4L1

1Q1H

11Q

3H3

1Q6L

31Q

2H2

1Q4L

11Q

5L2

4Q2H

24Q

3H3

4Q6L

34Q

1H1

4Q4L

14Q

5L2

5Q3H

35Q

2H2

5Q6L

35Q

1H1

5Q5L

25Q

4L1

3Q2H

23Q

3H3

3Q1H

13Q

5L2

3Q4L

13Q

6L3

Avg. mapinformationscent

Avg. pagesinfomationscent

Avg.informationscent

Figure 21: Information scent score

Table 6: Questions classification based on their information scents

Map information scent< 0.39 (low) >= 0.39 (high)

Web information scent

< 0.22(low)

1Q4L1* 1Q5L2*2Q4L1* 2Q5L2*3Q4L1* 3Q6L3*4Q4L1* 4Q5L2* 4Q6L3 4Q1H15Q4L1* 5Q5L2*7Q1H1 7Q3L1* 7Q4L29Q5L2* 9Q6L3*

1Q2H2 1Q6L33Q1H15Q1H1 5Q6L3

>= 0.22(high)

2Q2H2 2Q3H33Q5L2

1Q1H1** 1Q3H3**2Q1H1** 2Q6L3**3Q2H2** 3Q3H3**4Q2H2** 4Q3H3**5Q2H2** 5Q3H3**9Q1H1** 9Q2H2** 9Q3H3 9Q4L1

* Selected as low information scent.** Selected as high information scent.

58

The reliability analysis (scale alpha) was applied to the selected questions with the ten

subjects information scent score. The 12 high information scent questions were high reliability (alpha

= 0.7730) and the 12 low information scent questions were low reliability (alpha = 0.1403).

Reliability analysis might be useful to classify question on their information scent.

The summary of the selected questions information scent scores, grouped by web site

complexity, is shown in Table 7. The overall information scent score in the high complexity web site

and the low complexity web site were similar.

The summary of minimum pages required to find the target page of the selected questions

group by Web site complexity and question type is shown in Table 8. The number of minimum pages

required to find the target page used in calculating the number of extra pages views, will be discussed

later in this chapter. The number of minimum pages required to find the target page were high in the

high complexity Web site because the randomly picked target pages in high complexity Web site

tended to have a high distance from root node.

Table 7: Summary of the information scent of the selected questions

Web site complexity

Question type Map information scent Pages information scent Overall information scentAvg. Std.Dev. Avg. Std.Dev. Avg. Std.Dev.

High High 0.52 0.253 0.49 0.431 0.51 0.291Low 0.19 0.241 -0.04 0.298 0.07 0.225

Low High 0.74 0.265 0.35 0.306 0.54 0.244Low 0.11 0.145 0.08 0.267 0.09 0.153

Table 8: Summary of the minimum pages required finding the selected target nodes.

Web site complexity

Question type Sum of minimum pages

Average of minimum pages

High High 28 4.67Low 29 4.83

High Total 57 4.75Low High 17 2.83

Low 18 3.00Low Total 35 2.92Grand Total 92 3.83

59

3.4.3 Software

The study assessed three navigational tools conditions:

The browser alone, with a “back” and “forward” facility (in essence, a history list)

The graphical overview with a text display window (no link-following capability in a text

display)

The browser and graphical overview with both tools synchronized and displayed on the

screen.

The footprint facility, i.e. a color of icon label changed when the corresponding Web page

was visited, was provided. Other navigational facilities such as the history list, content index, and

search were not used. The display of the browser and graphical overview had a fixed size and

structure.

The experimental software was a combination of the browser, the graphical overview, and the

integrated tool. Practice tasks were presented first by the software. Then, pre-determine sequence of

questions, Web sites, and tools were presented in order. The experimental software captured events

generated when user clicked the mouse. The events that related to the navigation process were

recorded into a database. The software also recorded the identification of the submitted page. The

next task in the sequence was automatically activated after submitting the result. The subject could

not go back to previous questions. In between each of tool task set, there was a one-minute waiting

screen.

The timer was shown in the interface. When the time limit expired, a dialog box was

presented to the subject. The answer was recorded as incorrect and the time was equal to the time

limit. After the experimental task was completed a questionnaire screen was shown. The instruction

and screen snapshot of the software is shown in Appendix G.

3.5 Experimental DesignThree independent variables are tools (3 level), questions (2 level) and Web sites (2 level).

Tools are the browser, the graphical overview and integrated tool. Questions are classified and

selected based on information scent score as high information scent and low information scent. Web

sites are classified as high complexity and low complexity. The experiment is full factorial 3 x 2 x 2,

within-subject testing.

Browser Graphical overview Browser + Graphical overview

Web l Web h Web l Web h Web l Web hIS l IS h IS l IS h IS l IS h IS l IS h IS l IS h IS l IS h

60

The within-subject testing is selected in order to minimize the effects of individual

differences. Each subject performed tasks with three tools, two question types, and two Web site type.

Each subject performed a total of 3 x 2 x 2 = 12 conditions. There were two questions for each

condition, repeated for more reliability, with a total of 24 information finding tasks for each subject.

In order to minimize the knowledge about Web site while browsing, which would help in

navigation, there were a total of 6 Web sites: 3 Web sites for each Web site condition. Web sites in

each information-finding task were seen four times, i.e. two questions in low information scent

condition and two questions in high information scent condition. In order to eliminate a Web site

difference versus tool condition effect, each set of Web sites was treated by Latin square block. A

pair of high complex Web site and low complex Web site is also block. For instance, high complex

Web site w1 will pair with low complex w2, w4 and w6.

However, this design might lead to sequence effect. The sequence effect was compensated by

that fact that the ordering of navigational tools was counter balanced by using all sequences. The

order of the Web sites in each navigational tool was counter balanced by using all sequences. The

ordering of four questions in each Web site was random.

Using a power analysis for a 3x2x2 factorial design with the significance level at 0.05 non-

directional, the small effect size (i.e. f = 0.01), and the power at 0.8, a power analysis indicated a need

for at least 82 subjects per cell. To develop a full counter balanced design, 108 subjects were decided

upon.

3.6 Experimental TaskThe information-finding task was simplified in this experiment. Subjects were given a direct

question and told to find a Web page that contained the answer within the time limit for each task.

3.7 ProcedureSubjects were randomly assigned to experimental conditions. Subjects were briefed about the

experiment’s objectives. Subjects were asked to perform tasks as fast as possible with the correct

result. Subjects were trained for 2 minutes in the use of each navigational tool. Subjects were then

allowed practice using all three navigational tools with dummy hypertext and 3 practice questions.

In the experimental session, subjects used each tool to find the answers of four questions

using the assigned Web page. Subject was limited to finding the answer within a 6 minutes period. If

the subject could not find the target page within time allotted, the answer would be assumed incorrect

and the time would be recorded as 6 minutes.

61

After finishing the experiment, subjects filled out a questionnaire which provided

demographic information, Web sites familiarity score form, and subjective evaluation information.

3.8 Data Collection and MeasurementThe navigation activity logs were used to capture measurement data. The navigation activity

log contains a list of the subject’s identification number, the identification number of the visited

nodes (Web pages), and the time stamp. It was generated by the software used in the experiment.

In addition to the navigation activity log, the software also captured the source of navigation

action. For instance, the browser has three methods for navigation, following links, “back” and

“forward.” In the graphical overview, there are three methods of navigation, which is to click on an

icons, “back” and “forward”. In the integrated tool, the navigation include following links, “back”

and “forward”, and click on an icon. With the integrated tool, the number of navigational actions

made by the graphical overview and by the browser were reported.

Time spent in each part of tool was recorded. This was approximated as a total time when a

mouse cursor was in each tool area. The timer for each tool would start counting when there was a

mouse “button down” action on the tool and the timer would stop when another tool got a mouse

action. Mouse action included scroll bar movement.

The pages viewed by the subject are defined as follows:

Page views – the total number of page views viewed in the browser by the subject.

One page viewed three times would constitute three “page views.”

Pages – the total number of unique page – duplicate viewings not counted.

Revisited page views – the number of viewing of various pages beyond the initial

view. Revisited page views would be three if one page was viewed four times or if

three pages were viewed twice.

Extra page views – when using the browser the extra page views is the number of

page views minus the shortest path. Whether the page views include the shortest path

pages is orthogonal to the calculation. When using the graphical overview or the

integrated tool, the extra page views is the number of pages views minus by two.

The relation of between page views, pages, and revisited page views is the following:

The number of page views = The number of pages + The number of revisited page views

The number of extra page views is a number of page that not necessary for navigation, in

theory. The number of extra page views is calculated by subtracting a number of pages by a number

62

of pages that necessary to perform the task. In the graphical overview and the integrated tool, only

two pages are necessary for the task, one for the first page, and second one for the target page. For

the browser, the number of nodes that are necessary is equal the distance from the root page to the

target page, which depends on the question. The number of extra page views is highly correlated to

the number of page views.

In an information finding task, fewer page views may be considered more efficient. In

general, the number of revisited page views is a loss of the navigation process.

Demographic data was collected with the form, shown in Appendix H.1. The Web sites

familiarity score form is shown in Appendix H.2. User preference was measured by a subjective

satisfaction questionnaire. The questionnaire is showed in Appendix H.3, based on Post-Study

System Usability Questionnaire (PSSUQ)(Lewis, 1995).

63

4 RESULTS AND DISCUSSION

4.1 Demographic Data of Recruited SubjectsA total of 111 subjects (55 male, 56 female) who had at least one-year’s experience using a

Web browser (i.e. Internet Explore or Netscape) were recruited from the University of Pittsburgh

student population. Three subjects did not complete the experiment due to personal time constraints

and a software problem, leaving a total of 108 for whom results are reported. The demographic data

of subjects are shown in Table 9. The computer experience and Web experience of subjects are

shown in Table 10.

Table 9: Summary of subjects’ demographic data

Category Range Frequency PercentGender Male 54 50%

Female 54 50%Age 16-20 33 30.6%(Years) 21-25 30 27.8%

26-30 25 23.1%31-35 9 8.3%36-40 6 5.6%41-45 2 1.9%46-50 3 2.8%

Education Freshman 16 14.8%Sophomore 18 16.7%Junior 14 13.0%Senior 20 18.5%Graduate School 40 37.0%

Major Arts and Sciences 39 36.1%Information Science 32 29.6%Business 11 10.2%Engineering 10 9.3%Law 3 2.8%Education 2 1.9%Health and Rehabilitation Services 1 0.9%Nursing 1 0.9%Pharmacy 1 0.9%Public and International Affairs 1 0.9%Public Health 1 0.9%Other 6 5.6%

64

Table 10: Summary of subjects’ computer experience data

Category Range Frequency PercentComputer Experience 3-5 31 28.7%(Years) 6-8 23 21.3%

9-11 30 27.8%12-14 13 12.0%15 or more 11 10.2%

Web Experience 1-3 18 16.7%(Years) 4-6 66 61.1%

7-10 24 22.2%Web Usage 0-1 3 2.8%(Hours/week) 2-4 11 10.2%

5-6 6 5.6%7-9 4 3.7%10-20 53 49.1%21-40 27 25.0%> 41 4 3.7%

Web browser familiarity Novice 9 8.3%Intermediate 58 53.7%Expert 41 38.0%

The subjects were equally balanced between men and women; a result of controlled recruiting

process to counter of the problem reported in GVU (1998) that women reported more problems in

navigation than men. The average age of the subjects was 25 years with the majority being

undergraduates (73% of the total subjects). They came from a variety of disciplines and report

extensive computer experience (average computer experience = 8.9 years), Web experience1, and

Web usage (average Web experience 5 years, average Web usage 17 hours per week).

4.2 ResultsEach of 108 subjects completed 24 tasks (3 tool types x 2 Web site complexity conditions x 2

question types x 2 questions). The tools, Web site complexity, and question type conditions were

applied using a full sequence counter balanced design within subjects. 23 subjects had been seen

some of the Web sites in the experiment, the detail and the impact is show in Appendix I.10.

1 24 subjects claim to have been using the web for longer than it has been available to the general population.

This might raise one question about exaggerated number in this kind of self-report data.

65

4.2.1 Tool usage

Each tool was used in a total of 864 tasks; each of the 108 subjects performed 8 tasks (2 Web

site conditions x 2 question types x 2 questions). The tool usage was indicated by the navigation

action generated by the tool and the time spent in the tool. The navigation action is the mouse click

action on the icons on the overview map and the mouse click action on anchors. The time spent in

each tool was recorded. Time began when a mouse action occurred in the tool window. Time ended

when there was another mouse action occurred in another tool windows. The mouse action included

navigation action and scrolling action.

Browser

Using the browser, subjects navigated by clicking on anchors, the back button, or the forward

button. A total of 11,867 navigation actions were collected from 864 tasks using the browser.

Navigation by clicking on anchors happened 9,006 times (75.9% of total navigation actions), by

clicking on the back button happened 2,831 times (23.9% of total navigation actions), and by clicking

on the forward button happened 30 times (0.3% of total navigation actions). A navigation action

using the browser is similar to the number of total pages viewed that will be analyzed later in this

chapter. Back and forward navigation are one of the sources in re-visited pages. Another source of

re-visited pages comes from pages that are a target of clicking on anchors to pages already visited.

From the navigational action log, the time between anchor clicking was computed. Statistics

on time between anchor clicks grouped by Web site complexity and information-scent question type

is shown in Table 11.

Table 11: Summary statistic of time between anchor clicks in the browser

Web complexity Question type Count Mean Median Std. Dev.High High 1533 8.65 7.0 7.381

Low 4526 9.47 7.0 8.658Low High 288 7.95 5.0 7.665

Low 1736 11.58 8.0 10.071

The time between anchor clicks was an exponential distribution. The log transformation was

applied. The ln(time between anchor clicks) was analyzed in ANOVA with treatment (Web

complexity x question type) as a within-subject factors (Table 12). The time between anchor clicks

was dependent on a two-way interaction between Web complexity and question type (Figure 22).

Pairwise comparison indicated that there were significant differences from each other (Table 13).

High information-scent question type had significantly lower time between anchor clicks.

66

Table 12: ANOVA on ln(time between anchor clicks) of the browser

SourceType III Sum

of Squares df Mean Square F Sig. WEB .030 1 .030 .053 .819 QUESTION 67.588 1 67.588 117.219 .000 WEB * QUESTION 21.842 1 21.842 37.881 .000 Error 4596.634 7972 .577

Table 13: Pairwise comparison between ln(time between anchor clicks), Bonferroni adjustment

Web complexity, Question type High, Low Low, High Low, LowHigh, High Mean Diff. -0.1175 0.1695 -0.2881 P <0.0001 0.0038 <0.0001High, Low Mean Diff. 0.287 -0.1706 P <0.0001 <0.0001Low, High Mean Diff. -0.4576 P <0.0001

Figure 22: Cell line chart of mean (time between anchor clicks) when using the browser

Graphical Overview

Using the graphical overview, subjects navigated by clicking on icons in the map view, or the

back or forward buttons in the text viewer. A total of 7,848 navigation actions were collected from

864 tasks. Navigation using the map occurred 7,787 times. Only 59 navigation actions were generated

by clicking the back button and only two from clicking the forward button, accounting for 0.75% and

67

0.03% of total navigation actions with the graphical overview. On average, subject spent 75.72% of

the total time using graphical overview on the map and 24.28% of the total time on the text viewer.

From the navigational action log, the time between icon clicks was computed. Statistics on

the time between icon clicks grouped by Web site complexity and information-scent question type is

shown in Table 14.

Table 14: Summary statistic of time between icon clicks of the graphical overview

Web complexity Question type Count Mean Median Std. Dev.High High 1022 11.73 7.0 12.361

Low 3078 11.24 6.0 12.753Low High 250 12.08 9.0 10.746

Low 2239 8.94 6.0 9.019

The time between icon clicks was an exponential distribution. The log transformation was

applied. The ln(time between icon clicks) was analyzed in ANOVA with treatment (Web complexity

x question type) as a within-subject factors (Table 15). The time between anchor clicks was

dependent on a two-way interaction between Web complexity and question type (Figure 23).

Table 15: ANOVA on ln(time between icon clicks) of the graphical overview

Source Type III Sum of Squares df Mean Square F Sig. WEB 3.249 1 3.249 3.648 .056 QUESTION 17.233 1 17.233 19.348 .000 WEB * QUESTION 5.820 1 5.820 6.534 .011 Error 5769.780 6478 .891

Figure 23: Cell line chart of mean (time between icon clicks) when using the graphical overview

68

Pairwise comparison indicated that in the high complexity Web site with the low

information-scent questions, the time between icon clicks was significantly lower than in other

conditions (Table 16). There were no significant differences in time between icon clicks when using

graphical overview in the high complexity Web site with low information-scent questions and in the

low complexity Web site. These indicated subjects tried to visit many pages in the low complexity

Web site when the question was low information scent.

Table 16: Pairwise comparison between ln(time between icon clicks), Bonferroni adjustment

Web complexity, Question type High, Low Low, High Low, LowHigh, High Mean Diff. 0.06702 -0.04761 0.2225 P 0.4316 1.0000 <0.0001High, Low Mean Diff. -0.1146 0.1555 P 0.4403 <0.0001Low, High Mean Diff. 0.2701 P 0.0002

Integrated tool

Using the integrated tools, subjects navigated by clicking on icons in the map view, or the

back or forward buttons, or by clicking on anchors in the browser. The back and forward buttons

were considered as a part of the browser. Overall, there were 4,490 browser navigation actions and

4,392 graphical overview navigation actions, 50.55 % and 49.45% of the total navigation actions

(8,882 actions) respectively, using the integrated tool. Navigation by clicking on anchors happened

3,313 times (37.3% of total navigation actions, 73.8% of browser navigation actions), by clicking on

the back button 1,152 times (13.0% of total navigation action, 25.7% of browser navigation action),

and by clicking on the forward button 25 times (0.3% of total navigation actions, 0.6% of browser

navigation actions). The total navigation actions when using the integrated tool was less than when

using the browser alone but more than when using the graphical overview alone. On average, subjects

spent 52.18% of the total time using the browser and 47.82% of total time using the graphical

overview in the integrated tool.

While on average, subjects used both tools in the integrated tool treatment further analysis

can be derived. In 403 of 864 tasks, subjects used both the browser part and the graphical overview

part within the same task, mixed mode. On the other hand, 227 tasks were navigated using the

browser alone and 227 tasks were navigated using the graphical overview alone. Using a single tool

for navigation accounted for 52.54% of total tasks using the integrated tool. However, subject

69

alternated use of the individual tool in difference tasks, only three subjects navigated using the

browser alone for all eight tasks and only one subject navigated using the graphical overview alone.

There was one major exception -- seven tasks were conducted without any navigation action from

either navigational tools (i.e. the subjects submitted the first page).

Summary of the nature of integrated tool navigation action grouped by Web site conditions

and question type conditions is shown in Table 17. In the low complexity Web site with the high

information-scent questions, the usage of a single tool may have been high because tasks in this

condition were simple so that a single tool was sufficient to finish the tasks. In the high complexity

Web site with the high information-scent questions, subjects used the browser alone more often than

they used the graphical overview alone but in the low complexity Web site with the low information-

scent questions, subjects used the graphical overview alone more often than they used the browser

alone. Note that, in the integrated tool, subjects saw the Web site map which revealed the complexity

of the Web site.

Table 17: Frequency Distribution for tool usage based on location of navigation actions

Web complexity

Question type

Browser alone

Percent. (row)

Graphical overview

alonePercent.

(row) MixPercent.

(row) TotalHigh High 73 34.4% 38 17.9% 101 47.6% 212

Low 29 13.6% 21 9.8% 164 76.6% 214Low High 81 37.5% 97 44.9% 38 17.6% 216

Low 44 20.5% 71 33.0% 100 46.5% 215Total 227 26.5% 227 26.5% 403 47.0% 857** 7 tasks did not used either tool

The Browser Navigation Action Ratio (BNAR) for the integrated tool indicates how much a

subject used the browser compared to the overall navigation actions within a task. This ratio is

computed by:

The browser navigation action ratio is 0 when there are only navigation actions from the

graphical overview in the integrated tool and 1 when the navigation actions only come from the

browser. The distribution of browser usage ratio, shown in Figure 24, is very high at zero and one and

uniformly distributed in between.

70

Figure 24: Histogram of browser navigation

action ratio in the integrated tool

Figure 25: Histogram of browser time usage

ratio in the integrated tool

Two subjects spent all their time using browser in the integrated tool in all 8 tasks. No subject

spent all of their time using the graphical overview in all 8 tasks. 90 tasks (10.42% of the total) were

performed with the subject spending all of the time using graphical overview alone without any

interaction with browser, not even scrolling the browser. 185 tasks (21.41 % of a total task) were

conducted by using browser alone, no time was detected on the graphical overview part. When

subjects used a browser alone to navigate (277 tasks), there were 42 tasks (18.5 %) that subjects spent

some period on the graphical overview. When subjects used a graphical overview alone to navigate

(277 tasks), there were 137 tasks (60.53%) when subject spent a period of time in the browser.

Browser time usage ratio (BTUR) for the integrated tool is computed by

The histogram of browser time usage ratio is show in Figure 25. The histogram shows high

usage of the browser alone (i.e. browser time usage ratio = 1). The browser navigation action ratio

and browser time usage ratio show high correlation (r = 0.824, p <0.001).

Table 18 (a) and (b) provide summary statistics for the browser navigation action ratio

(BNAR) and browser time usage ratio (BTUR). Using the integrated tool, the standard deviation of

BNAR and BTUR were very high because of the single tool usage as shown in Table 18 (a).

However, the average value of overall usage of integrated tool and when the integrated tool was used

in mixed mode were similar (Table 18 (b)). On average, the graphical overview map was used

slightly more than the browser part, the BNAR less than 0.5.

71

Table 18: Summary statistic of BNAR and BTUR grouped by Web site complexity conditions

and question type conditions

a) overall usage

OverallWeb complexity

Question type

Avg. BNAR Std Dev.

Avg. BTUR Std Dev.

High High 0.59 0.374 0.55 0.366Low 0.48 0.325 0.47 0.280

Low High 0.46 0.458 0.55 0.416Low 0.38 0.403 0.58 0.320

Overall 0.48 0.399 0.53 0.351

b) only mixed mode

Only mixed modeWeb complexity

Question type

Avg. BNAR Std Dev.

Avg. BTUR Std Dev.

High High 0.51 0.181 0.42 0.244Low 0.45 0.243 0.42 0.205

Low High 0.49 0.163 0.57 0.238Low 0.38 0.275 0.61 0.217

Overall 0.45 0.235 0.48 0.237

The browser navigation action ratio was analyzed using an ANOVA with treatment (Web

complexity x question type) as a within-subjects factor (Table 19). The Web complexity and question

type showed significance as the main effect (F = 2.732, p <0.003 and F=16.284, p <0.001,

respectively) without interaction (F = 0.231, p < 0.632).

Table 19: ANOVA on Browser Navigation Action Ratio

SourceType III Sum of

Squares df Mean Square F Sig. WEB 2.732 1 2.732 9.522 .003 QUESTION 1.863 1 1.863 16.284 .000 WEB * QUESTION .032 1 .032 .231 .632 Error 14.730 107 .138

The pairwise comparison indicated that the browser navigation action ratio was significantly

higher in the high complexity Web site than in the low complexity Web site (Bonferroni, Mean Diff.

= 0.112, p = 0.003) and it was also significantly higher with the high information-scent questions than

with the low information-scent questions (Bonferroni, Mean Diff. = 0.093, p < 0.001). Subjects used

the browser part of the integrated tool for navigating more when Web sites were high complexity

53% of the time compared to 42% of the time in the low complexity Web site. Subject also

72

increasingly used the browser part of the integrated tool for navigating when questions were higher

information scent, 52 % of the time compared to 43% of the time in the low information-scent

questions.

From the navigational activity log, state transition probability of the integrated tool usage

were calculated and shown in Figure 26 and Table 20. Time between state transitions was

summarized and shown in Table 21 (forward actions were not included). The time between

transitions had an exponential distribution.

The state transition probability shows that using a single tool consecutively was more

common than switching between tools. In the beginning of the tasks, the graphical overview and the

browser were selected with equal frequency.

Using the back button after using the map view was rare compared to using it after clicking

on an anchor. Using the back button to navigate was less frequent in the high information-scent

questions. Consecutive back-clicking was more common in the high complexity Web sites than in the

low complexity Web sites. The average time between consecutive clicking of the back button was

very short, 1.76 sec.

StartSubmit

Time out

Map action

Anchor action

Back actionForward action

0.50

0.49 0.11

0.08

0.01

0.77

0.11

0.020.01

0.13 0.51

0.24

0.02

0.010.10

0.01

0.63

0.23

0.020.04

0.320.36

0.20 0.08

Figure 26: State transition probability in using the integrated tool

73

Table 20: State transition probability in using the integrated tool

Next actionWeb site complexity

Question type

Previous action

Map Anchor Back Forward Submit Time out

Over all Start .50 .49 .01Map .77 .11 .02 .08 .01Anchor .13 .51 .24 .11 .01Back .10 .63 .23 .02 .02 .01Forward .32 .36 .20 .08 .04

High High Start .46 .52 .02Map .64 .18 .04 .13 <.01Anchor .11 .59 .07 .22 <.01Back .24 .35 .35 .01 .05Forward 1.00

Low Start .43 .56 .01Map .80 .12 .02 .03 .02Anchor .15 .57 .22 .04 .01Back .09 .56 .31 .02 .01 .01Forward .29 .41 .18 .06 .06

Low High Start .60 .40Map .37 .16 .03 .44Anchor .09 .36 .06 .49Back .04 .61 .17 .09 .09Forward .50 .50

Low Start .51 .49 .00Map .84 .05 .01 .09 <.01Anchor .13 .33 .46 .08 <.01Back .08 .82 .07 .01 .02 <.01Forward .20 .20 .40 .20

74

Table 21: Time between state transitions in using the integrated tool

Time in between (sec)

Next actionWeb site complexity

Question type

Previous action Map Anchor Back Submit

Time out

Overall Start Avg. 17.43 10.99 5.00StdDev. 19.13 14.08 5.86

Map Avg. 9.93 11.23 15.51 9.48 18.36StdDev. 13.19 13.71 22.89 12.01 22.23

Anchor Avg. 24.35 10.63 9.43 8.27 18.62StdDev. 20.12 11.97 11.28 9.00 22.37

Back Avg. 17.55 4.65 1.76 10.55 13.11StdDev. 17.22 6.83 2.91 11.67 15.37

High High Start Avg. 17.83 9.48 7.25StdDev. 14.11 13.22 7.14

Map Avg. 11.77 8.59 10.33 10.06 26.00StdDev. 17.66 7.78 10.32 13.61 .


Back Avg. 16.54 6.09 1.44 20.00StdDev. 19.25 4.93 2.06 19.12

Low Start Avg. 31.60 14.87 1.00StdDev. 30.12 18.16 0.00

Map Avg. 11.31 10.75 18.51 14.45 18.55StdDev. 14.77 12.72 28.30 21.21 22.80


Back Avg. 19.09 4.84 1.69 8.25 14.25StdDev. 18.38 7.74 3.05 3.95 16.02

Low High Start Avg. 9.56 5.56StdDev. 8.12 5.64

Map Avg. 9.67 10.90 21.67 8.12StdDev. 10.29 7.27 31.30 7.06

Anchor Avg. 12.85 7.76 10.54 7.81StdDev. 7.43 7.84 8.79 5.75

Back Avg. 12.00 6.79 3.50 7.50StdDev. . 6.09 3.70 2.12

Low Start Avg. 14.47 12.59 4.00StdDev. 13.25 12.89 .

Map Avg. 7.64 16.23 11.13 8.07 12.67StdDev. 8.64 21.72 7.41 7.31 16.86


Back Avg. 15.55 4.18 2.42 7.00 4.00StdDev. 13.19 5.83 2.56 7.71 .

75

The natural logarithmic transformation was applied to the data on time between clicks. The

transformed time between clicks was analyzed using an ANOVA with treatment (Web complexity x

question type x event type) as a within-subjects factor (Table 22). There was a significant three-way

interaction between Web site complexity, question type, and event type (F = 3.094, p = 0.026) (Figure

27).

The time between anchor to icon click was significantly higher than time between other

clicks, i.e. icon to icon, anchor to anchor, and icon to anchor in most case (Appendix I.1, Table 50).

There is one exception. In the low complexity Web site with high information-scent type questions,

the time between anchor to icon click was not significantly different from the others. There were no

significant differences between the times for icon to anchor and anchor to anchor. This would seem to

indicate that the time for reorientation to the map was higher than the time to reorientation to the

browser.

Table 22: ANOVA on ln(time between clicking) when using the integrated tool


Squares df Mean Square F Sig. WEB 1.691 1 1.691 2.320 .128 QUESTION 4.464 1 4.464 6.122 .013 WEB * QUESTION .356 1 .356 .488 .485 WEB * EVENT 9.987 3 3.329 4.566 .003 QUESTION * EVENT 29.909 3 9.970 13.675 .000 WEB * QUESTION * EVENT 6.767 3 2.256 3.094 .026 Error 4770.929 6544 .729

Figure 27: Cell line chart of mean (time between clicking) when using the integrated tool

76

To compare the difference in usage between the integrated tool and the individual tools, two

ANOVAs were applied to the time between clicks (transformed by the natural logarithmic). The

ln(time between anchor clicks) when using the browser alone was compared to the ln(time between

anchor clicks) when using the browser part of the integrated tool by ANOVA with treatment (tool x

Web complexity x question type conditions) as a within subjects factor (Table 23). The ln(time

between icon clicks) when using the graphical overview alone was compared to the ln(time between

icon clicks) when using the graphical overview part (map viewer) of the integrated tool by ANOVA

with treatment (tool x Web complexity x question type conditions) as a within subjects factor (Table

24).

Table 23: ANOVA on ln(time between anchors clicking) comparison the browser and the

integrated tool

SourceType III Sum

of Squares df Mean Square F Sig. TOOL 13.699 1 13.699 23.351 .000 WEB .130 1 .130 .221 .638 QUESTION 67.545 1 67.545 115.135 .000 TOOL * WEB .059 1 .059 .100 .752 TOOL * QUESTION .141 1 .141 .241 .623 WEB * QUESTION 14.009 1 14.009 23.879 .000 TOOL * WEB * QUESTION .924 1 .924 1.575 .209 Error 6056.022 10323 .587

Table 24: ANOVA on ln(time between icons clicking) comparison the graphical overview and

the integrated tool

SourceType III Sum

of Squares df Mean Square F Sig. TOOL 1.455 1 1.455 1.636 .201 WEB 4.463 1 4.463 5.018 .025 QUESTION 21.229 1 21.229 23.870 .000 TOOL * WEB .027 1 .027 .031 .861 TOOL * QUESTION .461 1 .461 .518 .472 WEB * QUESTION 7.161 1 7.161 8.052 .005 TOOL * WEB * QUESTION .150 1 .150 .169 .681 Error 8696.946 9779 .889

There was a significant difference in the time between anchor clicks when using the browser

alone and when using the browser part of the integrated tool (Figure 28). The difference in the tool

was the main effect (F=23.351, p<0.001). There were no significant interactions between tool and

77

Web site complexity or question type. The effect of Web site complexity, question type, and their

interaction were similar to when using the browser alone. Pairwise comparison indicated that the time

between anchor clicks when using the integrated tool was significantly higher than when using the

browser alone (Mean Diff = 1.532, p <0.001 in ln(sec)). One of the reasons may be related to the size

of browser part in the integrated tool. It was half the size of the browser tool alone. This may have

caused subjects to scroll more. Another reason may be that subjects get information from the

graphical overview part of the integrated tool.

Figure 28: Cell line chart of mean ln(time between anchor-anchor clicking) when using the

browser and using the integrated tool

On the other hand, there was no significant difference in the time between icon clicks when

using the graphical overview alone and when using the graphical overview part of the integrated tool

(Figure 29). There were no tool effects either as the main effect and or as interaction with Web

complexity and information-scent type questions. The effect of Web site complexity, question type,

and their interaction were similar to when using the graphical overview alone.

Figure 29: Cell line chart of mean ln(time between icon-icon clicking) when using the graphical

overview and using the integrated tool

78

Adjusted time spent on tool

Measuring time spent on tool by using mouse action on the tool as an indication of usage

suffers from several possible errors. First, the time might be divided between the tools when there is a

transition. Second the time allocated to one tool, even when there was not a transition, may have been

split between two tools without an action in the second tool. The simple case for such a division

would be when the subject clicks only in the overview but is looking at the browser to determine if

they reached the target page.

The adjustment of the tool usage time was recomputed with the time data from the activity

log using the following formula

1. From start to first click time goes to tool of first click.

2. All additional time allocated as followings:

a. If tool at time n is the same as tool at time n-1, time goes to the tool

b.If tool at time n is not the same as tool at time n-1, 50% of the time goes to each tool.

3. When time between icons clicks exceeds 9 seconds, 50% of the excess time goes to the

browser.

4. When time between anchor clicks exceeds 8 seconds, 50% of the excess time goes to the

map.

The adjusted time spent on tool and the adjusted browser time spent ratio were computed.

The summary statistics comparing the time spent on tool is showed in Table 25.

Table 25: Summary statistic of adjusted time spent on tool

Time spent on toolAdjusted time spent on

toolBrowser Graphical

OverviewBTUR Browser Graphical

overviewAdj. BTUR

Mean 57.495 52.707 .534 57.579 62.850 .527Std. Dev. 71.521 72.402 .352 67.428 69.187 .346Std. Error 2.433 2.463 .012 2.303 2.363 .012

There was high correlation between the initial calculation of the time spent on tool and the

adjusted time spent on tool. The correlation coefficient between the original calculation and the

adjusted time on the browser was 0.913, p < 0.001. The correlation coefficient between the original

and the adjusted time spent on the graphical overview was 0.908, p < 0.001. The correlation

coefficient between the original browser time spent ratio and the adjusted browser time spent ratio

was 0.844, p < 0.001. Pairwise t-test indicated there was no significant difference between the

original time spent on the browser and the adjusted time spent on the browser. Adjusted time spent on

79

the graphical overview was significantly higher than the original time spent on the graphical

overview. There was no significant difference between the original BTUR and the adjusted BTUR.

4.2.2 Task completion

Each subject performed 24 information-finding tasks. On average, subjects completed 21.42

tasks within the six minute time limit. Overall, 240 of the 2,592 tasks (9.26% of total tasks) exceeded

the time limit. A summary of the number of tasks completed grouped by tool, Web site complexity,

and question type is shown in Table 26. Subjects were able to perform all the tasks within time limit

in the low complexity Web sites and high information-scent questions. Most of the tasks that

exceeded the time limit were in the high complexity Web site with the low information-scent

questions, indicating that these tasks were difficult and required a longer time to finish than the time

limit allowed.

Table 26: Summary statistic of number of tasks completed

Tool Web complexity

Question type

Tasks completed Num. of tasks not completedN Percent (1) Avg.(2) S.D.

Browser High High 212 98.15% 1.96 0.190 4Low 150 69.44% 1.39 0.734 66

Low High 216 100.00% 2.00 0.000 0Low 211 97.69% 1.95 0.252 5

Graphical overview

High High 206 95.37% 1.91 0.291 10Low 158 73.15% 1.46 0.647 58

Low High 216 100.00% 2.00 0.000 0Low 210 97.22% 1.94 0.230 6

Integrated High High 213 98.61% 1.97 0.165 3Low 134 62.04% 1.24 0.722 82

Low High 216 100.00% 2.00 0.000 0Low 210 97.22% 1.94 0.268 6

Total 2,352 1.81 0.470 240(1) Percent of tasks completed for each specific condition (i.e. 216 (108 x 2 replicate))(2) Average number of tasks completed by subjects, 2 tasks each subject, for each condition.

The number of tasks completed was analyzed in ANOVA with treatment (tool x Web

complexity x question type) as a within-subjects factor. The sphericity assumption was not met

(Appendix I.1, Table 51) so that the lower-bound correction was applied (Table 27). The number of

tasks completed depended on a three-way interaction between tools, Web complexity and question

type (F = 4.428, p = 0.038) (Figure 30).

80

Table 27: ANOVA on number of tasks completed, lower bound correction

SourceType III Sum

of Squares df Mean Square F Sig. TOOL .421 1.000 .421 1.696 .196 WEB 32.744 1.000 32.744 162.282 .000 QUESTION 32.744 1.000 32.744 178.850 .000 TOOL * WEB .381 1.000 .381 1.752 .189 TOOL * QUESTION 1.122 1.000 1.122 4.063 .046 WEB * QUESTION 22.827 1.000 22.827 115.360 .000 TOOL * WEB * QUESTION 1.113 1.000 1.113 4.428 .038 Error 26.887 107.000 .251

Hs = High information-scent question type, Ls = Low information-scent question typeHc = High complexity Web site, Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool

Figure 30: Cell line chart of mean number of tasks completed grouped by tool, Web site

complexity, and question type show interactions

The number of tasks completed was divided into two groups, the high complexity Web site

condition and the low complexity Web site condition, and analyzed with an ANOVA with tool by

question type as a within-subjects factor. In the high complexity Web site condition, tool by question

type interaction was significant (F=4.798, p <0.031, Table 28) but in the low complexity Web site

condition, the tool by question type interaction was not significant (F=0.050, p < 0.952, Table 29).

81

Table 28: ANOVA on number of tasks completed in the high complexity Web site condition

with lower-bound correction

SourceType III Sum

of Squares df Mean Square F Sig. TOOL .799 1.000 .799 1.980 .162 QUESTION 55.125 1.000 55.125 157.816 .000 TOOL * QUESTION 2.231 1.000 2.231 4.798 .031 Error 49.769 107.000 .465

Table 29: ANOVA on number of task completed in the low complexity Web site condition with

lower-bound correction

SourceType III Sum

of Squares df Mean Square F Sig. TOOL .003 1.000 .003 .050 .824 QUESTION .446 1.000 .446 14.088 .000 TOOL * QUESTION .003 2 .002 .050 .952 Error 6.664 107.000 .062

In the high complexity Web site condition, pairwise comparison between tools with high and

low information-scent question types showed that there were no significant differences in the number

of tasks completed between tools in the high information-scent question type, but the graphical

overview had significantly higher number of tasks completed than the integrated tool in the low

information-scent question type (Appendix I.1, Table 52).

In the low complexity Web site condition, the question type had the main effect. The low

information-scent question type showed significantly lower number of tasks completed than the high

information-scent question type (Bonferroni, Mean Diff. = 0.052, p<0.001). There was no tools

effect.

The number of tasks that exceeded the time limit showed that the subject could not find the

answer. The fact that subject submitted an answer page is not indicative of their having found the

correct answer. In this experiment setting, subjects might give up searching for the answer and submit

a page at any time.

4.2.3 Number of answers found

The answer was found if a subject clicked “submit” when the target Web page was presented

within the time limit. The answer was found in 1,765 tasks, 68.1% of total tasks, and was not found in

827 tasks, 31.9 % of total tasks. On average, a subject found answers in 16.3 out of the 24 tasks. A

82

summary of the number of answers found grouped by tool, Web complexity, and question type is

shown in Table 30.

Table 30: Summary statistics of the number of answers found

Web site Complexity

Question type

Answers foundTool N %(1) Avg.(2) StdDev. Browser High High 176 81.48% 1.630 0.5895

Low 44 20.37% 0.407 0.5806Low High 202 93.52% 1.870 0.3375

Low 144 66.67% 1.333 0.8428Graphical High High 171 79.17% 1.583 0.6575Overview Low 60 27.78% 0.556 0.6604

Low High 203 93.98% 1.880 0.3269Low 177 81.94% 1.639 0.6479

Integrated High High 184 85.19% 1.704 0.5842Low 56 25.93% 0.519 0.6187

Low High 199 92.13% 1.843 0.3906Low 149 68.98% 1.380 0.8284

Total 1765(1) Percent of tasks completed for each specific condition (i.e.216 (108 x 2 questions) tasks)(2) Average number of answers found completed by subjects, 2 tasks each subject, for each condition.

The 240 tasks that exceeded time limit were counted as “answer not found”, accounting for

29.0% of the number of answers not found. The navigation log indicated that in 71 tasks out of the

827 tasks with answers not found (8.6%) the subjects visited the target page but submitted some other

page or continued searching until the time limit was exceeded (Appendix I.3, Table 53). In the

remaining 756 tasks, subjects exceeded the time limit or submitted a page without visiting the target

page.

The number of answers not found in the high complexity Web site with the low information-

scent questions was 488, 75.3% of the tasks in this condition (Table 31). This accounts for 59.0% of

total answers not found (827). The results would be a little easier to accept if all of 488 not found

answer resulted from the time out condition. The question that arises is whether one or more of the

questions were poorly worded resulting in submission of incorrect target pages that could indeed be

defined as correct.

83

Table 31: Summary number of answer found, answer not found, and timed out grouped by

Web site complexity and question type

Web site complexity

Question type

Answer found Answer not found Grand TotalNot timed out Timed out Not found Total

N %(row) N %(row) N %(row) N %(row)High High 531 81.9% 100 15.4% 17 2.6% 117 18.1% 648

Low 160 24.7% 282 43.5% 206 31.8% 488 75.3% 648High Total 691 53.3% 382 29.5% 223 17.2% 605 46.7% 1296Low High 604 93.2% 44 6.8% 44 6.8% 648

Low 470 72.5% 161 24.8% 17 2.6% 178 27.5% 648Low Total 1074 82.9% 205 15.8% 17 1.3% 222 17.1% 1296Grand Total 1765 68.1% 587 22.6% 240 9.3% 827 31.9% 2592

Further analysis on the distribution of the number of answer found, the number of the answer

not found, and the number of answer not found cause by timed out were conducted (Figure 31 and

Appendix I.3, Table 54). This distribution would support the conclusion that there was not a “bad

question” that skewed the result. In all questions, some found the target answer, some submitted a

wrong page, and some timed out. The second prospect has to do with the possibility that there is a

secondary legitimate target page being selected. To test this possibility, the frequency with which

incorrect target pages were submitted was summarized (Figure 32 and Appendix I.3, Table 55). The

number of unique pages that was not a target pages submitted show exponential distribution. This

distributed indicated that each question many subjects submitted the same non-target pages and many

non-pages submitted by only one subjects. The pages that had the highest number of subjects

submitted were investigated. Those pages were, in fact, not a secondary target pages but partially

match with a key word in the question. For instance, the question asked the following: Find the

abstract of “A latent variable model for multivariate discretization.” The target pages contain the

exact abstract name, 14 subjects submitted the target page. On the other hand, 14 subjects submitted

the page that contain “Multivariate Discretization Method for Learning Bayesian Networks from

Mixed Data” page.

84

Figure 31: The percent of answers found, answers not found, and tasks incomplete for each

question.

1

6

11

16

21

26

31

5Q1H

15Q

2H2

4Q1H

14Q

2H2

9Q2H

29Q

1H1

5Q4L

22Q

1H1

2Q2H

24Q

3L1

5Q3L

11Q

1H1

3Q2H

24Q

4L2

9Q3L

12Q

3L1

1Q2H

23Q

4L2

3Q3L

11Q

4L2

2Q4L

29Q

4L2

1Q3L

1

0

5

10

15

20

25

# task

# page

Question ID

Distribution of pages submitted only non-target pages and not timed out

5Q1H15Q2H24Q1H14Q2H29Q2H29Q1H15Q4L22Q1H12Q2H24Q3L15Q3L11Q1H13Q2H24Q4L29Q3L12Q3L11Q2H23Q4L23Q3L11Q4L22Q4L29Q4L21Q3L1

Figure 32: Histogram of submitted pages each question only tasks that not timed out and the

target node not found

85

The number of answers found was analyzed using ANOVA with treatment (tool x Web

complexity x question type) as a within-subjects factor (Table 32). The sphericity assumption was

met (Appendix I.3, Table 56). No three-way interactions were found. Web site complexity by

question type interaction was significant (F=156.794, p < 0.001) and tool by question type interaction

was significant (F = 6.212, p < 0.001)(Figure 33). Tool by Web site interaction was not significant

(F=6.212, p=0.57). Web site complexity by question type interaction will be discussed later in this

session.

Table 32: ANOVA on the number of answers found


Squares df Mean Square F Sig. TOOL 2.344 2 1.172 3.053 .049 WEB 113.186 1 113.186 373.823 .000 QUESTION 196.779 1 196.779 503.658 .000 TOOL * WEB 1.955 2 .978 2.897 .057 TOOL * QUESTION 3.576 2 1.788 6.212 .002 WEB * QUESTION 43.340 1 43.340 156.794 .000 TOOL * WEB * QUESTION .144 2 .072 .245 .783 Error 62.690 214 .293


Figure 33: Cell line charts of mean number of answers found showing tool by Web site

complexity interaction and tool by question type interaction

For the low information-scent question type, the number of answers found using the graphical

overview was significantly higher than the number of answers found using the browser (Mean Diff =

0.227, p < 0.001, Appendix I.3, Table 57). The number of answers found using the graphical

overview was not significantly higher than the number of answers found using the integrated tool

86

(Mean Diff = 0.148, p <0.053). There were no significant differences in the number of answers found

between tools when questions were the high information-scent type.

In the low complexity Web site, the number of answers found using the graphical overview

was significantly higher than the number of answers found using the browser (Mean Diff = 0.157, p

<0.012, Appendix I.3, Table 58) and the integrated tool (Mean Diff = 0.148, p <0.035).

In other words, using the graphical overview was more effective in finding more answers

when the tasks became more difficult, in the high complexity Web site or in the low information-

scent question type when compared to the browser and the integrated tool.

4.2.4 Task performance

Task performance is measured by time spent on the task and number of pages viewed.

Time spent on task

Time spent on task was measured from when subject clicked on start button to when subject

clicked on submit or 6 min. (360 sec.) timed out. The summary of the time spent on task group by

tool, Web site complexity and question type was show in Table 33. The time-spent distribution of

each group was different. The high complexity Web site with the low information-scent question type

group had a negative skew because it contained many tasks that timed out -- ceiling effect of time

limit. The low complexity Web site with the high information-scent question type group has high

skew. The histogram of time spent is shown in Figure 34. The 240 tasks that exceeded time limit

appeared on the 360 sec bar.

Table 33: Summary statistic of time spent on tasks (sec.)

Tool Site Question Mean Std. Dev. Median SkewnessBrowser High High 77.275 83.878 40.333 1.962

Low 227.832 120.484 228.960 -0.205Low High 22.563 27.913 13.559 4.226

Low 117.145 88.176 98.167 1.092Graphical overview

High High 92.708 91.468 58.501 1.634Low 228.890 113.012 234.140 -0.177

Low High 31.658 33.180 20.425 2.887Low 118.416 86.805 93.345 1.251

Integrated High High 72.802 77.544 42.710 2.127Low 254.932 115.492 300.135 -0.644

Low High 27.754 33.570 17.445 3.860Low 121.292 88.993 98.535 1.183

Total 116.105 115.491 69.585 1.064

87

Histogram of time spent on task

0

50

100

150

200

250

300

350

400

10 40 70 100 130 160 190 220 250 280 310 340 More

Time (Sec.)

Freq

uenc

y

.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

FrequencyCumulative %

Figure 34: Histogram of time spent on task

Because the distribution of the time-spent data was not a normal distribution, natural log

transformation was applied to the time spent on task value. The extreme data was detected and

removed (Appendix I.4). The tasks that exceeded timed limit were included in this analysis. The

transformed time spent on task was analyzed in an ANOVA with treatment (tool x Web complexity x

question type) as a within-subjects factor. The missing data were replaced with group mean for the

analysis. The sphericity assumption was met (Appendix I.5, Table 60). Web complexity by question

type interaction was significant (F = 7.435, p = 0.007). Tool by question type interaction was

significant (F = 3.591, p = 0.029)(Figure 35).

Table 34: ANOVA on ln(time spent on task)


Squares df Mean Square F Sig. TOOL 12.471 2 6.236 12.911 .000 WEB 527.273 1 527.273 1192.078 .000 QUESTION 1324.265 1 1324.265 1898.668 .000 TOOL * WEB 1.464 2 .732 1.806 .167 TOOL * QUESTION 4.807 2 2.403 3.591 .029 WEB * QUESTION 3.734 1 3.734 7.435 .007 TOOL * WEB * QUESTION 1.265 2 .632 .876 .418 Error 154.480 214 .722

88

Hs = High information-scent question type, Ls = Low information-scent question type

B = Browser, G = Graphical overview, I = Integrated tool

Figure 35: Cell line chart of mean ln(time spent on task) grouped by tool, question type show

tool by question type interaction

A pairwise comparisons between tools using high and low information scent questions show

that subjects spent significantly more time to do the task using the graphical overview with the high

information-scent questions than when using the browser and the integrated tool for the same

question type (Mean Diff = 0.241, p <0.001 and Mean Diff =0.141, p < 0.021 respectively in ln(time)

unit, Appendix I.5, Table 61). For the low information-scent questions, time on task was significantly

lower when using the browser compared to using the integrated tool (Mean Diff = 0.146, p <0.015).

There was no significant difference when using the graphical overview compared to using the

browser and the integrated tool. The question type was an ordinal interaction with the tools and more

strong effect than the tools effect. The low information-scent question type took more time to

complete than high information-scent question type.

The absence of the tool by Web site complexity interaction in time spent on the task was

interesting. There was no significant difference in the time spent on task between the three tools

when Web sites were high or low complexity. The main effect of Web site complexity, question type,

and interaction between Web site complexity and question type are discussed later in this chapter.

Number of pages viewed

The number of pages viewed may be measured in terms of page views, pages, revisited page

views, and extra page views. The number of page views includes repeated pages. The number of

pages does not count repeated viewing. The number of revisited page views counts only those visited

more than one. The number of extra page views is the number of page views minus the minimum

number of pages required to complete the task.

89

Summary the number of page views, the number of pages, the number of revisited page views

and the number of extra page views are shown in Table 35 and Table 36. The distribution of the

number of pages views was highly skewed in the high information-scent question type. The

distribution of the number of page views, the number of pages, the number of revisited page views

and the number of extra page views showed an exponential distribution (Figure 36).

There were 22 tasks, by 17 subjects, where subjects viewed all the pages in the Web site. This

phenomenon only happened in the one Web site that had 16 pages, the low complexity Web site

condition. One task in the high information-scent question type condition used the integrated tool.

There were 21 tasks in the low information-scent question condition; 12 tasks employed the graphical

overview—5.6% of the total task in the same condition, and 9 tasks utilized the integrated tool – 4.2%

of the task in the same condition. This ceiling effect did not produce significant difference in the

mean value of the number of pages.

There were 35 tasks where the number of extra page views was less than zero. These

happened when subjects submitted the first page when using the graphical overview or the integrated

tool and when subject visited pages less than the number of page in the shortest path from the first

page to the target node. All of these cases the answer was not found. This occurrence was considered

to be the extreme case (Appendix I.4).

Table 35: Descriptive statistics of the number of page views and the number of pages

Number of page views Number of pages

ToolWeb complexity

Question type Mean

Std. Dev. Median

Skew-ness Mean

Std. Dev. Median

Skew- ness

Browser High High 9.824 9.362 6 2.525 7.81 5.469 5 2.089 Low 29.56 20.2 24 0.502 18.09 10.52 16 0.481

Low High 3.62 2.516 3 3.936 3.134 1 3 2.273 Low 15.94 10.66 15 1.16 8.125 4.121 8 0.544

Graphical overview

High High 7.167 9.062 3 2.736 6.111 7.579 3 3.196 Low 17.32 13.79 13.5 1.194 14.28 11.46 11 1.439

Low High 3.245 2.172 2 3.015 2.903 1.737 2 3.516 Low 12.61 7.133 12 1.252 10.67 4.478 12 -0.033

Integrated High High 6.759 7.199 5 4.717 5.727 4.585 5 3.255 Low 21.84 15.46 19 1.229 15.83 10.29 14 1.347

Low High 3.236 2.813 3 5.376 2.88 1.859 2 5.503 Low 13.29 8.393 13 0.95 9.458 4.503 10 -0.167

90

Table 36: Descriptive statistics of the number of revisited page views and the number of extra

page views

Number of revisited page views Number of extra page views

ToolWeb complexity

Question type Mean

Std. Dev. Median

Skew-ness Mean

Std. Dev. Median

Skew-ness

Browser High High 2.014 4.279 0 3.217 5.157 9.33 1 2.523 Low 11.47 10.66 8 0.751 24.73 19.96 19 0.489

Low High 0.486 1.739 0 5.084 0.787 2.442 0 4.151 Low 7.81 7.377 7 1.969 12.94 10.66 12 1.16

GraphicalOverview

High High 1.056 2.653 0 4.768 5.167 9.062 1 2.736 Low 3.037 5.052 1 3.088 15.32 13.79 11.5 1.194

Low High 0.343 0.859 0 4.444 1.245 2.172 0 3.015 Low 1.935 3.917 0 2.766 10.61 7.133 10 1.252

Integrated High High 1.032 3.438 0 7.617 4.759 7.199 3 4.717 Low 6.009 6.686 4 1.573 19.84 15.46 17 1.229

Low High 0.356 1.268 0 5.678 1.236 2.813 1 5.376 Low 3.829 5.38 1 2.115 11.29 8.393 11 0.95

Figure 36: Histograms of the number of page views, the number of pages, the number of

revisited page views, and the number of extra page views by tasks

91

There were 638 tasks (24.6% of the total tasks) where the number of extra page views was

equal to zero (Table 37). These tasks, the answers were found in 576 of the cases and not found in 62

of the cases. When the number of extra page views was zero and the answer was found, subjects

navigated through the Web site by the shortest possible path to the target node. This situation

accounted for 83% of the total task using the browser in the low complexity Web site with the high

information-scent questions.

Table 37: Number of tasks where the extra page views were zero

Tool Web Complexity

Question type

AnswerFound Not found Total %(1)

Browser High High 86 7 93 43.1%Low 1 3 4 1.9%

Low High 174 6 180 83.3%Low 9 4 13 6.0%

Graphical High High 64 10 74 34.3%Overview Low 1 6 7 3.2%

Low High 113 3 116 53.7%Low 7 4 11 5.1%

Integrated High High 15 2 17 7.9%Low 1 2 3 1.4%

Low High 100 7 107 49.5%Low 5 8 13 6.0%

Total 576 62 638 24.6%*(1) percent of tasks in that condition, 216 tasks total* percent of the total tasks, 2592 tasks.

Because the distributions of the number of pages viewed were not normal, the natural log

transformation was applied to the number of page views, the number of pages, the number of

revisited page views, and the number of extra page views. The extreme data were detected and

removed (Appendix I.4). The transformed variables were analyzed in ANOVA with treatment (tool x

Web complexity x question type) as a within-subjects factor. For analysis, the missing data were

replaced with the group mean value. The sphericity assumption was met (Appendix I.5, Table 62),

but in the number of pages variable Mauchly’s test of Sphericity was marginally insignificant.

However, the result of ANOVA when low-bound correction applied was consistent with the

sphericity assumption. Only ANOVA using the sphericity assumption was reported (Table 38). Post-

hoc analysis was conducted using the Bonferroni adjustment for multiple comparisons (Appendix I.6,

Table 63 and Table 64).

The main effect of Web site complexity, question type, and Web site complexity by question

type interaction is discussed later in session 4.2.5.

92

Table 38: ANOVA on ln(number of page views), ln(number of pages), ln(number of revisited

page views), and ln(number of extra page views)

Source MeasureType III Sum of

Squares df Mean Square F Sig. TOOL TOTAL 48.918 2 24.459 60.441 .000 DIFF 12.929 2 6.464 20.849 .000 REVISIT 157.645 2 78.823 117.521 .000 EXTRA 10.749 2 5.374 8.092 .000 WEB TOTAL 168.719 1 168.719 464.568 .000 DIFF 168.165 1 168.165 609.668 .000 REVISIT 54.030 1 54.030 92.348 .000 EXTRA 218.299 1 218.299 395.896 .000 QUESTION TOTAL 958.208 1 958.208 1420.319 .000 DIFF 659.229 1 659.229 1303.342 .000 REVISIT 661.235 1 661.235 718.385 .000 EXTRA 1773.994 1 1773.994 1850.155 .000 TOOL * WEB TOTAL 19.243 2 9.622 24.266 .000

DIFF 28.342 2 14.171 46.296 .000 REVISIT .520 2 .260 .490 .613 EXTRA 18.028 2 9.014 15.230 .000 TOOL * QUESTION

TOTAL .324 2 .162 .343 .710DIFF 12.880 2 6.440 18.712 .000

REVISIT 111.144 2 55.572 72.403 .000 EXTRA 23.773 2 11.886 14.369 .000 WEB * QUESTION

TOTAL 6.511 1 6.511 16.256 .000DIFF 3.105 1 3.105 12.038 .001

REVISIT 1.999 1 1.999 2.766 .099 EXTRA 12.876 1 12.876 19.195 .000 TOOL * WEB * QUESTION

TOTAL .120 2 .060 .142 .868DIFF 1.486 2 .743 2.358 .097REVISIT 6.187 2 3.093 5.555 .004EXTRA 1.131 2 .565 .733 .482

Error TOTAL 22789.012 214 106.491DIFF 9488.493 214 44.339REVISIT 4786.623 214 22.367EXTRA 1233.883 214 5.766

TOTAL = ln(number of page views), DIFF = ln(number of pages)REVISIT = ln(number of revisited page views), EXTRA = ln(number of extra page views)

In the number of page views, there were two significant two-ways interactions: tool by Web

site complexity interaction (F = 24.266, p < 0.001)(Figure 37), and Web site complexity by question

type interaction (F = 16.256, p < 0.001). Tool by question type interaction was not significant (F =

0.343, p = 0.710). The pairwise comparisons between tools in each Web site condition (Appendix

I.6, Table 63) showed there was no significant difference between using the graphical overview and

using the integrated tool in the low complexity Web site (Mean Diff. = 0.15, p = 1.000, in ln(number

93

of pages) unit). However, there was a significant difference between using the graphical overview and

using the integrated tool in the high complexity Web site (Mean Diff. = 0.251, p = 0.046, in

ln(number of pages) unit). The increase of the number of page views when the Web site was more

complex was smaller when using the graphical overview than when using the browser and the

integrated tool.

Hc = High complexity Web site,Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool

Figure 37: Cell line chart of mean ln(pages views) shows tool by Web site complexity interaction

For the number of pages, there were three two-way interactions, tool by Web site complexity

(F=46.296, p <0.001), tool by question type (F=18.717, p <0.001) (Figure 38) and Web site

complexity by question type (F=12.038, p =0.001). The pairwise comparisons between tools in each

Web site condition (Appendix I.6, Table 63) showed that using the graphical overview resulted in

significantly higher number of pages when compared to using the browser and the integrated tool in

the low complexity Web site (Mean diff. = 0.084, p = 0.030, and Mean diff. = 0.077 p=0.019

respectively, in ln(number of pages) unit). Using the graphical overview resulted in significantly

lower number of pages than the browser and the integrated tool in the high complexity Web site

(Mean diff = -0.427, p<0.001 and Mean Diff. = -0.218, p < 0.001, respectively, in ln(number of page)

unit). The browser and the integrated tool were not significantly different in the low complexity Web

site, but in the high complexity Web site the browser resulted in significantly higher number of pages

than using the integrated tool (Mean diff = 0.218,p <0.001, in ln(number of page) unit).

Tool by question type interaction was indicated by the fact that there was no significant

difference in the number of pages between the three tools with low information-scent questions. They

were all significantly different from each other with the high information-scent questions (Appendix

I.6, Table 64). In the high information-scent question type, the number of pages was the highest with

browser. The integrated tool fell in the middle and the graphical overview was the lowest.

94


Figure 38: Cell line charts for ln(number of pages) shows tool by Web site complexity

interaction and tool by question type interaction

For the number of revisited page views there was a three ways interaction: tool by Web site

complexity by question type (F = 5.555, p = 0.004) (Figure 39). Separate ANOVAs for the high and

low information-scent questions were applied. Tool by Web site complexity interaction was

significant (F=4.053, p = 0.019, Table 39) for the high information-scent questions but it was not

significant (F = 2.602, p = 0.110, Table 40) for the low information-scent questions.


Figure 39: Cell line chart of mean for ln(number of revisited page views) show tool by Web

complexity interaction.

95

Table 39: ANOVA on ln(number of revisited page views) only in the high information-scent

question type


Squares df Mean Square F Sig. TOOL 3.501 2 1.750 4.576 .011 WEB 17.622 1 17.622 58.574 .000 TOOL * WEB 2.928 2 1.464 4.053 .019 Error 32.191 107 .301

Table 40: ANOVA on ln(number of revisited page views) only in the low information-scent

question type


Squares df Mean Square F Sig. TOOL 265.289 2 132.644 125.635 .000 WEB 38.407 1 38.407 38.146 .000 TOOL * WEB 3.778 2 1.889 2.602 .110 Error 155.407 214 .726

For the high information-scent questions, there was no significant difference among tools in

the low complexity Web site. Using the browser resulted in a significantly higher number of revisited

page views than the graphical overview and the integrated tool which were not significantly different

from each other (Mean Diff. = 0.194, p <0.046 and Mean Diff. = 0.222, p < 0.010, in ln(number of

pages) unit, Appendix I.6, Table 65).

For the low information-scent questions, there was no significant interaction between tool

and Web site complexity (Appendix I.6,Table 66). All the tools were significantly different from each

other in the number of revisited page views, indicating a tool main effect. The number of revisited

page views was highest when using the browser. It was lower when using the integrated tool and the

lowest when using the graphical overview.

When using the browser 58.7% of revisited page views were generated by using the back and

forward button. When using the graphical overview only small number of revisited pages was

produced from the back and forward button (4.7% of revisited page views). When using the

integrated tool, 53.4 % of revisited page views came from the back and forward button. The browser

part and the graphical overview part of the integrated tool generated similar number of revisited page

views (23.0% and 23.3% of revisited page views, respectively).

For the number of extra page views, there were three two-way interactions: tool by Web site

complexity interaction (F=15.23, p <0.001), tool by question type interaction (F=14.369, p <0.001),

and Web by question type interaction (F=19.195, p <0.001)(Figure 40). The pairwise comparisons

96

between tools in each Web site condition (Appendix I.6, Table 63) showed that there was no

significance in the number of extra page views using different tools in the low complexity Web site.

Using the graphical overview had a significantly lower number of extra page views in the high

complexity Web site than when using the browser and or integrated tool (Mean diff = 0.234, p =

0.001, Mean diff = 0.336, p <0.001).


Figure 40: Cell line charts of mean ln(number of extra page views) show tool by Web site

complexity interaction and tool by question type interaction

The pairwise comparisons between tools in each question type condition (Appendix I.6,

Table 64) show that using the graphical overview in the low information-scent question type

produced a significantly lower number of extra page views than using the browser or the integrated

tool (Mean diff. = 0.250, p < 0.001 and Mean Diff. = 0.148, p = 0.026 respectively, in ln(number of

pages) unit). There was no significant difference between using the browser and the integrated tool.

In the high information-scent question type the integrated tool generated a significantly higher

number of extra page views than using the browser or the graphical overview (Mean diff=0.304, p

<0.001 and Mean diff = 0.164, p =0.008 respectively, in ln(number of pages) unit). There was no

significant difference between the browser and the graphical overview.

There was a significant interaction between tool and question type in the number of pages and

the number of revisited page views but the interaction was not significant in the number of page

views. Note that the number of page views is equal to the number of pages plus the number of

revisited page views.

97

4.2.5 Web complexity, Question type and their interaction

The Web site complexity by question type interaction effect was significant in the number of

task completions, the number of answers found, the time spent on the task, the number of page views,

the number of pages, and the number of extra page views. It is not significant only in the number of

revisited page views, which showed a three-way interaction between tools, Web site complexity, and

question type. All interactions between Web site complexity and question type were ordinal

interactions. More complex Web sites and low information scent questions affect the following (detail

in Appendix I.8, Table 68)

o It lowers the number of tasks completed within the time limit.

o It lowers the number of answers found.

o It lengthens the time spent on the tasks.

o It raises the number of pages views.

o It raises the number of pages.

o It raises the number of extra page views.

The interactions showed that magnitudes of changing these measurement values depended on

the levels of Web site complexity and questions information-scent score.

4.3 Summary task performance at each conditionThe pairwise comparisons between tools for each Web site complexity and question type are

summarized and shown in Table 41 (detailed in Appendix I.7, Table 67).

Table 41: Summary of tools difference in Web site complexity and question type condition.

Question typeWeb site complexity Measure High LowHigh Task completed B=G=I G>I, G=B, B=I

Answers found B=G=I G>B, G=I, B=ITime spent on task B=G=I B<I, I=G, G=BPages views G<I<B G<I<BPages G<I<B G<(B=I)Revisited page views (G=I)<B G<I<BExtra page views (B=G)<I G<I<B

Low Task completed B=G=I B=G=IAnswers found B=G=I G>(B=I)Time spent on task B<(G=I) B=G=IPages views (G=I)<B B=G=IPages (G=I)<B B<I<GRevisited page views B=G=I G<I<BExtra page views B<(G=I) B=G=I

B = Browser, G = Graphical overview, I = Integrated tool> significant greater than at .05 level, < significant less than at .05 level = no significant difference at .05 level

98

4.3.1 Low complexity Web sites with high information-scent questions

In the low complexity Web site with the high information-scent questions the tasks were the

easiest for the subjects. Subjects were able to finish all the tasks in the time limit. Subjects were able

to find the target pages in 93% of the tasks. 90% of tasks were done within 1 minute with average of

31 sec., and a median of 16 sec. Using the browser was significantly less time spent on task (mean =

22.7 sec., median =13 sec.) than using the graphical overview (mean = 31.7 sec., median = 20 sec.) or

the integrated tool (mean = 27.7 sec, median = 17.4 sec.)

For the browser, the number of page views and the number of pages was significantly higher

than using the graphical overview or using the integrated tool. This was from the fact that when using

the browser subjects must follow the links page by page. Subjects were able to use the browser to

follow the shortest path as indicated by the average number of revisited page views and the average

number of extra page views. Subjects did not obtain “single click away” advantage from the

graphical overview, i.e. the average of extra page views was 1.2 pages, which was worse than using

the browser. The integrated tool performance in the low complexity Web site with the high

information-scent questions was the average of using the browser part of the integrated tool alone and

using the graphical overview part of the integrated tool alone.

4.3.2 High complexity Web sites with high information-scent questions

In the high complexity Web site with high information-scent questions, 97% of the tasks

were done within the time limit and the answers were found in 82% of the tasks. There was no

significant difference between tools in the number of tasks completed and the number of answers

found. There was no significant difference between tools in the time spent on task (mean = 84 sec.

and median = 47 sec.)

The number of page views when using the browser was significantly higher than when using

the graphical overview and the integrated tool. The number of pages when using the browser was

significantly higher than when using the integrated tool. When using the integrated tool, the number

of pages was significantly higher than when using the graphical overview. When using the browser,

the mean for number of revisited page views was two pages and the median was zero pages. The

mean for number of extra page views was five pages and median was one page. This indicates that

subjects got off the shortest path track on more tasks compared to the tasks in low complexity Web

site with high information-scent questions. In the high complexity Web site with the high

information-scent questions, the average path length to the target node was longer than in the low

complexity Web site with the high information-scent questions. The minimum pages required to the

target page were 4.8 pages and 2.8 pages respectively. When using the graphical overview, subjects

99

viewed more pages before locating the target node, indicated by the number of extra page views

(mean = 5.1 pages and median = 1 page).

When using the integrated tool, the number of revisited page views was similar to when using

the graphical overview. The number of the extra page views when using the integrated tool was

significantly higher than using the browser and the graphical overview because the way to compute

extra page views was biased toward using the graphical overview part.

4.3.3 Low complexity Web sites with low information-scent questions

In the low complexity Web site with low information-scent questions, 97% of the tasks were

done within the time limit and the answers were found in 73% of the tasks. There was no significant

difference in the number of tasks completed between tools. Using the graphical overview, the number

of answers found (83% of the tasks in this condition) was significantly higher than with the browser

(66% of the tasks) and the integrated tool (69% of the tasks). There was no significant difference

between tools in time spent on tasks (overall mean = 116 sec. and median = 96 sec.)

There was no significant difference between tools in the number of page views (overall mean

= 13 pages). The numbers of pages was significantly different between tools. The highest number of

pages was by using the graphical overview (mean = 10.6 pages), the second was by using integrated

tool (mean = 9.4) and the lowest one was by using the browser (mean = 8.1 pages). The numbers of

revisited page views was the reverse to the number of pages. There was a significant difference in the

number of revisited page views between tools. The highest number of revisited page views was by

using the browser (mean = 7.8 pages), the second was by using the integrated tool (mean = 3.8 pages)

and the lowest was when using the graphical overview (mean = 1.9 pages). The number of extra page

views was high (overall mean = 11 pages, median = 11 pages) indicating that most subjects did not

navigate through the shortest path. The three Web sites in the low complexity condition had 16, 27,

and 29 html pages. On average, in the low complexity Web site with the low information-scent

question types, 41% of the total pages in the Web sites were viewed.

4.3.4 High complexity Web sites with low information-scent questions

In the high complexity Web site with low information-scent questions, the tasks were

difficult. Only 68% of the tasks in this condition were done within the time limit and the answers

were found in only 24% of the total tasks. The graphical overview was significantly better than the

browser and the integrated tool in the number of task completed and the number of answers found.

Using the graphical overview, 73% of the tasks in this condition were done within the time limit and

27% of the answers in this condition were the target pages. These were higher than by using the

100

browser (tasks completed = 69% of the tasks, the answers found = 20% of the tasks) and by using the

integrated tool (task completed = 62% of the tasks, the answer found = 26% of the tasks). Using the

integrated tool, the time spent on task (mean = 257 sec, median = 304 sec) was significant higher than

when using the graphical overview (mean = 230 sec, median = 239 sec) and the browser (mean = 225

sec, median = 224 sec.) There was no significant difference in the time spent on task between the

browser and the graphical overview.

There were significant differences between tools in the number of page views. The number of

page views was the highest when using the browser (mean = 29.6 pages and median = 24 pages). The

second the number of page views was when using the integrated tool (mean = 21.8 pages and median

= 19 pages). The lowest of the number of page views was when using the graphical overview (mean =

17.3 pages and median = 13.5 pages). There was no significant difference in the number of page

when using the browser (mean = 17.8 pages, median = 16 pages) or the integrated tool (mean = 15.7

pages, median = 14 pages). When using the graphical overview, the number of pages (mean = 14.4,

median = 11 pages) was significantly lower than others. The numbers of revisited page views was

significantly different when using different tools. Using the browser resulted in the highest number of

revisited page views (mean = 10.9 pages, median = 8 pages). Second was the integrated tool (mean =

6 pages, median = 4 pages). The lowest of the number of revisited page views was when using the

graphical overview (mean = 3.1 pages, median = 2 pages). The number of extra page views indicates

the shortest path to the target node was difficult to obtain.

4.4 User satisfactionThe user satisfaction is determined by using a subjective questionnaire called the Post-Study

System Usability Questionnaire (PSSUQ)(Lewis, 1995). The overall satisfaction score (OVERALL)

was computed using the arithmetic mean of 19 questions. The sub-categories, system usefulness

(SYSUSE), information quality (INFOQUAL), and interface quality (INTERQUAL) score were

computed with arithmetic mean of question items 1-8, 9-15, and 16-18 respectively. The PSSUQ uses

a 7-point scale where a higher score is better than lower score based on the anchors used in the scales.

The scores of N/A answers were disregarded.

There were five subjects who did not complete this questionnaire because of personal time

constraints. Incomplete answers and outliers (the score lower than 1.5 x interquartile range) were

detected and removed (6 scores). 97 scores were analyzed. The average user satisfaction score can be

found in Table 42. The PSSUQ score and sub-score were analyzed in ANOVA with tool as a within-

subjects factor. The sphericity assumption was not met (Appendix I.9, Table 69) so that ANOVA

with a lower-bound correction was used. There was a significant difference in user satisfaction scores

101

between tools in all scores (Table 43). The pairwise comparison showed significant differences in

overall score and all sub categories between the graphical overview and the browser and between the

graphical overview and the integrated tool (detailed in Appendix I.9, Table 70). The graphical

overview received a lower overall score and a lower score in all sub-categories as compared to the

browser and the integrated tool. There was a significant difference in the information quality score

between the browser and the integrated tool conditions. The integrated tool received a significantly

higher information quality score than the browser.

Table 42: Questionnaire descriptive statistics

Score Tool Mean* Std. Deviation OVERALL Browser 5.08 .984 Graphical overview 4.23 1.284

Integrated 5.19 1.084

SYSUSE Browser 5.34 1.050 Graphical overview 4.26 1.376


INFOQUAL Browser 4.83 1.112 Graphical overview 4.29 1.313


INTERQUAL Browser 4.98 1.270 Graphical overview 4.10 1.396


* score value from 1-7

Table 43: ANOVA on PSSQU score with lower-bound correction

Source MeasureType III Sum of

Squares df Mean Square F Sig. TOOL OVERALL 53.588 1.000 53.588 43.841 .000 SYSUSE 75.846 1.000 75.846 46.193 .000

INFOQUAL 31.985 1.000 31.985 26.597 .000

INTERQUAL 55.138 1.000 55.138 28.137 .000

Error OVERALL 117.344 96.000 1.222SYSUSE 157.627 96.000 1.642INFOQUAL 115.448 96.000 1.203INTERQUAL 188.126 96.000 1.960

There was some evidence that the score of the browser may not be accurate. 22 subjects

wrote comments about feature that were not provide by the browser. The problem was that many

102

subjects thought the browser in the first of three PSSUQs was the integrated tool and the software,

used in collecting the questionnaire data, did not allow subjects to go back to the previous

questionnaire. The score of the graphical overview and integrated condition were more accurate.

When PSSUQ score was analyzed without these 22 cases, the information quality score difference

between the browser and integrated tool was not significant.

4.5 Support for HypothesesThe hypothesis of this research is:

H0: There is no difference in user performance in information-finding tasks between integrated

navigational tools and individual navigational tools.

H1: There are significant differences in user performance in information-finding tasks when using

different navigational tools in certain kinds of environments.

The null hypothesis is rejected and H1 is accepted. The results from experiment show that

user performances, in term of the number of tasks completed, the number of answers found, time

spent on the task, the numbers of pages viewed are different when using different tools, in certain

kinds of environments. The environments are classified by complexity of the Web sites and semantic

relatedness between question and information provided by the tools.

H1a: Integrated navigational tools, i.e. the browser and the graphical overview, will provide higher

performance in information-finding tasks and navigation within complex Web site spaces with high

information scent than will the browser or the graphical overview alone.

H1a: is rejected. The integrated navigational tool performance is not higher than the browser

or the graphical overview in the high complexity Web sites and high information scent score question

type. There was no significant difference in the time spent on task between tools. However when

using the integrated tool the number of page views was significantly higher than when using the

graphical overview. It was expected that the integrate tool would have an advantage in navigation

within the complex Web space with high information scent because it presented both the map view

and the browser.

H2: Subjects will perform better when using the browser than when using the graphical overview in

simple structured Web sites with little information scent.

H2: is rejected. The number of answers found by using the browser is less than using the

graphical overview in the low complexity Web site with the low information-scent question type.

The browser was expected to provide more information in the low information scent question

103

situation compared to the graphical overview and the performance of the browser should be better

when the Web sites were simple. It seems that the information provided by the browser, e.g.

sentences surround an anchors, did not contributed to performance. This may be because the low

information-scent question had very little relation to the overall information on the Web pages. There

were no significant difference in time spent on task and total pages viewed between tools in this

condition but using the graphical overview more pages were viewed than with the other tools. This

might explain why when using the graphical overview, more answers were found.

H3: Subject performance when using integrated navigational tools will degrade with the simplicity

of the hypertext, as the tool becomes a noise contributor rather than an information provider.

H3 is accepted. The time spent on task using the integrated tool was significant higher than

using the browser alone in the low complexity Web site with the low information-scent questions.

The integrated tool did help improving efficiency in the low complexity Web site with low

information-scent questions as shown by the number of revisited page views. However, the integrated

tool performance in the low complexity Web site was in between the browser and the graphical

overview.

104

5 CONCLUSIONS AND FUTURE STUDYThis study set out to examine the performance of various tools with Web site complexity and

information scent controlled. This was motivated by the belief that these factors had a significant

impact on tool performance. This conclusion would appear to be supported. Web site complexity and

information scent do have an impact on navigation for the purpose of information finding. There is

evidence that supports the conclusion that integrated tools add a level of cognitive overhead to the

task. This is supported by the higher time used in the high complexity Web site with the low

information scent condition and in the transition time from anchor to icon. While the data do not

warrant any firm conclusion beyond those described in the results, the experimenter believes the

research also supports the conclusions that:

Information sent may be the single biggest factor in improving Web site browsing.

Experiments assessing “new” navigational tools will continue to be biased by user preference

for the tools with which they are already familiar.

5.1 Review of the researchThis research sought to understand the use of the integrated navigational tools to find

information in a Web site. An empirical experiment was conducted. Three navigational tools were

investigated, a browser, a graphical overview, and an integrated tool. The environments were varied

in terms of Web site complexity and level of information scent. Task performance was measured in

terms of the tasks completed within the time limit, the number of answers found, the time spent on

task, and the numbers of pages viewed. The numbers of pages viewed were calculated as the total

number of page views, the number of pages, the number of revisited page views, and the number of

extra page views.

In order to classify the Web site into the high and low complexity categories, the

measurements of Web site structure were investigated. A sample of 83 Web sites was analyzed in

term of their structure. Three structural measurements were used as the indicator of the Web site

complexity: the number of HTML nodes, the mean root distance, and the connection ratio. Six Web

sites were selected for the main experiment: three low complexity Web sites and three high

complexity Web sites.

Questions were created from randomly selected pages of the selected Web sites. The

questions were classified in term of their information scent by an information scent experiment. Two

sets of questions were selected based on their information scent score: the high information-scent

questions and the low information-scent questions.

105

The full factorial (3 tools x 2 Web site complexity x 2 question type), repeated measurement

within subject experiment was conducted. The 108 subjects were recruited from students at the

University of Pittsburgh.

Subjects used the integrated tool by alternating individual tools in different tasks or mixing

the use of tools within the same task. The performance of the integrated tool was not superior to the

individual tools alone. There appeared to be cognitive overhead associated with the integrated tool.

The results indicated that there was the extra time needed when switching from the browser to the

map overview. The extra time showed in a significantly longer time spent on tasks when using the

integrated tool in the low information scent question type condition. This finding is similar to result

of Olsen and Nilsen (1987) that adding more features to a system causes lower performance on task.

The graphical overview provided an effective way to navigate in a Web site as indicated by

the number of revisited page views, which was significantly less when using the graphical overview

and the integrated tool in comparisons to the browser.

The experiment showed that there were interactions between tools, Web site complexity, and

questions type. As a consequence, tool performance in different environments showed results. For

instance, there was no difference in time spent on the task when using the browser or the integrated

tool in the high information-scent question condition but there was a significant difference in the low

information-scent question condition.

Both Web site complexity and information scent had an effect on the navigation performance.

The result also shows an interaction between Web site complexity and information scent. Low

information scent and high complexity Web site caused performance to degrade more than low

information scent or high complexity Web site alone.

5.2 Summary findingThe experimental results may be summarized as follows

o Subjects used both of navigational tools in the integrated tool.

o Number of tasks completed within time limit was higher when using the graphical

overview.

o Number of answers found was higher when using the graphical overview.

o Time spent on the tasks was less when using the browser.

o Number of page views was highest when using the browser, second, when using the

integrated tool and lowest when using the graphical overview.

o Using the browser, more pages were viewed than with integrated tools and the

graphical overview, except when the Web site was of low complexity.

106

o More pages were revisited when using the browser.

o Subjects were more satisfied using the browser and the integrated tool than using the

graphical overview.

o Performance of the integrated tool is in between the browser and the graphical

overview except for the time spent on task in the high complexity Web site with low

information-scent questions and extra page views in the high complexity Web site

with high information-scent questions.

The following interactions were found:

Three-way interactions, tool by Web site complexity by question type, were found in the

number of tasks completed and the number of revisited page views.

Two-way interactions between tool and Web site complexity were found in the number

of page views, the number of pages, and the number of extra page views. The number of

answers found was marginally insignificant interaction.

Two-way interactions between tool and question type were found in the number of

answers found, time spent on tasks, the number of page views, the number of pages and

the number of extra page views. The interaction was found in the number of tasks

completed in the high complexity Web site condition but the interaction was not

significant in the low complexity Web site condition.

Two-way interactions between Web site complexity and question were found in the

number of answers found, time spent on task, the number of pages views, the number of

pages, and the number of extra page views. The interaction was also found in the number

of revisited page views in the high information scent questions but it was not significant

in the low information scent questions.

5.3 Comparison to prior research resultsInformation finding task performances were different for different question types classified

by information scent score and Web site complexity. There was an interaction between tool and Web

site complexity and another interaction between tool and question type. These findings are consistent

with Furnas’s framework (Furnas, 1997) to determine the effectiveness of a view of a space. The

graphical overview and the browser present different views of the same space. The difference

between the views affected the performance. Information finding task performance was affected by

the semantic relatedness between question and information provided by tool -- the “residual” (Furnas,

1997) or “information scent” (Pirolli, Card, & Wege, 2000).

107

The experiment conducted by Monk, Walsh, & Dix (1988), which indicated that the static

map aided hypertext navigation performance, predicted that the integrated tool should perform better

in terms of lower response time in the low complexity Web site condition. This experiment results

showed differently: using the browser had a lower response time than the integrated tool. It might be

argued that subjects in the Monk et al. experiment did not have experience using the browser so that

the performance of using the browser was low.

In comparison to the Hammond & Allinson (1989) experiment, the current research showed

that subjects used the map overview part more than the browser part of the integrated tool in the low

complexity Web site (comparable to the Hammond and Allinson experiment which had 32

information screens). This difference in tool use contrasts with the Hammond and Allinson

experiment which reports subjects using a map in the hypertext with map feature 39% of the time.

However, in their experiment, the map had to be initiated while in the current experiment the map

was presented side by side with the browser. The new-to-old ratio, the number of pages divided by

the number of pages views, for the directed task (similar to our information finding tasks) using

hypertext (comparable to the browser), hypertext with map (comparable to the integrated tool) were

0.27 and 0.47 respectively. In the current experiment for the low complexity Web site, the new-to-

old ratio when using the browser was 0.51 in the low information-scent question type and the new-to-

old ratio when using the integrated tool was 0.71. The conclusions were similar that the browser was

less effective in the number of pages viewed compare to the integrated tool. However, our

experiment indicated the browser performed faster in the low complexity Web site with the high

information-scent question type and no differently than the integrated tool in the low information-

scent question type but there was no significant difference in time performance between tools in their

experiment. One possible explanation may be that their questions were on average in the low

information scent question category.

The experiment by Wright & Lickorish (1990) indicated the number of pages, when using an

index page (comparable to the graphical overview), were less than when using the page navigation

(comparable to the browser). This is consistent with the current experiment that found that subjects

when using the browser, in general, viewed more pages than with other tools. However, they reported

no significant difference in time performance between two navigational tools in the direct finding

question type. The current result did show the difference in time performance.

The results of the current experiment related to the integrated tool performance in terms of

the time spent on task were consistence with Heo (2000) in which the integrated tool took more time

than the browser to complete the information finding task. However, the differences of the time spent

on task were detected in specific environments that were in the high complexity Web site with the

108

low information-scent question type and in the low complexity Web site with the high information

scent question type. Heo’s experiment did not detect the interaction between size of Web site and

tools in time spent on task but the current experiment showed an interaction between Web site

complexity and tools. It might be the case that the size of the Web site was not the parameter that

interacted with the tools. Heo’s experiment reported no significant difference in task accuracy

between the tools (i.e. the browser and three integrated tools) consistence with the current experiment

in which there was no significant difference in the number of answer found between the integrated

tool and the browser. However, in the current experiment, the graphical overview performed better in

terms of the number of answers found.

5.4 Issues to reconsiderOne major consideration in the experiment was the time limit and its impact to the results.

Initially, the task time limit was set at 10 minutes. A pilot study was conducted and subjects in the

pilot study suggested that the time was too long. The time limit was reduced to 6 minute. This was

based on the fact that 90% of the answers that were found in the pilot study were found in that time.

However, the experimenter failed to detect that this had a substantial impact on tasks in the high

complexity Web site with low information-scent questions. As a consequence, the number of tasks

completed in this condition was low and produced the ceiling effect on other performance

measurements.

The questions in the low information scent group had very low scores. The questions in this

condition were not difficult questions. The information scent score was low because the Web pages

and the graphical map did not provide the information that was needed in such the way that it was

easy to find the answer to the questions. When the information scent was too low, it was very difficult

to find the answer, particularly in the high complexity Web sites. As a result, the high complexity

Web site with low information scent had too many tasks that exceeded the time limit and many of the

answers were not found. In the WWW environment, there are a lot of redundant sources of

information. When a user fails to find information in a Web site, changing Web site may be easier

than continuing try to explore the Web site, but this strategy was not available to the subjects in this

study.

The effect of the information scent had a significant impact on the navigational performance.

The information scent measurement might be improved. The browser information scent score

calculation was simplified by using the arithmetic mean of the Web page scores on the shortest path

to the target node. The score may be more theoretically grounded by using a conditional probability

instead of an arithmetic mean.

109

5.5 Future researchThe first question asked by many subjects was whether the tools had a search capability.

Searching by search engine or search function within tools is important to the information finding

task. Integrating a search function might help overall performance. For instance, in the simple Web

structure with high information-scent questions, simple browsing might perform well without a

search function. The search function might detract from the main finding task. For low information

scent questions, a search function might be more beneficial.

The results showed that subjects used the integrated tool in an integrated way. However, the

appearance of the tools in the experiment was the fixed on screen, no moving and no closing. The

result might be different if the tools did not appear side by side.

The integrated tool performance was not better than the single tool. This research did not

have enough information to answer why. The integrated tool did provide some improvement over the

browser alone, e.g. it reduced the number of re-visited pages and allowed more answers to be found.

The experiment showed that the question type based on information-scent score and the Web

site complexity had a high impact to the performance. These parameters could be used as a

performance predictor in the Web site design process. If this is to be done, the process for obtaining

information-scent may have to be refined to be practical. The difficult task will be to predict user and

customer information need generally. Information scent also depends on prior knowledge and

common knowledge about the subject of a Web site. Small scale Web site usability with focus group

subjects might be useful.

On the other hand, the Web site complexity metric is more objective and easier be automate.

This property is useful in an iterative design methodology. The relation between Web site complexity

and the navigational performance should be investigated in more in detail.

110

Appendix A : Web visualize tools

Figure 41: Web browser with a distortion technique tool

Figure 42: Web browser with a zoom technique tool

111

Figure 43: Web browser with an expanding outline technique tool

112

Appendix B : URI in HTML tagsThe following tags-attribute from HTML (version 4) DTD specification will be followed by URI;

A href

APPLET codebase

AREA href

BASE href

BLOCKQUOTE cite

BODY background

DEL cite

FORM action

FRAME longdesc

FRAME src

HEAD profile

IFRAME longdesc

IFRAME src

IMG longdesc

IMG src

IMG usemap

INPUT src

INPUT usemap

INS cite

LINK href

OBJECT classid

OBJECT codebase

OBJECT data

OBJECT usemap

Q cite

SCRIPT for

SCRIPT src

The “ismap” attribute of IMG tags indicate that the client software should capture the clicked pointer

location, i.e. x and y coordinate, and derives URI request of the parent A tag by append ‘?’ followed

by x, y values pass to the server.

Appendix C : Stratum formulaLet D be a directed graph and d(u,v) be a distance between u and v in D.

The distance sum for all u in D and only d(vi,u) is finite.

The ai is the sum of the finite entry on row ith of D.

The distance sum for all u in D and only d(u,vj) is finite.

The bi is the sum of the finite entry on column ith of D.

The linear absolute prestige (LAP) is given by

where n is the number of nodes in D.

The total absolute prestige (TAP) is given by

for i form 0 to n.

The stratum (St) is define as

113

Appendix D : Web site structure statistic

Table 44: Correlations between numbers of nodes

Total URLs URLs within site HTML nodes

Total URLs 1.000 .992(**) .942(**)

URLs within site .992(**) 1.000 .935(**)

HTML nodes .942(**) .935(**) 1.000

Pearson Correlation** Correlation is significant at the 0.01 level (1-tailed).

Table 45: Correlations between numbers of links

Total links Navigation links Connections

Total links 1.000 .967(**) .960(**)

Navigation links .967(**) 1.000 .969(**)

Connections .960(**) .969(**) 1.000


Table 46: Distance measurement correlation


Bi-direction distance mean


Root distance mean

Directed distance mean 1.000 .759(**) .932(**) .848(**)

Bi-direction distance mean .759(**) 1.000 .901(**) .792(**)

Jump to root distance mean .932(**) .901(**) 1.000 .923(**)

Root distance mean .848(**) .792(**) .923(**) 1.000


114

Table 47: Correlation between the Web site metrics

HTM

L no

des

Con

nect

ions

Con

nect

ion

per H

TML

node

-1 ra

tio

Con

nect

ed

ratio

Com

pact

ness

HTML nodes 1.000 .820(**) .138 -.004 .019

Connections .820(**) 1.000 .505(**) .263(**) .288(**)

Connections per HTML node-1 ratio .138 .505(**) 1.000 .594(**) .619(**)

Connected ratio -.004 .263(**) .594(**) 1.000 .998(**)

Compactness .019 .288(**) .619(**) .998(**) 1.000

Stratum -.367(**) -.267(**) -.213(*) .020 .003

Directed distance mean .586(**) .297(**) -.069 -.035 -.024

Bi-direction distance mean .507(**) .109 -.309(**) -.528(**) -.520(**)

Jump to root distance mean .617(**) .258(*) -.174 -.296(**) -.284(**)

Root distance mean .647(**) .346(**) .048 -.137 -.122Pearson Correlation* Correlation is significant at the 0.05 level (1-tailed).** Correlation is significant at the 0.01 level (1-tailed).

Table 47 (Cont.)

Stra

tum

Dire

cted

di

stan

ce

mea

n

Bi-d

irect

ion

dist

ance

m

ean

Jum

p to

ro

ot

dist

ance

m

ean

Roo

t di

stan

ce

mea

nHTML nodes -.367(**) .586(**) .507(**) .617(**) .647(**)

Connections -.267(**) .297(**) .109 .258(*) .346(**)

Connections per HTML node-1 ratio -.213(*) -.069 -.309(**) -.174 .048

Connected ratio .020 -.035 -.528(**) -.296(**) -.137

Compactness .003 -.024 -.520(**) -.284(**) -.122

Stratum 1.000 -.325(**) -.370(**) -.381(**) -.352(**)

Directed distance mean -.325(**) 1.000 .759(**) .932(**) .848(**)

Bi-direction distance mean -.370(**) .759(**) 1.000 .901(**) .792(**)

Jump to root distance mean -.381(**) .932(**) .901(**) 1.000 .923(**)

Root distance mean -.352(**) .848(**) .792(**) .923(**) 1.000Pearson Correlation* Correlation is significant at the 0.05 level (1-tailed).** Correlation is significant at the 0.01 level (1-tailed).

115

Appendix E : Web Sites in the experiment and their properties

Id Type Name HTML

nodes Mean root

distanceConnected

ratio9 High Intelligent Systems Program

URL:http://www.isp.pitt.edu/ 261 3.9000 0.6442

2 High Dept. of SurgeryURL:http://www.surgery.upmc.edu/ 327 3.0951 0.0265

1 High Women's Studies ProgramURL:http://www.pitt.edu/~womnst/ 105 2.5192 0.5034

4 Low Dept. of Hispanic Languages and LiteratureURL:http://www.pitt.edu/~hispan/ 29 1.8571 1.0000

5 Low Dept. of StatisticsURL:http://www.stat.pitt.edu/ 27 1.9231 0.8376

3 Low Film Studies Programhttp://www.pitt.edu/~filmst/ 16 1.8000 1.0000

7 Practices Dept. of History of Art and Architecturehttp://www.pitt.edu/~arthome/index.html 62 2.4918 0.8072

8 Practices Dept. of Information Science and Telecommunicationshttp://www.sis.pitt.edu/~dist/

118 2.0000 0.8020

Scan date 02-Nov-2000

116

http://www.sis.pitt.edu/~dist/

http://www.pitt.edu/~arthome/

http://www.pitt.edu/~filmst/

http://www.stat.pitt.edu/

http://www.pitt.edu/~hispan/

http://www.pitt.edu/~womnst/

http://www.surgery.upmc.edu/

http://www.isp.pitt.edu/

Appendix F : Information Scent experiment

F.1 Information scent experiment instruction sheet

117

Information scent experiment Instruction sheet

Thank you, for participating in this experiment. The objective of this information scent

experiment is to measure the semantic information, i.e. information scent, between a given question

and a Web site. This values obtained will be used as a part of Ph.D. thesis, Semantics, Complexity

and Capability: The Use of Integrated Navigational tools for Information Finding in Hypertext

Document Space.

The experiment will use special software to present and collect data. Your task is to select

pages or anchors which you think there are most likely to contain or lead to information that answers

the given question. In total, there are 150 screens for this experiment.

Two types of screen will be presented. The first one is a Graphical Overview, which shows

all the pages in a Web site as icons. In this view, your task is to select three of the icons (pages) that

are likely to contain the answer of the given question. You should order the selections putting the

most likely first. A screen snapshot of the Graphical overview is shown as Figure 1.

Figure 1: Graphical Overview

1

2

3

4

118

The overview can be manipulated using the by scroll bars at the bottom (1) and right (2). The

small map may also be used to navigate by clicking or dragging the small box on it (3), which

represents what you are seeing. The view can be zoomed in or out by +, - buttons or scroll bar on the

left near the +, - button (4).

The second screen type is similar to the Web browser as shown in Figure 2. A given page

will be presented. Your task is to select up to three anchors that are most likely to lead to the page

that you think will contain the answer or be closer to the page containing the answer. You must

select at least one, but you can select up to three. If you select more than one anchor, the first should

be the one most likely to move you to or toward the page with the answer. Keep in mind that for a

given page, you may only selecting a link that will move you toward the answer. Note that, in the

browser screen, anchors are usually represented by text with blue color and underline, however,

sometime image areas are also selectable anchors. One indication of an anchor on an image is when

the cursor shows as . This means when the cursor is over an anchor area.

Figure 2: Browser

119

For both tasks, the selected pages or anchors will be shown in a list box like that shown in

Figure 3. The order of selected items can be re-arranged by selecting the item and using the or

button. You may delete the item by selecting and pressing the button.

Figure 3: Selected anchors or pages

After finishing the experiment, there will be two questionnaires to collect demographic data

and Web site familiarity information.

Your participation in this research study is completely voluntary. You do not have to take

part in this research study and, should you change your mind, you can withdraw from the study at any

time.

There are no direct risks or benefits to participation. Indirectly, you will have a chance to be

exposed to a state-of-the art technology and learn more about navigational tools. You will also

contribute indirectly to the development of such technology.

There will be no cost to you. The experiment will take approximately one hour. Upon

completion, you will receive a $25 payment for your participation or $7 per hour.

120

F.2 Questions in the information scent experiment and results

Table 48: Questions, target Web page and selected Web pages for information scent experiment

Site Question Target, question, and test pages9 9Q1H1**

-> 9Q2H2Target: http://www.isp.pitt.edu/courses/2710.htmlQuestion: What was the course description of ISSP 2710?

http://www.isp.pitt.edu/index.htmlhttp://www.isp.pitt.edu/new/Information/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/complete.html

9 9Q2H2**->9Q1H1

Target: http://www.isp.pitt.edu/courses/3540.htmlQuestion: What was the course description of ISSP 3540?

http://www.isp.pitt.edu/index.htmlhttp://www.isp.pitt.edu/new/Information/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/complete.html

9 9Q3H3 Target: http://www.isp.pitt.edu/~whq/wangresume.htmlQuestion: What papers has graduate student HaiQin Wang published?

http://www.isp.pitt.edu/http://www.isp.pitt.edu/new/directory/students/student-webpage-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/directory.htmlhttp://www.isp.pitt.edu/~whq/

9 9Q4L1 Target: http://www.isp.pitt.edu/program/specstud.htmlQuestion: Does ISSP accept special students?

http://www.isp.pitt.edu/index.htmlhttp://www.isp.pitt.edu/new/Map/map.html

9 9Q5L2*->9Q4L2

Target: http://www.isp.pitt.edu/~smonti/HTML/DOCUMENTS/ais99.htmlQuestion: Find the abstract of “A latent variable model for multivariate discretization.”

http://www.isp.pitt.edu/http://www.isp.pitt.edu/new/directory/students/student-webpage-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/directory.htmlhttp://www.isp.pitt.edu/~smonti/index.htmlhttp://www.isp.pitt.edu/~smonti/HTML/publications.html

9 9Q6L3*->9Q3L1

Target: http://www.isp.pitt.edu/~carenini/storage/new-papers-frame.htmlQuestion: Who (s) wrote “Describing Complex Charts in Natural Language: A caption Generation System”

http://www.isp.pitt.edu/http://www.isp.pitt.edu/new/directory/people-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/student-webpage-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/directory.htmlhttp://www.isp.pitt.edu/~carenini/

2 2Q1H1**->2Q1H1

Target: http://www.surgery.upmc.edu/contact/plastic/Russavage/Russavageeducation.htmQuestion: Where was James M. Russavage (professor of plastic surgery) educated

121

http://www.surgery.upmc.edu/contact/plastic/Russavage/Russavageeducation.htm

http://www.isp.pitt.edu/~carenini/

http://www.isp.pitt.edu/new/directory/students/directory.html

http://www.isp.pitt.edu/new/directory/students/student-webpage-frame.html

http://www.isp.pitt.edu/new/directory/people-frame.html


http://www.isp.pitt.edu/~carenini/storage/new-papers-frame.html

http://www.isp.pitt.edu/~smonti/HTML/publications.html

http://www.isp.pitt.edu/~smonti/index.html




http://www.isp.pitt.edu/~smonti/HTML/DOCUMENTS/ais99.html

http://www.isp.pitt.edu/new/Map/map.html

http://www.isp.pitt.edu/index.html

http://www.isp.pitt.edu/program/specstud.html

http://www.isp.pitt.edu/~whq/




http://www.isp.pitt.edu/~whq/wangresume.html

http://www.isp.pitt.edu/new/Information/Classes/complete.html

http://www.isp.pitt.edu/new/Information/Classes/classes-frame.html

http://www.isp.pitt.edu/new/Information/classes-frame.html


http://www.isp.pitt.edu/courses/3540.html

http://www.isp.pitt.edu/new/Information/Classes/complete.html

http://www.isp.pitt.edu/new/Information/Classes/classes-frame.html

http://www.isp.pitt.edu/new/Information/classes-frame.html


http://www.isp.pitt.edu/courses/2710.html

Site Question Target, question, and test pagesand what was his training in?

http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/contact/FScontact.htmhttp://www.surgery.upmc.edu/contact/plastic/facplas.htmhttp://www.surgery.upmc.edu/contact/plastic/Russavage/Russavagebio.htm

2 2Q2H2 Target: http://www.surgery.upmc.edu/resident/general/awards.htmQuestion: Find the list of Resident Research Awards/Grants in the General Surgery Residency Training Program.

http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSresident.htmhttp://www.surgery.upmc.edu/resident/FSResTraining.htmhttp://www.surgery.upmc.edu/resident/FSResGen.htmhttp://www.surgery.upmc.edu/resident/general/research.htm

2 2Q3H3 Target: http://www.surgery.upmc.edu/resident/pediatric/application.htmQuestion: What is the street address for submitting an application for the Pediatric Surgery Resident Program?

http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSresident.htmhttp://www.surgery.upmc.edu/resident/FSResTraining.htmhttp://www.surgery.upmc.edu/resident/FSResPed.htm

2 2Q4L1*->2Q4L2

Target: http://www.surgery.upmc.edu/resident/oncology/research.htm Question: Who worked on the research about "Identification of Tumor Vasculature Binding Peptides Using an E. coli Peptide Display Library"? http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSresident.htmhttp://www.surgery.upmc.edu/resident/FSResTraining.htmhttp://www.surgery.upmc.edu/resident/FSResOnc.htm

2 2Q5L2*->2Q3L1

Target: http://www.surgery.upmc.edu/contact/plastic/Shestak/Shestaklicense.htmQuestion: What professional and scientific societies does Kenneth C. Shestak belong to?

http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/contact/FScontact.htmhttp://www.surgery.upmc.edu/contact/NAVcontact.htmhttp://www.surgery.upmc.edu/contact/plastic/facplas.htmhttp://www.surgery.upmc.edu/contact/plastic/Shestak/Shestakbio.htm

2 2Q6L3**->2Q2H2

Target: http://www.surgery.upmc.edu/contact/plastic/Manders/Mandershours.htmQuestion: What are Ernest Manders’s outpatient clinic hours?

http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSsplash.htmhttp://www.surgery.upmc.edu/contact/FScontact.htmhttp://www.surgery.upmc.edu/contact/plastic/facplas.htmhttp://www.surgery.upmc.edu/contact/plastic/Manders/Mandersbio.htm

1 1Q1H1**->1Q1H1

Target: http://www.pitt.edu/~womnst/newsletters/newsf98/vote.htmlQuestion: In 1998, what was the status of the Pennsylvania Women’s Vote project?

http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.html

122

http://www.pitt.edu/~womnst/newsletters/newsf98/vote.html

http://www.surgery.upmc.edu/contact/plastic/Manders/Mandershours.htm

http://www.surgery.upmc.edu/contact/plastic/Shestak/Shestaklicense.htm

http://www.surgery.upmc.edu/resident/FSResOnc.htm

http://www.surgery.upmc.edu/resident/FSResTraining.htm

http://www.surgery.upmc.edu/FSresident.htm

http://www.surgery.upmc.edu/old.htm

http://www.surgery.upmc.edu/resident/oncology/research.htm

http://www.surgery.upmc.edu/resident/FSResPed.htm




http://www.surgery.upmc.edu/resident/pediatric/application.htm

http://www.surgery.upmc.edu/resident/general/research.htm

http://www.surgery.upmc.edu/resident/FSResGen.htm




http://www.surgery.upmc.edu/resident/general/awards.htm

http://www.surgery.upmc.edu/contact/plastic/Russavage/Russavagebio.htm

http://www.surgery.upmc.edu/contact/plastic/facplas.htm

http://www.surgery.upmc.edu/contact/FScontact.htm


Site Question Target, question, and test pageshttp://www.pitt.edu/~womnst/newsletters/newsf98/contents.html

1 1Q2H2 Target: http://www.pitt.edu/~womnst/contactus/contactus.htmlQuestion: Find the web page for adding your name in the Women studies program’s mailing list.

http://www.pitt.edu/~womnst/index.html 1 1Q3H3**

->1Q2H2Target: http://www.pitt.edu/~womnst/newsletters/newsfall96/w7.htmlQuestion: What are the discussions at the UN Women’s Conference “ONE YEAR LATER”, in Fall 1996?

http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsfall96/newsltr1.html

1 1Q4L1*->1Q3L1

Target: http://www.pitt.edu/~womnst/newsletters/newsf98/grat.htmlQuestion: Who were the patrons of the women’s study program, in March 1998?

http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsf98/contents.html

1 1Q5L2*->1Q4L2

Target: http://www.pitt.edu/~womnst/newsletters/newsfall96/call.htmlQuestion: When was the due date to send the abstracts for the George Washington university conference on Cultural Violence, March 7-9, 1997?

http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsfall96/newsltr1.html

1 1Q6L3 Target: http://www.pitt.edu/~womnst/newsletters/newsf98/eholmes.htmlQuestion: Who is Erin Holmes?

http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsf98/contents.html

4 4Q1H1 Target: http://www.pitt.edu/~hispan/related.html Question: What are the Spanish Language Periodicals suggested by the Hispanic Languages and Literatures program?

http://www.pitt.edu/~hispan/index.html4 4Q2H2**

->4Q1H1Target: http://www.pitt.edu/~hispan/fac-jbra.htmlQuestion: What are Jerome Branche’s specialties?

http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/fac.html

4 4Q3H3**->4Q2H2

Target: http://www.pitt.edu/~hispan/grad-ma.htmlQuestion: How many credits are required for the Master of Arts (MA) in Hispanic Languages & Literatures?

http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/grad.html

4 4Q4L1*->4Q3L1

Target: http://www.pitt.edu/~hispan/fac-mm.htmlQuestion: Who wrote “Literatura y cultura nacional en Hispanoamérica” (1910-1940)?


123

http://www.pitt.edu/~hispan/fac-mm.html

http://www.pitt.edu/~hispan/index.html

http://www.pitt.edu/~hispan/grad-ma.html

http://www.pitt.edu/~hispan/fac.html

http://www.pitt.edu/~hispan/fac-jbra.html

http://www.pitt.edu/~hispan/index.html

http://www.pitt.edu/~hispan/related.html

http://www.pitt.edu/~womnst/newsletters/newsf98/eholmes.html

http://www.pitt.edu/~womnst/newsletters/newsfall96/newsltr1.html

http://www.pitt.edu/~womnst/newsletters/newsfall96/call.html

http://www.pitt.edu/~womnst/newsletters/newsf98/contents.html

http://www.pitt.edu/~womnst/newsletters/newsf98/grat.html

http://www.pitt.edu/~womnst/newsletters/newsfall96/newsltr1.html

http://www.pitt.edu/~womnst/newsletters/newsfall96/w7.html

http://www.pitt.edu/~womnst/index.html

http://www.pitt.edu/~womnst/contactus/contactus.html

http://www.pitt.edu/~womnst/newsletters/newsf98/contents.html

Site Question Target, question, and test pages4 4Q5L2*

->4Q4L2Target: http://www.pitt.edu/~hispan/fac-leeman.htmlQuestion: Who has research interests in interaction in second language acquisition, feedback and negative evidence in SLA, task-based language learning and teaching?


4 4Q6L3 Target: http://www.pitt.edu/~hispan/fac-tp.htmlQuestion: Who wrote the Ph.D. dissertation on “The Production and Perception of Vowel Sounds: A Case Study of Peruvian students learning English as a foreign language”


5 5Q1H1 Target: http://www.stat.pitt.edu/abstracts.htmlQuestion: Find the abstracts of the “PERFECT SAMPLING: AN INTRODUCTION” seminar (Thursday, October 26, 2000)?

http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/news.html

5 5Q2H2**->5Q2H2

Target: http://www.stat.pitt.edu/pfenning.html Question: What is one technique Dr. Pfenning is interested in using to enhance student involvement in her courses?

http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/people.html

5 5Q3H3**->5Q1H1

Target: http://www.stat.pitt.edu/block.htmlQuestion: What class Prof. Henry W. Block teaching in Spring 2000.


5 5Q4L1*->5Q4L2

Target: http://www.stat.pitt.edu/ds.htmlQuestion: Who has research interests in time series, spatial statistics, longitudinal data analysis and applications to medicine, epidemiology, molecular biology and computer vision?


5 5Q5L2*->5Q3L1

Target: http://www.stat.pitt.edu/ts.htmlQuestion: Who has research interests in reliability theory, applied probability theory, stochastic processes, and dependence concepts?


5 5Q6L3 Target: http://www.stat.pitt.edu/students.htmlQuestion: Who is Robert Buck?

http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/graduate.htmlhttp://www.stat.pitt.edu/ci.htmlhttp://www.stat.pitt.edu/grad.html

3 3Q1H1 Target: http://www.pitt.edu/~filmst/pittfaculty.htmlQuestion: Who are the faculty members in Film Studies?

124

http://www.pitt.edu/~filmst/pittfaculty.html

http://www.stat.pitt.edu/students.html

http://www.stat.pitt.edu/people.html

http://www.stat.pitt.edu/index.html

http://www.stat.pitt.edu/ts.html

http://www.stat.pitt.edu/ds.html

http://www.stat.pitt.edu/block.html

http://www.stat.pitt.edu/pfenning.html

http://www.stat.pitt.edu/abstracts.html

http://www.pitt.edu/~hispan/fac-tp.html

http://www.pitt.edu/~hispan/fac-leeman.html

Site Question Target, question, and test pageshttp://www.pitt.edu/~filmst/index.html

3 3Q2H2**->3Q1H1

Target: http://www.pitt.edu/~filmst/pittevents.htmlQuestion: What talk was given on WEDNESDAY OCTOBER 25TH and where?

http://www.pitt.edu/~filmst/index.html3 3Q3H3**

->3Q2H2Target: http://www.pitt.edu/~filmst/pittgradcourse.htmlQuestion: What courses were required by graduate film study?

http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittgrad.htmlhttp://www.pitt.edu/~filmst/pittgradcourse.html

3 3Q4L1*->3Q3L1

Target: http://www.pitt.edu/~filmst/pittugcourses.htmlQuestion: What is the title of the advertisement taken from Motion Picture Magazine 1913?

http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittundergrad.html

3 3Q5L2 Target: http://www.pitt.edu/~filmst/ugcatone.htmlQuestion: What is the course number for “The World of China: Chinese National Cinema”?

http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittundergrad.htmlhttp://www.pitt.edu/~filmst/pittugcourses.html

3 3Q6L3*->3Q4L2

Target: http://www.pitt.edu/~filmst/pittugmajor.htmlQuestion: Find the Web page that shows the patent for Edison’s Kinetograph.

http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittundergrad.html

* Selected as low information scent question.** Selected as high information scent question.-> Question ID that use in the main experiment

125

http://www.pitt.edu/~filmst/index.html

http://www.pitt.edu/~filmst/pittugmajor.html

http://www.pitt.edu/~filmst/pittugcourses.html


http://www.pitt.edu/~filmst/ugcatone.html


http://www.pitt.edu/~filmst/pittugcourses.html


http://www.pitt.edu/~filmst/pittgradcourse.html



Table 49: Information-scent score

Web Site Question ID

Avg. of Graphical

overview scent score Std. Dev.

Avg. of Browser scent score Std. Dev.

Avg. of information scent Std. Dev.

High Complex

9Q2H2** 0.88 0.249 0.84 0.219 0.86 0.1669Q1H1** 0.90 0.316 0.81 0.189 0.86 0.1939Q3H3 0.50 0.471 0.69 0.328 0.59 0.3259Q4L1 0.55 0.438 0.31 0.352 0.43 0.3319Q6L3* 0.13 0.219 0.05 0.263 0.09 0.2149Q5L2* 0.00 0.000 0.15 0.396 0.07 0.1982Q1H1** 0.55 0.497 0.80 0.270 0.68 0.2932Q6L3** 0.40 0.516 0.36 0.212 0.38 0.2912Q2H2 0.30 0.483 0.40 0.355 0.35 0.3462Q3H3 0.10 0.316 0.49 0.626 0.30 0.3092Q5L2* 0.03 0.105 0.06 0.285 0.05 0.1572Q4L1* 0.00 0.000 -0.41 0.298 -0.21 0.1491Q1H1** 0.50 0.471 0.54 0.524 0.52 0.3871Q3H3** 0.53 0.502 0.40 0.345 0.47 0.3141Q6L3 0.55 0.497 -0.29 0.430 0.13 0.4241Q2H2 0.43 0.439 -0.56 0.486 -0.06 0.3481Q4L1* 0.00 0.000 -0.17 0.448 -0.09 0.2241Q5L2* 0.03 0.105 -0.44 0.468 -0.20 0.243

Low Complex

4Q2H2** 0.93 0.211 0.48 0.079 0.70 0.1084Q3H3** 0.85 0.337 0.39 0.161 0.62 0.2384Q6L3 0.30 0.483 -0.04 0.347 0.13 0.3274Q1H1 0.22 0.334 0.00 0.000 0.11 0.1674Q4L1* 0.10 0.316 -0.09 0.204 0.00 0.2184Q5L2* 0.00 0.000 -0.03 0.129 -0.01 0.0655Q3H3** 0.87 0.281 0.28 0.236 0.58 0.1905Q2H2** 0.83 0.272 0.28 0.172 0.56 0.1565Q6L3 0.40 0.370 0.18 0.186 0.29 0.2325Q1H1 0.40 0.459 0.11 0.314 0.26 0.3595Q5L2* 0.00 0.000 0.15 0.218 0.08 0.1095Q4L1* 0.05 0.158 0.05 0.327 0.05 0.1923Q2H2** 0.95 0.158 0.93 0.237 0.94 0.1353Q3H3** 0.63 0.350 0.66 0.388 0.64 0.3143Q1H1 0.95 0.158 0.00 0.000 0.48 0.0793Q5L2 0.05 0.158 0.73 0.188 0.39 0.1343Q4L1* 0.05 0.158 -0.07 0.395 -0.01 0.2233Q6L3* 0.00 0.000 -0.15 0.344 -0.08 0.172

Overall 0.39 0.340 0.22 0.382 0.30 0.317* Selected as low information scent question. ** Selected as high information scent question.

Appendix G : The main experiment instruction sheet

126

Navigational Tools Experiment Instruction Sheet

Thank you, for participating in this experiment. The objective of this navigational tools

experiment is to measure and compare navigation performance between three navigational tools in

various conditions. The data obtained will be used as a part of Ph.D. thesis, Semantics, Complexity

and Capability: The Use of Integrated Navigational tools for Information Finding in Hypertext

Document Space.

The experiment will use special software to present and collect data. Your task is to find a

Web page that provides information that answers given questions. In total, there are 30 tasks in this

experiment. There will also be the follow up questionnaire, to gather demographic data, user

satisfaction information etc.

There are three navigational tools used in the experiment, a Browser similar to what you have

used, a Graphical Overview which provides a graph of Web pages, and a tool which combines the

Browser and Graphical Overview.

IF YOU HAVE ANY QUESTIONS, PLEASE AT ANY TIME ASK THE

EXPERIMENTOR. THANK YOU.

127

Navigational Tools

Browser

The Browser is a simplified version of a Web Browser, e.g. Internet Explorer. It is shown in

Figure 1. You are limited to navigating within a given Web site. Navigation can be done by clicking

on an anchor (1), a back (2) and a forward (3) button.

Figure 1: Browser

In the browser screen, anchors are usually represented by text with blue color and underlines.

However, sometimes image areas are also selectable anchors. One indication of an anchor on an

image is when the cursor shows as a when moving over the image. This means the image is an

anchor that may be clicked.

1

2

3

128

Graphical Overview

The Graphical Overview, shown in Figure 2, provides a map view and text viewer. The map

is the whole Web site. Web page is present by an icon on the map and a link between pages is

represented by a line. A selected Web page is shown in text viewer. Your view of the Web site can be

changed by using scroll bars at the bottom (1) and right (2) and dragging the main map area. The

small map may also be used to navigate by clicking or dragging the small box on it (3). The area in

the box represents what you are seeing. The view can be zoomed in or out by +, - buttons or scroll

bar on the left near the +, - button (4). Clicking on any icon (5) on the map will show the content of

that page on the text viewer (6). Back and forward button (7) can be use for going back, i.e. previous

view and forward.

When pages are visited, the text color of icon will change to purple. The current page on the

text viewer is indicated by red icon. The location of that selected page will be shown in a small map

as a red dot. In this mode, there will be no links in the text viewer.

Figure 2: Graphical Overview

1

2

3

4

5 6

7

129

Browser and Graphical Overview

The Browser and Graphical Overview is an integrated tool, shown in Figure 3. The

navigational functions in both the browser and graphical overview will work. You may navigate to a

page by clicking on an icon in the Graphical overview or an anchor in the Browser. Both tools are

synchronized, when you select a page in the browser the map will scroll automatically and show the

icon of selected page in red color. When you select an icon on the Graphical overview, the browser

will show that page.

Figure 3: Browser and Graphical Overview

130

Software for the experiment

You will need to take the following steps as you move through the experiment.

1. You will fill in a user ID. It will be assigned to you by the experimenter.

2. After entering the user ID, you will be shown an introduction page and brief pages for each step in

experiment. Read the information and click on “Next >” (1) to continue to the next page.

1

131

3. There will be 2 sessions, one for practice and one for the experiment purpose.

The practice session is designed to allow you to practice using the navigational tools and

become familiar with the tasks. There will be 3 practice tasks with each navigational tool. Take your

time and play with the functions in each navigational tool. There is no time limited for the practice

session

The experimental session will then be conducted to collected data. Each task will be limited

in 6 minutes. The time will be show by small clock show below. When all the circle is all black, time

is expired.

There are a total of 30 questions in the experimental session.

4. When a task page appears, the question or instruction will be shown at the top of the screen (1).

Click start (2) to begin task. The clock (3) will start counting.

5. The navigational tool will not show up until you press start. You will use the navigational tool to

navigate to the page that contains the information to answer the given question. When you have found

the page you want, click on submit (1).

6. If you cannot find the answer with in a time limit (6 min.), the navigational tool will disappear. Do

not worry about not completing the task, your goal should only be to do your best. Some target pages

will be difficult to find. Click on Submit to continue to next question.

7. You may take a break any time. However, to make the results of the experiment as consistent as

possible, please take a break in between the tasks, i.e. after submit and before start.

12 3

1

132

After finishing the experiment, there will be a questionnaire to collect demographic data,

Web site familiarity information, etc. Answer all the questions. When you finish, the “done” button

at the bottom will be enabled, click on “done” button to go to the next part. In some part there are

multiple pages. Using “<previous” or “next>” buttons to change a page or using a page tab to go to a

certain page.

Your participation in this research study is completely voluntary. You do not have to take

part in this research study and, should you change your mind, you can withdraw from the study at any

time.

There are no direct risks or benefits to participation. Indirectly, you will have a chance to be

exposed to a state-of-the art technology and learn more about navigational tools. You will also

contribute indirectly to the development of such technology.

There will be no cost to you. The experiment will take approximately one and a half hours.

Upon completion, you will receive a $15 payment for your participation.

133

Appendix H : Questionnaires

H.1 Demographics, Computer and World Wide Web Experience formFigure 44 shows the screen that used to obtain demographics, computer and WWW experience.

Figure 44: Demographic data screen

134

H.2 Web sites familiarity scoreFigure 45 shows the screen that used to obtain the Web site familiarity scores. There were 7

pages. The same question was asked with difference Web site names and pictures.

Figure 45: Web site familiarity screen

135

H.3 User satisfaction QuestionnaireThe questionnaire based on the based on Post-Study System Usability Questionnaire (PSSUQ)

(Lewis, 1995). The software showed the following instruction in the first screen;

This questionnaire gives you an opportunity to tell us your reactions to having used Browser,

Graphical Overview and Browser + Graphical Overview. Your responses will help us understand

what aspects of software you are particularly concerned about and the aspects that satisfy you.

To as great a degree as possible, think about all the tasks you just performed while you

answer these questions. Please read each statement carefully and indicate how strongly you agree

or disagree with the statement by checking a number on the scale.

If you are certain that a statement does not apply to you, check N/A.

Three set of four pages questionnaire show in Figure 46.

Figure 46: User Satisfaction Questionnaire screen

136

List of questions that appeared in the user satisfaction questionnaire (Figure 46);

1. Overall, I am satisfied with how easy it is to use Browser

2. It was simple to use Browser

3. I can effectively complete my work using Browser

4. I am able to complete my work quickly using Browser

5. I am able to efficiently complete my work using Browser

6. I feel comfortable using Browser

7. It was easy to learn to use Browser

8. I believe I became productive quickly using Browser

9. Browser gives error messages that clearly tell me how to fix problems

10. Whenever I make a mistake using Browser, I recover easily and quickly

11. The information (such as online help, on-screen messages, and other documentation) provided

with Browser is clear

12. It is easy to find the information I needed

13. The information provided for Browser is easy to understand

14. The information is effective in helping me complete the tasks and scenarios

15. The organization of information on Browser screens is clear

16. The interface of Browser is pleasant

17. I like using the interface of Browser

18. Browser has all the functions and capabilities I expect it to have

19. Overall, I am satisfied with Browser

List the most negative aspect(s) of Browser:1:__________________________________________________________________

2:__________________________________________________________________List the most positive aspect(s) of Browser:1:___________________________________________________________________

2:___________________________________________________________________

The all of the word “Browser” in the question were replace with “Graphical Overview” in the second

questionnaire and replace with “Browser + Graphical Overview” in the third questionnaire screen.

137

Appendix I : Statistical Analysis results

I.1 Tool usage statistic

Table 50: Pairwise Comparisons ln(time between clicking) of the integrated tool

Web site complexity

Question type

Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval for Difference

(I) Event (J) Event Lower Bound Upper Bound High High I-I I-A .046 .098 1.000 -.212 .304 A-I -.876* .110 .000 -1.167 -.585 A-A .066 .069 1.000 -.117 .248 I-A A-I -.922* .129 .000 -1.261 -.582 A-A .020 .096 1.000 -.233 .272 A-I A-A .941* .108 .000 .655 1.227 Low I-I I-A -.234* .056 .000 -.383 -.086 A-I -1.021* .056 .000 -1.168 -.874 A-A -.223* .036 .000 -.318 -.127 I-A A-I -.787* .072 .000 -.977 -.597 A-A .011 .058 1.000 -.142 .165 A-I A-A .798* .058 .000 .646 .950 Low High I-I I-A -.154 .166 1.000 -.592 .284 A-I -.351 .212 .588 -.911 .209 A-A .214 .137 .710 -.147 .574 I-A A-I -.197 .229 1.000 -.802 .408 A-A .368 .162 .138 -.059 .794 A-I A-A .565* .209 .042 .012 1.117 Low I-I I-A -.553* .099 .000 -.814 -.292 A-I -.988* .085 .000 -1.212 -.764 A-A -.522* .049 .000 -.651 -.392 I-A A-I -.435* .124 .003 -.763 -.106 A-A .031 .103 1.000 -.241 .304 A-I A-A .466* .091 .000 .227 .705 Based on estimated marginal means* The mean difference is significant at the .05 level. Adjustment for multiple comparisons: Bonferroni.I-I icon-icon clicking, I-A icon-anchor clicking, A-I anchor-icon clicking, A-A anchor-anchor clicking

138

I.2 Task completion statistic

Table 51: Mauchly's Test of Sphericity on number of tasks completed

Within Subjects Effect Mauchly’s W Approx.

Chi-Square df Sig.

EpsilonGreenhouse-

Geisser Huynh-Feldt Lower-

bound TOOL .984 1.698 2 .428 .984 1.000 .500 WEB 1.000 .000 0 . 1.000 1.000 1.000 QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB .962 4.077 2 .130 .964 .981 .500 TOOL * QUESTION .926 8.139 2 .017 .931 .947 .500 WEB * QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB * QUESTION .993 .743 2 .690 .993 1.000 .500

Table 52: Pairwise Comparisons on number of task completed between tools in question type

conditions only in high complex Web site

QUESTION (I) TOOL (J) TOOL

Mean Difference



Lower Bound Upper Bound High B G -.074 .081 1.000 -.271 .122 I .148 .094 .351 -.080 .376 G I .222* .079 .018 .029 .415 Low B G .056 .034 .327 -.028 .139 I -.009 .021 1.000 -.060 .041 G I -.065 .030 .103 -.138 .009 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool

139

I.3 Number of answer found statistic

Table 53: Number task that subject visited the target node but submitted other node or time out.

Web site complexity Question type N % answers not foundHigh High 2 1.7%

Low 19 3.9%High Total 21 3.5%Low High 13 29.5%

Low 37 20.8%Low Total 50 22.5%Grand Total 71 8.6%

Table 54: Number of answer grouped by question

Answer not foundWeb site complexity

Question type

Answer found Not timed out Timed outQuestion N %(row) N %(row) N %(row)

High High 1Q1H1 79 73.1% 25 23.1% 4 3.7%1Q2H2 66 61.1% 42 38.9%2Q1H1 96 88.9% 9 8.3% 3 2.8%2Q2H2 87 80.6% 13 12.0% 8 7.4%9Q1H1 101 93.5% 6 5.6% 1 0.9%9Q2H2 102 94.4% 5 4.6% 1 0.9%

High Total 531 81.9% 100 15.4% 17 2.6%Low 1Q3L1 27 25.0% 60 55.6% 21 19.4%

1Q4L2 21 19.4% 46 42.6% 41 38.0%2Q3L1 65 60.2% 38 35.2% 5 4.6%2Q4L2 17 15.7% 49 45.4% 42 38.9%9Q3L1 16 14.8% 37 34.3% 55 50.9%9Q4L2 14 13.0% 52 48.1% 42 38.9%

Low Total 160 24.7% 282 43.5% 206 31.8%High Total 691 53.3% 382 29.5% 223 17.2%Low High 3Q1H1 108 100.0%

3Q2H2 79 73.1% 29 26.9%4Q1H1 101 93.5% 7 6.5%4Q2H2 104 96.3% 4 3.7%5Q1H1 106 98.1% 2 1.9%5Q2H2 106 98.1% 2 1.9%

High Total 604 93.2% 44 6.8%Low 3Q3L1 59 54.6% 45 41.7% 4 3.7%

3Q4L2 59 54.6% 45 41.7% 4 3.7%4Q3L1 84 77.8% 17 15.7% 7 6.5%4Q4L2 76 70.4% 30 27.8% 2 1.9%5Q3L1 91 84.3% 17 15.7%5Q4L2 101 93.5% 7 6.5%

Low Total 470 72.5% 161 24.8% 17 2.6%Low Total 1074 82.9% 205 15.8% 17 1.3%Grand Total 1765 68.1% 587 22.6% 240 9.3%

140

Table 55: Task submitted only the answer not found

Number of task

submitted not target

Number of page

submitted

Number of subjects

submitted per page

Number of task in the highest submitted not target page

Web site complexity

Question type Question Avg. SD N

% of answer not found

% of answer found

High High 1Q1H1 25 16 1.56 1.31 9 36.0% 11.4%1Q2H2 42 8 5.25 7.61 35 83.3% 53.0%2Q1H1 9 8 1.13 0.35 2 22.2% 2.1%2Q2H2 13 13 1.00 0.00 1 7.7% 1.1%9Q1H1 6 4 1.50 1.00 3 50.0% 3.0%9Q2H2 5 5 1.00 0.00 1 20.0% 1.0%

Low 1Q3L1 60 25 2.40 4.15 23 38.3% 85.2%1Q4L2 46 20 2.30 1.87 9 19.6% 42.9%2Q3L1 38 22 1.73 2.05 10 26.3% 15.4%2Q4L2 49 32 1.53 1.67 10 20.4% 58.8%9Q3L1 37 25 1.48 1.12 5 13.5% 31.3%9Q4L2 52 23 2.26 3.15 14 26.9% 100.0%

Low High 3Q1H13Q2H2 29 7 4.14 5.52 16 55.2% 20.3%4Q1H1 7 5 1.40 0.89 3 42.9% 3.0%4Q2H2 4 3 1.33 0.58 2 50.0% 1.9%5Q1H1 2 2 1.00 0.00 1 50.0% 0.9%5Q2H2 2 2 1.00 0.00 1 50.0% 0.9%

Low 3Q3L1 45 9 5.00 6.93 23 51.1% 39.0%3Q4L2 45 11 4.09 3.21 10 22.2% 16.9%4Q3L1 17 10 1.70 1.25 5 29.4% 6.0%4Q4L2 30 10 3.00 3.77 11 36.7% 14.5%5Q3L1 17 7 2.43 3.78 11 64.7% 12.1%5Q4L2 7 5 1.40 0.55 2 28.6% 2.0%

Table 56: Mauchly's Test of Sphericity on number of answers found

Within Subjects Effect Mauchly's WApprox. Chi-

Square df Sig.

EpsilonGreenhouse-

GeisserHuynh-Feldt Lower-

bound TOOL .991 1.005 2 .605 .991 1.000 .500 WEB 1.000 .000 0 . 1.000 1.000 1.000 QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB .982 1.889 2 .389 .983 1.000 .500 TOOL * QUESTION .991 .963 2 .618 .991 1.000 .500 WEB * QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB * QUESTION

.980 2.150 2 .341 .980 .998 .500

141

Table 57: Pairwise comparisons on number of answers found between tools in question type

conditions

QUESTION (I) TOOL (J) TOOL

Mean Difference



Lower Bound

Upper Bound

High B G .019 .047 1.000 -.096 .133 I -.023 .046 1.000 -.136 .089 G I -.042 .052 1.000 -.169 .085 Low B G -.227* .060 .001 -.373 -.081 I -.079 .065 .682 -.236 .079 G I .148 .061 .053 -.001 .298 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.

Table 58: Pairwise Comparisons on number of answers found between tools in Web site

complexity conditions

WEB (I) TOOL (J) TOOL

Mean Difference



Lower Bound

Upper Bound

High B G -.051 .058 1.000 -.192 .090 I -.093 .054 .273 -.225 .039 G I -.042 .059 1.000 -.186 .102 Low B G -.157* .053 .012 -.287 -.028 I -.009 .063 1.000 -.164 .145 G I .148* .058 .035 .008 .289 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.

142

I.4 Outliers: Extreme casesExtreme cases were excluded from the data analyses. The following cases were classified as

extreme:

In the high complexity web site with low information-scent questions, there were 6 tasks

where subjects spent an unreasonably short time on the task. The time spent on task was less

than 3.7 sec. where the mean was 237.2 sec. and the SD 116.8 sec.

In the high complexity web site, there were 10 tasks where subjects spent a short time on the

task. The time spent on task was less than 10 sec. where the mean was 159.1 sec. and the SD

was 128.6 sec.

In the high information-scent questions, there were 18 tasks where subjects spent 360 sec.

where the mean was 54.1 sec and the SD was 69.4.

In the high complexity web site, there were 3 tasks where subjects viewed more than 75

pages where the mean was 15.4 pages and the SD was 15.6.

In the low complexity web site, there were 4 tasks where subjects viewed more than 50 pages

where the mean was 8.6 pages and the SD was 8.4.

When using graphical overview and using integrated tool, there was 12 tasks where the

number of pages equaled zero where subjects used the map for some time and then submitted

without viewing any other page besides the start page.

In low complexity web site, there were 5 tasks where subjects revisited many pages. The

number of revisited page views was more that 25 pages where the mean was 2.4 pages and

the SD was 4.9.

In high complexity web site, there were 3 tasks where subjects revisited many pages. The

number of revisited page views was more that 35 pages where the mean was 4.1 pages and

the SD was 7.1.

There were 35 tasks where the number of extra node was less than zero. Subjects viewed

fewer pages than the number required to perform the tasks.

Some tasks were given in more than one condition. A total of 27 extreme cases out of 2,592 were

excluded in the time spent on task analysis. 45 extreme cases were excluded in the number of page

views, the number of pages, and the revisited page views analysis. 68 extreme cases were excluded in

the extra page views analysis. The number of extreme case break down by tool type, web site

complexity, and question type are shown in Table 59.

143

Table 59: Number of extreme case

Time Page ExtraTool Web complexity Question type N %(1) N %(1) N %(1)Browser High High 0 0.0% 4 1.9% 12 5.6%

Low 4 1.9% 4 1.9% 13 6.0%Low High 1 0.5% 1 0.5% 1 0.5%

Low 6 2.8% 6 2.8% 12 5.6%Graphical Overview

High High 4 1.9% 14 6.5% 14 6.5%Low 2 0.9% 2 0.9% 2 0.9%

Low High 0 0.0% 0 0.0% 0 0.0%Low 1 0.5% 1 0.5% 1 0.5%

Integrated High High 5 2.3% 8 3.7% 8 3.7%Low 2 0.9% 3 1.4% 3 1.4%

Low High 0 0.0% 0 0.0% 0 0.0%Low 2 0.9% 2 0.9% 2 0.9%

Total 27 1.0%* 45 1.7%* 68 2.6%*(1) percent of the 216 total tasks in the condition * percent of the 2,592 total tasksTime – the time spent on task analysisPage – the number of page views, the number of pages, and the number of revisited page view analysis Extra – the number of extra page analysis

144

I.5 Time spent on task statistic

Table 60: Mauchly's Test of Sphericity on ln(time spent on task)

Within Subjects Effect Mauchly's W

Approx. Chi-Square df Sig.

EpsilonGreenhouse-

Geisser Huynh-

FeldtLower-bound

TOOL .997 .371 2 .831 .997 1.000 .500 WEB 1.000 .000 0 . 1.000 1.000 1.000 QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB .990 1.030 2 .598 .990 1.000 .500 TOOL * QUESTION .998 .215 2 .898 .998 1.000 .500 WEB * QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB * QUESTION

.994 .656 2 .720 .994 1.000 .500

Table 61: Pairwise comparisons on ln(time spent on task)

QUESTION (I) TOOL (J) TOOLMean Difference



Lower Bound Upper BoundHigh B G -0.241* 0.056 0.000 -0.377 -0.105 I -0.100 0.052 0.173 -0.226 0.027 G I 0.141* 0.051 0.021 0.017 0.265Low B G -0.085 0.047 0.217 -0.200 0.029

I -0.146* 0.051 0.015 -0.270 -0.022G I -0.061 0.052 0.747 -0.188 0.067

Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.

145

I.6 Number of page viewed statistic

Table 62: Mauchly's Test of Sphericity for number of pages statistic

Within Subjects Effect Measure Mauchly's W

Approx. Chi-Square df Sig.

EpsilonGreenhouse-

GeisserHuynh-

FeldtLower-bound

TOOL TOTAL .991 .997 2 .607 .991 1.000 .500 DIFF .981 1.992 2 .369 .982 1.000 .500 REVISIT .970 3.246 2 .197 .971 .988 .500 EXTRA .974 2.830 2 .243 .974 .992 .500 WEB TOTAL 1.000 .000 0 . 1.000 1.000 1.000 DIFF 1.000 .000 0 . 1.000 1.000 1.000 REVISIT 1.000 .000 0 . 1.000 1.000 1.000 EXTRA 1.000 .000 0 . 1.000 1.000 1.000 QUESTION TOTAL 1.000 .000 0 . 1.000 1.000 1.000 DIFF 1.000 .000 0 . 1.000 1.000 1.000 REVISIT 1.000 .000 0 . 1.000 1.000 1.000 EXTRA 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB TOTAL .953 5.137 2 .077 .955 .972 .500

DIFF .953 5.090 2 .078 .955 .972 .500REVISIT .997 .347 2 .841 .997 1.000 .500EXTRA .951 5.359 2 .069 .953 .970 .500

TOOL * QUESTION

TOTAL .991 .906 2 .636 .992 1.000 .500DIFF .979 2.283 2 .319 .979 .997 .500REVISIT .991 1.002 2 .606 .991 1.000 .500EXTRA .972 3.034 2 .219 .973 .990 .500

WEB * QUESTION

TOTAL 1.000 .000 0 . 1.000 1.000 1.000DIFF 1.000 .000 0 . 1.000 1.000 1.000REVISIT 1.000 .000 0 . 1.000 1.000 1.000EXTRA 1.000 .000 0 . 1.000 1.000 1.000

TOOL * WEB * QUESTION

TOTAL .977 2.460 2 .292 .978 .996 .500DIFF .946 5.920 2 .052 .948 .965 .500REVISIT .954 4.963 2 .084 .956 .973 .500EXTRA .995 .548 2 .760 .995 1.000 .500

146

Table 63: Pairwise comparisons on Ln(number of pages) between tools in Web complexity

conditions

Measure

Web complexity

(I) TOOL

(J) TOOL

Mean Difference



Lower Bound

Upper Bound

TOTAL High B G .540* .053 .000 .411 .669 I .290* .045 .000 .181 .398 G I -.251* .046 .000 -.363 -.138 Low B G .123* .038 .004 .031 .215

I .139* .040 .002 .042 .236 G I .015 .034 1.000 -.068 .098 DIFF High B G .427* .048 .000 .310 .544 I .218* .041 .000 .120 .317 G I -.209* .042 .000 -.310 -.107 Low B G -.084* .032 .030 -.162 -.006

I -.007 .033 1.000 -.086 .072 G I .077* .028 .019 .010 .145 REVISIT High B G .617 .061 .000 .470 .764 I .352 .054 .000 .220 .484 G I -.265 .051 .000 -.388 -.142 Low B G .580 .048 .000 .463 .697

I .385 .053 .000 .255 .514 G I -.196 .048 .000 -.313 -.078 EXTRA High B G .234* .063 .001 .080 .388 I -.102 .056 .217 -.238 .035 G I -.336* .055 .000 -.470 -.201 Low B G -.115 .049 .065 -.235 .005

I -.092 .053 .259 -.221 .037 G I .023 .044 1.000 -.085 .131 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool

147

Table 64: Pairwise Comparisons on Ln(number of pages) between tool in question type

conditions

Measure

Question type

(I) TOOL

(J) TOOL

Mean Difference



Lower Bound

Upper Bound

TOTAL High B G .341 .045 .000 .231 .451 I .241 .038 .000 .148 .334 G I -.100 .041 .047 -.199 -.001 Low B G .323 .048 .000 .207 .439

I .187 .049 .001 .068 .307 G I -.136 .048 .017 -.252 -.019 DIFF High B G .340* .039 .000 .245 .434 I .224* .032 .000 .145 .303 G I -.115* .037 .007 -.205 -.026 Low B G .003 .042 1.000 -.099 .106

I -.013 .041 1.000 -.111 .086 G I -.016 .042 1.000 -.117 .085 REVISIT High B G .092 .047 .165 -.023 .207 I .122* .043 .014 .019 .226 G I .031 .036 1.000 -.056 .117 Low B G 1.106* .069 .000 .938 1.274

I .614* .073 .000 .436 .792 G I -.492* .067 .000 -.655 -.328 EXTRA High B G -.140 .066 .112 -.301 .022 I -.304* .062 .000 -.454 -.153 G I -.164* .053 .008 -.293 -.035 Low B G .259* .056 .000 .123 .394

I .110 .059 .195 -.033 .254 G I -.148* .056 .026 -.284 -.013 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool

148

Table 65 Pairwise comparisons on Ln(number of re-visited pages) between tools in Web

complexity conditions only in the high information-scent question type

WEB (I) TOOL (J) TOOL

Mean Difference



Lower Bound Upper Bound High B G .194* .079 .046 .002 .385 I .222* .074 .010 .043 .401 G I .028 .063 1.000 -.125 .181 Low B G -.010 .045 1.000 -.120 .099 I .023 .041 1.000 -.076 .122 G I .033 .037 1.000 -.058 .124 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool

Table 66: Pairwise comparisons on Ln(number of re-visited pages) between tools only in the low

information-scent question type

(I) TOOL (J) TOOLMean Difference



Lower Bound Upper Bound B G 1.106* .069 .000 .938 1.274 I .614* .073 .000 .436 .792 G I -.492* .067 .000 -.655 -.328 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool

149

I.7 Tools performances comparisons

Table 67: Tools performances comparisons

MeasureWeb complexity

Question type

(I) TOOL

(J) TOOL

Mean Difference

(I-J)Std.

Error Sig.


Lower Bound

Upper Bound

Task completed

High High B G .056 .034 .327 -.028 .139 B I -.009 .021 1.000 -.060 .041 G I -.065 .030 .103 -.138 .009 Low B G -.074 .081 1.000 -.271 .122 B I .148 .094 .351 -.080 .376 G I .222* .079 .018 .029 .415Low High B G .000 .000 . .000 .000 B I .000 .000 . .000 .000 G I .000 .000 . .000 .000 Low B G .009 .034 1.000 -.072 .091 B I .009 .036 1.000 -.078 .097 G I .000 .032 1.000 -.078 .078

Answer found

High High B G .046 .082 1.000 -.152 .245 B I -.074 .076 1.000 -.260 .112 G I -.120 .087 .508 -.332 .091 Low B G -.148 .074 .145 -.329 .032 B I -.111 .078 .475 -.301 .079 G I .037 .074 1.000 -.144 .218Low High B G -.009 .047 1.000 -.122 .104 B I .028 .052 1.000 -.098 .154 G I .037 .043 1.000 -.069 .143 Low B G -.306* .096 .006 -.540 -.072 B I -.046 .101 1.000 -.293 .200 G I .259 .094 .021 .030 .489

Time spent on task

High High B G -.149 .083 .224 -.350 .052 B I -.019 .084 1.000 -.224 .187

Ln(sec) G I .130 .082 .348 -.069 .329 Low B G -.061 .057 .863 -.200 .078 B I -.172* .065 .027 -.329 -.015 G I -.111 .061 .215 -.260 .038Low High B G -.332* .065 .000 -.490 -.174 B I -.181* .064 .017 -.337 -.025 G I .152 .064 .061 -.005 .308 Low B G -.110 .082 .542 -.308 .089 B I -.120 .081 .419 -.316 .076 G I -.010 .076 1.000 -.196 .176

150

Table 67: (Cont)


Question type

(I) TOOL

(J) TOOL

Mean Difference

(I-J)Std.

Error Sig.


Lower Bound

Upper Bound

Total pages viewed Ln(pages)

High High B G .551* .081 .000 .354 .748 B I .332* .064 .000 .176 .487 G I -.219* .074 .011 -.398 -.040 Low B G .530* .069 .000 .361 .698 B I .248* .074 .004 .067 .428 G I -.282* .074 .001 -.462 -.103Low High B G .131* .038 .002 .038 .223 B I .151* .039 .001 .055 .246 G I .020 .042 1.000 -.084 .123 Low B G .116 .062 .190 -.034 .266 B I .127 .061 .118 -.021 .275 G I .011 .054 1.000 -.121 .143

Different pages viewed

High High B G .536* .071 .000 .364 .709 B I .305* .054 .000 .173 .438 G I -.231* .066 .002 -.392 -.071

Ln(pages) Low B G .318* .062 .000 .166 .469 B I .131 .064 .128 -.024 .287 G I -.186* .068 .023 -.353 -.020 Low High B G .143* .029 .000 .071 .214 B I .143* .030 .000 .070 .216 G I .000 .036 1.000 -.087 .088 Low B G -.311* .053 .000 -.439 -.183 B I -.157* .053 .010 -.285 -.029 G I .154* .042 .001 .050 .257Re-visited pages

High High B G .194* .079 .046 .002 .385 B I .222* .074 .010 .043 .401

Ln(pages) G I .028 .063 1.000 -.125 .181 Low B G 1.041* .090 .000 .822 1.259 B I .482* .100 .000 .239 .725 G I -.559* .085 .000 -.765 -.353 Low High B G -.010 .045 1.000 -.120 .099 B I .023 .041 1.000 -.076 .122 G B .010 .045 1.000 -.099 .120 G I .033 .037 1.000 -.058 .124 Low B G 1.171* .085 .000 .964 1.379 B I .747* .092 .000 .523 .970 G I -.425* .092 .000 -.649 -.201

151

Table 67: (Cont)


Question type

(I) TOOL

(J) TOOL

Mean Difference

(I-J)Std.

Error Sig.


Lower Bound

Upper Bound

Extra pages viewed

High High B G -.005 .113 1.000 -.280 .269 B I -.357* .100 .002 -.599 -.114

Ln(pages) G I -.351* .092 .001 -.576 -.127 Low B G .473* .083 .000 .270 .675 B I .153 .086 .231 -.055 .361 G I -.320* .083 .001 -.523 -.117 Low High B G -.274* .061 .000 -.422 -.127 B I -.251* .063 .000 -.405 -.097 G I .023 .060 1.000 -.122 .168 Low B G .044 .076 1.000 -.142 .230 B I .067 .075 1.000 -.114 .249 G I .023 .065 1.000 -.135 .182Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.

I.8 Web complexity by question type interaction

Table 68: Pairwise Comparisons between question types in Web site complexity conditions


(I) Question type

(J) Question type

Mean Difference

(I-J)Std.

Error Sig.


Lower Bound

Upper Bound

Task Completed High High Low .583* .046 .000 .491 .675Low High Low .052* .014 .000 .025 .080

Answer found High High Low 1.145* .041 .000 1.063 1.227Low High Low .414* .049 .000 .316 .511

Time spent on tasks

High High Low -1.354* .046 .000 -1.444 -1.263Low High Low -1.505* .040 .000 -1.585 -1.426

Total page viewed


Different pages viewed

High High Low -.939* .039 .000 -1.016 -.863Low High Low -1.078* .030 .000 -1.136 -1.019

Extra page viewed


Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.

152

I.9 User satisfaction statistic

Table 69: Mauchly's Test of Sphericity on PPSUQ score

Measure Mauchly's WApprox. Chi-

Squaredf Sig.

EpsilonGreenhouse-

GeisserHuynh-

FeldtLower-bound

TOOL OVERALL .861 14.801 2 .001 .878 .893 .500 SYSUSE .867 14.132 2 .001 .883 .897 .500 INFOQUAL .851 15.959 2 .000 .870 .885 .500 INTERQUAL .792 23.141 2 .000 .828 .840 .500

Table 70: Pairwise Comparisons on PPSUQ score between tools

Measure (I) TOOL (J) TOOL Mean Difference (I-J)

Std. Error Sig.

OVERALL B G .849* .133 .000 I -.113 .095 .714 G I -.962* .106 .000 SYSUSE B G 1.081* .152 .000 I -.005 .107 1.000 G I -1.085* .127 .000 INFOQUAL B G .542* .132 .000 I -.253* .099 .036 G I -.795* .100 .000 INTERQUAL B G .882* .170 .000 I -.077 .134 1.000 G I -.960* .116 .000 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool

153

I.10 Web site familiarity statisticWeb site familiarity was determined by a questionnaire to assert if subjects had previously

visited any of the Web sites in the experiment. Subjects who had seen a Web site might perform the

experimental task better than others. 23 subjects of 105 who responded to the Web familiarity

questionnaire had visited one or more of the Web sites in the experiment. Three subjects had visited

two of the Web sites. 20 subjects had visited one of the Web sites (Table 71). There were 92 tasks

performed by subjects who were familiar with a Web site (3.55% of the total 2,592 tasks). This

group of tasks was not distributed evenly in the tool x Web complexity x question type conditions

(Table 72). The average of the task performance, the time spent on task and the number of page

views, were in between mean +/- 1 S.D of overall performance (Table 33 and Table 35) suggests that

there were not significant effects on the overall results.

Table 71: Subject's Web site familiarity

Web Site IDSeen before Last time visit Freq of visit 1 2 3 4 5 7 9 TotalYes Yesterday Many time a day 1 1 2

Within the last week Once a week 1 1Not very often 1 1

Within the last month Once a week 1 1 1 3 1 7More than a month ago

Once a week 1 1 2Not very often 2 1 3 3 4 13

Total 1 2 3 2 8 3 7 26No 102 101 100 101 95 100 96 695No data 5 5 5 5 5 5 5 35Grand Total 108 108 108 108 108 108 108 756

Table 72: Tasks performed by subjects who had visited Web sites prior experiment grouped by tool, Web site complexity, and question type

Tool Web site complexity

Question type

Num. of task

%(/ 216)

Task completion

Answer found

Avg. of time spent

Avg. of total page viewed

Browser High High 6 2.8% 6 6 69.15 10.83Low 6 2.8% 5 4 233.28 38.17

Low High 10 4.6% 10 10 13.70 3.10Low 10 4.6% 10 8 70.07 14.40

Graphical overview

High High 12 5.6% 12 12 102.53 5.58Low 12 5.6% 6 2 277.65 21.58

Low High 8 3.7% 8 8 36.00 3.75Low 8 3.7% 8 7 130.25 8.88

Integrated High High 2 0.9% 2 1 59.10 6.00Low 2 0.9% 0 0 360.00 37.50

Low High 8 3.7% 8 8 22.60 3.13Low 8 3.7% 8 7 77.23 12.13

154

Reference List

Albers, M. J. (1997). Cognitive strain as a factor in effective document design. Proceedings of the 15th

Annual International Conference on Computer Documentation (pp. 1-6). ACM.

Ankerst, M., Berchtold, S., & Keim, D. A. (1998). Similarity clustering of dimensions for an enhanced

visualization of multidimensional data. IEEE Symposium on Information Visualization (pp. 52-60,153).

Beccaria, M., Bertolazzi, P., Battista, G. D., & Liotta, G. (1991). A tailorable and extensible automatic

layout facility. IEEE Workshop on Visual Languages (pp. 68-73). IEEE.

Bederson, B. B. & Hollan, J. D. (1994). Pad++: A zooming graphical interface for exploring alternate

interface physics. Proceedings of the ACM Symposium on User Interface Software and Technology (pp. 17-

26). ACM.

Benedikt, M. (1992). Cyberspace: some proposals. In M. Benedikt (Ed.), Cyberspace: first steps

Cambridge, Massachusetts: The MIT Press.

Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H. F., & Secret, A. (1994). The World-Wide

Web. Communication of ACM, 37, 76-82.

Björk, S., Holmquist, L. E., & Redström, J. (1999). A framework for focus+context visualization.

IEEE Symposium on Information Visualization (InfoVis '99) (pp. 53-56,145). IEEE.

Bly, S. A. & Rosenberg, J. K. (1986). A comparison of tiled and overlapping windows. Conference

Proceedings on Human Factors in Computing Systems CHI86 (pp. 101-106). ACM.

Botafogo, R. A., Rivlin, E., & Shneiderman, B. (1992). Structural analysis of hypertexts: Identifying

hierarchies and useful metrics. ACM Transactions on Information Systems, 19, 142-180.

155

Boyle, C. & Teh, S. H. (1992). To link or not to link: An empirical comparison of Hypertext linking

strategies. Proceedings of the 10th Annual International Conference on Systems Documentation SIGDOC'92

(pp. 221-231). ACM.

Brandenburg, F. J. (1987). Nice drawings of graphs are computationally hard. In P. Gorny & M. J.

Tauber (Eds.), Visualization in Human-Computer Interaction (pp. 1-15). New York: Springer-Verlag.

Bray, T. (1996). Measuring the Web. Proceedings of the Fifth International World Wide Web

Conference Amsterdam, Netherlands: Elsevier Science.

Brewington, B. E. & Cybenko, G. (2000). How dynamic is the web? The Ninth International World

Wide Web Conference (WWW9) Amsterdam.

Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., & Wiener,

J. (2000). Graph structure in the Web. Proceedings of the Ninth International World-Wide Web Conference

WWW9 Amsterdam.

Buckland, M. K. (1991). Information as thing. Journal of the American Society for Information

Science, 42, 351-360.

Bush, Vannevar (1945). As we may think. The Atlantic Monthly, 101-108.

Campbell, C. S. & Maglio, P. P. (1999). Facilitating navigation in information spaces: Road-signs on

the World Wide Web. International Journal of Human-Computer Studies, 50, 307-327.

Chen, C. & Rada, R. (1996). Interacting with hypertext: A meta-analysis of experimental studies.

Human-Computer Interaction, 11, 125-156.

Cockburn, A. & Jones, S. (1996). Which way now? Analysing and easing inadequacies in WWW

navigation. International Journal of Human-Computer Studies, 45, 105-129.

Conklin, J. (1987). Hypertext: An introduction and survey. IEEE Computer, 20, 17-41.

156

Czerwinski, M. & Larson, K. (1998). Business: trends in future Web designs: what's next for the HCI

professional? Interactions, 5, 9-14.

Dillon, A. (1994). Designing usable electronic text ergonomic aspects of human information usage.

Bristol, PA: Taylor & Francis Inc.

Dix, A. & Mancini, R. (1998). Specifying history and backtracking mechanisms. In P.Palanque & F.

Paternò (Eds.), Formal Methods in Human-Computer Interaction (pp. 1-23). London: Springer.

Durand, D. & Kahn, P. (1998). MAPA: a system for inducing and visualizing hierarchy in websites.

Proceedings of the Ninth ACM Conference on Hypertext (pp. 66-76).

Engelbart, D. C. (1963). A conceptual framework for the augment of man's intellect. In P.W.Howerton

(Ed.), Vistas in information handling (pp. 1-29). Washington, D.C.: SPARTAN BOOKS.

Fleming, J. (1998). Web Navigation: Designing the user experience. Sebastopol, CA: O'Reilly.

Fowler, R. H., Fowler, W. A. L., & Wilson, B. A. (1991). Integrating Query, Thesaurus, and

Documents through a Common Visual Representation. Proceedings of the Fourteenth Annual International

ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 142-151).

Fowler, R. H., Kumar, A., & Williams, J. L. (1996). Visualizing and browsing WWW semantic

content. Emerging Technologies and Applications in Communications, 1996. Proceedings., First Annual

Conference on (pp. 110-113). IEEE.

Furnas, G. W. (1982). The FISHEYE view: A new look at structured files (Rep. No. Technical

Memorandum, #82-11221-22). Murray Hill, N.J.: Bell Laboratories.

Furnas, G. W. (1997). Effective View Navigation. Conference Proceedings on Human Factors in

Computing Systems CHI'97 (pp. 367-374). ACM.

157

Gaylin, K. B. (1986). How are windows used? Some notes on creating an empirically-based

windowing benchmark task. Conference Proceedings on Human Factors in Computing Systems CHI'86 (pp. 96-

100). ACM.

Gloor, P. A. (1997). Element of hypermedia design: techniques for navigation and visualization in

cyperspace. Boston: Birkhauser.

Graphics, Visualization & Usability (GVU) Center (1998). GVU's 10th WWW user survey. (n.d.).

Retrieved January, 2000, from http://www.gvu.gatech.edu/gvu/user_surveys/

Hammond, N. & Allinson, L. (1989). Extending hypertext for learning: An investigation of access and

guidance tools. Proc. BCS HCI'89 (pp. 293-304). Nottingham,U.K.

Heo, M. (2000). A Usability Study on Web Visualization Techniques and User Mental Models. Ph.D.

University of Pittsburgh.

Hirtle, S. C. & Jonides, J. (1985). Evidence of hierarchies in cognitive maps. Memory & Cognition,

13, 208-217.

Hölscher, C. & Strube, G. (2000). Web Search Behavior of Internet Experts and Newbies. 9th

International World Wide Web Conference Amsterdam.

Huberman, B. A. & Adamic, L. A. (1999). Evolutionary Dynamics of the World Wide Web Palo Alto,

CA: Internet Ecologies Group, Xerox Palo Alto Research Center.

Internet Engineering Task Force (IETF) (1994). Uniform Resource Locators (URL) (Rep. No.

RFC1738). Retrieved January, 2000, From http://www.ietf.org/rfc/rfc1738.txt

Internet Engineering Task Force (IETF) (1998). Uniform Resource Identifiers (URI) (Rep. No.

RFC2396). Retrieved January, 2000, From http://www.ietf.org/rfc/rfc2396.txt

Internet Engineering Task Force (IETF) (1999). Hypertext Transfer Protocol -- HTTP/1.1 (Rep. No.

RFC2616). Retrieved January, 2000, From http://www.ietf.org/rfc/rfc2616.txt.

158

http://www.gvu.gatech.edu/gvu/user_surveys/

Jerding, D. F. & Stasko, J. T. (1995). The Information Mural: A technique for displaying and

navigating large information spaces. IEEE Information Visualization Symposium IEEE Computer Society

Press.

Jul, S. & Furnas, G. W. (1997). Navigation in electronic worlds: A CHI 97 Workshop. SIGCHI, 29.

Kohonen, T. (1998). Self-organization of very large document collections: State of the art. Proceedings

of ICANN98, the 8th International Conference on Artificial Neural Networks (pp. 65-74). London: Springer.

Lamping, J., Rao, R., & Pirolli, P. (1995). A Focus+Context Technique Based on Hyperbolic

Geometry for Visualizing Large Hierarchies. Proceedings of ACM CHI'95 Conference on Human Factors in

Computing Systems (pp. 401-408).

Larson, K. & Czerwinski, M. (1998). Web Page Design: Implications of Memory, Structure and Scent

for Information Retrieval. Proceedings of ACM CHI 98 Conference on Human Factors in Computing Systems

(pp. 25-32).

Lawrence, Steve and Giles, C. Lee (1999). Accessibility of information on the web. Nature, 400, 107-

109.

Lewis, J. R. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation

and Instructions for Use. International Journal of Human-Computer Interaction, 7, 57-78.

Lin, X., Soergel, D., & Marchionini, G. (1991). A Self-Organizing Semantic Map for Information

Retrieval. Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and

Development in Information Retrieval (pp. 262-269).

Mackinlay, J. (1986). Automating the design of graphical presentations of relational information. ACM

Transactions on Graphics, 5, 110-141.

Maurer, H. (1996). Hyper-G now Hyperwave : the next generation Web solution. New York: Addison-

Wesley Publishing.

159

McKnight, C., Dillon, A., & Richardson, J. (1991). Hypertext in Context. Cambridge; New York:

Cambridge University Press.

Minar, N. & Donath, J. (1999). Visualizing the crowds at a Web site. Human Factors in Computing

Systems CHI'99 Extended Abstracts (pp. 186-187). ACM SIGHCI.

Monk, A. F., Walsh, P., & Dix, A. J. (1988). A comparison of hypertext, scrolling and folding

mechanisms for program browsing. In People and Computer IV (pp. 421-435). Cambridge University Press.

Nakayama, T., Kato, H., & Yamane, Y. (2000). Discovering the Gap Between Web Site Designers'

Expectations and Users' Behavior. The Ninth International World Wide Web Conference (WWW9): The Web:

The Next Generation Amsterdam.

Nation, D. A., Plaisant, C., Marchionini, G., & Komlodi, A. (1997). Visualizing websites using a

hierarchical table of contents browser: WebTOC. Human Factors and the Web Conferences Colorado.

Nelson, T. H. (1987). Literary Machines. (87.1 ed.) Published by the author.

Nielsen, J. (1989). The Matter that Really Matter for Hypertext Usability. Hypertext'89 Proceeding

(pp. 239-248). ACM.

Nielsen, J. (1990). The Art of Navigation through Hypertext. Communication of ACM, 33, 296-310.

Nielsen, J. (1999). User Interface directions for the Web. Communication of ACM, 42, 65-72.

North, C. & Shneiderman, B. (1997). A Taxonomy of Multiple Window Coordinations (Rep. No. CS-

TR-3854). University of Maryland, College Park, Dept of Computer Science.

North, C. & Shneiderman, B. (1999). Snap-Together visualization: Coordinating multiple views to

explore information (Rep. No. CS-TR-4020). University of Maryland, College Park, Dept of Computer Science.

Olson, J.R., & Nielsen, E. (1987). Analysis of the cognition involved in spreadsheet software

interaction. Human-Computer Interaction 3, 4, 309-349.

160

Perlin, K. & Fox, D. (1993). Pad - An alternative approach to the computer interface. Proceedings of

the 20th Annual Conference on Computer Graphic, SIGGRAPH '93 (pp. 57-64). ACM.

Pirolli, P., Card, S. K., & Wege, M. M. V. D. (2000). The Effect of Information Scent on Searching

Information Visualizations of Large Tree Structures (Rep. No. UIR-R-2000-04-Pirolli-AVI2000-

InfoScentAndHBSearch ). Xerox PARC User Interface Research Group.

Pirolli, P. & Card, S. (1999). Information Foraging. Psychological Review, 106, 643-675.

Pitkow, J. E. (1998). Summary of WWW characterizations. The Seventh International World Wide

Web Conference Brisbane, Australia.

Pitkow, J. E. (1999). Summary of WWW characterizations. World Wide Web, 2, 3-13.

Schoon, P. L. (1997). World Wide Web Hypertext linkage patterns. PHD Illinois State University.

Shneiderman, B. (1994). Dynamic queries for visual information seeking. IEEE Software, 11, 70-77.

Shneiderman, B. (1998). Designing the user interface: Strategies for effective human-computer

interaction. (Third ed.) Reading MA.: Addison-Wesley.

Snowdon, D., Fahlen, L., & Stenius, M. (1996). WWW3D: A 3D multi-user Web browser. WebNet 96

Proceedings California USA: AACE.

Spence, R. (1998). Navigation in real and virtual worlds. Workshop on Personalised and Social

Navigation in Information Space (pp. 69-76). IFIP and Navigation SIG of Rsprit's i3-net.

Spring, M. B. (1991). Electronic printing and publishing: the document processing revolution. New

York: Marcel Dekker, Inc.

Spring, M. B., Morse, E., & Heo, M. (1996). Multi-level navigation of a document space. Leveraging

Cyberspace Conference Palo Alto, CA.

161

Stone, M. C., Fishkin, K., & Bier, E. A. (1994). The movable filter as a user interface tool. Proceedings

of CHI 94 (pp. 306-312). ACM: New York.

Tauscher, L. & Greenberg, S. (1997a). How people revisit Web pages: Empirical findings and

implications for the design of history systems. International Journal of Human-Computer Studies, 47, 97-137.

Tauscher, L. & Greenberg, S. (1997b). Revisitation patterns in World Wide Web navigation.

Proceedings of ACM CHI 97 Conference on Human Factors in Computing Systems (pp. 399-406).

Tversky, B., Franklin, N., Taylor, H. A., & Bryant, D. J. (1994). Spatial mental models from

descriptions. Journal of the American Society for Information Science, 45, 656-668.

Weinreich, H. & Lamersdorf, W. (2000). Concepts for Improved Visualization of Web Link

Attributes. Proceedings of the 9th International World Wide Web Conference Amsterdam, The Netherlands.

Wexelblat, A. & Maes, P. (1999). Footprints: history-rich tools for information foraging. CHI 99

Conference Proceedings (pp. 270-277). New York: ACM.

Whitaker, L. A. (1997). Human navigation. In C.Forsythe, E. Grose, & J. Ratner (Eds.), Human

Factors and Web Development (pp. 63-71). NJ: Lawrence Erlbaum Associates.

Wood, A., Drew, N., Beale, R., & Hendley, B. (1995). HyperSpace: Web browsing with visualisation.

Technology, Tools and Applications, the Third International World Wide Web Conference Darmstadt,

Germany.

Woodruff, A., Aoki, P. M., Brewer, E., Gauthier, P., & Rowe, L. A. (1996). An investigation of

documents from the World Wide Web. Fifth International World Wide Web Conference.

World Wide Web Consortium (1999). HTML 4.01 Specification (Rep. No. REC-html401-19991224).

Retrieved January, 2000, From http://www.w3.org/TR/1999/REC-html401-19991224/

162

Wright, P. & Lickorish, A. (1990). An empirical comparison of two navigation systems for two

hypertexts. In R. McAleese & C. Green (Eds.), Hypertext: State of the Art (pp. 84-93). Oxford, England:

Intellect.

163

Appendix A: URI in HTML tags -...

Documents

Transcript of Appendix A: URI in HTML tags -...