รายงานประจำาปี 2553 บัณฑิตวิทยาลัย มหาวิทยาลัยขอนแก่น การ ... · 28 มิถุนายน
Appendix A: URI in HTML tags -...
Transcript of Appendix A: URI in HTML tags -...
Semantics, Complexity and Capability:The Use of Integrated Navigational Tools
for Information Finding in Hypertext Document Space
By
Wasu Chaopanon
B.S.E.E., Khon Kaen University, Thailand, 1987
M.S., Computer Science, New York University, New York, 1996
Submitted to the Graduate Faculty ofInformation Sciences in partial fulfillment
of the requirement for theDegree of Doctor of Philosophy
University of Pittsburgh
2001
Semantics, Complexity and Capability:The Use of Integrated Navigational Tools
for Information Finding in Hypertext Document Space
Wasu Chaopanon, Ph.D.
University of Pittsburgh, 2001
This study examines the performance of navigational tools in information finding tasks based
on the complexity of the hypertext space, and the degree of “information scent” available through the
tools. Operational metrics for the Web site complexity were examined and analyzed. Information
scent was measured empirically. The 3x2x2 factorial design within subjects was employed. A
browser, a graphical overview and integrated tool were examined. Questions were created, measured
for information scent, and classified as high information scent and low information scent questions
over six Web sites, three low complexity Web sites and three high complexity Web sites. The number
of tasks completed, the number of answers found, time spent on task, and the number of pages viewed
were measured.
Performance in the information finding tasks was different when using different tools in the
various conditions. The results showed that there were significant interactions between tool, Web site
complexity, and question type in performance measurements. Three-way interactions were found in
the number of tasks completed and the number of revisited page views. Two-way interactions
between tool and Web site complexity were found in the number of page views, the number of pages,
and the number of extra page views. Two-way interactions between tool and question type were
found in the number of answers found, time spent on task, the number of pages, and the number of
extra page views. The Web site complexity and the information scent show strong effects on tasks’
performance.
Although the integrated tool had more capabilities than either single tool alone, it did not
provide higher performance. The integrated tool leverages the difference of single tool capabilities.
There was an indication that the integrated tool had more cognitive overhead.
iii
TABLE OF CONTENTS
1 INTRODUCTION...........................................................................................................................1
1.1 Overview.................................................................................................................................1
1.2 Problem Statement..................................................................................................................2
1.3 Motivation and Goal of this Research.....................................................................................4
1.4 Definition of Terms.................................................................................................................5
1.5 Scope and Limitations.............................................................................................................6
2 LITERATURE REVIEW................................................................................................................7
2.1 Document space and the WWW.............................................................................................7
2.1.1 Document Spaces............................................................................................................7
2.1.2 Hypertext.........................................................................................................................8
2.1.3 The World Wide Web (WWW)......................................................................................9
2.2 Navigation and Information-Finding Tasks..........................................................................11
2.2.1 The Questions Answered by Navigational Tools..........................................................12
2.2.2 Navigation in Document Space.....................................................................................13
2.2.3 Problems of Navigation in Document Space................................................................16
2.2.4 WWW Usage.................................................................................................................18
2.3 Navigational tools in Document Spaces................................................................................19
2.4 Integrated Document Space Navigational tools....................................................................24
2.4.1 Tool Integration.............................................................................................................24
2.4.2 Evaluation of Integrated Navigational tools in Hypertext.............................................28
3 RESEARCH METHODOLOGY..................................................................................................31
3.1 Introduction...........................................................................................................................31
3.1.1 Document Space............................................................................................................31
3.1.2 Web Site Metrics...........................................................................................................31
3.1.3 Study of Web sites structure..........................................................................................36
3.1.4 Task and semantic relatedness......................................................................................49
3.1.5 Navigational tools and Integration................................................................................50
3.1.6 Summary.......................................................................................................................53
3.2 Hypotheses............................................................................................................................53
3.3 Participants............................................................................................................................55
iv
3.4 Material.................................................................................................................................55
3.4.1 Web Sites.......................................................................................................................55
3.4.2 Questions and their Information Scent..........................................................................56
3.4.3 Software.........................................................................................................................60
3.5 Experimental Design.............................................................................................................60
3.6 Experimental Task.................................................................................................................61
3.7 Procedure...............................................................................................................................61
3.8 Data Collection and Measurement........................................................................................62
4 RESULTS AND DISCUSSION...................................................................................................64
4.1 Demographic Data of Recruited Subjects.............................................................................64
4.2 Results...................................................................................................................................65
4.2.1 Tool usage.....................................................................................................................65
4.2.2 Task completion............................................................................................................80
4.2.3 Number of answers found.............................................................................................82
4.2.4 Task performance..........................................................................................................87
4.2.5 Web complexity, Question type and their interaction...................................................98
4.3 Summary task performance at each condition......................................................................98
4.3.1 Low complexity Web sites with high information-scent questions..............................99
4.3.2 High complexity Web sites with high information-scent questions..............................99
4.3.3 Low complexity Web sites with low information-scent questions.............................100
4.3.4 High complexity Web sites with low information-scent questions.............................100
4.4 User satisfaction..................................................................................................................101
4.5 Support for Hypotheses.......................................................................................................103
5 CONCLUSIONS AND FUTURE STUDY................................................................................105
5.1 Review of the research........................................................................................................105
5.2 Summary finding.................................................................................................................106
5.3 Comparison to prior research results...................................................................................107
5.4 Issues to reconsider.............................................................................................................109
5.5 Future research....................................................................................................................110
Appendix A : Web visualize tools.......................................................................................................111
Appendix B : URI in HTML tags........................................................................................................113
Appendix C : Stratum formula............................................................................................................113
v
Appendix D : Web site structure statistic............................................................................................114
Appendix E : Web Sites in the experiment and their properties.........................................................116
Appendix F : Information Scent experiment.......................................................................................117
Appendix G : The main experiment instruction sheet.........................................................................127
Appendix H : Questionnaires..............................................................................................................135
H.1 Demographics, Computer and World Wide Web Experience form.........................................135
H.2 Web sites familiarity score.......................................................................................................136
H.3 User satisfaction Questionnaire................................................................................................137
Appendix I : Statistical Analysis results..............................................................................................139
I.1 Tool usage statistic.....................................................................................................................139
I.2 Task completion statistic............................................................................................................140
I.3 Number of answer found statistic..............................................................................................141
I.4 Outliers: Extreme cases..............................................................................................................144
I.5 Time spent on task statistic........................................................................................................146
I.6 Number of page viewed statistic................................................................................................147
I.7 Tools performances comparisons...............................................................................................151
I.8 Web complexity by question type interaction............................................................................153
I.9 User satisfaction statistic............................................................................................................154
I.10 Web site familiarity statistic....................................................................................................155
Reference List......................................................................................................................................156
vi
List of TablesTable 1: Content types of scanned URLs..............................................................................................38
Table 2: Tags-attributes of links............................................................................................................39
Table 3: Descriptive Statistics of number of nodes and links...............................................................42
Table 4: Descriptive statistic of Web site properties.............................................................................44
Table 5: Number of Web Sites by their complexity..............................................................................47
Table 6: Questions classification based on their information scents.....................................................58
Table 7: Summary of the information scent of the selected questions..................................................59
Table 8: Summary of the minimum pages required finding the selected target nodes.........................59
Table 9: Summary of subjects’ demographic data................................................................................64
Table 10: Summary of subjects’ computer experience data..................................................................65
Table 11: Summary statistic of time between anchor clicks in the browser.........................................66
Table 12: ANOVA on ln(time between anchor clicks) of the browser.................................................67
Table 13: Pairwise comparison between ln(time between anchor clicks), Bonferroni adjustment......67
Table 14: Summary statistic of time between icon clicks of the graphical overview...........................68
Table 15: ANOVA on ln(time between icon clicks) of the graphical overview...................................68
Table 16: Pairwise comparison between ln(time between icon clicks), Bonferroni adjustment..........69
Table 17: Frequency Distribution for tool usage based on location of navigation actions...................70
Table 18: Summary statistic of BNAR and BTUR grouped by Web site complexity conditions and
question type conditions................................................................................................................72
Table 19: ANOVA on Browser Navigation Action Ratio....................................................................72
Table 20: State transition probability in using the integrated tool........................................................74
Table 21: Time between state transitions in using the integrated tool..................................................75
Table 22: ANOVA on ln(time between clicking) when using the integrated tool................................76
Table 23: ANOVA on ln(time between anchors clicking) comparison the browser and the integrated
tool.................................................................................................................................................77
Table 24: ANOVA on ln(time between icons clicking) comparison the graphical overview and the
integrated tool................................................................................................................................77
Table 25: Summary statistic of adjusted time spent on tool..................................................................79
Table 26: Summary statistic of number of tasks completed.................................................................80
Table 27: ANOVA on number of tasks completed, lower bound correction........................................81
Table 28: ANOVA on number of tasks completed in the high complexity Web site condition with
lower-bound correction.................................................................................................................82
vii
Table 29: ANOVA on number of task completed in the low complexity Web site condition with
lower-bound correction.................................................................................................................82
Table 30: Summary statistics of the number of answers found............................................................83
Table 31: Summary number of answer found, answer not found, and timed out grouped by Web site
complexity and question type........................................................................................................84
Table 32: ANOVA on the number of answers found............................................................................86
Table 33: Summary statistic of time spent on tasks (sec.)....................................................................87
Table 34: ANOVA on ln(time spent on task)........................................................................................88
Table 35: Descriptive statistics of the number of page views and the number of pages.......................90
Table 36: Descriptive statistics of the number of revisited page views and the number of extra page
views..............................................................................................................................................91
Table 37: Number of tasks where the extra page views were zero.......................................................92
Table 38: ANOVA on ln(number of page views), ln(number of pages), ln(number of revisited page
views), and ln(number of extra page views).................................................................................93
Table 39: ANOVA on ln(number of revisited page views) only in the high information-scent question
type................................................................................................................................................96
Table 40: ANOVA on ln(number of revisited page views) only in the low information-scent question
type................................................................................................................................................96
Table 41: Summary of tools difference in Web site complexity and question type condition.............98
Table 42: Questionnaire descriptive statistics.....................................................................................102
Table 43: ANOVA on PSSQU score with lower-bound correction....................................................102
Table 44: Correlations between numbers of nodes.............................................................................114
Table 45: Correlations between numbers of links...............................................................................114
Table 46: Distance measurement correlation......................................................................................114
Table 47: Correlation between the Web site metrics..........................................................................115
Table 48: Questions, target Web page and selected Web pages for information scent experiment. . .121
Table 49: Information-scent score.......................................................................................................126
Table 50: Pairwise Comparisons ln(time between clicking) of the integrated tool............................139
Table 51: Mauchly's Test of Sphericity on number of tasks completed.............................................140
Table 52: Pairwise Comparisons on number of task completed between tools in question type
conditions only in high complex Web site..................................................................................140
Table 53: Number task that subject visited the target node but submitted other node or time out.....141
Table 54: Number of answer grouped by question.............................................................................141
Table 55: Task submitted only the answer not found.........................................................................142
viii
Table 56: Mauchly's Test of Sphericity on number of answers found................................................142
Table 57: Pairwise comparisons on number of answers found between tools in question type
conditions....................................................................................................................................143
Table 58: Pairwise Comparisons on number of answers found between tools in Web site complexity
conditions....................................................................................................................................143
Table 59: Number of extreme case......................................................................................................145
Table 60: Mauchly's Test of Sphericity on ln(time spent on task)......................................................146
Table 61: Pairwise comparisons on ln(time spent on task).................................................................146
Table 62: Mauchly's Test of Sphericity for number of pages statistic................................................147
Table 63: Pairwise comparisons on Ln(number of pages) between tools in Web complexity conditions
.....................................................................................................................................................148
Table 64: Pairwise Comparisons on Ln(number of pages) between tool in question type conditions
.....................................................................................................................................................149
Table 65 Pairwise comparisons on Ln(number of re-visited pages) between tools in Web complexity
conditions only in the high information-scent question type......................................................150
Table 66: Pairwise comparisons on Ln(number of re-visited pages) between tools only in the low
information-scent question type..................................................................................................150
Table 67: Tools performances comparisons........................................................................................151
Table 68: Pairwise Comparisons between question types in Web site complexity conditions...........153
Table 69: Mauchly's Test of Sphericity on PPSUQ score...................................................................154
Table 70: Pairwise Comparisons on PPSUQ score between tools......................................................154
Table 71: Subject's Web site familiarity.............................................................................................155
Table 72: Tasks performed by subjects who had visited Web sites prior experiment grouped by tool,
Web site complexity, and question type......................................................................................155
ix
List of Figures
Figure 1: Graphical overview and Browser............................................................................................2
Figure 2: Frequency of navigation action as a percentage of the total navigation events and (b) details
of Open URL action......................................................................................................................18
Figure 3: Process model of information seeking using Web (transition probability)...........................19
Figure 4: A taxonomy of multiple window coordination (North & Shneiderman, 1997).....................26
Figure 5: Proportion of navigational tool usage in an exploratory and directed tasks..........................29
Figure 6: Summary of URLs founded...................................................................................................38
Figure 7: Links summary......................................................................................................................39
Figure 8: Number of URLs...................................................................................................................41
Figure 9: Number of links.....................................................................................................................41
Figure 10: Histogram of number of HTML nodes and number of connections....................................42
Figure 11: Total URLs versus total links and HTML node versus connections of each site................43
Figure 12: Histogram of #connections per #HTML node-1..................................................................45
Figure 13: Histogram of connected ratio...............................................................................................45
Figure 14: Histogram of stratum...........................................................................................................46
Figure 15: Histograms of distances.......................................................................................................46
Figure 16: Mean directed distance and bi-direction distance versus Number of HTML nodes...........48
Figure 17: Scatter plots between Web site parameters..........................................................................48
Figure 18: The browser screen snapshot...............................................................................................51
Figure 19: The graphical overview and text viewer screen snapshot....................................................52
Figure 20: The graphical overview and the browser.............................................................................52
Figure 21: Information scent score........................................................................................................58
Figure 22: Cell line chart of mean (time between anchor clicks) when using the browser..................67
Figure 23: Cell line chart of mean (time between icon clicks) when using the graphical overview....68
Figure 24: Histogram of browser navigation action ratio in the integrated tool...................................71
Figure 25: Histogram of browser time usage ratio in the integrated tool..............................................71
Figure 26: State transition probability in using the integrated tool.......................................................73
Figure 27: Cell line chart of mean (time between clicking) when using the integrated tool................76
Figure 28: Cell line chart of mean ln(time between anchor-anchor clicking) when using the browser
and using the integrated tool.........................................................................................................78
Figure 29: Cell line chart of mean ln(time between icon-icon clicking) when using the graphical
overview and using the integrated tool..........................................................................................78
x
Figure 30: Cell line chart of mean number of tasks completed grouped by tool, Web site complexity,
and question type show interactions..............................................................................................81
Figure 31: The percent of answers found, answers not found, and tasks incomplete for each question.
.......................................................................................................................................................85
Figure 32: Histogram of submitted pages each question only tasks that not timed out and the target
node not found...............................................................................................................................85
Figure 33: Cell line charts of mean number of answers found showing tool by Web site complexity
interaction and tool by question type interaction..........................................................................86
Figure 34: Histogram of time spent on task..........................................................................................88
Figure 35: Cell line chart of mean ln(time spent on task) grouped by tool, question type show tool by
question type interaction...............................................................................................................89
Figure 36: Histograms of the number of page views, the number of pages, the number of revisited
page views, and the number of extra page views by tasks............................................................91
Figure 37: Cell line chart of mean ln(pages views) shows tool by Web site complexity interaction...94
Figure 38: Cell line charts for ln(number of pages) shows tool by Web site complexity interaction and
tool by question type interaction...................................................................................................95
Figure 39: Cell line chart of mean for ln(number of revisited page views) show tool by Web
complexity interaction...................................................................................................................95
Figure 40: Cell line charts of mean ln(number of extra page views) show tool by Web site complexity
interaction and tool by question type interaction..........................................................................97
Figure 41: Web browser with a distortion technique tool...................................................................111
Figure 42: Web browser with a zoom technique tool.........................................................................111
Figure 43: Web browser with an expanding outline technique tool....................................................112
Figure 44: Demographic data screen...................................................................................................135
Figure 45: Web site familiarity screen................................................................................................136
Figure 46: User Satisfaction Questionnaire screen.............................................................................137
xi
Semantics, Complexity and Capability:
The Use of Integrated Navigational Tools for Information Finding in
Hypertext Document Space
1 INTRODUCTION
1.1 OverviewThis research examined the use of integrated navigational tools to find information located
within a single Web site of the World Wide Web (WWW). An empirical experiment was conducted
in order to understand the use of navigational tools in document spaces of varying complexity and
with varying levels of semantic information.
Ease of access has made the WWW a common source of information. The number of Web
pages already exceeds 800 million (Lawrence & Giles, 1999). It has been growing at an exponential
rate and is expected to double in the next five years (Nielsen, 1999). It is sometimes difficult to find
information in this massive information space and improvements in Web page structure and
navigational tools are needed.
From the library at Alexandria to the electronic repositories on the WWW, browsing has been
a method people use to find information. The process of browsing is easy to understand using
metaphors of space, place, and movement. A sense of location and place can easily be obtained by
most users with little conscious attention (McKnight, Dillon, & Richardson, 1991). Navigation is the
activity that allows browsing of a document space. The design of improved navigational tools will
contribute to the overall efficiency and effectiveness of browsing activities.
While the capability of the tools available is one factor in navigation, it is not the only factor.
The structure of the hypertext or document space also influences the navigation process. For instance,
a typical Web browser in a space that is a linear linked list will require a visit to all nodes before
accessing the end node. In a mesh structure, all nodes will be one link apart. Thus, link following is
strongly influenced by the underlying structure. On the other hand, using a graphical overview (see
Figure 1), any visible node can be selected directly without regard to the structure.
1
Figure 1: Graphical overview and Browser
There are yet other factors beyond tool capability and space complexity that interact with the
navigation process. In an information-finding task, the match between the information need and
information provided by a navigational tool becomes a significant factor in selecting the path to travel
through a document space. At a simple level, labeling of link anchors or nodes in a graphical
overview can provide cues as to where to go.
In summary, navigation to find information is dependent upon the complexity of the space in
which the information is located, the nature of the navigational tools available, and the richness of the
information available about the space.
1.2 Problem StatementResearchers are looking at tools to manage the various information spaces. This paper focuses
on one subset of tools, navigational tools, as one method of finding information. Further, the focus is
on one type of information space -- a document space. The World Wide Web (WWW) was selected
as the subject of the investigation because it is widely used and has hypertext features which are a
generic class of document spaces. A document space that has a list structure or a hierarchical
structure is a special case of a network.
There are already many kinds of navigational tools. In order to improve their effectiveness,
integration among tools is proposed as a key factor. The idea comes from Spring, Morse, & Heo
(1996) who discussed a set of interrelated tools that play a role in different phases of navigation.
Two navigational tools will be examined, Web browser and graphical overview. A Web
browser is the most common mode of navigation in the WWW (e.g. Internet Explorer or Netscape).
2
A graphical overview is a traditional hypertext tool, and in the literature is often called a “Browser”.
In the early hypertext literature, a browser provides a structural overview of a hypertext. However, in
this study, the browser will refer to the Web browser one.
A Web browser provides navigation capability as well as document content presentation. Only
one document, one page or one node at a time is presented by a Web browser. It is similar to
navigating in an egocentric view. In contrast, a graphical overview presents a view of the overall
structure of a hypertext, an exocentric view. Depending on the size of the Web site, a graphical
overview may present only a local overview of space. With a scroll bar, other areas can be shown. A
graphical overview navigates a Web site via active graphical objects. The integrated tool in this thesis
is the combination of the graphical overview and the browser.
Navigation in a Web browser is tied to the structure of the Web pages because the two
common methods of navigation in a Web browser are following a link and going back. On the other
hand, navigation by a graphical overview allows jumps to any node in the space with equal ease. A
graphical overview has it own problems in navigation. It cannot show very much information about
the nodes, i.e. only label, part of a label or some encoded data via color, size, shape of icon. This is
due to the size of the structure. It may present the Web page as a very small icon. The links can
quickly overwhelm the display. Many other display techniques may help such as zooming,
focus+context scheme, grouping nodes into single node, and combining multiple links line into single
thick line. An integrated tool might perform better than a single tool alone since it has the capability
of each individual tool.
Empirical studies show that using an additional navigational tool, specifically an overview
map, causes mixed results in efficiency and effectiveness of navigation process. Monk, Walsh, & Dix
(1988) show significant improvement when the overview map is provided. Hammond & Allinson
(1989) report on a small effect or non-statistical improvement in task efficiency. Heo (2000) reports
lower performance in a navigation task when an integrated tool was used with the Web browser, i.e.
response times were higher when compared to using the Web browser alone. Details of these studies
will be presented in section 2.4.2. Many of the new navigational tools presented do not report a
usability study.
One of the goals to navigation in the WWW is to find useful information. The information
need drives the navigation process. In navigating, decisions are made to select the path. These
decisions depend on the information need and the information that is provided by the environment,
i.e. information presented by the interface used for navigation. The relation between the information
need and the information provided by the tool is defined as “semantic relatedness”, “residual
information”, and “information scent.” For example, suppose the information need is some person’s
3
office address. We navigate the WWW looking for the person’s name. If the Web page contains an
anchor with that person’s name, it has a high information scent. The anchor may or may not lead to
the person’s address information. If the Web page contains an anchor with “personnel” or “staff”, it
has a lower information scent. On other hand, if the Web page contains nothing related to the person
at all, it has low information scent. Semantic relatedness is discussed further in section 2.2.2.
1.3 Motivation and Goal of this ResearchFurnas (1997) provided a framework to determine the effectiveness of a view of a space. A
view is a presentation of the information space via a user interface. The view can be analyzed in terms
of view traversal and view navigation components. The view traversal refers to the ability to move
the view around within an information structure. The traversibility of the view can be described in
terms of “out-degree of vertices” and the distance between pairs of vertices. The vertices are the
active items in the view. Out-degree of vertices refers to the number of vertices that the source
vertices lead to. View navigation refers to information in each view that describes other views. The
view that is effective should have a low out-degree of vertices, low distance between pairs of vertices
and high “residue” of view information in all other views. The “residue” concept is similar to
“information scent”.
Furnas’s framework indicated two components that might be used to improve the
navigational performance when using the Web browser, structural property of the Web site and
semantic relatedness between information need and information provided by the Web page. The
navigational performance when using the Web browser with a graphical overview should be different
from using the Web browser alone because the view of the information space is different.
This study explored the suitability of selected types of navigational tools for different spaces
(Web sites). Navigation in an information-finding task was a major concern. The structure of Web
site was analyzed in terms of several metrics – number of nodes, number of links, mean distance
between nodes, etc., as an indicator of its complexity. Information scent was also measured as the
relation between the information sought in an information-finding task and information provided
through the interface. In this study, the structure of Web site and information scent were controlled.
Two navigational tools were tested and their performance was compared in different complexities
under different information scent conditions. The integration of navigational tools was compared to a
single tool performance.
Based on the findings, new tools may be recommended for certain types of document spaces
or modifications in the design of spaces may be suggested when appropriate tools are unavailable.
Finally we believe the study may reveal the conditions under which integrated navigational tools
4
contribute to navigation and conditions under which they simply add noise and unnecessary cognitive
overhead to the task.
1.4 Definition of Terms Anchor: an active text or graphic area in a hypertext system indicating a link. It is
used in a link following interface to navigate to the link’s destination.
Closed Hypertext: a self-contained hypertext system.
Document: “A document is an identifiable entity, having some durable form,
produced by a person or persons toward the goal of communication and may take a
number of forms, but must have at least one symbolic manifestation that can be
comprehended by humans." Spring, 1991 (p.8)
Document Space: a collection of documents, which have some common attributes.
Graphical overview: a graphic user interface that provides an overview a set of
linked nodes.
Hypertext: a document or documents with explicitly defined relationships between
documents or document components.
Information Scent: “the (imperfect) perception of the value, cost, or access path of
information sources obtained from proximal cues” (Pirolli & Card, 1999).
Link: an explicitly defined relationship between nodes in a hypertext system.
Navigation: a process of moving in space, including virtual movement through
cognitive space.
Navigational tools: tools that help us in navigation. These include tools to navigate
and tools that give information for navigation.
Node: a basic unit of reference in a hypertext system. A node contains content.
Open Hypertext: a hypertext system that is linked to other hypertext systems.
System: a coordinated and integrated set of tools.
Tool: modular program that provides a specific presentation and interaction and
fulfill a special function.
Typed Link: a link that provides additional information about the relationships
between the linked components.
Web browser: a user interface that presents a single node, with the capability to
display anchors that may be used as navigation links. Internet Explorer and Netscape
navigator are examples of Web browsers.
Web Site: a set of Web pages that is provided by a Web Server.
5
World Wild Web (WWW): an open hypertext system implemented using the HTTP
protocol, HTML, and other markup languages, and URL links.
1.5 Scope and LimitationsMany structural properties of a hypertext have been investigated, including number of nodes,
number of links, and topology. Research has shown the relation between structure and navigation
performance, as is discussed in section 3.1.3. It would be an advantage to predict navigation
performance of a Web site in advance of constructing the Web site and to use these metrics as an
assessment tool. However, there are many metrics and the interaction effects between these metrics
and navigation performance are unknown. Some metrics are subjective. The main concern in this
proposal was to classify the document space (Web site) into high complexity and low complexity
rather than evaluate the metrics. The selected Web site metrics might not be a good representation of
complexity of the Web site’s structure. As a consequence, the metrics selected here might not be a
good predictor of navigation performance.
There are a wide variety of navigational tools. Two navigational tools were selected in this
study, a Web browser and a graphical overview. The results of this study on browsers and graphical
overviews might not be able to be generalized to other types of navigational tools. The interfaces used
in the experiment represent only one instance of a Web browser and a graphical overview. Thus, the
results might not be generalized to the broader class of navigational tools. The performance might
also depend on other factors such as data encoding schemes or interaction techniques.
Many tasks are performed with WWW, including finding information, reading, learning and so
forth. The navigation process is a sub-task. The navigation process is reviewed in section 2.2. The
information finding task was addressed in this thesis because it is a common task in the WWW
environment. However, navigational tools that achieve high performance for the navigation process in
information finding task may not facilitate other tasks. For instance, a navigational tool that makes it
easy to remember documents, may not have a significance effect in improving a navigation process in
new and unknown environments but it may improve the navigation process for re-visiting documents.
The study assumes users are of average skill and engaged in an information-finding task. Results
within a controlled experimental setting may vary from those in a real environment.
6
2 LITERATURE REVIEW
2.1 Document space and the WWWThere are many definitions of a document. Efforts to define what a document is, and more
generally, what information is, have been discussed in detail by Buckland (1991). He points out that
definitions for a document have ranged from any text object to any informative thing, including living
animals in a zoo. To narrow the scope of this study, the document definition given by Spring (1991)
will be used. It is stated as follows;
“A document is an identifiable entity, having some durable form, produced by a person or
persons toward the goal of communication and may take a number of forms, but must have at
least one symbolic manifestation that can be comprehended by humans." (p.8)
Documents include text, graphics, images and sounds in various combinations. Documents
may be produced on demand, based on what customers need and when they need it. Using the
WWW, the contents of a document can be constructed based on a user’s request. Many news Web
pages are “live documents,” i.e., the content of the document is dynamic. New document types, such
as active documents that search for users instead of waiting for to be found by a user, are beginning to
emerge.
2.1.1 Document Spaces
Benedikt (1992) investigated physical space to develop guidelines for designing artificial
spaces. He discussed space in terms of its topological properties, including dimensionality, continuity,
limits, and density. From these space properties, seven principles were proposed for designing a
cyberspace. The principles concentrated on what it would look like and how it would be effectively
presented. A space's dimensions may be described as extrinsic and intrinsic. Generally, an extrinsic
dimension controls the location of objects in space-time. An intrinsic dimension is a property of an
object. A space may be bounded or unbounded, as well as discrete or continuous. In part this depends
on the nature of the data type mapped to the spatial dimensions. Theoretically, some spaces have
unbounded dimensions. For example, the dimension formed by an integer attribute, such as file size,
has no upper limit. Practically, however, there is a finite number of documents in some given scope.
A space can be bounded on some values but still be extensible, i.e. bounded space may have infinite
resolution, (e.g. rational numbers between integers). The density of a space refers to how many
objects and sub-spaces can be contained within the space. The density will be reflected in scale of
space and movement through space.
7
Document space is used to refer to a collection of documents with some common attributes.
It is possible that some attributes are specified only in some documents. In general, orthogonal
attributes are used as dimensions. A space is defined by its dimensions. A space implies all possible
objects in it with respect to the dimensions. In this view, a document space is not the same as a
perceived physical space. However, it can be projected so as to be presented in a perceived pseudo
physical space.
Given a space, documents are objects within the space. (There are also other possibilities for
transformation of a document mapped to a non-object, such as vector field or force, but these cases
are rare). For the purposes of this discussion, document-objects are projected into some location in a
space, based on attribute values that conform to the dimensions of the space. The perception of a
document object is controlled by space properties.
As a corollary of the definition of a space, it is useful to define the laws that apply to all
objects in the space. In this paper, space is often defined in terms of the properties of the objects in
the space. Objects may belong to a space if they contain an existence property. For instance, a query
will result in the creation of a sub-space, and only documents that match a query belong in that sub-
space. Other laws would include the notion that position and distance are created by a space itself,
and that there is a Universe, the space that covers all spaces. General laws may be defined in the
design of a space. In physical space, the laws of physics govern. For example, two objects cannot
coexist at a given location; i.e. only one object can exist at single location. However, this and other
laws may be relaxed in an artificial space.
2.1.2 Hypertext
In a hypertext system, a document is no longer a single integrated unit, but may consist of a
network of components. A document is no longer linear but consists of a graph of “nodes” and
“links.” One may consider a hypertext as a set of documents, where each path through the nodes may
be defined as one document. Further, because users can choose any path when reading or can create
new links, the structure of the document is both dynamic and extensible, publicly and privately.
A node contains content and anchors. A link is defined as the relation between two anchors.
In general, a link joins a source anchor and a destination anchor. In implementation, a link also
contains source node identification and destination node identification. The scope of an anchor is
bound in a node. A link may contain other attributes such as link types and directions. Links may be
managed by a link manager to maintain consistency when a node is moved or deleted. More details
about a concept and implementations of hypertext systems can be found in Conklin (1987).
8
Hypertext was first envisioned by Vannevar Bush (1945). The memex (memory extension) he
envisioned contained a very large library and personal notes. It was used to make links to related
documents, thereby joining them into a trail. The system was optimized for scientific use and the
primary goals were to support making notes and browsing documents. Douglas Engelbart (1963)
developed the first operational computer-based hypertext system, NLS (oN Line System). Ted Nelson
(1987) is considered by many to be the spiritual father of a global hypertext system which he called
Xanadu. In Xanadu, related documents would be linked together on a large scale where everything
would be in a single system. Further, he envisioned a document being archived with a history of its
development -- versioning.
2.1.3 The World Wide Web (WWW)
The World Wide Web originated as a distributed hypertext system. It consists of an address
system (Uniform Resource Locators: URLs), a network protocol (HyperText Transfer Protocol:
HTTP), and a markup language (HyperText Markup Language: HTML) (Berners-Lee, Cailliau,
Luotonen, Nielsen, & Secret, 1994). A WWW system is composed of one or more WWW servers and
one or more WWW browsers. The first widely used WWW browser, Mosaic, was able to view
HTML documents and pictures. In addition to using HTTP, it was capable of using GOPHER and
FTP protocols.
The Uniform Resource Locator (URL) (Internet Engineering Task Force [IETF], 1994
[RFC1738]) standard specifies mechanisms for locating resources. URL is a subset of the Uniform
Resource Identifier (URI) (IETF, 1998 [RFC2396]). The URL standard specifies the syntax and
semantics in the context of the Internet. It comprises a syntax for protocol names, host Internet
addresses, and internal file names. The “query operator” may be applied to a URL as a mechanism to
pass state parameters through a URL.
The HTTP protocol is stateless (IETF, 1999 [RFC2616]). HTTP 1.1 offers nine operations of
which “GET” and “POST” are the most frequently used. Resources can be obtained from or stored
on a server. It also provides a flexible scheme for transferring many types of data.
HTML (World Wide Web Consortium, 1999), while considered by many to be a markup
language in its own right, is in reality an instantiation of one Document Type Definition (DTD) under
the Standard Generalized Markup Language (SGML). It provides the syntax of markup in an HTML
document. HTML specifies the syntax for specifying hypertext links. The browser is able to
recognize an anchor and traverse a link embedded within an HTML document. The distinction
between links and anchors is collapsed into a single anchor tag using the HREF (Hypertext
REFerence) attribute. It is a unidirectional, untyped, direct link. (HTML version 4 proposes the
9
capability for link types and direction). WWW client-software is required to comprehend an HTML
document. Current WWW client software also has the ability to present a variety of document
formats. HTML is currently being superceded by the eXtensible Markup Language (XML), which
like SGML, is a standard that allows for the definition of multiple document types.
According to Conklin's definition of hypertext (Conklin, 1987), the WWW is a weak example
of hypertext. It lacks the node and link manager aspect of many of the early hypertext systems. There
is nothing to prevent the dissolution of links or the creation of invalid links. Conklin suggested that an
essential component of hypertext was a “browser.” The “browser,” used to display the network
graphically for navigation, does not exist as a standard part of the WWW. In the WWW, the term
“browser” is used to refer to a tool for viewing a node.
The WWW uses the concept of a page (a hypertext node). A document is not specifically
defined. It may be a single page or a set of pages. Multiple documents could be included on one page.
Because a page can be pointed to by a URL, most components (e.g. server, client, and search engine)
use a page as a basic unit for service. The HTML standard is flexible; metadata may be used to
describe a set of pages as a document.
In the WWW, relationships between pages are explicitly defined by links as in a hypertext
system. The frame feature in HTML creates a complex relation between pages allowing new kinds of
implicit relations. On the presentation level, frames create an effect of state. The view is dependent on
which combinations of nodes are used to fill a frame. The frame creates a structural display area; that
can show multiple HTML files. Activating links in one frame area can causes another frame area to
display a different HTML file.
Many features were added to HTML versions 3 and 4.01 to support a variety of interactions
for WWW clients. These include applets, intrinsic event declaration, and scripting. With add-on
technology and improvement of Web browsers, current WWW content may also include programs,
e.g. java scripts. As a result, the interface of the WWW is equivalent to an interactive program, not
simply a text and image viewer.
As reported by Lawrence and Giles (1999), in February 1999, the estimated number of Web
servers was 2.8 million. Lawrence and Giles reported, based on a sampling of the number of pages in
thousands of servers, the mean number of Web pages per server was about 300, and distribution was
skewed. The estimation of total number of Web pages was about 800 million.
A comprehensive summary of WWW data can be found in Pitkow (1998). The summary
includes a characterization of client, proxy and gateways, server, and WWW. Woodruff, Aoki,
Brewer, Gauthier, & Rowe (1996) and Bray (1996) studied Web page characteristics showing; the
mean page sizes are 4.4 KB and 6.5 KB; the median is 2 KB; a page size distribution has high
10
deviation with a long tail. Bray also reported that over 50% of pages contain more than one image.
The HTML format is used in 76% of all nodes, and nearly 95% of HTML pages had the HREF
attribute with an average of 14 anchors per document (Woodruff, Aoki, Brewer, Gauthier, & Rowe,
1996). The number of links between sites was small, nearly 80% of sites had no links to other sites,
and 80% of sites had 1-10 links pointing to them (Bray, 1996). These figures indicate that only a
small number of major sites had contributed to navigation to other sites. A WWW structural analysis
can be found in Broder et al. (2000). This study indicated that the distribution of in-degree and out-
degree follows a power law. The study showed that there is a 75% chance that there are no paths
between two random nodes and if there is a path, on average there will be 16 links in the path. The
life span of nodes is around 50 days (Pitkow, 1998). It should be noted that this does not include
pages generated dynamically; these pages have a life span equal to the length of time they are viewed.
More details about Web page lifetime and rate of change can be found in Brewington & Cybenko
(2000).
2.2 Navigation and Information-Finding TasksJul and Furnas (1997) indicate that navigation is a process of moving something, i.e.
locomotion of either navigator or object, and making decisions about where to move. These processes
take place within a context, i.e. within an information environment or set of locations. Locomotion
assumes the concepts of location and direction. The decisions that are made sometimes follow a plan
and sometimes respond to the environment according to some goal. They depend on both declarative
and procedural knowledge and frequently require coordination of knowledge in different forms
(orientation). Thus, navigation is an incremental real-time process that integrates these two
components (locomotion and decision-making). In the process of navigation, a mental or physical
map of space is built. Jul and Furnas discuss situated navigation, in contrast to plan-based navigation,
models of navigation, and other issues including characteristics of the space, task, strategy, and user
knowledge.
Spence (1998) divides navigation activity into browsing, context modeling, gradient
perception, and strategy formulation. These activities are driven by intention.
The navigational process relies on knowledge about space. Knowledge about physical space
primarily comes from the senses, directly from the environment, or indirectly through a map or some
other representational aid. There are differences between small physical spaces, those within the line
of sight, and large spaces. Both route and survey perspectives are commonly used to communicate
spatial knowledge. A route perspective may use observers as a frame of reference, i.e. an egocentric
perspective. Alternatively, the environment can be described by the relative direction of a landmark to
11
an observer. The survey perspective takes a view from above, an exocentric frame of reference, and
describes environments relative to one another. On a small scale, such as a tabletop view, or on a very
large scale, such as a state level or global level, we get spatial knowledge from survey perspectives.
We look from above onto some representation or map, because, either we cannot be within the
environment, or the environment is too large to obtain route information.
Tversky, Franklin, Taylor, & Bryant (1994) indicated that the perspective information, either
route or survey, is not encoded in spatial mental models. Knowledge of route and survey perspectives
can be translated into each other equally well. However, a human “cognitive map” is not as accurate
as a physical map. Hirtle and Jonides (1985) reported on evidence of hierarchical relations in the
recognition of places.
In understanding a physical space, some objects are considered landmarks. Landmarks are
special objects and are different from other objects in that environment. Landmarks are used as points
of reference in a space.
2.2.1 The Questions Answered by Navigational Tools
Navigational tools assist in navigation. Functionally, a tool is a navigational tool when it
helps to answer one or more of the following questions;
Where am I?
Where is my destination?
How can I go there?
To answer these questions, the information that helps us identify paths can be received by answering
the following questions:
What are the conditions of alternative paths?
Where have I been?
Where can I go next?
Navigational tools also give us information about space itself.
How are objects in a space related to each other?
Why are the objects in that place?
These questions were mentioned by Grice (1989), cited by Mackinlay (1986), and Fleming
(1998). Fleming addressed navigation in Web page design by dividing user goals and expectations
into three tiers. Similar to the above questions, general navigation questions comprises the first tier.
He added the second tier as purpose-oriented questions and the third tier as product or audience
oriented questions.
12
Whitaker (1997) suggested that navigation is different within structured and unstructured
environments. In a structured environment (e.g. towns and corridors), navigation is primarily based
on landmarks and standard structures of the environment. In the unstructured environment (e.g.
natural or off-road environment), four strategies are used in navigation: prediction, recovery, catching
features, and aiming off. He suggested that these strategies might be applied to the WWW
environment as problem-solving strategies. Prediction, the ability to predict what will come next,
might be used in path selection. Recovery is the ability to recover from loss, i.e. to backtrack.
Catching features are features indicating that a given activity will move us too far from the goal
location. Aiming off is a strategy of following a well-known path, which is not directly toward the
goal location, but not far off either, then moving to the goal location later.
There are many navigation levels, which may be derived from the size of the space. For
example, state maps are used for traveling interstate. These maps show which interstate highways
should be followed. Once in the city, a city map provides more detail about which city road to use.
While driving on highways, road signs usually show how far it is to the next exit. On the other hand,
in cities where the junctions are close together and the speed limit is lower than on highways, road
signs usually show street names. Just as different tools are used to navigate physical spaces, it would
make sense to use variety of navigational tools, depending on the size and type of the document
space.
2.2.2 Navigation in Document Space
While document spaces are no less real than physical spaces, they are less likely to be the
comfortable three-dimensional space we are used to navigating. They may be one-dimensional as is
the case for an ordered list or n-dimensional as in the case of a vector retrieval system. A document
space may have many presentations. Navigational tools will vary based on the presentation of the
space.
Locomotion in a document space can be complex. Clicking on a link or an icon in the
interface, a new display may appear; this may be considered as “go to” or “get it.” Observers may
move in a space, or the space may move and change its appearance around the observers.
The continuity of motion in a physical space may be not applied in a document space.
Jumping from place to place is more common than walking along a continuous path. Travel in the
physical world occurs in an egocentric view where an observer is moving in an egocentric frame of
reference. The navigational information, such as map recognition or route knowledge, will be
transformed and used in this viewpoint. However, in a document space interface, traveling can use
both egocentric and exocentric views. Locomotion in document space can be relative or absolute.
13
Relative locomotion occurs when the next location is relative to a current position, while absolute
locomotion does not need a notion of current position. The common desktop metaphor views objects
on a display as if the view were from above a desk; navigation does not take place “in” a space but
“on” a space.
In physical space, one’s own location is a single point in space. In a document space, it is
possible to have many interfaces of radically different types open in multiple windows. For example,
Microsoft Windows Explorer allows selection of multiple files. The navigation metaphor does not fit
well in this situation because involves different places at the same time. One may argue however, that
observers still have one central point of focus at a particular view or window.
While navigation in physical space is concerned with place and location, with where to go
and how to get there, in a document space, the major concern is the information need. The high level
goal of navigation is the finding and use of information. According to Jul and Furnas (1997), tasks
can be identified as either searching or browsing, and tactics as either querying or navigation. The
definitions are given as follows;
“Searching – The task of looking for a known target.
Browsing – The task of looking to see what is available in the world.
Querying – Submitting a description of the object being sought (for instance, using
keyword) to a search engine which will return relevant content or information.
Navigation – Moving oneself sequentially around an environment, deciding at each step
where to go.” (Jul & Furnas, 1997)
The task and tactics are combined, i.e. searching by querying, searching by navigation, browsing by
querying and browsing by navigation.
Navigational activities are classified by Maurer (1996) in the following five categories:
“Scanning: covering a large area without depth.
Browsing: following a path by association until one’s interest caught.
Searching: striving to find an explicit goal.
Exploring: finding out the extent of the information space.
Wandering: ambling along in a purposeless, unstructured manner.” (Maurer, 1996)
Czerwinski and Larson (1998) discusses Web design and tools according to the following tasks:
Targeted revisitation: finding a Web document that you know exists and that you have
visited before.
Targeted search: finding a Web document that you know exists but that you have never
seen before.
14
Comprehensive browsing: finding a Web document and most of the pages related to it on
a particular topic.
Satisficing during browsing: finding a Web document on a topic that is “close enough” to
the subject at hand.
Navigation in information space is often accomplished by using an interface, a combination
of data presentations and interactions. The knowledge about a space is derived from a presentation
and interaction through an interface. To present data from document space in a display, the physical
dimensions of the display must be used. An object on the screen represents some data from the
document space. There are many ways to encode data into the physical dimensions of the screen and
to specify interactions. The encoding on a screen may not encode anything from the data; i.e. when
users may freely move objects on the screen, location may not be used for encoding. The screen
encoding may be used to encode attributes in one dimension; i.e., objects are spatially displayed in
some sorted order. The notion of “place” of presentation may differ from “place” in the document
space. The place on the screen can be changed dramatically; the distance relationship between
objects may not be preserved while interacting with the interface.
Documents can be classified by type. Document types might include such categories as
fiction or non-fiction, book, text, periodical, journal, novel, news, and so forth. The content of each
document type has some expected structure. For instance, a scientific paper is normally structured in
some order; for example abstract, general discussion, experiment method, experiment result,
discussion, and conclusion. Dillon (1994) has shown that users can predict the location of information
in a journal article with a high level of accuracy. The type of document is also differentiated by how it
is read. A novel may be read only once but a textbook may be read repeatedly. The overall structure
of a document collection of each document type is different. For instance, a book may be referred by
title and author where a newspaper may be referred by date of printing.
Furnas (1997) described the “navigability of a view” as the outer-link information. The outer-
link in a Web page is an anchor and the content that surrounds it. For effective navigability, the outer-
link information should not only describe the next node but also the whole set of nodes that the link
leads to. In other words, a node must have a good “residue” at every other node.
Pirolli and Card (1999) use the term “information scent”, defined as “the (imperfect)
perception of the value, cost, or access path of information sources obtained from proximal cues”
(p.646). The information scent is comparable to “residue” in Furnas’ work. Pirolli, Card, & Wege
(2000) developed and used an information scent score in comparing two navigational tools, the
Hyperbolic browser and the Explorer. The Hyperbolic browser makes use of distortion techniques.
The Explorer uses an expandable tree technique. The information scent score had an effect on
15
reducing task completion time in a retrieval task. Both navigational tools had lower task completion
time in high information scent score conditions than in low information scent score conditions. The
Hyperbolic browser had lower completion time than the Explorer in high information scent score
conditions but higher completion time at low information scent score conditions.
Not only does the data in navigation come from the structure of the document space itself, it
also comes from information about where users have been traveling through space, the current
position in space, and perhaps user’s plan and alternative paths to some other place.
2.2.3 Problems of Navigation in Document Space
Problems of disorientation and cognitive overhead were reported by Conklin (1987). The
terms “lost in space” and disorientation are used in hypertext. These terms are based on the problems
of not knowing where you are in the network of hypertext and how to get to some other places that
you know (or think) exist in the network. The problems include the decision of where to go next, and
whether it is worth going to.
Nielsen (1990) investigated the homogeneity problem of an information space. On-line text
always looks the same. Thus, places and sense of location are not easily recognized or understood,
which is part of the disorientation problem. He also suggested that the problem in navigation is not
only in the “context-in-the-large” which addresses the entire hypertext structure, but also in the
“context-in-the-small” which address reading hypertext nodes. The problem is “losing track of the
text one is currently reading is related to the immediately preceding or following text” (Nielsen,
1990).
Mackinlay (1986) studied the use of hypertext to search for information. Two classes of
problems were encountered in using hypertext: category troubles and navigation troubles. The
category troubles, created by “the lack of shared literal meanings of categories,” manifested
themselves in terms of subject confusion. The experiment showed that in 39% of the searches,
subjects had category troubles. While being confused by a context which was not related to searching
topics, subjects in 27% of the searches still expected and hoped to find useful information this way.
Subjects also refused to accept that the category was different from their own understanding; this is
indicated by their going though the same path repeatedly, as shown in 31% of the searches.
Three kinds of navigational troubles were reported; linearity assumptions, becoming lost in
space, and linked navigation breakdown. The linearity assumption is a misconception about the
nonlinear structure of hypertext. Subjects were surprised when they ended up at an unexpected place
when non-sequential links were used. Subjects expressed these perceptions in 30% of the searches.
The lost in space troubles occurred in non-sequential link traversal and in poorly chosen non-literal
16
sequential links series. The linked navigation breakdown was caused by the fact that the subjects had
no certainty about what they had explored and what they had not. This problem occurred because the
size of the hypertext was unknown to the subjects. Subjects navigated though hypertext by
“wandering around aimlessly.” Gray also reported that subjects overestimated the size of hypertext.
After a two-hour session, subjects reported from 16 to 1,000 screens; the mean of estimation was
219.19 screens and the deviation was 325.41 from the actual 68 screens.
Dillon's experiment (Dillon, 1994) on estimating a document size provided similar results. In
a hypertext environment, users had difficulty estimating the number of nodes, while in a linear
condition, reading from paper and word processor, estimated page counts were more accurate.
Dillon’s hypothesis was that the hypertext version, which did not provide a structure of the document
space, would lead to a problem in estimating document size and would be difficult to navigate. In his
experiment, a navigation problem was indicated by time spent on contents index as percent of total
time. The result showed that hypertext navigation has a significantly higher usage of contents index
than the linear text condition.
From 1994 to 1998, the Graphics, Visualization & Usability (GVU) Center at the Georgia
Institute of Technology conducted user surveys of the WWW (GVU, 1998). According to GVU’s
WWW user survey question “What do you find to be the biggest problems in using the Web?” only
small number of those responding report on the problem of “Not being able to determine where I am
(i.e., 'lost in hyperspace' problem),” 3.7% - 6.4% of the cases in the fifth to the tenth surveys. The
problem of “Not being able to visualize where I have been and where I can go (e.g., view portions of
a Web site, view clickstream)” is also low, 6.5% - 11.1% of the cases. Finding new information is
more problematic, 45.4% – 49.5% of cases, from the eighth to the tenth surveys. Finding a page that
is known to be out there is reported as a problem in 28.4% - 32.4% of the cases and revisiting pages is
reported as 17.8% - 12.2% of the cases. The biggest problem is a concern with speed reported by
61.4% - 80.9% of the respondents. The problems in navigation (i.e. being lost, visualizing location,
finding new information, finding a page, and revisiting a page) show different responses by gender,
age group and experience. The differences are consistent across multiple surveys. Females report
more problems than males. Being lost or unable to visualizing location are more of a problem for the
young, 11-20, and old, 50+ than the 20-50 age groups. The finding information problem is more
frequently reported from the young group. In general, those with more experience in Web usage
report fewer problems in navigation.
17
2.2.4 WWW Usage
Pitkow (1999) summarized the characteristics of Web usage. There are notions of popularity
in usage: requested files showed a Zipf distribution in both client usage and requested files from
servers, and 25% of sites were responsible for 80 to 95% of accesses.
Tauscher and Greenberg (1997b) studied navigation on the WWW. They found that there is a
58% chance that the next page will be a page that has already been visited. However, users visit only
a few pages frequently. Many pages are only visited once (60%) or twice (19%). The classification of
navigation actions and their frequency of usage are shown in Figure 2. Following an anchor and
“Back” button are common ways in navigation using a Web browser.
Figure 2: Frequency of navigation action as a percentage of the total navigation events and (b)
details of Open URL action.
According to the GVU’s tenth WWW user survey (1998), about 70% of users report that they
use the WWW for finding specific information “most of the time”. About 33% and 55% of users
report that they use the WWW to have “fun” and explore, “most of the time” and “some time”
respectively.
Hölscher and Strube (2000) investigated user behavior in information searching tasks using
the WWW. Twenty-four participants used the WWW to answer 5 questions with in 10 minutes. The
usage actions were captured by a Web proxy. The result was presented as the transition probability
shown in Figure 3. The study shows a difference in searching behavior between experts and novices
in terms of Web experience and domain knowledge. The results of usage data show that in
information searching on the WWW, users use a search engine more than browsing the index
hierarchy or going to the known Web site combined. When user browses the result Web site, there is
a 70% chance that the user will continue to navigate through such the Web site.
18
Figure 3: Process model of information seeking using Web (transition probability)
2.3 Navigational tools in Document SpacesAccording to Nielsen, “.. [a] hypertext system has two navigational dimensions; a linear
dimension used to move back and forth among the text pages within a given node, and a non linear
dimension used for hypertext jumps.” (Nielsen, 1990). In addition to a link follower, the following
tools were suggested for navigation in hypertext:
Overview diagram of the global information space and the local neighborhood of the current
node.
Backtracking facility tools for going back to a previous page.
Interaction history including timestamps, footprints, and breadcrumbs. Timestamps record
time and user movement and show when pages were visited. Footprints provide check marks
in an overview diagram of visited pages. Bread-crumbs show check marks in an anchor of
visited pages.
Gloor (1997) classifies navigational tools that are related to hypermedia documents into seven
categories as follows:
Linking - links in hypertext. Links are also classified as static links or dynamic links.
Searching - a full-text search engine such as WAIS.
Sequentialization - helps navigate by making a sequential path such as a guided tour.
Hierarchy - a hierarchical display of hypertext structure in various forms.
Similarity - a display based on document similarity.
Mapping - overview map of hyperdocuments.
Agents - artificial intelligence based techniques.
The Web browser is a common presentation of hypertext and the WWW. One node or Web
page is presented with active anchor areas. Clicking on an anchor will lead to the linked node, which
19
will be displayed replacing the current page. There are many approaches to providing more
information about links to aid user in anchor selection process. Campbell and Maglio (1999) add the
“traffic light”, small image indicating connection speed, in front of an anchor. Weinreich and
Lamersdorf (2000) implement a prototype, HyperScout system, provides information about the
anchor-link (i.e. title, author, object size, and etc.) via a small pop-up window.
Many hypertext systems provide an active trail list. A history list in Netscape is shown as a
list of visited sites including the title of page, the URL, the first visit date, the last visit date, the visit
count, and so forth. An active trail list can be ordered by those attributes. In Internet Explorer version
5, a history list is shown by date, by site, by most visited, and by order visited today. In “by date”
and “by site” views, history is shown as a two level expandable/collapseable tree. Items are grouped
by either site name or date. It is interesting that the date grouping is non-linear -- today, days in a
week, last week and last 2 weeks.
Instead of showing a trail using an ordered list, a trail may be viewed based on visited nodes.
In a Web Browser, an anchor may change color when it has been visited. In a graphical overview map
of a document collection, nodes and links visited while navigating may be highlighted. One may
present only visited nodes and links, similar to an overview map. This is a presentation of visited sub-
space. Samples of this system are WebNet (Cockburn & Jones, 1996), Footprint Site Map, and
Footprint Paths (Wexelblat & Maes, 1999). Controlled experiments (Cockburn & Jones, 1996;
Wexelblat & Maes, 1999) were conducted and gave positive results in the utilization of tools.
The trail list of a current session shows a list of visited pages. The trail list can be shown by
expanding the back button (in many Web Browsers). The top item is the most recently visited page –
the destination of a click on the back button. There are many schemes that might be used to create
this list. The stack-based scheme is commonly used. Dix and Mancini (1998) investigated six history
and backtracking mechanisms. The formal definitions are provided. They indicate that the back
button is used in different ways in many applications. In general, the linear traversal of links will
give the same results. However, when the list includes a node that has been visited several times,
each mechanism treats the visits differently. As a result, “go back” will go to different positions.
Tauscher & Greenberg (1997a) found that a trail list that presents the last 10 URLs with duplicates
saved only in the last position would be more predictive and usable than a stack based system.
A bookmark is a list of marked locations. In the WWW environment, a bookmark list is
shown as a title list of marked pages. Most Web browsers provide a hierarchical organization of
bookmark items. MS Windows implements a bookmark item as a linked file. While bookmarks are
still accessible from the menu, MS File Manager can also access them. In some implementations,
bookmarks are built into an HTML file.
20
Document usage data is also useful, especially for managing a document space. Animation
of the number of accesses per day on a given Web site could be very effective for identifying new hot
pages on a site. It could also show pages that, over time, are cooling down or becoming of less
interest. Similarly, document usage data makes it possible to see general growth patterns and clusters
of activity. Animation of visitors of a group to Web pages can be found in Minar and Donath (1999).
The graphical overview diagram is common in early hypertext systems, i.e. NoteCards,
Intermedia and WE (Conklin, 1987). The graphical overview is one promising tool for aiding
navigation in complex Web space (Czerwinski & Larson, 1998; Nielsen, 1999). There are many
graphical overview implemented in the WWW environment such as HyperSpace (Wood, Drew,
Beale, & Hendley, 1995), Hyperbolic Browser (Lamping, Rao, & Pirolli, 1995), WWW3D
(Snowdon, Fahlen, & Stenius, 1996), WebTOC (Nation, Plaisant, Marchionini, & Komlodi, 1997),
MAPA (Durand & Kahn, 1998), Microsoft FrontPage, CLEARWeb (CLEARWeb, Inc.), HoTMetal
(SoftQuad Software, Inc.), InContext WebAnalyzer (Geac Computer Corporation Limited), Ixsite
Web Analyzer (Ixacta, Inc.), Site Manager (Silicon Graphics, Inc.), and so forth. Many of these
systems are designed for Web site management. Only a few them, for example the MAPA system,
provide a client side viewer. The process of scanning Web structure takes time, which may make the
overview system inappropriate as navigation aids and for browsing. Pre-scanning a Web site structure
in some way will be important if graphical overviews are to be used for navigation.
Chen and Rada (1996) performed a meta-analysis, which showed that a graphical overview
diagram – a visualization of the organization of hypertext, is significantly useful. Graphical
overviews or maps provide an exocentric view of the space. They make sense of what the whole
space looks like, how the space is organized, and how objects are related.
The simple graphical overview diagram shows the structure of a document space. It shows an
explicitly defined set of relations, such as the hierarchical structure of a file system or the network
structure of hypertext. Diagrams use spatial dimensions in a partially ordered manner. Relationships
among objects in diagrams are often presented by connection lines. The objects are represented in
some simple symbolic form. The layout of objects in diagrams conveys information such as a
nearness relation and a group-cluster relation. Many algorithms are used to create diagrams. Display
constraints have been set in order to create a nice-looking networks, for instance, to maintain a
minimum number of cross-links, avoid overlapping nodes, and to keep a minimum link length.
Algorithms for optimizing diagram layout with many constraints are intractable, NP-complete
(Brandenburg, 1987). Heuristic methods and relaxed constraints are common in implementation. The
classification of graph structure topology, with extensive treatment of the aesthetics of diagram
construction, can be found in Beccaria, Bertolazzi, Battista, & Liotta (1991)
21
Another type of overview diagram is a semantic map (Lin, Soergel, & Marchionini, 1991;
Fowler, Fowler, & Wilson, 1991; Fowler, Kumar, & Williams, 1996; Kohonen, 1998). Words in
documents are processed into a semantic map. The process may not be truly semantic, but rather an
attempt to capture the semantic aspects of the documents. For example, a semantic map may be
created by projecting a set of documents into 2D or 3D space and optimizing the distance between
them so that similar documents will be clustered. Document similarity may be measured by a
distance vector method. Alternatively, similarity may be determined by classification mapping.
However, Ankerst, Berchtold, & Keim (1998) have proven that the optimal spatial arrangement
problem by similarity of multiple variables is NP-complete.
The graphical overviews or maps are also classified as global or local views. The global view
presents an overall view of the space. It is relative to the size of the pertinent space. For instance, a
state map may be considered a global map if one is concerned with travel only within a city.
Similarly, a map of all the Web pages in a single Web site map may be considered the global view
even if there are many Web Sites. The local overview/map view shows neighborhood around a local
focus. It can indicate “where we can go next” from a top-view. A local view may be a “zoom-in” of a
global view.
An author of a Web Site may create Web pages that serve as a map. For instance, the “Site
Map” page, table of contents or index pages might be viewed as a map of a Web Site.
Hypertext uses a network model for relations among nodes. A hypertext collection is
presented as a network diagram. However, network relations are sometimes simplified as a
hierarchical structure. There are various forms for presenting a hierarchical structure. The overview of
a hierarchical structure is normally presented as a tree diagram. Expanding and collapsing sub-trees
operate as a general strategy to avoid too much node information in a view.
Conklin (1987) reported several problems with graphical overviews. The problems
mentioned included difficulties in presenting a large number of nodes and/or links; difficulties in
dealing with a frequently changing hypertext network; difficult in overcoming slow response time in
user interaction. Other problems Conklin reported included an insufficient visual differentiation
among nodes or links and the fact that disorientation problems still exist for non-visually oriented
users.
The design of the display is always a tradeoff between data that can be displayed and data
that will not be visible at the moment. In order to display all of the data at once on a limited display
space, a data point has to be reduced to a very small point. On the other hand, if data are visible at a
size that can be readable or selectable by a mouse, some data points will be not visible due to
occlusions from other data points, or due to being out of the boundaries of the display. A large data
22
set also slows down an interaction process. Many strategies are used to solve these problems,
including the following:
The occlusion of objects may be allowed and interaction techniques, such as local
manipulation of the viewpoint, may be used to see them.
Panning of a virtual display space is allowed when the space to be displayed is larger than
will fit on the view area.
Multiple levels of display may be used, where more details of the object may be shown by
zooming in.
Content+focus addresses the problem of details versus overview by showing both of them at
the same time. At the point of focus, details of objects are shown and an overview is shown in
the rest of the area.
Some interaction techniques are:
Dynamic Queries: Dynamic queries technique, developed by Shneiderman (1994), is an
interactive display with controllers for direct-manipulation of queries and results. Controllers
are created which are bound to a range of values of interest corresponding to an attribute. The
presented data changes dynamically as the controller is manipulated within a bounded range.
Mural: Mural is a scheme that provides an overview presentation to fit on the screen (Jerding
& Stasko, 1995). The Mural view is a miniature of larger content that cannot be viewed
without losing detail or is not readable in a single display space. The display space is
condensed. Therefore, a single dot or Mural view may present multiple data points from the
original display. A secondary encoding, such as the color dimension of pixels, may be used to
present additional data.
Magic Lenses: A Magic Lens uses the concept of spatial sensitivity. It mimics a magnifying
lens. Magic lenses are areas which are superimposed on top of another presentation. Many
functions can be applied to lenses such as showing more detail or filtering. What is shown on
the lens is a function of the lens's position (Stone, Fishkin, & Bier, 1994).
Pad and Pad++: Pad uses zooming and panning as main interactions (Perlin & Fox, 1993;
Bederson & Hollan, 1994). The data first appear at a certain magnification factor. Zooming in
shows objects at a bigger size, with more detail, or with different detail (semantic zooming).
An object in Pad space is 3D, having both an X-Y coordinate and depth. At a certain zoom
factor, objects that have a certain depth will appear.
Furnas' Fisheye: Furnas (1982) provides a view combination showing both overview and
detail; using zooming, an overview structure is not visible when viewing detail. Using
multiple views, it is difficult to relate the information in both views. The fisheye provides a
23
combination of overview and detail views in a single view. The objects of interest will be
visible and change dynamically according to the focus point. By using the distance
metaphor, the focal object will appear closer than other objects. Furnas’ Fisheye is a basic
concept of spatial distortion presentation by “degree of interest” and it is also called
“Detail+Context” or “Focus+Context.” A formal description and framework of
“Focus+Context” appears in Björk, Holmquist, & Redström (1999).
2.4 Integrated Document Space Navigational tools
2.4.1 Tool Integration
Each document space navigational tool is designed and optimized for specific tasks. It is not
possible to meaningfully present at one time all the information about documents at all the levels it
might be presented. This failure is a result of the limitations of physical display devices and human
perceptual abilities. Supporting navigation requires a combination of different presentations in an
appropriate order. For example, it may be useful at the beginning to get an overview of a space – to
understand the structure of that space. During traversal of space, specific detail, obtained by zooming
to some local map with more detail on objects, may be useful. At the final stages, a content viewer is
needed to examine the content. This idea is similar to one in Spring, Morse, & Heo (1996) as well as
being consistent with Shneiderman's view that user interface should provide an “Overview first, zoom
and filter, then details on demand” (Shneiderman, 1998).
While it is possible to add many features to an application, i.e. to add a variety of document
presentation tools, a user might not choose to use these additional features. According to Albers
(1997), learning new methods and options requires additional work and remembering. Users tend to
work to optimize their cognitive resources rather than maximizing their work output.
Tools can be simple or complex, simple purpose or multi-purpose. Tools can be application
specific or generic, i.e. tools that are used by a variety of other programs. The term tool is used here
to refer to the simple generic type. Tools are modular programs that provide a specific presentation
and a specific interaction and fulfill a special function. A coordinated and integrated set of tools will
be called a system.
Navigational tools use document-related data and a presentation or display of that data.
Different tools may use the same data with different presentations or use the same presentation on
different data.
24
In terms of presentation, the integration of navigational tools needs to address how different
presentations can be viewed on a single screen or in a specific sequence. The concerns of integration
include multiple windows, graphical object combinations, and interaction schemes.
At the code level, the integration of navigational tools is a matter of sharing or exchanging
state data and content data between program modules. When the user is interacting with a system of
navigational tools, tool state information, such as current position, will be of use when switching
between tools. When the user is using a search tool, passing the search terms allows the system to
provide result texts with the query words highlighted in context.
To this point, two types of data have been described – document data and navigation data.
One might imagine document data to be public and navigation data to be personal. One might further
imagine document data to be stored remotely and navigation data to be stored locally. It is not hard to
find counter examples for these cases. For example, link traversal or navigation data on a Web site
might be collected to find high traffic paths. Similarly, file system data on a PC may be considered
personal – and only stored locally. Below, some of the issues pertaining to document and navigation
data are outlined:
Most GUI technology is organized around “windows” as a basic unit. The window is a
rectangular area for display that has some degree of automated functionality provided by a window
manager program. These include the reporting of various events such as: window events (e.g.
exposure, resizing, etc.); user events (e.g. mouse clicks, keyboard actions, etc); and system events
(e.g. OS interrupts, signals, inter application messaging, etc.) Window systems allow -- require --
windows to be organized within other windows -- creating a hierarchy. Within X-window
terminology, all graphical objects are windows, including icons and scroll bars. From this viewpoint,
each navigational tool may be defined as having its own window. The placement of windows and
indication of relations among windows are a major concern when integrating-tools.
The layout of windows may be static or dynamic. Windows may be laid out side-by-side or
overlapping. The advantage of side-by-side windows is that all of the information is presented in a
single view. The disadvantage is that the territory of all the windows must be less than the territory of
the display itself. Using overlapping windows, the total territory in all the windows may be many
times greater than the display territory. The disadvantage of overlapping windows is that some of the
information in the windows will be hidden from view at any given time. Interaction can resolve this
problem by allowing the user to select which window should be up front. X-Windows provides a
policy to do an automatic “bring to front” when the mouse enters the window area. MS Windows 95
uses a “task bar” to aid this process. Some applications provide a “stay on top” option to avoid
25
occlusion by other windows. Gaylin (1986) has shown that cycling through windows is the most
frequent window action (i.e. more than moving or resizing).
Many applications, which use a multi-window scheme, provide an automatic layout as either
tiled or overlapping. Window layout is dependent on the tasks being performed. Tasks that require
little window manipulation can be performed faster in tiled than in overlapping windows (Bly &
Rosenberg, 1986).
North & Shneiderman (1997) provide a taxonomy of multiple window coordination. It is a
two dimensional taxonomy. The first dimension relates to the data in the two windows, which is
either the same or different. (The same data might have a different presentation in each window.)
Different data should have explicitly defined relationships among them. (They may be an aggregate
of data items.) The second dimension is the function of the window. It is suggested that the windows
might be either selection or navigation windows. Given two windows, both might be navigational in
function, both might be selection- oriented in function, or they may be split.
The navigation functions include scrolling, zooming, following links, opening files, and so
forth. The six cases of combination are show in Figure 4. Shneiderman and North have reviewed the
advantages in presenting data with multiple windows with coordination between them. They
implemented a “snap-together visualization” which allows a user to define the coordination of
windows (North & Shneiderman, 1999).
Figure 4: A taxonomy of multiple window coordination (North & Shneiderman, 1997)
Multiple window system might encounter difficulty in presenting the relationship between
the windows on the screen. Many windows from different applications may be shown on the same
screen. As more windows are added, the screen becomes crowded. The windows may be shown to be
26
related by presenting them within a single application window. Relationships among windows may
be shown by the synchronization of changes. Interactions in one window can be used to change other
windows. For instance, when the data from one window is changed by interaction (e.g. selection), the
contents of another window (e.g. a contents window) can be changed. The relationship may be
explicit -- shown by some presentation such as a line and an arrow-- or unannounced.
A pop-up window is a window that only appears after some interaction with a main window.
A pop-up window generally captures control or focus from the main window. A common type of
pop-up window becomes active with a mouse click action and disappears when some button on the
window is clicked. Some variations such as Balloon Help in Apple system and Tool Tips in MS
windows are activated when the mouse pointer has been over some specific area for a specified
amount of time. A pop-up window may be kept open via some holding action. A “Pin” or “tear off”
capability is used in some systems to keep a pop-up menu open even after the pointer has left the
menu area.
The location of a pop-up window varies; positioning it at the center of the screen is a
common practice. Many applications position a pop-up window under the active area to avoid
obstructing the active area. A pop-up window shifts the focus of attention from the main window.
The second window may be the same size and in the same position as the current window,
but with a transparency property. Data is drawn on the transparent window, which is layered on top of
the other window. The data from a new layer may block the view of the layers beneath. This scheme
is used when spatial encoding of both views is similar. The interaction of views should be coherent
in both layers. The magic lens uses a scheme where a second smaller window is positioned in the
larger window, and the contents of the smaller window is a function of location within the main
window.
Changing the mode of a window may be concerned the same as creating a second window to
replace the first window. When changing windows, the interaction may be the same or it may use a
new set of interactions in the new windows. This approach has the disadvantage of associating
information from a previous display with a new one. It requires cognition overhead to recognize the
change in mode.
Windows may be synchronized in three ways. The first is one-way synchronization which
propagate a change occurs in one window to cause another window to change state; the second
window dose not change the first. The second is two-way synchronization. Changes in either
window will propagate to the other. The third possibility is that both windows maintain their internal
states independently, with no synchronization.
27
Regardless of whether navigational tools are presented at the same or at different times and in
the same or different windows, the navigational tools should be capable of being synchronized to
each other. This synchronization can be done by passing display data or tool state information or
both. The synchronization may be in terms of:
Place
Selection
Boundary or view
Attributes
For instance, if three windows are used, one showing a global tree map, another the files in a
directory list, and one showing file contents, the global tree map could have an indicator to show
which files are being displayed in the file directory list. The file content viewer may display the file
that is selected in the file list. In this case, the navigation might be done in either the tree map or the
file list. Navigation via the tree map should change the file list contents.
It is conceivable that each navigational tool has its own internal representation of document
space and navigation data. In order to communicate between tools, only common data can be
interchanged. The representation of the current place, or current selection, will have to be converted
to some shared representation.
2.4.2 Evaluation of Integrated Navigational tools in Hypertext
Monk, Walsh, & Dix (1988) compared three types of interface, i.e. hypertext browser,
scrolling text and folding text in two experiments. The task was to answer questions about a program.
The first experiment results showed that there were significant differences in time used between
scrolling text and hypertext browser. The hypertext browser was significantly slower than scrolling
text. There was no difference in task correctness. In a second experiment, they showed that a static
overview map of the program structure improved the hypertext browser navigation as indicated by the
reduction in response time. In controlled conditions, using the hypertext browser and showing a list of
titles that were similar to titles shown in an overview map, had little effect. Note that the document
contained 12 nodes. In the hypertext browser condition, two windows presented two nodes at the
same time. Subjects had no experience in using a mouse-based system.
Hammond & Allinson (1989) compared a hypertext browser alone to a hypertext browser
used with a map, a hypertext browser used with an index, a hypertext browser used with tours, and a
hypertext browser with all three of these navigational tools. Two task types, exploratory and directed,
were studied. The exploratory task was to read documents for subsequent testing. The directed task
was to use documents to answer a series of questions. All subjects were novices to the tools. The
28
document contained 39 screens with up to 6 navigation screens. There were three map screens for the
map tool. The results show that there are no significant differences in task performance, i.e. accuracy
score and time to complete task, for each tool. When using the hypertext browser with a single
navigational tool, subjects used the additional facilities to navigate separately; the map was used 31%
of the time, the index 23% and the tours for 49% of the total transitions. Usage of a hypertext browser
with all three navigational tools is as shown in Figure 5. When using the hypertext browser alone,
subjects viewed fewer screens and fewer different screens in terms of the total than the other
conditions. The new-to-old screen ratio in the hypertext browser alone condition was significantly
smaller than the other conditions.
Exploratory
Hypertext54%
Tour28%
Index6%
Map12%
Directed
Hypertext59%
Tour8%
Index17%
Map16%
Figure 5: Proportion of navigational tool usage in an exploratory and directed tasks
Nielsen (1989) summarized usability studies in hypertext and reported several factors that
affected performance. Most of the effect sizes were small. Only 17 of 92 effect factors were higher
than 2. Two significant issues reported are individual differences among users (i.e. age groups,
activity level and expertise) and the effect of different tasks (i.e. exploration vs. direct task). These
factors affected usability measurements to a greater degree than other factors studied.
Wright & Lickorish (1990) studied the effect of two navigation systems, i.e. index navigation
and page navigation, on various types of question answering. In index navigation, an index page is
provided and navigation to other pages can only achieved by clicking on an item in the index page.
This index page may be viewed as an overview map of the hypertext. In page navigation, each page
contained active anchors that allowed jumps to various places, i.e. navigate by using a hypertext
browser. The results showed that the page navigation group performed tasks by using more clicks
than the index group. However, time spent for the task depended on the question. For finding a page,
there was no significant difference in the time spent performing the task. The error rate was not
affected by navigation systems.
29
Heo (2000) studied Web visualization techniques. Web visualization techniques were
classified into four categories: distortion, three-dimensional layout, zoom and expanding outline. The
usability study was conducted to examine the performance of users in information-finding tasks using
four different navigational tools in two sizes of Web space. The experiments used a Web browser
(Internet Explorer as a control condition), a Web browser with a distortion technique tool (Site
Analyst - using Hyperbolic distortion), a Web browser with a zoom technique tool (MerzScope), and
a Web browser with an expanding outline technique tool (LiveIndex) (see Appendix A, Figure 41,
Figure 42, and Figure 43). Two Web sites were used; a small Web site containing 50 pages and a
large Web site containing 583 pages. The performances were measured by response time and
accuracy. The results showed that there was a significant difference in response time between tools
but no significant difference in accuracy. The Web browser with the zoom technique tool took more
time than a Web browser alone and the Web browser with the expanding outline technique tool. In
general, the mean response time, when using a Web browser with the tools, was higher than when
using a Web browser alone. The results showed that users took more time to complete tasks and
answer the questions less accurately on the large Web site than on the small Web site. However, there
was no interaction effect between tools and size of Web site in subject’s performance.
In summary, the findings about performance are mixed. The research would seem to indicate
that performance is highly dependent upon the user and the task. Less attention has been paid to the
characteristics of the space being navigated. This current study looks to examine the effectiveness of
navigational tools for information finding in hypertext spaces of varying complexity and “scent.”
30
3 RESEARCH METHODOLOGY
3.1 IntroductionDocument space properties are one of many factors that contribute to navigation efficiency.
The WWW environment is selected because it is a widely available document space. Web site
complexity is selected as a simple classification of the properties of the space. A number of
complexity measures are investigated and selected as predictors of navigation performance based on
Web site complexity.
Navigation behavior depends on the given tasks. Navigation for information-finding tasks is
used in this study. Navigating for information finding is a common task in the WWW environment.
The selection of a single task type is a simplifying condition for this research.
The Web browser is used as a base condition because it has become a common tool for
navigation in the WWW. The graphical overview diagram of Web site is chosen for study because of
its historical use in hypertext systems and it theoretic potential as a navigational aid. It shows the
structure of a document space visually. Theories of navigation support the utility of a graphical
overview of the space.
3.1.1 Document Space
A document space may be implicitly or explicitly linked or not linked at all, i.e. a simple
collection. Hypertext generally, and the WWW specifically, is a document space with a network
structure. A network structure can be a complex mesh, a hierarchical structure or a simple list
structure.
The WWW is an open and variably complex mesh environment, i.e. the user can travel from
collection to collection. This study addresses isolated sites. Links to other sites are disabled. An
individual Web Site is considered a closed hypertext system.
3.1.2 Web Site Metrics
This section examines data collected to describe the complexity of a Web site. The WWW is
a single heterogeneous hypertext. While all nodes can be identified as belonging to the WWW, it may
be the case that they are from several hypertexts. The WWW may also be defined as a collection of
Web sites.
A Web site may be viewed as a hierarchical collection according to its logical construction or
Domain Name structure. For example, department Web sites may be considered to be part of a
31
university Web site, i.e. www.sis.pitt.edu and www.cis.pitt.edu belong to www.pitt.edu, as their
primary domain names are the same.
A Web site may also be defined as synonymous with a Web server. As a further
complication, it should be noted that the implementation of many Web servers, i.e. Apache, MS IIS,
allows a single server to support multiple Web sites with different Internet addresses. Thus, while a
Web site may be defined as one or more servers managed by a single organization, other definitions
such as those above are possible.
There are many ways to gather information about a Web site. One convenient way is using
spiders (also called crawlers, robots and bots). This kind of program can be structured to return what
a user would encounter when navigating links. Both a linked graph, and a hierarchical structure map
of the site can be constructed using a Web spider to capture the structure of the Web site. However,
an unconnected component cannot be captured by a Web spider. (It would be possible, if a given
server allowed directory listings, to construct a spider that could find unconnected components in
directories where connected component existed.)
When scanning a Web site, a Web spider will encounter files in many different formats.
Many file formats, such as MS Office and Adobe PDF, are proprietary standards. Other, such as
VRML and VML, proposed by W3C, are public. Special software, i.e. “ a plug in”, is used to present
special formats. In order to simplify the Web Spider program, special file formats are ignored and
only HTML file formats are scanned. Scanning only HTML file types will cover the majority of
documents in a Web site. Woodruff, Aoki, Brewer, Gauthier, & Rowe (1996) reported that 75% of all
URLs point to files that are HTML. Bray (1996) reported that 44.7% of all URLs point to files with
no extension and 36.5% of all URLs point to html. Bray concludes that 80% of URLs are likely to use
the HTML file format.
A Web page from a Web site may be a static file or dynamically generated by a program
from a Web server. Common Gateway Interface (CGI), Active Server Pages (ASP), Internet Server
Application Programming Interface (ISAPI), PHP Hypertext Preprocessor, and Server Side Includes
(SSI) are examples of methods for create dynamic Web pages. In practice, a URL of the dynamic
Web page uses a specific file type (i.e. “.cgi”, “.asp”, “.dll”, etc.) The request from the client (Web
Browser) for a dynamic page commonly contains parameters that determine the specific content to be
created. The parameters may come from user entry on an input form, events from an active
component, state information kept by a cookie or state information kept at the sever site. In some
cases, it is impossible to determine the number of pages or the content of a dynamic Web page
because of the non-deterministic nature of the inputs. Iteration over dynamic Web page parameters
32
may not be feasible. A Web spider program is able to capture only a snapshot of a generated Web
page.
The HTML version 4 standard specifies a number of tags using URI. The details are shown in
Appendix B. One URI that is used in navigation by a Web browser is presented after the HREF
attribute in the “A” tag. However, HTML can provide other mechanisms to navigate, such as
javascript and “ismap” attributes of the “IMAGE” tag. These are more difficult to find and follow
using a Web spider. The script and “ismap” tags were ignored.
A Web site can be represented by a directed-graph. A number of attributes may be used to
describe a Web site including the number of nodes, the number of links, and the topology. The
topology may be described in terms of connections between nodes or in terms of the average distance
between nodes.
The size of a Web site may be defined in terms of the number of nodes or the size of the
nodes. There are many kinds of nodes in the WWW environment. The target URI may be an HTML
file, a text file, an image file or a proprietary file type.
Nodes may be classified as follows;
o HTML pages
o Embedded objects such as graphics (<BODY background > and <IMG src>)
o Other file types, which are referenced by anchor
Lawrence & Giles (1999) reported that the mean number of pages per server was 289, with an
extreme skew. According to Huberman & Adamic (1999), the distribution of the number of pages per
Web site may be predicted by a universal power law.
In the WWW environment, anchors and links are combined. The number of links is equal to
the number of tags and attributes that refer to a URI. In this study, a link is defined as those tags and
attributes that can be used in navigation to another node by the Web Browser. These include <A href>
and <AREA href>. The targets of the links, in HTML, can be classified as follows: internal node
links, internal Web site links and external links. This study is concerned only with internal Web site
links.
Other common derived attributes of the node such as the number of incoming-links and the
number of outgoing-links can be computed. The true number of incoming links to a node is unknown
in the WWW environment. In the scope of a Web site, the number of incoming links can be
determined by counting when the node is a target of a link from another node at the Web site. The
number of outgoing-links can be determined by counting referent tags of the node. The global
structure of a Web site may be represented by the average number of links and their deviation (e.g. of
incoming and outgoing links).
33
Two global structure measurements of the Web site developed by Botafogo, Rivlin, &
Shneiderman (1992) are compactness and stratum. The compactness indicates reachability of nodes.
It is defined as
.
The computation applied to Converted Distance Matrix (CDM), a distance matrix which
defines the non-reachable node pair distance with some constant, K, rather than an infinite value.
is the distance between node i and j and where n is number of
nodes and C is the maximum value a CDM entry can assume, usually C = K. In fully disconnected
nodes, Cp = 0 and in fully connected Cp = 1.
Stratum is a metric that suggests whether there is an order for reading the hypertext. In a
linear hypertext, it can only be read in one-way; stratum value is one. On the other hand, with a cyclic
hypertext there is no difference in ordering from what node reading starts; stratum value is zero. The
detail on Stratum formula is shown in Appendix B.
There are other ways to simplify a directed-graph distance matrix in order to avoid an infinite
distance. For example, distance matrix of directed graph can be assumed as an undirected graph and
computed with all-pair shortest path between nodes.
In the case of a Web site, the distance between nodes may be defined as the number of
“clicks” when navigating using a browser. For instance, node a and b which have no direct link have
a distance equal to the links from a to the root and the root to b. In general, this would be the distance
between root node to a target node plus one, assuming that it is possible to jump from source node to
root by one “click”. The source node to root node distance is set as one by assuming that there is a
backtrack mechanism. This distance metric depends on the root node.
Average distance from the root node is a metric that may be useful in an information-finding
task. Based on the fact that the root node is a convenient entry point to the Web site, the distance from
the root node is an optimized path condition. If the information need is randomly distributed
throughout the structure, the average distance will be a suitable predictor for the number of nodes that
need to be visited to find the information. However, it is known that in information seeking tasks, the
user may use a search engine. The search results will provide a direct access mechanism. Also, there
is a notion of node popularity, i.e. information may not be distributed evenly across the Web pages in
the site. Many Web sites provide an index page for use as a navigation aid. These are not target
nodes. Finally, users may not use the shortest path or may use some other strategy in navigation such
as aim off.
34
Boyle & Teh (1992) showed that increasing the number of links in a preserved hypertext
structure would decrease the average number of nodes visited in an information-finding task. They
also found that the total time for the completion of a task and the numbers of errors were not affected
significantly by the number of links.
Schoon (1997) showed that the different hypertext structures -- linear, hierarchical, star, and
arbitrary, had a significant effect on navigation in finding the location of the answer in a closed Web
site. The star and hierarchy structures were more navigationally efficient than the linear structure. The
arbitrary structure was significantly less efficient than the others. Efficiency was measured by
Navigation Action Efficiency (NAE) and derived from the following formula:
There was no significant difference in NAE value between groups with different levels of experience.
(It should be noted that the experience rating score was self-reported.) There was significant
difference in the NAE value among gender only in the arbitrary structure; males had lower NAE
values than females. In other structure types, there was no significant difference of the NAE value
based on gender.
Larson & Czerwinski (1998) showed the reaction time and “lostness” in three Web site
structures each with 512 bottom level node, i.e., 8x8x8 (8 top-level categories, each with 8 sub-levels
and 8 content level categories under each sub-level), 16x32 (16 top-level categories, each with 32
content level categories), and 32x16 (32 top-level categories, each with 16 content level categories)
hierarchies. The 8x8x8 hierarchy had a significantly higher reaction time and “lostness” than the
other two in an information-finding task. (The answer for the task was in the bottom level of each
structure type.) The 16x32 structure had a better reaction time than 32x16 but it was not significant.
Regarding “lostness”, the 16x32 had a better reaction time than 32x16, but it was marginally
significant. The “lostness” was computed by the number of unique and total links visited in
comparison to the “optimal path.” The information scent was controlled by the editor to make each
level of category appear natural. The depth of the Web structure contributed to both reaction time and
lostness.
Nakayama, Kato, & Yamane (2000) studied the Web site usage log in order to improve Web
site design. Access co-occurrence, measured by cosine similarity of access log, was used as one
metric. The path length is computed by using the shortest path between two nodes. The study showed
a relation between access co-occurrence and the path length between nodes, a negative correlation i.e.
nodes that are near each other are likely to be accessed together.
35
Frame construction in HTML creates the appearance of a single Web page by combining
HTML files. In this case, the Web browser is not a single node text viewer. Also, navigation using a
“frames” condition is different in that some combination of anchors from several nodes is visible and
may be used. In this study, a node will be applied to a single HTML file. The frame tag, i.e. <frame
src>, is considered to be a zero length link to the target node when used to compute the distance
between nodes. However, it was not counted when considering the navigational link.
Various Web structure metrics, including number of nodes, number of links and other derived
information that summarizes the connection between nodes, may determine performance in Web
navigation.
3.1.3 Study of Web sites structure
As discussed in the last section, there are many metrics that may be used to describe the
structure of a Web site. Some of them have been shown to be correlated to navigation performance in
information finding tasks, i.e. number of nodes, average distance from root node, and topology of a
Web site. In order to select metrics for representation of a Web site structure, a study was conducted
on a set of Web sites. A suitable set of metrics should differentiate Web sites with high and low
complexity. The research will test to see if these metrics have high correlation with navigation
performance using differential tools. The purpose of this preliminary study was to gather a suitable
set of Web sites for analysis. This set will serve as the basis for the selection of sites with high and
low complexity for use in the experiment.
The Web sites within the University of Pittsburgh domain were selected. Within this domain
are a large number of department and program sites that were identified and scanned. The local
university’s Web sites were selected in order to minimize Web site scanning response time and
minimize out-going network traffic. The department and program Web sites are managed by each
department thus they should have a variety of structural properties. However, because these sites were
similar in subject matter (i.e. they introduce programs and provide academic information) many
similarities of structure were noticeable. For instance, most of the main pages contain links to lists of
staff and faculty. These similarities were considered beneficial in that it would provide an additional
form of control in the final experiment. That is, while the sites would vary in complexity, they would
be similar in scope and content. However, It is true that the University Web sites are not
representative of the Web sites in the WWW. The majority of Web sites in the WWW are commercial
Web sites. The commercial Web sites differ from the University Web site in their objective and
content.
36
The Web sites list came from the University of Pittsburgh departments and program list page,
http://www.pitt.edu/academics.html. Web sites were also gathered by scanning all IP addresses in the
“pitt.edu” domain. The result of scanning all the IP addresses, from 2562 possible IP addresses, found
that a total of 636 hosts responded to an HTTP call at port 80. The first page of all these sites was
scanned. The Web sites were investigated based on their first page. Of the 636 addresses responding,
57% (365) of were either printer manager Web sites, default Web server software package pages, or
device manager Web sites. Of the 636 responses, including the printers and other device managers,
there were 308 unique first pages. Beyond the device manager sites, there were a few sites with the
same text because the hosts respond to multiple IP addresses. This analysis led to the identification of
232 sites that might be scanned for content. This list was then compared with a list generated by
scanning the main Web pages of the University to develop a list of 83 sites for further analysis.
The data was collected by a Spider program. Major components of the spider program came
from the public domain. The Web spider core code was written by Jef Poskanzer, www.acme.com.
The HTML parser was written by Arthur Do, (http://www-cs-students.stanford.edu/~do/
htmlstreamer.html). These two components were modified and merged together. An interface and
database functions were added.
The spider program uses a given URL as a starting point for scanning a Web site. The spider
program follows the URL that has the same prefix as the given URL. This given URL prefix is the
boundary of the Web site, i.e. only URLs with the same host and sub-directory path are in the same
site. Only URLs from anchors (i.e. “A href”, “AREA href”) and frames (i.e. “FRAME src”, and
“IFRAME src”) were followed. Other sources of URL are stored but not further scanned. Only the
target-URLs that are identified to be HTML file type were parsed to extract additional URLs.
A directed graph representing the Web site’s structure was created from the scanned Web site
data. Nodes of the graph were the HTML files within the Web site. Edges of the graph were presented
when the HTML file had a link to the target HTML file within the same Web site. The edges weight,
i.e. distance between nodes, were initialized to 1 when there were linked by anchor and initialized to
0 when there were linked within a frame.
The Web sites were scanned between June - September, 2000. Of the 83 Web sites scanned,
three Web sites were removed because they caused an error in the spider program. Given 80 starting
URLs, a total of 45,984 URLs were discovered from the 12,007 HTML files parsed. The summary of
URLs found are shown in Figure 6. The 40,856 (88.85% of the total URL) URLs used the HTTP
protocol. Other protocols found were: mailto - 4,186 (9.10%), javascript - 656 (1.43%) and ftp - 98
(0.21%). The rest 188 (0.41%) of URLs were typing errors, or system specific protocols such as
“gopher”, “file”, “news” and so forth.
37
Figure 6: Summary of URLs founded
The Spider successfully scanned 30,627 (74.96 % of HTTP URLs) of URLs that used the
HTTP protocol. Of the 10,229 (25.04% of HTTP URLs) URLs that were not scanned, 8,627 URLs
were not in the Web site (21.12% of HTTP URLs), 1,117 URLs resulted in a server responds of error
or access denied (2.73% of HTTP URLs), and 485 URLs were linked by other tag type (1.19% of
HTTP URLs).
Of the 30,627 URLs that were scanned, the content types identified by the server were as
shown in Table 1. The content type “text/html” was identified for 12,007 URLs. These were parsed
to extract URLs, URLs that had others content-types were ignored.
Table 1: Content types of scanned URLs
Content type Number of scanned URLstext/html 12,007 39.20%image/gif 11,807 38.55%image/jpeg 5,565 18.17%Others 1,248 4.07%Total 30,627
The files of type “text/html” content-type identified by the server, had the following file
extensions: “.html” 7,648 files (63.70% of expected HTML files), “.htm” 3,581 files (29.82%), no
file type 731 files (6.09%), and other file type include “.lasso”, “.asp” and “.map” 27 files (0.39%).
All of these files were scanned and parsed for tags.
There were 219 HTML files that contained a “FRAME” tag (1.82 % of HTML files), 160
files that contained “FRAME” and “BODY” tags (1.33%), 1,791 files that contained programs, i.e.
“SCRIPT” tag (14.92%), 360 files that contained “ISMAP” tags (3.00%). No files used “IFRAME”
tags.
38
20,085 outside site targets
Total 277,890 links
3,664 target nods
171,652 within sitetargets
A href(57.72%)
Img src(33.98%)
Areahref
(2.48%)
Others(5.82%)
25,263 links(9.13%) to outside sites
252,527 links(90.87%) to within site
33,191 links(13.94%) self reference
Figure 7: Links summary
A total of 277,890 links were found (see Figure 7). A link is a tag-attribute that contains a
URL. Of the 277,890 links, 191,737 were “tag-connections” where a connection is defined by
ignoring multiple links between source-target pairs with identical tag-attributes. The tags-attributes
that created links are shown in Table 2. Of the 277,890 links, 252,527 links (90.87% of the links)
were to nodes within the site. There were 171,652 tag-connections (89.52 % of the tag-connections)
that pointed to nodes within the same site. Of the links within the site, 3,664 nodes had 33,191 links
(11.94 % of total links, 13.14% of the links within site) to themselves. There were 25,363 links
(9.13% of the links) that pointed to nodes in other sites. Of the 191,737 tag-connections, 20,085 of the
tag-connections (10.48% of the tag-connections) pointed to nodes outside the site.
Table 2: Tags-attributes of links
Tags-attributes #Tags-connections % #Links %A href 107,251 55.94% 160,408 57.72%Img src 68,578 35.77% 94,419 33.98%Area href 6,450 3.36% 6,883 2.48%Body background 4,002 2.09% 4,189 1.51%Link href 1,987 1.04% 2,329 0.84%Img usemap 1,638 0.85% 7,806 2.81%Form action 695 0.36% 710 0.26%Frame src 583 0.30% 592 0.21%Script src 535 0.28% 535 0.19%Applet codebase 9 0.00% 10 0.00%Object classid 3 0.00% 3 0.00%Input src 3 0.00% 3 0.00%Object codebase 2 0.00% 2 0.00%Script for 1 0.00% 1 0.00%Total 191,737 277,890
39
In the navigation process by a Web browser, only the anchor links, i.e. “A href” and “AREA
href”, are shown as active areas. In this paper, these types of links will be defined as “navigational
links”. From the data, there are 167,291 navigational links (60.20 % of all links). The situation is
made more complicated by the fact that while the user does not “navigate” links that are a part of the
frame source structure, these do represent connections. “FRAME src” and “IFRAME src” links are
defined, for definitional clarity, as structural links.
In the graph structure representation of a Web site the anchor and frame were used to
connect HTML files. In this paper, the “connections” are defined as the number of connected pairs of
HTML nodes, regardless of how many navigational links and structural links connect them, within
the site pointed to by the “A href,” “AREA href,” “FRAME src” and “IFRAME src” tags-attributes.
There were 72,651 connections. Figure 8 shows a breakdown of the URLs identified at each site
classified as the total number of URLs discovered, the number of URLs pointing to nodes within the
site, and the number of in site nodes that were classed as HTML nodes. The graph is ordered by site’s
rank of total URLs. The graph shows that the rank versus total URLs distribution can be described by
a power law. Note that the number of URLs is in log scale. The distribution is similar to the
Huberman & Adamic (1999) prediction.
The distribution of the number of links at each Web site is shown in Figure 9. The Figure
shows the total number of links, internal and external at the site along with the number of
navigational links to other nodes within the site, i.e. navigational links to external sites are not
counted. Finally, the figure shows the number of connections to other nodes in the site. Note that the
graph uses a log scale.
40
Figure 8: Number of URLs
1
10
100
1000
10000
100000
Site sorted by number of total link
Num
ber o
f lin
ks
Total linksNavigation linksConnections
Figure 9: Number of links
41
Descriptive statistics of the number of URLs and links show in Table 3. Note that the
numbers of URLs/site and number of links/site have high standard deviation and skewness values.
The histogram of number of HTML nodes and connected links shows in Figure 10. There are high
correlations between the total number of URLs discovered, the number of URLs pointing to nodes
within the site, and the number of nodes in a site (r = 0.935 - 0.992, detail in Appendix D, Table 44).
These show that the ratio of HTML nodes to URLs within site is approximately constant (mean 0.27,
SD 0.13). There are also high correlations between each of links type groups (r = 0.960 – 0.969,
detail in Appendix D, Table 45). These show that the ratio of connected links to total links within site
(mean 0.23, SD 0.12) and ratio of navigation links to total links within site (mean 0.64, SD 0.17) are
approximately constant.
Table 3: Descriptive Statistics of number of nodes and links
Minimum Maximum Mean Std. Dev. SkewnessTotal URLs 13 8476 601.64 1225.34 5.011 URLs within site 13 8035 465.16 1128.67 5.489 HTML nodes 2 1842 150.09 256.03 4.341 Total links 16 48652 3472.52 7848.87 4.709 Navigation links 8 26911 2090.81 4107.55 4.085 Connections 1 14678 908.14 2162.28 4.648
Figure 10: Histogram of number of HTML nodes and number of connections
The total URLs versus total links and HTML nodes versus connections were plotted and are
shown in Figure 11. The plot is in log-log scale. Note that the number of links is at least the number
of nodes minus one because of the scanning process using the spider software. Only connected
components were discovered. The upper limit of number of connections is n2 – n where n is the
number of HTML nodes. There is a high correlation between HTML nodes and connections (Pearson
Correlation r = 0.82, p < 0.0001).
42
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04Number of nodes
Num
ber o
f lin
ksTotal URLs vs. Total linksHTML nodes vs. connect links
n 2 - n
n - 1
Figure 11: Total URLs versus total links and HTML node versus connections of each site
With HTML files as nodes of the graph and links as the edges of the graph, an “all-pair-
shortest path” algorithm was applied to compute all distances between nodes. The distance metrics
were computed in three graphs; directed graph, bi-directional graph and “jump to root” graph. Given
the nature of HTML links, the hypertext network structure is a directed graph. The bi-directional
graph assumes that no direction in the hypertext structure. The distance computed by bi-directional
graph will be the lowest distance between all nodes provided by the structure. The “jump to root”
graph is computed by assuming the distance of non-connected pair of nodes, in directed graph, to be
the distance from root to the target node plus one, which is the jump from source to root node.
The following data and metrics were collected or computed to provide an indication of the
Web site structure:
Number of HTML pages
Number of connections
Number of connections per number of HTML node - 1
Connection ratio – define as number of connected nodes pair per number of all possible
connected nodes pair
Mean of directed distance – computed based on only connected nodes pair
Median of directed distance
Standard deviation of directed distance
43
Skew factor of directed distance
Compactness
Stratum
Mean bi-direction distance
Mean jump-to-root distance
Mean of distance from root node
Descriptive statistics for the data are shown in Table 4.
Table 4: Descriptive statistic of Web site properties
N Min Max MeanStd.
Deviation Median SkewnessHTML nodes 80 2 1842 150.09 256.03 61 4.341Connections 80 1 14678 908.14 2162.28 215.5 4.648Connections per HTML node - 1 ratio 80 1.0000 22.9394 4.9901 4.7036 3.113 2.060
Connected ratio 80 .0298 1.0000 .6333 .3031 .705 -.551Compactness 80 .0296 .9912 .6146 .2938 .676 -.532Stratum 79 .0017 1.0000 .0930 .1243 .072 5.227Directed distance 80 .8000 6.5762 2.7563 1.2331 2.536 1.121Bi-direction distance 80 .7000 7.3026 2.5345 1.0782 2.275 1.419Jump to root distance 80 1.0000 8.1699 2.9804 1.2398 2.889 1.268Root distance 80 .5000 7.3783 2.4147 1.0528 2.229 1.676
The distribution of connections per HTML node –1 is shown as Figure 12. Only a small
number of sites have a connection per HTML node -1 ratio higher than 10.
However, nodes are connected within a Web site more than between Web sites as reported by
Broder et al. (2000). Broder further found that 75% of randomly selected pairs of nodes would not
have a directed connection path. The connected ratio showed that, on an average site, 63% of the
pairs of node are connected. Thus, 37% have no connection. Moreover, from the distribution of the
connected ratio, shown in Figure 13, and the skewness of the connected ratio, there is an indication of
a trend toward high connected ratio. The data also show very high correlation between connected
ratio and compactness (Pearson Correlation r = 0.998, p < 0.001). There are high correlations
between connection per HTML node – 1 and connected ratio and compactness (r = 0.594, p < 0.001
and r = 0.619, p < 0.001 respectively).
44
Link per node-1 ratio
23.021.0
19.017.0
15.013.0
11.09.07.05.03.01.0
Link per node-1 ratio
Freq
uenc
y
20
10
0
Std. Dev = 4.70
Mean = 5.0
N = 80.00
Figure 12: Histogram of #connections per #HTML node-1
Connected ratio
1.00.94
.88.81
.75.69
.63.56
.50.44
.38.31
.25.19
.13.06
0.00
Connected ratio
Freq
uenc
y
12
10
8
6
4
2
0
Std. Dev = .30
Mean = .63
N = 80.00
Figure 13: Histogram of connected ratio
Stratum values of most Web sites are small. The stratum value’s distribution is shown in
Figure 14. This suggests that most Web sites have several paths to travel between nodes. The mean distance between nodes in Web sites has a distribution as shown in Figure 15. Note
that the directed distance mean is computed based on connected nodes only. The bi-direction distance
mean is computed from the mean distance of all pair of nodes. On average, the mean distance
between nodes within Web site is 2.75 if they are connected. Mean of bi-direction distance, on
average, is 2.53. Jump to root distance mean value, on average, is 2.98. Mean distance from root
node, on average, is 2.41. Most of the sites have short mean distances within 4. The mean value of
distance from root indicates that most of Web sites are shallow. Some of the mean distance values are
less than one because the zero distance is used between “Frame” nodes. The data show there are high
correlations between distance measurements (Appendix D, Table 46).
45
Stratum
1.00.94
.88.81
.75.69
.63.56
.50.44
.38.31
.25.19
.13.06
0.00
Stratum
Freq
uenc
y
30
20
10
0
Std. Dev = .12
Mean = .09
N = 79.00
Figure 14: Histogram of stratum
Directed distance mean
6.255.75
5.254.75
4.253.75
3.252.75
2.251.75
1.25.75
Directed distance mean
Freq
uenc
y
12
10
8
6
4
2
0
Std. Dev = 1.23
Mean = 2.76
N = 80.00
Bi-direction distance mean
7.507.00
6.506.00
5.505.00
4.504.00
3.503.00
2.502.00
1.501.00
.50
Bi-direction distance meanFr
eque
ncy
30
20
10
0
Std. Dev = 1.08
Mean = 2.53
N = 80.00
Jump to root distance mean
8.007.50
7.006.50
6.005.50
5.004.50
4.003.50
3.002.50
2.001.50
1.00
Jump to root distance mean
Freq
uenc
y
20
10
0
Std. Dev = 1.24
Mean = 2.98
N = 80.00
Root distance mean
7.507.00
6.506.00
5.505.00
4.504.00
3.503.00
2.502.00
1.501.00
.50
Root distance mean
Freq
uenc
y
30
20
10
0
Std. Dev = 1.05
Mean = 2.41
N = 80.00
Figure 15: Histograms of distances
46
It is expected that when the number of nodes increases, the distance between nodes in the
Web site will increase. The relationship between mean directed distance and number of nodes is
shown in Figure 16. This graph shows a tendency toward higher distance between nodes when the
number of nodes is high. However, some Web sites had a shorter distance between nodes compared
to other site with similar number of nodes. The correlation between number of HTML nodes and
distance metrics were not high (r = 0.507-0.647, Appendix D, Table 47).
The scatter plots between all pair of metrics are shown in Figure 17 and the detail of
correlations between all pair of metrics is shown in Appendix D, Table 47.
In conclusion, these preliminary studies show that there is a high correlation between number
of nodes and number of links, but a low correlation between nodes and distance. Also, structure as
measured by number of connections per HTML nodes –1, stratum, connected ratio and compactness
appears to provide additional data about the complexity of a Web site.
In this study, complex sites – those that might be predicted to benefit from a tool to visualize
a site might be characterized as large size, far distance and small connected ratio. Those that would
not require such a tool would be small size, close distance and high connected ratio. From the data,
“small size” Web sites are defined as Web sites with less than 60 HTML nodes – lower than 50% of
rank. “Large size” Web sites are defined as Web sites with more than 60 HTML nodes. “Close
distance” Web sites are defined as Web sites with mean root distance less than 2.4, mean of root
distance, and “far distance” Web sites are defined as Web sites with mean root distance more than
2.4. “Small connected ratio” Web sites are defined as Web sites with connected ratio less than 0.704,
median of connected ratio, and “high connected ratio” Web sites are defined as Web sites with
connected ratio more than 0.704. The numbers of Web sites of each category given by these
classifications are shown in Table 5.
Table 5: Number of Web Sites by their complexity
Root distanceClose Close Total Far Far Total
Grand TotalConnected ratio Connected ratio
Size High Small High Small Small 19* 14 33 4 3 7 40 Large 5 6 11 12 17** 29 40Grand Total 24 20 44 16 20 36 80* low complexity** high complexity
47
0
1
2
3
4
5
6
7
8
1 10 100 1000 10000
Number of HTML nodes
Mea
n di
stan
ce
directed distancebi-direction distance
Figure 16: Mean directed distance and bi-direction distance versus Number of HTML nodes
#Nodes
#C links
L/N
Co nt. ratio
Comp actness
Stratum
Direct. dist.
Bi- d istance
Jump d istance
Root d istance
Figure 17: Scatter plots between Web site parameters
48
3.1.4 Task and semantic relatedness
This study examines navigation in support of information finding tasks, specifically, a
targeted search task. Subjects will be asked to answer questions that may or may not be answerable
based on where the information is located in a Web site that is new to them. The extent, organization,
and content will be new. This kind of task is one frequently under taken in the WWW context (GVU,
1998). For a targeted search task, only an answerable question will be used in the controlled
experiment. A question that not is answerable would require a lot of effort by subjects; they might
have to look at all the pages in the Web site.
The “information scent” or semantic relatedness between information needs (i.e. given
questions, and information provided by the site through tools, like node name, content of Web page
and anchor text) play a major role in navigation. If there are no hints from the node name in a
graphical overview, the advantage of being “one click away” cannot be used. In contrast, in a highly
complex environment, with multiple links between nodes and minimal intermediate path information,
browsers may offer significant advantages, both in the short run, i.e. an information-finding task and
in the long run, i.e. a space structure modeling task. In this study, the information scent will be
measured and controlled.
The operational information scent score can be found in Pirolli, Card, & Wege (2000). It is
defined as follows:
“Information scent = the proportion of participants who correctly identified the location of the
task answer from looking at upper branches in the tree.” (Pirolli, Card, & Wege, 2000) p. 5.
The tree browser, i.e. Microsoft Window Explorer, was used in their experiment.
In this paper, because a Web site has a network structure, a difference technique is used for
measuring the information scent. It is not practical to measure information scent of all pages in a Web
site because of its size. The shortest path from the root page was chosen as representative of the
information scent. The root page is a common entry point to the Web site and it is the entry point for
the main experiment. For a given target page of a specific question, the shortest paths from root page
to the target page was identified. Many shortest paths are possible but only one is selected. The
pages, in the selected shortest path, were presented to subject to select the anchor(s) which the subject
believes will lead to the target node. An anchor weighted positively if it lead toward target page, and
negatively if it moved farther from the target node. The information scent of the question and the
Web site is defined as the sum of average score of pages within the shortest path. In order to compare
between questions and Web site, the information scent is normalized by path length. The average of
information scent from 10 subjects was used.
49
3.1.5 Navigational tools and Integration
Two common navigational tools that are frequently used in a network structured document
space are: browser, with visible anchors connected to links; and graphical overview, a graph showing
link structure and nodes. The browser is the most common navigational tool in the WWW. Many
graphical overviews have been developed in hypertext research. There is theoretical support that they
will be useful in WWW navigation.
A browser provides navigation capability as well as document content presentation. A
browser presents only one page or node, at a time. It is similar to navigating in an egocentric view. In
contrast, a graphical overview presents a view of the overall structure of a hypertext, an exocentric
view. Depending on the size of the document space, a graphical overview may present only a local
overview of the space. With a scroll bar, other areas can be shown. A graphical overview navigates a
document space via active graphical objects. In order to access the content of a node, a graphical
overview has to be integrated with a text viewer or browser.
Browsers and graphical overviews are different tools. One would expect that they would be
better for navigation of one or another kinds of space. In general, the browser may be better for a
highly structured document space. For instance, a linear list hypertext may be visited in order, using a
browser. In contrast, a highly complicated space might be better navigated with a graphical overview.
Every node, despite its distance from the current node, is only one click away in a graphical
overview. More abstractly, a graphical overview will be of more use where the user either has or is
able to construct a simple mental model of the document space.
The anchor in a browser generally provides more information about nodes and links than
does a graphical overview. It can be used to provide more semantic information about the target node
in a limited display space. The semantic information in an anchor combined with the surrounding text
information allows a more accurate path selection than is possible with a graphical overview. A
graphical overview may show only the structure and a brief node name.
While many implementations of browsers and graphical overviews exist, experiment specific
navigation software was developed in order to automatically capture user interaction with the system.
The software was written in JAVA. The browser program was written by using “Web Browser”
object provided by Microsoft Development platform. The interface of the browser is a simplification
of the Web Browser (i.e. Internet Explorer); the features were minimized to be able to navigate in a
Web site. The sample of the browser screen is shown in Figure 18. The graphical overview was
written by modifying the generic graph viewing program from Visualizing Graphs with Java (VGJ,
1998). The program was modified to be able to read Web site structure data from the Spider. There
are many way to present a Web site structure and many interaction techniques as previously
50
discussed. The simple presentation is used by showing a Web structure in 2D planar graph. A tree
layout algorithm is used. The Web site graph is simplified by a breadth-first search. The graphical
overview uses the text viewer to present detail of the selected node. The sample of the graphical
overview and its text viewer screen is shown in Figure 19. The graphical overview also provides
panning and zooming functions.
As described previously, there are many ways to integrate navigational tools. This study used
a synchronized condition. The browser and the graphical overview were synchronized, i.e. navigation
using either tool will cause a change in the other view and the selected object will be the same in both
tools. The display layout of tools was side-by-side without adjustment ability. This layout is chosen
because it made the tools available to the user all the time. Multiple independent windows can cause
distraction in performing experimental tasks, i.e. adjusting the window size or moving the window to
the front or back. The integrated browser and the graphical overview screen is show in Figure 20.
Figure 18: The browser screen snapshot
51
Figure 19: The graphical overview and text viewer screen snapshot
Figure 20: The graphical overview and the browser
52
3.1.6 Summary
In conclusion, it was hypothesized that navigational tool performance depends on the
semantic relatedness of information presented to the information needed and the complexity of a
document space. The complexity of a document space may be described by a document space’s size
and its structure. However, document structure as described in Schoon’s taxonomy is subjective. The
number of html nodes, mean root distance and connected ratio are proposed for this study as
sufficient metrics.
In a space where there is a strongly semantic relationship between the question and the
content or node name, a user knows where to navigate. In the case of the browser, the structure of the
document space will have an impact on navigation performance. For example, a star configuration
will allow one-step access to information. In contrast, a linear list will require n steps where n is the
distance between the original node and the target node. In the case of the graphical overview, it is
possible in one jump to find the answer node.
In a space where there is a small semantic relationship between question and node context,
navigation may be considered a random walk through the space. Size, average distance between
nodes, and variation of distance between nodes will cause the difference in navigation performance.
However, the effect may not be equal to that of using the graphical overview and the browser
together.
3.2 HypothesesThis research examines the effect of integrated navigational tools on information finding in
closed hypertext. Navigational tools may operate at different levels of performance in different
environments. Integrated tools may be useful in selected environments under given predictable
conditions.
The Null hypothesis of this research is:
H0: There is no difference in user performance in information-finding tasks between integrated
navigational tools and individual navigational tools.
The working hypothesis is:
H1: There are significant differences in user performance in information-finding tasks when using
different navigational tools in certain kinds of environments.
Two navigational tools will be used, the graphical overview and the browser. The study will
assess three navigational tools conditions:
The browser alone, with a “back” facility (in essence, a history list)
53
The graphical overview with a text display window (no link following capability in a text
display)
The browser and the graphical overview with both tools synchronized.
Further, the closed hypertext, where information-finding tasks are carried out, will be
controlled in terms of the “complexity” of the hypertext and the “information scent.”
H1a: Integrated navigational tools, i.e. the browser and the graphical overview, will provide higher
performance in information-finding tasks and navigation within complex Web site spaces with high
information scent than will the browser or the graphical overview alone.
Performance will be measured by the following:
Number of tasks completion
Number of answers found
Time to complete the task
Total number of page views
Total number of pages
Total number of re-visited page views
Total number of extra page views
The Web sites will be measured by using the following criteria:
Number of HTML nodes small/large
Mean root distance small/large
Connected ratio small/large
Web site complexity is defined as high for large number of HTML nodes with high mean root
distance and high connected ratio, and as low complexity for sites which has small number of HTML
nodes with small mean root distance and small connected ratio. Two sets of Web sites were selected
based on low/high complexity measurements.
The information-finding tasks were conducted by giving subjects questions and asking them
to find the Web pages that contain the answer of the given questions.
The Web sites’ “information scent” or the relationship between question asked and the
amount of semantic information in the presentation were measured as described in section 3.4.2, and
each question at each Web site was classified as having low/high information scent.
Additional Hypotheses are:
H2: Subjects will perform better when using the browser than when using the graphical overview in
simple structured Web sites with little information scent.
54
Auxiliary Hypotheses
H3: Subject performance when using integrated navigational tools will degrade with the simplicity
of the hypertext, as the tool becomes a noise contributor rather than an information provider.
The ceiling effect and familiarity of navigational tool were also of concern. The ceiling effect
might occur when a single navigational tool can be used at the highest performance. Thus, the
improvement in performance due to an integrated navigational tool might not be able to shown.
Regards familiarity, the single familiar navigational tool in an integrated environment might be the
only one used and as the result, no performance improvement will be achieved.
3.3 Participants108 subjects (54 man and 54 women, using Internet more than one year) were recruited from
the University student population. Subjects were randomly assigned to each condition. Subjects were
paid (15 $US). Each subject performed a total 27 information-finding tasks, using three navigational
tool conditions. The experiment was expected to be completed in 90 minutes. Individual differences
such as gender and experience can affect an experiment as shown in the previous discussion. The
gender group, and experience were controlled in the recruitment process. This experience (using
Internet more than one year) is a majority of the Web user (GVU, 1998).
3.4 Material
3.4.1 Web Sites
Eight Web sites from the preparatory study were used in this experiment. Three high
complexity Web sites and three of low complexity Web sites were used. This selection was based on
their complexity property and additional properties such as including small numbers of pages that
contains programs and frames, small number of error pages and high variety of title text. Two Web
sites were selected for practicing. The list of selected Web sites and their properties are shown in
Appendix E. Selected Web sites were re-scanned in October, 2000 and copied into local storage. The
links to other sites were removed. The search facility, input form, server site query and java applets
were removed.
3.4.2 Questions and their Information Scent
Questions used in the experiment were prepared and controlled by the following conditions:
The questions were related to the content of Web site.
55
The questions were specific. The answer of the question appears in the textual content of the
Web page. There was no need to derive information from the Web page to answer the
question.
The questions were prepared by randomly selecting a Web page as a target of searching. The
given Web page was used for generating questions base on the content of the page. Six questions
were prepared for each Web site. The selected target Web pages and questions are shown in
Appendix F, Table 48. There were some pages that were difficult to create questions for, i.e. pages
that only contained links to other pages and pages that contained “The information you have
requested is being compiled and is not yet ready for posting here.” These pages, root pages, and
already selected pages were ignored and a new page randomly selected.
The graphical overview screen snapshot of the Web Site was generated. The label in a
graphical overview was extracted from title of the Web page. If the Web page did not contain a title,
one was generated from the heading or headings on the page or from the file name.
The Information scent or semantic relatedness between the question asked and presentation
was measured by the experiment discussed below. From the result of the information scent
experiment, four questions (two high information scent questions and two low information scent
questions) were selected for each selected Web site.
Information Scent Measurement Experiment
Ten subjects were recruited from the University student population. Subjects were given a
total of 36 questions (6 question for each of 6 selected Web site). For each question one graphical
overview and a set of Web pages were presented to subject. The Web pages were on the shortest-path
from the starting page to the page that precedes the answer page. The selected pages for information
scent experiment and the questions are shown in Appendix F, Table 48.
For each question, subjects were asked to select and rank 3 of the labels in a graphical
overview and 3 anchors on Web pages that lead to the answer page. The number of Web pages for
each question varied. A total of 74 Web pages were used.
A program was written to present questions, graphical overview and Web pages, and collect
data. The instruction sheet used is shown in Appendix F.1. Subjects were not allowed to go back to
previous page or questions. There was no time limit. The ordering of questions presented to subject
was random. The Web pages, of each question, were presented in the same order as accessing by the
browser from the root node to the target node.
56
The graphical overview’s information scent score is a weighted total of the number of users
who selected the target Web page label. Weights of 1, 0.5 and 0.3 are assigned to the first, second and
third selection, respectively.
The browser’s information scent score is computed by the following:
The selected anchor from each page is weighted by the order. Weights of 1, 0.5 and
0.3 are assigned to the first, second and third selection, respectively.
Anchor is also weighted by +1 if the anchor leads toward the target node and -1 if the
anchor leads further from the target node.
The sum of selected order weight and anchor weight is a Web page information scent
score.
The average of Web pages information scent score is used to represent Web site
information scent score of each question.
Average of the subjects score represents the scent of the question in the Web Site.
The information scent score of each question is the average of the graphical overview scent
score and the browser scent score. The results from the experiment are shown in Appendix F, Table
49 and Figure 21. The information scent score was used to select and classify questions. Four
questions, the two lowest information scent score and the two highest information scent score for
each Web site, were selected for use in the main experiment. The selected questions are listed in
Appendix F, Table 48.
Using the mean of browser scent score and the mean of graphical overview scent score,
questions were classified into four groups; browser low/high scent score and graphical overview
low/high scent score as shown in Table 6. High information scent questions had a high information
scent score in both the graphical overview and the browser. Low information scent questions had a
low information scent score in both the graphical overview and the browser.
57
-0.60
-0.30
0.00
0.30
0.60
0.90
9Q2H
29Q
1H1
9Q3H
39Q
4L1
9Q6L
39Q
5L2
2Q1H
12Q
6L3
2Q2H
22Q
3H3
2Q5L
22Q
4L1
1Q1H
11Q
3H3
1Q6L
31Q
2H2
1Q4L
11Q
5L2
4Q2H
24Q
3H3
4Q6L
34Q
1H1
4Q4L
14Q
5L2
5Q3H
35Q
2H2
5Q6L
35Q
1H1
5Q5L
25Q
4L1
3Q2H
23Q
3H3
3Q1H
13Q
5L2
3Q4L
13Q
6L3
Avg. mapinformationscent
Avg. pagesinfomationscent
Avg.informationscent
Figure 21: Information scent score
Table 6: Questions classification based on their information scents
Map information scent< 0.39 (low) >= 0.39 (high)
Web information scent
< 0.22(low)
1Q4L1* 1Q5L2*2Q4L1* 2Q5L2*3Q4L1* 3Q6L3*4Q4L1* 4Q5L2* 4Q6L3 4Q1H15Q4L1* 5Q5L2*7Q1H1 7Q3L1* 7Q4L29Q5L2* 9Q6L3*
1Q2H2 1Q6L33Q1H15Q1H1 5Q6L3
>= 0.22(high)
2Q2H2 2Q3H33Q5L2
1Q1H1** 1Q3H3**2Q1H1** 2Q6L3**3Q2H2** 3Q3H3**4Q2H2** 4Q3H3**5Q2H2** 5Q3H3**9Q1H1** 9Q2H2** 9Q3H3 9Q4L1
* Selected as low information scent.** Selected as high information scent.
58
The reliability analysis (scale alpha) was applied to the selected questions with the ten
subjects information scent score. The 12 high information scent questions were high reliability (alpha
= 0.7730) and the 12 low information scent questions were low reliability (alpha = 0.1403).
Reliability analysis might be useful to classify question on their information scent.
The summary of the selected questions information scent scores, grouped by web site
complexity, is shown in Table 7. The overall information scent score in the high complexity web site
and the low complexity web site were similar.
The summary of minimum pages required to find the target page of the selected questions
group by Web site complexity and question type is shown in Table 8. The number of minimum pages
required to find the target page used in calculating the number of extra pages views, will be discussed
later in this chapter. The number of minimum pages required to find the target page were high in the
high complexity Web site because the randomly picked target pages in high complexity Web site
tended to have a high distance from root node.
Table 7: Summary of the information scent of the selected questions
Web site complexity
Question type Map information scent Pages information scent Overall information scentAvg. Std.Dev. Avg. Std.Dev. Avg. Std.Dev.
High High 0.52 0.253 0.49 0.431 0.51 0.291Low 0.19 0.241 -0.04 0.298 0.07 0.225
Low High 0.74 0.265 0.35 0.306 0.54 0.244Low 0.11 0.145 0.08 0.267 0.09 0.153
Table 8: Summary of the minimum pages required finding the selected target nodes.
Web site complexity
Question type Sum of minimum pages
Average of minimum pages
High High 28 4.67Low 29 4.83
High Total 57 4.75Low High 17 2.83
Low 18 3.00Low Total 35 2.92Grand Total 92 3.83
59
3.4.3 Software
The study assessed three navigational tools conditions:
The browser alone, with a “back” and “forward” facility (in essence, a history list)
The graphical overview with a text display window (no link-following capability in a text
display)
The browser and graphical overview with both tools synchronized and displayed on the
screen.
The footprint facility, i.e. a color of icon label changed when the corresponding Web page
was visited, was provided. Other navigational facilities such as the history list, content index, and
search were not used. The display of the browser and graphical overview had a fixed size and
structure.
The experimental software was a combination of the browser, the graphical overview, and the
integrated tool. Practice tasks were presented first by the software. Then, pre-determine sequence of
questions, Web sites, and tools were presented in order. The experimental software captured events
generated when user clicked the mouse. The events that related to the navigation process were
recorded into a database. The software also recorded the identification of the submitted page. The
next task in the sequence was automatically activated after submitting the result. The subject could
not go back to previous questions. In between each of tool task set, there was a one-minute waiting
screen.
The timer was shown in the interface. When the time limit expired, a dialog box was
presented to the subject. The answer was recorded as incorrect and the time was equal to the time
limit. After the experimental task was completed a questionnaire screen was shown. The instruction
and screen snapshot of the software is shown in Appendix G.
3.5 Experimental DesignThree independent variables are tools (3 level), questions (2 level) and Web sites (2 level).
Tools are the browser, the graphical overview and integrated tool. Questions are classified and
selected based on information scent score as high information scent and low information scent. Web
sites are classified as high complexity and low complexity. The experiment is full factorial 3 x 2 x 2,
within-subject testing.
Browser Graphical overview Browser + Graphical overview
Web l Web h Web l Web h Web l Web hIS l IS h IS l IS h IS l IS h IS l IS h IS l IS h IS l IS h
60
The within-subject testing is selected in order to minimize the effects of individual
differences. Each subject performed tasks with three tools, two question types, and two Web site type.
Each subject performed a total of 3 x 2 x 2 = 12 conditions. There were two questions for each
condition, repeated for more reliability, with a total of 24 information finding tasks for each subject.
In order to minimize the knowledge about Web site while browsing, which would help in
navigation, there were a total of 6 Web sites: 3 Web sites for each Web site condition. Web sites in
each information-finding task were seen four times, i.e. two questions in low information scent
condition and two questions in high information scent condition. In order to eliminate a Web site
difference versus tool condition effect, each set of Web sites was treated by Latin square block. A
pair of high complex Web site and low complex Web site is also block. For instance, high complex
Web site w1 will pair with low complex w2, w4 and w6.
However, this design might lead to sequence effect. The sequence effect was compensated by
that fact that the ordering of navigational tools was counter balanced by using all sequences. The
order of the Web sites in each navigational tool was counter balanced by using all sequences. The
ordering of four questions in each Web site was random.
Using a power analysis for a 3x2x2 factorial design with the significance level at 0.05 non-
directional, the small effect size (i.e. f = 0.01), and the power at 0.8, a power analysis indicated a need
for at least 82 subjects per cell. To develop a full counter balanced design, 108 subjects were decided
upon.
3.6 Experimental TaskThe information-finding task was simplified in this experiment. Subjects were given a direct
question and told to find a Web page that contained the answer within the time limit for each task.
3.7 ProcedureSubjects were randomly assigned to experimental conditions. Subjects were briefed about the
experiment’s objectives. Subjects were asked to perform tasks as fast as possible with the correct
result. Subjects were trained for 2 minutes in the use of each navigational tool. Subjects were then
allowed practice using all three navigational tools with dummy hypertext and 3 practice questions.
In the experimental session, subjects used each tool to find the answers of four questions
using the assigned Web page. Subject was limited to finding the answer within a 6 minutes period. If
the subject could not find the target page within time allotted, the answer would be assumed incorrect
and the time would be recorded as 6 minutes.
61
After finishing the experiment, subjects filled out a questionnaire which provided
demographic information, Web sites familiarity score form, and subjective evaluation information.
3.8 Data Collection and MeasurementThe navigation activity logs were used to capture measurement data. The navigation activity
log contains a list of the subject’s identification number, the identification number of the visited
nodes (Web pages), and the time stamp. It was generated by the software used in the experiment.
In addition to the navigation activity log, the software also captured the source of navigation
action. For instance, the browser has three methods for navigation, following links, “back” and
“forward.” In the graphical overview, there are three methods of navigation, which is to click on an
icons, “back” and “forward”. In the integrated tool, the navigation include following links, “back”
and “forward”, and click on an icon. With the integrated tool, the number of navigational actions
made by the graphical overview and by the browser were reported.
Time spent in each part of tool was recorded. This was approximated as a total time when a
mouse cursor was in each tool area. The timer for each tool would start counting when there was a
mouse “button down” action on the tool and the timer would stop when another tool got a mouse
action. Mouse action included scroll bar movement.
The pages viewed by the subject are defined as follows:
Page views – the total number of page views viewed in the browser by the subject.
One page viewed three times would constitute three “page views.”
Pages – the total number of unique page – duplicate viewings not counted.
Revisited page views – the number of viewing of various pages beyond the initial
view. Revisited page views would be three if one page was viewed four times or if
three pages were viewed twice.
Extra page views – when using the browser the extra page views is the number of
page views minus the shortest path. Whether the page views include the shortest path
pages is orthogonal to the calculation. When using the graphical overview or the
integrated tool, the extra page views is the number of pages views minus by two.
The relation of between page views, pages, and revisited page views is the following:
The number of page views = The number of pages + The number of revisited page views
The number of extra page views is a number of page that not necessary for navigation, in
theory. The number of extra page views is calculated by subtracting a number of pages by a number
62
of pages that necessary to perform the task. In the graphical overview and the integrated tool, only
two pages are necessary for the task, one for the first page, and second one for the target page. For
the browser, the number of nodes that are necessary is equal the distance from the root page to the
target page, which depends on the question. The number of extra page views is highly correlated to
the number of page views.
In an information finding task, fewer page views may be considered more efficient. In
general, the number of revisited page views is a loss of the navigation process.
Demographic data was collected with the form, shown in Appendix H.1. The Web sites
familiarity score form is shown in Appendix H.2. User preference was measured by a subjective
satisfaction questionnaire. The questionnaire is showed in Appendix H.3, based on Post-Study
System Usability Questionnaire (PSSUQ)(Lewis, 1995).
63
4 RESULTS AND DISCUSSION
4.1 Demographic Data of Recruited SubjectsA total of 111 subjects (55 male, 56 female) who had at least one-year’s experience using a
Web browser (i.e. Internet Explore or Netscape) were recruited from the University of Pittsburgh
student population. Three subjects did not complete the experiment due to personal time constraints
and a software problem, leaving a total of 108 for whom results are reported. The demographic data
of subjects are shown in Table 9. The computer experience and Web experience of subjects are
shown in Table 10.
Table 9: Summary of subjects’ demographic data
Category Range Frequency PercentGender Male 54 50%
Female 54 50%Age 16-20 33 30.6%(Years) 21-25 30 27.8%
26-30 25 23.1%31-35 9 8.3%36-40 6 5.6%41-45 2 1.9%46-50 3 2.8%
Education Freshman 16 14.8%Sophomore 18 16.7%Junior 14 13.0%Senior 20 18.5%Graduate School 40 37.0%
Major Arts and Sciences 39 36.1%Information Science 32 29.6%Business 11 10.2%Engineering 10 9.3%Law 3 2.8%Education 2 1.9%Health and Rehabilitation Services 1 0.9%Nursing 1 0.9%Pharmacy 1 0.9%Public and International Affairs 1 0.9%Public Health 1 0.9%Other 6 5.6%
64
Table 10: Summary of subjects’ computer experience data
Category Range Frequency PercentComputer Experience 3-5 31 28.7%(Years) 6-8 23 21.3%
9-11 30 27.8%12-14 13 12.0%15 or more 11 10.2%
Web Experience 1-3 18 16.7%(Years) 4-6 66 61.1%
7-10 24 22.2%Web Usage 0-1 3 2.8%(Hours/week) 2-4 11 10.2%
5-6 6 5.6%7-9 4 3.7%10-20 53 49.1%21-40 27 25.0%> 41 4 3.7%
Web browser familiarity Novice 9 8.3%Intermediate 58 53.7%Expert 41 38.0%
The subjects were equally balanced between men and women; a result of controlled recruiting
process to counter of the problem reported in GVU (1998) that women reported more problems in
navigation than men. The average age of the subjects was 25 years with the majority being
undergraduates (73% of the total subjects). They came from a variety of disciplines and report
extensive computer experience (average computer experience = 8.9 years), Web experience1, and
Web usage (average Web experience 5 years, average Web usage 17 hours per week).
4.2 ResultsEach of 108 subjects completed 24 tasks (3 tool types x 2 Web site complexity conditions x 2
question types x 2 questions). The tools, Web site complexity, and question type conditions were
applied using a full sequence counter balanced design within subjects. 23 subjects had been seen
some of the Web sites in the experiment, the detail and the impact is show in Appendix I.10.
1 24 subjects claim to have been using the web for longer than it has been available to the general population.
This might raise one question about exaggerated number in this kind of self-report data.
65
4.2.1 Tool usage
Each tool was used in a total of 864 tasks; each of the 108 subjects performed 8 tasks (2 Web
site conditions x 2 question types x 2 questions). The tool usage was indicated by the navigation
action generated by the tool and the time spent in the tool. The navigation action is the mouse click
action on the icons on the overview map and the mouse click action on anchors. The time spent in
each tool was recorded. Time began when a mouse action occurred in the tool window. Time ended
when there was another mouse action occurred in another tool windows. The mouse action included
navigation action and scrolling action.
Browser
Using the browser, subjects navigated by clicking on anchors, the back button, or the forward
button. A total of 11,867 navigation actions were collected from 864 tasks using the browser.
Navigation by clicking on anchors happened 9,006 times (75.9% of total navigation actions), by
clicking on the back button happened 2,831 times (23.9% of total navigation actions), and by clicking
on the forward button happened 30 times (0.3% of total navigation actions). A navigation action
using the browser is similar to the number of total pages viewed that will be analyzed later in this
chapter. Back and forward navigation are one of the sources in re-visited pages. Another source of
re-visited pages comes from pages that are a target of clicking on anchors to pages already visited.
From the navigational action log, the time between anchor clicking was computed. Statistics
on time between anchor clicks grouped by Web site complexity and information-scent question type
is shown in Table 11.
Table 11: Summary statistic of time between anchor clicks in the browser
Web complexity Question type Count Mean Median Std. Dev.High High 1533 8.65 7.0 7.381
Low 4526 9.47 7.0 8.658Low High 288 7.95 5.0 7.665
Low 1736 11.58 8.0 10.071
The time between anchor clicks was an exponential distribution. The log transformation was
applied. The ln(time between anchor clicks) was analyzed in ANOVA with treatment (Web
complexity x question type) as a within-subject factors (Table 12). The time between anchor clicks
was dependent on a two-way interaction between Web complexity and question type (Figure 22).
Pairwise comparison indicated that there were significant differences from each other (Table 13).
High information-scent question type had significantly lower time between anchor clicks.
66
Table 12: ANOVA on ln(time between anchor clicks) of the browser
SourceType III Sum
of Squares df Mean Square F Sig. WEB .030 1 .030 .053 .819 QUESTION 67.588 1 67.588 117.219 .000 WEB * QUESTION 21.842 1 21.842 37.881 .000 Error 4596.634 7972 .577
Table 13: Pairwise comparison between ln(time between anchor clicks), Bonferroni adjustment
Web complexity, Question type High, Low Low, High Low, LowHigh, High Mean Diff. -0.1175 0.1695 -0.2881 P <0.0001 0.0038 <0.0001High, Low Mean Diff. 0.287 -0.1706 P <0.0001 <0.0001Low, High Mean Diff. -0.4576 P <0.0001
Figure 22: Cell line chart of mean (time between anchor clicks) when using the browser
Graphical Overview
Using the graphical overview, subjects navigated by clicking on icons in the map view, or the
back or forward buttons in the text viewer. A total of 7,848 navigation actions were collected from
864 tasks. Navigation using the map occurred 7,787 times. Only 59 navigation actions were generated
by clicking the back button and only two from clicking the forward button, accounting for 0.75% and
67
0.03% of total navigation actions with the graphical overview. On average, subject spent 75.72% of
the total time using graphical overview on the map and 24.28% of the total time on the text viewer.
From the navigational action log, the time between icon clicks was computed. Statistics on
the time between icon clicks grouped by Web site complexity and information-scent question type is
shown in Table 14.
Table 14: Summary statistic of time between icon clicks of the graphical overview
Web complexity Question type Count Mean Median Std. Dev.High High 1022 11.73 7.0 12.361
Low 3078 11.24 6.0 12.753Low High 250 12.08 9.0 10.746
Low 2239 8.94 6.0 9.019
The time between icon clicks was an exponential distribution. The log transformation was
applied. The ln(time between icon clicks) was analyzed in ANOVA with treatment (Web complexity
x question type) as a within-subject factors (Table 15). The time between anchor clicks was
dependent on a two-way interaction between Web complexity and question type (Figure 23).
Table 15: ANOVA on ln(time between icon clicks) of the graphical overview
Source Type III Sum of Squares df Mean Square F Sig. WEB 3.249 1 3.249 3.648 .056 QUESTION 17.233 1 17.233 19.348 .000 WEB * QUESTION 5.820 1 5.820 6.534 .011 Error 5769.780 6478 .891
Figure 23: Cell line chart of mean (time between icon clicks) when using the graphical overview
68
Pairwise comparison indicated that in the high complexity Web site with the low
information-scent questions, the time between icon clicks was significantly lower than in other
conditions (Table 16). There were no significant differences in time between icon clicks when using
graphical overview in the high complexity Web site with low information-scent questions and in the
low complexity Web site. These indicated subjects tried to visit many pages in the low complexity
Web site when the question was low information scent.
Table 16: Pairwise comparison between ln(time between icon clicks), Bonferroni adjustment
Web complexity, Question type High, Low Low, High Low, LowHigh, High Mean Diff. 0.06702 -0.04761 0.2225 P 0.4316 1.0000 <0.0001High, Low Mean Diff. -0.1146 0.1555 P 0.4403 <0.0001Low, High Mean Diff. 0.2701 P 0.0002
Integrated tool
Using the integrated tools, subjects navigated by clicking on icons in the map view, or the
back or forward buttons, or by clicking on anchors in the browser. The back and forward buttons
were considered as a part of the browser. Overall, there were 4,490 browser navigation actions and
4,392 graphical overview navigation actions, 50.55 % and 49.45% of the total navigation actions
(8,882 actions) respectively, using the integrated tool. Navigation by clicking on anchors happened
3,313 times (37.3% of total navigation actions, 73.8% of browser navigation actions), by clicking on
the back button 1,152 times (13.0% of total navigation action, 25.7% of browser navigation action),
and by clicking on the forward button 25 times (0.3% of total navigation actions, 0.6% of browser
navigation actions). The total navigation actions when using the integrated tool was less than when
using the browser alone but more than when using the graphical overview alone. On average, subjects
spent 52.18% of the total time using the browser and 47.82% of total time using the graphical
overview in the integrated tool.
While on average, subjects used both tools in the integrated tool treatment further analysis
can be derived. In 403 of 864 tasks, subjects used both the browser part and the graphical overview
part within the same task, mixed mode. On the other hand, 227 tasks were navigated using the
browser alone and 227 tasks were navigated using the graphical overview alone. Using a single tool
for navigation accounted for 52.54% of total tasks using the integrated tool. However, subject
69
alternated use of the individual tool in difference tasks, only three subjects navigated using the
browser alone for all eight tasks and only one subject navigated using the graphical overview alone.
There was one major exception -- seven tasks were conducted without any navigation action from
either navigational tools (i.e. the subjects submitted the first page).
Summary of the nature of integrated tool navigation action grouped by Web site conditions
and question type conditions is shown in Table 17. In the low complexity Web site with the high
information-scent questions, the usage of a single tool may have been high because tasks in this
condition were simple so that a single tool was sufficient to finish the tasks. In the high complexity
Web site with the high information-scent questions, subjects used the browser alone more often than
they used the graphical overview alone but in the low complexity Web site with the low information-
scent questions, subjects used the graphical overview alone more often than they used the browser
alone. Note that, in the integrated tool, subjects saw the Web site map which revealed the complexity
of the Web site.
Table 17: Frequency Distribution for tool usage based on location of navigation actions
Web complexity
Question type
Browser alone
Percent. (row)
Graphical overview
alonePercent.
(row) MixPercent.
(row) TotalHigh High 73 34.4% 38 17.9% 101 47.6% 212
Low 29 13.6% 21 9.8% 164 76.6% 214Low High 81 37.5% 97 44.9% 38 17.6% 216
Low 44 20.5% 71 33.0% 100 46.5% 215Total 227 26.5% 227 26.5% 403 47.0% 857** 7 tasks did not used either tool
The Browser Navigation Action Ratio (BNAR) for the integrated tool indicates how much a
subject used the browser compared to the overall navigation actions within a task. This ratio is
computed by:
The browser navigation action ratio is 0 when there are only navigation actions from the
graphical overview in the integrated tool and 1 when the navigation actions only come from the
browser. The distribution of browser usage ratio, shown in Figure 24, is very high at zero and one and
uniformly distributed in between.
70
Figure 24: Histogram of browser navigation
action ratio in the integrated tool
Figure 25: Histogram of browser time usage
ratio in the integrated tool
Two subjects spent all their time using browser in the integrated tool in all 8 tasks. No subject
spent all of their time using the graphical overview in all 8 tasks. 90 tasks (10.42% of the total) were
performed with the subject spending all of the time using graphical overview alone without any
interaction with browser, not even scrolling the browser. 185 tasks (21.41 % of a total task) were
conducted by using browser alone, no time was detected on the graphical overview part. When
subjects used a browser alone to navigate (277 tasks), there were 42 tasks (18.5 %) that subjects spent
some period on the graphical overview. When subjects used a graphical overview alone to navigate
(277 tasks), there were 137 tasks (60.53%) when subject spent a period of time in the browser.
Browser time usage ratio (BTUR) for the integrated tool is computed by
The histogram of browser time usage ratio is show in Figure 25. The histogram shows high
usage of the browser alone (i.e. browser time usage ratio = 1). The browser navigation action ratio
and browser time usage ratio show high correlation (r = 0.824, p <0.001).
Table 18 (a) and (b) provide summary statistics for the browser navigation action ratio
(BNAR) and browser time usage ratio (BTUR). Using the integrated tool, the standard deviation of
BNAR and BTUR were very high because of the single tool usage as shown in Table 18 (a).
However, the average value of overall usage of integrated tool and when the integrated tool was used
in mixed mode were similar (Table 18 (b)). On average, the graphical overview map was used
slightly more than the browser part, the BNAR less than 0.5.
71
Table 18: Summary statistic of BNAR and BTUR grouped by Web site complexity conditions
and question type conditions
a) overall usage
OverallWeb complexity
Question type
Avg. BNAR Std Dev.
Avg. BTUR Std Dev.
High High 0.59 0.374 0.55 0.366Low 0.48 0.325 0.47 0.280
Low High 0.46 0.458 0.55 0.416Low 0.38 0.403 0.58 0.320
Overall 0.48 0.399 0.53 0.351
b) only mixed mode
Only mixed modeWeb complexity
Question type
Avg. BNAR Std Dev.
Avg. BTUR Std Dev.
High High 0.51 0.181 0.42 0.244Low 0.45 0.243 0.42 0.205
Low High 0.49 0.163 0.57 0.238Low 0.38 0.275 0.61 0.217
Overall 0.45 0.235 0.48 0.237
The browser navigation action ratio was analyzed using an ANOVA with treatment (Web
complexity x question type) as a within-subjects factor (Table 19). The Web complexity and question
type showed significance as the main effect (F = 2.732, p <0.003 and F=16.284, p <0.001,
respectively) without interaction (F = 0.231, p < 0.632).
Table 19: ANOVA on Browser Navigation Action Ratio
SourceType III Sum of
Squares df Mean Square F Sig. WEB 2.732 1 2.732 9.522 .003 QUESTION 1.863 1 1.863 16.284 .000 WEB * QUESTION .032 1 .032 .231 .632 Error 14.730 107 .138
The pairwise comparison indicated that the browser navigation action ratio was significantly
higher in the high complexity Web site than in the low complexity Web site (Bonferroni, Mean Diff.
= 0.112, p = 0.003) and it was also significantly higher with the high information-scent questions than
with the low information-scent questions (Bonferroni, Mean Diff. = 0.093, p < 0.001). Subjects used
the browser part of the integrated tool for navigating more when Web sites were high complexity
53% of the time compared to 42% of the time in the low complexity Web site. Subject also
72
increasingly used the browser part of the integrated tool for navigating when questions were higher
information scent, 52 % of the time compared to 43% of the time in the low information-scent
questions.
From the navigational activity log, state transition probability of the integrated tool usage
were calculated and shown in Figure 26 and Table 20. Time between state transitions was
summarized and shown in Table 21 (forward actions were not included). The time between
transitions had an exponential distribution.
The state transition probability shows that using a single tool consecutively was more
common than switching between tools. In the beginning of the tasks, the graphical overview and the
browser were selected with equal frequency.
Using the back button after using the map view was rare compared to using it after clicking
on an anchor. Using the back button to navigate was less frequent in the high information-scent
questions. Consecutive back-clicking was more common in the high complexity Web sites than in the
low complexity Web sites. The average time between consecutive clicking of the back button was
very short, 1.76 sec.
StartSubmit
Time out
Map action
Anchor action
Back actionForward action
0.50
0.49 0.11
0.08
0.01
0.77
0.11
0.020.01
0.13 0.51
0.24
0.02
0.010.10
0.01
0.63
0.23
0.020.04
0.320.36
0.20 0.08
Figure 26: State transition probability in using the integrated tool
73
Table 20: State transition probability in using the integrated tool
Next actionWeb site complexity
Question type
Previous action
Map Anchor Back Forward Submit Time out
Over all Start .50 .49 .01Map .77 .11 .02 .08 .01Anchor .13 .51 .24 .11 .01Back .10 .63 .23 .02 .02 .01Forward .32 .36 .20 .08 .04
High High Start .46 .52 .02Map .64 .18 .04 .13 <.01Anchor .11 .59 .07 .22 <.01Back .24 .35 .35 .01 .05Forward 1.00
Low Start .43 .56 .01Map .80 .12 .02 .03 .02Anchor .15 .57 .22 .04 .01Back .09 .56 .31 .02 .01 .01Forward .29 .41 .18 .06 .06
Low High Start .60 .40Map .37 .16 .03 .44Anchor .09 .36 .06 .49Back .04 .61 .17 .09 .09Forward .50 .50
Low Start .51 .49 .00Map .84 .05 .01 .09 <.01Anchor .13 .33 .46 .08 <.01Back .08 .82 .07 .01 .02 <.01Forward .20 .20 .40 .20
74
Table 21: Time between state transitions in using the integrated tool
Time in between (sec)
Next actionWeb site complexity
Question type
Previous action Map Anchor Back Submit
Time out
Overall Start Avg. 17.43 10.99 5.00StdDev. 19.13 14.08 5.86
Map Avg. 9.93 11.23 15.51 9.48 18.36StdDev. 13.19 13.71 22.89 12.01 22.23
Anchor Avg. 24.35 10.63 9.43 8.27 18.62StdDev. 20.12 11.97 11.28 9.00 22.37
Back Avg. 17.55 4.65 1.76 10.55 13.11StdDev. 17.22 6.83 2.91 11.67 15.37
High High Start Avg. 17.83 9.48 7.25StdDev. 14.11 13.22 7.14
Map Avg. 11.77 8.59 10.33 10.06 26.00StdDev. 17.66 7.78 10.32 13.61 .
Anchor Avg. 23.97 8.86 13.23 5.62 50.50StdDev. 18.46 8.81 15.96 6.39 50.20
Back Avg. 16.54 6.09 1.44 20.00StdDev. 19.25 4.93 2.06 19.12
Low Start Avg. 31.60 14.87 1.00StdDev. 30.12 18.16 0.00
Map Avg. 11.31 10.75 18.51 14.45 18.55StdDev. 14.77 12.72 28.30 21.21 22.80
Anchor Avg. 27.02 11.06 8.75 11.30 16.48StdDev. 21.40 12.48 11.67 12.04 19.34
Back Avg. 19.09 4.84 1.69 8.25 14.25StdDev. 18.38 7.74 3.05 3.95 16.02
Low High Start Avg. 9.56 5.56StdDev. 8.12 5.64
Map Avg. 9.67 10.90 21.67 8.12StdDev. 10.29 7.27 31.30 7.06
Anchor Avg. 12.85 7.76 10.54 7.81StdDev. 7.43 7.84 8.79 5.75
Back Avg. 12.00 6.79 3.50 7.50StdDev. . 6.09 3.70 2.12
Low Start Avg. 14.47 12.59 4.00StdDev. 13.25 12.89 .
Map Avg. 7.64 16.23 11.13 8.07 12.67StdDev. 8.64 21.72 7.41 7.31 16.86
Anchor Avg. 19.69 12.44 9.71 11.56 13.50StdDev. 17.79 14.31 10.14 12.23 17.68
Back Avg. 15.55 4.18 2.42 7.00 4.00StdDev. 13.19 5.83 2.56 7.71 .
75
The natural logarithmic transformation was applied to the data on time between clicks. The
transformed time between clicks was analyzed using an ANOVA with treatment (Web complexity x
question type x event type) as a within-subjects factor (Table 22). There was a significant three-way
interaction between Web site complexity, question type, and event type (F = 3.094, p = 0.026) (Figure
27).
The time between anchor to icon click was significantly higher than time between other
clicks, i.e. icon to icon, anchor to anchor, and icon to anchor in most case (Appendix I.1, Table 50).
There is one exception. In the low complexity Web site with high information-scent type questions,
the time between anchor to icon click was not significantly different from the others. There were no
significant differences between the times for icon to anchor and anchor to anchor. This would seem to
indicate that the time for reorientation to the map was higher than the time to reorientation to the
browser.
Table 22: ANOVA on ln(time between clicking) when using the integrated tool
SourceType III Sum of
Squares df Mean Square F Sig. WEB 1.691 1 1.691 2.320 .128 QUESTION 4.464 1 4.464 6.122 .013 WEB * QUESTION .356 1 .356 .488 .485 WEB * EVENT 9.987 3 3.329 4.566 .003 QUESTION * EVENT 29.909 3 9.970 13.675 .000 WEB * QUESTION * EVENT 6.767 3 2.256 3.094 .026 Error 4770.929 6544 .729
Figure 27: Cell line chart of mean (time between clicking) when using the integrated tool
76
To compare the difference in usage between the integrated tool and the individual tools, two
ANOVAs were applied to the time between clicks (transformed by the natural logarithmic). The
ln(time between anchor clicks) when using the browser alone was compared to the ln(time between
anchor clicks) when using the browser part of the integrated tool by ANOVA with treatment (tool x
Web complexity x question type conditions) as a within subjects factor (Table 23). The ln(time
between icon clicks) when using the graphical overview alone was compared to the ln(time between
icon clicks) when using the graphical overview part (map viewer) of the integrated tool by ANOVA
with treatment (tool x Web complexity x question type conditions) as a within subjects factor (Table
24).
Table 23: ANOVA on ln(time between anchors clicking) comparison the browser and the
integrated tool
SourceType III Sum
of Squares df Mean Square F Sig. TOOL 13.699 1 13.699 23.351 .000 WEB .130 1 .130 .221 .638 QUESTION 67.545 1 67.545 115.135 .000 TOOL * WEB .059 1 .059 .100 .752 TOOL * QUESTION .141 1 .141 .241 .623 WEB * QUESTION 14.009 1 14.009 23.879 .000 TOOL * WEB * QUESTION .924 1 .924 1.575 .209 Error 6056.022 10323 .587
Table 24: ANOVA on ln(time between icons clicking) comparison the graphical overview and
the integrated tool
SourceType III Sum
of Squares df Mean Square F Sig. TOOL 1.455 1 1.455 1.636 .201 WEB 4.463 1 4.463 5.018 .025 QUESTION 21.229 1 21.229 23.870 .000 TOOL * WEB .027 1 .027 .031 .861 TOOL * QUESTION .461 1 .461 .518 .472 WEB * QUESTION 7.161 1 7.161 8.052 .005 TOOL * WEB * QUESTION .150 1 .150 .169 .681 Error 8696.946 9779 .889
There was a significant difference in the time between anchor clicks when using the browser
alone and when using the browser part of the integrated tool (Figure 28). The difference in the tool
was the main effect (F=23.351, p<0.001). There were no significant interactions between tool and
77
Web site complexity or question type. The effect of Web site complexity, question type, and their
interaction were similar to when using the browser alone. Pairwise comparison indicated that the time
between anchor clicks when using the integrated tool was significantly higher than when using the
browser alone (Mean Diff = 1.532, p <0.001 in ln(sec)). One of the reasons may be related to the size
of browser part in the integrated tool. It was half the size of the browser tool alone. This may have
caused subjects to scroll more. Another reason may be that subjects get information from the
graphical overview part of the integrated tool.
Figure 28: Cell line chart of mean ln(time between anchor-anchor clicking) when using the
browser and using the integrated tool
On the other hand, there was no significant difference in the time between icon clicks when
using the graphical overview alone and when using the graphical overview part of the integrated tool
(Figure 29). There were no tool effects either as the main effect and or as interaction with Web
complexity and information-scent type questions. The effect of Web site complexity, question type,
and their interaction were similar to when using the graphical overview alone.
Figure 29: Cell line chart of mean ln(time between icon-icon clicking) when using the graphical
overview and using the integrated tool
78
Adjusted time spent on tool
Measuring time spent on tool by using mouse action on the tool as an indication of usage
suffers from several possible errors. First, the time might be divided between the tools when there is a
transition. Second the time allocated to one tool, even when there was not a transition, may have been
split between two tools without an action in the second tool. The simple case for such a division
would be when the subject clicks only in the overview but is looking at the browser to determine if
they reached the target page.
The adjustment of the tool usage time was recomputed with the time data from the activity
log using the following formula
1. From start to first click time goes to tool of first click.
2. All additional time allocated as followings:
a. If tool at time n is the same as tool at time n-1, time goes to the tool
b.If tool at time n is not the same as tool at time n-1, 50% of the time goes to each tool.
3. When time between icons clicks exceeds 9 seconds, 50% of the excess time goes to the
browser.
4. When time between anchor clicks exceeds 8 seconds, 50% of the excess time goes to the
map.
The adjusted time spent on tool and the adjusted browser time spent ratio were computed.
The summary statistics comparing the time spent on tool is showed in Table 25.
Table 25: Summary statistic of adjusted time spent on tool
Time spent on toolAdjusted time spent on
toolBrowser Graphical
OverviewBTUR Browser Graphical
overviewAdj. BTUR
Mean 57.495 52.707 .534 57.579 62.850 .527Std. Dev. 71.521 72.402 .352 67.428 69.187 .346Std. Error 2.433 2.463 .012 2.303 2.363 .012
There was high correlation between the initial calculation of the time spent on tool and the
adjusted time spent on tool. The correlation coefficient between the original calculation and the
adjusted time on the browser was 0.913, p < 0.001. The correlation coefficient between the original
and the adjusted time spent on the graphical overview was 0.908, p < 0.001. The correlation
coefficient between the original browser time spent ratio and the adjusted browser time spent ratio
was 0.844, p < 0.001. Pairwise t-test indicated there was no significant difference between the
original time spent on the browser and the adjusted time spent on the browser. Adjusted time spent on
79
the graphical overview was significantly higher than the original time spent on the graphical
overview. There was no significant difference between the original BTUR and the adjusted BTUR.
4.2.2 Task completion
Each subject performed 24 information-finding tasks. On average, subjects completed 21.42
tasks within the six minute time limit. Overall, 240 of the 2,592 tasks (9.26% of total tasks) exceeded
the time limit. A summary of the number of tasks completed grouped by tool, Web site complexity,
and question type is shown in Table 26. Subjects were able to perform all the tasks within time limit
in the low complexity Web sites and high information-scent questions. Most of the tasks that
exceeded the time limit were in the high complexity Web site with the low information-scent
questions, indicating that these tasks were difficult and required a longer time to finish than the time
limit allowed.
Table 26: Summary statistic of number of tasks completed
Tool Web complexity
Question type
Tasks completed Num. of tasks not completedN Percent (1) Avg.(2) S.D.
Browser High High 212 98.15% 1.96 0.190 4Low 150 69.44% 1.39 0.734 66
Low High 216 100.00% 2.00 0.000 0Low 211 97.69% 1.95 0.252 5
Graphical overview
High High 206 95.37% 1.91 0.291 10Low 158 73.15% 1.46 0.647 58
Low High 216 100.00% 2.00 0.000 0Low 210 97.22% 1.94 0.230 6
Integrated High High 213 98.61% 1.97 0.165 3Low 134 62.04% 1.24 0.722 82
Low High 216 100.00% 2.00 0.000 0Low 210 97.22% 1.94 0.268 6
Total 2,352 1.81 0.470 240(1) Percent of tasks completed for each specific condition (i.e. 216 (108 x 2 replicate))(2) Average number of tasks completed by subjects, 2 tasks each subject, for each condition.
The number of tasks completed was analyzed in ANOVA with treatment (tool x Web
complexity x question type) as a within-subjects factor. The sphericity assumption was not met
(Appendix I.1, Table 51) so that the lower-bound correction was applied (Table 27). The number of
tasks completed depended on a three-way interaction between tools, Web complexity and question
type (F = 4.428, p = 0.038) (Figure 30).
80
Table 27: ANOVA on number of tasks completed, lower bound correction
SourceType III Sum
of Squares df Mean Square F Sig. TOOL .421 1.000 .421 1.696 .196 WEB 32.744 1.000 32.744 162.282 .000 QUESTION 32.744 1.000 32.744 178.850 .000 TOOL * WEB .381 1.000 .381 1.752 .189 TOOL * QUESTION 1.122 1.000 1.122 4.063 .046 WEB * QUESTION 22.827 1.000 22.827 115.360 .000 TOOL * WEB * QUESTION 1.113 1.000 1.113 4.428 .038 Error 26.887 107.000 .251
Hs = High information-scent question type, Ls = Low information-scent question typeHc = High complexity Web site, Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool
Figure 30: Cell line chart of mean number of tasks completed grouped by tool, Web site
complexity, and question type show interactions
The number of tasks completed was divided into two groups, the high complexity Web site
condition and the low complexity Web site condition, and analyzed with an ANOVA with tool by
question type as a within-subjects factor. In the high complexity Web site condition, tool by question
type interaction was significant (F=4.798, p <0.031, Table 28) but in the low complexity Web site
condition, the tool by question type interaction was not significant (F=0.050, p < 0.952, Table 29).
81
Table 28: ANOVA on number of tasks completed in the high complexity Web site condition
with lower-bound correction
SourceType III Sum
of Squares df Mean Square F Sig. TOOL .799 1.000 .799 1.980 .162 QUESTION 55.125 1.000 55.125 157.816 .000 TOOL * QUESTION 2.231 1.000 2.231 4.798 .031 Error 49.769 107.000 .465
Table 29: ANOVA on number of task completed in the low complexity Web site condition with
lower-bound correction
SourceType III Sum
of Squares df Mean Square F Sig. TOOL .003 1.000 .003 .050 .824 QUESTION .446 1.000 .446 14.088 .000 TOOL * QUESTION .003 2 .002 .050 .952 Error 6.664 107.000 .062
In the high complexity Web site condition, pairwise comparison between tools with high and
low information-scent question types showed that there were no significant differences in the number
of tasks completed between tools in the high information-scent question type, but the graphical
overview had significantly higher number of tasks completed than the integrated tool in the low
information-scent question type (Appendix I.1, Table 52).
In the low complexity Web site condition, the question type had the main effect. The low
information-scent question type showed significantly lower number of tasks completed than the high
information-scent question type (Bonferroni, Mean Diff. = 0.052, p<0.001). There was no tools
effect.
The number of tasks that exceeded the time limit showed that the subject could not find the
answer. The fact that subject submitted an answer page is not indicative of their having found the
correct answer. In this experiment setting, subjects might give up searching for the answer and submit
a page at any time.
4.2.3 Number of answers found
The answer was found if a subject clicked “submit” when the target Web page was presented
within the time limit. The answer was found in 1,765 tasks, 68.1% of total tasks, and was not found in
827 tasks, 31.9 % of total tasks. On average, a subject found answers in 16.3 out of the 24 tasks. A
82
summary of the number of answers found grouped by tool, Web complexity, and question type is
shown in Table 30.
Table 30: Summary statistics of the number of answers found
Web site Complexity
Question type
Answers foundTool N %(1) Avg.(2) StdDev. Browser High High 176 81.48% 1.630 0.5895
Low 44 20.37% 0.407 0.5806Low High 202 93.52% 1.870 0.3375
Low 144 66.67% 1.333 0.8428Graphical High High 171 79.17% 1.583 0.6575Overview Low 60 27.78% 0.556 0.6604
Low High 203 93.98% 1.880 0.3269Low 177 81.94% 1.639 0.6479
Integrated High High 184 85.19% 1.704 0.5842Low 56 25.93% 0.519 0.6187
Low High 199 92.13% 1.843 0.3906Low 149 68.98% 1.380 0.8284
Total 1765(1) Percent of tasks completed for each specific condition (i.e.216 (108 x 2 questions) tasks)(2) Average number of answers found completed by subjects, 2 tasks each subject, for each condition.
The 240 tasks that exceeded time limit were counted as “answer not found”, accounting for
29.0% of the number of answers not found. The navigation log indicated that in 71 tasks out of the
827 tasks with answers not found (8.6%) the subjects visited the target page but submitted some other
page or continued searching until the time limit was exceeded (Appendix I.3, Table 53). In the
remaining 756 tasks, subjects exceeded the time limit or submitted a page without visiting the target
page.
The number of answers not found in the high complexity Web site with the low information-
scent questions was 488, 75.3% of the tasks in this condition (Table 31). This accounts for 59.0% of
total answers not found (827). The results would be a little easier to accept if all of 488 not found
answer resulted from the time out condition. The question that arises is whether one or more of the
questions were poorly worded resulting in submission of incorrect target pages that could indeed be
defined as correct.
83
Table 31: Summary number of answer found, answer not found, and timed out grouped by
Web site complexity and question type
Web site complexity
Question type
Answer found Answer not found Grand TotalNot timed out Timed out Not found Total
N %(row) N %(row) N %(row) N %(row)High High 531 81.9% 100 15.4% 17 2.6% 117 18.1% 648
Low 160 24.7% 282 43.5% 206 31.8% 488 75.3% 648High Total 691 53.3% 382 29.5% 223 17.2% 605 46.7% 1296Low High 604 93.2% 44 6.8% 44 6.8% 648
Low 470 72.5% 161 24.8% 17 2.6% 178 27.5% 648Low Total 1074 82.9% 205 15.8% 17 1.3% 222 17.1% 1296Grand Total 1765 68.1% 587 22.6% 240 9.3% 827 31.9% 2592
Further analysis on the distribution of the number of answer found, the number of the answer
not found, and the number of answer not found cause by timed out were conducted (Figure 31 and
Appendix I.3, Table 54). This distribution would support the conclusion that there was not a “bad
question” that skewed the result. In all questions, some found the target answer, some submitted a
wrong page, and some timed out. The second prospect has to do with the possibility that there is a
secondary legitimate target page being selected. To test this possibility, the frequency with which
incorrect target pages were submitted was summarized (Figure 32 and Appendix I.3, Table 55). The
number of unique pages that was not a target pages submitted show exponential distribution. This
distributed indicated that each question many subjects submitted the same non-target pages and many
non-pages submitted by only one subjects. The pages that had the highest number of subjects
submitted were investigated. Those pages were, in fact, not a secondary target pages but partially
match with a key word in the question. For instance, the question asked the following: Find the
abstract of “A latent variable model for multivariate discretization.” The target pages contain the
exact abstract name, 14 subjects submitted the target page. On the other hand, 14 subjects submitted
the page that contain “Multivariate Discretization Method for Learning Bayesian Networks from
Mixed Data” page.
84
Figure 31: The percent of answers found, answers not found, and tasks incomplete for each
question.
1
6
11
16
21
26
31
5Q1H
15Q
2H2
4Q1H
14Q
2H2
9Q2H
29Q
1H1
5Q4L
22Q
1H1
2Q2H
24Q
3L1
5Q3L
11Q
1H1
3Q2H
24Q
4L2
9Q3L
12Q
3L1
1Q2H
23Q
4L2
3Q3L
11Q
4L2
2Q4L
29Q
4L2
1Q3L
1
0
5
10
15
20
25
# task
# page
Question ID
Distribution of pages submitted only non-target pages and not timed out
5Q1H15Q2H24Q1H14Q2H29Q2H29Q1H15Q4L22Q1H12Q2H24Q3L15Q3L11Q1H13Q2H24Q4L29Q3L12Q3L11Q2H23Q4L23Q3L11Q4L22Q4L29Q4L21Q3L1
Figure 32: Histogram of submitted pages each question only tasks that not timed out and the
target node not found
85
The number of answers found was analyzed using ANOVA with treatment (tool x Web
complexity x question type) as a within-subjects factor (Table 32). The sphericity assumption was
met (Appendix I.3, Table 56). No three-way interactions were found. Web site complexity by
question type interaction was significant (F=156.794, p < 0.001) and tool by question type interaction
was significant (F = 6.212, p < 0.001)(Figure 33). Tool by Web site interaction was not significant
(F=6.212, p=0.57). Web site complexity by question type interaction will be discussed later in this
session.
Table 32: ANOVA on the number of answers found
SourceType III Sum of
Squares df Mean Square F Sig. TOOL 2.344 2 1.172 3.053 .049 WEB 113.186 1 113.186 373.823 .000 QUESTION 196.779 1 196.779 503.658 .000 TOOL * WEB 1.955 2 .978 2.897 .057 TOOL * QUESTION 3.576 2 1.788 6.212 .002 WEB * QUESTION 43.340 1 43.340 156.794 .000 TOOL * WEB * QUESTION .144 2 .072 .245 .783 Error 62.690 214 .293
Hs = High information-scent question type, Ls = Low information-scent question typeHc = High complexity Web site, Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool
Figure 33: Cell line charts of mean number of answers found showing tool by Web site
complexity interaction and tool by question type interaction
For the low information-scent question type, the number of answers found using the graphical
overview was significantly higher than the number of answers found using the browser (Mean Diff =
0.227, p < 0.001, Appendix I.3, Table 57). The number of answers found using the graphical
overview was not significantly higher than the number of answers found using the integrated tool
86
(Mean Diff = 0.148, p <0.053). There were no significant differences in the number of answers found
between tools when questions were the high information-scent type.
In the low complexity Web site, the number of answers found using the graphical overview
was significantly higher than the number of answers found using the browser (Mean Diff = 0.157, p
<0.012, Appendix I.3, Table 58) and the integrated tool (Mean Diff = 0.148, p <0.035).
In other words, using the graphical overview was more effective in finding more answers
when the tasks became more difficult, in the high complexity Web site or in the low information-
scent question type when compared to the browser and the integrated tool.
4.2.4 Task performance
Task performance is measured by time spent on the task and number of pages viewed.
Time spent on task
Time spent on task was measured from when subject clicked on start button to when subject
clicked on submit or 6 min. (360 sec.) timed out. The summary of the time spent on task group by
tool, Web site complexity and question type was show in Table 33. The time-spent distribution of
each group was different. The high complexity Web site with the low information-scent question type
group had a negative skew because it contained many tasks that timed out -- ceiling effect of time
limit. The low complexity Web site with the high information-scent question type group has high
skew. The histogram of time spent is shown in Figure 34. The 240 tasks that exceeded time limit
appeared on the 360 sec bar.
Table 33: Summary statistic of time spent on tasks (sec.)
Tool Site Question Mean Std. Dev. Median SkewnessBrowser High High 77.275 83.878 40.333 1.962
Low 227.832 120.484 228.960 -0.205Low High 22.563 27.913 13.559 4.226
Low 117.145 88.176 98.167 1.092Graphical overview
High High 92.708 91.468 58.501 1.634Low 228.890 113.012 234.140 -0.177
Low High 31.658 33.180 20.425 2.887Low 118.416 86.805 93.345 1.251
Integrated High High 72.802 77.544 42.710 2.127Low 254.932 115.492 300.135 -0.644
Low High 27.754 33.570 17.445 3.860Low 121.292 88.993 98.535 1.183
Total 116.105 115.491 69.585 1.064
87
Histogram of time spent on task
0
50
100
150
200
250
300
350
400
10 40 70 100 130 160 190 220 250 280 310 340 More
Time (Sec.)
Freq
uenc
y
.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
FrequencyCumulative %
Figure 34: Histogram of time spent on task
Because the distribution of the time-spent data was not a normal distribution, natural log
transformation was applied to the time spent on task value. The extreme data was detected and
removed (Appendix I.4). The tasks that exceeded timed limit were included in this analysis. The
transformed time spent on task was analyzed in an ANOVA with treatment (tool x Web complexity x
question type) as a within-subjects factor. The missing data were replaced with group mean for the
analysis. The sphericity assumption was met (Appendix I.5, Table 60). Web complexity by question
type interaction was significant (F = 7.435, p = 0.007). Tool by question type interaction was
significant (F = 3.591, p = 0.029)(Figure 35).
Table 34: ANOVA on ln(time spent on task)
SourceType III Sum of
Squares df Mean Square F Sig. TOOL 12.471 2 6.236 12.911 .000 WEB 527.273 1 527.273 1192.078 .000 QUESTION 1324.265 1 1324.265 1898.668 .000 TOOL * WEB 1.464 2 .732 1.806 .167 TOOL * QUESTION 4.807 2 2.403 3.591 .029 WEB * QUESTION 3.734 1 3.734 7.435 .007 TOOL * WEB * QUESTION 1.265 2 .632 .876 .418 Error 154.480 214 .722
88
Hs = High information-scent question type, Ls = Low information-scent question type
B = Browser, G = Graphical overview, I = Integrated tool
Figure 35: Cell line chart of mean ln(time spent on task) grouped by tool, question type show
tool by question type interaction
A pairwise comparisons between tools using high and low information scent questions show
that subjects spent significantly more time to do the task using the graphical overview with the high
information-scent questions than when using the browser and the integrated tool for the same
question type (Mean Diff = 0.241, p <0.001 and Mean Diff =0.141, p < 0.021 respectively in ln(time)
unit, Appendix I.5, Table 61). For the low information-scent questions, time on task was significantly
lower when using the browser compared to using the integrated tool (Mean Diff = 0.146, p <0.015).
There was no significant difference when using the graphical overview compared to using the
browser and the integrated tool. The question type was an ordinal interaction with the tools and more
strong effect than the tools effect. The low information-scent question type took more time to
complete than high information-scent question type.
The absence of the tool by Web site complexity interaction in time spent on the task was
interesting. There was no significant difference in the time spent on task between the three tools
when Web sites were high or low complexity. The main effect of Web site complexity, question type,
and interaction between Web site complexity and question type are discussed later in this chapter.
Number of pages viewed
The number of pages viewed may be measured in terms of page views, pages, revisited page
views, and extra page views. The number of page views includes repeated pages. The number of
pages does not count repeated viewing. The number of revisited page views counts only those visited
more than one. The number of extra page views is the number of page views minus the minimum
number of pages required to complete the task.
89
Summary the number of page views, the number of pages, the number of revisited page views
and the number of extra page views are shown in Table 35 and Table 36. The distribution of the
number of pages views was highly skewed in the high information-scent question type. The
distribution of the number of page views, the number of pages, the number of revisited page views
and the number of extra page views showed an exponential distribution (Figure 36).
There were 22 tasks, by 17 subjects, where subjects viewed all the pages in the Web site. This
phenomenon only happened in the one Web site that had 16 pages, the low complexity Web site
condition. One task in the high information-scent question type condition used the integrated tool.
There were 21 tasks in the low information-scent question condition; 12 tasks employed the graphical
overview—5.6% of the total task in the same condition, and 9 tasks utilized the integrated tool – 4.2%
of the task in the same condition. This ceiling effect did not produce significant difference in the
mean value of the number of pages.
There were 35 tasks where the number of extra page views was less than zero. These
happened when subjects submitted the first page when using the graphical overview or the integrated
tool and when subject visited pages less than the number of page in the shortest path from the first
page to the target node. All of these cases the answer was not found. This occurrence was considered
to be the extreme case (Appendix I.4).
Table 35: Descriptive statistics of the number of page views and the number of pages
Number of page views Number of pages
ToolWeb complexity
Question type Mean
Std. Dev. Median
Skew-ness Mean
Std. Dev. Median
Skew- ness
Browser High High 9.824 9.362 6 2.525 7.81 5.469 5 2.089 Low 29.56 20.2 24 0.502 18.09 10.52 16 0.481
Low High 3.62 2.516 3 3.936 3.134 1 3 2.273 Low 15.94 10.66 15 1.16 8.125 4.121 8 0.544
Graphical overview
High High 7.167 9.062 3 2.736 6.111 7.579 3 3.196 Low 17.32 13.79 13.5 1.194 14.28 11.46 11 1.439
Low High 3.245 2.172 2 3.015 2.903 1.737 2 3.516 Low 12.61 7.133 12 1.252 10.67 4.478 12 -0.033
Integrated High High 6.759 7.199 5 4.717 5.727 4.585 5 3.255 Low 21.84 15.46 19 1.229 15.83 10.29 14 1.347
Low High 3.236 2.813 3 5.376 2.88 1.859 2 5.503 Low 13.29 8.393 13 0.95 9.458 4.503 10 -0.167
90
Table 36: Descriptive statistics of the number of revisited page views and the number of extra
page views
Number of revisited page views Number of extra page views
ToolWeb complexity
Question type Mean
Std. Dev. Median
Skew-ness Mean
Std. Dev. Median
Skew-ness
Browser High High 2.014 4.279 0 3.217 5.157 9.33 1 2.523 Low 11.47 10.66 8 0.751 24.73 19.96 19 0.489
Low High 0.486 1.739 0 5.084 0.787 2.442 0 4.151 Low 7.81 7.377 7 1.969 12.94 10.66 12 1.16
GraphicalOverview
High High 1.056 2.653 0 4.768 5.167 9.062 1 2.736 Low 3.037 5.052 1 3.088 15.32 13.79 11.5 1.194
Low High 0.343 0.859 0 4.444 1.245 2.172 0 3.015 Low 1.935 3.917 0 2.766 10.61 7.133 10 1.252
Integrated High High 1.032 3.438 0 7.617 4.759 7.199 3 4.717 Low 6.009 6.686 4 1.573 19.84 15.46 17 1.229
Low High 0.356 1.268 0 5.678 1.236 2.813 1 5.376 Low 3.829 5.38 1 2.115 11.29 8.393 11 0.95
Figure 36: Histograms of the number of page views, the number of pages, the number of
revisited page views, and the number of extra page views by tasks
91
There were 638 tasks (24.6% of the total tasks) where the number of extra page views was
equal to zero (Table 37). These tasks, the answers were found in 576 of the cases and not found in 62
of the cases. When the number of extra page views was zero and the answer was found, subjects
navigated through the Web site by the shortest possible path to the target node. This situation
accounted for 83% of the total task using the browser in the low complexity Web site with the high
information-scent questions.
Table 37: Number of tasks where the extra page views were zero
Tool Web Complexity
Question type
AnswerFound Not found Total %(1)
Browser High High 86 7 93 43.1%Low 1 3 4 1.9%
Low High 174 6 180 83.3%Low 9 4 13 6.0%
Graphical High High 64 10 74 34.3%Overview Low 1 6 7 3.2%
Low High 113 3 116 53.7%Low 7 4 11 5.1%
Integrated High High 15 2 17 7.9%Low 1 2 3 1.4%
Low High 100 7 107 49.5%Low 5 8 13 6.0%
Total 576 62 638 24.6%*(1) percent of tasks in that condition, 216 tasks total* percent of the total tasks, 2592 tasks.
Because the distributions of the number of pages viewed were not normal, the natural log
transformation was applied to the number of page views, the number of pages, the number of
revisited page views, and the number of extra page views. The extreme data were detected and
removed (Appendix I.4). The transformed variables were analyzed in ANOVA with treatment (tool x
Web complexity x question type) as a within-subjects factor. For analysis, the missing data were
replaced with the group mean value. The sphericity assumption was met (Appendix I.5, Table 62),
but in the number of pages variable Mauchly’s test of Sphericity was marginally insignificant.
However, the result of ANOVA when low-bound correction applied was consistent with the
sphericity assumption. Only ANOVA using the sphericity assumption was reported (Table 38). Post-
hoc analysis was conducted using the Bonferroni adjustment for multiple comparisons (Appendix I.6,
Table 63 and Table 64).
The main effect of Web site complexity, question type, and Web site complexity by question
type interaction is discussed later in session 4.2.5.
92
Table 38: ANOVA on ln(number of page views), ln(number of pages), ln(number of revisited
page views), and ln(number of extra page views)
Source MeasureType III Sum of
Squares df Mean Square F Sig. TOOL TOTAL 48.918 2 24.459 60.441 .000 DIFF 12.929 2 6.464 20.849 .000 REVISIT 157.645 2 78.823 117.521 .000 EXTRA 10.749 2 5.374 8.092 .000 WEB TOTAL 168.719 1 168.719 464.568 .000 DIFF 168.165 1 168.165 609.668 .000 REVISIT 54.030 1 54.030 92.348 .000 EXTRA 218.299 1 218.299 395.896 .000 QUESTION TOTAL 958.208 1 958.208 1420.319 .000 DIFF 659.229 1 659.229 1303.342 .000 REVISIT 661.235 1 661.235 718.385 .000 EXTRA 1773.994 1 1773.994 1850.155 .000 TOOL * WEB TOTAL 19.243 2 9.622 24.266 .000
DIFF 28.342 2 14.171 46.296 .000 REVISIT .520 2 .260 .490 .613 EXTRA 18.028 2 9.014 15.230 .000 TOOL * QUESTION
TOTAL .324 2 .162 .343 .710DIFF 12.880 2 6.440 18.712 .000
REVISIT 111.144 2 55.572 72.403 .000 EXTRA 23.773 2 11.886 14.369 .000 WEB * QUESTION
TOTAL 6.511 1 6.511 16.256 .000DIFF 3.105 1 3.105 12.038 .001
REVISIT 1.999 1 1.999 2.766 .099 EXTRA 12.876 1 12.876 19.195 .000 TOOL * WEB * QUESTION
TOTAL .120 2 .060 .142 .868DIFF 1.486 2 .743 2.358 .097REVISIT 6.187 2 3.093 5.555 .004EXTRA 1.131 2 .565 .733 .482
Error TOTAL 22789.012 214 106.491DIFF 9488.493 214 44.339REVISIT 4786.623 214 22.367EXTRA 1233.883 214 5.766
TOTAL = ln(number of page views), DIFF = ln(number of pages)REVISIT = ln(number of revisited page views), EXTRA = ln(number of extra page views)
In the number of page views, there were two significant two-ways interactions: tool by Web
site complexity interaction (F = 24.266, p < 0.001)(Figure 37), and Web site complexity by question
type interaction (F = 16.256, p < 0.001). Tool by question type interaction was not significant (F =
0.343, p = 0.710). The pairwise comparisons between tools in each Web site condition (Appendix
I.6, Table 63) showed there was no significant difference between using the graphical overview and
using the integrated tool in the low complexity Web site (Mean Diff. = 0.15, p = 1.000, in ln(number
93
of pages) unit). However, there was a significant difference between using the graphical overview and
using the integrated tool in the high complexity Web site (Mean Diff. = 0.251, p = 0.046, in
ln(number of pages) unit). The increase of the number of page views when the Web site was more
complex was smaller when using the graphical overview than when using the browser and the
integrated tool.
Hc = High complexity Web site,Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool
Figure 37: Cell line chart of mean ln(pages views) shows tool by Web site complexity interaction
For the number of pages, there were three two-way interactions, tool by Web site complexity
(F=46.296, p <0.001), tool by question type (F=18.717, p <0.001) (Figure 38) and Web site
complexity by question type (F=12.038, p =0.001). The pairwise comparisons between tools in each
Web site condition (Appendix I.6, Table 63) showed that using the graphical overview resulted in
significantly higher number of pages when compared to using the browser and the integrated tool in
the low complexity Web site (Mean diff. = 0.084, p = 0.030, and Mean diff. = 0.077 p=0.019
respectively, in ln(number of pages) unit). Using the graphical overview resulted in significantly
lower number of pages than the browser and the integrated tool in the high complexity Web site
(Mean diff = -0.427, p<0.001 and Mean Diff. = -0.218, p < 0.001, respectively, in ln(number of page)
unit). The browser and the integrated tool were not significantly different in the low complexity Web
site, but in the high complexity Web site the browser resulted in significantly higher number of pages
than using the integrated tool (Mean diff = 0.218,p <0.001, in ln(number of page) unit).
Tool by question type interaction was indicated by the fact that there was no significant
difference in the number of pages between the three tools with low information-scent questions. They
were all significantly different from each other with the high information-scent questions (Appendix
I.6, Table 64). In the high information-scent question type, the number of pages was the highest with
browser. The integrated tool fell in the middle and the graphical overview was the lowest.
94
Hs = High information-scent question type, Ls = Low information-scent question typeHc = High complexity Web site, Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool
Figure 38: Cell line charts for ln(number of pages) shows tool by Web site complexity
interaction and tool by question type interaction
For the number of revisited page views there was a three ways interaction: tool by Web site
complexity by question type (F = 5.555, p = 0.004) (Figure 39). Separate ANOVAs for the high and
low information-scent questions were applied. Tool by Web site complexity interaction was
significant (F=4.053, p = 0.019, Table 39) for the high information-scent questions but it was not
significant (F = 2.602, p = 0.110, Table 40) for the low information-scent questions.
Hs = High information-scent question type, Ls = Low information-scent question typeHc = High complexity Web site, Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool
Figure 39: Cell line chart of mean for ln(number of revisited page views) show tool by Web
complexity interaction.
95
Table 39: ANOVA on ln(number of revisited page views) only in the high information-scent
question type
SourceType III Sum of
Squares df Mean Square F Sig. TOOL 3.501 2 1.750 4.576 .011 WEB 17.622 1 17.622 58.574 .000 TOOL * WEB 2.928 2 1.464 4.053 .019 Error 32.191 107 .301
Table 40: ANOVA on ln(number of revisited page views) only in the low information-scent
question type
SourceType III Sum of
Squares df Mean Square F Sig. TOOL 265.289 2 132.644 125.635 .000 WEB 38.407 1 38.407 38.146 .000 TOOL * WEB 3.778 2 1.889 2.602 .110 Error 155.407 214 .726
For the high information-scent questions, there was no significant difference among tools in
the low complexity Web site. Using the browser resulted in a significantly higher number of revisited
page views than the graphical overview and the integrated tool which were not significantly different
from each other (Mean Diff. = 0.194, p <0.046 and Mean Diff. = 0.222, p < 0.010, in ln(number of
pages) unit, Appendix I.6, Table 65).
For the low information-scent questions, there was no significant interaction between tool
and Web site complexity (Appendix I.6,Table 66). All the tools were significantly different from each
other in the number of revisited page views, indicating a tool main effect. The number of revisited
page views was highest when using the browser. It was lower when using the integrated tool and the
lowest when using the graphical overview.
When using the browser 58.7% of revisited page views were generated by using the back and
forward button. When using the graphical overview only small number of revisited pages was
produced from the back and forward button (4.7% of revisited page views). When using the
integrated tool, 53.4 % of revisited page views came from the back and forward button. The browser
part and the graphical overview part of the integrated tool generated similar number of revisited page
views (23.0% and 23.3% of revisited page views, respectively).
For the number of extra page views, there were three two-way interactions: tool by Web site
complexity interaction (F=15.23, p <0.001), tool by question type interaction (F=14.369, p <0.001),
and Web by question type interaction (F=19.195, p <0.001)(Figure 40). The pairwise comparisons
96
between tools in each Web site condition (Appendix I.6, Table 63) showed that there was no
significance in the number of extra page views using different tools in the low complexity Web site.
Using the graphical overview had a significantly lower number of extra page views in the high
complexity Web site than when using the browser and or integrated tool (Mean diff = 0.234, p =
0.001, Mean diff = 0.336, p <0.001).
Hs = High information-scent question type, Ls = Low information-scent question typeHc = High complexity Web site, Lc = Low complexity Web site B = Browser, G = Graphical overview, I = Integrated tool
Figure 40: Cell line charts of mean ln(number of extra page views) show tool by Web site
complexity interaction and tool by question type interaction
The pairwise comparisons between tools in each question type condition (Appendix I.6,
Table 64) show that using the graphical overview in the low information-scent question type
produced a significantly lower number of extra page views than using the browser or the integrated
tool (Mean diff. = 0.250, p < 0.001 and Mean Diff. = 0.148, p = 0.026 respectively, in ln(number of
pages) unit). There was no significant difference between using the browser and the integrated tool.
In the high information-scent question type the integrated tool generated a significantly higher
number of extra page views than using the browser or the graphical overview (Mean diff=0.304, p
<0.001 and Mean diff = 0.164, p =0.008 respectively, in ln(number of pages) unit). There was no
significant difference between the browser and the graphical overview.
There was a significant interaction between tool and question type in the number of pages and
the number of revisited page views but the interaction was not significant in the number of page
views. Note that the number of page views is equal to the number of pages plus the number of
revisited page views.
97
4.2.5 Web complexity, Question type and their interaction
The Web site complexity by question type interaction effect was significant in the number of
task completions, the number of answers found, the time spent on the task, the number of page views,
the number of pages, and the number of extra page views. It is not significant only in the number of
revisited page views, which showed a three-way interaction between tools, Web site complexity, and
question type. All interactions between Web site complexity and question type were ordinal
interactions. More complex Web sites and low information scent questions affect the following (detail
in Appendix I.8, Table 68)
o It lowers the number of tasks completed within the time limit.
o It lowers the number of answers found.
o It lengthens the time spent on the tasks.
o It raises the number of pages views.
o It raises the number of pages.
o It raises the number of extra page views.
The interactions showed that magnitudes of changing these measurement values depended on
the levels of Web site complexity and questions information-scent score.
4.3 Summary task performance at each conditionThe pairwise comparisons between tools for each Web site complexity and question type are
summarized and shown in Table 41 (detailed in Appendix I.7, Table 67).
Table 41: Summary of tools difference in Web site complexity and question type condition.
Question typeWeb site complexity Measure High LowHigh Task completed B=G=I G>I, G=B, B=I
Answers found B=G=I G>B, G=I, B=ITime spent on task B=G=I B<I, I=G, G=BPages views G<I<B G<I<BPages G<I<B G<(B=I)Revisited page views (G=I)<B G<I<BExtra page views (B=G)<I G<I<B
Low Task completed B=G=I B=G=IAnswers found B=G=I G>(B=I)Time spent on task B<(G=I) B=G=IPages views (G=I)<B B=G=IPages (G=I)<B B<I<GRevisited page views B=G=I G<I<BExtra page views B<(G=I) B=G=I
B = Browser, G = Graphical overview, I = Integrated tool> significant greater than at .05 level, < significant less than at .05 level = no significant difference at .05 level
98
4.3.1 Low complexity Web sites with high information-scent questions
In the low complexity Web site with the high information-scent questions the tasks were the
easiest for the subjects. Subjects were able to finish all the tasks in the time limit. Subjects were able
to find the target pages in 93% of the tasks. 90% of tasks were done within 1 minute with average of
31 sec., and a median of 16 sec. Using the browser was significantly less time spent on task (mean =
22.7 sec., median =13 sec.) than using the graphical overview (mean = 31.7 sec., median = 20 sec.) or
the integrated tool (mean = 27.7 sec, median = 17.4 sec.)
For the browser, the number of page views and the number of pages was significantly higher
than using the graphical overview or using the integrated tool. This was from the fact that when using
the browser subjects must follow the links page by page. Subjects were able to use the browser to
follow the shortest path as indicated by the average number of revisited page views and the average
number of extra page views. Subjects did not obtain “single click away” advantage from the
graphical overview, i.e. the average of extra page views was 1.2 pages, which was worse than using
the browser. The integrated tool performance in the low complexity Web site with the high
information-scent questions was the average of using the browser part of the integrated tool alone and
using the graphical overview part of the integrated tool alone.
4.3.2 High complexity Web sites with high information-scent questions
In the high complexity Web site with high information-scent questions, 97% of the tasks
were done within the time limit and the answers were found in 82% of the tasks. There was no
significant difference between tools in the number of tasks completed and the number of answers
found. There was no significant difference between tools in the time spent on task (mean = 84 sec.
and median = 47 sec.)
The number of page views when using the browser was significantly higher than when using
the graphical overview and the integrated tool. The number of pages when using the browser was
significantly higher than when using the integrated tool. When using the integrated tool, the number
of pages was significantly higher than when using the graphical overview. When using the browser,
the mean for number of revisited page views was two pages and the median was zero pages. The
mean for number of extra page views was five pages and median was one page. This indicates that
subjects got off the shortest path track on more tasks compared to the tasks in low complexity Web
site with high information-scent questions. In the high complexity Web site with the high
information-scent questions, the average path length to the target node was longer than in the low
complexity Web site with the high information-scent questions. The minimum pages required to the
target page were 4.8 pages and 2.8 pages respectively. When using the graphical overview, subjects
99
viewed more pages before locating the target node, indicated by the number of extra page views
(mean = 5.1 pages and median = 1 page).
When using the integrated tool, the number of revisited page views was similar to when using
the graphical overview. The number of the extra page views when using the integrated tool was
significantly higher than using the browser and the graphical overview because the way to compute
extra page views was biased toward using the graphical overview part.
4.3.3 Low complexity Web sites with low information-scent questions
In the low complexity Web site with low information-scent questions, 97% of the tasks were
done within the time limit and the answers were found in 73% of the tasks. There was no significant
difference in the number of tasks completed between tools. Using the graphical overview, the number
of answers found (83% of the tasks in this condition) was significantly higher than with the browser
(66% of the tasks) and the integrated tool (69% of the tasks). There was no significant difference
between tools in time spent on tasks (overall mean = 116 sec. and median = 96 sec.)
There was no significant difference between tools in the number of page views (overall mean
= 13 pages). The numbers of pages was significantly different between tools. The highest number of
pages was by using the graphical overview (mean = 10.6 pages), the second was by using integrated
tool (mean = 9.4) and the lowest one was by using the browser (mean = 8.1 pages). The numbers of
revisited page views was the reverse to the number of pages. There was a significant difference in the
number of revisited page views between tools. The highest number of revisited page views was by
using the browser (mean = 7.8 pages), the second was by using the integrated tool (mean = 3.8 pages)
and the lowest was when using the graphical overview (mean = 1.9 pages). The number of extra page
views was high (overall mean = 11 pages, median = 11 pages) indicating that most subjects did not
navigate through the shortest path. The three Web sites in the low complexity condition had 16, 27,
and 29 html pages. On average, in the low complexity Web site with the low information-scent
question types, 41% of the total pages in the Web sites were viewed.
4.3.4 High complexity Web sites with low information-scent questions
In the high complexity Web site with low information-scent questions, the tasks were
difficult. Only 68% of the tasks in this condition were done within the time limit and the answers
were found in only 24% of the total tasks. The graphical overview was significantly better than the
browser and the integrated tool in the number of task completed and the number of answers found.
Using the graphical overview, 73% of the tasks in this condition were done within the time limit and
27% of the answers in this condition were the target pages. These were higher than by using the
100
browser (tasks completed = 69% of the tasks, the answers found = 20% of the tasks) and by using the
integrated tool (task completed = 62% of the tasks, the answer found = 26% of the tasks). Using the
integrated tool, the time spent on task (mean = 257 sec, median = 304 sec) was significant higher than
when using the graphical overview (mean = 230 sec, median = 239 sec) and the browser (mean = 225
sec, median = 224 sec.) There was no significant difference in the time spent on task between the
browser and the graphical overview.
There were significant differences between tools in the number of page views. The number of
page views was the highest when using the browser (mean = 29.6 pages and median = 24 pages). The
second the number of page views was when using the integrated tool (mean = 21.8 pages and median
= 19 pages). The lowest of the number of page views was when using the graphical overview (mean =
17.3 pages and median = 13.5 pages). There was no significant difference in the number of page
when using the browser (mean = 17.8 pages, median = 16 pages) or the integrated tool (mean = 15.7
pages, median = 14 pages). When using the graphical overview, the number of pages (mean = 14.4,
median = 11 pages) was significantly lower than others. The numbers of revisited page views was
significantly different when using different tools. Using the browser resulted in the highest number of
revisited page views (mean = 10.9 pages, median = 8 pages). Second was the integrated tool (mean =
6 pages, median = 4 pages). The lowest of the number of revisited page views was when using the
graphical overview (mean = 3.1 pages, median = 2 pages). The number of extra page views indicates
the shortest path to the target node was difficult to obtain.
4.4 User satisfactionThe user satisfaction is determined by using a subjective questionnaire called the Post-Study
System Usability Questionnaire (PSSUQ)(Lewis, 1995). The overall satisfaction score (OVERALL)
was computed using the arithmetic mean of 19 questions. The sub-categories, system usefulness
(SYSUSE), information quality (INFOQUAL), and interface quality (INTERQUAL) score were
computed with arithmetic mean of question items 1-8, 9-15, and 16-18 respectively. The PSSUQ uses
a 7-point scale where a higher score is better than lower score based on the anchors used in the scales.
The scores of N/A answers were disregarded.
There were five subjects who did not complete this questionnaire because of personal time
constraints. Incomplete answers and outliers (the score lower than 1.5 x interquartile range) were
detected and removed (6 scores). 97 scores were analyzed. The average user satisfaction score can be
found in Table 42. The PSSUQ score and sub-score were analyzed in ANOVA with tool as a within-
subjects factor. The sphericity assumption was not met (Appendix I.9, Table 69) so that ANOVA
with a lower-bound correction was used. There was a significant difference in user satisfaction scores
101
between tools in all scores (Table 43). The pairwise comparison showed significant differences in
overall score and all sub categories between the graphical overview and the browser and between the
graphical overview and the integrated tool (detailed in Appendix I.9, Table 70). The graphical
overview received a lower overall score and a lower score in all sub-categories as compared to the
browser and the integrated tool. There was a significant difference in the information quality score
between the browser and the integrated tool conditions. The integrated tool received a significantly
higher information quality score than the browser.
Table 42: Questionnaire descriptive statistics
Score Tool Mean* Std. Deviation OVERALL Browser 5.08 .984 Graphical overview 4.23 1.284
Integrated 5.19 1.084
SYSUSE Browser 5.34 1.050 Graphical overview 4.26 1.376
Integrated 5.35 1.160
INFOQUAL Browser 4.83 1.112 Graphical overview 4.29 1.313
Integrated 5.08 1.100
INTERQUAL Browser 4.98 1.270 Graphical overview 4.10 1.396
Integrated 5.06 1.156
* score value from 1-7
Table 43: ANOVA on PSSQU score with lower-bound correction
Source MeasureType III Sum of
Squares df Mean Square F Sig. TOOL OVERALL 53.588 1.000 53.588 43.841 .000 SYSUSE 75.846 1.000 75.846 46.193 .000
INFOQUAL 31.985 1.000 31.985 26.597 .000
INTERQUAL 55.138 1.000 55.138 28.137 .000
Error OVERALL 117.344 96.000 1.222SYSUSE 157.627 96.000 1.642INFOQUAL 115.448 96.000 1.203INTERQUAL 188.126 96.000 1.960
There was some evidence that the score of the browser may not be accurate. 22 subjects
wrote comments about feature that were not provide by the browser. The problem was that many
102
subjects thought the browser in the first of three PSSUQs was the integrated tool and the software,
used in collecting the questionnaire data, did not allow subjects to go back to the previous
questionnaire. The score of the graphical overview and integrated condition were more accurate.
When PSSUQ score was analyzed without these 22 cases, the information quality score difference
between the browser and integrated tool was not significant.
4.5 Support for HypothesesThe hypothesis of this research is:
H0: There is no difference in user performance in information-finding tasks between integrated
navigational tools and individual navigational tools.
H1: There are significant differences in user performance in information-finding tasks when using
different navigational tools in certain kinds of environments.
The null hypothesis is rejected and H1 is accepted. The results from experiment show that
user performances, in term of the number of tasks completed, the number of answers found, time
spent on the task, the numbers of pages viewed are different when using different tools, in certain
kinds of environments. The environments are classified by complexity of the Web sites and semantic
relatedness between question and information provided by the tools.
H1a: Integrated navigational tools, i.e. the browser and the graphical overview, will provide higher
performance in information-finding tasks and navigation within complex Web site spaces with high
information scent than will the browser or the graphical overview alone.
H1a: is rejected. The integrated navigational tool performance is not higher than the browser
or the graphical overview in the high complexity Web sites and high information scent score question
type. There was no significant difference in the time spent on task between tools. However when
using the integrated tool the number of page views was significantly higher than when using the
graphical overview. It was expected that the integrate tool would have an advantage in navigation
within the complex Web space with high information scent because it presented both the map view
and the browser.
H2: Subjects will perform better when using the browser than when using the graphical overview in
simple structured Web sites with little information scent.
H2: is rejected. The number of answers found by using the browser is less than using the
graphical overview in the low complexity Web site with the low information-scent question type.
The browser was expected to provide more information in the low information scent question
103
situation compared to the graphical overview and the performance of the browser should be better
when the Web sites were simple. It seems that the information provided by the browser, e.g.
sentences surround an anchors, did not contributed to performance. This may be because the low
information-scent question had very little relation to the overall information on the Web pages. There
were no significant difference in time spent on task and total pages viewed between tools in this
condition but using the graphical overview more pages were viewed than with the other tools. This
might explain why when using the graphical overview, more answers were found.
H3: Subject performance when using integrated navigational tools will degrade with the simplicity
of the hypertext, as the tool becomes a noise contributor rather than an information provider.
H3 is accepted. The time spent on task using the integrated tool was significant higher than
using the browser alone in the low complexity Web site with the low information-scent questions.
The integrated tool did help improving efficiency in the low complexity Web site with low
information-scent questions as shown by the number of revisited page views. However, the integrated
tool performance in the low complexity Web site was in between the browser and the graphical
overview.
104
5 CONCLUSIONS AND FUTURE STUDYThis study set out to examine the performance of various tools with Web site complexity and
information scent controlled. This was motivated by the belief that these factors had a significant
impact on tool performance. This conclusion would appear to be supported. Web site complexity and
information scent do have an impact on navigation for the purpose of information finding. There is
evidence that supports the conclusion that integrated tools add a level of cognitive overhead to the
task. This is supported by the higher time used in the high complexity Web site with the low
information scent condition and in the transition time from anchor to icon. While the data do not
warrant any firm conclusion beyond those described in the results, the experimenter believes the
research also supports the conclusions that:
Information sent may be the single biggest factor in improving Web site browsing.
Experiments assessing “new” navigational tools will continue to be biased by user preference
for the tools with which they are already familiar.
5.1 Review of the researchThis research sought to understand the use of the integrated navigational tools to find
information in a Web site. An empirical experiment was conducted. Three navigational tools were
investigated, a browser, a graphical overview, and an integrated tool. The environments were varied
in terms of Web site complexity and level of information scent. Task performance was measured in
terms of the tasks completed within the time limit, the number of answers found, the time spent on
task, and the numbers of pages viewed. The numbers of pages viewed were calculated as the total
number of page views, the number of pages, the number of revisited page views, and the number of
extra page views.
In order to classify the Web site into the high and low complexity categories, the
measurements of Web site structure were investigated. A sample of 83 Web sites was analyzed in
term of their structure. Three structural measurements were used as the indicator of the Web site
complexity: the number of HTML nodes, the mean root distance, and the connection ratio. Six Web
sites were selected for the main experiment: three low complexity Web sites and three high
complexity Web sites.
Questions were created from randomly selected pages of the selected Web sites. The
questions were classified in term of their information scent by an information scent experiment. Two
sets of questions were selected based on their information scent score: the high information-scent
questions and the low information-scent questions.
105
The full factorial (3 tools x 2 Web site complexity x 2 question type), repeated measurement
within subject experiment was conducted. The 108 subjects were recruited from students at the
University of Pittsburgh.
Subjects used the integrated tool by alternating individual tools in different tasks or mixing
the use of tools within the same task. The performance of the integrated tool was not superior to the
individual tools alone. There appeared to be cognitive overhead associated with the integrated tool.
The results indicated that there was the extra time needed when switching from the browser to the
map overview. The extra time showed in a significantly longer time spent on tasks when using the
integrated tool in the low information scent question type condition. This finding is similar to result
of Olsen and Nilsen (1987) that adding more features to a system causes lower performance on task.
The graphical overview provided an effective way to navigate in a Web site as indicated by
the number of revisited page views, which was significantly less when using the graphical overview
and the integrated tool in comparisons to the browser.
The experiment showed that there were interactions between tools, Web site complexity, and
questions type. As a consequence, tool performance in different environments showed results. For
instance, there was no difference in time spent on the task when using the browser or the integrated
tool in the high information-scent question condition but there was a significant difference in the low
information-scent question condition.
Both Web site complexity and information scent had an effect on the navigation performance.
The result also shows an interaction between Web site complexity and information scent. Low
information scent and high complexity Web site caused performance to degrade more than low
information scent or high complexity Web site alone.
5.2 Summary findingThe experimental results may be summarized as follows
o Subjects used both of navigational tools in the integrated tool.
o Number of tasks completed within time limit was higher when using the graphical
overview.
o Number of answers found was higher when using the graphical overview.
o Time spent on the tasks was less when using the browser.
o Number of page views was highest when using the browser, second, when using the
integrated tool and lowest when using the graphical overview.
o Using the browser, more pages were viewed than with integrated tools and the
graphical overview, except when the Web site was of low complexity.
106
o More pages were revisited when using the browser.
o Subjects were more satisfied using the browser and the integrated tool than using the
graphical overview.
o Performance of the integrated tool is in between the browser and the graphical
overview except for the time spent on task in the high complexity Web site with low
information-scent questions and extra page views in the high complexity Web site
with high information-scent questions.
The following interactions were found:
Three-way interactions, tool by Web site complexity by question type, were found in the
number of tasks completed and the number of revisited page views.
Two-way interactions between tool and Web site complexity were found in the number
of page views, the number of pages, and the number of extra page views. The number of
answers found was marginally insignificant interaction.
Two-way interactions between tool and question type were found in the number of
answers found, time spent on tasks, the number of page views, the number of pages and
the number of extra page views. The interaction was found in the number of tasks
completed in the high complexity Web site condition but the interaction was not
significant in the low complexity Web site condition.
Two-way interactions between Web site complexity and question were found in the
number of answers found, time spent on task, the number of pages views, the number of
pages, and the number of extra page views. The interaction was also found in the number
of revisited page views in the high information scent questions but it was not significant
in the low information scent questions.
5.3 Comparison to prior research resultsInformation finding task performances were different for different question types classified
by information scent score and Web site complexity. There was an interaction between tool and Web
site complexity and another interaction between tool and question type. These findings are consistent
with Furnas’s framework (Furnas, 1997) to determine the effectiveness of a view of a space. The
graphical overview and the browser present different views of the same space. The difference
between the views affected the performance. Information finding task performance was affected by
the semantic relatedness between question and information provided by tool -- the “residual” (Furnas,
1997) or “information scent” (Pirolli, Card, & Wege, 2000).
107
The experiment conducted by Monk, Walsh, & Dix (1988), which indicated that the static
map aided hypertext navigation performance, predicted that the integrated tool should perform better
in terms of lower response time in the low complexity Web site condition. This experiment results
showed differently: using the browser had a lower response time than the integrated tool. It might be
argued that subjects in the Monk et al. experiment did not have experience using the browser so that
the performance of using the browser was low.
In comparison to the Hammond & Allinson (1989) experiment, the current research showed
that subjects used the map overview part more than the browser part of the integrated tool in the low
complexity Web site (comparable to the Hammond and Allinson experiment which had 32
information screens). This difference in tool use contrasts with the Hammond and Allinson
experiment which reports subjects using a map in the hypertext with map feature 39% of the time.
However, in their experiment, the map had to be initiated while in the current experiment the map
was presented side by side with the browser. The new-to-old ratio, the number of pages divided by
the number of pages views, for the directed task (similar to our information finding tasks) using
hypertext (comparable to the browser), hypertext with map (comparable to the integrated tool) were
0.27 and 0.47 respectively. In the current experiment for the low complexity Web site, the new-to-
old ratio when using the browser was 0.51 in the low information-scent question type and the new-to-
old ratio when using the integrated tool was 0.71. The conclusions were similar that the browser was
less effective in the number of pages viewed compare to the integrated tool. However, our
experiment indicated the browser performed faster in the low complexity Web site with the high
information-scent question type and no differently than the integrated tool in the low information-
scent question type but there was no significant difference in time performance between tools in their
experiment. One possible explanation may be that their questions were on average in the low
information scent question category.
The experiment by Wright & Lickorish (1990) indicated the number of pages, when using an
index page (comparable to the graphical overview), were less than when using the page navigation
(comparable to the browser). This is consistent with the current experiment that found that subjects
when using the browser, in general, viewed more pages than with other tools. However, they reported
no significant difference in time performance between two navigational tools in the direct finding
question type. The current result did show the difference in time performance.
The results of the current experiment related to the integrated tool performance in terms of
the time spent on task were consistence with Heo (2000) in which the integrated tool took more time
than the browser to complete the information finding task. However, the differences of the time spent
on task were detected in specific environments that were in the high complexity Web site with the
108
low information-scent question type and in the low complexity Web site with the high information
scent question type. Heo’s experiment did not detect the interaction between size of Web site and
tools in time spent on task but the current experiment showed an interaction between Web site
complexity and tools. It might be the case that the size of the Web site was not the parameter that
interacted with the tools. Heo’s experiment reported no significant difference in task accuracy
between the tools (i.e. the browser and three integrated tools) consistence with the current experiment
in which there was no significant difference in the number of answer found between the integrated
tool and the browser. However, in the current experiment, the graphical overview performed better in
terms of the number of answers found.
5.4 Issues to reconsiderOne major consideration in the experiment was the time limit and its impact to the results.
Initially, the task time limit was set at 10 minutes. A pilot study was conducted and subjects in the
pilot study suggested that the time was too long. The time limit was reduced to 6 minute. This was
based on the fact that 90% of the answers that were found in the pilot study were found in that time.
However, the experimenter failed to detect that this had a substantial impact on tasks in the high
complexity Web site with low information-scent questions. As a consequence, the number of tasks
completed in this condition was low and produced the ceiling effect on other performance
measurements.
The questions in the low information scent group had very low scores. The questions in this
condition were not difficult questions. The information scent score was low because the Web pages
and the graphical map did not provide the information that was needed in such the way that it was
easy to find the answer to the questions. When the information scent was too low, it was very difficult
to find the answer, particularly in the high complexity Web sites. As a result, the high complexity
Web site with low information scent had too many tasks that exceeded the time limit and many of the
answers were not found. In the WWW environment, there are a lot of redundant sources of
information. When a user fails to find information in a Web site, changing Web site may be easier
than continuing try to explore the Web site, but this strategy was not available to the subjects in this
study.
The effect of the information scent had a significant impact on the navigational performance.
The information scent measurement might be improved. The browser information scent score
calculation was simplified by using the arithmetic mean of the Web page scores on the shortest path
to the target node. The score may be more theoretically grounded by using a conditional probability
instead of an arithmetic mean.
109
5.5 Future researchThe first question asked by many subjects was whether the tools had a search capability.
Searching by search engine or search function within tools is important to the information finding
task. Integrating a search function might help overall performance. For instance, in the simple Web
structure with high information-scent questions, simple browsing might perform well without a
search function. The search function might detract from the main finding task. For low information
scent questions, a search function might be more beneficial.
The results showed that subjects used the integrated tool in an integrated way. However, the
appearance of the tools in the experiment was the fixed on screen, no moving and no closing. The
result might be different if the tools did not appear side by side.
The integrated tool performance was not better than the single tool. This research did not
have enough information to answer why. The integrated tool did provide some improvement over the
browser alone, e.g. it reduced the number of re-visited pages and allowed more answers to be found.
The experiment showed that the question type based on information-scent score and the Web
site complexity had a high impact to the performance. These parameters could be used as a
performance predictor in the Web site design process. If this is to be done, the process for obtaining
information-scent may have to be refined to be practical. The difficult task will be to predict user and
customer information need generally. Information scent also depends on prior knowledge and
common knowledge about the subject of a Web site. Small scale Web site usability with focus group
subjects might be useful.
On the other hand, the Web site complexity metric is more objective and easier be automate.
This property is useful in an iterative design methodology. The relation between Web site complexity
and the navigational performance should be investigated in more in detail.
110
Appendix A : Web visualize tools
Figure 41: Web browser with a distortion technique tool
Figure 42: Web browser with a zoom technique tool
111
Figure 43: Web browser with an expanding outline technique tool
112
Appendix B : URI in HTML tagsThe following tags-attribute from HTML (version 4) DTD specification will be followed by URI;
A href
APPLET codebase
AREA href
BASE href
BLOCKQUOTE cite
BODY background
DEL cite
FORM action
FRAME longdesc
FRAME src
HEAD profile
IFRAME longdesc
IFRAME src
IMG longdesc
IMG src
IMG usemap
INPUT src
INPUT usemap
INS cite
LINK href
OBJECT classid
OBJECT codebase
OBJECT data
OBJECT usemap
Q cite
SCRIPT for
SCRIPT src
The “ismap” attribute of IMG tags indicate that the client software should capture the clicked pointer
location, i.e. x and y coordinate, and derives URI request of the parent A tag by append ‘?’ followed
by x, y values pass to the server.
Appendix C : Stratum formulaLet D be a directed graph and d(u,v) be a distance between u and v in D.
The distance sum for all u in D and only d(vi,u) is finite.
The ai is the sum of the finite entry on row ith of D.
The distance sum for all u in D and only d(u,vj) is finite.
The bi is the sum of the finite entry on column ith of D.
The linear absolute prestige (LAP) is given by
where n is the number of nodes in D.
The total absolute prestige (TAP) is given by
for i form 0 to n.
The stratum (St) is define as
113
Appendix D : Web site structure statistic
Table 44: Correlations between numbers of nodes
Total URLs URLs within site HTML nodes
Total URLs 1.000 .992(**) .942(**)
URLs within site .992(**) 1.000 .935(**)
HTML nodes .942(**) .935(**) 1.000
Pearson Correlation** Correlation is significant at the 0.01 level (1-tailed).
Table 45: Correlations between numbers of links
Total links Navigation links Connections
Total links 1.000 .967(**) .960(**)
Navigation links .967(**) 1.000 .969(**)
Connections .960(**) .969(**) 1.000
Pearson Correlation** Correlation is significant at the 0.01 level (1-tailed).
Table 46: Distance measurement correlation
Directed distance mean
Bi-direction distance mean
Jump to root distance mean
Root distance mean
Directed distance mean 1.000 .759(**) .932(**) .848(**)
Bi-direction distance mean .759(**) 1.000 .901(**) .792(**)
Jump to root distance mean .932(**) .901(**) 1.000 .923(**)
Root distance mean .848(**) .792(**) .923(**) 1.000
Pearson Correlation** Correlation is significant at the 0.01 level (1-tailed).
114
Table 47: Correlation between the Web site metrics
HTM
L no
des
Con
nect
ions
Con
nect
ion
per H
TML
node
-1 ra
tio
Con
nect
ed
ratio
Com
pact
ness
HTML nodes 1.000 .820(**) .138 -.004 .019
Connections .820(**) 1.000 .505(**) .263(**) .288(**)
Connections per HTML node-1 ratio .138 .505(**) 1.000 .594(**) .619(**)
Connected ratio -.004 .263(**) .594(**) 1.000 .998(**)
Compactness .019 .288(**) .619(**) .998(**) 1.000
Stratum -.367(**) -.267(**) -.213(*) .020 .003
Directed distance mean .586(**) .297(**) -.069 -.035 -.024
Bi-direction distance mean .507(**) .109 -.309(**) -.528(**) -.520(**)
Jump to root distance mean .617(**) .258(*) -.174 -.296(**) -.284(**)
Root distance mean .647(**) .346(**) .048 -.137 -.122Pearson Correlation* Correlation is significant at the 0.05 level (1-tailed).** Correlation is significant at the 0.01 level (1-tailed).
Table 47 (Cont.)
Stra
tum
Dire
cted
di
stan
ce
mea
n
Bi-d
irect
ion
dist
ance
m
ean
Jum
p to
ro
ot
dist
ance
m
ean
Roo
t di
stan
ce
mea
nHTML nodes -.367(**) .586(**) .507(**) .617(**) .647(**)
Connections -.267(**) .297(**) .109 .258(*) .346(**)
Connections per HTML node-1 ratio -.213(*) -.069 -.309(**) -.174 .048
Connected ratio .020 -.035 -.528(**) -.296(**) -.137
Compactness .003 -.024 -.520(**) -.284(**) -.122
Stratum 1.000 -.325(**) -.370(**) -.381(**) -.352(**)
Directed distance mean -.325(**) 1.000 .759(**) .932(**) .848(**)
Bi-direction distance mean -.370(**) .759(**) 1.000 .901(**) .792(**)
Jump to root distance mean -.381(**) .932(**) .901(**) 1.000 .923(**)
Root distance mean -.352(**) .848(**) .792(**) .923(**) 1.000Pearson Correlation* Correlation is significant at the 0.05 level (1-tailed).** Correlation is significant at the 0.01 level (1-tailed).
115
Appendix E : Web Sites in the experiment and their properties
Id Type Name HTML
nodes Mean root
distanceConnected
ratio9 High Intelligent Systems Program
URL:http://www.isp.pitt.edu/ 261 3.9000 0.6442
2 High Dept. of SurgeryURL:http://www.surgery.upmc.edu/ 327 3.0951 0.0265
1 High Women's Studies ProgramURL:http://www.pitt.edu/~womnst/ 105 2.5192 0.5034
4 Low Dept. of Hispanic Languages and LiteratureURL:http://www.pitt.edu/~hispan/ 29 1.8571 1.0000
5 Low Dept. of StatisticsURL:http://www.stat.pitt.edu/ 27 1.9231 0.8376
3 Low Film Studies Programhttp://www.pitt.edu/~filmst/ 16 1.8000 1.0000
7 Practices Dept. of History of Art and Architecturehttp://www.pitt.edu/~arthome/index.html 62 2.4918 0.8072
8 Practices Dept. of Information Science and Telecommunicationshttp://www.sis.pitt.edu/~dist/
118 2.0000 0.8020
Scan date 02-Nov-2000
116
Appendix F : Information Scent experiment
F.1 Information scent experiment instruction sheet
117
Information scent experiment Instruction sheet
Thank you, for participating in this experiment. The objective of this information scent
experiment is to measure the semantic information, i.e. information scent, between a given question
and a Web site. This values obtained will be used as a part of Ph.D. thesis, Semantics, Complexity
and Capability: The Use of Integrated Navigational tools for Information Finding in Hypertext
Document Space.
The experiment will use special software to present and collect data. Your task is to select
pages or anchors which you think there are most likely to contain or lead to information that answers
the given question. In total, there are 150 screens for this experiment.
Two types of screen will be presented. The first one is a Graphical Overview, which shows
all the pages in a Web site as icons. In this view, your task is to select three of the icons (pages) that
are likely to contain the answer of the given question. You should order the selections putting the
most likely first. A screen snapshot of the Graphical overview is shown as Figure 1.
Figure 1: Graphical Overview
1
2
3
4
118
The overview can be manipulated using the by scroll bars at the bottom (1) and right (2). The
small map may also be used to navigate by clicking or dragging the small box on it (3), which
represents what you are seeing. The view can be zoomed in or out by +, - buttons or scroll bar on the
left near the +, - button (4).
The second screen type is similar to the Web browser as shown in Figure 2. A given page
will be presented. Your task is to select up to three anchors that are most likely to lead to the page
that you think will contain the answer or be closer to the page containing the answer. You must
select at least one, but you can select up to three. If you select more than one anchor, the first should
be the one most likely to move you to or toward the page with the answer. Keep in mind that for a
given page, you may only selecting a link that will move you toward the answer. Note that, in the
browser screen, anchors are usually represented by text with blue color and underline, however,
sometime image areas are also selectable anchors. One indication of an anchor on an image is when
the cursor shows as . This means when the cursor is over an anchor area.
Figure 2: Browser
119
For both tasks, the selected pages or anchors will be shown in a list box like that shown in
Figure 3. The order of selected items can be re-arranged by selecting the item and using the or
button. You may delete the item by selecting and pressing the button.
Figure 3: Selected anchors or pages
After finishing the experiment, there will be two questionnaires to collect demographic data
and Web site familiarity information.
Your participation in this research study is completely voluntary. You do not have to take
part in this research study and, should you change your mind, you can withdraw from the study at any
time.
There are no direct risks or benefits to participation. Indirectly, you will have a chance to be
exposed to a state-of-the art technology and learn more about navigational tools. You will also
contribute indirectly to the development of such technology.
There will be no cost to you. The experiment will take approximately one hour. Upon
completion, you will receive a $25 payment for your participation or $7 per hour.
120
F.2 Questions in the information scent experiment and results
Table 48: Questions, target Web page and selected Web pages for information scent experiment
Site Question Target, question, and test pages9 9Q1H1**
-> 9Q2H2Target: http://www.isp.pitt.edu/courses/2710.htmlQuestion: What was the course description of ISSP 2710?
http://www.isp.pitt.edu/index.htmlhttp://www.isp.pitt.edu/new/Information/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/complete.html
9 9Q2H2**->9Q1H1
Target: http://www.isp.pitt.edu/courses/3540.htmlQuestion: What was the course description of ISSP 3540?
http://www.isp.pitt.edu/index.htmlhttp://www.isp.pitt.edu/new/Information/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/classes-frame.htmlhttp://www.isp.pitt.edu/new/Information/Classes/complete.html
9 9Q3H3 Target: http://www.isp.pitt.edu/~whq/wangresume.htmlQuestion: What papers has graduate student HaiQin Wang published?
http://www.isp.pitt.edu/http://www.isp.pitt.edu/new/directory/students/student-webpage-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/directory.htmlhttp://www.isp.pitt.edu/~whq/
9 9Q4L1 Target: http://www.isp.pitt.edu/program/specstud.htmlQuestion: Does ISSP accept special students?
http://www.isp.pitt.edu/index.htmlhttp://www.isp.pitt.edu/new/Map/map.html
9 9Q5L2*->9Q4L2
Target: http://www.isp.pitt.edu/~smonti/HTML/DOCUMENTS/ais99.htmlQuestion: Find the abstract of “A latent variable model for multivariate discretization.”
http://www.isp.pitt.edu/http://www.isp.pitt.edu/new/directory/students/student-webpage-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/directory.htmlhttp://www.isp.pitt.edu/~smonti/index.htmlhttp://www.isp.pitt.edu/~smonti/HTML/publications.html
9 9Q6L3*->9Q3L1
Target: http://www.isp.pitt.edu/~carenini/storage/new-papers-frame.htmlQuestion: Who (s) wrote “Describing Complex Charts in Natural Language: A caption Generation System”
http://www.isp.pitt.edu/http://www.isp.pitt.edu/new/directory/people-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/student-webpage-frame.htmlhttp://www.isp.pitt.edu/new/directory/students/directory.htmlhttp://www.isp.pitt.edu/~carenini/
2 2Q1H1**->2Q1H1
Target: http://www.surgery.upmc.edu/contact/plastic/Russavage/Russavageeducation.htmQuestion: Where was James M. Russavage (professor of plastic surgery) educated
121
Site Question Target, question, and test pagesand what was his training in?
http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/contact/FScontact.htmhttp://www.surgery.upmc.edu/contact/plastic/facplas.htmhttp://www.surgery.upmc.edu/contact/plastic/Russavage/Russavagebio.htm
2 2Q2H2 Target: http://www.surgery.upmc.edu/resident/general/awards.htmQuestion: Find the list of Resident Research Awards/Grants in the General Surgery Residency Training Program.
http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSresident.htmhttp://www.surgery.upmc.edu/resident/FSResTraining.htmhttp://www.surgery.upmc.edu/resident/FSResGen.htmhttp://www.surgery.upmc.edu/resident/general/research.htm
2 2Q3H3 Target: http://www.surgery.upmc.edu/resident/pediatric/application.htmQuestion: What is the street address for submitting an application for the Pediatric Surgery Resident Program?
http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSresident.htmhttp://www.surgery.upmc.edu/resident/FSResTraining.htmhttp://www.surgery.upmc.edu/resident/FSResPed.htm
2 2Q4L1*->2Q4L2
Target: http://www.surgery.upmc.edu/resident/oncology/research.htm Question: Who worked on the research about "Identification of Tumor Vasculature Binding Peptides Using an E. coli Peptide Display Library"? http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSresident.htmhttp://www.surgery.upmc.edu/resident/FSResTraining.htmhttp://www.surgery.upmc.edu/resident/FSResOnc.htm
2 2Q5L2*->2Q3L1
Target: http://www.surgery.upmc.edu/contact/plastic/Shestak/Shestaklicense.htmQuestion: What professional and scientific societies does Kenneth C. Shestak belong to?
http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/contact/FScontact.htmhttp://www.surgery.upmc.edu/contact/NAVcontact.htmhttp://www.surgery.upmc.edu/contact/plastic/facplas.htmhttp://www.surgery.upmc.edu/contact/plastic/Shestak/Shestakbio.htm
2 2Q6L3**->2Q2H2
Target: http://www.surgery.upmc.edu/contact/plastic/Manders/Mandershours.htmQuestion: What are Ernest Manders’s outpatient clinic hours?
http://www.surgery.upmc.edu/old.htmhttp://www.surgery.upmc.edu/FSsplash.htmhttp://www.surgery.upmc.edu/contact/FScontact.htmhttp://www.surgery.upmc.edu/contact/plastic/facplas.htmhttp://www.surgery.upmc.edu/contact/plastic/Manders/Mandersbio.htm
1 1Q1H1**->1Q1H1
Target: http://www.pitt.edu/~womnst/newsletters/newsf98/vote.htmlQuestion: In 1998, what was the status of the Pennsylvania Women’s Vote project?
http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.html
122
Site Question Target, question, and test pageshttp://www.pitt.edu/~womnst/newsletters/newsf98/contents.html
1 1Q2H2 Target: http://www.pitt.edu/~womnst/contactus/contactus.htmlQuestion: Find the web page for adding your name in the Women studies program’s mailing list.
http://www.pitt.edu/~womnst/index.html 1 1Q3H3**
->1Q2H2Target: http://www.pitt.edu/~womnst/newsletters/newsfall96/w7.htmlQuestion: What are the discussions at the UN Women’s Conference “ONE YEAR LATER”, in Fall 1996?
http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsfall96/newsltr1.html
1 1Q4L1*->1Q3L1
Target: http://www.pitt.edu/~womnst/newsletters/newsf98/grat.htmlQuestion: Who were the patrons of the women’s study program, in March 1998?
http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsf98/contents.html
1 1Q5L2*->1Q4L2
Target: http://www.pitt.edu/~womnst/newsletters/newsfall96/call.htmlQuestion: When was the due date to send the abstracts for the George Washington university conference on Cultural Violence, March 7-9, 1997?
http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsfall96/newsltr1.html
1 1Q6L3 Target: http://www.pitt.edu/~womnst/newsletters/newsf98/eholmes.htmlQuestion: Who is Erin Holmes?
http://www.pitt.edu/~womnst/index.htmlhttp://www.pitt.edu/~womnst/newsletters/news.htmlhttp://www.pitt.edu/~womnst/newsletters/newsf98/contents.html
4 4Q1H1 Target: http://www.pitt.edu/~hispan/related.html Question: What are the Spanish Language Periodicals suggested by the Hispanic Languages and Literatures program?
http://www.pitt.edu/~hispan/index.html4 4Q2H2**
->4Q1H1Target: http://www.pitt.edu/~hispan/fac-jbra.htmlQuestion: What are Jerome Branche’s specialties?
http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/fac.html
4 4Q3H3**->4Q2H2
Target: http://www.pitt.edu/~hispan/grad-ma.htmlQuestion: How many credits are required for the Master of Arts (MA) in Hispanic Languages & Literatures?
http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/grad.html
4 4Q4L1*->4Q3L1
Target: http://www.pitt.edu/~hispan/fac-mm.htmlQuestion: Who wrote “Literatura y cultura nacional en Hispanoamérica” (1910-1940)?
http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/fac.html
123
Site Question Target, question, and test pages4 4Q5L2*
->4Q4L2Target: http://www.pitt.edu/~hispan/fac-leeman.htmlQuestion: Who has research interests in interaction in second language acquisition, feedback and negative evidence in SLA, task-based language learning and teaching?
http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/fac.html
4 4Q6L3 Target: http://www.pitt.edu/~hispan/fac-tp.htmlQuestion: Who wrote the Ph.D. dissertation on “The Production and Perception of Vowel Sounds: A Case Study of Peruvian students learning English as a foreign language”
http://www.pitt.edu/~hispan/index.htmlhttp://www.pitt.edu/~hispan/fac.html
5 5Q1H1 Target: http://www.stat.pitt.edu/abstracts.htmlQuestion: Find the abstracts of the “PERFECT SAMPLING: AN INTRODUCTION” seminar (Thursday, October 26, 2000)?
http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/news.html
5 5Q2H2**->5Q2H2
Target: http://www.stat.pitt.edu/pfenning.html Question: What is one technique Dr. Pfenning is interested in using to enhance student involvement in her courses?
http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/people.html
5 5Q3H3**->5Q1H1
Target: http://www.stat.pitt.edu/block.htmlQuestion: What class Prof. Henry W. Block teaching in Spring 2000.
http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/people.html
5 5Q4L1*->5Q4L2
Target: http://www.stat.pitt.edu/ds.htmlQuestion: Who has research interests in time series, spatial statistics, longitudinal data analysis and applications to medicine, epidemiology, molecular biology and computer vision?
http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/people.html
5 5Q5L2*->5Q3L1
Target: http://www.stat.pitt.edu/ts.htmlQuestion: Who has research interests in reliability theory, applied probability theory, stochastic processes, and dependence concepts?
http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/people.html
5 5Q6L3 Target: http://www.stat.pitt.edu/students.htmlQuestion: Who is Robert Buck?
http://www.stat.pitt.edu/index.htmlhttp://www.stat.pitt.edu/graduate.htmlhttp://www.stat.pitt.edu/ci.htmlhttp://www.stat.pitt.edu/grad.html
3 3Q1H1 Target: http://www.pitt.edu/~filmst/pittfaculty.htmlQuestion: Who are the faculty members in Film Studies?
124
Site Question Target, question, and test pageshttp://www.pitt.edu/~filmst/index.html
3 3Q2H2**->3Q1H1
Target: http://www.pitt.edu/~filmst/pittevents.htmlQuestion: What talk was given on WEDNESDAY OCTOBER 25TH and where?
http://www.pitt.edu/~filmst/index.html3 3Q3H3**
->3Q2H2Target: http://www.pitt.edu/~filmst/pittgradcourse.htmlQuestion: What courses were required by graduate film study?
http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittgrad.htmlhttp://www.pitt.edu/~filmst/pittgradcourse.html
3 3Q4L1*->3Q3L1
Target: http://www.pitt.edu/~filmst/pittugcourses.htmlQuestion: What is the title of the advertisement taken from Motion Picture Magazine 1913?
http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittundergrad.html
3 3Q5L2 Target: http://www.pitt.edu/~filmst/ugcatone.htmlQuestion: What is the course number for “The World of China: Chinese National Cinema”?
http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittundergrad.htmlhttp://www.pitt.edu/~filmst/pittugcourses.html
3 3Q6L3*->3Q4L2
Target: http://www.pitt.edu/~filmst/pittugmajor.htmlQuestion: Find the Web page that shows the patent for Edison’s Kinetograph.
http://www.pitt.edu/~filmst/index.htmlhttp://www.pitt.edu/~filmst/pittundergrad.html
* Selected as low information scent question.** Selected as high information scent question.-> Question ID that use in the main experiment
125
Table 49: Information-scent score
Web Site Question ID
Avg. of Graphical
overview scent score Std. Dev.
Avg. of Browser scent score Std. Dev.
Avg. of information scent Std. Dev.
High Complex
9Q2H2** 0.88 0.249 0.84 0.219 0.86 0.1669Q1H1** 0.90 0.316 0.81 0.189 0.86 0.1939Q3H3 0.50 0.471 0.69 0.328 0.59 0.3259Q4L1 0.55 0.438 0.31 0.352 0.43 0.3319Q6L3* 0.13 0.219 0.05 0.263 0.09 0.2149Q5L2* 0.00 0.000 0.15 0.396 0.07 0.1982Q1H1** 0.55 0.497 0.80 0.270 0.68 0.2932Q6L3** 0.40 0.516 0.36 0.212 0.38 0.2912Q2H2 0.30 0.483 0.40 0.355 0.35 0.3462Q3H3 0.10 0.316 0.49 0.626 0.30 0.3092Q5L2* 0.03 0.105 0.06 0.285 0.05 0.1572Q4L1* 0.00 0.000 -0.41 0.298 -0.21 0.1491Q1H1** 0.50 0.471 0.54 0.524 0.52 0.3871Q3H3** 0.53 0.502 0.40 0.345 0.47 0.3141Q6L3 0.55 0.497 -0.29 0.430 0.13 0.4241Q2H2 0.43 0.439 -0.56 0.486 -0.06 0.3481Q4L1* 0.00 0.000 -0.17 0.448 -0.09 0.2241Q5L2* 0.03 0.105 -0.44 0.468 -0.20 0.243
Low Complex
4Q2H2** 0.93 0.211 0.48 0.079 0.70 0.1084Q3H3** 0.85 0.337 0.39 0.161 0.62 0.2384Q6L3 0.30 0.483 -0.04 0.347 0.13 0.3274Q1H1 0.22 0.334 0.00 0.000 0.11 0.1674Q4L1* 0.10 0.316 -0.09 0.204 0.00 0.2184Q5L2* 0.00 0.000 -0.03 0.129 -0.01 0.0655Q3H3** 0.87 0.281 0.28 0.236 0.58 0.1905Q2H2** 0.83 0.272 0.28 0.172 0.56 0.1565Q6L3 0.40 0.370 0.18 0.186 0.29 0.2325Q1H1 0.40 0.459 0.11 0.314 0.26 0.3595Q5L2* 0.00 0.000 0.15 0.218 0.08 0.1095Q4L1* 0.05 0.158 0.05 0.327 0.05 0.1923Q2H2** 0.95 0.158 0.93 0.237 0.94 0.1353Q3H3** 0.63 0.350 0.66 0.388 0.64 0.3143Q1H1 0.95 0.158 0.00 0.000 0.48 0.0793Q5L2 0.05 0.158 0.73 0.188 0.39 0.1343Q4L1* 0.05 0.158 -0.07 0.395 -0.01 0.2233Q6L3* 0.00 0.000 -0.15 0.344 -0.08 0.172
Overall 0.39 0.340 0.22 0.382 0.30 0.317* Selected as low information scent question. ** Selected as high information scent question.
Appendix G : The main experiment instruction sheet
126
Navigational Tools Experiment Instruction Sheet
Thank you, for participating in this experiment. The objective of this navigational tools
experiment is to measure and compare navigation performance between three navigational tools in
various conditions. The data obtained will be used as a part of Ph.D. thesis, Semantics, Complexity
and Capability: The Use of Integrated Navigational tools for Information Finding in Hypertext
Document Space.
The experiment will use special software to present and collect data. Your task is to find a
Web page that provides information that answers given questions. In total, there are 30 tasks in this
experiment. There will also be the follow up questionnaire, to gather demographic data, user
satisfaction information etc.
There are three navigational tools used in the experiment, a Browser similar to what you have
used, a Graphical Overview which provides a graph of Web pages, and a tool which combines the
Browser and Graphical Overview.
IF YOU HAVE ANY QUESTIONS, PLEASE AT ANY TIME ASK THE
EXPERIMENTOR. THANK YOU.
127
Navigational Tools
Browser
The Browser is a simplified version of a Web Browser, e.g. Internet Explorer. It is shown in
Figure 1. You are limited to navigating within a given Web site. Navigation can be done by clicking
on an anchor (1), a back (2) and a forward (3) button.
Figure 1: Browser
In the browser screen, anchors are usually represented by text with blue color and underlines.
However, sometimes image areas are also selectable anchors. One indication of an anchor on an
image is when the cursor shows as a when moving over the image. This means the image is an
anchor that may be clicked.
1
2
3
128
Graphical Overview
The Graphical Overview, shown in Figure 2, provides a map view and text viewer. The map
is the whole Web site. Web page is present by an icon on the map and a link between pages is
represented by a line. A selected Web page is shown in text viewer. Your view of the Web site can be
changed by using scroll bars at the bottom (1) and right (2) and dragging the main map area. The
small map may also be used to navigate by clicking or dragging the small box on it (3). The area in
the box represents what you are seeing. The view can be zoomed in or out by +, - buttons or scroll
bar on the left near the +, - button (4). Clicking on any icon (5) on the map will show the content of
that page on the text viewer (6). Back and forward button (7) can be use for going back, i.e. previous
view and forward.
When pages are visited, the text color of icon will change to purple. The current page on the
text viewer is indicated by red icon. The location of that selected page will be shown in a small map
as a red dot. In this mode, there will be no links in the text viewer.
Figure 2: Graphical Overview
1
2
3
4
5 6
7
129
Browser and Graphical Overview
The Browser and Graphical Overview is an integrated tool, shown in Figure 3. The
navigational functions in both the browser and graphical overview will work. You may navigate to a
page by clicking on an icon in the Graphical overview or an anchor in the Browser. Both tools are
synchronized, when you select a page in the browser the map will scroll automatically and show the
icon of selected page in red color. When you select an icon on the Graphical overview, the browser
will show that page.
Figure 3: Browser and Graphical Overview
130
Software for the experiment
You will need to take the following steps as you move through the experiment.
1. You will fill in a user ID. It will be assigned to you by the experimenter.
2. After entering the user ID, you will be shown an introduction page and brief pages for each step in
experiment. Read the information and click on “Next >” (1) to continue to the next page.
1
131
3. There will be 2 sessions, one for practice and one for the experiment purpose.
The practice session is designed to allow you to practice using the navigational tools and
become familiar with the tasks. There will be 3 practice tasks with each navigational tool. Take your
time and play with the functions in each navigational tool. There is no time limited for the practice
session
The experimental session will then be conducted to collected data. Each task will be limited
in 6 minutes. The time will be show by small clock show below. When all the circle is all black, time
is expired.
There are a total of 30 questions in the experimental session.
4. When a task page appears, the question or instruction will be shown at the top of the screen (1).
Click start (2) to begin task. The clock (3) will start counting.
5. The navigational tool will not show up until you press start. You will use the navigational tool to
navigate to the page that contains the information to answer the given question. When you have found
the page you want, click on submit (1).
6. If you cannot find the answer with in a time limit (6 min.), the navigational tool will disappear. Do
not worry about not completing the task, your goal should only be to do your best. Some target pages
will be difficult to find. Click on Submit to continue to next question.
7. You may take a break any time. However, to make the results of the experiment as consistent as
possible, please take a break in between the tasks, i.e. after submit and before start.
12 3
1
132
After finishing the experiment, there will be a questionnaire to collect demographic data,
Web site familiarity information, etc. Answer all the questions. When you finish, the “done” button
at the bottom will be enabled, click on “done” button to go to the next part. In some part there are
multiple pages. Using “<previous” or “next>” buttons to change a page or using a page tab to go to a
certain page.
Your participation in this research study is completely voluntary. You do not have to take
part in this research study and, should you change your mind, you can withdraw from the study at any
time.
There are no direct risks or benefits to participation. Indirectly, you will have a chance to be
exposed to a state-of-the art technology and learn more about navigational tools. You will also
contribute indirectly to the development of such technology.
There will be no cost to you. The experiment will take approximately one and a half hours.
Upon completion, you will receive a $15 payment for your participation.
133
Appendix H : Questionnaires
H.1 Demographics, Computer and World Wide Web Experience formFigure 44 shows the screen that used to obtain demographics, computer and WWW experience.
Figure 44: Demographic data screen
134
H.2 Web sites familiarity scoreFigure 45 shows the screen that used to obtain the Web site familiarity scores. There were 7
pages. The same question was asked with difference Web site names and pictures.
Figure 45: Web site familiarity screen
135
H.3 User satisfaction QuestionnaireThe questionnaire based on the based on Post-Study System Usability Questionnaire (PSSUQ)
(Lewis, 1995). The software showed the following instruction in the first screen;
This questionnaire gives you an opportunity to tell us your reactions to having used Browser,
Graphical Overview and Browser + Graphical Overview. Your responses will help us understand
what aspects of software you are particularly concerned about and the aspects that satisfy you.
To as great a degree as possible, think about all the tasks you just performed while you
answer these questions. Please read each statement carefully and indicate how strongly you agree
or disagree with the statement by checking a number on the scale.
If you are certain that a statement does not apply to you, check N/A.
Three set of four pages questionnaire show in Figure 46.
Figure 46: User Satisfaction Questionnaire screen
136
List of questions that appeared in the user satisfaction questionnaire (Figure 46);
1. Overall, I am satisfied with how easy it is to use Browser
2. It was simple to use Browser
3. I can effectively complete my work using Browser
4. I am able to complete my work quickly using Browser
5. I am able to efficiently complete my work using Browser
6. I feel comfortable using Browser
7. It was easy to learn to use Browser
8. I believe I became productive quickly using Browser
9. Browser gives error messages that clearly tell me how to fix problems
10. Whenever I make a mistake using Browser, I recover easily and quickly
11. The information (such as online help, on-screen messages, and other documentation) provided
with Browser is clear
12. It is easy to find the information I needed
13. The information provided for Browser is easy to understand
14. The information is effective in helping me complete the tasks and scenarios
15. The organization of information on Browser screens is clear
16. The interface of Browser is pleasant
17. I like using the interface of Browser
18. Browser has all the functions and capabilities I expect it to have
19. Overall, I am satisfied with Browser
List the most negative aspect(s) of Browser:1:__________________________________________________________________
2:__________________________________________________________________List the most positive aspect(s) of Browser:1:___________________________________________________________________
2:___________________________________________________________________
The all of the word “Browser” in the question were replace with “Graphical Overview” in the second
questionnaire and replace with “Browser + Graphical Overview” in the third questionnaire screen.
137
Appendix I : Statistical Analysis results
I.1 Tool usage statistic
Table 50: Pairwise Comparisons ln(time between clicking) of the integrated tool
Web site complexity
Question type
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
(I) Event (J) Event Lower Bound Upper Bound High High I-I I-A .046 .098 1.000 -.212 .304 A-I -.876* .110 .000 -1.167 -.585 A-A .066 .069 1.000 -.117 .248 I-A A-I -.922* .129 .000 -1.261 -.582 A-A .020 .096 1.000 -.233 .272 A-I A-A .941* .108 .000 .655 1.227 Low I-I I-A -.234* .056 .000 -.383 -.086 A-I -1.021* .056 .000 -1.168 -.874 A-A -.223* .036 .000 -.318 -.127 I-A A-I -.787* .072 .000 -.977 -.597 A-A .011 .058 1.000 -.142 .165 A-I A-A .798* .058 .000 .646 .950 Low High I-I I-A -.154 .166 1.000 -.592 .284 A-I -.351 .212 .588 -.911 .209 A-A .214 .137 .710 -.147 .574 I-A A-I -.197 .229 1.000 -.802 .408 A-A .368 .162 .138 -.059 .794 A-I A-A .565* .209 .042 .012 1.117 Low I-I I-A -.553* .099 .000 -.814 -.292 A-I -.988* .085 .000 -1.212 -.764 A-A -.522* .049 .000 -.651 -.392 I-A A-I -.435* .124 .003 -.763 -.106 A-A .031 .103 1.000 -.241 .304 A-I A-A .466* .091 .000 .227 .705 Based on estimated marginal means* The mean difference is significant at the .05 level. Adjustment for multiple comparisons: Bonferroni.I-I icon-icon clicking, I-A icon-anchor clicking, A-I anchor-icon clicking, A-A anchor-anchor clicking
138
I.2 Task completion statistic
Table 51: Mauchly's Test of Sphericity on number of tasks completed
Within Subjects Effect Mauchly’s W Approx.
Chi-Square df Sig.
EpsilonGreenhouse-
Geisser Huynh-Feldt Lower-
bound TOOL .984 1.698 2 .428 .984 1.000 .500 WEB 1.000 .000 0 . 1.000 1.000 1.000 QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB .962 4.077 2 .130 .964 .981 .500 TOOL * QUESTION .926 8.139 2 .017 .931 .947 .500 WEB * QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB * QUESTION .993 .743 2 .690 .993 1.000 .500
Table 52: Pairwise Comparisons on number of task completed between tools in question type
conditions only in high complex Web site
QUESTION (I) TOOL (J) TOOL
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound Upper Bound High B G -.074 .081 1.000 -.271 .122 I .148 .094 .351 -.080 .376 G I .222* .079 .018 .029 .415 Low B G .056 .034 .327 -.028 .139 I -.009 .021 1.000 -.060 .041 G I -.065 .030 .103 -.138 .009 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool
139
I.3 Number of answer found statistic
Table 53: Number task that subject visited the target node but submitted other node or time out.
Web site complexity Question type N % answers not foundHigh High 2 1.7%
Low 19 3.9%High Total 21 3.5%Low High 13 29.5%
Low 37 20.8%Low Total 50 22.5%Grand Total 71 8.6%
Table 54: Number of answer grouped by question
Answer not foundWeb site complexity
Question type
Answer found Not timed out Timed outQuestion N %(row) N %(row) N %(row)
High High 1Q1H1 79 73.1% 25 23.1% 4 3.7%1Q2H2 66 61.1% 42 38.9%2Q1H1 96 88.9% 9 8.3% 3 2.8%2Q2H2 87 80.6% 13 12.0% 8 7.4%9Q1H1 101 93.5% 6 5.6% 1 0.9%9Q2H2 102 94.4% 5 4.6% 1 0.9%
High Total 531 81.9% 100 15.4% 17 2.6%Low 1Q3L1 27 25.0% 60 55.6% 21 19.4%
1Q4L2 21 19.4% 46 42.6% 41 38.0%2Q3L1 65 60.2% 38 35.2% 5 4.6%2Q4L2 17 15.7% 49 45.4% 42 38.9%9Q3L1 16 14.8% 37 34.3% 55 50.9%9Q4L2 14 13.0% 52 48.1% 42 38.9%
Low Total 160 24.7% 282 43.5% 206 31.8%High Total 691 53.3% 382 29.5% 223 17.2%Low High 3Q1H1 108 100.0%
3Q2H2 79 73.1% 29 26.9%4Q1H1 101 93.5% 7 6.5%4Q2H2 104 96.3% 4 3.7%5Q1H1 106 98.1% 2 1.9%5Q2H2 106 98.1% 2 1.9%
High Total 604 93.2% 44 6.8%Low 3Q3L1 59 54.6% 45 41.7% 4 3.7%
3Q4L2 59 54.6% 45 41.7% 4 3.7%4Q3L1 84 77.8% 17 15.7% 7 6.5%4Q4L2 76 70.4% 30 27.8% 2 1.9%5Q3L1 91 84.3% 17 15.7%5Q4L2 101 93.5% 7 6.5%
Low Total 470 72.5% 161 24.8% 17 2.6%Low Total 1074 82.9% 205 15.8% 17 1.3%Grand Total 1765 68.1% 587 22.6% 240 9.3%
140
Table 55: Task submitted only the answer not found
Number of task
submitted not target
Number of page
submitted
Number of subjects
submitted per page
Number of task in the highest submitted not target page
Web site complexity
Question type Question Avg. SD N
% of answer not found
% of answer found
High High 1Q1H1 25 16 1.56 1.31 9 36.0% 11.4%1Q2H2 42 8 5.25 7.61 35 83.3% 53.0%2Q1H1 9 8 1.13 0.35 2 22.2% 2.1%2Q2H2 13 13 1.00 0.00 1 7.7% 1.1%9Q1H1 6 4 1.50 1.00 3 50.0% 3.0%9Q2H2 5 5 1.00 0.00 1 20.0% 1.0%
Low 1Q3L1 60 25 2.40 4.15 23 38.3% 85.2%1Q4L2 46 20 2.30 1.87 9 19.6% 42.9%2Q3L1 38 22 1.73 2.05 10 26.3% 15.4%2Q4L2 49 32 1.53 1.67 10 20.4% 58.8%9Q3L1 37 25 1.48 1.12 5 13.5% 31.3%9Q4L2 52 23 2.26 3.15 14 26.9% 100.0%
Low High 3Q1H13Q2H2 29 7 4.14 5.52 16 55.2% 20.3%4Q1H1 7 5 1.40 0.89 3 42.9% 3.0%4Q2H2 4 3 1.33 0.58 2 50.0% 1.9%5Q1H1 2 2 1.00 0.00 1 50.0% 0.9%5Q2H2 2 2 1.00 0.00 1 50.0% 0.9%
Low 3Q3L1 45 9 5.00 6.93 23 51.1% 39.0%3Q4L2 45 11 4.09 3.21 10 22.2% 16.9%4Q3L1 17 10 1.70 1.25 5 29.4% 6.0%4Q4L2 30 10 3.00 3.77 11 36.7% 14.5%5Q3L1 17 7 2.43 3.78 11 64.7% 12.1%5Q4L2 7 5 1.40 0.55 2 28.6% 2.0%
Table 56: Mauchly's Test of Sphericity on number of answers found
Within Subjects Effect Mauchly's WApprox. Chi-
Square df Sig.
EpsilonGreenhouse-
GeisserHuynh-Feldt Lower-
bound TOOL .991 1.005 2 .605 .991 1.000 .500 WEB 1.000 .000 0 . 1.000 1.000 1.000 QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB .982 1.889 2 .389 .983 1.000 .500 TOOL * QUESTION .991 .963 2 .618 .991 1.000 .500 WEB * QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB * QUESTION
.980 2.150 2 .341 .980 .998 .500
141
Table 57: Pairwise comparisons on number of answers found between tools in question type
conditions
QUESTION (I) TOOL (J) TOOL
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
High B G .019 .047 1.000 -.096 .133 I -.023 .046 1.000 -.136 .089 G I -.042 .052 1.000 -.169 .085 Low B G -.227* .060 .001 -.373 -.081 I -.079 .065 .682 -.236 .079 G I .148 .061 .053 -.001 .298 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.
Table 58: Pairwise Comparisons on number of answers found between tools in Web site
complexity conditions
WEB (I) TOOL (J) TOOL
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
High B G -.051 .058 1.000 -.192 .090 I -.093 .054 .273 -.225 .039 G I -.042 .059 1.000 -.186 .102 Low B G -.157* .053 .012 -.287 -.028 I -.009 .063 1.000 -.164 .145 G I .148* .058 .035 .008 .289 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.
142
I.4 Outliers: Extreme casesExtreme cases were excluded from the data analyses. The following cases were classified as
extreme:
In the high complexity web site with low information-scent questions, there were 6 tasks
where subjects spent an unreasonably short time on the task. The time spent on task was less
than 3.7 sec. where the mean was 237.2 sec. and the SD 116.8 sec.
In the high complexity web site, there were 10 tasks where subjects spent a short time on the
task. The time spent on task was less than 10 sec. where the mean was 159.1 sec. and the SD
was 128.6 sec.
In the high information-scent questions, there were 18 tasks where subjects spent 360 sec.
where the mean was 54.1 sec and the SD was 69.4.
In the high complexity web site, there were 3 tasks where subjects viewed more than 75
pages where the mean was 15.4 pages and the SD was 15.6.
In the low complexity web site, there were 4 tasks where subjects viewed more than 50 pages
where the mean was 8.6 pages and the SD was 8.4.
When using graphical overview and using integrated tool, there was 12 tasks where the
number of pages equaled zero where subjects used the map for some time and then submitted
without viewing any other page besides the start page.
In low complexity web site, there were 5 tasks where subjects revisited many pages. The
number of revisited page views was more that 25 pages where the mean was 2.4 pages and
the SD was 4.9.
In high complexity web site, there were 3 tasks where subjects revisited many pages. The
number of revisited page views was more that 35 pages where the mean was 4.1 pages and
the SD was 7.1.
There were 35 tasks where the number of extra node was less than zero. Subjects viewed
fewer pages than the number required to perform the tasks.
Some tasks were given in more than one condition. A total of 27 extreme cases out of 2,592 were
excluded in the time spent on task analysis. 45 extreme cases were excluded in the number of page
views, the number of pages, and the revisited page views analysis. 68 extreme cases were excluded in
the extra page views analysis. The number of extreme case break down by tool type, web site
complexity, and question type are shown in Table 59.
143
Table 59: Number of extreme case
Time Page ExtraTool Web complexity Question type N %(1) N %(1) N %(1)Browser High High 0 0.0% 4 1.9% 12 5.6%
Low 4 1.9% 4 1.9% 13 6.0%Low High 1 0.5% 1 0.5% 1 0.5%
Low 6 2.8% 6 2.8% 12 5.6%Graphical Overview
High High 4 1.9% 14 6.5% 14 6.5%Low 2 0.9% 2 0.9% 2 0.9%
Low High 0 0.0% 0 0.0% 0 0.0%Low 1 0.5% 1 0.5% 1 0.5%
Integrated High High 5 2.3% 8 3.7% 8 3.7%Low 2 0.9% 3 1.4% 3 1.4%
Low High 0 0.0% 0 0.0% 0 0.0%Low 2 0.9% 2 0.9% 2 0.9%
Total 27 1.0%* 45 1.7%* 68 2.6%*(1) percent of the 216 total tasks in the condition * percent of the 2,592 total tasksTime – the time spent on task analysisPage – the number of page views, the number of pages, and the number of revisited page view analysis Extra – the number of extra page analysis
144
I.5 Time spent on task statistic
Table 60: Mauchly's Test of Sphericity on ln(time spent on task)
Within Subjects Effect Mauchly's W
Approx. Chi-Square df Sig.
EpsilonGreenhouse-
Geisser Huynh-
FeldtLower-bound
TOOL .997 .371 2 .831 .997 1.000 .500 WEB 1.000 .000 0 . 1.000 1.000 1.000 QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB .990 1.030 2 .598 .990 1.000 .500 TOOL * QUESTION .998 .215 2 .898 .998 1.000 .500 WEB * QUESTION 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB * QUESTION
.994 .656 2 .720 .994 1.000 .500
Table 61: Pairwise comparisons on ln(time spent on task)
QUESTION (I) TOOL (J) TOOLMean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound Upper BoundHigh B G -0.241* 0.056 0.000 -0.377 -0.105 I -0.100 0.052 0.173 -0.226 0.027 G I 0.141* 0.051 0.021 0.017 0.265Low B G -0.085 0.047 0.217 -0.200 0.029
I -0.146* 0.051 0.015 -0.270 -0.022G I -0.061 0.052 0.747 -0.188 0.067
Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.
145
I.6 Number of page viewed statistic
Table 62: Mauchly's Test of Sphericity for number of pages statistic
Within Subjects Effect Measure Mauchly's W
Approx. Chi-Square df Sig.
EpsilonGreenhouse-
GeisserHuynh-
FeldtLower-bound
TOOL TOTAL .991 .997 2 .607 .991 1.000 .500 DIFF .981 1.992 2 .369 .982 1.000 .500 REVISIT .970 3.246 2 .197 .971 .988 .500 EXTRA .974 2.830 2 .243 .974 .992 .500 WEB TOTAL 1.000 .000 0 . 1.000 1.000 1.000 DIFF 1.000 .000 0 . 1.000 1.000 1.000 REVISIT 1.000 .000 0 . 1.000 1.000 1.000 EXTRA 1.000 .000 0 . 1.000 1.000 1.000 QUESTION TOTAL 1.000 .000 0 . 1.000 1.000 1.000 DIFF 1.000 .000 0 . 1.000 1.000 1.000 REVISIT 1.000 .000 0 . 1.000 1.000 1.000 EXTRA 1.000 .000 0 . 1.000 1.000 1.000 TOOL * WEB TOTAL .953 5.137 2 .077 .955 .972 .500
DIFF .953 5.090 2 .078 .955 .972 .500REVISIT .997 .347 2 .841 .997 1.000 .500EXTRA .951 5.359 2 .069 .953 .970 .500
TOOL * QUESTION
TOTAL .991 .906 2 .636 .992 1.000 .500DIFF .979 2.283 2 .319 .979 .997 .500REVISIT .991 1.002 2 .606 .991 1.000 .500EXTRA .972 3.034 2 .219 .973 .990 .500
WEB * QUESTION
TOTAL 1.000 .000 0 . 1.000 1.000 1.000DIFF 1.000 .000 0 . 1.000 1.000 1.000REVISIT 1.000 .000 0 . 1.000 1.000 1.000EXTRA 1.000 .000 0 . 1.000 1.000 1.000
TOOL * WEB * QUESTION
TOTAL .977 2.460 2 .292 .978 .996 .500DIFF .946 5.920 2 .052 .948 .965 .500REVISIT .954 4.963 2 .084 .956 .973 .500EXTRA .995 .548 2 .760 .995 1.000 .500
146
Table 63: Pairwise comparisons on Ln(number of pages) between tools in Web complexity
conditions
Measure
Web complexity
(I) TOOL
(J) TOOL
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
TOTAL High B G .540* .053 .000 .411 .669 I .290* .045 .000 .181 .398 G I -.251* .046 .000 -.363 -.138 Low B G .123* .038 .004 .031 .215
I .139* .040 .002 .042 .236 G I .015 .034 1.000 -.068 .098 DIFF High B G .427* .048 .000 .310 .544 I .218* .041 .000 .120 .317 G I -.209* .042 .000 -.310 -.107 Low B G -.084* .032 .030 -.162 -.006
I -.007 .033 1.000 -.086 .072 G I .077* .028 .019 .010 .145 REVISIT High B G .617 .061 .000 .470 .764 I .352 .054 .000 .220 .484 G I -.265 .051 .000 -.388 -.142 Low B G .580 .048 .000 .463 .697
I .385 .053 .000 .255 .514 G I -.196 .048 .000 -.313 -.078 EXTRA High B G .234* .063 .001 .080 .388 I -.102 .056 .217 -.238 .035 G I -.336* .055 .000 -.470 -.201 Low B G -.115 .049 .065 -.235 .005
I -.092 .053 .259 -.221 .037 G I .023 .044 1.000 -.085 .131 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool
147
Table 64: Pairwise Comparisons on Ln(number of pages) between tool in question type
conditions
Measure
Question type
(I) TOOL
(J) TOOL
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
TOTAL High B G .341 .045 .000 .231 .451 I .241 .038 .000 .148 .334 G I -.100 .041 .047 -.199 -.001 Low B G .323 .048 .000 .207 .439
I .187 .049 .001 .068 .307 G I -.136 .048 .017 -.252 -.019 DIFF High B G .340* .039 .000 .245 .434 I .224* .032 .000 .145 .303 G I -.115* .037 .007 -.205 -.026 Low B G .003 .042 1.000 -.099 .106
I -.013 .041 1.000 -.111 .086 G I -.016 .042 1.000 -.117 .085 REVISIT High B G .092 .047 .165 -.023 .207 I .122* .043 .014 .019 .226 G I .031 .036 1.000 -.056 .117 Low B G 1.106* .069 .000 .938 1.274
I .614* .073 .000 .436 .792 G I -.492* .067 .000 -.655 -.328 EXTRA High B G -.140 .066 .112 -.301 .022 I -.304* .062 .000 -.454 -.153 G I -.164* .053 .008 -.293 -.035 Low B G .259* .056 .000 .123 .394
I .110 .059 .195 -.033 .254 G I -.148* .056 .026 -.284 -.013 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool
148
Table 65 Pairwise comparisons on Ln(number of re-visited pages) between tools in Web
complexity conditions only in the high information-scent question type
WEB (I) TOOL (J) TOOL
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound Upper Bound High B G .194* .079 .046 .002 .385 I .222* .074 .010 .043 .401 G I .028 .063 1.000 -.125 .181 Low B G -.010 .045 1.000 -.120 .099 I .023 .041 1.000 -.076 .122 G I .033 .037 1.000 -.058 .124 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool
Table 66: Pairwise comparisons on Ln(number of re-visited pages) between tools only in the low
information-scent question type
(I) TOOL (J) TOOLMean Difference
(I-J) Std. Error Sig.
95% Confidence Interval for Difference
Lower Bound Upper Bound B G 1.106* .069 .000 .938 1.274 I .614* .073 .000 .436 .792 G I -.492* .067 .000 -.655 -.328 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool
149
I.7 Tools performances comparisons
Table 67: Tools performances comparisons
MeasureWeb complexity
Question type
(I) TOOL
(J) TOOL
Mean Difference
(I-J)Std.
Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
Task completed
High High B G .056 .034 .327 -.028 .139 B I -.009 .021 1.000 -.060 .041 G I -.065 .030 .103 -.138 .009 Low B G -.074 .081 1.000 -.271 .122 B I .148 .094 .351 -.080 .376 G I .222* .079 .018 .029 .415Low High B G .000 .000 . .000 .000 B I .000 .000 . .000 .000 G I .000 .000 . .000 .000 Low B G .009 .034 1.000 -.072 .091 B I .009 .036 1.000 -.078 .097 G I .000 .032 1.000 -.078 .078
Answer found
High High B G .046 .082 1.000 -.152 .245 B I -.074 .076 1.000 -.260 .112 G I -.120 .087 .508 -.332 .091 Low B G -.148 .074 .145 -.329 .032 B I -.111 .078 .475 -.301 .079 G I .037 .074 1.000 -.144 .218Low High B G -.009 .047 1.000 -.122 .104 B I .028 .052 1.000 -.098 .154 G I .037 .043 1.000 -.069 .143 Low B G -.306* .096 .006 -.540 -.072 B I -.046 .101 1.000 -.293 .200 G I .259 .094 .021 .030 .489
Time spent on task
High High B G -.149 .083 .224 -.350 .052 B I -.019 .084 1.000 -.224 .187
Ln(sec) G I .130 .082 .348 -.069 .329 Low B G -.061 .057 .863 -.200 .078 B I -.172* .065 .027 -.329 -.015 G I -.111 .061 .215 -.260 .038Low High B G -.332* .065 .000 -.490 -.174 B I -.181* .064 .017 -.337 -.025 G I .152 .064 .061 -.005 .308 Low B G -.110 .082 .542 -.308 .089 B I -.120 .081 .419 -.316 .076 G I -.010 .076 1.000 -.196 .176
150
Table 67: (Cont)
MeasureWeb complexity
Question type
(I) TOOL
(J) TOOL
Mean Difference
(I-J)Std.
Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
Total pages viewed Ln(pages)
High High B G .551* .081 .000 .354 .748 B I .332* .064 .000 .176 .487 G I -.219* .074 .011 -.398 -.040 Low B G .530* .069 .000 .361 .698 B I .248* .074 .004 .067 .428 G I -.282* .074 .001 -.462 -.103Low High B G .131* .038 .002 .038 .223 B I .151* .039 .001 .055 .246 G I .020 .042 1.000 -.084 .123 Low B G .116 .062 .190 -.034 .266 B I .127 .061 .118 -.021 .275 G I .011 .054 1.000 -.121 .143
Different pages viewed
High High B G .536* .071 .000 .364 .709 B I .305* .054 .000 .173 .438 G I -.231* .066 .002 -.392 -.071
Ln(pages) Low B G .318* .062 .000 .166 .469 B I .131 .064 .128 -.024 .287 G I -.186* .068 .023 -.353 -.020 Low High B G .143* .029 .000 .071 .214 B I .143* .030 .000 .070 .216 G I .000 .036 1.000 -.087 .088 Low B G -.311* .053 .000 -.439 -.183 B I -.157* .053 .010 -.285 -.029 G I .154* .042 .001 .050 .257Re-visited pages
High High B G .194* .079 .046 .002 .385 B I .222* .074 .010 .043 .401
Ln(pages) G I .028 .063 1.000 -.125 .181 Low B G 1.041* .090 .000 .822 1.259 B I .482* .100 .000 .239 .725 G I -.559* .085 .000 -.765 -.353 Low High B G -.010 .045 1.000 -.120 .099 B I .023 .041 1.000 -.076 .122 G B .010 .045 1.000 -.099 .120 G I .033 .037 1.000 -.058 .124 Low B G 1.171* .085 .000 .964 1.379 B I .747* .092 .000 .523 .970 G I -.425* .092 .000 -.649 -.201
151
Table 67: (Cont)
MeasureWeb complexity
Question type
(I) TOOL
(J) TOOL
Mean Difference
(I-J)Std.
Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
Extra pages viewed
High High B G -.005 .113 1.000 -.280 .269 B I -.357* .100 .002 -.599 -.114
Ln(pages) G I -.351* .092 .001 -.576 -.127 Low B G .473* .083 .000 .270 .675 B I .153 .086 .231 -.055 .361 G I -.320* .083 .001 -.523 -.117 Low High B G -.274* .061 .000 -.422 -.127 B I -.251* .063 .000 -.405 -.097 G I .023 .060 1.000 -.122 .168 Low B G .044 .076 1.000 -.142 .230 B I .067 .075 1.000 -.114 .249 G I .023 .065 1.000 -.135 .182Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.
I.8 Web complexity by question type interaction
Table 68: Pairwise Comparisons between question types in Web site complexity conditions
MeasureWeb complexity
(I) Question type
(J) Question type
Mean Difference
(I-J)Std.
Error Sig.
95% Confidence Interval for Difference
Lower Bound
Upper Bound
Task Completed High High Low .583* .046 .000 .491 .675Low High Low .052* .014 .000 .025 .080
Answer found High High Low 1.145* .041 .000 1.063 1.227Low High Low .414* .049 .000 .316 .511
Time spent on tasks
High High Low -1.354* .046 .000 -1.444 -1.263Low High Low -1.505* .040 .000 -1.585 -1.426
Total page viewed
High High Low -1.116* .045 .000 -1.206 -1.026Low High Low -1.316* .035 .000 -1.386 -1.246
Different pages viewed
High High Low -.939* .039 .000 -1.016 -.863Low High Low -1.078* .030 .000 -1.136 -1.019
Extra page viewed
High High Low -1.514* .055 .000 -1.624 -1.404Low High Low -1.796* .044 .000 -1.883 -1.708
Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.
152
I.9 User satisfaction statistic
Table 69: Mauchly's Test of Sphericity on PPSUQ score
Measure Mauchly's WApprox. Chi-
Squaredf Sig.
EpsilonGreenhouse-
GeisserHuynh-
FeldtLower-bound
TOOL OVERALL .861 14.801 2 .001 .878 .893 .500 SYSUSE .867 14.132 2 .001 .883 .897 .500 INFOQUAL .851 15.959 2 .000 .870 .885 .500 INTERQUAL .792 23.141 2 .000 .828 .840 .500
Table 70: Pairwise Comparisons on PPSUQ score between tools
Measure (I) TOOL (J) TOOL Mean Difference (I-J)
Std. Error Sig.
OVERALL B G .849* .133 .000 I -.113 .095 .714 G I -.962* .106 .000 SYSUSE B G 1.081* .152 .000 I -.005 .107 1.000 G I -1.085* .127 .000 INFOQUAL B G .542* .132 .000 I -.253* .099 .036 G I -.795* .100 .000 INTERQUAL B G .882* .170 .000 I -.077 .134 1.000 G I -.960* .116 .000 Based on estimated marginal means* The mean difference is significant at the .05 level.Adjustment for multiple comparisons: Bonferroni.B = Browser, G = Graphical overview, I = Integrated tool
153
I.10 Web site familiarity statisticWeb site familiarity was determined by a questionnaire to assert if subjects had previously
visited any of the Web sites in the experiment. Subjects who had seen a Web site might perform the
experimental task better than others. 23 subjects of 105 who responded to the Web familiarity
questionnaire had visited one or more of the Web sites in the experiment. Three subjects had visited
two of the Web sites. 20 subjects had visited one of the Web sites (Table 71). There were 92 tasks
performed by subjects who were familiar with a Web site (3.55% of the total 2,592 tasks). This
group of tasks was not distributed evenly in the tool x Web complexity x question type conditions
(Table 72). The average of the task performance, the time spent on task and the number of page
views, were in between mean +/- 1 S.D of overall performance (Table 33 and Table 35) suggests that
there were not significant effects on the overall results.
Table 71: Subject's Web site familiarity
Web Site IDSeen before Last time visit Freq of visit 1 2 3 4 5 7 9 TotalYes Yesterday Many time a day 1 1 2
Within the last week Once a week 1 1Not very often 1 1
Within the last month Once a week 1 1 1 3 1 7More than a month ago
Once a week 1 1 2Not very often 2 1 3 3 4 13
Total 1 2 3 2 8 3 7 26No 102 101 100 101 95 100 96 695No data 5 5 5 5 5 5 5 35Grand Total 108 108 108 108 108 108 108 756
Table 72: Tasks performed by subjects who had visited Web sites prior experiment grouped by tool, Web site complexity, and question type
Tool Web site complexity
Question type
Num. of task
%(/ 216)
Task completion
Answer found
Avg. of time spent
Avg. of total page viewed
Browser High High 6 2.8% 6 6 69.15 10.83Low 6 2.8% 5 4 233.28 38.17
Low High 10 4.6% 10 10 13.70 3.10Low 10 4.6% 10 8 70.07 14.40
Graphical overview
High High 12 5.6% 12 12 102.53 5.58Low 12 5.6% 6 2 277.65 21.58
Low High 8 3.7% 8 8 36.00 3.75Low 8 3.7% 8 7 130.25 8.88
Integrated High High 2 0.9% 2 1 59.10 6.00Low 2 0.9% 0 0 360.00 37.50
Low High 8 3.7% 8 8 22.60 3.13Low 8 3.7% 8 7 77.23 12.13
154
Reference List
Albers, M. J. (1997). Cognitive strain as a factor in effective document design. Proceedings of the 15th
Annual International Conference on Computer Documentation (pp. 1-6). ACM.
Ankerst, M., Berchtold, S., & Keim, D. A. (1998). Similarity clustering of dimensions for an enhanced
visualization of multidimensional data. IEEE Symposium on Information Visualization (pp. 52-60,153).
Beccaria, M., Bertolazzi, P., Battista, G. D., & Liotta, G. (1991). A tailorable and extensible automatic
layout facility. IEEE Workshop on Visual Languages (pp. 68-73). IEEE.
Bederson, B. B. & Hollan, J. D. (1994). Pad++: A zooming graphical interface for exploring alternate
interface physics. Proceedings of the ACM Symposium on User Interface Software and Technology (pp. 17-
26). ACM.
Benedikt, M. (1992). Cyberspace: some proposals. In M. Benedikt (Ed.), Cyberspace: first steps
Cambridge, Massachusetts: The MIT Press.
Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H. F., & Secret, A. (1994). The World-Wide
Web. Communication of ACM, 37, 76-82.
Björk, S., Holmquist, L. E., & Redström, J. (1999). A framework for focus+context visualization.
IEEE Symposium on Information Visualization (InfoVis '99) (pp. 53-56,145). IEEE.
Bly, S. A. & Rosenberg, J. K. (1986). A comparison of tiled and overlapping windows. Conference
Proceedings on Human Factors in Computing Systems CHI86 (pp. 101-106). ACM.
Botafogo, R. A., Rivlin, E., & Shneiderman, B. (1992). Structural analysis of hypertexts: Identifying
hierarchies and useful metrics. ACM Transactions on Information Systems, 19, 142-180.
155
Boyle, C. & Teh, S. H. (1992). To link or not to link: An empirical comparison of Hypertext linking
strategies. Proceedings of the 10th Annual International Conference on Systems Documentation SIGDOC'92
(pp. 221-231). ACM.
Brandenburg, F. J. (1987). Nice drawings of graphs are computationally hard. In P. Gorny & M. J.
Tauber (Eds.), Visualization in Human-Computer Interaction (pp. 1-15). New York: Springer-Verlag.
Bray, T. (1996). Measuring the Web. Proceedings of the Fifth International World Wide Web
Conference Amsterdam, Netherlands: Elsevier Science.
Brewington, B. E. & Cybenko, G. (2000). How dynamic is the web? The Ninth International World
Wide Web Conference (WWW9) Amsterdam.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., & Wiener,
J. (2000). Graph structure in the Web. Proceedings of the Ninth International World-Wide Web Conference
WWW9 Amsterdam.
Buckland, M. K. (1991). Information as thing. Journal of the American Society for Information
Science, 42, 351-360.
Bush, Vannevar (1945). As we may think. The Atlantic Monthly, 101-108.
Campbell, C. S. & Maglio, P. P. (1999). Facilitating navigation in information spaces: Road-signs on
the World Wide Web. International Journal of Human-Computer Studies, 50, 307-327.
Chen, C. & Rada, R. (1996). Interacting with hypertext: A meta-analysis of experimental studies.
Human-Computer Interaction, 11, 125-156.
Cockburn, A. & Jones, S. (1996). Which way now? Analysing and easing inadequacies in WWW
navigation. International Journal of Human-Computer Studies, 45, 105-129.
Conklin, J. (1987). Hypertext: An introduction and survey. IEEE Computer, 20, 17-41.
156
Czerwinski, M. & Larson, K. (1998). Business: trends in future Web designs: what's next for the HCI
professional? Interactions, 5, 9-14.
Dillon, A. (1994). Designing usable electronic text ergonomic aspects of human information usage.
Bristol, PA: Taylor & Francis Inc.
Dix, A. & Mancini, R. (1998). Specifying history and backtracking mechanisms. In P.Palanque & F.
Paternò (Eds.), Formal Methods in Human-Computer Interaction (pp. 1-23). London: Springer.
Durand, D. & Kahn, P. (1998). MAPA: a system for inducing and visualizing hierarchy in websites.
Proceedings of the Ninth ACM Conference on Hypertext (pp. 66-76).
Engelbart, D. C. (1963). A conceptual framework for the augment of man's intellect. In P.W.Howerton
(Ed.), Vistas in information handling (pp. 1-29). Washington, D.C.: SPARTAN BOOKS.
Fleming, J. (1998). Web Navigation: Designing the user experience. Sebastopol, CA: O'Reilly.
Fowler, R. H., Fowler, W. A. L., & Wilson, B. A. (1991). Integrating Query, Thesaurus, and
Documents through a Common Visual Representation. Proceedings of the Fourteenth Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 142-151).
Fowler, R. H., Kumar, A., & Williams, J. L. (1996). Visualizing and browsing WWW semantic
content. Emerging Technologies and Applications in Communications, 1996. Proceedings., First Annual
Conference on (pp. 110-113). IEEE.
Furnas, G. W. (1982). The FISHEYE view: A new look at structured files (Rep. No. Technical
Memorandum, #82-11221-22). Murray Hill, N.J.: Bell Laboratories.
Furnas, G. W. (1997). Effective View Navigation. Conference Proceedings on Human Factors in
Computing Systems CHI'97 (pp. 367-374). ACM.
157
Gaylin, K. B. (1986). How are windows used? Some notes on creating an empirically-based
windowing benchmark task. Conference Proceedings on Human Factors in Computing Systems CHI'86 (pp. 96-
100). ACM.
Gloor, P. A. (1997). Element of hypermedia design: techniques for navigation and visualization in
cyperspace. Boston: Birkhauser.
Graphics, Visualization & Usability (GVU) Center (1998). GVU's 10th WWW user survey. (n.d.).
Retrieved January, 2000, from http://www.gvu.gatech.edu/gvu/user_surveys/
Hammond, N. & Allinson, L. (1989). Extending hypertext for learning: An investigation of access and
guidance tools. Proc. BCS HCI'89 (pp. 293-304). Nottingham,U.K.
Heo, M. (2000). A Usability Study on Web Visualization Techniques and User Mental Models. Ph.D.
University of Pittsburgh.
Hirtle, S. C. & Jonides, J. (1985). Evidence of hierarchies in cognitive maps. Memory & Cognition,
13, 208-217.
Hölscher, C. & Strube, G. (2000). Web Search Behavior of Internet Experts and Newbies. 9th
International World Wide Web Conference Amsterdam.
Huberman, B. A. & Adamic, L. A. (1999). Evolutionary Dynamics of the World Wide Web Palo Alto,
CA: Internet Ecologies Group, Xerox Palo Alto Research Center.
Internet Engineering Task Force (IETF) (1994). Uniform Resource Locators (URL) (Rep. No.
RFC1738). Retrieved January, 2000, From http://www.ietf.org/rfc/rfc1738.txt
Internet Engineering Task Force (IETF) (1998). Uniform Resource Identifiers (URI) (Rep. No.
RFC2396). Retrieved January, 2000, From http://www.ietf.org/rfc/rfc2396.txt
Internet Engineering Task Force (IETF) (1999). Hypertext Transfer Protocol -- HTTP/1.1 (Rep. No.
RFC2616). Retrieved January, 2000, From http://www.ietf.org/rfc/rfc2616.txt.
158
Jerding, D. F. & Stasko, J. T. (1995). The Information Mural: A technique for displaying and
navigating large information spaces. IEEE Information Visualization Symposium IEEE Computer Society
Press.
Jul, S. & Furnas, G. W. (1997). Navigation in electronic worlds: A CHI 97 Workshop. SIGCHI, 29.
Kohonen, T. (1998). Self-organization of very large document collections: State of the art. Proceedings
of ICANN98, the 8th International Conference on Artificial Neural Networks (pp. 65-74). London: Springer.
Lamping, J., Rao, R., & Pirolli, P. (1995). A Focus+Context Technique Based on Hyperbolic
Geometry for Visualizing Large Hierarchies. Proceedings of ACM CHI'95 Conference on Human Factors in
Computing Systems (pp. 401-408).
Larson, K. & Czerwinski, M. (1998). Web Page Design: Implications of Memory, Structure and Scent
for Information Retrieval. Proceedings of ACM CHI 98 Conference on Human Factors in Computing Systems
(pp. 25-32).
Lawrence, Steve and Giles, C. Lee (1999). Accessibility of information on the web. Nature, 400, 107-
109.
Lewis, J. R. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation
and Instructions for Use. International Journal of Human-Computer Interaction, 7, 57-78.
Lin, X., Soergel, D., & Marchionini, G. (1991). A Self-Organizing Semantic Map for Information
Retrieval. Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval (pp. 262-269).
Mackinlay, J. (1986). Automating the design of graphical presentations of relational information. ACM
Transactions on Graphics, 5, 110-141.
Maurer, H. (1996). Hyper-G now Hyperwave : the next generation Web solution. New York: Addison-
Wesley Publishing.
159
McKnight, C., Dillon, A., & Richardson, J. (1991). Hypertext in Context. Cambridge; New York:
Cambridge University Press.
Minar, N. & Donath, J. (1999). Visualizing the crowds at a Web site. Human Factors in Computing
Systems CHI'99 Extended Abstracts (pp. 186-187). ACM SIGHCI.
Monk, A. F., Walsh, P., & Dix, A. J. (1988). A comparison of hypertext, scrolling and folding
mechanisms for program browsing. In People and Computer IV (pp. 421-435). Cambridge University Press.
Nakayama, T., Kato, H., & Yamane, Y. (2000). Discovering the Gap Between Web Site Designers'
Expectations and Users' Behavior. The Ninth International World Wide Web Conference (WWW9): The Web:
The Next Generation Amsterdam.
Nation, D. A., Plaisant, C., Marchionini, G., & Komlodi, A. (1997). Visualizing websites using a
hierarchical table of contents browser: WebTOC. Human Factors and the Web Conferences Colorado.
Nelson, T. H. (1987). Literary Machines. (87.1 ed.) Published by the author.
Nielsen, J. (1989). The Matter that Really Matter for Hypertext Usability. Hypertext'89 Proceeding
(pp. 239-248). ACM.
Nielsen, J. (1990). The Art of Navigation through Hypertext. Communication of ACM, 33, 296-310.
Nielsen, J. (1999). User Interface directions for the Web. Communication of ACM, 42, 65-72.
North, C. & Shneiderman, B. (1997). A Taxonomy of Multiple Window Coordinations (Rep. No. CS-
TR-3854). University of Maryland, College Park, Dept of Computer Science.
North, C. & Shneiderman, B. (1999). Snap-Together visualization: Coordinating multiple views to
explore information (Rep. No. CS-TR-4020). University of Maryland, College Park, Dept of Computer Science.
Olson, J.R., & Nielsen, E. (1987). Analysis of the cognition involved in spreadsheet software
interaction. Human-Computer Interaction 3, 4, 309-349.
160
Perlin, K. & Fox, D. (1993). Pad - An alternative approach to the computer interface. Proceedings of
the 20th Annual Conference on Computer Graphic, SIGGRAPH '93 (pp. 57-64). ACM.
Pirolli, P., Card, S. K., & Wege, M. M. V. D. (2000). The Effect of Information Scent on Searching
Information Visualizations of Large Tree Structures (Rep. No. UIR-R-2000-04-Pirolli-AVI2000-
InfoScentAndHBSearch ). Xerox PARC User Interface Research Group.
Pirolli, P. & Card, S. (1999). Information Foraging. Psychological Review, 106, 643-675.
Pitkow, J. E. (1998). Summary of WWW characterizations. The Seventh International World Wide
Web Conference Brisbane, Australia.
Pitkow, J. E. (1999). Summary of WWW characterizations. World Wide Web, 2, 3-13.
Schoon, P. L. (1997). World Wide Web Hypertext linkage patterns. PHD Illinois State University.
Shneiderman, B. (1994). Dynamic queries for visual information seeking. IEEE Software, 11, 70-77.
Shneiderman, B. (1998). Designing the user interface: Strategies for effective human-computer
interaction. (Third ed.) Reading MA.: Addison-Wesley.
Snowdon, D., Fahlen, L., & Stenius, M. (1996). WWW3D: A 3D multi-user Web browser. WebNet 96
Proceedings California USA: AACE.
Spence, R. (1998). Navigation in real and virtual worlds. Workshop on Personalised and Social
Navigation in Information Space (pp. 69-76). IFIP and Navigation SIG of Rsprit's i3-net.
Spring, M. B. (1991). Electronic printing and publishing: the document processing revolution. New
York: Marcel Dekker, Inc.
Spring, M. B., Morse, E., & Heo, M. (1996). Multi-level navigation of a document space. Leveraging
Cyberspace Conference Palo Alto, CA.
161
Stone, M. C., Fishkin, K., & Bier, E. A. (1994). The movable filter as a user interface tool. Proceedings
of CHI 94 (pp. 306-312). ACM: New York.
Tauscher, L. & Greenberg, S. (1997a). How people revisit Web pages: Empirical findings and
implications for the design of history systems. International Journal of Human-Computer Studies, 47, 97-137.
Tauscher, L. & Greenberg, S. (1997b). Revisitation patterns in World Wide Web navigation.
Proceedings of ACM CHI 97 Conference on Human Factors in Computing Systems (pp. 399-406).
Tversky, B., Franklin, N., Taylor, H. A., & Bryant, D. J. (1994). Spatial mental models from
descriptions. Journal of the American Society for Information Science, 45, 656-668.
Weinreich, H. & Lamersdorf, W. (2000). Concepts for Improved Visualization of Web Link
Attributes. Proceedings of the 9th International World Wide Web Conference Amsterdam, The Netherlands.
Wexelblat, A. & Maes, P. (1999). Footprints: history-rich tools for information foraging. CHI 99
Conference Proceedings (pp. 270-277). New York: ACM.
Whitaker, L. A. (1997). Human navigation. In C.Forsythe, E. Grose, & J. Ratner (Eds.), Human
Factors and Web Development (pp. 63-71). NJ: Lawrence Erlbaum Associates.
Wood, A., Drew, N., Beale, R., & Hendley, B. (1995). HyperSpace: Web browsing with visualisation.
Technology, Tools and Applications, the Third International World Wide Web Conference Darmstadt,
Germany.
Woodruff, A., Aoki, P. M., Brewer, E., Gauthier, P., & Rowe, L. A. (1996). An investigation of
documents from the World Wide Web. Fifth International World Wide Web Conference.
World Wide Web Consortium (1999). HTML 4.01 Specification (Rep. No. REC-html401-19991224).
Retrieved January, 2000, From http://www.w3.org/TR/1999/REC-html401-19991224/
162
Wright, P. & Lickorish, A. (1990). An empirical comparison of two navigation systems for two
hypertexts. In R. McAleese & C. Green (Eds.), Hypertext: State of the Art (pp. 84-93). Oxford, England:
Intellect.
163