1
Internet en WWW voor het opsporen van wetenschappelijke informatie
Paul Nieuwenhuysen
Vrije Universiteit Brussel,
Pleinlaan 2, B-1050 Brussel
Meer informatie in digitale vorm via
http://www.vub.ac.be/BIBLIO/personal/nieuwenhuysen/courses/opsporen/
IPAVUB
2
Planning van de dag: voormiddag
• Over “informatie”
• De informatiemarkt
• Classificatie- en thesaurus-sytemen
• Information retrieval (+ oefenen van query-formulering)
• Computernetwerken en Internet i.h.b.; telnet en ftp
• World-Wide Web (+ oefenen van “browsing” + “saving”)
• Online toegankelijke informatiebronnen!
»Globale Internet directories (+ demonstratie van Yahoo!)
[Lunch]
3
Planning van de dag: namiddag (deel 1)
• Online toegankelijke informatiebronnen! (vervolg)
»Internet indexes (+ oefenen met AltaVista en NL)
»Bookshop databases (+ oefenen)
»Te betalen databases (+ Gale Directory of Databases)
»Gratis databases met titels van tijdschriftartikels (+ oefenen met Ingenta)
»Vinden van illustraties (+ oefenen met AltaVista)
• Citation searching (+ oefenen met Web of Science)
• Documentleverantie
4
Planning van de dag: namiddag (deel 2)
• (Client-server architectuur)
• (Evaluatie van zoekresultaten)
• [Vrij individueel gebruik van Internet]
5
About “information”
Introductory concepts about information
6
iInformation sources:
evaluation criteria
• authority
• authenticity
• accuracy
• objectivity
• currency / up to date
• wide coverage
• format / lay-out of the information
• reliability
• distribution medium(print, e-mail, online,...)
• price / costs
• stability
• ...
7
The flow of documentary information
Reader/User
Reader/User
Primary sources / systems: mainlyJournal articles / Books /
Electronic mail / Online sources /...
Primary sources / systems: mainlyJournal articles / Books /
Electronic mail / Online sources /...
Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)
Library catalogues, including OPACs...
Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)
Library catalogues, including OPACs...
Author / Sender
Author / Sender
!? Question !? Task !? Problem !?
Why is secondary information created? Why is secondary information created?
8
9
Past
Now
FutureiRetrospective searching versus
current awareness
Retrospective searching
Current awareness
10
iInformation retrieval: evolution of
storage and distribution media
• 1450 print
• 1975 + online access databases
• 1985 + CD-ROM
• 1990 + World-Wide Web (in Internet)
11
Information retrieval: end user or intermediaries
End-user
Information intermediary(Broker or library or ...)
Information
12
About “information”
Computer- and network-based information
13
Information: from bits to meaningful information
Digitalcomputer data = bits
or01Program code, meaningful for andto be interpreted / executed bya suitable / compatible computer
Information = “documents”, meaningful for andto be interpreted byhuman beings
14
Information: digitally stored and managed information
Categories of digital, computer readable information / data, forming electronic “documents”,understandable by human beings.
01textnumbersimagesvideosounds
multimedia
+
15
01Digital information
Multimedia / Hypermedia
Information: types of digital information
Linear textHypertext
Static imagesVideo
Sound
Programs for computers
16
Online / Networked
CD-ROM
Update speed
Volume
Some publication media compared
Printed
17
The digital / electronic / virtual library
Structures & practices of physical libraries and archives
Communication capabilities of electronic networking
Computing
Digital /electronic / virtuallibrary
Digital /electronic / virtuallibrary
18
Scientific publishing in Utopia: an ideal scheme
Many authorsMany authors
Many readers / usersMany readers / users
Many editors / publishersMany editors / publishers
Online remote access multimedia database serverOnline remote access multimedia database server
Many database search clients and user interfaces
Many database search clients and user interfacesone global ,
international computer data communication network
author = reader in science
19
!? Question !? Task !? Problem !?
Indicate the differences between reality
and that simplified, ideal schemeof the information flow.
Indicate the differences between reality
and that simplified, ideal schemeof the information flow.
20
!? Question !? Task !? Problem !?
Which basic problems hinder people to find information?
Which basic problems hinder people to find information?
21
iInformation retrieval:
basic problems (Part 1)
• In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed/required.
• In many cases the need for information cannot be expressed completely in the form of a query.
One of the reasons is that the complete context of the information need should ideally be expressed, including the knowledge and background of the searcher.
22
iInformation retrieval:
basic problems (Part 2)
• Language and vocabulary problems
»People use different languages and different terms (vocabularies) to describe a similar concept.
»The fluidity of concepts and vocabularies: meanings of terms depend on their context and may change over time.
23
iInformation retrieval:
basic problems (Part 3)
• Too many imperfect retrieval systems
»To retrieve and access the information which is in principle available, many different retrieval systems must be available and be mastered.
»Furthermore, a perfect information retrieval software does not exist.
24
iInformation retrieval:
basic problems (Part 4)
• Information overload
Users are often overwhelmed by the amount of available information and by the large influx of new information.
• The price (or inaccessibility) of particular information
A lot of information cannot be obtained or at least not free of charge.
25
The information industry and the information market
The information industry and the information market
26
The components of the information industry
The components of the information industry
• Authors
• Publishers
• Distributors
• Users
• Related organizations
27
Online access from an institute to information: methods
Online access from an institute to information: methods
Online access from an institute to information on
• an external online computer host/server systems:
»supermarket model: select and pay for selected information only
»fixed price per year, payed by the institute to the information distributor
• a local server computer, maintained by the institute for their users
»for a fixed price per year, payed by the institute to the information distributor
28
Increase in the number of scientific and technical serial publications
1
10
100
1000
10000
100000
1000000
1650 1700 1750 1800 1850 1900 1950 2000
29
Evolution of information industry: measures
Evolution of information industry: measures
• Number of living databases.
• Number of database producers.
• Number of database vendors (including online services).
• Number of database records / documents.
• Number of online searches per year.
• ...
30
The information market: growth in the database industry
The information market: growth in the database industry
0
2000
4000
6000
8000
10000
1975 1980 1985 1990 1995
Number oflivingdatabases
Number ofdatabaseproducers
Number ofvendors
Source: Williams, in: Gale Directory of Databases, 1998.Source: Williams, in: Gale Directory of Databases, 1998.
31
The information industry / market: future trends (Part 1)
The information industry / market: future trends (Part 1)
• Growth in the production of databases.
• Less analogue / hard-copy production = more digital production, storage, and distribution of information.
• More integration of information types into multimedia and hypermedia.
32
The information industry / market: future trends (Part 2)
The information industry / market: future trends (Part 2)
• Growth in the number of
»producers and distributors,
»end-users searching databases due to easier use and lower costs of information technology
33
Knowledge organisation: classifications, and thesaurus systems
Knowledge organisation: classifications, and thesaurus systems
34
• Universal means here: covering all subjects
• Examples
»Universal Decimal Classification = UDC
used mainly outside U.S.A.
»Dewey Classification
used mainly in U.S.A.
»Library of Congress Classification
used mainly in U.S.A.
»...
Classification systems: examples of universal systems
Classification systems: examples of universal systems
Examples
35
Thesaurus: descriptionThesaurus: description
• Thesaurus (contents) =
»system to control a vocabulary (= words and phrases + their relations)
»the contents of this vocabulary
• Thesaurus program =
program to create, manage, modify and/or search a thesaurus using a computer
36
Thesaurus relations
Thesaurus relations
Term(s) with broader meaning
BT (= Broader Term)
RT (= Related Term) UF (= Use(d) For)Other term(s) Term Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning
37
Thesaurus systems covering all subjects: examples
Thesaurus systems covering all subjects: examples
• General systems / universal systems / on all subjects = broad and shallow, horizontal systems
• Examples:
»Library of Congress Subject Headings (LCSH)
»thesaurus systems built into word processing software
»...
Examples
38
!? Question !? Task !? Problem !?
Try to find more suitable search terms to retrieve “general reviews about monitoring seawater pollution”
by using the thesaurus included in the program for word processing
that you use.
Limit yourself here to the concept of “pollution.”
Try to find more suitable search terms to retrieve “general reviews about monitoring seawater pollution”
by using the thesaurus included in the program for word processing
that you use.
Limit yourself here to the concept of “pollution.”
39
Databases and computerised information retrieval
40
What is a database?
A database is a collection of similar data records stored in a common file (or collection of files).
41
Types of databases: examples
The databases that form the basis for
»catalogues of books or other types of documents
»computerised bibliographies
»address directories
»a full text newspaper, newsletter, magazine, journal+ collections of these
»WWW and Internet search engines
» intranet search engines
» ...
42
iHints on how to use information sources: overview (Part 1)
• Know the purpose and motivation for each search.
• Do not be lazy: search on your own, before bothering experts with requests for advice.
• Plan your search in advance.
• Choose the best source(s) for each search.
• Use the right tools for each job (a suitable communication program for instance, in the case of online searches).
• Do not focus on a single source.
43
iHints on how to use information sources: overview (Part 2)
• Consider citation indexes besides subject-oriented databases, as useful secondary information sources.
• Use the available tools for subject searching well.
• Try to cope with the language problems.
• Match your search strategy with the type of source.
• In computer-based retrieval systems, combine search terms when appropriate, using
»Boolean operators
»proximity operators (for instance “near”,...)
44
iHints on how to use information sources: overview (Part 3)
• Work cost-effectively.
• Use special care when searching for names.
• Work iteratively.
• Keep a record of your work.
• Be critical: not all information is correct or useful.
• Stop searching when “enough is enough”
• Give up if necessary... (Not all questions have an answer.)
• ...
45
iHints on how to use information
sources: subject searching
• When you search for information on a particular topic/subject: investigate if the database producer offers
»a subject classification scheme and/or
»a controlled/approved/accepted subject terms, and/or
»a subject thesaurus
• Exploit these, if they are available.
• Use synonyms, broader and related terms, if appropriate.
• Use narrower terms also, in most cases.
46
Hints on how to use information sources: Boolean combinations (1)
In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible.
term x1OR term x2ORterm x3
term x1OR term x2ORterm x3
term y1OR term y2OR term y3
term y1OR term y2OR term y3
term z1OR term z2OR term z3
term z1OR term z2OR term z3
AND AND AND ...
47
Hints on how to use information sources: Boolean combinations (2)
Most text search systems understand the basic Boolean operators typed in capital characters:
AND
OR
48
!? Question !? Task !? Problem !?
How many (and which) concepts do you see in a search for
“general reviews about monitoring seawater pollution”?
How many (and which) concepts do you see in a search for
“general reviews about monitoring seawater pollution”?
49
!? Question !? Task !? Problem !?
Prepare off-line, on paper, a suitable search query in a generic format, to find
“general reviews about monitoring seawater pollution” as the basis for later, concrete searches in databases.
(Limit yourself perhaps to 1 concept.)
Prepare off-line, on paper, a suitable search query in a generic format, to find
“general reviews about monitoring seawater pollution” as the basis for later, concrete searches in databases.
(Limit yourself perhaps to 1 concept.)
50
Hints on how to use information sources: work iteratively
Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search.
Results
Query Searching
Feedback
51
Hints on how to use information sources: work iteratively: example
When you search a database with subject keywords from a controlled list, added to each record:
1. Search with search terms that you know
2. Investigate the results and select good, relevant items
3. Look for the keywords added to these items
4. Select the good, relevant keywords
5. Formulate a new search with these keywords added
6. Execute the new search
7. Repeat the procedure
52
Hints on how to use information sources: when to stop searching?
Develop a feel for the “curve of diminishing returns”:
If you spend too much time, effort, and/or money with too few benefits, you should stop.
time / effort / money
payoffTime to stop?
53
Computer networks, data communication and Internet
54
Data communication: a definition
• Interpersonal communication
» Telecommunication
—Broadcast
—Telephone
—Data communication
–Remote login
–File transfer
–Hypertext transfer
–Electronic mail
–...
55
01Digital information
Multimedia / Hypermedia
Data communication: which ‘data’?
Linear textHypertext
Static imagesVideo
Sound
Programs for computers
56
Data communication: applications (Part 1)
• Hard-copy transfer (Fax)
• Online use of the processing power of a remote computer
• Online access to information sources !
»library catalogues,
»bookshop catalogues,
»publisher’s catalogues,
»campus-wide and community information systems,
»(text or multimedia) databases,
»network-based journals, ...
57
Data communication: applications (Part 2)
• Software-downloading
• Electronic mail from a person to one or several persons
• Computer-network based interest groups
• Online talking / chatting (IRC,...)
• Video conferencing (Cu-seeme, ...)
• Selling, shopping, buying,..
• ...
58
Data communication: modems
• description: MODulator-DEModulator: device to convert digital data signals into a suitable form for transmission along a telecommunications channel, and to convert them back upon receipt into machine readable form.
• types
»(Acoustic coupler)
»Free standing box
»Board/card to plug-in microcomputer
59
Computer network protocols: definition
• When 2 computer systems communicate via network, they do that by exchanging messages.
• The structure of network messages varies from network to network.
• Thus the message structure in a particular network is agreed upon a priori and is described in a set of rules, each defined in a protocol.
60
Data communication with a server in a Local Area Network
• (Terminal)
• Microcomputer with serial line communications software /terminal emulation software
• Microcomputer with network card and network software
Network Network serverserver
61
!? Question !? Task !? Problem !?
Which applications do you know
of server computers in a LAN?
Which applications do you know
of server computers in a LAN?
62
Applications of server computers in a LAN
• Extra personal disk space for the users
• Common files with programs and/or data for many users (e.g. an intranet)
• Making files available from the institute to external users over the Internet (e.g. using ftp, gopher, WWW)
• Executing programs on the server (e.g. using Unix or Windows NT or Windows 2000, in multitasking / multiuser mode)
• Electronic mail servers; Usenet servers;...
63
!? Question !? Task !? Problem !?
Do you have personal disk space available, through the LAN of your institute?
Do you have personal disk space available, through the LAN of your institute?
64
National Wide Area Networks
• Public access national packet switching networks
• Research computer networks
• Public access made available by Internet Service Providers
• ...
65
National research computer networks: examples
• Belgium: BELNET
• Finland: FUNET
• Germany: DFN
• The Netherlands: Surfnet
• United Kingdom: JANET (Joint Academic Network)
• ...
Examples
66
International computer networks: examples
• National public data communication networks linked together
• Internet
• FidoNet
• Bitnet / EARN
• Usenet
• ...
Examples
67
!? Question !? Task !? Problem !?
What is the Internet? What is the Internet?
68
@The Internet
data communications network (Part 1)
• “Internet” is not well-defined.
• A network of smaller networks:The global collection of interconnected local area, regional and wide-area (national backbone) networks which use the TCP/IP suite of data communication protocols.
69
@The Internet
data communications network (Part 2)
• Links computers of various types.
• Is constantly growing.
• The analogy of a superhighway has been used to describe the emerging system of networked computers.
• The Internet has no owner, and is not managed by one organization.
70
The Internet: access from your Local Area Network
Your microcomputer
Local Area Network (LAN)
One of the national networks
The global Internet
71
@Data communication:
some services provided by Internet
Service Protocols used
• remote login telnet
• file transfer services ftp (or http)
• gopher menus to information space gopher
• hypertext transfer in WWW !! http
• WAIS or Z39.50 database searches WAIS, Z39.50
• electronic mail ! smtp, pop, imap,…
• Usenet News = Netnews nntp
• ...
72
@Host computers in the Internet:
definition
• A host (computer) is a domain name that has a unique IP address record associated with it.
• Could be any computer connected to the Internet by any means.
• For instance: www.vub.ac.be
73
Internet data communicationlayers and protocols (Part 1)
different physical nets
IP = Internet protocol
TCP = host to host transfer control protocol
Application protocols:smtp, pop, imap, nntp, telnet, ftp, gopher, http, cu-seeme,...
Application programs
USER
74
Client Server
Internet data communicationlayers and protocols (Part 2)
different physical nets
IP
TCP
Application protocols
Client application programs
Computer
USER
IP
TCP
Application protocols
Server application programs
Computer
75
@Transmission Control Protocol /
Internet Protocol (TCP/IP)
• the main suite of transport protocols used on the Internet for connectivity and transmission of data across heterogeneous systems
• “glue that holds the Internet together”
• an open standard
• available on most Unix systems, VMS and other minicomputer systems, many mainframe and supercomputing systems and some microcomputer and PC systems
76
@Internet: addresses of computers with the Domain Name System
• Internet style = Domain name system
• The Internet naming scheme consists of a hierarchical sequence of names from the most specific to the most general (left to right), separated by dots.
computer.subdomain.domain.(country if not USA) OR
n1.n2.n3.n4 where n is a natural number (8-bit)
77
@Internet: growth in number of hosts
worldwide: linear plot
0
5000000
10000000
15000000
20000000
1993 1994 1995 1996 1997 1998January of each year
78
@Internet Service Provider
= ISP
Internet Service Providers provides their clients access to Internet + in many cases
»software tools to start
»training
»technical support
»an accessible location for a WWW site of the client
»assistance with WWW site design and promotion
79
Online communication: remote login and file transfer
80
Remote terminal log-in / access: definition
The ability to access a computer from outside a building in which it is housed. This requires communications hardware, software, and actual physical links,although this can be as simple as common carrier (telephone) lines or as complex as telnet login to another computer across the Internet.
81
Microcomputer -- external computer: some ways of data communication
Modem
External computer
Gateway computer system
Private/academic data comm. network (e.g. Internet)
Intern Extern
Local PAD
Leased, fixed communication line
Tele-phone
Micro-computer
Public data comm. network
Voice telecommunication network
LAN
TelePAD
ISDN
82
Telnet: description
• The Internet standard protocol for remote terminal connection service; on top of the TCP/IP protocol suite
• Allows a user at one site to interact with a remote timesharing system at another site as if the user's terminal was connected directly to the remote computer
• Includes VT100 terminal emulation
83
Data communication: downloading
The electronic transfer of information (whole file or fragments) from one computer to another, generally from a larger computer to a smaller one (such as from a server under Unix to a microcomputer).
84
Data communication: downloading by copying a fragment
Capturing a small fragment of the information displayed:
1. select information on the display,
2. copy, and
3. paste in a document managed by another program.
85
Data communication: file transfer
• Copying + downloading / transfer of a whole file
• Requires a transfer protocol with error correction
86
ftp: file transfer protocol in the Internet
• A high-level protocol = application protocol in the Internet.
• For transferring files from one computer to another (free of errors).
• Provides the capability
»1. to connect to a remote computer,
»2. to execute a few simple tasks (such as listing the directory),
»3. to copy files to or from the remote computer quickly.
87
World-Wide Web = WWW
WWW
88
WWW: example of a welcome page
Example
!? Question !? Task !? Problem !?
Indicate some difference between telnet and the World-Wide Web?
Indicate some difference between telnet and the World-Wide Web?
90
URL = Universal Resource Locator
URL• = draft standard for specifying an object on the Internet
• the structure is in most casesprotocol://computer_address[/path_name/file_name]
• examples:
» telnet://biblio.vub.ac.be
»ftp://ftp.vub.ac.be/
»gopher://gopher.vub.ac.be/
»http://www.vub.ac.be/BIBLIO/index.html
»news:comp.infosystems.www
91
URLformat / structure
URL1. The first part of a URL, before the colon “:”, specifies
the access method = protocol
2. The second part of the URL, after the colon “:”, is interpreted specific to the access method. In general, two slashes after the colon indicate a machine /computer name.
!? Question !? Task !? Problem !?
What is the difference between Internet and the World-Wide Web?
What is the difference between Internet and the World-Wide Web?
93
WWW is an Internet application
Data communication
Internet
WWW
94
WWW
WWW: the essential elements
• Information delivery and access using hypertext/hypermedia documents/objects
»html documents
»http protocol: http clients http servers
• Integration of protocols in the Internet:
»http servers offering html documents including links to other http servers, telnet servers, ftp servers, nntp servers, gopher servers, ,...
95
WWWComputer 1
WWW: hyperlinks
Hyperlinks can link a part of a hypermedia document to
• another part of the same document file
• another document file on the same server computer
• another document file on a server computer located elsewhere in the world
Computer 2
96
WWW
WWW: hypertext mark-up language = HTML
• Hypertext mark-up language = HTML = the system of codes used by authors to build the hypertext-pages/files in WWW, for instance to create a title or an anchor.
• The codes are invisible / transparant for the user / reader.
97
WWW
WWW: hypertext transfer protocol = HTTP
• Hypertext transfer protocol = HTTP = the software conventions used by client and server programs for WWW to request and transfer hypermedia documents.
• The protocol must not be known by he user / reader = the protocol is invisible / transparant for the user.
• Analogous with the telnet, ftp and gopher protocol.
!? Question !? Task !? Problem !?
Briefly compare TCP/IP and HTTP.
Briefly compare TCP/IP and HTTP.
99
WWW
WWW: pages and forms
• Pages
Many documents developed for WWW are kept small and are named “pages”.
These often refer to several other “pages”.
• Forms = gateways to services and databases on server computers in WWW
Some pages contain electronic forms, to be filled in by the user.
100
WWW
WWW: client / browse programs
• To access the WWW, you run a browser program.
• The browser reads documents, and can fetch documents from other sources. Information providers set up hypermedia servers which browsers can get documents from.
• The browser can display hypertext documents. Hypertext is text with pointers to other text. The browsers let you deal with the pointers in a transparent way: select the pointer, and you are presented with the text that is pointed to.
101
WWW
WWW: examples of browsers for your own computer
Browsers are available for many platforms; in particular: browsers for Windows + Winsock:
»Netscape Navigator and Communicator
»Microsoft Internet Explorer
»...
102
Netscape Navigator 4for Windows 95: screen shot
Example
103
MS Internet Explorer 4 for Windows 95: screen shot
Example
104
!? Question !? Task !? Problem !?
Which client program do YOU use or will YOU use
to access the WWW?
Which client program do YOU use or will YOU use
to access the WWW?
105
!? Question !? Task !? Problem !?
Browse the WWW, using an available
browser client program.
Browse the WWW, using an available
browser client program.
106
!? Question !? Task !? Problem !?
Visualise the HTML source code of a WWW page,
using a WWW client program.
Visualise the HTML source code of a WWW page,
using a WWW client program.
107
!? Question !? Task !? Problem !?
Exploit the possibility to open more than one window, using a WWW client program
in Windows.
Exploit the possibility to open more than one window, using a WWW client program
in Windows.
108
!? Question !? Task !? Problem !?
Why would you want to open more than one window
on WWW servers,using a WWW client program?
Why would you want to open more than one window
on WWW servers,using a WWW client program?
109
WWW
WWWapplications
Analogous to gopher applications:
• Access to online public access catalogues
• Campus-wide information systems
• Access to subject-oriented information
• Access to computer file archives
• Traveling / navigating through the Internet via linked html-pages
• Access to intranets within institutes / companies
110
WWW
WWW: How to save information from WWW?
Information displayed by your WWW browser/client program can be saved,
• by select, copy, paste in another document (and save)
• by saving a complete page to your disk
»in separate files (for instance 1 HTML file + some image files)
»in 1 file, using Microsoft Internet Explorer 5
• by copying the information into an e-mail message that you send to your own e-mail account
111
!? Question !? Task !? Problem !?
Copy some text fragment from WWWand paste it into another document
on your computer.
Copy some text fragment from WWWand paste it into another document
on your computer.
112
!? Question !? Task !? Problem !?
Save a text from WWW to disk, as HTML,
using a browser program.
Save a text from WWW to disk, as HTML,
using a browser program.
113
!? Question !? Task !? Problem !?
Display an HTML file that you have saved
from the WWW to your disk,in a program for word processing.
Is the file displayed properly?
Display an HTML file that you have saved
from the WWW to your disk,in a program for word processing.
Is the file displayed properly?
114
!? Question !? Task !? Problem !?
Check if the program that you use can copy a picture from WWW,
so that you can directly paste it into a document
in another program on your PC.
Check if the program that you use can copy a picture from WWW,
so that you can directly paste it into a document
in another program on your PC.
115
!? Question !? Task !? Problem !?
Save a picture from WWW to disk,
using a browser program.
Save a picture from WWW to disk,
using a browser program.
116
!? Question !? Task !? Problem !?
Check if the program that you use for word processing
allows you to insert a picture that you saved saved to disk
into your word processing document.
Check if the program that you use for word processing
allows you to insert a picture that you saved saved to disk
into your word processing document.
117
WWW
WWW: How to save a HTML document including pictures?
Saving a complete HTML document including pictures can be done by using the appropriate software.
For instance:
»Microsoft Internet Explorer 4 with Frontpage Express
»Microsoft Internet Explorer 5
»Netscape Page Composer (included in the Netscape software suite)
118
WWW
WWW: growing number of WWW servers
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
1993 1994 1995 1996 1997 1998 1999 2000
119
Online access information sources and services
Introduction
120
Primary versus secondary computer sources / systems / services
• Primary sources /systems /services
directly useful
• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services”, ...
121
Online access to information: avoid network traffic jams
To access from Europe online information sources in the US, work when lines are not saturated.
(better in the morning than in the afternoon)
122
Internet based information sources: problems
• too much of it / redundancy
• too few of some
• no order / no quality checks / no requirement to register new information offered
• constantly changing / increasing / growing
• authenticity?
123
Internet based information sources: how many? how much?
In 2000:
• about 1 000 million unique URLs in the total Internet
• about 10 terabyte (= 10 000 gigabyte) of text data
124
Types of online access information systems: “free” versus “fee”
Public access information sources free of charge
Fee-based online information services / Databanks
125
Online access information sources and services
Internet-based encyclopedias
126
Encyclopedias accessible through Internet and WWW
Some encyclopedias and dictionaries are available through the WWW free of charge.
127
Encyclopedias accessible through Internet and WWW: examples
• Encyclopædia Britannica including Merrian Webster dictionary + links to selected WWW siteshttp://www.britannica.com/
• Encarta Concise Free Encyclopedia http://encarta.msn.com/
• A list of encyclopedia on the Internet:http://www.internetoracle.com/encyclop.htm
• Other lists of encyclopedia on Internet can be found as a part of browsable Internet directories.
Example
128
Online access information sources and services
Internet directories and indexes
129
Internet: meta-information about Internet information sources
• in printed manuals and guides:
»not always possible to get a copy fast
»costs money to get a copy
»soon out of date
• offered on the WWW!:
»can become overloaded by too many simultaneous users
»cheaper to get a copy than in the case of printed material
» in most cases regularly updated
• (“intelligent agent” software on client PC)
130
Internet: subject-oriented meta-information offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»subject hypertext directories = subject guides
»key word indexes, generated automatically, for searching
»collections of links or forms to the above
»(multi-threaded search systems)
131
Internet global subject directories:introduction
• They are virtual libraries with open shelves, manually generated, man-made, for browsing.
• They cover only a small number of selected WWW sites.
• They are suitable mainly for broad, not very specific searches that are difficult to formulate in words.
• They can include an index on the contents of the directory (which may be confusing, as they are not real Internet indexes)
132
Internet global subject directories: Yahoo!
• A hypertext global subject directory can be found at http://www.yahoo.com/
and at many other sites, includinghttp://www.yahoo.co.uk/
• Entries are not rated.
• Yahoo! also offers access to an Internet index created by Inktomi, for searching with search statements, when the more limited directory of Yahoo! does not provide results.
• Accessible free of charge.
Example
133
!? Question !? Task !? Problem !?
Try to find Internet sourceswhich are relevant for you,
by using an Internet subject directory.
Try to find Internet sourceswhich are relevant for you,
by using an Internet subject directory.
134
Internet local subject directories: examples in Belgium
• http://yellow.advalvas.be/weblist.html
• http://search.msn.be/exploring/exploring.asp
• The guide developed by the public libraries in Flanders: http://www.bib.vlaanderen.be/webwijzer
Examples
135
Internet indexes:automated search tools
• Several systems allow to search for and to locate many items (addressable resources) in the Internet in a more systematic way than by only browsing/navigating.
• Each of these search systems is based on:
»a database of links to URLs, continuously collected from the Internet by a “robot” and incorporated in their big index, that is “machine-made”
»a search system with a user interface in a WWW form, to allow the user to search through that database
136
Internet indexes: scheme
User searching for Internet based information
Internet client hardware and software
user interface to a search engine Internet information source
Internet index search engine Internet crawler and indexing system
database of Internet files, including an index
137
Internet indexes:AltaVista
Example
The primary search interface can be found in the US:
http://www.altavista.digital.com/
http://www.altavista.com/
http://www.av.com/
(These addresses all lead to the same information.)
Mirror site in UK:
http://www.altavista.co.uk/
138
Internet indexes:AltaVista: features
• Allows full text searching
»of WWW, with a good coverage
»(of Usenet newsgroups archives)
• Allows advanced Boolean searching (in “Advanced” mode)
• Offers relevance ranking of search results
• Offers a link to an Internet subject directory (Looksmart)
• Offers links to systems to find images, sounds,… (multimedia) in the Internet
Example
139
!? Question !? Task !? Problem !?
Read the online help, and search AltaVista
in simple search mode.
Read the online help, and search AltaVista
in simple search mode.
140
!? Question !? Task !? Problem !?
Read the online help offered by AltaVista
for its advanced search mode, and perform a Boolean search
with more than 1 concept using this advanced search mode.
Read the online help offered by AltaVista
for its advanced search mode, and perform a Boolean search
with more than 1 concept using this advanced search mode.
141
!? Question !? Task !? Problem !?
When should you better use the advanced search mode,
rather than just the simple one?
When should you better use the advanced search mode,
rather than just the simple one?
142
Internet indexes:AltaVista simple versus advanced
• “Simple” is suited for instance for searches
»with only 1 concept expressed as a series of synonyms, narrower terms,...such as a search for a person, a company, an institute,...
»when ranking is important
• “Advanced” is suited for instance for searches
»with more than 1 concept so that an AND combination is useful, besides an OR combination
»when ranking is not important
Example
143
!? Question !? Task !? Problem !?
Develop a suitable search query and apply this with AltaVista to find
“general reviews about monitoring seawater pollution”
Tips: use synonyms, narrower terms, truncation,
and Boolean combinations.
Develop a suitable search query and apply this with AltaVista to find
“general reviews about monitoring seawater pollution”
Tips: use synonyms, narrower terms, truncation,
and Boolean combinations.
144
Internet indexes: Northern Light
• The search interface can be found at http://www.northernlight.com/
• You can search
»free Web content and
»some other publications (full text articles) (but obtaining the text is not free of charge)
Example
145
Internet indexes: coverage / size of each index
The indexes grow and their “size ranking” is variable. In 1999:
1. AltaVista; Fast = All the Web; Northern Light
2. Google; Infoseek;... indexes based on the Inktomi database: Anzwers; Canada.com; Geocities; GoTo; Hotbot; MSN Web Search; Snap!; search system offered by Yahoo!, besides the index of the Yahoo directory;...
3. Excite; Lycos; Euroseek;...
4. Webcrawler;...
146
!? Question !? Task !? Problem !?
Try to find Internet sourceswhich are relevant for you, by using an Internet index.
Try to find Internet sourceswhich are relevant for you, by using an Internet index.
147
Internet information sources
Coverage of Internet directories and Internet indexes
A global Internet index
A global Internet directory
148
Global Internet search tools: a comparison
Global Internet directories
• Only a limited selection of Internet sources
• Browsing information sources is easy
• Good for broad searches
Global Internet indexes
• About 1/3 of the Internet is covered by an index
• Searching requires some skills and knowledge
• Good for specific, narrow searches
Multi-threaded search engines
• Gets information from directories and indexes
• Searching requires some skills and knowledge
• Good when even 1 index does not yield information
149
Finding multimedia files on the Internet: introduction
Several public access search systems are available free of charge to search the Internet for multimedia files:
»sound / audio files (music, speeches,...)
»images / pictures (either artwork, either photos, or both)
»video
150
Finding images on the Internet:introduction
• Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.
• The search results offer not only links, but also directly small versions of the images (= “thumbnails”)
151Examples
Finding images on the Internet:examples of image search engines
• http://www.altavista.com/ also audio and videochoose not the normal text search, but IMAGES in the user interface
• (http://ipix.yahoo.com/)
• http://scour.net/ (also audio and video)
• http://www.ditto.com/
• ...
152
!? Question !? Task !? Problem !?
Use a specialised search engineto find images
about a particular subject on the Internet.
Use a specialised search engineto find images
about a particular subject on the Internet.
153
Online access information sources and services
Online access information sources and services
Public access book databases
154
Public access book databases: an overview
Public access book databases: an overview
• (Databases by publishers.)
• Databases by book distributors / bookshops!
• Online public access library catalogues
• (Databases of computer-based versions of books.)
155
WWW
Examples of free public access bibliographic book databases (Part 1)
Examples of free public access bibliographic book databases (Part 1)
• Amazon.com (US):http://www.amazon.com/ http://www.amazon.co.uk/ note: amazon, NOT amazone
• Barnes and Noble (US):http://www.bn.com/
• Blackwell’s on the Internet (International):http://www.blackwell.co.uk/
Examples
156
!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?
Search for titles of bookswhich are relevant for you,
using an online database provided by a book publisher or bookshop.
Search for titles of bookswhich are relevant for you,
using an online database provided by a book publisher or bookshop.
157
Online access information sources and services
Online access information sources and services
Online Public Access Catalogues = OPACs
158
Online access library catalogues:The British Library
Online access library catalogues:The British Library
• In 1997: http://opac97.bl.uk/
• Accessible online via WWW
• Access free of charge
Example
159
Online access library catalogues:The British Library: screenshotOnline access library catalogues:The British Library: screenshot
Example
160
Online access information sources and services
Online access information sources and services
Fee-based online public access information services
161
Fee-based online access services: examples (Part 1)
Fee-based online access services: examples (Part 1)
Location of the computer(s)
U.S.A.U.S.A.U.S.A.U.S.A.U.S.A., Taiwan, UKSwitserlandU.S.A.U.S.A....
Name
America On LineOCLCOvid TechnologiesCompuserveCambridgeData-StarDialogEBSCO...
Examples
162
Fee-based online access services: examples (Part 2)
Fee-based online access services: examples (Part 2)
Location of the computer(s)
U.S.A.
U.S.A.U.S.A.U.S.A., The Netherlands,...Germany - U.S.A. - JapanThe Netherlands...
Name
Elsevier ScienceDirect ISI (Web of Science, JCR,…)MSN (Microsoft)ProdigySilverPlatterSTN Swets (e-journals)...
Examples
163
Online information services:total size of their databases
Online information services:total size of their databases
In 1999:
The big host systems and the public access WWW pages offer a comparable quantity of information:
• WWW offers about 8 terabytes (= 8 000 gigabytes) of text data(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)
• Dialog offered about 9 terabytes (= 9 000 gigabytes) (in 1998)
»6 billion pages of text
»3 million images
164
Databases of online public access databases
Databases of online public access databases
• Examples
»I’M Guide = Information Market Guide
»Gale directory of databases !
• Their coverage
»online access databases
»(databases accessible on CD-ROM)
»...
165
Databases of databases: Gale
Databases of databases: Gale
• Produced in U.S.A.
• Not free of charge
• Available in various formats:
»printed
»on CD-ROM
»online via the host systems Data-Star, Dialog, with a payment required for each use
»online through the Internet through various hosts,for a fixed price per year to be payed in advance
166
Online access information sources and services
Online access information sources and services
Online access databases about journal articles
167
Online access databases about journal articles: overview
Online access databases about journal articles: overview
• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains.
• Only few large databases offer access to bibliographies of articles published in journals, free of charge.
168
Online access databases about journal articles: Northern Light
Online access databases about journal articles: Northern Light
• Northern Light allows searching for
»WWW documents
»the full text of articles from many journals/magazines
• Searching is free of charge.
• Available from
http://www.northernlight.com/
http://www.nlsearch.com/
• Payment is required to receive the full text of an article.
169
Online access databases about journal articles: Ingenta (1)
Online access databases about journal articles: Ingenta (1)
• Ingenta Journals allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts.
• Searching is free of charge.
170
Online access databases about journal articles: Ingenta (2)
Online access databases about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Ingenta has acquired Uncover in 2000.
• Available from
• http://www.ingenta.co.uk/
• http://www.ingenta.com/
171
Online access databases about journal articles: Article@INIST
Online access databases about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic database, NOT full-text (Journal articles, Journal issues, Books, Reports or Conferences, doctoral dissertations) at the Institut de l'Information Scientifique et Technique, France.
• Searching is free of charge.
• Available fromhttp://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.
172
Online access databases about journal articles: Eric
Online access databases about journal articles: Eric
• Eric allows searching a bibliographic database of articles and other documents in the fields of information science and education
• Searching is free of charge
• Available fromhttp://ericir.syr.edu/Eric/
• Payment is required to receive the full text of a document.
173
Online access databases about journal articles: Medline
Online access databases about journal articles: Medline
• Medline allows searching a bibliographic database of articles in the field of medicine.
• free of charge
• available from many sites, including Ingenta
174
!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?
Search for titles of journal articleswhich are relevant for you,
in a database provided free of charge.
Search for titles of journal articleswhich are relevant for you,
in a database provided free of charge.
175
Online access databases about journal articles: Web of Science
Online access databases about journal articles: Web of Science
• The Web of Science offers access through the WWW to a database of bibliographic descriptions of scientific journal articles in all subject domains.
• This database is (only) available to members of instituts/companies that pay a yearly fee to the producer/publisher of the database.
• This database is not only suitable for subject searching, but also for citation searching.
!? Question !? Task !? Problem !?
When the Web of Science databaseis available to you:
use itfor subject searching,
When the Web of Science databaseis available to you:
use itfor subject searching,
177
Online access information sources and services
Online access information sources and services
Future trends
178
Online access information: future trends
Online access information: future trends
• Increasing amount of information available online.
• Increasing quality of server and client software.
• Increasing number of end-users searching for information online.
179
Citation searching
180
iInformation retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the form of searching for a particular subject/concept by using words and terms: subject searching
• A complementary method is searching for relevant documents among the documents cited in a known, identified, relevant, good “seed” document:citation searching
181
iInformation retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant publications (+ snow-ball effect)
»citation indexes allow you to identify more recent publications which contain citations to that particular document
182
Information retrieval:using citations (scheme)
Time
NowSeed document
Snowball citation
searching
Citation
indexing
Past Future
183
iInformation retrieval:
citation indexes
Citation indexes are produced by the company I.S.I., the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index
»…
184
iInformation retrieval: publication
formats of citation indexes
• Printed
• On CD-ROM
»for use on a single microcomputer, or
»for implementation on a local network
• Online
» through online access information services
»all together as the Web of Science, implemented on many sites, accessible by all users of an institute/company that pays a fixed fee per year
185
Information retrieval: cited reference searching
An introduction to cited reference searching is offered through WWW by the Institute for Scientific Information:
186
!? Question !? Task !? Problem !?
For which kinds of subjects / research can citation searching
be particularly useful / appropriate?
For which kinds of subjects / research can citation searching
be particularly useful / appropriate?
187
iInformation retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that contain citations of a particular document or author, for instance to study the impact in science of particular ideas
»in that document or
»of that author(for instance in literature science)
• less appropriate when you search for papers on a more general subject
!? Question !? Task !? Problem !?
When the Web of Science databaseis available to you:
use itfor subject searching, and
for citation searching.
When the Web of Science databaseis available to you:
use itfor subject searching, and
for citation searching.
189
Document delivery and interlibrary lending
190
Document delivery: related activities and names
• Document delivery
• Document supply
• Interlibrary lending
• Information delivery
• Information supply
• Resource sharing
• Full text retrieval from external databases
• ...
191
Document delivery: some definitions
• Interlibrary lending = borrowing of books by a library to a requesting library
• Document delivery = Document supply = supply of copies of documents to a requesting person or institute, by libraries as well as by specialized, commercial centers
192
Document delivery: various actors
End-user / Person requesting an item
Institute and library of end-user
Intermediate institute
Library or document delivery service delivering the requested document
Direct delivery, endangering the existence of libraries
193
Document delivery: future trends
• Increasing number of requests for documents.
• Document supply centers supply scanned documents over the networks.
• Electronic journals
»Publishers store their documents on server computers and deliver electronically.
»Intermediaries (for instance traditional journal distributors) collect documents from editors and store documents on computer servers to provide access.
194
Evaluations in information retrieval
195
Evaluations in information retrieval:introduction
• The quality of the results, the outcome of any search using any retrieval system depends on many components / factors.
• These components can be evaluated and modified to increase the quality of the results more or less independently.
196
Evaluations in information retrieval:important factors
• The information retrieval system ( = contents + system)
• The user of the retrieval system and the search strategy applied to the system
Result of a searchResult of a search
197
Evaluations in information retrieval: the simple Boolean model
Boolean model: # items in database = # items selected + # items not selected
# Items selected =
# relevant items + # irrelevant items
Relevant Yes
1In
IrrelevantNo0
Out
198
Selecting relevant items
Dependent on the aims, independent of the search strategy
Selected and relevant!
Selectedbut not relevant
Not selected but relevant
Not selectedand not relevant
Dependent on the aims and dependent on the search strategy
199
Recall: definition and meaning
• Definition: # Of selected relevant items “Recall” = ------------------------------------------------- * 100% Total # of relevant items in database
• Aim: high recall
• Problem: in most practical cases, the total # of relevant items in a database cannot be measured.
200
Selecting relevant items: recall
Selected and relevant!
Selectedbut not relevant
Not selected but relevant
Not selectedand not relevant
201
Precision: definition and meaning
• Definition:
# Of selected relevant items“Precision” = --------------------------------------- * 100% Total # of selected items
• Aim: high precision
202
Selecting relevant items: precision
Selected and relevant!
Selectedbut not relevant
Not selected but relevant
Not selectedand not relevant
203
Relation between recall and precision of searches
100%
Recall
0 0 Precision 100%
Ideal = Impossibleto reach in most systems
Ideal = Impossibleto reach in most systems•
Search (results)
204
Recall and precision should be considered together
Examples:
• Increase in retrieved number of relevant items may be accompanied by an impractical decrease in precision.
• Precision of a search close to 100% may NOT be ideal, because the recall of the search may be too low. Make search / query broader to increase recall !
• Poor (low) precision is more noticeable than bad (low) recall.
205
Evaluation in the case of systems offering relevance ranking
• Many modern information retrieval systems offer output with relevance ranking.
• This is more complicated than simple Boolean retrieval, and the simple concepts of recall and precision cannot be applied.
• To compare retrieval systems or search strategies, decide to consider for comparison a particular number of items ranked highest in each output.This brings us to for instance: “first-20 precision”.
206
!? Question !? Task !? Problem !?
Give examples of retrieval systems that offer relevance ranking.
Give examples of retrieval systems that offer relevance ranking.
Top Related