Searching & Utilizing of Information 陈贵梧 Chen Gui-wu September, 2006.

Post on 15-Dec-2015

214 views 1 download

Transcript of Searching & Utilizing of Information 陈贵梧 Chen Gui-wu September, 2006.

Searching & Utilizing of Information

陈贵梧 Chen Gui-wu

September, 2006

Outline

Ⅰ About the Course

ⅡRules of Information

Ⅲ Basic Concepts

Ⅳ Information Sources

ⅤQuery Languages

Ⅵ Questions & Answers

Questions/comments are always welcome!

Ⅰ About the Course

Goals: 1. introduce students to the basic academic informa

tion sources; 2. acquaint students with the strategies and techniq

ues of information searching; 3. develop students’ information competency, inclu

ding information identification, acquisition, evaluation, and utilization.

About the Course

Competencies:Identify, understand, and effectively use a variety of information sources in print and electronic formats.

Become familiar with and grasp most often used searching strategies

Acquire, organize, and evaluate academic information at a basic level.

Applied information searching techniques into practices.

About the Course

Activities:Lectures & Presentations PracticesWorkgroup Homework Examination

Ⅱ Rules of Information Rule1: Go Where It Is

Rule2: The Answer You Get Depends on the Questions You Ask

Rule3: The Answer Should Match the Information Need

Rule4: Question Your Answers — Information May Be True But Still Wrong

Rule5: Research Is a Multi-Stage Process

Ⅲ Basic Concepts

1. Information Chain

2. Database

3. Electronic Journal

4. Fields & Records

5. Search engine Vs. Web Directory

6. Impact Factor

1. Information Chain

A rough way of measuring the usual materials is to classify them as primary, secondary, or tertiary.

Primary sources They are original materials which have not been filtered through interpretation, consideration, or, often, even evaluation by a second party.

e.g. a journal article, monograph, report, patent, dissertation, or reprint of an article.

Information Chain Secondary sources

A secondary source is information about primary, or original, information which usually has been modified, selected, or rearranged for a specific purpose or audience.

e.g. index, abstracts

The neat distinction between primary and secondary sources is not always apparent.

Information Chain Tertiary Sources

These consist of information which is a distillation and collection of primary and secondary sources.

Twice removed from the original, they include almost all the source types of reference.

e.g. encyclopedias, reviews, bibliographical sources, fact books, and almanacs.

Information Chain The definitions of primary, secondary, and tertiary sources

are useful only in that they indicate :

relative currency (Primary sources tend to be more current than secondary sources.) ,

relative accuracy of materials (primary sources will generally be more accurate than secondary sources, only because they represent unfiltered, original ideas; but conversely, a secondary source may correct errors in the primary source).

2. Database A set of related files that is created and managed by a datab

ase management system (DBMS).

Today, DBMSs can manage any form of data including text, images, sound and video etc.

Databases can be categorized into index databases, abstracts databases, and full-text databases; or classified as general databases and subject-specific databases.

Database Types

Index Database

An index database lists the author, title, date, volume, and source for an article.

Abstracts Database

An abstract database gives all of that information, as well as an abstract, which is a short summary of the article.

Full Text Database

Increasingly, databases are including the full-text of articles.

The whole article can be printed, e-mailed, or saved to a disk for later usage.

However, not everything is available in full-text.

Database Types

Some are general, such as EBSCO, SDOS, Lexis-Nexis, or UMI Proquest,

while others are subject-specific, such as BA, CA, Ovid, Medline or Pubmed.

3. Electronic Journal

Any journal available over the Internet can be called an "electronic journal" or "e-journal ".

In many cases e-journals are counterparts to familiar print publications, although an increasing number of titles exist only in electronic format.

Frequently e-journals appear on the screen exactly as they do in print with similar page design and typeface.

Electronic Journal

Often e-journal issues may be available before the print counterpart is on the library shelf.

Some e-journals even provide advance copy of articles accepted for publication but not yet scheduled for a print issue.

In most cases, the electronic equivalent of a print journal only exists for the most recent volumes; older issues still need to be read in print.

e-journals Vs. databases

Full-text e-journals may be viewed as individual issues that correspond to their print counterparts.

Typically, you will use these on the publishers' site where you may browse the table of contents of an issue, scan the abstracts of articles, and view the full-text of articles.

Full-text article databases are collections of articles, not complete issues of journals.

Often these collections bring together articles on a particular subject, such as medicine or biology studies; they are usually searched by subject.

5. Fields & Records

A field is a unit of data . Examples of fields are title (TI), keyword, author (AU), source (SO), address (AD) and abstract (AB), etc..

A collection of fields make up a record.

In databases, searchable fields are sometimes called search options.

Sample Record

Fields

5. Search Engine Vs. Web Directory

A search engine is a program designed to help find files stored on a computer. Most outstanding search engine is Google.

The search engine allows one to ask for media content meeting specific criteria (typically those containing a given word or phrase) and retrieving a list of files that match those criteria.

A search engine often uses a previously made, and regularly updated index to look for files after the user has entered search criteria.

Search Engine Vs. Web Directory

A web directory is a directory on the World Wide Web that specializes in linking to other web sites and categorizing those links.

A web directory:- has a pre-defined list of websites- is compiled by human editors- is categorized according to subject/topic- is selective

Search Engine Vs. Web Directory

Web directories don't use software programs. They often allow site owners to submit their web sites for inclusion. Editors review and organize qualified web sites by subject into categories.

The most popular directories are Yahoo and Open Directory Project (http://dmoz.org/).

Search Engine Vs. Web Directory

Consider using the Directory instead of search engines whenever you want to: Familiarize yourself with a topic. Get suggestions for ways to narrow your search. Find ideas for query terms. Figure out the scope of a given category, e.g., the

number of newspapers in California. View only pages that have been evaluated by a human editor.

6. Impact Factor

Journal Impact Factor is from Journal Citation Report (JCR), a product of Thomson ISI (Institute for Scientific Information).

It is a measure of the frequency with which the "average article" in a journal has been cited in a particular year.

It is calculated by dividing the number of current citations to articles published in the two previous years by the total number of articles published in the two previous years.

It will help you evaluate a journal’s relative importance, especially when you compare it to others in the same field.

Ⅳ Information SourcesGeneral Databases:EBSCO, SDOS, John Willey online Journal, ProQuest-ARL

Specific Databases:Pubmed, Medline, OVID, MD Consult, Cell Press

Index Databases:BA, CA, SciFinder Scholar, EI, Web of Science (SCI, BP, ISTP)

Internet Resources:BioMed Central, HighWire Press

Search Engines:Google

Web Directories:Yahoo, Open Directory Project

Ⅴ Query Languages 1. Keyword-based Querying

2. Boolean Operators

3. Proximity Operators

4. Other Operators

5. Truncation

6. Parentheses

1. Keyword-based Querying

Single-word :What a word is?

Letters, separatorsNon-splitting characters: on-line. Database decides.

Text documents are assumed to be essentially long sequence of words.

the result of word queries is the set of documents containing at least one of the words of the query

Intuitive, easy to express, fast ranking.Words can be highlighted in the output. the exact positions where a word appears in the text my be required

Keyword-based Querying

Context Queries: Ensure that the words are related

Phrase :a sequence of words; normally, the exact phrase must be matched.

“enhance retrieval”

Allows separators and stopwords: “enhance the retrieval”

Proximity: a sequence of single words or phrase is given, together with a maximum allowed distance between them.

“enhance the quality of information retrieval”

Distance: words, letters. Order: same or not

2. Boolean Operators : AND

AND is used to find documents which contain both of the search terms linked by the operator and to eliminate documents which contain only one or neither of the search terms.

e1 AND e2

-- select all documents which satisfy both e1 and e2

e.g., to find documents on transgenic mice:

transgenic AND mice

Boolean Operators : OR

OR is used to find documents which contain either one or both of the search terms:

e1 OR e2

-- select all documents which satisfy e1 or e2

e.g., to find all documents referring to the kidney, the liver or both organs:

kidney OR liver

Boolean Operators : NOT

NOT is used to exclude documents from a retrieved set.

e1 NOT e2

-- select all documents which satisfy e1 but not e2

e.g., to find documents on rodents which do not deal with rats:

rodents NOT rats

3. Proximity Operators

Proximity operators allow you to locate one word within a certain distance of another. The symbols generally used in this type of search are w and n.

The w represents the word with(in) and the n represents the

word "near." This type of search is not available in all databases.

This can be useful to narrow down a search when searching for a sequence of words, if the exact sequence is not known, or if no other means is available to indicate that the key words should be treated as a phrase . 

Proximity Operators

Near Operator (Nx) -- finds words within x number of words from each other, regardless of the order in which they occur.

e.g.: television n2 violence would find "television violence" or "violence on television," but not "television may be the culprit in recent high school violence."

Within Operator (Wx) — finds words within x number of words from each other, in the order they are entered in the search.

e.g.: Franklin w2 Roosevelt would find " Franklin Roosevelt " or " Franklin Delano Roosevelt " or " Franklin D. Roosevelt ", but would not find " Roosevelt Franklin ".

4. Other operators

A number of other operators are commonly permitted by retrieval system and can be used to refine searching carried out in the simple search mode.

The most useful of these are:

+ - ~ “”

(to be discussed in Google Searching)

5. Truncation

Certain symbols, often * or ?, may be used in some search systems as wildcards to signify one or more characters.

Their use is most frequently permitted only for truncation at the end of a search term.

e.g. sul*ur will retrieve both sulfur and sulphur , while sulph* will retrieve sulphuric, sulphurous, sulphate, sulphite, etc, but not sulfuric, sulfurous, sulfate, sulfite, etc.

Truncation can result in too many irrelevant retrievals.

e.g., the truncated term diet* will retrieve documents containing the words diet, dietary, dietetic, dietician, but also any references to diethyl compounds.

6. Parentheses

The operators within a pair of parentheses are treated as a single unit which is processed first.

e.g. to find documents which mention cell culture or tissue culture:

cultur* AND (cell* OR tissue)

Note the use of truncation to cover variants such as cultured, culturing, cultures and cells.

Ⅵ Questions?

Email: chengw21th@sina.com

(MSN) BernersChen@hotmail.com

Tel: 85220285

谢谢 !Thank you!