Assessing The Retrieval

A.I Lab2007.01.20박동훈

Contents

• 4.1 Personal Assessment of Relevance• 4.2 Extending the Dialog with RelFbk• 4.3 Aggregated Assessment : Search Engi

ne Performance• 4.4 RAVE : A Relevance Assessment Vehicl

e• 4.5 Summary

4.1 Personal Assessment of Relevance

• 4.1.1 Cognitive Assumptions– Users trying to do ‘object recognition’– Comparison with respect to prototypic docum

ent– Reliability of user opinions?– Relevance Scale– RelFbk is nonmetric

Relevance Scale

• Users naturally provides only preference information

• Not(metric) measurement of how relevant a retrieved document is!

RelFbk is nonmetric

4.2 Extending the Dialog with RelFbk

RelFbk Labeling of the Retr Set

Query Session, Linked by RelFbk

4.2.1 Using RelFbk for Query Refinment

4.2.2 Document Modifications due to RelFbk

• Fig 4.7• Change

documents!?• More/less the

query that successfully / un matches them

4.3 Aggregated Assessment : Search Engine Performance

• 4.3.1 Underlying Assumptions – RelFbk(q,di) assessments independent– Users’ opinions will all agree with single ‘omni

scient’ expert’s

4.3.2 Consensual relevance

Consensually

relevant

4.3.4 Basic Measures

• Relevant versus Retrieved Sets

Contingency table

Relevant Relevant

Retrieved

RelRetr

NRel NNRel NDoc

RelRetr

RelRetr RelRetr

• NRel : the number of relevant documents

• NNRel : the number of irrelevant documents

• NDoc : the total number of documents

• NRet : the number of retrieved documents

• NNRet : the number of documents not retrieved

4.3.4 Basic Measures (cont)

RelRetPrecision

RelRetRe

4.3.4 Basic Measures (cont)

• Ret

RelRetFallout

4.3.5 Ordering the Retr set

• Each document assigned hitlist rank Rank(di)• Descending Match(q,di)• Rank(di)<Rank(dj) ⇔ Match(q,di)>Match(q,dj)

– Rank(di)<Rank(dj) ⇔ Pr(Rel(di))>Pr(Rel(dj))

• Coordination level : document’s rank in Retr– Number of keywords shared by doc and query

• Goal:Probability Ranking Principle

• A tale of tworetrievals

Query1 Query2

Recall/precision curveQuery1

Retrieval envelope

4.3.6 Normalized recall

ri : i 번째 relevant doc 의 hitlist rank

4.3.8 One-Parameter Criteria

• Combining recall and precision• Classification accuracy• Sliding ratio• Point alienation

Combining recall and precision

• F-measure– [Jardine & van Rijsbergen71]– [Lewis&Gale94]

• Effectiveness– [vanRijsbergen, 1979]

• E=1-F, α=1/(β2+1)• α=0.5=>harmonic mean of

precision & recall

RrecallPrecisionβ

Recall*n1)Precisio(βF

Recall

Precision1E

Classification accuracy

• accuracy

• Correct identification of relevant and irrelevant

RelRetrRelRetr

Sliding ratio

• Imagine a nonbinary, metric Rel(di) measure• Rank1, Rank2 computed by two separate system

Point alienation

• Developed to measure human preference data• Capturing fundamental nonmetric nature of RelFb

4.3.9 Test corpora

• More data required for “test corpus”• Standard test corpora• TREC:Text Retrieval Evaluation Conference• TREC’s refined queries• TREC constantly expanding, refining tasks

More data required for “test corpus”

• Documents• Queries• Relevance assessments Rel(q,d)• Perhaps other data too

– Classification data (Reuters)– Hypertext graph structure (EB5)

Standard test corpora

TREC constantly expanding,refining tasks

• Ad hoc queries tasks• Routing/filtering task• Interactive task

Other Measure

• Expected search length (ESL)– Length of “path” as user walks down HitList

– ESL=Num. irrelevant documents before each relevant document

– ESL for random retrieval

– ESL reduction factor

4.5 Summary

• Discussed both metric and nonmetric relevance feedback

• The difficulties in getting users to provide relevance judgments for documents in the retrieved set

• Quantified several measures of system perfomance

Assessing The Retrieval

Documents

Transcript of Assessing The Retrieval

IBM Research TREC-2002 Video Retrieval System · analysis, indexing, and retrieval of video, which was ap-plied to the TREC-2002 video retrieval benchmark. The system explores methods

Review Disaster epidemiology: Assessing the health …J. Natl. Inst. Public Health, 67 (1) : 2018 Disaster epidemiology: Assessing the health impacts of environmental public health

Assessing the impact of lava flows during the 2020 unrest ...

FMECA analysis for the assessing of maintenance activity ...

Assessing the Developing Child

The SMOS Soil Moisture Retrieval Algorithm

SUMMARY OF THE PhD THESIS ASSESSING THE α … felician.pdfeng. pop n. felician dorin summary of the phd thesis assessing the α s1-casein polymorphism effect on milk quality and cheesemaking

Typology and Benchmark of Tools for Assessing the Mobile ...

Advanced Numerical Simulation of Gas Explosion for Assessing the

Assessing Intercultural Learning - … · Challenges related to assessing intercultural learning ... Formative vs. Summative Assessment ... the process by imagining a time when the

ASSESSING THE SUSTAINABILITY OF CROP PRODUCTION IN …

06.06.2005 The PicSOM Retrieval System 1 Christian Steinberg.

ERP Cloud: Assessing Readiness and Building the Roadmap

Assessing the Cumulative Effects of Multiple Restoration ... · Assessing the Cumulative Effects of Multiple Restoration Projects Heida L. Diefenderfer, Ph.D. Pacific Northwest National

Assessing the Prospects of Accelerated Graduation of the ...unohrlls.org/custom-content/uploads/2014/11/Assessing-the-Prospe… · Graduation 2012 Inclusion 2012 45 degree line .

Assessing the Value Relevance of Accounting Data After the ...

Assessing the quality of open access journals suzhou presentation

Assessing the diversity of phytotelmic frogs along … 63 (2): 193 – 250 11.9.2013 Senckenberg Gesellschaft für Naturforschung, 2013. N 1864-5755 Assessing the diversity of phytotelmic

Assessing the impact of the realized range on the (E)GARCH ... · Assessing the impact of the realized range on the (E)GARCH volatility: Evidence from Brazil Resumo O presente artigo

BUSINESS PROCESS RETRIEVAL BASED ON ......108 Revista EIA Rev.EIA.Esc.Ing.Antioq Business process retrieval Based on Behavioral semantics of three automatic BP retrieval tools: the