Iterative residual rescaling: An analysis and generalization of LSI
description
Transcript of Iterative residual rescaling: An analysis and generalization of LSI
![Page 1: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/1.jpg)
1
Iterative residual rescaling: An analysis and generalization of LSI
Rie Kubota Ando & Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI. In the 24th Annual Int
ernational ACM SIGIR Conference (SIGIR'2001), 2001.
Presenter: 游斯涵
![Page 2: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/2.jpg)
2
Introduction
• The disadvantage of VSM:– Documents that do not share terms are mapped to orthogonal
vectors even if they are clearly related.
• LSI attempts to overcome this shortcomings by projecting the term-document matrix onto a lower-dimensional subspace.
}0,1,0{id
}1,0,1{jd
![Page 3: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/3.jpg)
3
Introduction of IRR
• LSI
• IRR
Aterm
doc
Weight
SVD
U
VT
eigenvalue
eigenvector
eigenvector
rescaling
![Page 4: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/4.jpg)
4
Frobenius norm and matrix 2-norm
• Frobenius norm:
• 2-norm:
m
i
n
jji
def
FXX
1 1
2],[ )(
)( & XrankhX nm
2122
max XyXy
def
![Page 5: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/5.jpg)
5
Analyzing LSI
• Topic-based similarities– C: an n-document collection– D: m-by-n term-document matrix – k: underlying topics (k<n)– Relevance score:
for each document and each topic:
for each document:
True topic-based similarity between and
then we can get a n-by-n matrix S
),( dtrel
)(
2 1),(Ctopicst
dtrel
d d
)(
),(),(),(ctopicst
dtreldtrelddsim
),(],[ ddsimddS
doc
topic
Sdoc
topic
topic
doc
doc
doc
![Page 6: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/6.jpg)
6
The optimum subspace
• Give a subspace of , and B form an orthonormal basis of
•
m
xBBxP T)(
)( m
B
x
xBxBxBxB
xBxB T
Tdef
),cos(),cos(
0)(,
)( ,
xPxfor
xxPxfor
![Page 7: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/7.jpg)
7
The optimum subspace
• We have m-by-n term-document matrix D
• D= the projection of D onto
is
nddd ...... 21
)]( ... )([ 1 ndpdp
)()(
)()())(),(cos(
ji
jT
iji
dPdP
dPdPdPdP
![Page 8: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/8.jpg)
8
The optimum subspace
• Deviation matrix:
find a subspace such that the entries of it are small.
• The optimum subspace
• Optimum error
)()()(, DPDPSdiff TDS
2,)(
)(minarg
DSDrange
opt diff
2, )( DSopt diff
if optimum error is high, then we cannot expect the optimum subspace to fully reveal the topic dominances.
![Page 9: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/9.jpg)
9
The singular value decomposition and LSI
• SVD
• Gained on the left singular vector by following observation:
be the projection of onto the span of
let be the residual vector:
TVUZ
Left singular vector span Z’s rangeand
21 Z
)()(i
j dproj jd
juuu ,...,, 21
)( jir )()1(
ij
i dprojd
![Page 10: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/10.jpg)
10
Analysis of LSI
![Page 11: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/11.jpg)
11
Non-uniformity and LSI
• A crucial quantity in our analysis is the dominance of a given topic t
t
Cd
t dtrel 2),(
![Page 12: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/12.jpg)
12
Non-uniformity and LSI
• Topic mingling
• If the topic mingling is high means the similarity of each document with different topics is high, then the topics will be fairly difficult to distinguish.
'),(',
2)),'(),(()(ttCtopictt Cd
dtreldtrelC
Sdoc
doc
doc
topic
topic
doc
0)( case, documents topic-single thein C
![Page 13: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/13.jpg)
13
Non-uniformity and LSI
• let be the ith largest singular value of . Then i )(DPopt
))((22 Coptii
Cd
t dtrel 2),(
)()()(, DPDPSdiff TDS
2, )( DSopt diff
'),(',
2)),'(),(()(ttCtopictt Cd
dtreldtrelC
n
di direl1
2),(
![Page 14: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/14.jpg)
14
Non-uniformity and LSI
• Define
• We can get the ratio:
the more largest topic dominates the collection, the higher this ratio will tend to be.
opth of dimension theis h where and min1max
min
max
![Page 15: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/15.jpg)
15
Non-uniformity and LSI
• Original error:
Let denote the VSM space
then as
• Root original error
VSM
2
max ~~AASE T
VSM
AAPVSM
~)
~(
maxmax
VSMEVSM
( Input error )
![Page 16: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/16.jpg)
16
Non-uniformity and LSI
• Let be the h-dimension LSI subspace spanned by the first h left singular vectors of D
if
must be close to when the topic-document distribution is relatively uniform.
LSI
2min
min
min
max
2 )ˆˆ(1
ˆˆˆ
ˆ)),(tan(
vsm
vsmoptLSI
vsm̂ˆmin
LSIopt
![Page 17: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/17.jpg)
17
Notation for related values
• is topic mingling• For
we write
the approximation becomes closer as the optimum error (or optimum error) becomes smaller.
0... and 0... 321321 nn yyyyxxxx
n
1
222 if i iiii yxyx
avgiiiiii
opt
i optoptE
n
yxEyxyx
n
1i
222max22
)( and max if
2,max )( DSdiffE
n
diffE FDSavg
)(,
![Page 18: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/18.jpg)
18
Ando’s IRR algorithm
• IRR algorithm
n
j
iji
ij
xi rxru
1
2)()(
1)),cos((maxarg
2
DR~)1(
),( )()()1(i
iii uRprojRR
![Page 19: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/19.jpg)
19
Introduction of IRR
![Page 20: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/20.jpg)
20
Ando’s IRR algorithm
}){,(
~
))),(,cos(),((maxarg
),(
)()()1(
)1(
1
2)(
2
)(
1
2
2
IRRi
iii
n
j
ij
ij
x
IRRi
q
uRprojRR
DR
qrpowxqrpowu
rrqrpow
find the max x which
approximate R
![Page 21: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/21.jpg)
21
Ando’s IRR algorithm
] ... [ ,: D
onto projection its fromSubtract
,...,2,1:for
))),ˆcos(ˆ((maxarg:
:r̂
,...,2,1:for
,...,2,1:for
:
:),(
21IRR
1
21:
lT
ii
n
i iixxj
i
q
ii
bbbBwhereDBB
br
ni
xrrb
rr
ni
lj
DR
lqIRR
![Page 22: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/22.jpg)
22
Auto-scale method
• Automatic scaling factor determination:2)()(
n
DDDf F
Tdef
k
tt
k
tt
k
t tt
n
d
n
d
n
d
k
t
Cdd CtopictFF
T
dtreldtreldtrel
dtreldtrel
dtreldtrelSDD
1
4
22
1
22
1 '
2
1
2
1
2
n
1d 1'
2
1
2
', )(
22
)()(
)),'(),(()),((
))',(),((
))',(),((~~
n
di direl1
2),(
When approximately single-topicdoc
topic
![Page 23: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/23.jpg)
23
Auto-scale method
• Implement auto-scale
We set q to a linear function of f(D)
)(Dfq
![Page 24: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/24.jpg)
24
Dimension selection
• Stopping criterion:
residual ratio (effective for both LSI and IRR)
n
RF
j 2)(
![Page 25: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/25.jpg)
25
Evaluation Matrix
• Kappa average precision:– Pair-wise average precision:
the measured similarity for any two intra-topic documents( share at least one topic) should be higher than for any two cross-topic documents which have no topics in common.
j
jipofpprec i
j
that such pairs topicintra #)(
chance
chancepprecpprec j
j
1
)()(
Denote the document pair with the jth largest measure cosine
pairsdocument of #
pairs topic-intra of #
Non intra-topic probability
![Page 26: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/26.jpg)
26
Evaluation Matrix
• Clustering:
let C be a cluster-topic contingency table
is the number of documents in cluster i that relevance to topic j.
define:
],[ jiC
n
NCS ji ji
, ,)(
column androw its both in
maximum ],[ ],[, uniquetheisjiCifjiCN ji
othewise 0, jiN
![Page 27: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/27.jpg)
27
Experimental setting
• (1)Choose two TREC topics (can choose more than two)• (2)Specified seven distribution type:
– (25,25), (30,20), (35,15), (40,10), (43,7), (45,5), (46,4)– Each document was relevant to exactly one of the pre-select
topics.
• (3)Extracted single-word stemmed terms using TALENT and removed stop-words.
• (4)Create term-document matrix, and length-normalized the document vector.
• (5)implement AUTO-SCALE, set0 3.5 where)( Dfq
![Page 28: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/28.jpg)
28
Controlled-distribution results
• The chosen scaling factor increases on average as the non-uniformity goes up.
min
max
![Page 29: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/29.jpg)
29
Controlled-distribution results
Highest S(C)
lowest S(C)
![Page 30: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/30.jpg)
30
Controlled-distribution results
![Page 31: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/31.jpg)
31
Conclusion
• Provided a new theoretical analysis of LSI.• Showing a precise relationship between LSI’s
performance and the uniformity of the underlying topic-document distribution.
• Extend Ando’s IRR algorithm.• IRR provide a very good performance in comparison to
LSI.
![Page 32: Iterative residual rescaling: An analysis and generalization of LSI](https://reader035.fdocument.pub/reader035/viewer/2022070404/56813c30550346895da5ae47/html5/thumbnails/32.jpg)
32
IRR on summarization
term
doc
turn to term
sentence
IRRU
VT
Put all document as a query to countthe similarity