An NLA Look at σ min Universality (& the Stoch Diff Operator)
description
Transcript of An NLA Look at σ min Universality (& the Stoch Diff Operator)
![Page 1: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/1.jpg)
An NLA Look at σmin Universality(& the Stoch Diff Operator)
Alan Edelman and Po-Ru LohMIT Applied Mathematics
Random Matrices
October 10, 2010
![Page 2: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/2.jpg)
Outline
• History of σmin universality
• Proof idea of Tao-Vu (in NLA language)• What the proof doesn't prove• Do stochastic differential operators say
more?
![Page 3: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/3.jpg)
A bit of history
• E ('89): Explicit formula (for finite iid Gaussian n x n matrices) for distribution of σn
– Analytic techniques: Integration of a joint density function, Tricomi functions, Kummer's differential equation, etc.
• Striking convergence to limiting distribution even in non-Gaussian case– Parallel Matlab in the MIT computer room – “Please don’t turn off” (no way to save work in the
background on those old Sun workstations)
![Page 4: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/4.jpg)
4
Empirical smallest singular value distribution: n x n Gaussian entries
![Page 5: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/5.jpg)
5
Empirical smallest singular value distribution: n x n ±1 entries
5
![Page 6: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/6.jpg)
6
Extending to non-Gaussians: How?
• Central limit theorem is a mathematical statement and a “way of life”– Formally: a (series of) theorems – with assumptions (e.g.,
iid) – and if the assumptions are not met, the theorems don't apply
– Way of life: “When a bunch of random variables are mixed up enough, they behave as if Gaussian”
– Example from our discussions: Does the square root of a sum of squares of (almost) iid random variables go to χn? Probably an application of CLT but not precisely CLT without some tinkering (what does “go to” mean when n is changing?)
![Page 7: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/7.jpg)
Outline
• History of σmin universality
• Proof idea of Tao-Vu (in NLA language)• What the proof doesn't prove• Do stochastic differential operators say
more?
![Page 8: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/8.jpg)
8
Tao-Vu ('09) “the rigorous proof”!• Basic idea (NLA reformulation)...
Consider a 2x2 block QR decomposition of M:
1. The smallest singular value of R22, scaled by √n/s, is a good estimate for σn!
2. R22 (viewed as the product Q2T M2) is roughly s x s Gaussian
M = (M1 M2) = QR = (Q1 Q2)( )
Note: Q2T M2 = R22
R11 R12 n-s
R22 s
n-s s n-s s
![Page 9: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/9.jpg)
9
Basic idea part 1: σs√n/s ≈ σn
• The smallest singular value of M is the reciprocal of the largest singular value of M-1
• Singular values of R22 are exactly the inverse singular values of an s-row subsample of M-1
• The largest singular value of an approximately low-rank matrix reliably shows up in a random sample (Vempala et al.; Rokhlin, Tygert et al.)– Referred to as “property testing” in theoretical CS
terminology
![Page 10: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/10.jpg)
10
Basic idea part 2: R22 ≈ Gaussian
• Recall R22 = Q2T M2
• Note that Q1 is determined by M1 and thus independent of M2
• Q2 can be any orthogonal completion of Q1
• Thus, multiplying by Q2T “randomly stirs up”
entries of the (independent) n x s matrix M2
• Any “rough edges” of M2 should be smoothed away in the s x s result R22
![Page 11: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/11.jpg)
11
Basic idea (recap)
1. The smallest singular value of R22, scaled by √n/s, is a good estimate for σn!
2. R22 (viewed as the product Q2T M2) ≈ s x s Gaussian
• We feel comfortable (from our CLT “way of life”) that part 2 works well
• How well does part 1 work?
![Page 12: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/12.jpg)
Outline
• History of σmin universality
• Proof idea of Tao-Vu (in NLA language)• What the proof doesn't prove• Do stochastic differential operators say
more?
![Page 13: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/13.jpg)
13
How good is the s x s estimator?
Fluctuates around ±10% of truth(at least for this matrix)
![Page 14: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/14.jpg)
14
How good is the s x s estimator?
A few more tries...
![Page 15: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/15.jpg)
15
More s x s estimator experimentsGaussian entries, ±1 entries
A lot more tries... again, accurate to say 10%1515 1515
![Page 16: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/16.jpg)
16
More s x s estimator experimentsn = 100 vs. n = 200
A lot more tries, now comparing matrix sizess = 10 to 50% (of n)... n = 200 a bit better
![Page 17: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/17.jpg)
17
How good is the s x s estimator?
• On one hand, surprisingly good, especially when not expecting any such result– “Did you know you can get the smallest singular value to
within 10% just by looking at a corner of the QR?”
• On the other hand, still clearly an approximation: n would need to be huge in order to reach human-indistinguishable agreement
![Page 18: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/18.jpg)
18
Bounds from the proof
• “C is a sufficiently large const (104 suffices)”• Implied constants in O(...) depend on E|ξ|C
– For ξ = Gaussian, this is 9999!!• s = n500/C
– To get s = 10, n ≈ 1020?• Various tail bounds go as n-1/C
– To get 1% chance of failure, n ≈ 1020000??
![Page 19: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/19.jpg)
19
Reality check: n x n Gaussian entries, revisited
![Page 20: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/20.jpg)
20
Reality check: n x n ±1 entries, revisited
![Page 21: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/21.jpg)
21
What the proof doesn't prove
• The s x s estimator is pretty nifty...
• … but the truth is far stronger than what the approximation can tell us
![Page 22: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/22.jpg)
Outline
• History of σmin universality
• Proof idea of Tao-Vu (in NLA language)• What the proof doesn't prove• Do stochastic differential operators say
more?
![Page 23: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/23.jpg)
23
Can another approach get us closer to the truth?
• Recall the standard numerical SVD algorithm starting with Householder bidiagonalization
• In the case of Gaussian random matrices, each Householder step puts a χ distribution on the bidiagonal and leaves the remaining subrectangle Gaussian
• At each stage, all χ's and Gaussians in the entries are independent of each other (due to isotropy of multivariate Gaussians)
![Page 24: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/24.jpg)
24
Bidiagonalization process for n x n Gaussian matrix
![Page 25: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/25.jpg)
25
A stochastic operator connection
• E ('03) argued that the bidiagonal of χ's can be viewed as a discretization of a stochastic Bessel operator
• – √x d/dx + “noise” / √2• As n grows, the discretization becomes
smoother, and the (scaled) singular value distributions of the matrices ought to converge to those of the operator
![Page 26: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/26.jpg)
26
A stochastic operator connection
![Page 27: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/27.jpg)
0 0.05 0.1 0.15 0.2
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
How close are we if we use kxk chi’s in the bottom,rest Gaussian?
n=200; t=1000000; v=zeros(t,1); for k=1 x=sqrt(n:-1:1);y=sqrt(n-1:-1:1); v=zeros(t,1);k, endx=(n-k+1:n); endy=(n-k+1:n-1);dofx=k:-1:1;dofy=(k-1):-1:1;for i=1:t yy=y+randn(1,n-1)/sqrt(2); xx=x+randn(1,n)/sqrt(2); xx(endx)=sqrt(chi2rnd(dofx)); yy(endy)=sqrt(chi2rnd(dofy)); v(i)=min(bidsvd(xx,yy)); if rem(i,500)==0,[i k],endendhold offv=v*sqrt(n);
k=1
k=2
k=3
k=4k=5k=6k=7..10k=inf
k=0
n=100n=200
Area of Detail
(Probably as n→inf, there is still a little upswing for finite k?)1 Million Trials in each experiment
![Page 28: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/28.jpg)
28
A stochastic operator connection
• Ramírez and Rider ('09) produced a proof• In further work with Virág, they have applied
the SDO machinery to obtain similar convergence results for largest eigenvalues of beta distributions, etc.
![Page 29: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/29.jpg)
29
Extending to non-Gaussians: How?
• The bidiagonalization mechanism shouldn't care too much about the difference...
• Each Householder spin “stirs up” the entries of the remaining subrectangle, making them “more Gaussian” (according to Berry-Esseen, qTx is close to Gaussian as long as entries of q are evenly distributed)
• Almost-Gaussians combine into (almost-independent) almost-χ's
• Original n2 entries compress to 2n-1
![Page 30: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/30.jpg)
30
SDO mechanism
• Old intuition: non-Gaussian n x n matrices act like Gaussian n x n matrices (which we understand)
• New view: non-Gaussian and Gaussian n x n matrices are both discretizations of the same object
• Non-random discretizations have graininess in step size, where to take finite differences, etc.
• SDO discretizations have issues like almost-independence... but can be overcome?
![Page 31: An NLA Look at σ min Universality (& the Stoch Diff Operator)](https://reader034.fdocument.pub/reader034/viewer/2022051402/56815a88550346895dc7f9a2/html5/thumbnails/31.jpg)
31
Some grand questions
• Could an SDO approach circumvent property testing (sampling the bottom-right s x s) and thereby get closer to the truth?
• Does the mathematics of today have enough technology for this? (If not, can someone invent the new technology we need?)