スパース線形回帰における 統計力学的アプローチ -天文学のデー … · 2+A...
Transcript of スパース線形回帰における 統計力学的アプローチ -天文学のデー … · 2+A...
スパース線形回帰における 統計力学的アプローチ -天文学のデータを例に-
小渕智之1 東工大 情報理工学院 数理計算科学系1
共同研究者 樺島祥介1、中西義典2、岡田真人3、植村誠4
東大駒場2、東大新領域3、広島大学4
• The solution is ``sparse’’ (many zeros) • A representative: regularization
Underdetermined Linear Equationy = Ax E.g. M=2, N=3
y1 = A11x1 +A12x2 +A13x3
y2 = A21x1 +A22x2 +A23x3=M
(
(N
Given y and A, compute x ←not unique
||x||0 : # of nonzero components
minx
1
2||y �Ax||22 s.t. ||x||0 K = N⇢
`0
x̂(�) = argminx
⇢1
2||y �Ax||22 + �||x||0
�
: Standard form
: Lagrange form
Variants of regularizations
x̂ = argminx
⇢1
2||y �Ax||22 +R(x)
�• General regularization
regularization`p
Cusps lead sparseness (x=0)
||x||pp = X
i
|xi|p!
p=1 is common (LASSO) - p>1: Non sparse - p<1: Nonconvex and local minima (Comp. difficult)
本日の内容1. p=1で近似的Cross validation (CV)
• TO and YK: J. Stat. Mech. (2016) (LASSO)
2. p=0は本当に難しいのか?アルゴリズムの典型性能と相転移 • TO, YN, YK, and MO: In preparation
✤ p=0をモンテカルロで解く(Simulated annealing (SA) ) • TO and YK: J. Phys.: Conf. Ser. 699 (2016) 012017(1-12)
3. SA+愚直CVでハイパーパラメータ選択
• TO and YK: arXiv:1603.01399(EUSIPCO2016 proceedings)
LASSO and
Approximate Cross Validation
C.f. TO and Y. Kabashima, J. Stat. Mech. (2016)
LASSOとハイパーパラメータ選択• LASSO (Lagrange form)
• 正則化項のパラメータ(λ)はどう決めるか?
• いくつかの流儀
• 情報量規準
• (経験)ベイズ法
• 交差検証法(Cross validation, (CV) )
x̂(�) = argminx
⇢1
2||y �Ax||22 + �||x||1
�H(x)
交差検証法(CV)
Data
TestTraining
Test
Test
Test
• データをTraining setとTest setに分割 • Training Set でパラメータを訓練し • Test Setで予測能力を測る
Average
CV Error= 予測誤差の推定値
k等分に分割する方法をk-fold CV: k=10やk=M(=leave-one-out) がよく用いられる(左図はk=4) - kは大きい方が 精度はいいが計算量がかかるFold 1
Fold 2
Fold 3
Fold 4
交差検証法(CV)
Data
TestTraining
Test
Test
Test
• データをTraining setとTest setに分割 • Training Set でパラメータを訓練し • Test Setで予測能力を測る
Average
CV Error= 予測誤差の推定値
k等分に分割する方法をk-fold CV: k=10やk=M(=leave-one-out) がよく用いられる(左図はk=4) - kは大きい方が 精度はいいが計算量がかかるFold 1
Fold 2
Fold 3
Fold 4
CVを低計算量で できるようにしたい
LOOCVと線形応答近似• 1つ抜きCV(Leave-one-out CV, LOOCV)
←M回最適化 (計算量高)
H(x̂)�H(x̂� d) ⇡X
µ
h
µ(x̂) · d
x̂
\µ ⇡ x̂� �\µh
µ(x̂) �\µ =@ x̂\µ
@h,
• 線形応答近似 • 摂動の導出:コスト関数を について展開 • 応答係数(帯磁率) を計算:後述
d = x̂� x̂
\µ
�\µ
x̂
\µ(�) = argminx
8<
:1
2
X
⌫( 6=µ)
y⌫ �
X
i
A⌫ixi
!2
+ �||x||1
9=
;
✏LOO(�) =1
2M
X
µ
yµ �
X
i
Aµix̂\µi (�)
!2
近似的交差検証公式• 線形近似+モデルの線形性+LOOEの定義式
++ ��1 = (�\µ)�1 + aµa>µ ✏LOO =
1
2M
MX
µ=1
yµ �
X
i
Aµix̂\µi
!2
• あとは、帯磁率が計算できるか否か • 正則化項に特異性が無ければ?
• コスト関数ヘシアンの逆: • LASSOの場合
• Active変数に関してのみの、コスト関数ヘシアンの逆 • Active変数集合は摂動で変化しないと仮定
x̂
\µ ⇡ x̂� �\µh
µ(x̂)
✏LOO ⇡ 1
2M
X
µ
0
@1�X
i,j
AµiAµj�ij
1
A�2
yµ �X
i
Aµix̂i
!2
� =⇣A>A
⌘�1
�SASA =⇣A>
⇤SAA⇤SA
⌘�1
近似的交差検証公式2つ
• Approx. 1
• Approx. 2=Approx.1 + Random matrix theory
1�X
i,j
AµiAµj�ij ⇡✓
↵
↵� ⇢(�)
◆�1
✏LOO ⇡ 1
2M
X
µ
0
@1�X
i,j
AµiAµj�ij
1
A�2
yµ �X
i
Aµix̂i
!2
�SASA =⇣A>
⇤SAA⇤SA
⌘�1
✏LOO ⇡ 1
2M
MX
µ=1
0
@1�X
i,j2SA
Aµi
⇣A
>⇤SA
A⇤SA
⌘�1
ijAµj
1
A�2
yµ �X
i
Aµix̂i
!2
✏LOO ⇡✓
↵
↵� ⇢
◆2 1
2M||y �Ax̂||22 =
✓↵
↵� ⇢
◆2
✏Train
↵ =M
N, ⇢(�) =
||x̂||0N
DEMO: Application to
SuperNovae Data Analysis
Application to SuperNovae data analysis• Type Ia supernovae produce consistent peak luminosity (absolute
magnitude at maximum) determined by “Chandrasekhar limit”. – In terms of the 0th approximation
• “Standard candle” of the Universe -> distance • “Accelerating expansion of the Universe”, Nobel prize in 2011
�Riess et al (2001)�
(red shift)�
Application to SuperNovae data analysis
Napolinano97�
(time: days)�
Light curve width�
Peak luminosity (magnitude at maximum) �
Candonau+87�
Color index�
• However, in reality, the peak luminosity varies owing to various factors.
• Two known major factors – “Light curve width” (radioactive decay) – “Color index” (interstellar extinction)
-6 -5 -4 -3 -2 -10.050.1
0.15 Approx 1
-6 -5 -4 -3 -2 -1
CV e
rrors
0.050.1
0.15 Approx 2
log λ -6 -5 -4 -3 -2 -1
0.050.1
0.15 10-fold CV
Application to SuperNovae data analysis• Calibration of the peak luminosity
http://heracles.astro.berkeley.edu/sndb/
M µ ≈ M 0 + Aµ ,1β1 + Aµ ,2β2 + Aµ ,3β3 +…+ Aµ ,276β276
µth data� const�
Color index�Light curve width�
Other candidates
• Berkeley supernova database • M=78, N=276 (cf. M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015) )
log λ -6 -5 -4 -3 -2 -1
CV er
rors
0.05
0.1
0.15
0.2Approx 1Approx 210-fold CV
-6 -5 -4 -3 -2 -10.050.1
0.15 Approx 1
-6 -5 -4 -3 -2 -1
CV e
rrors
0.050.1
0.15 Approx 2
log λ -6 -5 -4 -3 -2 -1
0.050.1
0.15 10-fold CV
Application to SuperNovae data analysis• Calibration of the peak luminosity
http://heracles.astro.berkeley.edu/sndb/
M µ ≈ M 0 + Aµ ,1β1 + Aµ ,2β2 + Aµ ,3β3 +…+ Aµ ,276β276
µth data� const�
Color index�Light curve width�
Other candidates
• Berkeley supernova database • M=78, N=276 (cf. M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015) )
log λ -6 -5 -4 -3 -2 -1
CV er
rors
0.05
0.1
0.15
0.2Approx 1Approx 210-fold CV
Minimum+one-sigma rule gives K=6
10-fold App. 1 App. 231.6 s 3.2 s 2.85 s
Actual computational time:
Summary• Leave-one-out CV in LASSO(+TV) has been examined
• An efficient approximation of the LOOE has been invented
• Perturbation assuming M and N are large enough • Reproducing the direct CV result in much shorter time
は本当に難しいのか? アルゴリズムの典型性能と相転移
& モンテカルロベースアルゴリズム
C.f.
TO and YK, J. Phys.: Conf. Ser. 699 (2016) 012017(1-12) (2016);
Y. Nakanishi, TO, M. Okada, YK, J. Stat. Mech. (2016) 063302;
another paper in preparation
`0
情報統計力学• 情報統計力学:統計力学の手法で情報科学の問題を解析
1980
1990
2000
~現在
✤ニューラルネット • 連想記憶容量の評価・ボルツマンマシン • cf) Hopfiled 1982, Hinton 1985, Gardner-Derrida 1988
✤学習理論 • 汎化能力の評価・ベイズ推定 • cf) Levin 1990, Gyorgyi-Tishby 1990
✤符号・通信・組合せ最適化問題 • 典型性能評価・解探索アルゴリズム・相転移と可解性 • cf) Kabashima-Saad 1999, Tanaka 2001, Cocco-
Monasson 2001, Mezard et al. 2003 ✤スパース推定・逆問題・機械学習
• cf) Kabashima et al. 2009, Krzakala et al. 2012
2010
情報統計力学• 情報統計力学の強み?
• 統計力学由来のアルゴリズム・計算手法 • 平均場近似・Cavity法・モンテカルロ法
• 大自由度系の処理が当初から念頭にある• 独自の問題の特徴づけ・性能評価法
• 相転移・臨界現象
情報統計力学
• 圧縮センシングを例に
• 独自の問題の特徴づけ・性能評価法 • 相転移・臨界現象
min
x
⇢1
2
||y �Ax||�
subj. to ||x||pp ˆK = N ⇢̂P(p) :
y = Ax0, ||x0||0 = K0 = N⇢0
• 以下のスパース推定を考えたとき
• スパースシグナルx0を観測結果yから完璧に復元できるのはどういうときか?
• 圧縮センシングを例に
• 独自の問題の特徴づけ・性能評価法 • 相転移・臨界現象
min
x
⇢1
2
||y �Ax||�
subj. to ||x||pp ˆK = N ⇢̂P(p) :
y = Ax0, ||x0||0 = K0 = N⇢0
• 以下のスパース推定を考えたとき
• スパースシグナルx0を観測結果yから完璧に復元できるのはどういうときか?
limit`1
limit`0
⇢0
解析解(Kabashima et al. (2009))数値解(Donoho-Tanner (2006))
情報統計力学
復元可能領域を増やしたい• p<1を考えれば、、、 ★妥当なアルゴリズムが組めるか?
• p=0 • 貪欲法(OMPなど)
• スケーリングリミットには全然到達しない • ベイズとMessage passing (Krzakala et al. 2012)
• limitにPolynomial timeで到達できる! • 実問題には、、、(ベイズだから仮定が多い)
• モンテカルロサンプリング!(TO et al. 2016~) • ベイズでなく、最適化問題ベースで定式化
`0
⇢0
X
i
ci = 3 = 6⇥ 1
2
Introduction of Support Indicator
y = Ax
N = 6
((
M = 4
c1 = 0c2 = 1
c4 = 1
c6 = 1
c3 = 0
c5 = 0
⇡
y ⇡ x2a2 + x4a4 + x6a6
⇢
• Introduce binary variables c expressing which variables are used • We call it sparse weight, and S={i|c_i=1} support.
Solution on Fixed Support
基底状態探索問題(組合せ最適化) Worst is NP hard (Natarajan 95.)
全探索は無理→近似的探索→モンテカルロHamiltonian
c1 = 0c2 = 1
c4 = 1
c6 = 1
c3 = 0
c5 = 0
Ac
xc
• Shrunken expression on support
=
minx
1
2||y �Ax||22 s.t. ||x||0 K = min
c:|c|=Kminxc
1
2||y �A
c
x
c
||22
x̂c = (ATcAc)
�1ATc y
= minc:c=K
⇢H(c) =
1
2yT
�I �Ac(A
TcAc)
�1ATc
�y
�
Monte Carlo: Metropolis Algorithm
• Energy difference, Boltzmann distribution, Metropolis criterion
• Note!: Comp. cost of Δ is O(MK~N2) thanks to matrix inversion formula
• Single column flipc = (0, 1, 0, 1, 0, 1)T c0 = (0, 1, 0, 1, 1, 0)T
�(c ! c0) =1
2yT
n
Ac�
ATcAc
��1AT
c �Ac0�
ATc0Ac0
��1AT
c0
o
y
P (c;µ, ⇢) = (1/Z)� (|T |�N⇢) e�µH(c)
Paccept(c ! c0) = minn
1, e�µ�(c!c0)o
• 自由度K=|c|が自動的に一定
Sampling at T=0: Simulated Annealing(SA)
• Simulated Annealing(SA): • An optimization solver using MCMC
• Start from High T=1/µ • Update sparse weights c with gradually decreasing T • At T~0, the sampled c is the minimum of ✏(c)
SA and phase spaceHigh TMiddle TLow TT≒0
T (t) >A(N)
log(t+ 2)
✏
Benchmark• Benchmark=Test on Random data
⇠µ ⇠ N (0,�2⇠ )
• Annealing Schedule
(a = 1, · · · , 100)µa = T�1
a = 10�8 + ra�1 � 1, r = 1.1
• Scaling of the problemM = ↵N
N = 100, 200, 400
Aµi ⇠ N (0, N�1)
⌧a = ⌧
y = Ax0 + ⇠
P (x0i) = ⇢0N (0,�2x
) + (1� ⇢0)�(x0i)
✏ = hHi /N ↵, ⇢0, ⇢ = O(1)
• On this setup, the performance of simulated annealing (SA ) is examined
minc:|c|=K
{H(c)}
-数値実験の前に- 統計力学的解析による アルゴリズム性能の理解
統計力学的解析
• Energy of Ising (binary) spins • Energy landscape→Entropy
• Training error=Hamiltonian
s(✏) =1
Nlog Tr
c� (✏� ✏(c))
• If y, A are Gaussian, we can compute the typical entropy by the Replica method • Note!: We fix |c|=Nρ⇔Magnetization is fixed
H(c) =1
2y>
⇣I �Ac(A
>c Ac)
�1A>c
⌘y = M✏(c)N✏(c)
エントロピーと解探索の難しさの関係
何もない系 (エントロピーは 解析的で凸)
s(✏)
✏
スピングラス転移 (エントロピーの
見た目ではわからない 深刻な減速を起こす)
s(✏)
✏
2次転移 (高次微分にずれ。 臨界減速を起こす)
s(✏)
✏1次転移
(複数の山を持つ。 山間移動は超難)
s(✏)
✏
Details in Analysis• Introduce the generating function
• Taking the average over data and basis matrix: Replica method
• Two different kinds of replicas, n and ν are needed • Due to two different dynamical variables c and x
Hamiltonian in β→∞
Phase Diagram (T-ρ)
⇢0
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
0.25
T
=0.5, 0=0.25
TATTECTSPTF
0 0 0.1 0.2 0.3 0.4 0.50
0.002
0.004
0.006
0.008
0.01
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
T
=0.5, 0=0.2
TATTECTSPTF
00 0.1 0.2 0.3 0.4 0.5
0
0.002
0.004
0.006
0.008
0.01
• TAT: AT instability temp. • TEC: Entropy crisis (RS level) temp. • TSP: Spinodal temp. • TF: 1st ord. transition (FE equal) temp.
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
T
=0.5, 0=0.2
TATTECTSPTF
00 0.1 0.2 0.3 0.4 0.5
0
0.002
0.004
0.006
0.008
0.01
Phase space picture (Noiseless case)
✏
っs
⇢0⇢AT ⇢CR
✏
x0
Phase space picture (Noiseless case)✏ ✏
⇢Hard
(probably poly.)
Easy (no perfect)
Easy (perfect)
Hard (probably exp. by naive MC)
s(✏)
✏
s(✏)
✏
s(✏)
✏
s(✏)
✏
↵ � ⇢0
config.
✏
0 0 0 0
⇢0⇢AT ⇢CR
x0
✏
x0
数値実験 (Noiseless limit)
(i)�x
= 1,�⇠
= 0
(1 MCS=N random column flip)
Benchmark• Benchmark=Test on Random data
⇠µ ⇠ N (0,�2⇠ )
• Annealing Schedule
(a = 1, · · · , 100)µa = T�1
a = 10�8 + ra�1 � 1, r = 1.1
• Scaling of the problemM = ↵N
N = 100, 200, 400
Aµi ⇠ N (0, N�1)
⌧a = ⌧
y = Ax0 + ⇠
P (x0i) = ⇢0N (0,�2x
) + (1� ⇢0)�(x0i)
✏ = hHi /N ↵, ⇢0, ⇢ = O(1)
• On this setup, the performance of simulated annealing (SA ) is examined
minc:|c|=K
{H(c)}
⌧ = 5 MCS
Agreement with analytical curve: SA follows equilibrium state
⇢0 ⇢0
✏ x(T
) ✏x
=1
2N
⌦||x0 � x||22
↵
Easy case
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
T=0.5, 0=0.2
TATTECTSPTF
00 0.1 0.2 0.3 0.4 0.5
0
0.002
0.004
0.006
0.008
0.01
っs
⇢0⇢AT ⇢CR
っsAnnealing path
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
T=0.5, 0=0.2
TATTECTSPTF
00 0.1 0.2 0.3 0.4 0.5
0
0.002
0.004
0.006
0.008
0.01
っs
⇢0⇢AT ⇢CR
っsAnnealing path
⌧ = 5 MCS
⇢0 ⇢0
✏ x(T
)
`1Easy but cannot
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
T=0.5, 0=0.2
TATTECTSPTF
00 0.1 0.2 0.3 0.4 0.5
0
0.002
0.004
0.006
0.008
0.01
っs
⇢0⇢AT ⇢CR
っsAnnealing path
⌧ = 5 MCS
⇢0 ⇢0
✏ x(T
)
`1Easy but cannot
`1
Good agreement with Analytical curve ↓
SA outperforms the relaxation
⌧ = 5 MCS
Metastable state traps the system
⇢0 ⇢0
✏ x(T
)
1st ord. tr. prevents
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
0.25T
=0.4, 0=0.2
TATTECTSPTF
0
っs
Annealing path
⌧ = 100 MCS⌧ = 5 MCS
✏ x(T
)
⇢0
✏ x(T
)
⇢0
Avoid 1st ord. tr.
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
0.25T
=0.4, 0=0.2
TATTECTSPTF
0
っs
Annealing path
✏ x(T
)
⌧ = 100 MCS⌧ = 5 MCS
✏ x(T
)
⇢0
✏ x(T
)
⇢0
Avoid 1st ord. tr.
0 0.1 0.2 0.3 0.40
0.05
0.1
0.15
0.2
0.25T
=0.4, 0=0.2
TATTECTSPTF
0
っs
Annealing path
✏ x(T
)The computational time of SA is scaled as
if τ is O(1). (The same order as the convex solvers)
O(Nµ⌧NNMC) = O(↵⇢⌧NµN3)
Summary• The performance of the SA was examined in SAP
• Analytical Result → Benchmark Test • Rapid SA works fairly well in several situations
• It outperforms the relaxation • But it fails to find when metastable appears
• Finding a nice value of K=Nρ is crucial • Future works
• Application to real data • Diminish disadvantages
• Extended ensemble, seeding trick, good initial ( )
• T>0 solutions can be useful in noisy cases • SA-based cross validation
`1
`p relaxation?
TO, YK: J. Phys. Conf. 699 (2016) 012017
Summary• The performance of the SA was examined in SAP
• Analytical Result → Benchmark Test • Rapid SA works fairly well in several situations
• It outperforms the relaxation • But it fails to find when metastable appears
• Finding a nice value of K=Nρ is crucial • Future works
• Application to real data • Diminish disadvantages
• Extended ensemble, seeding trick, good initial ( )
• T>0 solutions can be useful in noisy cases • SA-based cross validation
`1
`p relaxation?
TO, YK: J. Phys. Conf. 699 (2016) 012017
SA+愚直CVで ハイパーパラメータ選択
C.f.
TO and YK: EUSIPCO2016 Proceedings, arXiv:1603.01399
LOO data, solution, and prediction• Leave-one-out (LOO) data
Remove µ=1st data
Ac
xcy
≒
xc
≒
y\1
A\1c
minc:|c|=K
minxc
1
2||y\µ �A\µ
c
x
c
||22 x̂
\µc = ((A\µ
c )TA\µc )�1(A\µ
c )Ty\µ
H\µ(c) =1
2
⇣
y\µ⌘T n
(I �A\µc (A\µ
c )TA\µc )�1(A\µ
c )To
y\µ
c\µSA
• Prediction quality to µth data:
: The optimal solution for the µth LOO system
aµ=1µ=1st row vector:
test data prediction
✏µ(K) ⌘ 1
2
⇣yµ � a
Tµ x̂
\µc\µ
⌘2
LOO error• We define the LOO CV error (LOOE) as the mean
✏LOO(K) =1
M
MX
µ=1
✏µ(K)
• Issue to Computational Cost • Not cheap. Need to rerun (rapid) SA M times
• The scale is • |SK|: Number of trial values of K • Kmax: Maximum of K • NSA: Computational cost for one SA run. O(MNK)
• But easy to parallelize! The significant part is just O(NSA).
O(M |SK |Kmax
NSA)
Benchmark on Synthetic Data• Random data, 100 temp. points, 5 MCS at each temp..
ρ0 0.05 0.1 0.15 0.2
ϵ
0
0.1
0.2
0.3
0.4
α=0.5,ρ0=0.1,σx2=10,σ
ξ
2=0.1
N=100N=200N=400RSATρ0
ρ0 0.05 0.1 0.15 0.2
ϵ LO
O,ϵ
g
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
α=0.5,ρ0=0.1,σx2=10,σ
ξ
2=0.1
N=100N=200N=400RSATρ0ρ*
Training error LOO error
• The match in training error is very good, while in test error it deviates from the analytical solution for large ρ
, , , ⇠µ ⇠ N (0,�2⇠ )Aµi ⇠ N (0, N�1)y = Ax0 + ⇠
P (x0i) = ⇢0N (0,�2x
) + (1� ⇢0)�(x0i)
Benchmark on Synthetic Data⇠µ ⇠ N (0,�2
⇠ )Aµi ⇠ N (0, N�1)y = Ax0 + ⇠
P (x0i) = ⇢0N (0,�2x
) + (1� ⇢0)�(x0i)
• Randomly generated data⇐ Analytical solution is available
ρ0 0.05 0.1 0.15 0.2
ϵ
0
0.1
0.2
0.3
0.4
α=0.5,ρ0=0.1,σx2=10,σ
ξ
2=0.1
N=100N=200N=400RSATρ0
ρ0 0.05 0.1 0.15 0.2
ϵ LO
O,ϵ
g
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
α=0.5,ρ0=0.1,σx2=10,σ
ξ
2=0.1
N=100N=200N=400RSATρ0ρ*
Training error LOO error
• The match in training error is very good, while in test error it deviates from the analytical solution for large ρ
- Why the LOO error is so different from the RS T=0 prediction? - Trapping effect at T=TAT
- In rapid annealing, the system’s state almost ceases to evolve - Does not reach to the equilibrium solution
- The states in T<TAT are a lot for large ρ - Their training error values are almost similar - While the test error much changes as T lowers←Overfitting →Thus, trapping is favorable in the present context
✏ ✏
Application to SuperNovae data analysis• Type Ia supernovae produce consistent peak luminosity (absolute
magnitude at maximum) determined by “Chandrasekhar limit”. – In terms of the 0th approximation
• “Standard candle” of the Universe -> distance • “Accelerating expansion of the Universe”, Nobel prize in 2011
�Riess et al (2001)�
(red shift)�
Application to SuperNovae data analysis• Calibration of the peak luminosity
http://heracles.astro.berkeley.edu/sndb/
M µ ≈ M 0 + Aµ ,1β1 + Aµ ,2β2 + Aµ ,3β3 +…+ Aµ ,276β276
µth data� const�
Color index�Light curve width�
Other candidates
K 1 2 3 4 5 6
ε 0.0308 0.0202 0.0176 0.0155 0.0138 0.0122
εLOO 0.0328 0.0239 0.0281 0.0331 0.0334 0.0362
• SA-based CV result (100 temp. points, 5 MCS at each temp)
• Berkeley supernova database • M=78, N=276 (cf. M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015) )
Training err.
LOO err.
• K=2 is the best model • Comp. time: ~5 sec. for each LOO system
M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015)Database: http://heracles.astro.berkeley.edu/sndb/
Application to SuperNovae data analysis
variables� 2� *� *� *� *�Times
selected� 78� 0� 0� 0� 0�
variables� 2� 1� 275� *� *�Times
selected� 78� 77� 1� 0� 0�
variables� 2� 1� 233� 14� 69�Times
selected� 78� 76� 69� 3� 2�
variables� 2� 1� 233� 94� 225�Times
selected� 78� 59� 56� 49� 13�
variables� 2� 36� 223� 225� 6�Times
selected� 78� 37� 33� 31� 27�
K=1�
K=2�
K=3�
K=4�
K=5�
M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015)Database: http://heracles.astro.berkeley.edu/sndb/
Application to SuperNovae data analysis
variables� 2� *� *� *� *�Times
selected� 78� 0� 0� 0� 0�
variables� 2� 1� 275� *� *�Times
selected� 78� 77� 1� 0� 0�
variables� 2� 1� 233� 14� 69�Times
selected� 78� 76� 69� 3� 2�
variables� 2� 1� 233� 94� 225�Times
selected� 78� 59� 56� 49� 13�
variables� 2� 36� 223� 225� 6�Times
selected� 78� 37� 33� 31� 27�
K=1�
K=2�
K=3�
K=4�
K=5�
“Color index”� “Light curve width”�
log λ -6 -5 -4 -3 -2 -1
CV
erro
rs
0.05
0.1
0.15
0.2Approx 1Approx 210-fold CV
K=9 K=6
one-σ
M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015)Database: http://heracles.astro.berkeley.edu/sndb/
Application to SuperNovae data analysis
• Comparison with LASSO ( relaxation)`1• With the one-sigma rule,
LASSO result says K=6. • However, Uemura et al.
finally concluded K=2 • By the careful analysis of
several statistical models and their expert knowledge
• Our result reproduces the conclusion without such a special knowledge and treatment!
Summary• Simulated annealing (SA) based Cross validation (CV) in sparse
linear regression (SLR) was investigated • For low ρ, the SA result well fits to the analytical RS one. • For large ρ, the SA result deviates from the RS result.
• The RSB effect gets more significant and the dynamics (almost) ceases to evolve at TAT in the rapid schedule
• But it is advantageous for giving smaller CV error, since for large ρ, the solution at low temperatures overfit to noise.
• Application to the SuperNovae data • The well accepted model is reproduced naturally. • Reliability criterion by the distribution of variables?
• Messages • Typically, SLR is not so difficult (even NP-hard in worst) • Relaxation ( etc.) should be used more carefully`1
TO and YK: arXiv:1603.01399; EUSIPCO2016 proceedings