スパース線形回帰における統計力学的アプローチ -天文学のデー … · 2+A...

スパース線形回帰における統計力学的アプローチ -天文学のデータを例に-

小渕智之1 東工大　情報理工学院　数理計算科学系1

共同研究者樺島祥介1、中西義典2、岡田真人3、植村誠4

東大駒場2、東大新領域3、広島大学4

• The solution is ``sparse’’ (many zeros) • A representative: regularization

Underdetermined Linear Equationy = Ax E.g. M=2, N=3

y1 = A11x1 +A12x2 +A13x3

y2 = A21x1 +A22x2 +A23x3=M

(

(N

Given y and A, compute x ←not unique

||x||0 : # of nonzero components

minx

1

2||y �Ax||22 s.t. ||x||0 K = N⇢

`0

x̂(�) = argminx

⇢1

2||y �Ax||22 + �||x||0

�

: Standard form

: Lagrange form

Variants of regularizations

x̂ = argminx

⇢1

2||y �Ax||22 +R(x)

�• General regularization

regularization`p

Cusps lead sparseness (x=0)

||x||pp = X

i

|xi|p!

p=1 is common (LASSO) - p>1: Non sparse - p<1: Nonconvex and local minima (Comp. difficult)

本日の内容1. p=1で近似的Cross validation (CV)

• TO and YK: J. Stat. Mech. (2016) (LASSO)

2. p=0は本当に難しいのか？アルゴリズムの典型性能と相転移 • TO, YN, YK, and MO: In preparation

✤ p=0をモンテカルロで解く(Simulated annealing (SA) ) • TO and YK: J. Phys.: Conf. Ser. 699 (2016) 012017(1-12)

3. SA+愚直CVでハイパーパラメータ選択

• TO and YK: arXiv:1603.01399（EUSIPCO2016 proceedings）

LASSO and

Approximate Cross Validation

C.f. TO and Y. Kabashima, J. Stat. Mech. (2016)

LASSOとハイパーパラメータ選択• LASSO (Lagrange form)

• 正則化項のパラメータ(λ)はどう決めるか？

• いくつかの流儀

• 情報量規準

• (経験)ベイズ法

• 交差検証法(Cross validation, (CV) )

x̂(�) = argminx

⇢1

2||y �Ax||22 + �||x||1

�H(x)

交差検証法（CV）

Data

TestTraining

Test

Test

Test

• データをTraining setとTest setに分割 • Training Set でパラメータを訓練し • Test Setで予測能力を測る

Average

CV Error= 予測誤差の推定値

k等分に分割する方法をk-fold CV: k=10やk=M(=leave-one-out) がよく用いられる（左図はk=4） - kは大きい方が　　精度はいいが計算量がかかるFold 1

Fold 2

Fold 3

Fold 4

交差検証法（CV）

Data

TestTraining

Test

Test

Test

• データをTraining setとTest setに分割 • Training Set でパラメータを訓練し • Test Setで予測能力を測る

Average

CV Error= 予測誤差の推定値

k等分に分割する方法をk-fold CV: k=10やk=M(=leave-one-out) がよく用いられる（左図はk=4） - kは大きい方が　　精度はいいが計算量がかかるFold 1

Fold 2

Fold 3

Fold 4

CVを低計算量でできるようにしたい

LOOCVと線形応答近似• １つ抜きCV（Leave-one-out CV, LOOCV)

←M回最適化 (計算量高)

H(x̂)�H(x̂� d) ⇡X

µ

h

µ(x̂) · d

x̂

\µ ⇡ x̂� �\µh

µ(x̂) �\µ =@ x̂\µ

@h,

• 線形応答近似 • 摂動の導出：コスト関数をについて展開 • 応答係数（帯磁率）を計算：後述

d = x̂� x̂

\µ

�\µ

x̂

\µ(�) = argminx

8<

:1

2

X

⌫( 6=µ)

y⌫ �

X

i

A⌫ixi

!2

+ �||x||1

9=

;

✏LOO(�) =1

2M

X

µ

yµ �

X

i

Aµix̂\µi (�)

!2

近似的交差検証公式• 線形近似＋モデルの線形性＋LOOEの定義式

++ ��1 = (�\µ)�1 + aµa>µ ✏LOO =

1

2M

MX

µ=1

yµ �

X

i

Aµix̂\µi

!2

• あとは、帯磁率が計算できるか否か • 正則化項に特異性が無ければ？

• コスト関数ヘシアンの逆: • LASSOの場合

• Active変数に関してのみの、コスト関数ヘシアンの逆 • Active変数集合は摂動で変化しないと仮定

x̂

\µ ⇡ x̂� �\µh

µ(x̂)

✏LOO ⇡ 1

2M

X

µ

0

@1�X

i,j

AµiAµj�ij

1

A�2

yµ �X

i

Aµix̂i

!2

� =⇣A>A

⌘�1

�SASA =⇣A>

⇤SAA⇤SA

⌘�1

近似的交差検証公式2つ

• Approx. 1

• Approx. 2=Approx.1 + Random matrix theory

1�X

i,j

AµiAµj�ij ⇡✓

↵

↵� ⇢(�)

◆�1

✏LOO ⇡ 1

2M

X

µ

0

@1�X

i,j

AµiAµj�ij

1

A�2

yµ �X

i

Aµix̂i

!2

�SASA =⇣A>

⇤SAA⇤SA

⌘�1

✏LOO ⇡ 1

2M

MX

µ=1

0

@1�X

i,j2SA

Aµi

⇣A

>⇤SA

A⇤SA

⌘�1

ijAµj

1

A�2

yµ �X

i

Aµix̂i

!2

✏LOO ⇡✓

↵

↵� ⇢

◆2 1

2M||y �Ax̂||22 =

✓↵

↵� ⇢

◆2

✏Train

↵ =M

N, ⇢(�) =

||x̂||0N

DEMO: Application to

SuperNovae Data Analysis

Application to SuperNovae data analysis•  Type Ia supernovae produce consistent peak luminosity (absolute

magnitude at maximum) determined by “Chandrasekhar limit”. –  In terms of the 0th approximation

•  “Standard candle” of the Universe -> distance •  “Accelerating expansion of the Universe”, Nobel prize in 2011

�Riess et al (2001)�

(red shift)�

Application to SuperNovae data analysis

Napolinano97�

(time: days)�

Light curve width�

Peak luminosity (magnitude at maximum) �

Candonau+87�

Color index�

•  However, in reality, the peak luminosity varies owing to various factors.

•  Two known major factors –  “Light curve width” (radioactive decay) –  “Color index” (interstellar extinction)

-6 -5 -4 -3 -2 -10.050.1

0.15 Approx 1

-6 -5 -4 -3 -2 -1

CV e

rrors

0.050.1

0.15 Approx 2

log λ -6 -5 -4 -3 -2 -1

0.050.1

0.15 10-fold CV

Application to SuperNovae data analysis• Calibration of the peak luminosity

http://heracles.astro.berkeley.edu/sndb/

M µ ≈ M 0 + Aµ ,1β1 + Aµ ,2β2 + Aµ ,3β3 +…+ Aµ ,276β276

µth data� const�

Color index�Light curve width�

Other candidates

• Berkeley supernova database • M=78, N=276 (cf. M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015) )

log λ -6 -5 -4 -3 -2 -1

CV er

rors

0.05

0.1

0.15

0.2Approx 1Approx 210-fold CV

-6 -5 -4 -3 -2 -10.050.1

0.15 Approx 1

-6 -5 -4 -3 -2 -1

CV e

rrors

0.050.1

0.15 Approx 2

log λ -6 -5 -4 -3 -2 -1

0.050.1

0.15 10-fold CV






Other candidates


log λ -6 -5 -4 -3 -2 -1

CV er

rors

0.05

0.1

0.15


Minimum+one-sigma rule gives K=6

10-fold App. 1 App. 231.6 s 3.2 s 2.85 s

Actual computational time:

Summary• Leave-one-out CV in LASSO(+TV) has been examined

• An efficient approximation of the LOOE has been invented

• Perturbation assuming M and N are large enough • Reproducing the direct CV result in much shorter time

は本当に難しいのか？アルゴリズムの典型性能と相転移

& モンテカルロベースアルゴリズム

C.f.

TO and YK, J. Phys.: Conf. Ser. 699 (2016) 012017(1-12) (2016);

Y. Nakanishi, TO, M. Okada, YK, J. Stat. Mech. (2016) 063302;

another paper in preparation

`0

情報統計力学• 情報統計力学：統計力学の手法で情報科学の問題を解析

1980

1990

2000

~現在

✤ニューラルネット • 連想記憶容量の評価・ボルツマンマシン • cf) Hopfiled 1982, Hinton 1985, Gardner-Derrida 1988

✤学習理論 • 汎化能力の評価・ベイズ推定 • cf) Levin 1990, Gyorgyi-Tishby 1990

✤符号・通信・組合せ最適化問題 • 典型性能評価・解探索アルゴリズム・相転移と可解性 • cf) Kabashima-Saad 1999, Tanaka 2001, Cocco-

Monasson 2001, Mezard et al. 2003 ✤スパース推定・逆問題・機械学習

• cf) Kabashima et al. 2009, Krzakala et al. 2012

2010

情報統計力学• 情報統計力学の強み？

• 統計力学由来のアルゴリズム・計算手法 • 平均場近似・Cavity法・モンテカルロ法

• 大自由度系の処理が当初から念頭にある• 独自の問題の特徴づけ・性能評価法

• 相転移・臨界現象

情報統計力学

• 圧縮センシングを例に

• 独自の問題の特徴づけ・性能評価法 • 相転移・臨界現象

min

x

⇢1

2

||y �Ax||�

subj. to ||x||pp ˆK = N ⇢̂P(p) :

y = Ax0, ||x0||0 = K0 = N⇢0

• 以下のスパース推定を考えたとき

• スパースシグナルx0を観測結果yから完璧に復元できるのはどういうときか？

• 圧縮センシングを例に

• 独自の問題の特徴づけ・性能評価法 • 相転移・臨界現象

min

x

⇢1

2

||y �Ax||�

subj. to ||x||pp ˆK = N ⇢̂P(p) :

y = Ax0, ||x0||0 = K0 = N⇢0

• 以下のスパース推定を考えたとき

• スパースシグナルx0を観測結果yから完璧に復元できるのはどういうときか？

limit`1

limit`0

⇢0

解析解（Kabashima et al. (2009)）数値解（Donoho-Tanner (2006)）

情報統計力学

復元可能領域を増やしたい• p<1を考えれば、、、 ★妥当なアルゴリズムが組めるか？

• p=0 • 貪欲法(OMPなど)

• スケーリングリミットには全然到達しない • ベイズとMessage passing (Krzakala et al. 2012)

• limitにPolynomial timeで到達できる！ • 実問題には、、、（ベイズだから仮定が多い）

• モンテカルロサンプリング！(TO et al. 2016~) • ベイズでなく、最適化問題ベースで定式化

`0

⇢0

X

i

ci = 3 = 6⇥ 1

2

Introduction of Support Indicator

y = Ax

N = 6

((

M = 4

c1 = 0c2 = 1

c4 = 1

c6 = 1

c3 = 0

c5 = 0

⇡

y ⇡ x2a2 + x4a4 + x6a6

⇢

• Introduce binary variables c expressing which variables are used • We call it sparse weight, and S={i|c_i=1} support.

Solution on Fixed Support

基底状態探索問題（組合せ最適化) Worst is NP hard (Natarajan 95.)

　　　　　全探索は無理→近似的探索→モンテカルロHamiltonian

c1 = 0c2 = 1

c4 = 1

c6 = 1

c3 = 0

c5 = 0

Ac

xc

• Shrunken expression on support

=

minx

1

2||y �Ax||22 s.t. ||x||0 K = min

c:|c|=Kminxc

1

2||y �A

c

x

c

||22

x̂c = (ATcAc)

�1ATc y

= minc:c=K

⇢H(c) =

1

2yT

�I �Ac(A

TcAc)

�1ATc

�y

�

Monte Carlo: Metropolis Algorithm

• Energy difference, Boltzmann distribution, Metropolis criterion

• Note!: Comp. cost of Δ is O(MK~N2) thanks to matrix inversion formula

• Single column flipc = (0, 1, 0, 1, 0, 1)T c0 = (0, 1, 0, 1, 1, 0)T

�(c ! c0) =1

2yT

n

Ac�

ATcAc

��1AT

c �Ac0�

ATc0Ac0

��1AT

c0

o

y

P (c;µ, ⇢) = (1/Z)� (|T |�N⇢) e�µH(c)

Paccept(c ! c0) = minn

1, e�µ�(c!c0)o

• 自由度K=|c|が自動的に一定

Sampling at T=0: Simulated Annealing(SA)

• Simulated Annealing(SA): • An optimization solver using MCMC

• Start from High T=1/µ • Update sparse weights c with gradually decreasing T • At T~0, the sampled c is the minimum of ✏(c)

SA and phase spaceHigh TMiddle TLow TT≒0

T (t) >A(N)

log(t+ 2)

✏

Benchmark• Benchmark=Test on Random data

⇠µ ⇠ N (0,�2⇠ )

• Annealing Schedule

(a = 1, · · · , 100)µa = T�1

a = 10�8 + ra�1 � 1, r = 1.1

• Scaling of the problemM = ↵N

N = 100, 200, 400

Aµi ⇠ N (0, N�1)

⌧a = ⌧

y = Ax0 + ⇠

P (x0i) = ⇢0N (0,�2x

) + (1� ⇢0)�(x0i)

✏ = hHi /N ↵, ⇢0, ⇢ = O(1)

• On this setup, the performance of simulated annealing (SA ) is examined

minc:|c|=K

{H(c)}

-数値実験の前に- 統計力学的解析によるアルゴリズム性能の理解

統計力学的解析

• Energy of Ising (binary) spins • Energy landscape→Entropy

• Training error=Hamiltonian

s(✏) =1

Nlog Tr

c� (✏� ✏(c))

• If y, A are Gaussian, we can compute the typical entropy by the Replica method • Note!: We fix |c|=Nρ⇔Magnetization is fixed

H(c) =1

2y>

⇣I �Ac(A

>c Ac)

�1A>c

⌘y = M✏(c)N✏(c)

エントロピーと解探索の難しさの関係

何もない系 (エントロピーは解析的で凸)

s(✏)

✏

スピングラス転移 (エントロピーの

見た目ではわからない深刻な減速を起こす)

s(✏)

✏

２次転移 (高次微分にずれ。臨界減速を起こす)

s(✏)

✏１次転移

(複数の山を持つ。山間移動は超難)

s(✏)

✏

Details in Analysis• Introduce the generating function

• Taking the average over data and basis matrix: Replica method

• Two different kinds of replicas, n and ν are needed • Due to two different dynamical variables c and x

Hamiltonian in β→∞

Phase Diagram (T-ρ)

⇢0

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

0.25

T

=0.5, 0=0.25

TATTECTSPTF

0 0 0.1 0.2 0.3 0.4 0.50

0.002

0.004

0.006

0.008

0.01

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

T

=0.5, 0=0.2

TATTECTSPTF

00 0.1 0.2 0.3 0.4 0.5

0

0.002

0.004

0.006

0.008

0.01

• TAT: AT instability temp. • TEC: Entropy crisis (RS level) temp. • TSP: Spinodal temp. • TF: 1st ord. transition (FE equal) temp.

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

T

=0.5, 0=0.2

TATTECTSPTF

00 0.1 0.2 0.3 0.4 0.5

0

0.002

0.004

0.006

0.008

0.01

Phase space picture (Noiseless case)

✏

っｓ

⇢0⇢AT ⇢CR

✏

x0

Phase space picture (Noiseless case)✏ ✏

⇢Hard

(probably poly.)

Easy (no perfect)

Easy (perfect)

Hard (probably exp. by naive MC)

s(✏)

✏

s(✏)

✏

s(✏)

✏

s(✏)

✏

↵ � ⇢0

config.

✏

0 0 0 0

⇢0⇢AT ⇢CR

x0

✏

x0

数値実験 (Noiseless limit)

(i)�x

= 1,�⇠

= 0

(1 MCS=N random column flip)

Benchmark• Benchmark=Test on Random data

⇠µ ⇠ N (0,�2⇠ )

• Annealing Schedule

(a = 1, · · · , 100)µa = T�1

a = 10�8 + ra�1 � 1, r = 1.1

• Scaling of the problemM = ↵N

N = 100, 200, 400

Aµi ⇠ N (0, N�1)

⌧a = ⌧

y = Ax0 + ⇠

P (x0i) = ⇢0N (0,�2x

) + (1� ⇢0)�(x0i)

✏ = hHi /N ↵, ⇢0, ⇢ = O(1)

• On this setup, the performance of simulated annealing (SA ) is examined

minc:|c|=K

{H(c)}

⌧ = 5 MCS

Agreement with analytical curve: SA follows equilibrium state

⇢0 ⇢0

✏ x(T

) ✏x

=1

2N

⌦||x0 � x||22

↵

Easy case

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

T=0.5, 0=0.2

TATTECTSPTF

00 0.1 0.2 0.3 0.4 0.5

0

0.002

0.004

0.006

0.008

0.01

っｓ

⇢0⇢AT ⇢CR

っｓAnnealing path

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

T=0.5, 0=0.2

TATTECTSPTF

00 0.1 0.2 0.3 0.4 0.5

0

0.002

0.004

0.006

0.008

0.01

っｓ

⇢0⇢AT ⇢CR


⌧ = 5 MCS

⇢0 ⇢0

✏ x(T

)

`1Easy but cannot

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

T=0.5, 0=0.2

TATTECTSPTF

00 0.1 0.2 0.3 0.4 0.5

0

0.002

0.004

0.006

0.008

0.01

っｓ

⇢0⇢AT ⇢CR


⌧ = 5 MCS

⇢0 ⇢0

✏ x(T

)

`1Easy but cannot

`1

Good agreement with Analytical curve ↓

SA outperforms the relaxation

⌧ = 5 MCS

Metastable state traps the system

⇢0 ⇢0

✏ x(T

)

1st ord. tr. prevents

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

0.25T

=0.4, 0=0.2

TATTECTSPTF

0

っｓ

Annealing path

⌧ = 100 MCS⌧ = 5 MCS

✏ x(T

)

⇢0

✏ x(T

)

⇢0

Avoid 1st ord. tr.

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

0.25T

=0.4, 0=0.2

TATTECTSPTF

0

っｓ

Annealing path

✏ x(T

)

⌧ = 100 MCS⌧ = 5 MCS

✏ x(T

)

⇢0

✏ x(T

)

⇢0

Avoid 1st ord. tr.

0 0.1 0.2 0.3 0.40

0.05

0.1

0.15

0.2

0.25T

=0.4, 0=0.2

TATTECTSPTF

0

っｓ

Annealing path

✏ x(T

)The computational time of SA is scaled as

if τ is O(1). (The same order as the convex solvers)

O(Nµ⌧NNMC) = O(↵⇢⌧NµN3)

Summary• The performance of the SA was examined in SAP

• Analytical Result → Benchmark Test • Rapid SA works fairly well in several situations

• It outperforms the relaxation • But it fails to find when metastable appears

• Finding a nice value of K=Nρ is crucial • Future works

• Application to real data • Diminish disadvantages

• Extended ensemble, seeding trick, good initial ( )

• T>0 solutions can be useful in noisy cases • SA-based cross validation

`1

`p relaxation?

TO, YK: J. Phys. Conf. 699 (2016) 012017

SA+愚直CVでハイパーパラメータ選択

C.f.

TO and YK: EUSIPCO2016 Proceedings, arXiv:1603.01399

LOO data, solution, and prediction• Leave-one-out (LOO) data

Remove µ=1st data

Ac

xcy

≒

xc

≒

y\1

A\1c

minc:|c|=K

minxc

1

2||y\µ �A\µ

c

x

c

||22 x̂

\µc = ((A\µ

c )TA\µc )�1(A\µ

c )Ty\µ

H\µ(c) =1

2

⇣

y\µ⌘T n

(I �A\µc (A\µ

c )TA\µc )�1(A\µ

c )To

y\µ

c\µSA

• Prediction quality to µth data:

: The optimal solution for the µth LOO system

aµ=1µ=1st row vector:

test data prediction

✏µ(K) ⌘ 1

2

⇣yµ � a

Tµ x̂

\µc\µ

⌘2

LOO error• We define the LOO CV error (LOOE) as the mean

✏LOO(K) =1

M

MX

µ=1

✏µ(K)

• Issue to Computational Cost • Not cheap. Need to rerun (rapid) SA M times

• The scale is • |SK|: Number of trial values of K • Kmax: Maximum of K • NSA: Computational cost for one SA run. O(MNK)

• But easy to parallelize! The significant part is just O(NSA).

O(M |SK |Kmax

NSA)

Benchmark on Synthetic Data• Random data, 100 temp. points, 5 MCS at each temp..

ρ0 0.05 0.1 0.15 0.2

ϵ

0

0.1

0.2

0.3

0.4

α=0.5,ρ0=0.1,σx2=10,σ

ξ

2=0.1

N=100N=200N=400RSATρ0

ρ0 0.05 0.1 0.15 0.2

ϵ LO

O,ϵ

g

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

α=0.5,ρ0=0.1,σx2=10,σ

ξ

2=0.1

N=100N=200N=400RSATρ0ρ*

Training error LOO error

• The match in training error is very good, while in test error it deviates from the analytical solution for large ρ

, , , ⇠µ ⇠ N (0,�2⇠ )Aµi ⇠ N (0, N�1)y = Ax0 + ⇠

P (x0i) = ⇢0N (0,�2x

) + (1� ⇢0)�(x0i)

Benchmark on Synthetic Data⇠µ ⇠ N (0,�2

⇠ )Aµi ⇠ N (0, N�1)y = Ax0 + ⇠

P (x0i) = ⇢0N (0,�2x

) + (1� ⇢0)�(x0i)

• Randomly generated data⇐ Analytical solution is available

ρ0 0.05 0.1 0.15 0.2

ϵ

0

0.1

0.2

0.3

0.4

α=0.5,ρ0=0.1,σx2=10,σ

ξ

2=0.1

N=100N=200N=400RSATρ0

ρ0 0.05 0.1 0.15 0.2

ϵ LO

O,ϵ

g

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

α=0.5,ρ0=0.1,σx2=10,σ

ξ

2=0.1

N=100N=200N=400RSATρ0ρ*

Training error LOO error

• The match in training error is very good, while in test error it deviates from the analytical solution for large ρ

- Why the LOO error is so different from the RS T=0 prediction? - Trapping effect at T=TAT

- In rapid annealing, the system’s state almost ceases to evolve - Does not reach to the equilibrium solution

- The states in T<TAT are a lot for large ρ - Their training error values are almost similar - While the test error much changes as T lowers←Overfitting →Thus, trapping is favorable in the present context

✏ ✏

Application to SuperNovae data analysis•  Type Ia supernovae produce consistent peak luminosity (absolute

magnitude at maximum) determined by “Chandrasekhar limit”. –  In terms of the 0th approximation

•  “Standard candle” of the Universe -> distance •  “Accelerating expansion of the Universe”, Nobel prize in 2011

�Riess et al (2001)�

(red shift)�






Other candidates

K 1 2 3 4 5 6

ε 0.0308 0.0202 0.0176 0.0155 0.0138 0.0122

εLOO 0.0328 0.0239 0.0281 0.0331 0.0334 0.0362

• SA-based CV result (100 temp. points, 5 MCS at each temp)


Training err.

LOO err.

• K=2 is the best model • Comp. time: ~5 sec. for each LOO system

M. Uemura, K. S. Kawabata, S. Ikeda, K. Maeda (2015)Database: http://heracles.astro.berkeley.edu/sndb/


variables� 2� *� *� *� *�Times

selected� 78� 0� 0� 0� 0�

variables� 2� 1� 275� *� *�Times

selected� 78� 77� 1� 0� 0�

variables� 2� 1� 233� 14� 69�Times

selected� 78� 76� 69� 3� 2�


selected� 78� 59� 56� 49� 13�


selected� 78� 37� 33� 31� 27�

K=1�

K=2�

K=3�

K=4�

K=5�



variables� 2� *� *� *� *�Times

selected� 78� 0� 0� 0� 0�

variables� 2� 1� 275� *� *�Times

selected� 78� 77� 1� 0� 0�


selected� 78� 76� 69� 3� 2�


selected� 78� 59� 56� 49� 13�


selected� 78� 37� 33� 31� 27�

K=1�

K=2�

K=3�

K=4�

K=5�

“Color index”� “Light curve width”�

log λ -6 -5 -4 -3 -2 -1

CV

erro

rs

0.05

0.1

0.15


K=9 K=6

one-σ



• Comparison with LASSO (　 relaxation)`1• With the one-sigma rule,

LASSO result says K=6. • However, Uemura et al.

finally concluded K=2 • By the careful analysis of

several statistical models and their expert knowledge

• Our result reproduces the conclusion without such a special knowledge and treatment!

Summary• Simulated annealing (SA) based Cross validation (CV) in sparse

linear regression (SLR) was investigated • For low ρ, the SA result well fits to the analytical RS one. • For large ρ, the SA result deviates from the RS result.

• The RSB effect gets more significant and the dynamics (almost) ceases to evolve at TAT in the rapid schedule

• But it is advantageous for giving smaller CV error, since for large ρ, the solution at low temperatures overfit to noise.

• Application to the SuperNovae data • The well accepted model is reproduced naturally. • Reliability criterion by the distribution of variables?

• Messages • Typically, SLR is not so difficult (even NP-hard in worst) • Relaxation ( etc.) should be used more carefully`1

TO and YK: arXiv:1603.01399; EUSIPCO2016 proceedings

スパース線形回帰における統計力学的アプローチ -天文学のデー … · 2+A...

Documents

Transcript of スパース線形回帰における統計力学的アプローチ -天文学のデー … · 2+A...

スパース線形回帰における 統計力学的アプローチ -天文学のデー … · 2+A...

Documents

Transcript of スパース線形回帰における 統計力学的アプローチ -天文学のデー … · 2+A...

スパース線形回帰における統計力学的アプローチ -天文学のデー … · 2+A...

Transcript of スパース線形回帰における統計力学的アプローチ -天文学のデー … · 2+A...