双眼鏡総合カタログ...双眼鏡総合カタログ 双眼鏡 / フィールドスコープ / デジスコーピングシステム / ゴルフ用レーザー距離計 ネイチャースコープ
Sugiyama Lab, NII - 隣接代数と双対平坦構造を 用い …Nov.20–23,2019 IBIS2019...
Transcript of Sugiyama Lab, NII - 隣接代数と双対平坦構造を 用い …Nov.20–23,2019 IBIS2019...
Nov. 20–23, 2019IBIS 2019
隣接代数と双対平坦構造を用いた学習
杉山 麿人(国立情報学研究所,JSTさきがけ研究者)
Matrix Balancingp₁₁ p₁₂
p₂₁ p₂₂
1/41
Matrix Balancing
r₁s₁ p₁₁ r₁s₂ p₁₂
r₂s₁ p₂₁ r₂s₂ p₂₂
p₁₁ p₁₂
p₂₁ p₂₂
s₁ 0
0 s₂
r₁ 0
0 r₂
∑j r₁sj p₁j = 1
∑j r₂sj p₂j = 1
∑i ris₁ pi₁ = 1 ∑i ris₂ pi₂ = 1
=
Find r and s:(Make doublystochasticmatrix)
1/41
Sinkhorn-Knopp Algorithm• Alternately rescale all rows and columnsof a matrix 𝑃 to sum to 1
• Commonly used to compute entropy-regularizedOptimal transport (Wasserstein distance)
2/41
Revisit Matrix Balancing [ICML2017]
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
3/41
Introduce 𝜼
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
η₁₁
η₂₁
η₃₁
η₁₂ η₁₃
4/41
Introduce 𝜼
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
η₁₁
η₂₁
η₃₁
η₁₂ η₁₃
4/41
Introduce 𝜼
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
η₁₁
η₂₁
η₃₁
η₁₂ η₁₃
4/41
Introduce 𝜼
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
η₁₁
η₂₁
η₃₁
η₁₂ η₁₃
4/41
Introduce 𝜼
3
2
1
2 1p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
η₁₁
η₂₁
η₃₁
η₁₂ η₁₃
4/41
Introduce 𝜽
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
θ₂₂ θ₂₃
θ₃₂ θ₃₃
θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁
–+
5/41
Introduce 𝜽
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
θ₂₂ θ₂₃
θ₃₂ θ₃₃
θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁
–+
5/41
Introduce 𝜽
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
θ₂₂ θ₂₃
θ₃₂ θ₃₃
θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁
–+
5/41
Introduce 𝜽
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
θ₂₂ θ₂₃
θ₃₂ θ₃₃
θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁
–+
5/41
Balancing as Constraints on 𝜼 and 𝜽
3
2
1
2 1p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
η₁₁
η₂₁
η₃₁
η₁₂ η₁₃
θ₂₂ θ₂₃
θ₃₂ θ₃₃
θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁
–+
Matrix balancing ⇔Satisfy ηi₁ = η₁i = 3 – i + 1with keeping all θij
6/41
Natural Gradient• Given 𝑃 ∈ ℝ𝑛×𝑛, introduce (𝜃, 𝜂) aslog𝑝𝑖𝑗 =
∑
𝑘≤𝑖
∑
𝑙≤𝑗𝜃𝑘𝑙, 𝜂𝑖𝑗 =
∑
𝑘≥𝑖
∑
𝑙≥𝑗𝑝𝑘𝑙
• Let 𝐼 = {11, 12,… , 1𝑛, 21,… , 𝑛1}, 𝜽 = (𝜃𝑖)𝑇𝑖∈𝐼 , 𝜼 = (𝜂𝑖)𝑇𝑖∈𝐼• Using Fisher information matrix 𝐺 ∈ ℝ|𝐼|×|𝐼| s.t.𝑔(𝑖𝑗)(𝑘𝑙) = 𝜂max{𝑖,𝑘}max{𝑗,𝑙} − 𝜂𝑖𝑗𝜂𝑘𝑙, update formula is𝜽next = 𝜽 − 𝐺−1(𝜼 − 𝜼correct)
7/41
Results on Hessenberg MatrixN
umbe
r of i
tera
tion
100 500 5000100102104106
nnRu
nnin
g tim
e (s
ec.)
100 500 5000
10410210010–2
106108
> x1000faster
NaturalBNEWTSinkhorn
8/41
Introduce Partial Order Structure
p₁₁ p₁₂ p₁₃
p₂₁ p₂₂ p₂₃
p₃₁ p₃₂ p₃₃
s(p(s), θₛ, ηₛ)
9/41
Partially Ordered Sets (Posets)
∅
{a,b,c}
{a,b} {a,c} {b,c}
{a} {b} {c}
{a,b,c}
{a,b} {a,c} {b,c}
{a} {b} {c}
2{a,b,c} ℕ² Network
10/41
Incidence Algebra• Incidence algebra is defined over a poset (𝑆,≤)
– (Closed) Interval [𝑎, 𝑏] = {𝑠 ∈ 𝑆 ∣ 𝑎 ≤ 𝑠 ≤ 𝑏}
• Members of the incidence algebra arefunctions 𝑓(𝑎, 𝑏) from intervals [𝑎, 𝑏] to a scalar with(𝑓 + 𝑔)(𝑎, 𝑏) = 𝑓(𝑎, 𝑏) + 𝑔(𝑎, 𝑏)
(𝑓𝑔)(𝑎, 𝑏) =∑
𝑎≤𝑥≤𝑏𝑓(𝑎, 𝑥)𝑔(𝑥, 𝑏) (convolution)
11/41
Analogy to Matrix Multiplication• For [𝑎, 𝑏], define 𝒇,𝒈 ∈ ℝ|[𝑎,𝑏]| as
𝒇 =⎡⎢⎣
𝑓(𝑎, 𝑎)⋮
𝑓(𝑎, 𝑏)
⎤⎥⎦, 𝒈 =
⎡⎢⎣
𝑔(𝑎, 𝑎)⋮
𝑔(𝑏, 𝑎)
⎤⎥⎦
• For 𝑓 and 𝑔,(𝑓𝑔)(𝑎, 𝑏) = 𝒇𝑇𝒈
12/41
Special Elements• Delta function 𝛿:
𝛿(𝑎, 𝑏) = { 1 if 𝑎 = 𝑏0 otherwise
• Zeta function 𝜁: (integral)
𝜁(𝑎, 𝑏) = { 1 if 𝑎 ≤ 𝑏0 otherwise
• Möbius function 𝜇 = 𝜁−1: 𝜁𝜇 = 𝛿 (differential)13/41
Möbius Inversion Formula• Given a poset 𝑆, for any functions 𝑓, 𝑔 ∶ 𝑆 → ℝ,the Möbius inversion formula is given as⎧⎪
⎨⎪⎩
𝑔(𝑥) =∑
𝑠∈𝑆𝜁(𝑠, 𝑥)𝑓(𝑠) =
∑
𝑠≤𝑆𝜁(𝑠, 𝑥)𝑓(𝑠) =
∑
𝑠≤𝑥𝑓(𝑠)
𝑓(𝑥) =∑
𝑠∈𝑆𝜇(𝑠, 𝑥)𝑔(𝑠) =
∑
𝑠≤𝑆𝜇(𝑠, 𝑥)𝑔(𝑠)
14/41
E.g.1: Inclusion-Exclusion Principle
A∩B∩C
A∪B∪C
A∩B A∩C B∩C
A B C
A∩B∩C
A∪B∪C
A∩B A∩C B∩C
A B C f (X) = |X| g(X) = |X \ ⋃ Y|Y ⊂ X
f (X) = ∑Y≤X g(X)g(X) = ∑Y≤X μ(Y,X) f (Y)
|A∪B∪C|= |A|+|B|+|C|–|A∪B|–|A∪C|–|B∪C|+|A∩B∩C|
15/41
E.g.2: Divisibility• Divisibility poset: 𝑎 ≤ 𝑏 ⇐⇒ 𝑏|𝑎 (𝑎 divides 𝑏)• The Möbius function: 𝑛 = 𝑏∕𝑎 and
𝜇(𝑛) = { (−1)𝑘 if 𝑛 = 𝑝1𝑝2…𝑝𝑘 for 𝑘 distinct primes0 otherwise
– 𝜇(𝑎, 𝑏) = 𝜇(𝑏|𝑎), the Möbius function in number theory• The Riemann zeta function 𝜁 is given by1∕𝜁(𝑠) =
∑∞
𝑛=1𝜇(𝑛)∕𝑛𝑠
16/41
Log-Linear Model on Poset [ICML2017]
• For probability 𝑝∶𝑆 → (0, 1) with∑𝑥∈𝑆 𝑝(𝑥) = 1,introduce 𝜽 and 𝜼 as𝜃𝑥 =
∑𝑠∈𝑆
𝜇(𝑠, 𝑥) log𝑝(𝑠),
𝜂𝑥 =∑
𝑠∈𝑆𝜁(𝑥, 𝑠)𝑝(𝑠) =
∑𝑠≥𝑥
𝑝(𝑠)
• From the Möbius inversion formula, log-linear model is:log𝑝(𝑥) =
∑𝑠∈𝑆
𝜁(𝑠, 𝑥)𝜃𝑠 =∑
𝑠≤𝑥𝜃𝑠
17/41
Log-Linear Model on Poset
x
(p(x), θx, ηx)
18/41
Log-Linear Model on Poset
x
(p(x), θx, ηx)
log p(x) = ∑s≤x θs
18/41
Log-Linear Model on Poset
x
(p(x), θx, ηx)
log p(x) = ∑s≤x θs
ηx = ∑s≥x p(s)
18/41
Exponential Family• The log-linear model on posets belongs tothe exponential family
• 𝜽 : Natural parameter• 𝜼 : Expectation parameter
19/41
Binary Log-Linear Model [AAAI2019](= Boltzmann machine)
∅
{a,b,c}
{a,b} {a,c} {b,c}
{a} {b} {c}
{a,b,c}
{a,b} {a,c} {b,c}
{a} {b} {c}
2{a,b,c}
20/41
Binary Log-Linear Model [AAAI2019](= Boltzmann machine)
∅
{a,b,c}
{a,b} {a,c} {b,c}
{a} {b} {c}
{a,b,c}
{a,b} {a,c} {b,c}
{a} {b} {c}
2{a,b,c}2{a,b,c}
∅
{a,b,c}
{a,b} {a,c}
{a} {b}
{a,b,c}
{a,b} {a,c}
{a} {b}
{b,c}
{c}{c}
log p(x) = ∑s≤x θs
ηx = ∑s≥x p(s)
20/41
Binary Log-Linear Model [AAAI2019](= Boltzmann machine)
000
100
110 101
111
000
100
110 101
111{0, 1}3
011011
001001010010
21/41
Binary Log-Linear Model [AAAI2019](= Boltzmann machine)
000
100
110 101
111
000
100
110 101
111{0, 1}3
011011
001001010010
log p(x) = ∑s≤x θs
= –ψ + ∑i θixi + ∑i,jθijxixj + ···
For x with xi₁ = ··· = xik = 1,ηx = ∑s≥x p(s)
= �[xi₁···xik] = Pr(xi₁ = ··· = xik = 1)000
100
110 101
111
000
100
110 101
111{0, 1}3
011
001001010010
21/41
Dually Flat Structure• Let 𝜓(𝜃) = −𝜃(⊥) (convex, partition function)
𝜓(𝜃)Legendre transformation,,,,,,,,,,,,,,,,,,,,,,→ 𝜙(𝜂) =
∑
𝑥∈𝑆𝑝(𝑥) log𝑝(𝑥)
• (𝜓(𝜃), 𝜙(𝜂)) leads to dually flat coordinate system (𝜃, 𝜂):
∇𝜓(𝜃) = 𝜂, 𝜕𝜕𝜃𝑥
𝜓(𝜃) = 𝜂𝑥
∇𝜙(𝜂) = 𝜃, 𝜕𝜕𝜂𝑥
𝜙(𝜂) = 𝜃𝑥22/41
Riemannian Metric (Fisher Information)
𝜕𝜕𝜃𝑥
𝜕𝜕𝜃𝑦
𝜓(𝜃) = 𝜕𝜕𝜃𝑥
𝜂𝑦 =∑
𝑠∈𝑆𝜁(𝑥, 𝑠)𝜁(𝑦, 𝑠)𝑝(𝑠) − 𝜂𝑥𝜂𝑦
𝜕𝜕𝜂𝑥
𝜕𝜕𝜂𝑦
𝜙(𝜂) = 𝜕𝜕𝜂𝑥
𝜃𝑦 =∑
𝑠∈𝑆𝜇(𝑠, 𝑥)𝜇(𝑠, 𝑦)𝑝(𝑠)−1
𝔼𝑠 [𝜕𝜕𝜃𝑥
log𝑝(𝑠) 𝜕𝜕𝜂𝑦
log𝑝(𝑠)] = 𝛿(𝑥, 𝑦)
23/41
Riemannian Metric (Fisher Information)
𝔼𝑠 [𝜕𝜕𝜃𝑥
log𝑝(𝑠) 𝜕𝜕𝜃𝑦
log𝑝(𝑠)] =∑
𝑠∈𝑆𝜁(𝑥, 𝑠)𝜁(𝑦, 𝑠)𝑝(𝑠) − 𝜂𝑥𝜂𝑦
𝔼𝑠 [𝜕𝜕𝜂𝑥
log𝑝(𝑠) 𝜕𝜕𝜂𝑦
log𝑝(𝑠)] =∑
𝑠∈𝑆𝜇(𝑠, 𝑥)𝜇(𝑠, 𝑦)𝑝(𝑠)−1
𝔼𝑠 [𝜕𝜕𝜃𝑥
log𝑝(𝑠) 𝜕𝜕𝜂𝑦
log𝑝(𝑠)] = 𝛿(𝑥, 𝑦)
24/41
Mixed Coordinate System• Many problems are formulated as coordinate mixing
P = ( θ1, θ2, ..., θi–1, θi, θi+1, ..., θₙ )
Q = ( η1, η2, ..., ηi–1, θi, θi+1, ..., θₙ )
R = ( η1, η2, ..., ηi–1, ηi, ηi+1, ..., ηₙ )
e-projection(MLE)m-projection
Pythagorean theorem:KL(P, R) = KL(P, Q) + KL(Q, R)
(Q is always unique)
25/41
Mixed Coordinate System (Example)• Many problems are formulated as coordinate mixing
P = ( , , ..., , , , ..., )
Q = ( η1, η2, ..., ηi–1, , , ..., )
R = ( η1, η2, ..., ηi–1, ηi, ηi+1, ..., ηₙ )
Pythagorean theorem:KL(P, R) = KL(P, Q) + KL(Q, R)
(Q is always unique)
0 0 0 0 0 0
0 0 0
Uniform dist.
Empirical dist.
26/41
Two Submanifolds
P = (θ1,θ2,...,θi–1,θi,θi+1,...,θₙ)
R = (η1,η2,...,ηi–1,ηi,ηi+1,...,ηₙ)
Q = (η1,η2,...,ηi–1,θi,θi+1,...,θₙ)
Fix
Fix
e-projection= MLE (Model manifold)
(Data manifold)27/41
Gradient methods for e-projection• e-projection is convex optimization• Gradient descent (first-order):𝜽next ← 𝜽 − 𝜀(𝜼 − �̂�target)
• Natural gradient (second-order)𝜽next ← 𝜽 − 𝐺−1(𝜼 − �̂�target)– 𝐺 is Fisher information matrix w.r.t. 𝜃
• Coordinate descent [IBIS2019]28/41
Boltzmann Machine Training
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111Boltzmannmachine
Samplespace
29/41
Boltzmann Machine Training
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111Boltzmannmachine
Samplespace θ = 0
η = η
29/41
Matrix Balancing
θ = θ
η = 3 – i + 1
3x3 matrixas poset:
30/41
Legendre Decomposition [NeurIPS 2018]
3x3x3 tensoras poset:
θ = 0η = η
31/41
Introducing Hidden Variables in BM
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111BM withhidden variables
Samplespace
32/41
Introducing Hidden Variables in BM
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111BM withhidden variables
Samplespace η0101 = ?
32/41
Introducing Hidden Variables in BM
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111BM withhidden variables
Samplespace θ = 0
η = η
Need to estimate ηvia EM nonconvexNeed to estimate ηvia EM nonconvex
32/41
Introducing Hidden States
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111Boltzmannmachine
Samplespace
33/41
Introducing Hidden States
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111Boltzmannmachine
Samplespace
Hiddenstates
34/41
Introducing Hidden States
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111Boltzmannmachine
Samplespace
Hiddenstates
Optimization isstill convex!
ηx
x
35/41
Introducing Hidden States
00001000 0100
1100
1110
1111
00001000 0100
1100 10101010
1110
1111Boltzmannmachine
Samplespace θ = 0
η = η
36/41
Blind Source Separation [arXiv]
SourceMixing Received
e.g. image
37/41
Blind Source Separation [arXiv]
SourceMixing Received
e.g. image
θ = 0η = η37/41
Results of BSS
0.270320.436300.621950.37167
IGBSSFastICANMFDicLearn
Source
Received
Reconst.
Method RMSE
38/41
Relationship to Homology• Möbius function is Euler characteristics• Consider order complex ∆(𝑆) of a poset 𝑆 with ⊥,⊤ ∈ 𝑆
(i) Vertices of ∆(𝑆) are elements of 𝑆(ii) Faces of ∆(𝑆) are chains of 𝑆
• For the Euler characteristic 𝜒(∆(𝑆)),𝜇(⊥,⊤) = 𝜒(∆(𝑆)) + 1– Two spaces are homotopy equivalent⇒ Euler characteristics are the same
39/41
Summary: Recipe for Poset-LogLinear1. Treat the target as a poset
2. Introduce the log-linear model on poset
3. Formulate the objective as coordinate mixing
4. Solve it by a gradient method
(This slide is at https://mahito.nii.ac.jp/)
40/41
Acknowledgment• Tsuda, K. (UTokyo), Nakahara, H. (RIKEN CBS)• Yamada, R. (KyotoU), Mimura, K. (HiroshimaCU)• Luo, S., Azizi, L. (USydney)• Hayashi, S., Matsushima, S. (UTokyo)• Borgwardt, K. and his lab members (ETH Zürich)• Lab members at NII
41/41