PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016...
Transcript of PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016...
![Page 1: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/1.jpg)
November 16, 2016IBIS2016
Partial Order Structure andInformation Geometry(順序構造と情報幾何)
Mahito Sugiyama (ISIR, Osaka University, PRESTO)(杉山 麿人;大阪大学産業科学研究所, JSTさきがけ)
![Page 2: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/2.jpg)
Today’s Model on Poset (S, ≤)
log p(x) =s∈S
ζ(s, x)θ(s)
p(x) =s∈S
µ(x , s)η(s)
1/39
![Page 3: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/3.jpg)
Today’s Model on Poset (S, ≤)
log p(x) =s∈S
ζ(s, x)θ(s)
p(x) =s∈S
µ(x , s)η(s)
Probability
Coe�cient of log-linear model(Bias/weight in Boltzmann machines)(Natural parameter of exponential family)
Zeta function
Möbius function Expectation(Frequency in pattern mining)(Su�cient statistics in exponential family)
log p(x) =s∈S
ζ(s, x)θ(s)
p(x) =s∈S
µ(x , s)η(s)
1/39
![Page 4: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/4.jpg)
Outcome
• Given a poset (S , ≤) and consider distributions on S– The least element⊥ ∈ S is assumed
1. KL divergence decomposition:DKL[P, R] = DKL[P, Q] + DKL[Q , R]with Q s.t. θQ (x) = θR(x) or ηQ (x) = ηP(x) for all x ∈ S \ {⊥}
2. The set of probability distributions on (S , ≤) isa dually flat manifold w.r.t. θ and η– p, θ, and η are coordinate systems– θ and η are orthogonal– θ introduces the structure of exponential family– η introduces the structure of mixture family
2/39
![Page 5: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/5.jpg)
Partially Ordered Sets
∅
{x, y, z}
{x, z} {y, z}
{y} {z}
{x, y}
{x}
Power set
3/39
![Page 6: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/6.jpg)
Partially Ordered Sets
∅
{x, y, z}
{x, z} {y, z}
{y} {z}
{x, y}
{x}
Power set
0
1
2
3
Positive integers
3/39
![Page 7: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/7.jpg)
Partially Ordered Sets
∅
{x, y, z}
{x, z} {y, z}
{y} {z}
{x, y}
{x}
Power set
0
1
2
3
Positive integers
01 10
0 1
1100
111110101100011010001000
λ
Pre�xes
3/39
![Page 8: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/8.jpg)
Partially Ordered Sets
∅
{x, y, z}
{x, z} {y, z}
{y} {z}
{x, y}
{x}
Power set
0
1
2
3
Positive integers
01 10
0 1
1100
111110101100011010001000
λ
Pre�xes
Directed Acyclic Graph
3/39
![Page 9: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/9.jpg)
Posets with Probability Distribution
Probability distributionon posets (partially ordered sets)
Prob
abili
ty
4/39
![Page 10: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/10.jpg)
Posets with Probability Distribution
Probability distributionon posets (partially ordered sets)
Prob
abili
ty
Informationgeometry
log p(x) = ∑ ζ(s, x)θ(s)Decomposition inthe log-linear model
4/39
![Page 11: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/11.jpg)
Posets with Probability Distribution
Probability distributionon posets (partially ordered sets)
Prob
abili
ty
log p(x) = ∑ ζ(s, x)θ(s)Decomposition inthe log-linear model
(e.g. Neurons, SNPs, ...)
0 0 11 0 01 1 10 0 01 1 00 1 11 0 11 0 11 0 11 1 0
Numerical score(KL divergence)and the p-valuefor higher-orderintractions
∅
{x, y, z}{x, z}
{y, z}
{y}
{z}
{x, y}
{x}
∅
{x, y, z}{x, z}
{y, z}
{y}
{z}
{x, y}
{x}
4/39
![Page 12: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/12.jpg)
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Binary vectors(Transactiondatabase)
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
Poset (itemset lattice)
5/39
![Page 13: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/13.jpg)
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Binary vectors(Transactiondatabase)
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
Poset (itemset lattice)
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
6/39
![Page 14: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/14.jpg)
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Binary vectors(Transactiondatabase)
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
Poset (itemset lattice)
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
7/39
![Page 15: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/15.jpg)
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Binary vectors(Transactiondatabase)
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
Poset (itemset lattice)
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
, 0.0
0.1 0.1 0.0
0.3 0.2 0.0
Probability = 0.3
7/39
![Page 16: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/16.jpg)
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
Poset (itemset lattice)
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
, 0.0
0.1 0.1 0.0
0.3 0.2 0.0
Probability = 0.3
Upward =Pattern mining
η( ) = p( ) + p( ){ , } { , } { , , }
η: Frequencyp: Probability
7/39
![Page 17: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/17.jpg)
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
Poset (itemset lattice)
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
, 0.0
0.1 0.1 0.0
0.3 0.2 0.0
Probability = 0.3
Upward =Pattern mining
η( ) = p( ) + p( ){ , } { , } { , , }
η: Frequencyp: Probabilityη: Frequencyp: Probabilityθ: Coe�cient of
log-linear model
log p( ) = θ( ) + θ( ) + θ( ) + θ(∅){ , } { , } { } { }
Downward =Log-linear analysis
8/39
![Page 18: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/18.jpg)
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
log p(x) = ∑ ζ(s, x)θ(s)
9/39
![Page 19: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/19.jpg)
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
log p(x) = ∑ ζ(s, x)θ(s)
p(x) = exp( ∑ θ(s)Fs(x) – ψ(θ) )Exponentialfamily:
Naturalparameter
e.g. Gaussian
10/39
![Page 20: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/20.jpg)
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
log p(x) = ∑ ζ(s, x)θ(s)
p(x) = exp( ∑ θ(s)Fs(x) – ψ(θ) )Exponentialfamily:
Naturalparameter
e.g. Gaussian
Su�cientstatistics ofexponentialfamily
η(x) = �[ Fx(s) ]
η(x) = ∑ ζ(x, s)p(s)
11/39
![Page 21: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/21.jpg)
Möbius Inversion on Posets
• Zeta function ζ∶S × S → {0, 1}:ζ(s, x) = { 1 if s ≤ x ,
0 otherwise
• Möbius function µ∶S × S → Z, defined as µ = ζ −1:
µ(x , y) = ⎧⎪⎪⎪⎨⎪⎪⎪⎩1, if x = y,−∑x≤s<y µ(x , s) if x < y,0 otherwise
• The Möbius inversion formula [Rota (1964)]:g(x) = ∑
s∈Sζ(s, x)f (s) ⇔ f (x) = ∑
s∈Sµ(s, x)g(s)
12/39
![Page 22: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/22.jpg)
Möbius Function Is Generalization ofInclusion-Exclusion Principle
• For sets A, B, C ,∣A ∪ B ∪ C∣ = ∣A∣ + ∣B∣ + ∣C∣ − ∣A ∩ B∣ − ∣B ∩ C∣ − ∣A ∩ C∣+ ∣A ∩ B ∩ C∣
• In general, for A1 , A2 , . . . , An ,»»»»»»»»»»⋃i A i
»»»»»»»»»» = ∑J⊆{1, . . . ,n}, J/=∅(−1)∣J∣−1
»»»»»»»»»»»⋂j∈J A j
»»»»»»»»»»»• The Möbius function µ is the generalization of “(−1)∣J∣−1”
13/39
![Page 23: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/23.jpg)
Log-linear Model with Möbius Inversion
• Log-linear model and its sufficient statistics:
log p(x) = ∑s∈S
ζ(s, x)θ(s) = ∑s≤x
θ(s),η(x) = ∑
s∈Sζ(x , s)p(s) = ∑
s≥xp(s)
– Generalization of the log-linear model on binary vectors:
log p(x) = ∑i
θ i x i +∑i< j
θ i j x i x j + ⋅ ⋅ ⋅ + θ 1. . .n x 1x2 . . . xn ,
• From the Möbius inversion formula,θ(x) = ∑
s∈Sµ(s, x) log p(s), p(x) = ∑
s∈Sµ(x , s)η(s)
14/39
![Page 24: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/24.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
15/39
![Page 25: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/25.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
{ }
{ }p( )
p( )
{ , }p( )
15/39
![Page 26: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/26.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
{ }
{ }p( )
p( )
{ , }p( )0.3
0.20.2
0.40.4
Probability distributionis a “point” in 3D space
16/39
![Page 27: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/27.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
{ }
{ }η( )
η( )
{ , }η( )
0.20.2
0.60.6
0.50.5
Probability distributionis a “point” in 3D space
17/39
![Page 28: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/28.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
{ }
{ }θ( )
θ( )
{ , }θ( )
–1.79–1.791.391.39
1.101.10
Probability distributionis a “point” in 3D space
18/39
![Page 29: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/29.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
{ , }
{ }
{ }
one-to-one
19/39
![Page 30: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/30.jpg)
∅
{ , }
{ } { }
0.30.5
1.10
0.20.2
–1.79
0.11.0
–2.30
0.40.6
1.39
pηθ
Triple for each node
η(x) = ∑ p(s)s ≥ x
log p(x) = ∑ θ(s)s ≤ x
{ , }
one-to-one
{ }θ( ){ }
{ }η( )θ and η aredually orthogonal
19/39
![Page 31: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/31.jpg)
Orthogonality of θ and η
• FromMöbius inversion,
∑s∈S
ζ(x , s)µ(s, y) = δx ,y , δx ,y = { 1 if x = y,0 otherwise
• θ and η are dually orthogonal:
E [ ∂∂θ(x) log p(s) ∂
∂η(y) log p(s)] = ∑s∈S
ζ(x , s)µ(s, y) = δx ,y
• Partial order structure leads to the samedually flat structure with the exponential family
20/39
![Page 32: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/32.jpg)
Existing Approach Limited To Power Set
∅
{x, y, z}
{x, z} {y, z}
{y} {z}
{x, y}
{x}
Power set
21/39
![Page 33: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/33.jpg)
Our Approach Applies To Any Posets
∅
{x, y, z}
{y, z}
{y}
{x, y}
{x}
Subset of power set
22/39
![Page 34: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/34.jpg)
Our Approach Applies To Any Posets
0
1
2
3
Positive integers
01 10
0 1
1100
111110101100011010001000
λ
Pre�xes
Directed Acyclic Graph
∅
{x, y, z}
{y, z}
{y}
{x, y}
{x}
Subset of power set
22/39
![Page 35: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/35.jpg)
KL Divergence Decomposition
• KL divergence decomposition:DKL[P, R] = DKL[P, Q] + DKL[Q , R]with Q s.t. θQ (x) = θR(x) or ηQ (x) = ηP(x) for all x ∈ S \ {⊥}– Q is called the mixed distribution of (P, R)– It is known as the (generalized) Pythagoras theoremin Information Geometry
• We can derive fromMöbius inversion:DKL[P, Q] + DKL[Q , R] − DKL[P, R]
= ∑s∈S
(ηQ (s) − ηP(s)) (θQ (s) − θR(s))23/39
![Page 36: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/36.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.70.81.95
0.10.1
–1.950.10.20.0
Dist. P Dist. Rpηθ
24/39
![Page 37: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/37.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.70.81.95
0.10.1
–1.950.10.20.0
Dist. P Dist. Rpηθ
choose ηchoose θMixed
distribution Q
∅
??????1.95
???0.2???
??????0.0
24/39
![Page 38: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/38.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.70.81.95
0.10.1
–1.950.10.20.0
Dist. P Dist. Rpηθ
choose ηchoose θMixed
distribution Q
∅
0.620.51.95
0.0890.60.0
–1.13
0.20.2
–1.13
25/39
![Page 39: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/39.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.70.81.95
0.10.1
–1.950.10.20.0
Dist. P Dist. Rpηθ
choose ηchoose θMixed
distribution Q
∅
0.620.51.95
0.0890.60.0
–1.13
0.20.2
–1.13
KL[P, R]
26/39
![Page 40: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/40.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.70.81.95
0.10.1
–1.950.10.20.0
Dist. P Dist. Rpηθ
choose ηchoose θMixed
distribution Q
∅
0.620.51.95
0.0890.60.0
–1.13
0.20.2
–1.13
KL[P, R] KL[P, Q]+ KL[Q, R]
=
Nonnegative decompositionof the KL divergence
27/39
![Page 41: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/41.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.70.81.95
0.10.1
–1.950.10.20.0
Dist. P Dist. Rpηθ
choose ηchoose θMixed
distribution Q
∅
0.620.51.95
0.0890.60.0
–1.13
0.20.2
–1.13
+ =
Nonnegative decompositionof the KL divergence
0.3946 0.04440.4390
28/39
![Page 42: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/42.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.250.50.0
0.250.250.0
0.250.50.0
Dist. P Uniform dist. P0pηθ
29/39
![Page 43: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/43.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.250.50.0
0.250.250.0
0.250.50.0
Dist. P Uniform dist. P0pηθ
choose η choose θ(KNOCK DOWN)
log p(x) = ∑ θ(s)s ≤ x
Log-linear model
∅
???0.5???
??????0.0
???0.6???
Mixeddistribution Q
29/39
![Page 44: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/44.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.250.50.0
0.250.250.0
0.250.50.0
Dist. P Uniform dist. P0pηθ
choose η choose θ(KNOCK DOWN)
log p(x) = ∑ θ(s)s ≤ x
Log-linear model
∅
0.20.50.0
0.30.30.0
0.30.60.405
Mixeddistribution R
Contribution of the node= KL[P, Q] = 0.086
30/39
![Page 45: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/45.jpg)
∅
0.30.51.10
0.20.2
–1.790.40.61.39 ∅
0.250.50.0
0.250.250.0
0.250.50.0
Dist. P Uniform dist. P0pηθ
choose η choose θ(KNOCK DOWN)
log p(x) = ∑ θ(s)s ≤ x
Log-linear model
∅
0.20.50.0
0.30.30.0
0.30.60.405
Mixeddistribution R
Contribution of the node= KL[P, Q] = 0.086
The statistics λ: λ = 2·[sample size]·KL[P, Q]follows χ2-distributionwith d.f. [#nodes – 1]⇒ p-value can be obtained!
31/39
![Page 46: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/46.jpg)
Poset of Subgraphs
32/39
![Page 47: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/47.jpg)
Log-Linear Model on Subgraphs
log p(x) = ∑ θ(s)s ⊑ x
Log-linear model:
Natural parameterof exponential familyNatural parameterof exponential family
η(x) = ∑ p(s)s ⊒ x
Su�cient statisticsof exponential familySu�cient statisticsof exponential family
33/39
![Page 48: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/48.jpg)
Information of Each Subgraph
η0 η1 η2 η3 η4 η5 ηkθ0 θ1 θ2 θ3 θ4 θ5 θk
P: Empirical distribution ηj
θi
Freq.Coef.
34/39
![Page 49: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/49.jpg)
Information of Each Subgraph
η0 η1 η2 η3 η4 η5 ηkθ0 θ1 θ2 θ3 θ4 θ5 θk
P: Empirical distribution ηj
θi
Freq.Coef.
KL(P, Q)
η0 η1 η2 η3 ? η5 ηk? ? ? ? 0 ? ?
Q: Null distribution KL(P, Q)
Freq.Coef.
34/39
![Page 50: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/50.jpg)
Information of Each Subgraph
η0 η1 η2 η3 η4 η5 ηkθ0 θ1 θ2 θ3 θ4 θ5 θk
P: Empirical distribution ηj
θi
Freq.Coef.
KL(P, Q)
η0 η1 η2 η3 ? η5 ηk? ? ? ? 0 ? ?
Q: Null distribution KL(P, Q)
Freq.Coef.
= KL(P, Q)= KL(P, R)+ KL(Q, Q
KL(Q, R)
KL(P, R)= KL(P, Q)+ KL(Q, R)
1.0 ? ? ? ? ? ?θ0’ 0 0 0 0 0 0
R: Uniform distribution KL(Q, R)
Freq.Coef.
34/39
![Page 51: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/51.jpg)
Make a Poset from Data
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Dataset
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
Number of nodes = 2#features
⇒combinatorial explosion!
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
, 0.0
0.1 0.1 0.0
0.3 0.2 0.0
Probability = 0.3
35/39
![Page 52: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/52.jpg)
Make a Poset from Data
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Dataset
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
∅
{ , }{ , }
{ } { }
{ , }
{ , , }
{ }
1.0
0.9 0.7 0.5
0.6 0.5 0.3
Frequency = 0.3
, 0.0
0.1 0.1 0.0
0.3 0.2 0.0
Probability = 0.3
Probability ≥ 0.2(user speci�ed threshold)
35/39
![Page 53: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/53.jpg)
Remove Nodes with Probability 0
ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:
Dataset
1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0
∅
{ , }{ , }
{ , , }0.30.30.0
pηθ
0.30.6
0.405
0.20.50.0
0.21.0
–1.61
= 1 – ∑ p(s)
36/39
![Page 54: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/54.jpg)
Example on Real Data (kosarak)
ID 1:ID 2:ID 3:ID 4:ID 5:
1 1 01 1 11 1 01 1 11 1 0
Sample size:990,002
# features: 41,270# nodes: 3,253(Threshold: 10–5)
# signi�cant interactions: 583Single feature: 537Pairwise interactions: 41Triple interactions: 5
Total runtime:4.95 seconds
37/39
![Page 55: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/55.jpg)
Example on Real Data (accidents)
ID 1:ID 2:ID 3:ID 4:ID 5:
1 1 01 1 11 1 01 1 11 1 0
Sample size:340,183
# nodes: 281(Threshold: 5×10–6)
# signi�cant interactions: 280# features in each interactionis between 26 to 41
Total runtime:4.95 seconds
# features: 468
38/39
![Page 56: PartialOrderStructureand InformationGeometry - Mahito · November16,2016 IBIS2016 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)](https://reader034.fdocument.pub/reader034/viewer/2022042319/5f0940327e708231d425f143/html5/thumbnails/56.jpg)
Conclusion
• A close connection between the partial order structureand information geometry– Möbius inversion leads to the dually flat manifolds
◦ M. Sugiyama, H. Nakahara, K. Tsuda,Information Decomposition on Structured Space,IEEE ISIT (2016)
◦ S. Amari, Information geometry on hierarchy ofprobability distributions, IEEE Trans. Info. Theory (2001)
◦ H. Nakahara, S. Amari, Information-geometric measurefor neural spikes, Neural Computation (2002)
• We can decompose the KL divergence andasses the significance on any posets
39/39