Improving Variational Inference with Inverse Autoregressive Flow
-
Upload
tatsuya-shirakawa -
Category
Data & Analytics
-
view
248 -
download
1
Transcript of Improving Variational Inference with Inverse Autoregressive Flow
ImprovingVariational InferencewithInverseAutoregressiveFlow
Jan.19,2017
TatsuyaShirakawa ([email protected])
Diederik P.Kingma (OpenAI)TimSalimans (OpenAI)Rafal Jozefowics (OpenAI)XiChen(OpenAI)IlyaSutskever (OpenAI)MaxWelling(UniversityofAmsterdam)
1
Variational Autoencoder (VAE)
log 𝑝 𝒙
≥
𝔼( 𝒛|𝒙 log 𝑝 𝒙, 𝒛 − log 𝑞(𝒛 |𝒙)∥
log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙∥
𝔼( 𝒛|𝒙 log 𝑝 𝒙 𝒛 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛
=: ℒ 𝒙; 𝜽
Modelz ~ p(z;η)x ~ p(x|z;η)
Optimization
maximize𝜼
1𝑁B log 𝑝 𝒙𝒏; 𝜼
D
EFG
Inference Modelz ~ q(z|x;ν)
Optimization
maximize𝜽F(𝜼,𝝂)
1𝑁Bℒ 𝒙𝒏; 𝜽
D
EFG
ELBO
𝒘𝒊𝒕𝒉𝜽 = 𝝁, 𝝂
P(z|x;μ*)
𝑫𝑲𝑳(𝒒 ∥ 𝒑)q(z|x;ν*)
P(z|x;μ)
q(z|x;ν)
2
Requirementsfortheinferencemodelq(z|x)
ComputationalTractability1. Computationallycheaptocomputeanddifferentiate2. Computationallycheaptosamplefrom3. Parallelcomputation
Accuracy4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)
P(z|x;μ*)
𝑫𝑲𝑳(𝒒 ∥ 𝒑)
q(z|x;ν*)
P(z|x;μ)
q(z|x;ν)
3
PreviousDesignsofq(z|x)
BasicDesigns- DiagonalGaussianDistribution- FullCovarianceGaussianDistribution
DesignsbasedonChangeofVariables- Nice
L.Dinh etal.,“Nice:non-linearindependentcomponentsestimation”,2014
- NormalizingFlowD.J.Rezende etal.,“Variational inferencewithnormalizingflows”,ICML2015
DesignsbasedonAddingAuxiliaryVariables- HamiltonianFlow/HamiltonianVariational Inference
T.Salimans etal.,”MarkovchainMonteCarloandvariational inference:Bridgingthegap”,2014
4
Diagonal/FullCovarianceGaussianDistribution
Diagonal:Efficientbutnotflexible𝑞 𝒛 𝒙 = ΠU𝑁 𝒛𝒊|𝜇U 𝒙 , 𝜎U 𝒙
FullCovariance:NotEfficientandnotflexible(unimodal)𝑞 𝒛 𝒙 = 𝑁 𝒛|𝝁 𝒙 , 𝚺 𝒙
1. Computationallycheaptocomputeanddifferentiate ✓ / ✗2. Computationallycheaptosamplefrom ✓ / ✗3. Parallelcomputation ✓ / ✗4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)✗
5
ChangeofVariablesbasedmethods
Transoform𝑞 𝑧Z 𝑥 tomakemorepowerful distribution𝑞 𝑧\ 𝑥 viasequentialapplicationofchangeofvariables
𝒛𝒕 = 𝑓 𝒛𝒕_𝟏
𝑞 𝒛𝒕 𝒙 = 𝑞 𝒛𝒕_𝟏 𝒙 det𝑑𝑓 𝒛𝒕_𝟏𝑑𝒛𝒕_𝟏
_G
⇒ log 𝑞 𝒛𝑻 𝒙 = log 𝑞 𝒛𝟎 𝒙 −Blog det𝑑𝑓 𝒛𝒕_𝟏𝑑𝒛𝒕_𝟏
�
^
• NiceL.Dinh etal.,“Nice:non-linearindependentcomponentsestimation”,2014
• NormalizingFlowD.J.Rezende etal.,“Variational inferencewithnormalizingflows”,ICML2015
6
NormalizingFlow
Transformationvia𝒛𝒕 = 𝒛𝒕_𝟏 + 𝒖𝒕𝑓 𝒘𝒕
\𝒛𝒕_𝟏 + 𝑏^KeyFeatures- Determinantsarecomputable
Drawbacks- Informationgoesthroughsinglebottleneck
1. Computationallycheaptocomputeanddifferentiate ✓2. Computationallycheaptosamplefrom ✓3. Parallelcomputation ✗4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)✗
singlebottleneck
⊕
𝒛𝒕_𝟏
𝒛𝒕
𝒘𝒕𝑻𝒛𝒕 + 𝑏^
𝒖𝒕𝑓 𝒘𝒕𝑻𝒛𝒕 + 𝑏^
7
HamiltonianFlow/HamiltonianVariational Inference
ELBOwithauxiliaryvariablesylog 𝑝 𝒙 ≥ log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙 − 𝐷23 𝑞 𝒚 𝒙, 𝒛 ∥ 𝑟 𝒚 𝒙, 𝒛 =: ℒ 𝒙
Drawing(y,z)viaHMC𝑦^, 𝑧^ ~𝐻𝑀𝐶 𝑦^, 𝑧^|𝑦^_G, 𝑧^_G
KeyFeatures- Capabilitytosamplefromexactposterior
Drawbacks- LongmixingtimeandlowerELBO
1. Computationallycheaptocomputeanddifferentiate ✗2. Computationallycheaptosamplefrom ✗3. Parallelcomputation ✗4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)✓
8
Nice
Transformonlyhalfofzateachsteps
𝒛𝒕 = 𝒛𝒕𝜶, 𝒛𝒕𝜷 = 𝒛𝒕_𝟏𝜶 , 𝒛𝒕_𝟏
𝜷 + 𝑓 𝒙, 𝒛𝒕_𝟏𝜶 ,KeyFeatures- DeterminantoftheJacobiandet uvw 𝒛𝒕x𝟏
u𝒛𝒕x𝟏isalways1
Drawbacks- Limitedformoftransformation- lessaccuratepowerfulthanNormalizingFlow(Next)
1. Computationallycheaptocomputeanddifferentiate ✓2. Computationallycheaptosamplefrom ✓3. Parallel computation ✗4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)✗
9
Autoregressive Flow(proposed)
AutoregressiveFlow(𝑑𝜇^,U/𝑑𝑧^,z =𝑑𝜎^,U/𝑑𝑧^,z =0if𝑖 ≤ 𝑗)𝑧^,U = 𝜇^,U 𝒛𝒕,𝟎:𝒊_𝟏 + 𝜎^,U 𝒛𝒕,𝟎:𝒊_𝟏 ⊙ 𝑧^_G,U
Keyfeatures- Powerful- Easytocomputedet 𝜕𝒛𝒕/𝜕𝒛𝒕_𝟏 = ΠU𝜎^,U 𝐳𝐭_𝟏
Drawbacks- Difficulttoparallelize
1. Computationallycheaptocomputeanddifferentiate ✓2. Computationallycheaptosamplefrom ✓3. Parallel computation ✗4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)✓
10
InverseAutoregressive Flow(proposed)
InvertingAF(𝝁𝒕, 𝝈𝒕isalsoautoregressive)
𝒛𝒕 =𝒛𝒕_𝟏 − 𝝁𝒕 𝒛𝒕_𝟏
𝝈𝒕 𝒛𝒕_𝟏KeyFeatures- EquallypowerfulasAF- Easytocomputedet 𝜕𝒛𝒕/𝜕𝒛𝒕_𝟏 = 1/ΠU𝜎^,U 𝐳𝐭_𝟏- Parallelizable
1. Computationallycheaptocomputeanddifferentiate ✓2. Computationallycheaptosamplefrom ✓3. Parallelcomputation ✓4. Sufficientlyflexibletomatch
thetrueposteriorp(z|x)✓
11
IAFthroughMaskedAutoencoder (MADE)
Modelingautoregressive𝝁𝒕 and𝝈𝒕 withMADE
• RemovingpathsfromfuturesfromAutoencodersbyintroducingmasks•MADEisaprobabilisticmodel𝑝 𝑥 = ΠU𝑝 𝑥U 𝑥Z:U_G
12
Experiments
IAFisevaluatedonimagegeneratingmodels
ModelsforMNIST- ConvolutionalVAEwithResNet blocks- IAF=2-layerMADE- IAFtransformationsarestackedwithorderingreversedalternately
ModelsforCIFAR-10(verycomplicated)
15
IAFin1slide
𝑫𝑲𝑳(𝒒 ∥ 𝒑)
𝒒 𝒛𝑻 𝒙; 𝝂𝑻 𝝂𝑻
𝒑 𝒛 𝒙; 𝝁∗𝒑 𝒛 𝒙; 𝝁
𝒒 𝒛 𝒙; 𝝂𝑻∗
𝒒 𝒛𝒕 𝒙; 𝝂𝒕 𝝂𝒕
𝒒 𝒛𝟎 𝒙; 𝝂𝟎 𝝂𝟎
Autoregressive Flow
Inverse Autoregressive Flow
IAF isü Easy to compute and differentiateü Easy to sample fromü Parallelizableü Flexible
𝒒 𝒛 𝒙; 𝝂𝑻