Improving Variational Inference with Inverse Autoregressive Flow

Post on 24-Jan-2017

248 views 1 download

Transcript of Improving Variational Inference with Inverse Autoregressive Flow

ImprovingVariational InferencewithInverseAutoregressiveFlow

Jan.19,2017

TatsuyaShirakawa (tatsuya@abeja.asia)

Diederik P.Kingma (OpenAI)TimSalimans (OpenAI)Rafal Jozefowics (OpenAI)XiChen(OpenAI)IlyaSutskever (OpenAI)MaxWelling(UniversityofAmsterdam)

1

Variational Autoencoder (VAE)

log ๐‘ ๐’™

โ‰ฅ

๐”ผ( ๐’›|๐’™ log ๐‘ ๐’™, ๐’› โˆ’ log ๐‘ž(๐’› |๐’™)โˆฅ

log ๐‘ ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› ๐’™โˆฅ

๐”ผ( ๐’›|๐’™ log ๐‘ ๐’™ ๐’› โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’›

=: โ„’ ๐’™; ๐œฝ

Modelz ~ p(z;ฮท)x ~ p(x|z;ฮท)

Optimization

maximize๐œผ

1๐‘B log ๐‘ ๐’™๐’; ๐œผ

D

EFG

Inference Modelz ~ q(z|x;ฮฝ)

Optimization

maximize๐œฝF(๐œผ,๐‚)

1๐‘Bโ„’ ๐’™๐’; ๐œฝ

D

EFG

ELBO

๐’˜๐’Š๐’•๐’‰๐œฝ = ๐, ๐‚

P(z|x;ฮผ*)

๐‘ซ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘)q(z|x;ฮฝ*)

P(z|x;ฮผ)

q(z|x;ฮฝ)

2

Requirementsfortheinferencemodelq(z|x)

ComputationalTractability1. Computationallycheaptocomputeanddifferentiate2. Computationallycheaptosamplefrom3. Parallelcomputation

Accuracy4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)

P(z|x;ฮผ*)

๐‘ซ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘)

q(z|x;ฮฝ*)

P(z|x;ฮผ)

q(z|x;ฮฝ)

3

PreviousDesignsofq(z|x)

BasicDesigns- DiagonalGaussianDistribution- FullCovarianceGaussianDistribution

DesignsbasedonChangeofVariables- Nice

L.Dinh etal.,โ€œNice:non-linearindependentcomponentsestimationโ€,2014

- NormalizingFlowD.J.Rezende etal.,โ€œVariational inferencewithnormalizingflowsโ€,ICML2015

DesignsbasedonAddingAuxiliaryVariables- HamiltonianFlow/HamiltonianVariational Inference

T.Salimans etal.,โ€MarkovchainMonteCarloandvariational inference:Bridgingthegapโ€,2014

4

Diagonal/FullCovarianceGaussianDistribution

Diagonal:Efficientbutnotflexible๐‘ž ๐’› ๐’™ = ฮ U๐‘ ๐’›๐’Š|๐œ‡U ๐’™ , ๐œŽU ๐’™

FullCovariance:NotEfficientandnotflexible(unimodal)๐‘ž ๐’› ๐’™ = ๐‘ ๐’›|๐ ๐’™ , ๐šบ ๐’™

1. Computationallycheaptocomputeanddifferentiate โœ“ / โœ—2. Computationallycheaptosamplefrom โœ“ / โœ—3. Parallelcomputation โœ“ / โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ—

5

ChangeofVariablesbasedmethods

Transoform๐‘ž ๐‘งZ ๐‘ฅ tomakemorepowerful distribution๐‘ž ๐‘ง\ ๐‘ฅ viasequentialapplicationofchangeofvariables

๐’›๐’• = ๐‘“ ๐’›๐’•_๐Ÿ

๐‘ž ๐’›๐’• ๐’™ = ๐‘ž ๐’›๐’•_๐Ÿ ๐’™ det๐‘‘๐‘“ ๐’›๐’•_๐Ÿ๐‘‘๐’›๐’•_๐Ÿ

_G

โ‡’ log ๐‘ž ๐’›๐‘ป ๐’™ = log ๐‘ž ๐’›๐ŸŽ ๐’™ โˆ’Blog det๐‘‘๐‘“ ๐’›๐’•_๐Ÿ๐‘‘๐’›๐’•_๐Ÿ

๏ฟฝ

^

โ€ข NiceL.Dinh etal.,โ€œNice:non-linearindependentcomponentsestimationโ€,2014

โ€ข NormalizingFlowD.J.Rezende etal.,โ€œVariational inferencewithnormalizingflowsโ€,ICML2015

6

NormalizingFlow

Transformationvia๐’›๐’• = ๐’›๐’•_๐Ÿ + ๐’–๐’•๐‘“ ๐’˜๐’•

\๐’›๐’•_๐Ÿ + ๐‘^KeyFeatures- Determinantsarecomputable

Drawbacks- Informationgoesthroughsinglebottleneck

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallelcomputation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ—

singlebottleneck

โŠ•

๐’›๐’•_๐Ÿ

๐’›๐’•

๐’˜๐’•๐‘ป๐’›๐’• + ๐‘^

๐’–๐’•๐‘“ ๐’˜๐’•๐‘ป๐’›๐’• + ๐‘^

7

HamiltonianFlow/HamiltonianVariational Inference

ELBOwithauxiliaryvariablesylog ๐‘ ๐’™ โ‰ฅ log ๐‘ ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’š ๐’™, ๐’› โˆฅ ๐‘Ÿ ๐’š ๐’™, ๐’› =: โ„’ ๐’™

Drawing(y,z)viaHMC๐‘ฆ^, ๐‘ง^ ~๐ป๐‘€๐ถ ๐‘ฆ^, ๐‘ง^|๐‘ฆ^_G, ๐‘ง^_G

KeyFeatures- Capabilitytosamplefromexactposterior

Drawbacks- LongmixingtimeandlowerELBO

1. Computationallycheaptocomputeanddifferentiate โœ—2. Computationallycheaptosamplefrom โœ—3. Parallelcomputation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ“

8

Nice

Transformonlyhalfofzateachsteps

๐’›๐’• = ๐’›๐’•๐œถ, ๐’›๐’•๐œท = ๐’›๐’•_๐Ÿ๐œถ , ๐’›๐’•_๐Ÿ

๐œท + ๐‘“ ๐’™, ๐’›๐’•_๐Ÿ๐œถ ,KeyFeatures- DeterminantoftheJacobiandet uvw ๐’›๐’•x๐Ÿ

u๐’›๐’•x๐Ÿisalways1

Drawbacks- Limitedformoftransformation- lessaccuratepowerfulthanNormalizingFlow(Next)

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallel computation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ—

9

Autoregressive Flow(proposed)

AutoregressiveFlow(๐‘‘๐œ‡^,U/๐‘‘๐‘ง^,z =๐‘‘๐œŽ^,U/๐‘‘๐‘ง^,z =0if๐‘– โ‰ค ๐‘—)๐‘ง^,U = ๐œ‡^,U ๐’›๐’•,๐ŸŽ:๐’Š_๐Ÿ + ๐œŽ^,U ๐’›๐’•,๐ŸŽ:๐’Š_๐Ÿ โŠ™ ๐‘ง^_G,U

Keyfeatures- Powerful- Easytocomputedet ๐œ•๐’›๐’•/๐œ•๐’›๐’•_๐Ÿ = ฮ U๐œŽ^,U ๐ณ๐ญ_๐Ÿ

Drawbacks- Difficulttoparallelize

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallel computation โœ—4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ“

10

InverseAutoregressive Flow(proposed)

InvertingAF(๐๐’•, ๐ˆ๐’•isalsoautoregressive)

๐’›๐’• =๐’›๐’•_๐Ÿ โˆ’ ๐๐’• ๐’›๐’•_๐Ÿ

๐ˆ๐’• ๐’›๐’•_๐ŸKeyFeatures- EquallypowerfulasAF- Easytocomputedet ๐œ•๐’›๐’•/๐œ•๐’›๐’•_๐Ÿ = 1/ฮ U๐œŽ^,U ๐ณ๐ญ_๐Ÿ- Parallelizable

1. Computationallycheaptocomputeanddifferentiate โœ“2. Computationallycheaptosamplefrom โœ“3. Parallelcomputation โœ“4. Sufficientlyflexibletomatch

thetrueposteriorp(z|x)โœ“

11

IAFthroughMaskedAutoencoder (MADE)

Modelingautoregressive๐๐’• and๐ˆ๐’• withMADE

โ€ข RemovingpathsfromfuturesfromAutoencodersbyintroducingmasksโ€ขMADEisaprobabilisticmodel๐‘ ๐‘ฅ = ฮ U๐‘ ๐‘ฅU ๐‘ฅZ:U_G

12

Experiments

IAFisevaluatedonimagegeneratingmodels

ModelsforMNIST- ConvolutionalVAEwithResNet blocks- IAF=2-layerMADE- IAFtransformationsarestackedwithorderingreversedalternately

ModelsforCIFAR-10(verycomplicated)

13

MNIST

14

CIFAR-10

15

IAFin1slide

๐‘ซ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘)

๐’’ ๐’›๐‘ป ๐’™; ๐‚๐‘ป ๐‚๐‘ป

๐’‘ ๐’› ๐’™; ๐โˆ—๐’‘ ๐’› ๐’™; ๐

๐’’ ๐’› ๐’™; ๐‚๐‘ปโˆ—

๐’’ ๐’›๐’• ๐’™; ๐‚๐’• ๐‚๐’•

๐’’ ๐’›๐ŸŽ ๐’™; ๐‚๐ŸŽ ๐‚๐ŸŽ

Autoregressive Flow

Inverse Autoregressive Flow

IAF isรผ Easy to compute and differentiateรผ Easy to sample fromรผ Parallelizableรผ Flexible

๐’’ ๐’› ๐’™; ๐‚๐‘ป

Wearehiring!http://www.abeja.asia/

https://www.wantedly.com/companies/abeja