状態空間モデルの実行方法と実行環境の比較

状態空間モデルの実行方法と実行環境の比較

森林総合研究所伊東宏樹

2014年3月16日第61回日本生態学会大会（広島） T13 生態学における状態空間モデルの利用

本日とりあつかうソフトウェア• Rパッケージ • dlm • KFAS

• MCMC • BUGS言語 • Stan

サンプルコードなどの置き場所

http://www001.upp.so-net.ne.jp/ito-hi/stat/2014ESJ/

Statistical Software for State Space

ModelsCommandeur et al. (2011)

Journal of Statistical Software 41(1)

State Space Models in RPetris & Petrone (2011)

Journal of Statistical Software 41(4)

dlm• Dynamic Linear Model （動的線形モデル）

• 線形+正規分布

• カルマンフィルタ

• パラメータ推定

• 最尤推定／ベイズ推定

dlmの記法

✓0 ⇠ N (m0,C0)

✓t = Gt✓t�1 + wt , wt ⇠ N (0,Wt)

t = 1, ... , n

データモデル

プロセスモデル

yt = Ft✓t + vt , vt ⇠ N (0,Vt)

ナイル川の流量の変化data(Nile)

dlmによるLocal Level Model Petris and Petrone (2011)より

## build functionの定義!BuildLLM <- function(theta) {! dlmModPoly(order = 1,! dV = theta[1],! dW = theta[2])!}

このような関数を定義しておく。

dlmによるLocal Level Model## パラメーターの最尤推定!fit.llm <- dlmMLE(Nile, parm = c(100, 2),! build = BuildLLM,! lower = rep(1e-4, 2))!!

## 推定したパラメーターをbuild functionで使用!model.llm <- BuildLLM(fit.llm$par)!!

## 平滑化!smooth.llm <- dlmSmooth(Nile, model.llm)

平滑化dlmSmooth()


アスワンダム着工

dlmによる回帰モデル

# アスワンダム着工の前後を変数に!x <- matrix(c(rep(0, 27),! rep(1, length(Nile) - 27)),! ncol = 1)


## モデル定義!model.reg <- dlmModReg(x, dW = c(1, 0))!BuildReg <- function(theta) {! V(model.reg) <- exp(theta[1])! diag(W(model.reg))[1] <- exp(theta[2])! return(model.reg)!}


## 最尤推定!fit.reg <- dlmMLE(Nile,! parm = rep(0, 2),! build = BuildReg)!model.reg <- BuildReg(fit.reg$par)!smooth.reg <- dlmSmooth(Nile,! mod = model.reg)


アスワンダム着工

dlmの文献• Petris G, Petrone S, Campagnoli (2009) “Dynamic Linear Models with R” Springer

• 和合肇（監訳）・萩原淳一郎（訳）(2013)「Rによるベイジアン動的線形モデル」朝倉書店

• Petris G (2010) An R package for dynamic linear models. Journal of Statistical Software 36(12)

KFAS

• Kalman Filter and Smoother for Exponential Family State Space Models

• 正規分布以外の分布（ポアソン分布など）を扱える

• 最尤推定

KFASの記法

t = 1, ..., n

↵1 ⇠ N (a1,P1)

プロセスモデル

データモデルyt = Zt↵t + ✏t , ✏t ⇠ N (0,Ht)

↵t+1 = Tt↵t + Rt⌘t , ⌘t ⇠ N (0,Qt)

イギリスのバン運転手の死者・重傷者数data(Seatbelts)

KFASによるポアソン分布の状態空間モデル help(KFAS)より

model.van <- SSModel(VanKilled ~ law +! SSMtrend(degree = 1,! Q = list(matrix(NA))) +! SSMseasonal(period = 12,! sea.type = “dummy",! Q = matrix(NA)),! data = Seatbelts,! distribution = "poisson")

KFASによるポアソン分布の状態空間モデル

fit.van <- fitSSM(inits = c(-4, -7, 2),! model = model.van,! method = “BFGS")!!

pred.van <- predict(fit.van$model,! states = 1:2)

lawとSSMtrend()のみをつかう

季節変化をのぞいた予測値

シートベルト着用義務化

BUGSWinBUGS, OpenBUGS, JAGS

BUGS• MCMCによるベイズ推定

• 柔軟なモデリング

• Rパッケージでは対応できないモデル

例題• ある生物の個体数を推定する。

• 一定の発見確率にしたがって発見される。

Kéry & Schaub (2011) “Bayesian Population Analysis using WinBUGS: A hierarchical perspective” Chapter 5を参考にした。

データ生成set.seed(1234)!n.t <- 50 # 観察回数!N.lat <- rep(50, n.t) # 真の個体数!p <- 0.7 # 発見確率!N.obs <- rbinom(n.t, N.lat, p) # 観察個体数!

生成されたデータBinomial(50, 0.7)

真の個体数

観測された個体数

BUGSモデル

var! N, # 観察回数! y[N], # 観察された個体数! y_hat[N], # 「真の個体数」の推定値! lambda[N], # log(y_hat)! p, # 発見確率! tau, sigma;

BUGSモデルmodel {! ## データモデル! for (t in 1:N) {! y[t] ~ dbin(p, y_hat[t]);! y_hat[t] <- trunc(exp(lambda[t]));! }! ## プロセスモデル! for (t in 2:N) {! lambda[t] ~ dnorm(lambda[t - 1], tau);! }! ## 事前分布! lambda[1] ~ dnorm(0, 1.0E-4);! p ~ dbeta(2, 2);! sigma ~ dunif(0, 100);! tau <- 1 / (sigma * sigma);!}

JAGSによる実行inits <- list()!inits[[1]] <- list(p = 0.9, sigma = 1,! lambda = rep(log(max(N.obs) + 1), n.t))!inits[[2]] <- list(p = 0.7, sigma = 3,! lambda = rep(log(max(N.obs) + 1), n.t))!inits[[3]] <- list(p = 0.8, sigma = 5,! lambda = rep(log(max(N.obs) + 1), n.t))!!model <- jags.model("ks51.bug.txt",! data = list(N = n.t, y = N.obs),! inits = inits, n.chains = 3,! n.adapt = 100000)!samp <- coda.samples(model,! variable.names = c("y_hat", “sigma",! "p"),! n.iter = 3000000, thin = 3000)!

推定結果

真の個体数

観測された個体数

「真の個体数」の推定値

http://mc-stan.org/

Stan • MCMCによるベイズ推定

• Hamiltonian Monte Carlo (HMC)

• No U-Turn Sampling (NUTS)

• Stan → C++ → ネイティブバイナリ

Stan• CmdStan • コマンドラインから

• RStan • Rから

• PyStan • Pythonから

StanによるDLMdata(Nile)を使用

StanによるDLMdata {! int<lower=0> N;! matrix[1, N] y;!}!transformed data {! matrix[1, 1] F;! matrix[1, 1] G;! vector[1] m0;! cov_matrix[1] C0;!! F[1, 1] <- 1;! G[1, 1] <- 1;! m0[1] <- 0;! C0[1, 1] <- 1.0e+6;!}

データ

dlmと同様のデータを用意

StanによるDLMparameters {! real<lower=0> sigma[2];!}!transformed parameters {! vector[1] V;! cov_matrix[1] W;!! V[1] <- sigma[1] * sigma[1];! W[1, 1] <- sigma[2] * sigma[2];!}!

パラメータ

dlmと同様のパラメータを用意

StanによるDLM

model {! y ~ gaussian_dlm_obs(F, G, V, W, m0, C0);! sigma ~ uniform(0, 1.0e+6);!}

モデル

StanによるDLMlibrary(rstan)!!model <- stan("kalman.stan",! data = list(y = matrix(c(Nile),! nrow = 1),! N = length(Nile)),! pars = c("sigma"),! chains = 3,! iter = 1500, warmup = 500,! thin = 1)

MCMCの軌跡traceplot(fit, pars = "sigma", inc_warmup = FALSE)

StanによるDLM> print(fit)!Inference for Stan model: kalman.!3 chains, each with iter=1500; warmup=500; thin=1; !post-warmup draws per chain=1000, total post-warmup draws=3000.!! mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat!sigma[1] 121.2 0.5 13.8 92.6 112.7 121.5 130.3 148.4 889 1!sigma[2] 45.5 0.6 17.6 18.3 32.7 43.2 55.7 85.2 833 1!lp__ -541.6 0.0 1.1 -544.6 -542.0 -541.3 -540.9 -540.6 904 1!!Samples were drawn using NUTS(diag_e) at Sun Feb 9 06:06:42 2014.!For each parameter, n_eff is a crude measure of effective sample size,!and Rhat is the potential scale reduction factor on split chains (at !convergence, Rhat=1).!

StanによるDLM

sigma <- apply(extract(fit, "sigma")$sigma, 2, mean)!!library(dlm)!!buildNile <- function(theta) {! dlmModPoly(order = 1, dV = theta[1], dW = theta[2])!}!modNile <- buildNile(sigma^2)!smoothNile <- dlmSmooth(Nile, modNile)

ベイズ推定されたパラメータをdlmで使用

平滑化Stanでベイズ推定されたパラメータをdlmで使用

Stanによる状態空間モデルの解析

• gaussian_dlm_obs()でうまくいかないことも

• 自分でモデルを記述することも当然可能

Stanによる状態空間モデルの解析

data {! int<lower=0> N;! real y[N];!}!parameters {! real theta[N];! real<lower=0> sigma[2];!}!

Stanによる状態空間モデルの解析model {! // データモデル! for (t in 1:N) {! y[t] ~ normal(theta[t], sigma[1]);! }!! // プロセスモデル! for (t in 2:N) {! theta[t] ~ normal(theta[t - 1], sigma[2]);! }!! // 事前分布! theta[1] ~ normal(0, 1.0e+4);! sigma ~ uniform(0, 1.0e+6);!}

まとめ状態空間モデルをあつかえるソフトウェア

• Rパッケージ: dlm, KFAS • 関数に与える引数の意味を理解する。

• ベイズ推定: BUGS, Stan • 柔軟なモデリングが可能。 • 計算時間はかかる。

• 上記以外のソフトウェアもある。

状態空間モデルの実行方法と実行環境の比較

Technology

Transcript of 状態空間モデルの実行方法と実行環境の比較