Motivation Point Process Modelling Inference Data Analysis Summary
A Space-Time Conditional Intensity Model forInvasive Meningococcal Disease Occurrence
Sebastian Meyer1,3 Johannes Elias4 Michael Höhle2,3
1Division of Biostatistics, Institute for Social & Preventive Medicine, Univ. of Zürich2Department for Infectious Disease Epidemiology, Robert Koch Institute, Berlin3(previously) Department of Statistics, Ludwig-Maximilians-Universität, München4German Reference Centre for Meningococci, University of Würzburg, Würzburg
QMUL – Institute of ZoologyLondon, United Kingdom
7 September 2012
1 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Outline
1 Motivation
2 Space-Time Point Process Modelling
3 Inference
4 Data Analysis
5 Summary
2 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Motivation and Aim
Understanding the spread of an infectious disease is astep towards its controlThere is increased agreement that such dynamics arestochastic phenomena operating in a heterogeneouspopulationThe spatial and temporal resolution of infectious diseasedata is becoming better and better
Aim
Establish a regression framework for point referencedinfectious disease surveillance data, where the transmissiondynamics and its dependency on covariates can bequantified within the context of a spatio-temporal stochasticprocess.
3 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Application:Invasive meningococcal disease (IMD)
Description
Life-threatening infectious disease triggered by thebacterium Neisseria meningitidis (aka meningococcus)Involves meningitis (50%), septicemia (5–20%),pneumonia (5-15%)Transmission by mucous secretions, also airborne
Epidemiology
Yearly incidence (Germany, 2001–2008):0.5–1 infections per 100 000 inhabitantsMainly affected are infants and adolescentsLethality: 8.4%, for meningococcal sepsis: ≈ 40%
4 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Available IMD data
Two most common finetypes in Germany in 2002–2008:336 cases of B:P1.7-2,4:F1-5, 300 cases of C:P1.5,2:F3-3Case variables: date, residence postcode, age, gender
B:P1.7-2,4:F1-5
0
2
4
6
8
10
12
14
16
Time (month)
Num
ber
of c
ases
of t
he s
erog
roup
B fi
nety
pe
2002 2003 2004 2005 2006 2007 2008 2009
C:P1.5,2:F3-3
0
2
4
6
8
10
12
14
16
Time (month)
Num
ber
of c
ases
of t
he s
erog
roup
C fi
nety
pe
2002 2003 2004 2005 2006 2007 2008 2009
5 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Spatial distributionB:P1.7-2,4:F1-5
48°N
50°N
52°N
54°N
6°E 8°E 10°E 12°E 14°E
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
0
500
1000
1500
2000
2500
3000
3500
4000
4500
C:P1.5,2:F3-3
48°N
50°N
52°N
54°N
6°E 8°E 10°E 12°E 14°E
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Scientific question: Do the finetypes spread differently?
My task: Quantify the transmission dynamics.
6 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Relationship of IMD and influenzaWeekly numbers of SurvNet influenza cases
0 10 20 30 40 50
010
0020
0030
0040
00
Week
Num
ber
of in
fluen
za c
ases
●
●
●
●
●
●
●
2002200320042005
200620072008
Weekly numbers of SurvNet IMD cases
0 10 20 30 40 50
010
2030
4050
WeekN
umbe
r of
IMD
cas
es
●
●
●
●
●
●
●
2002200320042005
200620072008
Scientific question: Do waves of influenza predispose to IMDaccumulations?
Statistical solution: Quantify and test the local effect of (lagged)numbers of influenza cases on occurrences of IMD
7 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
1 Motivation
2 Space-Time Point Process Modelling
3 Inference
4 Data Analysis
5 Summary
8 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Conditional intensity function (CIF)
A regular spatio-temporal point process N on R+ ×R2 can beuniquely characterised by its left-continuous CIF λ∗(t,s).
Definition
λ∗(t,s) = limΔt→0, |ds|→0
P
�
N([t, t + Δt)× ds) = 1�
�Ht−�
Δt |ds|
Instantaneous event rate at (t,s) given all past eventsKey to modelling, likelihood analysis and simulation ofevolutionary (“self-exciting”) point processesIn application, N is only defined on a subset(0, T]×W ⊂ R+ ×R2 (observation period and region)
9 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Proposed additive-multiplicative continuousspace-time intensity model (twinstim)
λ∗(t,s) = h(t,s) + e∗(t,s)
Inspiration
Additive-multiplicative SIR(susceptible-infectious-recovered) compartmentalmodel (Höhle, 2009) for a fixed populationSpatio-temporal ETAS (epidemic-typeaftershock-sequences) model (Ogata, 1998)
10 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Proposed additive-multiplicative continuousspace-time intensity model (twinstim)
λ∗(t,s) = h(t,s) + e∗(t,s)
Multiplicative endemic component
h(t,s) = exp�
oξ(s) + β′zτ(t),ξ(s)�
Piecewise constant function on a spatio-temporal grid{C1, . . . , CD}× {A1, . . . , AM} with time interval index τ(t)and region index ξ(s)Region-specific offset oξ(s), e.g., log-population densityEndemic linear predictor β′zτ(t),ξ(s) includes discretisedtime trend and exogenous effects, e.g., influenza cases
10 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Proposed additive-multiplicative continuousspace-time intensity model (twinstim)
λ∗(t,s) = h(t,s) + e∗(t,s)
Additive epidemic (self-exciting) component
e∗(t,s) =∑
j∈∗(t,s;ϵ,δ)eηj gα(t − tj) ƒσ(s− sj)
Individual infectivity weighting through linear predictorηj = γ′mj based on the vector of unpredictable marksPositive parametric interaction functions, e.g.,
ƒσ(s) = exp�
− ‖s‖2
2σ2
�
and gα(t) = e−αt
Set of active infectives depends on fixed maximumtemporal and spatial interaction ranges ϵ and δ
10 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Marked extension with event type
Motivation: joint modelling of both finetypes of IMDAdditional dimension K = {1, . . . , K} for event type κ ∈ K
Marked CIF
λ∗(t,s, κ) = exp�
β0,κ + oξ(s) + β′zτ(t),ξ(s)�
+∑
j∈∗(t,s,κ;ϵ,δ)qκj,κ e
ηj gα(t − tj|κj) ƒσ(s− sj|κj)
Type-specific endemic interceptType-specific transmission, qk, ∈ {0,1}, k, ∈ KType-specific infection pressure ηj = γ′mj, κj is part of mj
Type-specific interaction functions, e.g., variances σ2κ
11 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Marked extension with event type
Motivation: joint modelling of both finetypes of IMDAdditional dimension K = {1, . . . , K} for event type κ ∈ K
Marked CIF
λ∗(t,s, κ) = exp�
β0,κ + oξ(s) + β′zτ(t),ξ(s)�
+∑
j∈∗(t,s,κ;ϵ,δ)qκj,κ e
ηj gα(t − tj|κj) ƒσ(s− sj|κj)
Type-specific endemic interceptType-specific transmission, qk, ∈ {0,1}, k, ∈ KType-specific infection pressure ηj = γ′mj, κj is part of mj
Type-specific interaction functions, e.g., variances σ2κ
11 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Marked extension with event type
Motivation: joint modelling of both finetypes of IMDAdditional dimension K = {1, . . . , K} for event type κ ∈ K
Marked CIF
λ∗(t,s, κ) = exp�
β0,κ + oξ(s) + β′zτ(t),ξ(s)�
+∑
j∈∗(t,s,κ;ϵ,δ)qκj,κ e
ηj gα(t − tj|κj) ƒσ(s− sj|κj)
Type-specific endemic interceptType-specific transmission, qk, ∈ {0,1}, k, ∈ KType-specific infection pressure ηj = γ′mj, κj is part of mj
Type-specific interaction functions, e.g., variances σ2κ
11 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Marked extension with event type
Motivation: joint modelling of both finetypes of IMDAdditional dimension K = {1, . . . , K} for event type κ ∈ K
Marked CIF
λ∗(t,s, κ) = exp�
β0,κ + oξ(s) + β′zτ(t),ξ(s)�
+∑
j∈∗(t,s,κ;ϵ,δ)qκj,κ e
ηj gα(t − tj|κj) ƒσ(s− sj|κj)
Type-specific endemic interceptType-specific transmission, qk, ∈ {0,1}, k, ∈ KType-specific infection pressure ηj = γ′mj, κj is part of mj
Type-specific interaction functions, e.g., variances σ2κ
11 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Marked extension with event type
Motivation: joint modelling of both finetypes of IMDAdditional dimension K = {1, . . . , K} for event type κ ∈ K
Marked CIF
λ∗(t,s, κ) = exp�
β0,κ + oξ(s) + β′zτ(t),ξ(s)�
+∑
j∈∗(t,s,κ;ϵ,δ)qκj,κ e
ηj gα(t − tj|κj) ƒσ(s− sj|κj)
Type-specific endemic interceptType-specific transmission, qk, ∈ {0,1}, k, ∈ KType-specific infection pressure ηj = γ′mj, κj is part of mj
Type-specific interaction functions, e.g., variances σ2κ
11 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
1 Motivation
2 Space-Time Point Process Modelling
3 Inference
4 Data Analysis
5 Summary
12 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Log-likelihood of the proposed model
Observed spatio-temporal marked point pattern:
=n
(t,s,m) : = 1, . . . , no
Covariate information zτ,ξ on a spatio-temporal grid:
(θ) =
n∑
=1
logλ∗θ(t,s, κ)
−∫ T
0
∫
W
∑
κ∈Kλ∗θ(t,s, κ)dt ds
θ =�
β′0,β′,γ′,σ′,α′
�′
Integration of epidemic component e∗θ(t,s, κ) involves
∫min{T−tj;ϵ}0 gα(t|κj)dt and
∫
�
W∩b(sj;δ)�
−sjƒσ(s|κj)ds
13 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Numerical log-likelihood maximisation
For a polygonal region R, perform approximation∫
Rƒσ(s)ds ≈
n∑
j=1
j ƒσ(sj)
with fixed evaluation points s1, . . . ,snBenchmark experiment ⇒ two-dimensional midpoint rulewith adaptive bandwidth choice depending on the valueof σ as best trade-off between accuracy and speedRathbun (1996): existence, consistence and asymptoticnormality of a local maximum θ̂ML as T →∞ for fixed WNewton-algorithm using R’s nlminb function withanalytical score function and expected Fisher information
14 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Goodness-of-fit and simulation
Define residuals
Y = Λ̂∗(t)− Λ̂∗(t−1), = 2, . . . , n,
where Λ̂∗(t) is the cumulative intensity functionIf the estimated CIF describes the true CIF well in thetemporal dimension, then U = 1− exp(−Y)
iid∼ U(0,1)Use the Kolmogorov-Smirnov test and plot the empiricaldistribution function of the U’s to check for deviationsAlternative: compare the observed epidemic withsimulations from the model using Ogata’s modifiedthinning (Daley & Vere-Jones, 2003, Algorithm 7.5.V.)
15 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
1 Motivation
2 Space-Time Point Process Modelling
3 Inference
4 Data Analysis
5 Summary
16 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Data representation: epidataCS classR> library("surveillance")R> # [... loads of data preparation ...]R> imdepi <- as.epidataCS(events, stgrid, W = germany, qmatrix = diag(2))R> print(imdepi, n=5, digits=5)
History of an epidemicObservation period: 0 -- 2562Observation window (bounding box): [4034.1, 4670.4] x [2686.7, 3543.2]Spatio-temporal grid (not shown): 366 time blocks, 413 tilesTypes of events: 'B' 'C'Overall number of events: 636
coordinates ID time tile type eps.t eps.s age sex BLOCK103 (4112.19, 3202.79) 1 0.99 05554 B 30 200 17 male 1402 (4122.51, 3076.97) 2 1.00 05382 C 30 200 3 male 1312 (4412.47, 2915.94) 3 6.00 09574 B 30 200 34 female 1314 (4202.64, 2879.7) 4 8.00 08212 B 30 200 15 female 2629 (4128.33, 3223.31) 5 23.00 05554 C 30 200 15 male 4
start popdensity influenza0 influenza1 influenza2 influenza3103 0 260.86 0 0 0 0402 0 519.36 0 0 0 0312 0 209.45 0 0 0 0314 7 1665.61 0 0 0 0629 21 260.86 0 0 0 0[....]
17 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
IMD model selection by AIC
Joint analysis of the two finetypesTemporal interaction function g: constantϵ = 30 days, δ = 200 kmDistrict-specific population density as endemic offset
Compare all models composed by subsets of thefollowing terms:
Common or finetype-specific endemic interceptLinear time trend and sine-cosine time-of-year effectsLinear effect of weekly number of influenza casesregistered in the district of a point (lag 0 – lag 3)Epidemic predictor with gender, age (categorized as 0-2,3-18, ≥19), finetype and age-finetype interactionSpatial interaction function ƒ : Gaussian or constant
18 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Example code
R> fit <- twinstim(+ endemic = ~ 1 + offset(log(popdensity)) + I(start/365) ++ sin(start*2*pi/365) + cos(start*2*pi/365),+ epidemic = ~ 1 + type + agegrp,+ siaf = siaf.gaussian(1),+ tiaf = tiaf.constant(),+ data = imdepi, subset = !is.na(agegrp),+ nCub = 36, nCub.adaptive = TRUE,+ optim.args = list(par = startvalues), model = TRUE+ )
19 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Model summary (1)
R> toLatex(summary(fit), digits=2, withAIC=FALSE)
Estimate Std. Error z value P(|Z| > |z|)h.(Intercept) −20.365 0.087 −233.5 < 2 · 10−16h.I(start/365) −0.049 0.022 −2.2 0.03
h.sin(start*2*pi/365) 0.262 0.065 4.0 6 · 10−05h.cos(start*2*pi/365) 0.267 0.064 4.1 3 · 10−05
e.(Intercept) −12.575 0.313 −40.2 < 2 · 10−16e.typeC −0.850 0.257 −3.3 0.001
e.agegrp[3,19) 0.646 0.320 2.0 0.04e.agegrp[19,Inf) −0.187 0.432 −0.4 0.67
e.siaf.1 2.829 0.082
endemic: common intercept, no influenza effect
epidemic: no gender effect, no age-finetype interaction, Gaussian ƒ
Basic reproduction numbers
μ̂B = 0.25 (95% CI: 0.19− 0.34)μ̂C = 0.11 (95% CI: 0.07− 0.17)
20 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Model summary (2)
R> intensityplot(fit, which = "total intensity", aggregate = "time",+ types = 1, col = "orangered", ylim = c(0,0.3))
B:P1.7-2,4:F1-5
0 500 1000 1500 2000 2500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Time [days]
Fitt
ed in
tens
ity p
roce
ss
total intensityendemic intensity
C:P1.5,2:F3-3
0 500 1000 1500 2000 25000.
000.
050.
100.
150.
200.
250.
30
Time [days]
Fitt
ed in
tens
ity p
roce
ss
total intensityendemic intensity
21 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Model summary (3)0.
40.
60.
81.
01.
21.
41.
6
Time
Mul
tiplic
ativ
e ef
fect
2002 2004 2006 2008
point estimate95% Wald CI
Typical IMD peak in late Februaryand minimum in August
0 50 100 150 200
0.0
0.2
0.4
0.6
0.8
1.0
Distance ||s − s j|| from hosteγ̂ C
I C(κ
j) f σ̂(||
s−
s j||)
point estimate type Bpoint estimate type C95% Wald CI for type B95% Wald CI for type C
Effective interaction range ≈ 50 km
22 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Goodness-of-fit (residual analysis)
R> checkResidualProcess(fit, plot=1)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u(i)
Cum
ulat
ive
dist
ribut
ion
deterministic tie-breaking
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u(i)
Cum
ulat
ive
dist
ribut
ion
U(0,1)-scheme
23 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Goodness-of-fit (simulation)
R> simulate(fit, nsim = 100,+ data = imdepi,+ tiles = districts,+ W = germany)
Compare observed7-year incidences with(2.5%, 97.5%)quantiles from 100simulations from thefitted CIF model
Many excess districtsaround Aachen at theborder to theNetherlands
Edge effects hidepotentialtransmissions acrossthe border
0
2
4
6
8
10
24 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Summary
twinstim is a comprehensive framework for themodelling, inference and simulation of generalself-exciting spatio-temporal point processes, e.g.,epidemics, forest fires, residential burglaries, riots,. . . Details in Meyer, Elias & Höhle (2012)
. . . and most importantly . . .
The twinstim implementation is available in the popular Rpackage surveillance (Höhle, Meyer & Paul, 2012)
25 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Acknowledgements
Michael Höhle, Robert Koch Institute, for fruitfulcollaborationJohannes Elias and Ulrich Vogel, University of Würzburg,for supplying the IMD data and for discussions on themicrobiological aspectsLudwig Fahrmeir, Ludwig-Maximilians-UniversitätMünchen, for providing helpful suggestions andcommentsStephen Price for inviting me to this seminar and forproviding an interesting applicationThe Munich Center of Health Sciences and the SwissNational Science Foundation for financial support
26 / 27
Motivation Point Process Modelling Inference Data Analysis Summary
Literature
Daley, D. J. & Vere-Jones, D. (2003). An introduction to the theory of point processes(2nd ed., Vol. I: Elementary Theory and Methods). New York: Springer-Verlag.
Höhle, M. (2009, December). Additive-multiplicative regression models forspatio-temporal epidemics. Biometrical Journal, 51(6), 961–978. doi:10.1002/bimj.200900050
Höhle, M., Meyer, S. & Paul, M. (2012). surveillance: Temporal and spatio-temporalmodeling and monitoring of epidemic phenomena [Computer software manual].Retrieved from http://surveillance.r-forge.r-project.org/ (R packageversion 1.4-2)
Meyer, S., Elias, J. & Höhle, M. (2012). A space-time conditional intensity model forinvasive meningococcal disease occurrence. Biometrics, 68(2), 607–616. doi:10.1111/j.1541-0420.2011.01684.x
Ogata, Y. (1998, June). Space-time point-process models for earthquake occurrences.Annals of the Institute of Statistical Mathematics, 50(2), 379–402. doi:10.1023/A:1003403601725
Rathbun, S. L. (1996). Asymptotic properties of the maximum likelihood estimator forspatio-temporal point processes. Journal of Statistical Planning and Inference, 51(1),55–74. doi: 10.1016/0378-3758(95)00070-4
27 / 27
Top Related