Recent Developments in Statistical Reconstruction for...
Transcript of Recent Developments in Statistical Reconstruction for...
Recent Developments in Statistical pReconstruction for Emission Tomography
배재대학교 전자공학과
이수진
Email: [email protected]://presto.pcu.ac.kr
Outline
•Background for Tomographic Reconstruction•Deterministic vs. Statistical Approachespp•Maximum-Likelihood (ML) approaches•Expectation Maximization (EM) Algorithm•Accelerated EM by Ordered Subsets (OSEM)•Complete-Data OSEM (COSEM)•Penalized-Likelihood (PL) Approaches with Local Regularizers•Penalty functions
C O i i i•Convex Optimization•Non-Convex Optimization•PL Approaches with Non-Local Regularizers
A li ti E l•Application Examples(Use of Anatomical Side Information, Super Resolution)
What Object is Reconstructed?What Object is Reconstructed?
In emission imaging, our aim is to image the radiotracer distribution.At time t=0, we inject the patient with some radiotracer, containing a“large” number N of metastable atoms of some radionuclide.Let denote the position of the kth tracer atom at time t.3( )k t X pThese positions are influenced by blood flow, patient physiology,and unpredictable phenomena such as Brownian motion.
( )k
1 1 1 1( ) ( ), ( ), ( )t x t y t z tX
( ) ( ), ( ), ( )N N N Nt x t y t z tX
The ultimate imaging device would provide an exact list of thespatial locations of all tracer atoms for the entire scan.1( ), , ( )Nt tX X
h d ( )fImage reconstruction is to estimate the emission density . ( )f x
Deterministic ApproachesDeterministic Approachesig
g = Hf1
, 1,...,N
i ij jj
g h f i M
i
ijhor
11 1 12 2 13 3 1 1
21 1 22 2 23 3 2 2
N N
N N
h f h f h f h f gh f h f h f h f g
jf
1 1 2 2 3 3M M M MN N Mh f h f h f h f g
Each equation represents a hyperplane in an N-dimensional space.
• The numbers M and N are prohibitively large.• A unique solution will not exist if M < N (under-determined).
q p yp p p
A unique solution will not exist if M < N (under determined).• No solution may exist when M > N.• are corrupted by noise. ig
The Kacmarz method of solving algebraic equationsThe Kacmarz method of solving algebraic equations
11 1 12 2 13 3 1 1N Nh f h f h f h f g
21 1 22 2 23 3 2 2
1 1 2 2 3 3
N N
M M M MN N M
h f h f h f h f g
h f h f h f h f g
solution2f
21 1 22 2 2h f h f g
2f i i i l
0f1f
f initial guess
1f
11 1 12 2 1h f h f g 1f
Problems with Deterministic Approaches- Ignores statistical noise- Ignores statistical noise.- Yields negative intensity values.-The ramp filter in FBP accentuates high-frequency noise.
Projection Data(sinogram)
FBP Reconstruction Statistical Reconstruction
Statistical Reconstruction MethodsStatistical Reconstruction Methods
Why statistical methods?
Object constraints (e.g. nonnegativity) Accurate physical models (e.g. nonuniform attenuation) Appropriate statistical models Appropriate statistical models Side information (e.g. MRI or CT boundaries) Nonstandard geometries (“missing” data)
Disadvantages?
Computation timep Model complexity Software complexity Less predictable (due to nonlinearities)
Remark: FBP has its faults, but its properties (good and bad) are very wellunderstood and hence predictable, due to its linearity.
Emission Reconstruction Problem
pn
Estimate the emission density vector using:1 2, , ,pnf f f f
1Poisson , 1, ,i ij j i d
jG h f r i n
: system matrix (determined by system models) H h
Notations:
: system matrix (determined by system models) ijH h
Notations:ig
F: Random fields for underlying image(lexicographically ordered elements: Fj )
G: Random fields for projection(lexicographically ordered elements: Gi )
f: instantaneous value for Fjf f: instantaneous value for F
(lexicographically ordered elements: fj ) g: instantaneous value for G
(lexicographically ordered elements: g )(lexicographically ordered elements: gi ) Pr(F=f): probability that the random field F takes the value f.
Maximum Likelihood (ML) ApproachesMaximum Likelihood (ML) Approaches
The likelihood for the projection formation process is expressed asa product of independent Poisson distributions:a p oduct o depe de t o sso d st but o s
Pr( | ) ,!
i ig gi
i i
g eg
G g f where i ij j ij
g H f ri ig
The maximum likelihood (ML) estimate attempts to find the objectthat is most likely to have given rise to the collected data g:
f̂
ˆ arg max Pr( | ) arg max log Pr( | )
arg max logg g g
f f
f G g f G g f
arg max logi i ii
g g g f
The ML EM AlgorithmThe ML-EM Algorithm
Derivation of ML-EMIt b diffi lt bl t di tl i i th lik lih d It may be a difficult problem to directly maximize the likelihoodPr( | ).G g f
The EM approach assumes that G is thetg
The EM approach assumes that G is theobservable but “incomplete” data set.
The EM algorithm postulates a“complete data” or “missing data” such
i
,t ijc
complete data or missing data suchthat G is a function of the complete data.
The relationship between the completedata and the incomplete data must be adata and the incomplete data must be amany-to-one mapping.
C may be identified with the unobservablel t d t t G ith th b bl jcomplete data set, G with the observable
incomplete data set, and f with theparameter to be estimated. , ,t t ij
ij
G C
Instead of estimating C directly, maximize the conditional expectation:
log Pr | | ,
log Pr | Pr | ,
E
C
C
G g f G g f
G g f C c G g f
In fact, the conditional expectation is reduced to the log likelihood:
log Pr | Pr | , G g f C c G g f
log Pr | Pr | ,
l P |
C
CG g f C c G g f
G f
log Pr | G g f
BACKPROJECTION
1ˆ
ˆ ,ˆ
kj ij ik
j kiij in ni n
f gf
f
BACKPROJECTION
nPROJECTION
Accelerated EM by Ordered Subsets (OSEM)
OSEM subdivides projection data into several subsets (or blocks) andprogressively processes each subset of projections by calculating
j ti d b k j ti i h it ti
y
projection and backprojection in each iteration.
The OSEM algorithm accelerates convergence by a factor proportional tothe number of subsets. (order-of-magnitude acceleration)
1st block 2nd block 3rd block 4th block
OSEM (1st iteration) Standard EM(1st iteration)
Problems with OSEM
OSEM is fast but has the problem of no objective, and no theoretical convergence proof.
x 106 Objective function vs Iteration
With many subsets, OSEM tends to approach a suboptimal limit cycle.
-1.5
-1.48
-1.46x 10 Objective function vs Iteration
EMCOSEM-64OSEM-64
-1.54
-1.52
e fu
nctio
n
1 6
-1.58
-1.56
Obj
ectiv
e
0 10 20 30 40 50 60 70 80 90 100-1.64
-1.62
-1.6
0 10 20 30 40 50 60 70 80 90 100
Iteration
Re-derivation of ML-EM
Incomplete-Data Negative Log-Likelihood in MLEM:
( ) log Pr( | )
log loginc
i i i ij j i ij ji i j i j
Eg g g H f g H f
f G g f
The MLEM estimation can be re-derived as an alternating minimizationon the following objective function:
( ) l lE C H f H f C C C f
( , ) log log
cmp ij ij j ij j ij iji j i j i j
i ij ii j
E C H f H f C C
C g
C f
The Lagrange parameters express the complete-incompletedata constraint.
; 1,...,i i M
Complete-Data OSEM (COSEM)
The alternating minimization on C and f is performed by first minimizing E(C; f)with respect to C while keeping f fixed, and this leads to
p
(C, f )0cmp ij j
ij iij in nn
E H fC g
C H f
which is exactly the same as the E-step of the standard ML-EM algorithm.
Next, optimize E(C; f) w.r.t. f with fixed C to get
j n
( , ) 0 ijij
j iji
CE C f ff H
Therefore, at iteration k+1, the alternation becomes
j ijif
Estimate C Estimate f
1ˆ
ˆˆ
kij jk
ij i ki
H fC g
H f
11
ˆˆ
kijk i
jij
Cf
H
Identical to ML-EM
1ˆ
ˆˆ
kj ij ik
j kiij i
f gf
f
Estimate C Estimate f
in nnH f iji
H iij in ni nf
Extension of EM to Regularized EM (MAP-EM)g
Despite the order-of-magnitude acceleration of OS-EM, due to theinherent instability problem of ML-EM, the noise artifact of MLsolutions is magnified beyond a critical number of iterationssolutions is magnified beyond a critical number of iterations.
RM
SER
The instability problem can be alleviated by extending EM toregularized EM in the context of a Bayesian maximum a posteriori
iteration
regularized EM in the context of a Bayesian maximum a posteriori(MAP) framework.
Bayesian MAP (a.k.a. penalized-likelihoos (PL)) approaches allow theincorporation of suitable prior models to regularize the ill-posedincorporation of suitable prior models to regularize the ill posednature of the tomographic inversion problem.
Maximum A Posteriori (MAP) Approach
Bayes’ Theorem:i
Maximum A Posteriori (MAP) Approach(Bayesain or regularized-EM approach)
ypriorlikelihood
posterior
Pr( = | = ) Pr( = )Pr( = | = ) = Pr( = )
G g F f F fF f G gG g
MAP Estimation:ˆ argmax Pr( = | = )f = F f G g
posteriorconstant
argmax Pr( = | = )
argmax log Pr( = | ) + log Pr( )
ff = F f G g
= G g F = f F = fg g ( | ) g ( ) f
likelihood prior
g
For the likelihood, Poisson statistics are applied.
Pr( | ) ,!
t tg gt
t t
g eg
G g F f where ,t t ij ij tij
g H f r
The prior probability is modeled as a Gibbs distribution.
Gibbs Prior Distributions (Local Regularization)Gibbs distributions provide mathematically powerful machinery to model aclass of priors that specify local spatial correlations.
Gibbs Prior Distributions (Local Regularization)
1Pr = exp PEZ
F f f
Z: normalizing function (also known as “partition function”) : positive constant Ep: prior energy function f fjf kf
Penalty Function
( ) ;j
P j kj k N
E f f
f
4-nearest neighborsSurface plot of primate autoradiograph
Penalty Function
The potential function is usually defined so that its value is reduced as thedifference between the values of two pixels in the neighborhood Nj is reduced.
obtained with the benzodiazepineneuroreceptor agent Iomazenil (123I).
p g j Neighboring pixels in the underlying source are assumed, with few exceptions,
to have similar intensities.
P l F i C id iPenalty Function Considerations
Computationp Algorithm complexity Uniqueness of maximum of () Resolution properties (edge preserving?) # of adjustable (free) parameters Predictability of properties (resolution and noise)
Choices:
Quadratic vs nonquadraticQuadratic vs. nonquadratic Convex vs. nonconvex
P l F i Q d i N d iPenalty Functions: Quadratic vs. Nonquadratic
Quadratic: 2( ) ,QD
Simpler optimization Global smoothing
Nonquadratic: Edge preserving More complicated optimization (essentially solved in convex case) Unusual noise properties (and harder to predict moments) More adjustable (free) parameters
Example: Huber function2
2
,( )
2 ,
HB
Representative Convex Non Quadratic (CNQ) Penalty FunctionsRepresentative Convex Non-Quadratic (CNQ) Penalty Functions
/d d qualitative shapes of qualitative shapes of
2
2
,( ) ,
2 ,HB
2( ) ,QD ( ) log cosh ,GR
( ) ,1 2,BS 2( ) / log 1 /LN
ML MAP-QD MAP-CNQSinogram
PET with detector’s gaps
5
10
15
Penalty Functions: Convex vs. Nonconvex
Convex: Easier to optimize Guaranteed unique extremum of
Nonconvex: Greater degree of edge preservation Nice images for piecewise-constant objects Even more unusual noise properties Multiple extrema More complicated optimization (simulated annealing, deterministic
annealing)ˆ Estimator becomes a discontinuous function of G.f
Example: The “broken parabola” penalty function
Representative Convex Optimization Algorithms
Convergence rate Global Convergence Relaxation
Representative Convex Optimization Algorithms
ML-EM [1]OSEM [2]RAMLA/BSREM [3 4]
SlowFastFast
YesNo
Yes/No
NoNoYesRAMLA/BSREM [3,4] Fast
Fast
Yes/No
Yes
Yes
No
OS-SPS [5,6] (ECT/TCT) Fast No Yes
TRIOT [7] (TCT)COSEM/MAP [8] Moderate NoYesACOSEM [9] Fast NoYes
[1] Shepp and Vardi, IEEE-TMI, Oct. 1982.[2] Hudson and Larkin, IEEE-TMI, Dec. 1994.[3] Browne and De Pierro, IEEE-TMI, Oct.1996.[4] De Pierro and Yamagishi, IEEE-TMI, Apr. 2001.[5] De Pierro, IEEE-TMI, 1995.[6] Erdogan and Fessler, PMB, Nov. 1999[7] Ahn, Fessler, Blatt and Hero, IEEE-TMI, Mar. 2006[8] Hsiao, Rangarajan, Khurd and Gindi, PMB, May 2004.[9] Hsiao and Huang, PMB, Jan. 2010.
Convex Optimization: Optimization Transfer Method(Functional Substitution Method)
When faced with a convex objective function that is difficult tominimize (or maximize), at the n-th iteration, is replaced with asurrogate function that is easier to minimize (or maximize).
Φ( )fΦ( )f
( )( ; )k f f
Φ( )f( )( ; )k f f
Two conditions for :( )( ; )k f f( ) ( ) ( ) ( )i) ( ) ( ) ( ; ) ( ; )k k k k f f f f f f
( ) ( )
( )ii) ( ) ( ; )k k
k
f f f f
f f f
l OS S S (S bl b l d l S ) COS
f( )kf ( 1)kf ( 2)kf
Examples: OS-SPS (Separable Paraboloidal Surrogates), COSEM-MAP
NonConvex Penalty
2 2,
y
2,
Since the broken parabola function is nonconvex, the overallenergy function that includes such a function may havenumerous stable states
Φ( )fnumerous stable states.
Φ( )f
f
N C O i i i D i i i A li M h dNonConvex Optimization: Deterministic Annealing Method
A sequence of energy functions is constructed by transforming theb bilit
1Pr exp ( )PEZ
F f f
probabilitydistributions to
At small , the new energy becomes a smooth version of the original energy function.
As , the new energy approaches the original energy function.
=1
=10
=100
=1000
(The parameter may be identified as the inverse of a computational temperature used in conventional simulated annealing )simulated annealing.)
Non-Local Regularization (NLR) Method
Self-Similarity in Medical Images
“Every small patch in an image has many similar patches in the same image”
N L l M (NLM) Al i h f I D i iNonLocal Means (NLM) Algorithm for Image Denoising
noisy image denoised image denoised image y g g(using local method) (using NLM algorithm )
Principle of NonLocal Means Algorithmp g
( ) ( ) ( ) ,jkP j k
j k
E N NW
f f f
j
jj k jW
j jkk
W
2k 2
( ) ( )N N f f
jk
j : nonlocal search window for pixel j
1jk
2jk
j
def
'
patch P ( )
; 'j j
j j
N
f j N
f
2
( ) ( )exp j k
jk
N N
h
f f
2 2pN
3jk1k
3k
: p-th pixel in patch( )j p Pj
2 2( ) ( )
1( ) ( )
p
j k j p k pp
N N f f
f f
j( )k p : p-th pixel in patch Pk
Anatomical Priors for PET Reconstruction
(a) (b) (c)(a) (b) (c)
MR image PET image PET-MRI Co-registered image
Incorporating Anatomical Side Information
( ) ( ) ( )jk
j
j jkk
W
j : nonlocal search window
( ) ( ) ( ) ,j
jkP j k
j k j
E N NW
f f f
jfor pixel j( ) ( , )F A
jk jk jk af f
( )Fjk f ( , )A
jk a f
kk
j
k
j
k
functional image (f) anatomical image (a)
(f, a) The weight reflects the similaritybetween the local neighborhoods Njand N in the functional image and
jk
( ) ( ) 0j kN N a aY
and Nk in the functional image andalso reflects the similarity between thecorresponding local neighborhoods Njand N in the anatomical image
N
Y ( ) ( )N N h a a
and Nk in the anatomical image.
2
( ( ) ( ))( ) exp j kF
jkN N
f ff
( )F fN( ) ( )F A f f
( ) ( )j k AN N h a a 2( ) pjk h
2
( ( ) ( ))( , ) exp j kA
jkA
N N
h
a afa
( )jk jk fN
( ) ( )F A af f
(f)(f)
( ) ( )jk jk jk f f
2 2
( ( ) ( )) ( ( ) ( ))1 exp exp
A
A
j k j k
A F
h
N N N N
h h
f faa
( ) ( , )jk jk jk af f
(f, a)
( ) ( ) 1)A A f fa(
( )Ajk f
( ) ( , ) 1)jk jk f fa(
Nguyen and Lee, IEEE Trans. Image Proc., 22(10), 2013.
functional image anatomical image functional image anatomical image
MAP-QD withoutside information
MAP-QD with anatomicalboundary information
MAP-QD withoutside information
MAP-QD with anatomicalboundary information
MAP-NLR withoutside information
MAP-NLR with anatomicalinformation
MAP-NLR withoutside information
MAP-NLR with anatomicalinformation
Super-Resolution Reconstruction Using NonLocal & Local RegularizersSuper Resolution Reconstruction Using NonLocal & Local Regularizers
(d) NLQ(b) LNQ (c) LNQ+NLQ(a) LR phantom
(h) NLQ(f) LNQ (g) LNQ+NLQ(e) HR phantomL l i (LR) d hi h l i (HR) f h d d l i (( ) (d) LR iLow-resolution (LR) and high-resolution (HR) software phantoms and anecdotal reconstructions ((a)-(d) LR images, (e)-(h) HR images): (a) LR phantom; (b) LR image reconstructed by PL-LNQ (PE=28.42%); (c) LR imagereconstructed by PL-NLQ+LNQ with τ=0.6 (PE=27.91%) (d) LR image reconstructed by PL-NLQ (PE=27.98%);(e) HR phantom; (f) HR image reconstructed by PL-LNQ (PE=25.89%); (g) HR image reconstructed by PL-NLQ+LNQ with 0.6 (PE=25.27%); (h) HR image reconstructed by PL-NLQ (PE=25.95%).Q ( ); ( ) g y Q ( )