Recent Developments in Statistical Reconstruction for...

Recent Developments in Statistical pReconstruction for Emission Tomography

배재대학교 전자공학과

이수진

Email: [email protected]://presto.pcu.ac.kr

Outline

•Background for Tomographic Reconstruction•Deterministic vs. Statistical Approachespp•Maximum-Likelihood (ML) approaches•Expectation Maximization (EM) Algorithm•Accelerated EM by Ordered Subsets (OSEM)•Complete-Data OSEM (COSEM)•Penalized-Likelihood (PL) Approaches with Local Regularizers•Penalty functions

C O i i i•Convex Optimization•Non-Convex Optimization•PL Approaches with Non-Local Regularizers

A li ti E l•Application Examples(Use of Anatomical Side Information, Super Resolution)

What Object is Reconstructed?What Object is Reconstructed?

In emission imaging, our aim is to image the radiotracer distribution.At time t=0, we inject the patient with some radiotracer, containing a“large” number N of metastable atoms of some radionuclide.Let denote the position of the kth tracer atom at time t.3( )k t X pThese positions are influenced by blood flow, patient physiology,and unpredictable phenomena such as Brownian motion.

( )k

1 1 1 1( ) ( ), ( ), ( )t x t y t z tX

( ) ( ), ( ), ( )N N N Nt x t y t z tX

The ultimate imaging device would provide an exact list of thespatial locations of all tracer atoms for the entire scan.1( ), , ( )Nt tX X

h d ( )fImage reconstruction is to estimate the emission density . ( )f x

Deterministic ApproachesDeterministic Approachesig

g = Hf1

, 1,...,N

i ij jj

g h f i M

i

ijhor

11 1 12 2 13 3 1 1

21 1 22 2 23 3 2 2

N N

N N

h f h f h f h f gh f h f h f h f g

jf

1 1 2 2 3 3M M M MN N Mh f h f h f h f g

Each equation represents a hyperplane in an N-dimensional space.

• The numbers M and N are prohibitively large.• A unique solution will not exist if M < N (under-determined).

q p yp p p

A unique solution will not exist if M < N (under determined).• No solution may exist when M > N.• are corrupted by noise. ig

The Kacmarz method of solving algebraic equationsThe Kacmarz method of solving algebraic equations

11 1 12 2 13 3 1 1N Nh f h f h f h f g

21 1 22 2 23 3 2 2

1 1 2 2 3 3

N N

M M M MN N M

h f h f h f h f g

h f h f h f h f g

solution2f

21 1 22 2 2h f h f g

2f i i i l

0f1f

f initial guess

1f

11 1 12 2 1h f h f g 1f

Problems with Deterministic Approaches- Ignores statistical noise- Ignores statistical noise.- Yields negative intensity values.-The ramp filter in FBP accentuates high-frequency noise.

Projection Data(sinogram)

FBP Reconstruction Statistical Reconstruction

Statistical Reconstruction MethodsStatistical Reconstruction Methods

Why statistical methods?

Object constraints (e.g. nonnegativity) Accurate physical models (e.g. nonuniform attenuation) Appropriate statistical models Appropriate statistical models Side information (e.g. MRI or CT boundaries) Nonstandard geometries (“missing” data)

Disadvantages?

Computation timep Model complexity Software complexity Less predictable (due to nonlinearities)

Remark: FBP has its faults, but its properties (good and bad) are very wellunderstood and hence predictable, due to its linearity.

Emission Reconstruction Problem

pn

Estimate the emission density vector using:1 2, , ,pnf f f f

1Poisson , 1, ,i ij j i d

jG h f r i n

: system matrix (determined by system models) H h

Notations:

: system matrix (determined by system models) ijH h

Notations:ig

F: Random fields for underlying image(lexicographically ordered elements: Fj )

G: Random fields for projection(lexicographically ordered elements: Gi )

f: instantaneous value for Fjf f: instantaneous value for F

(lexicographically ordered elements: fj ) g: instantaneous value for G

(lexicographically ordered elements: g )(lexicographically ordered elements: gi ) Pr(F=f): probability that the random field F takes the value f.

Maximum Likelihood (ML) ApproachesMaximum Likelihood (ML) Approaches

The likelihood for the projection formation process is expressed asa product of independent Poisson distributions:a p oduct o depe de t o sso d st but o s

Pr( | ) ,!

i ig gi

i i

g eg

G g f where i ij j ij

g H f ri ig

The maximum likelihood (ML) estimate attempts to find the objectthat is most likely to have given rise to the collected data g:

f̂

ˆ arg max Pr( | ) arg max log Pr( | )

arg max logg g g

f f

f G g f G g f

arg max logi i ii

g g g f

The ML EM AlgorithmThe ML-EM Algorithm

Derivation of ML-EMIt b diffi lt bl t di tl i i th lik lih d It may be a difficult problem to directly maximize the likelihoodPr( | ).G g f

The EM approach assumes that G is thetg

The EM approach assumes that G is theobservable but “incomplete” data set.

The EM algorithm postulates a“complete data” or “missing data” such

i

,t ijc

complete data or missing data suchthat G is a function of the complete data.

The relationship between the completedata and the incomplete data must be adata and the incomplete data must be amany-to-one mapping.

C may be identified with the unobservablel t d t t G ith th b bl jcomplete data set, G with the observable

incomplete data set, and f with theparameter to be estimated. , ,t t ij

ij

G C

Accelerated EM by Ordered Subsets (OSEM)

OSEM subdivides projection data into several subsets (or blocks) andprogressively processes each subset of projections by calculating

j ti d b k j ti i h it ti

y

projection and backprojection in each iteration.

The OSEM algorithm accelerates convergence by a factor proportional tothe number of subsets. (order-of-magnitude acceleration)

1st block 2nd block 3rd block 4th block

OSEM (1st iteration) Standard EM(1st iteration)

Problems with OSEM

OSEM is fast but has the problem of no objective, and no theoretical convergence proof.

x 106 Objective function vs Iteration

With many subsets, OSEM tends to approach a suboptimal limit cycle.

-1.5

-1.48

-1.46x 10 Objective function vs Iteration

EMCOSEM-64OSEM-64

-1.54

-1.52

e fu

nctio

n

1 6

-1.58

-1.56

Obj

ectiv

e

0 10 20 30 40 50 60 70 80 90 100-1.64

-1.62

-1.6

0 10 20 30 40 50 60 70 80 90 100

Iteration

Re-derivation of ML-EM

Incomplete-Data Negative Log-Likelihood in MLEM:

( ) log Pr( | )

log loginc

i i i ij j i ij ji i j i j

Eg g g H f g H f

f G g f

The MLEM estimation can be re-derived as an alternating minimizationon the following objective function:

( ) l lE C H f H f C C C f

( , ) log log

cmp ij ij j ij j ij iji j i j i j

i ij ii j

E C H f H f C C

C g

C f

The Lagrange parameters express the complete-incompletedata constraint.

; 1,...,i i M

Complete-Data OSEM (COSEM)

The alternating minimization on C and f is performed by first minimizing E(C; f)with respect to C while keeping f fixed, and this leads to

p

(C, f )0cmp ij j

ij iij in nn

E H fC g

C H f

which is exactly the same as the E-step of the standard ML-EM algorithm.

Next, optimize E(C; f) w.r.t. f with fixed C to get

j n

( , ) 0 ijij

j iji

CE C f ff H

Therefore, at iteration k+1, the alternation becomes

j ijif

Estimate C Estimate f

1ˆ

ˆˆ

kij jk

ij i ki

H fC g

H f

11

ˆˆ

kijk i

jij

Cf

H

Identical to ML-EM

1ˆ

ˆˆ

kj ij ik

j kiij i

f gf

f

Estimate C Estimate f

in nnH f iji

H iij in ni nf

Extension of EM to Regularized EM (MAP-EM)g

Despite the order-of-magnitude acceleration of OS-EM, due to theinherent instability problem of ML-EM, the noise artifact of MLsolutions is magnified beyond a critical number of iterationssolutions is magnified beyond a critical number of iterations.

RM

SER

The instability problem can be alleviated by extending EM toregularized EM in the context of a Bayesian maximum a posteriori

iteration

regularized EM in the context of a Bayesian maximum a posteriori(MAP) framework.

Bayesian MAP (a.k.a. penalized-likelihoos (PL)) approaches allow theincorporation of suitable prior models to regularize the ill-posedincorporation of suitable prior models to regularize the ill posednature of the tomographic inversion problem.

Maximum A Posteriori (MAP) Approach

Bayes’ Theorem:i

Maximum A Posteriori (MAP) Approach(Bayesain or regularized-EM approach)

ypriorlikelihood

posterior

Pr( = | = ) Pr( = )Pr( = | = ) = Pr( = )

G g F f F fF f G gG g

MAP Estimation:ˆ argmax Pr( = | = )f = F f G g

posteriorconstant

argmax Pr( = | = )

argmax log Pr( = | ) + log Pr( )

ff = F f G g

= G g F = f F = fg g ( | ) g ( ) f

likelihood prior

g

For the likelihood, Poisson statistics are applied.

Pr( | ) ,!

t tg gt

t t

g eg

G g F f where ,t t ij ij tij

g H f r

The prior probability is modeled as a Gibbs distribution.

Gibbs Prior Distributions (Local Regularization)Gibbs distributions provide mathematically powerful machinery to model aclass of priors that specify local spatial correlations.

Gibbs Prior Distributions (Local Regularization)

1Pr = exp PEZ

F f f

Z: normalizing function (also known as “partition function”) : positive constant Ep: prior energy function f fjf kf

Penalty Function

( ) ;j

P j kj k N

E f f

f

4-nearest neighborsSurface plot of primate autoradiograph

Penalty Function

The potential function is usually defined so that its value is reduced as thedifference between the values of two pixels in the neighborhood Nj is reduced.

obtained with the benzodiazepineneuroreceptor agent Iomazenil (123I).

p g j Neighboring pixels in the underlying source are assumed, with few exceptions,

to have similar intensities.

P l F i C id iPenalty Function Considerations

Computationp Algorithm complexity Uniqueness of maximum of () Resolution properties (edge preserving?) # of adjustable (free) parameters Predictability of properties (resolution and noise)

Choices:

Quadratic vs nonquadraticQuadratic vs. nonquadratic Convex vs. nonconvex

P l F i Q d i N d iPenalty Functions: Quadratic vs. Nonquadratic

Quadratic: 2( ) ,QD

Simpler optimization Global smoothing

Nonquadratic: Edge preserving More complicated optimization (essentially solved in convex case) Unusual noise properties (and harder to predict moments) More adjustable (free) parameters

Example: Huber function2

2

,( )

2 ,

HB

Representative Convex Non Quadratic (CNQ) Penalty FunctionsRepresentative Convex Non-Quadratic (CNQ) Penalty Functions

/d d qualitative shapes of qualitative shapes of

2

2

,( ) ,

2 ,HB

2( ) ,QD ( ) log cosh ,GR

( ) ,1 2,BS 2( ) / log 1 /LN

ML MAP-QD MAP-CNQSinogram

PET with detector’s gaps

5

10

15

Penalty Functions: Convex vs. Nonconvex

Convex: Easier to optimize Guaranteed unique extremum of

Nonconvex: Greater degree of edge preservation Nice images for piecewise-constant objects Even more unusual noise properties Multiple extrema More complicated optimization (simulated annealing, deterministic

annealing)ˆ Estimator becomes a discontinuous function of G.f

Example: The “broken parabola” penalty function

Representative Convex Optimization Algorithms

Convergence rate Global Convergence Relaxation

Representative Convex Optimization Algorithms

ML-EM [1]OSEM [2]RAMLA/BSREM [3 4]

SlowFastFast

YesNo

Yes/No

NoNoYesRAMLA/BSREM [3,4] Fast

Fast

Yes/No

Yes

Yes

No

OS-SPS [5,6] (ECT/TCT) Fast No Yes

TRIOT [7] (TCT)COSEM/MAP [8] Moderate NoYesACOSEM [9] Fast NoYes

[1] Shepp and Vardi, IEEE-TMI, Oct. 1982.[2] Hudson and Larkin, IEEE-TMI, Dec. 1994.[3] Browne and De Pierro, IEEE-TMI, Oct.1996.[4] De Pierro and Yamagishi, IEEE-TMI, Apr. 2001.[5] De Pierro, IEEE-TMI, 1995.[6] Erdogan and Fessler, PMB, Nov. 1999[7] Ahn, Fessler, Blatt and Hero, IEEE-TMI, Mar. 2006[8] Hsiao, Rangarajan, Khurd and Gindi, PMB, May 2004.[9] Hsiao and Huang, PMB, Jan. 2010.

Convex Optimization: Optimization Transfer Method(Functional Substitution Method)

When faced with a convex objective function that is difficult tominimize (or maximize), at the n-th iteration, is replaced with asurrogate function that is easier to minimize (or maximize).

Φ( )fΦ( )f

( )( ; )k f f

Φ( )f( )( ; )k f f

Two conditions for :( )( ; )k f f( ) ( ) ( ) ( )i) ( ) ( ) ( ; ) ( ; )k k k k f f f f f f

( ) ( )

( )ii) ( ) ( ; )k k

k

f f f f

f f f

l OS S S (S bl b l d l S ) COS

f( )kf ( 1)kf ( 2)kf

Examples: OS-SPS (Separable Paraboloidal Surrogates), COSEM-MAP

NonConvex Penalty

2 2,

y

2,

Since the broken parabola function is nonconvex, the overallenergy function that includes such a function may havenumerous stable states

Φ( )fnumerous stable states.

Φ( )f

f

N C O i i i D i i i A li M h dNonConvex Optimization: Deterministic Annealing Method

A sequence of energy functions is constructed by transforming theb bilit

1Pr exp ( )PEZ

F f f

probabilitydistributions to

At small , the new energy becomes a smooth version of the original energy function.

As , the new energy approaches the original energy function.

=1

=10

=100

=1000

(The parameter may be identified as the inverse of a computational temperature used in conventional simulated annealing )simulated annealing.)

Non-Local Regularization (NLR) Method

Self-Similarity in Medical Images

“Every small patch in an image has many similar patches in the same image”

N L l M (NLM) Al i h f I D i iNonLocal Means (NLM) Algorithm for Image Denoising

noisy image denoised image denoised image y g g(using local method) (using NLM algorithm )

Principle of NonLocal Means Algorithmp g

( ) ( ) ( ) ,jkP j k

j k

E N NW

f f f

j

jj k jW

j jkk

W

2k 2

( ) ( )N N f f

jk

j : nonlocal search window for pixel j

1jk

2jk

j

def

'

patch P ( )

; 'j j

j j

N

f j N

f

2

( ) ( )exp j k

jk

N N

h

f f

2 2pN

3jk1k

3k

: p-th pixel in patch( )j p Pj

2 2( ) ( )

1( ) ( )

p

j k j p k pp

N N f f

f f

j( )k p : p-th pixel in patch Pk

Anatomical Priors for PET Reconstruction

(a) (b) (c)(a) (b) (c)

MR image PET image PET-MRI Co-registered image

Incorporating Anatomical Side Information

( ) ( ) ( )jk

j

j jkk

W

j : nonlocal search window

( ) ( ) ( ) ,j

jkP j k

j k j

E N NW

f f f

jfor pixel j( ) ( , )F A

jk jk jk af f

( )Fjk f ( , )A

jk a f

kk

j

k

j

k

functional image (f) anatomical image (a)

(f, a) The weight reflects the similaritybetween the local neighborhoods Njand N in the functional image and

jk

( ) ( ) 0j kN N a aY

and Nk in the functional image andalso reflects the similarity between thecorresponding local neighborhoods Njand N in the anatomical image

N

Y ( ) ( )N N h a a

and Nk in the anatomical image.

2

( ( ) ( ))( ) exp j kF

jkN N

f ff

( )F fN( ) ( )F A f f

( ) ( )j k AN N h a a 2( ) pjk h

2

( ( ) ( ))( , ) exp j kA

jkA

N N

h

a afa

( )jk jk fN

( ) ( )F A af f

(f)(f)

( ) ( )jk jk jk f f

2 2

( ( ) ( )) ( ( ) ( ))1 exp exp

A

A

j k j k

A F

h

N N N N

h h

f faa

( ) ( , )jk jk jk af f

(f, a)

( ) ( ) 1)A A f fa(

( )Ajk f

( ) ( , ) 1)jk jk f fa(

Nguyen and Lee, IEEE Trans. Image Proc., 22(10), 2013.

functional image anatomical image functional image anatomical image

MAP-QD withoutside information

MAP-QD with anatomicalboundary information

MAP-QD withoutside information

MAP-QD with anatomicalboundary information

MAP-NLR withoutside information

MAP-NLR with anatomicalinformation

MAP-NLR withoutside information

MAP-NLR with anatomicalinformation

Super-Resolution Reconstruction Using NonLocal & Local RegularizersSuper Resolution Reconstruction Using NonLocal & Local Regularizers

(d) NLQ(b) LNQ (c) LNQ+NLQ(a) LR phantom

(h) NLQ(f) LNQ (g) LNQ+NLQ(e) HR phantomL l i (LR) d hi h l i (HR) f h d d l i (( ) (d) LR iLow-resolution (LR) and high-resolution (HR) software phantoms and anecdotal reconstructions ((a)-(d) LR images, (e)-(h) HR images): (a) LR phantom; (b) LR image reconstructed by PL-LNQ (PE=28.42%); (c) LR imagereconstructed by PL-NLQ+LNQ with τ=0.6 (PE=27.91%) (d) LR image reconstructed by PL-NLQ (PE=27.98%);(e) HR phantom; (f) HR image reconstructed by PL-LNQ (PE=25.89%); (g) HR image reconstructed by PL-NLQ+LNQ with 0.6 (PE=25.27%); (h) HR image reconstructed by PL-NLQ (PE=25.95%).Q ( ); ( ) g y Q ( )

Recent Developments in Statistical Reconstruction for...

Documents

Transcript of Recent Developments in Statistical Reconstruction for...