Fuentes 2005

download Fuentes 2005

of 10

Transcript of Fuentes 2005

  • 8/11/2019 Fuentes 2005

    1/10

    Biometrics 61, 3645March 2005

    Model Evaluation and Spatial Interpolation by BayesianCombination of Observations with Outputs from Numerical Models

    Montserrat Fuentes

    Statistics Department, North Carolina State University, Raleigh, North Carolina 27695-8203, U.S.A.email: [email protected]

    and

    Adrian E. Raftery

    Statistics Department, University of Washington, Seattle, Washington 98195-4320, U.S.A.email: [email protected]

    Summary. Constructing maps of dry deposition pollution levels is vital for air quality management, andpresents statistical problems typical of many environmental and spatial applications. Ideally, such mapswould be based on a dense network of monitoring stations, but this does not exist. Instead, there are twomain sources of information for dry deposition levels in the United States: one is pollution measurementsat a sparse set of about 50 monitoring stations called CASTNet, and the other is the output of the regionalscale air quality models, called Models-3. A related problem is the evaluation of these numerical modelsfor air quality applications, which is crucial for control strategy selection. We develop formal methodsfor combining sources of information with different spatial resolutions and for the evaluation of numericalmodels. We specify a simple model for both the Models-3 output and the CASTNet observations in termsof the unobserved ground truth, and we estimate the model in a Bayesian way. This provides improvedspatial prediction via the posterior distribution of the ground truth, allows us to validate Models-3 via theposterior predictive distribution of the CASTNet observations, and enables us to remove the bias in theModels-3 output. We apply our methods to data on SO2 concentrations, and we obtain high-resolution SO2distributions by combining observed data with model output. We also conclude that the numerical models

    perform worse in areas closer to power plants, where the SO2 values are overestimated by the models.

    Key words: Air pollution; Deterministic simulation models; Environmental statistics; Geostatistics;Spatial statistics.

    1. Introduction

    Emission reductions were mandated in the Clean Air ActAmendments of 1990 with the expectation that they wouldresult in major reductions in the concentrations of atmospher-ically transported pollutants. Maps of dry deposition and con-centration levels of pollutants are useful for discovering when,where, and to what extent the pollution load is increasing

    or declining. Ideally, such maps would be based on a densenetwork of monitoring stations, covering most of the UnitedStates, at which dry deposition and concentrations of air pol-lutants would be measured on a regular basis. Unfortunately,such a network does not exist. Instead, there are two mainsources of information about dry deposition pollution levelsin the United States, and two resulting ways of constructingpollution maps. The first is asparseset of about 50 irregularlyspaced sites in the eastern United States, the Clean Air Statusand Trends Network (CASTNet), at which the EnvironmentalProtection Agency (EPA) regularly measures concentrationsand fluxes of different atmospheric pollutants (see Figure 1A).It would be possible to use an interpolation method to pro-

    duce a pollution map. However, the air pollutants fluxes andconcentrations are functions of terrain, atmospheric turbu-lence, vegetation, the rate of growth of the vegetation, andother soil and surface conditions. Because these factors varyabruptly in space and time and because the monitoring sta-tions are too far from each other, interpolation of the CAST-Net monitoring data is recognized to be inadequate for the

    problem (Clarke and Edgerton, 1997).The second source of information is pollution emissions

    data. The point and area sources of emissions are availablefrom known sources of p ollution such as chemical plants, gen-erally in the form of annual totals. If the emissions data wereaccurate and available at a fine time resolution, and if we hadprecise information about local weather, land use and cover,and pollution transport dynamics, we could in principle workout pollution levels at each point in time and space quite ac-curately. This ideal is far from being attained. However, theavailable emissions data have been combined with numeri-cal models of local weather (the Mesoscale Model version 5[MM5]), the emissions process (the Sparse Matrix Operator

    36

  • 8/11/2019 Fuentes 2005

    2/10

    Spatial Interpolation by Bayesian Combination 37

    1.51

    5.71

    1.8

    3.44

    0.15

    3.56

    0.33

    3.29

    0.18

    1.42

    0.950.18

    1.68

    0.57

    1.94

    1.06

    1.97

    1.61

    4.81

    1.97

    1.91

    2.75

    3.671.62

    1.27

    0.9

    0.31

    4.043

    1.79

    1.77

    1.13

    0.57

    3.14

    2.63

    1.02

    2.53

    0.19

    SO2concentrations (CASTNet )

    -100 -90 -80 -70 -60

    25

    30

    35

    40

    45

    50

    0

    10

    20

    3

    0

    40

    Models-3: SO2Concentrations

    (A) (B)

    Figure 1. (A) Weekly average of SO2 concentrations (parts per billion; ppb) at the Clean Air Status and Trends Network(CASTNet) sites, for the week of July 11, 1995. (B) Output of Models-3, weekly average of SO2 concentrations (ppb), for theweek of July 11, 1995.

    Kernel Emissions [SMOKE] model), as well as informationabout land use and cover, to estimate pollution levels in spaceand time (the Community Multiscale Air Quality [CMAQ]output) and to produce maps (Dennis et al., 1996). Theseare not statistical models, but rather numerical deterministicsimulation models based on systems of differential equationsthat attempt to represent the underlying chemistry; they takethe form of huge blocks of computer code. The combination

    of these models is referred to as Models-3 (models genera-tion 3). The models are run by the EPA and individual U.S.states, and they provide estimates of pollutant concentrationsand fluxes on regular grids in parts of the United States (seeFigure 1B).

    The output of Models-3 generates averaged concentrations/fluxes over regions of size 36 36 km. This approach mayalso be unsatisfactory for two main reasons. First, the under-lying emissions data are often not of high quality (Dolwicket al., 2001). Second, the underlying models may be inade-quate in various ways. It seems clear that combining the twomain approaches and sources of information, the model es-timates and the point measurements, could lead to a bettersolution. So far, efforts to do this have focused on model evalu-

    ation, in which model predictions are compared with measure-ments, and the models are revised and the outputs adjusted ifdiscrepancies are found (Dennis et al., 1990). The final mapsare still based on the model output alone.

    The evaluation of physically based computer models forair quality applications is crucial to assist in control strat-egy selection. Selecting the wrong control strategy has costlyeconomic and social consequences. The objective comparisonof modeled concentrations with observed field data is one ap-proach to assessment of model performance. Early evaluationsof model performance usually relied on linear least-squaresanalysis of observed versus modeled values, using scatterplotsof the values.

    Further development of these proposed statistical evalua-tion procedures is needed, and we propose a Bayesian ap-proach. Statistical assessment is tricky in this case, becausethe model predictions and the observations do not refer tothe same spatial locations, and indeed are on different spa-tial scales. The fact that they are on different spatial scalesis called the change of support problem. The model predic-tions are averages over grid squares, while the observations are

    at points in space; the two are thus not directly comparable.One approach to making them comparable is to apply inter-polation and extrapolation methods to the CASTNet pointmeasurements so as to produce empirical estimates of gridsquare averages, and then compare those to the model pre-dictions (Sampson and Guttorp, 1998). One difficulty withthis is that the interpolated grid square averages can bepoor because of the sparseness of the CASTNet network,and so treating them as ground truth for model evaluation isquestionable.

    A related problem is that the comparison does not take intoaccount the uncertainty in the interpolated values. In this arti-cle, we develop a new approach to the model-evaluation prob-lem, and show how it can also be used to remove the bias in

    model output and to produce maps that combine model pre-dictions with observations in a coherent way. Combining bothsources of information versus using just the sparse data fieldor the unreliable model output should lead to improved mapsof air quality. We specify a simple model for both Models-3predictions and CASTNet observations in terms of the unob-served ground truth, and estimate it in a Bayesian way. Solu-tions to all the problems considered here follow directly. Modelevaluation then consists of comparing the CASTNet observa-tions with their predictive distributions given the Models-3output. Bias removal follows from estimation of the bias pa-rameters in the model. Maps of pollution levels and of theuncertainty about them, taking into account all the available

  • 8/11/2019 Fuentes 2005

    3/10

    38 Biometrics, March2005

    information, are based directly on the posterior distribution ofthe (unobserved) ground truth. The resulting approach takesaccount of and estimates the bias in the atmospheric models,the lack of spatial stationarity in the data, the ways in whichspatial structure and dependence change with locations, thechange of support problem, and the uncertainty about these

    factors. It can be viewed as an instance of the Bayesian meld-ing framework for inference about deterministic simulationmodels (Poole and Raftery, 2000), and its implementation isquite straightforward.

    A similar approach has been developed by Cowles andZimmerman (2002, 2003), who have used systematic samplingand standard numeric integration techniques rather thanMonte Carlo integration to combine point and areal data.Another relevant method was proposed by Best, Ickstadt,and Wolpert (2000), who related different spatially varyingquantities to an underlying unobservable random field for a re-gression analysis of health and exposure data. In our methodwe also have an underlying unobserved process, but the sta-tistical model we propose is novel. We present a new way to

    relate air pollution variables to an underlying process repre-senting the true air pollution values, as well as a new model fornonstationarity. Wikle et al. (2001) presented an approach tocombining data from different sources to improve the predic-tion of wind fields. Wikle et al.s approach is a conditional onein which all the spatial quantities are defined through a seriesof statistical conditional models. In their approach, the out-put of the numerical models is treated as a prior process. Herewe present a simultaneous representation of the data and theoutput of numerical models in terms of the underlying truth.Our method is different from that of Wikle et al., because wedo not treat the output of the numerical models as a priorprocess, but rather as another source of data. Therefore, we

    write the output of the models in terms of the underlyingtruth, taking into account the potential bias of the numericalmodels. The way we estimate these bias parameters is alsonovel. To our knowledge, this is the first time that statisti-cal methods that take into account nonstationarity, change ofsupport, and the uncertainty about these factors have beenused for the evaluation of the regional scale air quality nu-merical models.

    In Section 2 we describe the statistical model, and in Sec-tion 3 we show some of our results for model evaluation andmap construction using the combined data for the air pollu-tion problem. Section 4 presents some final remarks.

    2. The Statistical Model

    2.1 Statistical Models for CASTNet and Models-3 OutputWe do not consider CASTNet measurements to be theground truth, because there is measurement error. Instead,we assume that there is an underlying (unobserved) fieldZ(s),whereZ(s) measures the true concentration/flux of the pol-lutant at location s. At station s we denote the CASTNetobservation by Z(s), and we assume that

    Z(s) = Z(s) +e(s), (1)

    where e(s) represents the measurement error at location sand e() N(0, 2e) is a white noise process. The processe(s) is independent ofZ(s). In some applications, it mightbe necessary to add a termB (s) in the observation equation

    (1) to explain the potential measurement bias in the data.Here we considered this bias to be negligible and ignored it,following the recommendation of our EPA collaborators.

    The true underlying process Z is assumed to follow themodel

    Z(s) = (s) +(s), (2)

    where Z(s) has a spatial trend, (s), that is a polynomialfunction of s with coefficients . We assume that Z(s) haszero-mean correlated errors(s). The process (s) has a pos-sibly nonstationary covariance with parameter vector thatmight change with location.

    We model the output of the EPA physical models as follows:

    Z(s) = a(s) +b(s)Z(s) +(s), (3)

    where the parameter functiona(s) measures the additive biasof the air quality models at location s and the parameterfunction b(s) accounts for the multiplicative bias in the airquality models. The process (s) explains the random devia-tion at location s with respect to the underlying true processZ(s), and we take it to be a white noise process, namely, () N(0, 2). The process (s) is independent ofZ(s) and e(s),which is the error term for CASTNet. Because the outputs ofModels-3 are not point measurements but areal estimationsin subregionsB 1, . . . , Bmthat cover the domain D, we have

    Z(Bi) =

    Bi

    a(s) ds +b

    Bi

    Z(s) ds +

    Bi

    (s)ds (4)

    for i = 1, . . . , m. According to the EPA modelers with whomwe have communicated and to some preliminary analyses wehave done, the bias is mostly additive, and their experience

    suggests treating the function b(s) as constant over space.Therefore, we modelb(s) as an unknown constant term andthe functiona(s) as a polynomial in s with a vector of coeffi-cients, a0.

    For spatial prediction we simulate values ofZfrom its pos-terior predictive distribution as

    P(Z| Z, Z). (5)

    For model evaluation we simulate values of CASTNet givenModels-3, assuming that Models-3 output is unbiased, i.e.,from the following posterior predictive distribution:

    P(Z| Z, a0= 0, b= 1). (6)

    For bias removal, we use values of the parameters a0 and bestimated from their posterior distribution

    P(a0, b | Z, Z). (7)

    2.2 Methods for Combining Data at DifferentSpatial Resolutions

    We first describe the change of support problem that oc-curs when we combine data sources with different supports,or when the supports of predictand and data are not thesame. This problem is treated in detail by Gotway and Young(2002). Here, we have point measurements at the CASTNet

  • 8/11/2019 Fuentes 2005

    4/10

    Spatial Interpolation by Bayesian Combination 39

    sites, and then we observe the output of Models-3 averagedover grid cells, B1, . . . , Bm. In this section we discuss algo-rithms to calculate the covariance for areal measurementsand the posterior predictive distribution of a random pro-cess at a point location Z(x0) given data on block averages,Z(B1), . . . , Z(Bm), where some of the blocks might be just

    points. The covariance for the block averages is

    cov(Z(Bi), Z(Bj)) =

    Bi

    Bj

    C(u, v,) dudv/|Bi||Bj |, (8)

    where C(u, v, ) = cov(Z(u), Z(v)), C being a possiblynonstationary covariance spatial function. Gelfand, Zhu, andCarlin (2001) approximated the integral in (8) using a randomsample over Bj. Here, we prefer a systematic sample becauseof the computational benefits of having the data on a regulargrid.

    We now deduce the joint distribution ofZand Zcondition-ing on the value of the parameters in models (1) and (4). Wecould in principle write this distribution as a function of the

    parameters and calculate the maximum likelihood estimate(MLE) for the parameters in models (1) and (4). Because inpractice this calculation will be hard, we present a Bayesianapproach to estimate the parameters. We have

    Z

    Z

    N

    a+b

    ,

    C CM

    CM M

    , (9)

    where

    = ((s1), . . . , (sn))T,

    a=

    B1

    a(s) ds, . . . ,

    Bm

    a(s)ds

    T,

    and

    =

    B1

    (s) ds, . . . ,

    Bm

    (s) ds

    T.

    In (9) C is the covariance ofZ (CASTNet), which is thecovariance of Zplus measurement error variance, M is thecovariance ofZ (Models-3), and CM is the crosscovariancebetween the point measurements Zand the block averagesZ.We write to denote the covariance matrix of (ZT,ZT)T, sothat is an (n + m) (n+ m) matrix with elements{i,j}given by

    i1,i2 = cov(Z(si1 ),Z(si2 ))

    = C(si1 , si2 ,) + 1{i1=i2}2e for i1, i2 n,

    n+j,i = i,n+j = cov(Z(si), Z(Bj))

    = b

    Bj

    C(si, v,) dv/|Bj |,

    for i = 1, . . . , n and j= 1, . . . , m, and

    n+j1,n+j2 = cov(Z(Bj1 ),Z(Bj2 ))

    = b2

    Bj1

    Bj2

    C(u,v,)dudv

    |Bj1 ||Bj2 | + 1{j1=j2}

    2|Bj1 |,

    forj1,j2= 1, . . . , m, where the function1A(x) is the indicatorfunction of the set A, taking the value 1 when x A and 0otherwise.

    The goal is to predict the value of Z at location x0 giventhe data. Thus we need the conditional distribution ofZ(x0)given the observations, assuming that all the parameters are

    known. We use classical results of multivariate analysis toderive the joint distribution ofZ(x0), and Z = (ZT,ZT)T. We

    define= cov(Z(x0), Z), an (n+ m)-dimensional vector withcomponents

    i = cov{Z(x0), Zi} = cov{Z(x0), Z(si)}

    = C(x0, si,), for i = 1, . . . , n ,

    n+j = cov{Z(x0), Zn+j} = cov{Z(x0), Z(Bj)}

    = b

    Bj

    C(x0, v,) dv/|Bj |, for j = 1, . . . ,m,

    where Zi denotes the ith component of Z. We then de-duce that the conditional distribution ofZ(x0) given {Z,Z}is normal with mean (x0) +T1(Z ), where =(, a+b)T, and variance 20

    T1.When the goal is to predictZat a location x0, the Bayesian

    solution is the predictive distribution ofZ(x0) given the ob-servationsZ,

    p(Z(x0) |Z) =

    p(Z(x0) |Z,)p( |Z) d, (10)

    where = (2e, a0, b, 2, , ). A Gibbs sampling approach

    is used to simulate mvalues from the posterior distributionof the vector parameter . The predictive distribution is ap-proximated by theRao-Blackwellized estimatoras follows:

    p(Z(x0) |Z) = 1T

    Tt=1

    pZ(x0) |Z,

    (t), (11)

    where (t) is the tth draw from the posterior distribution.

    2.3 Modeling a Nonstationary Covariance

    The spatial patterns shown by the air pollutant fluxes andconcentrations change with location, in the sense that thespatial covariance is different at different locations, as shownby Guttorp and Sampson (1994), Haas (1995), and Hollandet al. (1999), among others. Therefore, the underlying processZin (2) is nonstationary and the standard methods of spatialmodeling and interpolation are inadequate. In this section wereview the methodology of Fuentes (2001, 2002) and Fuentes

    and Smith (2001), which we will use to model the covariance ofthe processZ. We represent the process locally as a stationaryisotropic random field with some parameters that describe thelocal spatial structure. These parameters are allowed to varyacross space and reflect the lack of stationarity of the process.

    A broad class of stationary Gaussian processes may be rep-resented in the form

    Z(x) =

    D

    K(x s)Z(s)(x) ds, (12)

    where K is a kernel function and Z(x), x D is a family of(independent) stationary Gaussian processes indexed by .The parameter is allowed to vary across space to reflect the

  • 8/11/2019 Fuentes 2005

    5/10

    40 Biometrics, March2005

    lack of stationarity of the process. The stochastic integral (12)is defined as a limit (in mean square) of approximating sums(e.g., Cressie, 1993, p. 107). Each stationary process Z(s)(x)has a mean function s that is constant, i.e., s does notdepend on x. We propose a parametric model for the mean ofZ,

    E{Z(x)} =(x;);

    here we take to be a polynomial function of x with coeffi-cients , but other choices are possible.

    The covariance ofZ(s) is stationary with parameter (s),

    cov{Z(s)(s1), Z(s)(s2)} =C(s)(s1 s2).

    We take the process Z(s) to have a Matern stationary covari-ance (Matern, 1960):

    C(s)(x) = s

    2s1(s)

    21/2s |x|/s

    sKs

    21/2s |x|/s

    , (13)

    where Ks is a modified Bessel function and (s) = (s, s,s). The parameter s measures how the correlation decayswith distance; generally this parameter is called the range.The parametersis the variance of the random field, i.e.,s =var(Z(s)(x)), usually referred to as thesill. The parametersmeasures the degree of smoothness of the processZ(s). Thehigher the value ofs the smoother Z(s) would be.

    The covariance ofZ, C(s1, s2; ), is a convolution of thelocal covariances C(s)(s1 s2),

    C(s1, s2;) =

    D

    K(s1 s)K(s2 s)C(s)(s1 s2)ds. (14)

    In (14) every entry requires an integration. Because eachsuch integration is actually an expectation with respect to

    a uniform distribution, we propose Monte Carlo integration.We draw a systematic sample of locationsvj ,j = 1, 2, . . . , Mover D. Hence, we replace C(s1, s2; ) with

    CM(s1, s2;)

    =M1Mj=1

    K(s1 vj)K(s2 vj)C(vj)(s1 s2).(15)

    This is a Monte Carlo integration, which can be made arbi-trarily accurate and does not involve the dataZ. The samplingpoints vj ,j = 1, 2, . . . , M, determine subregions of local sta-tionarity for the processZ. We increase the value of Muntilconvergence is achieved.

    2.4 Algorithm for Estimation and PredictionIn our Gibbs sampling approach there are three stages. Wealternate between the parameters that measure the lack ofstationarity, (,(s)) (Stage 1), the parameters that measurethe bias of Models-3 and the measurement error of CASTNet(Stage 2), and the unobserved true values of Z at all theCASTNet sites and at the blocks where we have the Models-3output (Stage 3). We obtain the conditional posterior distri-bution of the parameters that measure the lack of stationarity,(, (s)), given the values ofZthat are updated in Stage 3.The posterior distribution of (, (s)) will be completelyspecified once we define the priors for (, (s)), because(Z|, ) is Gaussian.

    We use a MetropolisHastings step for blocks of parame-ters, after performing partial marginalizations of the full con-ditionals. We treat as a block , which are the three covarianceparameters, namely, the sill, the range, and the smoothness.In our experience, this scheme produces a chain with bettermixing properties than that obtained by independent sam-

    pling of the full conditional distributions of the sill, range,and smoothness parameters. We use a gamma-prior distri-bution for all the covariance parameters, except for the sillparameter; we use a uniform prior for the logarithm of thesill. For the parameter, we use uniform priors.

    3. Application: Air Pollution Data

    We model the covariance function for the processZ, the trueSO2 values, using equation (14). We estimate the covarianceparameters of the process Z, given the CASTNet data andModels-3 output, taking into consideration the change-of-support problem. We calculate the covariances involving blockaverages by drawing a systematic sample ofLlocations in eachpixel, so thatMis equal to L times the number of pixels. Thisis an approximation that can be made arbitrarily accurate byincreasing the value ofL. In this application L = 4 seemed tobe large enough to achieve a sufficiently good approximation.We sample systematically rather than randomly because itallows us to use the fast Fourier transform, with which thecovariance calculations are faster and easier.

    We implement the nonstationary model (12) withweight function K(u s) = 1

    h2K0(

    u sh ), where K0(u) is the

    quadratic weight function

    K0(u) = 3

    4

    1 u1

    2

    +

    34

    1 u2

    2

    +, (16)

    for u = (u1, u2). The bandwidth parameter h is defined as

    (1 + )l/2, wherelis the distance between neighboring sam-ple points v1, . . . , vM in (15), and is a value between 0 and1. For we use a uniform prior in the interval [0, 1]. The pa-rameter determines the overlap between the subregions ofstationarity centered at the sampling points v1, . . . , vM, andhcan be interpreted as the diameter of the subregions of sta-tionarity. We use gamma priors for all the Matern covarianceparameters, except for the sill parameter for which we usep() 1, which is a uniform prior for log(). The meanand variance of the gamma priors are determined using pre-vious knowledge and results from the analysis of similar data.For the smoothness parameter the prior mean is 0.5 and thevariance 1, and for the range parameter the prior mean is 100km and the prior standard deviation is 45 km.

    In Figure 2 we plot the posterior distribution of (x) atsome CASTNet sites x. The spatial locations at which weexamine the covariance parameters in Figure 2 have nothingto do with the choice of the nodes v1, . . . , vM in equation(15). We are assuming that the covariance parameters areapproximately constant between nodes; otherwise we increasethe number of nodes,M. Thus, the value of at some locationxof interest is calculated as the value of at the node vj thatis closest to x.

    The sill parameter changes with location as illustrated bythe variation in the distributions in Figure 2. The mean andstandard deviation of the posterior distributions for the sillparameter at six selected sites located in Maine, Illinois, North

  • 8/11/2019 Fuentes 2005

    6/10

    Spatial Interpolation by Bayesian Combination 41

    0

    20

    40

    60

    80 Maine

    0 2 4 6 8

    Illinois

    North Carolina

    0

    20

    40

    60

    80Indiana0

    20

    40

    60

    80 Florida Michigan

    0 2 4 6 8

    Posterior distributions

    PercentofTotal

    Sill

    Figure 2. Posterior distributions of the sill parameter of the Matern covariance for the SO2 concentrations of Z, for theweek starting July 11, 1995, at six selected locations.

    Carolina, Indiana, Florida, and Michigan, are, respectively,0.23 (0.03), 2.89 (0.85), 0.78 (0.23), 3.40 (0.99), 0.21 (0.04),and 0.54 (0.08). The range parameter does not change muchwith location. The means and standard deviations of the pos-terior distributions for the range parameter at the same sixsites are 97.7 (12.8), 95.7 (28.5), 119.0 (35.8), 91.9 (27.3),114.9 (21.9), and 96.2 (15.5). The smoothing parameter doesnot change much with location either, and is always closeto 1/2 (exponential). The mode of the posterior distributionof the parameter that measures the measurement error forCASTNet is 0.8, and for Models-3 it is 0.1.

    We use a uniform prior distribution for all the additive biasparameters, and a normal prior for the multiplicative biasparameter b. The mode of the posterior distribution for theparameter that measures the multiplicative bias for Models-3

    Table 1

    Columns25 in this table show the modes, standard deviations, and90% credibleintervals of the posterior predictive distribution(equation [5]) for the underlying process Z

    measuring the true SO2 concentrations at the six selected sites. Column6 shows theCASTNet values(Z). Columns79 show the modes and the corresponding90% credible

    intervals of the posterior predictive distribution(equation [6]) for model evaluation.

    Site Mode SD 90% CI CASTNet Models-3 90% CI

    Maine 0.18 0.02 0.15 0.25 0.15 0.33 0.10 0.43Illinois 2.80 0.25 2.55 3.41 3.29 3.33 2.17 5.03North Carolina 1.98 0.29 1.18 2.08 0.90 5.32 3.67 6.67Indiana 0.98 1.59 0.74 5.63 3.14 9.59 4.20 20.50Florida 0.56 0.05 0.50 0.69 0.57 0.52 0.20 0.80Michigan 0.83 0.12 0.79 1.14 1.02 1.04 0.53 1.70

    is 0.5 with a standard error of 0.5, and for the additive bias wehave a polynomial of degreeq. We treatqas a hyperparameterand use a reversible jump Markov Chain Monte Carlo (Green,1995; Denison, Mallick, and Smith, 1998) algorithm for modelselection. In this applicationq= 4 seemed to be large enough.

    In Table 1 we show the modes, standard deviations, and90% credible intervals of the p osterior predictive distribution(5) for SO2 at the six selected sites. We get very high vari-ability at the Indiana site. This site is very close to severalcoal power plants, and so the SO2 levels can be very high orvery low depending on wind speed, wind direction, and theatmospheric stability. The sites in Maine and Florida havethe lowest SO2 levels and variability. The agricultural site inIllinois and the site in North Carolina have similar behaviorin terms of SO2 levels. The site in North Carolina is not far

  • 8/11/2019 Fuentes 2005

    7/10

    42 Biometrics, March2005

    Figure 3. Map of the estimated additive spatial bias of Models-3. The bias parameters are estimated by the means of their

    posterior distributions.

    from the Tennessee power plants, and the site in Illinois is alsorelatively close to some midwestern power plants. The site inMichigan, which is very close to Lake Michigan and relativelyfar from power plants also has low SO2 levels.

    In Table 1 we also show the CASTNet values, to judgewhether the generated data are similar to the CASTNet data.The CASTNet values in Table 1 at the six sites represent the5th, 91st, 1st, 68th, 14th, and 79th percentiles of the corre-sponding posterior predictive distributions from the Bayesianmelding approach. For the North Carolina site the CASTNetvalues are low relative to the posterior predictive distribution.This is due to the higher altitude (and lower pollution levels)of this particular site in relation to the nearby locations.

    The last three columns in Table 1 are the modes and thecorresponding 90% credible intervals of the posterior predic-tive distribution (6) for model evaluation. We can clearly ap-preciate the bias in the numerical models by comparing thesevalues with CASTNet. For instance, for the site in NorthCarolina, the interpolated value of Models-3 is 5.32 ppb (SE =3.00 ppb), while the CASTNet value is only 0.90 ppb. Wecan remove the bias in these interpolated Models-3 values bytaking into account the additive bias measured by a(x) (apolynomial of degree 4 with coefficients a0) and the multi-plicative bias. At each site we calculate the posterior meansofa0and b, using the posterior distribution (7), and we obtain

    the following adjusted Models-3 values (adjusted value(s) =((Models-3)(s) a(s))/b) at the six selected sites: 0.12, 2.88,1.09, 3.12, 0.44, and 1.01, where a is a fourth-degree poly-nomial with coefficients that are the posterior means of a0,and b is the posterior mean of b. These adjusted values arevery similar to CASTNet. Figure 3 shows a contourplot forthe additive bias of Models-3, with the bias parameters, a0,estimated with the mean of their posterior distribution. Thisbias seems to be larger in a northsouth ridge covering someof the midwestern and the southeastern parts of our domain.The week of July 11 (1995) was very dry with a heat waveaffecting the midwest and southeast of the United States,this probably affected the ability of the numerical models to

    estimate correctly the mixing heights (the depth of the un-stable air in the boundary layer) that determines pollutanttrajectories.

    Figure 4A shows the predicted values of SO2 at differentlocations on a regular grid, using our Bayesian melding ap-proach for prediction to combine CASTNet and Models-3data. The predicted values in Figure 4A are the means ofthe posterior predictive distribution (equation [5]) for theSO2 data. This graph looks similar to the output of Models-3shown in Figure 1B. However, the SO2 values in Figure 4Aare between 0 and 8 ppb, while in Figure 1B the SO2 valuesare between 0 and 40 ppb. The range of values in Figure 4Ais more reasonable, and it is closer to the range of valuesfor CASTNet shown in Figure 1A. This illustrates the effec-tiveness of the Bayesian melding approach for correcting thebias in Models-3. Figure 4B shows the standard error of theposterior predictive distribution at each point. There is higheruncertainty in areas close to power plants, where there is morevariability due in part to the effect of metereological variablessuch as wind.

    Figure 5 is an illustration of the performance of ourBayesian melding approach. In this figure we plot the CAST-Net values Zi (at each site i) versus the mean of the poste-rior predictive distribution of the truth Z at site igiven themodels output Z and the CASTNet values Zi, at all sites

    except at the site i that we are predicting. The dotted linesshow the 90% pointwise credible bounds for CASTNet, andthe solid line has slope 1 and intercept 0. The average lengthof the 90% credible intervals for the SO2 values using theBayesian melding approach over all CASTNet sites is 7 ppband the standard deviation of the length of the intervals is 8ppb. There is quite a lot of variability in the lengths of theintervals, mainly due to the lack of stationarity of the spatialcovariance. We compare this mean interval length to that ofthe 90% credible intervals for the SO2 values at all CAST-Net sites, this time given only the Models-3 output, and notconditional on the CASTNet values. This is only 3.5 ppb,half of the average interval length given Models-3 and the

  • 8/11/2019 Fuentes 2005

    8/10

    Spatial Interpolation by Bayesian Combination 43

    -100 -90 -80 -70 -60

    25

    30

    35

    40

    45

    50

    0

    2

    4

    6

    8

    SO2Concentrations (Bayesian melding)

    -100 -90 -80 -70 -60

    25

    30

    35

    40

    45

    50

    0

    1

    2

    3

    4

    Standard Error of Bayesian Prediction

    (A) (B)

    Figure 4. Predicted SO2 concentrations using the Bayesian melding approach to combine CASTNet with Models-3 data:(A) mean and (B) standard deviation of the posterior predictive distribution of the underlying processZgiven CASTNet andModels-3.

    CASTNet values. These intervals are too overoptimistic,mainly due to the fact that the output of Models-3 is smootherthan the real data. The calibration of the intervals based onModels-3 alone is poor: for Models-3 alone, only 55% of the90% credible intervals contain the CASTNet values. We im-prove the calibration to 80% by including CASTNet sites (seeFigure 5).

    4. Discussion

    Air quality numerical models are used to examine the responseof the air pollution network to different control strategies un-der various high-pollution scenarios. To establish their cred-ibility, however, it is essential that they should accurately

    reproduce observed measurements when applied to grounddata. Our objectives are model evaluation and bias removal

    Bayesian cross-validation predictive values

    CASTNetsites

    0 5 10 15 20

    0

    5

    10

    15

    20

    Figure 5. CASTNet values of SO2 versus the mean of thepredictive posterior distribution of the true SO2 values givenModels-3 and CASTNet values at all sites, except for the siteat which the predictions are being made. The dotted linesshow the 90% pointwise crossvalidation predictive crediblebands for CASTNet.

    for the air quality numerical models, and construction of reli-able maps of air pollution combining the output of numericalmodels with air pollution measurements at monitoring sites.We evaluate air quality models by obtaining the posteriorpredictive distribution of the measurements at the monitor-ing sites given the numerical models output. We remove thebias in the air quality models by obtaining the posterior dis-tribution of the bias parameters given the measurements atthe monitoring sites and the numerical models output. Weconstruct maps of air pollutants simulating values from theposterior predictive distribution of the true values (underly-ing process) given the measurements at the monitoring sitesand the numerical models output.

    The Bayesian approach provides a natural way to combinedata from different sources taking into account the differentuncertainties, and it also provides posterior distributions ofquantities of interest that can be used for scientific inference.However, our approach could also be implemented using a geo-statistical approach by predicting the truth with the optimalpredictor, namely, the mean of the conditional distributionof the truth given the data and the parameters. For this, wewould use an iterative approach with two steps. In step 1, wewould obtain the predictor of the truth given the data andsome initial values of the parameters. In step 2, the predictedvalues obtained in step 1 combined with the data would beused to estimate the bias and covariance parameters, using a

    likelihood approach. We would then iterate between steps 1and 2.Another approach to model evaluation is to use spatiotem-

    poral models for monitoring data to provide estimates of aver-age concentrations over grid cells corresponding to model pre-diction (Dennis et al., 1990; Chu, 1996; Sampson and Guttorp,1998; Davis, Nychka, and Barley, 2000). This approach is rea-sonable when the monitoring data are dense enough that wecan fit an appropriate spatiotemporal model to the data. Insituations such as the one presented here, with few and sparsedata points that show a lack of stationarity, the interpolatedgrid square averages would be poor because of the sparsenessof the CASTNet network, and so treating them as groundtruth for model evaluation would be questionable.

  • 8/11/2019 Fuentes 2005

    9/10

    44 Biometrics, March2005

    We model the underlying process Z, with the true valuesof fluxes/concentrations of air pollution, as a spatial processwith nonstationary covariance. Misspecification of the covari-ance would lead to poor estimates of the predictive standarddeviation, and possibly also biased predictions. For instance,suppose we assume that the sill is constant, while it is actually

    changing with location. Then, we would tend to overestimatethe prediction error in areas with smaller sill than the pro-posed fixed value, and we would probably underestimate theprediction error in areas with larger sill than the assumedfixed value. Here we represent the process locally as a sta-tionary isotropic random field, but allow the parameters ofthe stationary random field to vary across space. With thismodel we are able to make inferences about the nonstationaryrandom field with only one realization of the process. We haveused a model that assumes the spatial covariance structure tobe approximately piecewise constant between nodes. An al-ternative approach would be to model the spatial covarianceparameter using splines, when we would have a continuousfunction of space. However, when the number of nodes M is

    large enough, our results using splines are similar to thosefrom the simpler approach we have presented here.

    We have overcome some limitations of earlier approaches tothe evaluation of numerical models and mapping of air quality,but other limitations remain. In our analysis we have mod-eled the true underlying pollution process as a Gaussian field,which seemed to be a reasonable assumption because we wereworking with weekly averages of hourly values. However, forhourly pollution levels the logarithmic transformation couldbe more appropriate. Based on the experience of our EPA col-laborators, the main problem with air quality models seemsto be the presence of additive bias. This is the reason for ourrepresentation of the numerical models as a linear function

    of the truth. In other applications, one might need to writethe output of a numerical model as a nonlinear function ofthe truth. We have also ignored the potential additive bias ofthe observations. Preliminary analysis showed that this biaswas negligible compared to the bias of the Models-3 output.We assumed independence between the truth and the mea-surement error. To our knowledge, this assumption is reason-able for SO2 concentrations. But, it is more questionable forother air pollution variables, such as fluxes and depositionvelocities. Thus, when modeling other pollution variables, amore complicated dependency structure might be needed.

    The only monitoring network for dry deposition and con-centrations is CASTNet, and the CASTNet monitoring sitesare all in rural areas. Because Models-3 are regional models,

    the main interest is to capture regional patterns, rather thanisolated hot spots (probably located in urban areas). There-fore, CASTNet seems to be an appropriate network for eval-uation of these regional air quality models, despite the factthat all sites are rural. However, we plan to use in the futureurban sites whenever they become available.

    In this article we combine data by relating the spatiallyvarying variables to an underlying unobservable true air pol-lution process and we predict this latent process. This is differ-ent from the traditional geostatistical approaches of cokrigingor kriging with external drift (KED) for spatial interpolationwith auxiliary covariates, as used by Phillips et al. (1997). Incokriging or KED the data are represented as a function of

    covariates using a regression model. Therefore, if the observeddata and the output of numerical models are measured at dif-ferent spatial resolutions, we cannot compare the observeddata directly to the output of numerical models. In this ar-ticle we overcome that problem by relating each source ofinformation to the unobserved ground truth.

    An alternative approach to model evaluation would usekriging to predict the output of the numerical models at theCASTNet locations. We compared our proposed modelingapproach with this traditional stationary kriging predictionmethod that ignores the uncertainty in the covariance param-eters. Both methods tend to give similar predicted values; themain difference is in the prediction error, which is the criti-cal factor for any inference done with the data. Our approachgave larger standard errors than kriging for the Models-3-predicted values, especially in areas close to power plants.This reflects the difficulty in estimating the covariance pa-rameters in these areas. The kriging approach reported thesame standard errors everywhere. For the evaluation of thephysical models, we studied where CASTNet values lay with

    respect to a confidence interval for the SO2 predicted valuesfrom Models-3. Using kriging we considerably underestimatedthe lengths of the prediction intervals leading to wrong con-clusions about the performance of Models-3.

    Acknowledgements

    Fuentess research was supported by National Science Foun-dation grant DMS 0002790 and by U.S. EPA award R-8287801. Rafterys research was supported by NIH grant 8R01 EB002137-02. The research of both authors was sup-ported by the Department of Defense Multidisciplinary Re-

    search Program of the University Research Initiative undergrant N-00014-01-10745. The authors would like to acknowl-edge the helpful insight and assistance provided by the U.S.EPA Office of Research Development, RTP, NC, in partic-ular by Peter Finkelstein, John Pleim, and Robin Dennis.The authors are also grateful to Tilmann Gneiting for helpfulcomments.

    Resume

    La construction de cartes de niveaux de pollution par depotssecs est fondamentale pour la gestion de la qualite de lair, etpresente des problemes statistiques typiques de nombreusesapplications spatiales et environnementales. Ces cartes de-vraient idealement reposer sur un reseau dense de stationsde suivi, mais ce nest jamais le cas. Aux Etats-Unis, il existedeux principales sources dinformation sur les depots secs: lereseau de stations de mesures de pollution CASTNet (reseaulache de 50 stations environ), et les sorties dun modeleregional de prediction de la qualite de lair appele Models-3. Levaluation de ces modeles numeriques pour leurs ap-plications en qualite de lair constitue un probleme annexemais crucial pour le choix de strategies de controle. Nousdeveloppons simultanement des methodes formelles de combi-naison de sources dinformation a differentes resolutions spa-tiales et devaluation des modeles numeriques. Nous ecrivonsun modele simple combinant les sorties de Models-3 et lesobservations de CASTNet sous forme dune donnee terrainreelle non observee, et nous estimons le modele par une

  • 8/11/2019 Fuentes 2005

    10/10

    Spatial Interpolation by Bayesian Combination 45

    methode bayesienne. Ceci permet (1) une prediction spatialeamelioree, (2) la validation des predictions de Models-3 via ladistribution predictive a p osteriori des observations de CAST-Net, et (3) lelimination du biais dans les sorties de Models-3.Nous appliquons notre methode a des donnees de concentra-tion de SO2, ce qui nous permet de produire des cartes de dis-tribution a haute resolution du SO2 en combinant les donnees

    et les predictions du modele numerique. Nous concluons aussique le modele numerique se comporte moins bien a proximitedes centrales thermiques, ou il surestime les valeurs de SO2.

    References

    Best, N. G., Ickstadt, K., and Wolpert, R. L. (2000). SpatialPoisson regression for health and exposure data mea-sured at disparate resolutions. Journal of the AmericanStatistical Association95, 10761088.

    Chu, S.-H. (1996). ROM2.2 model performance in the July 311, 1988 ozone episode. Transactions of the 1996 AMS-AWMA Joint Meeting on ROM Evaluation.

    Clarke, J. F. and Edgerton, E. S. (1997). Dry deposition cal-culations for the Clean Air Status and Trends Network.Atmospheric Environment31, 36673678.

    Cowles, M. K. and Zimmerman, D. L. (2002). CombiningSnow Water Equivalent data from multiple sources toestimate spatio-temporal trends and compare measure-ment systems.Journal of Agricultural, Biological and En-vironmental Statistics7, 536557.

    Cowles, M. K. and Zimmerman, D. L. (2003). A Bayesianspace-time analysis of acid deposition data combinedfrom two monitoring networks. Journal of GeophysicalResearch 108, 9006.

    Cressie, N. A. (1993). Statistics for Spatial Data, revised edi-tion. New York: Wiley.

    Davis, J. M., Nychka, D., and Bailey, B. (2000). A comparison

    of regional oxidant model (ROM) output with observedozone data. Atmospheric Environment34, 24131423.

    Denison, D. G. T., Mallick, B. K., and Smith, A. G. M. (1998).Automatic Bayesian curve fitting. Journal of the RoyalStatistical Society B60, 333350.

    Dennis, R. L., Barchet, W. R., Clark, T. L., and Seilkop, S. K.(1990). Evaluation of regional acidic deposition models(Part 1), NAPAP SAS/T Report 5. In National AcidPrecipitation Assessment Program: State of Science and

    Technology, Volume 1. Washington, D.C.: National AcidPrecipitation Assessment Program.

    Dennis, R. L., Byun, D. W., Novak, J. H., Galluppi, K. L.,Coats, C. J., and Vouk, M. A. (1996). The next genera-tion of integrated air quality modelling: EPAs Models-3.Atmospheric Environment30, 19252938.

    Dolwick, P. D., Jang, C., Possiel, N., Timin, B., Gipson, G.,and Godowitch, J. (2001). Summary of results from aseries of Models-3/CMAQ simulations of ozone in theWestern United States. 94th Annual A & WMA Confer-ence and Exhibition, Orlando, Florida.

    Fuentes, M. (2001). A new high frequency kriging ap-proach for nonstationary environmental processes. En-vironmetrics12, 469483.

    Fuentes, M. (2002). Spectral methods for nonstationary spa-tial processes.Biometrika89, 197210.

    Fuentes, M. and Smith, R. (2001). A new class of nonsta-

    tionary models. Technical report, Institute of StatisticsMimeo Series 2534, North Carolina State University.Gelfand, A. E., Zhu, L., and Carlin, B. P. (2001). On the

    change of support problem for spatio-temporal data.Bio-statistics2, 3145.

    Gotway, C. A. and Young, L. J. (2002). Combining incom-patible spatial data. Journal of the American StatisticalAssociation97, 632648.

    Green, P. J. (1995). Reversible jump Markov Chain MonteCarlo computation and Bayesian model determination.Biometrika82, 711732.

    Guttorp, P. and Sampson, P. D. (1994). Methods for esti-mating heterogeneous spatial covariance functions withenvironmental applications. In Handbook of Statistics 12,

    G. P. Patil and C. R. Rao (eds), 661689. Amsterdam:Elsevier Science B.V.

    Haas, T. C. (1995). Local prediction of a spatio-temporal pro-cess with an application to wet sulfate deposition. Jour-nal of the American Statistical Association90,11891199.

    Holland, D., Saltzman, N., Cox, L. H., and Nychka, D. (1999).Spatial prediction of sulfur dioxide in the eastern UnitedStates. In geoENV II: Geostatistics for EnvironmentalApplications, J. Gomez-Hernandez, A. O. Soares, andR. Froidevaux (eds), 6576. Dordrecht: Kluwer.

    Matern, B. (1960). Spatial variation. In Meddelandenfr an Statens Skogsforskningsinstitut, Volume 49, No. 5.Stockholm: Almaenna Foerlaget. Berlin: Springer-

    Verlag, 2nd edition, 1986.Phillips, D. L., Lee, H. E., Herstrom, A. A., Hogsett,W. E., and Tingey, D. T. (1997). Use of auxiliary datafor spatial interpolation of ozone exposure in southeast-ern forests. Environmetrics8, 4361.

    Poole, D. and Raftery, A. E. (2000). Inference for determinis-tic simulation models: The Bayesian melding approach.Journal of the American Statistical Association95, 12441255.

    Sampson, P. D. and Guttorp, P. (1998). Operational Evalu-ation of Air Quality Models. Proceedings of a NovartisFoundation Symposium on Environmental Statistics.

    Wikle, C. K., Milliff, R. F., Nychka, D., and Berliner, M.(2001). Spatiotemporal hierarchical Bayesian modeling:

    Tropical ocean surface winds. Journal of the AmericanStatistical Association95, 10761987.

    Received August2003. Revised May2004.Accepted May2004.