1
EmpiricalBayesQuantile-Prediction
akaE-BPredictionunderCheck-loss;
LawrenceD.BrownWhartonSchool,Univ.ofPennsylvania
BFF4,May2,2017Jointworkwith
GourabMukherjeeandPaatRusmevichientong MarshallSchool,U.S.C.
PreprintavailableonArXiv.NOTE:Inseveralplacesthesymbolν shouldreadν 2 .Thiscorrectionshouldbeclearfromthecontext.
2
MultipleIndependentGaussianProblems• nindependentproblems,indexedbyi =1,..,n
• Observe Xi ∼N θi ,vPi2( )
[θi unknown;vP , i known][Inapplications,Xi mightbe Xi = Xiiwith
Xik ∼ iid N θi ,σ i2( ),k =1,..,Ki ,vP ,i
2 =σ i2 Ki .]
3
QuantilePrediction• Foreachi ∃potentialfutureobservation Yi ∼N θi ,νF ,i( ).[Againthismightbeaverageofseveraliidvar’s.]
• Notethatforeachi,themeanθi issameforbothPastandFuture.
• Fixbi ∈ 0,1( ),i =1,..,n.Goalistopredictthebi -thquantileofdist.ofYi foreachindex,i.Thisquantileis
q0,i =θi + vF ,iΦ−1 bi( ).
• Thenaïvecoordinate-wisepredictoris q0,i = Xi + vF ,iΦ
−1 bi( ).NOTE:ThisisformalBayesforuniformprior
4
ShrinkageEstimation• Generallybeneficialinmulti-meanproblems(andmanyothermultivariatesettings).
• Buttheshrinkageneedstobeproperlyimplemented.• Forhomoscedasticproblemsthisiswellunderstood.(Steinestimation)
• Butthecurrentproblemisquiteheteroskedastic:
vP ,i , vF ,i , bi canalldependoni.• ForsuchproblemsminimaxshrinkagemaydramaticallydifferfromEmpiricalBayesshrinkage;
• And,well-implementedE-Bshrinkageisusuallypreferable
SeeEfronandMorris(1973,1975)……andXie,Kou,Brown(2012,2015)
5
ParametricEmpiricalBayes• Wefollowthehierarchicalconjugate-priorparadigm
EfronandMorris(op.cit.),Stein(1962),Lindley(1962),LindleyandSmith(1972)….
• θ1 ,..,θn ∼ iid N η ,τ 2( )• η ,τ 2areunknownhyper-parameterstobeestimatedfromthedata.
• Letη ,τ 2denotetheestimates.
Forsimplicityandnotationalsimplicity:• Consider(forthetalk)thecasewhereη =0isknown.(Paperallowsunknownη andestimatesitalongwithτ .)
6
QuantilePredictionforKnownHyper-parameters
• Ifτ isknownthenposteriordistributionof !θ known(&
Gaussian)• Yieldsknown,GaussianpredictivedistributionforeachfutureYi : Letα i = τ
2 τ 2 + vP ,i( ),then FYi y τ
2 ,Xi( ) =N α i Xi ,vF ,i +α ivP ,i( ).Soquantilepredictionis q Xi ;τ 2( ) =α i Xi + vF ,i +α ivP ,i( )Φ−1 bi( ).
• TheroleoftheE-Bpriorstructureisonlytomotivatethisfamilyofquantilepredictors.
7
“Direct”E-Bquantileprediction• Couldusedata+anyplausibleestimateofτ 2,thehyper-parameter(s),andplugin.
• i.e.,couldusemarginalMLEτ 2andgetprediction q Xi ;τ 2( ).• OrcouldsimilarlyuseMofMestimateinsteadofMLE.• These(andother)plausibleestimatesofτ 2aredifferent&givedifferentpredictors.
• They’renot“bad”.• BUTnoneoftheseneedstobethebestchoice.
SeeXie,Kou,Brown(2012)aboutthemorefamiliarestimationofmeansproblem.
8
FormulationwithLossFunction
Toinvestigateoptimalchoiceof ⌢τ 2in q Xi ;⌢τ 2( )imposea
predictivelossfunctionunderwhichqτ = q Xi ;τ 2( )isthenaturalpredictorofqwhenτ 2isknown.
Thenfind‘best’(or,just‘good’) ⌢τ 2 = ⌢τ 2 "X( )underthis
loss.Use ⌢q = q Xi ;
⌢τ 2( )foreachi.
• Check-Loss(aka,pinballlossorquantileloss).Letb,h>0;normalizebyb+h=1(w.l.o.g).i-thcomponentofpredictivelossis ℓ i Yi ,q( ) = bi q−Yi( )+ +hi Yi −q( )+
9
Notethatbi ,hi areknownandallowedtodependoni.
PlotofCheckLoss
b=0.7
Yqaxis
10
ThislosshasbeenintroducedhereasadevicetoevaluatepotentialQuantile-predictors.Butithasmotivationinvariousapplications.Forexample:
11
12
“Improved”E-BquantilepredictionConceptualIdea
• Defineaveragepredictiveriskas R
!θ , !q( ) = n−1 Eθi
ℓ i Yi , qi!X( )( )( )∑ .
• Whenusingaparticularhyper-parameterestimator,
!τ2 = !τ 2
"X( ),thisis
R!θ ; "τ 2( ) = n−1 Eθi
ℓ i Yi , qi Xi ; "τ 2!X( )( )( )( )∑ .
• Goalistocreateagood/bestestimator !τ .
13
• KEYistofindagoodand(nearly)unbiasedestimatorof n
−1 Eθiℓ i Yi , qi Xi ;τ 2( )( )( )∑ [Notewhathappenedtoτ 2 .]
• Callit RE! "
X ;τ 2( ).[ RE! isafnct.of !X ,butnotof
!Y .]
• Thiswillhave(1)
Eθi
RE!"X ;τ 2( )( )≈n−1 Eθi
ℓ i Yi , qi Xi ;τ 2( )( )( )∑ .
• Thenchoose
⌢τ RE2 "X( ) = argminτ 2
RE#"X ;τ 2( ){ }.
14
Whydoesthisyieldthebest ⌢τ 2(asn→∞)?Becausethereisabest ⌢τ 2(asn→∞).
Oracle“Predictor”
• Forevery !θ let
τOR2 = τOR
2 !X ;!θ( ) = argminτ 2
R θ ;τ 2( ).• Thenshow[asn→∞ forevery(L1b’nded)seq
!θ ]
(2) ⌢τ RE2 "X( )→τ 2OR
"X ;"θ( )inprob,&
(3) R!θ ; ⌢τ RE2( )→R
!θ ;τOR2( ).
• Thisconceptualschemeinvolvestwomajorsteps:
15
TwoMajorSteps• Step1:Createthe(asymptotic)riskestimator RE
! τ 2( )toyieldasuitableapproximation(1).
• Step2:Proveithasthedesiredconvergenceandriskproperties(2)and(3).
Thetwostepsareinterrelated:• RE! τ 2( )needstoproduceasatisfactoryapproximationin(1),andbecomputationallyfeasible.
• ANDitalsoneedstoenableStep2.
16
Step1:AsymptoticRiskEstimator• Needtosatisfy
(1) Eθi
RE!"X ;τ 2( )( )≈n−1 Eθi
ℓ i Yi , qi Xi ;τ 2( )( )( )∑ ,
where ℓ i ischeck-loss.• If ℓ i weresquarederrorthencanuseSURE.• But ℓ i isnotdifferentiable.SosomethinglikeSURE
seemsabigstretch!
17
• HOWEVER,
ℓ i* θi ,q( )" Eθi
Yi ℓ i Yi ,q( )( ) = vF ,i Gq−θivF ,i
,bi⎛
⎝⎜⎜
⎞
⎠⎟⎟∍
G w , b( ) =ϕ w( )+wΦ w( )−bw .• Gisasmooth,C∞ ,function.And ℓ i
* θi ,q( )≥0playstheroleofaconventionallossfunction.
• Here’sapictureofGforb=0.7:
18
PlotofthefunctionG
Approxima)ontheorybasednon-linearapproxima)on
Oftheloss
LinearTailapproxima)on
LinearTailapproxima)onb=0.7
0
b=0.7
19
CreationoftheAsymptoticRiskEstimator ARE! ForvP ,i =1(forsimplicity):(a) ApproximateG w ,bi( )byTaylorexpansion(“TE”)to
K(i)terms.(b) Substituteqi Xi ;τ( )−θi =w .(c) Theresultingexpectationis,say,Eθi
TE Xi ,θi( )( ).Itincludespowersofθi timesfunctionsofXi .
(d) IntegratebypartstocreateanequivalentexpectationthatcontainsonlyfunctionsofXi ¬ermswithθi .
(e) Step(d)couldbeunderstoodasaversionofSUREgeneralizedtohigherpowersofθi .
20
(f) ActualcomputationisgreatlyfacilitatedbyuseofHermitepolynomials.
(g) TheresultingexpectandisafunctionofXi thatisanunbiasedestimateoftheexpectedTaylorapproximation.That’sthecoreofour ARE! .
(h) Therearesomeadditional(clever)stepsneededtohandlelargevaluesofW,forwhichTaylorexpansionisnotgood.
(i) Justtoimpressyou,here’sthecoreof ARE! withouttheadditionalstepsin(h):(InthefollowingUi τ( )isaminortruncation/modificationofXi anddi τ( )isasimplerationalfunctionofτ ,vP ,i ,bi .Thisiscopiedfromthepaper,whichusesthenotation“τ ”inplaceof“τ 2”.)
21
(j) ChooseKn(i),thenumberoftermsintheTaylor
expansion.Thisvarieswithi,butisO logn( ).(k) Thenaddoveritoget ARE! .
(l) Finally,findthedesired argminτ ARE!( )viaadiscrete
gridsearch.
22
Step2:Provethedesiredasymptoticproperties
TherearesomecluesinXie,KouandBrown(2012).Butpartsoftheproofheredifferinkeyrespectsfromwhat’sthere,andthedetailsherearemuchmorecomplex.
23
Simulation#1Homoscedastic(σ 2 =1).Twotypesofθi ,bi( )pairs: 0.58,0.51( )withprob0.9 5.1,0.9( )withprob0.1
Comparisonofpred.risksofthreepred.methods#coord’s→ n=20 n=20 n=50 n=50Method↓ Efficiency Ave.ofτ 2 Efficiency% Ave.ofτ 2ARE 86% 1.21 99% 0.344ML/MM 68% 0.037 69% 0.037Oracle 100% 0.296 0.296Note:Becausemodelishomoscedastic,MaxLikandMethMomarethesame
24
Simulations#3 Thiswasasetof6simulationsinvolvingdifferent
scenarios,allwithheteroscedasticdataandrandomchoicesforparametersandbi .e.g.,Inthefirstsetup(thesimplest)
νP2 ∼U 0.1,0.33( ) ,νF2 =1, θ ∼U 0,1( ), b∼U 0.5,0.99( ),allindep. Inthese6scenariostheefficiencyofourAREpredictor
relativetotheoraclewasForn=20: 88%<Eff.<98% Forn=100: 91%<Eff.<99%.
[Note:Thelastscenarioinvolveduniformlydistributedobservations,ratherthannormallydistributedones.Butitisn’treportedinthemanuscriptwhethertheoracleknewandusedthatfact.I’llneedtoaskGourabandPaat.
25
Recap• Probleminvolvesmultivariatenormalmeans• Goalisquantilepredictionsoffutureobservations• Ashrinkagepredictorisproduced
o Shrinkageisproducedbyahierarchicalconjugatesetup.o Qualityofquantilepredictionmeasuredthroughpredictivecheck-loss,whichisthenaturallossforquantileprediction.
• MethodologyinvolvescreationanduseofacomplicatedbutcomputableAsymptoticRiskEstimate( ARE! )involvingHermitepolynomials.
• Methodisasymptoticallyjustified.And• Appearstoworkwellinsimulationsformoderaten.
26
ConcludingRemarksforBFF4
1. QuantilepredictormethodologyismotivatedfromaBayesianhierarchicalstructure.
2. Theformofthepredictorthendrivesthemethodology
3. TheAREmethodisaCONDITIONALFREQUENTISTnotion.
4. BUT:TheproposedAREpredictorisNOTBAYESIAN–Itdoesn’tresultfromanypriorIcanthinkof.notevenasymptotically
5. Wellknown,butoftenoverlooked–Predictionisdifferentfromparameterestimation.Anestimateofthesamplequantiledoesn’tdirectlyleadtoquantileprediction.
Top Related