Post on 30-Oct-2020
DIPARTIMENTO DI INFORMATICA ESCIENZEDELL’INFORMAZIONE
UNIVERSITA’ DI GENOVA - ITALY
TechnicalReportDISI-TR-00-4(v 1.1)
Discontinuousand Intermittent SignalForecasting:A Hybrid Approach
FrancescoMasulliIstitutoNazionaleper la Fisicadella Materia&
DISI-Dipartimentodi InformaticaeScienzedell’InformazioneUniversita di Genova,Via Dodecaneso35, I-16146Genova, Italy
E-mail: masulli@disi.unige.it
Giovambattista CicioniIstitutodi RicercasulleAcque
ConsiglioNazionaledelleRicerche, Via Reno1, I-00198Roma,ItalyE-mail: cicioni@irsa.rm.cnr.it
LeonardStuderInstitutdePhysiquedesHautesEnergies
UniversitedeLausanne, CH-1015Dorigny, SwitzerlandE-mail: leonard.studer@iphe.unil.ch
Abstract
A constructive methodologyfor shapinga neuralmodelof a non-linearprocess,sup-portedby resultsandprescriptionsrelatedto the Takens-Mane theorem,hasbeenre-centlyproposed.Following thisapproach,themeasurementof thefirst minimumof themutual informationof the outputsignalandthe estimationof the embeddingdimen-sionusingthemethodof globalfalsenearestneighborspermitto designtheinput layerof a neuralnetwork or a neuro-fuzzysystemto beusedaspredictor. In this paperwepresentan extensionof this predictionmethodologyto discontinuousor intermittentsignals. As the Universalfunction approximationtheoremsfor neuralnetworks andfuzzy systemsrequiresthecontinuityof thefunctionto beapproximate,we apply theSingular-SpectrumAnalysis(SSA) to the original signal, in orderto obtaina familyof time seriescomponentsthataremoreregularthantheoriginal signalandcanbe,inprinciple,individually predictedusingthementionedmethodology. Onthebasisof thepropertiesof SSA,thepredictionof theoriginal seriescanberecoveredasthesumofthoseof all theindividualseriescomponents.Weshow anapplicationof thispredictionapproachto a hydrologyproblemconcerningtheforecastingof daily rainfall intensityseries,usinga databasecollectedfor aperiodof 10yearsfrom 135stationsdistributedin theTiber riverbasin.
Chapter 1
Intr oduction
In thelastfew years,neuralnetworksandfuzzysystemshavebeenwidely usedin non-linear dynamicsystemsmodelingandforecasting.Thoseapplicationsaresupportedon UniversalFunctionsApproximationpropertiesholding for both neuralnetworksandfuzzy system[3, 12, 33]. However, from thesetheoremsno informationcanbeobtainedin orderto definethestructureof learningmachine,while theapplicationthiskind of systems,suchasa Multi-Layer Perceptrons(MLPs), to the problemof timeseriesforecastingimplies the settingof: the numberof units in the input layer, thesamplingtimeof theseries,andthestructureanddimensionof hiddenlayers.Theneu-ral network theorygivesonly generalsuggestionsin orderto choosethesequantities.Thespecificitiesof datasethaveto betakeninto accountat this level to tailor theMLPto thetime serieswhich have to beforecasted.
As shown by our groupin [26, 22], a constructive methodologyfor shapinga su-pervisedneuralmodelof anon-linearprocesscanbebasedontheresultsandprescrip-tionsrelatedto theTakens-Mane theoremandon theGlobal FalseNearestNeighborsmethod(FNN) proposedby Abarbanel[1]. In [10] a similar methodologyhasbeenindependentlyproposed.
In this paperwe presentanextensionof this methodologyto thepredictionof dis-continuousor intermittentsignals. basedon an ensemblemethodusinga decompo-sition basedon the Singular-SpectrumAnalysis(SSA) [31] . The SSA decomposestheraw signalin reconstructedcomponents. Eachreconstructedcomponentcanbe,inprinciple,predictedseparatelyusingtheapproachpresentedin [22]. Then,thepredic-tion of the original seriescanbe recoveredasthe sumof thoseof all the individualseriescomponents.Theapproachfollowedin the paperis basedon the predictionofreconstructedwavesthataresumof disjoint groupsof reconstructedcomponentsandon thefollowing recompositionof theforecastingof theraw signalby additionof thepredicitionsof individual reconstructedwaves.
We presentalsoanapplicationof thepresentedmethodsto theforecastingof rain-fall intensityseriescollectedin theTiber basin.
In thenext chapter, wepresenttheMulti-Layer Perceptronsanddiscusstheirprop-ertiesrelevant to seriesforecasting. In Ch. 3 we give the basisof the methodologyfor time seriesforecasting.In Ch. 4 we introducetheSingularSpectrumAnalysisand
1
ensemblemethodbasedonSSAdecomposition.In Ch. 5 wepresenttheapplicationtorainfall forecastingandtheobtainedresults.Theconclusionsof thework aredrawn inCh. 6.
2
Chapter 2
Learning Time SeriesusingMulti-Lay er Perceptrons
Theoreticalandexperimentalresultssupportthe useof neuralnetworks in many ap-plicative tasks. In particular, it hasbeenshown that suchsystemscanperformfunc-tion approximation[3, 12], Bayesianclassification[24], unsupervisedvectorquantiza-tion clusteringof inputs[14], content-addressablememories[11], linearandnon-linearprincipalcomponentanalysisandindependentcomponentanalysis[9].
Artificial neuralnetworksaremadeup of simplenodesor neuronsinterconnectedto oneanother. Generallyspeaking,a nodeof a neuralnetwork canbe regardedasablock thatmeasuresthesimilarity betweenthe input vectorandtheparametervector,or weightvector, associatedto thenode,followedby anotherblock that computesanactivation function, normally not linear [18, 9]. The transferfunction of an artificialneuronis givenby theequation:
y H∑i
wixi θ (2.1)
wherey is the outputof the neuron,H is a the activation function,wi areweights,xi
aretheinputs,andθ is thethreshold.Themostusedneuralnetwork is theMulti-Layer Perceptron(MLP) that is a feed-
forward modelbasedon layersof neurons. Nodesof eachlayer are interconnectedwith all nodesof the following layer. In this way Multi-Layer Perceptronsperformnon-linearmapsfrom aninputspaceto anoutputspace.
Moreover, asdemonstratedby the Universal ApproximationTheorem [3, 12], anMLP with a single hidden layer 1, and using sigmoid activation functionsH
x
1 1 exp ax , wherea is theslopeparameterof thesigmoidfunction),is sufficient
to uniformly approximateany continuousfunctionwith supportin a unit hypercube.The non-linearmap can be automaticallylearnedfrom databy a MLP thought
supervisedlearningtechniquesbasedon the minimizationof a costfunction,suchas1Theoutputnodesconstitutetheoutputof theMLP. Theremainingnodesconstitutehiddenlayers of the
network.
3
the Root Mean Square(RMS) error . The most diffusedlearningtechniqueis theError Back-Propagationthatis anefficientapplicationof theGradientDescentmethod[25, 9].
Theuniversalapproximationpropertyimpliesthat,if thenon-lineardynamicalpro-cesscanberepresentedby a continuousfunction,anefficient non-linearmodelcanbebuilt from datausingaMulti-Layer Perceptron.UsingMLPsthecostlydetaileddesignstepof thefirst principlesmodelusuallyimplementedin thenon-linearsystemidenti-ficationis transformedin a moresimplerstructuringstepof theMLP plusanoptionalpre-processing(eventuallydriven by any understandingof the physicalmodelof theprocess)of theraw datacomingfrom thefield.
Even if, in principle, the functionapproximationpropertyof MLP guaranteesthefeasibility of data-basedmodelsof non-lineardynamicalsystems,the neuralnetworktheorydon’t giveany suggestionaboutmany details:for exampleno generalprescrip-tionsareavailableconcerningthedimensionof thedatawindow (i.e. input layerof theMLP), thesamplingrateof theinputdata,thedimensionof thehiddenlayer, andthedi-mensionof thetrainingset,andthenmostof timethosefundamentaldesignparametershaveto beobtainedby experimentsandheuristics.
4
Chapter 3
Hints fr om Dynamical SystemsTheory
3.1 Dynamical Systemsand ChaosTheory
3.1.1 StateSpace
A deterministicdynamicalsystemis describedby a setof differentialequations.Itsevolution is representedby thetrajectoryin statespace(of dimensionn) of thevectorQ
x x y y z z wherex x y y z z arethevariablesof thesystemandtheirderivatives.Thefiguremadein statespaceby Q is theattractorof thesystem.
For non-linearsystems,thedynamicalvariables(x y z ) arecoupled.Theevolu-tion of onevariable(let sayx) is not independentof all theotherones(y z ). Exceptfor few simplephenomena,the setof differentialequationsis unknown. Even,oftenthewholesetof relevanteffectivedynamicalvariablesis notalwayswell defined.But,asthevariablesareinterdependent,theobservationof only oneof thosebringsinfor-mation— maybein an implicit way — on the otheronesandconsequentlyon thecompletedynamicalsystem.This is thereasonwhy time seriesof non-lineardynamicsystemsaresouseful.
3.1.2 EmbeddingTheorem
The questionis now: “How to reconstructthe completedynamicalsystemwith onlythe one-variabletime series
s1 s2 s3 ?” Here the theoryof dynamicalsystems
givesan answer. In particular, the EmbeddingTheorem proposedindependentlyin1981TakensandMane [27, 17] givesananswerto theabovequestion.
In the Takens-Mane theoremwe consideran augmentedvectorS built with d el-ementsof the time series.The dimensionof the vectord hasto be greaterthantwotimesthebox-countingdimensionD0 of theattractorof thesystem:
d 2D0 (3.1)
5
A vector S satisfying the Takens-Mane boundcited in the previous paragraphwillevolve in a reconstructedstatespace,andits evolution will be in a diffeomorphicre-lation with theoriginal Q statespacepoint (a diffeomorphismis a smoothone-to-onerelation). In otherwords,for every practicalpurposestheevolution of S is a fair copyof theevolutionof Q.
It is worth noting that thereis a distinctionbetweenthe orderof the differentialequation(n) whichis thedimensionof thestatespacewherelivethetruestatevectorQandthesufficientdimensionof a reconstructedstatespace(d) wherethereconstructedvectorS lives.
3.1.3 An Example
In orderto elucidatetheEmbeddingTheorem,let considerasinewavest A sin
t . In
d=1(i.e. thest space)this wave oscillatesin theinterval A A . Two pointswhich
areclosein thesenseof Euclidean(or otherdistance)mayhave quitedifferentvaluesof s
t . In thiswaytwo ”close” pointsmaymovein oppositedirectionsalongthesingle
spatialaxis. In a two dimensionalspacest st T , whereT is a time lag,theambiguity
of the dynamicsof pointsis resolved. The systemevolveson a figure (in generalanellipse)thatis topologicallyequivalentto acircle. If wedraw thesinewavein thethreedimensions
st st T st 2T , no furtherunfoldingoccursandthesineis representedas
anew ellipse.
3.1.4 The Method of Embedding
In order to reconstructthe dynamicalsystemwe can usethe time delay embeddingmethod[1]. This method consists in building d-dimensionalstatevectorsSi
si si T si d 1 T . In principle,it sufficesthatd n. But, theeffectivedimension
d is notdirectly relatedto thedynamicaldimensionn asin thecaseof weakcoupledvariables.
3.2 Choosingthe time delay
ThetimedelayT (or timelag) usedin theembeddinghasto bechosencarefully. If it istoo long,thesamplessi si T si d 1 T arenotcorrelated1 andthen,in general,thedynamicalsystemcannotbereconstructed.If it is tooshort,everysampleis essentiallyacopy of thepreviousone,bringingvery little informationon thedynamicalsystem.
We usethe Shannon’s mutual informationto quantify the amountof informationsharedby two samplesin order to get an usefulestimationof the time lag T. Let’sdefinedtheaverage mutualinformationbetweenmeasurementsai drawn from thesetA andmeasurementsbi drawn from setB. Thesetof measurementsA is madeof thevaluesof theobservablesi andthesetB is madeof thevaluessi t (t is a time interval).
1This happensin particularfor chaoticsystems,for which even two initially closechaotictrajectorieswill divergeexponentiallyin time.
6
Averagemutualinformationis then:
It ∑
si A si t B
Psi si t log2
Psi si t
Psi P si t (3.2)
whereP areprobabilitiesdistributionsbasedon frequency observations.
It hasbeensuggested[6, 5, 29, 1] to take the time T, wherethe first minimumof I(t) occurs,asthe valueto useat the time delayin thephasespacereconstruction.In this way the valuesof sn andsn T are the most independentof eachother in aninformation-theoreticsense.
Moreoverthefirst minimumof averagemutualinformationis agoodcandidatefortheinterval betweenthecomponentsof thestatevectorsthatwill beinput to theneuralnetwork modelof thenon-lineardynamicalprocess.
3.3 Evaluating the Global EmbeddingDimension
From the EmbeddingTheorem,the box countingdimensionD0 shouldbe evaluated.In principle,it canbeestimateddirectly from thetimeseriesitself, but this taskis verysensitive to thenoiseandneedslargesetof datapoints(orderof 10D0 datapoints)[1].
In order to avoid thoseproblems,we canestimatethe embeddingdimensiondE,definedas the lowest (integer) dimensionwhich unfolds the attractor, i.e. the min-imal dimensionfor which foldings due to the projectionof the attractorin a lowerdimensionalspaceareavoided. Theembeddingdimensionis a global dimensionandin generalis differentfrom thelocaldimensionof theunderlyingdynamics.
The EmbeddingTheoremguaranteesthat if the dimensionof the attractoris D0,thenwe canunfold the attractorin a spaceof dimensiondE (dE 2Do). It is worthnotingthatdE is nota necessaryconditionfor unfolding,but is sufficient.
Thedimensionof input layerof theMulti-Layer Perceptronwill bethenof dimen-sionhigh enoughin orderthat thedeterministicpartof thedynamicsof thesystemisunfold.
3.3.1 Global FalseNearestNeighbors
In practice,themethodof GlobalFalseNearestNeighborsproposedby Abarbanel[1],canbeusedto evaluatetheembeddingdimensiondE. Givenadataspacereconstructionin dimensiond, with datavectorsSi
si si T si d 1 T , wherethetimedelayT
is thefirst minimumof averagemutualinformation(Eq.3.2).Let beSNN
i
sNNi sNN
i T ! sNNi d 1 T , thenearestneighborvectorin phasespace.
If thevectorSNNi is a falseneighbor(FNN) of Si , having arrivedin its neighborhoodby
projectionfrom a higherdimensionbecausethepresentdimensiond doesnot unfoldtheattractor, thenby goingto thenext dimensiond 1 we maymovethis point out oftheneighborhoodof Si .
We definethedistanceξ betweenpointswhenseenin dimensiond 1 relative to
7
thedistancein dimensiond as
ξi " #R2
d 1
i R2
d
i
R2d
i (3.3)
then
ξi%$ si dT sNN
i dT $Rd
i (3.4)
As suggestedby Abarbanel[1], SNNi andSi canbeclassifiedasa falseneighborif
ξi is a numbergreaterthana thresholdθ (ξi θ). In many applicationsa goodvaluefor θ is 15.
In caseof cleandatafrom a dynamicalsystem,we expect that the percentageofFNNswill dropfrom nearly100%in dimensiononecloseto zerowhendE is reached.
It is worthnotingthat,aswe go to higherdimensionalspacesthevolumeavailablefor datagrowsasthedistanceto thepowerof dimension,andno nearneighborwill beclassifiedcloseneighbor. In this casewe canmodify theEq.3.4as
ξi%$ si dT sNN
i dT $RA
(3.5)
whereA is the nominal “radius” of the attractordefinedas the Root Mean Square(RMS)errorvalueof dataaboutits mean,e.g.:
RA 1
N
N
∑i & 1
$ si sav $ (3.6)
sav 1
N
N
∑i & 1
si (3.7)
In [20] a very efficient implementationof FNN algorithmis presented.This algo-rithm is basedon thework by NeneandNayar[21].
It is worth noting that thereare two main argumentsthat cansuggestto sizetheinput layer of a predictorbasedon MLPs smallerthanthe evaluationobtainedusingtheFNN method.In fact this evaluationis still anupperbound,andmoreover for anassignedsizeof thetrainingset,a limitation of thecomplexity of thelearningmachinecanleadto bettergeneralization.
3.3.2 BellsWhistles and Pitfalls of FNN' TheglobalFNN calculationis simpleandfast.' TheFNN calculationappliedto signalscomingfrom two differentoutputsof thesamedynamicalsystemgives,in general,two differentvaluesof dE. Thenfromeachsignalwe will obtaindifferentreconstructedcoordinatesystems,but bothconsistentwith theoriginaldynamicalsystem.' FNN methodis valid evenif thesignalof interestresultsfrom a filteredoutputof a dynamicalsystem[1, 4].
8
' If the signal is contaminedby noise(assumedto be generatedby an high di-mensionalsystem),it maybethatthecontaminationwill dominatethesignalofinterestandFNN will show thedimensionrequiredto unfold thecontamination.Here,a simplebyproductof FNN calculationis anindicationof noiselevel in asignal.
9
Chapter 4
EnsembleMethod basedonSingular SpectrumAnalysisDecomposition
4.1 Singular SpectrumAnalysis
The methodologydescribedin the previous sectionhasbeensuccessfullyappliedinthedesignof Multi-Layer PerceptronsandNeuro-Fuzzysystemsto forecastingof sim-ulatednon-linearandchaoticsystems[26, 22] andof to realworld problemsuchasthemodelingof thevibrationdynamicof a realsystemconsistingin a 150MW Siemenssteamturbine[19].
The proposedmethodologycannot be directly appliedto forecastingdiscontinu-ousor intermittentsignals,astheuniversalfunctionapproximationtheoremsfor neuralnetworks[3] andfuzzysystems[33] requirethecontinuityof thefunctionto beapprox-imate.
In orderto avoid theeffectof discontinuitiesof asignalwecanapplytheSingular-SpectrumAnalysis(SSA) [15, 23, 31, 16] to the signalto be forecasted.In SSA thestatevectorSi
si si 1 si M 1 is a temporalwindow (augmentedvector)of the
seriess, madeup by a givennumberof samplesM.Thecornerstoneof SSAis theKarhunen-Loeveexpansionor PrincipalComponent
Analysis(PCA) [28] thatis basedon theeigenvaluesproblemof thelaggedcovariancematrix Zs. Zs hasa Toeplitz structure,i.e. constantdiagonalscorrespondingto equallags: ())))))* c
0 c
1 c
M 1
c1 c
0 c
1+ c
1
cM 1 c
1 c
0
,.------/ (4.1)
10
In absenceof prior informationaboutthe signal it hasbeensuggest[31] to usethefollowing estimatefor Zs:
cj 1
N j
N j
∑i & 1
sisi j (4.2)
The original seriescanbe expandedwith respectto the orthonormalbasiscorre-spondingto theeigenvectorsof Zs
si j M
∑k& 1
pki u
kj 1 0 j 0 M 0 0 i 0 N M (4.3)
wherepki arecalledprincipal components(PCs)andtheeigenvectorsuk
j arecalledtheempiricalorthogonalfunctions(EOFs),andtheorthornomalityproperty
M
∑k& 1
ukju
kl δ j l 1 0 j 0 M 1 0 l 0 M (4.4)
holds. It is worth noting that SSA doesnot resolve periodslonger thanthe windowlengthM. Hence,if we want to reconstructa strangeattractor, whosespectrumin-cludesperiodsof arbitrarylength,thelargeM thebetter, avoiding to exceedingM N
3(otherwisestatisticalerrorscoulddominatethelastvaluesof theauto-covariancefunc-tion).
In [30, 8,31, 13, 16, 7] many applicationsof SingularSpectrumAnalysishavebeenpresented,includingnoisereduction,detrending,spectralestimate,andprediction.
Concerningthe applicationof SSA to prediction,that is the main interestof thepresentpaper, it is supportedby the following argument. Sincethe PCsarefilteredversionof the signaland typically band-limited,their behavior is more regular thanthatof theraw seriess, andhencemorepredictable.
VautardandGhil in [31] fit an autoregressive (AR) modelfor eachindividual PCusingtheAR coefficientestimateof Burg [2], while Lisi, Nicolis andSandri[16] usedMulti-Layer Perceptronsin order to estimatefiltered versionof the raw signalusingobtainedusingSSA.
In orderto reducethecomputationalcostswedecomposetheraw seriess in recon-structedwavescorrespondingto SSA subspacesequivalentto similar explainedvari-anceandwe predictthemusingMulti-Layer Perceptronscombinedwith independentevaluationof time lag usingthefirst minimumof mutualinformationandembeddingdimensionusingFalseNearestNeighborsmethod.
4.2 Reconstructedcomponentsandreconstructedwaves
Following VautardandGhil [31], supposewe want to reconstructthe original signalsi startingfrom a SSA subspace1 of k eigenvectors. By analogywith Eq. 4.3, theproblemcanbeformalizedasthesearchfor aseriessof lengthN, suchthatthequantity
H2 s N M
∑i & 0
M
∑j & 1
si j ∑
k 2 pki u
kj 2 (4.5)
11
is minimized.In otherwords,theoptimalseriess is theonewhoseaugmentedversionS is theclosest,in the least-squaressense,to theprojectionof theaugmentedseriesSontoEOFswith indicesbelongingto 1 .
Thesolutionof theleast-squaresproblemof Eq.4.5 is givenby
si4355556 55557
1M ∑M
j & 1 ∑k 2 pki ju
kj for M 0 i 0 N M 1
1i ∑i
j & 1 ∑k 2 pki ju
kj for 1 0 i 0 M 1
1N i 1 ∑M
j & i N M ∑k 2 pki ju
kj for N M 2 0 i 0 N (4.6)
When 1 consistson a singleindex k, theseriess is calledthekth RC, andwill bedenotedby sk.
RCshaveadditiveproperties,i.e.
s ∑k 2 sk (4.7)
In particulartheseriesscanbeexpandedasthesumof its RCs:
s M
∑k& 1
sk (4.8)
Notethat,despiteits linearaspect,thetransformchangingtheseriess into sk is, infact,non-linear, sincetheeigenvectorsuk dependnon-linearlyon s.
If we truncatethis sumto an assignednumberof RCs,the explainedvarianceoftherelatedaugmentedvectorS is thesumof theeigenvaluesassociatedto thoseRCs,while theestimationof theresultingreconstructionerroris thesumof theeigenvaluescorrespondingto theremainingRCs.As a consequence,it is suitableto ordertheRCsfollowing thevalueof theeigenvalues.
Let be 1 1 81 2 81 L L disjoint subspaces,thena reconstructedwave(RW) Ωl (l 1 L) is definedas
Ωl ∑
k 2 l
sk 1 0 l 0 L (4.9)
Then,from Eq.s4.8and4.9,onecanobtain:
s L
∑l & 1
Ωl (4.10)
thatsaysthattheoriginalseriesscanberecoveredasthesumof all theindividualRWs.
4.3 Hybrid Approachto ComplexSignal Prediction
In orderto designa predictorfor complex signals,suchasdiscontinuousand/orinter-mittent signals,we canapply the following approachthat combinesan unsupervisedstepandonesupervisedone,building-upansucha way anensembleof learningma-chines:
12
' Unsuperviseddecomposition:Using the SingularSpectrumAnalysis, decom-posestheoriginal signalS in reconstructedwaves(RWs), correspondingto sub-spaceswith equalexplainedvariance;' Supervisedlearning: Preparesa predictorfor eachRW usingthe methodologydescribedin Ch.3;' OperationalPhase:Thepredictionof theoriginalsignalS is thenobtainedasthesumof thepredictionsof individualRWs, i.e. usingEq.4.10.
It is worth noting that, sometimethe mostcomplex waves(in generalthosecor-respondingthe the low eigenvalues)cannotsatisfactorypredicted,usingtheavailabledata.Following thecriteriaof thebestprediction[16] in theEq.4.10we canexcludedthemif, whenif enclosedin thesum,makeworsetheoverallprediction.
13
Chapter 5
Application to RainfallForecasting
5.1 Data Setand Methods
Oneof our applicationsof the previous describedforecastingapproachconcernstheforecastingof daily rainfall intensitiesseriesof 3652sampleseach,collectedby 135stationslocatedin theTiber riverbasin(seeFig. 5.1) in theperiod01/01/1958- 12/31/1967.
The dataprocessingstartedby consideringthe seriesof the MeanStation(MS),definedastheaverageof all 135rainfall intensityseries(Fig. 5.2). In Fig 5.3awindowontheperiod07/01/66- 12/30/66is presentedin orderto bettershow thediscontinuityandintermittenceof thestudiedsignal.Fig.5.4showsthegraphof themutualinforma-tion of theMS’stimeseries.Its first minimumgivesT 7. Thisvaluehasbeenusedasthetimelagfor thecomputationof GlobalFalseNearestNeighbors.Thegraphof FNNis shown in Fig. 5.5.Till d 6 thecurvedecreaseswith thegrowing of dimension,andthenreachesa plateauof 20%.Theembeddingdimensionis thendE
6.Following the constructive approachdescribedin Ch. 3, we designeda predictor
basedon a Multi-Layer Perceptron.TheMLP wasmadeup by two hiddenlayersof 5units,aninput layerof 6 inputsspacedby atime lagof 7 days.Theresultsobtainedbysuchaway arepoor, dueto thediscontinuityof thehydrologicalvariable.
In orderto reducetheeffectsof thediscontinuities,weusedtheSSAdecompositionensemblemethodshown in thepreviouschapter.
We appliedtheSingular-SpectrumAnalysis(SSA)to asignalcorrespondingto thefirst 3000samplesof MS series.Thewindow width usedfor theSSA wasM 182,i.e. 6 months,that is a periodsufficient to take in accountseasonalperiodicitiesof therelatedphysicalphenomena.
Fig. 5.6 shows the orderedlist of eigenvaluesand the explainedvarianceof thereconstructedsignalusinganincreasingnumberof RCs.
Then,from theoriginal MS serieswe obtained10 wavesΩ1 Ω10 reconstructedfrom 10disjointsub-spaces,eachof themrepresentinga10%of theexplainedvariance
14
Figure5.1: Distribution of the135stationsin theTiber riverbasin.
15
0 500 1000 1500 2000 2500 3000 3500 40000
20
40
60
80
100
120
time (days)
Dai
ly r
ain
(mm
)
Figure5.2: MeanStation:Daily rain millimeters.Period01/01/1958- 12/31/1967.
0 20 40 60 80 100 120 140 160 180 2000
5
10
15
20
25
30
35
time (days)
Dai
ly r
ain
(mm
)
Figure5.3: MeanStation:Daily rain millimeters.Period07/01/66- 12/30/66.
16
0 10 20 300
0.5
1
1.5
2
2.5
3
3.5
time (days)
Mut
ual I
nfor
mat
ion
Figure5.4: MeanStation:Mutual Information.Thefirst minimumis for t 7.
0 5 10 15 200
20
40
60
80
100
time (days)
FN
N R
atio
Figure5.5: MeanStation:GlobalFalseNearestNeighbors.
17
0 50 100 150 2000
50
100
150
200
250
Eig
en V
alue
Spe
ctra
0 50 100 150 2000
20
40
60
80
100
Exp
lain
ed V
aria
nce
Figure5.6: MeanStation: Eigenvaluesspectrum(up) andexplainedvarianceof theaugmentedvectorsrelatedto anincreasingnumberof RCs(down).
18
Table 5.1: Reconstructedwaves (RWs) from disjoint SSA subspaces(eachof themexplaining10%of thevariance)andcorrespondingreconstructedcomponents(RCs).TheSSAis performedusingusingawindow of 182days.
RW RCs
Ω1 1-4Ω2 5-11Ω3 12-19Ω4 20-28Ω5 29-39Ω6 40-52Ω7 53-70Ω8 71-93Ω9 94-126Ω10 127-182
(Tab5.1).Waves Ω1 Ω6 (correspondingto the first 52 RCs), are more regular than the
remainingwaves(correspondingto subspaceswith low eigenvalues)aremorecomplex(Fig. 5.7).
Fig. 5.8showsthemutualinformationfor eachRW, while Fig. 5.9showsthecorre-spondingGlobalFalseNeighborsplots.Theevaluationsof thefirstminimumof mutualinformationandof dE for eachRW arepresentedin Tab. 5.2.
Then,we designeda neuralpredictorbasedon a MLP for eachindividualwave oftheMS, following theconstructiveapproachdescribedin Ch.3, implementing,in suchaway, a SSAdecompositionensembleof learningmachines.
The bestresultsfor eachRW have beenobtainedusingas inputswindows of 5consecutiveelementsandtwo hiddenlayerswith dimensionsdescribedin Tab. 5.3.Aseachwave contains3652daily samples,in our casefor eachwave we obtaineda datasetof 3646associative couples,eachof themconsistingof a window of 5 consecutiveelements,asinput,andthenext dayrainfall intensity, asoutput.
EachMLP wastrainedusingthefirst 2000associativecouples(trainingset), usingtheerrorback-propagationalgorithmwith momentum[32], andabatchpresentationofsamples.Thefollowing 1000associativecouples(validationset) wereusedin ordertoimplementanearlystoppingof thetrainingprocedure.Theremaining646wereusedfor measuringthequality of theforecastingof thereconstructedwave (testset).
19
Table5.2: First minimum of Mutual Information(T) andandembeddingdimension(dE) computedusingT andothertime lagsfor eachreconstructedwave.
RW T dE 9 T : dE 9 7: dE 9 1:Ω1 22 4 3 2Ω2 9 18 14 3Ω3 4 10 7 4Ω4 5 18 14 4Ω5 4 14 9 4Ω6 3 5 6 4Ω7 2 4 4 5Ω8 2 4 4 6Ω9 2 5 5 4Ω10 5 10 8 4
5.2 Resultsand Discussion
The predictionresultsfor eachreconstructedwave are presentedin Tab. 5.3 and inFig. 5.10.
ThepredictionsobtainedusingtheSSAdecompositionensembleof learningma-chines(i.e., thesumof thepredictionsof the10waves)at1 dayaheadareverysatisfac-tory, asfor theresultingMS predictiontheRootMeanSquare(RMS) erroron thetestsetis .95 mm of rain, while the Maximum Absolute(MAXA) error is 6.47mm, i.e.,thepredictedsignalis substantiallycoincidentwith themeasuredMS rainfall intensitysignal.
As shown in Figs.5.11,5.12,and5.13,thepredictionsof theMS rainfall intensitysignalaresubstantiallycoincidentwith themeasuredMS 1.
It is worth noting that the designof the ensemblelearningmachineis critical.Choosinga window M 182 for the SSA, the bestpredictionresultswereobtainedusingMLPs with four or five inputsandtwo hiddenlayers. Using MLPs predictorswith four inputswe obtainedresultsslight worse.Inthis casetheRMS for MS is 1.05mm and the MAXA is 8.05 mm for MLPs predictorsusing four inputs. We noticethattheMaximumAbsoluteerroroccursthesameday(11/05/1967)thanfor thearchi-tectureusingMLPs with five inputs. A differentwindow for SSAcangive resultsofinferior quality. E.g.,usingM=256 asthewindow for SSAwe obtainedgoodpredic-tion performancesonly for for wavesΩ1 Ω6, correspondingto 60%of theexplainedvariance(first 76 RCs).Theresultinggeneralizationof theSSAdecomposionensem-
1Notethatin thecomparisonshown in Fig. 5.11thepredictedsignalis clampedto zero.
20
Table5.3: Sizeof thehiddenlayers(L1 andL2), RootMeanSquare(RMS) errorandMaximumAbsolute(MAXA) errorfor eachreconstructedwave - Sizeof MLPs InputLayer=5.
RW L1 L2 RMS MAXA
Ω1 6 4 .02 .05Ω2 8 5 .03 .12Ω3 6 4 .04 .15Ω4 8 4 .04 .11Ω5 8 5 .06 .14Ω6 8 4 .15 .40Ω7 4 4 .15 .38Ω8 6 4 .64 1.92Ω9 3 4 .75 2.40Ω10 3 4 .29 .90
ble waspoor, evenleaving out in Eq.4.10thepredictionsof Ω7 Ω10 as,if enclosedin theaddition,makeworsetheoverallprediction.
We underlinethat thedimensionof theoptimal input layer (i.e. 5) is smallerthanthe dE evaluatedwith the FNN method(Tab. 5.2). This choiceis supportedby thegeneralizationtrade-off dueto complexity of the learningmachineandlimed sizeofthetrainingset(seediscussionin Ch.3.3.1).Concerningthetime lag betweeninputs,we investigateddifferentvaluesasthefirst minimumof themutualinformationis onlyaprescriptionandnot a theoreticalresult( in [26]).
The plateauin the FNN plots of Fig. 5.4 is a symptomof the presenceof highdimensionalnoise[1]. After the SSA decompositionwe cannoticethat the noiseisconcentratedmainly in RW10 and also in RW3, RW5, and RW9, as shown in theplateausin thetheirFNN plots(Fig. 5.9.
21
0 50 100 150 200−1
0
1
2
3
4
5
6
time (days)
Dai
ly r
ain
(mm
)
RW1
0 50 100 150 200−4
−2
0
2
4
6
time (days)
Dai
ly r
ain
(mm
)
RW2
0 50 100 150 200−4
−2
0
2
4
6
time (days)
Dai
ly r
ain
(mm
)
RW3
0 50 100 150 200−3
−2
−1
0
1
2
3
time (days)
Dai
ly r
ain
(mm
)
RW4
0 50 100 150 200−3
−2
−1
0
1
2
3
4
time (days)
Dai
ly r
ain
(mm
)
RW5
0 50 100 150 200−3
−2
−1
0
1
2
3
time (days)
Dai
ly r
ain
(mm
)
RW6
0 50 100 150 200−4
−3
−2
−1
0
1
2
3
4
time (days)
Dai
ly r
ain
(mm
)
RW7
0 50 100 150 200−6
−4
−2
0
2
4
time (days)
Dai
ly r
ain
(mm
)
RW8
0 50 100 150 200−8
−6
−4
−2
0
2
4
6
8
time (days)
Dai
ly r
ain
(mm
)
RW9
0 50 100 150 200−8
−6
−4
−2
0
2
4
6
8
time (days)
Dai
ly r
ain
(mm
)
RW10
Figure5.7: ReconstructedWaves.Period07/01/1966- 12/30/1966.
22
0 10 20 30 400
1
2
3
4
5
6
7
8
RW1
0 10 20 30 401
2
3
4
5
6
7
RW3
0 10 20 30 401
2
3
4
5
6
7
RW4
0 10 20 30 401
2
3
4
5
6
RW5
0 10 20 30 401
2
3
4
5
6
RW6
0 10 20 30 400
1
2
3
4
5
6
RW7
0 10 20 30 400
1
2
3
4
5
6
RW8
0 10 20 30 401
2
3
4
5
6
7
RW9
0 10 20 30 401
2
3
4
5
6
RW10
Figure5.8: Reconstructedwaves- Mutual Information.
23
0 5 10 15 200
20
40
60
80
100
RW1
0 5 10 15 200
20
40
60
80
100
RW2
0 5 10 15 200
20
40
60
80
100
RW3
0 5 10 15 200
20
40
60
80
100
RW4
0 5 10 15 200
20
40
60
80
100
RW5
0 5 10 15 200
20
40
60
80
100
RW6
0 5 10 15 200
20
40
60
80
100
RW7
0 5 10 15 200
20
40
60
80
100
RW8
0 5 10 15 200
20
40
60
80
100
RW9
0 5 10 15 200
20
40
60
80
100
RW10
Figure5.9: ReconstructedWaves- GlobalFalseNeighborsusingT=7.
24
−2 0 2 4 6−1
0
1
2
3
4
5
6
MS
Pre
dict
ed M
S
RW1
−5 0 5−6
−4
−2
0
2
4
6
MS
Pre
dict
ed M
S
RW2
−5 0 5−6
−4
−2
0
2
4
6
MS
Pre
dict
ed M
S
RW3
−5 0 5−5
0
5
MS
Pre
dict
ed M
S
RW4
−5 0 5−5
0
5
MS
Pre
dict
ed M
S
RW5
−5 0 5 10−5
0
5
10
MS
Pre
dict
ed M
SRW6
−5 0 5−5
0
5
MS
Pre
dict
ed M
S
RW7
−5 0 5−5
0
5
MS
Pre
dict
ed M
S
RW8
−10 −5 0 5 10 15−6
−4
−2
0
2
4
6
8
MS
Pre
dict
ed M
S
RW9
−10 −5 0 5 10 15−8
−6
−4
−2
0
2
4
6
8
MS
Pre
dict
ed M
S
RW10
Figure5.10: ReconstructedWaves- Scatterplots on the testset(usingMLPs with 5inputs). 25
0 50 100 150 2000
5
10
15
20
25
time (days)
Dai
ly r
ain
(mm
)
Figure5.11: MeanStation:1 dayaheadforecastingin theperiod07/01/66- 12/30/66usingtheensembleof 10 MLPswith 5 inputs.
0 50 100 150 200−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
time (days)
Err
or (
mm
)
Figure5.12: MeanStation: 1 day aheadforecasting.Errorsin the period07/01/66-12/30/66usingtheensembleof 10 MLPswith 5 inputs.
26
0 5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
30
35
40
45
50
Predicted Station
MS
Figure5.13: MeanStation: scatterplot of the 1 day haedforecastingon the testsetusingtheensembleof 10 MLPswith 5 inputs.
27
Chapter 6
Conclusions
In thispaperwepresentedanextensionof amethodologyfor signalforecasting[1, 26,22] to thecaseof discontinuousandintermittentsignals.
As in [22], we usedpredictorsbasedon Multi-Layer Perceptronsor Neuro-FuzzySystemscharacterizedby theUniversalFunctionApproximationProperty. The inputlayer of thosepredictorareshapedusing resultsandsuggestionsfrom the theoryofdynamicalsystemslinked to the Takens-Mane theorem[27, 17] aboutthe sufficientdimensionof thereconstructionvectorfor thedynamicsof anattractor.
In orderto avoid theeffect of thediscontinues,in this paperwe have proposedanEnsembleMethodbasedon SingularSpectrumAnalysis[15, 23, 31, 16] Decomposi-tion basedon thefollowing designsteps:' Unsuperviseddecomposition:Using the SingularSpectrumAnalysis, decom-
posestheoriginal signalS in reconstructedwaves(RWs), correspondingto sub-spaceswith equalexplainedvariance;' Supervisedlearning: Preparesa predictorfor eachRW usingthe methodologydescribedin Ch.3;' OperationalPhase:Thepredictionof theoriginalsignalS is thenobtainedasthesumof thepredictionsof individualRWs, i.e. usingEq.4.10.
Thepresentedmethodologyhasbeensuccessfullyappliedto theforecastingof rain-fall intensitiesseriescollectedby 135stationsdistributedin theTiber river basinfor aperiodof 10years.
Moreover, preliminaryresultsfor the forecastingof the rainfall seriesof an indi-vidual stationarealsoin goodagreementwith data[20].
28
Acknowledgments
Thisworkswassupportedby IRSA-CNR,ProgettoFinalizzatoMadessII CNR,INFM,andUniversita di Genova. We thankFabio MontarsoloandDanielaBarattafor theprogrammingsupport.
29
Bibliography
[1] H.D.I. Abarbanel. Analysisof ObservedChaotic Data. Springer, New York,USA, 1996.
[2] J.P. Burg. Maximumentropy spectralanalysis. In D.G.Childers,editor, ModernSpectrumAnalysis, IEEEPress,page34,New York, 1978.
[3] G. Cybenko. Approximationby superpositionsof a sigmoidalfunction. Mathe-maticsof Control Signals,andSystems, 2:303–314,1989.
[4] M.E. Dave. Reconstructionof attractorsfrom filtered time series. PhysicaD,101:195–206,1997.
[5] A. Fraser. Informationtheoryandstrangeattractors.TechnicalReportPhDthesis,Universityof Texas,Austin,1989.
[6] A. FraserandL. Swinney. Independentcoordinatesfor strangeattractorsfrommutualinformation.PhysicalReview, 33:1134–1140,1986.
[7] M Ghil. TheSSA-MTM toolkit: Applicationsto analysisandpredictionof timeseries.In B. Bosacchi,J.C.Bezdek,andD.B. Fogel,editors,Applicationof SoftComputing, volume3165of Proceedingsof SPIE, pages216–230,Bellingham,WA, 1997.
[8] M. Ghil andR. Vautard.Rapiddisintegrationof thewordieice shelf in responseto atmosphericwarning.Nature, 350:324,1991.
[9] S. Haykin. Neural Networks.A ComprehensiveFoundation(SecondEdition).PrenticeHall, UpperSaddleRiver, NJ,1999.
[10] S. Haykin. and J. Principe. Making senseof a complex world: Using neuralnetworks to dynamicallymodelchaoticeventssuchasseaclutter. IEEE SignalProcessingMagazine,15,1998
[11] J. J. Hopfield. Neuralnetworks andphysicalsystemswith emergentcollectivecomputationalfacilities. Proceedingsof theNationalAcademyof Sciences,USA,79:2554–2558,1982.
[12] K. Hornik, M. Stinchcombe,andH. White. Multilayer feedforwardnetworksareuniversalapproximators.Neural Networks, 2:359–366,1989.
30
[13] C.L. KeppenneandM. Ghil. Adaptive filtering andpredictionof noisy multi-variatesignals: an applicationto subannualvariability in atmosphericangularmomentum.InternationalJournalof BifurcationandChaos, 3:625–634,1993.
[14] T. Kohonen.Self-OrganizationandAssociativeMemory. Springer, Berlin, thirdedition,1989.
[15] R. KumaresanandD.W. Tuffs. Data-adaptive principal componentsignalpro-cessing.In IEEE Proc. Conf. on DecisionandControl, page949,Albuquerque,USA, 1980.IEEE.
[16] F. Lisi, O. Nicolis, andM. Sandri. CombiningSingular-SpectrumAnalysisandneuralnetworks for time seriesforecasting.Neural ProcessingLetters, 2:6–10,1995.
[17] R. Mane. On the dimensionof the compactinvariantsetsof certainnon-linearmaps. In D.A. RandandL.-S. Young,editors,DynamicalSystemsand Turbu-lence, volume898 of Lecture Notesin Mathematics, pages230–242,Warwick1980,1981.Springer-Verlag,Berlin.
[18] F. Masulli. Bayesianclassificationby feedforward connectionistsystems. InF. Masulli, P. G. Morasso, and A. Schenone,editors, Neural Networks inBiomedicine- Proceedingsof the AdvancedSchool of the Italian BiomedicalPhysicsAssociation- Como(Italy) 1993, pages145–162,Singapore,1994.WorldScientific.
[19] F. Masulli, R. Parenti,andL. Studer. Neuralmodelingof non-linearprocesses:Relevanceof theTakens-Mane theorem.InternationalJournal on ChaosTheoryandApplications, 4:59-74,1999.
[20] F. Montarsolo. A toolkit for discontinuousseriesforecasting. Laureathesisincomputerscience(in Italian), DISI - Departmentof ComputerandInformationSciences,Universityof Genova- Genova,(Italy), 1998.
[21] S.A.NeneandS.K.Nayar. A simplealgorithmfor nearestneighborsearchin highdimensions. IEEE Transactionson Pattern Analysisand Machine Intelligence,19,1997.
[22] R. Parenti,F. Masulli, andL. Studer. Control of non-linearprocessby neuralnetworks: BenefitsusingtheTakens-Mane theorem.In Proceedingsof theICSCSymposiumon Intelligent Industrial Automation,IIA’97, pages44–50,Millet,Canada,1997.ICSC.
[23] E.R.Pike,J.G.MCWhirter, M. Bertero,andC. deMol. Generalizedinformationtheoryfor inverseproblemsin signalprocessing.IEE Proceedings, 59:660–667,1984.
[24] D.W. Ruck,S.K. Rogers,M. Kabrisky, M.E. Oxley, andB.W. Suther. Themul-tilayer perceptronasanapproximationto a Bayesoptimaldiscriminantfunction.IEEETransactionsonNeural Networks, 1:296–298,1990.
31
[25] D.E. Rumelhart,G.E. Hinton, and R.J. Williams. Learninginternal represen-tationsby error propagation. In D.E. RumelhartandJ.L. McClelland,editors,Parallel DistributedProcessing, volume1, chapter8, pages318–362.MIT Press,Cambridge,1986.
[26] L. StuderandF. Masulli. Building a neuro-fuzzysystemto efficiently forecastchaotictimeseries.NuclearInstrumentsandMethodsin PhysicsResesarch, Sec-tion A, 389:264–667,1997.
[27] F. Takens. Detectingstrangeattractorsin turbulence. In D.A. RandandL.-S.Young,editors,DynamicalSystemsandTurbulence, volume898of LectureNotesin Mathematics, pages366–381,Warwick,1981.Springer-Verlag,Berlin.
[28] C. W. Therrien. Decision,Estimation,and Classification: An IntroductiontoPatternRecognitionandRelatedTopics. Wiley, New York, 1989.
[29] J.VastanoandL. Rahman.Informationtransportin spatio-temporalchaos.Phys-ical Review Letters, 72:241–275,1989.
[30] R. VautardandM. Ghil. Singular-spectrumanalysisin nonlineardynamics,withapplicationsto paleoclimatictimeseries.PhysicaD, 35:395–424,1989.
[31] R. Vautard,P. You,andM. Ghil. Singular-spectrumanalysis:A toolkit for short,noisychaoticsignals.PhysicaD, 58:95–126,1992.
[32] T.P. Vogl, J.K.Mangis,A.K. Rigler, W.T. Zink, andD.L. Alkon. Acceleratingtheconvergenceof the back-propagationmethod. Biological Cybernetics, 59:257–263,1988.
[33] L. WangandJ.M. Mendel. Fuzzybasisfunctions,universalapproximation,andorthogonalleast-squareslearning. IEEE Trans.on Neural Networks, 5:807–14,1992.
32