Post on 26-May-2020
Lagrange regularisationapproach to compare nested
data sets and determineobjectively financial bubbles’
inceptions
G. Demos †1, D. Sornette†\
† ETH Zurich, Dept. of Management, Technology and
Economics, Zurich, Switzerland.
\ Swiss Finance Institute, c/o University of Geneva,
Geneva, Switzerland.
ABSTRACTInspired by the question of identifying the start time τ of
financial bubbles, we address the calibration of time series in
which the inception of the latest regime of interest is unknown.
By taking into account the tendency of a given model to overfit
data, we introduce the Lagrange regularisation of the normalised
sum of the squared residuals, χ2np(Φ), to endogenously detect
the optimal fitting window size := w∗ ∈ [τ : t2] that should
be used for calibration purposes for a fixed pseudo present time
t2. The performance of the Lagrange regularisation of χ2np(Φ)
defined as χ2λ(Φ) is exemplified on a simple Linear Regression
problem with a change point and compared against the Resid-
ual Sum of Squares (RSS) := χ2(Φ) and RSS/(N-p):= χ2np(Φ),
where N is the sample size and p is the number of degrees of
freedom. Applied to synthetic models of financial bubbles with
a well-defined transition regime and to a number of financial time
series (US S&P500, Brazil IBovespa and China SSEC Indices),
the Lagrange regularisation of χ2λ(Φ) is found to provide well-
defined reasonable determinations of the starting times for major
bubbles such as the bubbles ending with the 1987 Black-Monday,
the 2008 Sub-prime crisis and minor speculative bubbles on other
Indexes, without any further exogenous information. It thus al-
lows one to endogenise the determination of the beginning time
of bubbles, a problem that had not received previously a system-
atic objective solution.
Keywords: Financial bubbles, Time Series Anal-ysis, Numerical Simulation, Sub-Sample Selection,Overfitting, Goodness-of-Fit, Cost Function, Opti-mization.JEL classification: C32, C53, G01, G1.
1. Introduction
There is an inverse relationship between the ten-
dency of a model to overfit data and the sample size
1∗Corresponding author: gdemos@ethz.ch
used. In other words, the smaller the data sample
size, the larger the number of degrees of freedom,
the larger is the possibility of overfit (Loscalzo et al.,
2009). Due this characteristic feature, one can-
not compare directly goodness-of-fit metrics, such
as the Residual Sum of Squares (RSS) := χ2(Φ)
or its normalized version RSS/(N-p) := χ2np(Φ), of
statistical models over unequal sized samples for a
given parametrisation Φ. Here, N denotes the sam-
ple size while p is the number of degrees of freedom
of a model. This is particularly problematic when
one is specifically interested in selecting the optimal
sub-sample of a dataset to calibrate a model. This
is a common problem when calibrating time series,
when the model is only valid in a specific time win-
dow, which is unknown a priori. Our motivation
stems from the question of determining the begin-
ning of a financial bubble, but this question is more
generally applicable to time series exhibiting regime
shifts that one is interested in localising precisely.
In the literature, there are solutions for proper
model selection such as the Lasso (Tibshirani, 1996)
and Ridge regressions (Ng, 2004), where the cost
function contains an additional penalisation for
large values of the estimated parameters. Well-
known metrics such as the AIC and BIC are also
standard tools for quantifying goodness-of-fit of dif-
ferent models (Akaike, 1974) and for selecting the
one with the best compromise between goodness-of-
fit and complexity. However, results stemming from
these methodologies are only comparable within the
same data set.
There seems to be a gap in the literature about
the proper procedure one should follow when com-
paring goodness-of-fit metrics of a model calibrated
to different batches of a given data set. In order
to fill this gap, we propose a novel metric for cali-
brating endogenised end points and compare nested
data sets. The method empirically computes the
tendency of a model to overfit a data set via what we
term the “Lagrange regulariser term” λ. Once λ has
been estimated empirically, the cost function can be
corrected accordingly as a function of sample size,
giving the Lagrange regularisation of χ2np(Φ). As
the number of data points or the window beginning-
or end-point is now endogeneised, the optimal sam-
ple length can then be determined. We empirically
Preprint submitted to Elsevier
arX
iv:1
707.
0716
2v1
[q-
fin.
ST]
22
Jul 2
017
test the performance of the Lagrange regularisation
of χ2np(Φ), which defined χ2
λ(Φ) as the regularised
Residual Sum of Squares, in comparison with the
naive χ2(Φ) and χ2np(Φ) itself using both linear
and non-linear models as well as synthetic and real-
world time-series.
This paper is structured as follows. Section (2)
explains the motivation behind the proposed La-
grange regularising term. Moreover, we provide de-
tails of the derivation of λ as well as the analytical
expression for computing the tendency of a model to
overfit data. In Section (3), we make use of a simple
OLS regression to test the empirical performance of
the Lagrange regularisation of χ2np(Φ) on the prob-
lem of optimal sub-sample selection. Section (4)
shows how the regulariser can be used alongside
with the LPPLS model of financial bubbles in or-
der to diagnose the beginning of financial bubbles.
Empirical findings are given in Sec. (4.2) and Sec-
tion (5) concludes.
2. Formulation of calibration with varying
window sizes: How to endogenize t1 and
make different window sizes comparable
Let us consider the normalised mean-squared
residuals, defined as the sum of squares of the resid-
uals divided by the number t2 − t1 of points in the
sum corrected by the number of degrees of freedom
p of the model,
χ2np(Φ) :=
1
(t2 − t1)− p
t2∑i=t1
ri(Φ)2 , (1)
with
ri(Φ) = ydatai − ymodeli (Φ) , (2)
where Φ denotes the set of model parameters to fit
including a priori the left end point t1 of the cali-
bration window. The term ymodeli (Φ) corresponds
to the theoretical model and ydatai is the empirical
value of the time-series at time i.
For a fixed right end point t2 of the calibration
window, we are interested in comparing the results
of the fit of the model to the empirical data for
various left end points t1 of the calibration window.
The standard approach assumes a fixed calibrationwindow [t1, t2] with N = t2− t1 + 1 data points. In
order to relate the two problems, we consider the
minimisation of χ2np(Φ) at fixed t2 − t1 (for a fixed
t2) as minimising a general problem involving t1 as
a fit parameter augmented by the condition that
t2 − t1 + 1 = N is fixed. This reads
Min χ2λ(Φ) , (3)
with
χ2λ(Φ) :=
1
(t2 − t1)− p
t2∑i=t1
ri(Φ)2+λ(t2−t1) , (4)
where we have introduced the Lagrange parameter
λ, which is conjugate to the constraint t2 − t1 +
1 = N . Once the parameters Φ are determined,
λ is obtained by the condition that the constraint
t2 − t1 + 1 = N is verified.
Since data points are discrete, the minimisation
of (4) with respect to t1 reads
0 = χ2λ(Φ)(t1 + 1)− χ2
λ(Φ)(t1) =1
(t2 − t1 − p− 1)
t2∑i=t1+1
ri(Φ)2 − 1
t2 − t1 − p
t2∑i=t1
ri(Φ)2 − λ
=1
t2 − t1 − p
(1 +
1
t2 − t1 − p+O
(1
(t2 − t1 − p)2
)) t2∑i=t1+1
ri(Φ)2 − 1
t2 − t1 − p
t2∑i=t1
ri(Φ)2 − λ ,
= − 1
t2 − t1 − prt1(Φ)2
(1 +O
(1
t2 − t1 − p
))+
1
t2 − t1 − pχ2(Φ)
(1 +O
(1
t2 − t1 − p
))− λ .
(5)
Neglecting the small terms O(
1t2−t1−p
)leads to
χ2λ(Φ) = rt1(Φ)2 + λ(t2 − t1 − p) . (6)
2
Expression (6) has the following implications.
Consider the case where all squared terms ri(Φ)2 in
the sum (1) defining χ2λ(Φ) are approximately the
same and independent of t1, which occurs when the
residuals are thin-tailed distributed and the model
is well specified. Then, we have
ri(Φ)2 ≈ r2 , ∀i , including rt1(Φ)2 = r2 , (7)
and thus
χ2np(Φ) ≈ r2 . (8)
Expressing (6) with the estimation (7) yields
χ2λ(Φ) ≈ r2 + λ(t2 − t1 − p) . (9)
Comparing with (8), this suggests that varying t1 is
expected in general to introduce a linear bias of the
normalised sum χ2np(Φ) of squares of the residuals,
which is proportional to the size of the calibration
window (up to the small correction by the number
p of degrees of freedom of the model). If we want
to compare the calibrations over different window
sizes, we need to correct for this bias.
More specifically, rather than fixing the window
size t2−t1 +1 = N , we want to determine the ‘best’
t1, thus comparing calibrations for varying window
sizes, for a fixed right end point t2. As a conse-
quence, the Lagrange multiplier λ is no more fixed
to ensure that the constraint t2− t1 + 1 = N holds,
but now quantifies the average bias or “cost” asso-
ciated with changing the window sizes. This bias is
appreciable for small data sample sizes. It vanishes
asymptotically as N →∞, i.e. limN→∞λ = 0.
In statistical physics, this is analogous to the
change from the canonical to the grand canonical
ensemble, where the condition of a fixed number of
particles (fixed number of points in a fixed window
size) is relaxed to a varying number of particles with
an energy cost per particle determined by the chem-
ical potential (the Lagrange parameter λ) (Gibbs,
1902). It is well-known that the canonical ensemble
is recovered from the grand canonical ensemble by
fixing the chemical potential (Lagrange multiplier)
so that the number of particles is equal to the im-
posed constraints. Idem here.
How to determine the crucial Lagrange parameter
λ? We propose an empirical approach. When plot-
ting χ2np(Φ) as a function of t1 for various instances,
we observe that a linearly decreasing function of t1
provides a good approximation of it, as predicted
by (6) (for λ > 0). The slope can then be inter-
preted as quantifying the average bias of the scaled
goodness-of-fit χ2np(Φ) due to the reduced number
of data points as t1 is increased. This average bias
is clearly dependent on the data and of the model
used to calibrate it. We can thus interpret the aver-
age linear trend observed empirically as determin-
ing the effective Lagrange regulariser term λ that
quantifies the impact on the goodness-of-fit result-
ing from the addition of data points in the calibra-
tion, given the specific realisation of the data and
the model to calibrate. Thus, to make all the cal-
ibrations performed for different t1 comparable for
the determination of the optimal window size, we
propose to correct expression (1) by subtracting the
term λ(t2− t1) from the normalised sum of squared
residuals χ2np(Φ) given by Eq. (1), where λ is es-
timated empirically as the large scale linear trend.
Here, we omit the p correction since it leads to a
constant translation for a given model with given
number of degrees of freedom. Such a large scale
linear trend of χ2np(Φ) as a function of t1 has been
reported for a number of financial bubble calibra-
tions in (Demos and Sornette, 2017). Our proposed
procedure thus amounts simply to detrend χ2np(Φ),
which has the effect of making more pronounced the
minima of χ2np(Φ), as we shall see below for differ-
ent models.
To summarise, endogenising t1 in the set of pa-
rameters to calibrate requires to minimize
χ2λ(Φ) = χ2
np(Φ)− λ(t2 − t1) (10)
=1
(t2 − t1)− p
t2∑i=t1
ri(Φ)2 − λ(t2 − t1) ,
(11)
with,
ri(Φ) = ydatai − ymodeli (Φ) , (12)
where λ is determined empirically so that χ2np(Φ)−
λ(t2 − t1) has zero drift as a function of t1 over the
set of scanned values. The obtained empirical value
of λ can be used as a diagnostic parameter quanti-
fying the tendency of the model to over-fit the data.
3
We can thus also refer to λ as the “overfit measure”.
When it is large, the goodness-of-fit χ2(Φ) changes
a lot with the number of data points, indicating a
poor overall ability of the model to account for the
data. Demos and Sornette (2017) observed other
cases where χ2(Φ) is constant as a function of t1
(corresponding to a vanishing λ), which can be in-
terpreted in a regime where the model fits robustly
the data, “synchronizing” on its characteristic fea-
tures in a way mostly independent of the number
of data points.
3. Application of the Lagrange regularisa-
tion method to a simple linear-regression
problem
Consider the following linear model:
Y = βX + ε, (13)
with explanatory variable of length (N × 1) de-
noted by X = {x1, x2, . . . , xN}, regressand Y =
{y1, y2, . . . , yN} and error vector ε ∼ N (0, σ2).
Bold variables denote either matrices or vectors.
Fitting Eq. (13) to a given data set Y data consists
on solving the quadratic minimisation problem
β = argminβ
χ2(Φ), (14)
where Φ are parameters to be estimated and the
objective function χ2(Φ) is given by
χ2(Φ) =
N∑i=1
|Y datai − (Y modeli − βXi)|2 (15)
= ||Y data − (Y model − βX)||. (16)
The solution of Eq. (14) with (16) for a given data
set of length N reads
β = (X ′X)−1X ′Y . (17)
Let w∗ ⊆ Y data and have length ≤ N. w∗ ∈ [τ :
t2] thus denotes the optimal window size one should
use for fitting a model into a data set of length N
for a fixed end point := t2 and an optimal starting
point := τ .
In order to show how the goodness-of-fit met-
ric χ2(Φ) fails to flag the optimal τ -portion of
the data set where the regime of interest exists
and how delicate is χ2np(Φ) for diagnosing the true
value of the transition time τ , 20000 synthetic re-
alisations of the process (13) were generated, with
X := t ∈ [−200,+1], in such a way that Y data dis-
plays a sudden change of regime at τ = −100. In
the first half of the dataset [−200,−100], the data
points are generated with β = 0.3. In the second
half of the dataset [−101, 0], the data points are
generated with β = 0.6. After the addition of ran-
dom noise ε ∼ N (0, 1), each single resulting time-
series was fitted for a fixed end time t2 = 1 while
shrinking the left-most portion of the data (t1) to-
wards t2, starting at t1 = −200,−199, . . . , t2 − 3.
For the largest window with t1 = −200, there are
t2− t1 +1 = 1− (−200)+1 = 202 data points to fit.
For the smallest window with t1 = t2 − 3, there are
t2 − t1 + 1 = 4 data points to fit. For each window
size w, the process of generating synthetic data and
fitting the model was repeated 20000 times, allow-
ing us to obtain confidence intervals.
As depicted by Fig. (1), the proposed methodol-
ogy is able to correctly diagnose the optimal start-
ing point := τ associated with the change of slope.
While the χ2(Φ) metric monotonously decreases
and the χ2np(Φ) metric plateaus from t = −100 on-
wards, χ2np−λ(t2− t1) monotonously increases over
the same interval, thus marking a clear minimum.
The variance of the metric χ2λ(Φ) also increases over
this interval. Specifically, the metric χ2(Φ) tends
to favor the smallest windows and therefore over-
fitting is prone to develop and remain undetected.
The metric χ2np(Φ) suggests τ ≈ −90 after 20000
simulations, which is 10% away from the true value
τ = 100. Moreover, the dependence of χ2np(Φ) as a
function of t1 is so flat for t1 ∈ [−100 : −40] that
any given value of τ within this period is statisti-
cally significant. For this simulation study, χ2np(Φ)
ranges for 0.134 to 0.135 for t1 ∈ [−100 : −60], so
as to be almost undistinguishable over this inter-
val of possible τ values. As we shall see later on,
the performance of χ2np(Φ) degrades further to re-
semble that of the χ2(Φ) metric when dealing with
more complex nonlinear models such as the LPPLS
model. On the other hand, our proposed correction
via the Lagrange regulariser λ provides a simple and
effective method to identify the change of regime
4
and the largest window size compatible with the
second regime. The minimum is very pronounced
and clear, which is not the case for χ2np(Φ).
4. Using the Lagrange regularisation
method for Detecting the Beginning
of Financial Bubbles
In the previous Section, we have proposed a
novel goodness-of-fit metric for inferring the opti-
mal beginning point or change point τ (for a fixed
end point t2) in the calibration of a simple linear
model. The application of the Lagrange regulariser
λ allowed us to find the optimal window length
w∗ = [τ : t2] for fitting the model by enabling the
comparison of the goodness-of-fits across different
w values. We now extend the application of the
methodology to a more complex non-linear model,
which requires one to compare fits across different
window sizes in order to diagnose bubble periods
on financial instruments such as equity prices and
price indexes.
4.1. The LPPLS model
The LPPLS (log-periodic power law singularity)
model introduced by Johansen et al. (2000) provides
a flexible set-up for diagnosing periods of price ex-
uberance (Shiller, 2000) on financial instruments.
It highlights the role of herding behaviour, trans-
lating into positive feedbacks in the price dynamics
during the formation of bubbles. This is reflected in
faster-than-exponential growth of the price of finan-
cial instruments. Such explosive behavior is com-
pletely unsustainable and the bubbles usually ends
with a crash or a progressive correction. Here, we
use the LPPLS model combined with the Lagrange
regulariser λ in order to detect the beginning of fi-
nancial bubbles.
In the LPPLS model, the expectation of the log-
arithm of the price of an asset is written under the
form
fLPPL(φ, t) = A+B(f) + C1(g) + C2(h), (18)
where φ = {A,B,C1, C2,m, ω, tc} is a (1×7) vector
of parameters we want to determine and
f ≡ (tc− t)m, (19)
g ≡ (tc − t)m cos(ω ln(tc − t)), (20)
h ≡ (tc − t)m sin(ω ln(tc− t)). (21)
Note that the power law singularity (tc − t)m em-
bodies the faster-than-exponential growth. Log-
periodic oscillations represented by the cosine and
sine of ln(tc − t) model the long-term volatility dy-
namics decorating the accelerating price. Expres-
sion (18) uses the formulation of Filimonov and
Sornette (2013) in terms of 4 linear parameters
A,B,C1, C2 and 3 nonlinear parameter m,ω, tc.
Fitting Eq. (18) to the log-price time-series
amounts to search for the parameter set φ∗ that
yields the smallest N -dimensional distance be-
tween realisation and theory. Mathematically, using
the L2 norm, we form the following sum of squares
of residuals
F (tc,m, ω,A,B,C1, C2) =∑Ni=1
[ln[P (ti)]−A−B(fi)− C1(gi)− C2(hi)
]2,
(22)
for i = 1, . . . , N . We proceed in two steps. First,
enslaving the linear parameters {A,B,C1, C2} to
the remaining nonlinear parameters φ = {tc,m, ω},yields the cost function χ2(φ)
χ2(φ) := F1(tc,m, ω) (23)
= min{A,B,C1,C2}
F (tc,m, ω,A,B,C1, C2) (24)
= F (tc,m, ω, A, B, C1, C2) , (25)
where the hat symbol indicates estimated param-
eters. This is obtained by solving the optimization
problem
{A, B, C1, C2} = arg min{A,B,C1,C2}
F (tc,m, ω.A,B,C1, C2),
(26)
which can be obtained analytically by solving the
following system of equations,
N
∑fi
∑gi
∑hi∑
fi∑f2i
∑figi
∑fihi∑
gi∑figi
∑g2i
∑gihi∑
hi∑fihi
∑gihi
∑h2i
A
B
C1
C2
=
∑yi∑yifi∑yigi∑yihi
.(27)
5
Second, we solve the nonlinear optimisation prob-
lem involving the remaining nonlinear parameters
m,ω, tc:
{tc, m, ω} = arg min{tc,m,ω}
F1(tc,m, ω). (28)
The model is calibrated on the data using the Ordi-
nary Least Squares method, providing estimations
of all parameters tc, ω, m, A, B, C1, C2 in a given
time window of analysis.
For each fixed data point t2 (corresponding to
a fictitious “present” up to which the data is
recorded), we fit the price time series in shrinking
windows (t1, t2) of length dt := t2 − t1 decreasing
from 1600 trading days to 30 trading days. We shift
the start date t1 in steps of 3 trading days, thus giv-
ing us 514 windows to analyse for each t2. In order
to minimise calibration problems and address the
sloppiness of the model with respect to some of its
parameters (and in particular tc), we use a number
of filters to select the solutions. For further infor-
mation about the sloppiness of the LPPLS model,
we refer to (Bree et al., 2013; Sornette et al., 2015;
Demos and Sornette, 2017; Filimonov et al., 2017).
The filters used here are {(0.1 < m < 0.9), (6 <
ω < 13), (t2 − [t2 − t1] < tc < t2 + [t2 − t1])}, so
that only those calibrations that meet these condi-
tions are considered valid and the others are dis-
carded. These filters derive from the empirical evi-
dence gathered in investigations of previous bubbles
(Zhou and Sornette, 2003; Zhang et al., 2015; Sor-
nette et al., 2015).
Previous calibrations of the JLS model have fur-
ther shown the value of additional constraints im-
posed on the nonlinear parameters in order to re-
move spurious calibrations (false positive identifica-
tion of bubbles) (Demos and Sornette, 2017; Bree
et al., 2013; Geraskin and Fantazzini, 2011). For
our purposes, we do not consider them here.
4.2. Empirical analysis
We apply our novel goodness-of-fit metric to the
problem of finding the beginning times of financial
bubbles, defined as the optimal starting time t1 ob-
tained by endogenising t1 and calibrating it. We
first illustrate and test the method on synthetic
time series and then apply it to real-world finan-
cial bubbles. A Python implementation of the al-
gorithm is provided in the appendix.
4.2.1. Construction of synthetic LPPLS bubbles
To gain insight about the application of our
proposed calibration methodology on a controlled
framework and thus establish a solid background to
our empirical analysis, we generate synthetic price
time series that mimic the salient properties of fi-
nancial bubbles, namely, a power law-like accelera-
tion decorated by oscillations. The synthetic price
time series are obtained by using formula (18) with
parameters given by the best LPPLS fit within the
window w ∈ [t1 = 1 Jan. 1981: t2 = 30 Aug. 1987]
of the bubble that ended with the Black Monday 19
Oct. 1987 crash. These parameters are m = 0.44,
ω=6.5, C1 = -0.0001, C2=0.0005, A=1.8259, B= -
0.0094, tc = 1194 (corresponding to 1987/11/14),
where days are counted since an origin put at
t1 = Jan. 1981. To the deterministic component
describing the expected log-price given by expres-
sion (18) and denoted by fLPPLS(φ, t), we add
a stochastic element to obtain the synthetic price
time series
ln[P (t)] = fLPPLS(φ, t) + σε(t), (29)
where ε(t) ∼ N (0, σ0) noise, σ0 = 0.03 and t =
[1, . . . , N = 1100].
To create a price time series with a well-defined
transition point corresponding to the beginning of a
bubble, we take the first 500 points generated with
expression (29) and mirror them via a t → t1 − treflection across the time t1 = 1 Jan. 1981. We
concatenate this reflected sequence of 500 prices to
the 1100 prices obtained with (29) for t ≥ t1, so
that the true transition point corresponding to the
start of the bubble described by the LPPLS pat-
tern is t1 = 1 Jan. 1981. The black stochastic line
on the top of figure (2) represent this union of the
two time-series. This union constitutes the whole
synthetic time series on which we are going to ap-
ply our Lagrange regularisation of χ2λ(Φ) in order
to attempt recovering the true start time, denoted
by the hypothetical time t1 = 1 Jul. 1911.
For each synthetic bubble price time series, we
thus calibrated it with Eq. (18) by minimizing ex-
6
pression (1) in windows w = [t1, t2], varying t2 from
1912/07/01 to t2 = 1913/01/01, with t1 scanned
from t1 = Jan. 1910 up to 30 business days before
t2, i.e. up to t1,max = t2− 30 for each fixed t2. The
goal is to determine whether the transition point τ
we determine is close (or even equal to) the true
hypothetical value t1 = 01 Jul. 1911 for different
maturation times t2 of the bubble. The number of
degrees of freedom used for this exercise as well as
for the real-world time series is p = 8, which in-
cludes the 7 parameters of the LPPLS model aug-
mented by the extra parameter t1.
4.2.2. Real-world data: analysing bubble periods of
different financial Indices
The real-world data sets used consists on bubble
periods that have occurred on the following major
Indexes: S&P -5002, IBovespa3 and SSEC4. For
each data set and for each fixed pseudo present time
t2 depicted by red vertical dashed lines on Fig. (2),
our search for the bubble beginning time τ consists
in fitting the LPPLS model using a shrinking esti-
mation window w with t1 = [t2 − 30 : t2 − 1600]
with incremental step-size of 3 business days. This
yields a total of 514 fits per t2.
4.2.3. Analysis
Let us start with the analysis of the synthetic
time-series5 depicted in Fig. (3). For the earliest t2
= 1912/07/01, our proposed goodness-of-fit scheme
is already capable of roughly diagnosing correctly
the bubble beginning time, finding the optimal τ
to be ≈ May 1911. In contrast, the competing
metric (χ2np(Φ)) is degenerate as t1 → t2 and is
thus blind to the beginning of the bubble. For t2
closer to the end of the bubble, χ2np(Φ) continues
to deliver very small optimal windows, proposing
the incorrect conclusion that the bubble has started
very recently (i..e close to the pseudo present time
t2). This is a signature of strong overfitting, which
is quantified via λ and depicted in the title of the
figure alongside with the bubble beginning time and
t2. The Lagrange regularisation of the χ2np(Φ) locks
2t2’s = {1987.07.15; 1997.06.01; 2000.01.01; 2007.06.01}3t2’s = {2000.01.01; 2004.01.01; 2006.01.01; 2007.12.01}4t2’s = {2000.08.01; 2007.05.01; 2009.07.01; 2015.05.01}5t2’s = {1912.07.01; 1912.10.01; 1912.11.15; 1913.01.01}
into the true value of τ ≈ Jul.1911 as t2 → tc, i.e.,
as t2 moves closer and closer to January 1913 and
the LPPLS signal becomes stronger.
We now switch to the real-world time-series. For
the S&P -500 Index, see Fig. (4), the results ob-
tained are even more pronounced. While again
χ2np(Φ) is unable to diagnose the optimal starting
date of a faster than exponential log-price growth
τ ≡ t1, the Lagrange regularisation of the χ2np(Φ)
depicted by blank triangles in the lower box of the
figure is capable of overcoming the tendency of the
model to overfit data as t1 → t2. Specifically, the
method diagnoses the start of the Black-Monday
bubble at t1 ≈ March 1984 and the beginning of
the Sub-Prime bubble at ≈ Aug. 2003 in accordance
with (Zhou and Sornette, 2005).
We also picked two pseudo present times t′2s at
random in order to check how consistent are the
results. To our delight, the method is found capa-
ble of capturing the different time-scales present of
bubble formation in an endogenous manner. For
t2 = 1997.06.01, the method suggests the presence
of a bubble that nucleated more than five years ear-
lier. This recovers the bubble and change of regime
in September 1992, documented in Chapter 9 of
(Sornette, 2003) as a “false alarm” in terms of being
followed by a crash. Nevertheless, it was a genuine
change of regime as the market stopped its ascent
and plateaued for the three following months. For
t2 = 2000.01.01, χ2λ(Φ) diagnoses a bubble with a
shorter duration, which started in November 1998.
The starting time is coherent with the recovery af-
ter the so-called Russian crisis of August-September
1998 when the US stock markets dropped by about
20%. And this bubble is nothing but the echo in the
S&P500 of the huge dotcom bubble that crashed
in March-April 2000. More generally, scanning t2
and different intervals for t1, the Lagrange regu-
larisation of the χ2np(Φ) can endogenously identify
a hierarchy of bubbles of different time-scales, re-
flecting their multi-scale structure (Sornette, 2003;
Filimonov et al., 2017).
For the IBovespa and the SSEC Index (Figures
(5) and (6) respectively), the huge superiority of
the Lagrange regularisation of the χ2np(Φ) vs. the
χ2np(Φ) metric is again obvious. For each of the four
chosen t2’s in each figure, χ2λ(Φ) exhibits a well-
7
marked minimum corresponding to a well-defined
starting time for the corresponding bubble. These
objectively identified t1 correspond pleasantly to
what the eye would have chosen. They pass the
“smell test” (Solow, 2010). In contrast, the χ2np(Φ)
metric provides essentially no guidance on the de-
termination of t1.
5. Conclusion
We have presented a novel goodness-of-fit met-
ric, aimed at comparing goodnesses-of-fit across a
nested hierarchy of data sets of shrinking sizes. This
is motivated by the question of identifying the start
time of financial bubbles, but applies more generally
to any calibration of time series in which the start
time of the latest regime of interest is unknown. We
have introduced a simple and physically motivated
way to correct for the overfitting bias associated
with shrinking data sets, which we refer to at the
Lagrange regularisation of the χ2np(Φ) := 1
N−pSSR.
We have suggested that the bias can be captured
by a Lagrange regularisation parameter λ. In ad-
dition to helping remove or alleviate the bias, this
parameter can be used as a diagnostic parameter, or
“overfit measure”, quantifying the tendency of the
model to overfit the data. It is a function of both
the specific realisation of the data and of how the
model matches the generating process of the data.
Applying the Lagrange regularisation of the
χ2np(Φ) to simple linear regressions with a change
point, synthetic models of financial bubbles with a
well-defined transition regime and to a number of
financial time series (US S&P500, Brazil IBovespa
and China SSEC Indices), we document its impres-
sive superiority compared with the χ2np(Φ) metric.
In absolute sense, the Lagrange regularisation of
the χ2np(Φ) is found to provide very reasonable and
well-defined determinations of the starting times for
major bubbles such as the bubbles ending with the
1987 Black-Monday, the 2008 Sub-prime crisis and
minor speculative bubbles on other Indexes, with-
out any further exogenous information.
Appendix
200 150 100 50 0
N
0.0
0.5
1.0
1.5
2.0
2.5
χ2
Yt = βXt + εt
χ2
χ2
N− p
χ2
N− p − λ(N)
100 80 60 40 20 10
N
0.1
0.0
0.1
0.2
0.3
0.4
χ2
Figure 1: Different goodness-of-fit measures appliedto a shrinking-window linear regression problem (Eq.13) in order to diagnose the optimal calibration win-dow length: We simulated synthetic time-series with lengthN=200 (white circles) using expression (13) with a suddenchange of regime at t = −100. We then fitted the samemodel (13) within shrinking windows (from left to right), i.e.for a fixed t2 = 1, we shrink t1 from t1=-200 to t1=-3 andshow the values of χ2(Φ) (blue), χ2
np(Φ) (green) and χ2λ(Φ)
(red) metrics as a function of this shrinking estimation win-dow. For each pair [t2 : t1] (i.e. for each N), the processof generating synthetic data and fitting the model was re-peated 20000 times (resulting on confidence bounds for eachmetric). For t=[-200:-100], Yt was simulated with β = 0.3while from t = [-100:1], β = 0.6 was used. Without loss ofgenerality, both the data and the cost functions had theirvalues divided by their respectively maximum value in orderto be bounded within the interval [0, 1]. A Python script forgenerating the figure and performing all calculations can befound on Appendix.
8
Jan1910
Jan1911
Jan1912
Jan1913
Jul Jul Jul1.69
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
ln(P
t)
Synthetic
t2s
19821986
19901994
19982002
20062010
20144.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
ln(P
t)
S&P-500
t2s
19982000
20022004
20062008
20102012
20142016
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
ln(P
t)
IBovespa
t2s
19992001
20032005
20072009
20112013
20152017
6.5
7.0
7.5
8.0
8.5
9.0
ln(P
t)
SSEC
t2s
Figure 2: Synthetic and real-world Time-seriesused in this study for measuring the perfor-mance of different goodness-of-fit metrics atdifferent t2’s (red lines): Synthetic time-seriesand Indexes S&P -500, IBovespa and SSEC witht′2s = {1912.07.01; 1912.10.01; 1912.11.15; 1913.01.01},t′2s = {1987.07.15; 1997.06.01; 2000.01.01; 2007.06.01},t′2s = {2000.01.01; 2004.01.01; 2006.01.01; 2007.12.01}and t′2s = {2000.08.01; 2007.05.01; 2009.07.01; 2015.05.01}respectively (red dashed vertical lines).
1.691.701.711.721.731.741.751.761.77
ln(P
t)
t1 = 1911-04-17; t2 = 1912-07-15; λ= 1. 24e− 09
Jan1911
Apr Jul Oct Jan1912
Apr Jul Oct Jan1913
0.00.20.40.60.81.0
χ2
1.691.701.711.721.731.741.751.761.77
ln(P
t)
t1 = 1911-04-21; t2 = 1912-09-15; λ= 8. 82e− 10
Apr Jul Oct Jan1912
Apr Jul Oct Jan1913
Apr0.00.20.40.60.81.0
χ2
1.691.701.711.721.731.741.751.761.77
ln(P
t)
t1 = 1911-07-20; t2 = 1912-11-15; λ= 1. 62e− 09
Jan1911
Jan1912
Jan1913
Jul Jul Jul0.00.20.40.60.81.0
χ2
1.691.701.711.721.731.741.751.761.77
ln(P
t)
t1 = 1911-07-22; t2 = 1913-01-01; λ= 1. 79e− 09
Jan1912
Jan1913
Jul Jul Jul0.00.20.40.60.81.0
χ2
Figure 3: Diagnosing the beginning of financial bub-bles by comparing two goodness-of-fit metrics χ2
np(Φ)
vs. χ2λ(Φ) using the LPPLS model on Synthetic
Time-Series: χ2np(Φ) is depicted by blank circles in the
lower plot while our proposed metric is depicted by blanktriangles. The dashed black vertical lines denotes the mini-mum of each goodness of fit metric and therefore representsthe optimal τ ≡ t1 for χ2
np(Φ) and χ2λ(Φ). For a fixed t2, the
log-price time-series of the Index was fitted using a shrinkingwindow from t1 = [t2− 30 : t2− 1600] sampled every 3 days.For a fixed t2 and t1, we display the resulting fit of the LP-PLS model (red line) obtained with the parameters solvingEq. (28).
9
5.0
5.2
5.4
5.6
5.8
ln(P
t)
t1 = 1984-11-25; t2 = 1987-07-15; λ= 1. 69e− 08
May 1984
Nov 1984
May 1985
Nov 1985
May 1986
Nov 1986
May 1987
Nov 19870.00.20.40.60.81.0
χ2
6.0
6.2
6.4
6.6
6.8
ln(P
t)
t1 = 1992-08-26; t2 = 1997-06-01; λ= 5. 37e− 09
19931994
19951996
19971998
0.0
0.2
0.4
0.6
0.8
1.0
χ2
6.76.86.97.07.17.27.37.4
ln(P
t)
t1 = 1998-11-06; t2 = 2000-01-01; λ= 1. 97e− 08
Dec 1997
Mar 1998
Jun 1998
Sep 1998
Dec 1998
Mar 1999
Jun 1999
Sep 1999
Dec 1999
Mar 2000
Jun 20000.00.20.40.60.81.0
χ2
6.66.76.86.97.07.17.27.37.4
ln(P
t)
t1 = 2003-09-05; t2 = 2007-06-01; λ= 7. 85e− 09
20022003
20042005
20062007
20080.00.20.40.60.81.0
χ2
Figure 4: Same as figure 3 for the US S&P -500 Index.
8.5
9.0
9.5
10.0
ln(P
t)
t1 = 1999-01-21; t2 = 2000-01-01; λ= 1. 13e− 07
Apr Jul Oct Jan1999
Apr Jul Oct Jan2000
Apr Jul0.0
0.2
0.4
0.6
0.8
1.0
χ2
9.0
9.2
9.4
9.6
9.8
10.0
ln(P
t)
t1 = 2003-02-22; t2 = 2004-01-01; λ= 3. 29e− 08
Apr Jul Oct Jan2003
Apr Jul Oct Jan2004
Apr Jul0.00.20.40.60.81.0
χ2
9.8
10.0
10.2
10.4
10.6
ln(P
t)
t1 = 2004-05-25; t2 = 2006-01-01; λ= 1. 5e− 08
Apr Jul Oct Jan2005
Apr Jul Oct Jan2006
Apr Jul0.0
0.2
0.4
0.6
0.8
1.0
χ2
9.0
9.5
10.0
10.5
11.0
11.5
ln(P
t)
t1 = 2004-05-20; t2 = 2007-12-01; λ= 1. 59e− 08
2003 2004 2005 2006 2007 20080.0
0.2
0.4
0.6
0.8
1.0
χ2
Figure 5: Same as figure 3 for the Brazilian IBovespaindex.
10
6.97.07.17.27.37.47.57.6
ln(P
t)
t1 = 1999-07-02; t2 = 2000-08-01; λ= 3. 14e− 08
Apr Jul Oct Jan2000
Apr Jul Oct Jan2001
0.00.20.40.60.81.0
χ2
7.0
7.5
8.0
8.5
9.0
ln(P
t)
t1 = 2005-05-07; t2 = 2007-05-01; λ= 4. 11e− 08
Jan2005
Jan2006
Jan2007
Jul Jul Jul Jul0.0
0.2
0.4
0.6
0.8
1.0
χ2
7.47.57.67.77.87.98.08.18.2
ln(P
t)
t1 = 2008-10-25; t2 = 2009-07-01; λ= 6. 44e− 08
Jul Oct Jan2009
Apr Jul Oct Jan2010
0.00.20.40.60.81.0
χ2
7.4
7.6
7.8
8.0
8.2
8.4
8.6
ln(P
t)
t1 = 2013-06-22; t2 = 2015-05-01; λ= 2. 34e− 08
Jan2013
Jan2014
Jan2015
Jul Jul Jul0.00.20.40.60.81.0
χ2
Figure 6: Same as figure 3 for the Chinese SSEC in-dex.
11
// Python script for computing the Lambda
regulariser metric - OLS case.
// Copyright: G.Demos @ ETH-Zurich - Jan.2017
########################
def simulateOLS():
""" Generate synthetic OLS as presented in
the paper """
nobs = 200
X = np.arange(0,nobs,1)
e = np.random.normal(0, 10, nobs)
beta = 0.5
Y = [beta*X[i] + e[i] for i in
range(len(X))]
Y = np.array(Y)
X = np.array(X)
Y[:100] = Y[:100] + 4*e[:100]
Y[100:200] = Y[100:200]*8
return X, Y
########################
def fitDataViaOlsGetBetaAndLine(X,Y):
""" Fit synthetic OLS """
beta_hat = np.dot(X.T,X)**-1. *
np.dot(X.T,Y) # get beta
Y = [beta_hat*X[i] for i in range(len(X))]
# generate fit
return Y
########################
def getSSE(Y, Yhat, p=1, normed=False):
"""
Obtain SSE (chi^2)
p -> No. of parameters
Y -> Data
Yhat -> Model
"""
error = (Y-Yhat)**2.
obj = np.sum(error)
if normed == False:
obj = np.sum(error)
else:
obj = 1/np.float(len(Y) - p) *
np.sum(error)
return obj
########################
def
getSSE_and_SSEN_as_a_func_of_dt(normed=False,
plot=False):
""" Obtain SSE and SSE/N for a given
shrinking fitting window w """
# Simulate Initial Data
X, Y = simulateOLS()
# Get a piece of it: Shrinking Window
_sse = []
_ssen = []
for i in range(len(X)-10): # loop t1
until: t1 = (t2 - 10):
xBatch = X[i:-1]
yBatch = Y[i:-1]
YhatBatch =
fitDataViaOlsGetBetaAndLine(xBatch,
yBatch)
sse = getSSE(yBatch, YhatBatch,
normed=False)
sseN = getSSE(yBatch, YhatBatch,
normed=True)
_sse.append(sse)
_ssen.append(sseN)
if plot == False:
pass
else:
f, ax = plt.subplots(1,1,figsize=(6,3))
ax.plot(_sse, color=’k’)
a = ax.twinx()
a.plot(_ssen, color=’b’)
plt.tight_layout()
if normed==False:
return _sse, _ssen, X, Y # returns
results + data
else:
return _sse/max(_sse),
_ssen/max(_ssen), X, Y # returns
results + data
########################
def LagrangeMethod(sse):
""" Obtain the Lagrange regulariser for a
given SSE/N"""
# Fit the decreasing trend of the cost
function
12
slope = calculate_slope_of_normed_cost(sse)
return slope[0]
########################
def calculate_slope_of_normed_cost(sse):
#Create linear regression object using
statsmodels package
regr =
linear_model.LinearRegression(fit_intercept=False)
# create x range for the sse_ds
x_sse = np.arange(len(sse))
x_sse = x_sse.reshape(len(sse),1)
# Train the model using the training sets
res = regr.fit(x_sse, sse)
return res.coef_
########################
def obtainLagrangeRegularizedNormedCost(X, Y,
slope):
""" Obtain the Lagrange regulariser for a
given SSE/N Pt. III"""
Yhat = fitDataViaOlsGetBetaAndLine(X,Y) #
Get Model fit
ssrn_reg = getSSE(Y, Yhat, normed=True) #
Classical SSE
ssrn_lgrn = ssrn_reg - slope*len(Y) # SSE
lagrange
return ssrn_lgrn
########################
def GetSSEREGvectorForLagrangeMethod(X, Y,
slope):
"""
X and Y used for calculating the original
SSEN
slope is the beta of fitting OLS to the
SSEN
"""
# Estimate the cost function pondered by
lambda using a Shrinking Window.
_ssenReg = []
for i in range(len(X)-10):
xBatch = X[i:-1]
yBatch = Y[i:-1]
regLag =
obtainLagrangeRegularizedNormedCost(xBatch,
yBatch,
slope)
_ssenReg.append(regLag)
return _ssenReg
13
6. bibliography
References
Akaike, H. (1974). A new look at the statistical
model identification. IEEE Trans. on Automatic
Control, 19(6):716–723.
Bree, D., Challet, D., and Peirano, P. (2013). Pre-
diction accuracy and sloppiness of log-periodic
functions. Quantitative Finance, 3:275–280.
Bree, D. S., Challet, D., and Peirano, P. P.
(2013). Prediction accuracy and sloppiness of
log-periodic functions. Quantitative Finance,
13(2):275–280.
Demos, G. and Sornette, D. (2017). Birth or burst
of financial bubbles: which one is easier to diag-
nose? Quantitative Finance, 5:657–675.
Filimonov, V., Demos, G., and Sornette, D. (2017).
Modified profile likelihood inference and interval
forecast of the burst of financial bubbles. Quan-
titative Finance, 7(8):1167–1186.
Filimonov, V. and Sornette, D. (2013). A Sta-
ble and Robust Calibration Scheme of the Log-
Periodic Power Law Model. Physica A: Statistical
Mechanics and its Applications, 392(17):3698–
3707.
Geraskin, P. and Fantazzini, D. (2011). Every-
thing you always wanted to know about log-
periodic power laws for bubble modeling but were
afraid to ask. The European Journal of Finance,
19(5):366–391.
Gibbs, J. W. (1902). Elementary Principles in
Statistical Mechanics. Dover Books on Physics.
Dover Publications.
Johansen, A., Ledoit, O., and Sornette, D. (2000).
Crashes as critical points. International Journal
of Theoretical and Applied Finance, 2:219–255.
Loscalzo, S., Yu, L., and Ding, C. (2009). Consensus
group stable feature selection. In Proceedings of
the 15th ACM SIGKDD international conference
on Knowledge discovery and data mining., pages
567–576.
Ng, A. Y. (2004). Feature selection, l1 vs. l2 regular-
ization, and rotational invariance. In Proceedings
of the Twenty-first International Conference on
Machine Learning, pages 78–. ACM.
Shiller, R. (2000). Irrational exuberance. Princeton
University Press, Princeton, NJ.
Solow, R. (2010). Building a science of economics
for the real world. House Committee on Science
and Technology; Subcommittee on Investigations
and Oversight (July 20).
Sornette, D. (2003). Why stock markets crash: Crit-
ical events in complex financial systems. Prince-
ton University Press, New Jersey.
Sornette, D., Demos, G., Zhang, Q., Cauwels, P.,
Filimonov, V., and Zhang, Q. (2015). Real-
Time Prediction and Post-Mortem Analysis of
the Shanghai 2015 Stock Market Bubble and
Crash. Journal of Investment Strategies, 4(4):77–
95.
Tibshirani, R. (1996). Regression shrinkage and se-
lection via the lasso. Royal Statistical Society,
58(1):267–288.
Zhang, Q., Zhang, Q., and Sornette, D.
(2015). Early warning signals of finan-
cial crises with multi-scale quantile regres-
sions of Log-Periodic Power Law Singularities.
http://ssrn.com/abstract=2674128.
Zhou, W.-X. and Sornette, D. (2003). Evidence of a
worldwide stock market log-periodic anti-bubble
since mid-2000. Physica A: Statistical Mechanics
and its Applications, 330(3–4):543–583.
Zhou, W.-X. and Sornette, D. (2005). Testing the
stability of the 2000-2003 us stock market “an-
tibubble”. Physica A: Statistical Mechanics and
its Applications, 348:428–452.
14