CHAPTER-8 : TEST OF HYPOTHESISgauss.stat.su.se/gu/e/slides/Time Series/Forecast.pdf · observed...
Transcript of CHAPTER-8 : TEST OF HYPOTHESISgauss.stat.su.se/gu/e/slides/Time Series/Forecast.pdf · observed...
Page# 1
FORECASTING: When estimates of future conditions are made on a systematic basis, the
process is referred to as ‘forecasting’ and the figure and the statement obtained is known as a
forecast. Forecasting is a service whose purpose is to offer the best available basis for
management expectations of the future and to help management understand the implications
for the firm’s future of the courses of actions to them at present.
Forecasting is concerned with two main tasks: first, the determination of the best basis
available for the formation of intelligent managerial expectations: and second, the handling of
uncertainty about the future, so that implications of decisions become explicit.
Main Functions of Forecasting: The following are main functions of forecasting:
(i) The creation of plans of action of action. It is impossible to evolve a
worthwhile system of business control without one acceptable system of
forecasting
(ii) The second general use of forecasting is to found in monitoring the continuing
progress of plans based on forecasts.
(iii) The forecast provides a warning system of the critical factors to be monitored
regularly because they might drastically affect the performance of the plan.
Steps in forecasting: The forecasting of business fluctuations consists of the following steps.
(i) Understanding why changes in the past have occurred: Forecast should use the
data on past performance to get a speedometer reading of the current rate and
how far the rate is increasing and decreasing.
(ii) Determining which phases of business activity must be measured: It is necessary
to measure certain phases of business activity in order to predict what changes
will probably follow the present level of activity.
(iii) Selecting and compiling data to be used as measuring devices: There is an
independent relationship between the selection of statistical data and
determination of why business fluctuations occur.
(iv) Analysis of data: Data are analyzed in the light of one’s understanding of the
reason why changes occur.
Methods of Forecasting: The following are some of the important methods of forecasting:
1. Historical Analogy Method;
2. Field survey and opinion poll;
3. Extrapolation
4. Regression Analysis
5. Econometric models
6. Lead –Lag Analysis
7. Exponential smoothing
8. Input-Output Analysis
9. Time series Analysis
TIME SERIES ANALYSIS
Time series: Arrangement of Statistical data in accordance with occurrence of time is known
as time series. A time series may be mathematically expressed by the functional relationship
Yt =f(t) where Yt is the value of the variable under consideration at time t.
There are two main goals of time series analysis: (a) Identifying the nature of the
phenomenon represented by the sequence of observations, and (b) Forecasting (predicting
future values of the time series variable). Both of these goals require that the pattern of
observed time series data is identified and more or less formally described. Once the pattern
Page# 2
is established, we can interpret and integrate it with other data (i.e., use it in our theory of the
investigated phenomenon, e.g., seasonal commodity prices). Regardless of the depth of our
understanding and the validity of our interpretation (theory) of the phenomenon, we can
extrapolate the identified pattern to predict future.
Role of Time series analysis: Time series analysis is of great significance in decision-making for the following reasons.
(i) It helps in the understanding of past behavior: By observing data over a period of
time, one can easily understand what changes have taken place in the past. Such
analysis will be extremely helpful in predicting the future behavior.
(ii) It helps in planning future operations: If the regularity of occurrence of any
feature over a sufficient long period could be clearly established then, within
limits prediction of probable future variations would become possible.
(iii) It helps in evaluating current accomplishments: The actual performance can be
compared with the expected performance and the cause of variation analyzed. For
example, if expected sales for 2006-7 were 20000 colored TV sets and the actual
sales were only 19000 one can investigate the cause for the shortfall in
achievement.
(iv) It facilitates comparison: Different time series are often compared and important
conclusions drawn there from.
Components of Time Series: Changes of data with change of time depend on a number of causes; these causes are known
as the components of time series. The common components of time series are:
1. Trend or long term movement or Secular Trend is the long run direction of the
time series.
2. Seasonal Variation is the pattern in a time series within a year. These patterns tend to
repeat themselves from year to year.
3. Cyclical variation is the fluctuation above and below the trend line.
4. Irregular or Random variation is divided into two components. [Episodic
variations are unpredictable, but can usually be identified, such as a flood of
hurricane. Residual variations refer to random in nature and cannot be identified.]
Two General Aspects of Time Series Patterns
Most time series patterns can be described in terms of two basic classes of components: trend
and seasonality. The former represents a general systematic linear or (most often) nonlinear
component that changes over time and does not repeat or at least does not repeat within the
time range captured by our data (e.g., a plateau followed by a period of exponential growth).
The latter may have a formally similar nature (e.g., a plateau followed by a period of
exponential growth), however, it repeats itself in systematic intervals over time. Those two
general classes of time series components may coexist in real-life data. For example, sales of
a company can rapidly grow over years but they still follow consistent seasonal patterns (e.g.,
as much as 25% of yearly sales each year are made in December, whereas only 4% in
August).
Page# 3
This general pattern is well illustrated in a "classic" Series G data set (Box and Jenkins, 1976,
p. 531) representing monthly international airline passenger totals (measured in thousands) in
twelve consecutive years from 1949 to 1960 (see example data file G.sta and graph above). If
you plot the successive observations (months) of airline passenger totals, a clear, almost
linear trend emerges, indicating that the airline industry enjoyed a steady growth over the
years (approximately 4 times more passengers traveled in 1960 than in 1949). At the same
time, the monthly figures will follow an almost identical pattern each year (e.g., more people
travel during holidays than during any other time of the year). This example data file also
illustrates a very common general type of pattern in time series data, where the amplitude of
the seasonal changes increases with the overall trend (i.e., the variance is correlated with the
mean over the segments of the series). This pattern which is called multiplicative seasonality
indicates that the relative amplitude of seasonal changes is constant over time, thus it is
related to the trend.
Trend Analysis
There are no proven "automatic" techniques to identify trend components in the time series
data; however, as long as the trend is monotonous (consistently increasing or decreasing) that
part of data analysis is typically not very difficult. If the time series data contain considerable
error, then the first step in the process of trend identification is smoothing.
Smoothing always involves some form of local averaging of data such that the nonsystematic
components of individual observations cancel each other out. The most common technique is
moving average smoothing which replaces each element of the series by either the simple or
weighted average of n surrounding elements, where n is the width of the smoothing
"window". Medians can be used instead of means. The main advantage of median as
compared to moving average smoothing is that its results are less biased by outliers (within
the smoothing window). Thus, if there are outliers in the data (e.g., due to measurement
errors), median smoothing typically produces smoother or at least more "reliable" curves than
moving average based on the same window width. The main disadvantage of median
smoothing is that in the absence of clear outliers it may produce more "jagged" curves than
moving average and it does not allow for weighting.
Fitting a function. Many monotonous time series data can be adequately approximated by a
linear function; if there is a clear monotonous nonlinear component, the data first need to be
transformed to remove the nonlinearity. Usually a logarithmic, exponential, or (less often)
polynomial function can be used.
Page# 4
Analysis of Seasonality
Seasonal dependency (seasonality) is another general component of the time series pattern.
The concept was illustrated in the example of the airline passengers’ data above. It is
formally defined as correlational dependency of order k between each i'th element of the
series and the (i-k)'th element and measured by autocorrelation (i.e., a correlation between the
two terms); k is usually called the lag. If the measurement error is not too large, seasonality
can be visually identified in the series as a pattern that repeats every k elements.
Autocorrelation correlogram. Seasonal patterns of time series can be examined via
correlograms. The correlogram (autocorrelogram) displays graphically and numerically the
autocorrelation function (ACF), that is, serial correlation coefficients (and their standard
errors) for consecutive lags in a specified range of lags (e.g., 1 through 30). Ranges of two
standard errors for each lag are usually marked in correlograms but typically the size of auto
correlation is of more interest than its reliability (see Elementary Concepts) because we are
usually interested only in very strong (and thus highly significant) autocorrelations.
Examining correlograms. While examining correlograms, you should keep in mind that
autocorrelations for consecutive lags are formally dependent. Consider the following
example. If the first element is closely related to the second, and the second to the third, then
the first element must also be somewhat related to the third one, etc. This implies that the
pattern of serial dependencies can change considerably after removing the first order auto
correlation (i.e., after differencing the series with a lag of 1).
Partial autocorrelations. Another useful method to examine serial dependencies is to
examine the partial autocorrelation function (PACF) - an extension of autocorrelation, where
the dependence on the intermediate elements (those within the lag) is removed. In other
words the partial autocorrelation is similar to autocorrelation, except that when calculating it,
the (auto) correlations with all the elements within the lag are partially out. If a lag of 1 is
specified (i.e., there are no intermediate elements within the lag), then the partial
autocorrelation is equivalent to auto correlation. In a sense, the partial autocorrelation
provides a "cleaner" picture of serial dependencies for individual lags (not confounded by
other serial dependencies).
Page# 5
Removing serial dependency. Serial dependency for a particular lag of k can be removed by
differencing the series, that is converting each i'th element of the series into its difference
from the (i-k)''th element. There are two major reasons for such transformations.
First, we can identify the hidden nature of seasonal dependencies in the series. Remember
that, as mentioned in the previous paragraph, autocorrelations for consecutive lags are
interdependent. Therefore, removing some of the autocorrelations will change other auto
correlations, that is, it may eliminate them or it may make some other seasonalities more
apparent.
The other reason for removing seasonal dependencies is to make the series stationary which
is necessary for ARIMA and other techniques.
Systematic Pattern and Random Noise
As in most other analyses, in time series analysis it is assumed that the data consist of a
systematic pattern (usually a set of identifiable components) and random noise (error) which
usually makes the pattern difficult to identify. Most time series analysis techniques involve
some form of filtering out noise in order to make the pattern more salient.
Models of Time Series: Time series may be affected by one or more components simultaneously. Two different
models are assumed in time series.
A. The additive model: According to the additive model, a time series can be expressed as
Yt =Tt+St+Ct+It
Where Yt = Time series value at time t
Tt = Trend values at time t
St = Seasonal variation at time t
Ct = Cyclical variation at time t
It = Irregular variation at time t
B. The multiplicative Model: In classical or traditional approach, it is assumed that there is
a multiplicative relationship among four components.
Any Particular value Yt is considered to be the product of Trend (Tt), Seasonal variation (St),
Cyclical variation (Ct) and Irregular variation (It). Thus Yt = Tt St Ct It
1. Trend (Tt): By trend we mean the general tendency of the data to increase or decrease
during a long period of time. This is true of most of series of Business and Economic
Statistics. Fore example an upward tendency would be seen in data pertaining to population,
agricultural production, currency in circulation etc., while, a downward tendency will be
noticed in data of birth rate, death rate etc.
2. Seasonal Variation (St): Seasonal variations are the periodic and regular movement in a
time series with period less than one year. Fore example demand of umbrella in the rainy
season, demand of worm clothe in the winter, demand of cold drinks in the summer etc. The
factor that causes seasonal variations is
(i) Climate and weather conditions
(ii) Customs, traditions and habits etc.
Page# 6
3. Cyclical variations (Ct): The oscillatory movements in a time series with period of
oscillation more than one year are termed as cyclic fluctuations. One complete period is
called a ‘cycle’. The cyclical movements in a time series are generally attributed to the so-
called business cycle. There are four well-defined periods or phase in the business cycle
namely prosperity, recession (decline), depression and recovery and normally lasts from
seven to eleven years.
4. Irregular variation (It) : Besides trend, seasonal variations and cyclical variations, there
are other factors, which cause variations in time series. These variations are purely random,
unpredictable and are due to some irregular circumstances, which are beyond control of
human hand. These irregular but powerful fluctuations are due to floods, famines, revelations,
political unrest, draught etc.
Method of Measuring Trend
Trend can be measured by the following methods:
1. The free hand or graphic method;
2. The semi-average method;
3. The method of moving average;
4. The least squares method;
1. The graphic method: A free hand smooth curve obtained on plotting. The value Yt
against t enables us to form an idea about the general trend of the series.
This method is simple and easier and does not require mathematical skill. But in this method
different researcher may get different trend line for the same set of data. Forecasting in this
method is risky if the researcher is not efficient and experienced.
2. Method of semi average: In this method the whole data is divided into two parts with
respect to time. In case of odd number the two parts are obtaining by omitting the value
corresponding to the middle of the series. Next we compute the arithmetic mean for each part
and plot these two averages against the mid values of the respective periods covered by each
part. The line obtained on joining these two points is the required trend line
This method is simple to understand compared to the moving average method and the method
of least squares.
This method assumes straight-line relationship between the plotted points regardless of the
fact whether the relationship exists or not.
3. Method of moving averages: In this method 3, 4, or 5 years moving averages of the
variable values are first obtained. Arithmetic mean of the first three years values are
computed and placed against the middle of those years. Then excluding the first year value,
arithmetic mean of the 2nd
, 3rd
, and 4th
year values are calculated and placed against their
middle year. In this way 4 year, 5 year moving averages can be computed. The graph
obtained on plotting the moving average against time gives trend.
Merits: *Long term trend determination is easy by the moving average method
** If an appropriate moving average can be taken, the irregular movement is reduced
to a great extent.
Limitations:* Trend values for all the times of time series can not be estimated by the
method of moving average; some values at the stating and some values at the end may not
found.
** Moving average are affected by extreme values
Page# 7
*** This method cannot be used for forecasting future trend, which is the main objective of
the time series analysis.
5. Least squares method: This method is widely used in practice. When this method is
applied, a trend line is fitted to the data in such a manner that the following two conditions
are satisfied:
(i) 0)( cYY (ii) 2)( cYY is the least.
The straight line is represented by the equation
Yc=a+bX
Where Yc denote the trend values; Y actual values; a is the intercept; b is the slope of the line
or amount of change in Y variable that is associated with a change of one unit in X variable.
The long term trend equation (linear) estimated by the least squares equation for time t is:
The estimated trend line becomes tbaY ˆˆˆ . On the basis of this trend line, values of Y
can be obtained for different values of X and prediction of future values can be done.
Example: The owner of Strong Homes would like a forecast for the next couple of years of
new homes that will be constructed in the Pittsburgh area. Listed below are the sales of new
homes constructed in the area for the last 5 years.
Year Sales
1997 4.3
1998 5.6
1999 7.8
2000 9.2
2001 9.7
Total 36.6
Year Sales t Sales*t t2
1997 4.3 1 4.3 1
1998 5.6 2 11.2 4
1999 7.8 3 23.4 9
2000 9.2 4 36.8 16
2001 9.7 5 48.5 25
Total 36.6 15 124.2 55
Develop a trend equation using the least squares method by letting 1997 be the time period 1.
44.15/)15(55
5/)15(6.362.124
/
/222 ntt
ntYtYb
00.35
1544.1
5
6.36
n
tb
n
Ya
Y a bt
btY Y t n
t t n
aY
nb
t
n
'
( )( ) /
( ) /2 2
Page# 8
The time series equation is: Y’ = 3.00 + 1.44t
The forecast for the year 2003 is: Y’ = 3.00 + 1.44(7) = 13.08
If the trend is not linear but rather the increases tend to be a constant percent, the Y values
are converted to logarithms, and a least squares equation is determined using the logs.
Method of Moving Average: It consists of measurement of trend by smoothing out the
fluctuations of the data by means of a moving average. Moving average of extent (or period)
m is a series of successive averages (A.M.) of m terms at a time, starting with 1st, 2nd, 3rd
term etc. Thus the first average is the mean of the 1st, m terms, the 2nd is the mean of the m
terms from 2nd to (m+1)th term and so on. Moving average is placed against the middle
value of the time interval it covers. When m is even the moving average does not coincide
with an original time period and an attempt is made to synchronize the moving averages and
the original data by centering the moving averages which consists in the taking a moving
average of extent two, of these moving averages and putting of these values against the
middle time period. The graph obtained on plotting the moving averages against time gives
trend.
Example: The data on the rice production during 1990-2000 in a certain region are given
below:
Year: 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Production
(Ton) 280 300 325 420 315 360 400 450 350 420 460
Determine the trend by method of moving average.
Solution: It is clear from the data that a 4-year cycle is present here. So 4-year moving
averages are computed.
Year Production (Ton) 4-year moving
total
4-year moving
average
4-year moving
average (Centered)
1990 280
1991 300
1325 331.50
1992 325 335.63
1360 340.00
1993 420 347.50
1420 355.00
1994 315 364.38
1495 373.75
1995 360 377.50
1525 381.25
1996 400 385.63
1560 390.00
1997 450 397.50
1620 405.00
1998 350 412.50
1680 420.00
1999 420
2000 460
log( ') [log( )] [log( )]Y a b t
Page# 9
The trend line is estimated by plotting the 4-year moving averages along the y-axis against
the corresponding year plotted along the x-axis.
The moving-average method is used to smooth out a time series. This is accomplished by
“moving” the arithmetic mean through the time series.
The moving-average is the basic method used in measuring the seasonal fluctuation.
To apply the moving-average method to a time series, the data should follow a fairly linear
trend and have a definite rhythmic pattern of fluctuations.
The method most commonly used to compute the typical seasonal pattern is called the ratio-
to-moving-average method.
It eliminates the trend, cyclical, and irregular components from the original data (Y).
The numbers that result are called the typical seasonal indexes.
Step 1: Determine the moving total for the time series.
Step 2: Determine the moving average for the time series.
Step 3: The moving averages are then centered.
Step 4: The specific seasonal for each period is then computed by dividing the Y values with
the centered moving averages.
Step 5: Organize the specific seasonals in a table.
Step 6: Apply the correction factor.
The resulting series (sales) is called deseasonalized sales or seasonally adjusted sales.
The reason for deseasonalizing a series (sales) is to remove the seasonal fluctuations so that
the trend and cycle can be studied. A set of typical indexes is very useful in adjusting a series
(sales, for example)
Example : The data on rice production during 1990 –2000 in a large agricultural area.
Year : 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Production: 28 30 32 37 39 35 38 40 45 43 52
(Tones)
(a) fit a trend line by the method of 3-yearly moving average;
(b) Fit a trend line by the method of least squares and comment
(c) Estimate the production for the year 2003
1. Below are given the figures of food requirement (in million tons) for a country:
Year Food grain Requirement
(Million tons)
1996 20.22
1997 20.58
… …
…. …
… …
2006 23.03
(i) Fit a straight Line by the “ Least Squares Methods” and tabulate the trend value
(ii) What is the monthly increase of food requirement for this country?
Page# 10
(iii) Estimate the food requirement (in million tons) for Bangladesh in the year 2015.
Solution: (i) Computation of Trend value
Computational Table
Year Food grain
Requirement
(Y)
t = Year - 1995 t2 tY Trend Values
(in million
tons)
Elimination
of Trend
1996 20.22 1 1 20.22 20.33 -0.12
1997 20.58 2 4 41.16 20.61 -0.03
1998 20.94 3 9 62.83 20.89 0.05
1999 21.21 4 16 84.83 21.17 0.03
2000 21.49 5 25 107.45 21.45 0.04
2001 21.77 6 36 130.63 21.74 0.04
2002 22.09 7 49 154.66 22.02 0.08
2003 22.35 8 64 178.81 22.30 0.05
2004 22.55 9 81 202.94 22.58 -0.03
2005 22.86 10 100 228.56 22.86 0.00
2006 23.03 11 121 253.33 23.14 -0.11
239.09 66 506 1465.41 20.33 -0.12
Let the Trend Equation or Time series Equation be Y=a + bt
b= ntt
ntYtY
/)(
/)).((22
= 11/(66)-506
66/11239.09 -1465.412
= 0.28078
a = tbY . =11
6628078.0
11
09.239=20.0505
So the Trend Equation or Time series Equation is Y= 20.0505+ 0.28078t
(ii) Yearly increase of Food Demand as provided by linear trend is 0.28078 million tons
or 280.78 thousand tons. So the monthly increase of Food Demand is 280.78/12=
23.398 thousand tons.
(iii) The estimated food requirement (in million tons) in the year 2015 (t= 20).
Y= 20.0505+ 0.28078 20 =25.666
Estimated food requirement is 25.666 million tons
2. Below are given the figures of Mid-Year Population (in million ) for of a country:
Year Mid-Year Population
(Million)
1996 122.10
1997 124.30
… …
Page# 11
… …
… …
2006 139.10
(iv) Fit a straight Line by the “ Least Squares Methods” and tabulate the trend value
(v) What is the monthly increase of population for this country?
(vi) Estimate the population for Bangladesh in the year 2015.
Solution: (i) Computation of Trend value
Computational Table
Year Population in
million
(Y)
t = Year - 1995 tY t2 Trend Values
(in million)
Elimination
of Trend
1996 122.10 1 122.10 1 122.80 -0.70
1997 124.30 2 248.60 4 124.50 -0.20
1998 126.50 3 379.50 9 126.19 0.31
1999 128.10 4 512.40 16 127.89 0.21
2000 129.80 5 649.00 25 129.59 0.21
2001 131.50 6 789.00 36 131.28 0.22
2002 133.45 7 934.15 49 132.98 0.47
2003 135.00 8 1080.00 64 134.67 0.33
2004 136.20 9 1225.80 81 136.37 -0.17
2005 138.05 10 1380.50 100 138.07 -0.02
2006 139.10 11 1530.10 121 139.76 -0.66
Total 1444.1 66 8851.15 506
Let the Trend Equation or Time series Equation be Y=a + bt
b= ntt
ntYtY
/)(
/)).((22
= 11/(66)-506
66/111444.1 -8851.152
= 1.69591
a = tbY . =11
6669591.1
11
1.1444=121.106
So the Trend Equation or Time series Equation is Y= 121.106+ 1.69591t
(ii) Yearly increase of population as provided by linear trend is 1.69591 million or. So the
monthly increasing number of people is 1.69591million /12= 141326.
(iii) The estimated population for the country in the year 2015 (t= 20).
Y= 121.106+ 1.69591 20 =155.0242 million
Page# 12
Exponential Smoothing
General Introduction
Exponential smoothing has become very popular as a forecasting method for a wide variety
of time series data. Historically, the method was independently developed by Brown and
Holt. Brown worked for the US Navy during World War II, where his assignment was to
design a tracking system for fire-control information to compute the location of submarines.
Later, he applied this technique to the forecasting of demand for spare parts (an inventory
control problem). He described those ideas in his 1959 book on inventory control. Holt's
research was sponsored by the Office of Naval Research; independently, he developed
exponential smoothing models for constant processes, processes with linear trends, and for
seasonal data.
Simple Exponential Smoothing
A simple and pragmatic model for a time series would be to consider each observation as
consisting of a constant (b) and an error component (epsilon), that is: Xt = b + t. The
constant b is relatively stable in each segment of the series, but may change slowly over time.
If appropriate, then one way to isolate the true value of b, and thus the systematic or
predictable part of the series, is to compute a kind of moving average, where the current and
immediately preceding ("younger") observations are assigned greater weight than the
respective older observations. Simple exponential smoothing accomplishes exactly such
weighting, where exponentially smaller weights are assigned to older observations. The
specific formula for simple exponential smoothing is:
St = *Xt + (1- )*St-1
When applied recursively to each successive observation in the series, each new smoothed
value (forecast) is computed as the weighted average of the current observation and the
previous smoothed observation; the previous smoothed observation was computed in turn
from the previous observed value and the smoothed value before the previous observation,
and so on. Thus, in effect, each smoothed value is the weighted average of the previous
observations, where the weights decrease exponentially depending on the value of parameter
(alpha). If is equal to 1 (one) then the previous observations are ignored entirely; if is
equal to 0 (zero), then the current observation is ignored entirely, and the smoothed value
consists entirely of the previous smoothed value (which in turn is computed from the
smoothed observation before it, and so on; thus all smoothed values will be equal to the
initial smoothed value S0). Values of in-between will produce intermediate results.
Even though significant work has been done to study the theoretical properties of (simple and
complex) exponential smoothing the method has gained popularity mostly because of its
usefulness as a forecasting tool. Thus, regardless of the theoretical model for the process
underlying the observed time series, simple exponential smoothing will often produce quite
accurate forecasts.
Choosing the Best Value for Parameter (alpha)
Gardner (1985) discusses various theoretical and empirical arguments for selecting an
appropriate smoothing parameter. Obviously, should fall into the interval between 0 (zero)
and 1. Among practitioners, smaller than 0.30 is usually recommended. However, in the
study by Makridakis (1982), values above .30 frequently yielded the best forecasts.
Page# 13
Estimating the best value from the data. In practice, the smoothing parameter is often
chosen by a grid search of the parameter space; that is, different solutions for are tried
starting, for example, with = 0.1 to = 0.9, with increments of 0.1. Then is chosen so as
to produce the smallest sums of squares (or mean squares) for the residuals (i.e., observed
values minus one-step-ahead forecasts; this mean squared error is also referred to as ex post
mean squared error, ex post MSE for short).
Indices of Lack of Fit (Error)
The most straightforward way of evaluating the accuracy of the forecasts based on a
particular value is to simply plot the observed values and the one-step-ahead forecasts.
This plot can also include the residuals (scaled against the right Y-axis), so that regions of
better or worst fit can also easily be identified.
This visual check of the accuracy of forecasts is often the most powerful method for
determining whether or not the current exponential smoothing model fits the data. In
addition, besides the ex post MSE criterion (see previous paragraph), there are other
statistical measures of error that can be used to determine the optimum parameter (see
Makridakis, Wheelwright, and McGee, 1983):
Mean error: The mean error (ME) value is simply computed as the average error value
(average of observed minus one-step-ahead forecast). Obviously, a drawback of this measure
is that positive and negative error values can cancel each other out, so this measure is not a
very good indicator of overall fit.
Mean absolute error: The mean absolute error (MAE) value is computed as the average
absolute error value. If this value is 0 (zero), the fit (forecast) is perfect. As compared to the
mean squared error value, this measure of fit will "de-emphasize" outliers, that is, unique or
rare large error values will affect the MAE less than the MSE value.
Sum of squared error (SSE), Mean squared error. These values are computed as the sum
(or average) of the squared error values. This is the most commonly used lack-of-fit indicator
in statistical fitting procedures.
Percentage error (PE). All the above measures rely on the actual error value. It may seem
reasonable to rather express the lack of fit in terms of the relative deviation of the one-step-
ahead forecasts from the observed values, that is, relative to the magnitude of the observed
values. For example, when trying to predict monthly sales that may fluctuate widely (e.g.,
seasonally) from month to month, we may be satisfied if our prediction "hits the target" with
about ±10% accuracy. In other words, the absolute errors may be not so much of interest as
Page# 14
are the relative errors in the forecasts. To assess the relative error, various indices have been
proposed (see Makridakis, Wheelwright, and McGee, 1983). The first one, the percentage
error value, is computed as:
PEt = 100*(Xt - Ft )/Xt
where Xt is the observed value at time t, and Ft is the forecasts (smoothed values).
Mean percentage error (MPE). This value is computed as the average of the PE values.
Mean absolute percentage error (MAPE). As is the case with the mean error value (ME,
see above), a mean percentage error near 0 (zero) can be produced by large positive and
negative percentage errors that cancel each other out. Thus, a better measure of relative
overall fit is the mean absolute percentage error. Also, this measure is usually more
meaningful than the mean squared error. For example, knowing that the average forecast is
"off" by ±5% is a useful result in and of itself, whereas a mean squared error of 30.8 is not
immediately interpretable.
Automatic search for best parameter. A quasi-Newton function minimization procedure
(the same as in ARIMA is used to minimize either the mean squared error, mean absolute
error, or mean absolute percentage error. In most cases, this procedure is more efficient than
the grid search (particularly when more than one parameter must be determined), and the
optimum parameter can quickly be identified.
The first smoothed value S0. A final issue that we have neglected up to this point is the
problem of the initial value, or how to start the smoothing process. If you look back at the
formula above, it is evident that you need an S0 value in order to compute the smoothed value
(forecast) for the first observation in the series. Depending on the choice of the parameter
(i.e., when is close to zero), the initial value for the smoothing process can affect the
quality of the forecasts for many observations. As with most other aspects of exponential
smoothing it is recommended to choose the initial value that produces the best forecasts. On
the other hand, in practice, when there are many leading observations prior to a crucial actual
forecast, the initial value will not affect that forecast by much, since its effect will have long
"faded" from the smoothed series (due to the exponentially decreasing weights, the older an
observation the less it will influence the forecast).
Seasonal and Non-Seasonal Models With or Without Trend
The discussion above in the context of simple exponential smoothing introduced the basic
procedure for identifying a smoothing parameter, and for evaluating the goodness-of-fit of a
model. In addition to simple exponential smoothing, more complex models have been
developed to accommodate time series with seasonal and trend components. The general idea
here is that forecasts are not only computed from consecutive previous observations (as in
simple exponential smoothing), but an independent (smoothed) trend and seasonal
component can be added. Gardner (1985) discusses the different models in terms of
seasonality (none, additive, or multiplicative) and trend (none, linear, exponential, or
damped).
Additive and multiplicative seasonality. Many time series data follow recurring seasonal
patterns. For example, annual sales of toys will probably peak in the months of November
and December, and perhaps during the summer (with a much smaller peak) when children are
on their summer break. This pattern will likely repeat every year, however, the relative
Page# 15
amount of increase in sales during December may slowly change from year to year. Thus, it
may be useful to smooth the seasonal component independently with an extra parameter,
usually denoted as (delta).
Seasonal components can be additive in nature or multiplicative. For example, during the
month of December the sales for a particular toy may increase by 1 million dollars every
year. Thus, we could add to our forecasts for every December the amount of 1 million dollars
(over the respective annual average) to account for this seasonal fluctuation. In this case, the
seasonality is additive.
Alternatively, during the month of December the sales for a particular toy may increase by
40%, that is, increase by a factor of 1.4. Thus, when the sales for the toy are generally weak,
than the absolute (dollar) increase in sales during December will be relatively weak (but the
percentage will be constant); if the sales of the toy are strong, than the absolute (dollar)
increase in sales will be proportionately greater. Again, in this case the sales increase by a
certain factor, and the seasonal component is thus multiplicative in nature (i.e., the
multiplicative seasonal component in this case would be 1.4).
In plots of the series, the distinguishing characteristic between these two types of seasonal
components is that in the additive case, the series shows steady seasonal fluctuations,
regardless of the overall level of the series; in the multiplicative case, the size of the seasonal
fluctuations vary, depending on the overall level of the series.
The seasonal smoothing parameter . In general the one-step-ahead forecasts are
computed as (for no trend models, for linear and exponential trend models a trend component
is added to the model; see below):
Additive model:
Forecastt = St + It-p
Multiplicative model:
Forecastt = St*It-p
In this formula, St stands for the (simple) exponentially smoothed value of the series at time t,
and It-p stands for the smoothed seasonal factor at time t minus p (the length of the season).
Thus, compared to simple exponential smoothing, the forecast is "enhanced" by adding or
multiplying the simple smoothed value by the predicted seasonal component. This seasonal
component is derived analogous to the St value from simple exponential smoothing as:
Additive model:
It = It-p + *(1- )*et
Multiplicative model:
It = It-p + *(1- )*et/St
Put into words, the predicted seasonal component at time t is computed as the respective
seasonal component in the last seasonal cycle plus a portion of the error (et; the observed
Page# 16
minus the forecast value at time t). Considering the formulas above, it is clear that parameter
can assume values between 0 and 1. If it is zero, then the seasonal component for a
particular point in time is predicted to be identical to the predicted seasonal component for
the respective time during the previous seasonal cycle, which in turn is predicted to be
identical to that from the previous cycle, and so on. Thus, if is zero, a constant unchanging
seasonal component is used to generate the one-step-ahead forecasts. If the parameter is
equal to 1, then the seasonal component is modified "maximally" at every step by the
respective forecast error (times (1- ), which we will ignore for the purpose of this brief
introduction). In most cases, when seasonality is present in the time series, the optimum
parameter will fall somewhere between 0 (zero) and 1(one).
Linear, exponential, and damped trend. To remain with the toy example above, the sales
for a toy can show a linear upward trend (e.g., each year, sales increase by 1 million dollars),
exponential growth (e.g., each year, sales increase by a factor of 1.3), or a damped trend
(during the first year sales increase by 1 million dollars; during the second year the increase is
only 80% over the previous year, i.e., $800,000; during the next year it is again 80% less than
the previous year, i.e., $800,000 * .8 = $640,000; etc.). Each type of trend leaves a clear
"signature" that can usually be identified in the series; shown below in the brief discussion of
the different models are icons that illustrate the general patterns. In general, the trend factor
may change slowly over time, and, again, it may make sense to smooth the trend component
with a separate parameter (denoted [gamma] for linear and exponential trend models, and
[phi] for damped trend models).
The trend smoothing parameters (linear and exponential trend) and (damped
trend). Analogous to the seasonal component, when a trend component is included in the
exponential smoothing process, an independent trend component is computed for each time,
and modified as a function of the forecast error and the respective parameter. If the
parameter is 0 (zero), than the trend component is constant across all values of the time
series (and for all forecasts). If the parameter is 1, then the trend component is modified
"maximally" from observation to observation by the respective forecast error. Parameter
values that fall in-between represent mixtures of those two extremes. Parameter is a trend
modification parameter, and affects how strongly changes in the trend will affect estimates of
the trend for subsequent forecasts, that is, how quickly the trend will be "damped" or
increased.