CHAPTER-8 : TEST OF HYPOTHESISgauss.stat.su.se/gu/e/slides/Time Series/Forecast.pdf · observed...

Page# 1

FORECASTING: When estimates of future conditions are made on a systematic basis, the

process is referred to as ‘forecasting’ and the figure and the statement obtained is known as a

forecast. Forecasting is a service whose purpose is to offer the best available basis for

management expectations of the future and to help management understand the implications

for the firm’s future of the courses of actions to them at present.

Forecasting is concerned with two main tasks: first, the determination of the best basis

available for the formation of intelligent managerial expectations: and second, the handling of

uncertainty about the future, so that implications of decisions become explicit.

Main Functions of Forecasting: The following are main functions of forecasting:

(i) The creation of plans of action of action. It is impossible to evolve a

worthwhile system of business control without one acceptable system of

forecasting

(ii) The second general use of forecasting is to found in monitoring the continuing

progress of plans based on forecasts.

(iii) The forecast provides a warning system of the critical factors to be monitored

regularly because they might drastically affect the performance of the plan.

Steps in forecasting: The forecasting of business fluctuations consists of the following steps.

(i) Understanding why changes in the past have occurred: Forecast should use the

data on past performance to get a speedometer reading of the current rate and

how far the rate is increasing and decreasing.

(ii) Determining which phases of business activity must be measured: It is necessary

to measure certain phases of business activity in order to predict what changes

will probably follow the present level of activity.

(iii) Selecting and compiling data to be used as measuring devices: There is an

independent relationship between the selection of statistical data and

determination of why business fluctuations occur.

(iv) Analysis of data: Data are analyzed in the light of one’s understanding of the

reason why changes occur.

Methods of Forecasting: The following are some of the important methods of forecasting:

1. Historical Analogy Method;

2. Field survey and opinion poll;

3. Extrapolation

4. Regression Analysis

5. Econometric models

6. Lead –Lag Analysis

7. Exponential smoothing

8. Input-Output Analysis

9. Time series Analysis

TIME SERIES ANALYSIS

Time series: Arrangement of Statistical data in accordance with occurrence of time is known

as time series. A time series may be mathematically expressed by the functional relationship

Yt =f(t) where Yt is the value of the variable under consideration at time t.

There are two main goals of time series analysis: (a) Identifying the nature of the

phenomenon represented by the sequence of observations, and (b) Forecasting (predicting

future values of the time series variable). Both of these goals require that the pattern of

observed time series data is identified and more or less formally described. Once the pattern

Page# 2

is established, we can interpret and integrate it with other data (i.e., use it in our theory of the

investigated phenomenon, e.g., seasonal commodity prices). Regardless of the depth of our

understanding and the validity of our interpretation (theory) of the phenomenon, we can

extrapolate the identified pattern to predict future.

Role of Time series analysis: Time series analysis is of great significance in decision-making for the following reasons.

(i) It helps in the understanding of past behavior: By observing data over a period of

time, one can easily understand what changes have taken place in the past. Such

analysis will be extremely helpful in predicting the future behavior.

(ii) It helps in planning future operations: If the regularity of occurrence of any

feature over a sufficient long period could be clearly established then, within

limits prediction of probable future variations would become possible.

(iii) It helps in evaluating current accomplishments: The actual performance can be

compared with the expected performance and the cause of variation analyzed. For

example, if expected sales for 2006-7 were 20000 colored TV sets and the actual

sales were only 19000 one can investigate the cause for the shortfall in

achievement.

(iv) It facilitates comparison: Different time series are often compared and important

conclusions drawn there from.

Components of Time Series: Changes of data with change of time depend on a number of causes; these causes are known

as the components of time series. The common components of time series are:

1. Trend or long term movement or Secular Trend is the long run direction of the

time series.

2. Seasonal Variation is the pattern in a time series within a year. These patterns tend to

repeat themselves from year to year.

3. Cyclical variation is the fluctuation above and below the trend line.

4. Irregular or Random variation is divided into two components. [Episodic

variations are unpredictable, but can usually be identified, such as a flood of

hurricane. Residual variations refer to random in nature and cannot be identified.]

Two General Aspects of Time Series Patterns

Most time series patterns can be described in terms of two basic classes of components: trend

and seasonality. The former represents a general systematic linear or (most often) nonlinear

component that changes over time and does not repeat or at least does not repeat within the

time range captured by our data (e.g., a plateau followed by a period of exponential growth).

The latter may have a formally similar nature (e.g., a plateau followed by a period of

exponential growth), however, it repeats itself in systematic intervals over time. Those two

general classes of time series components may coexist in real-life data. For example, sales of

a company can rapidly grow over years but they still follow consistent seasonal patterns (e.g.,

as much as 25% of yearly sales each year are made in December, whereas only 4% in

August).

Page# 3

This general pattern is well illustrated in a "classic" Series G data set (Box and Jenkins, 1976,

p. 531) representing monthly international airline passenger totals (measured in thousands) in

twelve consecutive years from 1949 to 1960 (see example data file G.sta and graph above). If

you plot the successive observations (months) of airline passenger totals, a clear, almost

linear trend emerges, indicating that the airline industry enjoyed a steady growth over the

years (approximately 4 times more passengers traveled in 1960 than in 1949). At the same

time, the monthly figures will follow an almost identical pattern each year (e.g., more people

travel during holidays than during any other time of the year). This example data file also

illustrates a very common general type of pattern in time series data, where the amplitude of

the seasonal changes increases with the overall trend (i.e., the variance is correlated with the

mean over the segments of the series). This pattern which is called multiplicative seasonality

indicates that the relative amplitude of seasonal changes is constant over time, thus it is

related to the trend.

Trend Analysis

There are no proven "automatic" techniques to identify trend components in the time series

data; however, as long as the trend is monotonous (consistently increasing or decreasing) that

part of data analysis is typically not very difficult. If the time series data contain considerable

error, then the first step in the process of trend identification is smoothing.

Smoothing always involves some form of local averaging of data such that the nonsystematic

components of individual observations cancel each other out. The most common technique is

moving average smoothing which replaces each element of the series by either the simple or

weighted average of n surrounding elements, where n is the width of the smoothing

"window". Medians can be used instead of means. The main advantage of median as

compared to moving average smoothing is that its results are less biased by outliers (within

the smoothing window). Thus, if there are outliers in the data (e.g., due to measurement

errors), median smoothing typically produces smoother or at least more "reliable" curves than

moving average based on the same window width. The main disadvantage of median

smoothing is that in the absence of clear outliers it may produce more "jagged" curves than

moving average and it does not allow for weighting.

Fitting a function. Many monotonous time series data can be adequately approximated by a

linear function; if there is a clear monotonous nonlinear component, the data first need to be

transformed to remove the nonlinearity. Usually a logarithmic, exponential, or (less often)

polynomial function can be used.

Page# 4

Analysis of Seasonality

Seasonal dependency (seasonality) is another general component of the time series pattern.

The concept was illustrated in the example of the airline passengers’ data above. It is

formally defined as correlational dependency of order k between each i'th element of the

series and the (i-k)'th element and measured by autocorrelation (i.e., a correlation between the

two terms); k is usually called the lag. If the measurement error is not too large, seasonality

can be visually identified in the series as a pattern that repeats every k elements.

Autocorrelation correlogram. Seasonal patterns of time series can be examined via

correlograms. The correlogram (autocorrelogram) displays graphically and numerically the

autocorrelation function (ACF), that is, serial correlation coefficients (and their standard

errors) for consecutive lags in a specified range of lags (e.g., 1 through 30). Ranges of two

standard errors for each lag are usually marked in correlograms but typically the size of auto

correlation is of more interest than its reliability (see Elementary Concepts) because we are

usually interested only in very strong (and thus highly significant) autocorrelations.

Examining correlograms. While examining correlograms, you should keep in mind that

autocorrelations for consecutive lags are formally dependent. Consider the following

example. If the first element is closely related to the second, and the second to the third, then

the first element must also be somewhat related to the third one, etc. This implies that the

pattern of serial dependencies can change considerably after removing the first order auto

correlation (i.e., after differencing the series with a lag of 1).

Partial autocorrelations. Another useful method to examine serial dependencies is to

examine the partial autocorrelation function (PACF) - an extension of autocorrelation, where

the dependence on the intermediate elements (those within the lag) is removed. In other

words the partial autocorrelation is similar to autocorrelation, except that when calculating it,

the (auto) correlations with all the elements within the lag are partially out. If a lag of 1 is

specified (i.e., there are no intermediate elements within the lag), then the partial

autocorrelation is equivalent to auto correlation. In a sense, the partial autocorrelation

provides a "cleaner" picture of serial dependencies for individual lags (not confounded by

other serial dependencies).

http://www.statsoft.com/textbook/elementary-concepts-in-statistics/

Page# 5

Removing serial dependency. Serial dependency for a particular lag of k can be removed by

differencing the series, that is converting each i'th element of the series into its difference

from the (i-k)''th element. There are two major reasons for such transformations.

First, we can identify the hidden nature of seasonal dependencies in the series. Remember

that, as mentioned in the previous paragraph, autocorrelations for consecutive lags are

interdependent. Therefore, removing some of the autocorrelations will change other auto

correlations, that is, it may eliminate them or it may make some other seasonalities more

apparent.

The other reason for removing seasonal dependencies is to make the series stationary which

is necessary for ARIMA and other techniques.

Systematic Pattern and Random Noise

As in most other analyses, in time series analysis it is assumed that the data consist of a

systematic pattern (usually a set of identifiable components) and random noise (error) which

usually makes the pattern difficult to identify. Most time series analysis techniques involve

some form of filtering out noise in order to make the pattern more salient.

Models of Time Series: Time series may be affected by one or more components simultaneously. Two different

models are assumed in time series.

A. The additive model: According to the additive model, a time series can be expressed as

Yt =Tt+St+Ct+It

Where Yt = Time series value at time t

Tt = Trend values at time t

St = Seasonal variation at time t

Ct = Cyclical variation at time t

It = Irregular variation at time t

B. The multiplicative Model: In classical or traditional approach, it is assumed that there is

a multiplicative relationship among four components.

Any Particular value Yt is considered to be the product of Trend (Tt), Seasonal variation (St),

Cyclical variation (Ct) and Irregular variation (It). Thus Yt = Tt St Ct It

1. Trend (Tt): By trend we mean the general tendency of the data to increase or decrease

during a long period of time. This is true of most of series of Business and Economic

Statistics. Fore example an upward tendency would be seen in data pertaining to population,

agricultural production, currency in circulation etc., while, a downward tendency will be

noticed in data of birth rate, death rate etc.

2. Seasonal Variation (St): Seasonal variations are the periodic and regular movement in a

time series with period less than one year. Fore example demand of umbrella in the rainy

season, demand of worm clothe in the winter, demand of cold drinks in the summer etc. The

factor that causes seasonal variations is

(i) Climate and weather conditions

(ii) Customs, traditions and habits etc.

http://www.statsoft.com/textbook/statistics-glossary/s/#Stationary Series (in Time Series)

http://www.statsoft.com/textbook/time-series-analysis/?button=3#arima

Page# 6

3. Cyclical variations (Ct): The oscillatory movements in a time series with period of

oscillation more than one year are termed as cyclic fluctuations. One complete period is

called a ‘cycle’. The cyclical movements in a time series are generally attributed to the so-

called business cycle. There are four well-defined periods or phase in the business cycle

namely prosperity, recession (decline), depression and recovery and normally lasts from

seven to eleven years.

4. Irregular variation (It) : Besides trend, seasonal variations and cyclical variations, there

are other factors, which cause variations in time series. These variations are purely random,

unpredictable and are due to some irregular circumstances, which are beyond control of

human hand. These irregular but powerful fluctuations are due to floods, famines, revelations,

political unrest, draught etc.

Method of Measuring Trend

Trend can be measured by the following methods:

1. The free hand or graphic method;

2. The semi-average method;

3. The method of moving average;

4. The least squares method;

1. The graphic method: A free hand smooth curve obtained on plotting. The value Yt

against t enables us to form an idea about the general trend of the series.

This method is simple and easier and does not require mathematical skill. But in this method

different researcher may get different trend line for the same set of data. Forecasting in this

method is risky if the researcher is not efficient and experienced.

2. Method of semi average: In this method the whole data is divided into two parts with

respect to time. In case of odd number the two parts are obtaining by omitting the value

corresponding to the middle of the series. Next we compute the arithmetic mean for each part

and plot these two averages against the mid values of the respective periods covered by each

part. The line obtained on joining these two points is the required trend line

This method is simple to understand compared to the moving average method and the method

of least squares.

This method assumes straight-line relationship between the plotted points regardless of the

fact whether the relationship exists or not.

3. Method of moving averages: In this method 3, 4, or 5 years moving averages of the

variable values are first obtained. Arithmetic mean of the first three years values are

computed and placed against the middle of those years. Then excluding the first year value,

arithmetic mean of the 2nd

, 3rd

, and 4th

year values are calculated and placed against their

middle year. In this way 4 year, 5 year moving averages can be computed. The graph

obtained on plotting the moving average against time gives trend.

Merits: *Long term trend determination is easy by the moving average method

** If an appropriate moving average can be taken, the irregular movement is reduced

to a great extent.

Limitations:* Trend values for all the times of time series can not be estimated by the

method of moving average; some values at the stating and some values at the end may not

found.

** Moving average are affected by extreme values

Page# 7

*** This method cannot be used for forecasting future trend, which is the main objective of

the time series analysis.

5. Least squares method: This method is widely used in practice. When this method is

applied, a trend line is fitted to the data in such a manner that the following two conditions

are satisfied:

(i) 0)( cYY (ii) 2)( cYY is the least.

The straight line is represented by the equation

Yc=a+bX

Where Yc denote the trend values; Y actual values; a is the intercept; b is the slope of the line

or amount of change in Y variable that is associated with a change of one unit in X variable.

The long term trend equation (linear) estimated by the least squares equation for time t is:

The estimated trend line becomes tbaY ˆˆˆ . On the basis of this trend line, values of Y

can be obtained for different values of X and prediction of future values can be done.

Example: The owner of Strong Homes would like a forecast for the next couple of years of

new homes that will be constructed in the Pittsburgh area. Listed below are the sales of new

homes constructed in the area for the last 5 years.

Year Sales

1997 4.3

1998 5.6

1999 7.8

2000 9.2

2001 9.7

Total 36.6

Year Sales t Sales*t t2

1997 4.3 1 4.3 1

1998 5.6 2 11.2 4

1999 7.8 3 23.4 9

2000 9.2 4 36.8 16

2001 9.7 5 48.5 25

Total 36.6 15 124.2 55

Develop a trend equation using the least squares method by letting 1997 be the time period 1.

44.15/)15(55

5/)15(6.362.124

/

/222 ntt

ntYtYb

00.35

1544.1

5

6.36

n

tb

n

Ya

Y a bt

btY Y t n

t t n

aY

nb

t

n

'

( )( ) /

( ) /2 2

Page# 8

The time series equation is: Y’ = 3.00 + 1.44t

The forecast for the year 2003 is: Y’ = 3.00 + 1.44(7) = 13.08

If the trend is not linear but rather the increases tend to be a constant percent, the Y values

are converted to logarithms, and a least squares equation is determined using the logs.

Method of Moving Average: It consists of measurement of trend by smoothing out the

fluctuations of the data by means of a moving average. Moving average of extent (or period)

m is a series of successive averages (A.M.) of m terms at a time, starting with 1st, 2nd, 3rd

term etc. Thus the first average is the mean of the 1st, m terms, the 2nd is the mean of the m

terms from 2nd to (m+1)th term and so on. Moving average is placed against the middle

value of the time interval it covers. When m is even the moving average does not coincide

with an original time period and an attempt is made to synchronize the moving averages and

the original data by centering the moving averages which consists in the taking a moving

average of extent two, of these moving averages and putting of these values against the

middle time period. The graph obtained on plotting the moving averages against time gives

trend.

Example: The data on the rice production during 1990-2000 in a certain region are given

below:

Year: 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Production

(Ton) 280 300 325 420 315 360 400 450 350 420 460

Determine the trend by method of moving average.

Solution: It is clear from the data that a 4-year cycle is present here. So 4-year moving

averages are computed.

Year Production (Ton) 4-year moving

total

4-year moving

average

4-year moving

average (Centered)

1990 280

1991 300

1325 331.50

1992 325 335.63

1360 340.00

1993 420 347.50

1420 355.00

1994 315 364.38

1495 373.75

1995 360 377.50

1525 381.25

1996 400 385.63

1560 390.00

1997 450 397.50

1620 405.00

1998 350 412.50

1680 420.00

1999 420

2000 460

log( ') [log( )] [log( )]Y a b t

Page# 9

The trend line is estimated by plotting the 4-year moving averages along the y-axis against

the corresponding year plotted along the x-axis.

The moving-average method is used to smooth out a time series. This is accomplished by

“moving” the arithmetic mean through the time series.

The moving-average is the basic method used in measuring the seasonal fluctuation.

To apply the moving-average method to a time series, the data should follow a fairly linear

trend and have a definite rhythmic pattern of fluctuations.

The method most commonly used to compute the typical seasonal pattern is called the ratio-

to-moving-average method.

It eliminates the trend, cyclical, and irregular components from the original data (Y).

The numbers that result are called the typical seasonal indexes.

Step 1: Determine the moving total for the time series.

Step 2: Determine the moving average for the time series.

Step 3: The moving averages are then centered.

Step 4: The specific seasonal for each period is then computed by dividing the Y values with

the centered moving averages.

Step 5: Organize the specific seasonals in a table.

Step 6: Apply the correction factor.

The resulting series (sales) is called deseasonalized sales or seasonally adjusted sales.

The reason for deseasonalizing a series (sales) is to remove the seasonal fluctuations so that

the trend and cycle can be studied. A set of typical indexes is very useful in adjusting a series

(sales, for example)

Example : The data on rice production during 1990 –2000 in a large agricultural area.

Year : 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Production: 28 30 32 37 39 35 38 40 45 43 52

(Tones)

(a) fit a trend line by the method of 3-yearly moving average;

(b) Fit a trend line by the method of least squares and comment

(c) Estimate the production for the year 2003

1. Below are given the figures of food requirement (in million tons) for a country:

Year Food grain Requirement

(Million tons)

1996 20.22

1997 20.58

… …

…. …

… …

2006 23.03

(i) Fit a straight Line by the “ Least Squares Methods” and tabulate the trend value

(ii) What is the monthly increase of food requirement for this country?

Page# 10

(iii) Estimate the food requirement (in million tons) for Bangladesh in the year 2015.

Solution: (i) Computation of Trend value

Computational Table

Year Food grain

Requirement

(Y)

t = Year - 1995 t2 tY Trend Values

(in million

tons)

Elimination

of Trend

1996 20.22 1 1 20.22 20.33 -0.12

1997 20.58 2 4 41.16 20.61 -0.03

1998 20.94 3 9 62.83 20.89 0.05

1999 21.21 4 16 84.83 21.17 0.03

2000 21.49 5 25 107.45 21.45 0.04

2001 21.77 6 36 130.63 21.74 0.04

2002 22.09 7 49 154.66 22.02 0.08

2003 22.35 8 64 178.81 22.30 0.05

2004 22.55 9 81 202.94 22.58 -0.03

2005 22.86 10 100 228.56 22.86 0.00

2006 23.03 11 121 253.33 23.14 -0.11

239.09 66 506 1465.41 20.33 -0.12

Let the Trend Equation or Time series Equation be Y=a + bt

b= ntt

ntYtY

/)(

/)).((22

= 11/(66)-506

66/11239.09 -1465.412

= 0.28078

a = tbY . =11

6628078.0

11

09.239=20.0505

So the Trend Equation or Time series Equation is Y= 20.0505+ 0.28078t

(ii) Yearly increase of Food Demand as provided by linear trend is 0.28078 million tons

or 280.78 thousand tons. So the monthly increase of Food Demand is 280.78/12=

23.398 thousand tons.

(iii) The estimated food requirement (in million tons) in the year 2015 (t= 20).

Y= 20.0505+ 0.28078 20 =25.666

Estimated food requirement is 25.666 million tons

2. Below are given the figures of Mid-Year Population (in million ) for of a country:

Year Mid-Year Population

(Million)

1996 122.10

1997 124.30

… …

Page# 11

… …

… …

2006 139.10

(iv) Fit a straight Line by the “ Least Squares Methods” and tabulate the trend value

(v) What is the monthly increase of population for this country?

(vi) Estimate the population for Bangladesh in the year 2015.

Solution: (i) Computation of Trend value

Computational Table

Year Population in

million

(Y)

t = Year - 1995 tY t2 Trend Values

(in million)

Elimination

of Trend

1996 122.10 1 122.10 1 122.80 -0.70

1997 124.30 2 248.60 4 124.50 -0.20

1998 126.50 3 379.50 9 126.19 0.31

1999 128.10 4 512.40 16 127.89 0.21

2000 129.80 5 649.00 25 129.59 0.21

2001 131.50 6 789.00 36 131.28 0.22

2002 133.45 7 934.15 49 132.98 0.47

2003 135.00 8 1080.00 64 134.67 0.33

2004 136.20 9 1225.80 81 136.37 -0.17

2005 138.05 10 1380.50 100 138.07 -0.02

2006 139.10 11 1530.10 121 139.76 -0.66

Total 1444.1 66 8851.15 506

Let the Trend Equation or Time series Equation be Y=a + bt

b= ntt

ntYtY

/)(

/)).((22

= 11/(66)-506

66/111444.1 -8851.152

= 1.69591

a = tbY . =11

6669591.1

11

1.1444=121.106

So the Trend Equation or Time series Equation is Y= 121.106+ 1.69591t

(ii) Yearly increase of population as provided by linear trend is 1.69591 million or. So the

monthly increasing number of people is 1.69591million /12= 141326.

(iii) The estimated population for the country in the year 2015 (t= 20).

Y= 121.106+ 1.69591 20 =155.0242 million

Page# 12

Exponential Smoothing

General Introduction

Exponential smoothing has become very popular as a forecasting method for a wide variety

of time series data. Historically, the method was independently developed by Brown and

Holt. Brown worked for the US Navy during World War II, where his assignment was to

design a tracking system for fire-control information to compute the location of submarines.

Later, he applied this technique to the forecasting of demand for spare parts (an inventory

control problem). He described those ideas in his 1959 book on inventory control. Holt's

research was sponsored by the Office of Naval Research; independently, he developed

exponential smoothing models for constant processes, processes with linear trends, and for

seasonal data.

Simple Exponential Smoothing

A simple and pragmatic model for a time series would be to consider each observation as

consisting of a constant (b) and an error component (epsilon), that is: Xt = b + t. The

constant b is relatively stable in each segment of the series, but may change slowly over time.

If appropriate, then one way to isolate the true value of b, and thus the systematic or

predictable part of the series, is to compute a kind of moving average, where the current and

immediately preceding ("younger") observations are assigned greater weight than the

respective older observations. Simple exponential smoothing accomplishes exactly such

weighting, where exponentially smaller weights are assigned to older observations. The

specific formula for simple exponential smoothing is:

St = *Xt + (1- )*St-1

When applied recursively to each successive observation in the series, each new smoothed

value (forecast) is computed as the weighted average of the current observation and the

previous smoothed observation; the previous smoothed observation was computed in turn

from the previous observed value and the smoothed value before the previous observation,

and so on. Thus, in effect, each smoothed value is the weighted average of the previous

observations, where the weights decrease exponentially depending on the value of parameter

(alpha). If is equal to 1 (one) then the previous observations are ignored entirely; if is

equal to 0 (zero), then the current observation is ignored entirely, and the smoothed value

consists entirely of the previous smoothed value (which in turn is computed from the

smoothed observation before it, and so on; thus all smoothed values will be equal to the

initial smoothed value S0). Values of in-between will produce intermediate results.

Even though significant work has been done to study the theoretical properties of (simple and

complex) exponential smoothing the method has gained popularity mostly because of its

usefulness as a forecasting tool. Thus, regardless of the theoretical model for the process

underlying the observed time series, simple exponential smoothing will often produce quite

accurate forecasts.

Choosing the Best Value for Parameter (alpha)

Gardner (1985) discusses various theoretical and empirical arguments for selecting an

appropriate smoothing parameter. Obviously, should fall into the interval between 0 (zero)

and 1. Among practitioners, smaller than 0.30 is usually recommended. However, in the

study by Makridakis (1982), values above .30 frequently yielded the best forecasts.

Page# 13

Estimating the best value from the data. In practice, the smoothing parameter is often

chosen by a grid search of the parameter space; that is, different solutions for are tried

starting, for example, with = 0.1 to = 0.9, with increments of 0.1. Then is chosen so as

to produce the smallest sums of squares (or mean squares) for the residuals (i.e., observed

values minus one-step-ahead forecasts; this mean squared error is also referred to as ex post

mean squared error, ex post MSE for short).

Indices of Lack of Fit (Error)

The most straightforward way of evaluating the accuracy of the forecasts based on a

particular value is to simply plot the observed values and the one-step-ahead forecasts.

This plot can also include the residuals (scaled against the right Y-axis), so that regions of

better or worst fit can also easily be identified.

This visual check of the accuracy of forecasts is often the most powerful method for

determining whether or not the current exponential smoothing model fits the data. In

addition, besides the ex post MSE criterion (see previous paragraph), there are other

statistical measures of error that can be used to determine the optimum parameter (see

Makridakis, Wheelwright, and McGee, 1983):

Mean error: The mean error (ME) value is simply computed as the average error value

(average of observed minus one-step-ahead forecast). Obviously, a drawback of this measure

is that positive and negative error values can cancel each other out, so this measure is not a

very good indicator of overall fit.

Mean absolute error: The mean absolute error (MAE) value is computed as the average

absolute error value. If this value is 0 (zero), the fit (forecast) is perfect. As compared to the

mean squared error value, this measure of fit will "de-emphasize" outliers, that is, unique or

rare large error values will affect the MAE less than the MSE value.

Sum of squared error (SSE), Mean squared error. These values are computed as the sum

(or average) of the squared error values. This is the most commonly used lack-of-fit indicator

in statistical fitting procedures.

Percentage error (PE). All the above measures rely on the actual error value. It may seem

reasonable to rather express the lack of fit in terms of the relative deviation of the one-step-

ahead forecasts from the observed values, that is, relative to the magnitude of the observed

values. For example, when trying to predict monthly sales that may fluctuate widely (e.g.,

seasonally) from month to month, we may be satisfied if our prediction "hits the target" with

about ±10% accuracy. In other words, the absolute errors may be not so much of interest as

Page# 14

are the relative errors in the forecasts. To assess the relative error, various indices have been

proposed (see Makridakis, Wheelwright, and McGee, 1983). The first one, the percentage

error value, is computed as:

PEt = 100*(Xt - Ft )/Xt

where Xt is the observed value at time t, and Ft is the forecasts (smoothed values).

Mean percentage error (MPE). This value is computed as the average of the PE values.

Mean absolute percentage error (MAPE). As is the case with the mean error value (ME,

see above), a mean percentage error near 0 (zero) can be produced by large positive and

negative percentage errors that cancel each other out. Thus, a better measure of relative

overall fit is the mean absolute percentage error. Also, this measure is usually more

meaningful than the mean squared error. For example, knowing that the average forecast is

"off" by ±5% is a useful result in and of itself, whereas a mean squared error of 30.8 is not

immediately interpretable.

Automatic search for best parameter. A quasi-Newton function minimization procedure

(the same as in ARIMA is used to minimize either the mean squared error, mean absolute

error, or mean absolute percentage error. In most cases, this procedure is more efficient than

the grid search (particularly when more than one parameter must be determined), and the

optimum parameter can quickly be identified.

The first smoothed value S0. A final issue that we have neglected up to this point is the

problem of the initial value, or how to start the smoothing process. If you look back at the

formula above, it is evident that you need an S0 value in order to compute the smoothed value

(forecast) for the first observation in the series. Depending on the choice of the parameter

(i.e., when is close to zero), the initial value for the smoothing process can affect the

quality of the forecasts for many observations. As with most other aspects of exponential

smoothing it is recommended to choose the initial value that produces the best forecasts. On

the other hand, in practice, when there are many leading observations prior to a crucial actual

forecast, the initial value will not affect that forecast by much, since its effect will have long

"faded" from the smoothed series (due to the exponentially decreasing weights, the older an

observation the less it will influence the forecast).

Seasonal and Non-Seasonal Models With or Without Trend

The discussion above in the context of simple exponential smoothing introduced the basic

procedure for identifying a smoothing parameter, and for evaluating the goodness-of-fit of a

model. In addition to simple exponential smoothing, more complex models have been

developed to accommodate time series with seasonal and trend components. The general idea

here is that forecasts are not only computed from consecutive previous observations (as in

simple exponential smoothing), but an independent (smoothed) trend and seasonal

component can be added. Gardner (1985) discusses the different models in terms of

seasonality (none, additive, or multiplicative) and trend (none, linear, exponential, or

damped).

Additive and multiplicative seasonality. Many time series data follow recurring seasonal

patterns. For example, annual sales of toys will probably peak in the months of November

and December, and perhaps during the summer (with a much smaller peak) when children are

on their summer break. This pattern will likely repeat every year, however, the relative

http://www.statsoft.com/textbook/time-series-analysis/?button=3#arima

Page# 15

amount of increase in sales during December may slowly change from year to year. Thus, it

may be useful to smooth the seasonal component independently with an extra parameter,

usually denoted as (delta).

Seasonal components can be additive in nature or multiplicative. For example, during the

month of December the sales for a particular toy may increase by 1 million dollars every

year. Thus, we could add to our forecasts for every December the amount of 1 million dollars

(over the respective annual average) to account for this seasonal fluctuation. In this case, the

seasonality is additive.

Alternatively, during the month of December the sales for a particular toy may increase by

40%, that is, increase by a factor of 1.4. Thus, when the sales for the toy are generally weak,

than the absolute (dollar) increase in sales during December will be relatively weak (but the

percentage will be constant); if the sales of the toy are strong, than the absolute (dollar)

increase in sales will be proportionately greater. Again, in this case the sales increase by a

certain factor, and the seasonal component is thus multiplicative in nature (i.e., the

multiplicative seasonal component in this case would be 1.4).

In plots of the series, the distinguishing characteristic between these two types of seasonal

components is that in the additive case, the series shows steady seasonal fluctuations,

regardless of the overall level of the series; in the multiplicative case, the size of the seasonal

fluctuations vary, depending on the overall level of the series.

The seasonal smoothing parameter . In general the one-step-ahead forecasts are

computed as (for no trend models, for linear and exponential trend models a trend component

is added to the model; see below):

Additive model:

Forecastt = St + It-p

Multiplicative model:

Forecastt = St*It-p

In this formula, St stands for the (simple) exponentially smoothed value of the series at time t,

and It-p stands for the smoothed seasonal factor at time t minus p (the length of the season).

Thus, compared to simple exponential smoothing, the forecast is "enhanced" by adding or

multiplying the simple smoothed value by the predicted seasonal component. This seasonal

component is derived analogous to the St value from simple exponential smoothing as:

Additive model:

It = It-p + *(1- )*et

Multiplicative model:

It = It-p + *(1- )*et/St

Put into words, the predicted seasonal component at time t is computed as the respective

seasonal component in the last seasonal cycle plus a portion of the error (et; the observed

Page# 16

minus the forecast value at time t). Considering the formulas above, it is clear that parameter

can assume values between 0 and 1. If it is zero, then the seasonal component for a

particular point in time is predicted to be identical to the predicted seasonal component for

the respective time during the previous seasonal cycle, which in turn is predicted to be

identical to that from the previous cycle, and so on. Thus, if is zero, a constant unchanging

seasonal component is used to generate the one-step-ahead forecasts. If the parameter is

equal to 1, then the seasonal component is modified "maximally" at every step by the

respective forecast error (times (1- ), which we will ignore for the purpose of this brief

introduction). In most cases, when seasonality is present in the time series, the optimum

parameter will fall somewhere between 0 (zero) and 1(one).

Linear, exponential, and damped trend. To remain with the toy example above, the sales

for a toy can show a linear upward trend (e.g., each year, sales increase by 1 million dollars),

exponential growth (e.g., each year, sales increase by a factor of 1.3), or a damped trend

(during the first year sales increase by 1 million dollars; during the second year the increase is

only 80% over the previous year, i.e., $800,000; during the next year it is again 80% less than

the previous year, i.e., $800,000 * .8 = $640,000; etc.). Each type of trend leaves a clear

"signature" that can usually be identified in the series; shown below in the brief discussion of

the different models are icons that illustrate the general patterns. In general, the trend factor

may change slowly over time, and, again, it may make sense to smooth the trend component

with a separate parameter (denoted [gamma] for linear and exponential trend models, and

[phi] for damped trend models).

The trend smoothing parameters (linear and exponential trend) and (damped

trend). Analogous to the seasonal component, when a trend component is included in the

exponential smoothing process, an independent trend component is computed for each time,

and modified as a function of the forecast error and the respective parameter. If the

parameter is 0 (zero), than the trend component is constant across all values of the time

series (and for all forecasts). If the parameter is 1, then the trend component is modified

"maximally" from observation to observation by the respective forecast error. Parameter

values that fall in-between represent mixtures of those two extremes. Parameter is a trend

modification parameter, and affects how strongly changes in the trend will affect estimates of

the trend for subsequent forecasts, that is, how quickly the trend will be "damped" or

increased.

CHAPTER-8 : TEST OF HYPOTHESISgauss.stat.su.se/gu/e/slides/Time Series/Forecast.pdf · observed...

Documents

Transcript of CHAPTER-8 : TEST OF HYPOTHESISgauss.stat.su.se/gu/e/slides/Time Series/Forecast.pdf · observed...