DETERMINANTS OF THE GOLD PRICE - Ghent University€¦ · UNIVERSITEIT GENT FACULTEIT ECONOMIE EN...
Transcript of DETERMINANTS OF THE GOLD PRICE - Ghent University€¦ · UNIVERSITEIT GENT FACULTEIT ECONOMIE EN...
UNIVERSITEIT GENT
FACULTEIT ECONOMIE EN BEDRIJFSKUNDE
ACADEMIEJAAR 2011 – 2012
DETERMINANTS OF THE GOLD PRICE
Masterproef voorgedragen tot het bekomen van de graad van
Master in de Toegepaste Economische Wetenschappen: Handelsingenieur
Bernard Dierinck
onder leiding van:
Prof. dr. Michael Frömmel
Prof. dr. ir. Benjamin Schrauwen
Begeleiders: ir. Michiel Hermans ir. Francis wyffels
UNIVERSITEIT GENT
FACULTEIT ECONOMIE EN BEDRIJFSKUNDE
ACADEMIEJAAR 2011 – 2012
DETERMINANTS OF THE GOLD PRICE
Masterproef voorgedragen tot het bekomen van de graad van
Master in de Toegepaste Economische Wetenschappen: Handelsingenieur
Bernard Dierinck
onder leiding van:
Prof. dr. Michael Frömmel
Prof. dr. ir. Benjamin Schrauwen
Begeleiders: ir. Michiel Hermans ir. Francis wyffels
Permission Undersigned declares that the content of this master thesis may be consulted and
reproduced when referencing to it.
Bernard Dierinck
I
I Foreword I always have had a sincere interest in the financial markets and saw it as an opportunity to be able
to write my thesis about something which is located on the border between economics and artificial
intelligence.
I would like to take this opportunity to thank Prof. dr. Frömmel and Prof. dr. ir. Schrauwen since they
offered me the possibility to do this thesis. I would also like to thank ir. Francis wyffels and ir. Michiel
Hermans who invested their time and effort to teach me techniques, which are seldom used by
students with an economic background. I also thank Prof. dr. Gerdie Everaert who helped me with
some econometrical questions.
Besides these people, I would also like to thank my family and friends since they provide me with an
ideal environment to work and live.
II
II Table of contents I Foreword ................................................................................................................................... I
II Table of contents .................................................................................................................... II
III Abbreviations ....................................................................................................................... III
III.I General abbreviations .................................................................................................... III
III.II Mathematical abbreviations ......................................................................................... IV
IV List of tables .......................................................................................................................... V
V List of figures ........................................................................................................................ VII
VI List of graphs ..................................................................................................................... VIII
VII Summary (Dutch) ................................................................................................................ IX
1.Introduction ............................................................................................................................. 1
2.Gold and its history ................................................................................................................. 2
2.1.The history of gold............................................................................................................ 2
2.2.Reasons for central banks to invest in gold ..................................................................... 3
2.3.The evolution of the gold price ........................................................................................ 4
2.4.London gold market fixing ............................................................................................... 5
2.5.Supply and demand .......................................................................................................... 6
3.Literature Review .................................................................................................................... 9
3.1.Macroeconomic approach ............................................................................................... 9
3.2.Speculation approach ....................................................................................................... 9
3.3.Inflation hedge approach ............................................................................................... 10
3.4.The artificial intelligence approach ................................................................................ 11
3.5.Conclusion ...................................................................................................................... 12
4.Determinants of the gold price ............................................................................................. 13
4.1.Short run supply and demand ........................................................................................ 13
4.2.Factors affecting the gold price in the long run ............................................................. 17
5.Data and research questions ................................................................................................. 18
5.1.Description of the variables ........................................................................................... 20
6.Regression ............................................................................................................................. 27
6.1.Technique ....................................................................................................................... 27
6.2.Results ............................................................................................................................ 27
7.Recurrent neural networks ................................................................................................... 40
7.1.Reservoir computing ...................................................................................................... 40
7.2.Backpropagation through time ...................................................................................... 48
8.Gaussian processes for machine learning ............................................................................. 59
8.1.Technique ....................................................................................................................... 59
8.2.Results ............................................................................................................................ 60
9.Discussion .............................................................................................................................. 63
10.Conclusion ........................................................................................................................... 70
VIII References .......................................................................................................................... XI
III
III Abbreviations
III.I General abbreviations ADF Augmented Dickey Fuller
ADL Autoregressive Distributed Lag
BPTT Backpropagation Through Time
CBGA Central Bank Gold Agreement
CLI Composite Leading Indicator
CPI Consumer Price Index
CRDP Credit Risk Default Premium
DF Dickey Fuller
ECM Error Correction Model
ER Exchange Rate
GP Gaussian Processes
MAE Mean Absolute Error
MSE Mean Squared Error
NMSE Normalised Mean Square Error
OECD Organisation for Economic Co-operation and Development
OLS Ordinary Least Squares
RC Reservoir Computing
USGS United States Geological Survey
WGC World Gold Council
IV
III.II Mathematical abbreviations Gold Price (end of month)
Gold Lease Rate
Trade Weighted Exchange Rate Index
Beta
CPI US
Inflation US
Volatility US Inflation
CPI World
Inflation World
Volatility World Inflation
CRDP
CLI
V
IV List of tables
TABLE 1: GOLD PRODUCING COUNTRIES (MINING) (2010) ....................................................... 7
TABLE 2: JEWELLERY CONSUMPTION (2010) ............................................................................. 8
TABLE 3: INFLUENCE OF AN INCREASE OF THE DETERMINANTS ON SUPPLY AND PRICE ....... 17
TABLE 4: INFLUENCE OF AN INCREASE OF THE DETERMINANTS ON DEMAND AND PRICE .... 17
TABLE 5: VARIABLES ................................................................................................................. 18
TABLE 6: THEORETICAL IMPACT ON THE GOLD PRICE OF AN INCREASE IN THE VARIABLES... 19
TABLE 7: UNIT ROOT TESTING, WITH X BEING THE ORDER OF INTEGRATION ........................ 27
TABLE 8: REGRESSION RESULTS OF THE LONG RUN RELATIONSHIP (P-VALUE BETWEEN BRACKETS) ................................................................................................................................ 29
TABLE 9: ADF TEST RESULTS ON THE RESIDUALS (P-VALUE BETWEEN BRACKETS) ................. 29
TABLE 10: TABLE OF MACKINNON ........................................................................................... 29
TABLE 11: GRANGER REPRESENTATION THEOREM REGRESSION RESULTS (P-VALUE BETWEEN BRACKETS) ................................................................................................................................ 30
TABLE 12: GOLD AS A HEDGE (1971-1995) .............................................................................. 32
TABLE 13: GOLD AS A HEDGE (1971-2011) .............................................................................. 32
TABLE 14: ADL REGRESSION RESULTS WITHOUT DUMMY VARIABLES (P-VALUE BETWEEN BRACKETS) ................................................................................................................................ 34
TABLE 15: ADL REGRESSION RESULTS WITH DUMMY VARIABLES (P-VALUE BETWEEN BRACKETS) ................................................................................................................................ 34
TABLE 16: REGRESSION RESULTS ADL MODEL ......................................................................... 36
TABLE 17: TEST RESULTS FOR THE ADL MODEL (2010-2011) .................................................. 38
TABLE 18: ESTIMATION RESULTS FOR THE ADL MODEL (1990-2009) ..................................... 38
TABLE 19: VALIDATION ERROR RESERVOIR COMPUTING ........................................................ 44
TABLE 20: OPTIMAL COMBINATION OF META-PARAMETERS FOR RC .................................... 45
TABLE 21: TRAINING ERROR FOR RC ........................................................................................ 45
TABLE 22: TEST ERROR FOR RC ................................................................................................ 45
TABLE 23: TEST ERROR OF BEST MODEL FOR RC ..................................................................... 45
TABLE 24: VALIDATION ERROR FOR RC (4 FOLDS) ................................................................... 46
TABLE 25: TEST ERROR FOR RC (4 FOLDS) ................................................................................ 47
TABLE 26: ERROR MEASURES FOR BPTT-MODEL 1 .................................................................. 50
TABLE 27: STANDARD DEVIATION ON ERROR MEASURES FOR BPTT-MODEL 1 ...................... 50
TABLE 28: ERROR OF THE BEST BPTT-MODEL 1 ....................................................................... 50
TABLE 29: IMPACT OF THE DIFFERENT CATEGORIES ON THE PREDICTED GOLD PRICE FOR BPTT-MODEL 1 ......................................................................................................................... 51
TABLE 30: INDEPENDENT VARIABLES ....................................................................................... 52
TABLE 31: ERROR MEASURES FOR BPTT-MODEL 2 .................................................................. 53
TABLE 32: STANDARD DEVIATION ON ERROR MEASURES FOR BPTT-MODEL 2 ...................... 53
TABLE 33: ERROR OF THE BEST BPTT-MODEL 2 ....................................................................... 53
TABLE 34: IMPACT OF THE DIFFERENT CATEGORIES ON THE PREDICTED GOLD PRICE FOR BPTT-MODEL 2 ......................................................................................................................... 54
TABLE 35: ERROR MEASURES FOR BPTT-MODEL 3 .................................................................. 56
TABLE 36: STANDARD DEVIATION ON ERROR MEASURES FOR BPTT-MODEL 3 ...................... 56
TABLE 37: ERROR OF THE BEST BPTT-MODEL 3 ....................................................................... 56
VI
TABLE 38: IMPACT OF THE DIFFERENT CATEGORIES ON THE PREDICTED GOLD PRICE FOR BPTT-MODEL 3 ......................................................................................................................... 57
TABLE 39: RESULTS FOR GAUSSIAN PROCESSES ...................................................................... 60
TABLE 40: DIFFERENT ERROR MEASURES FOR THE DIFFERENT MODELS (BEST CELL PER LINE IN GREY) .................................................................................................................................... 63
TABLE 41: COMPUTATIONAL POWER RANKING ...................................................................... 65
TABLE 42: OVERVIEW OF THE IMPORTANCE OF THE DIFFERENT VARIABLES (IMPORTANT VARIABLE: CELL IS GREY) .......................................................................................................... 67
VII
V List of figures
FIGURE 1: GOLD DEMAND 2011................................................................................................. 6 FIGURE 2: GOLD IN CIRCULATION (TOTAL=165600 TONNES) .................................................... 8 FIGURE 3: SHORT RUN SUPPLY (THE EFFECT OF AN INCREASE IN THE STATED VARIABLES HAS EITHER A POSITIVE OR A NEGATIVE IMPACT ON THE SUPPLY) ................................................ 14 FIGURE 4: SHORT RUN DEMAND (THE EFFECT OF AN INCREASE IN THE STATED VARIABLES HAS EITHER A POSITIVE OR A NEGATIVE IMPACT ON THE DEMAND) ..................................... 16 FIGURE 5: EXAMPLE OF A RESERVOIR WITH IN- AND OUTPUT ............................................... 41 FIGURE 6: EXAMPLE OF A NETWORK WITH IN- AND OUTPUT (BPTT) ..................................... 48
VIII
VI List of graphs
GRAPH 1: GOLD PRICE (1971-2011) ........................................................................................... 4
GRAPH 2: GOLD PRICE (1990-2011) ........................................................................................... 5
GRAPH 3: EVOLUTION OF THE GOLD DEMAND ......................................................................... 6
GRAPH 4: EVOLUTION OF THE GOLD SUPPLY ............................................................................ 7
GRAPH 5: PRICE OF GOLD AND CONSUMER PRICE INDEX (CPI) US ......................................... 20
GRAPH 6: PRICE OF GOLD AND CPI WORLD ............................................................................. 21
GRAPH 7: COMPARE CPI'S (11/1990 = 100) ............................................................................. 21
GRAPH 8: PRICE OF GOLD AND INFLATION US ........................................................................ 22
GRAPH 9: PRICE OF GOLD AND INFLATION WORLD ................................................................ 22
GRAPH 10: PRICE OF GOLD AND VOLATILITY OF INFLATION US .............................................. 23
GRAPH 11: PRICE OF GOLD AND VOLATILITY OF INFLATION WORLD ..................................... 23
GRAPH 12: PRICE OF GOLD AND CLI ........................................................................................ 24
GRAPH 13: PRICE OF GOLD AND US-WORLD EXCHANGE RATE INDEX .................................... 24
GRAPH 14: PRICE OF GOLD AND GOLD LEASE RATE ................................................................ 25
GRAPH 15: PRICE OF GOLD AND GOLD’S BETA ........................................................................ 25
GRAPH 16: PRICE OF GOLD AND CRDP .................................................................................... 26
GRAPH 17: PREDICTED VS ACTUAL GOLD PRICE (ADL-MODEL) ............................................... 38
GRAPH 18: PREDICTED VS ACTUAL GOLD PRICE FOR 200 NEURONS (RC) .............................. 46
GRAPH 19: PREDICTED VS ACTUAL GOLD PRICE FOR 100 NEURONS (BPTT-MODEL 1) .......... 51
GRAPH 20: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE DIRECT WAY U (BPTT-MODEL 1) ....................................................................................................................... 52
GRAPH 21: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE INDIRECT WAY V (BPTT-MODEL 1) ....................................................................................................................... 52
GRAPH 22: PREDICTED VS ACTUAL GOLD PRICE FOR 100 NEURONS (BPTT-MODEL 2) .......... 54
GRAPH 23: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE DIRECT WAY U (BPTT-MODEL 2) ....................................................................................................................... 54
GRAPH 24: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE INDIRECT WAY V (BPTT-MODEL 2) ....................................................................................................................... 55
GRAPH 25: PREDICTED VS ACTUAL GOLD PRICE FOR 100 NEURONS (BPTT-MODEL 3) .......... 57
GRAPH 26: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE DIRECT WAY U (BPTT-MODEL 3) ....................................................................................................................... 58
GRAPH 27: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE INDIRECT WAY V (BPTT-MODEL 3) ....................................................................................................................... 58
GRAPH 28: GAUSSIAN PROCESSES (DOTTED LINE: INDIVIDUAL GAUSSIANS; FULL LINE: LINEAR COMBINATION OF GAUSSIANS) ............................................................................................... 59
GRAPH 29: PREDICTED VS ACTUAL GOLD PRICE (GP) .............................................................. 61
GRAPH 30: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES (GP) .............................. 62
GRAPH 31: PREDICTED VS ACTUAL GOLD PRICE (BEST PREDICTION) ...................................... 65
IX
VII Summary (Dutch) De mens heeft altijd al een zekere fascinatie voor goud gehad. De Grieken, de Romeinen, de
Egyptenaren, allen hebben ze goud gebruikt als handelsmunt, als teken van rijkdom of als
vluchthaven in tijden van crisis. Door de recente economische problemen, is goud terug prominent
aanwezig in de wereld van de beleggers. Dit leidde de laatste jaren tot enorme prijsstijgingen,
waarbij de goudprijs in nominale waarde nieuwe hoogtes verkent. Als we echter zouden kijken naar
de goudprijs in constante 2011 waarde, dan zien we dat de maandelijkse goudprijs zijn hoogste
niveau haalde in januari 1980 met een goudprijs van $1958.85.
De goudprijs modeleren is nooit eenvoudig geweest. In tegenstelling tot andere activa, zoals
aandelen of obligaties, biedt goud geen dividenden of interest coupons. Daardoor kan de goudprijs
niet berekend worden door toekomstige kasstromen te gaan verdisconteren. Een andere manier om
naar de goudprijs te kijken, is de drijvers van vraag en aanbod te gaan analyseren. Eenmaal men
weet wat de goudprijs drijft, kan men daaruit de impact van een verandering in een drijver op de
goudprijs gaan bekijken. Deze thesis zal dan ook de verschillende modellerings- en
voorspellingstechnieken voor de goudprijs onderzoeken, zoals lineair regressie, reservoir computing,
backpropagation through time en Gaussian processes. Daarnaast zal ook ingegaan worden op de
determinanten die de goudprijs drijven. Deze thesis wordt geschreven vanuit de ideeën van de
inflatoire stroming waarbij goud bescherming moet bieden tegen inflatie.
De beste techniek is backpropagation through time. Binnen deze modellen presteert het model
waarbij de metaparameters worden geoptimaliseerd en de recurrente gewichten worden getraind,
het beste. Reservoir computing is ook zeer belovend, maar enkel wanneer het correct uitgevoerd
wordt. Dit wil zeggen, datapunten na de validatie set weglaten. Lineaire regressie is een van de
betere technieken omdat het goede voorspellingen geeft en vrij eenvoudig is om te interpreteren.
Twee belangrijke opmerkingen kunnen wel over lineaire regressie gemaakt worden. Ten eerste, voor
de periode november 1990 tot december 2011, waren we niet in staat om een cointegratie relatie te
vinden tussen de goudprijs en het algemeen prijsniveau in de VS. Dit wil hoegenaamd niet zeggen dat
goud daarom geen goede bescherming tegen inflatie kan zijn. Aangezien er geen cointegratie
aanwezig was, hebben we dan ook een ADL-model gebruikt voor de goudprijs. De tweede opmerking
heeft te maken met het feit dat wanneer het lineair karakter weg is, lineaire modellen zwakker
presteren dan niet-lineaire modellen.
Betreffende het gebruiken van technieken voor tijdreeks analyse, kunne twee suggesties gegeven
worden. Indien het qua tijd en financieel budget mogelijk is, gebruikt men best zoveel mogelijk
X
technieken. Wanneer de verschillende technieken elkaar bevestigen, dan zal de waarschijnlijkheid
dat de goudprijs zo zal bewegen, hoger zijn in vergelijk met de situatie waar ze elkaar tegenspreken.
De tweede suggestie heeft te maken met de keuze van techniek. Als men door omstandigheden niet
meerdere technieken kan toepassen, dan moet men ervoor zorgen dat men de juiste kiest. Het is
namelijk zo dat elke techniek voor- en nadelen heeft, maar het komt er op neer de juiste techniek te
vinden die het beste aansluit bij bepaalde specifieke karakteristieken van een probleem.
Om een antwoord te vinden op de tweede onderzoeksvraag, werd het belang van de verschillende
variabelen in de verschillende modellen nagegaan. Hieruit bleek dat acht van de twaalf variabelen
belangrijk zijn bij de goudprijsmodellering. Van deze acht, hebben vier een meer directe, lineaire
impact: de goudprijs in de vorige periode, de inflatie in de VS, de wisselkoers en het prijsniveau in de
wereld. De andere vier variabelen hebben een meer indirecte, niet-lineaire impact: de bèta van goud,
de leasing interest van goud, de volatiliteit van de wereld inflatie en het prijsniveau in de VS. Dit alles
kan vrij eenvoudig samengevat worden: de goudprijs in de huidige periode is gelijk aan de goudprijs
in de vorige periode plus wat extra’s.
Gebaseerd op de resultaten van deze studie, zal de goudprijs minstens nog drie jaar toenemen,
waarna een daling kan plaatsvinden. Zes van de acht variabelen die een impact hebben op de
goudprijs, wijzen allen op een stijging van de goudprijs. Voor de andere twee variabelen is het
moeilijker om een uitspraak te doen. Een potentiële maximum goudprijs voor de komende twee à
drie jaar is $2465.57 per ounce, aangezien dit de goudprijs is in constante 2011 termen die bereikt
werd op 21 januari 1980.
1
1.Introduction In mankind’s history different nations, like the Romans, the Greeks, the Egyptians and others wanted
to posses as much gold as possible. They used gold as a way to pay for resources, as a token of
wealth, as a safe-haven in times of war or political turmoil... Each nation had its reasons to own as
much gold as possible. The last decade the interest in gold has significantly increased, gold is popular
anno 2012. The demand for gold is high while the supply is decreasing year by year which leads to
quotations of more than $1500 per ounce.
Before investing in gold, one needs to get an idea of the fair value of gold. Modelling the value of this
asset is considered to be difficult. When one invests in assets that offer recurrent yields like bonds
and shares, it is less difficult to get an idea of the value. The many determinants of the gold price, the
absence of recurrent cashflows, the psychological aspect of gold... all contribute to the fact that gold
is hard to value. The price of gold is now modelled via linear regression techniques like cointegration
that use as input variables the US inflation, the beta of gold... However these techniques are not
perfect. In case of a breach in the model during training, for example going from a linear to a non-
linear model, machine learning techniques can add value to the modelling task. The purpose of this
thesis is not only to get a clear view on the most important determinants of the gold price but also to
know which modelling and forecasting technique is the most promising one.
This thesis will begin with a short explanation about gold and its history. After this explanation an
overview of the literature is presented, where there is a short elaboration on the different views of
the drivers of the gold price. In the chapter following this overview, the different determinants of the
gold price are presented. The influence of every determinant is first explained in a more qualitative
way after which there will be an empirical argumentation in the following chapter. In that chapter,
we will also present the data that is used and its sources. The thesis will then continue with three
more quantitative chapters. The first of these three chapters will try to model the gold price with
classic linear regression techniques, while the second one will use artificial intelligence, more specific
reservoir computing and backpropagation through time to model the gold price. The third of these
three chapters will make use of Gaussian processes to get an idea of the gold price model. After
these three chapters, there will be a chapter that summarizes and compares the results of the three
preceding chapters. The final chapter of this thesis consists of a thorough conclusion.
2
2.Gold and its history
2.1.The history of gold Already early on in history, mankind used gold to trade, as a token of wealth… The history of gold
started around 5000 years ago in Egypt and Nubia (which is an area in the northern part of Sudan). In
1500 BC, merchants in the Middle-East recognized gold as the standard medium of exchange. In 1091
BC golden leaflets were legalized as money in China. The Romans started to use gold in a frequent
manner in the first century BC. They used the Aureus, which was a golden coin with a weight of
around 8 grams. During the period where Egypt and Rome were flourishing, annual gold production
was around one tonne annually.
In 1377 AD England developed a monetary system which was based on gold and silver. During these
Dark- and Middle-Ages gold production dropped to an amount of less than one tonne annually. From
the 15th century on, Africa started to excavate gold which increased annual gold production to
around five to eight tonnes annually. Annual gold production arrived at a new height with the
discovery of North- and South-America. In 1851, thanks to the huge gold output of California, annual
gold production was around 77 tonnes. With the discovery of gold in Australia, gold production even
soared to 280 tonnes in 1852.
In 1844 the Bank of England was the first central bank to adopt the gold standard. The gold standard
is a monetary system where the issued amount of paper is tightly or loosely tied to the central bank
gold reserve. In this system countries all have fixed exchange rates with each other. To maintain
these fixed exchange rates, central banks were obliged to adjust their interest rates. By the year
1890, most of the world had adopted the gold standard. The increasing popularity of the gold
standard can be explained by its network effects. First of all, countries that refused in the beginning
to adapt to the gold standard were facing difficulties in attracting money. Secondly, companies that
were doing business in a country with the gold standard, were able to do better investment planning
because of lower inflation volatility. A major drawback of the system was the potential
destabilization of countries with a trade deficit. If there was a trade deficit between two countries,
central banks were obliged to solve it by exchanging the deficit in gold. When inhabitants of a
country with a trade deficit, knew that there was a trade deficit, they often did a bank run to
exchange their paper money in gold before it was transferred to the other country. This could
destabilize countries that already suffered from lower economic activity (Eichengreen & Flandreau,
1997).
3
In 1914, with the outbreak of World War I, the United Kingdom decided to leave the gold standard.
By doing so, they were able to finance the war by printing more paper money. After the first World
War the US and UK did some serious attempts to get back to the gold standard. However none of the
attempts was successful.
By the end of the second World War, a new monetary system was established between 44 countries:
the Bretton Woods System. In the Bretton Woods System, all currencies are tied to the dollar and the
dollar is tied to gold. The first established parity was 35 dollars per ounce. The Bretton Woods
System came to an end by 1971 because the US kept on printing dollars to finance its war in Vietnam.
This caused other countries to lose their trust in the maintainability of the parity.
Since the seventies, the price of gold is free to move, it increases or decreases as a result of changes
in supply or demand. Nevertheless central banks intervened once more on 26/9/1999. With the
Central Bank Agreement On Gold (CBGA) they stressed the importance of gold as a reserve asset.
This agreement limited the sale of gold by central banks by 400 tonnes a year and 2000 tonnes in the
next five years. It also limited the amount of gold leased to the level of previous years. This
agreement was renewed after five years: on 8/3/2004 the central banks decided to renew it, but
changed the amounts to 500 tonnes annually and 2500 tonnes in the next five years.
2.2.Reasons for central banks to invest in gold According to the World Gold Council there are multiple different reasons why central banks invest in
gold:
Gold acts as a hedge against inflation.
Central banks get income from leasing gold.
Gold offers worldwide confidence.
Gold offers a diversification benefit.
Changes in the world monetary system do not affect gold.
Gold provides physical safety in case other assets are blocked on accounts.
The price of gold is unaffected by bad government decisions, unlike currencies that are prone
to bad decisions made by governments.
4
2.3.The evolution of the gold price The nominal gold price (GRAPH 1) moved from $37.87 in January 1971 to $1652.31 in December
2011. There was a serious increase in the gold price around 1980, that was caused by the oil crisis,
floating exchange rates and the silver cornering. The silver cornering was an attempt by the Hunt
Brothers to manipulate the value of silver. There is the possibility to corner the market if you control
the price of an asset, like for example silver. The day when the Brothers were unable to meet the
margin calls on their futures, was named Silver Thursday. Due to the serious decline in the silver
price, other assets that previously also benefitted from the surge, now collapsed. The last decade the
price of gold reached new heights (in nominal terms). This was caused by financial turmoil, increasing
demand coming from developing countries, the CBGA …
GRAPH 1: GOLD PRICE (1971-2011)
Gold is often perceived as being a hedge against inflation. The inflation hedge gold price (GRAPH 1)
shows us that if an investor bought gold when the Bretton Woods system stopped (1971), the
investor would have been able to maintain his wealth. This is because the nominal gold price is
always higher than the inflation hedge gold price. The gold price in constant 2011 terms shows that
an investor investing in gold at any time period and holding his position, would never have lost
wealth except for the period around 1980. The gold price in January 1980 in constant 2011 terms is
$1958.85, meaning that even the gold price peak of the last year does not yet reach the peak from
1980.
$-
$500,00
$1.000,00
$1.500,00
$2.000,00
$2.500,00
jan
/71
mrt
/74
mei
/77
jul/
80
sep
/83
no
v/8
6
jan
/90
mrt
/93
mei
/96
jul/
99
sep
/02
no
v/0
5
jan
/09
Nominal gold price
Inflation hedge gold price 1971
Gold price in constant 2011 terms
5
If an investor bought gold in December 1990 (GRAPH 2), he had to wait to sell the asset until April
2006. Selling before April 2006 would mean that the investor lost wealth because the price of gold
was insufficient to meet inflation, the nominal gold price is lower than the inflation hedge gold price.
Buying gold at any period and holding the position until the end of 2011 offered an increase in value
that was more than sufficient to meet inflation demands. The gold price in constant 2011 terms
(GRAPH 2) is almost never higher than the nominal gold price at the end of 2011.
GRAPH 2: GOLD PRICE (1990-2011)
2.4.London gold market fixing Since 12 September 1919, London is the centre in the world for fixing the gold price. On the 12th of
September 1919 the five member banks, N.M. Rotschild & Sons, Mocatta & Goldsmid, Samuel
Montagu & Co, Pixley & Abell and Sharps & Wilkings, fixed for the first time the gold price at £4.94
per ounce. Until 1968, the fix was in Sterling’s but from then on, it was quoted in dollars. Around that
same period, an afternoon fix was also introduced to better serve the American market.
Anno 2012 the London fix is established two times per day by five members, once at 10.30 and once
at 15.00. The five current members are Scotiabank, HSBC, Barclays Capital, Deutsche Bank and
Société Générale Corporate & Investment Banking.
At the start of every fixing the chairman announces a price. After this, the five members relay this
price back to their customers. Then the members present themselves as sellers or buyers. If there are
sellers and buyers, the banks are asked how many bars they would like to trade at the quoted price.
If there are no sellers or buyers, or the amount of bars at which there can be a trade is not within 50
bars, a new price is quoted. This procedure will be repeated as long as there is no balance. The fix,
the price where there is a balance, will serve as benchmark for valuing gold derivatives, like futures.
6
2.5.Supply and demand This section will explain in detail the supply and demand for gold (World Gold Council). It will also
give an overview of the top gold consuming and producing countries. It finishes with an overview of
all the gold that is ever excavated.
GRAPH 3: EVOLUTION OF THE GOLD DEMAND
In 2011, the total demand for gold was 3993.3 tonnes, which is almost 4% less than a year before
when demand was 4151.4 tonnes. Despite this, the price of gold increased from $1224.5 in 2010 to
$1571.5 in 2011 (annual averages). This increase of the gold price can partly be explained by the
decrease in the total supply of gold. The rising price of gold is the main driver for the lower gold
demand for jewellery (GRAPH 3). Since 2003 demand for gold for investment reasons kept on rising
notwithstanding the higher prices and reached a new height of 1640.7 tonnes in 2011. Gold appears
to act as a kind of safe-haven for investors in times of economical and financial turmoil. The gold
demand for industry reasons remained stable over the last decade, around 500 tonnes annually. The
category “other” is a correction factor to make sure that supply equals demand. In FIGURE 1 it is
clear that the gold demand for jewellery is still higher than the demand for investment reasons, but
as can be seen in GRAPH 3 the gap is closing.
FIGURE 1: GOLD DEMAND 2011
-500
0
500
1000
1500
2000
2500
3000 2
00
2
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
ton
ne
Industry
Investment
Jewellery
Other
Industry: 464 tonnes
Investment: 1641 tonnes
Jewellery: 1963 tonnes
7
Gold that is being offered on the market, comes from three sources: mining, recycling and official
sector sales. Mining was able to supply 2821.7 tonnes of gold in 2011 (GRAPH 4). The increase in gold
coming from mines only started three years ago, which is two to three years after the start of the
boom in the gold price. The reason for this slow reaction is the required time to open a new mine for
gold digging.
GRAPH 4: EVOLUTION OF THE GOLD SUPPLY
In GRAPH 4, the gold coming from recycling shows a minor drop of around 2.5% in 2011. There is
however a clear difference per continent. In India and China, gold recycling dropped while in the EU
and US, where gold recycling is only a recent phenomena, it increased (WGC). Since the financial
crisis, governments and central banks (official sector) stopped liquidating their gold reserves and
even started to increase their gold reserves. Central banks try to diversify their assets as much as
possible and try to reduce their assets in Euros or Dollars, because they fear a fierce depreciation in
the future. Where central banks by the end of the nineties still needed a CBGA to limit their sales of
gold, they now are net buyers of gold.
Rank Country Gold production
2010 (tonnes)
1 China 345
2 Australia 255
3 United States 230
4 Russia 190
5 South Africa 190
6 Peru 170
7 Indonesia 120
8 Ghana 100
9 Canada 90
10 Uzbekistan 90 TABLE 1: GOLD PRODUCING COUNTRIES (MINING) (2010)
-500
0
500
1000
1500
2000
2500
3000
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
ton
ne Mining
Recycling
Official sector
8
The top ten of gold producing countries (according to the United States Geological Survey, USGS) is
presented in TABLE 1 (for 2010). An interesting number one is China because often the conception is
that South Africa is the number one in gold mining.
Rank Country Jewellery consumption
2010 (tonnes)
1 India 657.4
2 China 451.8
3 USA 128.6
4 Turkey 70.6
5 Saudi Arabia 67.6
6 Russia 66
7 UAE 61.6
8 Egypt 53.4
9 Italy 34.9
10 Indonesia 32.8 TABLE 2: JEWELLERY CONSUMPTION (2010)
An interesting fact that arises when comparing the top gold producing countries with jewellery
consumption is the high demand coming from India (WGC). India uses for jewellery 657.4 tonnes of
gold per year but has almost no gold production at all. Importing all this gold will affect the trade
account of India.
Mankind in its entire history, has excavated 165600 tonnes of gold, of which more than half is used in
jewels (FIGURE 2). Another 29600 tonnes is kept for investment reasons by investors, while central
banks own another 28900 tonnes. According to USGS there is still 50000 tonnes of gold beneath the
earth crust that can be excavated in an economical viable way. This amount changes with changing
gold prices and changing production costs and techniques. If the price of gold stays the same and
mankind keeps on excavating at a paste of 2500 tonnes per year, gold will be depleted in less than 20
years.
FIGURE 2: GOLD IN CIRCULATION (TOTAL=165600 TONNES)
Industry: 19800 tonnes
Investment: 29600 tonnes
Juwellery: 83700 tonnes
Official sector: 28900 tonnes
Other: 3600 tonnes
9
3.Literature Review In the literature, different studies have been conducted that try to explain and model the gold price
and its determinants. These studies can be categorized into four main groups. The first approach
tries to model the variation in the gold price in terms of variation in the main macroeconomic
variables, such as exchange rates, world income, political shocks… The second approach will focus
more on gold as a way to speculate. This approach will study the rationality or irrationality of gold
price movements. The third approach considers investing in gold more as a way to hedge a portfolio
against inflation. They often distinguish short term and long term movements. The last and most
recent group of studies are these studies that use artificial intelligence to model the gold price.
3.1.Macroeconomic approach Dooley, Isard and Taylor (1995) conducted a variety of empirical tests to prove that the gold price has
explanatory power in predicting exchange rate movements in addition to the movements in
monetary fundamentals and other variables that enter standard exchange rate models. The core idea
of their research is that they consider gold to be an asset without a country. Any shock that reduces
the attractiveness of net claims on A will increase the demand for other assets like gold or net claims
on B. This will lead to changes in the equilibrium prices.
Sjaastad and Scacciallani (1996) investigated the relationship between the gold price and the foreign
exchange market for the period 1982-1990. The main result of their study is the significant influence
of an appreciation or depreciation of a European currency on the gold price. They also found that the
US dollar only has a minor impact on the gold price. According to the authors fluctuations in the real
exchange rates among the major currencies explain almost half of the variation in the gold price.
3.2.Speculation approach Diba and Grossman (1984) conducted a research to find out if there were rational bubbles in the
relative gold price. They conclude with three findings. First of all they found that de relative price of
gold, meaning relative to a general index of prices of goods and service, corresponds with
fundamental market variables, like real interest rates. Their second conclusion is the fact that the
process generating the first differences of market fundamentals is stationary. Their last conclusion is
that actual gold price movements do not involve rational bubbles.
Pindyck (1993) tried to develop a present value model for the gold price based on futures. For
commodities like copper, lumber and oil, the model performed well, but for gold the results of the
present value model were rather poor. According to Pindyck, this is due to the fact that gold has not
the same convenience yield like the other commodities.
10
3.3.Inflation hedge approach Kolluri (1981) did some empirical research to find out the role of gold for inflation hedging. To do so,
Kolluri followed two approaches. The first approach modelled the relationship between the return on
gold investments and the anticipated inflation or variants of it estimated via the iterative procedure
of Cochrane-Orcutt. For this, Kolluri used monthly gold prices from 1968-1980. In the second
approach Kolluri first modeled the return of shares and bonds between 1926-1978, which he then
used as minimum required return for gold investments. Kolluri concluded that gold was a good hedge
against inflation in the period 1968-1980.
Sherman (1983) used multiple regression to identify the important determinants of the gold price.
With an R² of 76.03% he found that the tension index, the interest rate, the US trade weighted
exchange rate, the GDP, the excess liquidity and the unanticipated inflation were important drivers
for the gold price. However when Sherman tried to correct the serial correlation via an adjustment
procedure, certain variables became insignificant, like the tension index while others became more
significant, like unanticipated inflation.
Via simulation, Moore (1990) was able to earn an annual return on gold investments of 18-20% by
tracking the buy and sell signals of the inflation index of Columbia University (during 1970-1988). If
an investor would have followed a buy and hold strategy during the same period, the return would
be 13.9% while if he had held stocks and bonds, he would have earned 11.2% or 8.7% per year.
Mahdavi and Zhou (1997) performed research to find out if gold or other commodities were
indicators for inflation. They also investigated the possibility to predict the inflation via error-
correction models. They concluded that commodities are good indicators for inflation since they are
co-integrated with the consumer price index. According to the authors, the weak results concerning
the prediction of inflation are caused by the strong volatility of the gold price in the short run.
Ghosh, Levin, Macmillan and Wright (2004) studied the long term relation between the gold price
and the average price level in the US. In their research they also looked for an explanation for the
short run deviations from this long run relationship. For their research they used the monthly gold
prices from 1976 until 1999. The long run relationship was found via cointegration regression
techniques. This long run relationship had an elasticity of one. They also found via error-correction
models that the deviations in the short run are caused by changes in the interest rates to lease gold,
the beta of gold and the US/World exchange rate.
Ranson and Wainwright (2005) concluded that resources and more specifically gold, were the best
hedge against inflation. Their research proved that an increase in the gold price is on average two to
three times larger than an increase in inflation. According to the authors, the optimal amount to
11
hedge your portfolio against inflation, was to invest 18% of your portfolio in gold. The only drawback
could be a decrease in inflation or a deflation, which could affect your return on the entire portfolio.
Levin and Wright (2006) proved that there was a long run relationship between the price of gold and
the average price level in the US for the period 1975-2006. They showed that an increase of 1% in the
average US level caused an increase of 1% in the gold price. They modelled the long run relationship
via cointegration and the short run dynamics via error correction models. The main drivers of the
gold price in the short run were changes in US inflation, inflation volatility, credit risk, the interest
rate to lease gold and the US trade weighted exchange rate. They also proved that 66% of a deviation
of the long run relationship will disappear within five years after the shock that caused the deviation.
This research added value to the research of Ghosh and others (2000) by including a variable
“political risk in oil producing countries”.
In his research, Lampinen (2007) tried to confirm the results from the study of Levin and Wright
(2006) but he extends the research period with one year. Lampinen found similar results in his
research. The biggest difference was the number of included time dummy variables. Levin and
Wright (2006) only needed 10 time dummy variables while Lampinen needed 19 time dummy
variables. These time dummy variables were added in order to prevent autocorrelation in the
residuals.
3.4.The artificial intelligence approach
This approach is the youngest of the four approaches. However it is maybe not a new approach in
the sense that it defines a new scope of determinants. The newness of this approach lies in the new
techniques that help in modelling the gold price. The determinants that are used in this artificial
modelling are not limited to one of the above approaches.
There are only a limited number of papers available concerning the subject of artificial intelligence in
combination with the gold price. It is however remarkable to see that these papers are often written
by engineers or mathematicians with no or limited knowledge about the drivers of the gold price.
This can be seen in the kind of drivers they use in their models. They often use general determinants
like the S&P 500 index and less, the actual drivers of the gold price.
McCann and Kalman (1994) developed a forecasting mechanism for the gold price based on a simple
recurrent neural network. The simple recurrent neural network was trained to discover turning-
points in the gold price. Based on these turning points, an investor decided to either go short or long
in the gold price. The data that were used to train the recurrent neural network were they daily gold
prices over a period of five years. The authors tested their network on a test period of nine months
after these five years and during these nine months they were able to realize a return of 21%.
12
Mirmirani (2004) found inspiration in the fact that economic theories were not able to offer a
satisfying explanation for the dynamic gold price movements. Previous studies used linear
techniques to model the gold price, which is of course arbitrary. Neural networks in combination
with genetic algorithms offer the advantage that these techniques can be used without having a clear
view on the underlying structure of the problem. The author used back-propagation in combination
with genetic algorithms to predict the gold price. The results showed that the past gold price, 36 days
before, exercised a strong influence on the gold price that needed to be predicted.
Parisi, Parisi and Diaz (2008) used rolling and recursive neural networks to predict sign variation.
Their study showed that rolling networks outperformed the recursive and feed forward networks in
predicting gold price sign variation. The block bootstrap method realized an average sign prediction
of 60.68% with a standard deviation of 2.82%.
3.5.Conclusion Most research is performed in the area of the inflation hedge approach. This thesis is also based on
the inflation hedge approach although some techniques that are used in this thesis, come from the
artificial intelligence approach.
13
4.Determinants of the gold price
4.1.Short run supply and demand Since the value of gold cannot be modelled by discounting future expected cashflows, another
method to value the price of gold needs to be determined. The short run price of gold is determined
by the supply and the demand for gold, while the long run price will move more in line with the
general price level. This section combines the insights from the inflation hedge approach into one
understandable gold price model.
Short run supply
There are different variables that alter the market supply of gold ( in the short run. Buyers get
gold coming from two sources, producers (mining or recycling companies) and central banks
( . Central banks started leasing gold in the early 1980’s. Mining companies can choose to supply
their clients by leasing gold from central banks or by extracting it from their mines. Recycling
companies can choose to supply their customers by leasing gold from central banks or by recycling
gold. So the total supply of gold (FIGURE 3) in the short run equals the gold supplied by producers
plus the gold supplied by central banks.
The quantity that producers supply from their mines or recycling installations to their customers is
determined by the price in an earlier period, because producers react with a lag to an increase in
price. There is a time lag because in general it takes a lot of time to start up a new mine or to build an
new recycling installation. Producers will also supply more to the market coming from their inventory
if prices in the current period are high. The quantity of gold supplied to the market coming from
producers is negatively related to the amount of gold produced to repay for leased gold from a
previous period plus the physical interest rate if this needs be repaid in gold. To conclude, the total
supply to the market coming from production is positively related to the current and lagged gold
prices, negatively related the amount of gold produced to repay leased gold from the previous period
and negatively related to the gold lease rate in the previous period.
Central banks will be leasing gold if they receive a physical interest rate which is at least the
convenience yield plus a compensation for the default risk (Levin and Wright, 2006). The
convenience yield is the benefit a bank has from physically owning the gold. When the market is in
equilibrium, the lease rate is equal to the convenience yield plus a default risk. An increase in the
physical interest rate (lease rate), a decrease in the default risk or a decrease in the convenience
14
yield will increase the gold supplied via leasing from central banks. When prices in the current period
are high, central banks will also be more inclined to sell their gold, which positively affects supply.
FIGURE 3: SHORT RUN SUPPLY (THE EFFECT OF AN INCREASE IN THE STATED VARIABLES HAS EITHER A POSITIVE OR A NEGATIVE IMPACT ON THE SUPPLY)
There is however one important remark before there can be a general conclusion about the
determinants that drive the overall short run supply of gold. Although the current lease rate may
positively affect the supply by central banks, the lagged lease rate causes a negative effect. So the
factors driving the lagged lease rate, like the lagged default risk and lagged convenience yield will
have a positive relationship with current supply, because if these variables were high in previous
periods, central banks were less inclined to lease gold in the previous period. This results in the fact
that less produced gold needs to be used to repay central banks.
Therefore the short run supply is positively affected by the current and lagged values of the gold
price, positively affected by the current gold lease rate, lagged default risk and lagged convenience
yield while it will be negatively affected by the current default risk, current convenience yield and
lagged lease rate.
15
Short run demand
There are different drivers that alter the short run demand for gold ( ). The short run demand for
gold consists out of two categories. First of all there is the so called use demand for gold ( . In this
category, gold is used for jewellery, medals, electrical components… The second category is the asset
demand for gold ( , where one will buy gold for investment reasons. So the total demand for gold
(FIGURE 4) equals the use demand for gold plus the asset demand for gold.
The use demand for gold is affected by two variables. If the price of gold increases in the current
period, use demand for gold will decrease. Gold price volatility in the current period has a negative
impact on the current use demand for gold. If the price of gold is volatile, users will delay their
purchase of gold (The effects of gold price volatility are too short run to be included in this research.).
The asset demand for gold is based on a number of factors that have an impact on the demand side,
like exchange rate, inflation, fear, return on other assets and the correlation with other assets. If the
dollar becomes stronger, foreign investors will be less inclined to buy gold. For these foreign
investors it becomes more expensive to buy gold which is quoted in dollars. If investors expect
inflation to rise, investors will try to counter this by investing a part of their portfolio in gold. Another
variable that has a serious impact on the gold price is fear, during times of economic and financial
turmoil. The more fear is present in the market, the higher the price of gold will rise. Investors will
also buy gold to diversify their portfolio in order to reduce the overall portfolio risk. In such a case,
gold can be an interesting asset because of its negative or even zero beta. In the used data, the
average beta was -0.0765. In times of underperforming stock markets, beta will be more negative
than in more general calm markets, where there will be almost no correlation. If the beta of gold
increases, asset demand will lower but rises again when the beta returns to its average. So the asset
demand is negatively related to the current beta, but positively to the lagged betas.
If an investor buys gold, he precludes interest that he could earn by investing his money in other
assets. The real interest rate is the opportunity cost of holding gold. In general a higher interest rate
will decrease the asset demand for gold, which in turn lowers the price of gold. However in some
cases the gold price and interest rate may move together if the underlying reason is the same, for
example in case of an expected higher inflation.
16
FIGURE 4: SHORT RUN DEMAND (THE EFFECT OF AN INCREASE IN THE STATED VARIABLES HAS EITHER A POSITIVE OR A
NEGATIVE IMPACT ON THE DEMAND)
The short run demand for gold is negatively affected by the gold price, gold price volatility, exchange
rate, real interest rate, current beta, but positively affected by inflation, fear and lagged values of
beta.
Equilibrium
The market price of gold is determined when supply equals demand. As explained above, different
factors drive supply and demand and therefore the price of gold. There exists however an arbitrage
relationship between two of these factors, the gold lease rate and the real interest rate. According to
Ghosh and others (2004) “mines are indifferent between extracting gold now and selling the mined
gold or leasing gold now, selling the leased gold now, investing the proceeds of the sale in a bond,
selling the bond in one year and using the proceeds including interest to pay for extracting the gold
plus the physical interest rate.” So if the cost of extraction rises at the rate of inflation, the real
interest rate equals the gold lease rate.
TABLE 3 and TABLE 4 give not only a general conclusion about the impact of an increase of the
variables on the supply or demand, but also the impact on the gold price. If an increasing factor
Gold price (t): -
Gold price volatility (t): -
(t)
Exchange rate (t): -
Inflation (t): +
Fear (t): +
Real interest rate (t): -
Beta (t): -
Beta (t-1): +
17
causes the supply to increase, it will decrease the price or vice versa. For example an increase in the
current gold lease rate, will increase the supply, which will cause prices to drop. If an increasing
factor causes the demand to increase, it will increase the price or vice versa. For example an increase
in the real interest rate will lower demand and as a consequence, lower demand will cause prices to
drop.
Influence on supply Influence on gold price
Current gold lease rate + -
Lagged gold lease rate - +
Current default risk - +
Lagged default risk + -
Current convenience yield - +
Lagged convenience yield + - TABLE 3: INFLUENCE OF AN INCREASE OF THE DETERMINANTS ON SUPPLY AND PRICE
Influence on demand Influence on gold price
Gold price volatility - -
Exchange rate - -
Real interest rate - -
Current beta - -
Lagged beta + +
Inflation + +
Fear + + TABLE 4: INFLUENCE OF AN INCREASE OF THE DETERMINANTS ON DEMAND AND PRICE
4.2.Factors affecting the gold price in the long run In the long run, the gold price is expected to rise in line with the general price level and as such act as
an inflation hedge. According to Ghosh and others (2004) the main reason behind this moving
together, is because the long run price of gold is related to the marginal cost of extraction. So if the
cost of production increases with inflation, the price of gold will also have to increase with inflation.
In the long run, the gold price will be equal to the marginal cost of extraction if the market is
competitive or the price will be proportional with the marginal cost of extraction if the producers
have market power. In both cases the price will rise at the general rate of inflation.
18
5.Data and research questions
This thesis will cover two main research questions:
1. Compare the modelling and forecasting power of classic linear regression techniques against
more advanced non-linear machine learning techniques.
2. What are the most important determinants of the gold price according to the different
methods that were used for modelling the gold price?
The training and validation period for the different methods will be from November 1990 until
December 2009. The forecasting period will be from January 2010 until December 2011. The reason
for starting on November 1990 is the lack of data for the gold lease rate before this date. The
variables that are used in this thesis are presented in TABLE 5. In TABLE 5 the mnemonics, the source
and the way in which they are calculated, are also listed. TABLE 6 covers the impact on the gold price
of an increase in the different variables from TABLE 5. Three important remarks need to be made
about the different variables. Firstly, from TABLE 3 and TABLE 4 only 1 variable is not included in the
research, the gold price volatility because its impact is expected to be too short run for this research.
The second remark is the fact that there are variables in TABLE 5 that are additional to the ones in
TABLE 3 and 4. However all these variables are in a way connected to the ones in TABLE 3 and 4. The
last remark is the relationship between the gold lease rate, the convenience yield and the default
risk. In this thesis we only use the gold lease rate since as explained, the lease rate is in equilibrium
equal to the default risk plus the convenience yield.
Mnemonic Source Calculation
Gold price (end of month)
World Gold Council /
Gold lease rate Datastream subtract UK GOFO 3months from US
interbank rate 3months
Trade weighted exchange rate index
Datastream /
Beta of gold Datastream regress the monthly gold return on the monthly S&P500 return using a sample
period of 36 months
CPI US Datastream / Inflation US Datastream (CPI(t)/CPI(t-12))-1
Volatility inflation US Datastream GARCH(1,1) volatility modelling CPI World Datastream / Inflation World Datastream (CPI(t)/CPI(t-12))-1 Volatility inflation World
Datastream GARCH(1,1) volatility modelling
CRDP Datastream (Moody's BAA - Moody's AAA)/Moody's
AAA Composite leading indicator
OECD /
TABLE 5: VARIABLES
19
Variables Expected influence on
gold price at time t
+
+
+
+
+
+
+
-
-
+
-
+
+ TABLE 6: THEORETICAL IMPACT ON THE GOLD PRICE OF AN INCREASE IN THE VARIABLES
20
5.1.Description of the variables
In this section I will describe the different variables that have an impact on the gold price. It is
however important to notify that this descriptive analysis may reveal spurious trends or correlations.
The graphs will start on November 1990, because that was the first month that the gold lease rate
was available.
The general price level in the US (US consumer price index) is included as an explanatory variable to
test whether the gold price moves together with the general price level in the long run so that gold
can be considered as a long run hedge against inflation. It is clear from GRAPH 5 that both variables
trend upwards, suggesting that there is a long run relationship and as such gold might be a long run
hedge against inflation. It is however important to see that there are significant deviations between
short run movement in both variables. Gold is therefore probably not a good hedge against inflation
in the short run.
GRAPH 5: PRICE OF GOLD AND CONSUMER PRICE INDEX (CPI) US
The consumer price index of the world (CPI world) is also included in the described long run
relationship to control for the fact that foreign investors also can have an impact on the gold price
movements. In GRAPH 6 we can observe that both variables trend upwards and as consequence gold
might be a long run hedge against inflation. If we compare both CPI’s, we see that the CPI world is
much steeper than the CPI US (GRAPH 7).
4,8
4,9
5
5,1
5,2
5,3
5,4
5,5
5
5,5
6
6,5
7
7,5
8
no
v/9
0
jun
/92
jan
/94
aug/
95
mrt
/97
okt
/98
mei
/00
dec
/01
jul/
03
feb
/05
sep
/06
apr/
08
no
v/0
9
jun
/11
ln C
PI U
S ($
)
ln g
old
pri
ce (
$)
ln nominal gold price
ln CPI US
21
GRAPH 6: PRICE OF GOLD AND CPI WORLD
GRAPH 7: COMPARE CPI'S (11/1990 = 100)
3
3,2
3,4
3,6
3,8
4
4,2
4,4
4,6
4,8
5
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
ln C
PI w
orl
d (
$)
ln g
old
pri
ce (
$)
ln nominal gold price
ln CPI world
100
105
110
115
120
125
130
135
140
145
150
no
v/9
0
dec
/91
jan
/93
feb
/94
mrt
/95
apr/
96
mei
/97
jun
/98
jul/
99
aug/
00
sep
/01
okt
/02
no
v/0
3
dec
/04
jan
/06
feb
/07
mrt
/08
apr/
09
mei
/10
jun
/11
Pri
ce le
vel
CPI US
CPI world
22
If inflation is high, the gold price should also be high. In some periods this relationship holds true, but
in other periods there is no evidence of such a relationship (GRAPH 8 and GRAPH 9). The last decade
inflation has almost always been less than 5% (which is historically speaking low), nevertheless the
price of gold has exploded.
GRAPH 8: PRICE OF GOLD AND INFLATION US
GRAPH 9: PRICE OF GOLD AND INFLATION WORLD
-3%
-2%
-1%
0%
1%
2%
3%
4%
5%
6%
7%
4,5
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
Infl
atio
n U
S
ln g
old
pri
ce (
$)
ln nominal gold price
Inflation US
0%
5%
10%
15%
20%
25%
30%
35%
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
Infl
atio
n w
orl
d
ln g
old
pri
ce (
$)
ln nominal gold price
Inflation world
23
Rising US or world inflation volatility should increase the gold price, but on GRAPH 10 and 11 there
are also periods that show a negative relationship.
GRAPH 10: PRICE OF GOLD AND VOLATILITY OF INFLATION US
GRAPH 11: PRICE OF GOLD AND VOLATILITY OF INFLATION WORLD
0
0,01
0,02
0,03
0,04
0,05
0,06
5
5,5
6
6,5
7
7,5
8 n
ov/
90
jun
/92
jan
/94
aug/
95
mrt
/97
okt
/98
mei
/00
dec
/01
jul/
03
feb
/05
sep
/06
apr/
08
no
v/0
9
jun
/11
Vo
lati
lity
US
infl
atio
n
ln g
old
pri
ce (
$)
ln nominal gold price
Volatility US inflation
0
0,05
0,1
0,15
0,2
0,25
5
5,5
6
6,5
7
7,5
8
no
v/9
0
jul/
92
mrt
/94
no
v/9
5
jul/
97
mrt
/99
no
v/0
0
jul/
02
mrt
/04
no
v/0
5
jul/
07
mrt
/09
no
v/1
0
Vo
lati
lity
wo
rld
infl
atio
n
ln g
old
pri
ce (
$)
ln nominal gold price
Volatility world inflation
24
The composite leading indicator (CLI) of the OECD provides early signals of turning points between
expansions and slowdowns of economic activity. In general the relationship between the composite
leading indicator and the gold price is expected to be positive (GRAPH 12). During good economic
times, demand for jewellery and investments should have to increase which leads to higher prices for
gold.
GRAPH 12: PRICE OF GOLD AND CLI
The US-World exchange rate (ER) is included as a variable to control for gold price movements that
are not caused by changes in the price level in the US or the world. If the dollar becomes stronger,
investment demand by foreign investors will lower which causes prices to drop (GRAPH 13).
GRAPH 13: PRICE OF GOLD AND US-WORLD EXCHANGE RATE INDEX
4,44
4,49
4,54
4,59
4,64
5
5,5
6
6,5
7
7,5
8
no
v/9
0
jun
/92
jan
/94
aug/
95
mrt
/97
okt
/98
mei
/00
dec
/01
jul/
03
feb
/05
sep
/06
apr/
08
no
v/0
9
jun
/11
ln C
LI
ln g
old
pri
ce (
$)
ln nominal gold price
ln CLI
4,2
4,3
4,4
4,5
4,6
4,7
4,8
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
ln E
R in
dex
ln g
old
pri
ce (
$)
ln nominal gold price
ln ER
25
The theory suggests that in case of a decreasing lease rate, gold prices should increase. This seems to
be the case from 2000 onwards, however in other periods, it is more difficult to find evidence for this
relationship. The spikes that can be seen in GRAPH 14 have been caused by sudden increases in the
real interest rate, convenience yield or default risk. As explained above, the real interest rate equals
the gold lease rate. This means for the last couple of years, that as long as real interest rates are
negative, prices of gold will continue to rise because investors are not able to maintain their wealth
by parking money at a bank.
GRAPH 14: PRICE OF GOLD AND GOLD LEASE RATE
The relationship between the gold price and the beta of gold is expected to be negative for the
current values of beta and positive for the lagged values of beta. In some periods evidence is found
for this negative relationship while in others, the relationship is more positive (GRAPH 15).
GRAPH 15: PRICE OF GOLD AND GOLD’S BETA
-1%
0%
1%
2%
3%
4%
5%
6%
7%
8%
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
Go
ld le
ase
rate
ln g
old
pri
ce (
$)
ln nominal gold price
gold lease rate
-0,6
-0,4
-0,2
0
0,2
0,4
0,6
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
Go
ld's
bet
a
ln g
old
pri
ce (
$)
ln nominal gold price
Gold's beta
26
The credit risk default premium (CRDP) can be found in GRAPH 16. It is expected to show a positive
relationship with the price of gold. In periods of lower credit risk, central banks will be willing to lease
more gold which causes gold prices to decrease because of increased supply. During the recent crisis
the CRDP has increased which as the theory suggested, caused a rise in the gold price.
GRAPH 16: PRICE OF GOLD AND CRDP
The flight to safety, meaning investing in gold during periods of turmoil, that took place in recent
years, will be modelled by the CRDP and the volatility in inflation in the US and the world, since these
variables are a kind of proxy for economic and financial risk. During periods of financial crises, the
exchange rate volatility will be higher (Coudert, Couharde, Mignon, 2010) which will lead to a higher
inflation volatility. Especially in emerging markets where there is the fear of floating (Calvo, Reinhart,
2000), a crisis will have a larger impact on inflation volatility because of liability dollarization and
depreciations of the local currency. It is therefore that we expect that a crisis will have more impact
via the volatility of the world inflation on the gold price, than via the volatility of the US inflation.
In order to even better catch the flight to safety aspect of gold, additional variables can be added like
for example a political risk factor. In contrast to the study of Levin and Wright (2006) this thesis does
not include a political risk factor because it was not available for free. However this variable can be
bought from the PRS group.
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
5
5,5
6
6,5
7
7,5
8
no
v/9
0
apr/
92
sep
/93
feb
/95
jul/
96
dec
/97
mei
/99
okt
/00
mrt
/02
aug/
03
jan
/05
jun
/06
no
v/0
7
apr/
09
sep
/10
CD
RP
ln g
old
pri
ce (
$)
ln nominal gold price
CRDP
27
6.Regression
6.1.Technique The technique that will be used to model the gold price, is called cointegration regression.
Cointegration regression is used in economics to model financial series that cannot be modelled by
more basic forms of regression. The cointegration regression technique was developed by Clive W.J.
Granger by the end of the eighties. Granger (1987) proved that applying statistical methods that
were developed for stationary series on non-stationary series resulted in spurious conclusions. A
time series is called stationary if its moments, like average, variance… are independent of the
moment of observation. Granger’s research resulted in the remarkable observation that if individual
time series are non-stationary, certain linear combinations of these non-stationary time series are
stationary. These time series can then be called cointegrated.
When two time series are cointegrated, these two series are related in the long run. So if an event
causes a deviation of this long run relationship, there will be a return to the long run relation. As such
the existence of a long run equilibrium has its implications for the short run behaviour of the
variables. The relationship between two series can be described via a so called error correction
model or Granger representation theorem.
6.2.Results Unit root testing
The dynamic testing procedure of Enders (2004) is first applied to get a clear view if there are any
unit roots. If there is no unit root (the order of integration is zero = I(0)), then the variable is
stationary (TABLE 7).
Variables I(x)
0
0
2
2
1
0
0
1
0
0
1
0 TABLE 7: UNIT ROOT TESTING, WITH X BEING THE ORDER OF INTEGRATION
28
In what follows, we will continue assuming that all variables are I(1) and as consequence are
stationary on first differencing. Ghosh and others (2004) and Levin and Wright (2006) also found
variables with different orders and they also continued under the assumption that all variables are
I(1). According to Ghosh and others (2004) there is nothing wrong with following such a strategy for
four reasons. Unit root tests have low power and practitioners often rely on theoretical arguments to
construct models based on cointegration (Engle and Granger, 1987). The second reason is the most
obvious one. Since this study is following the inflation hedge approach from the literature review
(supra), the possibility of a long run relationship between the price of gold and the general price level
needs to be investigated. An important condition to test for this cointegration is that both variables
need to be non-stationary or I(1). The third is that models that mix variables in levels with variables in
first differences often fail residual tests. The final reason is that such models that mix, are difficult to
interpret in terms of the hypotheses.
Tests for cointegration
Engle-Granger two step approach
Assume the following equation with two time series and
We do not include
since it is more than 95% correlated with
So the long run relationship between the
gold price and the general price level in the US will be researched.
Since both variables are I(1), a cointegration analysis can be executed. This thesis will estimate this
model in levels using Ordinary Least Squares (OLS) and if is I(0), OLS is super consistent. To test
whether is I(0) and as such and
are cointegrated, an Augmented Dickey-Fuller
(ADF) test is performed on the residuals. The Augmented Dickey-Fuller test tests for a unit root in the
estimated residuals using the standard Dickey-Fuller (DF) specification:
As explained above, TABLE 8 shows the estimation of the model in levels using OLS. The following
step consists of an Augmented Dickey-Fuller test on the residuals, as previously explained. The
critical values produced by Eviews cannot be used, instead the appropriate critical values are
calculated via the formula of MacKinnon.
29
Where T is the sample size and the other variables can be found in the table of MacKinnon (1991)
(TABLE 10). The critical value at the 5% significance level equals -3.4. Since -0.599759 (TABLE 9) is
larger than the critical value, the null hypothesis of no cointegration cannot be rejected. According to
the Augmented Dickey-Fuller test, the price of gold and the price level in the US are not cointegrated.
Because there is no cointegration observed, the second step of the Engle-Granger two step
approach, estimate an Error Correction Model (ECM) to analyze the short run dynamics, is not
executed. Some important remarks about the Engle-Granger two step approach:
OLS is biased in finite samples.
OLS is not normally distributed, not even asymptotically.
The ADF test has low power.
The Engle-Granger approach is only capable of handling a single cointegrating vector.
The ECM model only allows for unidirectional causality.
Since the result is not in line with the expectations, a second cointegration test is executed.
Dependent variable
Coefficient of constant -2.9330 (0.00)
Coefficient of 1.7379 (0.00)
Adjusted R² 44% TABLE 8: REGRESSION RESULTS OF THE LONG RUN RELATIONSHIP (P-VALUE BETWEEN BRACKETS)
Dependent variable
Coefficient of constant 0.0009 (0.7527)
Coefficient of -0.0061(0.5493)
Adjusted R² 0.16%
ADF-test statistic -0.5998 TABLE 9: ADF TEST RESULTS ON THE RESIDUALS (P-VALUE BETWEEN BRACKETS)
TABLE 10: TABLE OF MACKINNON
30
Granger representation theorem
The Granger representation theorem is the second cointegration test. It states that if there exists an
error correction model for a set of I(1) variables, then this set of variables is cointegrated. Thus if
and
are I(1) and there exists an error correction model of the general form:
=
+ … +
+
then and
are cointegrated with cointegrating vector (1, ). We do not
include since it is more than 95% correlated with
The direct test for cointegration:
Since the t-test statistic of 1.8 (TABLE 11) is larger than -3.4, which is the critical value calculated via
the table and formula of MacKinnon (TABLE 10), the null hypothesis of no cointegration cannot be
rejected. It is even surprising to see that the coefficient of the correction term is positive where it
should be negative. This signals a model that is more explosive than one that returns to its long term
equilibrium as expected. This cointegration test shows again that there is no cointegration.
Dependent variable
Coefficient of constant -0.0868 (0.0913)
Coefficient of
-0.2593 (0.0004)
Coefficient of
0.0591 (0.1510)
Coefficient of 0.0903 (0.5260)
Coefficient of 1.8560 (0.0143)
Coefficient of -0.0870 (0.8876)
Coefficient of
0.0022 (0.6785)
Coefficient of 0.5613 (0.4745)
Coefficient of -0.7055 (0.0000)
Coefficient of -0.0115 (0.9899)
Coefficient of 0.4961 (0.4055)
Coefficient of
0.01246(0.0722)
Adjusted R² 12.36%
Prob(F-statistic) 0
Durbin-Watson stat 1.9222
T-test statistic of 1.8065 TABLE 11: GRANGER REPRESENTATION THEOREM REGRESSION RESULTS (P-VALUE BETWEEN BRACKETS)
Also for this method some important remarks need to be taken into consideration:
Under this test has a non-normal distribution, meaning that the MacKinnon critical values
need to be used although they are only a proxy.
This method is only capable of handling a single cointegrating vector.
It also assumes unidirectional causality.
31
Even though the Granger representation theorem still has some disadvantages, it is found to have
more power compared to the residual based tests. It avoids common factor restrictions and the
dynamics are not ignored when the cointegrating vector is estimated within the ECM.
The two preceding tests prove that there is no long run relationship between the price of gold and
the general price level in the US. Ghosh and others (2004) also found no conclusive proof of
cointegration and assumed for their research that the two variables were cointegrated. Other
authors like Levin and Wright (2006) did find proof for cointegration however their research was only
until August 2005, meaning that the boom in the gold price was not included. The assumed long run
relationship between the price of gold and the general price level in the US is no longer there. It
appears that since 2002 the gold price is increasing at a faster pace than the general price level in the
US, hence the positive coefficient of the error correction term. Since the gold price is apparently
more determined by short run dynamics, comparing linear regression with artificial intelligence
techniques that are good in capturing such dynamics, might be very interesting.
If there is cointegration between the two variables, the gold price and the price level, then gold is a
long run hedge against inflation. If there is no cointegration, this does not mean that gold is no long
run hedge against inflation. In GRAPH 1 it is clear that if an investor bought gold and kept it in his
portfolio until January 2012 he would almost always be hedged against inflation, except for two
periods. The first period is the boom around the beginning of the eighties while the second period is
the real short term, meaning the last six months. This finding supports the idea that gold is a long run
hedge against inflation while it is more difficult to hedge against inflation in the short run.
If an investor was offered the possibility from January 1971 (when the gold price was free to move)
onwards to invest on a monthly basis in gold for 1, 2, 4, 8 or 16 years (with a sliding window), the
picture looks different (TABLE 12). For example: an investor with a time horizon of 16 years could buy
gold from January 1971 until December 1995 and keep it in portfolio for 16 years or an investor with
a time horizon of 1 year could also buy gold from January 1971 until December 1995 and keep it in
portfolio for 1 year (the 1 year investor can only buy until December 1995 because the results for the
investors with different time horizons need to be comparable). It seems that the investor with the
short run horizon is in 48.33% of the cases protected against inflation, while for the investor with the
16 years horizon this is only 43%. Nevertheless in both cases it is less than 50%, meaning that in less
than 1 out of 2 times an investor was protected against inflation.
32
1y 2y 4y 8y 16y
Sum 145 133 125 101 129
Total 300 300 300 300 300
Hedge? 48.33% 44.33% 41.67% 33.67% 43.00% TABLE 12: GOLD AS A HEDGE (1971-1995)
In TABLE 13 all the months are included, even the ones where the time horizon can no longer be
respected. So an investor with a time horizon of 16 years who buys gold in January 2000 and keeps it
until December 2011 (which is only 12 years) will positively contribute to the hedging result. By
including these numbers the results are more in favour of a long run hedge even though investors
with a time horizon of maximum 1 year do not underperform investors with a team horizon of 8
years.
1y 2y 4y 8y 16y
Sum 276 271 278 276 317
Total 492 492 492 492 492
Hedge? 56.10% 55.08% 56.50% 56.10% 64.43% TABLE 13: GOLD AS A HEDGE (1971-2011)
A suggestion for future work is to apply a Johansen test which is a procedure for testing cointegration
of several variables. This test permits the existence of more than one cointegration relationship
which makes it more general than the two previous tests.
33
Autoregressive distributed lag model (ADL)
ADL models are used to explain one variable from its own past and current and/or lagged values of
other variables. As such the dynamic impact of variables on the gold price can be modelled. It is
however important that the variables are stationary. Since our variables are all I(1), the first
difference is taken to make the variables stationary. There are different reasons why it might be
interesting to include dynamics:
Psychological reasons: investors do not change their habits rapidly
Technological reasons: sometimes technology inhibits investors to change
Institutional reasons: investors might be bound to regulations
A simple example of an ADL model is presented:
Where and are assumed to be stationary and a white noise process. The presented model is
an ADL(1,1), because one lagged value of the variables is included. For this thesis the variables are
the first differences in order to be stationary, so equals , equals
…
In order to know the number of lags, different models are tested. It is however important to know
that all the variables will have the same number of lags, so there will only be ADL(0…0), ADL(1…1)…
models and no mix in the number of lags. The reason for this simplification is the high number of
variables which would complicate everything unnecessarily.
To know the number of lags, the Schwarz criterion for the different models is calculated. The most
negative value of Schwarz shows the ideal number of lags. Once the Schwarz value increases, it keeps
on increasing.
The first model ADL(0…0) gives a Schwarz criterion of -3.37.
The second model ADL(1…1) gives a Schwarz criterion of -3.27.
Since the Schwarz criterion of an ADL(0…0) model is more negative than the Schwarz criterion of an
ADL(1…1) model and the criterion will keep on increasing, the ideal model is one with no lags
included. The model that will be estimated is similar to the ADL(0…0) model except for one variable,
the delta of the lagged gold price that will be included (TABLE 14).
34
Dependent variable
Coefficient of constant 0.0055(0.0441)
Coefficient of
-0.2320 (0.0014)
Coefficient of
0.0590 (0.1539)
Coefficient of 0.0796 (0.5779)
Coefficient of 1.7943 (0.0183)
Coefficient of -0.0541 (0.9302)
Coefficient of
0.0026 (0.6253)
Coefficient of 0.5153 (0.5134)
Coefficient of -0.7146 (0.0000)
Coefficient of -0.0305 (0.9736)
Coefficient of 0.5027 (0.4018)
Adjusted R² 11.45%
Prob(F-statistic) 0
Durbin-Watson stat 1.9122 TABLE 14: ADL REGRESSION RESULTS WITHOUT DUMMY VARIABLES (P-VALUE BETWEEN BRACKETS)
To increase performance and to make sure that the model performs well on residual tests, two
dummy variables are included. The first one is September 1999 because then the Central Bank Gold
Agreement was signed, which caused an enormous increase in the gold price which cannot be
explained by any of the above variables. The second dummy variable is October 2008, the start of the
global financial crisis with the failure of Lehman.
The final model including the dummy variables is presented in TABLE 15.
Dependent variable
Coefficient of constant 0.0055 (0.0354)
Coefficient of
-0.1994 (0.0049)
Coefficient of
0.0492 (0.2145)
Coefficient of 0.1414 (0.3112)
Coefficient of 1.3150 (0.3112)
Coefficient of 0.2505 (0.6765)
Coefficient of
-0.0043 (0.4595)
Coefficient of 0.0211 (0.9778)
Coefficient of -0.5766 (0.0005)
Coefficient of -0.2105 (0.8119)
Coefficient of 0.3037 (0.6002)
Coefficient of Dummy 1999/9 0.1552 (0.0005)
Coefficient of Dummy 2008/10 -0.1408 (0.0016)
Adjusted R² 19.7%
Prob(F-statistic) 0
Durbin-Watson stat 1.8674 TABLE 15: ADL REGRESSION RESULTS WITH DUMMY VARIABLES (P-VALUE BETWEEN BRACKETS)
35
The OLS estimator for the parameters in the ADL model:
Is biased as the error term and the explanatory variables are not independent
Is consistent as long as the error term is uncorrelated with and
Necessary conditions for this to hold true are:
o The error term is white noise, that is not correlated with
o is exogenous such that are not correlated with the error term
So to conclude, the main limitations are that the explanatory variables need to be exogenous and all
variables need to be stationary. In this thesis, the explanatory variables are assumed to be exogenous
while all variables are stationary due to the first differencing.
Two residual tests were executed. The Jarque-Bera test for normality showed that the residuals are
normally distributed at the 5% significance level. The correlogram in levels of the residuals also
revealed that there is no autocorrelation in the residuals.
36
Results and interpretation
TABLE 16 gives an overview of the null hypotheses and the alternative hypotheses based on TABLE 6.
It also shows the observed signs corresponding to the variables, that can be found in TABLE 15. The
p-value for a one sided t-test can also be found in TABLE 16, it is just half of the p-value that can be
found in TABLE 15 where the included p-value is for a two sided t-test. The adjusted R² for the model
is 19.7%, meaning that 19.7% of the variation in the change of the gold price can be explained by
changes in the explanatory variables. The R² is comparable to the one found by Lampinen (2007), he
found an R² of 19.5% for the period September 1989 until August 2006 (however he included 19 time
dummy variables).
In the following section, the five variables that are significant at the 5% level and for which the null
hypothesis can be rejected, will be explained in more detail (TABLE 16). This does not mean that the
other variables are not relevant. Therefore it is preferable to keep these variables included even
though they are not significant. The variables in TABLE 16, that have a different sign than expected,
are the change in the US inflation volatility and the change in the beta of gold. However none of
these variables are significant.
The significant negative coefficient of
causes a kind of mean reversion. It is not because
there is no longer a long run relationship between the price of gold and the general price level in the
US that the gold price can now explode. If the gold price increased in a previous period, it is likely to
decrease in the subsequent period. If there has been an increase of 1% from t-2 to t-1 in the gold
price, it will be likely that there will be a decrease in the gold price with almost 0.2% in the following
period (keeping all other variables constant).
Coefficient of Observed sign of coefficient
p-value
= 0 <0 - 0.0025
= 0 <0 + 0.1073
= 0 >0 + 0.1556
= 0 >0 + 0.0356
= 0 >0 + 0.3383
= 0 <0 - 0.2298
= 0 >0 + 0.4889
= 0 <0 - 0.0003
= 0 >0 - 0.4060
= 0 >0 + 0.3001
Dummy1999Sept = 0 >0 + 0.0003
Dummy2008Okt = 0 <0 - 0.0008 TABLE 16: REGRESSION RESULTS ADL MODEL
37
If the inflation in the US rises with 100 basis points, then the gold price will rise with 1.3% (keeping all
other variables constant). This means that although some people who claim that gold is not a hedge
against inflation in the short run, might be wrong. In TABLE 12 and TABLE 13 it was also clear that in
general, gold is not such a bad hedge against inflation in the short run. However protecting your
wealth in the short run against inflation by going long in gold is difficult because it is important to
pick the right moment. If the inflation in the US rises with 1%, then the gold price will rise with only
0.08% (keeping all other variables constant).
If the dollar index shows a stronger dollar by increasing 1%, the price of gold will decrease with 0.5%
(keeping all other variables constant). If we compare the three previous variables, we see that the
exchange rate has the biggest impact, an increase by 1% causes a decrease of 0.5%.
Both dummy variables show the impact as was expected. The CBGA in September 1999 caused the
gold price to increase with more than 16%. The start of the financial crisis in October 2008 caused a
decrease in the gold price with more than 13%. This negative sign may not seem logical in the first
place, but after the failure of Lehman a lot of hedge funds needed to deleverage or liquidate in order
to meet margin calls on positions. One of the ways to do so, was selling their gold which caused
prices to fall, even though the world was at the beginning of a new crisis.
Error measures
To compare the different techniques we use as an error measure the NMSE.
With being the desired value and the outputted value. Two other error measures that are also
calculated, are the MAE and the hitrate. However these two error measures are only given for the
linear regression part. Because the NMSE is the most often used error measure in time series
modelling, I will only compare the techniques via the NMSE.
With being the desired value and the outputted value. The hitrate is a percentage describing how
often to model could correctly predict an increase or decrease in the gold price.
Predicting
Once the gold price model was established for the period November 1990 until December 2009, it
was used to predict the gold price from January 2010 until December 2011 with the actual values of
the input variables from that period (GRAPH 17).
38
GRAPH 17: PREDICTED VS ACTUAL GOLD PRICE (ADL-MODEL)
As can be seen on GRAPH 17, the predicted gold price is close to the actual gold price. The error
measures corresponding to GRAPH 17 can be found in TABLE 17 in the column with the title “Gold
Price”. The error measures in the other columns will be used to compare these results with the
results of the other methods.
ln gold price Gold price
Normalised ln gold price
NMSE test 0.1287 0.1500 0.1287
HITRATE test 54.17% 54.17% 54.17%
MAE test 0.0448 64.5922 0.0887 TABLE 17: TEST RESULTS FOR THE ADL MODEL (2010-2011)
When comparing the results from estimating (TABLE 18) with these from testing (TABLE 17), it is as
expected that the results from estimating are better than the results from testing. The only exception
is the hitrate where the result from the test period is higher than the one from estimating. It is also
interesting to see that normalizing data does not affect the NMSE measure, the NMSE measure in the
first column equals the NMSE measure in the third column. Surprising to see is the fact that the
hitrate during estimating is less than 50%, this means that the model was unable to correctly specify
(less than 50%) in which direction the gold price would go in the next month.
ln gold price Gold price Normalised ln gold price
NMSE estimation 0.0101 0.0136 0.0101
HITRATE estimation 47.78% 47.78% 47.78%
MAE estimation 0.0322 16.6499 0.0636 TABLE 18: ESTIMATION RESULTS FOR THE ADL MODEL (1990-2009)
The NMSE validation based on 4 folds (parts) during the period 51-180 equals to 0.0119 (for the
normalised ln gold price). An ADL-model has a low validation error if during the period where it was
validated, the gold price moved in a more linear way. However if the gold price moves more
$ 1.000,00
$ 1.100,00
$ 1.200,00
$ 1.300,00
$ 1.400,00
$ 1.500,00
$ 1.600,00
$ 1.700,00
$ 1.800,00
$ 1.900,00
jan
/10
mrt
/10
mei
/10
jul/
10
sep
/10
no
v/1
0
jan
/11
mrt
/11
mei
/11
jul/
11
sep
/11
no
v/1
1
Predicted Gold Price
Actual Gold Price
39
exponential or non-linear, the ADL-model faces more difficulties in getting good results. So an ADL-
model will give poor results if it is estimated on a linear part and tested on a non-linear part.
Robustness check
To check the robustness of the model, the quadratic and cubic forms of the variables were included
in the ADL-model. This resulted in a lower estimation error, namely a NMSE of 0.007, compared to
the situation where they were not included. However the test error increased, going from 0.1287 to
0.18. This can be caused by the fact that there is an in-sample overfit while the prediction out-sample
is worse.
When estimating an ADL-model on the period 1998-2009, the volatility in the world inflation became
also a significant variable, as expected and explained in section 5.1. It shows that the recent crisis is a
reason for the gold price increase.
40
7.Recurrent neural networks Some problems are difficult to solve with classical linear regression techniques. In such cases, it
might be interesting to use more advanced, non-linear machine learning techniques. One subpart of
these non-linear machine learning techniques makes use of neural networks. These are abstract
mathematical models of how the human brain functions. It consists of a number of connected, non-
linear neurons that receive input and pass output to other neurons. In the neurons, the sum of the
incoming values are processed using a non-linear function, in this thesis this will be a hyperbolic
tangent.
In general, neural networks have two categories: feed-forward neural networks and recurrent neural
networks (Jaeger, 2005). Feed-forward neural networks do not have feedback between the neurons
and the layers: there is a direct mapping from the input to the output. In contrast, recurrent neural
networks have feedback, which brings in the aspect of memory.
There are different ways to train a recurrent neural network. Two of these techniques will be
explained in more detail: reservoir computing and backpropagation through time. This chapter is
based on the tutorial of Herbert Jaeger (2005): “A tutorial on training recurrent neural networks
covering BPTT, RTRL, EKF and the echo state network approach”.
7.1.Reservoir computing Reservoir computing is a recent technique to train recurrent neural networks. It uses less
computational power than backpropagation through time. Reservoir computing is a collection of
similar techniques: echo state neural networks, liquid state machines… In this thesis the echo state
neural network will be used.
Technique
This thesis will first explain an easy setup after which certain features that add complexity but also
yield better results will be added. It is important to know that for the recurrent neural networks, all
data was normalized. Normalizing data makes the influence of every variable equal, it also decreases
the likelihood that the neurons will be saturated due to high input values. A second important factor
in reservoir computing is the fact that all reservoirs are initiated randomly and only the output-
weights are trained (FIGURE 5). The fact that only the output-weights are trained can be seen in
FIGURE 5 by the dotted lines. In contrast to the output-weights, the other weights remain fixed and
untrained.
41
Input Reservoir Output
FIGURE 5: EXAMPLE OF A RESERVOIR WITH IN- AND OUTPUT
Before the model is explained, some notations are given:
Input at time k
Reservoir states at time k
Actual output at time k
Predicted output at time k
: Weight matrix from input to reservoirrandom and fixed
Weight matrix between neurons in the reservoirrandom and fixed
Weight matrix from bias to reservoirrandom and fixed
Weight matrix from input to outputtrained
Weight matrix from reservoir to outputtrained
Weight matrix from bias to outputtrained
The matrices , and are first generated randomly. As soon as these matrices are generated
the reservoir states can be determined.
Two remarks can be made for this formula. First of all, the states of the neurons at time k+1 will be
determined by the input at time k+1. Secondly, the function that will be used in this thesis will be the
hyperbolic tangent. After the states are determined, the output-weight matrices are calculated such
that the quadratic error is minimized.
42
Where A is a matrix where each column contains the input, neuron states and bias for the different
observations. As soon as the output weights are determined, the output can be forecasted.
Meta-parameters/features that can be added
Regularization meta-parameter
In order to avoid that the output weights become too big, a regularization meta-parameter can be
added. This is equivalent to adding noise to the reservoir states, it makes a more general model
instead of a model that overfits on the training data. When using a regularization meta-parameter,
the output weights need to be calculated differently.
In order to determine the ideal regularization meta-parameter, cross-validation has to be executed.
When doing cross-validation the training data is divided into N different parts. Then for each
regularization meta-parameter we train on N-1 parts to test on the part that was not included for
training. For every regularization meta-parameter the error measure on the test part is stored. For
every regularization meta-parameter this process is repeated N times, after which an average error is
calculated per regularization meta-parameter. The regularization meta-parameter yielding the
minimum average error measure is the optimal one (in this thesis the error measure is the NMSE).
Input scaling, spectral radius and leak rate
The input scaling determines the degree of non-linearity. When the input scaling is low, the
reservoir is almost linear, while when the input scaling is high, the neurons are saturated and as such
switch between two extreme values which is non-linear.
After initiating , it is rescaled by dividing it by its largest, absolute eigenvalue. After this it will be
multiplied by the spectral radius. The spectral radius is the largest absolute eigenvalue of the matrix
. In general the spectral radius is smaller than or close to one, meaning that the neuron system is
on the edge of stability.
The dynamics of the reservoir can be slowed down by adding a leak rate . The closer the leak rate is
to zero, the slower the system will adapt. If the leak rate equals one, then there is no change
compared to the normal situation.
43
In order to determine the ideal combination of input scaling, spectral radius and leak rate, cross-
validation has to be executed. However if the ideal combination of these three meta-parameters has
to be found, it will be a double cross-validation that needs to be executed, since for every
combination the ideal regularization meta-parameter per part also has to be found.
Warm-up drop
To avoid that transient effects of the states impact the training of the neural network, a warm-up
drop is performed. During the warm-up drop, the first states of the neurons are omitted. To
determine the number of omitted states, the graph describing the activity of the neurons has to be
observed. The optimal number of omitted states is the number where the transient effects
disappear. It is important to know that, the more states are dropped, the less data is available to
train. This can be a crucial factor when only a limited amount of data is available.
Number of neurons
The number of neurons can have an impact on the results. If there is no regularization, an optimal
number of neurons can be found: there is a minimum error for a certain number of neurons because
increasing the neurons further, will increase the error (overfitting). However if there is regularization,
more neurons will yield better results. Nevertheless the advantage of adding more neurons will
decrease. A disadvantage of a large number of neurons is the increasing computational power that is
needed.
44
Results
The reservoir uses 12 input variables to determine the :
,
,
,
,
and . The data ranges from
11/1990 until 12/2011 and these 254 months will be divided into three subparts:
1-50: warm-up drop
51-230: training with double cross-validation
231-254: testing
For the double cross-validation there will be 30 parts (folds) of each six months. The double cross-
validation will be repeated 10 times per number of neurons in order to avoid that accidental results
would make the system spurious. The number of neurons that will be used, will vary from 50 neurons
up to 200 neurons with steps of 50 neurons.
When after the double cross-validation the ideal combination of input scaling, leak rate and spectral
radius is determined, another cross-validation is performed on the entire training set to get the
optimal regularization meta-parameter for testing. During the double cross-validation the
regularization meta-parameter was only determined on 29 sets because there still needed to be one
set available for testing. In contrast, in this case the regularization meta-parameter is determined on
30 sets while the test will be performed on the period ranging from month 231 until 254. Testing will
be performed at least 20 times so that there is an average test and training error plus standard
deviation on the test and training error. Other important remarks have to be made before looking at
the results:
There is no recursive prediction, meaning that the actual gold price from the previous period
will be an input variable instead of the forecasted gold price from the previous period.
The input variables for modelling the gold price at time t, are also observed at time t. In
order to use this model to forecast gold prices, investors will need predictions of these input
variables. In this thesis during testing, the actual observations of the input variables are used
as proxies for the predicted input variables.
As can be seen in TABLE 19 the model is well validated and as expected the validation error
decreases with an increasing number of neurons.
Neurons Average NMSE validation Std NMSE validation
50 0.0133 0.0011
100 0.0131 0.0008
150 0.0128 0.0009
200 0.0125 0.0010 TABLE 19: VALIDATION ERROR RESERVOIR COMPUTING
45
The optimal combinations of meta-parameters (input scaling, leak rate and spectral radius) for the
different number of neurons are presented in TABLE 20.
Leak Rate Input scaling Spectral radius
50 0.0631 0.0156 1.25
100 0.0631 0.0221 1.1
150 0.0316 0.0221 1
200 0.0316 0.0156 1 TABLE 20: OPTIMAL COMBINATION OF META-PARAMETERS FOR RC
It is clear in the gold price model that the leak rate is close to zero meaning that the model will slowly
adapt. The spectral radius is close to one which is common in reservoir computing. The input scaling
for the gold price model is low. Since the input scaling is a degree for non-linearity, a low input
scaling shows that the reservoir is more linear and as such the neurons less saturated.
The average training error is a bit smaller than the average validation error (TABLE 21).
Neurons Average NNSE train Std NMSE train
50 0.0077 0.0014
100 0.0057 0.0017
150 0.0052 0.0015
200 0.0053 0.0015 TABLE 21: TRAINING ERROR FOR RC
When looking at the average test error of 20 tests and its standard deviation (TABLE 22), it appears
that there is too much variation in testing. However the 20 tests show that these results fall apart
into two groups, one group with on average a NMSE of 0.1 and another group where the NMSE is
above two. Before looking into ways to solve this problem, the model that yielded the best results is
presented in TABLE 23 and GRAPH 18.
Neurons Average NMSE test Std NMSE test
50 1.8946 6.9254
100 3.7645 5.5540
150 7.1580 11.4967
200 4.1852 5.9896 TABLE 22: TEST ERROR FOR RC
Neurons NMSE test HIT test MAE test Regularization parameter
50 0.0997 65.2174% 0.0715 0.0032
100 0.2507 65.2174% 0.1263 0.0100
150 0.0988 60.8696% 0.0683 0.0032
200 0.1029 60.8696% 0.0778 0.0032 TABLE 23: TEST ERROR OF BEST MODEL FOR RC
46
GRAPH 18: PREDICTED VS ACTUAL GOLD PRICE FOR 200 NEURONS (RC)
A possible reason for the variation in the test results is the fact that during cross-validation the model
could already recognize the relations between variables that were coming in the future, but were
unseen in the past. If for example the sixth set of thirty was used for validation, the model could
learn from relationships behind number six but that were unseen until set five. This explains why the
model has a low validation error and a non-stable test error.
There are two different ways to get better and more stable test results. The first suggested
improvement is to leave out a certain amount of data after each validation set in order that the
model cannot learn from future patters that are closely located in time to the validation set. A
second way to improve the model is by decreasing the number of folds. If the number of folds are
decreased, the validation set will become larger. This makes it more difficult for the model to learn
from relationships in the short run that are located right behind a validation set. If this statement
holds true, the validation error should increase. To test this hypothesis in the thesis, the number of
folds was decreased from 30 to 4. The results are presented in TABLE 24. As can be seen from TABLE
24, the average validation error is four times larger than the average validation error when 30 folds
were used. This might be an indication that the hypothesis was correct. The test results with four
folds in TABLE 25 show a smaller average NMSE error with also a smaller standard deviation in
comparison when 30 folds were used.
Neurons Average NMSE validation
50 0.0506
100 0.0517
150 0.0460
200 0.0475 TABLE 24: VALIDATION ERROR FOR RC (4 FOLDS)
$ 1.000,00
$ 1.100,00
$ 1.200,00
$ 1.300,00
$ 1.400,00
$ 1.500,00
$ 1.600,00
$ 1.700,00
$ 1.800,00
$ 1.900,00
jan
/10
mrt
/10
mei
/10
jul/
10
sep
/10
no
v/1
0
jan
/11
mrt
/11
mei
/11
jul/
11
sep
/11
no
v/1
1
Predicted gold price
Actual gold price
47
Cross-testing is a technique comparable to cross-validation that can be used to test if the suggested
reason for the variation in the test results, is true. Cross-testing is often suggested in theory but
seldom applied in practice. However for financial time series analysis, cross-testing might have its
use. This technique can help to determine if models learn too much from information located just
behind the validation set.
Neurons Average NMSE test std NMSE test
50 0.3067 0.2750
100 0.3080 0.2727
150 0.4279 0.3784
200 0.4409 0.3727 TABLE 25: TEST ERROR FOR RC (4 FOLDS)
A way to improve the model from reservoir computing is to not only train the output weights but
also training the input and recurrent weights. One of the ways to do this, is called backpropagation
through time. Another advantage of applying backpropagation through time is the fact that it is
possible to determine the most important variables of a model. This feature can help in answering
the second research question, since reservoir computing is not capable of determining the most
important variables in a model.
It would be interesting to research if reservoir computing can add value to classical linear regression
by trying to model the residuals coming from linear regression, using the same input variables that
were used in developing the linear model. If the reservoir computing model can model the residuals,
then the added value of reservoir computing becomes clear, since it is able to help explain things that
were not explained by linear regression.
48
7.2.Backpropagation through time
Backpropagation through time is an often used classical gradient method to train recurrent neural
networks. In comparison to reservoir computing it requires more computational power. However in
general the results it yields are more accurate and stable.
Technique
Before the technique is explained in more detail, some important notations are given (FIGURE 6):
: Weight matrix from neural network and input to outputzero initialised and trained
: Weight matrix from input to neural networkrandomly initialised and trained
: Weight matrix between neurons in the neural networkrandomly initialized and trained (optional)
Input V Recurrent neural network U Output FIGURE 6: EXAMPLE OF A NETWORK WITH IN- AND OUTPUT (BPTT)
Initially a maximum number of iterations is determined. For every iteration until the maximum
number of iterations is reached, the network is trained and the states of the neurons are
determined. Each iteration results in an output and as such in an error which is the deviation from
the desired output. Per iteration, a gradient is calculated of the mean squared error with respect to
the different weight matrices. After calculating the gradient, the different weight matrices are
updated by subtracting the multiplication of the gradient with a learning rate .
In this thesis, backpropagation through time is applied in combination with cross-validation. The
training data is, as explained in the reservoir computing part, divided in different parts, where one
will serve as a validation set. For every validation fold, , and are reinitialised. Since there is a
gold price prediction for every iteration, there is also a validation error available. The optimal
49
number of iterations is the number where the validation error is minimal. This is because there can
be overfitting, which results in an increasing error when increasing the number of iterations.
The optimal combination of input scaling, learning rate, spectral radius and leak rate will be
determined by choosing the combination where the validation error for the optimal number of
iterations is minimal. The described process with the optimal meta-parameter combination is then
repeated on the entire training set, so there is no distinction between training and validation, until
the optimal number of iterations is reached. For the optimal number of iterations the gold price will
be predicted on the test set.
Results
The neural network uses 12 input variables to determine the :
,
,
,
,
and . The data ranges
from 11/1990 until 12/2011 and these 254 months will be divided into three subparts:
1-50: warm-up drop
51-230: training with double cross-validation
231-254: testing
For the cross-validation there will be four parts of each 45 months. The number of neurons that will
be used in this thesis is 100. The ideal combination of input scaling, leak rate, learning rate and
spectral radius is found where the validation error for the optimal number of iterations is minimal.
Testing will be performed at least five times so that there is an average test and training error plus
standard deviation on the test and training error. The two remarks that were made for reservoir
computing, are also important for backpropagation through time.
Backpropagation through time will be executed in three different ways. The first model will take the
optimal input scaling, spectral radius and leak rate from the best model in reservoir computing. This
model will only optimize and . The recurrent weights are not yet trained since they are also not
trained in reservoir computing. The second model will perform cross-validation for different meta-
parameters to get the optimal combination of input scaling, spectral radius, learning rate and leak
rate. This model will also only optimize and . The third model will perform cross-validation to get
the optimal combination of input scaling, spectral radius, learning rate and leak rate. However this
model will not only optimize and but also .
50
Optimal meta-parameters coming from RC without training (BPTT-model 1)
The optimal meta-parameters that were used in this model, were the optimal meta-parameters
coming from reservoir computing with 200 neurons and 30 folds:
Learning rate = 0.0005
Spectral radius = 1
Leak rate = 10^-1.5 = 0.03162
Input scaling = 2^-6 = 0.01563
The average performance of the model with the input meta-parameters coming from reservoir
computing, is presented in TABLE 26. The standard deviation on the different NMSE’s is presented in
TABLE 27. TABLE 26 and TABLE 27 show that the average result is in line with what was found in
reservoir computing when four folds were used. However in this specific case the best result of
backpropagation through time with four folds (TABLE 28) does not reach the NMSE test measure of
0.1 as was reached in the best case with reservoir computing on 30 folds (TABLE 23).
Number of neurons 100
Average NMSE test 0.5816
Average NMSE train 0.0066
Average NMSE validation 0.0468 TABLE 26: ERROR MEASURES FOR BPTT-MODEL 1
Number of neurons 100
Std NMSE test 0.1141
Std NMSE train 0.0004
Std NMSE validation 0.0083 TABLE 27: STANDARD DEVIATION ON ERROR MEASURES FOR BPTT-MODEL 1
Number of neurons 100
NMSE test 0.4559
NMSE train 0.0061 TABLE 28: ERROR OF THE BEST BPTT-MODEL 1
In GRAPH 19 the actual en predicted gold price are presented. The predicted gold price comes from
the best model (TABLE 28).
51
GRAPH 19: PREDICTED VS ACTUAL GOLD PRICE FOR 100 NEURONS (BPTT-MODEL 1)
As described earlier in this thesis, a major advantage of backpropagation through time is its ability to
demonstrate the importance of different variables. Backpropagation through time is one of the
techniques a researcher can use to indicate if an explanatory variable has a more linear relationship
with the dependent variable or a more non-linear relationship.
To look how important the presence of the neural network, previous gold price and other input
variables is, the standard deviation on the part of the gold price that was predicted by one of these
three categories, is calculated. TABLE 29 shows that the impact of the neural network on the
predicted gold price is even higher than the impact of the previous gold price on the predicted gold
price. This proofs that there is a substantial amount of gold price prediction that comes from the
neural network which contains non-linear neurons.
Std in predicted gold price
Impact of neural network 0.4419
Impact of previous gold price 0.3702
Impact of other input variables 0.3077 TABLE 29: IMPACT OF THE DIFFERENT CATEGORIES ON THE PREDICTED GOLD PRICE FOR BPTT-MODEL 1
It is also interesting to see which variables impact the predicted gold price via which way. There are
two options for a variable to have an impact on the predicted gold price. The first way is via the
neural network, . The impact that is caused by the neural network is substantial as showed by
TABLE 29. Variables can however also have a direct impact on the predicted gold price without
passing through the neural network (a part of , which is a linear relationship since there are no
neurons involved. In GRAPH 20 the importance of the 12 independent variables via the direct way is
presented. We assume that the importance of a variable is approximated by the norm of the vector
$ 1.000,00
$ 1.100,00
$ 1.200,00
$ 1.300,00
$ 1.400,00
$ 1.500,00
$ 1.600,00
$ 1.700,00
$ 1.800,00
$ 1.900,00
jan
/10
mrt
/10
mei
/10
jul/
10
sep
/10
no
v/1
0
jan
/11
mrt
/11
mei
/11
jul/
11
sep
/11
no
v/1
1
Actual gold price
Predicted gold price
52
representing the variable in the model. In TABLE 30 the variables belonging to the different numbers
are shown.
1
2
3
4
5
6
7
8
9
10
11
12
TABLE 30: INDEPENDENT VARIABLES
GRAPH 20: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE DIRECT WAY U (BPTT-MODEL 1)
The three variables having an important direct impact are the gold price in the previous period, the
dollar-world exchange rate and the CPI level of the world (GRAPH 20).
The four variables having an important impact via the neural network are the beta of gold, the
inflation in the US, the dollar-world exchange rate and the gold lease rate (GRAPH 21).
GRAPH 21: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE INDIRECT WAY V (BPTT-MODEL 1)
0,001
0,01
0,1
1
10
100
1 2 3 4 5 6 7 8 9 10 11 12
Importance (via U)
0
0,5
1
1,5
2
2,5
3
3,5
1 2 3 4 5 6 7 8 9 10 11 12
Importance (via V)
53
Optimizing the different meta-parameters via cross-validation without training (BPTT-model 2)
Cross-validation for different meta-parameters, resulted in an optimal combination of meta-
parameters that yielded the lowest validation error. The optimal combination of meta-parameters:
Learning rate = 0.0005
Spectral radius = 1.2
Leak rate = 1/125 = 0.008
Input scaling = 0.2
The average performance during training, validation and testing is presented in TABLE 31. It is
interesting to see that the average NMSE during the test period is as low as the best model of
reservoir computing. Another positive aspect of this model is its low standard deviation for the
different NMSE’s (TABLE 32). This model is good at predicting the gold price but on top is capable of
doing this in a consistent way (low standard deviation).
Number of neurons 100
Average NMSE test 0.0927
Average NMSE train 0.0071
Average NMSE validation 0.0533 TABLE 31: ERROR MEASURES FOR BPTT-MODEL 2
Number of neurons 100
Std NMSE test 0.0013
Std NMSE train 0.0001
Std NMSE validation 0.0014 TABLE 32: STANDARD DEVIATION ON ERROR MEASURES FOR BPTT-MODEL 2
The outcome of the best model is presented in TABLE 33 and GRAPH 22.
Number of neurons 100
NMSE test 0.0908
NMSE train 0.0070 TABLE 33: ERROR OF THE BEST BPTT-MODEL 2
54
GRAPH 22: PREDICTED VS ACTUAL GOLD PRICE FOR 100 NEURONS (BPTT-MODEL 2)
The average optimal number of iterations where the validation error is minimal equals 12027 with a
standard deviation of 1014.4. As can be seen in TABLE 34, the impact of the previous gold price
increased in comparison with the previous model. The standard deviation went up from 0.3702 to
0.4356. At the same time the impact of the eleven other input variables decreased to 0.2287.
Std in predicted gold price
Impact of neural network 0.4304
Impact of previous gold price 0.4356
Impact of other input variables 0.2287 TABLE 34: IMPACT OF THE DIFFERENT CATEGORIES ON THE PREDICTED GOLD PRICE FOR BPTT-MODEL 2
The variables with the highest importance to impact the gold price in the direct way are the same as
in the previous model. So the three variables having an important direct impact are the gold price in
the previous period, the dollar-world exchange rate and the CPI level of the world (GRAPH 23). TABLE
30 gives an overview of the variables corresponding with the different numbers.
GRAPH 23: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE DIRECT WAY U (BPTT-MODEL 2)
$ 1.000,00
$ 1.100,00
$ 1.200,00
$ 1.300,00
$ 1.400,00
$ 1.500,00
$ 1.600,00
$ 1.700,00
$ 1.800,00
$ 1.900,00
jan
/10
mrt
/10
mei
/10
jul/
10
sep
/10
no
v/1
0
jan
/11
mrt
/11
mei
/11
jul/
11
sep
/11
no
v/1
1
Actual gold price
Predicted gold price
0,0001
0,001
0,01
0,1
1
10
100
1 2 3 4 5 6 7 8 9 10 11 12
Importance (via U)
55
The four most important variables to influence the gold price via the neural network, meaning the
non-linear system, are somewhat different compared to the previous model. The first variable that
has an important impact, is the gold price in the previous period. Apparently this variable does not
only have a significant direct impact but also an important indirect one. The second variable is the
price level in the US. The third and fourth variable are the gold lease rate, which is a proxy for the
real interest rate and the volatility of the world inflation (GRAPH 24).
GRAPH 24: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE INDIRECT WAY V (BPTT-MODEL 2)
0
5
10
15
20
1 2 3 4 5 6 7 8 9 10 11 12
Importance (via V)
56
Optimizing the different meta-parameters via cross-validation in combination with training
(BPTT-model 3)
Cross-validation for different meta-parameters, resulted in an optimal combination of meta-
parameters that yielded the lowest validation error. The optimal combination of meta-parameters:
Learning rate = 0.0005
Spectral radius = 1.2
Leak rate = 1/125 = 0.008
Input scaling = 0.04
The average performance during training, validation and testing is presented in TABLE 35. Since the
weights inside the neural network are also trained, it is expected that this model should have to
return a lower validation error than the previous one. In TABLE 35 the average validation error equals
0.0356 which is lower than the average validation error of TABLE 31, 0.0533.
Number of neurons 100
Average NMSE test 0.1059
Average NMSE train 0.0072
Average NMSE validation 0.0356 TABLE 35: ERROR MEASURES FOR BPTT-MODEL 3
The results in TABLE 35 also have a low standard deviation which proves that this model is stable,
just like the previous one (TABLE 36).
Number of neurons 100
Std NMSE test 0.0071
Std NMSE train 0.0000
Std NMSE validation 0.0007 TABLE 36: STANDARD DEVIATION ON ERROR MEASURES FOR BPTT-MODEL 3
The NMSE on the train and test set of the best model can be seen in TABLE 37. The corresponding
predicted gold price for this model is presented in GRAPH 25.
Number of neurons 100
NMSE test 0.0947
NMSE train 0.0071 TABLE 37: ERROR OF THE BEST BPTT-MODEL 3
57
GRAPH 25: PREDICTED VS ACTUAL GOLD PRICE FOR 100 NEURONS (BPTT-MODEL 3)
The average optimal number of iterations where the validation error is minimal equals 19111 with a
standard deviation of 811.1. Because in this model, also the weights in the neural network need to
be trained, it takes more iterations to reach to optimal point in comparison with the previous two
models. TABLE 38 shows that in this model the impact of the eleven other input variables becomes
larger compared to the previous model. There is an increase of almost 15%. The impact of the neural
network remains almost stable even though the recurrent weights are trained. There is only a minor
increase of only 2.7% compared to the situation where is not trained. The impact of the previous
gold price decreases, although not substantially.
Std in predicted gold price
Impact of neural network 0.4423
Impact of previous gold price 0.4161
impact of other input variables 0.26 TABLE 38: IMPACT OF THE DIFFERENT CATEGORIES ON THE PREDICTED GOLD PRICE FOR BPTT-MODEL 3
The variables with the highest importance to impact the gold price in a direct way, are the same as in
the previous model. So the three variables having an important direct impact are the gold price in the
previous period, the dollar-world exchange rate and the CPI level of the world (GRAPH 26). An
overview of the corresponding variables with the different numbers can be found in TABLE 30.
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
jan
/10
mrt
/10
mei
/10
jul/
10
sep
/10
no
v/1
0
jan
/11
mrt
/11
mei
/11
jul/
11
sep
/11
no
v/1
1
Actual gold price
Predicted gold price
58
GRAPH 26: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE DIRECT WAY U (BPTT-MODEL 3)
Three of the four variables that have an important indirect impact also had an import impact in the
previous model even though then the recurrent weights were not yet trained. The previous model is
already capable of determining the more important variables. The three variables are the price level
in the US, the gold lease rate and the volatility of the world inflation. The fourth variable to have an
important impact, is the inflation in the US (GRAPH 27).
GRAPH 27: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES VIA THE INDIRECT WAY V (BPTT-MODEL 3)
0,0001
0,001
0,01
0,1
1
10
100
1 2 3 4 5 6 7 8 9 10 11 12
Importance (via U)
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12
Importance (via V)
59
8.Gaussian processes for machine learning
8.1.Technique Gaussian processes is another machine learning technique to model the gold price. Via Gaussian
processes, certain non-linear relationships between variables can be modelled, where linear
regression would face difficulties in modelling. This chapter will first explain the concept of Gaussian
processes in a qualitative way, after which it will be presented in a quantitative, generalising manner.
This chapter is based on the book “Gaussian Processes for machine learning” by Carl Edward
Rasmussen and Christopher K.I. Williams.
To explain the concept of Gaussian processes, I will only model the relationship between two
variables, y and x. For each data point x, we generate a Gaussian, centred on x. The Gaussian process
then approximates y by taking a linear combination of said Gaussians. The objective of Gaussian
processes is to minimize the mean squared error between every outputted value and the desired
output. It is in fact similar to more classical linear regression. The Gaussian process can then be used
to predict output values for new values of x. In GRAPH 28 the grey line shows how to get the
predicted value and compares it with the actual value of y.
GRAPH 28: GAUSSIAN PROCESSES (DOTTED LINE: INDIVIDUAL GAUSSIANS; FULL LINE: LINEAR COMBINATION OF GAUSSIANS)
The process when having multiple independent variables is similar to the one described above. The
only thing that changes, is the number of dimensions.
60
If there are N data points available for training, the relation between y and x: y = can be
determined by using the formula’s below. Assuming that the number of independent variables is ,
the formula’s are:
With being point i in the training set and being a new point. The different values of can be
found by minimizing the MSE. A Gaussian process is specified by a mean function and a
covariance function . The covariance function that will be used in this thesis, is the
multivariate Gaussian.
With being the component j of . To optimize the hyper-parameter of the covariance function,
automatic relevance determination will be used. The objective of using this technique, is to get a
clear view on the importance of a dimension (Rasmussen & Williams, 2006).
8.2.Results The model uses 12 input variables to determine the
:
,
,
,
,
and . The data ranges from
11/1990 until 12/2011 and these 254 months will be divided into two subparts:
1-230: training with cross-validation
231-254: testing
For the cross-validation there will be 5 parts of each 46 months. The optimal initial starting value of
the hyper-parameter for the covariance function will be determined via cross-validation. Cross-
validation is executed to avoid local optima.
The training results for the Gaussian processes (TABLE 39) are better than the training results coming
from linear regression, TABLE 18. This is as expected, since Gaussian processes include the training
data in their solution.
NMSE test 0.6156
NMSE train 0.0081
NMSE validation 0.0263 TABLE 39: RESULTS FOR GAUSSIAN PROCESSES
61
The test NMSE is however somewhat disappointing since it is underperforming most of the other
gold pricing models. This is due to the fact that the dynamics that can be observed during the last
two years were not present in the training set.
In GRAPH 29 one can compare the actual gold price against the predicted gold price for the test
period. The model underestimated the gold price during testing. This points to the fact that the
recent surge in the gold price of the last two years is extraordinary and unexpected.
GRAPH 29: PREDICTED VS ACTUAL GOLD PRICE (GP)
The importance of the different variables is depicted in GRAPH 30. The higher a bar is, the higher the
importance of the variable. The importance of a variable is equal to the maximum average of the
logarithm of the length scale that was found minus the average logarithm of the length scale for a
variable. An overview of the different variables can be found in TABLE 30. According to this model,
the most important variables are the gold price in the previous period, the beta of gold, the US-world
exchange rate, the gold lease rate and the price level in the world. It is interesting to see that the all
the models from this thesis have a similar idea about the importance of the different variables.
$ 1.000,00
$ 1.100,00
$ 1.200,00
$ 1.300,00
$ 1.400,00
$ 1.500,00
$ 1.600,00
$ 1.700,00
$ 1.800,00
$ 1.900,00
jan
/10
mrt
/10
mei
/10
jul/
10
sep
/10
no
v/1
0
jan
/11
mrt
/11
mei
/11
jul/
11
sep
/11
no
v/1
1
Actual gold price
Predicted gold price
62
GRAPH 30: IMPORTANCE OF DIFFERENT INDEPENDENT VARIABLES (GP)
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12
Importance
63
9.Discussion
This part of the thesis will respond to the two research questions by relying on the findings from the
previous chapters.
1. Compare the modelling and forecasting power of classic regression techniques against more
advanced non-linear machine learning techniques.
To compare the modelling and forecasting power of different techniques, one should, in theory look
at the average test errors since this is the most objective way. However since no cross-testing was
performed, a better option is to compare the average validation error even though there might be
some overfitting on the meta-parameters. The validation error is validated on the entire training set,
as such there is less chance involved when having a low error on the test period. If only the results of
the test period would be taken into account, it would be possible to achieve an excellent test result,
being just a lucky shot. In this thesis, also the result on the test period is taken into account to check
for the robustness of a certain model, in order to analyse whether the technique also performs well
on unknown data. In TABLE 40 an overview of the different error measures including their standard
deviation is presented. A naive model is included, which relies only on the gold price from the
previous time step to predict for the current time step.
Naive
ADL-4 folds
RC-30 folds
RC-4 folds
BPTT-model1
BPTT-model2
BPTT-model3
Gaussian processes
Average NMSE test 0.1479 0.1287 4.1852 0.4409 0.5816 0.0927 0.1059 0.6156
Best NMSE test NA NA 0.1029 0.3727 0.4559 0.0908 0.0947 NA
Std NMSE test NA NA 5.9896 0.3727 0.1141 0.0013 0.0071 NA
Average NMSE validation 0.0120 0.0119 0.0125 0.0475 0.0468 0.0533 0.0356 0.0263
Std NMSE validation NA NA 0.001 0.0045 0.0083 0.0014 0.0007 NA TABLE 40: DIFFERENT ERROR MEASURES FOR THE DIFFERENT MODELS (BEST CELL PER LINE IN GREY)
As can be seen in TABLE 40, the ADL-model, which represents the classic regression technique,
performs adequately, having a good test error compared to the other techniques and the best
validation error. The ADL-model performs well on the validation error measure if, during the period
where it is validated, the gold price moves in the same way as when estimated. However if the gold
price moves more exponential or non-linear during the testing/validating period but was estimated
on a linear part, the ADL-model faces more difficulties in achieving good results. The ADL-model is
based upon the underlying assumption of a linear system, this is not the case when using recurrent
neural networks, explaining the good result of the RC and BPTT-models on the test period. Since
during the test period (2010-2011) the gold price exploded, RC and BPTT perform well and face less
difficulties in such circumstances. There are three important advantages that make the ADL-model so
popular. The first reason is that this technique is easy to understand and to implement. A second
64
advantage is the low computational power it demands. For a researcher it is possible to obtain a
result within a few minutes without requiring serious demands regarding computational power. The
third and most important reason, is the easy interpretability of the model. If for example the dollar
appreciates by 1%, the gold price decreases with 0.5%. This kind of conclusions are difficult to make
using the other techniques. Nevertheless, next to these advantages there are some disadvantages. A
first one is that multiple conditions need to be satisfied before running the regression. Another
disadvantage is the inability of the technique to deal with non-linear effects. To overcome these
disadvantages, this thesis uses more advanced, non-linear machine learning techniques to model the
gold price.
The first applied machine learning technique is reservoir computing. A validation error of 0.0125 is
obtained when using 30 folds. The test results are not robust, given the high standard deviation. A
possible reason for this is that during cross-validation it is easy to recognize relations situated around
the validation set and achieve a good validation error. Though, on the test set, such information is
missing, since there are no data available after the test period, which makes it harder to obtain a
good result for the test period. This thesis suggests two ways to solve the problem. The first and best
way to do so, is by omitting a certain number of data points situated right behind the validation set,
making it more difficult to just interpolate for the validation set. A second option is to decrease the
number of folds. If the number of folds is lower, the number of data points per fold increases, making
it difficult to just extract certain relations. As can be seen in TABLE 40 the validation error is higher
when using less folds. The latter is to be expected since it is now more difficult to learn from unseen
relationships. The average test result also decreases, however it is still larger than the test error
resulting from linear regression.
To get an idea of the importance of the different variables and to train not only the output weights,
backpropagation through time is applied. The first model of backpropagation through time takes the
optimal meta-parameters coming from reservoir computing with 30 folds. This model only trains the
input and output weights, not the recurrent weights. The validation and test errors are somewhat
comparable to the result in reservoir computing with 4 folds. The high error confirms once again that
the cross-validation in reservoir computing is not performed well. Acknowledging this, it seems more
interesting to optimize the meta-parameters within the model itself instead of using the reservoir
computing meta-parameters. The second BPTT-model optimizes the meta-parameters without
training the recurrent weights. The results for the validation and test errors are good compared to
the other techniques. The third BPTT-model optimizes the meta-parameters and trains the recurrent
weights. This model has the lowest validation error of the three BPTT-models in combination with a
65
good test result. A disadvantage of training the recurrent weights as well, is that there is a higher
number of optimal iterations needed, implying that more computational power is required.
The validation error for Gaussian processes (GP) is good and outperforming backpropagation through
time. However the test result is somewhat disappointing. The predicted gold prices are too smooth
compared to the actual ones. This is caused by the cross-validation, as explained before. The
predicted gold prices for the last two years, according to GP, are lower than the actual gold prices.
During these two years an increase in the gold price of more than 50% took place. Based upon the
historical relationship between the input parameters and the gold price in the training period and the
input parameters in the test period, GP predicts lower gold prices. This confirms the believes that
certain relationships between the gold price and some input variables were, in the lost two years, no
longer present or that the gold price surged, induced by a huge amount of fear. Relying on this
insight, one can pose the question whether the current gold prices are overvalued?
Based upon the test and validation error, it can be concluded that the best technique is
backpropagation through time with optimizing the meta-parameters and training the recurrent
weights. Reservoir computing is promising but only when it is properly executed. Furthermore, linear
regression is a good technique, since it gives good predictions and is easy to understand and
interpret. An important remark concerning linear regression is that during periods without any
linearity, the technique performs worse than non-linear techniques.
Two important suggestions can be made when applying these techniques in practice. A first
recommendation is to use as many techniques as possible when allowed by the circumstances
(timing). This way different techniques can be used as a kind of confirmation on how the gold price
will behave, which is interesting when there is a lot of money at stake. When conditions do not allow
for the previously described approach, a second suggestion is to consider and analyse all the
different techniques in order to pick the most appropriate one. Each technique has its advantages
and disadvantages, implying that choosing the right technique for a specific problem can improve the
results. To put it metaphorically: “You do not use a butchers knife to put butter on your sandwich!”.
GRAPH 31 presents the actual and predicted gold prices.
To conclude this section, the different techniques are ranked according to their use of computational
power (TABLE 41). The computational power used by a certain technique is approximated by the
time a technique requires to generate a solution.
1 ADL-Model
2 Gaussian Processes
3 Reservoir Computing
4 BPTT TABLE 41: COMPUTATIONAL POWER RANKING
65
GRAPH 31: PREDICTED VS ACTUAL GOLD PRICE (BEST PREDICTION)
$ 1.000,00
$ 1.100,00
$ 1.200,00
$ 1.300,00
$ 1.400,00
$ 1.500,00
$ 1.600,00
$ 1.700,00
$ 1.800,00
$ 1.900,00
jan
/10
feb
/10
mrt
/10
apr/
10
mei
/10
jun
/10
jul/
10
aug/
10
sep
/10
okt
/10
no
v/1
0
dec
/10
jan
/11
feb
/11
mrt
/11
apr/
11
mei
/11
jun
/11
jul/
11
aug/
11
sep
/11
okt
/11
no
v/1
1
dec
/11
Actual Gold Price
Predicted Gold Price (ADL model)
Predicted Gold Price (RC)
Predicted Gold Price (BPPT-model 1)
Predicted Gold Price (BPPT-model 2)
Predicted Gold Price (BPPT-model 3)
Predicted Gold Price (Gaussian Processes)
66
2. What are the most important determinants of the gold price according to the different
methods that were used for modelling the gold price?
An overview of the importance of the different variables can be found in TABLE 42. It is interesting to
see that four variables were not appointed as being important for any model. The first one is the
composite leading indicator, providing early signals of turning points between expansions and
slowdowns of economic activity. It is somewhat surprising that there was no model that indicated
this variable as important. A possible explanation could be that these turning points are already
captured by the other variables in the system, like for example inflation. The second and third
variable are the world inflation and the volatility of the US inflation. The reason for their
unimportance could be once again that their information is already captured by the other variables,
like the volatility of the world inflation and the US inflation. The fourth and final variable is the CRDP,
a measure for economic and financial risk. Though most people think of this variable as a crucial
factor, determining the gold price, no evidence for its importance was found.
A common factor in every model is the presence of a variable which is connected with the price level
in the US or the world. So it is important to have at least one of the three following variables included
in the model: the price level in the US, the price level in the world or the inflation in the US. In every
machine learning model the general price level in the world is important, sometimes in combination
with the US price level or inflation.
Interesting to notice is the consistency of the important variables that have a direct impact in the
BPTT- models. Each of the three models considers the previous gold price, the exchange rate and the
price level in the world to be important in a direct, linear way. Of these three variables two were
important in the linear ADL-model, the previous gold price and the exchange rate.
Surprisingly the gold lease rate, being not important according to the ADL-model, is important
according to the machine learning models. Assignable reason for this is that the impact of the gold
lease rate is somewhat more indirect, suggesting that the impact is more non-linear, through which it
cannot be captured by the ADL-model.
Two final remarks can be made about the beta of gold and the volatility of the world inflation. Firstly,
the beta of gold is found to be important by some machine learning models but not by the ADL
model. Nevertheless the beta in the ADL-model is almost significant on the 10% level, which, in the
end, makes it the fourth most important variable in the ADL-model. The importance of the
characteristics of the beta of gold in practice cannot be underestimated since it is one of the most
important reasons for investors to buy gold, in order to diversify their portfolio. The final remark
handles about the importance of the volatility of the world inflation via the indirect way in the two
67
BPTT-models. This volatility, which is higher during crisis, has an important impact which is not
captured by the ADL-model trained on the entire training set. However when the ADL-model is
estimated on a period from 1998 until 2009, the volatility in the world inflation becomes significant.
This shows that artificial intelligence models outperform the linear techniques in modelling when the
importance of the underlying variables of a system changes.
Variables ADL-model
BPTT-model 1 BPTT-model 2 BPTT-model 3 Gaussian Processes Direct Indirect Direct Indirect Direct Indirect
TABLE 42: OVERVIEW OF THE IMPORTANCE OF THE DIFFERENT VARIABLES (IMPORTANT VARIABLE: CELL IS GREY)
In conclusion, eight out of twelve variables are considered to be important in the gold price
modelling. Four of them have a more direct, linear impact: the gold price in the previous period, the
inflation in the US, the exchange rate and the price level in the world. The other four variables will
impact the gold price in a more indirect, non-linear way: the beta of the gold price, the gold lease
rate, the volatility of the world inflation and the price level in the US.
In the research of Ghosh and others (2004) and Levin and Wright (2006) similar determinants are
found to be important, which adds credibility to the results found in this thesis. Ghosh and others
(2004) found that lagged gold prices are important as well as current and lagged gold lease rates.
Furthermore they found three other variables to be important in their study, namely the US-World
exchange rate, the current and lagged price levels in the US and the current and lagged betas of gold.
Levin and Wright (2006) found that lagged gold prices, US inflation, volatility of the US inflation, the
US-World exchange rate, the lagged gold lease rate and the lagged CRDP are important in explaining
gold prices. Both studies used cointegretion regression as technique. This research was not able to
find papers that used any of the other mentioned techniques in combination with relevant input
parameters.
68
Future evolution
Based on the results of this study, the gold price will keep on increasing for at least 3 years, after
which there might be a decrease in the gold price. Six of the eight variables, that determine the
direction of how the gold price will evolve, are described in this section: the inflation in the US, the
US price level, the world price level, the volatility of the world inflation, the exchange rate and the
real interest rate (which is equal to the gold lease rate). The evolution of the other two variables that
have an impact on the gold price, is difficult to determine. Therefore, these variables will not be
described in this section.
Based on the low to moderate inflation expectations in the US, the gold price should remain stable or
increase slowly. However in the media, it is often stated that because of money creation by central
banks, hyper-inflation could occur. There are four reasons why this will probably not be the case.
First of all, it is true that the reserves of the banks increased by more than 4000% in the last 5 years.
However the explosion of reserves resulted in an increase of only 25% in money supply. Reason for
this is that the Federal Reserve pays interests on funds that were placed with them. Banks find it
more interesting to give their excess funds to the Fed instead of lending it to households and
corporations. Secondly the high unemployment and the negative output-gap observed in the US,
puts pressure on wages and prices of products, which will favour a low inflation. Third, the thesis
finds proof supporting the idea that more reserves do not cause higher inflation. Japan tried to use
quantitative easing to fight deflation around 2001, but they never succeeded. The last reason why
excessive reserves will not lead to a high level of inflation, is the power of the Federal Reserve. When
the Federal Reserve observes an increasing trend in inflation, they find a way to manipulate the
excessive reserves in various ways so that inflation can be kept at acceptable levels (Feldstein, 2012).
Based on the expected US-world exchange rate, the gold price is expected to increase. There are four
reasons why the dollar is expected to depreciate, causing an increase in the gold price. First of all, the
bad US trade balance will put downward pressure on the value of the dollar, inducing it to
depreciate. Secondly, according to the purchasing-power parity, cfr. the Big Mac index (The
Economist) on the 12th of January 2012, the dollar is overvalued when comparing with for example
China, Turkey, Russia, India... A the third reason is the fact that the International Monetary Fund and
the UN are looking for a new global currency. This new currency could temper global demand for
dollars, forcing the US dollar into a depreciation. Lastly, increasingly countries are using their
domestic currency to trade with each other, which has once again a negative effect on the demand
side of the US currency. For example China and Japan no longer trade with each other using the
dollar, but now use their own, domestic currency.
69
Since real interest rates are expected to remain low, even negative, investors find it attractive to buy
gold, implying an increase in the demand for gold. This rise in demand will translate itself into higher
gold prices. The real interest rate equals the nominal interest rate minus inflation. Since the Federal
Reserve announced to keep nominal interest rates historically low at least until the end of 2014 and
because inflation is also low, real interest rates are expected to stay close to zero for another three
years.
Price level in the US and the world are expected to increase at slow to moderate paces, leading to
higher gold prices. The volatility of the world inflation is expected to remain high as long as there is a
crisis (supra). Higher volatility of the world inflation will cause a higher demand for gold, which will
result in a higher gold price.
Combining these six elements, suggests a future increase in the gold price. A potential, maximum
price that will be reached in the coming 2 to 3 years, is $2465.57 per ounce, since this is the gold
price in constant 2011 terms that was reached on January 21 in 1980. As soon as the gold price
reaches this psychological border and one of the underlying drivers of the gold price changes, the
gold price will rapidly decrease since gold is nowadays a liquid asset due to the existence of ETF’s. For
example, an underlying driver that can change, is the real interest rate. If the economy picks up over
3 years time and interest rates start to rise, investors will realize that gold offers no annual return like
shares with dividends. In the words of Warren Buffett: “If you offer me the choice of looking at some
67 foot cube of gold all day or having all the farm land in the US plus 7 Exxon Mobil’s plus a trillion
dollars, I would choose the second option! You can maybe fondle the cube but it won’t produce
anything.” (De Tijd, 2012)
70
10.Conclusion
Throughout the history of mankind, gold has always been an intriguing asset. The Greeks, the
Romans, the Egyptians… all used gold as a trade currency, a token of wealth or a safe-haven in times
of political or financial turmoil. Due to the recent economic problems, gold is prominently back in the
world of investing, leading to new heights in the nominal gold price. In constant 2011 terms the gold
price is however not yet reaching its top, which was reached in January 1980 with a monthly gold
price of $1958.85.
Modelling the price of gold is a difficult job. Unlike assets like shares or bonds, gold does not offer
any dividends or interest payments. As such, the price cannot be calculated by discounting future
cashflows. Another way to look at the problem is by investigating the drivers of supply and demand
of gold and their impact on the expected gold price. This thesis researched different modelling and
forecasting techniques for the gold price, but also the diverse determinants constituting the gold
price. The different techniques that were investigated, were linear regression, reservoir computing,
backpropagation through time and Gaussian processes. The approach that is followed to answer
these questions is known as the inflation-hedge approach.
The best technique is backpropagation through time with optimizing the meta-parameters and
training the recurrent weights. Another promising technique is reservoir computing, but only when it
is properly executed, meaning leaving data points out after each validation set. Linear regression
provides good results and is easy to understand and interpret. Two important remarks need to be
made concerning linear regression. First, for the period November 1990 until December 2011 no long
run relationship between the gold price and the general price level in the US was found. However
this does not mean that gold cannot be used to hedge against inflation on the long run. Since there
was no long run relationship, this thesis used an ADL-model instead of a cointegration regression.
The second remark concerning linear regression is the fact that during periods where the linearity is
absent, linear regression performs worse than non-linear techniques.
Two suggestions can be made when applying these techniques for financial time series. If time allows
so to do, use as many techniques as possible. If they confirm each other then the likelihood that the
gold price will indeed move in the suggested direction, will be higher as compared to a situation
where all techniques tend to disagree on gold price movements. The second suggestion is to think
before acting. When solving a problem, start with analysing the different characteristics of the
problem. Based upon this investigation, choose the technique which fits best. Every technique has its
advantages and disadvantages which makes it more suitable for solving a specific problem.
71
To answer the second research question, the importance of the different variables in the different
techniques was investigated. Eight of the twelve variables are considered to be important in the gold
price modelling. Of these eight, four have a direct, linear impact: the gold price in the previous
period, the inflation in the US, the exchange rate and the price level in the world. The other four
variables will impact the gold price in an indirect, non-linear way. These four variables are: the beta
of the gold price, the gold lease rate, the volatility of the world inflation and the price level in the US.
To put it in a simplistic way, the gold price at time t equals to the gold price at time t-1 plus some
extra’s.
Based on the results of this study, the gold price is expected to increase for at least 3 years, after
which there might be a decrease in the gold price. Six of the eight important variables, that
determine the direction of how the gold price will evolve, show an evolution that induces an increase
of the gold price. The evolution of the other two variables that have an impact on the gold price, is
difficult to determine. Therefore, these variables were not described. A potential, maximum price
that will be reached in the coming 2 to 3 years, is $2465.57 per ounce, since this is the gold price in
constant 2011 terms that was reached on January 21 in 1980.
Lastly some suggestions for future research can be made. An important step would be to include
cross-testing for all the different models, so that the different techniques can be compared in an
objective way. It would also be interesting to research if non-linear machine learning techniques can
add value to classical linear regression by trying to model the residuals coming from linear
regression, using the same input variables that were used in developing the linear model.
Furthermore, linear regression can be improved by using the Johansen test, which allows more
variables to be cointegrated and as such is able to capture more dynamics. A last point of further
improvement for future research is the possibility to include a political risk variable.
XI
VIII References Bollen, J., Mao, H., Zeng, X., 2010, Twitter mood predicts the stock market. Journal of
Computational Science, vol. 2, pp. 1-8
Cai, C.X., Cheung, Y.-L., Wong, M.C.S., 2001, What moves the gold market? Journal of
Futures Markets 21, pp. 257–278
Calvo, G.C., Reinhart, C.M., 2002, Fear of floating, The Quarterly Journal of Economics, vol.
117, pp. 379-408
Coudert, V., Couharde, C., Mignon, V., 2010, Exchange Rate Flexibility across Financial
Crises, CEPII
Diba, B. and Grossman, H., 1984, Rational Bubbles in the Price of Gold, NBER Working Paper:
1300. Cambridge, MA, National Bureau of Economic Research
Dooley, M.P., Isard, P. and Taylor, M.P., 1995, Exchange Rates, Country-specific Shocks and
Gold, Applied Financial Economics, vol. 5, pp. 121-129
Eichengreen, B., Flandreau, M., 1997, Gold Standard In Theory & History, Routledge, Taylor
& Francis Group
Enders, W., 2004, Applied Econometric Time Series, Wiley, pp. 213
Engle, R.F. and Granger, C.W.J., 1987, Cointegration and Error-Correction: Representation,
Estimation and Testing, Econometrica, vol. 55, pp. 251-276
Ghosh, D.P., E.J. Levin, P. Macmillan and R.E. Wright and, 2004, Gold as an Inflation Hedge?,
Studies in Economics and Finance, vol. 22, no. 1, pp. 1-25
Ismail, Z., Yahya, A., Shabri, A., 2009, Forecasting gold prices using multiple linear regression
method, American Journal of Applied Sciences 6, no. 8, pp. 1509-1514
Jaeger, H., 2005, A tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF
and the “echo state network” approach
Kolluri, B.R., 1981, Gold as a hedge against inflation: an empirical investigation. Quarterly
Review of Economics and Business 21, pp. 13–24
Lampinen, A., 2007, Gold Investments and short- and long-run price determinants of the
price of gold, Working paper, LUT, School of Business
Levin, E.J., Wright R.E., 2006, Short run and long run determinants of the gold price. World
Gold Council, London
Mahdavi, S., Zhou, S., 1997, Gold and commodity prices as leading indicators of inflation:
tests of long run relationships and predictive performance. Journal of Economics and
Business 49, pp. 475–489
XII
McCann, P., Kalman, B., 1994, A neural network model for the gold market, Washington
University
Mirmirani, S., Li, H., 2004, Gold Price, neural networks and genetic algorithm,
Computational Economics 23, no. 4, pp. 193-200
Moore, G., 1990, Gold Prices and a Leading Index of Inflation, Challenge, vol. 33, pp. 52-56
Parisi, A., Parisi, F., Diaz, D., 2008, Forecasting gold price changes : rolling and recursive
neural network models, Journal of multinational financial management 18, no. 5, pp. 477-
487
Pindyck, R.S., 1993, The Present Value Model of Rational Commodity Pricing, Economic
Journal, vol. 103, pp. 511-530
Ranson, D., 2005b, Inflation Protection: Why Gold Works Better Than “Linkers”, London,
World Gold Council
Rasmussen, C., Williams, C., 2006, Gaussian Processes for Machine Learning, the MIT Press
Shafiee, S., Topal, E., 2010, An overview of global gold market and gold price forecasting.
Resources Policy, vol. 35, no. 3, pp. 178-189
Sherman, E.J, 1983, A Gold Pricing Model, Journal of Portfolio Management, vol. 9, pp. 68-70
Sjaastad, L.A. and F. Scacciallani, 1996, The Price of Gold and the Exchange Rate, Journal of
Money and Finance, vol. 15, pp. 879-897
Websites:
World Gold Council, 2002, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2003, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2004, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2005, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2006, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2007, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2008, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
XIII
World Gold Council, 2009, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2010, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
World Gold Council, 2011, Gold Demand Trends, Retrieved October 15, 2011,
<http://www.gold.org/investment/research/regular_reports/gold_demand_trends/>
US Geological Survey, 2011, Mineral Commodity Summaries, Retrieved October 15, 2011,
< http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2011-gold.pdf>
Newspaper articles:
Feldstein, M., 2012, Fed Policy and Inflation Risk, De Tijd
Waarom wantrouwen? (2012, March 8). De Tijd