METHODS OF SPATIAL ECONOMIC ANALYSISLECTURE 02
Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, [email protected]Τηλ. 24210-74438Γραφείο Γ.6
UNIVERSITY OF THESSALYFACULTY OF ENGINEERING
DEPARTMENT OF PLANNINGAND REGIONAL DEVELOPMENT
MASTER «EUROPEAN REGIONAL DEVELOPMENT STUDIES»
1
METHODS OF SPATIAL ECONOMIC ANALYSIS: STATISTICAL TREATMENT OF SPATIAL DATA, A FIRST APPROACH
OBJECTIVE O
F THE LECTU
REMain terms used in StatisticsObjective of the Lecture
1. Exploratory statistical analysis of Regional data.
2. Familiarization with Eurostat regional data
3. Familiarization with Statistical Treatment through SPSS
Population
Complete set of data
elementsEx. Census,
Register
Sample
Portion of selected
elements from a reference’s population
Parameter
Measured characteristic of the whole Population
Statistic
Estimated characteristic of the sample
STATISTICAL TREATMEN
TData VisualizationTypes of Data
Categorical data: Non ordinal: family status,
employment status, etc (no measurement meaning).
Ordinal: rating-score variable (Likert-scale). In this case measurement has meaning.
Numeric data: they have a clear meaning as measurement Discrete data Continuous data
Most of the data used in Regional analysis are numeric,
allowing a cartographic visualization.
Gross Domestic Expenditure on R&D (% of GDP)
Source: Eurostat, EU 2020 indicators
See LECTURE_02_DATA.xls
STATISTICAL TREATMEN
TRepresentation of Likert scaleA specific case of ordinal data:
The Likert items
Initially, the likert scale is a psychometric scale measuring the level of agreement or disagreement.
This scale has a more general use and allows to evaluate characteristics according to objective or subjective criteria.
Most common used scales are the five, seven , nine and sometimes eleven levels.
Likert, R. (1932), "A Technique for the Measurement of Attitudes". Archives of Psychology 140: 1–55.
Typical five-level psychometric scale
Five level scale for regions’ classification
DATA FOR AN
ALYSISPresentation of Data for statistical Treatment
Source: Eurostat, EU 2020 indicators
See LECTURE_02_DATA.xls
Two Sheets
1. Analytical Data2. Data_SPSS
The second sheet has the appropriate format in order to open the data with SPSS, i.e.:
The 1st Row contains the variables’ namesThe following 28 rows concern the 28 countries without EU28
Each column concern one variable
STATISTICAL TREATMEN
TCentral parameters for total
R&D expenditures (% of GDP)Central Parameters [01]
The two variables examined are: RD_TOT04 and RD_TOT12, i.e. the total R&D expenditures as % of GDP, in 2004 and 2012.
Arithmetic Mean:Sum of all elements of the data set divided by the number of elements.
Weighted Mean: Sum of the weighted scores
Geometric Mean:The nth root of the product of data elements
n
XX
n
ii
1
1.1 1
n
i
n
iiii wXwX
n
n
iiXX
1
Conclusions:______________________________________
______________________________________
______________________________________
______________________________________
______________________________________
Statistical Analysis with Excel
2004 2012E.U. 28 1,82 2,07
Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67
Weigted Mean 1,56 1,83
Geometric Mean =GEOMEAN(.. , ..) 1,08 1,41
STATISTICAL TREATMEN
TExamplesCentral Parameters [02]
Be careful, The “MODE” command gives us the highest value when mode is not a single value.
In 2012, mode is effectively not a single value.
Mode has a very limited interest
Mode:The observed data that occurs most frequently.
Most frequent value of the variable. Mode is not necessarily a single value
Median: The value of the variable (arranged in order magnitude), below which 50% of the elements fall (50% of elements have a value lower than the Median).
Median = Arithmetic Mean when the distribution follows
the Laplace-Gauss distribution (Normal distribution).
Country RD_TOT12
Cyprus 0,46Romania 0,49Bulgaria 0,64Latvia 0,66Greece 0,69Croatia 0,75Slovakia 0,82Malta 0,84Lithuania 0,90Poland 0,90Italy 1,27Spain 1,30Hungary 1,30Luxembourg 1,46Portugal 1,50Ireland 1,72United Kingdom 1,72Czech Republic 1,88Netherlands 2,16Estonia 2,18Belgium 2,24France 2,29Slovenia 2,80Austria 2,84Denmark 2,98Germany 2,98Sweden 3,41Finland 3,55
Statistical Analysis with Excel
2004 2012Mode =MODE(.. , ..) 0,51 2,98
Median =MEDIAN(.. , ..) 1,08 1,48
In 2012, if mean = 1,67% of GDP, median is quite smaller!
STATISTICAL TREATMEN
TExamplesMeasures of dispersion [01]
Range:Difference between the highest and the lowest data element.
Dispersion Ratio: Quotient between the highest and the lowest data element.
Percentile (p%):The value of the variable of the variable below which p% of the elements falls.For dispersion analysis, the 5% and 95% are very useful.
minmax XXRange
min
max
X
XDR
Statistical Analysis with Excel
2004 2012Minimum =MIN(.. , ..) 0,37 0,46
Maximum =MAX(.. , ..) 3,58 3,55
Range =MAX - MIN 3,21 3,09
DR RATIO = MAX / MIN 9,68 7,72
Conclusions:______________________________________
______________________________________
______________________________________
______________________________________
______________________________________
STATISTICAL TREATMEN
TExamplesMeasures of dispersion [02]
Variance:The square average distance of each score from the mean.
Weighted Variance: The square average weighted distance of each score from the mean.
Standard deviation:σ = square root of variance
Coefficient of Variation (CV):
n
XXXV
n
ii
1
2
2
)(][
1)(][11
22
n
ii
n
iii wXXwXV
XCV
Conclusions:______________________________________
______________________________________
______________________________________
______________________________________
______________________________________
Statistical Analysis with Excel
2004 2012Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67
Variance =VAR(.. , ..) 0,809 0,880
Standard Deviation =STDEV(.. , ..) 0,900 0,938
CV coefficient =STDEV / AVERAGE 1,11 1,07
111% 107%
STATISTICAL TREATMEN
TExamplesMeasures of dispersion [03]
Weighted Coefficient of Variation wCV:
With spatial units, wi is generally the population weight of the spatial unit i, in the total area under examination.
Considering the 28 EU countries, Popi = population of the country
Pop. = EU population
X
XXw
wCV
n
iii
1
2).(
Conclusions:______________________________________
______________________________________
______________________________________
______________________________________
______________________________________
Pop
Popw ii
Statistical Analysis with Excel 2004 2012
Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67Variance =VAR(.. , ..) 0,809 0,880Standard Deviation =STDEV(.. , ..) 0,900 0,938CV coefficient =STDEV / AVERAGE 1,11 1,07
111% 107%
2004 2012Arithmetic Mean =AVERAGE(.. , ..) 1,32 1,57
Weighted varianceSee calculation on
columns J & K
0,662 0,680
Weighted St. Deviation 0,813 0,825wCV = wSTDEV / AVERAGE 0,607 0,494
61% 49%
STATISTICAL TREATMEN
TRepresentation [01]Normal / Gaussian Distribution
Perfectly symmetric distribution of the random variable around the mean value.
Mean = Median = Mode.
Standard Normal Distribution:
If X N(μ, σ2) Normal distribution
Consequently, the standardized variable Z N(0, 1)
where:
X
Z
P(X <μ) = 0,5 (50%)
STATISTICAL TREATMEN
TRepresentation [02]Normal / Gaussian Distribution
The distribution shape of a Normal variable depends on the specific values of its two parameters: mean and variance.
High value of variance flattened curve (see blue curve): there is no concentration of values around the mean.
Small value of variance high concentration around the mean value, low degree of variability (see red curve).
STATISTICAL TREATMEN
TConfidence LevelConfidence Interval
Confidence interval:It gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
Confidence limits:The lower and upper boundaries of a confidence interval, that is, the values which define the range of a confidence interval.
Confidence interval is very informative because its width gives us some idea about how uncertain we are about the unknown parameter.
Confidence level:The probability value (1-α) associated with a confidence interval. If a = 5%, the confidence level is (1-0,05) = 0,95 i.e. a 95% confidence level.
Statistical Analysis with Excel
2004 2012
Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67
Margin of Error =CONFIDENCE (α,STDEV;sample size)
0,333 0,347
Confidence IntervalLower born 1,008 1,322
Upper born 1,674 2,016
n
szXICa a ...)1(
In this example, we choose α=0,05 (5%), i.e. 95% CI, sample size = 28
STATISTICAL TREATMEN
TMeasures of TrendsMeasures of Trends
Kurtosis [a4]: A measure of the “peakedness” of the probability distribution of a random variable.
3).1(
)(
41
4
4
sn
XXa
n
ii
a4 = 0 : Normal distributiona4 > 0 : Peaked distributiona4 < 0 : Flat distribution
Skewness [a3]:A measure of the asymmetry of the probability distribution of a random variable.
31
3
3 ).1(
)(
sn
XXa
n
ii
a3 = 0 : Normal distribution
STATISTICAL TREATMEN
TExamplesMeasures of Correlation
Question:In which extend the R&D expenditures in 2012 are strongly correlated with the R&D expenditures in 2004?
Normally, we are waiting for a very positive coefficient. Countries with initial high expenditures will continue in tendency to have high expenditures.
Pearson coefficient of correlation rp :It indicates the strength and the direction of a linear relationship between two random variables (X and Y).
The Correlation coefficient does not indicate a cause and effect relationship
Spearman Coefficient of correlation rs:It indicates the strength and the direction of a relationship (not necessarily linear) between two random variables
YX
n
iii
p
YYXX
nr
.
))((.1 1
)1(
61
21
2
nn
dr
n
ii
sStatistical Analysis with Excel Pearson Correlation =CORREL(D2:D29,E2:E29) 0,914
INTRODUCTION TO METHODS OF SPATIAL ECONOMIC ANALYSIS:USING SPSS
DATA FOR AN
ALYSISFrom EXCEL to SPSS
Source: Eurostat, EU 2020 indicators
The Excel file LECTURE_02_DATA.xlshas to be closed
The Worksheet with the appropriate format is: Data_SPSS
The data that we are going to open through SPSS are in the range: A1:M29
The 1st row has to contain the names of the variables.
The names of the variables cannot contain special characters such as space, %,@,$,*, / etc.
It is suggested to utilize short names for the variables, because you can define in detail the variable in the Label column in the specific sheet describing the variables. Population in 2004 = POP04 (POP 04 is not allowed because of the space)
DATA FOR AN
ALYSISFrom EXCEL to SPSS
Source: Eurostat, EU 2020 indicators
The Excel file Data_LECTURE02.xlshas to be closed
The Worksheet with the appropriate format is: Data_SPSS
The data that we are going to open through SPSS are in the range: A1:M29
The 1st row has to contain the names of the variables
1. Select Excel type of File
2. Then select the fileLECTURE02_DATA.xls
3. Open
New window where we can select the appropriate Worksheet (Data_SPSS).
You will have also to check the range
4.
Command:File Open Data
DATA FOR AN
ALYSISData in SPSS
As you can observe the names of the variables are in the initial row without number. Consequently the 1st row gives us the data of the 1st one country (Belgium).
Each data file of SPSS has two sheets:
Data View with the data
Variable View where you can enter information about your data, and specify the nature and the meaning of the data.
DATA FOR AN
ALYSISStatistical Treatment with SPSS
How to obtain the most important statistical parameters of our variables in order to proceed to an exploratory analysis (Descriptive statistics)?
Use the following command:
Analyze Descriptive Statistics Explore
DATA FOR AN
ALYSISStatistical Treatment with SPSS
1. Select the variables to be explored from the left-hands list. It is possible to select more than one variable and to produce all the results for the various selected variables.
2. Move the variables to the right pane: Dependent List.
3.
With Explore, statistical parameters are calculated as well as the Box-Plot through which we can detect the presence of outliers.
In some cases, we will examine the statistical parameters of one or more than one variables for sub-groups of the total population. In this case, we will have to move the variable defining the sub-groups in the pane: Factor List.
DATA FOR AN
ALYSISResults from Explore
The results appear in a new worksheet : Output which is completely independent from the data sheet.
This sheet can be saved or convert in word, excel etc.
All the results are summarized in the table.
DATA FOR AN
ALYSISResults from Explore
With Explore, we also obtain for each variable the Box-Plot.
This diagram allows us to verify in which extend, the variables present a quite “Normal” distribution while it also allows to detect the presence of outliers (values that are below or above of the accepted thresholds).
In this case, there is no outlier.
METHODS OF SPATIAL ECONOMIC ANALYSISLECTURE 02
Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, [email protected]Τηλ. 24210-74438Γραφείο Γ.6
UNIVERSITY OF THESSALYFACULTY OF ENGINEERING
DEPARTMENT OF PLANNINGAND REGIONAL DEVELOPMENT
MASTER «EUROPEAN REGIONAL DEVELOPMENT STUDIES»
Top Related