Download - METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, [email protected] Τηλ. 24210-74438 Γραφείο

Transcript

METHODS OF SPATIAL ECONOMIC ANALYSISLECTURE 02

Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, [email protected]Τηλ. 24210-74438Γραφείο Γ.6

UNIVERSITY OF THESSALYFACULTY OF ENGINEERING

DEPARTMENT OF PLANNINGAND REGIONAL DEVELOPMENT

MASTER «EUROPEAN REGIONAL DEVELOPMENT STUDIES»

mailto:[email protected]

Page 2: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

METHODS OF SPATIAL ECONOMIC ANALYSIS: STATISTICAL TREATMENT OF SPATIAL DATA, A FIRST APPROACH

Page 3: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

OBJECTIVE O

F THE LECTU

REMain terms used in StatisticsObjective of the Lecture

1. Exploratory statistical analysis of Regional data.

2. Familiarization with Eurostat regional data

3. Familiarization with Statistical Treatment through SPSS

Population

Complete set of data

elementsEx. Census,

Sample

Portion of selected

elements from a reference’s population

Parameter

Measured characteristic of the whole Population

Statistic

Estimated characteristic of the sample

Page 4: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TData VisualizationTypes of Data

Categorical data: Non ordinal: family status,

employment status, etc (no measurement meaning).

Ordinal: rating-score variable (Likert-scale). In this case measurement has meaning.

Numeric data: they have a clear meaning as measurement Discrete data Continuous data

Most of the data used in Regional analysis are numeric,

allowing a cartographic visualization.

Gross Domestic Expenditure on R&D (% of GDP)

Source: Eurostat, EU 2020 indicators

See LECTURE_02_DATA.xls

Page 5: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TRepresentation of Likert scaleA specific case of ordinal data:

The Likert items

Initially, the likert scale is a psychometric scale measuring the level of agreement or disagreement.

This scale has a more general use and allows to evaluate characteristics according to objective or subjective criteria.

Most common used scales are the five, seven , nine and sometimes eleven levels.

Likert, R. (1932), "A Technique for the Measurement of Attitudes". Archives of Psychology 140: 1–55.

Typical five-level psychometric scale

Five level scale for regions’ classification

Page 6: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISPresentation of Data for statistical Treatment

Source: Eurostat, EU 2020 indicators

See LECTURE_02_DATA.xls

Two Sheets

1. Analytical Data2. Data_SPSS

The second sheet has the appropriate format in order to open the data with SPSS, i.e.:

The 1st Row contains the variables’ namesThe following 28 rows concern the 28 countries without EU28

Each column concern one variable

Page 7: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TCentral parameters for total

R&D expenditures (% of GDP)Central Parameters [01]

The two variables examined are: RD_TOT04 and RD_TOT12, i.e. the total R&D expenditures as % of GDP, in 2004 and 2012.

Arithmetic Mean:Sum of all elements of the data set divided by the number of elements.

Weighted Mean: Sum of the weighted scores

Geometric Mean:The nth root of the product of data elements

1.1 1

iiii wXwX

iiXX

Conclusions:______________________________________

______________________________________

Statistical Analysis with Excel

2004 2012E.U. 28 1,82 2,07

Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67

Weigted Mean 1,56 1,83

Geometric Mean =GEOMEAN(.. , ..) 1,08 1,41

Page 8: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TExamplesCentral Parameters [02]

Be careful, The “MODE” command gives us the highest value when mode is not a single value.

In 2012, mode is effectively not a single value.

Mode has a very limited interest

Mode:The observed data that occurs most frequently.

Most frequent value of the variable. Mode is not necessarily a single value

Median: The value of the variable (arranged in order magnitude), below which 50% of the elements fall (50% of elements have a value lower than the Median).

Median = Arithmetic Mean when the distribution follows

the Laplace-Gauss distribution (Normal distribution).

Country RD_TOT12

Cyprus 0,46Romania 0,49Bulgaria 0,64Latvia 0,66Greece 0,69Croatia 0,75Slovakia 0,82Malta 0,84Lithuania 0,90Poland 0,90Italy 1,27Spain 1,30Hungary 1,30Luxembourg 1,46Portugal 1,50Ireland 1,72United Kingdom 1,72Czech Republic 1,88Netherlands 2,16Estonia 2,18Belgium 2,24France 2,29Slovenia 2,80Austria 2,84Denmark 2,98Germany 2,98Sweden 3,41Finland 3,55

Statistical Analysis with Excel

2004 2012Mode =MODE(.. , ..) 0,51 2,98

Median =MEDIAN(.. , ..) 1,08 1,48

In 2012, if mean = 1,67% of GDP, median is quite smaller!

Page 9: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TExamplesMeasures of dispersion [01]

Range:Difference between the highest and the lowest data element.

Dispersion Ratio: Quotient between the highest and the lowest data element.

Percentile (p%):The value of the variable of the variable below which p% of the elements falls.For dispersion analysis, the 5% and 95% are very useful.

minmax XXRange

min

max

XDR

Statistical Analysis with Excel

2004 2012Minimum =MIN(.. , ..) 0,37 0,46

Maximum =MAX(.. , ..) 3,58 3,55

Range =MAX - MIN 3,21 3,09

DR RATIO = MAX / MIN 9,68 7,72

Conclusions:______________________________________

______________________________________

Page 10: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TExamplesMeasures of dispersion [02]

Variance:The square average distance of each score from the mean.

Weighted Variance: The square average weighted distance of each score from the mean.

Standard deviation:σ = square root of variance

Coefficient of Variation (CV):

XXXV

)(][

1)(][11

iii wXXwXV

XCV

Conclusions:______________________________________

______________________________________

Statistical Analysis with Excel

2004 2012Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67

Variance =VAR(.. , ..) 0,809 0,880

Standard Deviation =STDEV(.. , ..) 0,900 0,938

CV coefficient =STDEV / AVERAGE 1,11 1,07

111% 107%

Page 11: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TExamplesMeasures of dispersion [03]

Weighted Coefficient of Variation wCV:

With spatial units, wi is generally the population weight of the spatial unit i, in the total area under examination.

Considering the 28 EU countries, Popi = population of the country

Pop. = EU population

XXw

wCV

iii

2).(

Conclusions:______________________________________

______________________________________

Pop

Popw ii

Statistical Analysis with Excel 2004 2012

Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67Variance =VAR(.. , ..) 0,809 0,880Standard Deviation =STDEV(.. , ..) 0,900 0,938CV coefficient =STDEV / AVERAGE 1,11 1,07

111% 107%

2004 2012Arithmetic Mean =AVERAGE(.. , ..) 1,32 1,57

Weighted varianceSee calculation on

columns J & K

0,662 0,680

Weighted St. Deviation 0,813 0,825wCV = wSTDEV / AVERAGE 0,607 0,494

61% 49%

Page 12: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TRepresentation [01]Normal / Gaussian Distribution

Perfectly symmetric distribution of the random variable around the mean value.

Mean = Median = Mode.

Standard Normal Distribution:

If X N(μ, σ2) Normal distribution

Consequently, the standardized variable Z N(0, 1)

where:

P(X <μ) = 0,5 (50%)

Page 13: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TRepresentation [02]Normal / Gaussian Distribution

The distribution shape of a Normal variable depends on the specific values of its two parameters: mean and variance.

High value of variance flattened curve (see blue curve): there is no concentration of values around the mean.

Small value of variance high concentration around the mean value, low degree of variability (see red curve).

Page 14: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TConfidence LevelConfidence Interval

Confidence interval:It gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

Confidence limits:The lower and upper boundaries of a confidence interval, that is, the values which define the range of a confidence interval.

Confidence interval is very informative because its width gives us some idea about how uncertain we are about the unknown parameter.

Confidence level:The probability value (1-α) associated with a confidence interval. If a = 5%, the confidence level is (1-0,05) = 0,95 i.e. a 95% confidence level.

Statistical Analysis with Excel

2004 2012

Arithmetic Mean =AVERAGE(.. , ..) 1,34 1,67

Margin of Error =CONFIDENCE (α,STDEV;sample size)

0,333 0,347

Confidence IntervalLower born 1,008 1,322

Upper born 1,674 2,016

szXICa a ...)1(

In this example, we choose α=0,05 (5%), i.e. 95% CI, sample size = 28

Page 15: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TMeasures of TrendsMeasures of Trends

Kurtosis [a4]: A measure of the “peakedness” of the probability distribution of a random variable.

3).1(

)(

XXa

a4 = 0 : Normal distributiona4 > 0 : Peaked distributiona4 < 0 : Flat distribution

Skewness [a3]:A measure of the asymmetry of the probability distribution of a random variable.

3 ).1(

)(

XXa

a3 = 0 : Normal distribution

Page 16: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

STATISTICAL TREATMEN

TExamplesMeasures of Correlation

Question:In which extend the R&D expenditures in 2012 are strongly correlated with the R&D expenditures in 2004?

Normally, we are waiting for a very positive coefficient. Countries with initial high expenditures will continue in tendency to have high expenditures.

Pearson coefficient of correlation rp :It indicates the strength and the direction of a linear relationship between two random variables (X and Y).

The Correlation coefficient does not indicate a cause and effect relationship

Spearman Coefficient of correlation rs:It indicates the strength and the direction of a relationship (not necessarily linear) between two random variables

iii

YYXX

))((.1 1

)1(

sStatistical Analysis with Excel Pearson Correlation =CORREL(D2:D29,E2:E29) 0,914

Page 17: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

INTRODUCTION TO METHODS OF SPATIAL ECONOMIC ANALYSIS:USING SPSS

Page 18: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISFrom EXCEL to SPSS

Source: Eurostat, EU 2020 indicators

The Excel file LECTURE_02_DATA.xlshas to be closed

The Worksheet with the appropriate format is: Data_SPSS

The data that we are going to open through SPSS are in the range: A1:M29

The 1st row has to contain the names of the variables.

The names of the variables cannot contain special characters such as space, %,@,$,*, / etc.

It is suggested to utilize short names for the variables, because you can define in detail the variable in the Label column in the specific sheet describing the variables. Population in 2004 = POP04 (POP 04 is not allowed because of the space)

Page 19: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISFrom EXCEL to SPSS

Source: Eurostat, EU 2020 indicators

The Excel file Data_LECTURE02.xlshas to be closed

The Worksheet with the appropriate format is: Data_SPSS

The data that we are going to open through SPSS are in the range: A1:M29

The 1st row has to contain the names of the variables

1. Select Excel type of File

2. Then select the fileLECTURE02_DATA.xls

3. Open

New window where we can select the appropriate Worksheet (Data_SPSS).

You will have also to check the range

Command:File Open Data

Page 20: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISData in SPSS

As you can observe the names of the variables are in the initial row without number. Consequently the 1st row gives us the data of the 1st one country (Belgium).

Each data file of SPSS has two sheets:

Data View with the data

Variable View where you can enter information about your data, and specify the nature and the meaning of the data.

Page 21: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISStatistical Treatment with SPSS

How to obtain the most important statistical parameters of our variables in order to proceed to an exploratory analysis (Descriptive statistics)?

Use the following command:

Analyze Descriptive Statistics Explore

Page 22: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISStatistical Treatment with SPSS

1. Select the variables to be explored from the left-hands list. It is possible to select more than one variable and to produce all the results for the various selected variables.

2. Move the variables to the right pane: Dependent List.

With Explore, statistical parameters are calculated as well as the Box-Plot through which we can detect the presence of outliers.

In some cases, we will examine the statistical parameters of one or more than one variables for sub-groups of the total population. In this case, we will have to move the variable defining the sub-groups in the pane: Factor List.

Page 23: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISResults from Explore

The results appear in a new worksheet : Output which is completely independent from the data sheet.

This sheet can be saved or convert in word, excel etc.

All the results are summarized in the table.

Page 24: METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 02 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, mdyken@prd.uth.gr Τηλ. 24210-74438 Γραφείο

DATA FOR AN

ALYSISResults from Explore

With Explore, we also obtain for each variable the Box-Plot.

This diagram allows us to verify in which extend, the variables present a quite “Normal” distribution while it also allows to detect the presence of outliers (values that are below or above of the accepted thresholds).

In this case, there is no outlier.