Psych 5510/6510

64
1 Psych 5510/6510 Chapter Five Simple Models: Statistical Inferences about Parameter Values Spring, 2008

description

Psych 5510/6510. Chapter Five Simple Models: Statistical Inferences about Parameter Values. Spring, 2008. Contents. Understanding what we are doing Computing PRE and estimating η² - PowerPoint PPT Presentation

Transcript of Psych 5510/6510

Page 1: Psych 5510/6510

1

Psych 5510/6510

Chapter Five

Simple Models: Statistical Inferences about Parameter Values

Spring, 2008

Page 2: Psych 5510/6510

2

Contents

1. Understanding what we are doing2. Computing PRE and estimating η²3. Determining whether the PRE is

statistically significant (i.e. whether the additional parameter of Model A is ‘worthwhile’).

4. Confidence interval of the parameter.5. PRE as effect size.6. Power

Page 3: Psych 5510/6510

3

1) Understanding What We Are Doing

Page 4: Psych 5510/6510

4

Example

The mean score on a math test for the last several years among third graders at some school has been 65. A third grade teacher tries a new teaching method, the 15 students in her class earn a mean score of 78.1 on the test. The principal would like to determine whether the population represented by those 15 students (i.e. the population of students who may be taught with the new teaching method) has a mean that is different than 65.

Page 5: Psych 5510/6510

5

Single-group t test(from last semester)

The hypotheses (two-tailed) for the population of students taught with the new method.H0: μ = 65

HA: μ 65

Draw the sampling distribution of the mean assuming H0 true.

Set up rejection regions, d.f. = N-1,

tcritical=±2.145 (for two-tailed test)

Page 6: Psych 5510/6510

6

t test (continued)Y=72, 52, 93, 86, 96, 46, 55, 74, 129, 61, 57, 115, 79, 89, 68

15.21.6

6513.78

est.

-Y

10.6.

est. 23.63 est. 78.13Y

Y

Y

Y

obtainedt

n

est

Reject H0

Page 7: Psych 5510/6510

7

Doing the same thing with our new model comparison approach

Advantages– Uses an approach that generalizes to other

experimental designs.– Gives us an estimate of the size of the effect.– Impresses your friends

Page 8: Psych 5510/6510

8

Quick ReviewYi = Ŷi + ei

• Simplest model, no parameters:Ŷi = B0 where B0 equals some constant, this is not

considered to be a parameter in this approach as it is not estimated from the current data.

• Next simplest model, one parameter that makes a non-conditional prediction (makes the same prediction for everyone).Ŷi = β0 where β0 equals μ, the estimate of μ will come

from the sample.

Page 9: Psych 5510/6510

9

Our Models

MODEL C: (Compact model) Ŷi = B0 where B0 = 65 PC=0

MODEL A: (Augmented model)Ŷi = β0 where β0 = μ PA=1

Y

Page 10: Psych 5510/6510

10

HypothesesModel C: Ŷi = B0 where B0 = 65

Model A: Ŷi = β0 where β0 = μ

We will start off with a two-tailed test. There are several equivalent ways of expressing our hypotheses. We could use:

H0: β0 = B0

HA: β0 B0

In the model comparison approach it is always the case that if H0 is true then Model A would be the same as Model C.

Page 11: Psych 5510/6510

11

HypothesesModel C: Ŷi = B0 where B0 = 65

Model A: Ŷi = β0 where β0 = μ

H0: β0 = B0

HA: β0 B0

Given that B0 = 65 and β0 = μ we could also state the hypotheses as follows (which is how they appear in the t test for a single group mean):

H0: μ = 65

HA: μ 65

Page 12: Psych 5510/6510

12

HypothesesModel C: Ŷi = B0 where B0 = 65

Model A: Ŷi = β0 where β0 = μ

Remember that η2 represents that actual reduction in error from moving from Model C to Model A in the population from which we are sampling. One way of expressing our hypotheses that works for every use of the model comparison approach is:

H0: η2=0

HA: η2>0

Page 13: Psych 5510/6510

13

HypothesesThis gives us at least three equivalent ways of

expressing the hypotheses of this experiment.

H0: β0 = B0

HA: β0 B0

H0: μ = 65

HA: μ 65

H0: η2=0 HA: η2>0

Page 14: Psych 5510/6510

14

What We Are Doing

Model C: Ŷi = B0, where B0 = 65

Model A: Ŷi = β0, where β0 = μ

If we find it is ‘worthwhile’ to go to Model A then we would be saying that it is better to use the mean of the population than it is to use ’65’ as our model. This implies that the mean of the population must not be 65 (which is what we are trying to determine).

Page 15: Psych 5510/6510

15

How We Are Doing It

Model C: Ŷi = B0, where B0 = 65

Model A: Ŷi = β0, where β0 = μ

To determine whether it is worthwhile to move to Model A we need to examine the error that results from applying each model to our sample. For Model A, however, we do not know the actual value of μ, so we will estimate the value of μ using the data from our sample.

Model A: Ŷi = b0 where b0 = est. μ = mean of our sample =78.1

Page 16: Psych 5510/6510

16

SSE(C)

SSR PRE

SSE(A) - SSE(C) SSR

1.78Y SSE(A)

65Y SSE(C)

YY SSE

1.78Y :A MODEL

65Y :C MODEL

2

i

2

i

2ii

i

i

Q: Does Model A reduce error enough to be worthwhile? This willbe determined by whether or not the PRE is statistically significant.

Page 17: Psych 5510/6510

17

Why SSE(A) is Always SSE(C)

Look at the formula for SSE(A), remember from last semester that the sample mean will always lead to the smallest possible SS. If the constant proposed in Model C equals the sample mean then SSE(C) would equal SSE(A), otherwise SSE(A) will be smaller than SSE(C).

2

i

2

i

2

i

1.78YYY SSE(A)

65Y SSE(C)

Page 18: Psych 5510/6510

18

Continuing This ThoughtModel C: Ŷi = B0, where B0 = 65

Model A: Ŷi = β0, where β0 = μ

But we actually test:Model A: Ŷi = b0 where b0 = est. μ = the sample mean

Our hypotheses areH0: β0 = B0 or μ = 65

HA: β0 B0 or μ 65

It might be that H0 is true, and that μ = 65, if the sample mean doesn’t equal 65 due to chance then SSE(A) will still be less than SSE(C) and thus PRE will be greater than 0. This is one reason why we need to use significant testing to determine whether the PRE is greater enough than zero to reject H0.

Page 19: Psych 5510/6510

19

2) Computing PRE and Estimating η²

Page 20: Psych 5510/6510

20

Definitional Formulas for SS

2iA

n

1iiC

2

iAi

2

iCi

)YY( SSE(A)SSE(C) SSR

YY SSE(A)

YY SSE(C)

Page 21: Psych 5510/6510

21

Definitional Formulas (cont.)

You can get SPSS to do these formulas but it is tedious. You need to have SPSS create a new variable that equals the actual Y scores minus the predicted scores, then have SPSS create a new variable that equals the square values of the previous variable, and then finally have SPSS give you the sum of that variable.

2iAi

2

iCi

YY SSE(A)

YY SSE(C)

Page 22: Psych 5510/6510

22

Definitional Formulas (cont.)

While there is no reason to use the right-most part of this formula it does shed light on SSR. For each subject you subtract what Model A predicts their score will be from what Model C predicts their score will be, square those, and add them up. From this we can see that the more different Model A is from Model C the more Model A reduces error (which is what SSR measures).

2iA

n

1iiC )YY( SSE(A)SSE(C) SSR

Page 23: Psych 5510/6510

23

Computational Formulas

While SPSS will often give us what we need this semester, it does not directly provide the values we need for this particular use of the model comparison approach (performing the equivalent of the t test for a single group mean). SPSS can still be used, however, to do most of the number crunching for us.

Page 24: Psych 5510/6510

24

Computing SSE(A)

SSE(A) in this case is what we simply called the ‘SS’ last semester, the sum of the squared deviations from the mean. SPSS won’t give us the SS of a variable but it will give us the ‘variance’ of the variable (actually this is the estimate of the population variance based upon the sample).

.YY SSE(A) so ,YY :isA MODEL2

ii

1)-(n(variance)YY so ,1n

YYvariance

2

i

2

i

Page 25: Psych 5510/6510

25

Computing SSE(A)

If we ask SPSS to find the ‘variance’ of variable Y (this is available through the ‘Descriptive Statistics’ item in the ‘Analyze Data’ menu) we find the variance of the Y scores equals 558.38, as N=15, we find:

28.7817)115)(38.558(

1)-(n(variance)YYSSE(A)2

i

Page 26: Psych 5510/6510

26

Computing SSR

SSE(C) is not so easy to compute but we can get there by first computing SSR, which is easy to compute. In this context (doing the equivalent of a t test for a single group) the formula for SSR reduces to:

2iA

n

1iiC )YY( SSE(A)SSE(C) SSR

20 )Yn(B SSR

Page 27: Psych 5510/6510

27

Computing SSR

In our example SSR equals:

2.2574)61.171(1515(-13.1)

78.1)-15(65 )Yn(B SSR2

220

Page 28: Psych 5510/6510

28

Computing SSE(C)

Now that we have SSE(A) and SSR finding SSE(C) is easy.

Since SSR=SSE(C)-SSE(A), then:

SSE(C) = SSE(A) + SSR

SSE(C) = 7817.28 + 2574.2 = 10391.48

Page 29: Psych 5510/6510

29

Computing PRE

248.010391.48

2574.2

SSE(C)

SSR PRE

Model A led to about a 25% reduction in error compared to Model C.

Page 30: Psych 5510/6510

30

Estimating η²

PRE measures how much error was reduced by Model A in our sample. PRE is a biased estimate of how much Model A would reduce error if applied to our population (η²). PRE tends to be greater than η² . The following formula gives us an unbiased estimate of η² based upon our sample.

Page 31: Psych 5510/6510

31

Unbiased Estimate of η²

Note the last piece in the adjustment. The adjustment becomes bigger when PA is a lot larger than PC (i.e. when PA adds lots of extra parameters), and the adjustment also becomes bigger when PA and PC approach n (approach the maximum amount of parameters we can have). And finally, note that when PA=n, we divide by zero (which is undefined), which makes sense as PRE will always equal 1 if n=PA, so there is no way to estimate the true value of η².

PA-n

PC-nPRE)-(1-1η est. 2

194.01-15

0-15.248)-(1-1η est. 2

Page 32: Psych 5510/6510

32

Our Computations So Far

194. est.

.248 PRE

2574.2 SSR

1PA 817.287 SSE(A)

0PC 10391.48 SSE(C)

2

Page 33: Psych 5510/6510

33

3) Determining whether the PRE is statistically significant (i.e. whether the parameter of Model A is ‘worthwhile’).

Page 34: Psych 5510/6510

34

Worthwhileness

Model C: Ŷi = B0 where B0 = 65

Model A: Ŷi = β0 where β0 = μ

Moving from Model C to Model A reduced error by about 25%. If that reduction is statistically significant then we will reject Model C in favor for Model A, saying that the parameter of Model A is ‘worthwhile’ to add to our model, that a model which uses the mean of the population is better than a model that uses a value of 65, which would imply that the mean of the population must not be 65.

Page 35: Psych 5510/6510

35

Testing the Statistical Significance of the PRE

Three equivalent methods:

A. See if the PREobtained exceeds the PREcritical. This is the most direct way given the model comparison approach.

B. Change the PRE into a value of Fobtained(using what I call the ‘PRE to F Method’), and see if it exceeds Fcritical. This approach is the most conceptually clear of the three.

C. Change the results into Mean Squares and from that compute a value of Fobtained(using what I call the ‘Mean Square to F Method’), and see if it exceeds Fcritical. This approach best fits what is available in SPSS.

Page 36: Psych 5510/6510

36

Hypotheses (again)Model C: Ŷi = B0 where B0 = 65

Model A: Ŷi = β0 where β0 = μ

We have various ways of expressing the hypotheses of this experiment.

H0: β0 = B0 or μ = 65 or η2=0

HA: β0 B0 or μ 65 or η2>0

To understand our first approach we will focus on η2

Page 37: Psych 5510/6510

37

A) Comparing PREobt to PREc

H0: η2=0 (There is no real reduction in error from Model A, the extra parameter of Model A is not worth incorporating into our model)

HA: η2>0 (There is a real reduction in error from Model A, the extra parameter of Model A is worth incorporating into our model)

So we need to look at the PRE from the sample to see if it is large enough to conclude that η2 (i.e. the PRE in the population) is actually greater than zero. But we know that PRE probably won’t exactly equal η2 and in fact is usually greater than η2, so we need to determine what value of PRE (and above) is there only a 5% chance of obtaining if H0 is true.

Page 38: Psych 5510/6510

38

PREcritical Values

See the handout on PRE critical values PA=the number of parameters in MODEL A=1 PC=the number of parameters in MODEL C=0 N=the number of observations=15

PREcritical =.247

If H0 is true (η2=0) then there is a 95% chance that PREobt will be between 0 and .247, there is only a 5% chance that PREobt will be .247 or above. If PREobt PREcritical then reject H0 (p<.05)

PREobt = .248 , so we reject H0, we conclude that it is worthwhile to move to Model A, which in this context means it is worth estimating μ, for it’s not what was proposed by Model C (i.e. μ is not 65).

Page 39: Psych 5510/6510

39

Using the PRE Tool

We could, instead, use the ‘PRE Tool’ and plug in the values for PC, PA, N, and PRE we find that the p value for a PRE of 0.248 is p=0.0496, which is less than or equal to our significance level of .05, so we reject H0.

Page 40: Psych 5510/6510

40

PRE and F*

For the second, equivalent approach to testing the statistical significance of the PRE we will translate the PRE value into an F value. Note: the book uses ‘F*’ to represent Fobtained.

F* and PRE are related, to know one is to know the other (and to love one is to love the other).

Page 41: Psych 5510/6510

41

B) PRE to F* Approach

First, calculate the PRE per parameters added.

248.01

248.

PC-PA

PRE

added parameters #

PREadded parametersper PRE

Page 42: Psych 5510/6510

42

Second, calculate the ‘remaining proportion of error per parameters remaining’.

(1-PRE): remaining proportion of error (after incorporating MODEL A).

(N-PA): maximum number of possible parameters that could be added after incorporating MODEL A.

054.115

248.1

PA-n

PRE-1

Remaining proportion of error per parameters remaining:

Page 43: Psych 5510/6510

43

F*

If adding parameters to MODEL A only helped as much as we would expect due to chance, then the numerator will equal the denominator and F* will approximately equal one*. If the parameters helped more than chance then the numerator will be greater than the denominator and F* will be greater than one.

* (actually F should equal (n-PA)/(n-PA-2) 1)

PA)-PRE)/(n-(1

PC)-PRE/(PAF*

remaining parametersper error of proportion Remaining

added parametersper PREF*

Page 44: Psych 5510/6510

44

Our Example

62.40537.

248.

1)-.752/(15

0)-.248/(1 *F

Obtain Fcritical from the F critical table, or from the ‘F Tool’, with:d.f.numerator=(PA-PC)=1, andd.f.denominator=(n-PA)=14

Fcritical=4.60, F*=4.62,So reject H0. (p<.05)

In the ‘F Tool’ p=0.0496. Same p value as we found for the PRE

Page 45: Psych 5510/6510

45

C) Mean Squares to F* Approach

Residual

Regression

SSE(A)Residual

SSRRegression

SSE(C)

SSE(A)

SSR

MS

MSF*

df

SSE(A)MS

df

SSRMS

PCndf

PA-ndf

PC-PAdf

Note MSresidual a.k.a MSerror

Page 46: Psych 5510/6510

46

ANOVA summary table

Source

(text)

Source

(ANOVA)

SS df MS F* p

SSR Regression 2587 1 2587 4.64 .0496

SSE(A) Residual (error)

7816 14 558

SSE(C) Total 10403 15

Exact p value available through the F Tool, with a table you could just say p<.05.

Page 47: Psych 5510/6510

47

Summary

All three approaches (looking up the PREcritical value, turning the PRE into an F* value, or computing MS’s) lead to exactly the same conclusion to reject H0 with p=.0496. Thus your decision would be to ‘reject H0’. If there are no confounding variables then you can conclude that the extra parameter of Model A is worthwhile, and in turn that the population of students taught using the new method has a different mean than those taught using the old method (i.e. that μ 65).

Page 48: Psych 5510/6510

48

If p had been greater than .05, you would have decided to ‘not reject H0’, and said that you were unable to show that the extra parameter of Model A was worthwhile, that you were unable to show that the mean of the population of students taught using the new method differs from 65.

Remember that you should not infer that H0 is true, only that you failed to reject it. You have not proven that the parameters of Model A are worthless, you have only failed to prove they are worthwhile. Showing that you have a great deal of power in the experiment—on the other hand-- might allow you to infer that H0 is true.

Page 49: Psych 5510/6510

49

Performing a One-Tail Test

The approach we have used so far is for a two-tailed hypothesis:H0: β0 = B0 or μ = 65

HA: β0 B0 or μ 65

We can, with a little extra work, also test a one-tailed hypothesis:

Page 50: Psych 5510/6510

50

One-Tailed HypothesesRemember that HA states the theory we are

trying to prove. Thus if the theory predicts that math scores will increase, then:H0: β0 B0 or μ 65

HA: β0 > B0 or μ > 65

If the theory predicts that math scores will decrease, then:H0: β0 B0 or μ 65

HA: β0 < B0 or μ < 65

Page 51: Psych 5510/6510

51

Making Your Decision

If the data move in the direction predicted by HA then you divide the two-tail p value in half. Using the stat tools we found the two-tail p value =.0496, so in this case p=.0248, reject H0.

Page 52: Psych 5510/6510

52

Making Your Decision

If the data move in the opposite direction than that predicted by HA then:

p = 1 – (two tail p / 2).

Do not reject H0

975.02

.0496-1 p

Page 53: Psych 5510/6510

53

SPSS

If you don’t have access to the stat tools to get a specific p value for your PRE or F*, then the simplest route would be to use SPSS to do a t test for a single mean to get the p value. Doing the t test for a single group mean is the exact equivalent to this use of the model comparison approach.

Page 54: Psych 5510/6510

54

4) Confidence Interval of β0

Page 55: Psych 5510/6510

55

Confidence Interval of β0

H0: β0 = B0

HA: β0 B0

Null hypothesis significance testing (NHST) is designed to let us determine whether we can say that β0 B0. As we discussed last semester, we might be more interested in knowing what the value of β0 is than in knowing if it equals B0 or not. A confidence interval of the true value of β0 would be useful in that endeavor.

Page 56: Psych 5510/6510

56

Confidence Interval

91.18β65.02

13.0878.115

558.34.678.1

n

MSEF bβ

0

α1,n1,:critical00

This gives us more information about the true value of β0 than simply knowing whether it likely equals 65 ornot. And, if we want to perform NHST any hypothesisthat proposes a value for β0 that does not fall in theinterval can be rejected (including H0 in this case).

Page 57: Psych 5510/6510

57

Confidence Interval

91.18β65.02 0

Why do we care? In this use of the model comparison approach β0=μ, so this is also the 95% confidence interval of the true value of μ, which is what we are probably really interested in:

65.02 μ 91.18

The easiest way to get the confidence interval for β0 from SPSS is have SPSS give you the confidence interval of μ.

Page 58: Psych 5510/6510

58

5) PRE as Effect Size

Page 59: Psych 5510/6510

59

PRE as Effect Size

PRE is a measure of effect size, it tells us what proportion of the error is removed when we move from Model C to Model A in our sample. PRE is a statistic that is computed in every test we do using the model comparison approach, thus we will always have a measure of effect size without having to do any additional work.

est. η² provides us an estimate of the effect size in the population from which we sampled.

Page 60: Psych 5510/6510

60

Effect SizeIn this example:

• PRE=0.248 (applying the model to the sample)• est. η²=.196 (applying the model to the population)

These are measures of how much we gain in our model by assuming the mean of the population differs from what is proposed by H0 (in this case they reflect what we gain by assuming that the new teaching method has changed the mean score on the math test).

Page 61: Psych 5510/6510

61

6) Power

Page 62: Psych 5510/6510

62

Power

(from last semester)

Power = the probability that you will reject H0 when H0 is actually false.

(in the model comparison approach)

Power = the probability that you will reject MODEL C in favor of MODEL A when the extra parameters of MODEL A really are worth adding to the model.

.

Page 63: Psych 5510/6510

63

Computing PowerWe again be using GPower 3 to compute the

power of our experiments (GPower 3 can be downloaded for free for both Macs and PCs...see ‘Course Materials’ page).

The use of the model comparison approach to perform the equivalent of a t test for a single group is an atypical use of the approach. While GPower 3 can usually be used to compute power based upon PRE it cannot in this case, as it will not accept PC=0 as a possibility. To compute power using GPower 3 for this use just treat it as a t test for a single group, GPower 3 knows how to handle that.

Page 64: Psych 5510/6510

64

Assumptions

The assumptions underlying the model comparison approach focus on the residuals (error terms) of the model. For the use of the approach in this chapter the relevant assumptions are that:

1. The residuals are normally distributed.2. The residuals are independent of each other.

We will take a closer look at the assumptions after we have looked at models that involve at least one predictor variable (they involve both the assumptions above plus a couple more)..