Ch10-4page

8/22/2019 Ch10-4page

1/30

10 Hypothesis Testing 1

10 Hypothesis Testing

Hypothesis testing is used to investigate whether or not data are

consistent with some theory, when that theory can be quantified

through a particular value of a (population) parameter.

10.1 Four Basic Elements of a Hypothesis Test

Null hypothesis H0 Alternative hypothesis Ha

Test statistic

Rejection region, RR (also called critical region)


1. The Null Hypothesis: denoted by H0.

The null hypothesis H0 is an assertion about the population

(parameter). The purpose of the hypothesis testing is to test the

viability of the null hypothesis in the light of experimental data.

E.g. An experiment on a new antidepressant drug: Ten people suffering from depression were sampled and treatedwith the new drug, and the level of depression of all subjects

was measured after 12 weeks, denoted by Y1, , Yn. We want to compare the mean depression level of the

drug-taken patients with that of patients not taking any drug

(known to be 0 = 6, say).

The null hypothesis would be designated by the following symbols:H0 : = 6.

In this course, we will usually consider the simple null

hypothesis; that is, there is only one possible value of under H0.


2. The Alternative Hypothesis: denoted by Ha.

The alternative hypothesis Ha describes values of other than

those specified in H0. It is usually the hypothesis that we seek to

support based on the information contained in the sample data set.

In the previous example, if the researcher believes that the mean

depression level for patients taking the new drug is smaller than 0 = 6, i.e. the drug is effective in reducing thedepression level, the researcher will use

Ha : < 6 (left-tailed)

larger than 0 = 6, i.e. the drug is not effective, the researcherwill use

Ha : > 6 (right-tailed)

different from 0 = 6, the researcher will useHa : = 6 (two-tailed)


Remark: Notice that H0 false doesnt necessarily mean that Ha is

true (unless the union of H0 and Ha constitutes the entire

parameter space).

In general, the alternative is chosen to reflect the researchers

belief about the parameter. Therefore, Ha is sometimes called the

researchers hypothesis. The aim is to see if the researchershypothesis is supported by the data set.

3. The Test Statistic: the test statistic is a statistic that is used

to test H0 versus Ha. We make our decision by comparing the

observed value of the test statistic to its sampling distribution

under H0. Recall that a statistic is a function of the observed data

Y1, , Yn. used for inference about (population) parameters.

For the antidepressant drug example, a test statistic would bebased on the unbiased estimator of :

T = Y

8/22/2019 Ch10-4page

2/30


Most often a test statistic is based on a MVUE or MLE of the

parameter of interest that describes H0.

If the observed value of the test statistic is consistent with itssampling distribution under H0, then there is not enough

evidence for Ha If the observed value of the test statistic is not consistent withits sampling distribution under H0, and is in the direction of

sampling distribution specified under Ha, then there is enough

evidence to reject H0 (and support Ha.)

4. The Rejection Region (RR): The rejection region (RR)

specifies the values of the test statistic for which H0 is rejected.

If tobs RR, reject H0 If tobs / RR, do not reject H0 The RR is usually located in tails of the sampling distribution

of t derived under H0


In the depression example, suppose RR = {Y 4} and thaty = 4 is observed. Then we reject H0 and conclude that there

is evidence that the druge, on average, reduces depression

levels. The sample mean, y = 3, is significantly lower than

the hypothetical mean, = 6.)The above four elements: H0, Ha, test statistic T, and RR are the

building blocks of a statistical test. Any statistical test should include

all of the above four elements.


Another Hypothesis Testing Example

Assessing ESP (Extra-sensory perception) ability

(fabricated data)

Hypotheses:

H0: Rachael does not have ESP (=random guessing)

Ha: Rachael has ESP

Experiment:

A deck of 52 cards has 26 red and 26 black. The cards are shuffled

and one selected at random. Rachael guesses the color of the card.

Data from experiment:

Carry out n = 20 repetitions, shuffling each time, observing correct or

not correct.Test statistic:

T = Number of correct responses out of n


Distribution of Test Statistic under H0:

T Bin(n = 20, p),where p is the success rate, with H0 : p = 0.5. (Trials assumed

independent)

Restate hypotheses in terms of parameters:

H0 : p = 0.5 [pure guessing]

Ha : p > 0.5 [responses informed by ESP]

Rejection Region RR = {tobs 15}.Outcome of the experiment: tobs = 16

Conclusion:

Since tobs falls in RR, i.e. tobs = 16

{tobs

15

}, contradicting the

null hypothesis H0. We reject H0 as implausible and conclude that the

observed rate of success ( p = .8) is significantly higher than p0 = 0.5.

There is evidence of ESP.

8/22/2019 Ch10-4page

3/30


Hypothesis TestingErrors

Because we are choosing between H0 and Ha based on the sample

data, there is a chance that we make an error.

tobs RR tobs / RRReject H0 Do not reject H0

H0 true Type I error

H0 false Type II error

Two types of errors can be made in reaching a decision.

Type I error: Reject H0 when H0 is true.Type II error: Fail to reject H0 when H0 is false.


Error Probabilities

Probability of making a Type I error:

= P(reject H0|H0 true)

is called the Level of significance or the Level of the test.

Probability of making a type II error (at a particular/given value of

= a in Ha):

= (a) = P(fail to reject H0| = a).


Error Probabilities

Example 10.1.1 In the ESP example, calculate and when p = .7

The Type I error probability:



The Type II error probability for p = 0.7:

(0.7) = P(fail to reject H0|p = 0.7)= P(T < 15|p = 0.7)= P(T 14|p = 0.7)

= B(14; n = 20, p = 0.7)= 0.584.

Practice: calculate (0.9).

8/22/2019 Ch10-4page

4/30


Error Probabilities

Type I error probability (also referred to as level of

significance, or less formally, false positive rate)


Type II error probability (also referred to false negative rate):

= (a) = P(fail to reject H0| = a).

Power(a) = 1 (a):1 (a) = P(reject H0| = a).

Remark: Ideally we would like to reduce both and . However, witha fixed sample size n, we cannot reduce both of them. So we generally

fix to a small value (say 0.05 or 0.01), and construct a RR (in this

way, will be minimized).


Example 10.1.2 LetY Uniform(, + 1). Consider H0 : = 0versus Ha : > 0. The test procedure: reject H0 if Y > 0.95.

1. Calculate the level of significance of this test

2. Calculate when = 0.5.


10.2 Large Sample Tests

Suppose Y1, , Yn is a rs with n large and from a distribution withparameter of interest . Let denote an estimator with large sample

distribution N(, 2

).

For example, if is consistent and unbiased estimator of . By CLT,

we often have, for large n,

d N(, 2

).

We assume that 2

is known or can be estimated by a consistent

estimator 2

.


Large Sample Tests

Notational convention: SD() = andSD() = , V() = 2 .

Null and alternative Hypotheses:H0 : = 0 versus Ha : > 0,

where the value 0 is specified. Test statistic: T = RR = { : > k} (value of k to be determined) Specify the level of significance, (say 0.05 or 0.01), i.e., the

largest Type I error probability that can be tolerated.

We determine the value of k so that the corresponding type I errorprobability equals the pre-specified level . That is, we determine

k by solving the equation:

= P( > k | = 0)

8/22/2019 Ch10-4page

5/30


set= P( > k| = 0)

= P

0

>

k 0

| = 0

PZ > k 0

| = 0= P(Z > z).

Therefore, to solve this equation for k, we need to set

k 0

= z k = 0 + z

Thus,

RR =

: > 0 + z

=

:

0

> z

.


Large Sample TestSummary Procedure

State hypotheses (right-tailed alternative):

H0 : = 0 versus Ha : > 0

Test Statistic:

Z = 0

, [Z

approx N(0, 1) when = 0]

Rejection region:

RR = {zobs : zobs > z}. If left-tailed test: H0 : = 0 versus Ha : < 0, then

RR =

{zobs : zobs z/2}.

Note that the alternatives are paired with the corresponding RRs.


Large Sample TestUnknown Variance

Suppose that is unknown and that

p ,

i.e. is a consistent estimator of . Then

Z = 0

=

0

Since

p 1, and 0

d N(0, 1).

It follows by Sluskys Theorem that

Z =

0

d

N(0, 1).

Therefore, the same test procedure follows.


Large Sample TestUnknown Variance

Use Consistent Estimator of , , in the Z-test statistic.

Test Statistic:

Z = 0

SD(), [Z

approx N(0, 1) when = 0]

Right-tailed test: H0 : = 0 vs Ha : > 0.RR = {zobs : zobs > z}.

Left-tailed test: H0 : = 0 vs Ha : < 0.RR = {zobs : zobs < z}.

Two-tailed test: H0 : = 0 vs Ha : = 0.

RR = {zobs : |zobs| > z/2}.

8/22/2019 Ch10-4page

6/30


Large Sample TestExamples

Example 10.2.1 A company wishes to test its claim that the average

lifetime of the tire they sell is 20,000 miles. The experiment yields

n = 36 observations with the sample mean y = 19, 375 miles. Carry

out the hypothesis test and conclude at significance level = 0.01.Suggested procedures:

Parameter of interest: = = mean life time for the population Construct null and alternative hypotheses:

H0 : = 20, 000 vs Ha : < 20, 000

Compute the test statistic:

Construct the rejection region at level 0.01


Conclude:

Q: What if Ha : = 20, 000?


Example 10.2.2 (10.19 of WMS) A sample of 40 independent

readings on the voltage for this circuit gave a sample mean of 128.6,

and standard deviation 2.1. Test the hypothesis that the mean output

voltage is 130 against the alternative that it is less than 130. Use a

test with level 0.05.


Large Sample TestProportions

Data: Y Bin(n, p) Hypotheses: H0 : p = p0 vs Ha : p > p0

for a given value of p0 (0, 1). The level is specified.

Parameter of interest: = p Unbiased and consistent estimator = p = Y /n 2

= V(Y/n) = p(1p)/n. So under H0, 2 = p0(1p0)/n.

Test statistic:

Z = 0

=

p p0

p0(1p0)/n[by CLT , Z

d N(0, 1) under H0]

RR = {zobs : zobs > z}.

8/22/2019 Ch10-4page

7/30


Large Sample TestProportion

Example 10.2.3 Each member of a panel of 100 tasters was

presented with three glasses of beer in random order, one of which was

different from the other two (e.g. AAB). Each taster was asked to

identify which beer was different. Let p = P(B is correctly identified).If the tasters are unable to distinguish between the beers we would

expect p = 1/3. If they are able to distinguish we expect p > 1/3.

Suppose among 100 tasters, 40 answered correctly. Are tasters able to

distinguish? Carry out the hypothesis test at level 0.05.

Construct the appropriate null and alternative hypotheses:


Calculate the test statistic value:

Construct the rejection region at level 0.05

Conclude:

(BTW, what is the population here?)


Large Sample TestOther

Table 8.1 on page 397:


Example 10.2.4 LetX1, , Xn be a random sample from theexponential distribution with pdf

f(x) =1

ex/, 0 < x < +, 0 < < +.

1. Find the large sample distribution of the method of moments

estimator of , .

2. Set up a large sample test of H0 : = 0 versus Ha : < 0using level . Specify the rejection region.

8/22/2019 Ch10-4page

8/30


3. Using a random sample of size n = 64 and level = 0.05, test the

hypothesis H0 : = 10 versus H0 : < 10 when the sample mean

is x = 7.7. State your conclusion.


Large Sample Test

ExampleTwo-sample Z-test

Example 10.2.5 Samples of 36 males and 40 females tested to

determine their temperature preference. Assume that variances are

known.

Samples:

Y1,1, Y1,2, , Y1,nm , and Y2,1, Y2,2, , Y2,nf.

Males: nm = 36, 2m = 4.0, ym = 74.6

Females: nf = 40, 2f = 2.5, yf = 76.5

Do females and males differ with respect to their temperaturepreferences? Conduct the test at level = 0.01.

Let m and f denote the mean temperature preference of males and

females, respectively.


Solution:


Solution, continued.

Note that if 2m and 2f are unknown, since both sample sizes are

large, we would substitute their estimators from each sample, s2m

and

s2f, where, for example, S2m =

1nm1

nmj=1(Yi,j Ym)2. We may do

this because S2m (and S2f) are consistent estimators of

2m (and

2f).

8/22/2019 Ch10-4page

9/30


Difference between two population proportions

Example: 10.33 on WMS. A political researcher believes that the

fraction p1 of Republicans strongly in favor of the death penalty is

greater than the fraction p2 of Democrats strongly in favor of the

death penalty. He acquired independent random samples of 200Republicans and 200 Democrats and found 46 Republicans and 34

Democrats strongly favoring the death penalty. Does this evidence

provide statistical support for the researchers belief? Use = 0.05.


10.3 Sample Size and Power

Suppose Y1, Y2, , Yn is a rs from N(, 2). We wish to testH0 : = 0 versus Ha : > 0.

Question: can we find a sample size n, which guarantees that theType I & Type II error probabilities will not exceed and

respectively? Here and are prespecified values.


Recall the level test procedure for this hypothesis testing problem.

Note that under H0 : = 0,

Z =Y 0/

n N(0, 1).

Therefore from the large sample test constructed earlier, we reject H0

when

zobs > z.

That is, the Rejection Region (RR) is:

RR = {Y : Y > k}, where k = 0 + z n

.


Therefore, the Type II error probability is:

(a) = P(Do not reject H0| = a)= P(Y k| = a)= P

Y k|Y N(a, 2/n)

= P Y a/n < k a/n = P(Z 0 for right-tailed test)


Sample Size DeterminationSummary

For upper- or lower-tailed level tests, to control the Type II error

probability at when = a under Ha, the required sample size is

(z + z)0 a 2

For two-tailed level tests, the required sample size is:

n =

(z/2 + z)

0 a

2Remark: the above conclusions hold for the z-type tests based on

the normal distribution (or approximate), which applies when the teststatistic is normally distributed (or approximately for large samples)


Sample Size Determination: Example

Example 10.3.1 An SAT prep course claims to increase average

Verbal SAT scores.

To test the claim, n candidates will be selected at random to receive

the training and take the test.

It is known that in the population under study, SAT-V scores are

distributed

N( = 565, 2 = 402)

We want the probability of making a type II error, when there is a 15

point mean increase, to be 0.1(or less), when the level is = 0.05.

Determine how many candidate should be selected.


Power of the Test

The power of a test at a particular value in the alternative, = a, is

defined as

Power(a) = P{Reject H0| = a} = 1 (a)

In the SAT example, we find that ifn is 61 and the true mean SAT-Vscore of people taking the SAT prep course is 580 (increases 15

points), then the power of the test is

Power(580) = 1 0.1 = 0.9

8/22/2019 Ch10-4page

11/30


Power curve for SAT-V example (for n = 61)


Power curve for SAT-V example (for a = 580)


Another exampleBeer tasting

Refer to Example 10.2.3: H0 : p = 1/3 versus H0 : p > 1/3.

RR for a level = 0.05 test is:

RR : zobs =pp0

p0(1p0)/n

> 1.645,

where p0 = 1/3, p = Y /n, Y is the number of tasters that answered

correctly. That is,

RR : p > k, where k = p0 +

p0(1p0)

n z

Calculate (0.5), the Type II error rate when pa = 0.5.

Recall n = 100.


Solution:

10 H th i T ti 45 10 H th i T ti 46

8/22/2019 Ch10-4page

12/30


Q: Calculate the minimize sample size n to control (0.5) at 0.01.

We need solve n such that

k pa

pa(1pa)/n

= z, (1)

where k = p0 +p0(1p0)

n z also depends on n. Solving

p0 +

p0(1p0)n z pa

pa(1pa)/n= z

n =

z

pa(1pa) + z

p0(1p0)p0 pa

2Thus for this example, p0 = 1/3, pa = 0.5, z = 1.645, z = 2.33, so

n = 135


Power curve for Beer example (for n = 100)


Type II error rate for Beer example (for pa = 0.5)


10.4 Test/confidence interval relationship

Hypothesis Testing and Confidence Intervals

Hypothesis testing has a close connection to confidence intervals (CI)

in the sense that confidence intervals are often the

complement of rejection regions

The complement of the RR is sometimes called the acceptance

region

Consider the problem of two-tailed alternatives:

H0 : = 0 Ha : = 0

Test statistic: Z = 0

Rejection region:RR = {Z : |Z| > z/2}

10 Hypothesis Testing 49 10 Hypothesis Testing 50

8/22/2019 Ch10-4page

13/30


From the above, the Acceptance Region is:

RR =|Z| z/2 = 0 z/2 .

Notice that a 100(1 )% CI for is:

z/2.

Therefore,

Reject H0 if and only if 0 / CIIn other words, testing for a level two-tailed alternative is equivalent

to checking if the hypothesized value of (= 0) lies in the

100(1 )% CI for .A similar relationship exists between one-sided alternative hypotheses

and one-sided confidence intervals.


Test-confidence interval relationshipExample

Refer to Example 10.2.5 (Male and Female Temperature Experience).

Males: nm = 36, 2m = 4.0, ym = 74.6

Females: nf = 40, 2f = 2.5, yf = 76.5

Let m and f denote the mean temperature preference of males and

females, respectively.

Construct a 99% confidence interval for m f. Is the valuem f = 0 contained in the confidence interval? Based on theinterval, should we reject H0 : m f = 0?


10.5 The p-value

Observed significance level Often misunderstood P(these data or more extreme; H0 is true) Reject H0 at level p value < Definition: the smallest level of significance at which H0 can be

rejected.


Refer to Example 10.2.4.

H0 : = 10 versus Ha : < 10; zobs = 1.84 If = 0.1, z = 1.28, RR: zobs < 1.28, conclusion: Reject H0 If = 0.05, z = 1.65, RR: zobs < 1.65, conclusion: Reject H0

... If = 0.025, z = 1.96, RR: zobs < 1.96, so Do not reject H0

Questions:

Calculate P(Z < 1.28) and P(Z < 1.85) What is the smallest so that H0 will be rejected?

For what values, we can reject H0? for any > 0.03

Normal Calculator:

http://www.stat.tamu.edu/west/applets/normaldemo.html


8/22/2019 Ch10-4page

14/30


Calculation of p-values

The p-value is the probability of obtaining a test statistic value as

extreme as the observed value, calculated assuming H0 is true.

Consider testing H0 : = 0. Suppose the test statistic Z N(0, 1)(or approximately) under H0.

Left-tailed test Ha : < 0, p-value=P(Z < zobs) Right-tailed test Ha : > 0, p-value=P(Z > zobs) Two-tailed test Ha : = 0,

p-value=P(Z < |zobs| OR Z > |zobs|) = 2P(Z > |zobs|)

The more extreme observed test statistic value smaller p-value more evidence to reject H0


Calculation of p-value

Example 10.5.1 Refer to the Beer Tasting example 10.2.3.

H0 : p = 1/3, Ha : p > 1/3, = 0.05

y = 40 out of n = 100 correct ids observed. Calculate the p-value andconclude.


Example 10.5.2 (10.57 of WMS)

A publisher of a newsmagazine ad found through past experience that

60% of subscribers renew their subscription. In a recent random

sample of n = 200 subscribers, 108 indicated that they planned to

renew. What is the p-value associated with the test that the current

rate of renewal differs from the previously experienced? State your

conclusion using = 0.05. How about = 0.1?

BTW, does the total number of subscribers, N matter?


Example 10.5.3 Refer to Example 10.2.5 (Male and Female

Temperature Preference).

Test H0 : m f = 0 versus Ha : m f = 0. Calculate thep-value and conclude using = 0.01.

Males: nm = 36, 2m = 4.0, ym = 74.6

Females: nf = 40, 2f = 2.5, yf = 76.5


8/22/2019 Ch10-4page

15/30

yp g

10.6 Testing means in small samples (normal)

Recall: The z-test

Testing Means in Normal Samples with Known Variances

Assumption: Y1, , Yn a rs from N(, 2

), is known H0 : = 0 versus

Ha : = 0 (two-tailed test) Ha : < 0 (lower-tailed test)

Ha : > 0 (upper-tailed test)

Test statistic:

Z =Y

0/n

Under H0 : Z N(0, 1)

yp g

Two-tailed test:Ha : = 0RR =

zobs : |zobs| > z/2

p-value = P(|Z| > |zobs|) = 2P(Z > |zobs|)

Right-tailed test:

Ha : > 0

RR = {zobs : zobs > z}p-value = P(Z > zobs)

Left-tailed test:Ha : < 0

RR = {zobs : zobs < z}p-value = P(Z < zobs)


Recall: The large-sample z-test

Testing Means in Large Samples with Unknown Variances

Assumption: Y1, , Yn a rs with common mean andunknown variance 2, where n is large.

H0 : = 0 versus Ha : = 0 (two-tailed test) Ha : < 0 (left-tailed test)

Ha : > 0 (right-tailed test)

Test statistic:Z =

Y 0S/

n,

S2 is the sample variance.

Under H0 : Z approx N(0, 1) RR and p-value calculations are the same as the previous z-test



zobs : |zobs| > z/2

p-value = P(|Z| > |zobs|) = 2P(Z > |zobs|)

Right-tailed test:

Ha : > 0

RR = {zobs : zobs > z}p-value = P(Z > zobs)


RR = {zobs : zobs < z}p-value = P(Z < zobs)


8/22/2019 Ch10-4page

16/30

The t-test

Testing Means in Small Normal Samples

Y1, , Yn a rs from N(, 2), 2 unknown and n is small

H0 : = 0 versus

Ha : = 0 (two-tailed test) Ha : < 0 (lower-tailed test)

Ha : > 0 (upper-tailed test)

Significance level: Test statistic:

T =Y 0S/n

,

where Y and S2 are the sample mean and variance.

Under H0, T tn1


tobs : |tobs| > tn1,/2

p-value = P(|Tn1| > |tobs|) = 2P(Tn1 > |tobs|),

where Tn1 is a rv following the tn1 distribution.Right-tailed test:

Ha : > 0

RR = {tobs : tobs > tn1,}p-value = P(Tn1 > tobs)


RR = {tobs : tobs < tn1,}p-value = P(Tn1 < tobs)

T distribution calculator:

http://www.stat.tamu.edu/west/applets/tdemo.html


Example: IQ Test

Example 10.6.1 Ten sampled students aged 18-21 years received

special training. They are given an IQ test that is N(100, 102) in the

general population. Let be the mean IQ of these students who

received special training. The observed IQ scores:

121, 98, 95, 94, 102, 106, 112, 120, 108, 109

Test if the special training improves the IQ score using significance

level = 0.05.


Solution, continued

. . .p = .029, so that at level = .05, H0 is rejected, and the observed

sample mean, Y is significantly greater than = 100.


8/22/2019 Ch10-4page

17/30

Small Sample TestsTwo-Sample t-test

Independent random samples of size n1 and n2 from populations

N(1, 2) and N(2,

2), where is unknown.

H0 : 1

2 = D0 (e.g. D0 = 0) versus

Ha : 1 2 = D0 (or 1 2 < D0 or 1 2 > D0) Significance level Test statistic:

T =Y1 Y2 D0Sp

1n1

+ 1n2

where Y1 and Y2 are the sample means and S21 and S

22 are the

sample variances from two groups.

and the pooled estimator of the common variance 2 is

S2p =(n1 1)S21 + (n2 1)S22

n1 + n2 2 .

Under H0 : T

tn1+n22

Two-tailed test:Ha : 1 2 = D0RR =

tobs : |tobs| > t/2,n1+n22

p-value = P(|Tn1+n22| > |tobs|) = 2P(Tn1+n22 > |tobs|),where Tn1+n22 is a rv following the tn1+n22 distribution.

10 Hypothesis Testing67

Example: Recovery time for new drug

Example 10.6.2 Twenty subjects randomized to two groups, n = 10

each. The recovery time for patients taking a new drug (or placebot)

is measured in days. Data follow

with drug (1): 15 10 13 7 9 8 21 9 14 8

placebo(2): 15 14 12 8 14 7 16 10 15 12

Assume that the data are normally distributed and that 1 = 2. Use

= 0.05 to test H0 : 1 2 = 0 versus Ha : 1 2 < 0


Two-Sample t-test (Unequal variances)

Basic Assumptions

1. X1, , Xm is a random samples from N(1, 21), and 1 isunknown.

2. Y1, , Yn is a random samples from N(2, 22), and 2 isunknown.

3. The X and Y samples are independent of each other.

The standardized variable

(X Y) (1 2)S21

m +S22

n

has approximately a t distribution with degree of freedom estimated

from the data by

=( s

21

m +s22n )

2

(s21

/m)2

m1 +(s22

/n)2

n1(round down)


8/22/2019 Ch10-4page

18/30

Null hypothesis: H0 : 1 2 = D0.Test statistic value:

tobs =(x y)D0

s21

m +s22

n

.

Alternative Hypothesis Rejection Region for Level Test

Ha : 1 2 > 0 tobs t,Ha : 1 2 < 0 tobs t,Ha : 1 2 = 0 tobs t,/2 OR t t,/2

The p-values can be calculated as in the two-sample t-test with equal

variance.

Sign Test - Nonparametric Test

Random sample size n from a continuous distribution with median .

H0 : = 0 versus Ha : > 0

Significance level

Test statistic: S =No of observations > 0 Under H0: S Bin(n, 0.5) Form of RR = {sobs k} p-value =P(S sobs)


Sign Test

Example: IQ test

Refer to Example 10.6.1:

The observed IQ scores:

121, 98, 95, 94, 102, 106, 112, 120, 108, 109

H0 : = 100 versus Ha : > 100 = 0.05 Observed test statistic value: sobs = 7 Under H0 : S Bin(10, 0.5)

p-value=P(S

7) = 1

P(S

6) = 1

0.828 = 0.172.

Conclusion: do not reject H0


10.7 Testing Variances in Small Normal Samples

Random sample from N(, 2). Consider

H0 : 2 = 20 versus Ha : 2 = 20(or 2 < 20 or 2 > 20) Significance level Test statistic:

2 = (n1)S2

20

( 2n1 under H0)


8/22/2019 Ch10-4page

19/30

Two-tailed test:

Ha : 2 = 20

RR : 2obs > 2/2,n1 or

2obs <

21/2,n1

p-value = 2 min{P(2 > 2obs), P(2 < 2obs)}

Right-tailed test:

Ha : 2 > 20

RR : 2obs > 2,n1

p-value = P(2 > 2obs)

Left-tailed test:

Ha : 2

< 20

RR : 2obs < 21,n1

p-value = P(2 < 2obs)

Example: IQ test

Example 10.7.1 IQ Example 10.6.1: H0 : 2 = 100 versus

Ha : 2 > 100. Use significance level = 0.05.

Recall: y = 106.5, s = 9.5.

Solution:

Observed test statistic value:

Rejection region:

p-value=


Two Sample Variance Tests

Suppose that S21 and S22 are the sample variances for two independent

random samples of size n1 and n2 from distributions N(1, 21) and

N(2, 22). All parameters are unknown.

The forms

Uj =(nj

1)S2j

2j , j = 1, 2

are independent 2 random variables with (n1 1) and (n2 1)degrees of freedom, respectively, so that

F =U1

n1 1 /U2

n2 1 =S1

22

S2221

Fn11,n21.

Thus, when 21 = 22,

F =S21S22

Fn11,n21


To test the equality of the two population variances:

H0 : 21 = 22 versus Ha : 21 = 22 (or 21 > 22 or 21 < 22) Significance level

Test statistic: F = S21/S

22

RR (two-tailed test): F > Fn11,n21,/2 orF < Fn11,n21,1/2 = (Fn21,n11,/2)

1

RR (right-tailed test): F > Fn11,n21, = (Fn21,n11,1)1

RR (left-tailed test): F < Fn11,n21,1 = (Fn21,n11,)1


8/22/2019 Ch10-4page

20/30

Two Sample Variance Tests: Example

Example 10.7.2 Compare the variances of the amount of active

ingredients in generic and brand-name drugs. Random samples of

size 20 (generic) and 30 (brand-name). Data: s2g = 0.00109mg2,

s2

b = 0.000384mg2

. Use level = 0.05 to test

H0 : 2g =

2b versus Ha :

2g >

2b

10.8 Neyman-Pearson - MP -level test

Consider a test involving a parameter with test statistic W and

rejection region RR. The power of the test:

Power() = P(reject H0 when the parameter value is )= P(W RR, when the parameter value is )

Relationship between power and ,

Suppose H0 : = 0, a is a parameter value under Ha. Then

Power(0) = = P(Reject H0 when H0 is true)

Power(a) = 1

(a)

We would like to choose a level (Type I error) RR to maximize the

Power() for in Ha, i.e. find the Most Powerful (MP) -level

test.


Neyman-Pearson Lemma

MP -level Tests

Y1, , Yn is a rs from a distribution with parameter and likelihoodL(). We wish to test:

H0 : = 0 versus Ha : = a,

using level of significance , where 0 and a are given.

Theorem 1 The Neyman-Pearson Lemma For the given level

of significance, , the test that maximizes Power(a) has a RR with

the form:

RR :L(0)

L(a)< k,

wherek is chosen to insure that the level (Type I error probability) is

. The Most Powerful (MP) - level test is sometimes called the best

test.


Remark: The Neyman-Pearson Lemma gives the Rejection Region,

RR, that maximizes the power:

Power(a) = P(RR| = a)

given that

P(RR

| = 0) =


8/22/2019 Ch10-4page

21/30

MP -level Test

Example: Beta(, 1) (n = 1)

Example 10.8.1 One observation n = 1, Y from Beta(, 1) with

pdf:

f(y|) = y

1

, 0 < y < 1(a) Use the N-P lemma to find the = .05 MP test of

H0 : = 2 versus Ha : = 1

(b) For the MP0.05-level test derived above, calculate Power(1)

Solution:

Likelihood L() = f(y

|) = y1 Therefore, in this case

L(0)

L(a)=

L(2)

L(1)=

2y

1y0= 2y, 0 < y < 1

By N-P Lemma, the rejection region for the MP test:

RR = {2Y < k} or {Y < k/2}

Determining k:

= P(RR| = 0 = 2)= P(Y < k/2| = 2)

=

k/20

2ydy = y2|k/20 = (k/2)2

k/2 = k = 2

Therefore, for this problem, the MP 0.05-level test has the rejection

region

RR : Y < 0.05 = 0.2236


(b) Solution (compute Power(1)):

Power(1) = P(RR| = a = 1)= P(Y < 0.2236| = 1)

=

0.22360

1y0dy =

0.22360

1dy

= y|0.22360= 0.2236.


MP -level Test

Normal Sample2 known

Y1, , Yn rs N(, 2), 2 known. TestH0 : = 0 versus Ha : = a, where a > 0

The pdf of Yi is

f(y|) = 1

2exp

(y )

2

22

, < y < +.

Use the N-P lemma to find the MP -level test procedure.

Solution:

From the pdf, we obtain the likelihood for :

L() = f(y1

|)f(y2

|)

f(yn

|)

= 1

2

nexp

ni=1

(yi )222


8/22/2019 Ch10-4page

22/30

By the N-P Lemma, the FORM of the MP -level test rejection region

is:

RR :L(0)

L(a)< k.

L(0)L(a)

= 12n

1

2

n expni=1 (yi0)2

22 exp

ni=1 (yia)222 < k

exp 1

22

n

i=1

(yi 0)2 n

i=1

(yi a)2

< k

1

22

n

i=1(yi 0)2

n

i=1(yi a)2

< ln(k)

n

i=1

(yi 0)2 n

i=1

(yi a)2

> 22ln(k)

RR :

n

i=1

(yi 0)2 n

i=1

(yi a)2

> 22ln(k)

n

i=1y2i 2ny0 + n20

n

i=1y2i 2nya + n2a

> 22ln(k)

2ny(a 0) + n20 n2a > 22ln(k) 2ny(a 0) > 22ln(k) n20 n2a

since a 0 > 0 RR : y > 2

2ln(k) n20 n2a2n(a 0) .

Note that the right hand side of the above does not involve the data,

so the inequality is equivalent to y > k So the form of the RR becomes:

RR : y > k


RR : y > k

To determine k, we set

= P(Y RR| = 0)= P(Y > k | = 0)

= P Y 0/

n> k

0/

n| = 0

= P

Z >

k 0/

n

Therefore,

k 0/

n= z

k = 0 + z n


Thus, the MP -level test of

H0 : = 0 versus Ha : = a, where a > 0

has the rejection region:

RR : y > 0 + z

n

or equivalently:

RR : Z =y 0/

n> z.


8/22/2019 Ch10-4page

23/30

Uniformly Most Powerful (UMP) -level test

In the Normal sample example with known , note that the test does

not depend on a (except that we need the assumption a > 0).

Because of this, the test is the most powerful level test for

Ha : = a, where a > 0. We call such test the UniformlyMost Powerful (UMP) -level test of

H0 : = 0 versus Ha : > 0.

Simple hypothesis: hypothesis that uniquely specifies the

distribution of the population from which the sample is taken.

Composite hypothesis: not a simple hypothesis.Eg. for the above normal sample example with known H : = 0 is

simple, while H : > 0 is composite

MP -level test

Exponential Sample

Example 10.8.2 Y1, , Yn rs from Exp() with pdf:

f(y) =1

ey/ , 0 < y < +

(1) Using N-P Lemma to construct the MP -level test for

H0 : = 0 versus Ha : = a, wherea > 0

Hint:n

i=1 Yi Gamma(n, ).


(2) Construct the UMP -level test for

H0 : = 0 versus Ha : > 0.


8/22/2019 Ch10-4page

24/30

(3) For n = 36, we wish to test

H0 : = 1 versusHa : = 2(or > 1)

using level = 0.01. Use StaTable to find the critical value at

http: // www. cytel. com/ Products/ StaTable/ or

http: // mcsp. wartburg. edu/ nmb/ fall10/ math313/

seeingstats/ Chpt4/ gammaProb. html

Gamma(, ), is the shape parameter and is the scale

parameter.

(4) (Large Sample Approach). Using the CLT to obtain an

approximate test for the hypotheses in (3).


Summary: Neyman-Pearson Lemma

MP -level Tests

N-P Lemma provides the test statistic and form of the RR for theMP test. The constant (critical value) must be determined to

assure that the test is level .

It is not always possible to find a UMP test. The N-P Lemma cannot be applied if there are unknown

parameters other than .

If the rvs in the random sample are discrete, then it is usually notpossible to achieve a given level of significance, .


10.9 Likelihood Ratio Test

Likelihood Ratio Test (LRT)

An approach for developing tests when either or both of thehypotheses are composite.

Can be used when the model for the data has more than oneparameter:

1, 2, , kWe will refer to the vector of parameters:

= (1, 2, , k)

The likelihood of the random sample:

L(1, 2, , k) = L()


8/22/2019 Ch10-4page

25/30

LRT: Normal Example

Let Y1, , Yn be a random sample from N(, 2), where 2 isunknown. Then

= (, 2).

We wish to test

H0 : = 0 where Ha : = 0Note that the null hypothesis is actually:

H0 : = 0, 2 > 0,

i.e. H0 is not simple (it is composite).

In this situation, H0 states that the parameters fall in a particular set,i.e. is in set 0. We write 0: the parameter space underthe null hypothesis.

For our example,

H0 : 0 = {(, 2) : = 0, 2 > 0}

The alternative states:

Ha : {(, 2) : = 0, 2 > 0} = a,where a denotes the parameter space under the alternative

hypothesis.


We denote the union of the sets 0 and a by , i.e.

0 a =

In our example:

= 0 a= {(, 2) : = 0, 2 > 0} {(, 2) : = 0, 2 > 0}= {(, 2) : < < +, 2 > 0}

That is, = set of all possible values of the parameters, without

regard to the hypotheses.



Notations:

L(0) = max0

L()

denotes the maximum of the likelihood of the parameter values in 0

(under H0).

L() = max

L()

denotes the maximum of the likelihood of the parameter values in .


8/22/2019 Ch10-4page

26/30


This is always true:

L(0) L(),since the space contains 0. If the maximum over falls in 0 then

L(0) = L(a)

Evidence that H0 is false (and Ha is true) is that

L(0) 0}

= {(, 2) : < < +, 2 > 0}Use the LRT method to find the RR for a level test.



8/22/2019 Ch10-4page

27/30


LRT: Large Sample RR

Theorem 2 Let Y1, , Yn have a joint likelihood L(). Letr0 = # free parameters in 0

r = # free parameters in

Assuming that certain regularity conditions hold, then under H0 and

for large n, 2ln() has approximately a 2 distribution with r r0degrees of freedom.


LRT: Normal Example

Using Theorem ?? to derive the level large sample RR.


LRT E l 2 P i Di i T t


E l 2 P i Di T t

8/22/2019 Ch10-4page

28/30

LRT Example 2: Poisson Dispersion Test

Let X1, X2, , Xn be independent rv fromPoisson(i), i = 1, 2, , n with pmf:

P(Xi = xi) =eixii

xi!, xi = 0, 1,

; i > 0

We wish to test

H0 : i = , i = 1, 2, , nversus

Ha : i are not all equal.

Assume that n is large. Construct an approximate level LRT.

Example 2: Poisson Disperson Test

NIST test of asbestos fibers, # fiber on 23 squares on a grid:

31 29 19 18 31 28 34 27 34 30 16 18 26 27 27 18 24 22 28 24 21 17 24


Example 3: Binomials

Let Xi Binomial(ni, pi), i = 1, 2 be independent. We wish to test

H0 : p1 = p2 versus Ha : p1 = p20 = {(p1, p2) : 0 < p1 = p2 < 1}

a = {(p1, p2) : 0 < p1 < 1, 0 < p2 < 1}Both hypotheses are composite. Suppose ni are large. Carry out an

approximate level LRT.


Example 3: Binomials

Clinical Trial: Allergy Medicine versus Placebo

Randomization of 3774 subjects:

Allergy medicine group: n1 = 2103, x1 = 547 reported headaches

Placebo group: n2 = 1671, x2 = 368 reported headaches

Test whether the proportion of those reporting headaches in differentin the two groups, using significance level = 0.05.

Ref: Michael Sullivan, III. (2004) Statistics: Informed Decisions Using Data.


S LRT


S f Ch t 10

8/22/2019 Ch10-4page

29/30

Summary: LRT

The likelihood ratio approach does not guarantee an optimum test(unlike the N-P Lemma).

Using the likelihood ratio approach will customarily provide anacceptable test.

Unlike that N-P Lemma, the likelihood ratio approach can beapplied where the underlying model has nuisance parameters

(parameter not of particular interest).

Summary of Chapter 10

Four Basic Elements of a Statistical Test: (1) H0; (2) Ha; (3)test statistic; (4) rejection region

Error probabilities:

Type I error probability (level): = P(reject H0H0 istrue) (sending an innocent person to jail)

Type II error probability: (a) = P(fail to reject H0Hais true with = a) (setting a guilty person free)

Power() = P(reject H0 with parameter value ) Large sample tests

Suppose

N(, 2

) or approximately. For instance, apply

CLT when is consistent and unbiased for , and n is large.

2

is known or can be estimated consistently by 2

H0 : = 0 versus Ha : > 0


Test statistic: T = or Z = ( 0)/ RR: : 0

> z or equivalently > 0 + z

P-value=P(Z > zobs) (for right tailed z-test)

Typical examples:

Test on population means (one-sample z-test) Test on two population mean difference (two-sample z-test) Test on population proportions (one-sample z-test) Test on two proportion difference (two-sample z-test)

Calculation of (a) and Power for a level test procedure Determination of sample size to control the Type II error

probability at level for a level test procedure. Examples

discussed include

Test on means with one-sample z test (SAT prep course)

Test on means with two-sample z test (WMS 10.44)


Test on population proportions with one-sample z test

(Beer-tasting)

Test-CI relationship: CIs are complements of RRs, which are alsocalled acceptance regions

p-value Observed significance level Smallest level of significance, for which the observed value

indicates that H0 should be rejected

Tail area captured by the observed test statistic

Reject H0 when p-value <

Testing mean in small normal samples

One sample t-test Two sample t-test (assuming equal variance)


Testing variance in small normal samples


L() ( t i t d lik lih d) h i l f

8/22/2019 Ch10-4page

30/30

Testing variance in small normal samples 2-test for one population variance

F-test for testing the equality of two population variances

Neyman-Pearson Lemma

Use N-P Lemma to construct MP -level test for testingsimple hypotheses H0 and Ha

Based on the MP test, construct uniformly MP -level test for

composite hypotheses

Only one unknown parameter is involved

Likelihood-ratio test Test statistic: = L(0)

L()

L(0) (restricted likelihood): the maximum value ofthe likelihood when the parameters are restricted (and

reduced in number) based on the assumption of H0

L() (unrestricted likelihood): the maximum value ofthe likelihood when some the parameters are unrestricted,

i.e. obtained under the entire parameter space

Construct the RR for a level based on the sampling

distribution of

Large sample procedure: for large n, when the distribution

satisfies some regularity conditions, 2ln() 2rr0 , r: No of free parameters in the whole parameter space

(unrestricted)

r0: No of free parameters in the restricted parameter space(under H0)

Ch10-4page

Documents

Transcript of Ch10-4page