ch11-solns-all_skuce_2e

8/12/2019 ch11-solns-all_skuce_2e

1/46

Instructors Solutions Manual - Chapter 11

Chapter 11 Solutions

Develop Your Skills 11.1

1. These data are collected on a random sample of days. They should be independent,unless the locations are close enough to each other that the foot traffic at each would

be affected by the same factors. We will assume this is not the case.

Histograms show approximate normality.

0

2

4

6

8

10

12

Number

ofDays

NumberofPeople

DailyFootTrafficatLocation1

0

1

2

3

4

5

6

7

8

9

NumberofDays

Numberof

People


Copyright 2011 Pearson Canada Inc. 275


2/46


0

1

2

3

4

5

67

8

9

10

NumberofD

ays

NumberofPeople


The histogram for foot traffic at location 1 shows some right-skewness, butsample sizes are reasonable, and close to the same, so we will assume thepopulation data are normally distributed.

The largest variance is 478.7 (for location 2), and the smallest is 257.2 (location1). The largest variance is less than twice as large as the smallest. So, followingour rule, we will assume the population variances are approximately equal.

Therefore, these data meet the required conditions for one-way ANOVA.



3/46


2. Because we don't know the details of how the cashiers made their sample selection,we cannot know if the sample was truly random or independent. We will assume thatthe sample data were properly collected.

Histograms suggest normality.

0

5

10

15

20

NumberofPurchases

Valueof

Purchase

WineryPurchasesforCustomers

Under30YearsofAge

02

4

6

8

10

12

14

16

18

N

umberofPurchases

ValueofPurchase


Aged3050

0

2

4

6

8

10

12

14

NumberofPurchases

ValueofPurchase


Over50YearsofAge



4/46


The largest variance is 652.9, and the smallest is 555.1, so clearly the samplevariances are fairly close in value. We will assume that the population variances areapproximately equal.

These data appear to meet the requirements for one-way ANOVA.

3. We will presume that the college collected the sample data appropriately, so the dataare independent and truly random.

The histograms suggest normality.

0

1

2

3

4

5

6

7

NumberofGraduates

AnnualSalary

AnnualSalariesofMarketing

Graduates

0

1

2

3

4

5

6

7

8

9

NumberofGraduates

AnnualSalary

AnnualSalariesofAccounting

Graduates



5/46


0

1

2

3

45

6

7

8

NumberofGraduates

AnnualSalary

AnnualSalariesofHuman

ResourcesGraduates

0

1

2

3

4

5

6

7

NumberofGraduates

AnnualSalary

AnnualSalariesofGeneral

BusinessGraduates

The largest variance is 159,729,974, and the smallest is 70,826,421. The ratio of thelargest to the smallest is about 2.3, which is meets the requirement (less than four).These data appear to meet the requirements for one-way ANOVA.



6/46


4. It appears the data are randomly selected, and independent.

The data sets are too small for histograms, but stem-and-leaf displays suggestnormality.

Route1

3 3 6

4 0 5 6 8

5 1 4 7

6 0

Route2

2 2 8 8

3 2 3 5 5 8

4 6 9

Route3

3 1 6

4 3 6 9

5 3 5 7 6

6 1

The largest variance is 94, the smallest is 67, for a ratio of largest-to-smallest of

about 1.4. This is within the accepted range, so we will assume the populationvariances are approximately equal.

These data appear to meet the requirements for one-way ANOVA.

5. The histograms appear approximately normal. We have to be a bit cautious aboutassuming these are random samples. For example, one class may be mostlyAccounting students, one may be mostly Marketing students, etc. The students whohave selected these programs may have different levels of interest and aptitudes forstatistics. We will assume that the classes are approximately randomly selected, inthe absence of other information, but should note the caution.

The largest variance is not much larger than the smallest variance, so we will assumethe population variances are approximately equal.



7/46


Develop Your Skills 11.2

6. H0: 1= 2= 3

H1: At least one differs from the others.

= 0.05

nT= 85, n1= 27, n2 = 30, n3= 28, k = 3

1x 50.5556, 2x 56.6, 3x 74.3214

21s 257.1795, 478.7310, 333.5595

22s

2

3s

SSbetween= 8475.2497, SSwithin= 29,575.9738

We have already checked for normality and equality of variances.

749.11

6826.360

6249.4237

829738.29575

2

2497.8475

1

knSS

k

SS

MS

MSF

T

within

between

within

between

The F-distribution has 2, 82 degrees of freedom. The closest we can come in thetable is 2, 80. We see that the p-value is < 1% (Excel provides a p-value of 0.00003).Reject H0. There is sufficient evidence to conclude that at least one of the locationshas a different average number of daily passersby than the others.

The Excel output for this data set is shown below.

Anova:Single

Factor

SUMMARY

Groups Count Sum Average Variance

Location1 27 1365 50.5556 257.1794872

Location2 30 1698 56.6000 478.7310

Location3 28 2081 74.3214 333.5595238

ANOVA

SourceofVariation SS df MS F Pvalue

BetweenGroups 8475.2497 2 4237.6249 11.749 3.26E05

WithinGroups 29575.9738 82 360.6826074

Total 38051.2235 84



8/46


7. H0: 1= 2= 3


= 0.05

nT= 150, n1= 50, n2 = 50, n3= 50, k = 31x 77.5684, 2x 119.6708, 3x 132.4674

2

1s 652.9145, 555.0899, 625.78462

2s 2

3s

SSbetween= 82504.4210, SSwithin= 89855.6606


F = 67.5

The F-distribution has 2, 147 degrees of freedom. Excel provides a p-value ofapproximately zero. Reject H0. There is sufficient evidence to conclude thatcustomers in different age groups make different average purchases.

8. H0: 1= 2= 3= 4


= 0.025

nT= 80, n1= 20, n2 = 20, n3= 20, n4= 20, k = 4

1x 51,395, 2x 71,170, 3x 56,100, 4x 53,885

21s 159,729,973.68, 70,826,421.05, 116,576,842.11, 76,859,236.84

22s

2

3s 2

4s

SSbetween= 4,750,850,500, SSwithin= 8,055,857,000


F = 14.9

The F-distribution has 3, 76 degrees of freedom. Excel provides a p-value ofapproximately zero. Reject H0. There is sufficient evidence to conclude that at leastone of the program streams had an average salary for graduates that differs from thatof the other program streams.



9/46


Anova:SingleFactor

SUMMARY


Marketing 20 1027900 51395 159729973.68

Accounting 20 1423400 71170 70826421.05HumanResources 20 1122000 56100 116576842.11

GeneralBusiness 20 1077700 53885 76859236.84

ANOVA


BetweenGroups 4750850500 3 1.58E+09 14.94004664 9.77E08

WithinGroups 8055857000 76 1.06E+08

Total 12806707500 79

9. H0: 1= 2= 3


= 0.05

nT= 30, n1= 10, n2 = 10, n3= 10, k = 3

1x 47, 2x 34.6, 3x 48.7

2

1s 78.4444, 67.1556, 94.01112

2s 2

3s



F = 7.4

The F-distribution has 2, 27 degrees of freedom. Excel provides a p-value of 0.0027.Reject H0. There is sufficient evidence to conclude that the average commuting timefor at least one of the routes is different from the others.

The Excel output is shown below.



10/46


Anova:SingleFactor

SUMMARY


Route1 10 470 47 78.44444

Route2 10 346 34.6 67.15556

Route3 10 487 48.7 94.01111

ANOVA


BetweenGroups 1184.86667 2 592.4333 7.417436 0.002708

WithinGroups 2156.5 27 79.87037

Total 3341.36667 29

10. H0: 1= 2= 3


= 0.05

nT= 135, n1= 45, n2 = 45, n3= 45, k = 3

1x 70.1111, 2x 56.6889, 3x 54.0667

2

1s 212.1010, 226.5828, 218.01822

2s 2

3s



F = 15.2

The F-distribution has 2, 132 degrees of freedom. Excel provides a p-value ofapproximately zero. Reject H0. There is sufficient evidence to conclude thatdifferences in the use of the online software are associated with differences in finalgrades.

We should be cautious about interpreting the results, because although there isevidence of a difference in the average grades, we cannot necessarily attribute the

differences in the use of the online software as the cause. There are many potentialconfounding factors, that is, other factors which could have an effect on the finalgrades.



11/46


Develop Your Skills 11.311. Completed Excel templates are shown below.

For locations 1 and 3:

TukeyKramerConfidence

Interval

Wasthenullhypothesis

rejectedintheANOVAtest? yes

xbari 50.5556

xbarj 74.3214

ni 27

nj 28

q(fromAppendix7) 3.4

MSwithin 360.682607

UpperConfidenceLimit 11.4505171

LowerConfidenceLimit 36.0812289



Interval


rejectedin

the

ANOVA

test? yes

xbari 56.6000

xbarj 74.3214

ni 30

nj 28


MSwithin 360.682607






12/46



Interval



xbari 50.5556

xbar

j 56.6000

ni 27

nj 30


MSwithin 360.682607



The first two confidence intervals do not contain zero, so it appears that the averagenumber of people passing by location 3 is greater than at the other two locations.

12. Completed Excel templates are shown below (to save space, the row checking forrejection of the null hypothesis in ANOVA is not shown).

For under 30 and over 50:

TukeyKramerConfidenceInterval

xbari 77.568

xbarj 132.467

ni 50

nj 50


MSwithin 611.2629973





13/46


For under 30 and 30-50:


xbari 77.568

xbar

j 119.671

ni 50

nj 50





For 30-50 and over 50:

TukeyKramer

Confidence

Interval

xbari 119.671

xbarj 132.467

ni 50

nj 50





None of these confidence intervals contains zero. Certainly the highest averagepurchase is with those over 50.

13. Completed Excel templates are shown below (to save space, the row checking forrejection of the null hypothesis in ANOVA is not shown).

Marketing and Accounting:


xbari 51395.000

xbarj 71170.000

ni 20

nj 20


MSwithin 105998118.4210530





14/46


Accounting and General:


x

bar

i 71170.000xbarj 53885.000

ni 20

nj 20


MSwithin 105998118.4210530



Accounting and Human Resources:


xbari 71170.000

xbarj 56100.000

ni 20

nj 20


MSwithin 105998118.4210530



Marketing and Human Resources:


xbari 51395.000

xbarj 56100.000

ni 20

nj 20


MSwithin 105998118.4210530



At this point, no further comparisons are necessary. Since this interval contains zero,there does not appear to be a significant difference between the average salaries ofMarketing graduates and Human Resources graduates. The differences between the



15/46


sample means for all other pairs are smaller than for this pair, and so we know therewill not be a significant difference for the other pairs.

To summarize: We have 95% confidence that the interval

($-28,385.05, $-11,164.95) contains the average difference in the salaries of

Marketing graduates, compared to Accounting graduates (in other words, theaverage salary of Accounting graduates is likely at least $11,164.95 higher)

($8,674.95, $25,895.05) contains the average difference in the salaries ofAccounting graduates, compared to General Business graduates

($6,459.95, $23,680.05) contains the average difference in the salaries ofAccounting graduates, compared to Human Resources graduates.

The differences between the average salaries of Human Resources, GeneralBusiness, and Marketing graduates are not significant.

14. Because of the balanced design, these calculations simplify to:

86321.9)(

10

8703704.7949.3)(

)(

ji

ji

withinji

xx

xx

n

MSscoreqxx

For route 2 and route 3:

)24.4,96.23(

86321.91.14

86321.9)7.486.34(

For route 1 and route 2:

)26.22,54.2(

86321.94.12

86321.9)6.3447(

For route 1 and 3:

)16.8,56.11(

86321.97.1

86321.9)7.4847(



16/46


Route 2 would be the recommended route.

15.thesis. We

have sample evidence that the population means are not all the same.

he completed Excel templates are shown below.

or assigned quizzes and sample tests only:

We have to be careful NOT to answer this question merely by inspection! First werecall that the F-test for ANOVA indicated a rejection of the null hypo

TF


Interval



xbar

i 70.1111

xbarj 54.0667

ni 45

nj 45


MSwithin 218.900673



s

least 8.6 percent higher for those who use the online software forssigned quizzes.

We have 95% confidence that the interval (8.6, 23.5) contains the amount that theaverage mark for all those who used the online software for assigned quizzes, versuthe average mark for all those who used sample tests only. Thus it appears that theaverage mark is ata



17/46


For assigned quizzes for marks, and quizzes for no marks:


Interval

Wasthe

null

hypothesis


xbari 70.1111

xbarj 56.6889

ni 45

nj 45


MSwithin 218.900673



marks are higher when the online software is used for assigned quizzes forarks.

no marks) or sample tests only. The confidence intervalown below contains zero.

Once again, it appears that the average marks are higher when the online software isused for assigned quizzes for marks, compared with quizzes for no marks. We have95% confidence that the interval (6.0, 20.8) contains the amount by which theaveragemWe cannot conclude that there is a difference in the average marks when the onlinesoftware is used for quizzes (sh


Interval



xbari 56.6889

xbarj 54.0667

ni 45

nj 45q(fromAppendix7) 3.36

MSwithin 218.900673





18/46


We have evidence that assigning quizzes for marks results in the best averagefor student

markss. However, as we cautioned before, we cannot be certain of the cause-

nd-effect relationship here, because there are many potentially confoundingariables.

Cha

av

pter Review ExercisesThe histograms appear approxima1. tely normal, although there is some skewness ineach one. However, with the large sample sizes, it is not unreasonable to assume the

2. 590.65, and the smallest is 370.02. The ratio of the largest tothe smallest is not above 4, so it is reasonable to assume that population variances are

. The missing values are shown below in bold type.

ups Count

normality requirements are met.

The largest variance is

approximately equal.

3

SUMMARY

Gro Sum Average Variance

Class#1 95 5840 61.47368421 370.0179171

Class#2 95 088 3.55789474 90.6535274

lass#3 95 075 3.94736842 15.5823068

5 5 5

C 6 6 4

ion F

ANOVA

SourceofVariat SS df MS

BetweenGroups

596.133333 2 798.066667 .099311258

5 2 6

WithinGroups 1 8.7512505

Total 134963.986 284

29367.8526 282 45

The appropriate F-distribution has 2, 282 degrees of freedom. We4. refer to the area in

the F table for 2, 120 degrees of freedom and see that an F-score of 6.1 has a p-valueless than 0.010. Excel provides a more accurate value of 0.0026.



19/46


5. Because of the balanced design, these calculations simplify to:

273691.7)( ji xx

95

7512505.45831.3)(

)(

ji

withinji

xx

n

MSscoreqxx

3.5579 63.9474) 7.273691

it appears that the

verage marks of those with the Class 3 professor are at least 3 percentage pointsark for those with the Class 2 professor.

1.4737 53.5579) 7.273691.6, 15.2)

enappears that the average

marks of those with the Class 1 professor are at least 0.6 percentage points higherthan the average mark for those with the Class 2 professor.

1.4737 63.9474) 7.273691

, and so there does not appear to be agnificant difference between the average marks of those with the Class 1 professor

2d. There is no significant

ifference between the average marks for Class 1 and Class 3. The choice should

For Class 2 and Class 3:

(5(-17.7, -3.1)

We have 95% confidence that the interval (-17.7, -3.1) contains the differencebetween the average marks of Class 2 and Class 3. In other words,

ahigher than the average mFor Class 1 and Class 2:

(6(0

We have 95% confidence that the interval (0.6, 15.2) contains the difference betwethe average marks of Class 1 and Class 2. In other words, it

For Class 1 and Class 3:

(6(-9.7, 4.8)

In this case, the interval contains zerosi

and those with the Class 3 professor.

From these comparisons, it appears that the average marks are lower for the Classprofessor`s classes, and so this class should be avoidedthen be: any professor but the one who lead Class 2.



20/46


However, this is not a valid method of choosing classes, because there could bemany explanations for why the Class 2 marks were significantly lower. It could haveto do with the teacher`s expertise, and evaluation methods. But it could a

lso have

risen because of other factors: the students in Class 2 might have been less well-nted

6.ewed

ssfor the Mastercard data. It would not be

appropriate to use ANOVA techniques in this case. The Kruskal-Wallis test could be

. est variance is 14.757, which isonly 2.3 times as large as the smallest variance, which is 6.314.

The missing valu own be bo pe

Coun

aprepared, they may have worked more, or had family responsibilities that preve

them from studying, the class times might have been inconvenient, etc.

The conditions for ANOVA are not met, given the information in these threesamples. The distribution of monthly balances for Mastercard owners is quite skto the left. The distribution of monthly balances for American Express owners isquite skewed to the right. As well, the variance of the American Express data is lethan four times as large as the variance

used to compare these samples and draw conclusions about the populations (thistechnique is not covered in this text).

7 The requirement for equal variances is met. The larg

es are sh low, in ld ty .

SUMMARY

Groups t Sum Average Variance

Employee1 35 404 1 6.1.54286 314286

Employee2 37 2.48649 4.75676

mployee3 32 1.15625 0.32964

e4 42 8.97619 13.536

462 1 1

E 357 1 1

Employe 377

ourceofVariation SS df MS F

ANOVA

S

BetweenGroups .20984 726613264.6295 3 88 7.

WithinGroups 1621.124 142 11.41637

Total 1885.753 145

8. closest entry in the table is 3.95, and so we know that the p-value is < 0.01. At the 5% level of significance, the data do suggest that there aredifferences in the average number of minutes each employee spends with a customerbefore making a sale.

The F-distribution will have 3, 142 degrees of freedom. The closest we can come inthe table is 3, 120. The



21/46


9. The completed Excel templates are shown below.

Employee 4 and Employee 2:

TukeyKramer

Confidence

Interval



xbari 8.97619048

xbarj 12.4864865

ni 42

nj 37


MSwithin 11.4163655

UpperConfidence

Limit

1.52792567


We have 95% confidence that the interval (-5.5, -1.5) contains the number of minutesby which the average time spent with customers before making a sale for Employee4 differs from the average time spent by Employee 2. In other words, we expect theaverage time spent by Employee 4 is at least 1.5 minutes less than Employee 2.



Interval



xbari 8.97619048

xbarj 11.5428571

ni 42


MSwithin 11.4163655





22/46


We have 95% confidence that the interval (-4.5, -0.5) contains the number of minutesby which the average time spent with customers before making a sale for Employee4 differs from the average time spent by Employee 1. In other words, we expect theaverage time spent by Employee 4 is at least 0.5 minutes less than Employee 2.



Interval



xbari 8.97619048

xbarj 11.15625

ni 42


MSwithin 11.4163655



We have 95% confidence that the interval (-4.2, -0.1) contains the number ofminutes by which the average time spent with customers before making a sale for

Employee 4 differs from the average time spent by Employee 3. In other words, weexpect the average time spent by Employee 4 is at least 0.1 minutes less thanEmployee 3.



23/46




Interval

Wasthe

null

hypothesis


xbari 12.4864865

xbarj 11.15625

ni 37

nj 32


MSwithin 11.4163655



Since t his interval contains zero, we conclude there is no significant differencebetween the average number of minutes Employees 2 and 3 spend with customersbefore making a sale.

At this point, we can conclude that there are no significant differences between theaverage number of minutes Employees 1, 2 and 3 spend with customers beforemaking a sale (the differences in the sample means are all less than the difference forEmployees 2 and 3). This means that the average amount of time spent by Employee4 is less than the average amount of time spent by the other employees.



24/46


10. Without further information, we cannot comment on whether the data areindependent random samples. In practice, we should never take this on faith. We willassume this condition is met, with a caution that if it isn't, the results may not bereliable.

Histograms of the sample data reassure us that the population data are probablynormally distributed.

0

2

4

6

8

10

12

Freque

ncy

NumberofAccidents

NumberofFactoryAccidents,Training

Method#1

0

2

4

6

8

10

12

Frequency

NumberofAccidents


Method#2



25/46


0

1

2

3

4

5

6

7

8

9

Frequency

NumberofAccidents


Method#3

The largest variance is 16.5, which is less than twice as large as the smallest varianceof 8.3, so we will assume the population variances are approximately equal.

It appears that the conditions for one-way ANOVA are met.

11. The Excel output is shown below.

Anova:SingleFactor

SUMMARY


NumberofAccidents,

TrainingMethod#1 30 281 9.366667 8.309195

NumberofAccidents,

TrainingMethod#2 30 331 11.03333 9.757471

NumberofAccidents,

TrainingMethod#3 30 362 12.06667 16.47816

ANOVA


BetweenGroups 111.3556 2 55.67778 4.835263 0.010205

WithinGroups 1001.8 87 11.51494

Total 1113.156 89



26/46


H0: 1= 2= 3


= 0.025

nT= 90, n1= 30, n2 = 30, n3= 301x 9.3667, 2x 11.0333, 3x 12.0677

2

1s 8.3092, 9.7575, 16.4782,2

2s 2

3s



F = 4.835

Excel provides a p-value of 0.010205. Reject H0. There is sufficient evidence toconclude that at the average number of factory accidents is different, according to thetraining method. However, we cannot be certain that it is the training method thatcaused these differences. There may be other factors involved.

12. Comparing training method #1 and #3:


Interval



xbar

i 9.366667

xbarj 12.06667

ni 30

nj 30


MSwithin 11.51494



We have 95% confidence that the interval (-4.8, -0.6) contains the amount by whichthe average number of factory accidents for training method #1 differs from theaverage number of factory accidents for training method #3. In other words, itappears that training method #1 is associated with at least 0.6 fewer accidents, onaverage.



27/46


Comparing training method #2 and #3:


Interval

Wasthe

null

hypothesis


xbari 11.033

xbarj 12.067

ni 30

nj 30


MSwithin 11.515



Since this confidence interval contains zero, there is not a significant difference inthe average number of factory accidents associated with training methods #2 and #3.

Comparing training method #1 and #2:


IntervalWasthenullhypothesis


xbari 9.36666667

xbarj 11.0333333

ni 30

nj 30


MSwithin 11.5149425


LowerConfidence

Limit

3.77310707

Since this confidence interval contains zero, there is no significant differencebetween the average number of accidents that are associated with training methods#1 and #2.



28/46


Training method #1 compares favourably to training method #3, but otherwise thedifferences are not significant. This suggests that training method #3 is the "worst".Again, we should be cautious, because there may be other explanatory factors.

13. Histograms of the sample data show significant skewness for some of the connection

times. The data for early morning and late afternoon connection times appear skewedto the right, and the connection times for the evening are skewed to the left. Samplesizes are also relatively small. As a result, it would probably not be wise to proceedwith ANOVA here, as the required conditions do not appear to be met.

0

2

4

6

8

10

12

Frequen

cy

TimesinSeconds,LateAfternoon

ConnectionTimestoOnline

MutualFundAccount

0

1

2

3

4

5

6

7

8

9

10

Frequency

TimesinSeconds,Evening


MutualFundAccount



29/46


0

1

2

3

4

5

6

7

8

9

Frequency

TimesinSeconds,EarlyAfternoon


MutualFundAccount

0

1

2

3

4

5

6

7

8

9

Frequency

TimesinSeconds,EarlyMorning


MutualFund

Account

0

1

2

3

4

5

6

7

8

9

Frequency

TimesinSeconds,MidDay


MutualFundAccount



30/46


14. We are told the data were collected on a random sample of days. Histograms areshown below.

0

1

2

3

4

5

6

7

8

9

Frequency

NumberofMinutes

CommutingTimes,6a.m.Departure

0

1

2

3

4

5

6

7

8

Frequency

NumberofMinutes


0

12

3

4

5

6

7

8

9

10

Frequency

NumberofMinutes


The histograms appear approximately normal. The Excel ANOVA output is shownbelow.



31/46


Anova:SingleFactor

SUMMARY

Groups Count Sum Average VarianceCommutingTimeinMinutes,6

a.m.Departure 24 1097 45.70833 172.3895

CommutingTimeinMinutes,

7a.m.Departure 22 1002 45.54545 175.4026

CommutingTimeinMinutes,8

a.m.Departure 27 1063 39.37037 197.5499

ANOVA

SourceofVariation SS df MS F Pvalue F crit

BetweenGroups 667.0442 2 333.5221 1.826131 0.168624 3.127676

WithinGroups 12784.71 70 182.6387

Total 13451.75 72

We see from the output that the variances are fairly close in value, and certainly thelargest is less than four times as large as the smallest. It appears that the conditionsfor ANOVA are met.

H0: 1= 2= 3


= 0.05

nT= 73, n1= 24, n2 = 22, n3= 27, k = 3

1x 45.7, 2x 45.5, 3x 39.4

2

1s 172.4, 175.4, 197.52

2s 2

3s



F = 3.1

Excel provides a p-value of 0.16. Fail to reject H0. There is not enough evidence toconclude that the mean commuting times are not all equal.



32/46


15. First, check conditions. The data are not actually random samples, but could perhapsbe considered to be (see the explanation in the exercise). Histograms of the data areshown below.

0

1

2

3

4

5

6

7

8

9

Frequency

FinalGrade

ClassesScheduled

at

8a.m.

Thursday

0

2

4

6

8

10

12

Frequency

FinalGrade

ClassesScheduledat4p.m.Friday

0

2

4

6

8

10

12

Frequency

FinalGrade

ClassesScheduledat2p.m.

Wednesday



33/46


The histograms appear reasonably normal. The Excel ANOVA output is shownbelow.

Anova:SingleFactor

SUMMARY


MarksofClass

Scheduledfor8a.m.

Thursdays 20 1257 62.85 268.0289

MarksofClass

Scheduledfor4p.m.

Fridays 23 1650 71.73913 305.2016

MarksofClass

Scheduledfor2p.m

Wednesday 25 1691 67.64 263.99

ANOVA


BetweenGroups 845.314 2 422.657 1.514253 0.22763

WithinGroups 18142.74 65 279.1192

Total 18988.06 67

We can see from the output that the variances are sufficiently similar to allow us toassume the requirements for ANOVA are met (population variances approximatelyequal).

H0: 1= 2= 3


= 0.01

nT= 78, n1= 20, n2 = 23, n3= 25, k = 31x 62.85, 2x 71.74, 3x 67.64

2

1s 268.03, 305.20, 263.992

2s 2

3s



F =1.514



34/46


Excel provides a p-value of 0.23. Fail to reject H0. There is not enough evidence toconclude that the mean grades for the students in classes for all three schedules arenot equal. It does not appear that the scheduled time for classes affects the marks.However, we should be cautious, because there are many other factors that could be

affecting marks. If we could control for them, we would be in a better position toinvestigate the effects of class schedule on student grades.

16. The first thing to note is that the data are not completely randomly selected. Theinformation is provided by those who enter the contest. These customers may notrepresent all drugstore customers. Therefore, we must be cautious in interpreting theresults. We would need more information about whether most customers entered thecontest, before we could apply the results to all customers.

As well, we have no way to be sure that the data are correct. Some people may havemisrepresented their age or the value of their most recent purchase.

With these caveats, we will proceed, but mostly for the practice!

Histograms of the data appear approximately normal, and sample sizes, at 45, arefairly large.

0

5

10

15

20

Freque

ncy

AmountofPurchase

MostRecentDrugstorePurchasefor

CustomersUnder18YearsOld

0

5

10

15

20

Frequency

AmountofPurchase


Customers1825YearsOld



35/46


0

5

10

15

20

25

Frequency

AmountofPurchase



0

5

10

15

20

Freque

ncy

AmountofPurchase



0

5

10

15

20

Frequen

cy

AmountofPurchase



0

5

10

15

20

Frequ

ency

AmountofPurchase


Customers75orMoreYearsOld



36/46


Excel's ANOVA output is shown below.

Anova:SingleFactor

SUMMARY


Under18 45 1055.7 23.46 106.4338

1825 45 1246.36 27.69689 83.09607

2634 45 1567.82 34.84044 57.77471

3549 45 1604.26 35.65022 147.776

5074 45 1647.04 36.60089 121.0066

75andover 45 1172.11 26.04689 78.81046

ANOVA

SourceofVariation SS df MS F Pvalue F crit

BetweenGroups 7179.96 5 1435.992 14.48308 1.53E12 2.248208

WithinGroups 26175.49 264 99.1496

Total 33355.46 269

The largest variance is 147.8, and the smallest is 57.8, so the largest variance is less

than four times the smallest variance. We will assume that the population variancesare sufficiently equal to proceed with ANOVA.



37/46


17. H0: 1= 2= 3= 4= 5 = 6


= 0.05

nT= 270, n1= 45, n2 = 45, n3= 45, n4= 45, n5= 45, n6= 45, k = 61x 23.46, 2x 27.50, 3x 34.84, 4x 35.65, 5x 36.60, 6x 26.05

2

1s 106.43, 83.10, 57.77, 147.78, 121.01, 78.812

2s 2

3s 2

4s 2

5s 2

6s



F = 14.5

Excel provides a p-value of approximately zero. Reject H0. There is enough

evidence to conclude that the mean purchases of customers in different age groupsare not all equal, when we consider the most recent purchases of those who enteredthe contest.

18. Because there are so many age groups in this data set, it is not as easy to see wherethe greatest differences in samples means is, simply by inspection. The easiest wayto proceed is to create a table showing the differences in sample means. This is fairlyeasily constructed in Excel. See an example of such a table, below. Notice that thetable shows the absolute value of the differences.

Under 18 18-25 26-34 35-49 50-74 75 and overUnder 18 0

18-25 4.237 0.000

26-34 11.380 7.144 0.000

35-49 12.190 7.953 0.810 0.000

50-74 13.141 8.904 1.760 0.951 0.000

75 and over 2.587 1.650 8.794 9.603 10.554 0

By inspection of the table, we can see that we should start first by comparing the

differences of purchases for customers under 18 and 50-74, then under 18 and 35-49,then under 18 and 26-34, and so on.

We need the q-value for 6, 265 degrees of freedom. We will use the value for 6, 120degrees of freedom, as the closest entry in Appendix 7.



38/46


The completed templates are shown below.

Under 18 and 50-74:


IntervalWasthenullhypothesis


xbari 2

xbarj 36.6008889

ni 45

n

3.46

j 45


MSwithin 99.1496022



We have 95% confidence that the interval (-$19.23, -$7.06) contains the amount bywhich the average most recent purchase of customers under 18 differs from thoseaged 50-74 (for those who entered the contest).

Under 18 and 35-49:


Interval



xbari 2

xbarj 35.6502222

ni 45

n

3.46

j 45


MSwithin 99.1496022

UpperConfidence

Limit

6.10434638





39/46


Under 18 and 26-34:


Interval


rejectedin

the

ANOVA

test? yes

xbari 2

xbarj 34.8404444

ni 45

nj 45


MSwithin 99.1496022



3.46


75 and over and 50-74:


Interval

Wasthe

null

hypothesis


xbari 26.0468889

xbarj 36.6008889

ni 45

nj 45


MSwithin 99.1496022



We have 95% confidence that the interval (-$16.64, -$4.47) contains the amount bywhich the average most recent purchase of customers 75 and over differs from thoseaged 50-74 (for those who entered the contest).



40/46




Interval


rejectedin

the

ANOVA

test? yes

xbari 26.0468889

xbarj 35.6502222

ni 45

nj 45


MSwithin 99.1496022



We have 95% confidence that the interval (-$115.69, -$3.52) contains the amount bywhich the average most recent purchase of customers 75 and over differs from thoseaged 35-49 (for those who entered the contest).



Interval


rejectedin

the

ANOVA

test? yes

xbari 36.6008889

xbarj 34.8404444

ni 45

nj 45


MSwithin 99.1496022



At this point, we see the confidence interval contains zero. For this and all theremaining comparisons, there is not a significant difference in the average purchases(for those who entered the contest).



41/46


19. This question has already been answered, in the discussion of exercise 16. Weproceeded, for practice, but these data do not represent a random sample of dataabout the drugstore customers.

20. Generally speaking, these data do not meet the requirements for ANOVA. The data

sets are non-normal, and quite significantly skewed. The histograms for Canada-widedata are shown below.

0

20

40

60

80

100

120

Frequency

WagesandSalaries

CanadianswithSecondarySchoolGraduation

CertificateasHighestLevelofSchooling

0

10

20

30

40

Frequency

WagesandSalaries

CanadianswithTradesCertificateorDiploma

asHighestLevelofSchooling

0

10

20

30

40

50

60

Frequency

WagesandSalaries

CanadianswithCollegeCertificateorDiploma

asHighestLevelofSchooling



42/46


21. The professor has selected random samples, from large classes, and there is noimmediately obvious reason why the observations would not be independent.

The sample data appears to be approximately normally distributed, as the histogramsbelow illustrate.

0

2

4

6

8

10

12

Frequency

Final

Mark

in

Microeconomics

StudentsWorking


43/46


0

2

4

6

8

10

Freq

uency

FinalMarkinMicroeconomics

StudentsWorking15


44/46


The ANOVA output shows the largest variance as 355.40, and the smallest as251.44, and so the largest variance is less than four times as large as the smallest. Wewill presume that the population variances are approximately equal.

H0: 1= 2= 3= 4= 5


= 0.05

nT= 153, n1= 32, n2 = 34, n3= 36, n4= 27, n5= 24, k = 5

1x 64.88, 2x 65.21, 3x 55.14, 4x 57.67, 5x 52.54

2

1s 355.40, 251.44, 305.32, 284.00, 256.432

2s 2

3s 2

4s 2

5s



F = 3.39

Excel provides a p-value of 0.010. Reject H0. There is enough evidence to suggestthat the mean marks are not all equal.

Again, because there are so many possible comparisons, it is useful to calculate alldifferences in sample means, so we can see which is largest, second-largest, and soon. Such a summary table is shown below (absolute values of differences areshown).

LessThan

5Hours

PerWeek

5


45/46


So, the first comparison will be the marks of students who work 20 or more hours aweek and those who work 5 -


46/46


For the marks of those who work 20 or more hours a week, and those who work

ch11-solns-all_skuce_2e

Documents

Transcript of ch11-solns-all_skuce_2e