ch13-solns-all_skuce_2e
-
Upload
gainesboro -
Category
Documents
-
view
214 -
download
0
Transcript of ch13-solns-all_skuce_2e
-
8/12/2019 ch13-solns-all_skuce_2e
1/38
Instructors Solutions Manual - Chapter 13Chapter 13 Solutions
Develop Your Skills 13.1
1. The scatter diagram is shown below.
y=6.6519x+4.7013
0
20
40
60
80
100
120
140
5 7 9 11 13 15 1
TotalSales($000)
NumberofSalesContacts
HendrickSoftwareSales
7
The least-squares regression line is:total sales ($000) = 6.6519(number of sales contacts) + 4.7013
Interpretation: Each new sales contact results in an increase in sales of
approximately $6,652.
The y-intercept should not be interpreted, since the sample data did not contain anyobservations of 0 sales contacts.
2. The equation of the least-squares regression line ismonthly spending on restaurant meals = 0.024144(monthly income)+$44.90
Interpretation: Each new dollar in monthly income increases spending on restaurantmeals by about 2.4.
Copyright 2011 Pearson Canada Inc. 351
-
8/12/2019 ch13-solns-all_skuce_2e
2/38
Instructors Solutions Manual - Chapter 133. A scatter diagram is shown below.
y=30.21x148770
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
$1,400,000
$1,600,000
$0 $10,000 $2 0,000 $30,000 $ 40,000 $50,000
Sales
PromotionExpenditure
SmithandKleinManufacturing
The least-squares regression line is:annual sales = 30.21(annual promotion spending) - $148,770
Interpretation: Each new dollar in promotion spending results in an increase inannual sales of approximately $30.21.
The y-intercept should not be interpreted, since the sample data did not contain anyobservations of $0 annual promotion spending.
4. The response variable is the semester average mark, and the explanatory variable isthe total number of hours spent working during the semester. The relationship isunlikely to be positive.
y = 0.1535x + 90.241
suggests that a student who worked no hours would get a mark of 90%, which seemsa little high (but this intercept may not be reasonable to interpret this way, depending
on the range of hours worked in the sample data).
It also suggests that for each hour worked, the students mark would increase by0.1535, which seems unlikely. It is more likely that the student's mark woulddecrease for each hour worked.
Copyright 2011 Pearson Canada Inc. 352
-
8/12/2019 ch13-solns-all_skuce_2e
3/38
Instructors Solutions Manual - Chapter 135. Because of the way the researcher has posed the question, the response variable is
revenues, and the explanatory variable is the number of employees.
The scatter diagram is shown below:
y=0.1338x+140.56
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
0 5,000 10,000 15,000 20,000 25,000 30,000 35,000
GlobalRes
earchRevenues(US$Millions)
FullTimeEmployees
Top25GlobalResearch
Organizations,2007
The least-squares regression line is:revenue (US$millions) = 0.1338(number of full-time employees) + $140.56 USmillion
Interpretation: Each additional thousand employees results in increased revenue ofUS$0.1338 million (or US$133,800).
The y-intercept should not be interpreted, since the sample data did not contain anyobservations of 0 employees.
Copyright 2011 Pearson Canada Inc. 353
-
8/12/2019 ch13-solns-all_skuce_2e
4/38
Instructors Solutions Manual - Chapter 13Develop Your Skills 13.2
6. The scatter diagram showed an apparently linear relationship between software salesand the number of sales contacts (see Develop Your Skills 13.1, Exercise 1).
20
15
10
5
0
5
10
15
20
0 5 10 15 20Residuals
NumberofSalesContacts
Numberof
Sales
Contacts
ResidualPlot
The residual plot shows residuals centred on zero, with fairly constant variability.There is no indication that the error terms are not independent. The data werecollected over a random sample of months, but the dates of collection are notincluded, so it is not possible to check for independence of the residuals over time.A histogram of the residuals appears to be approximately normal.
01
2
3
4
5
6
7
8
9
Frequency
Residual
HendrickSoftwareSalesResiduals
Copyright 2011 Pearson Canada Inc. 354
-
8/12/2019 ch13-solns-all_skuce_2e
5/38
Instructors Solutions Manual - Chapter 13A check of the scatter diagram and the standardized residuals does not reveal anyoutliers. There are no obvious influential observations. It appears that the sampledata meet the requirements of the theoretical model.
7. The scatter diagram does not contain much of a pattern, but if there is a relationship,
it appears to be linear.
y=0.0241x+44.903
$0
$50
$100
$150
$200
$250
$1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500
MonthlySpendingonRestaurantMeals
MonthlyIncome
SpendingonRestaurantMealsand
Income
150
100
50
0
50
100
150
$
$1,000
$2,000
$3,000
$4,000
$5,000
Residuals
MonthlyIncome
MonthlyIncome
Residual
Plot
The residual plot shows a fairly constant variability, although the residuals appear tobe a little larger on the positive side (except in the area of monthly incomes ofaround $3,500). There is no obvious dependence among the residuals.
Copyright 2011 Pearson Canada Inc. 355
-
8/12/2019 ch13-solns-all_skuce_2e
6/38
Instructors Solutions Manual - Chapter 13
A histogram of the residuals appears to be approximately normal.
0
5
10
15
20
25
30
Frequency
Residual
ResidualsforModelofRestaurant
SpendingandMonthlyIncome
A check of the scatter diagram and the standardized residuals reveals six points thatcould be considered outliers. They are circled on the scatter diagram below.
$0
$50
$100
$150
$200
$250
$1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500
MonthlySpendingonRestaurantMeals
MonthlyIncome
SpendingonRestaurantMealsand
Income
Copyright 2011 Pearson Canada Inc. 356
-
8/12/2019 ch13-solns-all_skuce_2e
7/38
-
8/12/2019 ch13-solns-all_skuce_2e
8/38
Instructors Solutions Manual - Chapter 13However, at this point in the analysis, it would be useful to go back to the beginning.It does not appear that monthly income is a strong predictor of monthly restaurantspending. There is too much variability in the restaurant spending data, for thevarious income levels, for us to develop a useful model.
8. The scatter diagram shows the points arranged in a linear fashion. However, thescatter around the regression line appears to widen as the amount of promotionalspending increases.This shows quite clearly in the residual plot.
300000
200000
100000
0
100000
200000
300000
$0 $10,000 $20,000 $30,000 $40,000 $50,000
Residuals
PromotionExpenditure
PromotionExpenditure Residual
Plot
At this point, it is clear that the data do meet the requirements of the theoreticalmodel. [For completeness, we will continue to check the other requirements.]
Copyright 2011 Pearson Canada Inc. 358
-
8/12/2019 ch13-solns-all_skuce_2e
9/38
Instructors Solutions Manual - Chapter 13This is time-series data, and so the residuals should be plotted against time. Theresulting plot shows a definite pattern over time, with the residuals widening in morerecent years. This again indicates a problem; the current model does not meet therequirements of the theoretical model.
250000
200000150000
100000
50000
0
50000
100000
150000
200000
250000
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Residual
ResidualsOverTime,SmithandKlein
Manufacturing
At this point, it is clear that the model should be re-specified. Introducing time as anexplanatory variable would probably be of interest.
Copyright 2011 Pearson Canada Inc. 359
-
8/12/2019 ch13-solns-all_skuce_2e
10/38
Instructors Solutions Manual - Chapter 139. With the two erroneous data points removed, the scatter diagram looks as shown
below.
y=0.144x+89.175
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400
SemesterAverageMark
TotalHoursatPaidJobDuringSemester
Hours
of
Work
and
Semester
Marks
The relationship appears to be linear.The residual plot is shown below.
20
15
10
5
0
5
10
15
0 100 200 300 400Residuals
TotalHoursatPaidJobDuringSemester
TotalHours
at
Paid
Job
During
Semester ResidualPlot
The residuals appear centred on zero, with fairly constant variability, althoughvariability seems greatest in the middle of the range of hours worked.
Copyright 2011 Pearson Canada Inc. 360
-
8/12/2019 ch13-solns-all_skuce_2e
11/38
Instructors Solutions Manual - Chapter 13There is no indication that the residuals are dependent.
A histogram of the residuals is shown below.
0
2
4
6
8
10
12
14
Frequency
Residual
Residualsfor
Semester
Mark
and
Hours
ofWorkData
The histogram is quite normal in shape.
A check of the standardized residuals does not reveal any that are -2 or +2,although there is one observation with a standardized residual of -1.99. This is theobservation (72, 65). [If we could, we would check this data point to make sure thatit is accurate.] This point is quite obvious in both the scatter diagram and residual
plot (the point is circled in these two graphs).
There are no obvious influential observations, except perhaps for the almost-outlier.Removing this point from the data set does not affect the least squares regression linesignificantly.
Despite the one troublesome point, the data set does appear to meet the requirementsof the theoretical model.
Copyright 2011 Pearson Canada Inc. 361
-
8/12/2019 ch13-solns-all_skuce_2e
12/38
Instructors Solutions Manual - Chapter 1310. The relationship between revenues and number of employees appears to be linear.
The residual plot is shown below.
600
400
200
0
200
400
600
800
1000
1200
0 5000 10000 15000 20000 25000 30000 35000
Residuals
FullTimeEmployees
FullTime
Employees
Residual
Plot
The residuals do not appear to be centred on zero, and the variability is not constant.At this point, it appears that this sample data set does not appear to meet therequirements of the theoretical model.
A histogram of the residuals is shown below.
0
2
4
6
8
10
12
14
16
Freq
uency
Residuals
ResidualsforTop25GlobalResearch
Organizations
Copyright 2011 Pearson Canada Inc. 362
-
8/12/2019 ch13-solns-all_skuce_2e
13/38
Instructors Solutions Manual - Chapter 13The histogram of residuals confirms what we saw in the residual plot. The residualsare highly skewed to the right.
There is one observation with a standardized residual of 3.8. The corresponding pointis circled on the residual plot above.
Develop Your Skills 13.3
11. Since the sample data meet the requirements, it is acceptable to proceed with thehypothesis test.H0: 1= 0 (that is, there is no linear relationship between the number of sales
contacts and sales)H1: 1> 0 (that is, there is a positive linear relationship between the number of sales
contacts and sales)= 0.05From the Excel output, t = 7.64The p-value is 9.38E-08, which is very small. The p-value for the one-tailed test is
only half of this value, and is certainly < . In other words, there is almost no chanceof getting sample results like these, if in fact there is no linear relationship betweenthe number of sales contacts and sales. Therefore, we can (with confidence), rejectthe null hypothesis and conclude there is evidence of a positive linear relationshipbetween the number of sales contacts and sales data for the Hendrick Software SalesCompany.
12. We already expect that the model will not be particularly useful. The number of datapoints with standardized residuals either +2 or -2 are a concern. However, thehypothesis test provides some evidence that there is a linear relationship betweenmonthly income and monthly spending on restaurant meals.
H0: 1= 0 (that is, there is no linear relationship between monthly income andmonthly spending on restaurant meals)
H1: 1> 0 (that is, there is a positive linear relationship between the number of salescontacts and sales)
= 0.05From the Excel output, t = 4.6. The p-value is on the output is 1.338E-05, and the p-value for the one-tailed test is half of this. Reject H0and conclude there is evidenceof a positive linear relationship between monthly income and monthly spending onrestaurant meals.
13. Since the sample data do not meet the requirements of the theoretical model, it is notappropriate to conduct a hypothesis test.
Copyright 2011 Pearson Canada Inc. 363
-
8/12/2019 ch13-solns-all_skuce_2e
14/38
Instructors Solutions Manual - Chapter 1314. Since the sample data meet the requirements, it is acceptable to proceed with the
hypothesis test.H0: 1= 0 (that is, there is no linear relationship between the number of hours
worked during the semester and the semester average grade)H1: 1< 0 (that is, there is a negative linear relationship between the number of
hours worked during the semester and the semester average grade)= 0.05From the Excel output, t = -10.01The p-value is 2.47086E-12, which is very small. The p-value for the one-tailed testis only half of this value, and is certainly < . In other words, there is almost nochance of getting sample results like these, if in fact there is no linear relationshipbetween the number of hours worked during the semester and the semester averagegrade. Therefore, we can (with confidence), reject the null hypothesis and concludethere is evidence of a negative linear relationship between the number of hoursworked during the semester and the semester average grade.
15. Since the sample data do not meet the requirements of the theoretical model, it is notappropriate to conduct a hypothesis test.
Develop Your Skills 13.4
16. From the Excel output, R2= 0.72. This means that 72% of the variation in sales is
explained by the number of sales contacts. This suggests a fairly strong linearassociation between the two variables, which is not surprising.
Assuming the original data was collected correctly, it is possible that the otherfactors affecting sales have been randomized. In such a case, it would seemreasonable to conclude that increasing sales contacts would lead to increased sales.However, there will likely be limits to the positive impact that could be created.Presumably, salespeople contact their best prospective clients first, so additionalcontacts may not be as productive. As well, increasing the number of contacts mayreduce the quantity of time spent with each contact, which could have a detrimentaleffect on sales.
17. The R2value for this data set is only 0.18. This is not surprising, because the scatterdiagram of the relationship revealed scarcely any perceivable pattern. Only 18% ofthe variation in monthly spending on restaurant meals is explained by income.Earlier investigations suggested this model was not worth pursuing, and the low R
2
value reinforces that.
18. The R2value is fairly high, at 0.83. This means that 83% of the variation in Smithand Kleins sales is explained by sales promotion spending. However, while there isa strong association between the two variables, the linear regression model is not agood one.
Copyright 2011 Pearson Canada Inc. 364
-
8/12/2019 ch13-solns-all_skuce_2e
15/38
Instructors Solutions Manual - Chapter 1319. The R2value, at 0.72, suggests that 72% of the variation in semester average marks
is explained by hours spent working during the semester. (Note that this is for theamended data set, where the two erroneous grades have been removedsee DevelopYour Skills 13.2, Exercise 9). Obviously, there are many factors that affect semesteraverage marks, for example, ability, study habits, past educational experience, and so
on. If the original data were collected in a truly random fashion, these factors mayhave been randomized.
It seems reasonable to conclude that students who work less will have more time fortheir studies, and it seems reasonable to think that marks improve with time spentstudying. However, this data set does not guarantee that reducing work will lead toimproved marks.
20. The R2value is 0.93. Notice that this value looks very promising. Remember,though, that the model did not meet the requirements of the theoretical model.Remember, a high R2value does not guarantee a cause-and-effect relationship, or a
useful model.
Develop Your Skills 13.5
21. Since the requirements are met, it is appropriate to create a confidence interval.The Excel output is shown below (in two parts, to better fit on the page).
ConfidenceIntervaland PredictionIntervals Calculations
Point 98% =ConfidenceLevel(%)
Number NumberofSalesContacts
1 10 PredictionInterval ConfidenceInterval
Lowerlimit Upperlimit Lowerlimit Upperlimi
44.96826 97.471443 66.068659 76.37104
With 98% confidence, the interval ($66,069, $76,371) contains the average sales for10 sales contacts.
22. We have already established this is not a good model. However, even if it were agood model, we would not use it to predict monthly spending on restaurant mealsbased on a monthly income of $6,000. The highest monthly income in the sample
data set is $4,056, and so we should not rely on our model to make predictions for amonthly income of $6,000.
23. Since the requirements are not met, it is not appropriate to create a confidenceinterval.
Copyright 2011 Pearson Canada Inc. 365
-
8/12/2019 ch13-solns-all_skuce_2e
16/38
Instructors Solutions Manual - Chapter 1324. The Excel output is shown below (note that this is for the amended data set, where
the two erroneous grades have been removedsee Develop Your Skills 13.2,Exercise 9).
Confidence
Interval
and
Prediction
Intervals
CalculationsPoint 95% =ConfidenceLevel(%)
Number TotalHoursatPaidJobDuringSemester
1 200 PredictionInterval ConfidenceInterval
Lowerlimit Upperlimit Lowerlimit Upperlimit
46.027952 74.74128231 58.1586403 62.61059452
With 95% confidence, the interval (58.2, 62.6) contains the average semester averagemark, when students work 200 hours in paid employment during the semester.
25. Since the requirements are not met, it is not appropriate to construct a predictioninterval.
Chapter Review Exercises
1. The hypothesis test is only valid if the required conditions are met. If you don't checkconditions, you may rely on a hypothesis test when it is misleading.
2. Regression prediction intervals are wider than confidence intervals because theinterval has to account for the distribution of y-values around the regression line. Theregression confidence interval has to take into account only that the sample
regression line may not match the true population regression line.
3. A lower standard error means that confidence and prediction intervals will benarrower. Predictions made with the model will therefore be more useful.
4. You should not make predictions outside the range of the sample data on which theregression relationship is based because the relationship may be very different there.For example, a linear model may provide a good approximation of a portion of arelationship that is actually a curved line. However, if the line is extended beyondthis portion, it could be quite misleading.
5. It is always tempting to just remove problem data points. However, if you do this,you will often find that the remaining data points also have outliers. If you persist inthe practice of removing troublesome data points, you may not have much data left!
Careful thinking is a better approach. The outlier may be telling you something reallyimportant about the actual relationship between the explanatory and responsevariables. You wouldn't want to miss this important clue to what is really going on.
Copyright 2011 Pearson Canada Inc. 366
-
8/12/2019 ch13-solns-all_skuce_2e
17/38
Instructors Solutions Manual - Chapter 136. The scatter diagram is shown below.
y=0.0374x+18017
$10,000
$12,000
$14,000
$16,000
$18,000
$20,000
$22,000
0 20,000 40,000 60 ,000 80 ,000 100,0 00 120,000
ListPrice
OdometerReading
ListPriceandOdometerReadingfor2006
HondaCivic
Sedan
(as
of
Fall
2008)
The relationship is:$list price = -0.0374 (odometer reading in kilometers) + $18,017
For this small car, the base asking price is $18,017, which is reduced by about 3.7for every kilometer on the odometer. However, note that this base asking priceshould not be trusted for any cars with fewer than 8,600 kilometres, since no cars in
the data set had odometer readings below that.
Copyright 2011 Pearson Canada Inc. 367
-
8/12/2019 ch13-solns-all_skuce_2e
18/38
Instructors Solutions Manual - Chapter 137. We have already examined the scatter diagram, which suggests a negative linear
relationship.
The residual plot is shown below. It has the desired appearance of constantvariability, with the residuals centred on zero.
4000
3000
2000
1000
0
1000
2000
3000
0 20000 40000 60000 80000 100000 120000
Residuals
Odometer
Odometer ResidualPlot
A histogram of the residuals is shown below. The histogram is not perfectlynormally-distributed, but it is approximately so.
0
12
3
4
5
6
7
8
Frequency
Residual
ResidualsforHondaCivicListPrice
Model,BasedonOdometer
Copyright 2011 Pearson Canada Inc. 368
-
8/12/2019 ch13-solns-all_skuce_2e
19/38
Instructors Solutions Manual - Chapter 13There are no standardized residuals +2 or -2.
It appears the sample data meet the requirements of the theoretical model, and so itwould be appropriate to use odometer readings to predict the list prices of these usedcars.
A 95% prediction interval for the list price for one of these cars with 50,000kilometres on the odometer is ($12,683, $19,608). The Excel output is shown below.
ConfidenceIntervaland PredictionIntervals Calculations
Point 95% =Confidence Level(%)
Number Odometer
1 50000 PredictionInterval ConfidenceInterval
Lower
limit Upper
limit Lower
limit Upper
limit12683.4909 19607.9242 15259.8312 17031.584
8. A scatter diagram showing the two stock market indexes is shown below. Note thatthe data used are the "adjusted close" figures. You must take care to match thedatesthere are a few instances when one market is open and the other is not.Observations that did not have a match were removed from the data set.
y=1.2553x 894.84
7,000
7,500
8,000
8,500
9,000
9,500
10,000
10,500
11,000
6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500
S&P/TSXCompositeIndex
DowJonesIndustrialAverage
TSX
and
DJI,
January
June,
2009
The estimated relationship is as follows:TSX Composite Index = 1.255 (DJI) 895
Copyright 2011 Pearson Canada Inc. 369
-
8/12/2019 ch13-solns-all_skuce_2e
20/38
Instructors Solutions Manual - Chapter 13Note that the choice of variable on the x or y axis is somewhat arbitrary here.Because Canada's economy is so dependent on exports to the US, the DJI is placed asthe "explanatory" variable, but the cause and effect is not direct.
9. The coefficient of determination for the TSX and the DJI over the first six months of
2009 is 0.72. This measure suggests that 72% of the variation in the TSX isexplained by variation in the DJI.
10. This data set is not a random sample, because it includes all matched observationsover the period studied. Could this be considered a random sample? Probably not.The credit crisis and the recession that were having impacts on the stock markets inthe first six months of 2009 made this period unreliable as a model of how the twoindexes behave during more normal times. However, it is interesting to examine thepatterns in the indexes over the period.
The indexes were more closely related at the beginning of 2009 than they were later
in the period. A time-series plot reveals this quite clearly.
6,000
6,500
7,000
7,500
8,0008,500
9,000
9,500
10,000
10,500
11,000
02
Jan09
16
Jan09
30
Jan09
13
Feb
09
27
Feb
09
13
Mar09
27
Mar09
10
Apr09
24
Apr09
08
May09
22
May09
05
Jun09
19
Jun09
IndexV
alues
TSXandDJI,JanuaryJune2009
DJI
TSX
The required conditions are not met (as we might expect, given the graph above).
Copyright 2011 Pearson Canada Inc. 370
-
8/12/2019 ch13-solns-all_skuce_2e
21/38
Instructors Solutions Manual - Chapter 13The residual plot clearly shows non-constant variability.
1500
1000
500
0
500
1000
6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500
Residuals
DJI
DJIResidualPlot
As well, the histogram of residuals shows marked negative skewness.
0
5
10
15
20
25
30
35
40
Frequency
Residual
Residuals,TSXandDJIData,January
June2009
Copyright 2011 Pearson Canada Inc. 371
-
8/12/2019 ch13-solns-all_skuce_2e
22/38
Instructors Solutions Manual - Chapter 13A plot of the residuals over time clearly shows a time-related pattern.
1500
1000
500
0
500
1000
02
Jan09
16
Jan09
30
Jan09
13
Feb
09
27
Feb
09
13
Mar09
27
Mar09
10
Apr09
24
Apr09
08
May09
22
May09
05
Jun09
19
Jun09
Residuals
ResidualsOverTime,TSXandDJIData,January
June
2009
Copyright 2011 Pearson Canada Inc. 372
-
8/12/2019 ch13-solns-all_skuce_2e
23/38
Instructors Solutions Manual - Chapter 1311. A scatter diagram is shown below.
y=
0.9586x
+0.4464
0
10
20
30
40
50
60
70
80
90100
0 20 40 60 80 10
MarkonFinalExam
MarkonTest#2
StudentMarksinStatistics
0
The estimated relationship is as follows:Mark on final exam = 0.9586 (Mark on Test #2) + 0.4464
In other words, it appears the mark on the final exam is about 96% of the mark onTest #2.
Copyright 2011 Pearson Canada Inc. 373
-
8/12/2019 ch13-solns-all_skuce_2e
24/38
Instructors Solutions Manual - Chapter 1312. The residual plot has the desired appearance.
15
10
5
0
5
10
0 20 40 60 80 100
Residuals
Markon
Test#2
Markon
Test#2
Residual
Plot
A histogram of the residuals appears approximately normally-distributed.
0
2
4
6
8
10
Frequency
Residual
ResidualsforFinalExamMarks
PredictionModel
There are no obvious influential observations or outliers. It appears that the sampledata conform to the requirements of the theoretical model.
Copyright 2011 Pearson Canada Inc. 374
-
8/12/2019 ch13-solns-all_skuce_2e
25/38
Instructors Solutions Manual - Chapter 1313. Since the sample data meet the requirements, it is acceptable to proceed with the
hypothesis test.H0: 1= 0 (that is, there is no linear relationship between the mark on Test #2 and
the final exam mark in Statistics)H1: 1> 0 (that is, there is a positive linear relationship between the mark on Test #2
and the final exam mark in Statistics)= 0.05
From the Excel output, t = 16.5
The p-value is 2.96E-14, which is very small. The p-value for the one-tailed test isonly half of this value, and is certainly < 5%. In other words, there is almost nochance of getting sample results like these, if in fact there is no linear relationshipbetween the mark on Test #2 and the final exam mark in Statistics. Therefore, rejectH0 and conclude there is strong evidence of a positive linear relationship betweenthe mark on Test #2 and the final exam mark in Statistics.
14a. The Excel output is shown below.
PredictionInterval ConfidenceInterval
Lowerlimit Upperlimit Lowerl imit Upperlimit
51.78719489 73.7293732 60.5028627 65.013705 b. The 95% confidence interval estimate for the average exam mark of students who
had a mark of 65% on the second test in the Statistics course is (60.5, 65).
c. The 95% prediction interval estimate for the exam mark of a student who had a markof 65% on the second test in the Statistics course is (51.8, 73.75). This interval iswider, because it has to take into the account the variability in individual marks ofthe students. The regression prediction interval is always wider than the confidenceinterval. The prediction interval has to take account of the distribution of exammarks around the regression line.
Copyright 2011 Pearson Canada Inc. 375
-
8/12/2019 ch13-solns-all_skuce_2e
26/38
-
8/12/2019 ch13-solns-all_skuce_2e
27/38
Instructors Solutions Manual - Chapter 1316. As the scatter diagram created for Exercise 15 indicates, there appears to be a fairly
strong positive linear relationship between the recorded and audited inventoryvalues.
The residual plot is shown below.
60
40
20
0
20
40
60
80
$ $200 $400 $600 $800 $1,000
Re
siduals
RecordedPartsInventoryValue
RecordedPartsInventory
Value ResidualPlot
The residual plot shows residuals fairly randomly distributed around zero, with about
the same variability for all x-values. There are two residuals that show unusualvariability. They are circled in the plot.
The data were all collected at about the same point in time, so there is no need tocheck residuals against time.
A review of the standardized residuals reveals two outliers, observation #1 andobservation #25 (these are the two points that are circled in the residual plot). Sincethe auditor has realized that he misread the written records for both data points, wewill amend the data, and re-do the analysis.
Copyright 2011 Pearson Canada Inc. 377
-
8/12/2019 ch13-solns-all_skuce_2e
28/38
-
8/12/2019 ch13-solns-all_skuce_2e
29/38
Instructors Solutions Manual - Chapter 13The residual plot for the amended data plot is shown below.
40302010
010203040
$ $200 $400 $600 $800 $1,000
Residuals
RecordedParts
Inventory
Value
RecordedPartsInventory
ValueResidual
Plot
The residual plot for the amended data set looks acceptable.
A histogram of the residuals for the amended data set is shown below.
0
1
2
3
4
5
6
7
8
9
Frequency
Residual
ResidualsforAriesCarPartsModel
The histogram of residuals shows some positive skewness, and this is a cause forconcern, suggesting caution in the use of the model.
Copyright 2011 Pearson Canada Inc. 379
-
8/12/2019 ch13-solns-all_skuce_2e
30/38
Instructors Solutions Manual - Chapter 13A check of the standardized residuals does not reveal any outliers. There are noobviously influential observations. It appears the corrected data set meets therequirements for the linear regression model, although the distribution of theresiduals is not as normal in shape as is desired.
17. While we have some concern about the distribution of residuals, we will proceedwith the hypothesis test.
H0: 1= 0 (that is, there is no linear relationship between the recorded inventoryvalues and the audited inventory values)
H1: 10 (that is, there is a linear relationship between the recorded inventoryvalues and the audited inventory values)
= 0.05
An excerpt of Excels regression output is shown below.
SUMMARYOUTPUT
RegressionStatistics
MultipleR 0.995213711
RSquare 0.99045033
AdjustedRSquare 0.990160946
StandardError 16.61634358
Observations 35
ANOVA
df SS MS F
Regression 1 944994.372 944994.372 3422.616936
Residual 33 9111.394836 276.1028738
Total 34 954105.7668
Coefficients StandardError t Stat Pvalue
Intercept 25.22708893 8.612571593 2.929100636 0.006122286
RecordedParts
InventoryValue 0.978281557 0.016721865 58.50313612 6.47389E35
From the Excel output, t = 58.503.The p-value is 6.47389E-35, which is very small, and certainly < 5%. In otherwords, there is almost no chance of getting sample results like these, if in fact there isno linear relationship between the recorded inventory values and the auditedinventory values. Therefore, reject the null hypothesis and conclude there isevidence of a linear relationship between the recorded and audited inventory values.
Copyright 2011 Pearson Canada Inc. 380
-
8/12/2019 ch13-solns-all_skuce_2e
31/38
-
8/12/2019 ch13-solns-all_skuce_2e
32/38
-
8/12/2019 ch13-solns-all_skuce_2e
33/38
Instructors Solutions Manual - Chapter 1320. A scatter diagram for the data is shown below.
y=0.6421x+4.9775
R=0.7989
3035
40
45
50
55
60
65
70
75
50 60 70 80 90 100Scoreon
TestGivenDuringJobInterview
FinallOverallAverageGrade
PerformanceofGraduatesonTest
GivenDuring
Job
Interview
It appears there is a positive linear relationship between the final overall averagegrade and the score on the test given during the job interview. The regressionrelationship is as follows:
score on test given during job interview= 0.6421(final overall average grade) + 4.98
This is promising. Since the grades are marked out of 100, and the test scores are outof 70, the slope would be 0.70 if the relationship was perfect.
Copyright 2011 Pearson Canada Inc. 383
-
8/12/2019 ch13-solns-all_skuce_2e
34/38
Instructors Solutions Manual - Chapter 1321. As discussed in Exercise 20 above, there appears to be a positive linear relationship
between the final overall average grade and the score on the test given during the jobinterview.
The residual plot is shown below.
8
6
4
2
0
2
4
6
8
50 60 70 80 90 100Residuals
FinalAverageMark
FinalAverageMark ResidualPlot
The residuals appear randomly distributed around zero, with the same variability forall x-values.
A histogram of the residuals is shown below.
0
2
4
6
8
10
12
Frequency
Residual
ResidualsforTestScoreModel
The residuals appear approximately normally distributed.
Copyright 2011 Pearson Canada Inc. 384
-
8/12/2019 ch13-solns-all_skuce_2e
35/38
-
8/12/2019 ch13-solns-all_skuce_2e
36/38
Instructors Solutions Manual - Chapter 1324. Refer back to the output shown above in the solution to Exercise 23.
With 98% confidence, we estimate that the interval (44.0, 62.3) contains the testscore of a student with an overall average mark of 75.
It is difficult to decide if the company should continue to administer its own test.The answer depends on how reliable a predictor of future performance the test hasbeen, and what the costs of administering the tests have been. If the company testmakes a major distinction between the predicted performance of someone with a testscore of 44 and someone with a test score of 62, then the overall average grade maynot be a good substitute. However, there is fairly strong relationship between thetwo variables. Perhaps the company could pilot using the overall average grade witha random sample of graduates, to see how well they do.
25. No, it would not be appropriate to use package weight as a predictor of shipping cost.We can see from the residual plot that variability increases as package weight
increases.
26. It is often suggested that the Canadian stock market is very closely tied to the priceof oil. A data set of weekly values for the Toronto Stock Exchange Composite Index(TSX) and the Canadian spot price of oil in dollars per barrel for the period fromJanuary 2000 to June 2009 was examined. The scatter diagram (shown below),suggests that while there may be a relationship between the two variables, it is notlinear.
y=76.584x+6039.5
R=0.6902
4,000
6,000
8,000
10,000
12,000
14,000
16,000
$0 $20 $40 $60 $80 $100 $120 $140 $160
S&PTSXCompositeIndex
WeeklyCanadianParSpotPrice(DollarsperBarrel)
TSXandCanadianOilPrices,January
2007June2009
Copyright 2011 Pearson Canada Inc. 386
-
8/12/2019 ch13-solns-all_skuce_2e
37/38
Instructors Solutions Manual - Chapter 13The non-linearity is evident in the residual analysis, as well.
5000
4000
3000
2000
1000
0
1000
2000
3000
4000
0 20 40 60 80 100 120 140 160
Residuals
WeeklyCanadianParSpotPriceFOB (DollarsperBarrel)
WeeklyCanadianParSpotPriceFOB
(Dollarsper
Barrel)
Residual
Plot
5000
4000
3000
2000
1000
0
1000
2000
3000
4000
0
3/01/2000
0
3/06/2000
0
3/11/2000
0
3/04/2001
0
3/09/2001
0
3/02/2002
0
3/07/2002
0
3/12/2002
0
3/05/2003
0
3/10/2003
0
3/03/2004
0
3/08/2004
0
3/01/2005
0
3/06/2005
0
3/11/2005
0
3/04/2006
0
3/09/2006
0
3/02/2007
0
3/07/2007
0
3/12/2007
0
3/05/2008
0
3/10/2008
0
3/03/2009
Residual
ResidualsOverTime,TSXandOilPriceModel
There appears to be a time-related pattern in the residuals. This is also apparent in thepatterns of extreme residuals (those with standardized residuals either +2 or -2).They predictably occur in the period of August in 2000, January July 2007, July2008 and September-October 2008. While the model could probably be improved bythe addition of a time variable, it is not clear how this could be used for predictive
Copyright 2011 Pearson Canada Inc. 387
-
8/12/2019 ch13-solns-all_skuce_2e
38/38
Instructors Solutions Manual - Chapter 13purposes. It would be probably be more useful to investigate what other explanatoryvariables were affecting the stock market over this period. As well, non-linearmodels could be explored.