ch13-solns-all_skuce_2e

download ch13-solns-all_skuce_2e

of 38

Transcript of ch13-solns-all_skuce_2e

  • 8/12/2019 ch13-solns-all_skuce_2e

    1/38

    Instructors Solutions Manual - Chapter 13Chapter 13 Solutions

    Develop Your Skills 13.1

    1. The scatter diagram is shown below.

    y=6.6519x+4.7013

    0

    20

    40

    60

    80

    100

    120

    140

    5 7 9 11 13 15 1

    TotalSales($000)

    NumberofSalesContacts

    HendrickSoftwareSales

    7

    The least-squares regression line is:total sales ($000) = 6.6519(number of sales contacts) + 4.7013

    Interpretation: Each new sales contact results in an increase in sales of

    approximately $6,652.

    The y-intercept should not be interpreted, since the sample data did not contain anyobservations of 0 sales contacts.

    2. The equation of the least-squares regression line ismonthly spending on restaurant meals = 0.024144(monthly income)+$44.90

    Interpretation: Each new dollar in monthly income increases spending on restaurantmeals by about 2.4.

    Copyright 2011 Pearson Canada Inc. 351

  • 8/12/2019 ch13-solns-all_skuce_2e

    2/38

    Instructors Solutions Manual - Chapter 133. A scatter diagram is shown below.

    y=30.21x148770

    $0

    $200,000

    $400,000

    $600,000

    $800,000

    $1,000,000

    $1,200,000

    $1,400,000

    $1,600,000

    $0 $10,000 $2 0,000 $30,000 $ 40,000 $50,000

    Sales

    PromotionExpenditure

    SmithandKleinManufacturing

    The least-squares regression line is:annual sales = 30.21(annual promotion spending) - $148,770

    Interpretation: Each new dollar in promotion spending results in an increase inannual sales of approximately $30.21.

    The y-intercept should not be interpreted, since the sample data did not contain anyobservations of $0 annual promotion spending.

    4. The response variable is the semester average mark, and the explanatory variable isthe total number of hours spent working during the semester. The relationship isunlikely to be positive.

    y = 0.1535x + 90.241

    suggests that a student who worked no hours would get a mark of 90%, which seemsa little high (but this intercept may not be reasonable to interpret this way, depending

    on the range of hours worked in the sample data).

    It also suggests that for each hour worked, the students mark would increase by0.1535, which seems unlikely. It is more likely that the student's mark woulddecrease for each hour worked.

    Copyright 2011 Pearson Canada Inc. 352

  • 8/12/2019 ch13-solns-all_skuce_2e

    3/38

    Instructors Solutions Manual - Chapter 135. Because of the way the researcher has posed the question, the response variable is

    revenues, and the explanatory variable is the number of employees.

    The scatter diagram is shown below:

    y=0.1338x+140.56

    0

    500

    1,000

    1,500

    2,000

    2,500

    3,000

    3,500

    4,000

    4,500

    5,000

    0 5,000 10,000 15,000 20,000 25,000 30,000 35,000

    GlobalRes

    earchRevenues(US$Millions)

    FullTimeEmployees

    Top25GlobalResearch

    Organizations,2007

    The least-squares regression line is:revenue (US$millions) = 0.1338(number of full-time employees) + $140.56 USmillion

    Interpretation: Each additional thousand employees results in increased revenue ofUS$0.1338 million (or US$133,800).

    The y-intercept should not be interpreted, since the sample data did not contain anyobservations of 0 employees.

    Copyright 2011 Pearson Canada Inc. 353

  • 8/12/2019 ch13-solns-all_skuce_2e

    4/38

    Instructors Solutions Manual - Chapter 13Develop Your Skills 13.2

    6. The scatter diagram showed an apparently linear relationship between software salesand the number of sales contacts (see Develop Your Skills 13.1, Exercise 1).

    20

    15

    10

    5

    0

    5

    10

    15

    20

    0 5 10 15 20Residuals

    NumberofSalesContacts

    Numberof

    Sales

    Contacts

    ResidualPlot

    The residual plot shows residuals centred on zero, with fairly constant variability.There is no indication that the error terms are not independent. The data werecollected over a random sample of months, but the dates of collection are notincluded, so it is not possible to check for independence of the residuals over time.A histogram of the residuals appears to be approximately normal.

    01

    2

    3

    4

    5

    6

    7

    8

    9

    Frequency

    Residual

    HendrickSoftwareSalesResiduals

    Copyright 2011 Pearson Canada Inc. 354

  • 8/12/2019 ch13-solns-all_skuce_2e

    5/38

    Instructors Solutions Manual - Chapter 13A check of the scatter diagram and the standardized residuals does not reveal anyoutliers. There are no obvious influential observations. It appears that the sampledata meet the requirements of the theoretical model.

    7. The scatter diagram does not contain much of a pattern, but if there is a relationship,

    it appears to be linear.

    y=0.0241x+44.903

    $0

    $50

    $100

    $150

    $200

    $250

    $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500

    MonthlySpendingonRestaurantMeals

    MonthlyIncome

    SpendingonRestaurantMealsand

    Income

    150

    100

    50

    0

    50

    100

    150

    $

    $1,000

    $2,000

    $3,000

    $4,000

    $5,000

    Residuals

    MonthlyIncome

    MonthlyIncome

    Residual

    Plot

    The residual plot shows a fairly constant variability, although the residuals appear tobe a little larger on the positive side (except in the area of monthly incomes ofaround $3,500). There is no obvious dependence among the residuals.

    Copyright 2011 Pearson Canada Inc. 355

  • 8/12/2019 ch13-solns-all_skuce_2e

    6/38

    Instructors Solutions Manual - Chapter 13

    A histogram of the residuals appears to be approximately normal.

    0

    5

    10

    15

    20

    25

    30

    Frequency

    Residual

    ResidualsforModelofRestaurant

    SpendingandMonthlyIncome

    A check of the scatter diagram and the standardized residuals reveals six points thatcould be considered outliers. They are circled on the scatter diagram below.

    $0

    $50

    $100

    $150

    $200

    $250

    $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500

    MonthlySpendingonRestaurantMeals

    MonthlyIncome

    SpendingonRestaurantMealsand

    Income

    Copyright 2011 Pearson Canada Inc. 356

  • 8/12/2019 ch13-solns-all_skuce_2e

    7/38

  • 8/12/2019 ch13-solns-all_skuce_2e

    8/38

    Instructors Solutions Manual - Chapter 13However, at this point in the analysis, it would be useful to go back to the beginning.It does not appear that monthly income is a strong predictor of monthly restaurantspending. There is too much variability in the restaurant spending data, for thevarious income levels, for us to develop a useful model.

    8. The scatter diagram shows the points arranged in a linear fashion. However, thescatter around the regression line appears to widen as the amount of promotionalspending increases.This shows quite clearly in the residual plot.

    300000

    200000

    100000

    0

    100000

    200000

    300000

    $0 $10,000 $20,000 $30,000 $40,000 $50,000

    Residuals

    PromotionExpenditure

    PromotionExpenditure Residual

    Plot

    At this point, it is clear that the data do meet the requirements of the theoreticalmodel. [For completeness, we will continue to check the other requirements.]

    Copyright 2011 Pearson Canada Inc. 358

  • 8/12/2019 ch13-solns-all_skuce_2e

    9/38

    Instructors Solutions Manual - Chapter 13This is time-series data, and so the residuals should be plotted against time. Theresulting plot shows a definite pattern over time, with the residuals widening in morerecent years. This again indicates a problem; the current model does not meet therequirements of the theoretical model.

    250000

    200000150000

    100000

    50000

    0

    50000

    100000

    150000

    200000

    250000

    1980

    1982

    1984

    1986

    1988

    1990

    1992

    1994

    1996

    1998

    2000

    2002

    2004

    2006

    2008

    2010

    Residual

    ResidualsOverTime,SmithandKlein

    Manufacturing

    At this point, it is clear that the model should be re-specified. Introducing time as anexplanatory variable would probably be of interest.

    Copyright 2011 Pearson Canada Inc. 359

  • 8/12/2019 ch13-solns-all_skuce_2e

    10/38

    Instructors Solutions Manual - Chapter 139. With the two erroneous data points removed, the scatter diagram looks as shown

    below.

    y=0.144x+89.175

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0 100 200 300 400

    SemesterAverageMark

    TotalHoursatPaidJobDuringSemester

    Hours

    of

    Work

    and

    Semester

    Marks

    The relationship appears to be linear.The residual plot is shown below.

    20

    15

    10

    5

    0

    5

    10

    15

    0 100 200 300 400Residuals

    TotalHoursatPaidJobDuringSemester

    TotalHours

    at

    Paid

    Job

    During

    Semester ResidualPlot

    The residuals appear centred on zero, with fairly constant variability, althoughvariability seems greatest in the middle of the range of hours worked.

    Copyright 2011 Pearson Canada Inc. 360

  • 8/12/2019 ch13-solns-all_skuce_2e

    11/38

    Instructors Solutions Manual - Chapter 13There is no indication that the residuals are dependent.

    A histogram of the residuals is shown below.

    0

    2

    4

    6

    8

    10

    12

    14

    Frequency

    Residual

    Residualsfor

    Semester

    Mark

    and

    Hours

    ofWorkData

    The histogram is quite normal in shape.

    A check of the standardized residuals does not reveal any that are -2 or +2,although there is one observation with a standardized residual of -1.99. This is theobservation (72, 65). [If we could, we would check this data point to make sure thatit is accurate.] This point is quite obvious in both the scatter diagram and residual

    plot (the point is circled in these two graphs).

    There are no obvious influential observations, except perhaps for the almost-outlier.Removing this point from the data set does not affect the least squares regression linesignificantly.

    Despite the one troublesome point, the data set does appear to meet the requirementsof the theoretical model.

    Copyright 2011 Pearson Canada Inc. 361

  • 8/12/2019 ch13-solns-all_skuce_2e

    12/38

    Instructors Solutions Manual - Chapter 1310. The relationship between revenues and number of employees appears to be linear.

    The residual plot is shown below.

    600

    400

    200

    0

    200

    400

    600

    800

    1000

    1200

    0 5000 10000 15000 20000 25000 30000 35000

    Residuals

    FullTimeEmployees

    FullTime

    Employees

    Residual

    Plot

    The residuals do not appear to be centred on zero, and the variability is not constant.At this point, it appears that this sample data set does not appear to meet therequirements of the theoretical model.

    A histogram of the residuals is shown below.

    0

    2

    4

    6

    8

    10

    12

    14

    16

    Freq

    uency

    Residuals

    ResidualsforTop25GlobalResearch

    Organizations

    Copyright 2011 Pearson Canada Inc. 362

  • 8/12/2019 ch13-solns-all_skuce_2e

    13/38

    Instructors Solutions Manual - Chapter 13The histogram of residuals confirms what we saw in the residual plot. The residualsare highly skewed to the right.

    There is one observation with a standardized residual of 3.8. The corresponding pointis circled on the residual plot above.

    Develop Your Skills 13.3

    11. Since the sample data meet the requirements, it is acceptable to proceed with thehypothesis test.H0: 1= 0 (that is, there is no linear relationship between the number of sales

    contacts and sales)H1: 1> 0 (that is, there is a positive linear relationship between the number of sales

    contacts and sales)= 0.05From the Excel output, t = 7.64The p-value is 9.38E-08, which is very small. The p-value for the one-tailed test is

    only half of this value, and is certainly < . In other words, there is almost no chanceof getting sample results like these, if in fact there is no linear relationship betweenthe number of sales contacts and sales. Therefore, we can (with confidence), rejectthe null hypothesis and conclude there is evidence of a positive linear relationshipbetween the number of sales contacts and sales data for the Hendrick Software SalesCompany.

    12. We already expect that the model will not be particularly useful. The number of datapoints with standardized residuals either +2 or -2 are a concern. However, thehypothesis test provides some evidence that there is a linear relationship betweenmonthly income and monthly spending on restaurant meals.

    H0: 1= 0 (that is, there is no linear relationship between monthly income andmonthly spending on restaurant meals)

    H1: 1> 0 (that is, there is a positive linear relationship between the number of salescontacts and sales)

    = 0.05From the Excel output, t = 4.6. The p-value is on the output is 1.338E-05, and the p-value for the one-tailed test is half of this. Reject H0and conclude there is evidenceof a positive linear relationship between monthly income and monthly spending onrestaurant meals.

    13. Since the sample data do not meet the requirements of the theoretical model, it is notappropriate to conduct a hypothesis test.

    Copyright 2011 Pearson Canada Inc. 363

  • 8/12/2019 ch13-solns-all_skuce_2e

    14/38

    Instructors Solutions Manual - Chapter 1314. Since the sample data meet the requirements, it is acceptable to proceed with the

    hypothesis test.H0: 1= 0 (that is, there is no linear relationship between the number of hours

    worked during the semester and the semester average grade)H1: 1< 0 (that is, there is a negative linear relationship between the number of

    hours worked during the semester and the semester average grade)= 0.05From the Excel output, t = -10.01The p-value is 2.47086E-12, which is very small. The p-value for the one-tailed testis only half of this value, and is certainly < . In other words, there is almost nochance of getting sample results like these, if in fact there is no linear relationshipbetween the number of hours worked during the semester and the semester averagegrade. Therefore, we can (with confidence), reject the null hypothesis and concludethere is evidence of a negative linear relationship between the number of hoursworked during the semester and the semester average grade.

    15. Since the sample data do not meet the requirements of the theoretical model, it is notappropriate to conduct a hypothesis test.

    Develop Your Skills 13.4

    16. From the Excel output, R2= 0.72. This means that 72% of the variation in sales is

    explained by the number of sales contacts. This suggests a fairly strong linearassociation between the two variables, which is not surprising.

    Assuming the original data was collected correctly, it is possible that the otherfactors affecting sales have been randomized. In such a case, it would seemreasonable to conclude that increasing sales contacts would lead to increased sales.However, there will likely be limits to the positive impact that could be created.Presumably, salespeople contact their best prospective clients first, so additionalcontacts may not be as productive. As well, increasing the number of contacts mayreduce the quantity of time spent with each contact, which could have a detrimentaleffect on sales.

    17. The R2value for this data set is only 0.18. This is not surprising, because the scatterdiagram of the relationship revealed scarcely any perceivable pattern. Only 18% ofthe variation in monthly spending on restaurant meals is explained by income.Earlier investigations suggested this model was not worth pursuing, and the low R

    2

    value reinforces that.

    18. The R2value is fairly high, at 0.83. This means that 83% of the variation in Smithand Kleins sales is explained by sales promotion spending. However, while there isa strong association between the two variables, the linear regression model is not agood one.

    Copyright 2011 Pearson Canada Inc. 364

  • 8/12/2019 ch13-solns-all_skuce_2e

    15/38

    Instructors Solutions Manual - Chapter 1319. The R2value, at 0.72, suggests that 72% of the variation in semester average marks

    is explained by hours spent working during the semester. (Note that this is for theamended data set, where the two erroneous grades have been removedsee DevelopYour Skills 13.2, Exercise 9). Obviously, there are many factors that affect semesteraverage marks, for example, ability, study habits, past educational experience, and so

    on. If the original data were collected in a truly random fashion, these factors mayhave been randomized.

    It seems reasonable to conclude that students who work less will have more time fortheir studies, and it seems reasonable to think that marks improve with time spentstudying. However, this data set does not guarantee that reducing work will lead toimproved marks.

    20. The R2value is 0.93. Notice that this value looks very promising. Remember,though, that the model did not meet the requirements of the theoretical model.Remember, a high R2value does not guarantee a cause-and-effect relationship, or a

    useful model.

    Develop Your Skills 13.5

    21. Since the requirements are met, it is appropriate to create a confidence interval.The Excel output is shown below (in two parts, to better fit on the page).

    ConfidenceIntervaland PredictionIntervals Calculations

    Point 98% =ConfidenceLevel(%)

    Number NumberofSalesContacts

    1 10 PredictionInterval ConfidenceInterval

    Lowerlimit Upperlimit Lowerlimit Upperlimi

    44.96826 97.471443 66.068659 76.37104

    With 98% confidence, the interval ($66,069, $76,371) contains the average sales for10 sales contacts.

    22. We have already established this is not a good model. However, even if it were agood model, we would not use it to predict monthly spending on restaurant mealsbased on a monthly income of $6,000. The highest monthly income in the sample

    data set is $4,056, and so we should not rely on our model to make predictions for amonthly income of $6,000.

    23. Since the requirements are not met, it is not appropriate to create a confidenceinterval.

    Copyright 2011 Pearson Canada Inc. 365

  • 8/12/2019 ch13-solns-all_skuce_2e

    16/38

    Instructors Solutions Manual - Chapter 1324. The Excel output is shown below (note that this is for the amended data set, where

    the two erroneous grades have been removedsee Develop Your Skills 13.2,Exercise 9).

    Confidence

    Interval

    and

    Prediction

    Intervals

    CalculationsPoint 95% =ConfidenceLevel(%)

    Number TotalHoursatPaidJobDuringSemester

    1 200 PredictionInterval ConfidenceInterval

    Lowerlimit Upperlimit Lowerlimit Upperlimit

    46.027952 74.74128231 58.1586403 62.61059452

    With 95% confidence, the interval (58.2, 62.6) contains the average semester averagemark, when students work 200 hours in paid employment during the semester.

    25. Since the requirements are not met, it is not appropriate to construct a predictioninterval.

    Chapter Review Exercises

    1. The hypothesis test is only valid if the required conditions are met. If you don't checkconditions, you may rely on a hypothesis test when it is misleading.

    2. Regression prediction intervals are wider than confidence intervals because theinterval has to account for the distribution of y-values around the regression line. Theregression confidence interval has to take into account only that the sample

    regression line may not match the true population regression line.

    3. A lower standard error means that confidence and prediction intervals will benarrower. Predictions made with the model will therefore be more useful.

    4. You should not make predictions outside the range of the sample data on which theregression relationship is based because the relationship may be very different there.For example, a linear model may provide a good approximation of a portion of arelationship that is actually a curved line. However, if the line is extended beyondthis portion, it could be quite misleading.

    5. It is always tempting to just remove problem data points. However, if you do this,you will often find that the remaining data points also have outliers. If you persist inthe practice of removing troublesome data points, you may not have much data left!

    Careful thinking is a better approach. The outlier may be telling you something reallyimportant about the actual relationship between the explanatory and responsevariables. You wouldn't want to miss this important clue to what is really going on.

    Copyright 2011 Pearson Canada Inc. 366

  • 8/12/2019 ch13-solns-all_skuce_2e

    17/38

    Instructors Solutions Manual - Chapter 136. The scatter diagram is shown below.

    y=0.0374x+18017

    $10,000

    $12,000

    $14,000

    $16,000

    $18,000

    $20,000

    $22,000

    0 20,000 40,000 60 ,000 80 ,000 100,0 00 120,000

    ListPrice

    OdometerReading

    ListPriceandOdometerReadingfor2006

    HondaCivic

    Sedan

    (as

    of

    Fall

    2008)

    The relationship is:$list price = -0.0374 (odometer reading in kilometers) + $18,017

    For this small car, the base asking price is $18,017, which is reduced by about 3.7for every kilometer on the odometer. However, note that this base asking priceshould not be trusted for any cars with fewer than 8,600 kilometres, since no cars in

    the data set had odometer readings below that.

    Copyright 2011 Pearson Canada Inc. 367

  • 8/12/2019 ch13-solns-all_skuce_2e

    18/38

    Instructors Solutions Manual - Chapter 137. We have already examined the scatter diagram, which suggests a negative linear

    relationship.

    The residual plot is shown below. It has the desired appearance of constantvariability, with the residuals centred on zero.

    4000

    3000

    2000

    1000

    0

    1000

    2000

    3000

    0 20000 40000 60000 80000 100000 120000

    Residuals

    Odometer

    Odometer ResidualPlot

    A histogram of the residuals is shown below. The histogram is not perfectlynormally-distributed, but it is approximately so.

    0

    12

    3

    4

    5

    6

    7

    8

    Frequency

    Residual

    ResidualsforHondaCivicListPrice

    Model,BasedonOdometer

    Copyright 2011 Pearson Canada Inc. 368

  • 8/12/2019 ch13-solns-all_skuce_2e

    19/38

    Instructors Solutions Manual - Chapter 13There are no standardized residuals +2 or -2.

    It appears the sample data meet the requirements of the theoretical model, and so itwould be appropriate to use odometer readings to predict the list prices of these usedcars.

    A 95% prediction interval for the list price for one of these cars with 50,000kilometres on the odometer is ($12,683, $19,608). The Excel output is shown below.

    ConfidenceIntervaland PredictionIntervals Calculations

    Point 95% =Confidence Level(%)

    Number Odometer

    1 50000 PredictionInterval ConfidenceInterval

    Lower

    limit Upper

    limit Lower

    limit Upper

    limit12683.4909 19607.9242 15259.8312 17031.584

    8. A scatter diagram showing the two stock market indexes is shown below. Note thatthe data used are the "adjusted close" figures. You must take care to match thedatesthere are a few instances when one market is open and the other is not.Observations that did not have a match were removed from the data set.

    y=1.2553x 894.84

    7,000

    7,500

    8,000

    8,500

    9,000

    9,500

    10,000

    10,500

    11,000

    6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500

    S&P/TSXCompositeIndex

    DowJonesIndustrialAverage

    TSX

    and

    DJI,

    January

    June,

    2009

    The estimated relationship is as follows:TSX Composite Index = 1.255 (DJI) 895

    Copyright 2011 Pearson Canada Inc. 369

  • 8/12/2019 ch13-solns-all_skuce_2e

    20/38

    Instructors Solutions Manual - Chapter 13Note that the choice of variable on the x or y axis is somewhat arbitrary here.Because Canada's economy is so dependent on exports to the US, the DJI is placed asthe "explanatory" variable, but the cause and effect is not direct.

    9. The coefficient of determination for the TSX and the DJI over the first six months of

    2009 is 0.72. This measure suggests that 72% of the variation in the TSX isexplained by variation in the DJI.

    10. This data set is not a random sample, because it includes all matched observationsover the period studied. Could this be considered a random sample? Probably not.The credit crisis and the recession that were having impacts on the stock markets inthe first six months of 2009 made this period unreliable as a model of how the twoindexes behave during more normal times. However, it is interesting to examine thepatterns in the indexes over the period.

    The indexes were more closely related at the beginning of 2009 than they were later

    in the period. A time-series plot reveals this quite clearly.

    6,000

    6,500

    7,000

    7,500

    8,0008,500

    9,000

    9,500

    10,000

    10,500

    11,000

    02

    Jan09

    16

    Jan09

    30

    Jan09

    13

    Feb

    09

    27

    Feb

    09

    13

    Mar09

    27

    Mar09

    10

    Apr09

    24

    Apr09

    08

    May09

    22

    May09

    05

    Jun09

    19

    Jun09

    IndexV

    alues

    TSXandDJI,JanuaryJune2009

    DJI

    TSX

    The required conditions are not met (as we might expect, given the graph above).

    Copyright 2011 Pearson Canada Inc. 370

  • 8/12/2019 ch13-solns-all_skuce_2e

    21/38

    Instructors Solutions Manual - Chapter 13The residual plot clearly shows non-constant variability.

    1500

    1000

    500

    0

    500

    1000

    6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500

    Residuals

    DJI

    DJIResidualPlot

    As well, the histogram of residuals shows marked negative skewness.

    0

    5

    10

    15

    20

    25

    30

    35

    40

    Frequency

    Residual

    Residuals,TSXandDJIData,January

    June2009

    Copyright 2011 Pearson Canada Inc. 371

  • 8/12/2019 ch13-solns-all_skuce_2e

    22/38

    Instructors Solutions Manual - Chapter 13A plot of the residuals over time clearly shows a time-related pattern.

    1500

    1000

    500

    0

    500

    1000

    02

    Jan09

    16

    Jan09

    30

    Jan09

    13

    Feb

    09

    27

    Feb

    09

    13

    Mar09

    27

    Mar09

    10

    Apr09

    24

    Apr09

    08

    May09

    22

    May09

    05

    Jun09

    19

    Jun09

    Residuals

    ResidualsOverTime,TSXandDJIData,January

    June

    2009

    Copyright 2011 Pearson Canada Inc. 372

  • 8/12/2019 ch13-solns-all_skuce_2e

    23/38

    Instructors Solutions Manual - Chapter 1311. A scatter diagram is shown below.

    y=

    0.9586x

    +0.4464

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90100

    0 20 40 60 80 10

    MarkonFinalExam

    MarkonTest#2

    StudentMarksinStatistics

    0

    The estimated relationship is as follows:Mark on final exam = 0.9586 (Mark on Test #2) + 0.4464

    In other words, it appears the mark on the final exam is about 96% of the mark onTest #2.

    Copyright 2011 Pearson Canada Inc. 373

  • 8/12/2019 ch13-solns-all_skuce_2e

    24/38

    Instructors Solutions Manual - Chapter 1312. The residual plot has the desired appearance.

    15

    10

    5

    0

    5

    10

    0 20 40 60 80 100

    Residuals

    Markon

    Test#2

    Markon

    Test#2

    Residual

    Plot

    A histogram of the residuals appears approximately normally-distributed.

    0

    2

    4

    6

    8

    10

    Frequency

    Residual

    ResidualsforFinalExamMarks

    PredictionModel

    There are no obvious influential observations or outliers. It appears that the sampledata conform to the requirements of the theoretical model.

    Copyright 2011 Pearson Canada Inc. 374

  • 8/12/2019 ch13-solns-all_skuce_2e

    25/38

    Instructors Solutions Manual - Chapter 1313. Since the sample data meet the requirements, it is acceptable to proceed with the

    hypothesis test.H0: 1= 0 (that is, there is no linear relationship between the mark on Test #2 and

    the final exam mark in Statistics)H1: 1> 0 (that is, there is a positive linear relationship between the mark on Test #2

    and the final exam mark in Statistics)= 0.05

    From the Excel output, t = 16.5

    The p-value is 2.96E-14, which is very small. The p-value for the one-tailed test isonly half of this value, and is certainly < 5%. In other words, there is almost nochance of getting sample results like these, if in fact there is no linear relationshipbetween the mark on Test #2 and the final exam mark in Statistics. Therefore, rejectH0 and conclude there is strong evidence of a positive linear relationship betweenthe mark on Test #2 and the final exam mark in Statistics.

    14a. The Excel output is shown below.

    PredictionInterval ConfidenceInterval

    Lowerlimit Upperlimit Lowerl imit Upperlimit

    51.78719489 73.7293732 60.5028627 65.013705 b. The 95% confidence interval estimate for the average exam mark of students who

    had a mark of 65% on the second test in the Statistics course is (60.5, 65).

    c. The 95% prediction interval estimate for the exam mark of a student who had a markof 65% on the second test in the Statistics course is (51.8, 73.75). This interval iswider, because it has to take into the account the variability in individual marks ofthe students. The regression prediction interval is always wider than the confidenceinterval. The prediction interval has to take account of the distribution of exammarks around the regression line.

    Copyright 2011 Pearson Canada Inc. 375

  • 8/12/2019 ch13-solns-all_skuce_2e

    26/38

  • 8/12/2019 ch13-solns-all_skuce_2e

    27/38

    Instructors Solutions Manual - Chapter 1316. As the scatter diagram created for Exercise 15 indicates, there appears to be a fairly

    strong positive linear relationship between the recorded and audited inventoryvalues.

    The residual plot is shown below.

    60

    40

    20

    0

    20

    40

    60

    80

    $ $200 $400 $600 $800 $1,000

    Re

    siduals

    RecordedPartsInventoryValue

    RecordedPartsInventory

    Value ResidualPlot

    The residual plot shows residuals fairly randomly distributed around zero, with about

    the same variability for all x-values. There are two residuals that show unusualvariability. They are circled in the plot.

    The data were all collected at about the same point in time, so there is no need tocheck residuals against time.

    A review of the standardized residuals reveals two outliers, observation #1 andobservation #25 (these are the two points that are circled in the residual plot). Sincethe auditor has realized that he misread the written records for both data points, wewill amend the data, and re-do the analysis.

    Copyright 2011 Pearson Canada Inc. 377

  • 8/12/2019 ch13-solns-all_skuce_2e

    28/38

  • 8/12/2019 ch13-solns-all_skuce_2e

    29/38

    Instructors Solutions Manual - Chapter 13The residual plot for the amended data plot is shown below.

    40302010

    010203040

    $ $200 $400 $600 $800 $1,000

    Residuals

    RecordedParts

    Inventory

    Value

    RecordedPartsInventory

    ValueResidual

    Plot

    The residual plot for the amended data set looks acceptable.

    A histogram of the residuals for the amended data set is shown below.

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Frequency

    Residual

    ResidualsforAriesCarPartsModel

    The histogram of residuals shows some positive skewness, and this is a cause forconcern, suggesting caution in the use of the model.

    Copyright 2011 Pearson Canada Inc. 379

  • 8/12/2019 ch13-solns-all_skuce_2e

    30/38

    Instructors Solutions Manual - Chapter 13A check of the standardized residuals does not reveal any outliers. There are noobviously influential observations. It appears the corrected data set meets therequirements for the linear regression model, although the distribution of theresiduals is not as normal in shape as is desired.

    17. While we have some concern about the distribution of residuals, we will proceedwith the hypothesis test.

    H0: 1= 0 (that is, there is no linear relationship between the recorded inventoryvalues and the audited inventory values)

    H1: 10 (that is, there is a linear relationship between the recorded inventoryvalues and the audited inventory values)

    = 0.05

    An excerpt of Excels regression output is shown below.

    SUMMARYOUTPUT

    RegressionStatistics

    MultipleR 0.995213711

    RSquare 0.99045033

    AdjustedRSquare 0.990160946

    StandardError 16.61634358

    Observations 35

    ANOVA

    df SS MS F

    Regression 1 944994.372 944994.372 3422.616936

    Residual 33 9111.394836 276.1028738

    Total 34 954105.7668

    Coefficients StandardError t Stat Pvalue

    Intercept 25.22708893 8.612571593 2.929100636 0.006122286

    RecordedParts

    InventoryValue 0.978281557 0.016721865 58.50313612 6.47389E35

    From the Excel output, t = 58.503.The p-value is 6.47389E-35, which is very small, and certainly < 5%. In otherwords, there is almost no chance of getting sample results like these, if in fact there isno linear relationship between the recorded inventory values and the auditedinventory values. Therefore, reject the null hypothesis and conclude there isevidence of a linear relationship between the recorded and audited inventory values.

    Copyright 2011 Pearson Canada Inc. 380

  • 8/12/2019 ch13-solns-all_skuce_2e

    31/38

  • 8/12/2019 ch13-solns-all_skuce_2e

    32/38

  • 8/12/2019 ch13-solns-all_skuce_2e

    33/38

    Instructors Solutions Manual - Chapter 1320. A scatter diagram for the data is shown below.

    y=0.6421x+4.9775

    R=0.7989

    3035

    40

    45

    50

    55

    60

    65

    70

    75

    50 60 70 80 90 100Scoreon

    TestGivenDuringJobInterview

    FinallOverallAverageGrade

    PerformanceofGraduatesonTest

    GivenDuring

    Job

    Interview

    It appears there is a positive linear relationship between the final overall averagegrade and the score on the test given during the job interview. The regressionrelationship is as follows:

    score on test given during job interview= 0.6421(final overall average grade) + 4.98

    This is promising. Since the grades are marked out of 100, and the test scores are outof 70, the slope would be 0.70 if the relationship was perfect.

    Copyright 2011 Pearson Canada Inc. 383

  • 8/12/2019 ch13-solns-all_skuce_2e

    34/38

    Instructors Solutions Manual - Chapter 1321. As discussed in Exercise 20 above, there appears to be a positive linear relationship

    between the final overall average grade and the score on the test given during the jobinterview.

    The residual plot is shown below.

    8

    6

    4

    2

    0

    2

    4

    6

    8

    50 60 70 80 90 100Residuals

    FinalAverageMark

    FinalAverageMark ResidualPlot

    The residuals appear randomly distributed around zero, with the same variability forall x-values.

    A histogram of the residuals is shown below.

    0

    2

    4

    6

    8

    10

    12

    Frequency

    Residual

    ResidualsforTestScoreModel

    The residuals appear approximately normally distributed.

    Copyright 2011 Pearson Canada Inc. 384

  • 8/12/2019 ch13-solns-all_skuce_2e

    35/38

  • 8/12/2019 ch13-solns-all_skuce_2e

    36/38

    Instructors Solutions Manual - Chapter 1324. Refer back to the output shown above in the solution to Exercise 23.

    With 98% confidence, we estimate that the interval (44.0, 62.3) contains the testscore of a student with an overall average mark of 75.

    It is difficult to decide if the company should continue to administer its own test.The answer depends on how reliable a predictor of future performance the test hasbeen, and what the costs of administering the tests have been. If the company testmakes a major distinction between the predicted performance of someone with a testscore of 44 and someone with a test score of 62, then the overall average grade maynot be a good substitute. However, there is fairly strong relationship between thetwo variables. Perhaps the company could pilot using the overall average grade witha random sample of graduates, to see how well they do.

    25. No, it would not be appropriate to use package weight as a predictor of shipping cost.We can see from the residual plot that variability increases as package weight

    increases.

    26. It is often suggested that the Canadian stock market is very closely tied to the priceof oil. A data set of weekly values for the Toronto Stock Exchange Composite Index(TSX) and the Canadian spot price of oil in dollars per barrel for the period fromJanuary 2000 to June 2009 was examined. The scatter diagram (shown below),suggests that while there may be a relationship between the two variables, it is notlinear.

    y=76.584x+6039.5

    R=0.6902

    4,000

    6,000

    8,000

    10,000

    12,000

    14,000

    16,000

    $0 $20 $40 $60 $80 $100 $120 $140 $160

    S&PTSXCompositeIndex

    WeeklyCanadianParSpotPrice(DollarsperBarrel)

    TSXandCanadianOilPrices,January

    2007June2009

    Copyright 2011 Pearson Canada Inc. 386

  • 8/12/2019 ch13-solns-all_skuce_2e

    37/38

    Instructors Solutions Manual - Chapter 13The non-linearity is evident in the residual analysis, as well.

    5000

    4000

    3000

    2000

    1000

    0

    1000

    2000

    3000

    4000

    0 20 40 60 80 100 120 140 160

    Residuals

    WeeklyCanadianParSpotPriceFOB (DollarsperBarrel)

    WeeklyCanadianParSpotPriceFOB

    (Dollarsper

    Barrel)

    Residual

    Plot

    5000

    4000

    3000

    2000

    1000

    0

    1000

    2000

    3000

    4000

    0

    3/01/2000

    0

    3/06/2000

    0

    3/11/2000

    0

    3/04/2001

    0

    3/09/2001

    0

    3/02/2002

    0

    3/07/2002

    0

    3/12/2002

    0

    3/05/2003

    0

    3/10/2003

    0

    3/03/2004

    0

    3/08/2004

    0

    3/01/2005

    0

    3/06/2005

    0

    3/11/2005

    0

    3/04/2006

    0

    3/09/2006

    0

    3/02/2007

    0

    3/07/2007

    0

    3/12/2007

    0

    3/05/2008

    0

    3/10/2008

    0

    3/03/2009

    Residual

    ResidualsOverTime,TSXandOilPriceModel

    There appears to be a time-related pattern in the residuals. This is also apparent in thepatterns of extreme residuals (those with standardized residuals either +2 or -2).They predictably occur in the period of August in 2000, January July 2007, July2008 and September-October 2008. While the model could probably be improved bythe addition of a time variable, it is not clear how this could be used for predictive

    Copyright 2011 Pearson Canada Inc. 387

  • 8/12/2019 ch13-solns-all_skuce_2e

    38/38

    Instructors Solutions Manual - Chapter 13purposes. It would be probably be more useful to investigate what other explanatoryvariables were affecting the stock market over this period. As well, non-linearmodels could be explored.