통자발표2

34
Factors That Can Affect Model Performance Seonwoo Lee

Transcript of 통자발표2

Page 1: 통자발표2

Factors That Can Affect Model

Performance

Seonwoo Lee

Page 2: 통자발표2

Contents

1. Introduction

2. Type III Errors

3. Measurement Errors

4. Discretizing Continuous Outcomes

5. When Should You Trust Your Model’s Prediction?

6. The Impact of a Large Sample

2

Page 3: 통자발표2

Several of the preceding chapters have focused on technical pitfalls of predictive models,

such as over-fitting and class imbalances.

Ture success may depend on aspects of the problem that are not directly related on the

model itself.

This chapter discusses several important aspects of creating and maintaining predictive

models.

Introduction

3

Page 4: 통자발표2

Type III Error

One of the most common mistakes in modeling is to develop a model that answers the

wrong question, otherwise known as a Type III error (Kimball, 1957).

There can be a tendency to focus on the technical details and inadvertently overlook true

nature of the problem.

It is very important to focus on the overall strategy of the problem at hand and not just

the technical tactics of the potential solution.

Type III Errors

4

Page 5: 통자발표2

Example Business Application

The main goal is almost always to maximize profit in business.

When the outcome is categorical (e.g., purchase / no-purchase or churn / retention), it is

key to tie the model performance and class prediction back to the expected profit.

Type III Errors

5

Page 6: 통자발표2

Example Response Modeling

Recall the direct marketing example discussed in Chapter 11.

This campaign do not sample from the appropriate population.

It only utilized customers who had been contacted.

Any model built from these data is limited to predicting the probability of a purchase.

Type III Errors

6

Page 7: 통자발표2

Example Response Modeling

Siegel (2011) outlines four possible cases:

Type III Errors

No contactResponse Non response

Contact Response A BNon response C D

To increase profits, a model that accurately predicts which customers are in cell B is the

most useful.

7

Page 8: 통자발표2

Example Response Modeling

Techniques that attempt to understand the impacts of customer response are called

1. Uplift modeling

2. True lift modeling

3. Net lift modeling

4. Incremental lift modeling

5. True response modeling

Type III Errors

8

Page 9: 통자발표2

Measurement Errors

Measurement Error

Measurement error is the difference between a measured value of quantity and its true

value.

Measurement error can be divided into two components:

1. Random error

2. Systematic error

9

Page 10: 통자발표2

Measurement Errors

Measurement Error in the Outcome

This type gives rise to an upper bound on model performance for which no pre-

processing, model complexity, or tuning can overcome.

If a measured categorical outcome is mislabeled in the training data 10% of the time, it is

unlikely that any model could truly achieve more than a 90% accuracy rate.

10

Page 11: 통자발표2

Example Linear Regression Model

···

, where ~i. i. d. 0, .

If we knew the true model structure, then would represent the lowest possible error

achievable or the irreducible error.

We do not usually know the true model structure, so this value becomes inflated to

include model error.

Measurement Errors

11

Page 12: 통자발표2

Example Linear Regression Model

If the outcome contains significant measurement error, the irreducible error is increased.

The and have respective lower and upper bounds due to this error.

The better we understand the measurement system and its limits, the better we can

foresee the limits of model performance.

Measurement Errors

12

Page 13: 통자발표2

Measurement Errors

13

Page 14: 통자발표2

Measurement Error in the Outcome

There are two important take-aways:

1. No model can predict this type of error.

2. As error increases, the models become virtually indistinguishable in terms of their

predictive performance.

Measurement Errors

14

Page 15: 통자발표2

Measurement Errors

Measurement Error in the Predictors

Since many predictors are measured, they can contain some level of measurement error

associated with the measurement system.

Any error in the predictors is likely to be propagated directly through the model prediction

equation and results in poor performance.

15

Page 16: 통자발표2

Measurement Errors

Measurement Error in the Predictors

The effect of randomness in the predictors can be drastic, depending on several factors:

1. The amount of randomness

2. The importance of the predictors

3. The type of model being used

16

Page 17: 통자발표2

Measurement Errors

17

Page 18: 통자발표2

Measurement Errors

Measurement Error in the Predictors

Measurement error in the predictors can cause considerable issues, especially in terms

of reproducibility of the results on future data sets.

Future results may be poor because the underlying predictor data are different than the

values used in the training set.

18

Page 19: 통자발표2

Introduction

In many fields, even if the original response is on a continuous scale, it may be desirable

to work with a categorical response.

This could be due to the fact that the underlying distribution of the response is truly

bimodal.

Discretizing Continuous Outcomes

19

Page 20: 통자발표2

Discretizing Continuous Outcomes

The left histogram is symmetric

The right histogram is clearly bimodal.

20

Page 21: 통자발표2

Introduction

When the response is bimodal (or multimodal), categorizing the response is appropriate.

If the response follows a continuous distribution, then categorizing the response is

difficult and induces a loss of information.

Discretizing Continuous Outcomes

21

Page 22: 통자발표2

Reasons for Discretization

1. Practical reason

Decision makers may prefer to know whether or not a compound is predicted to be

soluble enough rather than the compound’s predicted log solubility value.

2. High degree of error

Scientist may believe that the continuous response contains a high degree of

error, so much so that only response values in either extreme of the distribution are

likely to be correctly categorized.

Discretizing Continuous Outcomes

22

Page 23: 통자발표2

Discretizing Continuous Outcomes

23

Working with the original scale provides

more accurate predictions for all models.

Page 24: 통자발표2

Introduction

The predictive modeling process assumes that the mechanism that generated the

current, existing data will continue to generate new data.

The new data will have similar characteristics and will occupy similar parts of the

predictor space as the data on which the model was built.

We have taken appropriate steps to create test sets that had similar properties across

the predictor space as the training set (Section 4.3).

When Should You Trust Your Model’s Prediction?

24

Page 25: 통자발표2

Introduction

If the new data are generated by the same mechanism as training set, we can have the

confidence that the model will make sensible predictions for the new data.

If the new data are not generated by the same mechanism, or if the training set was too

small or sparse to adequately cover the range of space, then predictions from the model

may not be trustworthy.

When Should You Trust Your Model’s Prediction?

25

Page 26: 통자발표2

Extrapolation

Extrapolation is defined as using a model to predict samples that are outside the range

of the training data (Armitage and Berry, 1994).

There may be regions within the predictors’ range where no training data exist.

Extrapolated prediction may not be trustworthy and can lead to poor decision making.

When Should You Trust Your Model’s Prediction?

26

Page 27: 통자발표2

Similarity of the New Data to the Training data

Many time though, the practitioner does not know if the mechanism is the same for the

new data as the training data.

There are a few tools that can be employed to understand the similarity.

When Should You Trust Your Model’s Prediction?

27

Page 28: 통자발표2

Applicability Domain

The applicability domain of a model is the region of predictor space where the model

makes predictions with a given reliability (Netzeva et al., 2005).

If the new data being predicted are similar enough to the training set, the assumption

would be that these points would have reliability that is characterized by the model

performance estimates.

When Should You Trust Your Model’s Prediction?

28

Page 29: 통자발표2

Dimension Reduction Techniques

A gross comparison of the space covered by the predictors from the training set and the

new set can be made using routine dimension reduction techniques such as principal

components analysis or multidimensional scaling (Davison, 1983).

If the training data and new data are generated from the same mechanism, then the

projection of these data will overlap.

When Should You Trust Your Model’s Prediction?

29

Page 30: 통자발표2

When Should You Trust Your Model’s Prediction?

30

Page 31: 통자발표2

Quantifying the Likelihood

When projecting many predictors into two dimensions, intricate predictor relationships as

well as sparse and dense pockets of space can be masked.

Hastie et al. (2008) describe an approach for quantifying the likelihood that a new

sample is a member of the training data.

Instead of method introduced by Hastie et al., the authors propose two slight alterations

to this method.

When Should You Trust Your Model’s Prediction?

31

Page 32: 통자발표2

When Should You Trust Your Model’s Prediction?

32

Page 33: 통자발표2

Introduction

An underlying presumption is that the more samples we have, the better model we can

produce.

A large number of samples can be beneficial, especially if the samples contain

information throughout the predictor space.

Measurement errors can minimize any advantages that may be brought by an increase

in the number of samples.

An increase in the number of samples can have less positive consequences.

The Impact of a Large Sample

33

Page 34: 통자발표2

Less Positive Consequences

1. Many of the predictive models have significant computational burdens as the number of

samples and predictors grows.

A single tree

Ensembles of trees

2. There are diminishing returns on adding more of the same data from the same

population.

The Impact of a Large Sample

34