Overfitting and Its Avoidance

Overfitting and Its AvoidanceChapter 5

指導教授：徐立群教授學生： R16014101 陳怡齊

R16011234 吳年鑫

Overfitting

即「過適」、「超適」或稱「過度擬合」意指在調適一個 model時，使用過多參數。對比於可取得的資料總量來說，一個荒謬的 model只要足夠複雜，是可以完美地適應 (fit)資料。不合乎一般化 (Generalization)

違反奧卡姆剃刀（ Occam’s Razor ) 原則

Overfitting & Generalization

A extreme example –

Customer churn or non-churn

Training data & Holdout data

Overfitting Examined

• Holdout Data and Fitting Graphs -

Figure 1. A typical fitting graph.

A fitting graph shows the accuracy of a model as a function of complexity .

Overfitting Examined

Base rate - What would b be ?

Figure 2. A fitting graph for the customer churn (table) model.

Overfitting in Tree Induction

Decision tree induction overfitting starts to the “sweet spot” in the graph .

Figure 3. A typical fitting graph for tree induction.

Overfitting in Mathematical Functions

We add more Xi, the function becomes more and more complicated.

Each Xi has a corresponding Wi, which is a learned parameter of the model .

Two dimensions you can fit a line to any two points and in three dimensions you can fit a plane to any three points .

This concept generalizes: as you increase the dimensionality, you can perfectly fit larger and larger sets of arbitrary points .

Example: Overfitting Linear Functions

Data： sepal width, petal widthTypes ： Iris Setosa, Iris Versicolor

Two different separation lines：a. Logistic regressionb. Support vector machine Figure 4

Figure 4 Figure 5

Figure 6 Figure 7

From Holdout Evaluation to Cross-Validation

Holdout Evaluation Splits the data into only one training and one holdout set.

Cross-validation computes its estimates over all the data by performing multiple splits and systematically swapping out samples for testing. ( k folds, typically k would be 5 or 10. )

The Churn Dataset Revisited

Average accuracy of the folds with classification trees is 68.6%—significantly lower than our previous measurement of 73%. ( the standard deviation of the fold accuracies is 1.1 )

“Example: Addressing the Churn Problem with Tree Induction” in Chapter 3. The logistic regression models

show slightly lower average accuracy (64.1%) and with higher variation ( standard deviation of 1.3 )

Classification trees may be preferable to logistic regression because of their greater stability and performance.

Learning Curves The generalization performance of data-driven

modeling generally improves as more training data become available.

Overfitting Avoidance & Complexity Control

Concept in Tree Induction : Tree induction commonly uses two techniques to avoid overfitting. These

strategies are : (i) to stop growing the tree before it gets too complex, and (ii) to grow the tree until it is too large, then “prune” it back, reducing its size (and

thereby its complexity).

Methods in Tree Induction : To limit tree size is to specify a minimum number of instances that must be

present in a leaf. Hypothesis test ( P-value )

General Method for Avoiding Overfitting Compare the best model we can build from one family (say, classification

trees) against the best model from another family (say, logistic regression).

Training set

Test set( hold out )

Training subset

Validation set

Nested holdout testing Select the best model by assess by

having a complexity of 122 nodes ( the sweet spot).

Induce a new tree with 122 nodes from the whole, original training data.

Final hold out

Original data

Training set

Test set

Nested Cross-ValidationSequential Forward Selection

Overfitting and Its Avoidance

Documents

Transcript of Overfitting and Its Avoidance

Tax Avoidance, Human Capital Accumulation and Economic …

Patent Application Drafting & Infringement Avoidance ... · PADIAS Patent Application Drafting & Infringement Avoidance Strategies Courses in Tokyo 会場： 新宿Lタワー22階

pengaruh corporate governance terhadap tax avoidance pada ...

ABUSO Y ELUSIÓN ABUSE AND AVOIDANCE

ANALISIS TAX AVOIDANCE, FIRM SIZE GOOD CORPORATE ...

Learning Through Experience: A Self Learning Collision Avoidance … · 2018-02-19 · Among those is collision avoidance, or obstacle avoidance, which is a fundamental system that

Barclays Tax Avoidance Scm Censored Guardian 2009 BarclaysKnight

pengaruh spesialisasi keahlian kap terhadap tax avoidance ( studi ...

PENGARUH TAX AVOIDANCE JANGKA PANJANG TERHADAP …eprints.undip.ac.id/43388/1/10_SIMARMATA.pdf · PENGARUH TAX AVOIDANCE JANGKA PANJANG TERHADAP NILAI PERUSAHAAN DENGAN KEPEMILIKAN

PENGARUH TAX AVOIDANCE DAN PROFITABILITAS …eprints.umm.ac.id/43126/1/PENDAHULUAN.pdfPENGARUH TAX AVOIDANCE DAN PROFITABILITAS TERHADAP KESEMPATAN BERINVESTASI PADA PERUSAHAAN (Studi

PENGARUH NILAI BUDAYA UNCERTAINTY AVOIDANCE

Anti Tax Avoidance Directive (ATAD) › sites › default › files › events › ... · Anti Tax Avoidance Directive (ATAD) 28. února 2018 Stanislav Kouba, MFČR. Obsah •íl

ANALISIS PENGARUH TAX AVOIDANCE TERHADAP NILAI PERUSAHAAN

TCP Flow control & Congestion Avoidance

Avoidance of tropical cyclones

Duncan Unaccepted-foods-Socio-Cultural Approaches to Understanding Avoidance

pengaruh corporate governance terhadap tax avoidance

tax avoidance, tax evasion and tax havens - Arbeiterkammer · PDF fileFarny Otto, Franz Michael, Gerhartinger Philipp, Lunzer Gertraud, Neuwirth Martina, Saringer Martin TAX AVOIDANCE,

PENGUJIAN TAX AVOIDANCE DAN RISIKO …dosen.univpancasila.ac.id/dosenfile/... · teori stakeholderssebagai dasar penilaian perilaku tax avoidance dalam ... mempengaruhi resiko kebangkrutan,

ANALYSIS OF FACTORS AFFECTING TAX AVOIDANCE IN ...

Patent Application Drafting & Infringement Avoidance ... · PADIAS Patent Application Drafting & Infringement Avoidance Strategies Courses in Tokyo 会場：新宿Lタワー22階