PAC Learning Framework - University of Cambridgemi.eng.cam.ac.uk/~cz277/doc/Slides-PAC.pdf · PAC...

PAC Learning Framework

Dept. of CS&T, Tsinghua University

Ü� (Dept. of CS&T, Tsinghua University) PAC Learning Framework 1 / 30

1 Questions for Learning Algorithms

2 Basis of PACIntroductionBasic SymbolsError of a hypothesisPAC Learnability

3 Sample complexity for finite hypothesis spaceConsistent LearnerAgnostic Learning and Inconsistent HypothesesPAC-Learnability of Other Concept Classes

4 Sample Complexity for Infinite Hypothesis Spaces

5 Some More General ScenarioPapers in Recent Years

6 Critisms of The PAC Model

Questions for Learning Algorithms

Sample ComplexityI How many training examples do we need to converge to a successful

hypothesis with a high probability?

Computational Complexity

I How much computational effort is needed to converge to a successfulhypothesis with a high probability?

Mistake Bound

I How many training examples will the learner misclassify beforeconverging to a successful hypothesis?

Computational ComplexityI How much computational effort is needed to converge to a successful

Mistake Bound

I How many training examples will the learner misclassify beforeconverging to a successful hypothesis?

Computational ComplexityI How much computational effort is needed to converge to a successful

Mistake BoundI How many training examples will the learner misclassify before

converging to a successful hypothesis?

What it means for the learner to be ”successful”?

I The output hypothesis is identical to the target conceptI The output hypothesis agrees with the target concept most of the time

The answer to the question denpends on the particular learning modelin mind

What it means for the learner to be ”successful”?I The output hypothesis is identical to the target concept

I The output hypothesis agrees with the target concept most of the time

What it means for the learner to be ”successful”?I The output hypothesis is identical to the target conceptI The output hypothesis agrees with the target concept most of the time

Introduction

Probably Approximately Correct Learning

Analysis under the Framework

I Sample ComplexityI Computational Complexity

Restrict the discussion on

I Learn boolean-valued conceptI Learn from noise-free training data

Introduction

Analysis under the FrameworkI Sample ComplexityI Computational Complexity

Restrict the discussion on

I Learn boolean-valued conceptI Learn from noise-free training data

Introduction

Analysis under the FrameworkI Sample ComplexityI Computational Complexity

Restrict the discussion onI Learn boolean-valued conceptI Learn from noise-free training data

Basic SymbolsExample: The students who passed mid-term exam and hand in allthe homework, and passed the final exam or hand in a reading reportpassed the course Applied Stochastic Process

X : Instance set, x ∈ X

| X |= 2× 2× 2× 2 = 16

C : Concept set, c ∈ C

C = {((1, 1, 1, 0), 1), ((1, 0, 1, 1), 1), ((1, 1, 1, 1), 1)}H: All possible hypothesis set, h ∈ H

| H |= 1 + 3× 3× 3× 3 = 82

D : Instance Distribution; D: Training set

Basic SymbolsExample: The students who passed mid-term exam and hand in allthe homework, and passed the final exam or hand in a reading reportpassed the course Applied Stochastic ProcessX : Instance set, x ∈ X

| X |= 2× 2× 2× 2 = 16

| H |= 1 + 3× 3× 3× 3 = 82

| X |= 2× 2× 2× 2 = 16

C = {((1, 1, 1, 0), 1), ((1, 0, 1, 1), 1), ((1, 1, 1, 1), 1)}

H: All possible hypothesis set, h ∈ H

| H |= 1 + 3× 3× 3× 3 = 82

| X |= 2× 2× 2× 2 = 16

| H |= 1 + 3× 3× 3× 3 = 82

| X |= 2× 2× 2× 2 = 16

| H |= 1 + 3× 3× 3× 3 = 82

D : Instance Distribution; D: Training setÜ� (Dept. of CS&T, Tsinghua University) PAC Learning Framework 7 / 30

Basic Symbols

Model of learning from examples

x is drawn randomly from X according to D

Basic Symbols

Model of learning from examples

x is drawn randomly from X according to D

Error of a hypothesisTure Error

Definition

True Error of hypothesis h with respect to target concept c anddistribution D is the probability that h will misclassify an instance drawnrandomly according to D , i.e.

errorD(h) ≡ P(c∆h)

A∆B is symmetric difference, that A∆B = (A ∪ B) \ (A ∩ B)

Error of a hypothesisTure Error

Definition

True Error of hypothesis h with respect to target concept c anddistribution D is the probability that h will misclassify an instance drawnrandomly according to D , i.e.

errorD(h) ≡ P(c∆h)

A∆B is symmetric difference, that A∆B = (A ∪ B) \ (A ∩ B)

PAC Learnability

Learnabilty(”Successful”)

errorD(h) = 0 is futile

I We need to provide training examples corresponding to every possibleinstance in X

I Randomly drawned training examples will always be some nonzeroprobabilty to be misleading

PAC Learnability

I We will require only that h’s true error be bounded by some constant,ε, that can be made arbitrarily small

I We will require only that h’s probability of failure be bounded by someconstant, δ, that can be made arbitrarily small

I (1− δ)→ probably; ε→, approximately correctI We only require learner probably learn a hypothesis that is

approximately correct, which is PAC

PAC Learnability

errorD(h) = 0 is futile

I We need to provide training examples corresponding to every possibleinstance in X

PAC Learnability

errorD(h) = 0 is futileI We need to provide training examples corresponding to every possible

instance in X

PAC Learnability

instance in XI Randomly drawned training examples will always be some nonzero

probabilty to be misleading

PAC Learnability

PAC LearnabilityI We will require only that h’s true error be bounded by some constant,ε, that can be made arbitrarily small

PAC Learnability

Definition

Consider a concept class C defined over a set of instances X of length nand a learner L using hypothesis space H. C is PAC-learnable by L usingH if for all c ∈ C , distributions D over X , ε such that 0 < ε < 1

2 , and δsuch that 0 < δ < 1

2 ,learner L will with probability at least (1− δ) output a hypothesis h ∈ Hsuch that errorD(h) ≤ ε,in time that is polynomial in 1

ε , 1δ , n, and size(c).

L→ (1− δ), ε,

L→Polynomial time, Polynomial samples

X → n

C → size(c)

PAC Learnability

Definition

L→ (1− δ), ε,

X → n

C → size(c)

PAC Learnability

Definition

L→ (1− δ), ε,

X → n

C → size(c)

Sample Complexity for Finite Hypothesis Space

Sample ComplexityI The growth in the number of required training examples with problem

size, called the sample complexity of the learning problem

Consistent Learner

Definition

A hypothesis is consistent with set D, if and only if h(x) = c(x), i.e.,

Consistent(h,D) ≡ (∀ < x , c(x) >∈ D), h(x) = c(x)

A learner is consistent if it outputs hypotheses that perfectly fit thetraining data

Sample Complexity for Finite Hypothesis Space

Sample ComplexityI The growth in the number of required training examples with problem

size, called the sample complexity of the learning problem

Consistent Learner

Definition

A hypothesis is consistent with set D, if and only if h(x) = c(x), i.e.,

Consistent(h,D) ≡ (∀ < x , c(x) >∈ D), h(x) = c(x)

A learner is consistent if it outputs hypotheses that perfectly fit thetraining data

Consistent Learner

Version Space

Definition

The Version Space, denoted VSH,D ≡ {h ∈ H | Consistent(h,D)}

C = {(1, 1, 1, 0), (1, 0, 1, 1), (1, 1, 1, 1)}, c = (1, 0, 1, 1)D = {((1, 0, 1, 1), 1), ((1, 0, 0, 1), 0)}⇒ VSH,D = {(?, ?, 1, ?), ..., (1, 0, 1, 1), ..., (1, 1, 1, 0), ..., (1, 1, 1, 1)}

ε−exhaustedTo bound the number of examples needed by any consistent learner, weneed only to bound the number of examples needed to assure that theversion space contains no unacceptable hypotheses

Definition

VSH,D , is said to be ε-exhausted with respect to c and D , if everyhypothesis h in VSH,D has error less than ε with respect to c and D , i.e.

(∀h ∈ VSH,D)errorD(h) < ε

Consistent Learner

Version Space

Definition

The Version Space, denoted VSH,D ≡ {h ∈ H | Consistent(h,D)}

C = {(1, 1, 1, 0), (1, 0, 1, 1), (1, 1, 1, 1)}, c = (1, 0, 1, 1)D = {((1, 0, 1, 1), 1), ((1, 0, 0, 1), 0)}⇒ VSH,D = {(?, ?, 1, ?), ..., (1, 0, 1, 1), ..., (1, 1, 1, 0), ..., (1, 1, 1, 1)}ε−exhausted

To bound the number of examples needed by any consistent learner, weneed only to bound the number of examples needed to assure that theversion space contains no unacceptable hypotheses

Definition

VSH,D , is said to be ε-exhausted with respect to c and D , if everyhypothesis h in VSH,D has error less than ε with respect to c and D , i.e.

(∀h ∈ VSH,D)errorD(h) < ε

Consistent Learner

ε−Exhausting The Version SpaceA probabilistic way to probabilistic to bound the probability that theversion space will be ε−exhausted after a given number of trainingexamples, without knowing the identity of the target concept or thedistribution from which training examples are drawn

Definition

If the hypothesis space H is finite, and D is a sequence of m ≥ 1independent randomly drawn examples of some target concept c , then forany 0 ≤ ε ≤ 1, the probability that the version space VSH,D is notε−exhausted is less than or equal to,

| H | e−εm

Consistent Learner

ε−Exhausting The Version SpaceI proof:

Assume h1, h2, . . . , hk be all the hypotheses in H that have true errorgreater than ε with respect to c . The probability one such hypothesis isconsistent with m randomly drawn samples is (1− ε)m, then theprobability that there’s at least one such hypothesis in VSH,D is,

k(1− ε)m (1)

k(1− ε)m ≤| H | (1− ε)m ≤| H | e−εm (2)

| H | e−εm ≤ δ (3)

m ≥ 1

ε(ln | H | + ln(

δ)) (4)

Consistent Learner

ε−Exhausting The Version SpaceI proof:

Assume h1, h2, . . . , hk be all the hypotheses in H that have true errorgreater than ε with respect to c . The probability one such hypothesis isconsistent with m randomly drawn samples is (1− ε)m, then theprobability that there’s at least one such hypothesis in VSH,D is,

k(1− ε)m (1)

k(1− ε)m ≤| H | (1− ε)m ≤| H | e−εm (2)

| H | e−εm ≤ δ (3)

m ≥ 1

ε(ln | H | + ln(

δ)) (4)

Consistent Learner

Conjunctions of Boolean LiteralsI n = 4, | H |= 34, ε = 0.1, δ = 0.05,then,

m ≥ 1

ε(n ln 3 + ln(

δ)) = 10× (4 ln 3 + ln 20) ≈ 74

| X |= 16 < m′0 = 74

I The weakness of this bound is mainly due to the | H | term, whicharises in the proof when summing the probability that a singlehypothesis could be unacceptable, over all possible hypotheses

Consistent Learner

Conjunctions of Boolean LiteralsI n = 4, | H |= 34, ε = 0.1, δ = 0.05,then,

m ≥ 1

ε(n ln 3 + ln(

δ)) = 10× (4 ln 3 + ln 20) ≈ 74

| X |= 16 < m′0 = 74

I The weakness of this bound is mainly due to the | H | term, whicharises in the proof when summing the probability that a singlehypothesis could be unacceptable, over all possible hypotheses

Agnostic Learning and Inconsistent HypothesesAgnostic Learner if H does not contain the target concept c , then azero-error hypothesis cannot always be found

Definition

A learner that makes no assumption that the target concept isrepresentable by H and that simply finds the hypothesis with minimumtraining error, is often called an agnostic learner

Inconsistent Hypothesis

I errorD(h) denote the training error of hypothesis h, hbest denote thehypothesis from H having lowest training error over training set

I Using Hoeffding bound which characterize the deviation between thetrue probability of some event and its observed frequency over mindependent trialsTo ensure the true error errorD(hbest) ≤ ε+ errorD(hbest)

P((∃h ∈ H)(errorD(hbest))errorD(hbest) + ε) ≤| H | e−2mε2

m ≥ 1

2ε2(ln | H | + ln

Definition

Inconsistent Hypothesis

I errorD(h) denote the training error of hypothesis h, hbest denote thehypothesis from H having lowest training error over training set

m ≥ 1

2ε2(ln | H | + ln

Definition

Inconsistent HypothesisI errorD(h) denote the training error of hypothesis h, hbest denote the

hypothesis from H having lowest training error over training set

m ≥ 1

2ε2(ln | H | + ln

Definition

Inconsistent HypothesisI errorD(h) denote the training error of hypothesis h, hbest denote the

hypothesis from H having lowest training error over training setI Using Hoeffding bound which characterize the deviation between the

true probability of some event and its observed frequency over mindependent trialsTo ensure the true error errorD(hbest) ≤ ε+ errorD(hbest)

m ≥ 1

2ε2(ln | H | + ln

PAC-Learnability of Other Concept Classes

Unbiased LearnerI The unbiased concept class C that contains every teachable concept

relative to X , suppose that instances in X are defined by n booleanfeatures, then | C |= 2|X | = 22n ,

m ≥ 1

ε(2n ln 2 + ln

is not PAC learnable

K-Term DNF And K-CNF Concepts

I K-Term DNF has polynomial sample complexity, it does not havepolynomial computational complexity for a learner using H = C

I K-CNF polynomial computation complexity per example and still haspolynomial sample complexity

I K-Term DNF ⊆ K − CNF

PAC-Learnability of Other Concept Classes

Unbiased LearnerI The unbiased concept class C that contains every teachable concept

relative to X , suppose that instances in X are defined by n booleanfeatures, then | C |= 2|X | = 22n ,

m ≥ 1

ε(2n ln 2 + ln

is not PAC learnable

K-Term DNF And K-CNF ConceptsI K-Term DNF has polynomial sample complexity, it does not have

polynomial computational complexity for a learner using H = CI K-CNF polynomial computation complexity per example and still has

polynomial sample complexityI K-Term DNF ⊆ K − CNF

Sample Complexity for Infinite Hypothesis Spaces

Shatter

Definition

A set of instances S is shattered by hypothesis space H if and only if forevery dichotomy of S there exists some hypothesis in H consistent withthis dichotomy.

VC Dimension

Definition

VC (H), of hypothesis space H defined over instance space X is the size ofthe largest finite subset of X shattered by H. If arbitrarily large finite setsof X can be shattered by H, then VC (H) =

VC (H) = d = log2 2d ≤ log2 | H |

Sample Complexity for Infinite Hypothesis Spaces

Shatter

Definition

A set of instances S is shattered by hypothesis space H if and only if forevery dichotomy of S there exists some hypothesis in H consistent withthis dichotomy.

VC Dimension

Definition

VC (H), of hypothesis space H defined over instance space X is the size ofthe largest finite subset of X shattered by H. If arbitrarily large finite setsof X can be shattered by H, then VC (H) =

VC (H) = d = log2 2d ≤ log2 | H |

Sample Complexity and the VC Dimension

Upper Bound to PAC Learn Any Target in C

m ≥ 1

ε(4 log2

δ+ 8VC (H) log2

Lower Bound

m ≥ max{1

δ,VC (C )− 1

Sample Complexity and the VC Dimension

Upper Bound to PAC Learn Any Target in C

m ≥ 1

ε(4 log2

δ+ 8VC (H) log2

Lower Bound

m ≥ max{1

δ,VC (C )− 1

Some More General ScenarioLearning Real-Valued Target Functions

Definition

A concept class is a nonempty set C ⊆ 2X of concepts. Assume X is afixed set, either finite, countably infinite, [0, l ]n, or En for some n ≥ 1. Inthe latter cases, we assume that each c ∈ C is a Borel set. Forx = (x1, . . . , xm) ∈ Xm, m ≥ 1, the m-sample of c ∈ C generated by x isgiven by samc(x) = (< x1, Ic(x1) >, . . . , < xm, Ic(xm) >). The samplespace of C , denoted SC, is the set of all m-samples over all c ∈ C and allx ∈ Xm, for all m ≥ 1. · · ·

Blumer, A., Ehrenfeucht, A., Haussler, D., &Warmuth, M.Learnability and the Vapnik-Chervonenkis dimension.Journal of the ACM, 34(4) (October), 929-965. 1989.

Natarajan, B. K.Machine Learning: A theoretical approach.San Mateo, CA: Morgan Kaufmann, 1991.

Some More General Scenario

Learning from Certain Types of Noisy Data

Laird, P.Learning from good and bad data.Dordrecht: Kluwer Academic Publishers, 1988.

Kearns, M. J., Vazirani, U. V.An introduction to computational learning theory.Cambridge, MA: MIT Press, 1994.

Papers in Recent Years

Auer, P., Ortner, R.A new PAC bound for intersection-closed concept classesMachine learning, 2007.

Critisms of The PAC Model

The worst-case emphasis in the model makes it unusable in practice

The notions of target concept and noise-free training data areunrealistic in practice

Without its computational part, the result are contained by StatisticalLearning Theory

Reference

Valiant, L.A theory of the learnable.Communications of the ACM, 27, 1984.

Blumer, A., Ehrenfeucht, A., Haussler, D., &Warmuth, M.Learnability and the Vapnik-Chervonenkis dimension.Journal of the ACM, 34(4) (October), 929-965. 1989.

Haussler, D.Overview of the Probably Approximately Correct (PAC) LearningFramework.An introduction to the topic

Reference

Vidyasagar, M.A theory of learning and generalization : with applications toneural networks and control systems.London ; New York : Springer, c1997

Vapnik§V. N.Statistical learning theory.New York : Wiley, c1998.

Mitchell, T. M.Í, Qu�, ÜÕ¿�ÈÅìÆS.�®: Å�ó�Ñ��, 2003.

Thanks!

PAC Learning Framework - University of Cambridgemi.eng.cam.ac.uk/~cz277/doc/Slides-PAC.pdf · PAC...

Documents

Transcript of PAC Learning Framework - University of Cambridgemi.eng.cam.ac.uk/~cz277/doc/Slides-PAC.pdf · PAC...

R & D Dept

CAJA REGISTRADORA - torrey.net · ge d n dept 14 dept 13 void dept 12 dept 11 dept 10 dept 09cl on/off cl er k 04 08 dept03 dept07 02dept 06 dept01 dept05 9 def 8 abc 7 @ 6 mno 5

COMPANY OVERVIEW - pdsmy.files.wordpress.com · Web / Application / Mobile Design & Plan DEPT. Marketing DEPT. Design Strategy Consulting DEPT. Business & Martketing Department. Mobile/Web

Dept Meet 102011

THEATER DEPT.

Hotel Dept. Englsih

Framework Coreadi2010.pbworks.com/f/Framework+Core+-+Frappier+… · Web viewFramework Core Framework Core 1 **Shells 5 2AC Framework Shell 6 1NC Framework Shell 7 1NC Framework Shell

8Ì ø ºÕù* 7 - Hang Seng Bank · ABC Marketinq Dept (ABC Company) ABC Procurement Dept (ABC Company) XYZ Marketing Dept (XYZ Company) XYZ Creative Dept (XYZ Company) e-Banking

Dept Financeiro Sheraton

ADMINISTRAR PAC Y CUPO PAC EN UNIDADES … · ejecutora del actor principal, en las posiciones del catálogo de PAC con marca “Control PAC”, en una vigencia PAC y para el año

SNAP PAC R-Series Controller User’s Guide · snap pac r-series controller user’s guide snap-pac-r1 snap-pac-r1-b snap-pac-r1-fm snap-pac-r1-w snap-pac-r2 snap-pac-r2-fm snap-pac-r2-w

Presentasi KP Dept

Dept is asset

Real-Time Scheduling II: Compositional Scheduling Framework Insik Shin Dept. of Computer Science KAIST.

Dept Acd Mont

Dept Meet 022012

Import Dept

Dept May2011gf

Demanda Dept. Familia

PAC+ PAC+R