인공지능시스템 (Artificial Intelligence)

L E A R N IN G : 인 간 과 컴 퓨 터 의 학 습

최 윤 정

인공지능시스템(Artificial Intelligence)

2

Learning Machine Learning 의 기본 배경 및 개념

인공신경망

Data Mining Association : market basket analysis.(slide. 51,

supp.87) Clustering : KNN, Kmeans(slide.60) Decision Tree(slide, 70)

3

Learning

Change of contents and organization of system’s knowledge enabling to improve its performance on task - Simon

Acquire new knowledge from environment Organize its current knowledge Inductive Inference

General conclusion from examples Infer association between input and output with some confidence

4

Learning - Human

Attribute-based Learning 어린 아이들에게 남자는 모두 아빠 , 4 발 동물은 모두 ‘개‘

Learning by Model Averaging 얼굴 인식 : 남녀의 성차 및 연력

특정 부위 보다는 전체 인상으로 결정

발생적 인식론에 의한 단계 발달 이론 지식 : 조직 동화 조절되는데 , 일정 연령에 가서야 기저 변화 , 질적 차이를 보인다 .

정보 처리 이론 인간은 정보 처리 기능을 가진다 . 단 , 아이와 성인의 차이는 Processing Power 의 차이

자기 이해 및 성찰 능력 뒤를 돌아보는 능력 : back tracking!! 무엇이 잘못되었는가 ? 반성하고 이후 행동에 반영한다 .

6

Learning - Computer

시간 ( 경험 ) 이 지남에 따라 그 성능이 좋아야 한다 .

Speedup Learning : 쓰면 쓸수록 빨라지도록

Knowledge Learning : Rule 이 증가한다 . 그 결과로 성능이 좋아지도록

Quality Learning : Rule 의 형태를 줄 수는 없지만 답의 질이 좋아지도록

7

Machine Learning

기계학습 (Machine Learning ) 컴퓨터가 경험 , sample, 유추를 통해 학습할 수 있게 하는 적응 메커니즘

학습 능력은 시간이 흐르면서 지능형 시스템의 성능을 개선함

기계 학습의 접근법

인공 신경망 (artificial neural network) 유전 알고리즘 (genetic algorithm)

기계 학습의 예 - 체스 게임

팁 블루와 컴퓨터와 체스 게임 선수의 체스 게임

( 팁 블루 : IBM 에서 만든 초당 2 억 개의 포지션을 분석할 수 있는 컴퓨터 ) 체스 게임을 통해 배운 것

경험을 통해 성능을 향상시켜야 한다 . 경험이란 실패 / 성공 , 데이터와 지식 , 규칙

기계는 반드시 학습 능력이 있어야 한다 .

8

인공 신경망 (artificial neural network)

인공 신경망 : 뇌의 동작 원리 인간 뇌를 기반으로 한 추론 모델 . 뉴런 (neuron) - 정보처리의 기본 단위

주요특징 : 적응성 .!

인간 뇌의 특징 100 개의 뉴런과 각 뉴런을 연결하는 6 조 개의 시냅스의 결합체

인간의 뇌는 현존하는 어떤 컴퓨터보다 빠르게 기능을 수행할 수 있음

인간의 뇌는 매우 복잡하고 , 비선형적이며 , 병렬적인 정보 처리 시스템으로 생각할 수 있다 . 정보는 신경망 전체에 동시에 저장되고 처리된다 . 적응성에 따라 ‘잘못된 답’으로 이끄는 뉴런들 사이의 연결은 약화되고 , ‘ 올바른 답’으로 이끄는

연결은 강화된다 .

9


인공 신경망의 특징 인간 뇌를 기반으로 모델링 함 . 뉴런이라는 아주 단순하지만 내부적으로 매우 복잡하게 연결된 프로세스들로 이루어져 있음 . 뉴런은 가중치 있는 링크들로 연결되어 있음 . 각각의 뉴런은 연결을 통해 여려 입력 신호를 받지만 출력 신호는 오직 하나만 만듦 .

10


인공 신경망의 학습 반복적인 가중치의 조정과정이 수행된다 . 뉴런은 링크 (link) 로 연결되어 있고 , 각 링크에는 그와 연관된 가중치 ( 수치값 ) 이 부여된다 . 이 가중치 값은 각 뉴런 입력 강도를 표현한다 .

인간 뇌의 적응성을 활용하여 ‘학습 능력’을 구현했으나 , 아직 인간 뇌의 모든 기능을 흉내내기에는 미흡하다 .

** 인공 신경망의 가중치 조정 방법

1. 신경망의 가중치를 초기화한다 .2. 신경망의 구조를 먼저 선택하고 어떤 학습 알고리즘을 사용할 것 인지 결정한 후 , 3. 학습 예제들의 집합에 따라 해당 가중치를 갱신하며 신경망을 훈련시킨다 .

11

인공신경망 : 뉴런

뉴런의 특징 : 단순한 계산 요소 입력 링크로부터 여러 신호를 받아서 새로운 활성화 수준을 계산하고 , 출력 링크로 출력 신호를 내보낸다 . 이 때 , 입력 신호는 미가공 데이터 또는 다른 뉴런의 출력이 될 수 있다 . 출력 신호는 문제의 최종적인 해 (solution) 거나 다른 뉴런에 입력될 수 있다 .

12

인공신경망 : 뉴런의 계산

뉴런의 출력 결정 렌 맥클록 (Warren McCulloch) 과 월터 피츠 ( Walter Pitts ) 가 제안 (1943년 ). 전이 함수 , 즉 활성화 함수 (activation function) 를 사용

활성화 함수를 이용한 출력 결정 순서 1. 뉴런은 입력 신호의 가중치 합을 계산하여 임계값 θ 와 비교한다 . 2. 가중치 합이 임계값보다 작으면 뉴런의 출력은 ‘ -1’, 같거나 크면 뉴런은 활성화되고 , 뉴런의 출력은 ‘ +1’ 이 된다

(6.1) 은 부호함수의 형태 . 부호 활성화 함수를 사용하는 뉴런의 실제 출력은 (6.2) 의 모양

활성화 함수

( 부호함수 )

X : 뉴런으로 들어가는 입력의 가중치 합

xi : 입력 i 의 값 ,

wi: 입력 i 의 가중치 (weight),

n: 뉴런의 입력 개수

Y: 뉴런의 출력

13

인공신경망 : 뉴런의 활성화 함수의 예

가장 일반적인 활성화 함수로는 계단 , 부호 , 선형 , 시그모이드 함수가 있

다 . :

하드 리밋 함수 (hard limit function) 라고도 하며 , 분류와 패턴인식 작업에서 결정을 내리는 뉴런에 주로 적용된다 .

양과 음의 무한대 사이에 있는 입력값을 0~1 사이에 있는 적당한 값으로 바꾼다 . 시그모이드 함수를 사용하는 뉴런은 역전파 신경망에 쓰인다

선형 활성화 함수 (linear activation function) 는 뉴런의 입력에 가중치가 적용된 것과 같은 값을 출력으로 내놓는다 . 선형 함수를 사용하는 뉴런은 선형 근사에 주로 쓰인다 .

14

단일 뉴런의 학습 : 퍼셉트론

단일 뉴런의 학습 : 퍼셉트론 프랭크 로젠블랫이 간단한 인공 신경망을 훈련시키기 위한 알고리즘인 퍼셉트론을 소개했다 (1958년 ). ( 로젠블랫 퍼셉트론의 동작 원리는 맥클록과 피츠의 뉴런 모델에 기반한다 .) 퍼셉트론은 신경망의 가장 간단한 형태로 조정 가능한 시냅스 가중치의 선형결합기와 하드 리미터 (hard limiter)

를 포함한 단일 뉴런으로 구성된다 . 하드리미터 : 입력의 가중합을 하드 리미터에 입력하고 , 입력이 양이면 ‘ +1’, 음이면 ‘ -1’ 을 출력 .

15

퍼셉트론의 특징

기본적인 퍼셉트론의 경우 , 2 차원은 직선 , n 차원 공간에서는 초평면 (hyperplane) 으로 두 개의 결정 영역을 나눈다 . 초평면을 선형 분리 함수로 정의한다 .

입력이 x1 과 x2 로 두 개인 경우 , 결정 경계는 [ 그림 6-6] 의 (a) 에 보이듯 굵은 직선 형태로

나타낸다 . 경계선 오른편에 있는 점 (1) 은 클래스 A1 에 속하고 , 경계선 왼편에 있는 점 (2) 는 클래스 A2 에

속한다 . 이 때 , 임계값 θ 는 결정 경계를 옮기는 데 쓰인다 .

16

퍼셉트론의 기본 학습

퍼셉트론 기본 학습 방법 : 오차를 이용한다 . 가중치를 조절하여 실제 출력과 목표 출력 간의 차이를 줄여가는 과정 .! 보통 [−0.5, 0.5] 범위에서 초기 가중치를 임의로 할당한 후 훈련 예제와 일치하는 출력을 얻도록 갱신한다 . 오차 계산 식 :

오차 e(p) 가 양수면 퍼셉트론의 출력 Y(p) 를 증가시켜야 하고 그 값이 음이면 감소시킨다 .

퍼셉트론 학습 규칙 .

p번째 훈련 예제 ( 시간이 될 수도 있다 ) 대한 Y(p) : 실제출력값 , Yd(p) : 표본에 의한 출력값

α 는 학습률 (learning rate) 로 , 1 보다 작은 양의 상수 .

17

분류 작업을 위한 퍼셉트론 훈련알고리즘

분류작업을 위한 4 단계 알고리즘 1 단계 : 초기화

초기 가중치 w1,w2, . . .,wn과 임계값 θ 를 [−0.5, 0.5]( 적당히 작은 ) 구간의 임의의 값으로 설정 .

2 단계 : 활성화

입력 x1(p), x2(p), . . ., xn(p) 와 목표 출력 Yd(p) 를 적용하여 퍼셉트론을 활성화 . 반복 횟수 ( 시간 ) p=1 에서 실제 출력을 계산 .

3 단계 : 가중치 학습 : 퍼셉트론의 가중치를 갱신한다 .

4 단계 : 반복

반복 횟수 p 값을 1 증가시키고 , 2 단계로 돌아가서 수렴할 때까지 과정을 반복한다 .

n : 퍼셉트론의 입력 개수 step : 계단 활성화 함수

∆wi(p) 는 p번째 반복했을 때의 가중치 보정값 . 가중치 보정값은 델타 규칙으로 계산한다 .

18

퍼셉트론의 기본적인 논리연산자 학습

기본적인 논리 연산자 학습 AND, OR, Exclusive-OR와 같은 기본적인 논리 연산자의 기능을 수행하도록 훈련 . 연산자 AND, OR, Exclusive-OR의 진리표 : [표 6-2]

AND와 OR 연산자 학습

학습 계획

1. 하나의 에폭을 나타내는 네 개의 연속된 입력 패턴으로 퍼셉트론을 활성화한다 . 2. 퍼셉트론의 가중치는 각각의 활성화 이후에 갱신된다

3. 가중치가 일관된 수치 집합으로 수렴할 때까지 이 과정을 반복한다

19

AND 논리 연산자의 퍼셉트론 학습 결과

20

단층 퍼셉트론 학습의 한계

Exclusive-OR 연산자 학습 선형분리가 불가능하기 때문에 .! AND, OR, Exclusive-OR 함수의 2차원 도면 : (함수의 출력이 1인 입력공간의 점은 검은색으로 출력이 0인 점은 흰색 .)

1. (a)와 (b)에서는 검은 점과 흰 점을 구분하여 직선을 그릴 수 있지만 , (c)의 점들은 직선으로 분리할 수 없다 . 2. 퍼셉트론은 모든 검은 점과 모든 흰 점을 분리하는 직선이 있을 때만 함수로 표현할 수 있다…

21



단층 퍼셉트론은 선형 분리 가능한 함수만 학습할 수 있다 .

이동 , 확대 , 중복 및 회전된 패턴인 경우 결정 경계선이 계속적으로 변하여 정확히 분류되지 못하는 단점이 있다 .

선형 분리 가능한 함수들도 그리 많지 않다 .

단층 퍼셉트론의 계산적 한계는 1969년 Minsky 와 Papert 가 저술한『 Perceptrons』 (MIT Press) 에서

수학적으로 분석하여 로젠블랫 퍼셉트론이 전체적인 일반화를 할 수 없다는 것을 증명하였다 .

단층 퍼셉트론 학습의 한계 극복

다층 신경망으로 단층 퍼셉트론을 극복할 수 있다 .

역전파 알고리즘으로 학습한 다층 퍼셉트론처럼 발전된 형태의 신경망을 사용하면 로젠블랫 퍼셉트론의 한계를

극복할 수 있다는 것이 증명되었다 .

더 많은 변형모델이 제안되었다 .

22

인공신경망 모델

다층 신경망

홉필드

Kohonen 자기조직화 지도

볼쓰만 머신

…

23

Learning : supervised vs. unsupervised

Supervised Learning (감독 학습 , 교사학습 ) 옆에 교사가 있는 형태 . 이미 답을 알고 있는 과제를 통하여 지식을 추출하고 새로운 정보에 적용

예 ) Data Mining 의 Classification : 분류기준이나 패턴을 주어 데이터를 분류하게 하는 일 .

Unsupervised Learning( 비감독 , 비교사학습 ) 혼자 알아서 공부하는 형태

일정 기준에 의하여 알아서 자동 분류하도록 한다 . 스스로 규칙을 찾는다 . Threshold( 임계값 ) 이나 여러 평가값이 필요하다 . 예 ) Data Mining 의 Clustering

Reinforcement Learning( 강화학습 ) ( 비교사 학습으로 분류하기도 하고 독립적으로 분류하기도 한다 ) 최소한의 개입만 하는 교사의 형태 . 예를 들어 , 10 문항 중 7점을 맞았을 때 , 몇 번 문항이 왜 틀렸는지는 설명하지 않는 형태 .

24

컴퓨터 학습의 분류

Inductive Learning 예제로부터 학습된 결과에 의하여 주어진 과제의 결론을 도출

때로는 결론이 형성된 과정을 설명할 수 없는 상황이 발생

Deductive Learning(Abduction) 알고 있는 부분 지식을 근거로 가정 ( 전제 ) 에 대한 후방향 추론 실시

한 번 알려진 내용은 링크를 통하여 바로 접근 (Speedup) 결론 도출 과정이 명쾌

26

Classification of Inductive Learning

Supervised Learning : classification given training examples

correct input-output pairs

recover unknown function from data generated from the func-tion

generalization ability for unseen data classification : function is discrete concept learning : output is binary

Unsupervised Learning : clustering

27

Classification of Inductive Learning

Supervised Learning Unsupervised Learning

No correct input-output pairs needs other source for determining correctness reinforcement learning : yes/no answer only

example : chess playing

Clustering : group into clusters of common characteristics Map Learning : explore unknown territory Discovery Learning : uncover new relationships

28

Learning = Generalization

H. Simon -

Learning

denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time.”

The ability to perform a task in a situation which has never been encountered before

29

Learning = Generalization

Classification medical diagnosis; credit-card applications, handwritten letters;

Planning and Acting Navigation, Game playing, (chess, backgammon), driving a car

Skills balancing a pole, playing tennis

Common Sense Reasoning natural language interactions, visual interpretation, jokes

30

Supervised Learning

Given: Examples (x,f (x)) of some unknown function f Find: A good approximation to f

y = f (x1, x2, x3, x4)Unknownfunction

x2

x3

x4

Example x1 x2 x3 x4 y

1 0 0 1 0 0

3 0 0 1 1 1

4 1 0 0 1 1

5 0 1 1 0 0

6 1 1 0 0 0

7 0 1 0 1 0

2 0 1 0 0 0

x1 Example x1 x2 x3 x4 y

1 1 1 1 ?

0 0 0 0 ?

1 0 0 0 ?

1 0 1 1 ? 1 1 0 0 0 1 1 0 1 ?

1 0 1 0 ? 1 0 0 1 1

0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 ?

0 0 1 1 1 0 0 1 0 0 0 0 0 1 ?

1 1 1 0 ?

We can’t figure out which one is

correct until we’ve seen every

possible input-output pair.

31

What’s Good?

Learning problem: Find a function that best separates the data

What function? What’s best? How to find it?

A possibility: Define the learning problem to be: Find a (linear) function that best separates the data

32

Data are not separable in one dimensionNot separable if you insist on using a specific

class of functions

x

Feature Space

33

Functions Can be Made Linear

x1 x2 x4 Ç x2 x4 x5 Ç x1 x3 x7

Space: X= x1, x2,…, xn

input Transformation New Space: Y = {y1,y2,…} = {xi,xi xj, xi xj

xj}

Weather

Whether

y3 Ç y4 Ç y7 New

discriminator is functionally

simpler

34

Interesting Applications

Reasoning (Inference, Decision Support)Cartia ThemeScapes - http://www.cartia.com

6500 news storiesfrom the WWWin 1997

Planning, Control

Normal

Ignited

Engulfed

Destroyed

Extinguished

Fire Alarm

Flooding

DC-ARM - http://www-kbs.ai.uiuc.edu

Database MiningNCSA D2K - http://www.ncsa.uiuc.edu/STI/ALG

35

Rule and Decision Tree Learning

Example: Rule Acquisition from Historical Data Data

– Patient 103 (time = 1): Age 23, First-Pregnancy: no, Anemia: no, Diabetes: no, Previous-Premature-Birth: no, Ultrasound: unknown, Elective C-Section: unknown, Emergency-C-Section: unknown

– Patient 103 (time = 2): Age 23, First-Pregnancy: no, Anemia: no, Diabetes: yes, Previous-Prema-ture-Birth: no, Ultrasound: abnormal, Elective C-Section: no, Emergency-C-Section: unknown

– Patient 103 (time = n): Age 23, First-Pregnancy: no, Anemia: no, Diabetes: no, Previous-Premature-

Birth: no, Ultrasound: unknown, Elective C-Section: no, Emergency-C-Section: YES

Learned Rule– IF no previous vaginal delivery, AND abnormal 2nd trimester ultrasound, AND malpre-

sentation at admission, AND no elective C-Section THEN probability of emergency

C-Section is 0.6

– Training set: 26/41 = 0.634– Test set: 12/20 = 0.600

36

Why Study Learning?

Computer systems with new capabilities.Develop systems that can automatically adapt and customize themselves

to the needs of the individual users through experience.Discover knowledge and patterns in databases, database mining, e.g. dis-

covering purchasing patterns for marketing purposes.

Understand human and biological learning Understanding teaching better.

37

Some Issues in Machine Learning

어떤 알고리즘이 더 잘 적용될 수 있는가 ? When? 정확도에 영향을 주는 요소는 어떻게 .?

학습 데이터의 양 ? 데이터의 상태 , 혹은 표현의 복잡도

Error 의 허용

Noise : 측정 오류 , 분류 오류

Small Split : 아주 적은 수의 증거를 통한 유추

다양한 지식 표현의 허용 보통 Attribute-value pair, ILP 는 논리식

실제 문제에서는 지식 표현 과정이 더 어려울 수도 있다 . Graph based Learning……

부분 지식의 활용

Multi-Strategy Learning

D E F I N I T I O N :

Extracting potentially useful Information from huge data

sources

Discovering previously unknown implicit knowledge

Automating the process of searching interesting patterns

in the data

Data Mining

39

Data Mining Researches

Interdisciplinary

기계 학습전문가시스템

데이터 베이스

통계학

가시화

KDDData Mining

40

Data Mining Tasks

ClassificationClusteringAssociationEstimation, PredictionPattern AnalysisText MiningWeb Mining/ Web Usage Mining

41

DM Applications

Marketing & Retail Finance/Banking/Stock Insurance (Fraud Detection) Bio-Informatics Weather Forecast CRM Web Mining Spatial/ Temporal Data Mining

42

Data Mining Tasks(1) : Classification

classification

Examples

objects

largemediumsmall

predefined classes News [international] ⇒ [domestic] [sports] [culture]… Credit application [high] ⇒

[medium] [low]

43

Data Mining Tasks(1) :Classification

실세계의 문제 대부분은 분류의 문제 . 분류 작업

지식의 형태가 Attribute-Value 쌍으로 이루어지고

그 중 특별한 Attribute 를 Class 라고 함 . 주어진 정보를 근거로 Class Attribute 의 값을 결정하는 문제

find function f where f(A1,A2, …,An) -> C(class) given (Ai,Vi) for all I(1..n) for sufficiently large data

Algorithm Decision trees, Memory based reasoning

44

Data Mining Tasks(2) : Estimation

Estimation

Examples 나이 , 성별 , 혈압… ⇒ 잔여수명

나이 , 성별 , 직업… ⇒ 연수입

지역 , 수량(水), 인구 -> 오염농도

Algorithm : Neural network Estimating future value is called Prediction

attr1attr2attr3

…data

(continuous)value

cf. classification maps to discrete categories

45

Data Mining Tasks(3) : Association

Association (Market basket analysis) - determine which things go together

Example

shopping list ⇒ Cross-Selling (supermarket (shelf, catalog, CF…) home-shopping, E-shopping…)

Association rules

46

Data Mining Tasks(4) : Clustering

ClusteringG1

G3G4

heterogeneous popula-tion

homogeneous subgroups(clusters

)

G2

Examples : Symptom Disease⇒Customer information Selective sales⇒토양 ( 수질 ) data

cf. classification - predefined category clustering - find new category & explain the category

47

Data Mining Tasks(4)

Clustering is useful for Exception finding

Algorithm K-means K clusters

exception

- calling card fraud detection- credit card fraud. etc.

Note: clustering is dependent to the features used card 예 : number, color, suite …

48

Text Data Mining

Classification of Texts Association between terminologies and contents Bio-Informatics Applications

Web Mining Classification of Web Contents Search Machine Applications

Big Data Mining..!

49

Web Mining

Web Contents Mining Text Mining Automatic Classification

Web Usage Mining Mining of Cookie Data Design of Web page Find Patterns of Visits Find Trends of Society

50

DM Algorithms

Association K-Nearest NeighborDecision TreeNeural NetworkBaysian NetworkSVM (Support Vector Machine)Clustering Algorithms

K-Means Agglomerative

51

Market Basket Analysis (Associations) (1)

O: Orange Juice M: Milk S: Soda W: Window Cleaner D: Detergent

cluster items

1 O , S 2 M , O , W 3 O , D 4 O , D , S 5 W , S … …

52


Co.Occurrence Table

O W M S D

O W M S D

4 1 1 2 1 1 2 1 1 0 1 1 1 0 0 2 1 0 3 1 1 0 0 1 2

53


{ S , O} : Co-Occurrence of 2 R1 - if S Then O R2 - if O Then S

Support - 전체 data 중 몇 percent 가 이를 포함 ? Confidence - 전체 LHS 중 몇 percent 가 규칙만족?

eg. Support of R1 2 / 5 40% Confidence of R1 2 / 3 confidence of R2 2 / 4

determine “How Good” is the Rule

54


Probability Table {A, B, C}

combination probability A 45 B 42.5 C 40 A, B 25 A, C 20 B, C 15 A, B, C 5

55


R1: If A ∧ B then C R2: If A ∧ C then B R3: If B ∧ C then A

Confidence

Support =5

Rule P(Condition) P(Cond res) confidence∧ R1 R2 R3

25 5 0.2 20 5 0.25 15 5 0.33

56


R3 has the best confidence (0.33) but is it GOOD? Note: R3 : If B ∧C then A (0.33) A (0.45) 예 : 머리 긴 사람 여자

Improvement -> How good is the rule compared to random guessing?

?

57


improvement=

improvement > 1: criteria

P(condition and result)P(condition) P(result)

rule support confidence improvement R1 R2 R3 If A then B

5 0.2 0.5 5 0.25 0.59 5 0.33 0.74 25 0.59 1.31

58


Some Issues overall algorithm build co-occurrence matrix for 1 item, 2 items, 3 items, etc. -> complex!! Pruning eg. minimum support pruning Virtual Item season, store, geographic information combined with real : items eg. If OJ ∧ Milk ∧ Friday then Beer

59


Level of Description How specific !

Drink Soda Coke

단점

Complex as data grows Limited Data Type (at-

tributes) Difficult to determine right

number of items Rare Items --> pruned

장점

Explainability Undirected Data Mining

(unsupervised) variable length data simple computation

60

Neural Net

직업유무 (1/0)

나이

월수입

부양가족수

기대출금

우량

보통

불량

Inputlayer

Hiddenlayer

Outputlayer

61

K-nearest Neighbor (K-NN)

Classify item using most similar training dataSimple Machine Learning AlgorithmInstance-based LearningFind distance in the feature space (n-dimens.)K is the parameter (number of examples) k=1 : nearest neighbor k is usually an odd number (why?)

62

KNN Example

What class is the new data? K=3, K=5

63

Clustering Algorithm (1/2)

k-means method ( Mc Queen ‘61) - lot of variations

Alg. Step 1. Choose initial k-points (seeds) 2. Find closest neighbors for k points ( initial cluster) 3. Find centroid for the cluster 4. goto step 2 stop when no more change

64

Clustering Algorithm (2/2)

Note: Finding neighbors

Finding Centroid

(x1, y1) (xn, yn)

(x2, y2) (x3, y3)x1+ … + xn

n

y1 + … + yn

n,

65

Variation of k-means

1. Use probability density rather than simple distance eg. Gaussian mixture Models 2. Weighted Distance 3. Agglomeration Method - hierarchical clustering

66

Agglomerative Algorithm

1. Start with every single record as a cluster(N)

2. Select closest cluster and combine them (N-1 clusters)3. go to step 24. Stop at the right level (number) what is closest?

67

Distance between clusters

3 measures 1. Single linkage closest members 2. Complete linkage most distant members 3. Centroid

68

Example

Given the following data, cluster them to 3 groups using Ag-glomerative Algorithm.

A(2), B(3), C(6), D(7), E(11), F(14), G(21), H(22), I(31), J(35), K(36)

Single Linkage? Complete Linkage? Centroid?

69

Clustering

Strength 1. Undirected Knowledge Discovery 2. Categorical, Numeric, Textual data 에 적합

3. Easy to Apply Weakness 1. Can be difficult to choose right (distance) measure & weight 2. Initial parameter 에 sensitive 3. Can be hard to interpret

70

Decision Tree(contact lens)

Tear production

astigmatism

spectacle press

none

soft

hard none

reduced normal

noyes

myopehypermetrope

71

Concept Learning

eg. red good customer

Learning functioninput

Class 1

Class 2…

Class n…

classification

conceptinput

yes

noConcept learning

decision tree

72

Weather data

outlook temp humid wind play s h h f s h h t o h h f r m h f r c n f r c n t o c n t s m h f s c n f r m n f s m n t o m h t o h n f r m h t

no n y y y n y n y y y y y n

s: sunny h: hot

o: overcast m: mild

h: high n: normal

r: rainy c: cool

attribute

instance

73

Decision Tree for weather (1/4)

outlook

humidity windyyes

noyes yesno

sunnyo

r

high nt

f

If Outlook = sunny and humidity = high then play = no

74


note: temp, humid can be numeric data temp>30 (hot) 10<= temp <= 30 (normal) temp<10 (cool)

75


attribute

Attribute types nominal ( categorical discreet ) ordinal ( numeric continuous) interval [10,20] ratio – real numbers

outlook temp humid wind play

76


note: Leaf node doesn’t have to be yes/no --> classification

tear

astigmatismnone

soft hard

reduced normal

no

Contact lens

77

Prediction using Decision Tree

Training(set)

Test(set)Evaluation

set

real datadata

B

A

A

B C

...

Predict expected

performance

Build trees

Choose best

78

Box Diagram of Decision Tree

humidity Windy

sunny overcast rain

n

nn

y

y

y

y

y

y

n

n

yy

y

high

n

yes

no

79

The effect of pruning

Some issues where to prune? Too high -> unnecessarily complex (overfit) too low -> lose information what to split?

Prune here!Unseen data

Training data

Error

rate

Depth of Tree

80

Error Rate

Adjusted error rate of a tree AE(T)= E(T) + α leaf-count(T) Find sub tree α1 of T s.t. AE(α1) <= AE(T) then prune all the branches that are not part of α1

yyynyyn

y er=2/7

81

Possible sub trees for weather data (1/2)

first split?(a) (b)outlook temp

yyyy

y ynnn

yyynn

yynn

yyyynn

yyyn

sunnyo rainy hot

mildcool

82

Possible sub trees for weather data (2/2)

(c ) (d)humidity windy

yyynnnn

yyyyyyn

yyyyyynn

yyynnn

high normal false true

83

Information Theory & Entropy

info([2,3]) = 0.971 bit

info([4,0]) = 0.0 bit

info([3,2]) = 0.971 bit

-> info ([2,3], [4,0], [3,2])

= (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971

= 0.693 bit

gain(outlook) = info([9,5]) - info([2,3], [4,0],[3,2])

= 0.247 bits

gain(temp) = 0.029 bit

gain(humid) = 0.152 bit

gain(windy) = 0.048 bit

84

Calculating info(x) - entropy

if either #yes or #no is 0 then info(x) = 0 if #yes = #no then info(x) is max.value can cover multi class situation eg. Info[2,3,4] = info( [2,7] + 7/9 * info[3,4] ) => entropy(p1, p2, … , pn) = - p1log p1 - p2 logp2 - … - pn log pn info([2,3,4]) = entropy ( 2/9, 3/9, 4/9 ) -> -2/9 * log 2/9 - 3/9 * log 3/9 - 4/9 log 4/9 = [-2log 2 - 3 log 3 - 4 log 4 + 9 log 9] /9

85

Data Mining Steps

86

Data Mining Evaluation

Cross ValidationTraining Sample vs. Validation SampleCheck if the Training is done correctly

MotivationLimited Data – hazardous, costly, impossible

Instead of having various samples, use various partitions

87

Types of Cross Validation

Holdout Validation random portion is assigned for validation Repeated Random Subsampling Validation random split – iteration - average K-fold Cross Validation(10-fold) K splits 1 split for validation (K-1 for training) Leave-One-Out Cross Validation Extreme case of K-fold where K=N(data size)

89

연관규칙 마이닝(Association Rule Mining) 대용량 데이터베이스에서 아이템들 간의 유용한 연관 패턴을 찾아내는 것

적용 장바구니 분석 , 교차 마케팅 , 카달로그 디자인 등 연관성 있는

것들끼리 배치

예제 술어 ( 아이템 ) → 술어 ( 아이템 )[ 지지도 , 신뢰도 ]

buys(x, “ 기저귀” ) ⇒ buys(x, “ 맥주” ) [0.5%, 60%] major(x, “CS”) ^ takes(x, “DB”) → grade(x, “A”) [1%, 75%]

90

연관규칙 - 기본 개념

기본 구성 : 아이템과 거래 (transaction) 트랜잭션으로 구성된 데이터베이스

( 예 ) 각 트랜잭션은 아이템들의 목록 : 고객의 구매 목록

모든 규칙을 찾음 ( 지지도와 신뢰도를 이용하여 ) ( 예 ) 컴퓨터를 산 고객이 동시에 금융관리 소프트웨어를 구매

컴퓨터와 금융관리 소프트웨어를 같은 곳에 배치

컴퓨터와 금융관리 소프트웨어를 서로 반대편에 배치

91

연관척도 : Support 와 Confidence

92

연관규칙 마이닝 - 기본 개념

93

일차원 vs 다차원 연관규칙

94

연관규칙과 상관분석

“연관 규칙 마이닝으로 발견된 규칙들 ( 최소지지도와 최소신뢰도

임계값을 만족하는 규칙들 ) 은 모두 실제 적용가능 할 정도로

유용한가 ?” No! 연관 규칙의 객관적인 평가 측도 : support 와 Confidence

위의 문제에 대한 해결 방안

1. 발견된 규칙들 중에서 도메인 사용자에 의해 선별

2. 향상도 (Lift value, improvement) 이용

95

Example

97

Lift Value( improvement)

98

이 값이 0.89 로써 1 보다 작음으로 game 과 video 는 서로 부정적인 상관관계를 가짐

따라서 지지도와 신뢰도가 임계치보다 크지만 흥미있는 연관규칙이 될 수 없음

최소지지도와 최소 신뢰도를 이용하여 발견된 연관 규칙을 항목들간의 상관 관계 ( 향상도 )를 고려하여 최종적으로 선택

또는 , 도메인 전문가의 도움을 받아 base 를 설정하는 것도 좋은 방법

99

Relevant Disciplines + Big Data Artificial Intelligence Bayesian Methods Cognitive Science Computational Complexity Theory Control Theory Information Theory Neuroscience Philosophy Psychology Statistics

MachineLearningSymbolic Representation

Planning/Problem SolvingKnowledge-Guided Learning

Bayes’s TheoremMissing Data Estimators

PAC FormalismMistake Bounds

Language LearningLearning to Reason

OptimizationLearning Predictors

Meta-Learning

Entropy MeasuresMDL ApproachesOptimal Codes

ANN ModelsModular Learning

Occam’s RazorInductive Generalization

Power Law of PracticeHeuristic Learning

Bias/Variance FormalismConfidence IntervalsHypothesis Testing

인공지능시스템 (Artificial Intelligence)

Documents

Transcript of 인공지능시스템 (Artificial Intelligence)