1.introduction(epoch#2)

41
Introduction to Machine Learning with Python 1. Introduction Hongdae Machine Learning Study Epoch #2 1

Transcript of 1.introduction(epoch#2)

Page 1: 1.introduction(epoch#2)

Introduction to

Machine Learning with Python

1. Introduction

Hongdae Machine Learning Study Epoch #2

1

Page 2: 1.introduction(epoch#2)

Contacts

Haesun Park

Email : [email protected]

Meetup: https://www.meetup.com/Hongdae-Machine-Learning-Study/

Facebook : https://facebook.com/haesunrpark

Blog : https://tensorflow.blog

2

Page 3: 1.introduction(epoch#2)

Book

ํŒŒ์ด์ฌ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผํ™œ์šฉํ•œ๋จธ์‹ ๋Ÿฌ๋‹, ๋ฐ•ํ•ด์„ .

(Introduction to Machine Learning with Python, Andreas

Muller & Sarah Guido์˜๋ฒˆ์—ญ์„œ์ž…๋‹ˆ๋‹ค.)

๋ฒˆ์—ญ์„œ์˜ 1์žฅ๊ณผ 2์žฅ์€๋ธ”๋กœ๊ทธ์—์„œ๋ฌด๋ฃŒ๋กœ์ฝ์„์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

์›์„œ์—๋Œ€ํ•œํ”„๋ฆฌ๋ทฐ๋ฅผ์˜จ๋ผ์ธ์—์„œ๋ณผ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

Github:

https://github.com/rickiepark/introduction_to_ml_with_python/

3

Page 4: 1.introduction(epoch#2)

์†Œ๊ฐœ

4

Page 5: 1.introduction(epoch#2)

๋จธ์‹ ๋Ÿฌ๋‹์ด๋ž€

๋ฐ์ดํ„ฐ์—์„œ์ง€์‹์„์ถ”์ถœํ•˜๋Š”์ž‘์—…์ž…๋‹ˆ๋‹ค.(vs ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹)

ํ†ต๊ณ„ํ•™, ์ธ๊ณต์ง€๋Šฅ, ์ปดํ“จํ„ฐ๊ณผํ•™์ด์–ด์šฐ๋Ÿฌ์ง„์—ฐ๊ตฌ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.

์˜ˆ์ธก๋ถ„์„predictive analytics์ด๋‚˜ํ†ต๊ณ„์ ๋จธ์‹ ๋Ÿฌ๋‹statistical learning์œผ๋กœ๋„๋ถˆ๋ฆฝ๋‹ˆ๋‹ค.

์œ„ํ‚คํ”ผ๋””์•„์˜์†Œ๊ฐœ: โ€œ์ธ๊ณต์ง€๋Šฅ์˜ํ•œ๋ถ„์•ผ๋กœ์ปดํ“จํ„ฐ๊ฐ€ํ•™์Šตํ• ์ˆ˜์žˆ๋„๋กํ•˜๋Š”

์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ๊ธฐ์ˆ ์„๊ฐœ๋ฐœํ•˜๋Š”๋ถ„์•ผโ€

Arthur Samuel โ€œField of study that gives computers the ability to learn without

being explicitly programmedโ€

๋ช…ํ™•ํ•œํ•ด๊ฒฐ๋ฐฉ๋ฒ•์„๋ชจ๋ฅด๊ฑฐ๋‚˜ํ•ด๊ฒฐ๋ฐฉ๋ฒ•์„์ฐพ๊ธฐ์–ด๋ ค์šด๋ฌธ์ œ๋ฅผ๋‹ค๋ฃน๋‹ˆ๋‹ค.

5

Page 6: 1.introduction(epoch#2)

์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜

์˜ํ™”, ์‡ผํ•‘, ์Œ์•…์ถ”์ฒœ

์‚ฌ์ง„๋ถ„๋ฅ˜, ๊ฒ€์ƒ‰, ์นœ๊ตฌ์ถ”์ฒœ, ๊ด‘๊ณ , ๋ฒˆ์—ญ, ์Œ์„ฑ์ธ์‹

์˜ํ•™, ์ž์œจ์ฃผํ–‰์ž๋™์ฐจ, ์ฒœ๋ฌธ, ์–‘์ž์—ญํ•™

6

์œ ๋Ÿฝ์ž…์ž๋ฌผ๋ฆฌ์—ฐ๊ตฌ์†ŒJames Beacham

Page 7: 1.introduction(epoch#2)

AI โŠƒ๋จธ์‹ ๋Ÿฌ๋‹ โŠƒ๋”ฅ๋Ÿฌ๋‹

7

Deep

Learning

Machine

Learning

AI

Page 8: 1.introduction(epoch#2)

์™œ๋จธ์‹ ๋Ÿฌ๋‹์ธ๊ฐ€?

๊ทœ์น™๊ธฐ๋ฐ˜์ „๋ฌธ๊ฐ€์‹œ์Šคํ…œrule based expert system์˜๋ฌธ์ œ์ (๊ณ ์ „์ ์ธ์ŠคํŒธํ•„ํ„ฐ)

๊ฒฐ์ •์—ํ•„์š”ํ•œ๋กœ์ง์ดํ•œ๋ถ„์•ผ๋‚˜์ž‘์—…์—๊ตญํ•œ, ์ž‘์—…๋ณ€๊ฒฝ์—๋”ฐ๋ผ์‹œ์Šคํ…œ๊ฐœ๋ฐœ์„

๋‹ค์‹œ์ˆ˜ํ–‰ํ• ์ˆ˜๋„์žˆ์Šต๋‹ˆ๋‹ค.

๊ทœ์น™์„์„ค๊ณ„ํ•˜๋ ค๋ฉด ๋ถ„์•ผ์˜์ „๋ฌธ๊ฐ€๋“ค์˜๊ฒฐ์ •๋ฐฉ์‹์—๋Œ€ํ•ด์ž˜์•Œ์•„์•ผํ•ฉ๋‹ˆ๋‹ค.

2001๋…„์ด์ „๊นŒ์ง€์ด๋ฏธ์ง€์—์„œ์–ผ๊ตด์„์ธ์‹ํ•˜๋Š”๋ฌธ์ œ๋ฅผํ’€์ง€๋ชปํ–ˆ์Œ

์ปดํ“จํ„ฐ์˜ํ”ฝ์…€๋‹จ์œ„๋Š”์‚ฌ๋žŒ์ด์ธ์‹ํ•˜๋Š”๋ฐฉ์‹๊ณผ๋‹ค๋ฆ„, ์–ผ๊ตด์ด๋ฌด์—‡์ธ์ง€์ผ๋ จ์˜

๊ทœ์น™์„๋งŒ๋“ค๊ธฐ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

๋จธ์‹ ๋Ÿฌ๋‹์„์‚ฌ์šฉํ•ด๋งŽ์€์–ผ๊ตด์ด๋ฏธ์ง€๋ฅผ์ œ๊ณตํ•˜๋ฉด์–ผ๊ตด์„ํŠน์ •ํ•˜๋Š”์š”์†Œ๋ฅผ์ฐพ์„์ˆ˜

์žˆ์Šต๋‹ˆ๋‹ค.

8

Page 9: 1.introduction(epoch#2)

์ง€๋„ํ•™์ŠตSupervised Learning

์•Œ๊ณ ๋ฆฌ์ฆ˜์—์ž…๋ ฅ๊ณผ๊ธฐ๋Œ€ํ•˜๋Š”์ถœ๋ ฅ์„์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜์€์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ๊ธฐ๋Œ€ํ•˜๋Š”์ถœ๋ ฅ์„๋งŒ๋“œ๋Š”๋ฐฉ๋ฒ•์„์ฐพ์Šต๋‹ˆ๋‹ค. ํ˜น์€

๊ธฐ๋Œ€ํ•˜๋Š”์ถœ๋ ฅ์ด๋‚˜์˜ค๋„๋ก์•Œ๊ณ ๋ฆฌ์ฆ˜์„๊ฐ€๋ฅด์นฉ๋‹ˆ๋‹ค.

์ŠคํŒธ๋ฌธ์ œ์˜๊ฒฝ์šฐ์ด๋ฉ”์ผ(์ž…๋ ฅ)๊ณผ์ŠคํŒธ์—ฌ๋ถ€(๊ธฐ๋Œ€์ถœ๋ ฅ)์„์ œ๊ณตํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ง€๋„ํ•™์Šต์˜์˜ˆ

์†๊ธ€์”จ์ˆซ์žํŒ๋ณ„: ์†๊ธ€์”จ์Šค์บ”์ด๋ฏธ์ง€(์ž…๋ ฅ), ์šฐํŽธ๋ฒˆํ˜ธ(๊ธฐ๋Œ€ ์ถœ๋ ฅ)

๋ฐ์ดํ„ฐ์ˆ˜์ง‘์—์ˆ˜์ž‘์—…์ด๋งŽ์Œ. ๋น„๊ต์ ์‰ฝ๊ณ ์ ์€๋น„์šฉ์†Œ๋ชจ.

์˜๋ฃŒ์˜์ƒ์—๊ธฐ๋ฐ˜ํ•œ์•”์ง„๋‹จ: ์˜์ƒ(์ž…๋ ฅ), ์ข…์–‘์—ฌ๋ถ€(๊ธฐ๋Œ€ ์ถœ๋ ฅ)

๋„๋•์ /๊ฐœ์ธ์ •๋ณด๋ฌธ์ œ๊ณ ๊ฐ€์˜์žฅ๋น„, ์ „๋ฌธ๊ฐ€์˜๊ฒฌํ•„์š”

์‹ ์šฉ์นด๋“œ๋ถ€์ •๊ฑฐ๋ž˜๊ฐ์ง€: ์‹ ์šฉ์นด๋“œ๊ฑฐ๋ž˜๋‚ด์—ญ(์ž…๋ ฅ), ๋ถ€์ •๊ฑฐ๋ž˜์—ฌ๋ถ€(๊ธฐ๋Œ€์ถœ๋ ฅ)

๊ณ ๊ฐ์—๊ฒŒ์‹ ๊ณ ๊ฐ€์˜ฌ๋•Œ๊นŒ์ง€๊ธฐ๋‹ค๋ฆฌ๋ฉด๋จ. 9

Page 10: 1.introduction(epoch#2)

๋น„์ง€๋„ํ•™์ŠตUnsupervised Learning

์•Œ๊ณ ๋ฆฌ์ฆ˜์—์ž…๋ ฅ์€์ฃผ์–ด์ง€์ง€๋งŒ์ถœ๋ ฅ์€์ œ๊ณต๋˜์ง€์•Š์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ๋น„์ง€๋„ํ•™์Šต์˜์ดํ•ดํ•˜๊ฑฐ๋‚˜ํ‰๊ฐ€ํ•˜๊ธฐ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

๋น„์ง€๋„ํ•™์Šต์˜์˜ˆ

๋ธ”๋กœ๊ทธ๊ธ€์˜์ฃผ์ œ๊ตฌ๋ถ„: ์‚ฌ์ „์—์–ด๋–ค์ฃผ์ œ๊ฐ€์žˆ๋Š”์ง€์–ผ๋งˆ๋‚˜๋งŽ์€์ฃผ์ œ๊ฐ€์žˆ๋Š”์ง€

๋ชจ๋ฆ…๋‹ˆ๋‹ค.

๊ณ ๊ฐ๋“ค์„์ทจํ–ฅ์—๋”ฐ๋ผ๊ทธ๋ฃน์œผ๋กœ๋ฌถ๊ธฐ: ๋ถ€๋ชจ, ๋…์„œ๊ด‘, ๊ฒŒ์ด๋จธ๋“ฑ์–ด๋–ค๊ทธ๋ฃน์ด์žˆ๋Š”์ง€

๋ฏธ๋ฆฌ์•Œ์ˆ˜์—†๊ณ ์–ผ๋งˆ๋‚˜๋งŽ์ด์žˆ๋Š”์ง€๋ชจ๋ฆ…๋‹ˆ๋‹ค.(ํ‰๊ฐ€ํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค)

๋น„์ •์ƒ์ ์›น์‚ฌ์ดํŠธ์ ‘๊ทผํƒ์ง€: ๋น„์ •์ƒ์ ์ธํŒจํ„ด์€๊ฐ๊ธฐ๋‹ค๋ฅผ์ˆ˜์žˆ๊ณ ๊ฐ€์ง€๊ณ ์žˆ๋Š”

๋น„์ •์ƒ๋ฐ์ดํ„ฐ๊ฐ€์—†์„์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌํŠธ๋ž˜ํ”ฝ๋งŒ๊ด€์ฐฐํ• ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

10

Page 11: 1.introduction(epoch#2)

๋ฐ์ดํ„ฐ, ํŠน์„ฑ

์ข‹์€์ž…๋ ฅ๋ฐ์ดํ„ฐ๋ฅผ๋งŒ๋“ค์–ด๋‚ด๋Š”์ผ์„ํŠน์„ฑ์ถ”์ถœfeature extraction, ํŠน์„ฑ๊ณตํ•™feature

engineering์ด๋ผ๊ณ ํ•ฉ๋‹ˆ๋‹ค.11

๋‚˜์ด ๊ตฌ๋งค๋นˆ๋„ ์ด๋ฉ”์ผ ...

37 10 xxx@gmail

33 15 yyy@gmail

ํŠน์„ฑfeatures, ๋…๋ฆฝ๋ณ€์ˆ˜, (์†์„ฑattribute)

์ƒ˜ํ”Œsample,

๋ฐ์ดํ„ฐํฌ์ธํŠธdata point

(์ธ์Šคํ„ด์Šคinstance)

ํƒ€๊นƒtarget, ์ข…์†๋ณ€์ˆ˜

์„ฑ๋ณ„

๋‚จ

์—ฌ

Page 12: 1.introduction(epoch#2)

๋ฌธ์ œ์™€๋ฐ์ดํ„ฐ์ดํ•ด

๋จธ์‹ ๋Ÿฌ๋‹ํ”„๋กœ์„ธ์Šค์—์„œ๋ฐ์ดํ„ฐ๋ฅผ์ดํ•ดํ•˜๊ณ ํ•ด๊ฒฐํ• ๋ฌธ์ œ์™€์–ด๋–ค๊ด€๋ จ์ด์žˆ๋Š”์ง€์ดํ•ดํ•˜๋Š”๊ฒƒ์ด๊ฐ€์žฅ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜๋งˆ๋‹ค์ž˜๋“ค์–ด๋งž๋Š”๋ฐ์ดํ„ฐ๋‚˜๋ฌธ์ œ์˜์ข…๋ฅ˜๊ฐ€๋‹ค๋ฆ…๋‹ˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‚˜๋ฐฉ๋ฒ•๋ก ์€๋ฌธ์ œ๋ฅผํ‘ธ๋Š”์ „์ฒด๊ณผ์ •์ค‘์ผ๋ถ€์ผ๋ฟ์ž…๋‹ˆ๋‹ค.

๋จธ์‹ ๋Ÿฌ๋‹์†”๋ฃจ์…˜์„๋งŒ๋“ค๋™์•ˆ๋งˆ์Œ์—์ƒˆ๊ฒจ์•ผํ• ์งˆ๋ฌธ๋“ค

์–ด๋–ค์งˆ๋ฌธ์—๋Œ€๋‹ต์„์›ํ•˜๋Š”๊ฐ€? ์›ํ•˜๋Š”๋‹ต์„๋งŒ๋“ค์ˆ˜์žˆ๋Š”๋ฐ์ดํ„ฐ๋ฅผ๊ฐ€์ง€๊ณ ์žˆ๋Š”๊ฐ€?

๋จธ์‹ ๋Ÿฌ๋‹์˜๋ฌธ์ œ๋กœ๊ฐ€์žฅ์ž˜๊ธฐ์ˆ ํ• ์ˆ˜์žˆ๋Š”๋ฐฉ๋ฒ•์€๋ฌด์—‡์ธ๊ฐ€?

๋ฌธ์ œ๋ฅผํ’€๊ธฐ์—์ถฉ๋ถ„ํ•œ๋ฐ์ดํ„ฐ๊ฐ€์žˆ๋Š”๊ฐ€?

์ถ”์ถœํ•œ๋ฐ์ดํ„ฐ์˜ํŠน์„ฑ์€๋ฌด์—‡์ด๊ณ ์ข‹์€์˜ˆ์ธก์„์œ„ํ•œํŠน์„ฑ์„๊ฐ€์ง€๊ณ ์žˆ๋Š”๊ฐ€?

์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๊ณผ๋ฅผ์–ด๋–ป๊ฒŒ์ธก์ •ํ• ๊ฒƒ์ธ๊ฐ€?

๋‹ค๋ฅธ์—ฐ๊ตฌ๋‚˜์ œํ’ˆ๊ณผ์–ด๋–ป๊ฒŒํ˜‘๋ ฅํ• ์ˆ˜์žˆ๋Š”๊ฐ€?12

Page 13: 1.introduction(epoch#2)

Why Python?

13

Page 14: 1.introduction(epoch#2)

์•„์ง๋„์ด๋Ÿฐ์งˆ๋ฌธ์„?

14

Page 15: 1.introduction(epoch#2)

ํŒŒ์ด์ฌ(Python)

๊ณผํ•™๋ถ„์•ผ๋ฅผ์œ„ํ•œํ‘œ์ค€ํ”„๋กœ๊ทธ๋ž˜๋ฐ์–ธ์–ด๊ฐ€๋˜์–ด๊ฐ€๊ณ ์žˆ์Šต๋‹ˆ๋‹ค.(vs Julia)

MATLAB, R ๊ฐ™์ด๋„๋ฉ”์ธ์—ํŠนํ™”๋œ์–ธ์–ด์™€ Java, C ๊ฐ™์€๋ฒ”์šฉ์–ธ์–ด์˜์žฅ์ ์„๋ชจ๋‘

๊ฐ€์ง€๊ณ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ†ต๊ณ„, ๋จธ์‹ ๋Ÿฌ๋‹, ์ž์—ฐ์–ด, ์ด๋ฏธ์ง€, ์‹œ๊ฐํ™”๋“ฑ์„ํฌํ•จํ•œํ’๋ถ€ํ•œ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€

์žˆ์Šต๋‹ˆ๋‹ค: Scikit-Learn, pandas, Numpy, Scipy, matplotlib, ...

๋ธŒ๋ผ์šฐ์ €๊ธฐ๋ฐ˜์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒํ”„๋กœ๊ทธ๋ž˜๋ฐํ™˜๊ฒฝ์ธ์ฃผํ”ผํ„ฐ๋…ธํŠธ๋ถJupyter Notebook์ด

์žˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ด์ฌ์ฃผ๋„๋”ฅ๋Ÿฌ๋‹๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ: TensorFlow, Keras, PyTorch, Caffe2, CNTK,

MXNet, Theano, ...

15

Page 16: 1.introduction(epoch#2)

Scikit-Learn

์˜คํ”ˆ์†Œ์Šค: https://github.com/scikit-learn/scikit-learn

์†Œ์Šค์ฝ”๋“œ๋ฅผ๋ณด๊ณ ์–ด๋–ป๊ฒŒ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๊ตฌํ˜„๋˜์–ด์žˆ๋Š”์ง€ํ™•์ธํ• ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

ํšŒ๊ท€, ๋ถ„๋ฅ˜, ๊ตฐ์ง‘, ์ฐจ์›์ถ•์†Œ, ํŠน์„ฑ๊ณตํ•™, ์ „์ฒ˜๋ฆฌ, ๊ต์ฐจ๊ฒ€์ฆ, ํŒŒ์ดํ”„๋ผ์ธ๋“ฑ๋จธ์‹ ๋Ÿฌ๋‹์—

ํ•„์š”ํ•œ๋„๊ตฌ๋ฅผ๋‘๋ฃจ๊ฐ–์ถ”๊ณ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ’๋ถ€ํ•œ๋ฌธ์„œ (์˜๋ฌธ): http://scikit-learn.org/stable/documentation

ํ•™๊ต, ์‚ฐ์—…ํ˜„์žฅ์—์„œ๋„๋ฆฌ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

ํญ๋„“์€์ปค๋ฎค๋‹ˆํ‹ฐ์™€๋งŽ์€ํŠœํ† ๋ฆฌ์–ผ, ์˜ˆ์ œ์ฝ”๋“œ๊ฐ€์žˆ์Šต๋‹ˆ๋‹ค.

16

Page 17: 1.introduction(epoch#2)

Appleโ€™s Core ML Support

17

Page 18: 1.introduction(epoch#2)

Scikit-Learn ์„ค์น˜

NumPy, SciPy๋ฅผ๊ธฐ๋ฐ˜์œผ๋กœํ•ฉ๋‹ˆ๋‹ค.

๋Œ€ํ™”์‹ํ™˜๊ฒฝ์„์œ„ํ•ด์„œ๋Š” IPython ์ปค๋„๊ณผ Jupyter Notebook ์„ค์น˜๊ฐ€ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Anaconda(https://www.continuum.io/anaconda-overview)

๋ฌด๋ฃŒ, ๊ณผํ•™์ „๋ฌธํŒŒ์ด์ฌ๋ฐฐํฌํŒ, ์ˆ˜๋ฐฑ๊ฐœ์˜ํŒจํ‚ค์ง€, ๋งฅ/์œˆ๋„์šฐ/๋ฆฌ๋ˆ…์Šค ์ง€์›, ์ธํ…” MKL

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌํฌํ•จ

Enthought Canopy(https://www.enthought.com)

๊ณผํ•™์ „๋ฌธํŒŒ์ด์ฌ๋ฐฐํฌํŒ, ํ•™์ƒ, ๊ต์œก๊ธฐ๊ด€์—๋ฌด๋ฃŒ(scikit-learn ๋ฏธํฌํ•จ)

Python(x, y)

์œˆ๋„์šฐ์ฆˆ์ „์šฉ๊ณผํ•™ํŒŒ์ด์ฌ๋ฐฐํฌํŒ

18

Page 19: 1.introduction(epoch#2)

ํ•„์ˆ˜๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

19

Page 20: 1.introduction(epoch#2)

Jupyter Notebook

ํ”„๋กœ๊ทธ๋žจ์ฝ”๋“œ + ๋ฌธ์„œ + ๊ฒฐ๊ณผ

(ํ…์ŠคํŠธ,๊ทธ๋ž˜ํ”„)๋ฅผ์œ„ํ•œ๋Œ€ํ™”์‹๊ฐœ๋ฐœ

ํ™˜๊ฒฝ์ž…๋‹ˆ๋‹ค.

ํƒ์ƒ‰์ ๋ฐ์ดํ„ฐ๋ถ„์„์—์œ ๋ฆฌํ•˜์—ฌ

๋งŽ์€๊ณผํ•™์ž, ์—”์ง€๋‹ˆ์–ด๋“ค์ด

์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

https://jupyter.org

20

Page 21: 1.introduction(epoch#2)

IDE for Jupyter Notebook

PyCharm: ํŒŒ์ด์ฌ IDE๋กœ๋…ธํŠธ๋ถํŒŒ์ผ์„์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

Rodeo(https://www.yhat.com/products/rodeo) : ๋ฐ์ดํ„ฐ๋ถ„์„์šฉ์ „๋ฌธ IDE์ž…๋‹ˆ๋‹ค.

JupyterLab(https://github.com/jupyter/jupyterlab) : ์ฐจ์„ธ๋Œ€์ฃผํ”ผํ„ฐ๋…ธํŠธ๋ถํ™˜๊ฒฝ

Colaboratory(https://colab.research.google.com) : ๊ตฌ๊ธ€๋“œ๋ผ์ด๋ธŒ์™€์—ฐ๋™๋˜๋Š”์—ฐ๊ตฌ์™€๊ต์œก๋ชฉ์ ์˜๋…ธํŠธ๋ถํ™˜๊ฒฝ

21

Page 22: 1.introduction(epoch#2)

NumPy

๋‹ค์ฐจ์›๋ฐฐ์—ด์„์œ„ํ•œ๊ธฐ๋Šฅ๊ณผ์„ ํ˜•๋Œ€์ˆ˜๋ฅผ๋น„๋กฏํ•ด๊ณ ์ˆ˜์ค€์ˆ˜ํ•™ํ•จ์ˆ˜์™€์œ ์‚ฌ๋‚œ์ˆ˜

์ƒ์„ฑ๊ธฐ๋ฅผํฌํ•จํ•˜๊ณ ์žˆ์Šต๋‹ˆ๋‹ค.

scikit-learn์˜๊ธฐ๋ณธ๋ฐ์ดํ„ฐ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.(TensorFlow ๋„)

http://www.numpy.org

http://www.scipy-lectures.org/ ์˜ 1์žฅ

22

๋ชจ๋“ˆ์˜๋ณ„์นญ์„๋งŒ๋“ฆ

ndarray

๋™์ผํ•œ๋ฐ์ดํ„ฐํƒ€์ž…

Page 23: 1.introduction(epoch#2)

SciPy

์„ ํ˜•๋Œ€์ˆ˜, ์ตœ์ ํ™”, ํ†ต๊ณ„๋“ฑ๋งŽ์€๊ณผํ•™๊ณ„์‚ฐํ•จ์ˆ˜๋ฅผ๋ชจ์•„๋†“์€ํŒŒ์ด์ฌํŒจํ‚ค์ง€

scikit-learn์€์•Œ๊ณ ๋ฆฌ์ฆ˜๊ตฌํ˜„์— SciPy์—๋งŽ์ด์˜์กดํ•˜๊ณ ์žˆ์Šต๋‹ˆ๋‹ค.

0์ด๋งŽ์ดํฌํ•จ๋œํ–‰๋ ฌ์„ํšจ์œจ์ ์œผ๋กœํ‘œํ˜„ํ•˜๊ธฐ์œ„ํ•œํฌ์†Œํ–‰๋ ฌsparse matrix

scipy.sparse ํŒจํ‚ค์ง€๋ฅผ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

https://www.scipy.org/scipylib

http://www.scipy-lectures.org/

์˜ 2.5์ ˆ

23

๋‹จ์œ„ํ–‰๋ ฌ

Page 24: 1.introduction(epoch#2)

SciPy

24

๋Œ€๊ฐํ–‰๋ ฌ์˜์œ„์น˜

Compressed Sparse Row Format

Coordinate Format

์ธ๋ฑ์Šค๋ฅผ์ง€์ •ํ•˜์—ฌํฌ์†Œํ–‰๋ ฌ์„์ƒ์„ฑ

๋ฐ€์ง‘๋ฐฐ์—ด๋กœ๋ถ€ํ„ฐํฌ์†Œํ–‰๋ ฌ์„์ƒ์„ฑ(๋ฉ”๋ชจ๋ฆฌ๋ถ€์กฑ๊ฐ€๋Šฅ์„ฑ)

Page 25: 1.introduction(epoch#2)

matplotlib

๊ณผํ•™๊ณ„์‚ฐ์šฉ๊ทธ๋ž˜ํ”„๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.

์„ , ํžˆ์Šคํ† ๊ทธ๋žจ, ์‚ฐ์ ๋„๋“ฑ๋‹ค์–‘ํ•œ

๊ทธ๋ž˜ํ”„๋ฅผ๊ทธ๋ฆด์ˆ˜์žˆ์œผ๋ฉฐ์ถœํŒ์ˆ˜์ค€์˜

๊ณ ํ’ˆ์งˆ์„์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋…ธํŠธ๋ถ์—๊ทธ๋ž˜ํ”„ํฌํ•จ: %matplotlib inline

https://matplotlib.org/

๊ทธ์™ธ๋Œ€ํ‘œ์ ์ธ๊ทธ๋ž˜ํ”„๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ: Bokeh, Seaborn,

Plotly

25

๋ณ„์นญ์ด๋ฆ„

Page 26: 1.introduction(epoch#2)

pandas

๋ฐ์ดํ„ฐ์ฒ˜๋ฆฌ์™€๋ถ„์„์„์œ„ํ•œ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

DataFrame inspired by Rโ€™s data.frame

์—‘์…€๊ณผ๋น„์Šทํ•˜๊ฒŒ์ด์ข…๋ฐ์ดํ„ฐํฌํ•จ

๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.(numpy์™€ํฌ๊ฒŒ๋‹ค๋ฅธ์ )

์›จ์Šค๋งคํ‚ค๋‹ˆWes Makinney์˜ โ€œํŒŒ์ด์ฌ

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผํ™œ์šฉํ•œ๋ฐ์ดํ„ฐ๋ถ„์„โ€

http://pandas.pydata.org

26

๋ณ„์นญ์ด๋ฆ„

CSV, SQL, ์—‘์…€๋“ฑ์—์„œ๋ฐ์ดํ„ฐ์ฝ์„์ˆ˜์žˆ์Œ

Page 27: 1.introduction(epoch#2)

Python 2 vs Python 3

์–ด๋–ค๊ฒƒ์„์จ์•ผํ• ์ง€์ž˜๋ชจ๋ฅด๊ฒ ๋‹ค๋ฉด Python 3์ด์ ์ ˆํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฏธํŒŒ์ด์ฌ 2๋กœ์ž‘์„ฑํ•œ์ฝ”๋“œ๊ฐ€๋งŽ์ด์žˆ๊ฑฐ๋‚˜์‚ฌ์šฉํ•˜๊ณ ์žˆ๋Š”๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ํŒŒ์ด์ฌ 2

๋ฐ–์—์ง€์›ํ•˜์ง€์•Š๋Š”๊ฒฝ์šฐ์—๋Š”ํŒŒ์ด์ฌ 2๋ฅผ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

six ํŒจํ‚ค์ง€๋ฅผ์‚ฌ์šฉํ•ดํ˜ธํ™˜๋œ์ฝ”๋“œ๋ฅผ๋งŒ๋“ค์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.(https://pythonhosted.org/six/)

Python 2.7์€ 2020๋…„๊นŒ์ง€๋งŒ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

27

Page 28: 1.introduction(epoch#2)

์‚ฌ์šฉํ•˜๋Š” Version

Python 3.6

scikit-learn 0.19.x

matplotlib 2.0.x

NumPy 1.13.x

SciPy 0.19.x

pandas 0.20.x

28

Page 29: 1.introduction(epoch#2)

Iris Dataset

29

Page 30: 1.introduction(epoch#2)

๋ถ“๊ฝƒ์˜ํ’ˆ์ข…๋ถ„๋ฅ˜

setosa, versicolor, virginica ํ’ˆ์ข…(ํƒ€๊นƒ)์„๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

ํŠน์„ฑ์€๊ฝƒ์žŽpetal, ๊ฝƒ๋ฐ›์นจsepal์˜ํญ๊ณผ๊ธธ์ด์ž…๋‹ˆ๋‹ค.

์‚ฌ์ „์—์ด๋ฏธ๋ถ„๋ฅ˜ํ•œ๋ฐ์ดํ„ฐ๋ฅผ์ด์šฉํ•˜๋ฏ€๋กœ์ง€๋„

ํ•™์Šตsupervised learning์ด๊ณ  3๊ฐœ์˜๋ถ“๊ฝƒํ’ˆ์ข…์—์„œ๊ณ ๋ฅด๋Š”

๋ถ„๋ฅ˜classification ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

ํด๋ž˜์Šคclass: ๊ฐ€๋Šฅํ•œ์ถœ๋ ฅ๊ฐ’. ์ฆ‰์„ธ๊ฐœ์˜๋ถ“๊ฝƒํ’ˆ์ข…

๋ ˆ์ด๋ธ”label: ๋ฐ์ดํ„ฐํฌ์ธํŠธํ•˜๋‚˜์—๋Œ€ํ•œ์ถœ๋ ฅ

30

Page 31: 1.introduction(epoch#2)

๋ฐ์ดํ„ฐ์ ์žฌ

๋ถ“๊ฝƒ๋ฐ์ดํ„ฐ: sklearn.datasets.load_iris()

๊ทธ์™ธ load_digits(), load_boston(), load_breast_cancer(), load_diabetes() ๋„์žˆ์Šต๋‹ˆ๋‹ค.

31

๋”•์…”๋„ˆ๋ฆฌ์™€์œ ์‚ฌํ•œscikit-learn์˜ Bunch ํด๋ž˜์Šค๋ฐ˜ํ™˜

๋ฐ์ดํ„ฐ์…‹์—๋Œ€ํ•œ์„ค๋ช…iris_dataset.DESCR[:193] ๋„๊ฐ€๋Šฅ

Page 32: 1.introduction(epoch#2)

32

๋ถ“๊ฝƒํ’ˆ์ข…์˜์ด๋ฆ„

ํŠน์„ฑ์„ค๋ช…

๋ฐ์ดํ„ฐํฌ๊ธฐ(๊ฝƒ์žŽ๊ธธ์ด/ํญ, ๊ฝƒ๋ฐ›์นจ๊ธธ์ด/ํญ)

Page 33: 1.introduction(epoch#2)

33

ํ›ˆ๋ จ๋ฐ์ดํ„ฐ

ํƒ€๊นƒ๋ฐ์ดํ„ฐ

์ƒ˜ํ”Œ

ํŠน์„ฑ

Page 34: 1.introduction(epoch#2)

๊ธฐ๋ณธ 3:1 ๋น„์œจ (test_size ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ๋ณ€๊ฒฝ๊ฐ€๋Šฅ)

ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์™€ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ

ํ›ˆ๋ จ์—์‚ฌ์šฉํ•œ๋ฐ์ดํ„ฐ๋Š”ํ…Œ์ŠคํŠธ(์ผ๋ฐ˜ํ™”)์—์‚ฌ์šฉํ•˜์ง€์•Š์Šต๋‹ˆ๋‹ค: ๋‚™๊ด€์ ์ธ์ถ”์ •๋ฐฉ์ง€

๋ฐ์ดํ„ฐ๋ถ„๋ฆฌ: ํ›ˆ๋ จ์„ธํŠธ, ํ…Œ์ŠคํŠธ์„ธํŠธ(ํ™€๋“œ์•„์›ƒholdout ์„ธํŠธ)

X: 2์ฐจ์›๋ฐฐ์—ด(ํ–‰๋ ฌ), ๋Œ€๋ฌธ์žํ‘œ๊ธฐ, y: 1์ฐจ์›๋ฐฐ์—ด(๋ฒกํ„ฐ), ์†Œ๋ฌธ์žํ‘œ๊ธฐ

34

์œ ์‚ฌ๋‚œ์ˆ˜์ƒ์„ฑ

Page 35: 1.introduction(epoch#2)

์‚ฐ์ ๋„Scatter Plot

๋‘๊ฐœ์˜ํŠน์„ฑ์„์ด์šฉ์ ์œผ๋กœ๋ฐ์ดํ„ฐ

ํ‘œ์‹œ(3์ฐจ์›์ด์ƒ์€ํ‘œํ˜„์ด์–ด๋ ค์›€)

๊ฐํŠน์„ฑ์˜์กฐํ•ฉ์—๋Œ€ํ•ด๋ชจ๋‘๊ทธ๋ฆฌ๋Š”

pandas์˜์‚ฐ์ ๋„ํ–‰๋ ฌScatter Matrix

์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

35

pandas ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ๋ณ€๊ฒฝ

Page 36: 1.introduction(epoch#2)

36

ํŠน์„ฑ์ด๋งŽ์„๊ฒฝ์šฐ์‚ฐ์ ๋„ํ–‰๋ ฌ์„๊ทธ๋ฆฌ๊ธฐ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

ํ•œ๊ทธ๋ž˜ํ”„์—๋ชจ๋“ ํŠน์„ฑ์ด๋‚˜ํƒ€๋‚˜๋Š”๊ฒƒ์ด์•„๋‹ˆ๋ฏ€๋กœ์„œ๋กœ๋‚˜๋‰˜์–ด์ง„๊ทธ๋ž˜ํ”„์‚ฌ์ด์—์ค‘์š”ํ•œ์„ฑ์งˆ์ด์žˆ์„์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

์ž๊ธฐ์ž์‹ (์ฃผ๋Œ€๊ฐ์„ )์€ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœํ‘œ์‹œ

๋จธ์‹ ๋Ÿฌ๋‹์œผ๋กœ์ž˜๊ตฌ๋ถ„ํ• ์ˆ˜์žˆ์„๊ฒƒ๊ฐ™์Œ

Page 37: 1.introduction(epoch#2)

First ML Application

37

Page 38: 1.introduction(epoch#2)

k-์ตœ๊ทผ์ ‘์ด์›ƒk-Nearest Neighbors, k-NN

์ด์•Œ๊ณ ๋ฆฌ์ฆ˜์€ํ›ˆ๋ จ๋ฐ์ดํ„ฐ๋ฅผ์ €์žฅํ•˜๋Š”๊ฒƒ์ดํ•™์Šต์˜์ „๋ถ€์ž…๋‹ˆ๋‹ค.

์ƒˆ๋ฐ์ดํ„ฐํฌ์ธํŠธ์—์„œ๊ฐ€์žฅ๊ฐ€๊นŒ์šดํ›ˆ๋ จ๋ฐ์ดํ„ฐํฌ์ธํŠธ(k๊ฐœ)๋ฅผ์ฐพ์•„์„œ๊ฐ€์žฅ๋†’์€

๋นˆ๋„์˜ํด๋ž˜์Šค๋ฅผ์˜ˆ์ธก์œผ๋กœ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋Œ€ํ‘œ์ ์ธ์ธ์Šคํ„ด์Šค๊ธฐ๋ฐ˜instance based ํ•™์Šต์•Œ๊ณ ๋ฆฌ์ฆ˜(vs model based)

38

์ด์›ƒ์˜๊ฐœ์ˆ˜: ๊ธฐ๋ณธ๊ฐ’ 5

fit() ๋ฉ”์†Œ๋“œ์—์„œ๋„knn ๊ฐ์ฒด๋ฐ˜ํ™˜ ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์ฃผ์ž…

Page 39: 1.introduction(epoch#2)

์˜ˆ์ธก

ํ•™์Šต๋œ๋ชจ๋ธ(์ด์ „์Šฌ๋ผ์ด๋“œ์˜ knn ๊ฐ์ฒด)์„์‚ฌ์šฉํ•ด์ƒˆ๋กœ์šด๋ถ“๊ฝƒ์˜ํ’ˆ์ข…์„

๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

39

sepal length, width petal length, width๊ฐ€์ƒ์œผ๋กœ์ƒˆ๋กœ์šด๋ฐ์ดํ„ฐ์ƒ์„ฑํŠน์„ฑ์€ํ•ญ์ƒ 2์ฐจ์›๋ฐฐ์—ด

์˜ˆ์ธก

์ˆซ์ž๋กœ๋œ๋ ˆ์ด๋ธ”์ด๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ์ด๊ฐ’์ด๋งž์„๊นŒ์š”?

Page 40: 1.introduction(epoch#2)

ํ‰๊ฐ€

๋ชจ๋ธ์˜์„ฑ๋Šฅ(์ •ํ™•๋„)์„ํ‰๊ฐ€ํ•˜๊ธฐ์œ„ํ•ดํ›ˆ๋ จ์—์ƒ์š”ํ•˜์ง€์•Š์•˜๋˜ํ…Œ์ŠคํŠธ์„ธํŠธ๋ฅผ

ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

40

์˜ˆ์ธก๊ฐ’์„์‹ค์ œ๋ ˆ์ด๋ธ”๊ณผ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.0 ๋˜๋Š” 1์˜๋ฐฐ์—ด

ํ‰๊ท 

ํ…Œ์ŠคํŠธ์„ธํŠธ์—๋Œ€ํ•œ์˜ˆ์ธก(ํƒ€๊นƒ์ด์—†์Šต๋‹ˆ๋‹ค)

์ƒˆ๋กœ์šด๋ฐ์ดํ„ฐ์—๋Œ€ํ•ด 97% ์ •ํ™•๋„๋ฅผ๊ธฐ๋Œ€ํ• ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

Page 41: 1.introduction(epoch#2)

์š”์•ฝ

๋ถ“๊ฝƒ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์€๋ถ„๋ฅ˜ํ•ด๋†“์€ํ’ˆ์ข…๋ฐ์ดํ„ฐ๋ฅผ์‚ฌ์šฉํ•œ์ง€๋„ํ•™์Šต์ž…๋‹ˆ๋‹ค.

์„ธ๊ฐœ์˜ํ’ˆ์ข…(ํด๋ž˜์Šค)์„๊ตฌ๋ถ„ํ•˜๋Š”๋ถ„๋ฅ˜classification ๋ฌธ์ œ(๋‹ค์ค‘๋ถ„๋ฅ˜)์ž…๋‹ˆ๋‹ค.

ํŠน์„ฑ๋ฐ์ดํ„ฐ๋ฅผ๋‹ด๊ณ ์žˆ๋Š” X(2์ฐจ์›)์™€๊ธฐ๋Œ€์ถœ๋ ฅ(๋ ˆ์ด๋ธ”)์„๊ฐ€์ง„ y(1์ฐจ์›)๋ฅผ NumPy

๋ฐฐ์—ด๋กœ๋‹ค๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ›ˆ๋ จ์„ธํŠธ์—์„œํ•™์Šต(ํ›ˆ๋ จ)ํ•˜๊ณ ํ…Œ์ŠคํŠธ์„ธํŠธ๋กœ๋งŒ๋“ค์–ด์ง„๋ชจ๋ธ์„ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

41

predict(X_new)