予測:決定木 - stat.sm.u-tokai.ac.jpstat.sm.u-tokai.ac.jp/~yama/lect/chuo/2017-05.pdf ·...

27
6データマイニング特講 1 予測:決定木 Prediction: Decision tree) データマイニング特論 第5回

Transcript of 予測:決定木 - stat.sm.u-tokai.ac.jpstat.sm.u-tokai.ac.jp/~yama/lect/chuo/2017-05.pdf ·...

  • 6 1

    Prediction: Decision tree)

  • 6 2

    (Decision tree)

  • 6 3

    (target variable)(qualitative)(Classification tree)(quantitative)(Regression tree)

    or

  • 6 4

  • 6 5

    Divided by the 1st variable

    Divided by the 2nd variable

  • 6 6

  • 6 7

  • 6 8

    CHAID C5.0 CART QUEST

    2 2

    *1

    *2

    *1 *2 10

    SASCHAID, CART, C4.5

    Answer Tree CHAIDCARTQUESTClementine CARTC5.0

    Sheet1

    CHAIDC5.0CARTQUEST

    2 2

    *1

    *2

    Sheet2

    Sheet3

  • 6 9

  • 6 10

  • 6 11

    (Regression tree)(target variable) (quantitative)(input variable)

  • 6 12

    IC 8800 10 0 15 10 0 0 20

    15000 30 0 14 10 1 0 236500 30 0 14 10 0 0 24.5

    20000 20 1 15 11 1 1 2422000 20 1 14 11 1 1 3220000 45 1 15 11 0 1 2820000 45 1 15 12 1 1 2422000 50 1 14 11 1 0 34

    Sheet1

    IC

    880010015100020

    1500030014101023

    650030014100024.5

    2000020115111124

    2200020114111132

    2000045115110128

    2000045115121124

    2200050114111034

    1500020115111021

    1400030114111020

    2200030114121128

    2000045114111028

    1250025115111029.8

    1400020114101124

    2500025115111129

    980030115101026

    1500040115111128

    2200040115101036

    180005114101023.8

    1500015115111028

    150005115120120

    Sheet2

    Sheet3

  • 6 13

    (setting of Regression tree)

    : ProbF: Fp (anova)

    : 2100:

    : : 1

  • 6 14

  • 6 15

    (2)

  • 6 16

    (2)

  • 6 17

  • 6 18

  • 6 19

    id

  • 6 20

    pruning()

  • 6 21

    hachioji-housing

    price bus 0:1:parking 0:1:time ()Bustime ()walktime ()chikunen (heyasu shikimen tatemen

    Sheet1

    pricebusparkingtimeBustimewalktimechikunenheyasushikimentatemen

    12501135305213110.1158.78

    1380112420430392.6752.04

    13801139381214109.4375.19

    14801117152203101.3259.48

    17801125214294116.1576.14

    18901125223214119.0871.28

    19001116151144120.0589.23

    1980001901927394.4288.79

    19800120020145104.11107.62

    1980101713423414180.73

    2180002002013566.0399.28

    2180112215795171.784.04

    2180112215793220.7497.7

    220011129327483.1478.08

    2380111477255112.291.89

    24501118153124157.6982.62

    25000110010315154.87110.2

    25440119019194110.0470.1

    26000070724596.6773.01

    26000170724496.6773.01

    2680012502595100102.95

    278001606145120.0895.01

    27801116133105190.74108.97

    2880112017324107.8185.72

    2880012002086138.99149.13

    31000110010175156.4488.72

    31500117017275251.3493.77

    31800114014275212.5388.59

    32500130385125100.74

    32801197275215.8896.29

    32900115015235167.5104.92

    34800150525138.0990.33

    34800116016255211.46110.54

    35001116151245229.24132.92

    39500110010155112.46182.45

    41000110010135122.47129.16

    418001808245160.0391.92

    4480011001099199.33144.91

    550001303226218.18134.04

    650001606136309.32233.99

    85500120020247401.21146.15

    Sheet2

    price

    bus0:1:

    parking 0:1:

    time ()

    Bustime ()

    walktime ()

    chikunen (

    heyasu

    shikimen

    tatemen

    busparking

    305351250110.11358.782111

    20424138092.67352.043011

    381391380109.43475.192111

    152171480101.32359.482011

    214251780116.15476.142911

    223251890119.08471.282111

    151161900120.05489.231411

    01919198094.42388.792700

    020201980104.115107.621401

    134171980141480.732310

    02020218066.03599.281300

    157222180171.7584.04911

    157222180220.74397.7911

    9312220083.14478.082711

    77142380112.2591.892511

    153182450157.69482.621211

    010102500154.875110.23101

    019192544110.04470.11901

    077260096.67573.012400

    077260096.67473.012401

    0252526801005102.95901

    0662780120.08595.011401

    133162780190.745108.971011

    173202880107.81485.72211

    020202880138.996149.13801

    010103100156.44588.721701

    017173150251.34593.772701

    014143180212.53588.592701

    03332501255100.74801

    7293280215.88596.29711

    015153290167.55104.922301

    0553480138.09590.33201

    016163480211.465110.542501

    151163500229.245132.922411

    010103950112.465182.451501

    010104100122.475129.161301

    0884180160.03591.922401

    010104480199.339144.91901

    0335500218.186134.042201

    0666500309.326233.991301

    020208550401.217146.152401

    3051250110.113K58.78210426(65)5151

    204138092.672LDK52.04300120-006327

    3811380109.434DK75.19210426(25)6981

    1521480101.322LDK59.48200426(65)5151

    2141780116.153SLDK76.14290426(25)6981

    2231890119.084DK71.28210120-556762

    1511900120.054DK89.23140426(25)6981

    1 019198094.423DK88.79270426(25)6981

    0201980104.114LDK107.62140120-553536

    13419801414DK80.73230426(65)5151

    020218066.035DK99.28130120-556764

    1572180171.74LDK84.0490426(32)7771

    1572180220.742LDK97.790120-556764

    93220083.144DK78.08270120-556515

    772380112.25DK91.89250426(25)6981

    1532450157.694DK82.62120426(65)5151

    0102500154.874LDK110.2310426(25)6981

    0192544110.043LDK70.1190426(46)2331

    4 07260096.674LDK73.01240426(25)6981

    07260096.674DK73.01240120-556762

    02526801004LDK102.959042(771)3313

    062780120.084LDK95.01140426(20)2772

    2 1332780190.744SLDK108.97100120-556764

    1732880107.814DK85.7220426(25)6981

    0202880138.995LDK149.1380120-556515

    1512980169.82135.8135.8140426(25)6981

    0103100156.444LDK88.72170426(46)2331

    0173150251.344LDK93.77270426(46)2821

    0143180212.535SDK88.59270120-237413

    0332501254LDK100.7480426(25)6981

    3 723280215.884LDK96.2970120-556515

    0153290167.55DK104.92230426(46)2821

    053480138.094LDK90.3320426(25)6981

    0163480211.464LDK110.54250426(65)5151

    3 1513500229.244LDK132.92240120-710318

    0113660100.0194.1894.1880426(65)5151

    0103950112.464SLDK182.45150426(25)6981

    0104100122.474LDK129.16130120-710318

    1 084180160.034LDK91.9224-0422(54)1161

    0104480199.338LDK144.9190120-710318

    3 035500218.185SLDK134.04220426(25)6981

    2 066500309.325SLDK233.99130426(25)6981

    2 0208550401.216LDK146.15240426(25)6981

  • 6 22

    Classification tree(target variable) (qualitative)(input variable)

    both qualitative and quantitative variable

    SQL

  • 6 23

    ProbChisq: Pearson 2p: C5.0Gini: GiniCART

  • 6 24

    Newschan

    (acredit)

  • 6 25

    (NewsChanJ)

  • 4 26

    531 67

    EnterpriseMiner

  • 6 27

    SAS 2005 (2004) , , D. Steinberg 2CART 1999 . 2001

    2001 2002

    Prediction: Decision tree)(Decision tree)(Regression tree)(setting of Regression tree)(2)(2) 18 pruningClassification tree(NewsChanJ)531