Ch10 Logistic Regression

42
Ch10 Logistic Regression

description

Ch10 Logistic Regression. 迴歸分析 用於描述一應變數與一個( ) 的預測變數之關係. 必須滿足的假設: 常態性( 獨立變數並非常態性的假設 ) 變異數的均質性 獨立性. 迴歸分析之功用: 預測(給 x 求 y ) 控制(給 y 求 x ) 描述. Logistic Regression An Introduction to Categorical data Analysis---Alan Agresti, 1996 - PowerPoint PPT Presentation

Transcript of Ch10 Logistic Regression

Page 1: Ch10   Logistic Regression

Ch10 Logistic Regression

Page 2: Ch10   Logistic Regression

2

迴歸分析

用於描述一應變數與一個 ( ) 的預測變數之關係 .

必須滿足的假設 : 常態性 ( 獨立變數並非常態性的假設 ) 變異數的均質性 獨立性

Page 3: Ch10   Logistic Regression

3

迴歸分析之功用 :

預測(給 x 求 y )控制(給 y 求 x )描述

Page 4: Ch10   Logistic Regression

4

Logistic RegressionAn Introduction to Categorical data Analysis---Alan Agresti, 1996 當區別分析的群體中 , 不符合常態分配假設時 , 可用 (logistic Regression) 來做 .  Logistic Regression並非預測事件是否發生 . 而是預測該事件的機率 .當應變數 (x) 屬於離散型的變數 , 其分類只有2類或少數時 , 以 logistic

Regression來分析 .

Page 5: Ch10   Logistic Regression

5

Logistic Regression 能討論類別 , 定量的自變數對一類別的關係 .進行消費者問卷調查時 , 獲得消費者行為的質性分類資料 ( 會不會投資 ,購買意願 , 發生與未發生等 ) 並獲得影響此分類資料的原因 ( 年紀 , 收入 , 產地 ,經濟景氣 , 氣候與偏好 )

當應變數有兩個或 屬直性之變數時 , 用 logistic or Probit來分析較適當 .

    

Page 6: Ch10   Logistic Regression

6

Logistic Regression 二元資料的廣義線性模式 (Binary data) 很多類別的反應變數只有兩類 :投票 (民主黨 vs 共和黨)汽車的選擇 (進口車 vs 國產車)婦女是否有乳癌的診斷 (無 vs 有)

以 Y 表二元反應 P(Y=1) =  成功   P(Y=0) = 1 - 失敗

Page 7: Ch10   Logistic Regression

7

二元反應亦稱伯努利變數 (Bernoulli Variable) 其分佈由成功機率與失敗機率所訂 . 此分佈

平均數 E(Y) = 變異數 V ar(Y) = 1- 若一參數的二元反應有幾個獨立觀測值 ,  則成功數服從具有指標 n 及的二項分配 

Page 8: Ch10   Logistic Regression

8

Logical regression function

P = ef(x)

1 – P =

1 + ef(x)

1 + ef(x)

1

成功的機率 ( 非線性 )

失敗的機率 ( 非線性 )

P1 – P

= ef(x) 優勢比

ln ( ) = f(x) = 0 + 1x1+ 2 x2 +..

P1 - P

Page 9: Ch10   Logistic Regression

9

1

x

(x)

(a)

x

1

(x)

(b)

> 0 < 0

(x) 與 x 的非線性關係是單調的 (monotonic)

(x) 隨著 x 的增加而連續地遞增 or

(x) 隨著 x 的增加而連續地遞減

log ( )

1- = + 經過轉換而成具

有線性的性質

Page 10: Ch10   Logistic Regression

10

參數 決定曲線上升或下降的速度 .當 > 0. (x) 隨 x 之增加而增加 如 (a)

當 < 0. (x) 隨 x 之增加而減少 如 (b)

當 = 0.曲線便成水平線 . 此時 (x) 對 x 而言是常數 . Y 與 x 成獨立 .

Page 11: Ch10   Logistic Regression

11

1

x

0.5

(a) > 0

logit curve 最陡處

由圖 (a) 在特定的 x 值做一切線 ,描述該點的變化率以參數的 logistical regression 來討論該點斜率

m = (x) ( 1 - (x) )

Ex: if (x) =0.5 m= (0.5)(0.5)=0.25

when (x) =1 m= 0

Page 12: Ch10   Logistic Regression

12

1

x

0.5

(a) > 0

logit curve 最陡處

曲線最陡處發生在 (x) =0.5 對應的 x 處 . 其 x = -

Ex: log ( ) = (x)

1- (x) + x

log(0.5/0.5) = + x

log 1 = + x + x =0 x = -

Page 13: Ch10   Logistic Regression

13

Odds Ratio Interpretation ( 優勝比的解釋 )

odds vs. the odds ratio: 勝算 vs. 勝算比

(x)

1- (x)= exp ( + ) = e (e )x

此式提供 一個解釋 :

勝算在 x 增加一單位時 , 有依倍數的增加效應 (e )

勝算對數 log = + x 即 (x) 的 logical變換 , 具線性關係 .

i.e. x 的每一單位改變導致 logical 值單位的增減 .

(x)

1- (x)

Page 14: Ch10   Logistic Regression

14

logical regression 優於其它機率值的原因 : ( 針對個案對照組的原因 cas-control studies)

針對回朔抽樣設計 (retrospective sampling design)

Ex: 個案對照研究

Y=1 反應 (cases) 觀察二組樣本若個案與

Y=0 對照案 (controls)對照有差異的分佈 . 表示 x 與 Y 之間有存在關聯

logical regression涉及 (odds & the odds ratio ) 勝算比 . 可配適此種模型於回朔資料 , 並估計個案與對照案的效應 .

Page 15: Ch10   Logistic Regression

15

Inference for logical regression: 效應的信賴區間

探討模型參數的統計理論 協助評斷效應的顯著性與其大小 .針對大樣本

log = + x 中 的信賴區間為

(x)

1- (x)

+ Z (ASE) 2

此區間端點取指數 : e 因 x 一單位增加

對勝算的倍數效應之對應區間

Page 16: Ch10   Logistic Regression

16

ASE (Asymptotic Standard Error) 漸進標準誤Ex: 探討雌蟹寬度 (gap) 是否存在跟班 ? (Y=1 有 Y=0無 , 預測有跟班的雌蟹數目 )

=0.497 而 ASE = 0.102

Sol:

因 的一個 95% 信賴區間為 :

+ Z (ASE) 2 = 0.497 1.96 (0.102) = ( 0.298, 0.697)

推論 : 寬度每增加一公分 ,至少提高有跟班的勝算 35% 最高能提高一倍 .

Page 17: Ch10   Logistic Regression

17

Logical regression significance testing( 顯著性檢定 )

Ho : = 0 表示成功機率和 x 無關

Ha : ‡ 0 表示成功機率和 x 有關 在 = 0 時 具標準常態分配 ( 可取得單或雙尾 )

在 ‡ 0 時 ,z2 具 df=1 的 2 分配

p 值 : 超過觀測值的 2 分配 ---右尾機率

在大樣本 ,檢定統計量為 此參數估計除以其標準誤後取平方 . 稱為華德統計量 (Wald Statistics)

+ Z (ASE) 2

Page 18: Ch10   Logistic Regression

18

模型推論與檢核的另一種方法 :使用概似函數比

在下列二種情況下取最大 , 再求比率 .

1) 在 H0 限制下 , 參數所有可能值範圍內求極大 .

2) 在全模型限制下 , H0 或 H1 成立均可 . 參數所有可 能值 範圍內求極大 .

令 l 1 : 全模型限制下概似函數的最大值 .

l2 : H0 之較簡單模型限制下的最大值 .

Page 19: Ch10   Logistic Regression

19

Ex: 線性預測 + x 之

Ho : = 0Ha : ‡ 0

則 l 0 : 在 = 0 時 ,概似函數於最像會產生所見資料

的 值 .

l 1 : 概似函數在看起來最像會產生所見到的資料

(, ) 組合起來 .

其中 , l 0 是在產生 l 1 範圍之ㄧ個限制子集合上之最大值 . 所以 l 1 至少與 l 0 一樣大 .

Page 20: Ch10   Logistic Regression

20

Likelihood-ratio 檢定統計量

-2 log ( l 0 / l 1 ) = -2 [ log (l 0 ) - log (l 1 ) ] = -2 [ L 0 -

L 1 ]

L 0 與 L 1 表極大化的對數概似數值 .

在 Ho : = 0 時 , 此統計量能服從大樣本df=1 的 2 分配 .

一般實務上 ,概似度函數比檢定比華德檢定可靠 .概似度函數比檢定是比較 = 0 ( i.e. 強制 (x)

在所有 x 值都相同 ) 時 , 對數概似函數最大值 L1 .

Page 21: Ch10   Logistic Regression

21

檢定統計量 -2 (L0 –L1) 具有 df =1 的大樣本 2 分配 .

Page 22: Ch10   Logistic Regression

22

EXHIBIT 10.1: Logistic regression analysis with one categorical variable as the independent variable

1

Response Variable: SUCCESS Number of Observations: 24 Link Function: Logit Response Levels: 2

Response Profile Ordered Value SUCCESS Count 1 1 12 2 2 12

Page 23: Ch10   Logistic Regression

23

Exhibit 10.1 (continued)

Criteria for Assessing Model Fit

Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 35.271 21.864 .

SC 36.449 24.221 .

-2 LOG L 33.271 17.864 15.407 with 1 DF (p=0.0001)

Score . . 13.594 with 1 DF (p=0.0002)

2

2a

2b

2c

2d

Page 24: Ch10   Logistic Regression

24

Exhibit 10.1 (continued)

Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > StandardizedVariable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -1.7047 0.7687 4.9181 0.0266 . SIZE 4.0073 1.3003 9.4972 0.0021 1.124514

Association of Predicted Probabilities and Observed Responses Concordant = 76.4% Somers' D = 0.750 Discordant = 1.4% Gamma = 0.964 Tied = 22.2% Tau-a = 0.391 (144 pairs) c = 0.875

3

4a

Page 25: Ch10   Logistic Regression

25

Exhibit 10.1 (continued)

Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 10 2 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 11 13 24

Sensitivity= 83.3% Specificity= 91.7% Correct= 87.5% False Positive Rate= 9.1% False Negative Rate= 15.4%

NOTE: An EVENT is an outcome whose ordered response value is 1.

4b

Page 26: Ch10   Logistic Regression

26

Exhibit 10.1 (continued)

OBS SUCCESS SIZE PHAT OBS SUCCESS SIZE PHAT 1 1 1 0.90909 13 2 1 0.90909 2 1 1 0.90909 14 2 0 0.15385 3 1 1 0.90909 15 2 0 0.15385 4 1 1 0.90909 16 2 0 0.15385 5 1 1 0.90909 17 2 0 0.15385 6 1 1 0.90909 18 2 0 0.15385 7 1 1 0.90909 19 2 0 0.15385 8 1 1 0.90909 20 2 0 0.15385 9 1 1 0.90909 21 2 0 0.15385 10 1 1 0.90909 22 2 0 0.15385 11 1 0 0.15385 23 2 0 0.15385 12 1 0 0.15385 24 2 0 0.15385

5

Page 27: Ch10   Logistic Regression

27

Exhibit 10.2: Contingency Analysis Output

TABLE OF SUCCESS BY SIZE SUCCESS SIZE Frequency| Percent | Row Pct | Col Pct | 1| 2| Total -------------+----------+----------+ 1 | 10 | 2 | 12 | 41.67 | 8.33 | 50.00 | 83.33 | 16.67 | | 90.91| 15.38 | -------------+-----------+----------+ 2 | 1 | 11 | 12 | 4.17 | 45.83 | 50.00 | 8.33 | 91.67 | | 9.09 | 84.62 | -------------+-----------+-----------+ Total 11 13 24 45.83 54.17 100.00

Page 28: Ch10   Logistic Regression

28

Exhibit 10.2 (continued)

STATISTICS FOR TABLE OF SUCCESS BY SIZE Statistic DF Value Prob -------------------------------------------------------------------------- Chi-Square 1 13.594 0.000 Likelihood Ratio Chi-Square 1 15.407 0.000 Continuity Adj. Chi-Square 1 10.741 0.001

Statistic Value ASE -------------------------------------------------------------------------- Gamma 0.964 0.046 Kendall's Tau-b 0.753 0.133 Stuart's Tau-c 0.750 0.134 Somers' D C|R 0.750 0.134 Somers' D R|C 0.755 0.132

1

2

Page 29: Ch10   Logistic Regression

29

Exhibit 10.3: Logistic regression for categorical and continuous variables

Step 0. Intercept entered:

Analysis of Maximum Likelihood Estimates

Parameter Standard Wald Pr > Standardized

Variable Estimate Error Chi-Square Chi-Square Estimate

INTERCPT 0 0.4082 0.0000 1.0000 .

Residual Chi-Square = 16.5512 with 2 DF (p=0.0003)

1

1a

Page 30: Ch10   Logistic Regression

30

Exhibit 10.3 (continued)

Analysis of Variables Not in the Model

Score Pr >Variable Chi-Square Chi-squareSIZE 13.5944 0.0002FP 13.8301 0.0002

Step 1. Variable FP entered:

Analysis of Variables Not in the Model

Score Pr > Variable Chi-Square Chi-Square SIZE 5.0283 0.0249

2

3

3a

Page 31: Ch10   Logistic Regression

31

Exhibit 10.3 (continued)

Step 2. Variable SIZE entered:

Criteria for Assessing Model Fit Intercept Intercept and Criterion Only Covariates Chi-Square for

Covariates AIC 35.271 17.789 . SC 36.449 21.323 . -2 LOG L 33.271 11.789 21.482 with 2 DF

(p=0.0001) Score . . 16.551 with 2 DF

(p=0.0003)

4

4a

Page 32: Ch10   Logistic Regression

32

Exhibit 10.3 (continued)

Analysis of Maximum Likelihood Estimates

Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate

INTERCPT -4.4450 1.8432 5.8159 0.0159 . SIZE 3.0552 1.5981 3.6550 0.0559 0.857342 FP 1.9245 0.9116 4.4570 0.0348 1.139820

4b

Page 33: Ch10   Logistic Regression

33

Exhibit 10.3 (continued)

Association of Predicted Probabilities and Observed Responses

Concordant = 95.8% Somers' D = 0.917 Discordant = 4.2% Gamma = 0.917 Tied = 0.0% Tau-a = 0.478 (144 pairs) c = 0.958

NOTE: All explanatory variables have been entered into the model.

Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-

Square 1 FP 1 13.8301 . 0.0002 2 SIZE 2 5.0283 . 0.0249

4c

4d

Page 34: Ch10   Logistic Regression

34

Exhibit 10.3 (continued)

Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 9 3 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 10 14 24

Sensitivity= 75.0% Specificity= 91.7% Correct= 83.3% False Positive Rate= 10.0% False Negative Rate= 21.4%

5

Page 35: Ch10   Logistic Regression

35

Exhibit 10.3 (continued)

NOTE: An EVENT is an outcome whose ordered response value is 1.

OBS SUCCESS SIZE FP PHAT OBS SUCCESS SIZE FP PHAT 1 1 1 0.58 0.43202 13 2 1 2.28 0.95248 2 1 1 2.80 0.98199 14 2 0 1.06 0.08278 3 1 1 2.77 0.98094 15 2 0 1.08 0.08575 4 1 1 3.50 0.99525 16 2 0 0.07 0.01325 5 1 1 2.67 0.97699 17 2 0 0.16 0.01572 6 1 1 2.97 0.98695 18 2 0 0.70 0.04319 7 1 1 2.18 0.94297 19 2 0 0.75 0.04735 8 1 1 3.24 0.99220 20 2 0 1.61 0.20641 9 1 1 1.49 0.81421 21 2 0 0.34 0.02208 10 1 1 2.19 0.94400 22 2 0 1.15 0.09692 11 1 0 2.70 0.67939 23 2 0 0.44 0.02664 12 1 0 2.57 0.62265 24 2 0 0.86 0.05787

5a

Page 36: Ch10   Logistic Regression

36

Exhibit 10.4: Discriminant analysis for data in Table 10.1

Canonical Discriminant Functions

Pct of Cum Canonical After Wilks'

Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df Sig

: 0 .310367 24.570 2 .0000

1* 2.2220 100.00 100.00 .8304 :

* Marks the 1 canonical discriminant functions remaining in the analysis.

Unstandardized canonical discriminant function coefficients

Func 1

SIZE 1.8552118

FP .9162471

(Constant) -2.3834923

1

2

Page 37: Ch10   Logistic Regression

37

Exhibit 10.4 (continued)

Classification results -

No. of Predicted Group Membership Actual Group Cases 1 2--------------------- ------- -------------- --------------

Group 1 12 11 1 91.7% 8.3% Group 2 12 1 11 8.3% 91.7%

Percent of "grouped" cases correctly classified: 91.67%

3

Page 38: Ch10   Logistic Regression

38

Exhibit 10.5: Logistic Regression For Mutual Fund Data

Stepwise Selection Procedure

Criteria for Assessing Model Fit

1

Intercept

Intercept and

Criterion Only Covariates Chi-Square for Covariates

AIC 190.400 147.711 .

SC 193.327 165.275 .

-2 LOG L 188.400 135.711 52.689 with 5 DF (p=0.0001)

Score . . 44.034 with 5 DF (p=0.0001)

NOTE: All explanatory variables have been entered into the model.

1b1a

Page 39: Ch10   Logistic Regression

39

Exhibit 10.5 (continued)

Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-Square 1 YIELD 1 21.0379 . 0.0001 2 TOTRET 2 11.9103 . 0.0006 3 SIZE 3 8.5928 . 0.0034 4 SCHARGE 4 4.1344 . 0.0420 5 EXPENRAT 5 5.5516 . 0.0185

2

Page 40: Ch10   Logistic Regression

40

Exhibit 10.5 (continued)

Analysis of Maximum Likelihood Estimates

Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square EstimateINTERCPT -2.5902 1.2642 4.1981 0.0405 .SIZE 0.8542 0.4773 3.2020 0.0735 0.236320SCHARGE -0.1394 0.0589 5.6088 0.0179 -0.302154EXPENRAT - 1.4361 0.6793 4.4699 0.0345 -0.321113TOTRET 0.8090 0.2509 10.3988 0.0013 0.402480YIELD 0.0553 0.0124 19.9669 0.0001 0.694773

3

Page 41: Ch10   Logistic Regression

41

Exhibit 10.5 (continued)

Association of Predicted Probabilities and Observed Responses

Concordant = 85.5% Somers' D = 0.711 Discordant = 14.4% Gamma = 0.712 Tied = 0.1% Tau-a = 0.351 (4661 pairs) c = 0.856

4

Page 42: Ch10   Logistic Regression

42

Exhibit 10.5 (continued)

Classification Table

Predicted EVENT NO EVENT Total +---------------------+ EVENT ] 45 14 ] 59 Observed ] ] NO EVENT ] 12 67 ] 79 +---------------------+ Total 57 81 138

Sensitivity= 76.3% Specificity= 84.8% Correct= 81.2% False Positive Rate= 21.1% False Negative Rate= 17.3%

5