Ch10 Logistic Regression
-
Upload
jada-conner -
Category
Documents
-
view
62 -
download
3
description
Transcript of Ch10 Logistic Regression
Ch10 Logistic Regression
2
迴歸分析
用於描述一應變數與一個 ( ) 的預測變數之關係 .
必須滿足的假設 : 常態性 ( 獨立變數並非常態性的假設 ) 變異數的均質性 獨立性
3
迴歸分析之功用 :
預測(給 x 求 y )控制(給 y 求 x )描述
4
Logistic RegressionAn Introduction to Categorical data Analysis---Alan Agresti, 1996 當區別分析的群體中 , 不符合常態分配假設時 , 可用 (logistic Regression) 來做 . Logistic Regression並非預測事件是否發生 . 而是預測該事件的機率 .當應變數 (x) 屬於離散型的變數 , 其分類只有2類或少數時 , 以 logistic
Regression來分析 .
5
Logistic Regression 能討論類別 , 定量的自變數對一類別的關係 .進行消費者問卷調查時 , 獲得消費者行為的質性分類資料 ( 會不會投資 ,購買意願 , 發生與未發生等 ) 並獲得影響此分類資料的原因 ( 年紀 , 收入 , 產地 ,經濟景氣 , 氣候與偏好 )
當應變數有兩個或 屬直性之變數時 , 用 logistic or Probit來分析較適當 .
6
Logistic Regression 二元資料的廣義線性模式 (Binary data) 很多類別的反應變數只有兩類 :投票 (民主黨 vs 共和黨)汽車的選擇 (進口車 vs 國產車)婦女是否有乳癌的診斷 (無 vs 有)
以 Y 表二元反應 P(Y=1) = 成功 P(Y=0) = 1 - 失敗
7
二元反應亦稱伯努利變數 (Bernoulli Variable) 其分佈由成功機率與失敗機率所訂 . 此分佈
平均數 E(Y) = 變異數 V ar(Y) = 1- 若一參數的二元反應有幾個獨立觀測值 , 則成功數服從具有指標 n 及的二項分配
8
Logical regression function
P = ef(x)
1 – P =
1 + ef(x)
1 + ef(x)
1
成功的機率 ( 非線性 )
失敗的機率 ( 非線性 )
P1 – P
= ef(x) 優勢比
ln ( ) = f(x) = 0 + 1x1+ 2 x2 +..
P1 - P
9
1
x
(x)
(a)
x
1
(x)
(b)
> 0 < 0
(x) 與 x 的非線性關係是單調的 (monotonic)
(x) 隨著 x 的增加而連續地遞增 or
(x) 隨著 x 的增加而連續地遞減
log ( )
1- = + 經過轉換而成具
有線性的性質
10
參數 決定曲線上升或下降的速度 .當 > 0. (x) 隨 x 之增加而增加 如 (a)
當 < 0. (x) 隨 x 之增加而減少 如 (b)
當 = 0.曲線便成水平線 . 此時 (x) 對 x 而言是常數 . Y 與 x 成獨立 .
11
1
x
0.5
(a) > 0
logit curve 最陡處
由圖 (a) 在特定的 x 值做一切線 ,描述該點的變化率以參數的 logistical regression 來討論該點斜率
m = (x) ( 1 - (x) )
Ex: if (x) =0.5 m= (0.5)(0.5)=0.25
when (x) =1 m= 0
12
1
x
0.5
(a) > 0
logit curve 最陡處
曲線最陡處發生在 (x) =0.5 對應的 x 處 . 其 x = -
Ex: log ( ) = (x)
1- (x) + x
log(0.5/0.5) = + x
log 1 = + x + x =0 x = -
13
Odds Ratio Interpretation ( 優勝比的解釋 )
odds vs. the odds ratio: 勝算 vs. 勝算比
(x)
1- (x)= exp ( + ) = e (e )x
此式提供 一個解釋 :
勝算在 x 增加一單位時 , 有依倍數的增加效應 (e )
勝算對數 log = + x 即 (x) 的 logical變換 , 具線性關係 .
i.e. x 的每一單位改變導致 logical 值單位的增減 .
(x)
1- (x)
14
logical regression 優於其它機率值的原因 : ( 針對個案對照組的原因 cas-control studies)
針對回朔抽樣設計 (retrospective sampling design)
Ex: 個案對照研究
Y=1 反應 (cases) 觀察二組樣本若個案與
Y=0 對照案 (controls)對照有差異的分佈 . 表示 x 與 Y 之間有存在關聯
logical regression涉及 (odds & the odds ratio ) 勝算比 . 可配適此種模型於回朔資料 , 並估計個案與對照案的效應 .
15
Inference for logical regression: 效應的信賴區間
探討模型參數的統計理論 協助評斷效應的顯著性與其大小 .針對大樣本
log = + x 中 的信賴區間為
(x)
1- (x)
+ Z (ASE) 2
此區間端點取指數 : e 因 x 一單位增加
對勝算的倍數效應之對應區間
16
ASE (Asymptotic Standard Error) 漸進標準誤Ex: 探討雌蟹寬度 (gap) 是否存在跟班 ? (Y=1 有 Y=0無 , 預測有跟班的雌蟹數目 )
=0.497 而 ASE = 0.102
Sol:
因 的一個 95% 信賴區間為 :
+ Z (ASE) 2 = 0.497 1.96 (0.102) = ( 0.298, 0.697)
推論 : 寬度每增加一公分 ,至少提高有跟班的勝算 35% 最高能提高一倍 .
17
Logical regression significance testing( 顯著性檢定 )
Ho : = 0 表示成功機率和 x 無關
Ha : ‡ 0 表示成功機率和 x 有關 在 = 0 時 具標準常態分配 ( 可取得單或雙尾 )
在 ‡ 0 時 ,z2 具 df=1 的 2 分配
p 值 : 超過觀測值的 2 分配 ---右尾機率
在大樣本 ,檢定統計量為 此參數估計除以其標準誤後取平方 . 稱為華德統計量 (Wald Statistics)
+ Z (ASE) 2
18
模型推論與檢核的另一種方法 :使用概似函數比
在下列二種情況下取最大 , 再求比率 .
1) 在 H0 限制下 , 參數所有可能值範圍內求極大 .
2) 在全模型限制下 , H0 或 H1 成立均可 . 參數所有可 能值 範圍內求極大 .
令 l 1 : 全模型限制下概似函數的最大值 .
l2 : H0 之較簡單模型限制下的最大值 .
19
Ex: 線性預測 + x 之
Ho : = 0Ha : ‡ 0
則 l 0 : 在 = 0 時 ,概似函數於最像會產生所見資料
的 值 .
l 1 : 概似函數在看起來最像會產生所見到的資料
(, ) 組合起來 .
其中 , l 0 是在產生 l 1 範圍之ㄧ個限制子集合上之最大值 . 所以 l 1 至少與 l 0 一樣大 .
20
Likelihood-ratio 檢定統計量
-2 log ( l 0 / l 1 ) = -2 [ log (l 0 ) - log (l 1 ) ] = -2 [ L 0 -
L 1 ]
L 0 與 L 1 表極大化的對數概似數值 .
在 Ho : = 0 時 , 此統計量能服從大樣本df=1 的 2 分配 .
一般實務上 ,概似度函數比檢定比華德檢定可靠 .概似度函數比檢定是比較 = 0 ( i.e. 強制 (x)
在所有 x 值都相同 ) 時 , 對數概似函數最大值 L1 .
21
檢定統計量 -2 (L0 –L1) 具有 df =1 的大樣本 2 分配 .
22
EXHIBIT 10.1: Logistic regression analysis with one categorical variable as the independent variable
1
Response Variable: SUCCESS Number of Observations: 24 Link Function: Logit Response Levels: 2
Response Profile Ordered Value SUCCESS Count 1 1 12 2 2 12
23
Exhibit 10.1 (continued)
Criteria for Assessing Model Fit
Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 35.271 21.864 .
SC 36.449 24.221 .
-2 LOG L 33.271 17.864 15.407 with 1 DF (p=0.0001)
Score . . 13.594 with 1 DF (p=0.0002)
2
2a
2b
2c
2d
24
Exhibit 10.1 (continued)
Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > StandardizedVariable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -1.7047 0.7687 4.9181 0.0266 . SIZE 4.0073 1.3003 9.4972 0.0021 1.124514
Association of Predicted Probabilities and Observed Responses Concordant = 76.4% Somers' D = 0.750 Discordant = 1.4% Gamma = 0.964 Tied = 22.2% Tau-a = 0.391 (144 pairs) c = 0.875
3
4a
25
Exhibit 10.1 (continued)
Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 10 2 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 11 13 24
Sensitivity= 83.3% Specificity= 91.7% Correct= 87.5% False Positive Rate= 9.1% False Negative Rate= 15.4%
NOTE: An EVENT is an outcome whose ordered response value is 1.
4b
26
Exhibit 10.1 (continued)
OBS SUCCESS SIZE PHAT OBS SUCCESS SIZE PHAT 1 1 1 0.90909 13 2 1 0.90909 2 1 1 0.90909 14 2 0 0.15385 3 1 1 0.90909 15 2 0 0.15385 4 1 1 0.90909 16 2 0 0.15385 5 1 1 0.90909 17 2 0 0.15385 6 1 1 0.90909 18 2 0 0.15385 7 1 1 0.90909 19 2 0 0.15385 8 1 1 0.90909 20 2 0 0.15385 9 1 1 0.90909 21 2 0 0.15385 10 1 1 0.90909 22 2 0 0.15385 11 1 0 0.15385 23 2 0 0.15385 12 1 0 0.15385 24 2 0 0.15385
5
27
Exhibit 10.2: Contingency Analysis Output
TABLE OF SUCCESS BY SIZE SUCCESS SIZE Frequency| Percent | Row Pct | Col Pct | 1| 2| Total -------------+----------+----------+ 1 | 10 | 2 | 12 | 41.67 | 8.33 | 50.00 | 83.33 | 16.67 | | 90.91| 15.38 | -------------+-----------+----------+ 2 | 1 | 11 | 12 | 4.17 | 45.83 | 50.00 | 8.33 | 91.67 | | 9.09 | 84.62 | -------------+-----------+-----------+ Total 11 13 24 45.83 54.17 100.00
28
Exhibit 10.2 (continued)
STATISTICS FOR TABLE OF SUCCESS BY SIZE Statistic DF Value Prob -------------------------------------------------------------------------- Chi-Square 1 13.594 0.000 Likelihood Ratio Chi-Square 1 15.407 0.000 Continuity Adj. Chi-Square 1 10.741 0.001
Statistic Value ASE -------------------------------------------------------------------------- Gamma 0.964 0.046 Kendall's Tau-b 0.753 0.133 Stuart's Tau-c 0.750 0.134 Somers' D C|R 0.750 0.134 Somers' D R|C 0.755 0.132
1
2
29
Exhibit 10.3: Logistic regression for categorical and continuous variables
Step 0. Intercept entered:
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized
Variable Estimate Error Chi-Square Chi-Square Estimate
INTERCPT 0 0.4082 0.0000 1.0000 .
Residual Chi-Square = 16.5512 with 2 DF (p=0.0003)
1
1a
30
Exhibit 10.3 (continued)
Analysis of Variables Not in the Model
Score Pr >Variable Chi-Square Chi-squareSIZE 13.5944 0.0002FP 13.8301 0.0002
Step 1. Variable FP entered:
Analysis of Variables Not in the Model
Score Pr > Variable Chi-Square Chi-Square SIZE 5.0283 0.0249
2
3
3a
31
Exhibit 10.3 (continued)
Step 2. Variable SIZE entered:
Criteria for Assessing Model Fit Intercept Intercept and Criterion Only Covariates Chi-Square for
Covariates AIC 35.271 17.789 . SC 36.449 21.323 . -2 LOG L 33.271 11.789 21.482 with 2 DF
(p=0.0001) Score . . 16.551 with 2 DF
(p=0.0003)
4
4a
32
Exhibit 10.3 (continued)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate
INTERCPT -4.4450 1.8432 5.8159 0.0159 . SIZE 3.0552 1.5981 3.6550 0.0559 0.857342 FP 1.9245 0.9116 4.4570 0.0348 1.139820
4b
33
Exhibit 10.3 (continued)
Association of Predicted Probabilities and Observed Responses
Concordant = 95.8% Somers' D = 0.917 Discordant = 4.2% Gamma = 0.917 Tied = 0.0% Tau-a = 0.478 (144 pairs) c = 0.958
NOTE: All explanatory variables have been entered into the model.
Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-
Square 1 FP 1 13.8301 . 0.0002 2 SIZE 2 5.0283 . 0.0249
4c
4d
34
Exhibit 10.3 (continued)
Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 9 3 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 10 14 24
Sensitivity= 75.0% Specificity= 91.7% Correct= 83.3% False Positive Rate= 10.0% False Negative Rate= 21.4%
5
35
Exhibit 10.3 (continued)
NOTE: An EVENT is an outcome whose ordered response value is 1.
OBS SUCCESS SIZE FP PHAT OBS SUCCESS SIZE FP PHAT 1 1 1 0.58 0.43202 13 2 1 2.28 0.95248 2 1 1 2.80 0.98199 14 2 0 1.06 0.08278 3 1 1 2.77 0.98094 15 2 0 1.08 0.08575 4 1 1 3.50 0.99525 16 2 0 0.07 0.01325 5 1 1 2.67 0.97699 17 2 0 0.16 0.01572 6 1 1 2.97 0.98695 18 2 0 0.70 0.04319 7 1 1 2.18 0.94297 19 2 0 0.75 0.04735 8 1 1 3.24 0.99220 20 2 0 1.61 0.20641 9 1 1 1.49 0.81421 21 2 0 0.34 0.02208 10 1 1 2.19 0.94400 22 2 0 1.15 0.09692 11 1 0 2.70 0.67939 23 2 0 0.44 0.02664 12 1 0 2.57 0.62265 24 2 0 0.86 0.05787
5a
36
Exhibit 10.4: Discriminant analysis for data in Table 10.1
Canonical Discriminant Functions
Pct of Cum Canonical After Wilks'
Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df Sig
: 0 .310367 24.570 2 .0000
1* 2.2220 100.00 100.00 .8304 :
* Marks the 1 canonical discriminant functions remaining in the analysis.
Unstandardized canonical discriminant function coefficients
Func 1
SIZE 1.8552118
FP .9162471
(Constant) -2.3834923
1
2
37
Exhibit 10.4 (continued)
Classification results -
No. of Predicted Group Membership Actual Group Cases 1 2--------------------- ------- -------------- --------------
Group 1 12 11 1 91.7% 8.3% Group 2 12 1 11 8.3% 91.7%
Percent of "grouped" cases correctly classified: 91.67%
3
38
Exhibit 10.5: Logistic Regression For Mutual Fund Data
Stepwise Selection Procedure
Criteria for Assessing Model Fit
1
Intercept
Intercept and
Criterion Only Covariates Chi-Square for Covariates
AIC 190.400 147.711 .
SC 193.327 165.275 .
-2 LOG L 188.400 135.711 52.689 with 5 DF (p=0.0001)
Score . . 44.034 with 5 DF (p=0.0001)
NOTE: All explanatory variables have been entered into the model.
1b1a
39
Exhibit 10.5 (continued)
Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-Square 1 YIELD 1 21.0379 . 0.0001 2 TOTRET 2 11.9103 . 0.0006 3 SIZE 3 8.5928 . 0.0034 4 SCHARGE 4 4.1344 . 0.0420 5 EXPENRAT 5 5.5516 . 0.0185
2
40
Exhibit 10.5 (continued)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square EstimateINTERCPT -2.5902 1.2642 4.1981 0.0405 .SIZE 0.8542 0.4773 3.2020 0.0735 0.236320SCHARGE -0.1394 0.0589 5.6088 0.0179 -0.302154EXPENRAT - 1.4361 0.6793 4.4699 0.0345 -0.321113TOTRET 0.8090 0.2509 10.3988 0.0013 0.402480YIELD 0.0553 0.0124 19.9669 0.0001 0.694773
3
41
Exhibit 10.5 (continued)
Association of Predicted Probabilities and Observed Responses
Concordant = 85.5% Somers' D = 0.711 Discordant = 14.4% Gamma = 0.712 Tied = 0.1% Tau-a = 0.351 (4661 pairs) c = 0.856
4
42
Exhibit 10.5 (continued)
Classification Table
Predicted EVENT NO EVENT Total +---------------------+ EVENT ] 45 14 ] 59 Observed ] ] NO EVENT ] 12 67 ] 79 +---------------------+ Total 57 81 138
Sensitivity= 76.3% Specificity= 84.8% Correct= 81.2% False Positive Rate= 21.1% False Negative Rate= 17.3%
5