Binary Regression Models ((((二分二分类类类类变变变变量...
Transcript of Binary Regression Models ((((二分二分类类类类变变变变量...
9
Binary Regression Models ((((二分二分二分二分类类类类变变变变量量量量模型模型模型模型)))) 2.1 Binary Outcomes
– binary dependent variables have two values, typically coded as 0 for a negative outcome (i.e., the event did not occur) and 1 as a positive outcome (i.e., the event did occur)
2.2 Binary Regression Models: explore how each independent variable affects the
probability of the event occurring – binary logit model(二分胜算对数模型) – binary probit model(二分概率单元模型)
2.3 The Statistical Model 2.3.1 A Latent Variable Model (隐性变量模型):assume a latent or unobserved variable y* ranging from -∞ to ∞
– For example (女性参与就业市场) • y is observed in two states:
a woman is in the labor force she is not in the labor force
• The idea of a latent y* is that:
There is an underlying propensity that generates the observed event We cannot directly observe y*, but at some point a change in y* results in
a change in what we observe
– As the number of young children in the family increases, a woman’s propensity to “work” would decrease. At some point the propensity crosses a threshold that results in a decision to leave the labor force
-∞ ∞
0=τ
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y
y*
10
– Structural model
– Measurement model: the link between the observed binary y and the latent y* is made with a simple measurement equation:
• τ is called the threshold (临界点)or cutpoint(分界点) • the distribution of y* is shown by the bell-shaped curves which should be
thought of as coming out of the figure into a third dimension • when y* is larger than τ, indicated by the shaded area, we observe y=1. • Since y* is continuous, the model avoids the problems encountered with the
LPM. However, since the dependent variable is unobserved, the model cannot be estimated with OLS. Instead, we use ML estimation, which requires assumptions about the distribution of the errors.
• In general, the probit model and the logit model provide similar results except that the probit model assumes that error terms have a normal distribution and the logit model assumes that error terms have a logistic distribution.
• the size of the coefficients of the logit model is 1.7 to 1.8 larger than those of
the probit model
iii xy εβ +=*
≤>
=0 if 0
0 if 1*
*
i
i
i y
yy
11
2.3.2 A Nonlinear Probability Model(非线性概率模型): The BRM can also be derived without appealing to latent variable through specifying a nonlinear model relating the xs to the probability of an event
( )( )
( )( )x
x
x
x
1Pr1
1Pr
0Pr
1Pr)(
=−=
===
=y
y
y
yxΩΩΩΩ
since the probability is transformed into odds(胜算,或概率比), it indicates how often something happens (y=1) relative to how often it does not happen (y=0), and the values range from 0 when ( ) 0x|1yPr == to ∞ when ( ) 1x|1yPr == . The log (自然对
数)of the odds, or logit(胜算对数), ranges from -∞ to ∞. This suggests a model that is linear in the logit:
( )( )
kk xxx
y
yx
ββββ ++++==
=−=
=
...
1Pr1
1Prln)(ln
22110xβ
x
xΩΩΩΩ
2.3.2.1什么是胜算(odds)?
胜算是某事件发生的概率与该事件不发生的概率的比率
胜算( ΩΩΩΩ )=
概率(p)=
胜算=4,发生的概率为 0.8,不发生的概率为 0.2;发生的概率为不发生
的概率的 4倍 中文:五五波,六四开
– 胜算与概率相似处:
− 39
13.,.
1ge
p
p
i
i
5213
.,.geN
f
12
都以 0为最低值,无负数 当胜算与概率的值增大时,都表示某事件愈可能发生;概率愈大,胜算愈
大;概率愈小,胜算愈小
− 胜算与概率不同处: 胜算以该事件不发生的概率为分母,概率以所有发生与不发生的事件为分
母 概率最大值为 1 ,胜算可达无穷大,消除上界为 1的问题 概率趋近于 1时,概率少许的变动会对胜算有很大的影响
ip 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 ( )ip−1
0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01 胜算 0.01 0.11 0.25 0.43 0.67 1.00 1.50 2.33 4.00 9.00 99.0
odds >1,发生的概率大于不发生的概率 0< odds <1,发生的概率小于不发生的概率 odds =1,发生的概率等于不发生的概率
– 胜算与概率的关系:
( )
( )
i
ii
iii
iiii
iiii
iii
i
ii
p
p
pp
pp
pp
p
p
ΩΩΩΩΩΩΩΩ
ΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩ
ΩΩΩΩ
ΩΩΩΩ
+=
+=+=−=
−×=−
=
1
1
1
1
2.3.2.2胜算比(odds ratio): 一个胜算可与另一个胜算来比较
22
11
2
1
1
1
pp
pp
−−==
ΩΩΩΩΩΩΩΩθ
13
如果胜算比大于 1,表示第一个事件组的胜算大于第二个事件组的胜算。胜算比
愈大,第一个事件组的胜算愈大。如果胜算比小于 1,第一个事件组的胜算小于
第二个事件组的胜算。 2.3.2.3什么是自然胜算对数(logit, the natural log of odds)? 自然胜算对数就是将胜算( iii pp −= 1ΩΩΩΩ )取自然对数,即
( )iii pp −= 1lnln ΩΩΩΩ
ip 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 ( )ip−1
0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01 胜算 0.01 0.11 0.25 0.43 0.67 1.00 1.50 2.33 4.00 9.00 99.0 自然
胜算
对数 -4.60 -2.20 -1.39 -0.85 -0.41 0.00 0.41 0.85 1.39 2.20 4.60
从正无穷大到负无穷大,消除概率上下限为 0的问题
以概率为 0.5为界,成对称型
等量的概率变化会造成自然胜算对数不等量的改变
– 总之,
胜算大于 1,自然胜算对数为正值
胜算等于 1,自然胜算对数为 0
胜算大于 0但小于 1,自然胜算对数为负值
自然胜算对数以 5.0=ip 为界,成对称型
概率为 1或 0,相对的自然胜算对数不存在
14
2.3.2.4二分胜算对数模型
( )( )
kk xxx
y
yx
ββββ ++++==
=−=
=
...
1Pr1
1Prln)(ln
22110xβ
x
xΩΩΩΩ
• logit transformation linearizes the nonlinear relations • the model is linear in logit, the same change in x have constant effects • the logit implies a nonlinear relations between x and the probabilities
上式可在两边取指数函数(exponent),左侧以胜算表示依变量,右侧为自变
量。等式右侧呈现相乘关系,表示依变量与自变量是非线性关系。所以,二分
胜算对数模型将变量间的非线性关系,以自然胜算对数将之线性化(linear in logit)。
( )( )
kk
k
xxx
xxx
eeee
ey
y
ββββ
ββββ
...
)exp(1Pr1
1Pr
22110
22110 ...
=
===−
= ++++xβx
x
– Example: Labor force participation of women (file name: binlfp2 ) The sample consists of 753 white, married women between the ages of 30 and 60. The dependent variable lfp equals 1 if a woman is employed and 0 otherwise. The hypothesis is that the probability of a married woman in the paid labor force is affected by a series of factors, including the number of younger children (K5), the number of older children (K618), the wife’s age (age), the wife’s education (wc), the husband’s education (hc), log of wife’s estimated wage rate (lwg) and family income excluding wife’s wages (inc). The model to be estimated is
inclwghc
wcageKKFlfp
inclwghc
wcagekk
ββββββββ
+++++++== 6185()1Pr( 61850
− Estimation with STATA (a do-file)
15
*logit command, suppressing iteration log and stori ng the results logit lfp k5 k618 age wc hc lwg inc, nolog
(or statistics/binary outcomes/logistic regression)
estimates store logit (or statistics/general post-estimation/manage estimation results/store estimation results)
*probit command, suppressing iteration history by t he nolog option and storing the results probit lfp k5 k618 age wc hc lwg inc, nolog
(or statistics/binary outcomes/probit regression) estimates store probit *creating a table that combines the results estimates table logit probit, b(%9.3f) se label var width(30)
(or statistics/general post-estimation/manage estimation results/table of estimation results)
Output page . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates N umber of obs = 753 L R chi2(7) = 124.48 P rob > chi2 = 0.0000 Log likelihood = -452.63296 P seudo R2 = 0.1209 --------------------------------------------------- -------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- -------------------------- k5 | -1.462913 .1970006 -7.43 0.0 00 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.3 42 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.0 00 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.0 00 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.5 88 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.0 00 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.0 00 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.0 00 1.919188 4.445092 --------------------------------------------------- --------------------------- . estimates store logit . . *probit command, suppressing iteration history by the nolog option and storing > the results . probit lfp k5 k618 age wc hc lwg inc, nolog Probit estimates N umber of obs = 753 L R chi2(7) = 124.36 P rob > chi2 = 0.0000 Log likelihood = -452.69496 P seudo R2 = 0.1208
16
--------------------------------------------------- -------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- -------------------------- k5 | -.8747112 .1135583 -7.70 0.0 00 -1.097281 -.6521411 k618 | -.0385945 .0404893 -0.95 0.3 40 -.117952 .0407631 age | -.0378235 .0076093 -4.97 0.0 00 -.0527375 -.0229095 wc | .4883144 .1354873 3.60 0.0 00 .2227642 .7538645 hc | .0571704 .1240052 0.46 0.6 45 -.1858754 .3002161 lwg | .3656287 .0877792 4.17 0.0 00 .1935847 .5376727 inc | -.020525 .0047769 -4.30 0.0 00 -.0298875 -.0111626 _cons | 1.918422 .3806536 5.04 0.0 00 1.172355 2.66449 --------------------------------------------------- --------------------------- . estimates store probit . *creating a table that combines the results . estimates table logit probit, b(%9.3f) t label va rwidth(30) --------------------------------------------------- ----- Variable | logit probi t -------------------------------+------------------- ----- # kids < 6 | -1.463 -0. 875 | 0.197 0. 114 # kids 6-18 | -0.065 -0. 039 | 0.068 0. 040 Wife's age in years | -0.063 -0. 038 | 0.013 0. 008 Wife College: 1=yes 0=no | 0.807 0. 488 | 0.230 0. 135 Husband College: 1=yes 0=no | 0.112 0. 057 | 0.206 0. 124 Log of wife's estimated wages | 0.605 0. 366 | 0.151 0. 088 Family income excluding wife's | -0.034 -0. 021 | 0.008 0. 005 Constant | 3.182 1. 918 | 0.644 0. 381 --------------------------------------------------- ----- legend: b/se
2.3.4 Hypothesis Testing with test and lrtest – If the assumptions of the model hold, the ML estimators are distributed
asymptotically normally and thus can be tested with the corresponding z-statistics in the output.
– The t-test and z-test are the same when the N is large. – The z-test included in the output of estimation command is a Wald test, which
can be computed using test . For example, to test 0: 50 =kH β , *significant test of a single coefficient test k5
(or statistics/general post-estimation/tests/tests parameters)
17
Output page .test k5 ( 1) k5 = 0 chi2( 1) = 55.14 Prob > chi2 = 0.0000
The effect of having young children on the probability of entering the labor force is significant at the .01 level
– test can be used to test multiple coefficients. For example, to test
0:0 == hcwcH ββ ,
*significant test of multiple coefficients test hc wc
Output page .test hc wc ( 1) hc = 0 ( 2) wc = 0 chi2( 2) = 17.66 Prob > chi2 = 0.0001
The hypothesis that the effects of the husband’s and the wife’s education are simultaneously equal to zero can be rejected at the .01 level
– Comparing competitive (nested) models using LR test: the LR test assesses a hypothesis by comparing the log likelihood from the full model (MU) (i.e., the model that does not include the constraints implied by H0) and a restricted model (MC) that imposes the constraints. If the constraints significantly reduce the log likelihood, the H0 (that the imposed constraints are true) is rejected.
• Consider the following models:
M1: )()|1Pr( 443322110 xxxxxy βββββ ++++Λ==
M2: )()|1Pr( 22110 xxxy βββ ++Λ==
M3: )()|1Pr( 44220 xxxy βββ ++Λ==
18
• The LR test can be defined as follows: 1. Mc with parameters βc is nested in Mu with βu. 2. H0 is that the constraints imposed to create Mc are true. 3. L(M u) is the value of the likelihood function at the unconstrained
estimates. 4. L(M c) is the value at the constrained estimated. 5. The likelihood ratio statistic equals
)(ln2)(ln2)|(2CUUC MLMLMMG −=
6. Under very general conditions, if H0 is true, then G2 is asymptotically
distributed as chi-square with degrees of freedom equal to the number of independent constraints
• Assume that a restricted model (without k5 and k618 , or
0: 61850 ==kk
H ββ ) is tested against a full model *comparing nested models using an LR test logit lfp k5 k618 age wc hc lwg inc, nolog estimates store fmodel logit lfp age wc hc lwg inc, nolog estimates store rmodel lrtest fmodel rmodel
(or statistics/general post-estimation/tests/likelihood-ratio test)
Output page . *comparing nested models using an LR test . logit lfp k5 k618 age wc hc lwg inc, nolog ( output omitted) . estimates store fmodel . . logit lfp age wc hc lwg inc, nolog ( output omitted) . estimates store rmodel . . lrtest fmodel rmodel likelihood-ratio test LR chi2(2) = 66.49 (Assumption: rmodel nested in fmodel) Prob > chi2 = 0.0000
19
Since the LR chi-square is statistically significant, we reject the null hypothesis that the coefficients of k5 and k618 are simultaneously equal to
zero (or, reject 0: 61850 ==kk
H ββ ) When conducting LR test: 1) the two models must be nested (i.e., a nested model is created by imposing constraints on the coefficients in the prior model), and 2) the two models must be fitted on exactly the same sample.
2.4 Interpretation: In general, the estimated parameters from the BRM do not provide directly useful information for understanding the relationship between the dependent and independent variables. Substantively meaningful interpretations are thus based on predicted probabilities and functions of those probabilities (e.g., ratios, differences). There are four approaches:
- Predictions can be computed for each observation in the sample using predict
- Predicted values for substantively meaningful “profiles” of the independent variables can be compared using prvalue or prtab
- The marginal or discrete change in the outcome can be computed at representative value of the independent variables using prchange
- The nonlinear model can be transformed to a model that is linear in some other outcome and listcoef can be used
2.4.1 Interpretation with Predicted Probability using predict : *logit command, suppressing iteration log logit lfp k5 k618 age wc hc lwg inc, nolog predict prlogit
(or statistics/general post-estimation/obtain predictions, residuals)
probit lfp k5 k618 age wc hc lwg inc, nolog predict prprobit summarize pr* *Correlating prlogit and prprobit pwcorr prlogit prprobit
Output page
. *logit command, suppressing iteration log . logit lfp k5 k618 age wc hc lwg inc, nolog
20
Logit estimates N umber of obs = 753 L R chi2(7) = 124.48 P rob > chi2 = 0.0000 Log likelihood = -452.63296 P seudo R2 = 0.1209 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- k5 | -1.462913 .1970006 -7.43 0.0 00 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.3 42 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.0 00 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.0 00 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.5 88 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.0 00 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.0 00 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.0 00 1.919188 4.445092 --------------------------------------------------- --------------------------- . predict prlogit (option p assumed; Pr(lfp)) . probit lfp k5 k618 age wc hc lwg inc, nolog Probit estimates N umber of obs = 753 L R chi2(7) = 124.36 P rob > chi2 = 0.0000 Log likelihood = -452.69496 P seudo R2 = 0.1208 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- k5 | -.8747112 .1135583 -7.70 0.0 00 -1.097281 -.6521411 k618 | -.0385945 .0404893 -0.95 0.3 40 -.117952 .0407631 age | -.0378235 .0076093 -4.97 0.0 00 -.0527375 -.0229095 wc | .4883144 .1354873 3.60 0.0 00 .2227642 .7538645 hc | .0571704 .1240052 0.46 0.6 45 -.1858754 .3002161 lwg | .3656287 .0877792 4.17 0.0 00 .1935847 .5376727 inc | -.020525 .0047769 -4.30 0.0 00 -.0298875 -.0111626 _cons | 1.918422 .3806536 5.04 0.0 00 1.172355 2.66449 --------------------------------------------------- --------------------------- . predict prprobit (option p assumed; Pr(lfp)) . summarize pr* Variable | Obs Mean Std. Dev. Min Max -------------+------------------------------------- ------------------- prlogit | 753 .5683931 .1944213 .0139875 .9621198 prprobit | 753 .5705144 .1928416 .005691 .9744748
21
predict can be used to examine the range of predicted probabilities from the model. For instance, after storing the predicted probabilities in the corresponding variables, prlogit and prprobit , summarize shows the range of the predicted probabilities are from .014 to .962 and .006 to .974 respectively.
predict can also be used to demonstrate that the predictions from logit and probit models are essentially identical. As the following output shows,
Output page . *Correlating prlogit and prprobit . pwcorr prlogit prprobit | prlogit prprobit -------------+------------------ prlogit | 1.0000 prprobit | 0.9998 1.0000
2.4.2 Individual Predicted Probabilities with prvalue : a table of probabilities for ideal types of people (or countries, cows, etc.) can quickly summarize the effects of key variables. E.g., in the example of labor force participation Young, low income and low education families with young children Highly educated, middle aged couples with no children An “average family” defined as having the mean on all variables
*logit command, suppressing iteration log logit lfp k5 k618 age wc hc lwg inc, nolog *Individual Predicted Probabilities with prvalue *Young, low income and low education families with young children prvalue, x(age=35 k5=2 wc=0 hc=0 inc=15) rest(mean)
(or statistics/general post-estimation/table of adjusted means and proportions)
*Highly educated, middle aged couples with no child ren prvalue, x(age=50 k5=0 k618=0 wc=1 hc=1) rest(mean) *An “average family” defined as having the mean on all variables prvalue, rest(mean)
Output page *Young, low income and low education families with young children . prvalue, x(age=35 k5=2 wc=0 hc=0 inc=15) rest(mea n)
22
logit: Predictions for lfp Pr(y=inLF|x): 0.1318 95% ci: (0.0723,0.22 82) Pr(y=NotInLF|x): 0.8682 95% ci: (0.7718,0.92 77) k5 k618 age wc hc lwg inc x= 2 1.3532537 35 0 0 1.0971148 15 . . *Highly educated, middle aged couples with no chi ldren . prvalue, x(age=50 k5=0 k618=0 wc=1 hc=1) rest(mea n) logit: Predictions for lfp Pr(y=inLF|x): 0.7166 95% ci: (0.6266,0.79 21) Pr(y=NotInLF|x): 0.2834 95% ci: (0.2079,0.37 34) k5 k618 age wc hc lwg inc x= 0 0 50 1 1 1.0971148 20.128965 . . *An "average family" defined as having the mean o n all variables . prvalue, rest(mean) logit: Predictions for lfp Pr(y=inLF|x): 0.5778 95% ci: (0.5388,0.61 59) Pr(y=NotInLF|x): 0.4222 95% ci: (0.3841,0.46 12) k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 .2815405 .39 176627 1.0971148 20.128965
With these predictions, we can summarize the results on factors affecting a wife’s labor force participation
Table of Predicted Probabilities at Selected Values Ideal Types Probabilities of LFP Young, low income and low education families with young children
.013
Highly educated, middle aged couples with no children
.72
An “average family” defined as having the mean on all variables
.58
23
2.4.3 Individual Predicted Probabilities with prtab : if the focus is one two or three categorical independent variables. Predictions for all combinations of the categories of these variables could be presented as
Table of Predicted Probabilities at Selected Values
The above table is created by prtab k5 wc, rest(mean)
Output page . *Individual Predicted Probabilities with prtab . prtab k5 wc, rest (mean) logit: Predicted probabilities of positive outcome for lfp ---------------------------- | Wife College: # kids < | 1=yes 0=no 6 | NoCol College ----------+----------------- 0 | 0.6069 0.7758 1 | 0.2633 0.4449 2 | 0.0764 0.1565 3 | 0.0188 0.0412 ---------------------------- k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 .2815405 .39 176627 1.0971148 20.128965 2.4.4 Changes in Predicted Probabilities with prchange : this command computes
the marginal/discrete change at the values of the independent variables specified with x() or rest(). For instance,
*computing marginal change with all variables at th eir mean prchange, rest (mean)help *the above command can be supplemented by fromto op tion
Number of Young Children
Predicted Probability
Did not Attend Attended College Difference
0 .61 .78 0.17 1 .26 .44 0.18 2 .08 .16 0.08 3 .02 .04 0.02
24
prchange, fromto *(output omitted)
Output page prchange, rest (mean) help logit: Changes in Predicted Probabilities for lfp min->max 0->1 -+1/2 -+sd/2 MargE fct k5 -0.6361 -0.3499 -0.3428 -0.1849 -0.3 569 k618 -0.1278 -0.0156 -0.0158 -0.0208 -0.0 158 age -0.4372 -0.0030 -0.0153 -0.1232 -0.0 153 wc 0.1881 0.1881 0.1945 0.0884 0.1 969 hc 0.0272 0.0272 0.0273 0.0133 0.0 273 lwg 0.6624 0.1499 0.1465 0.0865 0.1 475 inc -0.6415 -0.0068 -0.0084 -0.0975 -0.0 084 NotInLF inLF Pr(y|x) 0.4222 0.5778 k5 k618 age wc hc lwg inc x= .237716 1.35325 42.5378 .281541 .391766 1.09711 20.129 sd(x)= .523959 1.31987 8.07257 .450049 .488469 .587556 11.6348 Pr(y|x): probability of observing each y for speci fied x values Avg|Chg|: average of absolute value of the change a cross categories Min->Max: change in predicted probability as x chan ges from its minimum to its maximum 0->1: change in predicted probability as x chan ges from 0 to 1 -+1/2: change in predicted probability as x chan ges from 1/2 unit below base value to 1/2 unit above -+sd/2: change in predicted probability as x chan ges from 1/2 standard dev below base to 1/2 standard dev above MargEfct: the partial derivative of the predicted p robability/rate with respect to a given independent variable
varying age (age) from its minimum of 30 to its maximum of 60 decreases the predicted probability by .44
changing family income (inc) from its minimum to its maximum decreases the probability of a woman being in the labor force by .64
for a woman who is average on all characteristics, an additional young child decreases the probability of employment by .3428
if a woman attends college, her probability of being in the labor force is .1881 greater than a woman who does not attend college, holding all other variables at their mean.
2.4.5 Interpretation using Odds Ratios (胜算比) with listcoef :
- For binary outcomes, we typically consider the odds of observing a positive outcome versus a negative one:
25
)1Pr(1
)1Pr(
)0Pr(
)1Pr(
=−==
===Ω
y
y
y
y
The log of odds is logit and the model is linear in the logit
kk xxxy
yx ββββ ++++=
=−==Ω ...
)1Pr(1)1Pr(
ln)(ln 22110
- Odds Ratio
• Taking the exponential (指数)of the logit equation:
[ ]kk
k
xxx
xxx
eeee
exxxββββ
βββββ...
)exp()(lnexp)(22110
22110 ...
===Ω=Ω ++++
• Let x2 change by 1:
kk
k
xxx
xxx
eeeee
exxxβββββ
βββββ...
)exp()1,(222110
22110 ...)1(
2
===+Ω +++++
• To compare the odds before and after the change of x2, we take odds ratio:
2
22110
222110
...
...
),(
)1,(
2
2 βββββ
βββββ
eeeee
eeeee
xx
xxkk
kk
xxx
xxx
==Ω
+Ω
• For a unit change in xk, the odds are expected to change by a factor of
exp(βk), holding all other variables constant For exp(βk) >1, the odds are “exp(βk) times larger” For exp(βk) <1, the odds are “exp(βk) times smaller” The effect does not depend on the level of xk or the level of any other variable.
Factor change (倍数改变)in odds ratios for both a unit change and a
standard deviation change of the independent variables can be obtained with listcoef :
* obtaining factor change in odds ratios listcoef, help
26
Output page * obtaining factor change in odds ratios listcoef, help logit (N=753): Factor Change in Odds Odds of: inLF vs NotInLF --------------------------------------------------- ------------------- lfp | b z P>|z| e^b e^bStdX SDofX -------------+------------------------------------- ------------------- k5 | -1.46291 -7.426 0.000 0.2316 0.4646 0.5240 k618 | -0.06457 -0.950 0.342 0.9375 0.9183 1.3199 age | -0.06287 -4.918 0.000 0.9391 0.6020 8.0726 wc | 0.80727 3.510 0.000 2.2418 1.4381 0.4500 hc | 0.11173 0.542 0.588 1.1182 1.0561 0.4885 lwg | 0.60469 4.009 0.000 1.8307 1.4266 0.5876 inc | -0.03445 -4.196 0.000 0.9661 0.6698 11.6348 --------------------------------------------------- ------------------- b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD i ncrease in X SDofX = standard deviation of X
For each additional young child, the odds of being employed decrease by a factor of 0.23, holding all other variables constant (在其它变项不变的情
形下,每增加一个小孩,已婚妇女参与就业市场的胜算(或概率比)就
会减少 0.23倍). For a standard deviation increase in the log of the wife’s expected wages, the odds of being employed are 1.43 times greater, holding all other variables constant. The interpretation of the odds ratio assumes that the other variables are held constant, but it does not require that they be held at any specific values. It is important to know that a constant factor change in the odds does not correspond to a constant change or constant factor change in the probability. For example, if the odds are 1/1 (probability=.5) and double to 2/1 (probability=0.333), the probability increases by 0.167
– Percent change in odds ratio can also be obtained with listcoef :
( ) 1exp100 −×δβk which is listed by lisfcoef with the percent option
27
Output page listcoef, percent logit (N=753): Percentage Change in Odds Odds of: inLF vs NotInLF --------------------------------------------------- ------------------- lfp | b z P>|z| % %StdX SDofX -------------+------------------------------------- ------------------- k5 | -1.46291 -7.426 0.000 -76.8 -53.5 0.5240 k618 | -0.06457 -0.950 0.342 -6.3 -8.2 1.3199 age | -0.06287 -4.918 0.000 -6.1 -39.8 8.0726 wc | 0.80727 3.510 0.000 124.2 43.8 0.4500 hc | 0.11173 0.542 0.588 11.8 5.6 0.4885 lwg | 0.60469 4.009 0.000 83.1 42.7 0.5876 inc | -0.03445 -4.196 0.000 -3.4 -33.0 11.6348 --------------------------------------------------- -------------------
For each additional young child, the odds of being employed decrease by 76.8%, holding all other variables constant (在其它变项不变的情形下,每
增加一个小孩,已婚妇女参与就业市场的胜算(或概率比)就会减少76.8%) For a standard deviation increase in the log of the wife’s expected wages, the odds of being employed increased 42.7%, holding all other variables constant
2.5 Residuals and Influence Using Predict 2.5.1 Detecting Residuals (残差值) − Residuals are the difference between a model’s predicted and observed outcome for
each observation in the sample. When the differences are large, the observations are considered outliers(离群值). For a binary model, define
)|1Pr()|( 1xyxyE iiii ===π
− The deviation iiy π− are heteroscedastistic, with
)1()|()|( iiiiiii xyVarxyVar πππ −==− − Which suggests the Pearson residual
)ˆ1(ˆ ii
iii
yr
πππ−
−=
28
− Large values of ir suggest a failure of the model to fit a given observation − Pregibon proposed the standardized Pearson residual:
)( i
ii
std
rVar
rr =
− An index plot is a useful way to examine residuals by simply plotting residuals against the observation number.
− There is no hard-and-fast rule for what counts as a “large” residual. One way to search for problematic residuals is to sort the residuals by the value of a variable that you think may be a problem for the model. If this variable is primarily responsible for the lack of fit of some observations, the plot would show a disproportionate number of cases with large residuals. The example below sorts the data by income before plotting.
*compute standardized residuals and plotting them a gainst index logit lfp k5 k618 age wc hc lwg inc, nolog predict rstd, rs
(or statistics/general post-estimation/obtain predictions, residuals)
sort inc generate index=_n graph twoway scatter rstd index, msymbol(none) mlab el(index) list rstd index if rstd>2.5|rstd<-2.5 Output page . //Residuals and influence using predict . logit lfp k5 k618 age wc hc lwg inc, nolog (output omitted) . predict rstd, rs . sort inc . generate index=_n . graph twoway scatter rstd index, msymbol(none) ml abel(index)
29
1
2
3
4567
8
9
10
11
121314
15
16
17
18
19
20
21
22
23
242526
27
282930
31
323334
35
36
3738
39
4041
42
43
44454647
48
49
50515253
54
5556
5758
59
6061
626364
65
6667
686970
71
727374
75
7677
78
79
80
8182
8384
85
86
8788
89
9091
92
93949596
97
9899100101
102
103104
105106107
108
109110111112113
114
115116117118
119
120
121122
123
124
125
126127128
129130
131132133134
135
136
137
138
139140141
142
143
144
145146
147
148149
150151152
153
154155156
157
158159
160161
162
163164
165
166
167
168169
170171172173
174
175
176
177178179180181
182
183
184
185
186
187
188
189
190191
192
193
194195196197198199
200201
202
203204
205206
207
208209
210
211
212
213214
215
216
217
218
219220221222223
224
225226
227228
229230231
232
233234235236237
238
239240
241
242243244
245
246247
248
249
250
251
252
253
254
255256
257
258
259
260
261262
263
264
265
266
267
268269270
271
272
273274
275
276
277
278279
280281
282
283
284
285
286
287
288
289290291
292
293
294295
296
297
298
299
300
301
302303304305
306307
308
309
310
311
312
313
314
315
316
317318319
320321
322323
324
325
326
327
328
329
330
331
332
333334
335
336337338339340
341
342
343
344
345
346347348
349
350
351
352
353
354355356
357
358359360
361
362363364365
366367368
369
370
371
372373
374375376
377
378
379380
381
382
383
384
385
386
387
388
389
390
391392393
394
395
396
397
398
399
400
401
402403
404405
406
407
408
409
410411
412
413
414415
416
417418419
420
421
422
423
424425
426
427
428
429
430
431432
433
434435
436
437
438
439
440441442443
444
445446447
448
449
450451
452
453
454
455
456
457
458
459460461
462
463
464
465
466467468469470
471472
473
474
475
476
477
478
479
480
481482
483
484485
486487488
489490491
492493
494
495
496
497
498499500501
502
503504
505
506
507
508
509510511
512
513
514
515
516517
518
519520
521522
523
524
525
526
527
528
529
530531
532
533534
535536
537538
539540541
542543
544
545546
547
548
549
550
551
552
553
554
555
556557
558
559
560561
562
563
564
565
566
567
568
569
570571
572573574
575
576577
578579
580
581
582
583
584
585
586
587
588589590
591
592
593
594595596
597
598599
600601602
603
604
605
606
607608
609610611
612
613
614
615
616617
618
619
620
621622
623
624
625
626
627
628
629
630
631632
633634635
636
637
638
639
640
641
642
643
644
645
646
647648
649
650651652
653
654655
656
657658
659
660661
662
663
664
665666
667
668669
670
671672
673
674
675
676677
678679680
681
682
683
684
685
686
687
688
689690691
692
693
694695
696697
698
699
700
701
702
703704705
706
707
708709710
711712
713714715
716
717
718719720
721
722
723
724
725
726
727
728
729730
731
732
733
734
735
736
737738
739
740741742
743
744745746
747
748749
750
751
752
753
-4-2
02
4st
anda
rdiz
ed P
ears
on r
esid
ual
0 200 400 600 800index
The above cases can be listed using the command list .
. list rstd index if rstd>2.5|rstd<-2.5 +-------------------+ | rstd index | |-------------------| 142. | 3.191524 142 | 345. | 2.873378 345 | 514. | -2.677243 514 | 554. | -2.871972 554 | 752. | 3.192648 752 | +-------------------+
2.5.2 Detecting Influential Cases(重要观察值)
− Influential cases are also called high-leverage points, which have a strong influence on the estimated parameters. They can be determined by examining
the change in the estimated β that occurs when the ith observation is deleted. This is the counterpart to Cook’s distance for the linear regression model, which is the dbeta in Stata and is defined as
2
2
)1( ii
iiii h
hrC
−=
* Detecting influential cases predict cook, dbeta
(or statistics/general post-estimation/obtain predictions, residuals)
30
generate index=_n graph twoway scatter cook index, msymbol(none) mlab el(index) The commands produce the following plot, which shows that cases 338, 534, and 635 merit further examination:
373340
108669
597540539114
466
13 229
459 513 708291 723362
6086394
159 302
44 566542421168704473 519
29 380543 7507 689
290
87 277706367
502
34
9073 29342
191
48420 538371 5818 369 699268217 732
14817679647564
729233214 429 474 676200 531524 571
406
379 431299
747
480576363123 176
52070073044650
632583147 440 7484144323
500451 59534561
4 29221694 207
680
734717522378 548
419
494667549 69147 66626996 678
334240 505407388
27652 653631 746
753536313 462
184
557 600450
534
13078 716523220 357 626
356 69043616
683611418
83751673
163115
22 681
672
448119
69430 132 389 688472 588 69856138 593336 658546335
70
744317
62
491213 423510
88 183 585310 580343 478 594437 56395 139481
188224
267281149 535498260721410
664
675
447
402
182 610545425374 467 578 69723613518
104 400 58450865 737735709718484232
643
64235590
442333193 370 630361
298
702
37756 715254 740341
166
434245
638
243103
489
348259 507174 3125866741145
488 692218725
331
138303318105
62398
150686
713415309 443 541 724
272
40964 316 736
172
604257 326197 607
727613562
288
187 246
338
625304
63740 486137 171
701
38751647738222563 438256
314705279
514215 719134722
372296
417 614616 726602238 649567199
662
720565
396
376226 475401189 745622582258 511 68745372 441 731587 714
30519521366
73843914168 412 645671204 636454 551
628
416212 560 684
20
555308
677
106 483 547330
222
86
235
392275 663
537496
127 227 359 624280
354
211
463156140 295 364180
283 460244237
196
449307265 347
247
36
37
509131 521517
530 621178 577452160 695
194 250
612 752659
742297 381 464 596
55
360177129
710
67749234
164126395
146 282285550
311 589365 527499 556 655615
422 558 696618
251646
294 66880
651
353 648242
346
53353 325170 707591 685569 620476 573165 458154 206 25311
328321
231
192
320
255
6933
118102 712122
60
629125202
544
27142
428 634
57601
52828 711
155
116329 733
323143
45
169161 44435018610 151 398 619
324424 572
121
627
641
266332
657228
252
99 327 552
46
209 504 74343405100
101263 445
239 512 739670173162 306 3492 273 506393
342
181 570644
485
351261
497
579 652532
92
3919 456654411
24433 617 656337 529
274
205 6821128579 455
471479
81
609
592 703728603300 59851 606403
221
45719
153 526
76
41
51 219
693492413284
262
660
52531
48291179
518157 344203230 48759 175 315
661
8449 38612413343097 210
31914 559
82 208190 501397
117 339 515111
674
554198 278599553384
287 426120
286136
568
35875 465390
633
383
493
20154110
241
399322
113
301
40418510977
89575
468355
503
495605
27112 289
408
490248
167 249
427
639
375
461
574
586223
15665640
435
32
264
15271
39 385
469
470
93144
128
352
7423
158
650
27026
368
25
635
1070.1
.2.3
Pre
gibo
n's
dbet
a
0 200 400 600 800index
2.6 Scalar Measures of Fit (适合度)using fitstat − A scalar measure of fit can be used to summarize the overall fit of a model. − A scalar measure of fit can be useful in comparing competing models. Within a
substantive area, measures of fit provide a rough index of whether a model is adequate. However, there is no convincing evidence that selecting a model that maximizes the value of a given measure of fit results in a model that is optimal in any sense other than the model having a larger value of that measure. Measures of fits thus only provide partial information that must be assessed within the context of the theory motivating the analysis.
2.6.1 Pseudo-R2 Based on R2 in the LRM − Several pseudo-R2’s for models with CDVs have been defined by analogy to the
formula for R2 in the LRM: McFadden’s R2, also know as the “likelihood-ratio index”, compares a model with just the intercept to a model with all parameters. It is defined as
31
)(ˆln
)(ˆln12
Intercept
FullMcF
ML
MLR −=
Maximum Likelihood R 2
N
Full
InterceptML
ML
MLR
2
2
)(
)(1
−=
Cragg & Uhler’s R2
N
Intercept
NFullIntercept
UC
ML
MLMLR 2
2
&2
)(1
)()(1
−
−=
Efron’s R2
∑∑
−−
−=2
22
)(
)ˆ(1
yy
yR
i
iiEfron
π
2.6.2 Information Measures
AIC: Akaike’s information criterion is defined as NPMLAIC k 2(ˆln2 +−= . Other things being equal, the model with smaller AIC is considered the better-fitting model. BIC: The Bayesian information criterion is defined as NdfMDBIC kkk ln)( −= . The more negative the BIC, the better the fit. When comparing two models, if BIC1 - BIC2 <0, then the first model is preferred. If BIC1 - BIC2 >0, then the second model is preferred. − Example: Consider two models,
• M1 has the original specification of independent variables • M2 adds a squared age term AGE2 and drops the variables K618, HC and
LWG
32
* Scalar Measures of Fit using fitstat gen age2=age*age logit lfp k5 k618 age wc hc lwg inc, nolog fitstat, save logit lfp k5 age age2 wc inc, nolog fitstat, dif
Output page . * Scalar Measures of Fit using fitstat . quietly logit lfp k5 k618 age wc hc lwg inc, nolo g . estimates store model1 . fitstat, save Measures of Fit for logit of lfp Log-Lik Intercept Only: -514.873 Log-Lik Fu ll Model: -452.633 D(745): 905.266 LR(7): 124.480 Prob > LR: 0.000 McFadden's R2: 0.121 McFadden's Adj R2: 0.105 Maximum Likelihood R2: 0.152 Cragg & Uh ler's R2: 0.204 McKelvey and Zavoina's R2: 0.217 Efron's R2 : 0.155 Variance of y*: 4.203 Variance o f error: 3.290 Count R2: 0.693 Adj Count R2: 0.289 AIC: 1.223 AIC*n: 921.266 BIC: -4029.663 BIC': -78.112 (Indices saved in matrix fs_0) . gen age2=age*age . quietly logit lfp k5 age age2 wc inc, nolog . estimates store model2 . estimates table model1 model2, b(%9.3f) t -------------------------------------- Variable | model1 model2 -------------+------------------------ k5 | -1.463 -1.380 | -7.43 -7.06 k618 | -0.065 | -0.95 age | -0.063 0.057 | -4.92 0.50 wc | 0.807 1.094 | 3.51 5.50 hc | 0.112 | 0.54 lwg | 0.605 | 4.01 inc | -0.034 -0.032 | -4.20 -4.18 age2 | -0.001 | -1.00 _cons | 3.182 0.979 | 4.94 0.40 -------------------------------------- legend: b/t
33
. fitstat, dif Measures of Fit for logit of lfp Current Sav ed Difference Model: logit log it N: 753 7 53 0 Log-Lik Intercept Only: -514.873 -514.8 73 0.000 Log-Lik Full Model: -461.653 -452.6 33 -9.020 D: 923.306(747) 905.2 66(745) 18.040(2) LR: 106.441(5) 124.4 80(7) 18.040(2) Prob > LR: 0.000 0.0 00 0.000 McFadden's R2: 0.103 0.1 21 -0.018 McFadden's Adj R2: 0.092 0.1 05 -0.014 Maximum Likelihood R2: 0.132 0.1 52 -0.021 Cragg & Uhler's R2: 0.177 0.2 04 -0.028 McKelvey and Zavoina's R2: 0.182 0.2 17 -0.035 Efron's R2: 0.135 0.1 55 -0.020 Variance of y*: 4.023 4.2 03 -0.180 Variance of error: 3.290 3.2 90 0.000 Count R2: 0.677 0.6 93 -0.016 Adj Count R2: 0.252 0.2 89 -0.037 AIC: 1.242 1.2 23 0.019 AIC*n: 935.306 921.2 66 14.040 BIC: -4024.871 -4029.6 63 4.791 BIC': -73.321 -78.1 12 4.791 Difference of 4.791 in BIC' provides positive su pport for saved model. Note: p-value for difference in LR is only valid if models are nested.
All pseudo-R2’s are slightly larger for M1 Both AIC and BIC statistics are smaller for M1, which provides support for that
model