Post on 15-Mar-2018
All of Statistics: A Concise Course in Statistical Inference
Brief Contents 1. Introduction…………………………………………………………11
Part I Probability
2. Probability…………………………………………………………...21
3. Random Variables…………………………….…………………….37
4. Expectation…………………………………………………………..69
5. Equalities…………………………………………………………….85
6. Convergence of Random Variables………………………………...89
Part II Statistical Inference
7. Models, Statistical Inference and Learning………………………105
8. Estimating the CDF and Statistical Functionals…………………117
9. The Bootstrap………………………………………………………129
10. Parametric Inference……………………………………………..145
11. Hypothesis Testing and p-values…………………………………179
12. Bayesian Inference………………………………………………..205
13. Statistical Decision Theory……………………………………….227
Part III Statistical Models and Methods
14. Linear Regression………………………………………………...245
15. Multivariate Models……………………………………………...269
16. Inference about Independence…………………………………..279
17. Undirected Graphs and Conditional Independence……………297
18. Loglinear Models…………………………………………………309
19. Causal Inference………………………………………………….327
20. Directed Graphs………………………………………………….343
21. Nonparametric curve Estimation……………………………….359
22. Smoothing Using Orthogonal Functions..………………………393
23. Classification……………………………………………………..425
24. Stochastic Processes………………………………………………473
25. Simulation Methods………………………………………………505
Appendix Fundamental Concepts in Inference
� � ��� ����
� � ������� � ������� �
�������! #"%$& %')(*�,+.-0/1 2 !34+5-6(* 87�9: <;2+.=,�>"?/�9@ #"%=A/B"%CD3E�!9: %F�G�=�+5G�7,9: !/B"H/�+.$5+5(JIK"%G�=-@(*"H(*+5-@(*+5CL-M'N !9O-P(*F�=,�LG#(*-�+.G�-P(D"H(:+.-@(:+.CQ-LRSCL !T�7�F,(:�L9U-:CQ+.�QG�CL��VW�Q-:7X�LCQ+Y"%$5$5I�=B"H(D" T�+.G[Z+.G��>"%G�=8T>"%CD��+.G,�\$5�]"%9@G�+.G,�E^_RBT>"H(*�,�LT>"H(*+5CL-LR�"%G�=K9@�L$."H(*�Q=`=�+5-:CQ+.7�$5+.G��Q-La)����+.-b/X [ !3CL <;!�Q9:-K"cTdF�CD�feb+5=��L9?9:"%G��!� %'\(* %7�+.CQ-?(:�B"%Gf"�(JI[7,+.C]"H$6+.G#(*9@ [=,F�CQ(: !9@I�(:�_g2(8 !GTh"i(*���QTh"H(:+.CL"%$j-P(D"H(:+.-@(:+.CQ-La)kl(m+5G�CL$5F�=��Q-nT� 2=��L9@G?(: !7�+5CL-b$.+53%�0G� !G�7�"%9*"%T��Q(:9:+5C6CLF�9P;!��L-P(*+.T>"H(:+. !GoRi/X [ %(:-@(:9*"%7�7,+.G��m"%G�=MCQ$Y"%-@-:+qpBC]"H(:+. !GjRi(: !7�+.CQ-�(:�B"H(&"H9:�rF,-:FB"%$5$5I09@�L$.�Q�#"H(:�L=(* 0'N !$.$5 <esZJF�7OCQ !F�9:-@�L-Qa)���,�m9@�]"%=��Q9S+.-)"%-:-@F�T��L=>(: t3EG� <euCL"%$.CQF�$.F,-S"%G�=v"0$.+q(:(:$.�r$5+.G[Z�]"%9n"%$5�!�L/,9*",a&wn U7�9:�_;[+5 !F�-b3EG� <eb$.�Q=��!�\ %'x7,9: !/B"H/�+.$5+5(JIh"%G�=?-@(*"H(*+5-@(*+5CL-�+5-�9:�LyEF�+59:�Q=oa�����0(*�_g2(n+.-�-@F�+5(*"%/�$5�6'N !9b"%=,;H"%G�CQ�L=8F�G�=��Q9:�!9:"%=�FB"i(*�L-m"%G�=8�!9:"%=�FB"H(:�z-@(:F�=��LG#(:-La
{j|<}[|�~��_|�~��#� RE� },|]}4��~W��~W��� "%G�= ��}B�i��~������N�2},�H��~W��� "H9:�n"%$.$BCQ !G�CQ�L9:G,�L=heb+q(*�CL !$5$.�QCQ(*+5G��4"HG�=�"%GB"H$5I2�L+5G��4=B"H(*",a �B !9t-: !T��>(:+.T��%Rx-P(D"H(:+.-P(*+.CQ-d9@�L-:�L"%9:CD�cer"%-MCL %G[Z=�F�C_(*�L=�+5Gh-P(D"H(:+.-@(:+.CQ-r=��Q7B"%9@(:T>�QG#(*-reb��+5$.�n=B"H(*"tT�+.G�+5G��d"HG�=vT>"%CD��+.G,�m$5�]"%9@G�+.G��\9:��Z-:�L"%9:CD�\es"%-�CL %G�=�F�C_(*�L=t+.G\CL !T�7�F,(:�L9�-@CL+.�QG�CL�)=��Q7B"%9@(:T>�QG#(*-La&�2(D"H(:+.-P(*+.CQ+Y"%G,-j(*�� %F��!�#((*�B"i(vCL %T>7�F[(*�L9>-:CQ+.�LG#(:+.-@(:-he��L9:� 9@�L+5GE;%�LG#(*+5G���(:���Keb���Q�L$�a��r !T>7,F,(*�Q9h-:CQ+.�QGE(:+.-P(*-(*�� %F��!�#(b(*�B"H(b-P(D"H(:+.-P(*+.CL"%$j(:���L !9PI�=�+5=�Go��(n"%7�7�$qIh(* U(*�,�L+.9�7,9: !/�$5�LT�-La
����+.G��%-U"%9:��CD�B"HG��!+.G,��a��2(*"H(*+5-@(:+.CL+."%G�-tG� <e�9:�QCL !�%G�+.�Q�v(*�B"H(MCL !T�7�F,(:�L9O-:CQ+.�QG[Z(*+5-@(*-r"%9:�mTh"H32+5G��\G� <;!�L$1CL !G#(:9:+./,F,(*+5 !G�-Seb��+.$5�mCQ !T>7,F,(*�Q9r-:CQ+.�QGE(:+.-P(*-rG, �ef9@�LCL %�!G�+.�Q�(*���0�!�QG��L9:"%$.+q(JIv %'�-@(*"H(*+5-@(:+.C]"H$1(*�,�L !9PI?"%G�=8T>�_(*�� 2=� !$5 !�%I%a&�r$.�_;!�Q9m=B"i(D"OT>+5G�+.G,�O"%$qZ�! !9@+5(*�,T>-z"H9:�UT> %9:�M-@C]"%$."%/�$.�M(:�B"%GA-@(*"H(*+5-@(*+5CL+."%G�-6�_;!�L9z(:�� !F��!�A7X !-:-@+./�$5�%aU�B !9:T>"%$-@(*"H(*+5-@(*+5C]"%$#(:���L !9PId+5-xT� !9:�)7X�L9@;H"%-@+5;%�r(*�B"HGMCL !T�7�F,(:�L9�-:CL+5�LG#(*+5-@(:-��B"%=O9@�]"%$5+.�L�Q=oa���$.$"%�!9@�L�U-@(:F�=��LG#(:-\eb�� 8=��L"%$xeb+5(:� (*����"%GB"H$5I2-:+5-6 %'�=B"H(D"�-@�� !F�$5=A/X�Oer�Q$.$��%9: !F�G,=��L=+.G?/B"%-@+.Cz7�9@ !/B"%/,+.$.+q(JIv"HG�=?Th"H(:���LT>"H(:+.C]"H$j-@(*"H(*+5-@(:+.CL-QaS�m-@+.G��O'W"%G�C_Iv(* 2 !$5-s$.+53%�zG��QF[Z
�!�
��� ��������� ����������������������� �!�
9*"H$1G��_(*-QR�/1 2 !-P(*+.G,�M"%G�=�-@F�7�7X !9@(r;!�LC_(* !9�T>"%CD��+5G��L-)eb+5(:�� !F,(�F,G�=��L9@-@(*"%G�=�+5G��O/B"%-@+.C-@(*"H(*+5-@(:+.CL-�+5-�$.+53%�z=� !+5G��O/�9:"%+.G?-:F,9:�!�Q9@I?/X�Q'N !9@�z32G, �eb+5G����� <e (* �F�-@�t"O/B"HG�=B"%+5=oa
"rF,(>eb���L9@� C]"%Gu-P(*F�=��QG#(*-�$.�L"%9:G /B"%-@+.C47�9@ !/B"%/,+.$.+q(JIc"%G�= -@(D"i(*+.-P(*+5CL-hyEF�+5CD3E$5I$#wm <eb���Q9:�Ha\�s(6$.�]"H-@(]RX(*�B"i(6es"H-6TdI`CL !G�CQ$.F�-@+. !G`eb���LG TdI8CL %T>7�F[(*�L9�-:CL+5�LG�CQ�OCL !$ Z$.�L"%�!F��Q-�3%�Q7,(�"%-:3E+.G,�nT��&%�'�( ���L9@�)CL"%G0k1-@�LG�=tTdI6-P(*F�=,�LG#(*-�(* ��!�_(�"s�! [ 2=0"�F�G�=��Q9PZ-@(*"%G�=�+5G��8 %'rT� 2=��L9@G�-P(D"H(:+.-@(:+.CQ-zy2F,+.CD3E$5I$#*) �����U(JI[7,+.C]"H$&Th"H(:���LT>"H(:+.C]"H$�-@(*"H(*+5-@(*+5CCL %F�9:-@�`-@71�QG�=�-h(: [ TMF�CD� (*+5T>�? !Gc(*�Q=�+. !F,-h"%G,= F�G,+.G�-@7�+.9@+.G�� (* %7�+.CQ-4VNCL !F�G#(:+.G��T��Q(*�, [=�-QR�(Jer �=�+5T>�QG�-:+5 !GB"%$1+.G#(*�Q�!9*"%$5-��Q(:C%a ^>"H(�(:���t��g[71�QG�-:�\ %'�CQ <;!�L9@+.G���T� [=��Q9:GCL %G�CL�Q7,(*->VW/X 2 %(*-P(*9*"H7�7�+.G,��RoCLF,9@;!�U�Q-@(*+5Th"i(*+. %GoRj�!9:"%7���+5C]"%$�T> 2=��Q$.-��_(*CHa ^�aU�[ ?km-@�Q( !F,(b(: >9@�L=��Q-:+.�%G` !F,9nF,G�=��L9@�!9*"H=�FB"H(:�t�� %G� !9:-nCL %F�9:-@�t !G87�9@ !/B"%/�+5$.+q(JIv"%G,=KT>"H(*����ZT>"H(*+5C]"%$�-P(D"H(:+.-@(:+.CQ-Lan����+.-m/1 2 !38"%9: %-:�t'N9@ !T (*�B"i(6CL %F�9:-@�%a�+n�L9:�t+.-�"h-:F�T�T>"%9@I? %'(*�,�\T>"%+5Gv'N�]"H(:F�9:�Q-n H'�(:��+.-�/X [ %3Xa� ab������/1 2 !3U+.-)-:F�+q(D"%/,$.�m'N %9��� %G� !9:-rF�G�=��Q9:�!9:"%=�FB"H(:�L-s+5GvT>"H(*�oR2-@(*"H(*+5-@(:+.CL-�"%G�=CQ !T>7,F,(*�Q9�-@CL+.�QG�CL�6"%-�e��L$5$j"%-��!9*"H=�FB"H(:��-@(:F�=��QGE(:-�+.G>CL !T�7�F,(:�L9r-@CL+5�LG�CQ�6"%G�= %(:���L9�yEFB"HGE(:+5(*"H(*+q;!�zpB�L$5=�-La
� abkvCQ <;!�L94"%=,;H"%G�CQ�L= (: !7�+.CQ-8(:�B"H(K"%9:� (*9:"%=�+q(*+. %GB"%$.$qI G� %(8(D"%F��%�E(K+.G " pB9@-@(CQ !F�9:-@�%a �B !9>��g,"%T>7,$.�%RrG� !G�7B"H9*"%T��Q(:9:+.C?9@�L�!9@�L-:-@+. !GjR�/1 2 %(*-P(*9:"%7�7�+5G���Rr=,�LG[Z-@+5(JIv�Q-@(:+.T>"H(*+5 !G?"%G�=8�!9*"H7���+.CL"%$jT� 2=��L$5-La
, abkO�B"];%�4 !T�+5(:(:�L= (* !7�+5CL->+.G 7,9: !/B"H/�+.$5+5(JI�(*��"H(v=� �G� %(v7,$Y"]I "�CL�QG#(*9*"H$�9: %$.�+5G�-@(*"H(*+5-@(*+5C]"%$)+.G['N�L9:�QG�CL�HaA� %9d�_g,"%T�7�$5�%R&CL %F�G#(*+.G,� T>�_(*�� 2=�-M"H9:�h;2+.9P(*FB"H$.$5I"%/,-:�LG#(La
- abkJG �!�LG,�L9*"H$�R)kt(*9PI�(* A"];! !+5= /X�L$."%/X !9:+5G��4(*�Q=�+. !F,-�CL"%$.CQF�$Y"i(*+. %G�-M+.Gc'W"];! !9U %'�QT>7���"%-:+5�L+.G,�OCL !G�CQ�L7,(:-La
. abk1CL <;!�Q9xG� !G,7B"%9*"HT>�_(*9:+5CS+.G,'N�Q9:�LG,CL�)/1�_'N !9:�)7B"%9:"%T>�_(*9@+.C&+.G['N�L9:�QG�CL�Hax����+5-�+5-�(*��� !7,71 !-@+5(:�& %',T� !-P(�-P(D"H(:+.-P(*+.CQ-�/1 2 !3E-�/�F,(�kj/X�L$5+.�_;!�S+5(�+5-o(*���)9:+5�!�#(�es"]Iz(* n=� b+5(La/�"%9*"%T��Q(:9:+5C\T� 2=��L$5-m"H9:�dF,G�9:�L"%$.+5-@(*+5C\"%G�=K7X�L=�"%�! !�!+5C]"%$5$5IhF�G�GB"H(:F�9*"H$�a\V0+n <ee� !F�$5=>e���3EG� <e (*���n�Q;%�L9@IE(:��+.G��U"H/1 !F[(�(:���m=,+.-@(:9:+5/�F,(*+5 !G��_g[CL�Q7,(s'N !9S !G��m !9(Je� M7B"%9:"%T��Q(*�Q9:-1#i^dk&+5G#(*9: 2=�F�CQ�m-P(D"H(:+.-P(*+.CL"%$B'NF�G�C_(*+. %GB"%$.-)"%G,=v/1 2 %(:-@(*9:"%7�7�+5G��;%�L9@I?�L"%9:$qI�"%G,=`-P(*F�=��QG#(*-bpBG�=?(*��+5-�y2F,+5(*�0GB"i(*F�9:"%$�a
2 abkt"%/B"%G,=� !Gc(*���?F�-@FB"%$3':�x+59:-@(M���Q9:T 45/&9: %/B"%/�+5$.+5(JI6) "HG�=7':�[�LCQ !G�= ���L9@T4��2(D"H(:+.-P(*+.CQ-8)8"%7�7,9: #"%CD�ja��[ !T���-P(*F�=,�LG#(*-\ !G�$qIK(D"H3%�O(*���OpB9@-@(t�B"%$q'r"HG�=�+q(
� ,
e� !F�$.=\/X�r"bCL9@+.T��)+q'[(:���QI0=,+.=\G� %(�-:�Q�s"%G#I0-@(*"H(*+5-@(:+.C]"H$%(*���Q !9@I%a��BF�9P(*���Q9:T� !9:�HR7�9@ !/B"%/�+5$.+q(JI`+5-6T� !9:�O�QG��#"%�!+5G��heb���QG�-P(*F�=,�LG#(*-\C]"%G -@�L�U+5(z7,F,(z(* �e� !9:34+5G(*�,�\CQ !G#(*��g[(m %'�-@(*"H(*+5-@(*+5CL-Qa
� ab����� CQ !F�9:-@� T> <;!�Q-v;!�Q9@IuyEF�+5CD32$qIu"%G,=uCL <;!�Q9:-8TMF�CD� Th"i(*�L9@+Y"%$�a��8I CL !$ Z$.�L"%�!F��Q-��@ !3%�U(*�B"i(0k�CL <;%�L9M"H$.$� %'�-@(*"H(*+5-@(:+.CL-z+5GA(:��+.-zCQ !F�9@-:�>"%G�=A���LG�CQ�>(:���(*+q(*$5�%a������sCL %F�9:-@��+.-�=��QTh"HG�=�+.G,�6/�F,(&k��B"];%�ser !9@3%�Q=>�B"H9:=M(* �Th"%3H�r(*�,�sTh"�Z(*�Q9:+."%$#"%-�+.G#(*F,+5(*+q;!�S"%-�7X !-:-@+./�$5�&-: m(:�B"H(�(*�,��T>"H(*�Q9:+."%$%+5-�;%�L9PItF,G�=��L9@-@(*"%G�=B"%/,$.�=��Q-:7�+q(*�z(*���z'W"%-P(n7B"%CL�HaS��G#IEes"]I!RX-:$5 <e CQ !F�9:-@�L-m"%9:�0/X !9:+5G���a
� ab��-��n+5CD�B"%9@= �B�QI2G�T>"%G 71 !+5G#(*�L= !F,(LRs9:+5�! !9U"%G�= CL$Y"H9:+5(JI�"%9:�8G� %(>-@I2G� !G#I#ZT� !F�-La�kn��"<;%�M(*9@+.�L=`(* v-P(*9:+53%�O">�! [ 2=4/B"H$Y"%G�CQ�%a6�� �"];! %+.= �!�_(:(*+5G��h/X !�!�%�L==� <ebGM+5GdF�G�+5G#(*�L9@�L-P(*+.G,�m(*�QCD��G�+5C]"%$2=��Q(*"%+.$5-LR�T>"%G#It9@�L-@F�$5(:-x"%9@�r-P(D"H(:�L=teb+q(*�� %F,(7�9@ [ %' a`������/�+5/�$.+5 !�!9*"H7���+.CO9@�Q'N�Q9:�LG,CL�L-U"i(d(*�,�v�LG�=� H'b�]"%CD��CD�B"H7,(*�Q9O71 !+5G#((*�,�\-P(*F�=,�LG#(m(: �"%7�7�9@ !7�9:+."H(*�z-@ !F�9@CL�L-Qa
a�6GdTdIze��L/,-:+5(:��"%9@�)pB$5�L-�eb+5(:����CL 2=��Seb��+.CD�M-P(*F�=,�LG#(*-�CL"%GdF,-:��'N %9�=� %+.G��m"%$.$(*�,��CL %T>7�F[(*+.G,��a�+n �e��Q;%�L9QR,(*���m/1 2 !3O+5-)G, %(�(:+.�Q=>(* � "%G�=v"HGEI�CQ !T>7,F,(*+5G��$Y"HG��!FB"%�%�6C]"%G8/X�\F,-:�L=ja
�����mpB9@-@(�7B"%9P(r %'1(*���n(*��g[(�+.-SCQ !G�CL�Q9:G��Q=veb+5(:�>7�9@ !/B"%/,+.$.+q(JId(:���L !9PI!RE(*���b'N !9@Th"%$$Y"%G,�!FB"%�!�z H'xF�G,CL�L9P(D"%+5G#(JI�eb��+.CD�8+.-s(:���0/B"%-:+5-b %'x-P(D"H(:+.-P(*+.CL"%$o+5G,'N�L9@�LG�CQ�%aS�����\/B"%-:+5C7�9: %/�$.�QT�(*��"H(ber�0-@(:F�=,I?+5G87,9: !/B"H/�+.$5+5(JI>+.- %
��� ������������������� ��� !"���#�$���&% !('*)+�-,�,+.0/21����3� !"�4�51���% !('0%6��!(�#�7�+,8'�9:�51��;'0<��>=)-'*?4�+,#@
�����)-:�LCQ !G�=\7B"%9P(� %'[(:���S/1 2 !3m+.-�"%/X !F,(�-P(D"H(:+.-@(:+.CL"%$%+5G,'N�L9@�LG�CQ��"HG�=t+q(*-�CQ$. !-@�&CL !F�-@+.G�-QR=B"H(*"0T�+.G,+.G��z"HG�=UTh"%CD�,+.G���$5�]"%9@G�+.G��,a������b/B"H-:+.C�7,9: !/�$5�LT %'X-@(D"i(*+.-P(*+5C]"%$[+.G,'N�Q9:�LG,CL�+.-s(:���0+.G#;!�Q9:-:�\ %'�7�9: %/B"%/�+5$.+5(JI %
��� �����A�51��B'0<��C)-'*?4�+,+.0/21����3)-���B/D�;,E�GF �IH6'*<����51���% !('*)+�-,�,��51����J�6����� !K=���C�-�L�51��A��������@
�����L-@�O+.=��L"%-6"%9@�d+5$.$.F,-@(*9:"H(*�Q=`+5G4�x+5�!F�9@� � a � a�/&9:�Q=�+.C_(*+. %GoR1CQ$Y"%-@-:+qpBC]"H(:+. !GjRXCQ$.F�- Z(*�Q9:+.G,�M"%G�=h�L-@(:+.T>"H(*+5 !G>"%9:�m"%$.$ -@71�QCL+Y"H$1CL"%-:�Q-s %'�-P(D"H(:+.-@(:+.CL"%$ +.G,'N�Q9:�LG,CL�%aNMz"i(D"d"HGB"%$qZI2-:+.-QR�T>"%CD��+.G,��$.�L"%9:G,+.G��d"%G�=v=B"i(D"dT�+.G�+5G��t"%9:��;H"%9@+. !F,-rGB"%T��L-��!+5;%�LGv(: d(*����7�9*"%C�Z(*+5CL�h H'�-@(*"H(*+5-@(*+5C]"%$S+5G,'N�L9@�LG�CQ�%R&=��Q71�QG�=�+.G,�4 !G�(:���hCQ !G#(*�_g2(La4�����v-@�LCL %G�=c7B"%9@(t %'
� - ��������� ����������������������� �!�
Mz"H(*"O�!�LG��Q9*"H(:+.G��U7�9@ [CQ�L-@- 6/�-:�Q9@;%�L=K=�"H(D"
/S9@ !/B"%/,+.$.+q(JI
kJG,'N�L9@�LG�CQ�d"HG�= Mz"i(D" �`+5G�+.G��
�x+5�!F�9:� � a � % /&9@ !/B"%/�+5$.+q(JIv"%G,=8+5G,'N�L9@�LG�CQ�%a
(*�,�K/X [ !3�CQ !G#(D"%+5G�-� !G��KT� !9@�`CD�B"H7,(*�Q9h !G�7,9: !/B"H/�+.$5+5(JI�(:�B"H(hCL <;!�Q9:-v-P(* 2CD�B"%-@(:+.C7�9@ [CQ�L-:-@�L-n+.G�CQ$.F�=,+.G�� �K"H9:3% <;?CD�B"H+.G�-Qa
kh�B"];%�A=,9*"]ebG ���L"];[+5$5I !G %(*���Q9�/1 2 !3E-?+.G T>"%G#I�7�$."%CL�Q-La �` !-P(8CD�B"%7,(:�L9:-CL %GE(*"%+.G "�-@�LC_(*+. %GfCL"%$.$5�L= "r+5$./�$5+. !�%9*"%7��,+.C �b�LT>"%9:3E-veb��+5CD�f-@�L9P;!�L-8/X %(*�u(* c"HC_Z3EG� <eb$.�Q=��!�8TdIc=��L/[(>(: � %(:���L9>"%F,(:�� !9:->"%G�=c(* �71 %+.G#(�9:�]"H=��L9@-�(: � %(:���L9UF�-@�Q'NF�$9:�_'N�L9@�LG�CQ�L-Laukter !F,$.= �L-:7X�LCQ+Y"%$5$5I�$5+.3H�?(: �T��LG#(:+. !G�(*�,�`/X [ %32-�/#I M6� � 9: 2 %(�"%G�=�[CD���Q9@;2+.-@� V ������� ^m"%G,= � 9@+.T�T>�_(:(m"%G�=A�E(*+.9@�]"%3H�L9UV ��6� � ^b'N9: %T eb�,+.CD� kn"%=B"%7[(*�L=T>"%G#Iv�_g,"%T�7�$5�L-n"%G�=8�_g[CL�Q9:CQ+.-:�Q-La
��.
� �����#�7,C�#�7)-,����N�������L� �6� ������� )+�#�7'*��� !(F
�2(D"H(:+.-P(*+.CQ+Y"%G,-b"%G�=KCQ !T�7�F,(*�Q9b-:CL+5�LG#(*+5-@(:-m %' (:�LG8F�-:�t=�+��1�L9@�LG#(n$Y"%G,�!FB"%�!�z'N !9�(:���-*"%T��K(:��+.G��,a +m�Q9:�4+5-v"�=�+5CQ(*+5 !GB"%9PI�(*�B"i(v(*���K9:�]"H=��L9vT>"]Ices"%G#(�(: �9:�_(*F�9@G�(* (*��9@ !F��!�, !F,(b(*���0CQ !F�9:-@�%a
� �����#�7,C�#�7)-, '*?�%I<��C��! � )��7����)-� ���-���6�$����L-P(*+.T>"H(:+. !G $.�L"%9:G�+5G�� F�-@+.G��U=B"i(D"M(* U�L-P(*+5Th"H(:�
"%G?F�G,32G, �ebG`yEFB"%G#(*+q(JICL$."%-:-@+5pBCL"H(*+5 !G -:F�7X�L9P;2+.-:�Q=K$5�]"%9@G�+.G�� 7�9@�L=�+5CQ(:+.G��U"U=�+.-@CL9@�Q(*�� 'N9: %T �����CL$5F�-@(:�L9:+5G�� F�G�-@F�71�Q9@;2+.-@�L=4$5�]"%9@G�+.G,� 7�F[(:(*+5G��O=B"H(*"�+5G#(* O�!9@ !F�7�-=B"H(*" (*9:"%+.G�+5G��O-*"HT>7�$5� V���������P^�����������V���������%^CL <;H"%9:+."H(*�Q- 'N�]"H(:F�9:�Q- (:��� �"!�� -CL$."%-:-@+5pB�Q9 �#I[7X %(:���L-@+.- "OT>"%7�'N9: !T�CL <;H"%9:+."H(*�Q-s(* � !F,(*CQ !T��L-�#I[7X %(:���L-@+.- # -@F�/�-:�_(� H'�"O7B"%9:"%T��Q(*�Q9�-:7B"HCL�%$CL !G[pB=��LG,CL�\+.G#(*�Q9@;H"%$ # +5GE(:�L9P;%"H$j(*�B"i(bCL !G#(D"H+.G�-bF�G�3EG� <ebG`y2F�"%G#(*+5(JI
eb+q(*�8"O7�9:�Q-:CL9@+./X�L=?'N9:�Qy2F,�LG�C_I=�+.9@�LC_(*�L=`"%C_I[CQ$.+5Cz�!9*"%7,� "s"]I!�Q-mG��_( TMF,$5(*+q;H"%9:+."H(*�n=�+.-P(*9@+./�F,(:+. !Gveb+q(*�
-@71�QCL+5p��L=8CL !G�=,+5(*+5 !GB"%$+5G�=��L7X�LG,=��LG�CQ�t9@�L$."H(*+5 !G�-
"s"]I!�Q-:+Y"HG`+5G,'N�L9@�LG�CQ� "s"]I!�Q-:+Y"HGK+.G['N�L9:�QG�CL� -P(D"H(:+.-@(:+.CL"%$jT��Q(:�� [=,-b'N %9�F�-:+5G��U=B"H(D"(: �F,71=B"i(*�0-:F�/��@�LCQ(:+5;%�t/X�L$5+.�Q'N-
'N9:�Qy2F,�LG#(*+5-@(m+.G,'N�Q9:�LG,CL� # -P(D"H(:+.-@(:+.CL"%$jT��Q(:�� [=,-b'N %9�7�9: 2=�F�CQ+.G��7X !+.G#(��Q-@(:+.T>"H(*�Q-m"HG�=?CL !G,p�=��LG�CQ�d+5GE(:�L9P;%"H$.-eb+q(*�?�!FB"%9:"%G#(*�Q�L-b !G?'N9:�LyEF��QG�CQI8/X�L�B"];2+. %9
$Y"%9@�!�6=��_;2+Y"H(:+. !G?/X !F�G�=�- /1�n� $5�]"%9@G�+.G,� F�G,+5'N !9@T�/X !F�G�=,-m !G?7�9@ !/B"%/,+.$.+q(JIh H'x�Q9:9: %9:-
� 2 ��������� ����������������������� �!�
� '������#�7'*�
� FI?�H6'�� ���-���6�$���� 9@�]"%$oGEF�Td/1�Q9:-+.G['������xV �1^ +5G,pBTMF�T %x(:���0$Y"%9@�!�L-P(bG2F,TM/1�Q9��h-:F,CD�`(*��"H(������V �1^�'N %9b"%$.$�� ���
(:��+.G�3> %'�(*��+5-b"%-�(*���0T�+.G�+5TMF�T %'�-:F,7 ����� �V �1^ -@F�7�9:�QTMF�T %�(*���0-@Th"%$5$.�Q-@(�GEF�TM/X�L9��v-@F�CD�`(:�B"H(�����xV �1^r'N !9b"%$5$�� ���
(:��+.G�3> %'�(*��+5-b"%-�(*���0T>"ig[+.TMF�T %'���� �! V �#" � ^ V �#" � ^ %$&$&$' , � �( � )+* �-,) ,/. �-0 )21 ,3)V54x^ � "%T�Th"t'NF,G�CQ(:+. !G76�89 �;: 0���<-0>=+? �@ !F[(*CL %T>�A -:"%T>7,$.�6-@7B"%CL�hVW-@�Q(n %'� !F,(:CL !T��L-*^� �_;!�LG#(MVW-@F�/�-:�_(� H' A ^B�xV @ ^ +5G�=�+.CL"H(* %9r'NF�G�C_(*+. %GDC � +5' @ �E� "%G�= � H(*���Q9@eb+.-@�F�VG�6^ 7�9@ !/B"%/,+.$.+q(JI� %'x�_;!�QGE(H�I � I GEF�TM/X�L9b %'�7X !+.G#(*-�+5G?-:�Q(��JLK CQF�TMF�$."H(*+q;!�z=�+5-@(*9@+./�F[(*+. %Gh'NF�G�C_(*+. %GMK 7�9@ !/B"%/,+.$.+q(JI�=��LG�-@+5(JI�VN !9�Th"H-:-D^�'NF,G�CQ(:+. !G�ONPJ � �B"%-b=�+5-@(:9:+./,F,(*+5 !GQJ�ONR � �B"%-b=��QG�-:+q(JI7� S4 � "%G�= �B"];!�0(*�,�\-:"%T��\=,+.-@(:9:+5/�F,(*+5 !G� � �������������TNPJ -:"%T>7,$.�6 %'x-:+5�L� � 'N9: !TUJV -P(D"%G�=�"%9:=`wm !9@Th"H$j7�9@ !/B"%/�+5$.+q(JIh=��QG�-:+q(JIW -P(D"%G�=�"%9:=`wm !9@Th"H$j=�+5-@(*9@+./�F[(*+. %Gv'NF�G�CQ(:+. !GX : F�7,71�Q9Y4�yEFB"HGE(:+.$5�z %'[Z�V � � � ^r+�a �%a\W 0�� V � " 4�^]6V �A^ 4^6_� ? JhV`�1^ ��g,7X�LC_(*�Q=`;H"%$.F,��VWT��]"%G ^r H'x9:"%G�=� !T�;%"H9:+Y"H/�$.� �]6VGa,V �A^@^ 4 6 a�V`�1^ ? JhV`�1^ ��g,7X�LC_(*�Q=`;H"%$.F,��VWT��]"%G ^r H'�a,V �A^bUV �A^ ;H"%9@+Y"%G�CQ�z %'�9*"%G�=, !T ;H"%9:+."%/�$5� � '�� V � � M^ CQ �;H"%9@+Y"%G,CL�z/X�Q(Je��L�LG � "HG�= � � ������������� =B"i(D"� -:"%T>7,$.�6-@+.�L�c"ed CQ !G#;!�L9@�!�LG,CL�d+5G?7�9: !/�"%/�+.$5+5(JIf CQ !G#;!�L9@�!�LG,CL�d+5G?=�+.-P(*9:+5/�F,(:+. !Ggih"ed CQ !G#;!�L9@�!�LG,CL�d+5G?y2F�"%=�9*"i(*+.CzT��]"HG���TjPZ�V k �2lDm� ^ V��"� " k�^onpl � f Z�V � � � ^
� �
� '������#� '0� '0���#�$�I<��-�� FI?�H�'�� ���-���6�$���� -P(D"H(:+.-P(*+.CL"%$oT� [=,�L$ C,"O-:�Q(n %'x=�+.-P(*9@+./�F,(:+. !G>'NF�G�CQ(:+. !G,-LR
=,�LG�-@+5(JI�'NF�G�C_(*+5 !G�-� !9b9:�Q�!9:�Q-:-@+. !G?'NF�G�C_(*+. %G�-� 7�"%9*"%T��Q(:�L9�� �Q-@(:+.T>"H(*�z %'�7B"%9:"%T��Q(*�Q9�MVGJd^ -P(D"H(:+.-P(*+.CL"%$j'NF�G,CQ(*+5 !GB"%$&VN(*�,�\T��]"HGoR,'N !9���g�"HT>7�$5�<^� �BV � ^ $5+.3H�L$.+5�� 2 [=h'NF,G�CQ(:+. !G��� 4��,V�� �#^ ��� n� � d ���� 4�vV�� �%^ I ��� n�� � I +.-�/X !F�G�=,�L=?'N !9�$Y"%9@�!� ��"� 4��� &V�� �#^ ��� n� � c"�d �
�"� 4�� SV�� �!^ I ��� n� � I +.-�/X !F�G�=,�L=K+5G�7�9: %/B"%/�+5$.+5(JI�'N !9�$."%9:�!� �
��� ��������� ����������������������� �!�
� , �G9 < � � ���51�����)+�C,
< � 4�� 8) � 9���) , 4 �� � � ��
m , � $&$&$� 8� �
) a � 4 � �� 0 � 'N !9 �� a �$.+5T ��� 8 ( ������ * � 4 < ������ � "%T>T>"\'NF,G�CQ(:+. !G�+5-s+.-s=,�QpBG��Q=?/EI#3�VG4�^ 4�6\89 � : 0�� < 0>= ? ��'N !9 4 � � a&kl'
4�� � (*���QG�3)V54x^ 4 V54 " � ^�3)V54 " � ^�aSkl' � +.-n"%G8+.G#(*�Q�!�L9s(:���LG 3)V � ^ 4 V �Q" � ^ � a�[ !T��z-:7X�LCQ+Y"%$1;%"H$.F��Q-m"H9:�&%�3)V � ^ 4 � "%G�= 3)V � n � ^ 4�� �&a
� ����� �
� ��� �� �� ������
���
� � ��� ��� � �
� ��� �� �� ������
��� �� ������������������� ���! #"%$&"('*)*',+.-/'*01+3254�67$8+32(4�67$8+3':9;$&)<)*$&=5>#?%$8>#4A@B &�DCE?%$&=<+3',@F-G':=5>H?(=59�4I�!+J$8'*=<+.-#KDL�49;$&=M$&N5N():-ON5�! #"%$&"('*)*',+.-P+Q254� &�!-R+Q H$TS5':U&4��!0Q4A0Q4I+1 &@(N5�Q &"5)*4I670IVW@B�! #6X9I #'*=/Y%':N5N5'*=(>Z+3 +3254/$8=%$&):-[0!'*0\ 8@]9I #6�N5?(+34I��$&):># #�Q',+3256�0�K1^�254O0_+J$&�_+3'*=(>`Na #':=<+Z'*0A+3 `0!Na4I9�':@F-b03$&6�N5):40QN%$89�4&V%+Q254/0Q4I+c &@]Nd #0Q0!'*"5):4O #?(+39I #6�4�0�K
��e� fgih j�k�l f�jmgi�clRnogi �� prq7lR �Dn^�254`s�t%u�vxwBy�s�v1t%z#y|{RVa':0c+Q254i0!4I+H &@DNd #0Q0!'*"5):4M &?(+39I #674I0P 8@�$&=}4e~GNa4I�Q'*6�4�=<+�K�] #':=<+30O��'*=�{�$&�!479�$&)*):4�S�s�t%u�v]w�y��d���Wz#�du�y[s� &�M�WyEt%w����<t(�����d�1s&���H�dyG�d�Wsb$&�!40Q?5"(0Q4I+Q0H &@x{RK�E�&�<�M�<� ���5�,���B���\�\ ¢¡�£J£T¤¦¥J¡¨§F©� ª��§«¥J�\ F¬%�© {¯®�°¨±m±³²J±|´T²Q´P±³²Q´T´/µG¶�· ¬%�c�¸&�©% B¬(¤& A F¬%��¹�ºJ£ A .¡�£J£�§*£O¬%�Q¤<»¨£/§*£T¼ ®½°¨±m±|²J±³´/µG¶\¾
�E�&�<�M�<� ���5�¿�¯À�� �ÂÁ �O B¬%�á¨Ä[ ¢¥J¡¨Åb�¦¡¢�M¤bÅb�Q¤¨£Ä[º3�Åb�©% Z¡¢�R£�¡¨Åb�ZÆ%¬[ÇW£§«¥Q¤&È�É�Ä5¤&©%Ê ª§F �Ç�Ë%�;¡¨ºM�!ÌE¤&ÅHÆdÈ*�JËx ¢�ÅHÆ%�º!¤& �ÄGº3� ¶7· ¬%�© {¯®ÎÍήÂÏ_ÐHÑX²eÑÓÒ�¶`· ¬%�i�¸&�©% D F¬(¤& � B¬%�Åb�Q¤¨£ÄGº3�Åb�©% ]§*£HÈ,¤&ºªÔ[�ºc B¬(¤&©³Õ[Ö Á Ä[ xÈ*�J£J£P B¬(¤&©m¡¨ºO�QÉ�Ä5¤&È× .¡¦ØEÙ`§*£�¼ ®ÂÏ ��Ú ²eÛ&ܨÝ.¶�¾�E�&�<�M�<� ���5�¿Þß�B�T�\�P .¡�£J£R¤b¥J¡¨§F©R�;¡¨º3�¸&�ºT F¬%�©| F¬%�T£�¤&ÅHÆdÈ*��£.Æ(¤[¥J�c§*£P B¬%�P§F©;¹�©%§F .��£��
{¯®áà%��®�ÏF��âe²!�]ã;²_�xä�²;å;å;å²eÒ²c�]æ�ç�°¨±³²Q´/µ%è1åÀ�� xé Á �� B¬%�`�¸&�©% � F¬(¤& \ F¬%��¹�º3£ �¬%�Q¤<»¤Æ<Æ%�Q¤&º3£¦¡¨©m F¬%�� F¬[§FºQ»b .¡�£J£ ¶�· ¬%�©é ®êàDÏF��âe²!�]ã;²!�]ä;²�å;å;å�²JÒ\ëP��â�®Î´T²_�xã\®ì´T²_�xä\®X±|²1�1æ1ç�°¨±³²Q´/µZ@B &�Hí�î¯Üaè1åH¾
Û �
Û#Û ������������ ������������������������
� ',U#4I=ß$8= 4U#4I=E+ ¼ VA)*4+ ¼� ® °;� ç½{�!¦�#"ç ¼ µ}S54I=5 &+Q4�+Q2549� &67N5):4�6�4�=<+ &@ ¼ K%$.=(@B &�Q67$&)*),-#V ¼ 9�$&= "a4Ã�Q4�$&S�$&0'&!=5 &+ ¼ K)(½^�254�9� #6�N5):4�6�4�=<+O &@\{ ':0R+32544�6�N(+.-0!4I+�*5K�^�254M?(=5'* #=� &@]4IU&4�=<+30 ¼ $&=5S,+á':0cS(4.-%=54IS ¼0/ +®½°;� ç {�!T� ç¼ #�D�¯ç1+� #�\� çm"d &+32 µ�2T25'*9J279;$&=�"a4T+32( #?5>#2<+� &@�$&03& ¼ #�4+bK5(1$ @ ¼ âe² ¼ ã;²;å�å;å'*0T$`0!4�CE?54�=(9�4i 8@D0Q4+30�+32(4�=67æ98aâ ¼ æ ®êà×� ç�{rë`�¯ç ¼ æ�@B #�T$8+c)*4�$&0!+T &=54/'_è1å
^�254`':=E+Q4��!0Q4�9+3': #= &@ ¼ $&=5S0+ '*0 ¼0: + ® °;� çß{�!R��ç ¼ $&=(S�� ç;+�µ7�!4;$&S& ¼ $8=5S1+bK5(=<G #6�4I+Q'*6�4�0>2�4?2T�Q',+34 ¼ :;+ $&0 ¼ +bK@$ @ ¼ âe² ¼ ã�²;å;å�åa':0O$�0Q4IC[?(4�=59I4 &@]0Q4+30T+3254I= 6Aæ98aâ ¼ æ ® à � ç�{rëÃ�¯ç ¼ æ�@B #�T$&)*)a' è åB�4I+ ¼ ÐC+®�°;�¯ë¦� ç ¼ ²!�D"çE+7µEKF$ @]4IU&4��_-�4�)*4I674I=<+T &@ ¼ ':0�$&):0Q `9� &=E+3$&'*=(4�S�'*=+D2�4�2T�!':+34 ¼=G + #�IVG4IC[?(':U8$&)*4I=<+3):-&VH+JI ¼ KF$ @ ¼ ':0\$K-%=5':+Q4P0!4I+�V()*4+ML ¼ L8S54�=( &+34+32(4�=E?56i"a4I�c 8@x4I)*4�6�4�=<+Q0c':= ¼ K�<G4I4�^]$&"5):4 � @B &�T$Ã0!?56�6b$&�_-#K
^]$&"5):4 � K4<($&6�N5)*4R0!N%$&9I4i$&=(S³4U#4�=<+Q0�K{ 03$&6�N5)*4H0QN%$&9I4� #?(+39I #6�4¼ 4IU#4I=<+`ÏB0Q?5"50!4I+P &@x{TÒL ¼ L =[?(6¦"a4I�T &@xNd #':=E+Q0�'*= ¼ Ï�',@ ¼ ':0N-%=5',+34�Ò¼ 9� #6�N5)*4I674I=<+� &@ ¼ ÏB=5 &+ ¼ Ò¼0/ + ?5=5'* &=�Ï ¼ #��+`Ò¼0: + #� ¼ + '*=<+34I�Q0Q4I9I+Q'* #=�Ï ¼ $&=5S,+`Ò¼ ÐO+ 0Q4I+cS5'QP 4I�Q4I=59�47Ï�Nd #':=E+Q0�'*= ¼ +32%$¨+c$&�Q4/=5 &+T':='+`Ò¼=G + 0Q4I+c':=59�):?50Q': #= Ï ¼ ':0P$`0!?5"50Q4+H 8@x #��4ICE?%$&)�+3 R+`Ò* =[?()*) 4U#4�=<+¦Ïª$&)Q2Z$;-[0�@�$&)*0!4�Ò{ +3�Q?(4�4U#4I=E+¦Ïª$8)S2\$�-[0�+3�!?54�Ò
L�4�0Q$;-R+32%$¨+ ¼ â² ¼ ã;²;å;å�å;$&�!4�T]��sVU��d���a�� #��$&�!4�u �����1t%w�wXW�yZY�z<w��]s;�F�dyT':@ ¼ æ :ß¼�[ ®*\2T254�=54U#4I�ií%]®_^%Ka`% #�O4~($&6�N5)*48V ¼ âH®cb Ú ² � Ò² ¼ ã/®cb � ²eÛ#Ò² ¼ ä/®cb¿Û[²JÜ<Ò²�å;å;å $8�Q4
� � ����������������� ����� Û&ÜS5'*0��! #'*=<+�K�� v1t5���W�B�����d�� &@1{Ó'*0�$M0Q4�CE?54I=59�4� &@�S5':0��! #':=<+Z0Q4+30 ¼ â² ¼ ã;²�å;å;å[0Q?59J2+325$8+/ 6æ98aâ ¼ æ ®r{RK � ':U&4�=|$&=|4IU&4�=<+ ¼ V%S54.-5=54O+3254����T1�ªz#tG���×���.�1�1z&�W���d� �� ¼ "E-
��xÏF�\Ò�® aÏF� ç ¼ Ò�®�� � ':@A�¯ç ¼Ú ':@\�="ç ¼ å� 0Q4�CE?54I=59�4³ &@c0Q4+30 ¼ âe² ¼ ã�²;å;å�åx'*0¦u��d�1�%���d��yÓ�ª�1z#�WyEt%s;����� ':@ ¼ â G ¼ ã G����� $&=5S 2�4³S54 -%=54�)*':6���� 6 ¼ � ® / 6æ98aâ ¼ æ.K�� 0Q4�CE?54I=59�4} &@H0!4I+Q0 ¼ âe² ¼ ã�²�å;å;åx':0u��d�1�%���d��yCT�y[z#�Wy[t5s;�ª��� ':@ ¼ â�I ¼ ãMI ����� $&=5Sm+Q254�= 2�4¦S54 -%=54`)*':6���� 6 ¼ �7®: 6æ98aâ ¼ æ K�$.=³4I':+Q254���9�$&0Q48V 2�4�2T'*):) 2T�Q',+34 ¼ ��� ¼ K�E�&�<�M�<� ���5���ÓÀ�� { ®ìÍ ¤&©d»�È*� d¼ æ ® b Ú ² � "8í.Ò �;¡¨º í]® � ²eÛ[²;å;å;å:¶Ã· ¬%�©?/ 6æ98aâ ¼ æ ®b Ú ² � Ò ¤&©d» : 6æ98aâ ¼ æD®�° Ú µG¶ �F�¦§F©(£ .�Q¤<»|�\�`»[�B¹�© �R¼ æ�® Ï Ú ² � "¨í.Ò F¬%�©E/ 6æ 8aâ ¼ æ�®Ï Ú ² � Ò ¤&©d»3: 6æ98aâ ¼ æ ® *׶\¾
���� �Â���! mg� ��;k��I�#"L�4�2Z$8=E+�+Q i$&0!0Q':>#=�$��!4;$&)%=E?56¦"d4��%$\Ï ¼ Òx+3 /4IU&4��!-�4U#4�=<+ ¼ VE9;$8)*)*4IS`+3254Tv1�¨�&('t)&]�ªw��B� W� 8@ ¼ KDL�4/$8)*0Q `9�$&)*)+*Â$¦v1�W�,&1t)&]�ªw��F� WOT]��sI���¨��&x���W�B�a�� #�Z$Ãv1�W�&]t)&]��wª�F� W.-0/214365879/ �<� :��<; 3 � 7=3?>3@1BA � � /?7 � 3C3@1 DE5 ��9F 7EA � A=1 � 1 / ; /?7 �<GW�9FH;�<GW� 5�/ ¼ 1 IJ/�K � 3 �<� >�<� � 3 �¨�MLI� 143 � �9F D �ON3QP L K � 3R/�K �S: K87 � �F �;�<�b� 1T5 �&� -U583C/ ���MVWN:��Ó� 3X3�14DY5 �9F 7EA � A=1 � >1 /Z1 � 3[/?7 � � 1 � 1 / ��VL8� � 3C3\7OI]3 � / L;�<� � ��V�R^ >`_ �8� V×� aG��� /�K �/ ��L Kb5=1 L;�<� �<�<�#� 5 V 1 �I�7 F�V<� / � 1 � 3 �
u�yEt%s;���Wyd��^1 �C[?5$&)*',@F-�$&0c$`N5�! #"%$&"5':)*',+.-#V)* 2%$&0�+3 Ã03$¨+3'*0_@F-b+Q25�Q4I4i$¨~G': #670Iëc�� _,5=1 /Z1 7E5 �5�0d]eá�eÄG© ¥ �§«¡¨© $ F¬(¤& 䨣J£§,Ô&©(£}¤ º3�Q¤&ÈP©%Ä[Å Á �º $\Ï ¼ Ò .¡ �Q¤[¥J¬�¸&�©% ¼á§*£/¤ v]�W�&]tE&x��wª�F� W T1��sI���¨�f&]���W���d� ¡¨º/¤ v1�W�,&1t)&]�ªw��F� W�u�yEt%s;���Wy § �§F �£�¤& �§*£B¹Z�J£/ F¬%���;¡¨ÈFÈ*¡¨��§F©[Ô� F¬[º3�J�i¤;Ì#§«¡¨Åi£�gh Y����du i g $\Ï ¼ Òkj Ú �;¡¨ºi�¸&�ºeÇM¼h Y����du l g $\Ϫ{TÒ�® �h Y����du m gZ�F�T¼ âe² ¼ ã�²;å;å;å ¤&º3�M»&§*£�n;¡¨§F©% � F¬%�©
$Ro 67æ98aâ ¼ æqp® 6r æ98aâ $�Ï ¼ æ�Òeå^�254��!4�$&�Q4A67$&=<-/'*=<+34I�QN5�!4I+J$¨+3'* &=50] 8@E$\Ï ¼ ÒKx^�2(4�+ 2� R9� #6�6� #=M':=<+34��!N5�Q4+J$8+Q'* #=(0$&�Q4Ã@B�Q4ICE?54�=59I'*4I0`$&=5S�S54�>#�!4�4I0i &@\"a4I)*'*4@B0�K $.= @B�Q4IC[?(4�=59-�':=E+Q4��!N5�Q4+J$8+Q'* #= V�$\Ï ¼ ÒO':0
�� ������������ ������������������������
+32(4M): #=5>Ã�Q?5=³N5�Q &Na #�_+3': #= &@x+Q'*6�4�0�+325$8+ ¼ ':0T+Q�Q?54�'*=��!4�Nd4I+3',+3': #=50�K�`% #�T4~($&6�N5)*48V':@ 2�4M0Q$�-+Q2%$8+T+32(4�N(�Q #"%$8"5'*):':+.-� 8@x254�$&S50c'*0 ��� Û[V 2T254�=³674�$&=|+Q2%$8+T',@2�4MY%':N+32549� &'*=}67$&=<-�+3':674I0c+Q254�=m+Q254¦N5�! #Na &�!+3': #=| &@�+3':674I0�2�4¦>#4+R254�$&S50H+Q4�=5S50P+3 ��� Û7$&0+32(4H=E?56¦"d4��Z 8@ +3 #0!0Q4I0�'*=59I�Q4;$80Q4�0IK �H=�':= -%=5',+34�),-Ã)* #=(>5VE?5=5N5�!4�S5':9I+J$8"5)*4P0Q4�CE?54I=59�4/ &@+3 &0Q0Q4I0�2T25 &0Q4H):'*6�':+Q'*=5>RN(�Q #Nd #�!+Q'* #=`+Q4�=5S50A+3 i$�9I #=50!+3$&=<+Z':0\$&=7'*S54�$&)*'��;$8+Q'* #= V&6¦?59J2)*'��&4�+3254`':S54;$� &@�$�0_+3�3$8'*>#2<+O):'*=54M'*=�>#4I #674+3�_-#K�^�254`S54�>&�Q4�4��. &@� "a4I)*':4I@�'*=<+Q4��QN(�Q4I+3$�+3': #= ':0M+325$8+�$\Ï ¼ ÒM674�$&0Q?(�Q4�0¦$&=� #"50!4��!U&4����¿0Ã0!+Q�Q4I=5>&+32 &@T"d4�):'*4@�+32%$8+ ¼ '*0�+3�Q?(4&K$.=}4I':+Q254��T'*=<+Q4��QN(�Q4I+3$8+3': #= V 2�4i�!4�CE?5'*�!4�+Q2%$8+ �c~G': #670 � +3 7Ü�25 &)*S K�^�254iS5'QP 4I�Q4I=59�4'*=�'*=<+34I�QN5�!4I+J$¨+3'* &=C2T'*):)�=5 8+`6b$¨+Q+34I�i6¦?(9J2 ?(=E+Q'*)�2\4�S54�$&)�2T',+32�0_+J$8+Q'*0!+Q'*9�$&)�':=(@B4�� �4�=(9�4&K�^�254��!4&V�+3254S('SPa4��Q':=5>m':=E+Q4��!N5�Q4+J$8+Q'* #=50i)*4;$8S +3 m+ 2� 0!9J25 [ #)*0` &@c':=(@B4��!4�=59I4&ë+32(4/@B�!4�CE?54I=E+Q'*0_+H$8=5S|+Q254��Z$;-#4I0Q' $8=³0!9J25 G &)*0�K�L�4/S54I@B4I�cS('*0Q9I?50Q0!'* #=|?5=<+3':) ) $¨+34��IK� =54/9;$&=|S54��!':U&4�67$&=<-N5�! #Nd4��!+Q'*4I0� &@ $�@B�Q #6á+3254�$¨~G'* &670IK��P4I�Q4M$&�Q4/$¦@B4.2/ë$\Ï *EÒ ® Ú
¼=G + ®�� $ZÏ ¼ Ò�� $�Ï +¦ÒÚ � $\Ï ¼ Ò�� �$\Ï ¼ Ò ® � Ð $ZÏ ¼ Ò
¼ A +½® * ®�� $�� ¼ 7 +��³® $\Ï ¼ Ò�� $\Ï +`Òeå Ï«ÛGK � Ò� )*4I0Q0T #"<U['* #?(0cN(�Q #Nd4��_+.-'*0T>#',U#4�=�':=�+Q254O@B #)*): 2T'*=5>%B�4I6767$(K�8�¨�M�H�¦�5� �"!1¡¨º�¤&©%Ç��¸&�©% B£c¼ ¤&©d» + Ë $�Ï ¼ / +`Ò�® $ZÏ ¼ Ò�� $ZÏ +`ÒxÐ $\Ï ¼ +`Ò�¶
�$# �%� `�KRL��!':+34 ¼0/ + ® Ï ¼ + Ò / Ï ¼ +`Ò / Ï ¼ +`Ò�$&=5S½=5 &+34�+Q2%$8+�+32(4�0Q44IU&4�=<+307$&�!4S5'*0��! #'*=<+;K&�P4I=59�48V�67$'�E'*=(>��Q4�Nd4;$¨+34�S ?50!4� &@T+3254�@�$&9+`+32%$¨+!* '*0¦$&S(�S5',+3':U&4R@B #��S5':0��! #':=E+Z4IU#4I=<+30�V 2�4M0!4�4O+32%$¨+$ � ¼ 7 + � ® $ � Ï ¼ + Ò 7 Ï ¼ +`Ò 7 Ï ¼ +`Ò �® $\Ï ¼ + Ò�� $ZÏ ¼ +`Ò�� $\Ï ¼ +¦Ò® $\Ï ¼ + Ò�� $ZÏ ¼ +`Ò�� $\Ï ¼ +¦Ò���$\Ï ¼ +`ÒxÐ $\Ï ¼ +¦Ò® * � Ï ¼ + Ò 7 Ï ¼ +`Ò � ��$ � Ï ¼ +`Ò 7 Ï ¼ +`Ò � Ð $\Ï ¼ +`Ò® $\Ï ¼ Ò���$\Ï +`ÒxÐ $\Ï ¼ +¦Òå}¾
�E�&�<���<� �/�5� ) · �\¡M¥J¡¨§F©� ¢¡�£J£��J£ ¶ À�� ±bâTÁ �T B¬%�O�¸&�©% � F¬(¤& ¬%�Q¤<»¨£R¡�¥J¥ÄGº3£R¡¨©b ¢¡�£J£PÕ¤&©d»RÈ*� ±`ãPÁ �Z F¬%�T�¸&�©% B¬(¤& d¬%�Q¤<»¨£c¡�¥J¥Ä[º3£P¡¨©Ã .¡�£J£ZØ ¶ �B�\¤&ÈFÈ%¡¨ÄG .¥J¡¨Åb�J£Z¤&º3�P�QÉ�Ä(¤&ÈFÈ Ç
���H����������������� �����_����� ����������� ������ � �@���� Û �È §��<�È ÇWË� F¬(¤& /§*£JË $�Ï¢°¨±bâe²J±`ã�µ8Òî $\Ï.°¨±�âJ²Q´ ã;µ8Ò`® $�Ï.°�´1â²3±ÃãIµ8Ò�® $ZÏ.°�´1âJ²Q´ ãIµ8Ò�®� "�� Ë� B¬%�© $\Ϫ±bâ / ±`ãeÒ�® $\Ϫ±bâ!Ò�� $\Ϫ±`ãeÒxÐ $�Ϫ±bâ!±`ãeÒ�® âã � âã Ð â
� ®rÜZ"��׶Z¾� K � 7 F �8� l �������+7E5�/Z1B5bP=1 / ; 7OI�� F 7EA � A=1 � 1 /Z1 � 3I��� �B�A¼ � � ¼Â F¬%�© $ZÏ ¼ �#Ò%� $\Ï ¼ Ò ¤¨£� � Ñ ¶
��� �!�#" $�<G?(N5Na &0Q4R+325$8+ ¼ �¦'*0\6� #=5 8+3 #=54H'*=59I�Q4;$80Q'*=(>¦0Q `+325$8+ ¼ â GÓ¼ ã G ����� KB�4I+ ¼ ® )*':6���� 6 ¼ � ® / 6æ98aâ ¼ æ K&%R4 -%=54 +/â® ¼ âV�+Pã}® °;� ç { ëê� ç¼ ã�²!� "ç ¼ âeµEV�+Hä�® °;� ç { ë � ç ¼ ä�²!� "ç ¼ ã�²!� "ç ¼ âeµE²�å;å;å4$ +9;$&=ì"d40Q25 2T= +32%$¨+K+�â² +Hã�²;å;å;åa$&�Q4`S('*0��! &'*=<+;V ¼ �b® / �æ98aâ ¼ æ�® / �æ 8aâ +cæD@B #�R4�$&9J2 � $&=5S/ 6æ98aâ +Pæ ® / 6æ98aâ ¼ æ«KcÏ <G4I4M4e~(9I4��!9�'*0!4 � K Ò%`%�Q #6 �c~G'* &6 Ü(V
$\Ï ¼ �#Ò�® $]o �7æ98aâ +Pæ p® �r æ98aâ $�Ï +cæ�Ò
$&=5S|254�=(9�4&V×?50!'*=5> �T~(': #6 Ü�$&><$8'*= V):'*6��� 6 $\Ï ¼ �#Ò�® ):'*6� � 6
�r æ98aâ $\Ï +PæFÒ�® 6r æ98aâ $\Ï +PæFÒ�® $ o 67æ98aâ +cæ p ® $\Ï ¼ Òå}¾��(' �Â���! mg� ��;k��I�#" �� )¯�; ��I�xl fg¦h j�kIl f�jmg¦�TlOn<G?5N5Nd #0Q4¦+Q2%$8+R+Q254¦0Q$&67N()*4i0!N%$&9I4Ã{ ®á°;��âe²;å�å;å�²!���[µÃ'*0�-5=5':+Q4&K�`% #�P4~($&6�N5)*48V':@ 2\4/+3 &0Q0c$�S('*4R+ 2T'*9I4&V5+Q254�=}{r2%$&0cÜ *Ã4�):4�6�4�=<+30IëA{¯®½°GÏBíQ² ^[Ò !cíQ²V^bç�° � ²;å�å;å+*[µ<µEK
$ @�4;$89J2m #?(+Q9� #6�4/'*0T4�CE?%$&):):-�):' �&4I):-&V5+3254I= $\Ï ¼ Ò\® L ¼ L "&Ü *?2T254I�Q4'L ¼ L[S54I=5 &+Q4�0H+Q254=E?56¦"d4��R &@A4I)*4�6�4�=<+Q0R'*= ¼ K/^�254`N5�Q &"%$&"5':)*':+.-+Q2%$8+R+Q254¦0!?56 &@�+Q254`S5':9�4¦':0 � � ':0Û "&Ü *`0!'*=59I4/+Q254��!4�$&�Q4O+ 2� 7 #?G+39� &674I0T+Q2%$8+T9I #�Q�!4�0QNd #=5S�+3 Ã+32('*0�4IU&4�=<+;K$.=³>#4I=54��Q$&)«V(':@x{r'*0N-5=5':+Q4/$&=5S|':@14;$&9J2³ #?(+Q9� #6�4O'*0�4IC[?5$&)*),-�)*'��&4I):-#VG+3254I=
$ZÏ ¼ Ò�® L ¼ LL {3L ²
2T25'*9J2³'*0�9�$&)*):4�S+3254��1�]��� �×�¨u v]�W�&]tE&x��wª�F� W T]��sI���¨��&x���W�B�d�A��^] Ã9� &67N5?G+34/N5�Q &"(�$&"5':)*':+Q'*4I0�V 2�4=54�4IS +Q 9� #?5=<+`+Q254�=[?(6¦"a4I�` &@cNa &'*=<+30`':= $&= 4IU#4I=<+ ¼ K-,³4+325 [S50@B #�\9I #?5=<+3':=5>`Nd #'*=<+30\$&�Q4H9;$&):)*4�S79� #6i"5'*=%$¨+3 #�!' $&)%6�4I+Q25 GS(0�KDL�4R=54�4IS5=�� +TS54�),U#4O':=E+Q +3254I0Q4`':=m$&=<-³>#�Q4�$8+HS(4I+J$8'*)«KTL�432T'*):)«V%25 2�4IU#4I��V�=54I4�S $7@B4 2�@�$&9I+Q0P@B�Q &6o9I #?5=<+3':=5>
Û * ������������ ������������������������
+32(4� #�_-�+32%$8+R2T'*):)�"a4|?50!4I@B?5)c) $8+Q4��IK � ',U#4I= � #" �!4�9+30IVT+Q254|=[?(6¦"a4I�b &@>2Z$;-[0� &@ #�!S54��!'*=5>/+3254I0Q4R &" �!4�9+30Z':0 � � ® � Ï � Ð � Ò�Ï � Ð�Û#Ò ����� Ü � Û � � K`% #��9I #=<U#4I=5'*4I=59�48V 2\4S54 -%=54 Ú � ® � KDL�4�$&)*0! `S54.-5=54 � � ��� ® � �� � Ï � Ð � Ò � ² Ï«ÛGK Û#Ò�Q4�$&S0& � 9J25 [ #0!4 � (5V 2T25':9J2Ã'*0x+Q254�=E?56¦"d4��� &@×S5':0!+Q'*=59+�2Z$;-[0� &@d9J25 [ #0Q':=5> � #" �!4I9I+30@B�Q &6 � K4`× &�\4~($&6�N5):4&V(',@ 2�4O2%$;U#4�$i9�)*$&0Q0\ &@xÛ Ú Nd4� #N5):4O$&=5S 2�4�2\$&=<+�+3 ¦0!4�)*4I9I+c$9� &676�':+!+34�4H &@DÜ`0_+3?5S54I=<+30�V×+32(4�=�+3254I�Q4�$&�Q4� Û ÚÜ � ® Û Ú �Ü � ��� � ® Û Ú� �������Ü � Û � � ® � � � ÚNd #0Q0!'*"5):4O9� #6�67',+Q+Q4�4�0IKxL�4�=( &+34O+32(4/@B &)*)* 2T':=5>¦N5�! #Na4I�!+Q'*4�0Ië� � Ú � ® � �
�� ® � $&=(S � � ��� ® � �
� Ð � � å���� �� ��mlOj�lO ��mlO �� prq�lO ���n$ @ 2\4cY%'*N�$�@�$&':�A9I #'*=�+ 2T'*9I4&VG+Q254�=�+Q254HN(�Q #"%$8"5'*):':+.-` &@ + 2\ ¦2(4;$&S50\'*0 âã � âã K]L�46¦?5),+3':N5):-�+3254MN5�Q &"%$&"5':)*':+Q'*4I0Z"a4I9;$&?(0Q432�4i�!4�><$&�!S³+Q254M+ 2� 7+Q #0Q0!4�0O$&0c'*=(S54�Nd4�=5S(4�=<+;K^�254O@B #�!6b$&)aS54.-5=5':+Q'* #=� &@]'*=5S(4�Nd4�=5S54I=59�4�'*0T$80T@B &)*)* 2T0IKc�� _ 5=1 /Z147E5 �5��� · �\¡��¸&�©% F£T¼ ¤&©d» + ¤&º3� ��� T1yEv�y[�T�y[�a� § �$\Ï ¼ +¦ÒA® $\Ï ¼ Ò $\Ï +`Ò ÏªÛGK Ü#Ò¤&©d»7�\����ºe§F .�c¼�� +�¶ e�£�� �¡¢�¦�¸&�©% B£ ° ¼ æ1ë�í�ç %µ §*£�§F©d»[�.Æ%�©d»[�©% �§ �
$ o Aæ���� ¼ æ p ®�� æ���� $ZÏ ¼ æFÒ�;¡¨º¦�¸&�ºeÇc¹�©%§F .�R£Ä Á £�� ��Ρ¢� a¶$.=5S54INa4I=5S54�=(9�4|9;$&= $&�!'*0Q4b'*=�+ 2� S5'*0_+3'*=(9I+32\$;-[0�K <G &674+3'*6�4�0IV 2�44~GN5)*':9�',+3):-$&0!0Q?56�4}+325$8+�+ 2� 4U#4I=E+Q0|$&�Q4m':=5S54INa4I=5S54�=<+�K `% #�74~($&6�N5)*48Vc':= +3 &0Q0Q':=5> $�9I #'*=
��� �;������4�� ���� �%� ��� �a�#� Û �+ 2T'*9I4&V 2�4�?(0Q?%$&):):-}$&0Q0!?56�4¦+3254¦+Q #0Q0!4�0�$&�Q4i'*=5S54INa4I=5S54I=E+K2T25':9J2 �Q4IY54�9I+Q0O+3254¦@�$&9++32%$¨+P+Q254i9I #'*=³2%$&0H=( �674I67 &�!- &@x+3254@-%�Q0_+H+3 &0Q0�K�$.=m &+Q254��P'*=50_+J$&=(9�4�0IV 2�4iS54I�Q',U#4'*=5S(4�Nd4�=5S54I=59�4Ã"<-}U&4��!':@F-['*=5>�+Q2%$8+ $�Ï ¼ +`Òc® $\Ï ¼ Ò?$\Ï +`Òc25 #)*S(0�K@`% #�H4e~5$867N5):4&V ':=+3 #0!0Q':=5>�$7@�$&':�PS5':4&V ):4I+ ¼ ®ê°8ÛG² �5² *[µÃ$&=5S�):4I+�+ ®á° � ²eÛ[²JÜ(² �(µEKP^�254�= V ¼0: + ®°8ÛG² �(µEV $ZÏ ¼ +`Ò³® Û " *ß® $\Ï ¼ Ò $\Ï +`Ò|® Ï � "&Û#Ò � Ï«Û "&Ü#Ò�$&=(SÓ0Q ¼ $&=5S + $&�!4'*=5S(4�Nd4�=5S54I=<+;K�$.=}+Q25'*0c9;$80Q4&V 2\4¦S('*S5=�� +H$&0Q0!?5674�+32%$¨+ ¼ $8=5SE+ $&�!4i':=5S54�Nd4�=(S54�=<+':+ �!?50_+T+3?5�!=54�S| #?(+�+Q2%$8+T+3254- 2�4��Q48K
<G?5N5Nd #0Q4�+32%$¨+ ¼ $&=5SC+ $&�!4�S('*0��! &'*=<+�4IU#4I=<+30�V�4�$&9J2C2T',+32 Na #0!':+Q':U#4ÃN5�! #"%$&"('*) �':+.-&K��Z$&=�+Q254I-�"a47'*=5S54INa4I=5S54I=E+��P (Km^�25'*0�@B #)*): 2T0i0Q':=59�4!$\Ï ¼ Ò $\Ï +`Ò7î Ú -&4I+$\Ï ¼ +¦Ò`® $�Ï *EÒ`® Ú K� x~G9�4�NG+Ã'*=�+Q25'*0¦0!Na4I9�'*$&)�9�$&0Q48V�+3254I�Q4'*0M=5 2\$�-�+3 ��!?5S(>#4'*=5S(4�Nd4�=5S54I=59�4�"<-�)* [ �E'*=5>¦$Ã+Q254/0Q4I+Q0T'*=|$���4�=5=³S5' $&>&�3$&6|K�E�&�<�M�<� ���5�,��� · ¡�£J£/¤��I¤&§FºM¥J¡¨§F©mÕ[Ö¦ ª§FÅb�J£ ¶ À1� �¼ ®�� ¤& �È*�Q¤¨£ \¡¨© �����Q¤<» ¶�� À�� ´ [Á �� B¬%�`�¸&�©% � F¬(¤& � «¤&§FÈ £`¡�¥J¥Ä[ºJ£`¡¨©} B¬%� ^���� .¡�£J£ ¶7· ¬%�©
$\Ï ¼ Ò ® � Ð $\Ï ¼ Ò® � Ð $\Ϫ$8)*) +J$8'*)*0QÒ® � Ð $\ÏB´�â.´aã ����� ´�â��Ò® � Ð $\ÏB´�â!Ò $ZÏF´ ãÒ ����� $\ÏB´�â��Ò ?(0Q'*=(>'*=5S54INa4I=5S54I=59�4® � Ð � �Û �
â��� å � � � å�¾
�E�&�<�M�<� ���5�,�&� · �\¡7Æ%�J¡JÆdÈ*� «¤ �<�� ªÄ[ºe©(£� �ºÇ¨§F©[Ô� .¡�£§F© � ¤ Á ¤¨£ �<� Á ¤&ÈFÈ\§F©% .¡ ¤ © � ¶� �ºJ£�¡¨©bÕH£Ä×¥J¥J�J�Q»¨£���§F F¬PÆdº3¡ Á ¤ Á §FÈ §F �ÇOÕ! aÙ/�x¬[§FÈ*�1Æ%�º3£�¡¨©ÃØO£Ä×¥J¥J�J�Q»¨£���§F F¬PÆdº3¡ Á ¤ Á §FÈ §F �ÇÕ! #" ¶%$ ¬(¤& T§*£¦ B¬%�RÆdº3¡ Á ¤ Á §FÈ §F �Ç B¬(¤& DÆ%�º3£�¡¨©�Õ�£Ä×¥J¥J�J�Q»¨£ Á �B�;¡¨º3�HÆ%�º3£�¡¨©�Ø'&�À�� Aé»[�© ¡¨ ¢�³ F¬%���¸&�©% á¢�|§F©% ¢�º3�J£ ¶ À1� H¼N[ Á �³ F¬%���¸&�©% M B¬(¤& ¦ F¬%�M¹�ºJ£ /£Ä×¥J¥J�J£J£³§*£Á Ç�Æ%�º3£�¡¨© Õ�¤&©d»} F¬(¤& P§F /¡�¥J¥ÄGº3£¡¨©� �ºe§�¤&È�©%ÄGÅ Á �º ^d¶)( ¡¨ .�� B¬(¤& �¼ â² ¼ ã;²;å;å�å ¤&º3�»&§*£�n;¡¨§F©% �¤&©d»b B¬(¤& ]é ® / 6[ 8aâ ¼N[ ¶ �M�© ¥J�JË
$\Ï é ÒA® 6r[ 8aâ $�Ï ¼N[ Òå
( ¡¨�xË $\Ï ¼ âQÒ�® � "8Ü%¶ ¼ ã ¡�¥J¥ÄGº3£M§ �/�\�/¬(¤&¸&�� B¬%�/£��QÉ�Ä%�© ¥J�iÕÅ`§*£J£��J£JËAØ�Å`§*£J£��J£JË\Õ£Ä×¥J¥J�J�Q»¨£ ¶ · ¬[§*£¦¬(¤¨£OÆdº3¡ Á ¤ Á §FÈ §F �Ç $\Ï ¼ ãÒO® ÏªÛ "&Ü<ÒIϪÜZ"��<Ò�Ï � "&Ü<ÒR® Ï � "#Û#ÒIÏ � "8Ü<Ò�¶ !1¡¨È Ê
Û ������������ ������������������������
È*¡¨��§F©[Ô7 F¬[§*£/È*¡QÔ&§«¥��\�R£��J�/ F¬(¤& $\Ï ¼N[ Ò�®ÂÏ � "#Û#Ò [�� â Ï � "&Ü<ÒI¶ �M�© ¥J�JË$\Ï é Ò�® 6r
[ 8aâ�Ü � �Û �
[�� â ® �Ü6r[ 8aâ
� �Û �[�� â ® ÛÜ å
���ºJ���\�OÄE£��Q»b F¬(¤& d�I¤[¥ A F¬(¤& BËA§ � Ú ����� � F¬%�©�� 6[ 8 � [ ® � "5Ï � Ð � Ò�¶Z¾
<[?56767$&�_-� &@$.=5S54INa4I=5S54�=(9�4� K ¼ $&=(S,+á$&�!4�':=5S54INa4I=5S54�=<+c':@�$\Ï ¼ +`Ò�® $\Ï ¼ Ò?$\Ï +`ÒeKÛ[K�$.=(S54�Nd4�=5S(4�=59I4i'*0T0! #674+3':674I0�$&0Q0!?56�4�S³$&=5S|0Q #6�4I+Q'*6�4�0�S54I�Q':U&4�S KÜGK %R':0��! #':=<+�4IU&4�=<+30�2T':+32�Nd #0Q',+3',U#4ON5�! #"%$&"5':)*',+.-b$8�Q4/=5 &+T':=5S54�Nd4�=(S54�=<+;K
���� ���� ����I�D���� mg¦k �Â���! mg� ��;k��I�#"�H0!0Q?56�'*=5>R+Q2%$8+%$\Ï +¦Ò\î Ú V 2�4TS54.-%=(4T+Q254�9� &=5S5':+Q'* #=5$&)GN(�Q #"%$8"5'*):':+.-� &@ ¼ >#',U#4�=+325$8+�+á25$&0T G9I9�?5�!�Q4�S³$&0�@B #):)* 2T0�Kc�� _ 5=1 /Z147E5 �5�,�;�¯�B� $\Ï +`Ò�î Ú B¬%�© B¬%� z#�d� T]�F�W�B�d�1t%wMv]�W�&]tE&x��wª�F� W ¡¢�`¼Ô&§F¸&�© + §*£ $ZÏ ¼ L +`Ò�® $\Ï ¼ +`Ò$ZÏ +`Ò å ϪÛGK �<Ò
^�25'*= �b &@�$\Ï ¼ L +`Ò\$80\+3254R@B�Q$&9I+Q'* #=b &@�+3'*6�4�0 ¼ G9I9�?5�!0T$&6� #=5>i+Q25 #0Q4O':= 2T25'*9J2+ [9�9I?5�Q0IK$�P4��!4O$&�!4R0Q #6�4P@�$89I+30\$&"d #?(+Z9I #=5S5',+3': #=%$&)%N5�! #"%$&"('*)*',+3':4�0�K`% #�Z$8=E-a-(~G4IS+ 0Q?(9J2Ã+Q2%$8+�$\Ï +`Ò\î Ú VO$\Ï � L +¦Ò1'*0D$HN5�Q &"%$&"5':)*':+.-/'ªK¿48Kx',+D03$¨+3'*0 -%4�0D+Q254\+Q25�Q4I4T$¨~G'* #6�0 &@�N(�Q #"%$8"5'*):':+.-#K�$.=�N%$8�!+3':9�?5)*$&��VW$\Ï ¼ L +`Ò6j Ú V $�Ï�{3L +`ÒT® � $&=5Sm',@ ¼ â² ¼ ã;²;å�å;å $8�Q4S5':0��! #':=E+a+3254I= $\Ï / 6æ98aâ ¼ æ L +`Ò�® � 6æ98aâ $ZÏ ¼ æ L +`ÒeK �\?G+�':+�':0 '*=O>#4I=54��Q$&)E���×��+Q�Q?54�+Q2%$8+$\Ï ¼ L + /� MÒO®�$\Ï ¼ L +`Ò �R$\Ï ¼ L� MÒKb^�2(47�!?5)*4I0M &@\N5�Q &"%$&"5':)*':+.-}$&N5N():-m+Q ³4U#4�=<+Q0
���Z� ����������� � ������� ��������������� ����� Û � #=}+Q254¦)*4@F+H 8@D+3254i"%$&��K�$.=�>#4I=54��Q$&)]',+H':0R���%�¦+3254M9;$&0!4i+325$8+6$\Ï ¼ L +`Ò\® $\Ï +,L ¼ ÒeK�]4I #N5)*4M>#4I+P+325':0P9I #=(@B?50!4�S�$8)*)�+3254�+3':6748K�`× &�c4e~($&67N()*4&V×+32(4iN5�! #"%$&"5':)*',+.-� &@�0!Na 8+30>#':U&4�=�-# #?|2%$;U#4�6�4;$&0!)*4�0�':0 � "(?(+�+3254/N5�! #"%$&"('*)*',+.-7+Q2%$8+�-& #?}2%$;U&4�6�4;$80Q)*4I0T>#':U&4�=+32%$¨+A-& #?b25$�U&4P0!Na 8+30A'*0�=5 &+ � K $.=�+Q25'*0�9;$80Q4&VE+32(4cS('SPa4��Q4I=59�4c"d4I+ 2�4�4�=�$ZÏ ¼ L +`Ò�$&=5S$\Ï +,L ¼ Ò1':0D #"<U['* #?(0x"5?(+x+3254I�Q4T$&�!4\9;$80Q4�0�2T2(4��Q4�',+D':0x)*4I0Q0D &"EU[': #?50�K]^�25'*0x67':0!+J$��&4\':06b$8S54H &@F+Q4�=4I=5 #?5>#2b'*=�):4�><$&)d9;$80Q4�0�+32%$8+Z':+�'*0\0! #6�4I+3':674I0�9�$&)*):4�Sb+Q254HN(�Q #0!4�9�?G+3 #���¿0@�$&)*)*$&9I-&K
�E�&�<�M�<� ���5�,�;Þ]e Åb�Q»&§«¥Q¤&È� .�J£ x�;¡¨º¤ »&§*£��Q¤¨£���� ¬(¤¨£�¡¨ÄG .¥J¡¨Åb�J£ � ¤&©d» Ц¶ · ¬%�Ædº3¡ Á ¤ Á §FÈ §F ª§«�J£M¤&º3��g� � � ¶ Ö<Ö��EÕ ¶ Ö��[Ö<ÖÐ ¶ Ö<Ö��[Ö ¶ �[Ö&Õ[Ö
! º3¡¨Å B¬%�|»[�B¹�©%§F ª§«¡¨©Ó¡¢�³¥J¡¨©d»&§F �§«¡¨©d¤&È�Ædº3¡ Á ¤ Á §FÈ §F �ÇWË $�Ï � L � Òî $\Ï �¦² � Ò "M$�Ï � Ò�®å Ú&Ú � "5Ï¢å Ú&Ú � ��å Ú&Ú#Ú � Ò1® å � ¤&©d» $\Ï_ÐaL � Ò�® $\Ï¢ÐM² � Ò "M$\Ï � Ò�®½å �#Ú ��Ú "5Ï_å �#Ú ��Ú �å Ú �#Ú#Ú Ò � å � ¶ eZÆ<Æ(¤&º3�©% �È ÇWË� B¬%� .�J£ �§*£`�I¤&§FºeÈ Ç ¤[¥J¥Ä[º!¤& ¢� ¶� §«¥ �mÆ%�J¡JÆdÈ*��Ǩ§«�È,»¯¤Æ%¡�£§F ª§F¸&��[Ö`Æ%�ºJ¥J�©% R¡¢�` F¬%�� �§FÅb�b¤&©d»�¬%�Q¤&È B¬[Ç`Æ%�J¡JÆdÈ*�`Ǩ§«�È,»�¤³© �«Ô<¤& �§F¸&�7¤ Á ¡¨Ä[ ��[ÖÆ%�º3¥J�©% A¡¢�T F¬%�H �§FÅb� ¶�� Ä�Æ<Æ%¡�£��cÇ<¡¨Ä`Ô[¡A�;¡¨ºP¤` .�J£ �¤&©d»OÔ[� D¤PÆ%¡�£§F ª§F¸&� ¶ $ ¬(¤& ]§*£P B¬%�Ædº3¡ Á ¤ Á §FÈ §F ªÇRÇ<¡¨Äi¬(¤&¸&�\ B¬%�T»&§*£��Q¤¨£�� &� ¡�£ (Æ%�J¡JÆdÈ*�Z¤&©(£�\�º ¶ �[Ö ¶¦· ¬%�T¥J¡¨ºº3�J¥ 1¤&©(£�\�º§*£ $\Ï � L �/Ò�® $\Ï �¦² � Ò "M$�Ï �/Ò�®½å Ú#Ú � "5Ï_å Ú#Ú � �må Ú �&Ú#Ú Òx®�å Ú å�· ¬%�TÈ*�J£J£�¡¨©�¬%�º3�T§*£ B¬(¤& ]ÇE¡¨Ä7© �J�Q»M .¡�¥J¡¨ÅHÆdÄ[ .�c B¬%�R¤&©(£�\�º�©%Ä[Åb�º§«¥Q¤&ÈFÈ Ç ¶�� ¡¨©�� ] �ºeÄE£ ]ÇE¡¨Ä[ºP§F©% �Ä[§F ª§«¡¨© ¶¾$ @ ¼ $&=5S'+á$&�Q4/'*=(S54�Nd4�=5S(4�=<+c4IU#4I=<+30P+Q254�=
$\Ï ¼ L +`Ò�® $ZÏ ¼ +`Ò$�Ï +`Ò ® $ZÏ ¼ Ò $ZÏ +`Ò$ZÏ +`Ò ® $\Ï ¼ Òå<G `$&=( &+3254I�\':=E+Q4��!N5�Q4+J$8+Q'* #=7 &@�'*=(S54�Nd4�=5S(4�=59I4O'*0�+Q2%$8+ �E=5 2T'*=5>%+ S5 G4I0Q=�� +�9J2%$8=5>#4+3254/N5�! #"%$&"('*)*',+.-b 8@ ¼ K
`×�! #6 +Q254 S54.-5=5':+Q'* #=Ó &@�9I #=5S5',+3'* &=%$&)HN(�Q #"%$8"5'*):':+.- 2\4 9�$&= 2T�!':+Q4 $ZÏ ¼ +`Ò�®$\Ï ¼ L +`Ò?$\Ï +`Òc$8=5S�$&):0Q $ZÏ ¼ +`Ò\®\$\Ï +'L ¼ Ò $�Ï ¼ ÒK � @F+34�= Vd+Q254�0!4i@B #�Q6i?5) $&4/>#',U#4�?50$`9� #=<U#4I=5'*4I=<+�2\$�-+Q Ã9� #6�N5?(+Q4 $ZÏ ¼ +`Ò�2T254I= ¼ $&=(S +á$&�Q4O=( &+c'*=5S54INa4I=5S54I=E+�K
Ü Ú ������������ ������������������������
�E�&�<���<� �/�5� � � � º!¤&� ª�\¡i¥Q¤&ºQ»¨£ �eºJ¡¨Å ¤¦»[�J¥ ��Ë1��§F B¬%¡¨Ä[ 1ºJ�.ÆdÈ,¤[¥J�Åb�©% ¶ À�� a¼ Á �T B¬%��¸&�©% � F¬(¤& 1 F¬%��¹�º3£ ]»&º!¤&��§*£ eH¥J�H¡¢���DÈ Ä Á £c¤&©d»/È*� + Á �T F¬%�H�¸&�©% � F¬(¤& 1 F¬%�Z£��J¥J¡¨©d»»&º!¤&��§*£��ZÄ×�J�©�¡¢� � §�¤&Åb¡¨©d»¨£ ¶M· ¬%�© $\Ï ¼ ² +`Ò�® $ZÏ ¼ Ò $ZÏ +,L ¼ Ò�®�Ï � " �#Û#Ò � Ï � " � � Ò�¶¾
<[?56767$&�_-� &@ �\ #=5S5',+3': #=%$&) ���! #"%$&"('*)*',+.-� K�$ @ $�Ï +`ÒZî Ú +Q254�= $\Ï ¼ L +¦Ò�® $\Ï ¼ +¦Ò$�Ï +`Ò åÛ[K2$\Ï � L +¦Ò�0Q$8+3':0 -%4I0]+32(4Z$¨~G'* &6701 &@%N5�! #"%$&"('*)*',+.-#V�@B #��-(~G4�Sa+bK$.=¦>#4I=54��Q$&)«VM$ZÏ ¼ L � ÒS( G4I0c=( &+c03$8+Q'*0!@F-7+3254�$¨~G': #670� 8@xN5�! #"%$&"('*)*',+.-#VE@B #�N-(~G4�S ¼ KÜGK�$.=|>#4I=54��Q$&)«VE$\Ï ¼ L +¦Ò�]® $\Ï +,L ¼ ÒK�(K ¼ $&=(S,+á$&�!4�':=5S54INa4I=5S54�=<+c':@x$&=5S| #=5),-�',@�$\Ï ¼ L +¦Ò�® $\Ï +`ÒeK
���� ��g6"�lOn����mlR����lOh�Z$;-#4I0 �<+3254I #�Q4I6�'*0D$/?(0Q4I@B?()%�Q4�0!?5):+�+Q2%$8+�':0x+3254T"%$80Q'*0� 8@N&_4~GNa4I�!+A0!-[0_+34�6�0 (M$&=5S
& �Z$;-&4�0 �×=54+30IK)(C`x'*�!0!+;V�2\4/=54I4�Sm$`N5�!4�):'*6�'*=%$&�_-��Q4�0!?5):+�K� K � 7 F �¨� l��fi� � � K � �8�Q: 7OI � 7O/ �<� � F 7EA � A=1 � 1 / ; ��� À�� c¼ â²;å;å;åI² ¼ �Á �³¤�Æ(¤&ºe �§F ª§«¡¨©¡¢� {/¶7· ¬%�©(Ë%�;¡¨ºM¤&©%Ç�¸&�©% + Ë $\Ï +`Ò�® � æ98aâ $�Ï +,L ¼ æBÒ?$\Ï ¼ æ�ÒI¶
��� �!�#" $!%R4.-5=54 [ ® + ¼N[ $&=5S=5 &+Q4H+325$8+ Pâe²�å;å;å�² R$&�!4RS5'*0��! #'*=<+\$&=5S�+Q2%$8++ ® / [ 8aâ [ K$�c4�=59I4&V
$\Ï +¦ÒA® r[$ZÏ [ Ò�® r
[$\Ï + ¼�[ Ò�® r
[$\Ï +'L ¼N[ Ò $ZÏ ¼N[ Ò
���H����������� �����������>� �;�� ,����� � Ü �0Q':=59�4 $ZÏ + ¼N[ Ò�® $\Ï +'L ¼N[ Ò $ZÏ ¼N[ Ò�@B�! #6�+32(4HS54 -%=5':+Q'* #=� &@�9I #=5S5',+3': #=%$&)%N5�! #"%$&"('*)*',+.-#K¾� K � 7 F �8� l �fi����� �<;I� 3� � K � 7 F �¨� ��� À1� (¼ âe²;å�å;å�² ¼ OÁ �\¤�Æ(¤&ºe ª§F �§«¡¨©�¡¢� { £Ä×¥J¬M F¬(¤& $\Ï ¼ æFÒZî Ú �;¡¨º¦�Q¤[¥J¬ í3¶ �B� $\Ï +¦Ò�î Ú F¬%�©(Ë×�;¡¨ºi�Q¤[¥J¬ í]® � ²;å;å�åI² � Ë
$\Ï ¼ æ L +¦ÒA® $\Ï +'L ¼ æFÒ?$�Ï ¼ æ�Ò� [ $\Ï +'L ¼N[ Ò?$\Ï ¼N[ Ò å Ï«Û[K��#Ò
�a�¨�R�9F� ��5� � ) $ �m¥Q¤&ÈFÈ $\Ï ¼ æBÒ B¬%� v]�¨�B�d��v]�W�&]tE&x��wª�F� W½�,� ¼ ¤&©d» $\Ï ¼ æ L +`Ò B¬%�v��dsI��yE�¨���×�bv1�W�&]t)&]��wª�F� W��� ¼ ¶��� �!�#" $�L�4\$&N5N5),-R+32(4\S54 -%=5':+Q'* #=� &@(9� #=5S(':+3': #=%$&)<N(�Q #"%$8"5'*):':+.-H+ 2T':9�48V8@B #):)* 2\4IS"<-+3254/) $ 2r &@1+3 &+3$&) N5�! #"%$&"5':)*',+.-aë
$ZÏ ¼ æ L +`Ò�® $\Ï ¼ æX+¦Ò$\Ï +¦Ò ® $\Ï +,L ¼ æBÒ $\Ï ¼ æFÒ$\Ï +`Ò ® $\Ï +,L ¼ æBÒ?$\Ï ¼ æBÒ� [ $\Ï +'L ¼N[ Ò $ZÏ ¼N[ Ò åm¾
�E�&�<�M�<� ���5�,��� �7»&§F¸�§�»[��Å`Ç �Å�¤&§FÈ�§F©% .¡} F¬[º3�J��¥Q¤& ¢�«Ô[¡¨ºe§«�J£�g¦¼ â�� � £.Æ(¤&ÅiË � ¼ ã��� È*¡¨��Ædºe§«¡¨ºe§F �Ç � ¤&©d»T¼ ä�� � ¬[§,Ô¨¬PÆdºe§«¡¨ºe§F �Ç ¶�� ! º3¡¨Å¯Ædº3�¸�§«¡¨ÄE£c�!ÌIÆ%�ºe§«�© ¥J���a¹�©d»O F¬(¤& $\Ï ¼ âQÒ�®½å � Ë $�Ï ¼ ãÒ�® å Û ¤&©d» $\Ï ¼ äeÒ�® å � ¶�� �O¥J¡¨Ä[º3£��JË å � ��å Û ��å � ® � ¶ À�� + Á � B¬%�M�¸&�©% � B¬(¤& � F¬%�i�Å�¤&§FÈ�¥J¡¨©% ¤&§F©(£R B¬%�R�\¡¨º!» � �eº3�J� ¶�� ! º3¡¨Å�Ædº3�¸�§«¡¨Ä<£M�!ÌIÆ%�ºe§«�© ¥J�JË$\Ï +,L ¼ â!Ò�® å � Ë $\Ï +,L ¼ ãeÒ�®�å Ú � Ë $\Ï +,L ¼ â!Ò\®�å Ú � ¶�� ( ¡¨ .��g å � �Îå Ú � �rå Ú � ]® � ¶�� �º3�J¥J�§F¸&�c¤&©³�Å�¤&§FÈG��§F B¬¦ B¬%�T�\¡¨º!» � �eº3�J� ¶�� $ ¬(¤& �§*£T F¬%�xÆdº3¡ Á ¤ Á §FÈ §F �Ç/ B¬(¤& �§F �§*£Z£.Æ(¤&Å�&� ¤&Ç<�J£ �E B¬%�J¡¨º3�ŠǨ§«�È,»¨£JË
$\Ï ¼ â L +¦ÒA® å �� å �Ï_å �� å � Ò��rÏ_å Ú � � å Û#Ò��XÏ¢å Ú � � å � Ò ®½å � � �[å ¾
���� � � �k������7��gij ���;� � lRh g¦�! n^�254�6b$8+Q4��!' $&)A'*= +Q25'*0`9J2%$8N(+34I�`'*0`0!+3$&=5S%$8�QS K %R4I+J$8'*)*0M9;$&= "d4�@B #?5=5S ':= $&=<-=E?56¦"d4��� &@1"d G �E0IK%�Z+�+3254O':=E+Q�Q [S5?59+3 #�_-b):4IU#4I)«V[+3254I�Q4O'*0 %R4 � �Q [ &+�$&=(S'<G9J254��_U['*0Q2
Ü<Û ������������ ������������������������
Ï«Û Ú#Ú Û#ÒeV�$8+O+3254Ã':=E+Q4��!674IS5' $8+Q4¦)*4U#4I)«V � �!'*6�674+Q+R$&=5SC<[+3':� �;$��&4��7Ï ��� Û#ÒP$&=5S���$8�Q�Ï � � � Ü<ÒeV×$&=5Sm$¨+P+Q254i$8S(U8$&=59�4ISm)*4U#4I)«V �\'*):)*':=5>#0Q):4I-}Ï � � � � Ò�$&=5S �\�Q4I'*67$&=�Ï ��� * ÒK�$$&S%$8N(+34IS 67$&=<-�4e~5$867N5):4�0/$&=(S N(�Q #"5):4�6�0R@B�Q #6 %R4 � �Q [ &+O$8=5S <[9J254��_UG':0Q2¯Ï«Û Ú#Ú Û#Ò$&=5S � �Q':676�4I+!+�$&=5S <[+Q'*� ��$'�&4I�MÏ ��� Û#ÒK���� lO� �� ����cgik�� j�j�lO ������� 4I=54��Q$&)*),-#V�',+�'*0�=5 &+�@B4�$&0Q':"5)*4x+Q c$&0Q0!'*>#=�N5�Q &"%$&"5':)*':+Q'*4I0a+3 c$&):)&0!?5"50Q4+30] &@5$�0Q$&67N()*40QN5$&9�4T{RK�$.=(0!+34�$&S V< #=(4��Q4�0_+3�!'*9I+Q0�$8+!+34�=<+Q'* #=¦+Q O$R0Q4+� &@d4IU#4I=<+30�9;$&):)*4IS`$ ^ 'Qt%wf�×y &1�¨t #�T$ ^ '�xy[w TC2T25'*9J2|'*0T$`9I) $&0!0�ê+325$8+c03$8+Q'*0 -54�0�ëÏ�' ÒN*�ç�VÏ�':'FÒ�',@ ¼ â² ¼ ã�²;å;å;åI²;ç á+3254I= / 6æ98aâ ¼ æ1ç��$&=5SÏ�':'*'FÒ ¼ ç��':67N()*'*4I0�+Q2%$8+ ¼ ç�K^�25470Q4I+Q0¦'*=� $&�!470Q$&'*S +3 }"d4�u�y[t5s;�1�Wt)&]wBydKL�4b9�$&)*)PϪ{R²��Ò�$³u�yEt%s;���¨tE&xwBys�v1t%z#y×��$ @ * '*0Z$¦N5�Q #"5$&"5'*):':+.-�6�4;$&0!?5�Q4OS54 -%=54IS³ #=� +Q254�=�Ϫ{R²�²�*iÒ�'*0Z9�$&)*):4�S�$v1�¨�&1t)&]�ªw��B� W s�v1t%z#y×�iLì254�= { ':0c+Q254¦�Q4�$&)1)*'*=(4&V 2\4i+3$'�&4� +3 �"d4M+Q254¦0Q67$&):)*4�0_+^ � -%4�):Sb+32%$¨+�9� #=<+3$&'*=50T$8)*)×+3254R &Na4I=|0!?5"50!4I+30IV 2T25'*9J2�':0Z9;$&):)*4ISb+3254��¦�×�Wy[w ^ '��]y[w T�K
����� p��³�TlO���P��nxlOn� K�`x':)*)¨'*=O+32(4AS(4I+J$8'*)*0 &@G+Q254�N5�Q [ &@G &@G^�254� #�!4�6½ÛGK K��H):0Q 5V¨N5�! WU&4�+3254�6� #=5 &+Q #=54S54I9��!4;$&0!'*=5>�9;$&0!4&KÛGKc���Q �U&4�+Q254/0!+J$¨+34�6�4�=<+Q0c':=�4�CE?%$8+Q'* #=�Ï«Û[K � ÒeKÜ(K�B�4I+M{Â"d4�$|0Q$&6�N5)*4`0!N%$&9�4b$&=5S )*4+ ¼ âe² ¼ ã;²;å;å;åI²�"a4�4IU&4�=<+30IK %R4.-%=(4 + �³®/ 6æ98E� ¼ æx$8=5S �/® : 6æ98E� ¼ æ«KÏ�$<Ò�<G25 2X+Q2%$8+>+�â�I +Hã�I ����� $&=5S�+32%$¨+ Pâ G +Hã G ����� KÏB"×Ò <G2( 2o+325$8+b� ç : 6� 8aâ + ��',@�$&=5SÓ #=5),- ':@R�á"d4�)* &=5>#07+Q $&=Ó'*=�-%=5':+Q4=E?56¦"d4��T &@1+32(4�4U#4I=E+Q0 ¼ âe² ¼ ã�²;å;å;å,KÏB9�ÒK<G25 2Â+Q2%$8+O� ç / 6� 8aâ ��':@\$&=5S #=5),-m',@��X"d4�)* &=5>#0O+3 |$&):)x+3254Ã4U#4�=<+Q0¼ âe² ¼ ã�²;å;å;å54e~G9�4�NG+HNd #0Q0!'*"5),-$%-%=5',+34O=E?56¦"d4��T &@1+325 &0Q4/4IU&4�=<+30IK
�5K�B�4I+P° ¼ æ1ë¦í�ç %µ/"d4R$i9I #)*):4�9I+Q'* #=� &@�4U#4I=E+Q0�2T254��!4 Ã'*0�$&=$&�!"5':+Q�3$&�_-Ã'*=5S54e~
�����Z����M������ �(� � Ü#Ü0Q4+;K�<G25 2 +Q2%$8+
o 7æ���� ¼ æ�p ® Aæ���� ¼ æ $&=5S o Aæ���� ¼ æ p ® 7æ���� ¼ æ�P':=<+;ë�`x':�Q0!+TN(�Q �U#4/+325':0Z@B #� `®�° � ²;å;å;å² � µEK
�GK><G?5N(Na #0!4R2�4Ã+3 #0!0i$@�$&'*�H9� #':=�?5=<+Q'*)�2�47>#4+�4~($&9I+Q):-m+ 2� }2(4;$&S50IK %R4I0Q9��!'*"d4+32(4i03$867N5):4/0QN%$&9I4ZKZLì2%$8+H'*0T+3254�N5�! #"%$&"5':)*',+.-b+325$8+H4e~($&9I+Q):- � +Q #0Q0!4�0O$&�!4�Q4ICE?5'*�!4�S *(K�B�4I+H{ή ° Ú ² � ²;å;å;å²�µEK����! WU&4i+325$8+P+Q254��!4iS5 [4�0P=5 &+P4~G'*0_+R$�?5=5',@B #�Q6 S5':0!+Q�Q' �"5?(+Q'* #=� #= {�'«K 4&K³':@k$\Ï ¼ Òi® $\Ï +¦ÒK2T254I=54IU&4��EL ¼ L�® L +,Ld+3254I= $Î9;$&=5=( &+03$¨+3'*0_@F-�+32(4M$W~(': #6�0Z &@]N5�Q #"5$&"5'*):':+.-&K� K�B�4I+ ¼ âJ² ¼ ã�²;å�å;å("d4/4IU#4I=<+30�KN<G25 2X+Q2%$8+
$ o 67� 8aâ ¼ � p � 6r � 8aâ $�Ï ¼ �<Ò å�P':=<+;ë %R4 -%=54M+ �¦® ¼ �/Ð / � � âæ98aâ ¼ æ.KZ^�254I=m0Q2( 2 +Q2%$8+c+3254M+ �7$&�!4MS5':0��! #':=<+$&=5S�+325$8+>/ 6� 8aâ ¼ �M® / 6� 8aâ + �GK
K><G?5N(Na #0!4�+Q2%$8+ $\Ï ¼ æBÒ�® � @B #��4�$&9J2³í!KA���! �U#4/+32%$8+$]o 6Aæ98aâ ¼ æ p® � å
� K�`% #�N-(~G4�S +á0!?59J2}+Q2%$8+ $\Ï +`ÒZî Ú V%0Q25 2 +Q2%$8+2$\Ï � L +`Ò\03$¨+3'*0 -%4�0T+32(4M$W~(': #6�0 &@]N5�Q &"%$&"5':)*':+.-&K��Ú K��A #? 25$�U&4N5�Q &"%$&"5),- 254;$&�!S�':+M"a4@B #�Q48K �P 2 -& #?�9;$&=�0! #):U&4b',+i�!'*># #�! #?50!):-#K
$ +`':0i9;$8)*)*4IS�+Q254 & ,³ &=E+.- �H$&):)A���! #"5):4�6|K)( � N(�Q' �I4b':0iN5)*$&9�4IS $8+¦�Q$&=5S5 #6"d4I+ 2�4�4�= #=54� &@c+325�!4�4�S5 [ #�Q0IK �A &? N('*9 ��$ S5 [ #��K�^1 �"d4�9� #=(9��Q4+348V\)*4+ � 00Q?(N5Na &0Q4Ã-# #?�$&)S2\$;-[0�N('*9 �mS5 [ #� � K �P 2 ,³ &=E+.- �P$&)*)]9J25 [ #0Q4I0M #=(4� 8@�+Q254 &+Q254��D+ 2� MS( G #�!0�V< #Nd4�=(0�':+�$&=(S�0!25 2T0A-& #?`+32%$¨+�':+�':0�4�6�N(+.-#K��c4Z+3254I=7>&':U#4I0-# &?�+Q254P &N5Na &�!+3?(=5':+.-M+3 ��&4�4IN7-& #?5��S5 [ #�� #��0 2T',+39J2Ã+3 /+3254T &+Q254���?5=5 &Na4I=54�S
Ü'� ������������ ������������������������
S5 [ #�IK�<G25 #?()*S�-# #?|0!+3$�-� #��0 2T',+39J2 $.=<+3?(':+3': #=�0Q?5>&>#4�0_+30c':+�S5 [4�0!=�� +P67$8+!+34��IK^�254/9I #�Q�!4�9I+H$&=50 2\4I�P':0Z+32%$¨+c-& #?}0!25 #?5):S�0 2T':+Q9J2 K����! �U#4i',+;KF$ +�2T'*):) 254I)*N�+3 0!Na4I9�':@F-b+3254/03$867N5):4/0QN%$&9I4i$8=5S|+Q254/�Q4I)*4IU8$&=<+c4IU&4�=<+30H9;$&�!4I@B?5):):-#K�^�2E?50�2T�Q',+34{ ®�°GÏF��âe²_�xãJÒ�ë¦�1æ1ç�° � ²eÛ[²JÜGµ<µ�2T254I�Q4O�Aâ�':0�2T254��!4�+Q254ON5�!' ��4O':0T$&=5S��]ãT'*0+Q254/S5 G &� ,³ #=<+.- #Nd4�=50IK� � K><G?(N5Na &0Q4M+Q2%$8+ ¼ $&=5S + $&�Q4�'*=(S54�Nd4�=5S(4�=<+/4IU&4�=<+30IKK<G25 2�+32%$¨+ ¼ $&=5SE+ $&�!4O'*=5S(4�Nd4�=5S54I=<+H4U#4�=<+Q0�K� ÛGKT^�254I�Q4�$&�!4¦+325�!4�4Ã9;$&�!S50�KÃ^�254%-%�!0!+�'*0O>&�Q4�4I= #=�"d &+Q2 0!'*S54I0�V�+32(4`0Q4�9I #=5S�'*0�!4�S� #=�"d &+32�0!'*S54I0Ã$&=5S +Q254b+32('*�QS�':0i>#�!4�4I= &=� #=54�0!'*S54$8=5S �Q4�S� &= +3254 &+Q254��IKiL�4�9J2( G #0!4�$�9�$&�QS�$8+O�Q$&=5S5 &6 $&=5SE2�4�0!4�4� #=54`0!'*S54³Ïª$8)*0Q 9J2( #0Q4I=$8+M�3$8=5S5 #6bÒK\$ @�+Q254b0!'*S54a2\4b0Q4�47'*0�>#�Q4I4�= V2T2%$8+¦':0�+Q254bN(�Q #"%$8"5'*):':+.-m+Q2%$8++Q254T &+3254I��0Q'*S(4c':0�$&)*0! />#�Q4I4�= ,³$&=<-`Na4I #N5):4c':=<+3?5',+3':U&4�),-¦$&=(0 2�4�� ��� ÛGK�<[25 2+Q2%$8+�+32(4�9I #�Q�!4�9+H$8=50 2�4��c'*0cÛ � Ü(K� Ü(K><G?(N5Na &0Q4P+Q2%$8+Z$�@�$&':�A9I #'*=7'*0A+3 #0!0Q4�Sb�Q4�Nd4;$¨+34�S():-7?5=<+3'*)×"d &+32b$i254�$&S$&=5S7+J$&':)2%$;U&4i$&N(Na4�$&�Q4IS}$8+T):4;$&0_+T #=59I4&K
Ï�$<Ò %R4�0!9��Q':"a4R+Q254/03$&6�N5):4O0QN%$&9I4i{RKÏB"×Ò�Lì2%$8+c'*0Z+Q254/N5�Q &"%$&"5':)*':+.-�+325$8+�+325�!4�4O+3 &0Q0Q4I0�2T':)*)a"a4/�Q4ICE?5'*�!4�S
� �5K><G2( 2á+325$8+`':@ $\Ï ¼ Òî Ú #� $�Ï ¼ Òî � +Q254�= ¼ ':0¦'*=5S(4�Nd4�=5S54I=<+7 8@P4U#4I�!- &+Q254���4IU&4�=<+;K�<G25 2ì+32%$¨+�':@ ¼ '*0\':=5S54INa4I=5S54�=<+T &@�',+30!4�):@ +3254I= $ZÏ ¼ Ò�':0\4I':+32(4��Ú #� � K� �GKT^�254\N5�Q #"5$&"5'*):':+.-O+Q2%$8+�$P9J25'*):S¦2%$&0]"5)*?54\4I-#4I0�'*0 ��� �5K �H0Q0!?56�4Z'*=5S(4�Nd4�=5S54I=59�4"d4I+ 2�4�4I=}9J25':)*S5�!4�= K �\ #=(0Q'*S(4��c$Ã@�$867':):-a2T':+Q2 �¦9J25':)*S5�!4�= K
Ï�$<Ò?$ @R':+Ã':0 �[=( 2T=ß+32%$8+7$8+�):4;$&0_+7 &=54³9J2('*)*S 2%$807"5):?54�4I-&4�0�VN2T25$8+b':0`+3254N5�! #"%$&"('*)*',+.-Ã+32%$8+P$8+T):4;$&0_+�+325�!4�4/9J25':)*S5�!4�=�2%$;U#4�"5)*?(4�4-#4I0ÏB"×Ò@$ @T',+¦'*0��E=5 2T=�+32%$8+M+32547-# #?5=(>#4�0_+Ã9J25'*):S�2%$&0i"5)*?5474I-&4�0�V�2T2%$8+i'*0�+3254N5�! #"%$&"('*)*',+.-Ã+32%$8+P$8+T):4;$&0_+�+325�!4�4/9J25':)*S5�!4�=�2%$;U#4�"5)*?(4�4-#4I0
� *(K><G2( 2r+32%$¨+ $�Ï ¼ +� MÒA® $ZÏ ¼ L +� iÒ $�Ï +,L iÒ $\Ï MÒå
�����Z����M������ �(� � Ü ���� K�<G?5N(Na #0!4 � 4IU#4I=<+30b@B #�!6 $�N5$&�!+Q':+3': #= &@H+Q254|03$&6�N5)*4�0!N%$&9�4}{RVZ'«K 4&Kß+3254-$&�!47S5':0��! #':=<+�$&=5S / æ98aâ ¼ æ\® {RK �P0Q0!?5674�+32%$¨+ $\Ï +`Ò¦î Ú K���Q �U&47+325$8+�':@$\Ï ¼ â L +`Ò � $ZÏ ¼ âQÒ�+Q254�= $\Ï ¼ æ L +`Ò\î $ZÏ ¼ æ�Ò�@B &��0Q #6�4/í1® ÛG²;å;å�å�² � K�� K�<G?5N(Na #0!4¦+32%$¨+/Ü Ú Nd4��Q9I4�=<+/ &@A9� &67N5?G+34��H 2T=(4��Q0/?50!4Ã$ ,}$&9I'*=<+3 #0!2 V � Ú ?(0Q4Lì'*=(S5 2T0]$8=5S¦Û Ú Nd4��!9�4I=E+]?50!4�B�'*=E?G~aK�<G?5N5Nd #0Q4�+Q2%$8+!* �ZNa4I�Q9I4�=<+x 8@G+Q254�,}$&9?50!4��Q0�25$�U&4R0Q?(9�9�?(6¦"a4IS+3 `$i9I #6�N5?(+34I��U['*�!?50�V ÛMNd4��!9�4�=<+T &@�+Q254PLì':=5S5 2T0?50!4��Q0\>#4I+�+3254cUG':�Q?50\$&=5S � Ú Nd4��Q9I4�=<+� &@�+Q254�B�':=[?[~b?(0Q4��!0Z>#4+\+32(4PU['*�!?50�KDL�40Q4I)*4I9I+T$¦Nd4��!0Q #=�$8+\�3$&=(S5 #6�$&=5S)*4�$&�Q=7+32%$8+Z254��Z0_-[0!+34I6 2\$&0Z'*=G@B4�9I+Q4�S 2T',+32+32(4/U[':�Q?50IK�Lì25$8+T'*0Z+Q254/N5�Q #"5$&"5'*):':+.-�+32%$¨+T0Q254�'*0T$¦Lì'*=(S5 2T0c?(0Q4�� ��� K � "d �~9� #=<+J$8'*=50 �¦9� &'*=50T$&=(S|4�$&9J2|2%$&0c$`S5'SPa4��!4�=<+TN5�Q &"%$&"5':)*':+.-� &@]0Q2( 2T':=5>254�$&S50�K�B�4I+�� âJ²;å;å;åI²����S54I=5 &+34m+Q254mN5�! #"%$&"('*)*',+.- &@/254;$8S50 #=Ó4;$89J2Î9I #'*= K
<G?5N(Na #0!4�+Q2%$8+�aâ�® Ú ²��5ãZ® � "��5²��%ä\® � "#Û[²�� � ®rÜZ"�� $8=5S����\® � å
B�4I+A± S54I=5 &+34 &!2(4;$&S50�':0� #"(+J$8'*=54IS (/$&=5SÃ)*4+ \æ×S54�=( &+34�+32(4c4U#4I=E+A+325$8+�9� #':=í�'*0�0!4�)*4I9I+Q4�S KϪ$#Ò�<G4�):4�9I+i$9� #':= $8+��3$&=5S( #6 $&=5S +3 &0Q0/':+�KR<G?5N5Nd #0Q47$�254;$&S '*0O &"(+J$&':=54�S KLì2%$8+]'*0�+Q254�Nd #0_+34��!'* #��N(�Q #"%$8"5'*):':+.-H+Q2%$8+19� &'*=�í 2\$&010Q4�):4�9+34�SÏ�í1® � ²�å;å;å�² �#Ò $.=| &+3254I��2\ #�!S50�V -5=5S $\Ï \æ L ±mÒ�@B #��í]® � ²;å�å;å�²��[KÏ�"%ÒP^] &0Q0R+32(4¦9� #':=�$&><$8'*= KHLì25$8+O'*0c+3254`N(�Q #"%$8"5'*):':+.-� &@A$&=( &+3254I�R254�$&S �$.= &+Q254���2� #�QS50�-%=(S $\Ϫ±`ã L ±�â¢Ò�2T254��!4i± [ ® &!254;$8S50T #=|+Q #0Q04^%K5(�P 2Ó0!?5N5Nd #0Q4T+325$8+�+3254T4~GNd4��!'*6�4�=<+�2\$&0�9;$8�Q�Q':4�S� #?(+�$&0�@B #):)* 2T0�K1L�4T0Q4I)*4I9I+$`9� #':=|$8+T�3$&=(S5 #6 $&=(S|+Q #0Q0T':+T?(=E+Q'*)�$`254�$&S|'*0� #"(+3$&'*=(4�S KÏ�9;Ò�`x'*=5S $\Ï \æ L + � Ò�2T254��!43+ � ® &�-%�Q0_+P2(4;$&S|'*0� #"G+J$&':=54�S� #=�+3 #0!0 �5K)(
Û Ú K�Ï �+7 ��� P�/ �9F��E�[�&�9F 1 �P� 5�/eK Ò <G?5N5Nd #0Q4 $�9I #'*=Ó2%$80N5�Q #"5$&"5'*):':+.-�Î &@/@�$&)*):'*=5>254�$&S50�K�$ @ 2\4OY%':N�+3254O9I #'*=67$&=<-b+Q'*6�4�0IV�2�4K2� #?5):S4~GNd4�9I+T+Q254ON5�Q &Na #�_+3': #= &@�254�$&S50\+Q ¦"d4R=54;$8���KxL�4�2T'*):)×6b$'�84H+32('*0�@B &�Q67$&)×) $8+Q4���Kx^x$'�84���® å Üi$&=5S� ® �;Ú#Ú#Ú $8=5Sm0Q':6¦?5)*$8+34 � 9I #'*=|Y%'*N(0�KR��): &+P+Q254iN(�Q #Nd #�!+Q'* #=³ &@�254�$&S50O$80R$@B?5=59+3': #=� &@ � K #c4INa4�$8+�@B #���b® å Ú Ü(K
Û � K�Ï �+7 ��� P�/ �9F��<�[�#�9F 1 �P� 5�/eK Ò%<G?5N5Nd #0!4N2\4�Y%':N`$R9� #':= � +3':674I0D$&=(S�):4I+��ÃS54�=5 8+34+32(47N5�! #"%$&"('*)*',+.-} 8@Z254�$&S50�K B�4+� "d4Ã+Q254�=[?(6¦"a4I�M 8@Z254�$&S50�K7L�4�9;$&):)
Ü * ������������ ������������������������
$ "5':=5 #6�' $&)T�Q$&=5S5 #6 U8$&�Q'*$&"5)*4'2T25'*9J2ì':0�S5'*0!9�?50!0Q4ISX'*=¯+Q254�=54e~G+�9J25$&N(+34I��K$.=<+3?(':+3': #=�0!?5>#>#4I0!+Q0¦+32%$8+ 2T'*):)�"d4�9�): #0Q4�+3 � ��K}^] �0!4�4':@\+325':0¦'*0�+3�Q?(4&V2�4¦9;$8=��Q4�Nd4;$¨+R+325':0H4e~(Nd4��!'*6�4�=<+O67$&=<-³+Q'*6�4�0H$&=5S�$;U#4I�3$&>&4¦+3254 U8$&):?54�0IK�Z$&�!�!-à #?G+Z$�0Q'*6i?5) $8+Q'* #=Ã$8=5Sb9I #67N5$&�Q4T+3254H$;U#4��Q$&>#4P &@ +Q254 � 0�+3 � � KD^]�_-+Q25'*0Z@B #����®½å¿ÜÃ$&=5S � ® ��Ú ² �;Ú#Ú ² ��Ú&Ú#Ú K
Û#ÛGKMÏ �#7 ��� P�/ �9FH�E�[�&�9F 1 �H� 5�/eK Ò �P4I�Q4 2�4 2T':)*)�>#4+¦0Q &674�4~GNd4��Q':4�=59I4�0Q':6¦?5)*$8+3':=5>9I #=5S5',+3'* &=%$&)dN5�Q #"5$&"5'*):':+Q'*4�0IK �\ &=50Q':S54���+3 &0Q0Q':=5>Ã$¦@�$&'*��S5'*48K�B�4I+ ¼ ®�°8ÛG² �5² *Gµ$&=(S?+ ®�° � ²eÛ[²JÜ(² �(µEK�^�254I= VO$\Ï ¼ Ò�® � "#Û[V9$\Ï +`Ò�®XÛ "&Üc$&=5S $\Ï ¼ +`Ò�® � "8Ü(K<G':=59�4 $\Ï ¼ +`Ò�® $\Ï ¼ Ò?$\Ï +¦ÒV<+3254T4U#4�=<+Q0 ¼ $&=5S + $&�!4T'*=5S54INa4I=5S54I=E+�K4<G'*6i?(�)*$8+34cS5�3$ 2T0A@B�Q #6 +32(4H03$867N5):4P0!N%$&9I4O$&=5SbU&4��!':@F-Ã+32%$¨+ �*�Ï ¼ +`Ò�® �*�Ï ¼ Ò �*bÏ +`Ò2T254I�Q4 �*�Ï ¼ Òi'*0¦+32(4N5�Q #Nd #�_+3'* &= &@c+3'*6�4�0 ¼ G9I9�?5�!�Q4�S '*= +Q2540Q':6¦?5)*$8+3': #=$&=(S70!'*6�'*) $8�Q):-O@B &� �*�Ï ¼ +`Ò�$&=5S �*7Ï +`ÒeK �P 2 -%=(S�+ 2� i4U#4�=<+Q0 ¼ $&=5S +�+Q2%$8+$&�!4R=5 &+T':=5S54INa4I=5S54�=<+�K �\ &67N5?G+34 �*�Ï ¼ Ò² �*7Ï +`Ò�$&=5S �*�Ï ¼ +¦ÒK �\ #6�N%$&�Q4P+32549�$&)*9I?5) $8+Q4�S�U8$&)*?(4�0O+3 �+3254I'*�R+Q254� #�!4I+Q'*9;$8)DU&$8)*?54I0�K #c4INa #�_+/-# #?5���Q4I0Q?5),+30�$&=5S':=E+Q4��!N5�Q4+;K
� � ��� ����
� � � � ��� � ��������� �����
��� � !#"%$'&'(�)+*+,-$'./(0"132546287:9;2<7:=>9%4@?BACAD46254�EF7G?H7:?HI�4@JLKM=/NO?B=/K/JL?HK/ACPQ7R2<SCAD462<4BTVUWNXPYAHN�PZK+[G7:?H\
9<4@E0]H[GK^9L]D4@=>K/9_46?HA`KbaOK/?c289d2<N0AD462<4cegf-SBKh[G7:?H\%7G9Q]HJ8N a37:AHK>A`icjk2<SBKh=>NO?H=/K>]B2_N6l'4J<4@?BAHNOE�a64@JL7m4@iH[GK@T
n'o>prqOs t<s uBqwvHxzy|{~}X�D�����r� �B�H}����B�����V�:�����0�b�c�r���3�+� �^� � � ���B�@����5�b�R�@�B�^���<�8�@ ¡�D¢3�0£b�b�¤��¥�¦Z§¨�ª©k�8�3«5�`©�¢¬�«5©����-¦d®
¯°o/±�²cqOs³±µ´c¶·¶ ¸/¹º´¼»m´½qµ¾¿cuBÀ Á�´�»ÂsôcÄc¶ oÅÀhÆXǪtÄOokÀdo/´�ÇÈÆc»�´cÄc¶ o@x%ÉBo>ot5²Xo tªo/±�²cqOs³±/´½¶�´½Êµ¾ÊOo6qX¿Bs ËÍÌ�uc»�¿co>t;´Osζ Ç/x
Ï 2Q4Ð=/K>JL2<4@7:?%]rNO7G?½2Z7:?%E0NO9;2¤]HJLNOiD4@iH7G[:7R2jw=/NOÑBJ898K>9/ÒB28SHK_984@EF]B[:Kd9L]D4@=>K_7:9ÓJ<4@JLK/[GjEFK>?c2<7:N@?HK/AÔ4@?BAÔPÓK-PZN@J8\ÐAB7:J8K>=>28[Gj^PQ7G2<SwJ84@?HAHN@E�a64@J87:4@iH[GK/9/T�ÕZÑB2'j@NOÑÔ98SHN@ÑH[:Aw\6K/K/]7:?�E07:?HA%28SD462Q2<SHKÖ984@EF]B[:K_9L]D4@=/K^7:9-JLKµ4@[G[Gj02<SHK>J8K6Òr[GÑHJ8\½7G?HIw7:?k2<SBK^iH4@=5\½IOJ8NOÑB?HA°T×½Ë@´cÀhÊc¶ o^vHxÙØÛÚÜ Ý�·�Þ�M«5©����ß��b���������5�/®Íà#�b�á��¥m¦Z§Í£b�Ð���D�Ð�D¢¬�0£b�b�Ô©ªâ^�D�8�cã��Ð���ß���D�� �8ä/¢å�b�æ«5�Z¦d®QÚ�©��Ð�Lç½�@�d�r :�5è'� âZ¦êéìëMëîíïëMëîíïëMëîíQí����D�b�F��¥�¦Z§'éñðH®¤ò×½Ë@´cÀhÊc¶ o^vHxÙvÛà#�b�r�Ûéôó¬¥�õ#ö<÷B§bø'õåùcúF÷3ùQûVü@ýÔ£b�¤���D�¤¢¬�D����ã@�:� «/®ÖþÓ©��B�b��ã3�b�-ã@�L�@ÿ'���3��_�D©����D���Â�@�'�8�@�rã3©����#âÈ�<©����Ö®����M�dÿ'�� � r�0��c�_���3�:�_��ã3�8�w��©��<�Ü�r�5�5«b�:� �d R�@��b�/®� {� �5�r� «8�@ D©�¢¬�«5©����Ó�:�頻â ���D�æ⵩��È�Û¦êé ¥�õ#ö<÷B§/®��°©����Q�Lç½�@�d�r :�5�-©ªâ �8�@�rã3©������@�È���c£/ :�5��@�<�Q��¥�¦Z§áé õ°è��%¥�¦Z§'é ÷åè��Ð¥�¦Z§'é õwú�÷Dè�� ¥�¦Z§áé�� õ ù úê÷ ù ®-ò
�Ö7RaOK/?^4-J<46?HAHNOE a64@J87:4@iH[:K � 4@?HA^4¤98ÑHiB98K>2��ÞN@l32<SBK'J8Kµ46[O[:7G?HK@ÒµABK��D?HK ��� �b¥ � §Üéóµ¦"!M�ì�Ô��¥�¦Z§�! � ý 4@?BA [GK>2
#%$
#�� ��������� ����������������� ����������� �! "
# ¥�� ! � §~é # ¥�� � � ¥ � §8§Üé # ¥óµ¦"!+�_øQ��¥m¦Z§ ! � ý6§# ¥���é õ §~é # ¥�� � � ¥�õ §L§Üé # ¥ªóµ¦ !+�_øQ��¥�¦Z§'éCõ¡ý6§%$
� AHK/?BN@2<K>9Q28SHKÖJ<4@?BAHNOE�a64@JL7m4@iH[GK¨4@?HA õ AHK/?HN62<K/9ï4Ô] NO9L987GiH[:Kda64@[GÑHK¨N@l � T
×½Ë@´cÀ^Êc¶ oÖvHx'& ÚÜ Ý�·�ô� «5©����ñ��ÿ'� «5���@�rã� :�b�¨� £b�+���D�ß�D¢¬�0£b�b��©ªâî�D�8�cã��/®)(°�D�b�Bè# ¥�� é+*c§-é # ¥ó íQí^ý6§Zé ü-,/.¬è # ¥�� éÅüX§¤é # ¥ªó�ëîíQöLídëßý6§ZéÅü0,�1+�@�rã # ¥�� é1O§Mé # ¥ªó�ëMëßý6§`é ü-,2.å®3(°�D�M�L�@�rã3©�� ���@�b���c£/ :�M�@�rãÞ�����gã@�:�b�Â�È��£/¢3�Â� ©��ô«8�@� £b��b¢3�w�0�@�b�54��8ãF���Ü⵩� � :©�ÿ��76
¦ # ¥ªóµ¦Qý6§ ��¥m¦Z§(8( 9%:�; <(>= 9%:�; 9=?( 9%:�; 9=�= 9%:�; @
õ # ¥�� éCõ §< ü-,2.9 ü-,�1@ ü-,2.
(H� �Ð�3�b�æ�b�8�@ Ý�54/���3�Ô���3�:�^�©BADCÓ�·�D�/®¤ò
���FE G .IH�$á& .7J+*`$'./(0" K *+"+,-$'./(0"LHNMÐ"+) O &'(?JPMQJ+.IRµ.>$TS K *+"+, U$'./(0"LH
�Ö7RaOK>?g4%J<4@?HABNOEYa64@JL7m4@iB[:K � ÒæPZKwABK��D?HKÔ46?+7:E0] NOJ;254@?c2Wl�ÑH?H=b2<7:N@?+=µ4@[G[:K/Aî2<SHK=/ÑBEÍÑH[m4�2<7Ga@KFAB7:9L28J87GiHÑB2<7GNO?gl�ÑH?H=>287:NO? ¥ NOJhAH7:9;2<JL7:iHÑB287:NO?gl�ÑH?H=b2<7GNO? § 7:?�28SHKFl�N@[:[:N PQ7G?HIPZ4 j@T
n'obp qOs t<s³uBqÍvHxWVX(°�D�ZY\[��][����_^X��� �_���a`b^ }����c[d^X��� �feg[��ZYh^X��� �ji>kZlnmZo �O�ì�p *Böµü7qQ©ªâ^���L�@�rã3©�� ���@�È���c£/ :�-� �:�Ðã3�srÜ�æ�8ã%£ �
mZoÖ¥�õ §Üé # ¥�� ûÛõ §%$
�����_� � � "!���� ���8Z�%�8�������Q��Z�%����"B����� � � ��� � � � �d�7�������Q��Z�%����" #�
* ü 1
ü
�� �
mZo¨¥�õ §
õT 1� T $
� 7GIOÑHJ8K # T ü@� i>k�l l�NOJ��D7:]B]H7:?HIÔ4Ô=/N@7:?k2PQ7:=>K ¥���� 46EF]H[GK # T ð T §
� NOÑ`E07:IOSc2QPÓNO?HAHK>JïPQScjîPÓKÐirN@28SHK/JQ2<NFAHK��D?BKh2<SBK i>kZl T � NOÑîPQ7G[:[¡98K>KÐ[:462<K>J2<SD4�2W28SHK i>kZl 7:9Q4ÔÑH9LK>l�ÑH[°l�ÑH?B=>2<7GNO? � 7R2QK�� K/=>287Ga@K/[Gj%=/NO?c25467:?H9W4@[G[ 2<SBK^7G?Bl�NOJLE�46287:NO?4@irNOÑB2-2<SBK^J84@?HAHN@E�a64@J87:4@iH[:K6T×½Ë@´cÀhÊc¶ o^vHx��ÛÚÜ Ý�·�Þ�Öâ>�@���Ô«5©�������ÿ'� «5�0�@�rãk :�b�á� £b�w���D�Í�D¢3�0£b�b�F©ªâÖ�D�8�cã��/® (°�D�b�# ¥���é *c§'é # ¥�� é 1O§'é�ü-,2.0�@�rã # ¥�� éü § éôü-,�1H® (°�D�¤ã@�:�b�Â�È��£/¢3�Â� ©��QâÈ¢3�æ«b�Â� ©���:�
mZo_¥�õ §Üé���� ���* õ�� *ü0,2. *ÔûCõ�ñü# ,2. ü^ûCõ� 1ü õ�� 1�$
(°�D�?i>kZl �:�%�5�D©�ÿ'� ����ÚÜ�R�@¢¬�<�! ½® 9c®�{d Ý���D©�¢@���ê���3�:�M�Lç½�@�d�r :�k�:���b���d�r :�5è¨�b�¢Bã ����_«8�@�5��âÈ¢3 � �@® iTk�l#" �%«8�@�º£b� �@�b� �M«5©��µâÈ¢½�b���3�c®%$^©��Â� «5�w���B�@�Q���D�¤âÈ¢3�æ«b�Â� ©��º�:�Í�È�R���3�«5©��D�Â���D¢D©�¢c�5èÖ�æ©��'&;ã3�5«b�<�8���b���3�+�@�rã+���B�@�¨���_�:�kã3�srÜ�æ�8ãÐ⵩��k�@ � #õ� �@�b�����D©�¢@���|���D��L�@�rã3©�� ���@�È���c£/ :�Í©��D �F� ��c�5� ���@ Ý¢D�5��*Böµü%�@�rã 1H®�(Í© �c©�¢�� �5�^ÿ�� � mF¥;ü�$W.½§'é $ $ eò
f-SHKwl�NO[:[GNXPQ7G?HI�JLK/9LÑH[G2/ÒæPQSH7:=5S+PÓKÔAHNk?HN@2¨]BJ8N aOK6Ò�9LSHN PQ9¨2<SD462¨28SHK i>kZl =>NOE�)]H[:Kb2<K>[Gj%ABK>2<K>J8E07:?HK>9-2<SHKÖAH7G9L2<JL7:iHѬ2<7:N@?kN@lá4wJ84@?HAHNOE�a@46J87m46iH[:K6T¯W²Xo>uc»mo6À+*-,/.Và#�b�'� �B� �@� i>k�l m~�@�rã :�b� ���B� �@�Bi>kZl 0F®%1�â mF¥�õ §ïé20�¥�õ §âµ©��h�@ � Bõ����D�b� # ¥�� ! � §'é # ¥ ��! � §Ü⵩��^�@ � � ® ¯°o/±�²cqOs³±µ´c¶·¶ ¸/¹43'oÍuBqc¶ ¸
²�´XÁXoVt5²�´ t # ¥�� !� §ôé # ¥ � ! � §Ì�u½»½obÁXo6»G¸dÀïoµ´XÇbÆc»m´½Äc¶ oo>ÁXo6q t � x
¯W²Xo>uc»mo6À+*-,65V{�âÈ¢3�æ«b�Â� ©�� m��0�b�c�r���3�Í���D�_�<�8�@ å Ý���æ�_�© p *Bö/ü7qá�:�¨� iTk�l¼âµ©��W� ©�����r�<©@£5�c£/�� Ý��� �Í���8���b¢3�<� # � â¨�@�rãF©��D �Í� âï�����/�@���:�sr¤�5�d���D��⵩� � :©�ÿ'���3�w���3�<�5�h«5©��rã@����� ©��B�76
.\* ��������� ����������������� ����������� �! "
� � m¼�:�^�æ©��'&;ã3�5«b�5�8���b���3�w��®��/®Óõ � � õ ù ���d�r Ý� �5�¨���B�@��mF¥�õ � §¤û mF¥�õ ù §>®� ��� Bm �:�^�æ©��b�0�@ Ý�54��8ã\6 [:7GE�� � ��� mF¥�õ §Üé *+�@�rã [:7GE�� � � m�¥�õ § éôü½®� ����� Bm �:�Ö�È�R���3� &8«5©��D�����D¢å©�¢c�5è ��®��/® mF¥�õ §Üé mF¥�õ��#§á⵩��h�@ � Hõ°è ÿ��D�b�<�
mF¥�õ � § é [:7:E���� �� � mF¥�÷B§%$
������� l�� 1¬ÑH]H]rNO98Kw28SD462 m 7G9Ö4 i>kZl T��#Kb2ÖÑH9Ö98SHN P�2<SD462 ¥ 7:7G7 § SHNO[:AB9/T��¡Kb2 õirKÍ4FJLKµ4@[�?½ÑHEÍirK/Jd4@?HA`[GK>2 ÷ � ö8÷ ù öI$7$I$ i KÐ4F98K��½ÑHK/?H=>KwN@l'JLKµ4@[�?½ÑHEÍirK/J89d98ÑH=5SM28SD462÷ ��� ÷ ù ��� � � 4@?HA�[G7:E�! ÷ ! éCõ T"�#Kb2 �#! é ¥%$�&ñö<÷ ! q 46?HAî[GK>2 � é ¥%$�&ñöLõ q T('WN@2<K2<SH462 � é*) �!,+ � �#!�4@?HAî4@[G98N0?HN@2<K¨28SD462 � �.- � ù -/� � � TZÕZK>=µ4@ÑB98Kh28SHKÖK>a@K/?c2<9¨46J8KE0NO?HN@28NO?HK@ÒB[:7:E0! # ¥ �#! § é # ¥ ) ! �1! § T f-S½ÑH9>Ò
m�¥�õ § é # ¥ � §'é #32�4 ! �#!65 é [G7:E! # ¥ �1! §Üé [G7:E! m�¥�÷ ! §'é m�¥�õ � §%$1¬SHN PQ7G?HI ¥ 7 § 4@?HA ¥ 7G7 § 7G9h9L7:E07:[m46J/T�7'J8N a37:?HI 2<SHK0N@28SHK/JÖAH7GJ8K/=b2<7GNO?�?H4@EFK>[Gj@Ò#28SD462h7Glm 98462<7G9 �DK>9 ¥ 7 § Ò ¥ 7G7 § 4@?BA ¥ 7:7:7 § 28SHK/? 7G2Q7G9Q4 i>kZl l�N@JW9LNOE0K¨J<4@?HABNOE�a64@J87:4@iH[GK@ÒBÑB98K/998N@EFK¨AHK>K/] 2<N3NO[:9¤7G? 4@?D4@[Rj¬9L7:9>T ò8 Ç;o>t s³Ç�±/uBÆcq t;´cÄc¶ o
s Ì s t s Ç prqOs tªo u½»s t ±/´½q ÄOo ÊcÆ t s·q´�uBqXob¾ tªu6¾u¬qXo|±/uc»Â»mob¾ÇÈÊOu¬qX¿co6qX±/o3Qs t5²ßt5²Xos·q tªo:9Oo�»mÇ>x�¯Q²XoMo>ÁXo�qqcÆcÀ^ÄOo�»mÇb¹ t5²Xo u3¿c¿qcÆcÀ^ÄOo�»mÇÜ´cqX¿^t5²Xo¨»m´ ¾t<s uBq�´c¶ Ç ´6»mo ±/uBÆcq t ¾´cÄc¶ o<;�t5²XoÔÇ;o>t¨u@ÌW»mo/´½¶qcÆcÀ^ÄOo�»mÇ`Ä@o>t 3'o>o6q>=´cqX¿êyîs³ÇFqXu@t^±/uBÆcq t ¾´cÄc¶ o@x
n'obp qOs t<s³uBqÍvHx@?ê� �:�ï��� `7YO}��\^ �+� âÖ��� � ��c�5�Í«5©�¢¬�D� �c£/ �0�0�@� �����@ Ý¢D�5�ó õ � ö8õ ù ö7$I$I$Ýý $
�`�hã3�srÜ�æ�¨���D�1A�}��å���H���Â���s^CB e�[��ZYh^X���r� ©��DA�}X�å���H�����Â� ^CBÞ�|� `7`�eg[��ZYh^X��� �⵩��-� £ � E
o_¥�õ §Üé # ¥�� éCõ §%$
f-S½ÑH9/ÒEo_¥�õ § � * l�NOJÐ4@[:[ õ !ì� 4@?HAGF ! E oÖ¥�õ ! §Íé ü T`f-SBK i>k�l N@l � 7:9
J8K>[m4628K/Ak2<NEo i½j
mZo_¥�õ §Üé # ¥�� ûÛõ §ÜéIH�CJLKM� E o¨¥�õ ! §%$1¬NOE0K>287:E0K/9ZPÓKÖPQJ87G28K
Eo 4@?BA m!o 987:E0]H[Rj%4@9
E4@?HA m T
�����_� � � "!���� ���8Z�%�8�������Q��Z�%����"B����� � � ��� � � � �d�7�������Q��Z�%����" .Hü
* ü 1
ü � � �Eo¨¥�õ §
õT 1� T
� 7:I@ÑHJ8K # T 1¬� 7'JLNOiD4@iH7G[:7R2jFl�ÑH?B=>2<7GNO?kl�NOJ��D7G]H]H7:?BIÔ40=>NO7:?k2PQ7:=>K ¥ ��� 4@EF]B[:K # T ð T §
×½Ë@´cÀhÊc¶ o^vHxRy = (°�D�¤�r�<©@£5�c£/�� Ý��� �WâÈ¢3�æ«b��� ©��Ð⵩����áç½�@�d�r :� ½®��Ô�:�
Eo_¥�õ §Üé
���� ���ü-,2. õké *ü-,�1 õké�üü-,2. õké 1* õ�,!|ó/*BöµüOö 13ý $
�°�5�_ÚÜ�R�@¢3�<� ½® @3®¤ò
n'o>prqOs t<s uBqwvHxzyOy|{ �8�@�rã3©�� ���@�È���c£/ :�Í� �:��YO� � ^X�Â�T[�� [Z`ß� âk���D�b�<�M�LçO�:�b���î�âÈ¢¬�æ«b��� ©��
Eo �b¢D«5�ß���B�@�
EoÖ¥�õ § � *F⵩��F�@ � °õ°è�� ���� E
o_¥�õ §��OõÞé üg�@�rãh⵩��� �@�b� �Fû��µè
# ¥ � � ���b§ é����� Eo_¥�õ §��Oõ!$ ¥ # T üX§
(°�D�'âÈ¢3�æ«b�Â� ©��Eo �:�w«8�@ � :�8ãF���D�#A�}X�å���B�����Â� ^CB����3�Z`µ� ^�B e�[��ZYh^X���r����� kZl�� ,
�M�¨�B� �@�¨���B�@�mZoÖ¥�õ §Üé � �
��� Eo_¥��L§����
�@�rãEo_¥�õ §Üé m��o ¥�õ §Í�@�Z�@ � ¬�D©����D���Qõ��@� ÿ��3� «5� m!o �:�Ðã@� �_�b�<�b�D�����c£/ :�/®
1¬NOE0K>2<7GEFK>9¤PZKÖ9LSD4@[:[°PQJL7G28K � E¥�õ §��Oõ NOJ-9L7:E0]H[Gj � E
28N0E0Kµ46? � ���� E¥�õ § �cõ T
. 1 ��������� ����������������� ����������� �! "
* ü
üm!o¨¥�õ §
õ� 7GIOÑHJLK # T # � i>k�l l�NOJ��ï?H7Rl�NOJ8E ¥ * Ò üX§ T
×½Ë@´cÀ^Êc¶ oÖvHxzy Ø �å¢X�c�D© � �Ö���B�@��� �B��� � k�lEo_¥�õ §Üé�� ü l�NOJ *0ûÛõ`ûVü
* N@2<SBK/JLPQ7G98K $þá :�8�@�È �Xè
Eo¨¥�õ §�� *+�@�rã � E
o_¥�õ § �cõ�é�ü½®Q{Å�L�@�rã3©�� ���@�È���c£/ :�Öÿ'����� ���3�:�Íã3�b�B�b��� ��:�_�/�@��ã��©w�B� �@�h� �W?H7Gl�NOJLE � < è 9 Fã@�:�b�Â�È��£/¢3�Â� ©��r®�(°�D��i>k�l �:�ï�@� �@�b�+£ �
m!o¨¥�õ §Üé�� � * õ� *
õ *ÔûÛõMûVüü õ � ü�$
�°�5�_ÚÜ�R�@¢3�<� ½®/ ½®¤ò
×½Ë@´cÀ^Êc¶ oÖvHxzy v �å¢X�c�D© � �Ö���B�@��� �B��� � k�lE¥�õ §'é � * l�N@J õ�� *
�� � � ����� N62<SHK>JLPQ7:9LK $�å���æ«5� � E
¥�õ § �cõkéü6è ���3�:�Ö�:�Ð��ÿZ�b � &;ã3�srÜ�æ�8ã� kZlꮤò �B}6��������� NO?c2<7G?½ÑHNOÑH9ÔJ<46?HAHNOE a@46J87m46iH[:K>9Í=µ4@?ê[GKµ4@Aº28Nß=>NO?Bl�ÑH9L7:NO?æT � 7GJ89L2/Ò
?HN@28Kw2<SD462Ö7Rl � 7G9¨=/NO?c2<7G?½ÑHNOÑH9Ö2<SHK>? # ¥�� é õ §¨é * l�N@J¨K>aOK>JLj õ���� NO?��Ã2¨2<J;jM2<N2<SB7:?H\MN6l
E¥�õ § 4@9 # ¥�� é õ § TFf-SH7G9ÖNO?H[GjMSHNO[:AB9Öl�N@J¨AH7:9L=/J8Kb2<K0J<4@?BAHNOE�a64@JL7m4@iH[GK/9>T� K_IOK>2¤]HJ8NOiH4@iH7:[G7G287:K/9'l�JLNOE�4 � kZl i½j�7:?c2<K>IOJ<4�2<7:?BIHT Ï � kZl =/4@?kirK_iH7:I@IOK/J 2<SD4@?
ü ¥ ÑB?H[:7G\@Kw4kEF4@989dl�ÑB?H=>287:NO? § T � NOJ_K � 4@EF]B[:K@Òæ7GlE¥�õ §dé l�NOJ õ ! p *Böµü-,� -q 4@?HA *
�����_� � � "!���� ���8Z�%�8�������Q��Z�%����"B����� � � ��� � � � �d�7�������Q��Z�%����" . #N@2<SBK/JLPQ7G98K6Òá2<SHK>?
E¥�õ § �n* 4@?HA � E
¥�õ §��Oõ é ü 98NM2<SB7:9Ð7G9Í4îPÓK/[G[ ) AHK��D?BK/A � k�lK>a@K/?º2<SBNOÑHIOS
E¥�õ §Ðé 7:?ß98NOE0K�]H[:4@=/K>9/T��?|l�4@=b2µÒá4 � k�l =µ46?ºi K0ÑH?½irNOÑH?HAHK>A°T� NOJhK � 4@EF]B[:K@Ò�7GlE¥�õ §^é ¥ 1�, # §ªõ � ����� l�N@J * � õ �¼ü 4@?HA
E¥�õ §hé * N@28SHK/J;PQ7:98K6Ò
2<SHK>? � E¥�õ § �Oõ�é�ü K>a@K/? 2<SHNOÑBIOS
E7G9-?HN@2Wi N@ÑH?HAHK>A°T
×½Ë@´cÀhÊc¶ o^vHxRy7& à#�b� E¥�õ §Üé�� * l�N@J õ� *
�� � � ��� N62<SHK>JLPQ7:9LK $(°�3�:�^�:�Ö�æ©��-� � kZl �b���æ«5� � E
¥�õ §��Oõ�é � �� �Oõ ,H¥;üÓúÞõ §Óé � �� ���T,�%é [GNOI ¥ &C§ é&V®¤ò
6o�ÀhÀd´wvHxzyIV à#�b��m�£b�¨���D��i>kZl�⵩��h�F�L�@�rã3©�� ���@�È���c£/ :�-��®?(°�D�b� 6� � # ¥���é õ §Üé mF¥�õ § $ mF¥�õ � §Öÿ��D�b�5��m�¥�õ � §Üé [:7GE �� � mF¥�÷B§5è� ��� # ¥�õ���� ûC÷B§'é mF¥�÷B§�$ mF¥�õ §5è� ����� # ¥�� � õ §Üéü $ m�¥�õ §5è� � � 1�â¤� �:�Ы5©��D�Â���D¢D©�¢½�^���D�b�# ¥ ��� ���b§ é # ¥��ûÛ� ���b§Üé # ¥ ��� û��b§Üé # ¥ FûÛ� û �b§%$
� 2ï7G9Q4@[:9LNÔÑH98Kbl�ÑH[°2<N0AHK��H?HK^28SHKÖ7:?caOK>J89LK i>kZl ¥ NOJD�½ÑD4@?c2<7G[:K_l�ÑH?B=>2<7GNO? § T
n'o>prqOs t<s uBqwvHxzy �Ûà#�b�#� £b�^�Ô�L�@�rã3©�� ���@�È���c£/ :�¨ÿ'����� iTk�l mF® (°�D�-�Â� �r�½}/`/�i>kZl ©��� T[��D� ^X�Â��� eg[��ZYh^X��� �C�:�Ðã3�srÜ�æ�8ã%£ �
m � � ¥��O§'é 7:?Bl�� õM� mF¥�õ §Zû����⵩���� ! p *¬öµü7q ® 1�â�m �:�¨�b�Â�È� «b�� �����æ«b�<�8���b���3���@�rã «5©��D�����D¢å©�¢c�^���D�b� m � � ¥��O§Ö�:����D�¨¢¬�D��ä/¢D�^�<�8�@ ¡�D¢¬�0£b�b�¤õÞ�b¢D«5�����B�@�cm�¥�õ §'é��B®
�ÃÌ%¸>uBÆ ´�»moñÆcq Ì�´cÀ^s·¶ ¾sô6» 3Qs t5²��;sÎq Ì��D¹��5ÆXǪtt5²OsÎq���u@ÌÛs t ´XÇ t5²XoÀ^s·qOsÎÀ^ÆcÀÍx
� K-=/4@[:[ m � �b¥;ü-,2.c§ 2<SHK �DJ89;2(�3ÑH4@JL287:[:K6Ò m � �>¥;ü0,�1O§ 28SHK-E0K/AH7:4@? ¥ N@Já98K>=/NO?HA��½ÑD4@J )2<7G[:K § 4@?HA m � � ¥ # ,2.½§ 2<SHK¨2<SB7:J8A �½ÑD4@J;2<7G[:K@T
.�. ��������� ����������������� ����������� �! "
f¤PZNgJ84@?HAHNOE a64@J87:4@iH[:K>9 � 4@?HA � 46J8K � >[#�D�w��� ���a`b^ }����c[d^X���r� � PQJL7G2828K/?� �é � � 7Rl m!o¨¥�õ §Óé m��Ó¥�õ § l�NOJQ4@[G[ õ TÜf-SH7:9QAHN3K/9Q?HN62ïE0Kµ46?�28SD462 � 4@?HA � 46J8KK:�½ÑD46[ T��ï4�2<SHK>J/ÒD7G2¤EFK/4@?H9-2<SD4�2d46[:[°]HJLNOiD4@iB7:[:7R2j09L254�2<K/E0K/?c289d4@irNOÑB2 � 4@?HA � PQ7:[G[irKÖ28SHKÖ9<4@E0K@T
���5� �k(� !�� �|(0&á$�MÍ"%$ G .IHá,W& d$� � MÍ"+)M(�� � MÍ&Ü.%UM JLR� H
�B}6��������%�'� [d^��î� ^ ��^X���r� � � 2'7G9�2<J84@AH7R2<7:N@?D4@[c2<NdPQJL7G28K ��� m 28N¨7:?HAH7G=µ4628K2<SH462 � SD4@9-AH7G9L28J87:iBÑB2<7GNO? m T'f-SH7:9-7G9¤ÑH?Bl�N@JL2<ÑB?D462<K¨?BN@2546287:NO?�987:?B=/K_2<SBK¨9Lj3EÍirNO[ �7:9á4@[:9LN¨ÑH98K>A028N¨AHK/?HN62<KW4@?Ô4@]H]BJ8N � 7:EF462<7GNO?°T�f-SHK-?HN@2<462<7GNO? ��� m 7:9á9LNÖ] K>JLa64@987RaOK2<SH462ZPÓK¨4@JLK_9L28ÑH=5\FPQ7R2<S%7R2µT �QKµ4@A ��� m 4@9�� � SD4@9¤AH7G9L28J87:iBÑB2<7GNO? m��w�#� ^ 4@9 �7:9Q46]H]HJ8N � 7:EF4628K/[Gj m T�! �" ���$#&%('*),+.-/-10#&-2'��(#&354('(#6��% �h� SH4@9¨4%]rNO7:?c2_EF4@9L9_AH7:9;2<JL7:iHÑB287:NO?`462 ÒHPQJ87R282<K>? ���76 � ÒD7Rl # ¥�� é 3§ éü 7G?kPQSH7:=5S =µ4@9LK
m�¥�õ §Üé � * õ��ü õ� $
f-SHKÖ]HJLNOiD4@iH7G[:7R2jFl�ÑH?B=>2<7GNO?�7:9E¥�õ §Üéü l�NOJ õ�é 4@?HA * N@2<SHK>JLPQ7G98K@T
�! �" 08#&- i � " ' ":9 %�# l � �(;<08#&-2'��(#&354('(#6��% � �¡K>2>= � ü i Kd4hIO7RaOK>?%7:?c2<K>IOK/J>T1¬ÑH]B] NO9LK¨2<SD462 � SD4@9Q]HJLNOiD4@iB7:[:7R2j�EF4@9L9¤l�ÑH?H=>287:NO?�IO7RaOK>?îicjE
¥�õ §Üé�� ü-, = l�NOJ õ�éôüOöI$I$7$/ö =* N@28SHK/J;PQ7:98K $
� K^984µj%2<SD4�2 � SD469ï4wÑH?B7Gl�NOJLE�AH7G9L2<JL7:iHѬ2<7:N@?kNO? ó½üOöI$I$I$>ö = ý T�! �"@?A" �(% �$4�BCBC#�0#&-2'��(#&354('(#6��% � �¡K>2 � J8K/]BJ8K/9LK/?c2 4|=>NO7:?��D7:]°T f-SHK>?
# ¥���é�üX§'éED 46?HA # ¥�� é *c§'é�ü $:D l�NOJ-98NOE0K D ! p *¬öµü7q T � KÖ9<4µj%28SD462 � SD4@940ÕZK/JL?HNOÑH[G[:7°AH7G9L28J87:iBÑB2<7GNO?%PQJ87R282<K>? ��� ÕZK/JL?HNOÑH[G[:7 ¥FDr§ TÓf-SBKh]HJLNOiD4@iB7:[:7R2jFl�ÑB?H=>287:NO?7:9
E¥�õ §ÜéED � ¥;ü $:Dr§ � � � l�NOJ õ !|ó/*Böµü6ý T�! �"G? #&% ��;H#I+.BJ08#&-2'��(#&354('(#6��% � 13ÑH]H]rNO98K+PÓK�SD4µa@K|4ê=/NO7G?CPQSH7:=5S l�4@[:[G9
SHK/4@AH9dPQ7R2<SM]HJLNOiD4@iB7:[:7R2j D l�NOJW98NOE0K *�û,D�û ü T � [:7:]�2<SBKÍ=/NO7G? A 287:E0K/9d4@?HAM[GK>2
��� ��� ">��� �� � ������ ���� � � ">������ ����������� ����������� �! " . � irK�2<SHK�?3ÑBEÍi K>JÍN@lQSHK/4@AH9>T Ï 989LÑHE0K�2<SD4�2Ð2<SBK�2<NO9L98K>9Ô4@J8KF7:?HAHK>] K>?HAHK>?½2/T �#Kb2E¥�õ §Üé # ¥�� éCõ § i K¨2<SBK^EF4@9L9¤l�ÑH?H=>287:NO?æT � 2ï=/4@? i KÖ98SBNXPQ? 2<SD4�2E
¥�õ §'é������ ��� D � ¥ªü1$ Dr§ � � � l�NOJ õké *Bö7$I$I$/ö A* N@28SHK/J;PQ7:9LK $
Ï J<46?HAHNOE a64@J87:4@iH[:KgPQ7G28Sñ2<SBKºE�46989`l�ÑH?B=>2<7GNO? 7G9M=µ4@[G[:K>A 4CÕZ7G?HNOE07m4@[¨J84@?HAHNOEa64@J87:4@iH[:K 4@?BAÛPZK`PQJL7G28K � � ÕZ7G?HNOE07m4@[ ¥aA�ö Dr§ T�� l � � � ÕZ7:?HN@EF7:4@[ ¥aA�ö D � § 4@?HA� ù � ÕZ7G?HNOE07m4@[ ¥sA�ö D ù § 2<SHK>? � � úê� ù � ÕZ7:?HN@EF7:4@[ ¥aA�ö D � ú D ù § T
�H}����Â���� �¡K>2hÑH9^2<4@\@KÔ2<SB7:9^NO]H]rNOJL28ÑH?H7G2j`2<N ]HJ8KbaOK/?c2w9LNOE0K0=>NO?Bl�ÑH9L7:NO?æT �7:9h4îJ84@?HAHNOE a64@JL7m4@iH[GK ø�õ AHK/?HN62<K/9Í4î]D4@J;2<7G=/ÑH[:4@J¨a@46[:ÑHK0N@l¤28SHK�J84@?HAHNOE a64@JL7m4@iH[GK øA 46?HA D 4@J8K A��H}X�D��� ^µ�3}/` Ò�2<SD4�2h7G9/Ò�� � K/AºJLKµ4@[Ü?½ÑHEÍirK/JL9/T�f-SHK�]H4@J<4@E0K>28K/J D 7G9ÑH98ÑH4@[:[Rj+ÑH?H\½?HN PQ?ê4@?HAßEÍÑH9;2ÐirK0K/9L287:EF462<K>Aßl�JLNOE AD462<4 ø 28SD462 �Ù9ÖPQSD462h9L2<462<7G9L2<7G=µ4@[7:?Bl�K>J8K>?H=/K_7G9Z46[:[r4@irNOÑB2µT �?�EFNO9;2¤9;2546287:9L287:=/4@[åEFN3AHK/[G9/Òc2<SBK/J8K_4@J8KïJ<4@?HABNOE a@46J87m46iH[:K>94@?HA ]D4@J84@EFKb2<K>J89 � ABNO?��Ã2ï=>NO?Bl�ÑH9LK^28SHK/E T�! �"�" ��; " '��(# i 0#&-/' �(#&354('(#6�$% �_� SD469_4�IOK>NOE0K>2<JL7:=ÖAH7G9L2<JL7:iHѬ2<7:N@? PQ7R2<S
]D4@J84@EFKb2<K>J D !ê¥a*BöµüX§ ÒBPQJ87R2828K/? ��� �¨K/N@E ¥ Dr§ ÒB7Gl# ¥�� é = §ÜéED�¥ªü1$ Då§� � � ö = �Vü�$
� KÖSD4µa@K^28SD462 �H + � # ¥�� é = §Üé D �H + � ¥;ü $:Då§�ïé Dü $ ¥;ü $:Då§ éôü�$
f-SH7:?B\�N@l � 4@9¤28SHK¨?½ÑHEÍirK/J-N@l �D7G]H9¤?HK>K/AHK>AîÑH?c287:[ 28SHK �DJL9L2QSHK/4@AH9-PQSHK/? �D7G]H]H7G?HI4w=/NO7G?°T�! �" ����#&- - ��% 08#&-2'��(#&3 4�' #6�$% �ê� SD4@9Ô4 7�NO7:9L98NO?�AH7:9;2<JL7:iHÑB287:NO?�PQ7G2<Sê]D4 )
J<4@E0K>28K/J �¡ÒHPQJ87R282<K>? ��� 7�NO7:9L98NO? ¥ � § 7RlE¥�õ §'é�� ��� � �õ�� õ � *_$'ïN@28K¨2<SD462 �H ��+ �
E¥�õ §Üé�� ��� �H ��+ � � �õ�� é�� ��� � � é�ü�$
.cð ��������� ����������������� ����������� �! "
f-SHK 7�NO7:9L98NO?07:9áN6l�2<K/?ÔÑH9LK/AF4@9Ü4_E0N¬AHK>[¬l�NOJ�=/NOÑH?c289 N6læJ84@J8K¤KbaOK>?½289Ó[G7:\6K-J<4@AH7GNc4@=b2<7Ga@KAHK>=µ4µjî4@?HAî28J<4��0=Ð4@=/=>7:AHK>?½289/T�� l � � � 7�NO7G9898N@? ¥aA�ö � � § 4@?HA � ù � 7�NO7:9L98NO? ¥sA�ö � ù §2<SBK/? � � úÞ� ù � 7�NO7G989LNO? ¥aA�ö � � ú � ù § T �B}6�������� � KÍAHK��H?HK/AgJ<46?HAHNOE¼a64@J87:4@iH[GK/9Q2<N%irKÍEF4@]H]H7G?HIO9Ql�J8NOEY4�984@EF]B[:K98]H4@=/K � 2<N � iBÑB2ïPÓKÐAB7:A`?BN@2dE0K/?c287:NO?�2<SHK^9<46EF]H[GKÖ98]D4@=>KÐ7G?M4@?cj�N@lá28SHK^AH7:9;2<J87 )iHÑB287:NO?B9d4@irN aOK6T Ï 9 �ZEFK>?½287:NO?BK/A`K/4@J8[G7:K>J/ÒH2<SBKh984@EF]B[:K^98]D4@=>KÐN@l�28K/? �LAH7:984@]H]rKµ4@JL9 �iHÑB2d7G2ï7:9WJ8K/4@[:[Rjk2<SHK>J8KÍ7G?î2<SBKÍiD4@=5\½IOJLNOÑH?HA°T �¡K>2 � 9_=>NO?H9L28J8ÑH=b2Ö4�984@EF]B[:K^98]D4@=>KÍK � )]H[G7:=/7R2<[Rj¨l�NOJ'4ÖÕZK/JL?HNOÑH[G[:7¬J<4@?HABNOE�a64@JL7m4@iH[GK@T��¡K>2 �Ûé p *Bö/ü7q 4@?BA0ABK��D?HK # 2<NÖ9<46287:9;l�j# ¥ p Dö � q�§Üé���$ l�NOJ *Ôû��û��dûVü T � 7 � D ! p *Böµü7q 4@?HA AHK��D?BK
�ê¥�¦Z§'é � ü ¦Ûû D* ¦ � Dd$
f-SHK>? # ¥�� é üX§ïé # ¥m¦ôû Dr§Wé # ¥ p *Bö D qm§ïé D 4@?BA # ¥�� é *c§Wé¼ü $ D Thf-S½ÑH9/Ò� � ÕZK/JL?HNOÑH[G[:7 ¥FDr§ T � KÐ=/NOÑH[GAMAHN%28SH7:9Wl�NOJ_4@[G[�28SHKÍAH7G9L28J87:iBÑB2<7GNO?H9WAHK��D?BK/A�4@i N a@K@T�?Í]HJ84@=>287:=>K@Ò6PÓKZ28SH7:?H\¨N6lr4ïJ<46?HAHNOEVa64@J87:4@iH[GK [G7:\6KZ4WJ<4@?HABNOEô?3ÑBEÍi K>J�iHѬ2�l�NOJLE�4@[G[Gj7G2¤7:9Q4ÔE�4@]B]H7:?HIwABK��D?HK>A`NO?�9LNOEFKÖ984@EF]B[:K_9L]D4@=/K6T����� �k(� !�� �|(0&'$cMÍ"%$���(0"%$'.µ"�*M(0*LH,� MÍ"+)M(�� � MÍ&Ü.%U
M JLR� H
�! �"19 %�# l � �(;70#&-2'��(#&354('(#6��% T � SD4@9'4 �W?H7Gl�N@J8E ¥ Dö �b§ AB7:9L28J87GiHÑB2<7GNO?°Ò�PQJ87G2 )2<K>? ��� �W?H7Gl�NOJLE ¥ Dö �b§ ÒH7RlE
¥�õ §Üé � �� � � l�NOJ õ�! p Dö � q* N@2<SBK/JLPQ7G98K
PQSHK>J8K ��� TÜf-SHKÖAH7:9;2<JL7:iHÑB287:NO?%l�ÑB?H=>287:NO?�7:9
m�¥�õ § é�� � * õ���� � �� � � õ�! p Dö � q
ü õ � �0$��� �(; +.B�� � +(4�- -/#I+.% �Ó� SD469Q4 'ïNOJLE�4@[ ¥ N@J �^46ÑH989L7m4@? § AH7:9;2<J87GiHÑB287:NO?�PQ7R2<S
]D4@J84@E0K>2<K>J89��ß4@?HA� áÒDAHK>?HN@28K/A i½j �����|¥ � ö ùb§ ÒH7RlE¥�õ §Üé ü �� 1�� K � ] �D$ ü
1 ù ¥�õ $ � § ù�� ö õ�!`�
������� ">��� �� � ������ ���� �8�8��Z� � � � �c"j����������� ����������� �! " . $
PQSHK/JLK � !Å� 4@?HA � * T �#4628K/J%PÓK+98SD46[:[d9LK/KM28SD462 �V7G9�2<SHK*�;=/K/?c28K/J �ì¥ NOJEFK/4@? § N@lÓ2<SHKÔAH7G9L2<JL7:iHѬ2<7:N@?�46?HA �7:9¨28SHK �;98]HJLKµ4@A �º¥ NOJÖ9;254@?HAH4@J8AßAHK>a37:462<7GNO? § N@l2<SHKîAH7:9;2<JL7:iHÑB287:NO?æTCf-SHK 'ïN@J8EF4@[-]H[m4µj39%46?Û7:E0] N@JL2546?½20JLNO[:K�7:?�]HJ8N@iD4@iH7G[:7G2jÞ4@?HA9L2<462<7G9L2<7G=/9>T��`4@?cj�]HSHK>?HNOE0K/?D4Ô7:?�?D4628ÑHJ8KÖSD4µa@KÐ4@]B]HJ8N � 7:EF462<K>[Gj 'ïNOJLE�46[æAH7G9L2<JL7:iHÑ )2<7GNO?H9/T(�#4628K/J>ÒBPZKÖ9LSD4@[:[ 98K>K^28SD462¤28SHKÖAH7:9;2<J87GiHÑB287:NO?%N6l�4w9LÑHE N@l�J<46?HAHNOEÅa@46J87m46iH[:K>9=µ4@? irKh46]H]HJ8N � 7:EF4628K/A�icj�4�'ïNOJLE�4@[æAH7:9;2<J87GiHÑB287:NO? ¥ 2<SHK¨=>K/?c2<J84@[#[:7:E07G2 2<SBK/NOJLK/E § T� KÜ9<4µj_28SD462 � SD469�4 `%^ �D�����H}�� � �å}6���D�å���a`b^ }����c[d^X��� � 7Rl � é * 4@?BA éü Tf�J84@AH7R2<7:N@?FAH7G=>2<462<K>9Ó2<SH462¤4^9L2<4@?HAD4@JLA 'ïNOJLE�46[DJ<4@?HABNOE a@46J87m46iH[:KQ7:9 AHK>?HN@2<K>A%icj � Tf-SHK � k�l 4@?HA i>kZl N6l_4M9;254@?HAH4@J8A 'ïN@J8EF4@[Z46J8KkAHK>?HN@2<K>A�icj�� ¥��O§ 4@?HA� ¥��c§ Tf-SHK � k�l 7:9-]H[GN@2828K/A�7:? � 7GIOÑHJLK # T . T'f-SHK/JLKh7G9¤?HN0=/[GNO98K>A ) l�NOJLE�K � ]HJ8K>98987GNO?îl�N@J�dTUïK>J8K^4@J8KÖ9LNOEFK¨ÑB98K>l�ÑB[#l�46=>2<9 �¥ 7 § � l �����º¥ � ö °ù § 28SHK/? � é ¥�� $ � § , � �º¥ *¬öµüX§ T¥ 7G7 § � l � ���|¥ *Bö/üX§ 2<SHK>? � é � ú � ���|¥ � ö ùȧ T¥ 7G7:7 § � l � ! ���º¥ ��! ö ù! § Ò�� é�ü@öI$I$I$bö A 4@J8K¨7G?HAHK>] K>?HAHK/?c2W2<SHK>?
�H ! + � � ! ���� �H !,+ � ��! ö �H ! + � ù!�� $� 2Ql�NO[:[GNXPQ9Ól�J8NOE ¥ 7 § 28SD462Q7Rl �����º¥ � ö ùb§ 28SHK/?
# ¥� ��� � �b§ é # 0$ � � � � �"$ � � é � ��$ � � $ � 0$ � � $¥ # T 1O§
f-S3ÑB9�PÓKÜ=µ4@?^=/NOE0]HÑB28KÜ4@?cj¨]HJLNOiD4@iH7G[:7R2<7:K>9rPÓK'P¤46?½2�4@9¡[GNO?HIQ4@9°PÓKÜ=µ4@?^=/NOE0]HÑB28K2<SHK i>kZl � ¥��c§ N6l#4Ð9L2<4@?HAD46J8A 'WNOJ8EF4@[ T Ï [G[å9L254�2<7:9;2<7G=µ4@[D=>NOEF]BÑB2<7G?HIÐ]D46=5\646IOK/9 PQ7:[:[=/NOE0]HÑB28K�� ¥��c§ 4@?HA�� � � ¥��O§ T Ï [G[¤9L2<462<7G9L287:=/9Í28K � 2<9>ÒZ7G?H=/[GÑHAH7:?BIM2<SH7G9wNO?HK@Ò SD4µaOKî4254@iB[:K¨N@l�a64@[:ÑHK>9QN@l�� ¥��c§ T×½Ë@´cÀhÊc¶ o^vHxRy�� �å¢ �c�D© � �Ö���B�@�������º¥ # ö O§>®QÚÜ���rã # ¥�� � ü §/® (°�D�¨� ©� Ý¢3�Â� ©��`�:�
# ¥�� � ü §'é�üD$ # ¥�� � üX§'é�ü $ # � � ü $ #� � éü $ � ¥%$�*_$ �� .h.½§'é $ � ü�$
. � ��������� ����������������� ����������� �! "
* ü 1$^ü$�1EoÖ¥�õ §
õ� 7:IOÑBJ8K # T .H� � K/?B987G2j%N@lá4w9;254@?HAH4@J8A 'ïNOJLE�46[ T
$^©�ÿ rÜ�rã ���b¢å«5�`���B�@� # ¥�� � �O§Zé $ 1H®�1È�Þ©����D�b�^ÿZ©��8ã��5è rÜ�rã �wé � � � ¥ $ 1O§>®��`�� ©� �@�Ö���3�:�У �Fÿ'�È�������3�
$ 1Öé # ¥�� � �O§Üé # � � ��$ � � é � � $ � � $Úæ�5©������D��$^©��È�0�@ ¡� �c£/ :�5è � ¥ $B$ � .Bü ðc§Üé $ 1H® (°�D�b�<��⵩��<�5è$B$ � .Bü ðÐé ��$ � é ��$ #� �@�rãÔ�D�b�æ«5���^é # $ $ � .Bü ð � ^éü�$Gü@ü � ü�$ ò
��� � �$% " %�' #I+�B 0#&-2'��(#&354('(#6��% �¤� SD4@9ï4@? ��� ]rNO?HK/?c287m4@[#AH7G9L2<JL7:iHѬ2<7:N@?�PQ7R2<S]D4@J84@E0K>2<K>J��ÜÒHAHK/?BN@2<K>A`icj ��� � � ] ¥ � § ÒH7GlE
¥�õ §Üé ü�� � � ��� ö õ � *
PQSHK>J8K�� � * T-f-SHKhK � ]rNO?HK/?c287m4@[�AH7:9;2<JL7:iHÑB287:NO?�7G9WÑB98K/AM28NFE0N3AHK/[¡2<SBKh[G7Gl�K>287:E0K/9-N@lK/[GK/=b2<J8N@?H7:=¨=>NOEF]rNO?HK>?c2<9W4@?HAk2<SBKÖPZ4@7R2<7:?BIÔ287:E0K/9¤irK>2PÓK/K>?MJ<46J8K¨K>a@K/?c2<9>T
� +.;H; +@08#&-2'��(#&3 4�' #6�$% � � NOJ� � * Ò#28SHK� �D�º���Ne�[���Yh^ ���r� 7:9ÖAHK �D?HK>Aicj � ¥ � §-é � �� ÷�� � � � � �c÷ T � SD469¨4 �^46EFEF40AB7:9L28J87GiHÑB2<7GNO?�PQ7G2<S`]D4@J<46EFKb2<K/JL9��4@?HA��ÜÒDAHK>?HN@2<K>Aîicj ��� �Ö4@EFEF4 ¥ � ö � § ÒB7RlE
¥�õ §Üé ü� � � ¥ � § õ � � � � � � ��� ö õ � *
������� ">��� �� � ������ ���� �8�8��Z� � � � �c"j����������� ����������� �! " . PQSHK/JLK � ö � � * T�f-SHKQK � ] N@?HK/?c2<7:4@[HAH7G9L28J87:iBÑB2<7GNO?Ð7G9 � ÑH9L2 4 �Ö4@E0E�4 ¥;ü@ö � § AH7G9L2<JL7:iHÑ )2<7GNO?°T � l � ! � �Ö4@EFEF4 ¥ ��! ö � § 46J8K'7:?BAHK/]rK/?HABK/?c2µÒX2<SHK>? F �! + � � ! � �Ö4@EFEF4 ¥ F �!,+ � ��! ö � § T�! �" ?A" ' + 08#&-2'��(#&3 4�' #6�$% � � SH4@9¤4@?�ÕZK>2<4ÐAH7G9L2<JL7:iHѬ2<7:N@?ÔPQ7R2<SF]D4@J<46EFKb2<K/JL9
� � * 4@?HA � � * ÒHAHK/?BN@2<K>A`icj ��� ÕZK>2<4 ¥ � ö � § ÒB7RlE¥�õ §Üé � ¥ � ú � §
� ¥ � § � ¥ � § õ � � � ¥;ü $ºõ § � � � ö *�� õ��ìü�$
� +�% k�� + 4 i �� 0#&-2'��(#&354('(#6��% �Ü� SD469¤4 � AH7:9;2<J87GiHÑB287:NO?wPQ7G28S��ÔAHK/IOJLK/K>9ZN@ll�J8K>K/AHNOE � PQJL7G2828K/? ������ � 7RlE
¥�õ § é � � � � �ù �� � � ù �
ü� üÓú ���� � � � � � � � ù $
f-SHK � AH7G9L2<JL7:iHѬ2<7:N@?�7:9Ó987GEF7G[m4@Já28Nw4�'ïN@J8EF4@[åiHÑB2-7R2¤SD4@9 2<SH7G=5\@K>J¤254@7G[:9/T �?%l�4@=>2/Ò¬28SHK'ïNOJLE�46[D=/NOJLJ8K/9L] N@?HAH9Z28NÐ4 � PQ7G2<S� é�& Táf-SHK 4@ÑH=5Scj�AB7:9L28J87GiHÑB2<7GNO?Ô7:9Ó4h9L] K>=/7m46[=µ4@9LKhN6l�28SHK � AH7G9L28J87:iBÑB2<7GNO?k=/NOJLJ8K>98]rNO?HAH7G?HI028N�� é�ü T'f-SHKÖAHK/?H9L7G2j�7G9E
¥�õ §'é ü�Ü¥;üÓúêõ ù § $f�N098K>K¨2<SD462Q28SH7:9-7G9-7:?HAHK>K/Aî4wAHK/?H9L7G2j@Òr[GK>2 �Ù9-AHNÔ2<SBK^7G?c2<K/I@J<4@[ �� �
��� E¥�õ §��Oõ é ü� � �
��� �OõüÓúêõ ù é ü� � �
��� � 2546? � ��Oõé ü� � 2546? � � ¥ &C§�$ 2<4@? � � ¥%$�&C§��Wé ü��� � 1 $�� $ � 1���� éü�$
�! �"�� ù k #&-2'��(#&354('(#6��% �'� SD4@9�4 � ù AH7G9L2<JL7:iHѬ2<7:N@?_PQ7G2<S D AHK/I@J8K/K>9�N@l3l�J8K>K/AHN@E� PQJ87G2L2<K>? ��� � ù� � 7GlE
¥�õ § é ü� ¥FD ,�1@§ 1 � � ù õ � � � ù � � � � � � � ù öñõ � *_$
� l � � öI$7$I$/ö � � 4@J8K�7:?HAHK>] K>?HAHK>?½2�9;254@?HAH4@J8A 'ïN@J8EF4@[@J84@?HAHNOE a64@J87:4@iH[:K>9 28SHK/? F � !,+ � � ù! �� ù� T
h* ��������� ����������������� ����������� �! "
����� ��.��BMÍ&Ü.7Mh$$ G .IH�$á&Ü.IJ+*`$'./(0"LH�Ö7RaOK>?ì4º]H4@7:J%N6lhAB7:98=>J8Kb2<K+J<46?HAHNOE a@46J87m46iH[:K>9 � 4@?HA � ÒWAHK �D?HK+2<SHK�� � ��� ^
�|� `7`?e�[���Yh^ ���r� icjE¥�õ#ö<÷B§ é # ¥�� éìõ 4@?HA � é ÷H§ T � J8NOE�?HN PôNO?°ÒDPZK^PQJ87R2<K
# ¥���é õ 4@?HA �ôé ÷H§ 4@9 # ¥���é õ#ö �é ÷H§ T � KWPQJL7G2<KE4@9
Eo�� � PQSHK/?0PZKWP¤4@?c2
2<NÔirKÖEFNOJLK¨K � ]H[G7:=/7R2µT×½Ë@´cÀ^Êc¶ oÖvHxzy =h�b�<�`�:�g��£/� ���@�È���@�ª�Mã@�:�b���b��£/¢3��� ©��ß⵩�� �ÂÿZ©|�L�@�rã3©�� ���@�b���c£/ :�5�0��@�rã�� �8�3«5��� ������3� ���@ Ý¢D�5� <k©�� 9\6
�é * ��éü� � < 9%:�� @7:�� 9%: � � 9 @7:�� ;h:�� 9%:
9%: 9%: 9(°�3¢c�5è # ¥���é�üOö �ôéüX§'é
E¥ªüOöµüX§áé . , ®¤ò
n'obp qOs t<s³uBqÍvHxRy ? 1b�+���D�Ô«5©��D�Â���D¢D©�¢c�w«8��� �5è ÿZ�0«8�@ � ��dâÈ¢3�æ«b�Â� ©��E¥�õ#ö<÷B§Ð�h�BãLâ
⵩��k���D� �L�@�rã3©�� ���@�È���c£/ :�5� ¥��Mö �Ч � â � � E¥�õ#ö<÷B§ � * ⵩�� �@ � -¥�õ#ö<÷B§5è � ��� � �
��� � ���� E
¥�õ#ö8÷H§ �cõ �c÷ é üê�@�rã�èÓ⵩��`�@� �g� �b� � � ���ê�-è # ¥8¥��Mö �w§ !� § é�� �� E
¥�õ#ö<÷H§ �Oõ �½÷æ® 1b�%���D�dã@�:� «b�<�b�ª�¨©��_«5©��D�����D¢å©�¢c�_«8��� �WÿZ�ïã3�srÜ�æ�-���D�� ©����D�ci>kZl ����mZo�� �Ó¥�õ#ö8÷H§'é # ¥�� ûÛõ#ö � ûC÷B§/®
×½Ë@´cÀ^Êc¶ oÖvHx Ø =Ûà��b�¤¥��`ö �ͧh£b�Ö¢¬�D� ⵩��È��©��M���D�^¢3�D���Ü�/ä/¢B�@�<�/®?(°�D�b�BèE¥�õ#ö8÷H§áé � ü 7Gl *wûÛõMûVü@ö *wûC÷%ûVü
* N@28SHK/J;PQ7:98K $ÚÜ���rã # ¥�� �ìü-,�1¬ö �2� ü-,�1O§>® (°�D�d� �@�b�D� � éôó � �ñü-,�1¬ö �2�ìü-,�13ýF«5©��b�<�5��D©��rã���©��0�b¢H£b� �b�Q©ªâÖ���D�^¢3�D��� �/ä/¢B�@�<�/® 1È�D�ª� �@�L�@�����3� E
©�@�b�Ö���3�:�^�b¢B£b� �b�-«5©��b�<�5��D©��rã��5èÜ������3�:�|«8��� �5è��ª©Û«5©��d�r¢3�����3�����D���@�<�8�C©ªâM���D�M� �b� � ÿ��3� «5�C�:�]9%:�;r® �°© è # ¥�� �ü-,�13ö �2�ñü-,h1O§'é�ü-,2.D®¤ò
����� �N� �0��� ������� � � "!���� ���8Z�%�8��" ¬ü×½Ë@´cÀhÊc¶ o^vHxÙجyêà#�b�Z¥��Mö �ͧ¨�B� �@�^ã3�b�B�b��� �E
¥�õ#ö<÷B§'é � õÔúê÷ 7Gl *wûCõMûVüOö�*0ûC÷�ûVü* N@28SHK/J;PQ7:98K $
(°�D�b� � �� � �
� ¥�õwú�÷B§��Oõ �c÷ é � ��
� � �� õ��Oõ�� �c÷¨ú � �
�� � �� ÷ �Oõ�� �c÷
é � ��
ü1 �½÷_ú�� �
� ÷ �½÷wé ü1 ú ü
1 éüÿ��3� «5� �@�b�È� r¤�5�Ö���B�@�Ó���3�:�^�:�Ð� � k�l��ïò×½Ë@´cÀhÊc¶ o^vHxÙØ@Ø 1�âÜ���D�¤ã@�:�b���b��£/¢3��� ©��w�:�-ã3�srÜ�æ�8㨩�@�b�Z�_�æ©��'&�<�5«b� �@�3�@¢¬ R�@�'�<� �@� ©��Bèæ���D�b����D�Ô«8�@ :«b¢B�@�Â� ©��B�^�@�5�Ð��£/��� ��©��<�w«5©��d�r Ý� «8�@�ª�8ãc®�=^�b�<�Ð�:�Ð�@�+�D� «5�w�Lç½�@�d�r :�¨ÿ��3� «5� 1£b©��È�5©�ÿZ�8ãQâÈ�5©�� (Ð���¤�5© ©��Ó�@�rã��°«5�D�b� � �:�5� �g@_<\</@ @®-à��b�Z¥��`ö �ͧ_�B� �@�hã3�b�B�b��� �E
¥�õ#ö<÷H§�é���� õåù5÷ 7Gl õåùïûC÷%ûVü* N@28SHK/J;PQ7:98K $
$h©���8rÜ�<�b�W���B�@� $^ü%û�õ�ûÅü½® $^©�ÿ :�b�Q¢c�8rÜ�rã ���D� ���@ Ý¢D�%©ªâ � ® (°�D�Í�Â�È� « �%�D�b�<��:�Í�©k£b�F«8�@�<��âÈ¢3 á�c£b©�¢3�¤���D�Í�L�@�3�3�0©ªâÐ���D�� �@�L�@�Â� ©��r® �`�ï�r� « ��©��æ�����@�b���c£/ :�5è�õê�/� �Xè�@�rãÍ :�b�á�����L�@�3�3�Ö©�@�b�ï����� ���@ Ý¢å�5�/® (°�D�b�Bè¬âµ©��¨�8�3«5��r�ç¬�8ã ���@ Ý¢å�Ö©ªâÜõ°è'ÿZ�ï :�b�æ÷ ���@� �©�@�b�������%�L�@�3�3�Fÿ��3� «5���:�^õåù û�÷�û ü½® 1b�d�0� � �D�b � � â��c©�¢Þ :© © �M�@�cr��@¢3�5� ½®c®(°�3¢½�5è
ü é � � E¥�õ#ö<÷B§��c÷ �Oõ�é � � �
� �� ���� õ ù ÷ �c÷ �Oõ
é � � �� �
õ ù � � �� � ÷ �c÷ � �Oõ�é � � �� �
õ ù ü1$�õ��1 �cõké . �1¬ü $
=h�b�æ«5�5è � é 1¬ü0,2. $ $h©�ÿV :�b���«5©��d�r¢3�ª� # ¥�� � �w§>® (°�3�:�k«5©��È�<�5��D©��rã��Ô�©î���D�� �b� � é�ó¬¥�õ#ö8÷H§Èø *�ûÅõìû üOö8õåùkû�÷�ûÅõ¡ý¬® �� ©�¢�«8�@�Þ� �5�����3�:�k£ �+ã@�8�@ÿ'���3�g�ã@��� �@�L�@�0®� �°© è
# ¥�� � �Ч~é 1¬ü. � �
� � ���� õ ù ÷ �½÷ �Oõ�é 1¬ü. � �
� õ ù � � ���� ÷ �½÷ � �cõé 1¬ü
. � �� õ ù õåù"$ºõ��
1 �cõké #1h* $|ò
�1 ��������� ����������������� ����������� �! "
* ü
ü
÷ÔéCõåù÷Ôé õ
õ
÷
� 7GIOÑHJ8K # T 3� f-SHKÔ[:7:I@S½2_98SD4@ABK/AßJ8K/I@7:NO?+7G9 õ ù û�÷�û�ü Twf-SHKÔAHK>?H987R2j+7:9¨]rNO987R2<7RaOKN aOK>Jh2<SB7:9^J8K/I@7:NO?°T�f-SHKFSD4628=5SHK/AÞJLK/IO7GNO?�7G9^28SHK�KbaOK>?½2 � � � 7:?c28K/J89LK/=b2<K/AºPQ7R2<Sõ ù ûC÷%ûVü T
����� � MÍ&��0.µ"PMQR G .IH�$á& .7J+*`$'./(0"LH
n'obp qOs t<s³uBqÍvHxÙØ@v 1�âÜ¥��Mö �ͧ'�B� �@� � ©����D�°ã@�:�b���È��£/¢¬��� ©��wÿ'�����¨�0���5�åâÈ¢¬�æ«b��� ©��Eo � �'è
���D�b�+���D�ï���H} ���#�D�Z�|� `I` eg[��ZYh^X��� � e �å}�� �:�hã3�srÜ�æ�8ã�£ �Eo¨¥�õ §Üé # ¥�� é õ §Üé H # ¥�� éCõ#ö �éì÷H§'é H
E¥�õ#ö<÷B§ ¥ # T # §
�@�rãF���D�ï�|�H} r�����D�Ó�|� `I`�e�[���Yh^ ���r�Ne �å} �Y�:�hã3�srÜ�æ�8ã%£ �E�Ó¥�÷B§áé # ¥ ��é ÷H§'é H � # ¥���é õ#ö �é ÷B§'é H � E
¥�õ#ö<÷H§F$ ¥ # T .c§
×½Ë@´cÀ^Êc¶ oÖvHx Øh& �å¢X�c�D© � � ���B�@�Eo�� �+�:���@� �@�b�Í���w���D� � �c£/ :�'���B�@��⵩� � :©�ÿ��/® (°�D�Ü�0�@�Â�@���r�@
ã@�:�b���b��£/¢3��� ©��d⵩��á� «5©��È�<�5��D©��rã��Q�©Ö���D�W�<©�ÿº�©�� �@ ·�ï�@�rãh���D�Q�0�@�Â�@���r�@ åã@�:�b�Â�È��£/¢3�Â� ©��⵩�� � «5©��È�<�5��D©��rã��¨�©0���D�w«5©� Ý¢3�w�B�Ö�©�� �@ ·�/®
�����\� �D�����8� ��� � � � "!���� � �8Z�%�8� " #�é * �éü
� � < 9%:T9�< @7:T9�< b:T9�<� � 9 b:T9�< ;h:T9�< � :T9�<
;h:T9�< � :T9�< 9Ú�©��Ð�Lç½�@�d�r :�5è
Eo_¥ *c§'é # ,¬ü0*M�@�rã
EoÖ¥ªüX§'é $ ,¬ü0*D®¤ò
n'o>prqOs t<s uBqwvHx ØhVCÚ�©��-«5©��D�����D¢å©�¢c�¤�L�@�rã3©������@�È���c£/ :�5�5è°���D�-�0�@�Â�@���r�@ 3ã3�b�B�b����� �5��@�<� E
oÖ¥�õ §Üé � E¥�õ#ö<÷B§��c÷ ö 4@?BA
E�Ó¥�÷H§'é � E
¥�õ#ö<÷B§��Oõ!$ ¥ # T O§(°�D��«5©��b�<�5��D©��rã@���3�`�0�@�Â�@���r�@ Óã@�:�b�Â�È��£/¢3�Â� ©���âÈ¢3�æ«b�Â� ©��B�%�@�<��ã3�b�æ©���8ãg£ � mZo�@�rãQm��Ó®
×½Ë@´cÀhÊc¶ o^vHxÙØ � �å¢ �c�D© � �Ö���B�@� Eo�� �Ó¥�õ#ö8÷H§'é�� � � � � �
⵩��-õ#ö<÷ � *D® (°�D�b� EoÖ¥�õ §Üé�� � � � �� � � �c÷Ôé�� � � ®-ò
×½Ë@´cÀhÊc¶ o^vHxÙØ � �å¢ �c�D© � �Ö���B�@�E¥�õ#ö<÷B§'é � õÔúê÷ 7Gl *wûCõMûVüOö�*0ûC÷�ûVü
* N@28SHK/J;PQ7:98K $(°�D�b� E
�Ó¥�÷B§'é � �� ¥�õÔúê÷H§ �Oõ�é � �
� õ �cõÍú�� �� ÷ �c÷Ôé ü
1 ú�÷ $�ò
×½Ë@´cÀhÊc¶ o^vHxÙØ Cà#�b�Z¥��Mö �ͧ¨�B� �@�^ã3�b�B�b��� �E¥�õ#ö<÷B§áé � ù �� õåù5÷ 7Gl õåùïûC÷%ûVü
* N@2<SBK/JLPQ7G98K $(°�3¢½�5è E
oÖ¥�õ §Üé�� E¥�õ#ö<÷B§��c÷Íé 1¬ü
. õ ù � ���� ÷ �c÷wé 1¬ü� õ ù ¥ªü1$ºõ � §
⵩�� $hü^ûÛõMûVü��@�rãEoÖ¥�õ §Üé *ß©����D�b�Èÿ'�:� �/®¤ò
2. ��������� ����������������� ����������� �! "
����� !#"+) �� ¨"+) ¨"%$ � MÐ"+)M(�� � MÐ& .bM JLR� H
n'obp qOs t<s³uBqÍvHxÙØ ?X(#ÿZ© �L�@�rã3©�� ���@�b���c£/ :�5�Ö� �@�rã � �@�<�Ð�Â���#� A'�3�����3� ^�� â<è⵩��Í� �@�b� � � �@�rã��0è
# ¥�� ! � ö � !��w§Üé # ¥�� ! � § # ¥ ��!��w§F$ ¥ # T ðO§�`�^ÿ'�È���ª�Q��� �%®
�?Ô]HJ87G?H=/7G]H[:K6Ò@2<N¨=5SHK>=5\wPQSHK>28SHK/J � 4@?HA � 4@JLK-7:?HAHK>] K>?HAHK>?½2ÜPÓKQ?HK/K>A028N¨=5SHK/=5\K:�½ÑD4�2<7:N@? ¥ # T ðc§ l�NOJÖ4@[:[�98ÑHiH9LK>289�� 4@?HA � T � NOJL28ÑH?D4628K/[Gj@Ò°PZK0SD4µaOKÔ2<SBKÔl�N@[:[:N PQ7G?HIJ8K>98ÑH[R2¨PQSH7:=5SgPÓK09;254628Kwl�NOJ¨=/NO?c287:?½ÑHNOÑH9ÖJ84@?HAHNOEYa64@J87:4@iH[:K>9d2<SBNOÑHIOSg7G2Ö7G9d2<JLÑHKwl�NOJAH7G98=/JLK>28K^J84@?HAHN@E�a64@J87:4@iH[:K>9Z2<N3NHT¯Q²Xo/uc»mo�À+* ,6*���à#�b�¡� �@�rã �¼�B� �@� � ©����D�°�BãLâ
Eo � �Z®?(°�D�b�0�� � � â^�@�rãk©��D �0� âE
o � � ¥�õ#ö<÷B§'éEo_¥�õ §
E�Ó¥�÷B§á⵩��Ð�@ � ���@ Ý¢å�5�Qõ��@�rãh÷殯Q²Xo Ǫt;´ tªo6Àïo6q t s³Ç
qXu6t »Âs 9@u½»:uBÆXÇ Ä@ob¾±/´½ÆXÇ;o%t5²Xo%¿co�qXÇÈs t¸gs³Ç¿cobp qXo>¿ uBqc¶ ¸�ÆcÊ tªuÇ;o>tªÇQu@Ì Àïoµ´XÇbÆc»:o�=Hx ×½Ë@´cÀ^Êc¶ oÖvHx vByÞà��b��� �@�rã �¼�B� �@�Ö���D�Ü⵩� � :©�ÿ'���3�Fã@�:�b�Â�È��£/¢3�Â� ©�� 6
�é * ��éü� � < 9%:�; 9%:�; 9%: @� � 9 9%:�; 9%:�; 9%: @
9%: @ 9%: @ 9
(°�D�b�BèEo¨¥a*c§Wé
Eo_¥;üX§Wé�ü-,�1ß�@�rã
E�Ó¥ *c§Qé
E�Ó¥ªüX§Qé�ü-,�1H®^� �@�rã�� �@�<�w���rã3� &
�D�b�rã3�b�D�ï£b�5«8�@¢½� �Eo_¥ *c§
E�Ó¥ *c§dé
E¥a*Bö *c§<è
Eo_¥ *c§
E�Ó¥;üX§dé
E¥a*BöµüX§<è
Eo¨¥;üX§
E� ¥ *O§¨éE
¥ªüOö *c§<èEoÖ¥ªüX§
E� ¥;üX§Öé
E¥;üOöµü §/® �å¢ �c�D© � �0���B�b�ª�8�cãî���B�@�Ó� �@�rã � �B� �@�0���D�Q⵩� &
:©�ÿ'���3�%ã@�:�b���È��£/¢¬��� ©�� 6�é * ��éü
� � < 9%: @ < 9%: @� � 9 < 9%: @ 9%: @
9%: @ 9%: @ 9
�����0��� ��� � ��� � ����������� ����������� �! " � (°�D�5� ���@�<�Þ�æ©��k���rã3��D�b�rã3�b�D�k£b�5«8�@¢½� �
Eo_¥ *c§
E�Ó¥;üX§Cé ¥;ü0,�1O§/¥ªü-,�1O§ é ü-,2.��c�b�E
¥ *¬öµüX§'é *H®-ò
×½Ë@´cÀhÊc¶ o^vHxÙv@Ø �å¢ �c�D© � �Q���B�@�å� �@�rã � �@�<�Q���rã3��D�b�rã3�b�D���@�rãÍ£b©����Í�B� �@�Q���D�-�/�@���ã3�b�B�b��� � E
¥�õ §Üé � 1�õ 7Rl *ÔûÛõ`ûôü* N62<SHK>JLPQ7:9LK $
à#�b�Ó¢c� rÜ�rã # ¥���ú � ûôüX§>®��¡�b���3�����rã3��D�b�rã3�b�æ«5�5è����D� � ©����D�¤ã3�b�B�b��� �F�:�E¥�õ#ö<÷B§'é
Eo_¥�õ §
E�Ó¥�÷B§'é � .@õå÷ 7Rl *wûÛõ`ûü@ö *wûC÷%ûVü
* N@28SHK/J;PQ7:9LK $$h©�ÿ�è
# ¥���ú � ûVüX§~é � � � � K � E ¥�õ#ö<÷B§��c÷ �Oõé . � �
� õ � � � � �� ÷ �c÷ � �Oõ
é . � �� õ ¥;ü $ºõ §;ù
1 �cõké üð $�ò
f-SHK^l�N@[:[:N PQ7G?HIÍJ8K>98ÑH[R2Q7:9-SHK>[:]Bl�ÑB[°l�NOJ¤aOK>J87Gl�j37G?HIÔ7:?HAHK>] K>?HAHK>?H=/K6T¯W²Xo>uc»mo6À+*-,6* * �å¢X�c�D© � �î���B�@�Í���D�`�L�@�3�3�g©ªâÔ� �@�rã � �:�M� �G�D© �5�b��£/ �|���IrÜ�D���� �<�5«b� �@�3�@ :�/® 1�â E
¥�õ#ö<÷B§Óé��æ¥�õ §��#¥�÷B§Ü⵩��_� ©����ÜâÈ¢¬�æ«b��� ©��B���|�@�rã� � �æ©��Ó�æ�5«5�5�5�/�@�b�� ��r�<©@£5�c£/�� Ý��� ��ã3�b�B�b��� �-âÈ¢3�æ«b�Â� ©��B� 0���D�b��� �@�rã � �@�<�^���rã3��D�b�rã3�b�D� ®×½Ë@´cÀhÊc¶ o^vHxÙv2& à#�b�#� �@�rã�� �B� �@�^ã3�b�B�b��� �E
¥�õ#ö<÷B§'é � 1 � � � � �Hù � 7Rl õ � * 4@?HA ÷ � ** N62<SHK>JLPQ7:9LK $
(°�D�Ü�L�@�3�3�-©ªâ¡�~�@�rã ��:� ���D� �<�5«b� �@�3�@ :�Ó¥ *Bö &C§ �ï¥ *BöC&C§>® �M�-«8�@�wÿ'�È����E¥�õ#ö8÷H§áé
�æ¥�õ §�#¥�÷H§¨ÿ��D�b�<���æ¥�õ §'é 1 � � � �@�rã�#¥�÷B§'é�� � ù ®?(°�3¢½�5è¡��� �%®-ò
@ð ��������� ����������������� ����������� �! "
����� ��(0"+)+.>$á./(0"PM R G .7H�$'&Ü.IJ+*`$á./(0"LH� l � 4@?HA � 4@J8KÓAH7:9L=/JLK>2<K6Ò@2<SBK/?ÍPÓK-=µ4@?Í=>NOE0]HÑB2<K 2<SHKZ=/NO?HAB7G2<7GNO?D4@[cAB7:9L28J87GiHÑB2<7GNO?
N@l � IO7Ga@K/?g2<SD4�2_PZK0SD4µaOKÔNOiB98K/J;aOK>A �¼é ÷ TÔ1¬]rK/=/7 �D=µ46[:[Gj@Ò # ¥�� é õ�� �¼é ÷H§ïé# ¥�� é�õ#ö � é�÷H§ , # ¥ ��é�÷B§ T%f-SH7G9Ð[GKµ4@AH9^ÑH9Ö2<NîAHK��D?BKF2<SBK�=/N@?HAH7G287:NO?H4@['E�46989l�ÑH?H=b2<7GNO? 4@9-l�NO[:[GNXPQ9>T
n'obp qOs t<s³uBqÍvHxÙv2V (°�D��YO�r����� ^X��� �#�D�(A�}X�å���H�������s^CB��|� `7`�eg[��ZYh^X��� � � `Eo�� � ¥�õ�� ÷B§'é # ¥���é õ�� �éì÷H§�é # ¥���é õ#ö �é ÷B§
# ¥ �éì÷B§ éEo�� �Ó¥�õ#ö<÷B§E�Ó¥�÷B§
� âE�Z¥�÷H§ � *D®
� NOJ¨=/NO?c287:?½ÑHNOÑH9^AH7G9L2<JL7:iHѬ2<7:N@?H9dPÓKFÑH9LK028SHKÔ9<4@E0KwAHK �D?H7G287:NO?B9/TÔf-SHK07:?c2<K>J8]HJLK )�+o�´�»moÞt5»mo/´�¿Bs·q 9ôs·q¿co>o6Ê 3'´Xtªo�» ²Xo6»:oOx��²Xo6q 3'oô±>uBÀhÊcÆ tªo# ¥�� ! � � � é ÷B§s·q t5²Xo ±/u¬q t<sÎqcÆXuBÆXDZ/´�Ç;o 3'o ´�»moÛ±>uBqX¿Bs ¾t<s uBqOsÎq 9ÐuBq¨t5²Xo_o>ÁXo�q tó��é ÷rý 3W²Os ±6² ²�´�ÇÊ�»mu¬Ä�´½ÄOs·¶Ùs t�¸/=Hx �go´ ÁXu¬s ¿ t5²Os³Ç Ê�»muBÄc¶ o6ÀÄb¸ ¿co>prqOsÎq 9 t5²OsÎq 9OÇs·q tªo�» ÀïÇ u@Ì t5²Xo� kZlÞx ¯W²Xo Ì�´�±btt5²�´ tÐt5²Os Ç%¶ o/´�¿cÇÍtªuß´3áo6¶Ù¶ ¾¿cobp qXo>¿ t5²Xo>u½»G¸s Ç�Ê�»mu@ÁXo>¿ s·q~Àduc»mo´X¿OÁ�´cqX±/o/¿ ±/uBÆc»:ÇLo>Ç/x�+o ÇÈs·ÀhÊc¶ ¸ t;´ �5o�s t´XÇQ´Ð¿cobp qOs t<s³uBq3x
254�2<7:N@?`AH7 �æK>J89 � 7G?`2<SHKÐAH7:9L=/J8Kb2<KÍ=/4@98K6ÒEo�� � ¥�õ�� ÷B§ 7:9 # ¥�� éôõ�� �Åé÷B§ iHѬ2_7:?î2<SHK
=/N@?½287:?½ÑHNOÑB9W=/4@98K6ÒDPZKÖEÍÑH9;2W7:?c2<K>IOJ<4628K_2<NÔIOKb2d4w]HJLNOiD4@iB7:[:7R2jOT
n'obp qOs t<s³uBqÍvHxÙv � Ú�©��Ó«5©��D�Â���D¢D©�¢c� �8�@�rã3©�� ���@�b���c£/ :�5�5è ���D��YO�r�����s^ ���r�#�D� A�}X�å����H�����Â� ^CBê���3�Z`µ� ^�B e�[��ZYh^X���r�Û�a`E
o�� � ¥�õ�� ÷B§'éEo�� � ¥�õ#ö<÷B§E�Ó¥�÷B§
���5�b¢3�w���3�F���B�@�E�Z¥�÷H§ � *D®?(°�D�b�Bè# ¥�� ! � � �ôé ÷H§'é�� � E
o�� �Z¥�õ�� ÷B§��Oõ!$×½Ë@´cÀ^Êc¶ oÖvHx v �Ûà��b�'� �@�rã�� �B� �@�0��¢¬�D� ⵩��È� ã@�:�b���È��£/¢¬��� ©���©������D�Í¢¬�D���¤�/ä/¢B�@�5�/® �b�È� â �Ö���B�@�
Eo� � ¥�õ�� ÷B§'é�üÓ⵩�� *wûÛõMûVüw�@�rã�<w©����D�b�Èÿ'�:� �/® (°�3¢½�5èæ�@� �@�b� ��é ÷åè
� �:� �W?H7Gl�N@J8E � < è 9 @® �`�w«8�@�Mÿ'�È���ª�^���3�:�Ð���Q��� �ôéì÷ � �W?H7Gl ¥ *Bö/üX§/®¤ò� J8NOE�28SHKÔAHK��H?H7G287:NO?gN@lÓ2<SHKÔ=/N@?HAH7G287:NO?H4@[áAHK>?H987R2jOÒ#PZK098K>K028SD462
Eo�� �Ó¥�õ#ö<÷H§dé
������� �8�8��� �7Z�%�8��� � � � "!���� ��� Z�%����" $Eo�� � ¥�õ�� ÷B§
E� ¥�÷B§^é
E� � o ¥�÷ � õ §
Eod¥�õ § Tîf-SH7G9Ð=/4@?º98N@EFKb2<7:E0K/9ÖirK�ÑH9LK>l�ÑH[Z4@9Ð7G?�28SHK
?HK � 2ïK � 4@E0]H[:K6T×½Ë@´cÀhÊc¶ o^vHxÙv Cà#�b�E
¥�õ#ö<÷B§'é � õÔúê÷ 7Gl *wûCõMûVüOö�*0ûC÷�ûVü* N@28SHK/J;PQ7:98K $
à#�b�h¢c� rÜ�rã # ¥�� �~ü-,2. � � é ü-, # §>® 1È� �Lç½�@�d�r :� ½® @ �`ÿZ�%�/�@ÿ ���B�@�E�Ó¥�÷B§Fé
÷¨úì¥;ü-,�1@§/®�=^�b�æ«5�5è Eo� � ¥�õ�� ÷H§'é
Eo�� �Ó¥�õ#ö8÷H§E�Ó¥�÷B§ é õÔú�÷
÷Öú �ù$
�°© è
� B� � ü.����
�é ü# � é � ��� �
�Eo� � Hõ �
���
ü# � �Oõ
é � ��� ��
õwú ���� ú �ù�Oõ�é �� ù ú ���� ú �ù
é ü7.# 1 $ºò
×½Ë@´cÀhÊc¶ o^vHxÙv ? �å¢ �c�D© � �Ö���B�@����� �ï?H7Rl ¥ *Böµü §/®¤{áâÈ�ª�b�Í©@£/� �@���D���3�%� ���@ Ý¢D�Í©ªâ-� ÿZ��3�b�æ�b�L�@�ª� � � ��éCõ1� �ï?H7Rl�NOJ8E ¥�õ#öµüX§>® �%�B�@�'�:�_���D�_�0�@�Â�@���r�@ æã@�:�b���È��£/¢¬��� ©��ß©ªâ ���
ÚÜ���<�b�Ó�æ©���^���B�@��è Eo¨¥�õ §Üé � ü 7Rl *0ûÛõ`ûVü
* N@28SHK/J;PQ7:9LK�@�rã E
� � o_¥�÷ � õ §áé � �� � � 7Gl *�� õ���÷ � ü* N@2<SHK>JLPQ7G98K $
�°© è Eo�� �Ó¥�õ#ö<÷B§'é
E� � o_¥�÷ � õ §
Eod¥�õ §Üé�� �� � � 7Rl * ��õ�� ÷ � ü
* N62<SHK>JLPQ7:9LK $(°�D�Ö�0�@�Â�@���r�@ @⵩�� � �:�E
�Ó¥�÷B§'é � �
Eo�� �Ó¥�õ#ö<÷H§ �Oõké �
��Oõ
ü $|õ é $ � � � �
���� é/$ [:N@I ¥;ü $Þ÷B§
⵩���* � ÷ �ñü½®-ò
� ��������� ����������������� ����������� �! "
×½Ë@´cÀ^Êc¶ oÖvHx'& =þÓ©��B�b��ã3�b�k���D�îã3�b�B�b��� �ß��� ��ç½�@�d�r :�! ½® @��½®Cà#�b� " � rÜ�rã E� � o ¥�÷ � õ §/®
���D�b�ß� é õ°èï÷Û�w¢c�b�d�/�@�Â�:��â �%õ ù û¼÷ û~ü½® �-�@�È Ý� �b�<è^ÿZ���/�@ÿ���B�@�EoÖ¥�õ §Ôé
¥ 13ü-, � §õåùX¥;üD$ºõ��>§>®�=h�b�æ«5�5èD⵩��-õåùWûC÷%ûVü6èE� � o ¥�÷ � õ §áé
E¥�õ#ö<÷B§Eo¨¥�õ § é ù �� õåù<÷ù �� õ ù ¥ªü $ºõ � § é 16÷
ü $|õ � $$^©�ÿ :�b�¨¢c�`«5©��d�r¢3�ª� # ¥ � � # ,2. � � é ü-,�1O§>®+(°�3�:� «8�@�C£b�`«5©��d�r¢3��8ã�£ � rÜ�<�b��æ©������3�F���B�@�
E� � o_¥�÷ �Îü-,�1@§áé # 1�÷ ,¬ü H®�(°�3¢c�5è
# ¥ � � # ,/. � � é�ü0,�1O§'é � �� � �
E¥�÷ �·ü-,�1O§ �½÷hé � �
� � �# 16÷ü �c÷wé $
ü $|ò
����� � *LR>$á.��BMÍ&Ü.7Mh$$ G .7H�$'&Ü.IJ+*`$á./(0"LHNMÍ"+) ����� � M!� �LR� H�¡K>2 � éY¥�� � öI$I$7$/ö8� � § PQSHK/JLK � � öI$7$I$>öL� � 4@J8KÔJ<46?HAHNOEYa@46J87m46iH[:K>9/T � Kw=µ4@[G[� 4 �8�@�rã3©�� �@�5«b�©�� T �¡K>2 E
¥�õ � öI$I$7$>ö8õ � § ABK/?HN@28Kï28SHK_]rABlªT � 2¤7G9Ó]rNO9L987:iB[:KQ2<NÐABK��D?HK2<SBK/7:J�EF4@J8IO7G?D4@[G9/Ò�=/NO?HAB7G2<7GNO?D4@[G9#K>2<=6T�EÍÑB=5SÍ2<SHK¤984@E0KÓPZ4 jÍ4@9�7G?h28SHK¤iH7Ra@46J87m4�2<K =µ4@9LK@T� K^984µj%2<SD4�2 � � ö7$I$I$/öL� � 4@J8KÖ7:?BAHK/]rK/?HABK/?c2W7GlªÒBl�NOJ-KbaOK>JLj � � ö7$I$I$/ö � � Ò
# ¥�� � ! � � öI$I$I$>ö8� � ! � � §Üé � !,+ � # ¥�� ! ! �1! §F$ ¥ # T $ §
� 2Ü98Ñ �F=>K/9 2<N^=5SHK/=5\Í28SD462E¥�õ � öI$I$I$>ö8õ � §Üé� �!,+ � E o J ¥�õ ! § T � l � � öI$I$7$>ö8� � 4@JLK-7:?HABK )
]rK/?HAHK>?c2Q4@?HAkK/4@=5SkSD4@9Z28SHK_984@EFKïE�46J8IO7G?D4@[DAH7G9L2<JL7:iHѬ2<7:N@?0PQ7R2<S%AHK>?H987R2jEÒ½PÓK¨9<4µj
2<SH462 � � ö7$I$I$bö8� � 46J8K # # kV¥ 7:?HAHK>] K>?HAHK>?½2_4@?HAî7:AHK>?½287:=/4@[:[Rj�AH7G9L28J87:iBÑB2<K>A § T � K^98SD4@[G[PQJ87R2<K^2<SH7G9_4@9 � � ö7$I$I$ª� � �
ENOJ/Ò 7:?`28K/JLEF9ïN@l'2<SHK i>k�l Ò � � öI$I$I$;� � � m Tïf-SH7:9
E0Kµ4@?H9^2<SD4�2 � � öI$7$I$/ö8� � 4@J8K07:?BAHK/]rK/?HABK/?c2wAHJ84 PQ9hl�J8NOE 2<SBK�9<46EFK0AH7:9;2<JL7:iHÑB287:NO?æT� Kh46[:98Nw=/4@[:[ � � ö7$I$I$bö8� � 4 �L�@�rã3©����/�@�d�r :� l�JLNOE m T�îÑH=5SMN6l�9;2546287:9L287:=/4@[æ2<SBK/NOJ;j 4@?HA ]HJ<4@=b2<7G=/KÖi K>IO7:?B9QPQ7R2<S # # k N@iH98K>JLa6462<7GNO?H9W4@?HAPÓK^9LSD4@[G[¡9L2<ÑBABj%2<SB7:9Q=µ4@9LKÖ7:?�AHK>2<4@7:[ PQSHK/? PZKÖAH7G98=/ÑB989W9L254�2<7:9;2<7G=/9/T��� � � ��� ( !�� �|(0&á$�MÐ"k$ � *LR>$á.��BMÍ&Ü.7M^$� G .7H�$'&Ü.IJ+*DU
$á./(0"LH
�������\�+�� � � � ���8��_� � �� ���dZ�0����������� � � "!���� ��� Z�%����" ) 4�B�'(#&% �$;H#I+.B TÜf-SHK_EÍÑB[G2<7Ra64@J87:462<K¤a@K/J89L7:NO?�N@l�4ÍÕZ7G?HNOE07m4@[H7G9Ó=/4@[:[GK/A�4 �îÑH[R2<7 )
?HNOE07m4@[ÂT NO?H987GAHK/J AHJ84 PQ7G?HI^4^iH4@[:[Bl�JLNOE 4@?0ÑHJL?0PQSH7G=5S�SD469 iH4@[:[G9áPQ7G2<S8=wAH76� K/JLK/?c2=/NO[GNOJ89�[m4@irK/[GK/AÛ=>NO[:NOJ ü ÒW=/NO[GNOJ 1 Ò^TRTGTÅÒï=>NO[:N@J�\åT �¡Kb2 D�é ¥ D � öI$I$7$/ö D § PQSHK>J8KD�� � * 4@?HA F � + � D��wé ü 46?HA�9LÑH]H]rNO98K02<SD4�2 D�� 7:9_2<SBKF]HJLNOiD4@iB7:[:7R2j`N6l¤AHJ84 PQ7G?HI4%iD4@[G[�N@l =/N@[:NOJDT � J<4µP A 2<7GEFK>9 ¥ 7G?HAHK/]rK/?BAHK/?c2ÖAHJ84 PQ9¨PQ7R2<S+J8K>]H[m46=/K/E0K/?c2 § 4@?HA[:Kb2 � é ¥�� � öI$I$7$/ö8� § PQSBK/J8K ��� 7G9¤2<SHKÖ?½ÑHEÍirK/J-N@l�2<7GEFK>9¤2<SD4�2Q=/NO[GNOJ��46]H] K/4@J89>TUïK>?H=/K6Ò Aêé F � + � ��� T � Kw9<4µj+2<SD4�2 � SH4@9Ö4 �îÑH[G287:?HN@EF7:4@[ ¥sA Ò Dr§ AH7G9L2<JL7:iHѬ2<7:N@?PQJ87R282<K>? ��� �îÑB[G2<7G?HNOE07m4@[ ¥aA�ö Dr§ T'f-SHK^]HJLNOiD4@iH7G[:7R2j0l�ÑB?H=>287:NO? 7:9E
¥�õ §Üé Aõ � $7$I$Lõ � D �� � � � � D ��� ¥ # T � §
PQSHK/JLK Aõ � $I$7$Lõ � é A �
õ � � � � � õ � $6o�ÀhÀd´wvHx'&Hy �å¢ �c�D© � �F���B�@�Ó� � � ÑH[G287:?HNOE07m46[ ¥aA�ö Dr§Fÿ��D�b�<�^� é~¥�� � öI$7$I$/ö8� §�@�rã D%é ¥FD � öI$I$I$>ö D §/®?(°�D�Ö�0�@�Â�@���r�@ #ã@�:�b���È��£/¢¬��� ©��|©ªâ¤���w�:���Q���æ©��w���@ � A#è D�� @®) 4�B�'(#�� +��(#I+ ' " ��� �(; +.B TFf-SBK0ÑB?H7Ga64@JL7m4628K 'ïNOJLE�46[áSD4@Ag2PZN ]D4@J84@E0K>2<K>J89>Ò�Û4@?BA áT �?|28SHK�EÐÑH[G287Ga64@J87:462<KhaOK/JL987GNO?°Ò��7:9h4�a@K/=b2<NOJw46?HA �7:9^J8K/]B[m4@=>K/A|icj�4
E�4�2<J87 ��� TÜf�NFirK/I@7:?°ÒH[GK>2� é
��� � �TTT�
����PQSHK/JLK � � öI$7$I$/ö � ���º¥a*BöµüX§ 46J8KÖ7:?HABK/]rK/?HAHK>?c2µTÜf-SHK^AHK/?H9L7G2jkN6l � 7:9 �ÃÌ ´cqX¿ �`´6»:oMÁXo/±b¾
tªu½»:Ǽt5²Xo6q �� � éF !,+ � ! � ! xE¥��c§Üé !,+ �
E¥�� ! § é ü
¥ 1��#§ � ù K � ] � $ ü1H � + � � ù� � é ü
¥ 1 ��§ � ù K � ] � $ ü1 � � � � $
� Kd9<4µjF28SD462 � SD4@9¤4Ð9L2<4@?HAD4@JLA%EÍÑH[R2<7Ra@46J87m4�2<K 'ïN@J8EF4@[åAH7:9;2<J87GiHÑB287:NO?ÔPQJ87R282<K>? �*��º¥a*Bö"!¬§ PQSHK>J8KÔ7G2¨7G9_ÑH?HABK/J89;2<N3N¬A+28SD462 * JLK/]HJLK/98K>?c2<9h4ka@K/=b2<NOJ¨N@l =$#/K>J8N3K/9Ö4@?BA !7:9¤28SHKA= � =k7:ABK/?c2<7R2j%EF462<JL7 � T�îNOJ8KFIOK/?BK/J<46[:[Gj@Ò�4�aOK>=>2<N@J � SH4@9Í4 EÍÑH[G287Ga64@JL7m4628K 'ïNOJLE�46['AH7:9;2<J87GiHÑB287:NO?°Ò#AHK )?HN@28K/A i½j �����|¥ � ö%�-§ ÒD7Gl�7G2QSH4@9QAHK/?H9L7G2j � � � s Ç�t5²Xo_sÎq ÁXo�»mÇ;o-u@Ì
t5²XoÔÀd´ t5»Âs Ë&�_xE¥�õ#ø � ö%�-§Üé ü
¥ 1���§ � ù AHKb2 ¥'�-§ ��� ù K � ] �D$ ü1 ¥�õ $ � § � � � � ¥�õ $ � § � ¥ # T §
ð�* ��������� ����������������� ����������� �! "
PQSHK>J8KwAHKb2 ¥ � § AHK>?HN@2<K>9¨2<SHKÍABK>2<K>J8E07:?D46?½2_N@l 4%E�4�2<J87 � Ò �Þ7:9d4�aOK>=>2<N@J¨N@l [:K/?BI@2<S =4@?HA � 7:9Ô4 = � =º9Lj3E0EFKb2<J87G=@ÒÜ]rNO987R2<7Ga@K%AHK �D?H7G28K�E�4628J87 � Tº1¬K>2L2<7G?HI � é * 4@?HA8¼Àd´Xt5»�s Ë � s Ç`ÊOu@Ǫ¾
s t<s ÁXoº¿cobp qOs tªoÛs ̪¹hÌ�uc»´c¶·¶ qXuBqµ¾ � o�»mu�ÁXo>±>tªuc»mÇõ#¹Hõ � �Ó� � *Hx
� é ! IO7RaOK/9QiD46=5\�2<SHKÖ9;254@?HAH4@J8A 'ïNOJLE�46[ T1¬7:?B=/K � 7:9h9Lj3EFE0K>28J87G=%4@?HAÞ]rNO9L7G2<7RaOK%ABK��D?H7R2<K6Ò'7G2Í=/4@?ÞirKk98SHN PQ?Þ2<SD4�2Í2<SHK>J8K
K � 7G9L2<9F4gE�4628J87 � � ��� ù�� =µ46[:[:K>A�28SHK 9 �½ÑD4@JLKîJLN¬N@20N@l ��� PQ7G28S�28SHK l�N@[:[:N PQ7G?HI]HJLNO] K>JL287:K/9 �+¥ 7 § � ��� ù 7:9Ð9Lj3EFE0K>28J87G=@Ò ¥ 7G7 §���é � ��� ù�� ��� ù 4@?HA ¥ 7G7:7 § � ��� ù � � ��� ù%é� � ��� ù�� ��� ù-é ! PQSHK>J8K � � ��� ù-é ¥'� ��� ùb§ � � T¯Q²Xo/uc»mo�À+* ,���� 1�â � � �|¥ *Bö !B§ß�@�rãß� é � ú � ��� ù � ���D�b� � � �º¥ � ö%�-§>®þÓ©�� �@�b�<� �b �Xè'� â¤�����|¥ � ö �-§5è ���D�b��� � ��� ù6¥�� $ � § ���º¥a*Bö"!¬§/®
1¬ÑH]H]rNO9LKFPÓK%]D4@J;2<7G287:NO?ß4 J<4@?HABNOE 'ïNOJLE�4@[áa@K/=b2<NOJ � 4@9 � é ¥�� � ö8� � § � K=µ46?î9L7:E07:[m46J8[GjÍ]D4@JL287G287:NO? � é ¥ � � ö � � § 4@?BA��é � � � � � �� � � � � � � $¯Q²Xo/uc»mo�À+* ,�� *à#�b����� �º¥ � ö%�-§>®?(°�D�b� 6
� 9 L(°�D�^�0�@�Â�@���r�@ ¡ã@�:�b���È��£/���Â� ©��|©ªâ¤� � �:�-� � ���º¥ � � ö � ��� §/®�g@ L(°�D�w«5©��rã@����� ©��r�@ ¡ã@�:�b���b��£/����� ©��|©ªâ¤� � �@� �@�b�%� � éCõ � �:�� � � � � éCõ � ��� � � � ú � � � � � �� � ¥�õ � $ � � §bö�� � � $ � � � � � ���� � � � � $
� � 1�â +�:�h� �@�5«b�©��Ö���D�b� �r�����º¥��� � ö �� � ¬§>®� ; é ¥�� $ � § � � � � ¥�� $ � § � � ù ®
��� �Ô� ��& MÐ"LH��O(0& � Mh$á./(0"LHY( �J� MÐ"+)M(�� � MÐ& .bM JLR� H1¬ÑH]H]rNO9LK028SD462 � 7G9^4 J84@?HAHNOE a@46J87m46iH[:KÍPQ7R2<S � k�l E
o 4@?HA i>k�l mZo T �¡K>2� é��B¥��g§ irK%4�l�ÑH?H=>287:NO?�N@l � Ò�l�NOJhK � 4@EF]B[:K@Ò � é�� ùÐNOJ � é � o T � KF=µ4@[G[�é��B¥��ߧ 4¨2<J84@?H9Ll�N@J8EF462<7GNO?wN@l � T'UïN PCAHN^PZKW=/N@EF]HѬ2<KQ2<SHK � k�l 46?HA i>kZl N@l� e��?�28SHKÖAH7:9L=/JLK>2<K¨=/4@98K6ÒH2<SHK^4@?B9LPÓK/JQ7:9¤K/4@9Lj@TÜf-SHKÖE�4@9L9¤l�ÑH?H=b2<7:N@? N6l � 7G9¤IO7RaOK/?icj
����� ���+���� ��" � �8���D��Z�%�8��" � � ����������� ����������� �! " ðBü
E� ¥�÷H§'é # ¥ �ôéì÷H§'é # ¥ �B¥��ߧ'éì÷B§áé # ¥ó õ#ø �H¥�õ § é ÷rý6§áé # ¥�� ! � � � ¥�÷B§8§%$
×½Ë@´cÀhÊc¶ o^vHxW&h& �å¢ �c�D© � �^���B�@� # ¥�� é $hüX§Zé # ¥�� é ü §¤é ü-,2.ß�@�rã # ¥�� é+*c§Zéü-,�1H®Mà#�b� � é�� ù ® (°�D�b�Bè # ¥ � én*c§^é # ¥�� én*c§^é�ü-,�1Þ�@�rã # ¥ � é üX§hé# ¥���é�üX§¡ú # ¥���é $hü § éôü-,�1H® �墬�w�0�@�È�54/���3�\6
õEo¨¥�õ §& 9 9%:�;
< 9%: @9 9%:�;
÷E�Ó¥�÷H§
< 9%: @9 9%: @
� � ��c�5�Qâµ�bÿZ�b� ���@ Ý¢å�5�0���B�@�M� £b�5«8�@¢c� �0���D�0�Â�L�@�B��⵩��È�0�@�Â� ©��|�:�Ô�æ©��Ö©��æ� &�© &8©��æ�/®ò
f-SHKh=>NO?c2<7G?3ÑBNOÑH9Q=µ4@9LKh7G9-SD4@JLAHK/J>T f-SBK/J8K^4@JLK¨2<SHJLK/KÖ9L28K/]H9Ql�NOJ �D?HAB7:?HIE�'�
f-SBJ8K/KÖ9;2<K/]B9Wl�NOJ¤28J<4@?B9Ll�NOJLE�46287:NO?B9ü T � NOJ-K/4@=5S ÷ Ò �D?BAî2<SBK^9LK>2 � éó õM���H¥�õ §Zû ÷rý T1 T � 7G?HAk2<SHK i>kZl
m��Ó¥�÷H§ é # ¥ � û ÷B§áé # ¥ �H¥��ߧZûC÷H§é # ¥ó õ#ø �B¥�õ §¤ûC÷åý6§áé � � � E o¨¥�õ § �cõ ¥ # T üI*c§
# T-f-SHK � k�l 7:9E�Z¥�÷H§'é m �� ¥�÷B§ T
×½Ë@´cÀhÊc¶ o^vHxW&hV à#�b�Eo¨¥�õ §`é � � � ⵩��0õ � *D® (°�D�b� mZo¨¥�õ §îé � �� E
oÖ¥�§ ��
éü $ � � � ®Qà#�b� �é��B¥��g§'é [:NOI �ê®?(°�D�b� � éôó õ`�%õ`û � ýî�@�rãm��Ó¥�÷B§'é # ¥ � ûC÷B§'é # ¥ [:NOI � ûC÷H§'é # ¥�� û � §'é m!o¨¥ � §Üéôü1$ � ��� � $
(°�D�b�<��⵩��5�5èE�Ó¥�÷B§áé � � ��� � ⵩��Q÷ !`�d®Öò
ð\1 ��������� ����������������� ����������� �! "
×½Ë@´cÀ^Êc¶ oÖvHx'& �Ûà��b�#��� �ï?H7Rl ¥%$hü@ö # §/®-ÚÜ���rã0���D�-�BãLâЩªâ �é �MùX®?(°�D�hã3�b�B�b��� �k©ªâ� �:� E
oÖ¥�õ § é � ü-,2. 7Gl $ ü ��õ�� #* N@2<SHK>JLPQ7G98K $
� «8�@�ñ©��D �ß� ��c� ���@ Ý¢å�5� ��� ¥a*Bö §>® þÓ©��B�b��ã3�b�k�ÂÿZ©ê«8��� �76"� � D* ��÷ � üê�@�rã� ��� kü�û�÷ � ®FÚ�©��Ô«8��� � � � è � é p $ � ÷ ö � ÷\qd�@�rã?m.�Z¥�÷H§Wé � � � E o¨¥�õ §��Oõ|é¥;ü0,�1O§ � ÷ ®ÓÚ�©��Z«8��� � � ��� è � é p $^üOö � ÷\q¡�@�rã m.�Ó¥�÷B§'é �� � E o¨¥�õ §��Oõ�é ¥ªü-,2.½§>¥ � ÷¬úüX§>®�(Ö� �_�b�<�b�D�Â���@�����3�Qm¼ÿZ�d�3�b�
E�Z¥�÷H§'é
���� ������� 7Gl * � ÷ �ñü�� � 7Gl ü � ÷ � * N@2<SBK/JLPQ7G98K $ºò
� SHK/? � 7:9Ö9;2<J87G=>28[GjME0NO?HN62<NO?HKw7G?H=/JLKµ4@9L7:?HI�NOJÖ9;2<J87G=>28[GjME0NO?HN62<NO?HKwABK/=/JLKµ4@9L7:?HI2<SBK/? � SD4@9W4@? 7:?caOK>J89LK
�é � � � 4@?BAî7G?�28SH7:9-=/4@98KÖNO?BK^=/4@? 98SHN P 28SD462E�Ó¥�÷B§áé
Eo_¥
�¥�÷H§L§ ��
��
��¥�÷B§�c÷ �
���
$ ¥ # T üOüX§
��� �BE ��& MÐ"LH��O(0& � Mh$á./(0"LH ( � � ��¨& M R�� MÍ"+)M(�� � MÐ&Ü.%UMQJLR� H
�?%98N@EFKW=µ4@9LK/9ZPÓK¨4@JLKï7G?c2<K/JLK/9;2<K/Ak7G?F28J<4@?H9;l�NOJ8EF46287:NO?ÔN@l¡9LK>a@K/J<46[°J<4@?BAHNOE a64@J87 )4@iH[GK/9>T � N@JWK � 4@EF]B[:K@Òr7Gl � 4@?BA � 4@J8KhIO7Ga@K/?`J84@?HAHN@E a@46J87m46iH[:K>9/ÒDPZKhEF7GIOSc2QP¤4@?c22<N`\3?BNXPÅ2<SBKkAH7:9;2<J87GiHÑB287:NO?�N@l � , � Ò � ú � Ò'EF4 �åó �`ö �0ý NOJhEF7G? ó �Mö �0ý T �¡K>2�¼é �H¥��Mö �Ч irKF2<SBK�l�ÑH?H=b2<7:N@?ºN@lQ7:?c28K/J8K>9L2/TMf-SHK%9;2<K/]B9Íl�NOJ �D?BAH7:?HI
E�
4@J8K02<SHK9<46EFKÖ4@9QirK>l�NOJLK �
����� �_�+���� ��" � �8���D��Z�%�8��" � � "_ �� ����� ��� ������� ��� ����� � �! " ð #
ü T � NOJ-K/4@=5S � Ò �H?HA 28SHKÖ98Kb2 ��� éó¬¥�õ#ö<÷B§Ó� �B¥�õ#ö<÷B§Óû �¬ý T1 T � 7G?HAk2<SHK i>kZl
m�¥��c§ é # ¥ � û �c§ é # ¥ �B¥��`ö �ͧ¤û �c§
é # ¥ªó¬¥�õ#ö8÷H§Èø �H¥�õ#ö<÷H§ û �3ý6§ é � � ��
Eo�� �Ó¥�õ#ö8÷H§ �Oõ �½÷ $
# T-f-SHK>?E�¥��c§Üé m �
�¥��c§ T
×½Ë@´cÀhÊc¶ o^vHxW& �Cà#�b�¤� � ö8� ù � �W?H7Gl ¥a*BöµüX§k£b�%���rã3��D�b�rã3�b�D� ®|ÚÜ���rã+���D��ã3�b�B�b��� �|©ªâ�é � � úÞ� ù ® (°�D� � ©����D�Óã3�b�B�b��� �k©ªâh¥�� � ö8� ù §Ö�:�E
¥�õ � öLõ ù §'é � ü *�� õ � � üOö�*%��õ ù �ñü* N@28SHK/J;PQ7:9LK $
à#�b� �B¥�õ � ö8õ ù § é õ � úÞõ ù ® $h©�ÿ�èm��Ó¥�÷H§ é # ¥ � ûC÷H§'é # ¥ �H¥�� � ö8� ù §ZûC÷H§
é # ¥ó¬¥�õ � ö8õ ù §-� �B¥�õ � ö8õ ù §ZûC÷rý6§'é � � � � E ¥�õ � ö8õ ù § �cõ � �Oõ ù $$h©�ÿ«5©����5�Í���D�Í�B�@�Lãw�B�@�È� 68rÜ�rã@���3� � ®0ÚÜ���<�b�¤�b¢X�c�D© � �Í���B�@� * �V÷gû ü½® (°�D�b�� �:�%���D�����b���@�3�@ :�%ÿ'�������@�b�È�Â� «5�5�k¥ *¬ö *c§bö ¥�÷rö *c§��@�rãߥ *¬ö<÷H§>® �°�5�0ÚÜ�R�@¢3�<� ½®��@® 1b����3�:�+«8��� �5è�� �� � E ¥�õ � öLõ ù §��Oõ � �Oõ ù �:� ���D�`�@�<�8��©ªâ����3�:� �Â�È���@�3�@ :��ÿ��3� «5�ß÷3ù ,h1H® 1�âü �ô÷ � 1ß���D�b� � �:�F� �@�b� �����3���3�î���|���D�Ô¢3�D���-�/ä/¢B�@�<�%�L笫5��r�-���D�Ô�Â�È���@�3�@ :�Íÿ'������@�b�È�Â� «5�5�Ð¥ªüOö<÷ $CüX§böX¥ªüOöµüX§ÈöX¥�÷ $CüOö/üX§/® (°�3�:�d� �b�Ü�B���h�@�<�8��ü $º÷3ùF,�1H®�(°�D�b�5��⵩��<�5è
m.�Ó¥�÷B§'é������ �����
* ÷ � * �ù *wû ÷FûVüüD$ �ù üÖû ÷Fû 1ü ÷ � 1�$� ��ã@� �_�b�5�b�D�����@��� ©��BèÜ���D� � kZl �:�E
�Ó¥�÷H§'é��� ��
÷ *0ûC÷%ûVüüD$Þ÷ ühûC÷%û 1* N@2<SHK>JLPQ7G98K $ºò
ðh. ��������� ����������������� ����������� �! "
* ü*
ü
¥a*Bö<÷B§
¥�÷rö *c§
�
�
f-SH7:9Q7:9¤28SHKÖ=µ4@9LK *0ûC÷%ûVü T* ü*
ü
¥;ü@ö<÷0$Cü §
¥�÷�$ÛüOö/üX§
�
�
f-SH7G9W7G9¤2<SHKÖ=/4@98K ü � ÷%û 1 T
� 7:IOÑBJ8K # T ðB� f-SHK^98Kb2 � l�NOJ-K � 4@EF]B[:K # T . $ T
��� �Ö� �E¨,��+"+.µ,�M R�� � � ¨"+)+.���WK>=µ4@[G[�2<SH462h4�]BJ8NOiD46iH7:[G7G2jîEFK/4@98ÑHJLK # 7:9ÖAHK �D?HK/A�NO?�4 4) �DK>[:A��N@lZ4�9<4@E�)
]H[GK`98]H4@=/K � T Ï J<4@?BAHNOE a@46J87m46iH[:K � 7G9%4 �|�½� `7[�}X�H����� EF4@] � �W�~� � T�îK/4@98ÑHJ84@iH[GK^E0Kµ46?H9¤2<SD4�2µÒHl�NOJ-KbaOK/J;j õ Ò óµ¦Û�Ô��¥m¦Z§ZûÛõ¡ý ! �kT��� � � ��î,>¨&Ü,ï.7H$ H
ü Tï1¬SBNXPì2<SD4�2# ¥���é õ §Üé mF¥�õ � § $ mF¥�õ � §
4@?BAmF¥�õ ù § $ mF¥�õ � §'é # ¥�� ûÛõ ù § $ # ¥�� ûÛõ � §%$
1 T1�¡K>2 � i Kß98ÑB=5Sñ2<SD4�2 # ¥�� é 1O§ßé # ¥�� é # §ßé ü-,¬ü0* 4@?BA # ¥�� é O§�é � ,¬üI* T 7'[:N@2Ô28SHK iTk�l m T �ï9LK m 28N �D?HA # ¥ 1 � � û . $ � § 4@?HA# ¥ 1ÔûÛ� û . $ � § T
# T17'J8N a@K��¡K/E0EF4 # T ü T. T1�¡K>2 � SD4µaOK^]HJLNOiD4@iH7G[:7R2jFABK/?H9L7G2jkl�ÑH?H=b2<7GNO?E
o_¥�õ §Üé�� � ü-,2. *�� õ�� ü# , � # � õ�� * N@28SHK/J;PQ7:9LK $
����� ���N�� �� � �c� "_ " ð ¥ 4 § � 7:?HAk28SHKÖ=/ÑHEÍÑB[m46287GaOK¨AB7:9L28J87GiHÑB2<7GNO?�l�ÑH?B=>2<7GNO?�N@l � T¥ i § �¡K>2 �Åé�ü-,�� T � 7:?BA`2<SHKw]BJ8NOiD46iH7:[G7G2jkAHK>?H987R2j`l�ÑH?B=>2<7GNO?
E� ¥�÷H§ l�NOJ � T
Uï7G?c2 � NO?H9L7:AHK>J-2<SHJLK/KÖ=µ4@9LK/9 � ��ûC÷%û �� Ò �� ûC÷%ûVü 4@?BA ÷ �Vü T
T1�¡K>2 � 46?HA � irKFAB7:98=>J8Kb2<K0J<4@?HABNOE�a64@J87:4@iH[GK/9/T�1¬SHN P 28SD462 � 4@?HA � 4@JLK7:?BAHK/]rK/?HABK/?c2d7Rl�4@?HA NO?H[Gj�7Gl
Eo � �Z¥�õ#ö<÷B§'é
Eo¨¥�õ §
E�Ó¥�÷B§ l�NOJQ4@[G[ õ 46?HA ÷ T
ð T1�¡K>2 � SD4µa@KWAB7:9L28J87GiHÑB2<7GNO? m 4@?HAwAHK>?H987R2jÍl�ÑH?H=b2<7:N@?E4@?HAÔ[:Kb2 �ÛirKQ4_98ÑHiH9LK>2
N@l�2<SHKÖJLKµ4@[¡[G7:?HK6T �¡K>2 ! � ¥�õ § irK_2<SHKÖ7:?BAH7:=/462<NOJÓl�ÑH?H=>287:NO?kl�NOJ � �! � ¥�õ §Üé�� ü õ ! �* õ�,! � $�¡K>2 � é ! � ¥��ߧ T � 7:?HA 4@?îK � ]HJLK/989L7:NO?îl�NOJ-2<SHK^=/ÑHEÐÑH[m46287Ga@K^AB7:9L28J87GiHÑB2<7GNO?kN@l
� T ¥ Uï7:?c2 � �DJ89;2 �H?HA 28SHKÖ]HJ8N@iD4@iH7G[:7G2jFEF4@989¤l�ÑH?B=>2<7GNO?kl�NOJ � T §$ T1�¡K>2 � 4@?BA � irKZ7G?HAHK>] K>?HAHK/?c2á4@?HAÐ9LÑH]H]rNO98KÓ2<SD462�Kµ46=5SÍSD4@9�4 �ï?B7Gl�NOJLE ¥ *Böµü §
AH7G9L2<JL7:iHѬ2<7:N@?°T �¡K>2 � é E07:? ó �Mö �Ôý T � 7:?HA%28SHKÖAHK/?B987G2jE�¥��c§ l�NOJ � TÜUï7:?c2 �
� 2WEF7GIOSc2-irK^K/4@987GK/J¤28N �HJ89L2 �D?BA # ¥ � � �O§ T� T1�¡K>2 � SD4µaOK^=/ABl m T � 7:?HAk2<SBK^=>ABláN6l � � é EF4 �åó/*BöL�ºý T T1�¡K>2 ��� ��� ] ¥ � § T � 7G?HA m�¥�õ § 46?HA m � � ¥��O§ Tü0* T �¡K>2 � 4@?BA � i K¨7G?HAHK>] K>?HAHK/?c2/TZ13SHN Pì28SD462 �æ¥��g§ 7:9Ó7:?HAHK>] K>?HAHK>?½2WN@l �#¥ �Ч
PQSHK>J8K � 4@?HA � 46J8K¨l�ÑH?H=b2<7:N@?H9/TüOü TW1¬ÑH]B] NO9LK¨PZK_28NO989W4Í=/N@7:?kNO?H=>KÖ4@?HA�[:Kb2 D i K_28SHK¨]HJLNOiD4@iH7G[:7R2jÔN@l�SBKµ4@AH9>T"�#Kb2
� AHK/?BN@2<KÖ2<SHKÖ?½ÑHEÍirK/JQN@l�SHK/4@AH9W4@?HA [:Kb2 � ABK/?HN@28K¨2<SHKÖ?½ÑHEÍirK/JQN@l�254@7G[:9>T¥ 4 § 7ÜJLN aOKÖ2<SD462 � 4@?HA � 4@JLK^ABK/]rK/?HAHK>?c2µT¥ i § �#Kb2 � � 7�N@7:989LNO? ¥ � § 4@?BA�9LÑH]H]rNO98KwPÓK028NO989^4k=/NO7G? � 287:E0K/9>T �¡K>2 �4@?HA � irKî2<SBK`?½ÑHEÍirK/JFN@l¨SHK/4@AH9k4@?BA 254@7G[:9>T 1¬SHN P 2<SD462 � 4@?BA � 4@JLK7:?BAHK/]rK/?HABK/?c2µT
ü-1 T 7'J8N aOK^f-SHK>NOJ8K>E # T # # T
ðOð ��������� ����������������� ����������� �! "
ü # T1�¡K>2 �����º¥a*BöµüX§ 46?HAî[GK>2 �é�� o T¥ 4 § � 7:?BA%2<SHKÖ]rABl�l�NOJ � T"7Ü[GN@2-7R2µT
¥ i §+¥��æuBÀ^ÊcÆ tªo6»0×½Ë3Ê@o6»Âs·Àdo�q tbx·§ �¨K/?HK>J<4628Kg4Þa@K/=b2<NOJ õ é ¥�õ � öI$I$I$bö8õ � � � � � � §=>NO?H987G9L287:?HIïN@l ü0* Ò *�*�* J<4@?BAHNOEô9L2<4@?HAD4@JLA�'ïNOJLE�46[:9/T �¡K>2 ÷Ôé ¥�÷ � ö7$I$I$/ö8÷ � � � � � � §PQSHK>J8K ÷ ! é � �CJ T � J<4µP 4^SH7:9;2<NOIOJ84@E N@l ÷ 4@?HA�=/NOE0]D4@JLKd7G2 2<Nh2<SHK � kZl jONOÑl�NOÑB?HA�7:?�]D4@J;2 ¥ 4 § T
üI. T1�¡K>2 ¥��Mö �Ч i K-ÑB?H7Gl�NOJLEF[Rj^AH7:9;2<J87GiHÑB28K/AÔNO?w2<SHK-ÑB?H7G2ÜAH7G98= ó¬¥�õ#ö<÷H§Z�¨õ ù úk÷ ù ûü@ý T(�¡Kb2�� é � � ù ú � ù T � 7:?HAk2<SBK^=>ABl'46?HA�] ABl�N@l��ÍT
ü T ¥ 8 ÆcqOs ÁXo6»mÇL´c¶^»m´½qX¿cu¬À qcÆcÀhÄ@o6» 9Oo6qXo�»�´ tªuc»xÙ§ �¡K>2 � SH4 a@K 4 =/N@?½287:?½ÑHNOÑB9/Ò9;2<J87G=>28[GjÖ7:?B=/J8K/4@987G?HI iTk�lnm T �¡K>2 �é m�¥��ߧ T � 7G?HAh28SHKZAHK>?H987R2jÐN6l � T�f-SH7:97G9 =/4@[:[GK/Aw2<SBKï]HJLNOiD4@iB7:[:7R2jÍ7G?½28K/IOJ84@[H2<J84@?H9;l�NOJ8E T 'WNXPÛ[:K>2�� � �W?H7Gl�N@J8E ¥ *¬öµüX§4@?BAß[GK>2 � é m � � ¥ � § TÔ1¬SHN P2<SD4�2 � �Xm T�'WN P PQJ87R2<KÔ4%]HJLNOIOJ84@E 28SD4622<4@\@K>9 �ï?H7Rl�NOJ8E ¥ * Ò üX§ J<4@?HABNOE¼a@46J87m46iH[:K>9d4@?BA`IOK>?HK/J8462<K>9_J<4@?BAHNOE¼a64@J87:4@iH[:K>9l�JLNOE 4@? ��� ] N@?HK/?c2<7:4@[ ¥ � § AH7G9L28J87:iBÑB2<7GNO?°T
ü ð T1�¡K>2 ��� 7�NO7:9L98NO? ¥ � § 4@?HA �@� 7�NO7G989LNO? ¥ � § 4@?HA04@9L98ÑHE0KZ2<SH462 � 4@?HA � 46J8K7G?HAHK/]rK/?BAHK/?c2µTÍ1¬SHN P2<SD462d2<SHKwAH7G9L28J87:iBÑB2<7GNO?`N@l � IO7RaOK>?g28SD462 �¼ú ��é�A7G9QÕZ7:?HN@EF7:4@[ ¥aA�ö ��§ PQSHK>J8K �+é � ,B¥ � ú � § TUW7:?c2 üO� � NOÑgEF4µjMÑH9LK028SHKÍl�NO[:[GN PQ7:?HI�l�46=>2 � � l � � 7�NO7G9898N@? ¥ � § 4@?HA � �7�NO7:9L98NO? ¥ � § ÒB4@?HA � 4@?HA � 46J8K_7G?HAHK/]rK/?BAHK/?c2µÒ¬2<SHK>? � ú � � 7�NO7:9L98NO? ¥ � ú� § TUW7:?c2 1¬� 'ïN62<K¨2<SD4�2 ó � é õ#öW��ú ��é A�ýÖéVó � éCõ#ö �é A $�õ¡ý T
ü $ T1�¡K>2 Eo�� �Ó¥�õ#ö<÷B§'é ��� ¥�õwú�÷ ù §3*0ûÛõ`ûVü 46?HA *0ûC÷%ûVü
* N@2<SBK/JLPQ7G98K $� 7G?HA � � � � �ù ���é �ù � T
ü � T1�¡K>2 � � �º¥ # öµü ðO§ T 1¬N@[GaOK�2<SBKîl�NO[G[:N PQ7:?HI+ÑB987:?BI�2<SBK 'WNOJ8EF4@[¤2546iH[:K 4@?HAÑH9L7:?HI04w=/N@EF]HѬ2<K/JQ]D46=5\646IOK@T¥ 4 § � 7:?HA # ¥�� � $ § T
����� ���N�� �� � �c� "_ " ð $¥ i § � 7G?HA # ¥�� � $�1O§ T¥ = § � 7G?HA õ 9LÑH=5Sî28SD462 # ¥�� � õ §Üé $ * T¥ A § � 7G?HA # ¥ *ÔûÛ� �N.½§ T¥ K § � 7G?HA õ 9LÑH=5Sî28SD462 # ¥ � � � � � õ��³§'é $ * T
ü T 7'J8N aOKÖl�NOJLEÍÑH[m4 ¥ # T üOüX§ T1h* T �¡K>2 �`ö � � �ï?H7Rl ¥a*BöµüX§ irKÍ7:?HABK/]rK/?HAHK>?c2µT � 7:?BA`2<SHK � kZl l�NOJ � $ � 4@?HA
� , � T1¬ü T �¡K>2 � � öI$I$I$>ö8� � � ��� ] ¥ � § i K # # k T �¡Kb2 � é E�4 �åó � � ö7$I$I$/öL� � ý T � 7G?HA2<SBK � kZl N@l � TÜUW7:?c2 � � ûC÷ 7Rl�46?HAîN@?H[Gj�7Gl � ! û ÷ l�NOJ � é�üOöI$I$7$>ö A T1�1 T �¡K>2 � 46?HA � i K^J<46?HAHNOE�a64@JL7m4@iB[:K/9>T-1¬ÑB]H] N@98KÖ2<SD462�� ¥ � � �ߧ éñ� T-1¬SBNXP
2<SH462 �°u6ÁH¥��Mö �ͧ'é��0¥��ߧ T1 # T �¡K>2 ��� �ï?H7Rl�NOJ8E ¥a*BöµüX§ T �¡K>2 *��� � � �ìü T(�¡K>2
�é � ü *�� õ����* N@2<SBK/JLPQ7G98K
4@?HA [:Kb2� é � ü ��õ�� ü
* N@28SHK/J;PQ7:9LK¥ 4 § Ï J8K � 4@?HA � 7:?HABK/]rK/?HAHK>?c2Èe � Scj�� � S½j�?BN@2Èe¥ i §Í¥;ü0* ] NO7G?c2<9 § � 7:?HA�� ¥ � � �d§ TÖUï7G?c2 � � SD462_a64@[GÑHK/9 � =/4@? � 254@\6K e 'WNXP�D?HA�� ¥ � � � é �c§ T
� � ��� ���� � � � �� ����������������� ���! �"$#&%(')%+*-,/. ,/01' 23'4.65�,/7 8�':9;*-'4<6=-"
>&?A@B@DCFEG@-HJILKMILNPORQTSUOWVYX/@ZKWQ\[]OW^_K`VLKMQAaAORXcbMKWVdNeKWfAge@ihjNPkYIL?A@:K�bW@-VdKWlR@:bMKWgemn@OW^oh�p+>&?A@i^qORVdXrKWgsan@JtuQANvILNPORQwNek]KWk&^qORgPgeO�x]k-py+zJ{G|R} ~L} �n|:�u�������u�4�����+���W�������A�u�U�o�A�����4�����u�(�4���i���M D�¡��¢G�����£�W�)�¥¤h ¦e§4¨ �q©;ªs� ¨r« �`¬D�
$Sqh®[+¯±°³²Y´�µ`S¶²£[+¯³·³¸1¹ ²£º»Sq²£[ NP^]hjNPk¼anNekdHJVd@DIL@½ ²£º(Sq²£[¾´R² NP^]hjNPk¼HJORQ�ILNPQ�mnORmAk SU¿npPÀÁ[ §Ã§DÄFÅB¦ ª�Æ « �  «Ç« �u� §DÄ�ÅÉÈ ��� ¦ ª « �ÊÆW� ÂWË Ì ¦e§ÎÍ � ˶ËÐÏ ¨ �q©;ªs� ¨�Ñ`Ò � Ä�§ � « �u�(¤Z� Ë¶Ë � Í Ï¦ ª�Ærªs� «  «Ó¦ ��ª « � ¨ �Dªs� « � « �u�B�ÕÔJÖu�Ã× « � ¨rØ ÂWË Ä �:�¥¤ hTÙÎSqh®[+¯ÚÇhɯ۰ܲδ�µrSq²£[+¯1Ý!¯ÚÝsÞYß SU¿npáàR[>&?A@4@âCnEG@-HDIÃKMIdNeORQãNekäKåORQA@âæçQ�mAX:fG@-VäkdmAX/XrKWVÕè`OW^»Id?A@�anNekÕIdVdNPfAmnILNPORQ�p;>&?ANPQAéêOW^iS¶h®[ëKWk]IL?A@)KZbR@JVLKWlW@)bMKWgPmA@$èRORmãxìORmngeaãORfnIÃKMNeQwNP^oèRORmãH-OWXrEAmFIL@-aãIL?A@iQ�mAX/@-VÕNeHZKMgKZbR@-VdKWlR@îíðïuñ ¸1òóÐô ñ h ó OM^YK�geKWVdlW@rQ�mAX:fG@-VBOM^¼õÓõ¶ö÷aAVdK�x]k:h ñâø ßZßZß ø h ò p6>&?A@ê^UKWHDIIL?uK�I¼ÎSqh®[ìùí ïuñ ¸ òó�ô ñ h ó NPkÎKWHDILmuKMgegPèêXrORVÕ@)Id?uKWQúK`?A@-mnVdNek¾ILNPHWûëNvIYNPkYKåIL?A@JORVd@JXHZKWgPge@Ja�IL?A@úgüKZxcOW^igüKMVdlR@!Q�mnX:f£@JVdkêId?uKMI`x_@�x]NegPg¼anNekdHJmAkdkîgeKMIL@JV-p±>&?n@�QAOWILKMILNPORQ
ýRþ
��� ������� ��������������������������
½ ²Y´�µ`S¶²£[oaA@Jkd@JVÕbR@Jk;kdORX/@ëH-OWXrX/@-Q�IZp ��@ëmAkd@&NvI»X/@-VÕ@-gvè4KMk+KYH-OWQ�bW@-QANP@-Q�IÇmAQANv^¶èFNPQAlQAOWILKMILNPORQ6kdOwx_@raAOWQ"! Ii?AK�bW@/IdOãx]VÕNPId@ ¸ ¹ ²£º»Sq²£[Y^qORV$aANPkdH-VÕ@JId@/VdKWQAaAOWX�bMKWVdNeKWfAge@JkKWQAa ½ ²£º(S¶²£[Õ´R²å^qORV»HJORQ�ILNeQ�mAOWmAk(VLKMQAaAORX�bMKWVdNeKWfAge@Jk�fnmnI(èROWmBkd?AORmngea:fG@&KZxëKMVd@ìId?uKMI½ ²Y´�µ`S¶²£[ì?uKWkäKBEAVd@JH-NPkd@)Xr@-KWQANeQnl:IL?uKMI]NPk&aANekÕH-mAkÕkd@Ja�NeQîVÕ@ZKWgðKWQngPè�kdNPk&H-ORmAVÕkd@Jk-p>�O]@JQAkdmAVÕ@+IL?uKMI�ÎS¶h®[£Nek�x_@-gPgRaA@DtuQA@-a$#�x_@;kLKZèYId?uKMI�ÎSqh®[s@âCFNekÕIdk�NP^ ½ ¹ % ² % ´�µðÞ$S¶²£[�&' p�(YIL?A@JVÕx]NPkd@$xì@)kLKZèêId?uKMI]IL?n@)@âCFE£@JHJIÃK�ILNeOWQãanOF@JkäQnOWIä@DCFNek¾IZp
)+*�,.-0/.1 z$�u�3254ð� «(h 687ì@-VÕQAORmAgPgeNUS:9G[-Ñ ���u�Dª ÎS¶h®[¼¯ ¸ ñ¹Dô<; ²£º(S¶²£[Y¯cS �>= S¥À�?9G[Õ[A@ÛS¾À = 9\[ǯB9oÑ�C)+*�,.-0/.1 z$�u�3D5E Ë ¦ Ö Â ¤  ¦ �Y×Ã� ¦ ª «UÍ � «U¦¶Å � §-Ñ 4ð� «uh ¬D� « �u�äª ÄFÅ ¬D�D�Y�¥¤_�u�  ¨�§-Ñ ���u�Dªn�ÎSqh®[]¯ ½ ²G´�µoÞiS¶²£[]¯ ¸ ¹ ²£ºÁÞiS¶²£[¼¯³S �F= º(S � [d[ @ S¾À = º»S¾ÀÁ[Õ[ @ SÓà = º(SÊàR[Õ[]¯S �G= S¥ÀIHM¿�[Õ["@ÛS¥À = S¾ÀIHRàW[d["@1SÊà = S¾ÀIH�¿�[d[+¯�ÀRß"C)+*�,.-0/.1 z$�u� �J4ð� «�hK6MLäQANP^ZSN?)À øPO [-Ñ ���u�Dªn� $S¶h®[+¯ ½ ²G´�µðÞiS¶²£[ǯ ½ ²£º�ÞiSq²£[¾´R²w¯ñQ ½SRïuñ ²Y´�²ê¯ À�ÑTC)+*�,.-0/.1 z$�u�VUJW]�Ã× ÂWË¶Ë « �  «  �  ª ¨ � Å Ø Â � ¦  ¬ Ë �ì�  § ÂFX;Â Ä ×Ã��Y ¨W¦e§D« � ¦ ¬ Ä�«Ó¦ ��ª ¦ ¤ ¦¶« �  §¨ �Dª §D¦¶« Y ºÁÞiS¶²£[¼¯[Z]\ÇS¥À�@�²_^â[P` ïuñ Ñba�§D¦ ª�Æ ¦ ª « �ÊÆW�  «Ó¦ ��ª ¬cY4Ö Â � «q§ � ÈÓ§ � «Sd®¯ ²  ª ¨e ¯ÚIÃKWQ ïuñ ² Ì �
° % ² % ´�µrSq²£[;¯ à\ °�f
;²Y´R²Àg@ ² ^ ¯ihá² ILKWQ ïuñ S¶²£[kj f; ? °lf
;ILKWQ ïuñ ²Î´R²î¯ '
§ � « �u� Å �  ª ¨ ��� § ªs� « �ÕÔ ¦e§D«ÊÑnm ¤oY�� Ä�§D¦¶ÅBÄ Ëv « � Â5X;Â Ä ×Ã��Y ¨W¦e§D« � ¦ ¬ Ä�«Ó¦ ��ª Å Â ªpY«U¦¶Å � §  ª ¨/« Ârq � « �u� Â Ø �D�  Æ��Ã��Y�� ÄwÍ+¦ Ë¶Ë § �Ã� « �  «;« �u� Â Ø �D�  Æ��iªs� Ø �D� § � «U« Ë � §4¨ � Í ª Ñ��� ¦e§`¦e§ ¬D�Ã× Â Ä�§ � « �u� X;Â Ä ×Ã��Yú�  §î« � ¦ × q «  ¦ Ë §  ª ¨ �u�Dªs×Ã�ã�ÕÔ « �Ã� Å �!�W¬ § �D� Ø Â «Ó¦ ��ª §Â �L�:×Ã� ÅBÅ ��ª ÑsCtuVdORX QAO�xÜORQ"#Çx]?A@-Qn@JbR@JVrx_@waANekÕH-mAkÕkr@âCnEG@-HDIÃKMIdNeORQnkc#_x_@ãNeX/EAgeNPH-NPIdgPè¡KMkdkdmnXr@IL?AKMI&IL?A@Dèw@DCFNek¾IZpu�@JIwv¯JxnSqh¡[DpSyäOÁx1aAO4x_@$H-OWXrEAmFIL@ä$Szv4[|{}(ÎQA@¼xëKZèêNPk_IdO4tuQAaãºI~ìS��A[;KWQAaIL?n@-QãH-ORX/EAmnId@$ÎS�v:[;¯ ½ �AºI~ÇS��n[Õ´.�£p�7ìmnI&IL?n@-Vd@)Nek]KMQã@-KWkdNP@-V&xìK�èWp
����� ����� �������������w�[��� � �w����G��� ����w� ��� � � À ��ÁzJ���üz -��������� ��Áz����.1 zi���o~��ÁzG1 ,! #"�$¥~N,�~L}%$¥~L}%&�} ,�|'�)( 4o� « v¯ xnSqh®[JÑ ���u�Dª
$Szv:[;¯÷iS�xAS¶h®[d[(¯Û° xnSq²£[¾´�µoÞÎSq²£[Dß SU¿np O [>&?ANekrVd@-kÕmAgPI`X`KMéW@-krNeQ�ILmnNPILNvbR@ãkd@JQAkd@MpÛ>&?ANPQAé OW^$EAgeK�è�NPQAl K®l�KWX/@ãx]?A@JVd@úx_@aAVLKZxh KMI$VLKMQAaAORX�KWQAa6IL?A@JQ+*¼EuKZè!èRORm vܯ xnSqh®[âp-,ÇOWmAV$KZbR@JVLKWlR@åNPQAH-ORX/@4NPk
xAS¶²£[$IdNeX/@-kÎId?A@`HÃ?uKMQAH-@/IL?uKMIih ¯ ² #»kÕmAX/Xr@Ja÷SqORViNeQ�IL@JlRVLKMId@-a\[$O�bR@JV4KWgPg(bMKWgemA@JkOW^+²ðp0y¼@JVd@:NPkÎKêkdEG@-HJNüKWgoHZKMkd@Wp0u�@JI. f£@:KWQ�@JbW@-Q�I�KWQAa6gP@JIxnSq²£[]¯0/21(S¶²£[äx]?A@JVd@/31»Sq²£[;¯ ÀiNP^ð²546.�KWQAa7/31»Sq²£[;¯ � NP^ð² H46.4p;>&?A@-Q
ÎS8/21(S¶h®[d[+¯ ° /21(S¶²£[dºÁÞ$Sq²£[¾´�²î¯ ° 1 º�Þ$S¶²£[Õ´R²w¯:9ìS¶h;46.$[Dß*çQ!OMIL?A@JV]x_ORVÕaAkc#uEAVÕORfuKWfnNegeNvIçè`NPk]KBkdEG@-H-NeKWg�HZKMkd@)OW^»@âCFE£@JHJIÃK�ILNeOWQ�p)+*�,.- /.1 zi�u�=<l4ð� «ohK6ML¼QnNP^-S � ø ÀÁ[-Ñ 4ð� « v¯ xAS¶h®[+¯?> Þ Ñ ���u�Dªn�
ÎSzv:[;¯1° ñ;> ¹ º(Sq²£[¾´�²î¯±° ñ
;> ¹ ´R²w¯?>�? ÀRß
@ Ë « �D�⪠ «U¦¶Ø � Ë YÁ�0Y�� Ä ×Ã� Ä Ë ¨ ©;ª ¨!ºI~_S��n[`Í � ¦ ×Ã� «UÄ �⪠§ � Ä�«i« �¡¬D� ºI~ìS��A[B¯ ÀIH � ¤Z���À & � &A>�Ñ ���u�Dªn� iS�v:[;¯ ½CBñ �ëº(S��n[Õ´.�4¯D>�? À�ÑsC)+*�,.- /.1 zi�u�=E�� Ârq �  §D«U¦ × q �¥¤ Ä ª ¦¶« Ë �Dª�Æ « �  ª ¨ ¬-�L� Ârq ¦¶«  « �  ª ¨ � Å/Ñ 4ð� «Av ¬D� « �u�Ë �Dª�Æ « ���¥¤ « �u� Ë ��ª�Æ��D�¼Ö ¦ �Ã×Ã� Ñ�Ò �  «]¦e§B« �u� Å �  ª �¥¤ vGF m ¤ h ¦e§B« �u�r¬-�L� Ârq Öu� ¦ ª «« �u�Dª h 6 L¼QANv^-S � ø À�[  ª ¨Gv ¯ xAS¶h®[ë¯XrK�C Z�h ø À�?�hb`FÑ ��� Ä�§ � xAS¶²£[&¯ À�?T²Í �u�Dª � & ²}&ÛÀIHRà  ª ¨0xAS¶²£[ǯڲ Í �u�Dª ÀIHRà�H÷²}&ÛÀ�ÑJI �Dªs×Ã�Ã�
ÎSzvB[+¯ ° xnSq²£[¾´�µrSq²£[+¯ ° ñ�K ^;
S¾À�? ²£[Õ´R² @ ° ññ�K ^ ²Î´R²w¯ O¿ ß C
t\mnQAHJIdNeORQAkoOW^Fkd@DbR@-VdKWgFbMKWVÕNüKWfnge@-kðKMVd@;?uKWQAange@-a)NeQ�K]kdNeX/NegeKWV£xìKZèRpL* ^NM1¯5xAS¶h ø v4[IL?A@JQ ÎS�MÎ[;¯ $S�xnSqh ø v4[Õ[ǯ۰ ° xnSq² ø �n[Õ´�µrSq² ø �n[Dß Sq¿Ap ¿�[
� à ������� ��������������������������
)+*�,.-0/.1 z$�u���54ð� «sSqh ø v:[ � Â Ø � Â�� � ¦ ª « Ë Y Ä ª ¦ ¤Z��� Å�¨W¦e§D« � ¦ ¬ ÄF«U¦ ��ª`��ª « �u� Ä ª ¦¶«\§��-Ä Â �L� Ñ4ð� «LM�¯JxnSqh ø v:[;¯ h ^�@ v ^ÁÑ ���u�Dªn�ÎS MÎ[;¯ ° ° xnSq² ø �n[Õ´�µrSq² ø �n[»¯ ° ñ; ° ñ
;Sq² ^ @� ^ [A´R²G´+�:¯ ° ñ
;² ^ ´�² @ ° ñ
;� ^ ´+�:¯ à
O�C
>&?A@���� ��¢G�����£� OW^�h NPk&aA@JtAQA@-aúIdOrfG@ÎÎSqh �Z[]KMkdkdmnXrNPQAlBIL?uKMI&ÎS % h % �-[�&' p���@)kÕ?uKWgPg�VLKWVÕ@-gvèêXrKWéW@iX:mAHÃ?ãmAkÕ@)OM^(X/ORXr@JQ�ILk&fG@JèROWQAa��/¯±àFp����� ��9;,/ �"Î9+%(*-"�� ,/0±���! �"$#&%(')%+*-,/. � J�Áz-���üzr-�� ����� m ¤ h ñâø ßZßZß ø h ò  �L�ã�  ª ¨ � ÅjØ Â � ¦  ¬ Ë � §  ª ¨ � ñâø ß-ßZß ø � ò  �Ã�6×Ã��ª ϧD«  ª «q§ � « �u�Dª ���� ó � ó h ó�� ¯�� ó � ó ÎSqh ó [âß SU¿Ap! R[)+*�,.-0/.1 z$�u���R�B4o� «Çh 6 7ìNeQnORXrNeKWgqSqí ø 9G[JÑ Ò �  «ä¦e§/« �u� Å �  ª �¥¤ h F Ò �î×Ã� Ä Ë ¨« ��Y « �  Ö�Öu� ÂWË « � « �u� ¨ �q©;ª ¦¶«U¦ ��ª Ù
$Sqh®[+¯Û°Ü²Y´�µoÞÎSq²£[;¯ � ¹ ²£º�ÞiSq²£[;¯ ò� ¹Dô<; ²#" í ²%$ 9 ¹ S¾Àw?}9G[ ò ï ¹¬ Ä�«)« � ¦e§w¦e§ ªs� «  ªÚ�  § Y §DÄ�Å « � � Ø ÂWË Ä Â « � Ñ m ª §D« �  ¨ �)ªs� « � « �  «]h ¯ ¸Ûòó�ô ñ h óÍ �u�D�L� h ó ¯³Àú¦ ¤ « �u��& ���« � §Ã§å¦e§ �u�  ¨�§  ª ¨Bh ó ¯ � � « �u�D� Í+¦e§ � Ñ ���u�Dª ÎSqh ó [¼¯S 9 = À�[A@ÛSdS¥Àw?}9G[ =}� [;¯ 9  ª ¨iiS¶h®[+¯ÚÎS ¸1ó h ó [ǯ ¸1ó iS¶h ó [+¯1í�9oÑ J�Áz-���üzr-�� ���' 4ð� «oh ñâø ßZßZß ø h ò ¬D� ¦ ª ¨ �çÖu�Dª ¨ �Dª « �  ª ¨ � ÅÜØ Â � ¦  ¬ Ë � §-Ñ ���u�Dªn�
(� ò) óÐô ñ h ó�� ¯ ) ó ÎSqh ó [Dß SU¿Ap ý [*¼OWIdNeHJ@/Id?uKMI)IL?A@rkdmAX/X`K�ILNeOWQ®VÕmAge@åaAO�@-k�QAOWI�Vd@,+�mnNeVd@rNeQAan@-EG@-QAaA@JQAH-@`fAmnI)IL?A@X:mAgvILNPEAgeNPHZKMIdNeORQ/VdmAgP@$aAO�@-k-p
�������0����w� �� ���l�� � ��� ����w� �� ��� � O����� 8�'49;*-':.6#]" '4.65 � ,��)':9;*-'4.6#ä">&?A@)bMKWVÕNüKWQnH-@$X/@ZKWkÕmAVd@Jk]Id?A@��ÕkdEnVd@ZKMarOM^+KBaANek¾ILVÕNefAmnIdNeORQsp& ,�|�� ~ � $Õz iS¶h ?
, $>,b-¼z , $ ���ezú���z ,� $Ã}á| &-z ÎSqh ?¯É$Sqh®[�?�ݯ? Ý ¯ � � �¡z|l,�|� $¾��-YzD~L} -¼z3$� $¾z % h ? Ý % , $-¼z , $ ���ez�� �L$ / �ez ,� ~ -¼�'�ez����¶~¥z�|��+z~��Áz��r, �U}V,�| &-zW�
y+zJ{G|R} ~L} �n|:�u���]Dl4ð� «nh ¬D�  �  ª ¨ � Å Ø Â � ¦  ¬ Ë � Í+¦¶« � Å �  ª Ý;Ñ ���u�»�n�A���U�u�o�R��¥¤ h � ¨ �Dªs� « � ¨ ¬cY�� ^ ����� ^Þ ����� Sqh¡[ ����� Sqh®[ ����� h � ¦e§i¨ �q©;ªs� ¨ ¬cY� ^ ¯ÚÎSqh ? Ýð[ ^ ¯ ° Sq²F?�Ýð[ ^ ´�µrSq²£[ SU¿np � [
 §Ã§DÄFÅB¦ ª�Æ « � ¦e§ �ÕÔJÖu�Ã× «  «U¦ ��ª!�ÕÔ ¦e§D«q§-Ñ ���u�Ç J�Z�u���ð�A�����ð�����U�n���U¢G� ¦e§Çkda�Sqh®[;¯� � Sqh®[  ª ¨r¦e§ ÂWË § � ¨ �Dªs� « � ¨ ¬cY �  ª ¨ � ÞiÑ
��ÁzJ���üz -�������� @ §Ã§DÄ�ÅB¦ ª�Æ « �u� Ø Â � ¦  ªs×Ã� ¦e§úÍ � Ë¶Ë ¨ �q©;ªs� ¨ � ¦¶« �  §ú« �u��¤Z� Ë¶Ë � Í+¦ ª�ÆÖG�L�ÃÖu�D� «U¦ � §-Ù! Ñ � Sqh®[+¯ÚÎS¶h ^â[�? Ý"^ÁÑ" Ñ�m ¤ �  ª ¨$#  �L�B×Ã��ª §D«  ª «q§i« �u�Dª%� S �Rh[@&#D[;¯ � ^ � S¶h®[-Ñ' Ñ�m ¤ h ñDø ßZßZß ø h ò  �L� ¦ ª ¨ �çÖu�Dª ¨ �Dª «  ª ¨�� ñDø ßZß-ß ø � ò  �Ã�4×Ã��ª §D«  ª «¶§ � « �u�Dª
� � ò� óÐô ñ � ó h ó � ¯ ò� óÐô ñ � ^ó � Sqh ó [âß Sq¿Ap)(�[)+*�,.- /.1 zi�u�v�cU54ð� «(h 6 7ìNPQAORX/NüKWg¶SUí ø 9G[-Ñ�Ò � Í � ¦¶« � h ¯ ¸ ó h ó Í �u�D�L� h ó ¯�À¦ ¤ « � §Ã§ & ¦e§ �u�  ¨�§  ª ¨Bh ó ¯ � � « �u�D� Í+¦e§ � Ñ ���u�Dª h ¯ ¸ ó h ó  ª ¨ã« �u�å�  ª ¨ � ÅØ Â � ¦  ¬ Ë � §  �L� ¦ ª ¨ �çÖu�Dª ¨ �Dª «ÊÑ @ Ë § ��� 9ëS¶h ó ¯ ÀÁ[B¯ 9  ª ¨ 9ìSqh ó ¯ � [å¯ À ?B9oÑW]�Ã× ÂWË¶Ë « �  « $Sqh ó [+¯+* 9 = À-,+@.*vS¾Àw?o9G[ = � ,\¯ 9�ß/ � Í � $Sqh ^ó [+¯+* 9 = À ^ ,�@.*vS¾Àw?}9G[ =}� ^ ,G¯ 9�ß���u�D�L�q¤Z���Ã�Ã�0� S¶h ó [w¯ ÎSqh ^ó [�? 9 ^ ¯ 9 ? 9 ^ ¯ 9oS¾À ? 9G[JÑ E ¦ ª ÂWË¶Ë Y��0� Sqh¡[ã¯� S ¸1ó h ó [ë¯ ¸1ó � S¶h ó [ë¯ ¸1ó 9oS¾À�? 9G[ë¯ í 9oS¥À�? 9G[JÑ�/ � «U¦ ×Ã� « �  « � Sqh®[ì¯ � ¦ ¤9î¯�À ��� 9`¯ � Ñ21 Ârq � §DÄ �L�0Y�� Äw§ �Ã� Í ��Y « � ¦e§iÅ Ârq � §i¦ ª «UÄF¦¶«U¦¶Ø � § �Dª § � Ñ
� ¿ ������� ��������������������������
* ^»h ñâø ß-ßZß ø h ò KMVd@iVLKWQnaAORX�bMKWVÕNüKWfnge@-k_IL?A@JQ!x_@)an@JtuQA@iIL?n@ J�u�����U�ú�����u� ILOfG@ h ò ¯ Àí ò� ó�ô ñ h ó SU¿Ap þ [KWQAaîId?A@ -�u�����U�ã�A�A���q�u�o�R� ILOåfG@
� ^ò ¯ Àí ?÷À ò� ó�ô ñ � h ó ? h ò � ^ ß Sq¿ApvÀ � [ J�Áz-���üzr-�� ��� � 4ð� «oh ñâø ßZßZß ø h ò ¬D� õUõ¶ö  ª ¨ Ë � «�Ý!¯ÚÎS¶h ó [ � � ^ ¯ � Sqh ó [JÑ ���u�DªÎS h ò [+¯�Ý ø � S h ò [+¯ � ^í KWQAa ÎS � ^ò [+¯ � ^ ß
* ^Yh KWQAa v KWVd@wVdKWQAaAORX bMKWVdNeKWfAge@Jkc#+IL?n@-Q Id?A@wH-O�bMKWVdNeKWQAHJ@!KMQAa H-ORVÕVd@-geKMILNPORQfG@JIçx_@-@-Q�hjKWQAa}v Xr@-KWkdmnVd@4?nOÁx�k¾ILVÕORQAl`Id?A@4gPNeQA@-KWV]Vd@-geKMILNPORQAkÕ?ANeEwNekäfG@JIçx_@-@-Q6hKWQAa v`py+zD{£|R} ~L} �n|4�u�v� <54ð� «Çh  ª ¨ v ¬D�å�  ª ¨ � ÅÉØ Â � ¦  ¬ Ë � §åÍ+¦¶« � Å �  ª §iÝsÞ Â ª ¨Ý"~  ª ¨Î§D«  ª ¨  � ¨�¨ � Ø�¦  «U¦ ��ª § � Þ Â ª ¨ � ~_Ñ�� �q©;ªs� « �u�+�R¢F�n�A���U�u�o�R�î¬D� «UÍ �Ã�Dªh  ª ¨ v ¬cY � � � S¶h ø v4[ǯ÷ *PSqh ?�ÝsÞë[-S�vM?�Ý"~o[ , SU¿npPÀRÀ�[ ª ¨r« �u�¼�R¢\�Á�����q�n���q¢G�Ú¬cY
� ¯ � Þ�� ~¡¯ � Sqh ø v:[;¯� � � Sqh ø vB[� ¹ � ~ ß SU¿npPÀÁàW[
J�Áz-���üzr-�� ���3���u�:×Ã� Ø Â � ¦  ªs×Ã� §  «U¦e§ ©ë� §-Ù� � � Sqh ø v:[;¯ $Sqh v:[ ?�ÎSqh®[ ÎSzv:[âß���u�:×Ã���â�L� Ëv «Ó¦ ��ª §  «Ó¦e§ ©ë� §-Ù?)À H � S¶h ø v:[ H ÀRß
��� ���F��� �������������w� ���� ����w� �� ���B��� ���o������<���G �w����G�J� � ����� � �� ����� m ¤ v ¯ � @ #Lh ¤Z��� § � Å �î×Ã��ª §D«  ª «q§ �  ª ¨ #w« �u�Dª � Sqh ø v:[i¯3Àú¦ ¤ #�� �  ª ¨� Sqh ø vB[+¯n?)ÀΦ ¤ #& � Ñ�m ¤ h  ª ¨sv  �L� ¦ ª ¨ �çÖu�Dª ¨ �Dª « � « �u�Dª � � � Sqh ø vB[+¯ � ¯ � Ñ���u�:×Ã��ª Ø �D� § � ¦e§ ªs� «Ç« � Ä � ¦ ªãÆ��Dªs�D� ÂWË Ñ ��ÁzJ���üz -�������� � S¶h @Jv4[i¯ � Sqh®[�@ � Szv:[ @1à � � � Sqh ø v:[  ª ¨ � S¶h ?lvB[$¯� S¶h®[ @ � Szv:[�?$à � � � Sqh ø v:[JÑ 1 ���L�(Æ��Dªs�D� ÂWË¶Ë Y���¤Z���;�  ª ¨ � ÅÛØ Â � ¦  ¬ Ë � §�h ñDø ßZßZß ø h ò �
� � � ó � ó h ó � ¯ � ó � ^ó � Sqh ó [�@ à � � ó��� � ó � �� � � Sqh ó ø h � [âß
���¾� ���! �"$#&%(')%+*-,/. '4.65 8�'49Ç*J':.6#]" ,å0�ð7 �,/9(%('4.ê%23'4.65�,r7 8�':9;*-'4<6=-"��y¼@-VÕ@)x_@iVd@JH-ORVÕaãId?A@i@DCFE£@JHJILKMILNPORQ!OM^»kÕORXr@iNPXrEGORVÕILKWQ�IëVdKWQAaAORX�bWKMVdNüKMfAge@Jk-pyì}%$¥~��Ó} ����~L} �n| �:z ,�| � ,!�Ó}V,�| &-z��OWNeQ�IäX`KWkÕk]KMI � � �7ì@JVdQAORmngegeN+SUE\[ E EðS¾ÀJæ E\[7ìNPQAORX/NüKWg»SUQ"# E\[ E Q!E�S¾ÀJæ E\[� @-ORX/@JIdVdNeH/SqE\[ À��ME S¾Àw?}9G[|H 9 ^��OWNekdkÕORQ�S��G[ � �L¼QnNP^qORVÕX SÓK # f\[ SÓK.@Îf\[��Rà S #T? ��[ ^ HFÀÁà*¼OWVdXrKWg;SUÝ ø � ^ [ Ý � ^�»CFE£ORQn@-Q�ILNeKWgÇS���[ � ��^� KWXrXrKwS�� ø ��[ ��� ��� ^7ì@DIÃK!S�� ø �»[ ��HAS�� @���[ ��� HASdS�� @���[ ^ S�� @�� @ÚÀÁ[d[�! � SqNP^#"$�±ÀÁ[ "<HAS�" ?TàR[$SqNP^#"%� àR[& ^' 9 à�9(!mAgvILNPQAORX/NüKWg»SUí ø 9G[ QAE kd@-@)fG@-geO�x(!mAgvILNvbWKMVdNüK�IL@�*äORVdXrKWg+SqÝ ø*) [ Ý )
��@]aA@JVdNPbW@-aBiS¶h®[(KWQAa � Sqh®[�^qOWV(IL?A@&fANPQAORX/NüKWg�NeQBId?A@&EAVd@Db�NeORmAk;kÕ@-HJIdNeORQsp(>&?A@HZKWgPH-mAgeKMILNPORQAkì^qOWV&kdORX/@$OW^oIL?A@iOWId?A@-VÕkYKWVÕ@$NeQîIL?n@)@âCFH-@-VÕH-NPkd@-kJp
� ý ������� ��������������������������
>&?A@igüKWk¾IëIçx_O/@JQ�ILVdNP@-k&NPQêIL?A@$IÃKMfAge@$KMVd@ÎX:mngPILNvbMKWVdNeKMIL@äXrO�aA@Jgekìx]?ANPHÃ?wNeQ�bRORgvbR@iKVLKMQAaAORX bR@-HDILORV]h OW^oIL?A@$^qORVÕXh ¯ ��� h ñppph �
���� ß>&?A@iX/@ZKWQwOW^»K/VdKWQAaAOWX�bR@JHJIdORVäh NPkäan@JtuQA@Jaúf�è
Ýú¯ ��� Ý ñpppÝ ����� ¯ ��� $Sqh ñ [pppÎSqh � [
���� ß>&?A@ �A�n� �q�u�o�R���Ã�R¢��A�A���q�u���R�����n�����¶� ) Nek&an@JtuQA@Ja!IdO/fG@
� S¶h®[;¯ �� S¶h ñ [ � � � S¶h ñâø h ^ [ � � � � � � Sqh ñâø h � [� � � S¶h ^ ø h ñ [ � S¶h ^ [ � � � � � � Sqh ^ ø h � [ppp ppp ppp ppp� � � S¶h � ø h ñ [ � � � S¶h � ø h ^ [�� � � � Sqh � [
������ ß* ^»hi6 (!mAgvILNPQAORX/NüKWgqSqí ø 9\[_IL?A@JQãÎSqh®[+¯�í 9ê¯�í+S:9 ñDø ßZß-ß ø 9 � [ëKWQAa
� S¶h®[+¯ �����í�9 ñ S¥À�?}9 ñ [ ?¼í�9 ñ 9 ^ � � � ?¼í�9 ñ 9 �?äí 9 ^ 9 ñ í 9 ^ S¥À�? 9 ^ [�� � � ?¼í�9 ^ 9 �ppp ppp ppp ppp?äí 9 � 9 ñ ?¼í�9 � 9 ^ � � �±í 9 � S¾Àw?o9 � [
������ ß>oO�kÕ@-@!Id?ANek #ìQAOMIL@wIL?uKMIåId?A@!XrKWVÕlRNeQuKMgìaANPkÕILVÕNefAmFILNeOWQTOM^$KWQ�è ORQA@ãH-OWXrEGORQA@JQ�I/OW^IL?n@ãbW@-HJIdORV`NPkrfnNeQAORX/NüKMg #(IL?AKMI`NPk/h ó 6 7ìNPQAORX/NüKWg¶SUí ø 9 ó [âp >&?�mnkc#ëÎSqh ó [î¯ í 9 óKWQAa � S¶h ó [&¯�í�9 ó S¥À�? 9 ó [Dp#*äOWIL@)IL?uK�IYh ó @ h � 6�7ìNeQnORXrNeKWgqSqí ø 9 ó @ 9 � [Dp¼>&?�mAkc#� Sqh ó @�h � [+¯1í(S 9 ó @ 9 � [-S¥ÀS? * 9 ó @ 9 � ,ü[Dp�(ÎQ`Id?A@YOWId?A@-V_?uKWQAa$#nmAkÕNeQAl)IL?A@ä^qORVdX:mngüK^qORVBIL?n@ãbMKWVÕNüKWQAHJ@wOW^$K¡kdmAX #_x_@ã?uKZbR@!Id?uKMI � Sqh ó @Ûh � [ê¯ � Sqh ó [T@ � Sqh ó [g@à � � � Sqh ó ø h � [;¯1í�9 ó S¾Àw?}9 ó [A@ í 9 � S¥Àw?}9 � [A@ à � � � Sqh ó ø h � [Dp * ^»èWORmú@,+�muKMIL@iIL?nNek^qORVÕX:mAgüK)x]NPIL?ãí+S:9 ó @ 9 � [JS 9 ó @ 9 � []KMQAawkdORgvbR@�#uOWQA@ilR@JIdk � � � Sqh ó ø h � [+¯n?äí 9 ó 9 � pt»NeQAKWgegvè #�?A@JVd@ÇNekoKägP@-X/X`KìId?uKMIoHZKMQ4fG@Çmnkd@J^qmng�^qORV�tuQnaANeQAläX/@ZKWQAkoKWQna�bMKWVdNeKWQAHJ@-kOW^�geNPQA@ZKWVìH-ORX:fnNeQuKMIdNeORQnkëOW^�X:mAgPIdNPbMKWVÕNüKMId@¼VdKWQAaAORX�bR@-HDILORVÕk-p�Mzr- -�,4�u�32�� m ¤ �Ú¦e§ Â Ø �Ã× « ���  ª ¨!h ¦e§  �  ª ¨ � ÅjØ �Ã× « ��� Í+¦¶« � Å �  ª Ý Â ª ¨Ø  � ¦  ªs×Ã� ) « �u�Dª ÎS ���\h®[;¯ ���£Ý  ª ¨ � S ���\h®[;¯ ��� ) �GÑwm ¤ .³¦e§  Š « � ¦ Ô « �u�DªÎS8.äh®[;¯:.YÝ Â ª ¨ � S .¼h®[;¯ . ) .��(Ñ
�����+�M����� ��c���������b��� �������������w� � ������ � ,/.656*J%+*-,/.�'4= ���! �"$#&%(')%+*-,/.� mAEAEGORkd@êId?uKMI�h KWQAa v3KWVd@rVLKWQnaAORX bMKWVdNeKWfAge@Jk-p �Ú?uKMIBNPk)Id?A@êX/@ZKWQ�OW^ëhKWX/ORQAlåIL?AORkÕ@4ILNPXr@Jk&x]?A@-Q v ¯M��{�>&?A@BKWQAk¾xì@JVÎNek]IL?AKMI¼x_@:HJORXrEnmnIL@iIL?A@�Xr@-KWQOW^Îh KMk`fG@J^qORVÕ@!fAmFI`x_@�kdmAfnkÕILNvILmnId@¡º Þ�� ~ Sq² % �A[:^qORV`ºÁÞiS¶²£[/NPQ IL?n@úaA@DtuQANPIdNeORQTOW^@DCFEG@-HJILKMILNPORQ�py+zJ{G|R} ~L} �n|:�u�32n�����u�:×Ã��ª ¨W¦¶«U¦ ��ª ÂWË �ÕÔJÖu�Ã× «  «U¦ ��ª��¥¤ h Æ ¦¶Ø �Dª v¯ �¡¦e§
ÎS¶h % v�¯ �n[+¯�·�¸ ²$º�Þ�� ~ÇSq² % �A[A´R² aANekÕH-VÕ@JIL@4HZKWkÕ@½ ²Îº Þ�� ~ Sq² % �n[A´R² H-ORQ�IdNeQ�mAORmAkYHZKWkÕ@Rß SU¿ApvÀ O [m ¤ xnSq² ø �A[$¦e§  ¤ Ä ªs× «Ó¦ ��ª��¥¤ ²  ª ¨ �¡« �u�DªÎS�xnSqh ø v:[ % v¯J�n[+¯ · ¸ xAS¶² ø �A[uº Þ�� ~ S¶² % �n[A´R² aANPkdHJVd@JId@:HZKMkd@½ xnSq² ø �A[uº�Þ�� ~;S¶² % �n[A´�² HJORQ�ILNeQ�mAOWmAkÎHZKMkd@RßSU¿ApvÀZ¿�[
�Ú?A@-VÕ@ZKWk #ðÎSqh¡[YNek$KîQ�mAX:fG@-V #�ÎS¶h % v³¯ �n[¼NPk$Kê^qmAQAHJIdNeORQ6OM^T�Gp 7ì@D^qORVd@:x_@ORfAkÕ@-VÕbW@�v>#Rxì@äaAORQ"! I;é�QAO�x÷IL?n@]bMKWgPmA@ëOW^\$Sqh % v�¯ �A[okdO)NPI+NPk;KiVdKWQAaAOWX�bMKWVdNeKWfAge@x]?ANeHÃ?�xì@/aA@-QnOWIL@å$S¶h % v4[Dp *çQ�OWIL?A@JV)x_ORVÕaAkc#ðiS¶h % v�[iNPk$IL?A@rVLKWQAanORXÉbMKWVdNeKWfAge@x]?AORkÕ@úbMKWgemA@ãNek:$Sqh % v ¯ �n[Bx]?A@JQl�Ú¯ �Gp � NPXrNPgüKWVÕgPè #»$S�xnSqh ø v�[ % v4[/NPk/Id?A@VLKWQnaAORX bWKMVdNüKMfAge@åx]?AORkÕ@êbWKMgemA@êNPk)$S�xnSqh ø v:[ % v ¯ �A[$x]?A@-Q ��¯ �£p6>&?nNek:NPk:KbR@JVÕèwH-OWQn^qmAkdNPQAl/E£ORNPQ�I&kdO/ge@DI]mAk]geO�ORéêKMIäKMQã@âCnKWXrEnge@Wp)+*�,.- /.1 zi�u� 2�2� Ä Ö�Öu� § � Í � ¨ �  Í�h 6KL¼QANv^-S � ø À�[-Ñ @ ¤ « �D� Í �ã�W¬ § �D� Ø � h ¯ ² �Í � ¨ � Â Í v % h ¯�² 6[LäQANP^-S¶² ø ÀÁ[-Ñ>m ª «UÄF¦¶«U¦¶Ø � Ë Y�� Í �`�ÕÔJÖu�Ã× «ä« �  «+$S�v % h ¯�²£[ίS¾Àg@ ²£[ HRàAÑ�m ª4¤ Â × « � ºI~� Þ$S�� % ²£[+¯ ÀIHAS¥À�? ²£[ ¤Z��� ² & � &ÛÀ  ª ¨
$S�v % h�¯ ²£[ǯ ° ñ¹ �&º ~�� Þ S�� % ²£[Õ´.�4¯ ÀÀw? ² ° ñ¹ �ì´.�B¯ Àg@ ²à § �ÕÔJÖu�Ã× « � ¨�Ñ ��� Ä�§ � $Szv % h®[+¯ S¾À$@ãh®[|HWàAÑ / � «U¦ ×Ã� « �  «GÎS�v % h®[+¯ S¾À$@ãh®[ HRà/¦e§Â �  ª ¨ � Å Ø Â � ¦  ¬ Ë � Í �u� § � Ø ÂWË Ä � ¦e§)« �u�)ª Ä�Å ¬D�D� $Szv % hɯ�²£[ǯ S¾Àg@T²£[ HRà ��ªs×Ã�h ¯ ² ¦e§ �W¬ § �D� Ø � ¨�ÑsC
� ( ������� ��������������������������
J�Áz-���üzr-�� � '����� ��Áz����.1 zi���;} ~¥z � ,Á~¥z êz *�/Wz3&D~N,Á~L} �n| $2�)( E����+�  ª ¨ � Å±Ø Â � ¦  ¬ Ë � §�h  ª ¨v �  §Ã§DÄ�ÅB¦ ª�Æ « �u�B�ÕÔJÖu�Ã× «  «U¦ ��ª § �ÕÔ ¦e§D« � Í �$� Â Ø � « �  « * ÎSzv % h®[ ,£¯÷iS�v:[ KWQAa * $Sqh % v�[ ,£¯ÚÎSqh¡[Dß Sq¿ApvÀ R[1 ���Ã�YÆ��Dªs�D� ÂWË¶Ë Y��u¤Z���  ªpY]¤ Ä ªs× «U¦ ��ª xAS¶² ø �n[iÍ �Î� Â Ø � * $S�xAS¶h ø v4[ % h®[ ,G¯ $S�xAS¶h ø v4[d[ KWQAa * $S�xnSqh ø v:[ % h®[ ,G¯ $S�xnSqh ø v:[Õ[DßSq¿ApvÀ ý [
���������_��@�! geg�EAVdO�bR@ëId?A@ìtuVÕkÕI+@,+�muKMILNPORQ�p L¼kdNPQAlYId?A@ëaA@DtuQANPIdNeORQ4OW^\H-ORQnaANPIdNeORQuKMg@DCFEG@-HDIÃKMIdNeORQãKWQAawIL?n@i^UKMHJI&IL?AKMIYº(Sq² ø �n[(¯±º»Sq²£[dº»S�� % ²£[�# * ÎSzv % h®[ , ¯ ° $Szv % h�¯�²£[ÕºÁÞÎSq²£[Õ´R²î¯ °c° �nº(S�� % ²£[¾´.�Aº(S¶²£[Õ´R²
¯ ° ° �nº(S�� % ²£[Õº(Sq²£[¾´R²G´+�4¯±°c° �Aº»Sq² ø �n[Õ´R²G´.�:¯÷iS�v:[âß}C)+*�,.-0/.1 z$�u�32W� X ��ª §D¦U¨ �D���ÕÔ Â Å Ö Ë � � Ñ " " Ñ I � Í × Â ª Í ��×Ã� Å Ö Ä�« � $Szv4[ F � ªs�Å � « �u� ¨¼¦e§Ç« �ð©;ª ¨ä« �u� � � ¦ ª «s¨ �Dª §D¦¶« Y º(S¶² ø �n[  ª ¨Y« �u�Dªr×Ã� Å Ö Ä�« � ÎSzv:[+¯ ½`½ �Aº(S¶² ø �n[Õ´R²G´.��Ñ@ ªr�  §D¦ �D� Í Â Y ¦e§Ç« � ¨ � « � ¦e§Ç¦ ª «UÍ � §D« �çÖ §-Ñ E ¦ � §D« � Í � ÂWË �Ã�  ¨ Y q ªs� Í�« �  «�$Szv % h®[+¯S¾Àg@�h®[|HRànÑ ��� Ä�§ �$Szv:[;¯÷_ÎSzv % h®[+¯ " S¥Àg@ h®[à $ ¯ S¾Àg@�ÎS¶h®[d[à ¯ S¥Àg@ÛS¾À]HRàR[d[à ¯ O HM¿Aß C
y+zJ{G|R} ~L} �n|:�A� 2�U ���u�:×Ã��ª ¨W¦¶«U¦ ��ª ÂWË Ø Â � ¦  ªs×Ã� ¦e§�¨ �q©;ªs� ¨  §� Szv % h�¯�²£[;¯ ° S�� ?�Ý;S¶²£[d[ ^ º(S�� % ²£[¾´+� Sq¿ApvÀ � [
Í �u�D�L� Ý;S¶²£[ǯ÷iS�v % hɯ ²£[-Ñ J�Áz-���üzr-�� � ' � Eo���$�  ª ¨ � Å³Ø Â � ¦  ¬ Ë � §ìh  ª ¨ v �
� S�vB[+¯Ú � S�v % h®[A@ � ÎSzv % h®[Dß
�����.� ����������� ��B�������� �� � � þ)+*�,.- /.1 zi�u� 2�< � � Â Í Â ×Ã� Ä ª « Y  « �  ª ¨ � Å ¤â�L� Å « �u� a ª ¦¶« � ¨ � «  « � §-Ñ ���u�Dª ¨ �  Íí Öu�Ã�ÃÖ Ë �  « �  ª ¨ � Å ¤â�L� Å « �u�w×Ã� Ä ª « Y Ñ 4ð� «_h ¬D� « �u�`ª Ä�Å ¬D�D�ê�¥¤ « �u� § �iÖu�Ã�ÃÖ Ë �Í �u�å� Â Ø �  ×Ã�D� «  ¦ ª ¨W¦e§ �  § � Ñ m ¤�� ¨ �Dªs� « � §�« �u�¼ÖG�L�ÃÖu��� «Ó¦ ��ª �¥¤å�¥¤&Öu�Ã�ÃÖ Ë � ¦ ª « �  «×Ã� Ä ª « Y Í+¦¶« � « �u� ¨W¦e§ �  § � « �u�Dª�� ¦e§ ÂWË § �  �  ª ¨ � Å Ø Â � ¦  ¬ Ë � §D¦ ªs×Ã� ¦¶«+Ø Â � ¦ � § ¤â�L� Å×Ã� Ä ª « Y « ��×Ã� Ä ª « Y Ñ��ë¦¶Ø �Dª�� ¯� � Í �/� Â Ø � « �  «ìh 6 7ìNPQAORX/NüKWgUSqí ø �W[-Ñ ��� Ä�§ �iS¶h % � ¯�W[¼¯�í��  ª ¨ � Sqh % � ¯��W[ä¯�í �nS¾À�?��R[JÑ � Ä Ö�Öu� § � « �  «&« �u�å�  ª ¨ � ÅØ Â � ¦  ¬ Ë ��� �  §  a ª ¦ ¤Z��� Å È�� � ! Ì ¨W¦e§D« � ¦ ¬ Ä�«Ó¦ ��ª Ñ ���u�Dªn� iS¶h®[`¯3Ç$S¶h % � [`¯iSqí � [Y¯ íGÎS � [ί�í"HRàAÑ 4ð� «äÄ�§ ×Ã� Å Ö Ä�« � « �u� Ø Â � ¦  ªs×Ã�r�¥¤ hTÑ / � Í � � S¶h®[Y¯ � Sqh % � [S@ � iS¶h % � [Dß 4ð� «�� § ×Ã� Å Ö Ä�« � « �u� § � «UÍ � « �D� Å4§-Ñ E ¦ � §D« � � Sqh % � [:¯ * í � S¾À ? � [ ,£¯1íGÎS � S¾À ? � [Õ[ǯ�í ½ �nS¾À ?��W[dº(S��R[¾´��$¯1í ½ ñ; �AS¥À ?��W[Õ´��i¯�í"H ý Ñ/ �ÕÔ « � � $Sqh % � [&¯ � Sqí � [&¯í ^ � S � [&¯í ^ ½ S��0?1S¥ÀIHRàR[Õ[ ^ ´��B¯�í ^ HFÀ�àAÑ I �Dªs×Ã�Ã�� S¶h®[+¯ SqíAH ý [A@1SUí ^ HFÀÁàR[JÑsC����� �T"Î#��6.6*Z#]':=�� 6 �"$.656*D��� "!# �$ %'&�(*),+ -/.0-/132�45.768.94;:<4�-=),>@?A.9B
>&?A@åNeQ�IL@JlRVLKMg»OW^_KîXr@-KWkdmnVLKWfAgP@4^qmAQAHDILNPORQ xAS¶²£[YNekÎan@JtuQA@Ja�KMkÎ^qORgegPOÁx]kJp t�NeVdk¾I$kdmnEFæE£OWkd@BIL?uK�I x`Nek$kdNPXrEnge@�#�X/@ZKMQANeQAlêId?uKMIiNvI$IÃKWéM@-kituQANPId@-gvè�XrKWQ�èúbMKWgemn@-k � ñDø ßZßZß ø � �O�bR@-VãK�EuKWV¾ILNvILNeOWQA. ñDø ß-ßZß ø . � p�>&?A@-Q ½ xnSq²£[¾´�µrSq²£[6¯ ¸ �ó�ô ñ � ó 9ìS�xnSqh®[ 4 . ó [âp>&?A@YNPQ�IL@-lWVLKWg\OW^ðK)E£OWkdNPIdNPbW@¼X/@ZKWkÕmAVLKMfAge@]^qmAQAHDILNeOWQFxiNekÇaA@JtAQA@-aêf�è ½ xnSq²£[¾´�µrSq²£[+¯geNPX ó ½ x ó Sq²£[Õ´�µrSq²£[ox]?n@-Vd@wx ó NPk+KÎkd@,+�mA@-QAHJ@¼OM^£kÕNeX/EAge@Ç^qmAQAHDILNeOWQAk+kdmAHÃ?åIL?AKMISx ó S¶²£[�HxAS¶²£[rKWQna x ó S¶²£[DC xnSq²£[/KWk & C ' pÛ>&?nNek`anOF@JkêQAOWI`aA@-EG@-QAa OWQ÷IL?A@!EuKWVÕIdNeHâæmAgüKMVãkÕ@�+�mA@JQAH-@Mp >&?A@�NPQ�Id@-lRVdKWg$OW^:KTX/@ZKWkÕmAVLKMfAge@6^qmAQAHDILNPORQ x�NPkwaA@JtuQn@-aÛIdO fG@½ xnSq²£[¾´�µrSq²£[`¯ ½ xFE+S¶²£[Õ´�µ`S¶²£[w? ½ x�ïoS¶²£[Õ´�µ`S¶²£[/KMkdkdmnXrNPQAl�fGOWId? NPQ�Id@-lRVdKWgek/KWVÕ@tuQANvIL@�#Ax]?n@-Vd@ x E Sq²£[;¯1XrK�C_ZIxnSq²£[ ø � `4KWQAa x ï Sq²£[;¯n?�X/NeQ_ZIxAS¶²£[ ø � `�p�� "!# HG I 2@JK)L4�-NMN)L4O),?A.0-A1P4O>RQOST4O+ -A132�4O6
( � ������� ��������������������������
y+zD{£|R} ~L} �n|4�u� 2�E ���u�o��¢G�����£���\���o���Á�n���U�����ç�o�o�W���U¢G�G�d����� (Z�ð�����;�A���U�u�R����Á�u�o � ¢\���T�&�¥¤ h ¦e§4¨ �q©;ªs� ¨ ¬cY ÞiS � [;¯ $S�> � Þ [+¯ ° > � ¹ ´�µrSq²£[
Í �u�D�L� � Ø Â � ¦ � § � Ø �D� « �u�)�Ã� ÂWË ª Ä�Å ¬D�D� §-Ñ*çQãx]?AKMI¼^qOWgegeO�x]k #Fx_@4KWkÕkdmAX/@iIL?uKMI&Id?A@)XrlW^oNPk]x_@-gPg�aA@JtAQA@-aã^qORVäKWgeg � NeQwkdXrKWgPgQA@JNelR?�fGORVd?AO�O�aîOW^ � p���Vd@JgüKMId@-aw^qmAQAHDILNeOWQãNPkëIL?A@)HÃ?uKWVdKWHJId@-VÕNekÕIdNeH$^qmAQAHDILNPORQ"#AaA@DtuQA@Jaf�è�$S�> ó � Þ [�x]?A@-VÕ@ & ¯� ?)ÀRp(>&?ANPk(^qmAQAHDILNeOWQBNek+KWgvxëKZè�k+xì@JgegAan@JtuQA@Ja/^qOWV+KWgeg � p(>&?A@X/lW^(NPkämnkd@J^qmngo^qORVäkd@DbR@JVLKWg�Vd@ZKMkdORQAkJp�t»NeVÕkÕI #\NvI¼?n@-geEnk¼mAk¼H-ORX/EAmnId@)Id?A@)XrORX/@-Q�Idk¼OW^KîaANek¾ILVÕNefAmnIdNeORQsp � @JH-ORQAa$#�NvI$?A@-gPEAkimAk$tuQAa¡IL?A@åaANPkÕIdVdNefnmnILNPORQ6OW^_kdmAX/k$OW^_VLKWQAanORXbMKWVdNeKWfAgP@-k-p�>&?ANPVda"#äNPI`Nekêmnkd@-aÚILO�EnVdO�bR@6IL?n@6H-@-Q�IdVLKWgÎgPNeX/NPIåIL?n@-ORVÕ@-X x]?nNeHÃ? xì@aANPkdH-mnkdkägüKMId@-V-p�Ú?A@-QBId?A@&XrlM^\Nek�x_@-gPgAaA@DtuQA@-a$#�NvI(H-KWQBf£@ìkd?AO�x]QåIL?uKMI»xì@&HÃ?uKMQAlR@&NeQ�Id@-VdHÃ?AKWQAlR@IL?n@)OWE£@JVLKMIdNeORQAkìOW^(aAN��s@JVd@JQ�IdNüKMIdNeORQêKWQAa �¾IÃKWé�NPQAlB@DCFEG@-HJILKMILNPORQ�p �>&?ANPk]ge@ZKMaAkëILO
�� S � [+¯�� ´´ � �> � Þ�� � ô<; ¯÷�� ´´ � > � Þ�� � ô<; ¯Ú h h+> � Þ j � ô<; ¯ $Sqh®[âß7_èrIÃKWé�NeQnl:^qORV&aA@-VÕNPbMKMIdNPbR@Jkëxì@$HJORQAHJgemAaA@$Id?uKMI �� ��� S � [+¯ÚÎSqh � [Dp+>&?ANPk&lRNPbW@-k&mAkäKX/@JIL?nOFaî^qORV&HJORXrEnmnILNPQAl:IL?A@iX/ORX/@-Q�ILk&OW^»KåaANek¾ILVdNPfAmnIdNeORQ�p)+*�,.-0/.1 z$�u�32 �l4o� «ohi6 �(CFEoS¥ÀÁ[-Ñ Eo���  ªpY � &±À �
Þ$S � [;¯Ú > � Þ ¯ °lf;
> � ¹ > ï ¹ ´R²w¯ °lf;
> � � ïuñ � ¹ ´�²î¯ ÀÀw? � ß���u� ¦ ª « �ÊÆW� ÂWË ¦e§:¨W¦¶Ø �D�ÓÆ��Dª «Ç¦ ¤ ��� À�Ñ ������ Þ$S � [ǯ ÀIHAS¾Àw? � [ ¤Z��� ÂWË¶Ë � &�À�Ñ�/ � Í � � S � [B¯ À  ª ¨ � � S � [B¯�àAÑ I �Dªs×Ã�Ã� ÎSqh¡[B¯ À  ª ¨ � Sqh®[å¯Ü$S¶h ^ [s? Ý ^ ¯à?÷ÀY¯À�Ñ�Mzr- -�,4�u�3D����_�Ã�ÃÖu�D� «U¦ � § �¥¤ « �u� Å ÆÕ¤ ÑÈ ! Ì m ¤ v¯ ��h[@ #/« �u�Dª� ~ìS � [+¯?>! � ÞiS�� � [-ÑÈ " Ì m ¤ h ñâø ßZßZß ø h ò  �L� ¦ ª ¨ �çÖu�Dª ¨ �Dª «  ª ¨ v�¯ ¸1ó h ó « �u�Dª" ~_S � [+¯$# ó ó S � [Í �u�D�L�� ó ¦e§)« �u� Å ÆÕ¤��¥¤ h ó Ñ
�����]����� ����� � � ��� � (nÀ)+*�,.- /.1 zi�u� DF� 4ð� «+h 6 7_NeQAORX/NüKMgqSUí ø 9G[-Ñ @ § ¬D�q¤Z���L� Í � q ªs� Í1« �  «+h ¯ ¸ ó h óÍ �u�D�L� 9ìS¶h ó ¯�ÀÁ[;¯ 9  ª ¨ 9ëS¶h ó ¯ � [+¯ À�?G9oÑ / � Í ó S � [;¯Ú > Þ�� ��¯�S 9 = >��U[�@SdS¥À�?}9\[d[+¯ 9 > � @��wÍ �u�D�L� �)¯ À�?}9ðÑ ��� Ä�§ �� ÞiS � [ǯ # ó ó S � [;¯�S:9 > � @ �R[ ò Ñ ��ÁzJ���üz -���� � ' 4ð� «�h  ª ¨ v ¬D�)�  ª ¨ � ÅÜØ Â � ¦  ¬ Ë � §-Ñsm ¤� ÞiS � [;¯ ~_S � [ ¤Z��� ÂWË¶Ë �¦ ª  ª��ÃÖu�Dª ¦ ª « �D� Ø ÂWËo �Ã� Ä ª ¨�� � « �u�Dª h �¯ vêÑ)+*�,.- /.1 zi�u� D�D 4ð� «�hK6M7ìNeQAOWXrNeKWgqSUí ñDø 9G[  ª ¨ihi6M7ìNPQAORX/NüKWgUSqí ^ ø 9G[ ¬D� ¦ ª ¨ �çÖu�Dª Ϩ �Dª « Ñ 4ð� « v¯Úh ñ @�h ^ Ñ0/ � Í ~_S � [;¯ ñ S � [ ^ S � [;¯ S:9 > � @ �R[ ò�� S 9 > � @��W[ ò�� ¯ S 9 > � @��W[ ò� E ò� ª ¨/Í �Î�Ã�Ã×Ã�dÆWª ¦�� � « �u� Ëv «Ó« �D�  §$« �u� Å ÆÕ¤)�¥¤  7_NeQAORX/NüKMgqSUí ñ @�í ^ ø 9G[i¨W¦e§D« � ¦ ¬ Ä�«Ó¦ ��ª Ñ� ¦ ªs×Ã� « �u� Å ÆÕ¤B×Ã�  � Â × « �D� ¦�� � §)« �u� ¨W¦e§D« � ¦ ¬ Ä�«Ó¦ ��ª È ¦UÑ � Ñ$« �u�D�L�å× Â ª � « ¬D�  ªs� « �u�D�)�  ª Ϩ � Å Ø Â � ¦  ¬ Ë � Í � ¦ ×Ã�:�  §]« �u� §  Š� Å ÆÕ¤ Ì Í �$×Ã��ªs× Ë ÄA¨ � « �  «$v 6� & í+Sqí ñ @îí ^ ø 9\[-Ñ
(!OWXr@JQ�I � @-QA@JVLKMIdNeQAl tumAQAHDILNPORQê^qORV � ORX/@��ìORX/XrORQ��ÎNPkÕIdVdNefnmnILNPORQAk�ÎNek¾ILVdNPfAmnIdNeORQ X/lW^7ì@-VÕQAORmAgPgeN;SqE\[ 9N> � @1S¾Àw?}9G[7ìNeQnORXrNeKWg»SUQ"# E\[ S:9 > � @ÛS¾Àw?}9G[Õ[ ò��ORNPkdkÕORQ S��£[ >�� � B�� ïuñ �*¼ORVÕX`KMg;SUÝ�# � [ @âCFE��GÝ � @�� � � �^��� KMXrXrKwS���# ��[ ���� ï ���� ^qOWV � & �
���"! ���!#]"$9;#¼* ��"��ÀRp � mAEnE£ORkÕ@äx_@äEAgüKZèåKil�KWX/@ëx]?A@-VÕ@äx_@äkÕIÃKMVÕIÇx]NPIL?�#ëaAOWgegüKMVdk-p�(ÎQr@-KWHÃ?`EAgeKZè:OW^IL?n@Yl�KWX/@]èRORmê@JNPId?A@-VÇaAORmAfnge@YOWV_?uKMgP^sèWORmAV_XrORQn@Jè #�x]NPId?ê@�+�muKMg£EAVÕORfuKWfnNegeNvIçèRp�Ú?uKMIäNPkëèRORmAV]@âCnEG@-HDIL@Ja!^qORV¾ILmAQA@)KM^¶Id@-V&í6ILVÕNüKWgPkP{àFp � ?AO�xcIL?AKMI � Sqh¡[㯠� NP^$KWQAa�OWQAgPèTNP^¼IL?A@JVd@!NekêK®H-ORQnkÕIÃKMQ�I$#ãkÕmAHÃ?÷IL?AKMI� Sqh�¯%#J[;¯ ÀRp
O p�u�@JI¼h ñDø ßZßZß ø h ò 6 L¼QANv^qORVdXãS � ø ÀÁ[]KWQAaúge@JIv ò ¯XrK�C_Z�h ñDø ßZßZß ø h ò `�pt»NPQAaÎSzv ò [âp
(�à ������� ��������������������������
¿Ap��3EuKWVÕIdNeHJge@úk¾IÃKWV¾ILkîK�IêIL?A@úORVdNPlRNeQTOW^iIL?A@úVd@ZKMgYgeNPQA@!KWQAa X/OÁbW@-kwKWgPORQAl®IL?A@gPNeQA@iNeQ��ÕmAX/EAk¼OM^+ORQA@�mAQANPI-p�t\OWV¼@-KWHÃ?��ÕmAX/E!Id?A@:EAVÕORfuKWfANPgeNvIçèêNek�9�Id?uKMIäIL?A@EuKMVÕILNPH-gP@¼x]NegPg��ÕmnXrEêOWQA@$mAQANvIìILO:Id?A@$ge@D^¶I&KWQnawId?A@ÎEAVÕORfuKWfnNegeNvIçè/NekYÀT? 9îId?uKMIId?A@iEuKWVÕIdNeHJge@Îx]NegPg��ÕmAX/EwORQA@)mAQANvI&ILOåIL?A@iVdNPlR?�IZp�u�@JI]h ò fG@)Id?A@iE£ORkÕNPIdNeORQîOW^Id?A@4EuKMVÕILNPH-gP@:KM^¶Id@-VYí�mAQANvILk-pt»NeQAaã$S¶h ò [äKWQAa � Sqh ò [DpåSq>&?ANekÎNPkYé�QAO�x]Q¡KWkKBVLKMQAaAORX³xìKWgPéGp [ Fp�� ^UKWNeV)H-ORNPQ Nek)ILORkÕkd@JaTmnQ�IdNeg_Kú?A@-KWa�Nek�ORfnIÃKMNeQA@Ja�po�Ú?uKMI:Nek�Id?A@ê@âCnEG@-HDIL@JaQ�mAX:fG@-V]OW^oILOWkdkd@JkäId?uKMI&x]NPgegsfG@iVd@,+�mnNeVd@Ja_{ý p �+VdO�bW@�>&?A@JORVd@JX ¿Ap ý ^qOWV&aANekÕH-Vd@DIL@iVLKMQAaAORX³bMKWVÕNüKWfnge@-kJp� p�u�@JIåh f£@!K¡H-ORQ�ILNPQ�mAORmAk/VLKWQnaAORX bMKWVÕNüKWfAgP@êx]NPId?���ö �6µ/p � mAEnE£ORkÕ@!Id?uKMI� Sqh � � [ί À`KMQAa®Id?uKMIÎ$Sqh®[Y@âCFNekÕIdk-p � ?AO�x�Id?uKMI$ÎSqh®[Y¯ ½ f; 9_S¶h �²£[¾´�²ðpyäNeQ�IZû �ìORQAkÕNeaA@JV�NPQ�Id@-lRVdKMILNPQAlãf�è¡EAKWVÕIdk-pî>&?A@/^qORgegPOÁx]NPQAlê^UKWHDI�NPk�?n@-geEF^qmAgÊû)NP^ÎSqh¡[ì@DCFNek¾ILk]IL?A@JQ!gPNeX ¹�� f ² *�Àw? µrSq²£[ ,£¯ � p
(np �+VdO�bW@�>&?A@JORVd@JX ¿ApvÀ ý pþ p�S � ��-0/���~¥z � )+*�/Wz �U} -¼zM|�~ p [[u�@JIåh ñâø h ^ ø ßZß-ß ø h ò fG@�S � ø ÀÁ[:VLKWQAanORX bMKWVdN�æKWfnge@-k�KWQAaige@DI h ò ¯�íðïuñ ¸ òó�ô ñ h ó p#�+gPOWI h ò bW@-VÕkdmAkoí4^qORV�í!¯À ø ßZßZß ø À � ø ��� � p�]@-EG@ZKMI&^qOWVëh ñÃø h ^ ø ßZßZß ø h ò 6 �ëKWmAHÃ?�èðp �»CFEAgüKMNeQîx]?�èêIL?A@JVd@iNek&kÕmAHÃ?�KBaANv^üæ^q@JVd@-QnH-@WpÀ � p�u�@JI&hK6� S � ø ÀÁ[ëKMQAa!gP@JI�v¯?> Þ p�t»NeQnaîÎS�vB[ëKWQAa � S�v:[âpÀRÀRp�S � ��-0/���~¥z �g)+*�/Wz �U} -¼zM|�~� ��\} - �.1 ,�~L}á|��å~��Áz��F~¥� &�� � , ���ÃzJ~ p [ u�@JI�v ñâø v ^ ø ßZßZßnf£@NPQAaA@-EG@-QnaA@-Q�I�VLKWQAanORXÉbMKWVdNeKWfAge@Jk$kdmAHÃ?�IL?uKMI � Szv ó ¯3ÀÁ[$¯ � Szv ó ¯ ?)ÀÁ[i¯ÀIHWàFp�u�@JIäh ò ¯ ¸ òó�ô ñ v ó p&>&?ANPQAéwOW^Sv ó ¯ À:KMk �¾Id?A@�k¾ILO�HÃéwEAVdNPH-@)NeQAHJVd@ZKMkd@-af�è6ORQn@raAOWgegüKMV <#Av ó ¯ ?�ÀrKWk��¾IL?n@/k¾ILO�HÃé6EAVdNPH-@åaA@JH-Vd@-KWkd@Ja f�è¡ORQA@åaAORgPgüKWV KWQnawh ò KWk&Id?A@$bWKMgemA@$OW^oIL?n@)k¾ILO�HÃéêORQãauKZèîí»pSUK�[st»NeQna`iS¶h ò [ëKWQAa � Sqh ò [âpSqf\[ � NeX:mAgeKMIL@&h ò KWQAaêEAgPOWI;h ò bW@-VdkÕmAk&í!^qOWV_í!¯ À ø à ø ßZßZß ø À � ø � � � p��]@-EG@ZKMIId?A@)x]?nORge@)kdNPX:mAgüK�ILNeOWQãkÕ@JbW@-VLKMg�ILNPXr@Jk-p *äOWILNPH-@iIçxìO`Id?ANeQnlRk-pwt»NPVdk¾I #\NPIc! kä@ZKWk¾èIdO��ÕkÕ@-@ úEuKMIÕIL@JVdQAk�NeQ6IL?A@rkd@,+�mn@-QAHJ@ê@JbW@-Q�Id?AORmAlR?�NPIiNPk�VdKWQAaAORXãp � @-HJORQAa"#
�����]����� ����� � � ��� � ( OèROWm�x]NegPg(tuQAa�Id?uKMI)IL?A@/^qORmAViVdmAQnk4geO�ORé�bW@-VÕè®aAN��s@JVd@JQ�I4@JbW@-Q IL?nORmAlR?®IL?A@Dèx_@-Vd@ålR@JQA@-VdKMIL@Ja6IL?A@BkdKWXr@�xëKZèRp y¼O�x aAOîIL?A@BH-KWgeHJmAgüKMIdNeORQnk¼NPQTSÓK�[¼@âCFEAgüKWNPQIL?n@)kÕ@-HJORQAaúOWfAkd@JVÕbMKMILNPORQ_{ÀÁàFp �+VdO�bR@åIL?n@/^qORVÕX:mAgeKWkÎlRNvbR@-Q®NeQ6Id?A@BIÃKWfAgP@åKMIiIL?A@åfG@-lRNPQAQANPQAlîOW^ � @-HJIdNeORQ¡¿Ap ¿^qORV4Id?A@ 7ì@JVdQAORmngegeNz# �oORNekÕkdORQ"#�LäQANP^qORVÕX # �(CFEGORQA@JQ�IdNüKWgz# � KMXrXrKúKWQna 7_@JIÃKFp
y¼@JVd@`KWVd@rkdORX/@`?ANPQ�ILk-p tuORV$IL?A@rXr@-KWQ OW^ìIL?A@ ��ORNPkdkdOWQ"#(mAkÕ@`IL?A@/^UKWHJI)IL?AKMI>��ä¯ ¸ f¹Dô<; � ¹ H�²��vp»>oOYHJORX/EAmnIL@ÇIL?A@ÇbMKWVdNeKWQAH-@ #�tAVdkÕI»H-ORX/EAmnId@ÇÎSqh Sqh ?rÀÁ[d[âptuORV:IL?n@îXr@-KWQ OW^¼Id?A@ � KMXrXrK�#+NPI4x]Negeg_?A@-gPETIdO¡X:mAgPIdNeEAgvè�KWQAa aANPb�NPaA@îf�è�ÇS��F@�ÀÁ[|H� E£ñ KWQAaîmnkd@$IL?A@Î^UKMHJIëId?uKMI]K � KWXrXrK�an@-QAkÕNPIçèêNPQ�IL@-lWVLKMId@-këILOîÀWptuORVëIL?n@ 7_@JIÃK #uX:mAgPIdNeEAgvèrKWQAawaANPb�NPaA@$f�è��_S�� @�ÀÁ[��ÇS���[|H��ÇS�� @�� @�ÀÁ[âp
À O p � mAEnE£ORkÕ@:xì@BlW@-QA@JVLKMId@BK`VLKMQAaAORXcbMKWVdNeKWfAge@��NPQúIL?A@�^qORgegPOÁx]NPQAlåxëKZèRp0t»NPVdk¾Ix_@�uNPE�K6^UKWNPVBH-ORNPQ�p * ^YIL?A@wH-OWNeQTNekå?A@ZKMaAkc#ìILKWéW@�� ILO®?uKZbR@!K LäQANP^ÕS � #PÀ�[aANPkÕILVÕNefAmFILNeOWQ�p�* ^£Id?A@ëH-OWNeQBNek�IÃKWNPgekc#�IÃKWéM@��÷ILO$?AK�bW@äK L¼QANv^ÕS O # ¿�[oanNekÕIdVdNPfAmnILNPORQ�pSÓKR[st»NeQAaîId?A@iXr@-KWQwOW^ �4pSUfu[wt�NeQAaêId?A@ikÕIÃKMQAauKWVÕa!aA@Db�NüKMIdNeORQwOW^��:pÀZ¿Apwu�@JIäh ñâø ßZßZß ø h��ÚKWQAaov ñÃø ßZßZß ø v ò f£@)VLKWQnaAORXÜbMKWVdNeKWfAgP@-k]KWQAaãge@DI � ñDø ßZßZß ø ���KWQAa�# ñDø ßZßZß ø # ò fG@)HJORQAk¾IÃKWQ�ILkJp � ?AO�xÛIL?AKMI� � � � �� ó�ô ñ � ó h ó ø ò�
�Õô ñ # � v � � ¯ �� ó�ô ñ ò��dô ñ � ó # �
� � � Sqh ó ø v � [âßÀ Fpwu�@JI ºÁÞ�� ~ÇS¶² ø �n[+¯�� ñR Sq²G@ �A[ � H÷²6HÀ ø � H � H�à
� OMIL?A@JVÕx]NekÕ@Wßt»NPQAa � SÓà�h ? O v5@&(R[Dp
À ý pwu�@JIäV�SüCA[&fG@iKB^qmAQAHJIdNeORQwOW^ðCãKWQnaãgP@JI]kZSqèn[]fG@)K:^qmAQAHJIdNeORQwOW^oèRp � ?nOÁx1IL?uK�IÎS�xAS¶h®[���Szv4[ % h®[+¯JxnSqh®[ ÎS���Szv4[ % h¡[Dß
�YgPkdO<#nkd?AO�x±Id?uKMIë$S�xnSqh®[ % h®[(¯5xAS¶h®[Dp
(W¿ ������� ��������������������������
À � p �+VdO�bW@)Id?uKMI� S�vB[+¯Ú � Szv % h®[A@ � $Szv % h®[âß
yäNeQ�IZû u�@JI�� ¯ $S�v:[(KWQna/gP@JI�#�S¶²£[_¯÷iS�v % h�¯�²£[âp *¼OMIL@ëIL?AKMI(ÎS #�Sqh®[Õ[;¯;ÎSzv % h®[Y¯�ÎSzv4[ί��6ß"7ì@ZKWViNPQ6XrNPQAa�IL?uK�I�#4NekiKê^qmAQAHDILNPORQ¡OW^_²ðp *äO�xx]VÕNPIL@ � S�v:[;¯ $Szv ?��ã[k^ë¯ÚÎSÕSzv ? #MS¶h®[d[p@�S #�Sqh®["?��ã[Õ[N^�ß �(CFEuKMQAaîIL?A@k +�mAKWVd@ìKWQAa4ILKWéW@ÇIL?A@_@DCFE£@JHJILKMILNPORQ�p ,ÇORm4IL?n@-Q:?uKZbW@_ILOäIÃKWéM@ìIL?n@_@âCnEG@-HDIÃKMIdNeORQOW^;Id?AVd@J@:IL@-VÕXrkJp *çQ¡@-KWHÃ?¡HZKWkÕ@�#�mAkÕ@åId?A@BVdmnge@:OW^+Id?A@BNPId@-VdKMIL@Ja6@DCFE£@JHJILKMILNPORQ�ûNÓp @Mp»$SqkÕIdm �Ç[;¯ÚÎS¶ÎSUk¾ILm � % h®[d[âßÀ�(np � ?nOÁxIL?AKMI$NP^+ÎSqh % v�¯ �n[]¯ #)^qORVÎkÕORX/@BH-ORQAk¾IÃKWQ�I�#�Id?A@-Q6h KWQna v KMVd@mAQnH-ORVÕVd@-geKMIL@Ja�pÀ þ p]>&?ANPk#+�mA@-k¾ILNPORQ¡NekYIdOî?A@-gPE�èRORm¡mAQAan@-Vdk¾IÃKWQna®Id?A@BNeaA@-KêOW^_K -�u����� �Ó����� �U ��������»�����q¢G� � u�@DI$h ñÃø ßZßZß ø h ò fG@/õÓõ¶ö x]NvIL?6X/@ZKWQ¡Ý KWQAa�bMKWVÕNüKWQnH-@ � ^ p u�@JIh ò ¯íðïuñ ¸1òó�ôñ h ó p¼>&?A@-Q h ò NPkÎK J�Z�n���U J���U� #AIL?uKMIYNek #�Kr^qmAQnHJILNPORQúOM^+IL?A@auK�IÃKnp � NeQAHJ@ h ò Nek&K4VdKWQAaAOWX bMKWVÕNüKWfAgP@�#�NPIì?uKWk&K:anNekÕIdVdNPfAmnILNPORQ�po>&?ANekìaANek¾ILVdN�æfAmFILNeOWQ�Nek¼HZKWgPge@-a!IL?A@`§ Â Å Ö Ë ¦ ª�Æ ¨W¦e§D« � ¦ ¬ Ä�«U¦ ��ªT�¥¤ « �u� §D«  «U¦e§D«Ó¦ × Ñ �ä@JHZKWgPg�^qVdOWX>&?A@JORVd@JXc¿ApPÀ ý IL?AKMIäÎS h ò [ì¯ Ý�KWQAa � S h ò [_¯ � ^ HMí»p �ÎORQ$! I¼H-ORQF^qmAkd@)IL?A@aANPkÕIdVdNefnmnILNPORQ:OW^\IL?n@&auKMIÃK)º�Þ KWQAaBIL?n@&aANek¾ILVdNPfAmnIdNeORQ:OM^£Id?A@&kÕILKMILNPkÕILNPH]º Þ�� p�>�OXrKWéW@äIL?ANPkìHJge@ZKMVc#�ge@JI_h ñâø ßZßZß ø h ò 6ML¼QnNP^qORVÕXúS � ø ÀÁ[âp�u�@JI]ºÁÞ f£@YId?A@ÎaA@JQAkdNvIçèOW^nIL?A@�LäQANP^qOWVdXãS � ø ÀÁ[Dp��+geOWI(ºÁÞYp *äOÁxTgP@JI h ò ¯1íðïuñ ¸ òóÐô ñ h ó p�t»NPQAa�ÎS h ò [KWQna � S h ò [Dp �+geOMI»IL?n@-X KMk+KY^qmAQAHDILNeOWQ:OW^Gí»p �ìORXrX/@-Q�I-p *¼O�x kdNPX:mAgüK�IL@;IL?A@aANPkÕIdVdNefnmnILNPORQiOW^ h ò ^qORVoí!¯À ø ø à ø À � � p �ì?A@JHÃé)Id?uKMIoIL?A@_kdNPX:mAgüK�IL@-aibMKWgemn@-kOW^ÇiS h ò [)KWQna � S h ò [)KMlRVd@J@rx]NPId?�èWORmAV�Id?A@-OWVd@JIdNeH-KWg;HZKWgPH-mAgeKMILNPORQAk-p �Ú?uKMIaAOBèWORm!QnOWILNPH-@)KWfGORmnI&IL?n@)kdKWX/EAgeNPQAl:aANPkÕILVÕNefAmFILNeOWQîOW^ h ò KWk]í¡NeQAHJVd@ZKMkd@-k {à � p �+VdO�bW@ u�@-X/XrK:¿Apáà � p
� � ��� ����
� � ����� �������������
����� !#"%$'&)( !#* + ,�-/.1032546-/.7( 89*/.;:�</!#=?>A@B>C.1D
EGFIHCJLKNMPORQRSTQUHCVWMPXTHYKIVTH[Z\KIO6Z\]^X`_a]^KbFIcIQUFbd/JLKNMPFeSfQgSfQUHAVWSfhIMiS#jkQUd^heS`]PSfhbHCXmlnQRVTH�_oHhNMPXmcpST]'qC]^jkrIKbSTHPs�tuhIH[v3lnQUOROwMiOUVT]W_aH1KIVmHCcpQUF3SThIH;SfhbHC]^XxvY]PZyqA]^Fez^HCXmd^HCFbqCH{lnhIQUq|hQUVucIQRVTqAKIVTVmHCc/QRF3SfhIH{FIH~}�S�q|hNMPr�SfHCXAs��1KbXn�IXTVmS�QRFIHCJLKNMPORQRSGv�QUV���MPXm�P]�zo��V�QUFbHCJLKNMPORQRSGv^s
�����A�e���i���������A�'�i���|�i�L�� {¡£¢��C¤I¥¦�e§�¨ ©«ªL��¬®y¯[°)± ²[¯�³µ´w¶¦´N·G´w¯�¸e³P°�¹»ºP¯®¼m³P´o½¾¶¦¿º¦³P¼~¹�³e²CÀU¯)³P´o½kÁ[Â�ÃeÃN¶�Á�¯{°\Äb³P°�Å1Æ\±®ÇW¯mÈ^¹UÁ[°\ÁCÉ�Êy¶¦¼{³P´NË{ÌÎÍÐÏLÑ
Ò Æ\±ÓÍÔÌmÇÎÕ Å;Æ\±ÖÇÌ × Æ�Ø sRÙ Ç
Ú;ÛwÜ�ÜÞÝwß Å;Æ\±®Ç�àâá�ãäæåaç Æ å Çxè å àâáÞéäBåaç Æ å Çxè åaê á%ãé åaç Æ å Çxè å ë á�ãé åaç Æ å Çmè å/ëÌ á ãé ç Æ å Çxè å àìÌ Ò Æ»±�ÍíÌmÇ s�î
ï Ø
� ���������� �������������������������!
�n���C�e���¦�����#"Ô�%$Þ���'&Aª� ~���A�L� 7¨ ¢��A¤I¥¦�L§ ¨ ©�ªL��¬ 9¯[°�( àÅ{Æ»±®Ç ³P´o½�)+*1à-, Æ\±ÖÇCÉ. ÄN¯[´bÑ
Ò Æ%/ ±102(3/ ë ÌmÇuÕ)4*Ì * MPFIc Ò Æ5/76�/ ë98 ÇÎÕ
Ù8 *
Æ«Ø s;: Ç< ÄN¯[¼f¯=6 à Æ\±>0?(�ÇA@B)%ÉDC[´ Ãb³P¼~°�¹FE[Â�Àg³P¼fÑ Ò Æ%/G6�/�Í : Ç`Õ Ù @'H�³P´o½ Ò Æ%/G6�/�ÍI ÇÎÕ Ù @BJNÉ
Ú;ÛwÜ�ÜÞÝaß�K H{KIVmH#��MiXT�P]�zo� V�QRFIHCJLKNMPORQRSGv�Sf] qC]^FbqCOUKbcIH1SfhNMiS
Ò Æ%/ ±L0M(N/ ë ÌmÇ%à Ò Æ%/ ±L0M(N/ * ë Ì * ÇÎÕÅ;Æ\±10M(9ÇO*
Ì * à )4*Ì * ×
tuhIH{VTHAqC]^Fbc�rNMPXxSuZ\]^OUOR]�lnVu_Lv3VTHASmSfQRFId ÌBà 8 ) s6îP�QP�e�SRe§ �UTWVYX9Z Â�ÃeÃN¶�Á�¯ < ¯`° ¯|Á[°Î³WÃo¼f¯T½P¹FE[°�¹�¶¦´ ¿Y¯[°\ÄN¶C½¦Ñγ3´w¯[¾¼T³PÀ9´w¯[°\[?¶¦¼ ¯mÈL³P¿7ÃoÀU¯|Ѷ¦´ ³#Á�¯[° ¶�[N]�´w¯ < ° ¯|Á[°!ET³¦Á�¯|ÁCÉu9¯[°w±_^wà Ù ¹ [�°»ÄN¯ Ão¼|¯T½P¹FE[°G¶¦¼7¹UÁ < ¼f¶¦´¾¸ ³P´o½;±_^wà Ϲ [)°\ÄN¯nÃo¼f¯T½P¹FE[°G¶¦¼`¹UÁ`¼~¹g¸¦Ä¾° É . ÄN¯[´ ±a` àb]�cWdfe `^hg d ±_^n¹UÁ`°\ÄN¯ ¶P²[Á�¯[¼~ºP¯T½5¯[¼~¼f¶¦¼W¼m³P° ¯CÉi ³jE|Ä�±_^;¿k³PË ²[¯k¼f¯�¸e³P¼m½¾¯T½�³¦ÁY³lk;¯[¼~´w¶¦Â¾À»À£¹ < ¹»°»Ä®Â�´nm¦´w¶ < ´�¿Y¯T³P´poyÉ?q�¯ < ¶¦Â�Àg½À£¹#me¯'°G¶rm¦´w¶ < °\ÄN¯#°«¼~ÂN¯|Ñ1²C¾°nÂ�´nm¦´w¶ < ´�¯[¼~¼f¶¦¼'¼T³P°G¯soyÉ_C~´N°«Â¾¹»°«¹»ºP¯[À£Ë�Ñ < ¯Y¯mÈAÃN¯5E[°u°»Äb³P°±t`pÁ|ÄN¶¦Â�Àg½�²[¯uE[ÀU¶�Á�¯)°G¶�oyÉ�v`¶ < À£¹#me¯[À£Ëk¹UÁ ±t`/°G¶k´w¶¦°u²[¯ < ¹»°»Ä¾¹»´lw�¶�[xozy{q/¯1Äb³PºP¯°»Äb³P°�, Æ ±t`^Ç�à|, Æ\± d Ç}@B]4*6à�oyÆ Ù 0tooÇA@B]µ³P´o½
Ò Æ5/ ±10~o�/¾Í�w|ÇÎÕ ,kÆ ±t`eÇw * à oyÆ Ù 0tooÇ
]+w * Õ ÙH�]4w *
Á[¹»´4E|¯xoyÆ Ù 0~ooÇÎÕ d� [?¶¦¼`³PÀ»À�oyÉ�Êy¶¦¼zw6à × : ³P´o½S]�à Ù Ï^Ï °»ÄN¯W²[¶¦Â¾´o½�¹UÁWÉ����'�j�eÉ î
���%� � &�.�� + >?*t�t�mD 89*/.1:�</!W=?>A@�2��]¾H��acIQUFbdI��VnQRFIHCJLKNMPORQRSGv�QRVnVTQUjkQUOUMPX6QRF VmrIQUXmQRSuST]3��MiXT�P]�zo� V7QUFbHCJLKNMPORQRSGv�_IK�S7QgSnQUV
M{VThNMPXmraHAX6QRFIHCJLKNMPORQRSGvPs�K�H�rIXmHCVmHCFeS6SThIH7XmHCVTKbORS�hIHCXmH�QRFkSGl�]#rNMPXxSfVAsBtuhIH7rbXT]¾]PZ\V6MiXTHQUF�SThIH1SfHCq|hbFIQUqCMPO9MPrIroHCFIcbQg}as
�����n�����z������������� r������_�������� ï�������A�e���i��������9¯[°�� d�� ×C×?× � �W`�²[¯{¹»´o½¾¯GÃN¯[´o½¾¯[´N°Î¶P²[Á�¯[¼~º¦³P°«¹�¶¦´bÁ�Á[ E|Ä5°»Äb³P°Å;Æ��W^\Ç�à Ï ³P´o½���^yÕ��W^9Õ��5^GÉ�9¯[° wnÍÐÏNÉ . ÄN¯[´bÑf[?¶¦¼W³P´NË{ÌÎÍÔϾÑ
Ò � `�^hg d
�W^ ë w���Õ� c é�! `" ^hg d é$#&%('*) c,+ ).-$#0/21 × Æ�Ø s I Ç
�����A�e���i�����R�9¯[°y± d�� ×?×?× � ±�`4365 HCXmFI]^KIOROUQ ÆhooÇAÉ . ÄN¯[´bÑW[?¶¦¼)³P´NË=w�ÍÐÏLÑÒ87 / ±a`U0to�/LÍ|w�9`Õ : c * ` ! # Æ�Ø s HLÇ
< ÄN¯[¼f¯ ±t`)à ] cWd e `^hg d ±�^GÉ
P�QP�e�=Re§ � TWV�:µ9¯[°6± d�� ×?×C× � ±_`;3<5 HAXTFI]PKIOUORQ Æ ooÇCÉ®9¯[° ]µà Ù Ï^Ï ³P´o½ w'à × : É q/¯ÁC³ < °»Äb³P°>=�ÄN¯C²CË�Á|ÄN¯[º@? Á1˦¹�¯[Àg½¾¯T½
Ò Æ%/ ±t`U0~o /�Í�w|ÇÎÕ × Ï � : Ø ×A E5E|¶¦¼m½P¹»´¾¸�°G¶uv)¶�¯CB�½P¹»´¾¸D? Á{¹»´w¯�ECÂI³PÀ£¹»°�Ë�ÑÒ Æ%/ ±L0to�/LÍ × : Ç6Õ : c * d äGä %$F * -.# à × Ï^Ï^Ï ���
< ľ¹FE|Äp¹UÁ)¿'ÂWE|Ä3Á[¿k³PÀ»ÀU¯[¼1°\Äb³P´®É����'�j�eÉ î��]¾H��acIQUFIdb��V�QUFIHAJ¾KIMPOUQgSGv�d^QRzPHCV�KIV;MkVTQUjkrIORH{l6M?v ST]YqAXTH?M¦SfH#M�GIHKJML�NDO,JDGIOQPRJTSU OWVYX[Z]\�Z\]^X6M`_IQRFI]^jkQ�MPONrNMiXfMPjkHASTHCX o s�K H�lnQUOROocbQUVTqAKIVTV6qC]^Fb�IcIHCFIqAH1QUFeSfHAXmziMPORV�OUMiSfHAX
_IKbS�hIHAXTH{QUVÎSThIH{_NMPVTQRq{QUcIHCMbs�^ÞQg}`_ ÍµÏ MPFIcpOUH[Sw}``àba Ù
: ] OR]^d�c :_ed�f d / * ×5 vl��]�H �wcIQRFIdI� VuQUFIHAJLKNMPOUQgSGvIgÒ 7 / ±t`�0~o�/¾Í?w}` 9 Õ : c * ` ! #h à _ ×
ï^ï ���������� �������������������������!
��HAS�� à Æ ±t` 0�w � ±t` ê w|Ç sntuhIHAFTgaÒ Æ � @� ooÇÎà Ò Æ5/ ±a` 0 o /NÍ wfÇ�Õ _ s���HCFIqAH gÒ Æho � � Ç ë Ù 0 _ goSThNMiS7QUV g SfhbH#XfMPFbcI]^j QUFeSfHAXmziMPO�� STXfMPrbV�SfhbHWSfXmKIHWrNMiXfMPjkHASTHCXziMPOUKbH o lnQRSfh rIXT]P_NMP_IQROUQRSGv�Ù 0 _��ol�HWqCMPOUO�� M Ù 0 _íqA]^Fb�NcIHAFIqCH'QRFeSfHCXxziMPO�sn�5]^XTH]^F�SfhbQUVuO�M¦SfHCXAs
���� ,�!W< 46-32� �p46-�� !#"�� !W* + ��.1* DÞ.1* 89*/.1:�</!#=?>[@�>C.1DtuhIQUV1VmHCq[SfQU]PF®qA]^FeS|MPQRFIV;SGl�] QRFIHCJLKNMPORQRSTQUHCV�]^F H[}�roHCq[SfHCc®ziMPOUKIHAV;SfhNM¦S)MPXTH']PZ»STHCF
KIVmHAZ\KIO«s�n���C�e���¦�������í�%$��e¥��i��ª������i�����i���W¨ ¢��C¤I¥¦�e§�¨ ©«ªL��¬ C [B± ³P´o½>�æÄb³PºP¯ �%´N¹»° ¯�º¦³P¼~¹»·³P´4E|¯|Á{°»ÄN¯[´
Å2/ ±8� /¾Õ � Å1Æ\± * ÇGÅ;Æ�� * Ç × Æ«Ø s ØPÇ
!�HAq?MPOROwSfhIMiS�M Z\KbFIqASTQU]^F#"3QRV�GIH JKX O%$�QRZ9Z\]^XuHCMPq|h å �'& MPFIc5H?Miq|h _ �)( Ï � Ù+*�g" Æ _ åWê Æ Ù 0 _ Ç & Ç�Õ _ " Æ å Ç ê Æ Ù 0 _ Ç " Æ & Ç ×
E Z,"'QRV SGlnQRqCH�cIQ �aHCXmHCFeSfQUMP_IORH geSfhIHAF3qC]PFLzPH[}�QRSGv�XTHAcIKIqAHCV6ST]Wq|hIHAq|�¾QRFIdWSThNMiS�".- - Æ å Ç ë ÏZ\]^XnMPORO å s%E S�q?MPF5_oH)VmhI]�lnF SThNMiSnQRZ�"�QUVnqC]^FezPH[}pSfhbHCF�QgSuOUQRHCVnMP_o]�z^H)MPFev�ORQUFIH;SThNMiSSf]PKIq|hIHCV/"ÖMiS)Vm]^j�H'ro]^QRFLS g9q?MPOROUHCc®M�S|MPFbd^HCFeS`OUQUFbHPs10 Z\KIFIqASTQU]^F2" QRV4GIH J GIZ�X O QRZ0 " QUV1qA]^Fez^H[}as43Þ}bMPjkrIOUHAV)]iZÎqC]PFLzPH[}®Z\KIFIq[SfQR]^FIV)MPXTH1" Æ å Ç1à å * MPFbc5" Æ å Ç;à +6 s3Þ}bMPj�rbOUHCVu]iZÞqA]^FIq?M?zPH`Z\KIFIq[SfQR]^FIVnMPXmH/" Æ å Ç�à-0 å * MPFIc#" Æ å Ç%à OR]^d å s
�n���C�e���¦�����87Ô�:9��i¢� x�i¢L� {¡ ¢��A¤I¥¦�L§ ¨ ©�ªL�»¬ C [ " ¹UÁuE|¶¦´NºP¯mÈk°\ÄN¯[´Å " Æ\±ÖÇ ë " Æ�Å ±®Ç × Æ«Ø s � Ç
C [ " ¹UÁ E|¶¦´4ET³PºP¯{°»ÄN¯[´Å " Æ\±ÖÇ6Õ " Æ�Å ±®Ç × Æ«Ø s � Ç
Ú;ÛwÜ�ÜÞÝaß;��H[S;< Æ å Ç à � ê � å _aH1M'ORQUFIH g�S|MiFId^HCFeSuST]=" Æ å Ç MiSÎSThIH{ra]PQUFeS Å1Æ\±®Ç s
�����j� ��������� � ���r��!����� ������� �� �4��� � � ��s������������`� ������_�������� �ï J� QUFbqCH "3QRVuqC]^Fez^H~}KgaQgSuOUQRHCVnMP_o]�z^H1SfhIH{ORQUFIH < Æ å Ç s � ][g
Å " Æ»±®Ç ë Å < Æ\±®Ç%àìÅ;Æ�� ê �f±®Ç%à � ê �TÅ{Æ»±®Ç�à < Æ»Å1Æ\±ÖÇTÇ�à " Æ�Å�±ÖÇ × î^ Xm]^j �PHCFIVmHCF�� VÖQUFIHAJLKNMPOUQgSGvÐl6H�VmHCH�SThNMiS Å ± * ë Æ»Å�±®ÇO* MPFIc Å;Æ Ù @�±®Ç ëÙ @�Å;Æ\±ÖÇ s � QRFIqCH'OR]^dpQUV;qA]^FIqCM�zPH g Å{Æ OU]^d ±ÖÇ)Õ OR]^d Å;Æ\±®Ç s ^N]^X;H[}bMPjkrIORH g�VTKbrIra]PVTHSfhNM¦S ± 3� Æ I � Ù Ç s�tuhIHAF Å;Æ Ù @�±®Ç ë Ù @ I s
���� �í.;46- * >?4n!#=� ���� .1* + >�������"�&�&���&��a��&�.��+ >?*t�t�mD89*/.1:�</!W=?>A@�2
K H;lnQROUOwj�MP�iH;KIVTH{]PZ9SThIH1H[}bMPq[SuZ\]^XTj�]PZytÞM?v¾OU]^XA��VÎSThIHC]PXTHCj��BQgZ�"kQRVnM#VTjk]�]PSThZ\KIFIq[SfQU]PFTgISfhIHAF5SThIHCXmH`QRVnMkFLKIjW_aHAX�� � Æ�Ï ��� Ç VTKbq|h�SfhNM¦S " Æ � Ç�à " Æ«ÏeÇ ê � " -�Æ«ÏeÇ ê #* ""! ! Æ � Ç s
Ú;ÛwÜ�ÜÞÝ)]iZItuhIHA]^XTHAj Ø s H s�^N]^X9MPFev ÌÎÍµÏ g¦l�H6hIM�zPH g¦Z\XT]^j��MiXT�P]�zo� VÞQUFbHCJLKNMPORQRSGvIgSfhNM¦S
Ò � `�^hg d
� ^ ë w � à Ò � Ì `�^hg d
�W^ ë ÌOw � à Ò$# é�% h)'&)(+* ) ë é�!�,Õ c é�! Å # é-% h).&/(+* )0, à� c é�! " ^ Å;Æ� é * ) Ç × Æ«Ø s ï Ç
� QUFbqCH ��^1Õ � ^7Õ � ^ g�l6H3qCMPF�lnXTQgSfH � ^ MPV#M qC]^Fez^H~}�qA]^j#_IQRFNMiSfQR]^F®]PZ ��^ MPFIc �5^ gFNMPjkHCOgvIg � ^wà _ � ^ ê Æ Ù 0 _ Ç&��^ lnhbHCXTH _ à�Æ��W^50���^»ÇA@IÆR� ^50��^�Ç s � ] g^_ev)SThIH6qA]^Fez^H~}bQgSGv]PZ é21 l6H{hNM?z^H é * ) Õ �W^ 0 ��^� ^ 0 ��^ é�'*) ê � ^\0 � ^� ^\0 ��^ é + ) ×tÞMi�PH)H[}�raHAqASfMiSfQR]^FIVn]PZ�_a]PSThpVTQUcbHCV�MPFIcpKIVTH{SfhbH{Z�MiqASuSfhIMiS43 Æ��W^�Ç à Ï ST] d^HAS
3 é * ) Õ 0 ��^� ^\0 ��^ é�'*) ê � ^� ^ 0 ��^ é + ) à� 65 % - Æ«Ø s JeÇ
lnhIHCXmH � à ÌAÆ�� ^\0 ��^�Ç g�" Æ � Ç�à 0�7 � ê OR]^d Æ Ù 0$7 ê 7K Ç MPFIc 75à-0 ��^ @IÆR� ^ 0 ��^»Ç s
J^Ï ���������� �������������������������!
��]PSTH)SThNMiS;" Æ«ÏeÇ%à "%- Æ«ÏeÇ àâÏ s�07OUVm][g " ! ! Æ � ÇÎÕ Ù @BH Z\]^X�MPOUO � ÍìÏ s 5 vpt�M?v¾OU]^XA��VSfhbHC]^XmHCj gbSfhIHAXTH{QUV�M�� � Æ�Ï ��� Ç VTKbq|h SfhIMiS
" Æ � Ç à " Æ«ÏeÇ ê � " ! Æ�ÏeÇ ê � *: " ! ! Æ � Çà � *: " ! ! Æ � Ç6Õ � *ï à Ì * Æ�� ^\0 ��^»Ç *
ï ×��HAFIqCH g
Å é * ) Õ� 65 % - Õ� é$#�%('*) c,+ ).-$#0/21 ×tuhIH{XTHAVTKIOgSnZ\]^OUOR]�lnV�Z\XT]^j Æ�Ø s ï Ç s î
Ú;ÛwÜ�ÜÞÝÖ]PZÎtuhIHA]^XTHAj Ø s Ø s4�9H[S �W^6à Æ Ù @']�ÇCÆ»±_^�0 ooÇ s�tuhIHCF Å;ÆR� ^»Ç)à Ï MPFIc�µÕ<� ^`Õ<� lnhIHAXTH � à 0!o @B] MPFIc �pà Æ Ù 0MooÇA@B] s 07ORVT][g ÆR��0 �¾Ç * à Ù @B] * s07rbrIORv¾QUFbd#SfhIH{O�MiVmSntuhIHA]^XTHAj l6H)d^HAS
Ò Æ ±t`U0to5Í|wfÇ à Òc �^� ^yÍ�w d Õ� c é�! é # /&%(1 ` - ×
tuhIH Mi_a]�z^H�hI]^OUcbV�Z\]^XYMPFev Ì ÍÓÏ s EGFÐrNMPXxSfQUqAKIO�MiX g�SfMP�PH Ì à H�]+w MPFbcÔl6H dPHASÒ Æ ±a` 0 o�Í�w|ÇÎÕ� Bc * ` ! # s 5 v'M;VmQUjkQUOUMPXyMPXTd^Kbj�HAFLSÞl6Hnq?MiFkVmhI]�l SThNMiS�Ò Æ ±t` 0 o��0�w|ÇÎÕ� c * ` ! # s���KbSmSfQRFId'SfhIHAVTH{Sf]^d^H[SfhIHAXul6H)d^HASuÒ 7 / ±t`�0~o�/¾Í?w 9`Õ : c * ` ! # s�î���~� � >?0 =?>C&_�k"�!�� - >?4 � .�� !#"%$�D07F{H[}�qAHCOUORHCFeSyXTH[Z\HCXTHAFIqCH%]^F{rIXm]^_NMP_bQUOUQgSGv�QUFIHAJ¾KIMPOUQgSfQUHAV�MPFbc{SThIHCQRX�KIVTH%QRF1VmS|M¦SfQUVxSfQRqCV
MPFIc rNMiSmSfHCXmF�XmHCqC]Pd^FIQRSTQU]^FpQRV;H[z¾XT]�v^H g��;v� ]PXm� MiFIc ��KId^]PVTQ Æ Ù J�J � Ç s�tuhIHWrbXT]¾]PZB]PZ��]¾H��acIQRFIdI� VuQUFIHAJ¾KIMPOUQgSGvYQRVnZ\Xm]^j�SThNMiSuSTH[}¾S?s����� ��� 4n.1"%4�>?DÞ.1DÙ^s ��HAS ± 3 3Þ}�ro]^FIHCFeSTQ�MPO Æ���Ç s�^ÞQUFbc`Ò Æ%/ ± 0 (��=/ ë98 )��uÇ Z\]^X 8 Í Ù^s��6]Pj�rNMiXTHSThIQUVÎST] SThIH{_a]PKIFIcpv^]^K5d^HASnZ\Xm]^j��6hIHA_ev�VmhIHAzo� V1QUFIHAJLKNMPOUQgSGv^s
:�s ��HAS ± 3 �y]^QUVmVT]^F Æ��aÇ s���VTH��6hbHC_ev¾VThIH[za� V1QUFbHCJLKNMPORQRSGvYST]YVmhI]�lâSfhNM¦S�Ò Æ»± ë: �oÇ6Õ Ù @�� sI s ��HAS ± d�� ×?×?× � ±_`�365 HCXmFI]^KIOROUQ Æ ooÇ MPFbc ±t`)à ]�cWdWe `^ g d ±_^ s 5 ]^KbFIc#Ò Æ%/ ±a`�0o�/bÍ wfÇ KIVmQUFId �6hIHC_ev¾VThIH[zo��V)QUFIHAJLKNMPOUQgSGvpMPFIc�KIVTQRFIdr��]�H �wcIQRFIdI� V7QRFIHCJLKNMPORQRSGvPs
��������� �=���� U� �O n�! J Ù� hI]�l SThNMiS gylnhIHAF ] QUV{OUMPXTd^H g�SfhIHk_o]^KIFIc®Z\XT]^j ��]¾H��acIQUFIdb��V{QUFbHCJLKNMPORQRSGv/QRVVTj�MPOROUHCX�SfhNMPF�SThIH{_a]^KbFIc5Z\Xm]^j��6hbHC_ev¾VThIH[za� V1QUFbHCJLKNMPORQRSGv^s
H s ��HAS ± d�� ×?×C× � ±_` 365 HAXTFI]PKIOUORQ Æ ooÇ sÆ M Ç �9H[S _ ÍÐÏ _aH1�b}�HCc�MPFIcpcIH[�NFIH
w}``à Ù: ] OR]^d c :_edW×
��HAS��of``à9] cWd e `^hg d ±_^ s ;HA�NFIH�� `)à�Æ �of`U02w}` � �of` ê w}`^Ç s ��VTH!��]�H �wcIQRFIdI� VQUFbHCJLKNMPORQRSGvYST] VThI]�l SThNMiSÒ Æ � ` qC]PFLSfMPQUFbV ooÇ ë Ù 0 _ ×
K�HWq?MiOUO � ` M Ù 0 _ E|¶¦´���½¾¯[´4E|¯)¹»´N° ¯[¼~º¦³PÀ�[?¶¦¼�oyÉ EGF�rbXfMPq[SfQUqAH g l6H)SfXmKIFIq?M¦SfHSfhbH)QRFeSfHCXxziMPO�VT]kQRSncb]�HAV�Fb]PSnd^] _aHAOU]�l Ï ]^XnMi_a]�z^H Ù^sÆ _ Ç)Æ $����=Re¥�© �¦� P�QjRP�i��¨ ���i¢�© V�Ç ��HAS?� V�H[}bMPjkQUFIH1SfhbH`rIXm]^roHCXmSTQUHAV�]iZÞSfhbQUVnqC]^F����cIHAFIqCH)QUFeSfHAXmziMPO«s �9H[S _ à Ï × Ï^Ø MiFIc o3à Ï × H s �6]^FIcbKIqAS7MkVmQUj#KbO�MiSTQU]^F3VxSfKIc�vSf]�VTHAHWhI]�l ]PZ»STHCF5SfhIH)QUFeSfHAXmziMPO9qC]PFLSfMPQUFbV o�Æ q?MiOUOUHAc�SfhIH)qC]�z^HAXfMPdPH Ç s ;] SfhIQRVZ\]^XnziMPXTQR]^KIVuziMPOUKIHAV�]iZ ] _oHASGl�HCHAF�Ù#MPFbc Ù Ï^ÏPÏ^Ï s ��OU]iS�SThIH`qA]�z^HCXTMPd^HWzPHCXmVTKIV] sÆ q Ç ��OU]PSySfhbHÎOUHAFIdPSfhW]PZNSfhIH6QUFeSfHAXmziMPOLz^HCXmVTKIV ] s � KbrIra]PVTHÎl�H6lÎMiFLS�SThIH6ORHCFIdiSfh]PZySfhIH{QRFLSTHCXxzPMiOwSf]k_aH1Fb]�jk]^XTH;SThNMPF × ÏeØ s���]�l OUMPXTdPH;VThI]PKIOUc ] _aH��
This is page 89Printer: Opaque this
6Convergence of Random Variables
6.1 Introduction
The most important aspect of probability theory concerns
the behavior of sequences of random variables. This part of
probability is called large sample theory or limit theory
or asymptotic theory. This material is extremely important
for statistical inference. The basic question is this: what can we
say about the limiting behavior of a sequence of random vari-
ables X1, X2, X3, . . .? Since statistics and data mining are all
about gathering data, we will naturally be interested in what
happens as we gather more and more data.
In calculus we say that a sequence of real numbers xn con-
verges to a limit x if, for every ε > 0, |xn − x| < ε for all
large n. In probability, convergence is more subtle. Going back
to calculus for a moment, suppose that xn = x for all n. Then,
trivially, limn xn = x. Consider a probabilistic version of this
example. Suppose that X1, X2, . . . is a sequence of random vari-
ables which are independent and suppose each has a N(0, 1)
distribution. Since these all have the same distribution, we are
90 6. Convergence of Random Variables
tempted to say that Xn “converges” to X ∼ N(0, 1). But this
can’t quite be right since P(Xn = Z) = 0 for all n. (Two con-
tinuous random variables are equal with probability zero.)
Here is another example. Consider X1, X2, . . . where Xi ∼N(0, 1/n). Intuitively,Xn is very concentrated around 0 for large
n. But P (Xn = 0) = 0 for all n. This chapter develops appro-
priate methods of discussing convergence of random variables.
There are two main ideas in this chapter:
1. The law of large numbers says that sample average
Xn = n−1∑n
i=1Xi converges in probability to the ex-
pectation µ = E(X).
2. The central limit theorem says that sample average
has approximately a Normal distribution for large n. More
precisely,√n(Xn − µ) converges in distribution to a
Normal(0, σ2) distribution, where σ2 = V(X).
6.2 Types of Convergence
The two main types of convergence are defined as follows.
6.2 Types of Convergence 91
Definition 6.1 Let X1, X2, . . . be a sequence of random vari-
ables and let X be another random variable. Let Fn de-
note the cdf of Xn and let F denote the cdf of X.
1. Xn converges to X in probability, written XnP−→X,
if, for every ε > 0,
P(|Xn −X| > ε)→ 0 (6.1)
as n→∞.
2. Xn converges to X in distribution, written Xn à X,
if,
limn→∞
Fn(t) = F (t) (6.2)
at all t for which F is continuous.
There is another type of convergence which we introduce
mainly because it is useful for proving convergence in proba-
bility.
Definition 6.2 Xn converges to X in quadratic mean
(also called convergence in L2), written Xnqm−→X, if,
E(Xn −X)2 → 0 (6.3)
as n→∞.
If X is a point mass at c – that is P(X = c) = 1 – we write
Xnqm−→ c instead of X
qm−→X. Similarly, we write XnP−→ c and
Xn à c.
Example 6.3 Let Xn ∼ N(0, 1/n). Intuitively, Xn is concentrat-
ing at 0 so we would like to say that Xn à 0. Let’s see if this
is true. Let F be the distribution function for a point mass at
92 6. Convergence of Random Variables
0. Note that√nXn ∼ N(0, 1). Let Z denote a standard normal
random variable. For t < 0, Fn(t) = P(Xn < t) = P(√nXn <√
nt) = P(Z <√nt) → 0 since
√nt → −∞. For t > 0,
Fn(t) = P(Xn < t) = P(√nXn <
√nt) = P(Z <
√nt) → 1
since√nt → ∞. Hence, Fn(t) → F (t) for all t 6= 0 and so
Xn à 0. But notice that Fn(0) = 1/2 6= F (1/2) = 1 so con-
vergence fails at t = 0. But that doesn’t matter because t = 0 is
not a continuity point of F and the definition of convergence in
distribution only requires convergence at continuity points. ¥
The next theorem gives the relationship between the types of
convergence. The results are summarized in Figure 6.1.
Theorem 6.4 The following relationships hold:
(a) Xnqm−→X implies that Xn
P−→X.
(b) XnP−→X implies that Xn à X.
(c) If Xn à X and if P(X = c) = 1 for some real number c,
then XnP−→X.
In general, none of the reverse implications hold except the
special case in (c).
Proof. We start by proving (a). Suppose that Xnqm−→X. Fix
ε > 0. Then, using Chebyshev’s inequality,
P(|Xn −X| > ε) = P(|Xn −X|2 > ε2) ≤ E|Xn −X|2ε2
→ 0.
Proof of (b). This proof is a little more complicated. You may
skip if it you wish. Fix ε > 0 and let x be a continuity point of
F . Then
Fn(x) = P(Xn ≤ x) = P(Xn ≤ x,X ≤ x+ ε) + P(Xn ≤ x,X > x+ ε)
≤ P(X ≤ x+ ε) + P(|Xn −X| > ε)
= F (x+ ε) + P(|Xn −X| > ε).
Also,
F (x− ε) = P(X ≤ x− ε) = P(X ≤ x− ε,Xn ≤ x) + P(X ≤ x+ ε,Xn > x)
6.2 Types of Convergence 93
≤ Fn(x) + P(|Xn −X| > ε).
Hence,
F (x−ε)−P(|Xn−X| > ε) ≤ Fn(x) ≤ F (x+ε)+P(|Xn−X| > ε).
Take the limit as n→∞ to conclude that
F (x− ε) ≤ lim infn→∞
Fn(x) ≤ lim supn→∞
Fn(x) ≤ F (x+ ε).
This holds for all ε > 0. Take the limit as ε→ 0 and use the fact
that F is continuous at x and conclude that limn Fn(x) = F (x).
Proof of (c). Fix ε > 0. Then,
P(|Xn − c| > ε) = P(Xn < c− ε) + P(Xn > c+ ε)
≤ P(Xn ≤ c− ε) + P(Xn > c+ ε)
= Fn(c− ε) + 1− Fn(c+ ε)
→ F (c− ε) + 1− F (c+ ε)
= 0 + 1− 0 = 0.
Let us now show that the reverse implications do not hold.
Convergence in probability does not imply con-
vergence in quadratic mean. Let U ∼ Unif(0, 1) and let
Xn =√nI(0,1/n)(U). Then P(|Xn| > ε) = P(
√nI(0,1/n)(U) >
ε) = P(0 ≤ U < 1/n) = 1/n → 0. Hence, Then XnP−→ 0. But
E(X2n) = n
∫ 1/n0
du = 1 for all n so Xn does not converge in
quadratic mean.
Convergence in distribution does not imply con-
vergence in probability. Let X ∼ N(0, 1). Let Xn = −Xfor n = 1, 2, 3, . . .; hence Xn ∼ N(0, 1). Xn has the same distri-
bution function as X for all n so, trivially, limn Fn(x) = F (x)
for all x. Therefore, Xnd→ X. But P(|Xn−X| > ε) = P(|2X| >
ε) = P(|X| > ε/2) 6= 0. So Xn does not tend to X in probability.
¥
94 6. Convergence of Random Variables
quadratic mean probability distribution
point-mass distribution
FIGURE 6.1. Relationship between types of convergence.
Warning!One might conjecture that ifXnP−→ b then E(Xn)→
b. This is not true. Let Xn be a random variable defined byWe can concludethat E(Xn) → bif Xn is uniformlyintegrable. See thetechnical appendix.
P(Xn = n2) = 1/n and P(Xn = 0) = 1− (1/n). Now, P(|Xn| <ε) = P(Xn = 0) = 1 − (1/n) → 1. Hence, Z
P−→ 0. However,
E(Xn) = [n2×(1/n)]+[0×(1−(1/n))] = n. Thus, E(Xn)→∞.
Summary. Stare at Figure 6.1.
Some convergence properties are preserved under transforma-
tions.
Theorem 6.5 Let Xn, X, Yn, Y be random variables. Let g be a
continuous function.
(a) If XnP−→X and Yn
P−→Y , then Xn + YnP−→X + Y .
(b) If Xnqm−→X and Yn
qm−→Y , then Xn + Ynqm−→X + Y .
(c) If Xn à X and Yn à c, then Xn + Yn à X + c.
(d) If XnP−→X and Yn
P−→Y , then XnYnP−→XY .
(e) If Xn à X and Yn à c, then XnYn à cX.
(f) If XnP−→X then g(Xn)
P−→ g(X).
(g) If Xn à X then g(Xn)à g(X).
6.3 The Law of Large Numbers
Now we come to a crowning achievement in probability, the law
of large numbers. This theorem says that the mean of a large
sample is close to the mean of the distribution. For example,
the proportion of heads of a large number of tosses is expected
to be close to 1/2. We now make this more precise.
6.3 The Law of Large Numbers 95
Let X1, X2, . . . , be an iid sample and let µ = E(X1) and
σ2 = V(X1). Recall that the sample mean is defined as Xn = Note that µ =E(Xi) is the samefor all i so we candefine µ = E(Xi)for any i. By con-vention, we oftenwrite µ = E(X1).
n−1∑n
i=1Xi and that E(Xn) = µ and V(Xn) = σ2/n.
Theorem 6.6 (The Weak Law of Large Numbers (WLLN).)
If X1, . . . , Xn are iid , then XnP−→µ.
Interpretation of WLLN: The distribution of Xn be-
comes more concentrated around µ as n gets large.
Proof. Assume that σ < ∞. This is not necessary but it
simplifies the proof. Using Chebyshev’s inequality,
P(|Xn − µ| > ε
)≤ V(Xn)
ε2=
σ2
nε2
which tends to 0 as n→∞. ¥ There is a strongertheorem in the ap-pendix called thestrong law of largenumbers.
Example 6.7 Consider flipping a coin for which the probability
of heads is p. Let Xi denote the outcome of a single toss (0
or 1). Hence, p = P (Xi = 1) = E(Xi). The fraction of heads
after n tosses is Xn. According to the law of large numbers,
Xn converges to p in probability. This does not mean that Xn
will numerically equal p. It means that, when n is large, the
distribution of Xn is tightly concentrated around p. Suppose that
p = 1/2. How large should n be so that P (.4 ≤ Xn ≤ .6) ≥ .7?
First, E(Xn) = p = 1/2 and V(Xn) = σ2/n = p(1 − p)/n =
1/(4n). From Chebyshev’s inequality,
P(.4 ≤ Xn ≤ .6) = P(|Xn − µ| ≤ .1)
= 1− P(|Xn − µ| > .1)
≥ 1− 1
4n(.1)2= 1− 25
n.
The last expression will be larger than .7 if n = 84. ¥
96 6. Convergence of Random Variables
6.4 The Central Limit Theorem
Suppose that X1, . . . , Xn are iid with mean µ and variance σ.
The central limit theorem (CLT) says that Xn = n−1∑
iXi
has a distribution which is approximately Normal with mean µ
and variance σ2/n. This is remarkable since nothing is assumed
about the distribution of Xi, except the existence of the mean
and variance.
Theorem 6.8 (The Central Limit Theorem (CLT).) Let X1, . . . , Xn
be iid with mean µ and variance σ2. Let Xn = n−1∑n
i=1Xi.
Then
Zn ≡√n(Xn − µ)
σà Z
where Z ∼ N(0, 1). In other words,
limn→∞
P(Zn ≤ z) = Φ(z) =
∫ z
−∞
1√2πe−x
2/2dx.
Interpretation: Probability statements about Xn can
be approximated using a Normal distribution. It’s the
probability statements that we are approximating, not
the random variable itself.
In addition to Zn à N(0, 1), there are several forms of nota-
tion to denote the fact that the distribution of Zn is converging
to a Normal. They all mean the same thing. Here they are:
Zn ≈ N(0, 1)
Xn ≈ N
(µ,
σ2
n
)
Xn − µ ≈ N
(0,σ2
n
)
√n(Xn − µ) ≈ N
(0, σ2
)
6.4 The Central Limit Theorem 97
√n(Xn − µ)
σ≈ N(0, 1).
Example 6.9 Suppose that the number of errors per computer
program has a Poisson distribution with mean 5. We get 125 pro-
grams. Let X1, . . . , X125 be the number of errors in the programs.
We want to approximate P(X < 5.5). Let µ = E(X1) = λ = 5
and σ2 = V(X1) = λ = 5. Then,
P(X < 5.5) = P(√
n(Xn − µ)σ
<
√n(5.5− µ)
σ
)≈ P(Z < 2.5) = .9938.
The central limit theorem tells us that Zn =√n(X − µ)/σ
is approximately N(0,1). However, we rarely know σ. We can
estimate σ2 from X1, . . . , Xn by
S2n =
1
n− 1
n∑
i=1
(Xi −Xn)2.
This raises the following question: if we replace σ with Sn is the
central limit theorem still true? The answer is yes.
Theorem 6.10 Assume the same conditions as the CLT. Then,
√n(Xn − µ)
Snà N(0, 1).
You might wonder, how accurate the normal approximation
is. The answer is given in the Berry-Esseen theorem.
Theorem 6.11 (Berry-Esseen.) Suppose that E|X1|3 <∞. Then
supz|P(Zn ≤ z)− Φ(z)| ≤ 33
4
E|X1 − µ|3√nσ3
. (6.4)
There is also a multivariate version of the central limit theo-
rem.
98 6. Convergence of Random Variables
Theorem 6.12 (Multivariate central limit theorem) Let X1, . . . , Xn
be iid random vectors where
Xi =
X1i
X2i...Xki
with mean
µ =
µ1µ2...µk
=
E(X1i)E(X2i)
...E(Xki)
and variance matrix Σ. Let
X =
X1
X2...Xk
.
where X i = n−1∑n
i=1X1i. Then,
√n(X − µ)Ã N(0,Σ).
6.5 The Delta Method
If Yn has a limiting Normal distribution then the delta method
allows us to find the limiting distribution of g(Yn) where g is
any smooth function.
Theorem 6.13 (The Delta Method) Suppose that√n(Yn − µ)
σà N(0, 1)
and that g is a differentiable function such that g′(µ) 6= 0. Then√n(g(Yn)− g(µ))|g′(µ)|σ Ã N(0, 1).
6.5 The Delta Method 99
In other words,
Yn ≈ N
(µ,σ2
n
)=⇒ g(Yn) ≈ N
(g(µ), (g′(µ))2
σ2
n
).
Example 6.14 Let X1, . . . , Xn be iid with finite mean µ and fi-
nite variance σ2. By the central limit theorem,√n(Xn)/σ Ã
N(0, 1). Let Wn = eXn. Thus, Wn = g(Xn) where g(s) = es.
Since g′(s) = es, the delta method implies thatWn ≈ N(eµ, e2µσ2/n).
¥
There is also a multivariate version of the delta method.
Theorem 6.15 (The Multivariate Delta Method) Suppose that
Yn = (Yn1, . . . , Ynk) is a sequence of random vectors such that
√n(Yn − µ)Ã N(0,Σ).
Let g : Rk → R ane let
∇g(y) =
∂g∂y1...∂g∂yk
.
Let ∇µ denote ∇g(y) evaluated at y = µ and assume that the
elements of ∇µ are non-zero. Then
√n(g(Yn)− g(µ))Ã N
(0,∇T
µΣ∇µ
).
Example 6.16 Let
(X11
X21
),
(X12
X22
), . . . ,
(X1n
X2n
)
be iid random vectors with mean µ = (µ1, µ2)T and variance Σ.
Let
X1 =1
n
n∑
i=1
X1i, X2 =1
n
n∑
i=1
X2i
100 6. Convergence of Random Variables
and define Yn = X1X2. Thus, Yn = g(X1, X2) where g(s1, s2) =
s1s2. By the central limit theorem,
√n
(X1 − µ1X2 − µ2
)Ã N(0,Σ).
Now
∇g(s) =(
∂g∂s1∂g∂s2
)=
(s2s1
)
and so
∇TµΣ∇µ = (µ2 µ1)
(σ11 σ12σ12 σ22
)(µ2µ1
)= µ22σ11+2µ1µ2σ12+µ21σ22.
Therefore,
√n(X1X2 − µ1µ2)Ã N
(0, µ22σ11 + 2µ1µ2σ12 + µ21σ22
). ¥
6.6 Bibliographic Remarks
Convergence plays a central role in modern probability the-
ory. For more details, see Grimmet and Stirzaker (1982), Karr
(1993) and Billingsley (1979). Advanced convergence theory is
explained in great detail in van der Vaart and Wellner (1996)
and van der Vaart (1998).
6.7 Technical Appendix
6.7.1 Almost Sure and L1 Convergence
We say that Xn converges almost surely to X, written
Xnas−→X, if
P({s : Xn(s)→ X(s)}) = 1.
We say that Xn converges in L1 to X, written XnL1−→X, if
E|Xn −X| → 0
6.7 Technical Appendix 101
as n→∞.
Theorem 6.17 Let Xn and X be random vaiables. Then:
(a) Xnas−→X implies that Xn
P−→X.
(b) Xnqm−→X implies that Xn
L1−→X.
(c) XnL1−→X implies that Xn
P−→X.
The weak law of large numbers says that Xn converges to
EX1 in probability. The strong law asserts that this is also true
almost surely.
Theorem 6.18 (The strong law of large numbers.) Let X1, X2, . . .
be iid. If µ = E|X1| <∞ then Xnas−→µ.
A sequence Xn is asymptotically uniformly integrable if
limM→∞
lim supn→∞
E (|Xn|I(|Xn| > M)) = 0.
If XnP−→ b and Xn is asymptotically uniformly integrable, then
E(Xn)→ b.
6.7.2 Proof of the Central Limit Theorem
If X is a random variable, define its moment generating func-
tion (mgf) by ψX(t) = EetX . Assume in what follows that the
mgf is finite in a neighborhood around t = 0.
Lemma 6.19 Let Z1, Z2, . . . be a sequence of random variables.
Let ψn the mgf of Zn. Let Z be another random variable and
denote its mgf by ψ. If ψn(t) → ψ(t) for all t in some open
interval around 0, then Zn à Z.
proof of the central limit theorem. Let Yi = (Xi −µ)/σ. Then, Zn = n−1/2
∑i Yi. Let ψ(t) be the mgf of Yi. The
mgf of∑
i Yi is (ψ(t))n and mgf of Zn is [ψ(t/√n)]n ≡ ξn(t).
102 6. Convergence of Random Variables
Now ψ′(0) = E(Y1) = 0, ψ′′(0) = E(Y 21 ) = V ar(Y1) = 1. So,
ψ(t) = ψ(0) + tψ′(0) +t2
2!ψ′′(0) +
t3
3!ψ′′(0) + · · ·
= 1 + 0 +t2
2+t3
3!ψ′′(0) + · · ·
= 1 +t2
2+t3
3!ψ′′(0) + · · ·
Now,
ξn(t) =
[ψ
(t√n
)]n
=
[1 +
t2
2n+
t3
3!n3/2ψ′′(0) + · · ·
]n
=
[1 +
t2
2+ t3
3!n1/2ψ′′(0) + · · ·
n
]n
→ et2/2
which is the mgf of a N(0,1). The result follows from the previous
Theorem. In the last step we used the fact that, if an → a then(1 +
ann
)n→ ea.
6.8 Excercises
1. Let X1, . . . , Xn be iid with finite mean µ = E(X1) and
finite variance σ2 = V(X1). Let Xn be the sample mean
and let S2n be the sample variance.
(a) Show that E(S2n) = σ2.
(b) Show that S2n
P−→ σ2. Hint: Show that S2n = cnn
−1∑ni=1X
2i −
dnX2
n where cn → 1 and dn → 1. Apply the law of large
numbers to n−1∑n
i=1X2i and to Xn. Then use part (e) of
Theorem 6.5.
6.8 Excercises 103
2. Let X1, X2, . . . be a sequence of random variables. Show
that Xnqm−→ b if and only if
limn→∞
E(Xn) = b and limn→∞
V(Xn) = 0.
3. Let X1, . . . , Xn be iid and let µ = E(X1). Suppose that
the variance is finite. Show that Xnqm−→µ.
4. Let X1, X2, . . . be a sequence of random variables such
that
P(Xn =
1
n
)= 1− 1
n2and P (Xn = n) =
1
n2.
Does Xn converge in probability? Does Xn converge in
quadratic mean?
5. Let X1, . . . , Xn ∼ Bernoulli(p). Prove that
1
n
n∑
i=1
X2i
P−→ p and1
n
n∑
i=1
X2i
qm−→ p.
6. Suppose that the height of men has mean 68 inches and
standard deviation 4 inches. We draw 100 men at ran-
dom. Find (approximately) the probability that the aver-
age height of men in our sample will be at least 68 inches.
7. Let λn = 1/n for n = 1, 2, . . .. Let Xn ∼ Poisson(λn).
(a) Show that XnP−→ 0.
(b) Let Yn = nXn. Show that YnP−→ 0.
8. Suppose we have a computer program consisting of n =
100 pages of code. LetXi be the number of errors on the ith
page of code. Suppose that the X ′is are Poisson with mean
1 and that they are independent. Let Y =∑n
i=1Xi be the
total number of errors. Use the central limit theorem to
approximate P(Y < 90).
104 6. Convergence of Random Variables
9. Suppose that P(X = 1) = P(X = −1) = 1/2. Define
Xn =
{X with probability 1− 1
n
en with probability 1n.
Does Xn converge to X in probability? Does Xn converge
to X in distribution? Does E(X −Xn)2 converge to 0?
10. Let Z ∼ N(0, 1). Let t > 0.
(a) Show that, for any k > 0,
P(|Z| > t) ≤ E|Z|ktk
.
(b) (Mill’s inequality.) Show that
P(|Z| > t) ≤{2
π
}1/2e−t
2/2
t.
Hint. Note that P(|Z| > t) = 2P(Z > t). Now write out
what P(Z > t) means and note that x/t > 1 whenever
x > t.
11. Suppose that Xn ∼ N(0, 1/n) and let X be a random
variable with distribution F (x) = 0 if x < 0 and F (x) = 1
if x ≥ 0. Does Xn converge to X in probability? (Prove or
disprove). Does Xn converge to X in distribution? (Prove
or disprove).
12. Let X,X1, X2, X3, . . . be random variables that are posi-
tive and integer valued. Show that Xn à X if and only
if
limn→∞
P(Xn = k) = P(X = k)
for every integer k.
6.8 Excercises 105
13. Let Z1, Z2, . . . be i.i.d., random variables with density f .
Suppose that P(Zi > 0) = 1 and that λ = limx↓0 f(x) > 0.
Let
Xn = n min{Z1, . . . , Zn}.Show that Xn à Z where Z has an exponential distribu-
tion with mean 1/λ.
14. Let X1, . . . , Xn ∼ Uniform(0, 1). Let Yn = X2
n. Find the
limiting distribution of Yn.
15. Let (X11
X21
),
(X12
X22
), . . . ,
(X1n
X2n
)
be iid random vectors with mean µ = (µ1, µ2) and variance
Σ. Let
X1 =1
n
n∑
i=1
X1i, X2 =1
n
n∑
i=1
X2i
and define Yn = X1/X2. Find the limiting distribution of
Yn.
Part II
Statistical Inference
103
7
Models, Statistical Inference andLearning
7.1 Introduction
Statistical inference, or \learning" as it is called in computer
science, is the process of using data to infer the distribution
that generated the data. The basic statistical inference problem
is this:
We observe X1; : : : ; Xn � F . We want to infer (o r
estimate o r learn) F o r some feature of F such as its
mean.
7.2 Parametric and Nonparametric Models
A statistical model is a set of distributions (or a set of
densities) F. A parametric model is a set F that can be pa-
rameterized b y a �nite n umber of parameters. For example, if
we assume that the data come from a Normal distribution then
106 7. Models, Statistical Inference and Learning
the model is
F =
�f(x;�; �) =
1
�p�exp
��
1
2�2(x� �)2
�; � 2 R; � > 0
�:
(7.1)
This is a two-parameter model. We have written the density
as f(x;�; �) to show that x is a value of the random variable
whereas � and � are parameters. In general, a parametric model
takes the form
F =
�f(x; �) : � 2 �
�(7.2)
where � is an unknown parameter (or vector of parameters) that
can take values in the parameter space �. If � is a vector but
we are only interested in one component of �, we call the re-
maining parameters nuisance parameters. A nonparamet-
ric model is a set F that cannot be parameterized by a �nite
number of parameters. For example, FALL = fall cdf 0sg is non-parametric.The distinction
between parametric
and nonparametric
is more subtle than
this but we don't
need a rigorous
de�nition for our
purposes.
Example 7.1 (One-dimensional Parametric Estimation.) Let X1; : : : ; Xn
be independent Bernoulli(p) observations. The problem is to es-
timate the parameter p. �
Example 7.2 (Two-dimensional Parametric Estimation.) Suppose that
X1; : : : ; Xn � F and we assume that the pdf f 2 F where F is
given in (7.1). In this case there are two parameters, � and �.
The goal is to estimate the parameters from the data. If we are
only interested in estimating � then � is the parameter of inter-
est and � is a nuisance parameter. �
Example 7.3 (Nonparametric estimation of the cdf.) Let X1; : : : ; Xn
be independent observations from a cdf F . The problem is to es-
timate F assuming only that F 2 FALL = fall cdf 0sg. �
Example 7.4 (Nonparametric density estimation.) Let X1; : : : ; Xn
be independent observations from a cdf F and let f = F 0 be the
7.2 Parametric and Nonparametric Models 107
pdf . Suppose we want to estimate the pdf f . It is not pos-
sible to estimate f assuming only that F 2 FALL. We need to
assume some smoothness on f . For example, we might assume
that f 2 F = FDENSTFSOB where FDENS is the set of all proba-
bility density functions and
FSOB =
�f :
Z(f 00(x))2dx <1
�:
The class FSOB is called a Sobolev space; it is the set of func-
tions that are not \too wiggly." �
Example 7.5 (Nonparametric estimation of functionals.) Let X1; : : : ; Xn �F . Suppose we want to estimate � = E(X1) =
Rx dF (x) as-
suming only that � exists. The mean � may be thought of as a
function of F : we can write � = T (F ) =Rx dF (x). In general,
any function of F is called a statistical functional. Other
examples of functions are the variance T (F ) =Rx2dF (x) ��R
xdF (x)�2
and the median T (F ) = F�1=2. �
Example 7.6 (Regression, prediction and classi�cation.) Suppose we
observe pairs of data (X1; Y1); : : : (Xn; Yn). Perhaps Xi is the
blood pressure of subject i and Yi is how long they live. X is
called a predictor or regressor or feature or independent
variable. Y is called the outcome or the response variable
or the dependent variable. We call r(x) = E(Y jX = x) the
regression function. If we assume that f 2 F where F is �-
nite dimensional { the set of straight lines for example { then
we have a parametric regression model. If we assume that
f 2 F where F is not �nite dimensional then we have a para-
metric regression model. The goal of predicting Y for a new
patient based on their X value is called prediction. If Y is dis-
crete (for example, live or die) then prediction is instead called
classi�cation. If our goal is to estimate the functin f , then we
call this regression or curve estimation. Regression models
108 7. Models, Statistical Inference and Learning
are sometimes written as
Y = f(X) + � (7.3)
where E (�) = 0. We can always rewrite a regression model this
way. To see this, de�ne � = Y � f(X) and hence Y = Y +
f(X)�f(X) = f(X)+�. Moreover, E (�) = E E (�jX) = E (E (Y �f(X))jX) = E (E (Y jX)� f(X)) = E (f(X)� f(X)) = 0. �
What's Next? It is traditional in most introductory courses
to start with parametric inference. Instead, we will start with
nonparametric inference and then we will cover parametric in-
ference. In some respects, nonparametric inference is easier to
understand and is more useful than parametric inference.
Frequentists and Bayesians. There are many approaches
to statistical inference. The two dominant approaches are called
frequentist inference and Bayesian inference. We'll cover
both but we will start with frequentist inference. We'll postpone
a discussion of the pro's and con's of these two until later.
Some Notation. If F = ff(x; �) : � 2 �g is a parametric
model, we write P�(X 2 A) =RAf(x; �)dx and E �(r(X)) =R
r(x)f(x; �)dx. The subscript � indicates that the probability
or expectation is with respect to f(x; �); it does not mean we
are averaging over �. Similarly, we write V� for the variance.
7.3 Fundamental Concepts in Inference
Many inferential problems can be identi�ed as being one of
three types: estimation, con�dence sets or hypothesis testing.
We will treat all of these problems in detail in the rest of the
book. Here, we give a brief introduction to the ideas.
7.3.1 Point Estimation
7.3 Fundamental Concepts in Inference 109
Point estimation refers to providing a single \best guess"
of some quantity of interest. The quantity of interest could be a
parameter in a parametric model, a cdf F , a probability density
function f , a regression function r, or a prediction for a future
value Y of some random variable.
By convention, we denote a point estimate of � by b�.Remember that � is a �xed, unknown quantity. The es-
timate b� depends on the data so b� is a random variable.
More formally, let X1; : : : ; Xn be n iid data point from some
distribution F . A point estimator b�n of a parameter � is some
function of X1; : : : ; Xn:b�n = g(X1; : : : ; Xn):
We de�ne
bias (b�n) = E �(b�n)� � (7.4)
to be the bias of b�n. We say that b�n is unbiased if E (b�n) = �.
Unbiasedness used to receive much attention but these days it is
not considered very important; many of the estimators we will
use are biased. A point estimator b�n of a parameter � is con-
sistent if b�n P�! �. Consistency is a reasonable requirement for
estimators. The distribution of b�n is called the sampling dis-
tribution. The standard deviation of b�n is called the standarderror, denoted by se :
se = se (b�n) =qV(b�n): (7.5)
Often, it is not possible to compute the standard error but usu-
ally we can estimate the standard error. The estimated standard
error is denoted by bse .Example 7.7 Let X1; : : : ; Xn � Bernoulli(p) and let bpn = n�1
PiXi.
Then E (bpn) = n�1P
iE (Xi) = p so bpn is unbiased. The standard
110 7. Models, Statistical Inference and Learning
error is se =pV(bpn) = pp(1� p)=n. The estimated standard
error is bse =pbp(1� bp)=n. �
The quality of a point estimate is sometimes assessed by the
mean squared error, or MSE, de�ned by
MSE = E �(b�n � �)2:
Recall that E �(�) refers to expectation with respect to the dis-
tribution
f(x1; : : : ; xn; �) =
nYi=1
f(xi; �)
that generated the data. It does not mean we are averaging over
a density for �.
Theorem 7.8 The MSE can be written as
MSE = bias (b�n)2 + V�(b�n): (7.6)
Proof. Let �n = E�(b�n). ThenE �(b�n � �)2 = E �(b�n � �n + �n � �)2
= E �(b�n � �n)2 + 2(�n � �)2E �(b�n � �n) + E �(�n � �)2
= (�n � �)2 + E �(b�n � �n)2
= bias 2 + V(b�n): �Theorem 7.9 If bias ! 0 and se ! 0 as n ! 1 then b�n is
consistent, that is, b�n P�! �.
Proof. If bias ! 0 and se ! 0 then, by Theorem 7.8,
MSE ! 0. It follows that b�n qm�! �. (Recall de�nition 6.3.) The
result follows from part (b) of Theorem 6.4. �
Example 7.10 Returning to the coin ipping example, we have
that E p(bpn) = p so that bias = p�p = 0 and se =pp(1� p)=n!
0. Hence, bpn P�! p, that is, bpn is a consistent estimator. �
7.3 Fundamental Concepts in Inference 111
Many of the estimators we will encounter will turn out to
have, approximately, a Normal distribution.
De�nition 7.11 An estimator is asymptotically Nor-
mal if b�n � �
se N(0; 1): (7.7)
7.3.2 Con�dence Sets
A 1�� con�dence interval for a parameter � is an interval
Cn = (a; b) where a = a(X1; : : : ; Xn) and b = b(X1; : : : ; Xn) are
functions of the data such that
P�(� 2 Cn) � 1� �; for all � 2 �: (7.8)
In words, (a; b) traps � with probability 1 � �. We call 1 � �
the coverage of the con�dence interval. Commonly, people use
95 per cent con�dence intervals which corresponds to choosing
� = 0:05. Note: Cn is random and � is �xed! If � is a vector
then we use a con�dence set (such a sphere or an ellipse) instead
of an interval.
Warning! There is much confusion about how to interpret
a con�dence interval. A con�dence interval is not a probability
statement about � since � is a �xed quantity, not a random
variable. Some texts interpret con�dence intervals as follows: if
I repeat the experiment over and over, the interval will contain
the parameter 95 per cent of the time. This is correct but useless
since we rarely repeat the same experiment over and over. A
better interpretation is this:
On day 1, you collect data and construct a 95 per
cent con�dence interval for a parameter �1. On day
112 7. Models, Statistical Inference and Learning
2, you collect new data and construct a 95 per cent
con�dence interval for an unrelated parameter �2. On
day 3, you collect new data and construct a 95 per
cent con�dence interval for an unrelated parameter �3.
You continue this way constructing con�dence inter-
vals for a sequence of unrelated parameters �1; �2; : : :
Then 95 per cent your intervals will trap the true pa-
rameter value. There is no need to introduce the idea
of repeating the same experiment over and over.
Example 7.12 Every day, newspapers report opinion polls. For
example, they might say that \83 per cent of the population fa-
vor arming pilots with guns." Usually, you will see a statement
like \this poll is accurate to within 4 points 95 per cent of the
time." They are saying that 83 � 4 is a 95 per cent con�dence
interval for the true but unknown proportion p of people who
favor arming pilots with guns. If you form a con�dence inter-
val this way everyday for the rest of your life, 95 per cent of
your intervals will contain the true parameter. This is true even
though you are estimating a di�erent quantity (a di�erent poll
question) every day.
Later, we will discuss Bayesian methods in which we treat
� as if it were a random variable and we do make probability
statements about �. In particular, we will make statements like
\the probability that � is Cn, given the data, is 95 per cent."
However, these Bayesian intervals refer to degree-of-belief prob-
abilities. These Bayesian intervals will not, in general, trap the
parameter 95 per cent of the time.
Example 7.13 In the coin ipping setting, let Cn = (bpn��n; bpn+�n) where �
2n= log(2=�)=(2n). From Hoe�ding's inequality (5.4)
it follows that
P(p 2 Cn) � 1� �
for every p. Hence, Cn is a 1� � con�dence interval. �
7.3 Fundamental Concepts in Inference 113
As mentioned earlier, point estimators often have a limiting
Normal distribution, meaning that equation (7.7) holds, that
is, b�n � N(�; bse 2). In this case we can construct (approximate)
con�dence intervals as follows.
Theorem 7.14 (Normal-based Con�dence Interval.) Suppose thatb�n � N(�; bse 2). Let � be the cdf of a standard Normal and
let z�=2 = ��1(1 � (�=2)), that is, P(Z > z�=2) = �=2 and
P(�z�=2 < Z < z�=2) = 1� � where Z � N(0; 1). Let
Cn = (b�n � z�=2 bse ; b�n + z�=2 bse ): (7.9)
Then
P�(� 2 Cn)! 1� �: (7.10)
Proof. Let Zn = (b�n��)= bse . By assumption Zn Z where
Z � N(0; 1). Hence,
P�(� 2 Cn) = P�
�b�n � z�=2 bse < � < b�n + z�=2 bse�= P�
�z�=2 <
b�n � �bse < z�=2
!! P
��z�=2 < Z < z�=2
�= 1� �: �
For 95 per cent con�dence intervals, � = 0:05 and z�=2 =
1:96 � 2 leading to the approximate 95 per cent con�dence
interval b�n� 2 bse . We will discuss the construction of con�dence
intervals in more generality in the rest of the book.
Example 7.15 Let X1; : : : ; Xn � Bernoulli(p) and let bpn = n�1P
n
i=1Xi.
Then V(bpn) = n�2P
n
i=1 V(Xi) = n�2P
n
i=1 p(1�p) = n�2np(1�p) = p(1�p)=n. Hence, se =
pp(1� p)=n and bse =
pbpn(1� bpn)=n.By the Central Limit Theorem, bpn � N(p; bse2). Therefore, anapproximate 1� � con�dence interval is
bp� z�=2 bse = bp� z�=2
rbpn(1� bpn)n
:
114 7. Models, Statistical Inference and Learning
Compare this with the con�dence interval in the previous exam-
ple. The Normal-based interval is shorter but it only has approx-
imately (large sample) correct coverage. �
7.3.3 Hypothesis Testing
In hypothesis testing, we start with some default theory
{ called a null hypothesis { and we ask if the data provide
suÆcient evidence to reject the theory. If not we retain the null
hypothesis.The term \retaining
the null hypothesis"
is due to Chris Gen-
ovese. Other termi-
nology is \accepting
the null" or \failing
to reject the null."
Example 7.16 (Testing if a Coin is Fair) SupposeX1; : : : ; Xn � Bernoulli(p)
denote n independent coin ips. Suppose we want to test if the
coin is fair. Let H0 denote the hypothesis that the coin is fair
and let H1 denote the hypothesis that the coin is not fair. H0
is called the null hypothesis and H1 is called the alternative
hypothesis. We can write the hypotheses as
H0 : p = 1=2 versus H1 : p 6= 1=2:
It seems reasonable to reject H0 if T = jbpn � (1=2)j is large.
When we discuss hypothesis testing in detail, we will be more
precise about how large T should be to reject H0. �
7.4 Bibliographic Remarks
Statistical inference is covered in many texts. Elementary
texts include DeGroot and Schervish (2001) and Larsen and
Marx (1986). At the intermediate level I recommend Casella
and Berger (2002) and Bickel and Doksum (2001). At the ad-
vanced level, Lehmann and Casella (1998), Lehmann (1986) and
van der Vaart (1998).
7.5 Technical Appendix
7.5 Technical Appendix 115
Our de�nition of con�dence interval requires that P�(� 2Cn) � 1 � � for all � 2 �. An pointwise asymptotic con-
�dence interval requires that lim infn!1 P�(� 2 Cn) � 1�� for
all � 2 �. An uniform asymptotic con�dence interval requres
that lim infn!1 inf�2� P�(� 2 Cn) � 1 � �. The approximate
Normal-based interval is a pointwise asymptotic con�dence in-
terval. In general, it might not be a uniform asymptotic con�-
dence interval.
116 7. Models, Statistical Inference and Learning
8
Estimating the cdf and StatisticalFunctionals
The �rst inference problem we will consider is nonparametric
estimation ofthe cdf F and functions of the cdf.
8.1 The Empirical Distribution Function
Let X1; : : : ; Xn � F be iid where F is a distribution function
on the real line. We will estimate F with the empirical distribu-
tion function, which is de�ned as follo ws.
De�nition 8.1 The empirical distribution function bFnis the cdf that puts mass 1=n at e ach data point Xi. For-
mally,
bFn(x) =
Pn
i=1 I(Xi � x)
n
=n umber of observations less than or equal to x
n(8.1)
where
I(Xi � x) =
�1 if Xi � x
0 if Xi > x:
118 8. Estimating the cdf and Statistical Functionals
Example 8.2 (Nerve Data.) Cox and Lewis (1966) reported 799
waiting times between successive pulses along a nerve �bre. The
�rst plot in Figure 8.1 shows the a \toothpick plot" where each
toothpick shows the location of one data point. The second plot
shows that empirical cdf bFn. Suppose we want to estimate the
fraction of waiting times between .4 and .6 seconds. The estimate
is bFn(:6)� bFn(:4) = :93� :84 = :09. �
The following theorems give some properties of bFn(x).Theorem 8.3 At any �xed value of x,
E
� bFn(x)� = F (x) and V
� bFn(x)� =F (x)(1� F (x))
n:
Thus,
MSE =F (x)(1� F (x))
n! 0
and hence, bFn(x) P�!F (x).
Actually,
supxj bFn(x)�F (x)j
converges to 0
almost surely.
Theorem 8.4 (Glivenko-Cantelli Theorem.) Let X1; : : : ; Xn � F .
Then
supx
j bFn(x)� F (x)j P�! 0:
8.2 Statistical Functionals
A statistical functional T (F ) is any function of F . Examples
are the mean � =RxdF (x), the variance �2 =
R(x� �)2dF (x)
and the median m = F�1(1=2).
De�nition 8.5 The plug-in estimator of � = T (F ) is
de�ned by b�n = T ( bFn):In other words, just plug in bFn for the unknown F .
8.2 Statistical Functionals 119
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
time
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
0.0
0.4
0.8
time
cdf
FIGURE 8.1. Nerve data. The solid line in the middle is the empirical
distribution function. The lines above and below the middle line are
a 95 per cent con�dence band. The con�dence band is explained in
the appendix.
120 8. Estimating the cdf and Statistical Functionals
A functional of the formRr(x)dF (x) is called a linear func-
tional. Recall thatRr(x)dF (x) is de�ned to be
Rr(x)f(x)dx
in the continuous case andP
jr(xj)f(xj) in the discrete. The
empirical cdf bFn(x) is discrete, putting mass 1=n at each Xi.
Hence, if T (F ) =Rr(x)dF (x) is a linear functional then we
have:
The plug-in estimator for linear functional T (F ) =Rr(x)dF (x) is:
T ( bFn) = Z r(x)d bFn(x) = 1
n
nXi=1
r(Xi): (8.2)
Sometimes we can �nd the estimated standard error bse of
T ( bFn) by doing some calculations. However, in other cases it
is not obvious how to estimate the standard error. In the next
chapter, we will discuss a general method for �nding bse. Fornow, let us just assume that somehow we can �nd bse. In many
cases, it turns out that
T ( bFn) � N(T (F ); bse2):By equation (7.10), an approximate 1 � � con�dence interval
for T (F ) is then
T ( bFn)� z�=2 bse: (8.3)
We will call this theNormal-based interval. For a 95 per cent
con�dence interval, z�=2 = z:05=2 = 1:96 � 2 so the interval is
T ( bFn)� 2 bse:Example 8.6 (The mean.) Let � = T (F ) =
Rx dF (x). The plug-
in estimator is b� =Rx d bFn(x) = Xn. The standard error is
se =
qV(Xn) = �=
pn. If b� denotes an estimate of �, then
8.2 Statistical Functionals 121
the estimated standard error is b�=pn. (In the next example, we
shall see how to estimate �.) A Normal-based con�dence interval
for � is Xn � z�=2 se2: �
Example 8.7 (The Variance) Let �2 = T (F ) = V(X) =Rx2dF (x)��R
xdF (x)�2. The plug-in estimator is
b�2 =
Zx2d bFn(x)� �Z xd bFn(x)�2
=1
n
nXi=1
X2i�
1
n
nXi=1
Xi
!2
=1
n
nXi=1
(Xi �Xn)2:
Another reasoable estimator of �2 is the sample variance
S2n=
1
n� 1
nXi=1
(Xi �Xn)2:
In practice, there is little di�erence between b�2 and S2nand you
can use either one. Returning the our last example, we now see
that the estimated standard error of the estimate of the mean isbse = b�=pn. �Example 8.8 (The Skewness) Let � and �2 denote the mean and
variance of a random variable X. The skewness is de�ned to be
� =E(X � �)3
�3=
R(x� �)3dF (x)�R
(x� �)2dF (x)3=2 :
The skewness measure the lack of symmetry of a distribution.
To �nd the plug-in estimate, �rst recall that b� = n�1P
iXi andb�2 = n�1
Pi(Xi � b�)2. The plug-in estimate of � is
b� =
R(x� �)3d bFn(x)nR
(x� �)2d bFn(x)o3=2=
1n
Pi(Xi � b�)3b�3 : �
122 8. Estimating the cdf and Statistical Functionals
Example 8.9 (Correlation.) Let Z = (X; Y ) and let � = T (F ) =
E (X��X)(Y��Y )=(�x�y) denote the correlation between X and
Y , where F (x; y) is bivariate. We can write T (F ) = a(T1(F ); T2(F ); T3(F ); T4(F ); T5(F ))
where
T1(F ) =Rx dF (z) T2(F ) =
Ry dF (z) T3(F ) =
Rxy dF (z)
T4(F ) =Rx2 dF (z) T5(F ) =
Ry2 dF (z)
and
a(t1; : : : ; t5) =t3 � t1t2p
(t4 � t21)(t5 � t22):
Replace F with bFn in T1(F ), . . . , T5(F ), and take
b� = a(T1( bFn); T2( bFn); T3( bFn); T4( bFn); T5( bFn)):We get b� = P
i(Xi �Xn)(Yi � Y n)qP
i(Xi �Xn)2
qPi(Yi � Y n)2
which is called the sample correlation. �
Example 8.10 (Quantiles.) Let F be strictly increasing with den-
sity f . The T (F ) = F�1(p) be the pth quantile. The estimate
if T (F ) is bF�1n
(p). We have to be a bit careful since bFn is
not invertible. To avoid ambiguity we de�ne bF�1n
(p) = inffx :bFn(x) � pg. We call bF�1n
(p) the pth sample quantile. �
Only in the �rst example did we compute a standard error or
a con�dence interval. How shall we handle the other examples.
When we discuss parametric methods, we will develop formulae
for standard errors and con�dence intervals. But in our non-
parametric setting we need something else. In the next chapter,
we will introduce two methods { the jackknife and the bootstrap
{ for getting standard errors and con�dence intervals.
Example 8.11 (Plasma Cholesterol.) Figure 8.2 shows histograms
for plasma cholesterol (in mg/dl) for 371 patients with chest
8.2 Statistical Functionals 123
pain (Scott et al. 1978). The histograms show the percentage of
patients in 10 bins. The �rst histogram is for 51 patients who
had no evidence of heart disease while the second histogram is
for 320 patients who had narrowing of the arteries. Is the mean
cholesterol di�erent in the two groups? Let us regard these data
as samples from two distributions F1 and F2. Let �1 =RxdF1(x)
and �2 =RxdF2(x) denote the means of the two populations.
The plug-in estimates are b�1 =Rxd bFn;1(x) = Xn;1 = 195:27
and b�2 = R xd bFn;2(x) = Xn;2 = 216:19: Recall that the standard
error of the sample mean b� = 1n
Pn
i=1Xi is
se (b�) =vuutV
�1
n
nXi=1
Xi
�=
vuut 1
n2
nXi=1
V(Xi) =
rn�2
n2=
�pn
which we estimate by
bse (b�) = b�pn
where
b� =
vuut 1
n
nXi=1
(Xi �X)2:
For the two groups this yields bse (b�1) = 5:0 and bse (b�2) = 2:4.
Approximate 95 per cent con�dence intervals for �1 and �2 areb�1 � 2 bse (b�1) = (185; 205) and b�2 � 2 bse (b�2) = (211; 221).
Now, consider the functional � = T (F2)�T (F1) whose plug-in
estimate is b� = b�2�b�1 = 216:19�195:27 = 20:92: The standard
error of b� is
se =pV(b�2 � b�1) =pV(b�2) + V(b�1) =p(se (b�1))2 + (se (b�2))2
and we estimate this by
bse =p( bse (b�1))2 + ( bse (b�2))2 = 5:55:
124 8. Estimating the cdf and Statistical Functionals
An approximate 95 per cent con�dence interval for � is b��2 bse =
(9:8; 32:0). This suggests that cholesterol is higher among those
with narrowed arteries. We should not jump to the conclusion
(from these data) that cholesterol causes heart disease. The leap
from statistical evidence to causation is very subtle and is dis-
cussed later in this text. �
8.3 Technical Appendix
In this appendix we explain how to construct a con�dence
band for the cdf .
Theorem 8.12 (Dvoretzky-Kiefer-Wolfowitz (DKW) inequality.) Let
X1; : : : ; Xn be iid from F . Then, for any � > 0,
P
�supx
jF (x)� bFn(x)j > �
�� 2e�2n�
2
: (8.4)
From the DKW inequality, we can construct a con�dence set.
Let �2n= log(2=�)=(2n), L(x) = maxf bFn(x)��n; 0g and U(x) =
minf bFn(x) + �n; 1g. It follows from (8.4) that for any F ,
P(F 2 Cn) � 1� �:
Thus, Cn is a nonparametric 1�� con�dence set for F . A better
name for Cn is a con�dence band. To summarize:
A 1�� nonparametric con�dence band for F is (L(x); U(x)) where
L(x) = maxf bFn(x)� �n; 0gU(x) = minf bFn(x) + �n; 1g
�n =
s1
2nlog
�2
�
�:
8.3 Technical Appendix 125
plasma cholesterol for patients without heart disease
100 150 200 250 300 350 400
plasma cholesterol for patients with heart disease
100 150 200 250 300 350 400
FIGURE 8.2. Plasma cholesterol for 51 patients with no heart disease
and 320 patients with narrowing of the arteries.
126 8. Estimating the cdf and Statistical Functionals
Example 8.13 The dashed lines in Figure 8.1 give a 95 per cent
con�dence band using �n =q
12nlog�
2:05
�= :048. �
8.4 Bibliographic Remarks
The Glivenko-Cantelli theorem is the tip of the iceberg. The
theory of distribution functions is a special case of what are
called empirical processes which underlie much of modern statis-
tical theory. Some references on empirical processes are Shorack
and Wellner (1986) and van der Vaart and Wellner (1996).
8.5 Exercises
1. Prove Theorem 8.3.
2. LetX1; : : : ; Xn � Bernoulli(p) and let Y1; : : : ; Ym � Bernoulli(q).
Find the plug-in estimator and estimated standard error
for p. Find an approximate 90 per cent con�dence interval
for p. Find the plug-in estimator and estimated standard
error for p�q. Find an approximate 90 per cent con�dence
interval for p� q.
3. (Computer Experiment.) Generate 100 observations from
a N(0,1) distribution. Compute a 95 per cent con�dence
band for the cdf F . Repeat this 1000 times and see how
often the con�dence band contains the true distribution
function. Repeat using data from a Cauchy distribution.
4. Let X1; : : : ; Xn � F and let bFn(x) be the empirical distri-
bution function. For a �xed x, use the central limit theo-
rem to �nd the limiting distribution of bFn(x).5. Let x and y be two distinct points. Find Cov( bFn(x); bFn(y)).
8.5 Exercises 127
6. Let X1; : : : ; Xn � F and let bF be the empirical distri-
bution function. Let a < b be �xed numbers and de�ne
� = T (F ) = F (b)�F (a). Let b� = T ( bFn) = bFn(b)� bFn(a).Find the estimated standard error of b�. Find an expressionfor an approximate 1� � con�dence interval for �.
7. Data on the magnitudes of earthquakes near Fiji are avail-
able on the course website. Estimate the cdf F (x). Com-
pute and plot a 95 per cent con�dence envelope for F .
Find an approximate 95 per cent con�dence interval for
F (4:9)� F (4:3).
8. Get the data on eruption times and waiting times between
eruptions of the old faithful geyser from the course web-
site. Estimate the mean waiting time and give a standard
error for the estimate. Also, give a 90 per cent con�dence
interval for the mean waiting time. Now estimate the me-
dian waiting time. In the next chapter we will see how to
get the standard error for the median.
9. 100 people are given a standard antibiotic to treat an in-
fection and another 100 are given a new antibiotic. In the
�rst group, 90 people recover; in the second group, 85 peo-
ple recover. Let p1 be the probability of recovery under
the standard treatment and let p2 be the probability of
recovery under the new treatment. We are interested in
estimating � = p1 � p2. Provide an estimate, standard er-
ror, an 80 per cent con�dence interval and a 95 per cent
con�dence interval for �.
10. In 1975, an experiment was conducted to see if cloud seed-
ing produced rainfall. 26 clouds were seeded with silver
nitrate and 26 were not. The decision to seed or not was
made at random. Get the data from
http://lib.stat.cmu.edu/DASL/Stories/CloudSeeding.html
128 8. Estimating the cdf and Statistical Functionals
Let � be the di�erence in the median precipitation from
the two groups. Estimate �. Estimate the standard error of
the estimate and produce a 95 per cent con�dence interval.
9
The Bootstrap
The bootstrap is a nonparametric method for estimating stan-
dard errors and computing con�dence intervals. Let
Tn = g(X1; : : : ; Xn)
be a statistic, that is, any function of the data. Suppose we
want to know VF (Tn), the variance of Tn. We hav ewritten VF
to emphasize that the variance usually depends on the unknown
distribution function F . For example, if Tn = n�1P
n
i=1Xi then
VF (Tn) = �2=n where �2 =R(x� �)2dF (x) and � =
RxdF (x).
The bootstrap idea has two parts. First we estimate VF (Tn) with
VbFn(Tn). Thus, for Tn = n�1
Pn
i=1Xi we hav eV bFn(Tn) = b�2=n
where b�2 = n�1P
n
i=1(Xi �Xn). However, in morecomplicated
cases we cannot write down a simple formula for VbFn(Tn). This
leads us to the second step which is to approximate VbFn(Tn)
using simulation. Before proceeding, let us discuss the idea of
simulation.
9.1 Simulation
130 9. The Bootstrap
Suppose we draw an iid sample Y1; : : : ; YB from a distribution
G. By the law of large numbers,
Y n =1
B
BXj=1
YjP�!Zy dG(y) = E (Y )
as B !1. So if we draw a large sample from G, we can use the
sample mean Y n to approximate E (Y ). In a simulation, we can
make B as large as we like in which case, the di�erence between
Y n and E (Y ) is negligible. More generally, if h is any function
with �nite mean then
1
B
BXj=1
h(Yj)P�!Zh(y)dG(y) = E (h(Y ))
as B !1. In particular,
1
B
BXj=1
(Yj�Y )2 =1
B
BXj=1
Y 2j��1
B
BXj=1
Yj
�2P�!Zy2dF (y)�
�ZydF (y)
�2
= V(Y ):
Hence, we can use the sample variance of the simulated values
to approximate V(Y ).
9.2 Bootstrap Variance Estimation
According to what we just learned, we can approximate VbFn(Tn)
by simulation. Now VbFn(Tn) means \the variance of Tn if the
distribution of the data is bFn." How can we simulate from the
distribution of Tn when the data are assumed to have distribu-
tion bFn? The answer is to simulateX�1 ; : : : ; X
�nfrom bFn and then
compute T �n= g(X�
1 ; : : : ; X�n). This constitutes one draw from
the distribution of Tn. The idea is illustrated in the following
diagram:
Real world F =) X1; : : : ; Xn =) Tn = g(X1; : : : ; Xn)
Bootstrap world bFn =) X�1 ; : : : ; X
�n
=) T �n= g(X�
1 ; : : : ; X�n)
9.2 Bootstrap Variance Estimation 131
How do we simulate X�1 ; : : : ; X
�nfrom bFn? Notice that bFn puts
mass 1/n at each data pointX1; : : : ; Xn. Therefore, drawing an
observation from bFn is equivalent to drawing one point
at random from the original data set. Thu, to simulate
X�1 ; : : : ; X
�n� bFn it suÆces to draw n observations with re-
placement from X1; : : : ; Xn. Here is a summary:
Boostrap Variance Estimation
1. Draw X�1 ; : : : ; X
�n� bFn.
2. Compute T �n= g(X�
1 ; : : : ; X�n).
3. Repeat steps 1 and 2, B times, to get T �n;1; : : : ; T
�n;B
.
4. Let
vboot =1
B
BXb=1
T �n;b�
1
B
BXr=1
T �n;b
!2
: (9.1)
Example 9.1 The following pseudo-code shows how to use the
bootstrap to estimate the standard of the median.
132 9. The Bootstrap
Bootstrap for The Median
Given data X = (X(1), ..., X(n)):
T <- median(X)
Tboot <- vector of length B
for(i in 1:N){
Xstar <- sample of size n from X (with replacement)
Tboot[i] <- median(Xstar)
}
se <- sqrt(variance(Tboot))
The following schematic diagram will remind you that we are
using two approximations:
VF (Tn)
not so smallz}|{� V
bFn(Tn)
smallz}|{� vboot:
Example 9.2 Consider the nerve data. Let � = T (F ) =R(x �
�)3dF (x)=�3 be the skewness. The skewness is a measure of
asymmetry. A Normal distribution, for example, has skewness
0. The plug-in estimate of the skewness is
b� = T ( bFn) = R(x� �)3d bFn(x)b�3 =
1n
Pn
i=1(Xi �Xn)3b�3 = 1:76:
To estimate the standard error with the bootstrap we follow the
same steps as with the medin example except we compute the
skewness from each bootstrap sample. When applied to the nerve
data, the bootstrap, based on B = 1000 replications, yields a
standard error for the estimated skewness of .16.
9.3 Bootstrap Con�dence Intervals 133
9.3 Bootstrap Con�dence Intervals
There are several ways to construct bootstrap con�dence in-
tervals. Here we discuss three methods.
Normal Interval. The simplest is the Normal interval
Tn � z�=2 bse boot (9.2)
where bse boot is the bootstrap estimate of the standard error.
This interval is not accurate unless the distribution of Tn is
close to Normal.
Pivotal Intervals. Let � = T (F ) and b�n = T ( bFn) and de�ne the
pivot Rn = b�n � �. Let b��n;1; : : : ;
b��n;B
denote bootstrap replica-
tions of b�n. Let H(r) denote the cdf of the pivot:
H(r) = PF (Rn � r): (9.3)
De�ne C?
n= (a; b) where
a = b�n �H�1�1�
�
2
�and b = b�n �H�1
��2
�: (9.4)
It follows that
P(a � � � b) = P(a� b�n � � � b�n � b� b�n)= P(b�n � b � b�n � � � b�n � a)
= P(b�n � b � Rn � b�n � a)
= H(b�n � a)�H(b�n � b)
= H�H�1
�1�
�
2
���H
�H�1
��2
��= 1�
�
2��
2= 1� �:
Hence, C?
nis an exact 1� � con�dence interval for �. Unfortu-
nately, a and b depend on the unknown distribution H but we
134 9. The Bootstrap
can form a bootstrap estimate of H:
bH(r) =1
B
BXb=1
I(R�n;b� r) (9.5)
where R�n;b
= b��n;b� b�n. Let r�� denote the � sample qauntile
of (R�n;1; : : : ; R
�n;B
) and let ���denote the � sample qauntile of
(��n;1; : : : ; �
�n;B
). Note that r��= ��
�� b�n. It follows that an ap-
proximate 1� � con�dence interval is Cn = (ba;bb) whereba = b�n � bH�1
�1�
�
2
�= b�n � r�1��=2 = 2b�n � ��1��=2bb = b�n � bH�1
��2
�= b�n � r�
�=2 = 2b�n � ���=2:
In summary:
The 1� � bootstrap pivotal con�dence interval is
Cn =�2b�n � b��1��=2; 2b�n � b���=2� : (9.6)
Typically, this is a pointwise, asymptotic con�dence interval.
Theorem 9.3 Under weak conditions on T (F ), PF (T (F ) 2 Cn)!1� � as n!1, where Cn is given in (9.6).
Percentile Intervals. The bootstrap percentile interval is
de�ned by
Cn =����=2; �
�1��=2
�:
The justi�cation for this interval is given in the appendix.
Example 9.4 For estimating the skewness of the nerve data, here
are the various con�dence intervals.
9.3 Bootstrap Con�dence Intervals 135
Method 95% Interval
Normal (1.44, 2.09)
Percentile (1.42, 2.03)
Pivotal (1.48, 2.11)
All these con�dence intervals are approximate. The probabil-
ity that T (F ) is in the interval is not exactly 1 � �. All three
intervals have the same level of accuracy. There are more accu-
rate bootstrap con�dence intervals but they are more compli-
cated and we will not discuss them here.
Example 9.5 (The Plasma Cholesterol Data.) Let us return to the
cholesterol data. Suppose we are interested in the di�erence of
the medians. Pseudo-code for the bootstrap analysis is as follows:
x1 <- first sample
x2 <- second sample
n1 <- length(x1)
n2 <- length(x2)
th.hat <- median(x2) - median(x1)
B <- 1000
Tboot <- vector of length B
for(i in 1:B){
xx1 <- sample of size n1 with replacement from x1
xx2 <- sample of size n2 with replacement from x2
Tboot[i] <- median(xx2) - median(xx1)
}
se <- sqrt(variance(Tboot))
Normal <- (th.hat - 2*se, th.hat + 2*se)
percentile <- (quantile(Tboot,.025), quantile(Tboot,.975))
pivotal <- ( 2*th.hat-quantile(Tboot,.975), 2*th.hat-quantile(Tboot,.025) )
The point estimate is 18.5, the bootstrap standard error is 7.42
and the resulting approximate 95 per cent con�dence intervals
are as follows:
136 9. The Bootstrap
Method 95% Interval
Normal (3.7, 33.3)
Percentile (5.0, 33.3)
Pivotal (5.0, 34.0)
Since these intervals exclude 0, it appears that the second group
has higher cholesterol although there is considerable uncertainty
about how much higher as re ected in the width of the intervals.
The next two examples are based on small sample sizes. In
practice, statistical methods based on very small sample sizes
might not be reliable. We include the examples for their peda-
gogical value but we do want to sound a note of caution about
interpreting the results with some skepticism.
Example 9.6 Here is an example that was one of the �rst used
to illustrate the bootstrap by Bradley Efron, the inventor of the
bootstrap. The data are LSAT scores (for entrance to law school)
and GPA.
LSAT 576 635 558 578 666 580 555 661
651 605 653 575 545 572 594
GPA 3.39 3.30 2.81 3.03 3.44 3.07 3.00 3.43
3.36 3.13 3.12 2.74 2.76 2.88 3.96
Each data point is of the form Xi = (Yi; Zi) where Yi = LSATi
and Zi = GPAi. The law school is interested in the correlation
� =
R R(y � �Y )(z � �Z)dF (y; z)qR
(y � �Y )2dF (y)R(z � �Z)2dF (z)
:
The plug-in estimate is the sample correlation
b� = Pi(Yi � Y )(Zi � Z)qP
i(Yi � Y )2
Pi(Zi � Z)2
:
9.3 Bootstrap Con�dence Intervals 137
The estimated correlation is b� = :776. The bootstrap based on
B = 1000 gives bse = :137. Figure 9.1 shows the data and a his-
togram of the bootstrap replications b��1; : : : ; b��B. This histogram is
an approximation to the sampling distribution of b�. The Normal-
based 95 per cent con�dence interval is :78 � 2 bse = (:51; 1:00)
while the percentile interval is (.46,.96). In large samples, the
two methods will show closer agreement.
Example 9.7 This example is borrowed from An Introduction to
the Bootstrap by B. Efron and R. Tibshirani. When drug com-
panies introduce new medications, they are sometimes requires
to show bioequivalence. This means that the new drug is not
substantially di�erent than the current treatment. Here are data
on eight subjects who used medical patches to infuse a hormone
into the blood. Each subject received three treatments: placebo,
old-patch, new-patch.
subject placebo old new old-placebo new-old
______________________________________________________________
1 9243 17649 16449 8406 -1200
2 9671 12013 14614 2342 2601
3 11792 19979 17274 8187 -2705
4 13357 21816 23798 8459 1982
5 9055 13850 12560 4795 -1290
6 6290 9806 10157 3516 351
7 12412 17208 16570 4796 -638
8 18806 29044 26325 10238 -2719
Let Z = old � placebo and Y = new � old. The Food and
Drug Administration (FDA) requirement for bioequivalence is
that j�j � :20 where
� =E F (Y )
E F (Z):
138 9. The Bootstrap
•
•
•
•
•
•
•
•
•
• •
••
•
•
LSAT
GPA
560 580 600 620 640 660
2.8
3.0
3.2
3.4
0.2 0.4 0.6 0.8 1.0
050
100
150
Bootstrap Samples
FIGURE 9.1. Law school data.
9.4 Bibliographic Remarks 139
The estimate of � is
b� = Y
Z=�452:36342
= �:0713:
The bootstrap standard error is bse = :105. To answer the bioe-
quivalence question, we compute a con�dence interval. From
B = 1000 bootstrap replications we get the 95 per cent inter-
val is (-.24,.15). This is not quite contained in [-.20,.20] so at
the 95 per cent level we have not demonstrated bioequivalence.
Figure 9.2 shows the histogram of the bootstrap values.
9.4 Bibliographic Remarks
The boostrap was invented by Efron (1979). There are several
books on these topics including Efron and Tibshirani (1993),
Davison and Hinkley (1997), Hall (1992), and Shao and Tu
(1995). Also, see section 3.6 of van der Vaart and Wellner (1996).
9.5 Technical Appendix
9.5.1 The Jackknife
There is another method for computing standard errors called
the jackknife, due to Quenouille (1949). It is less computa-
tionally expensive than the bootstrap but is less general. Let
Tn = T (X1; : : : ; Xn) be a statistic and T(�i) denote the statistic
with the ith observation removed. Let T n = n�1P
n
i=1 T(�i). The
jackknife estimate of var(Tn) is
vjack =n� 1
n
nXi=1
(T(�i) � T n)2
and the jackknife estimate of the standard error is bsejack =pvjack. Under suitable conditions on T , it can be shown that
vjack consistently estimates var(Tn) in the sense that vjack=var(Tn)p!
140 9. The Bootstrap
-0.3 -0.2 -0.1 0.0 0.1 0.2
020
4060
80
Bootstrap Samples
FIGURE 9.2. Patch data.
9.6 Excercises 141
1. However, unlike the bootstrap, the jackknife does not produce
consistent estimates of the standard error of sample quantiles.
9.5.2 Justi�cation For The Percentile Interval
Suppose there exists a monotone transformation U = m(T )
such that U � N(�; c2) where � = m(�). We do not suppose we
know the transformation, only that one exist. Let U�b= m(��
n;b).
Let u��be the � sample quantile of the U�
b's. Since a mono-
tone transformation preserves quantiles, we have that u��=2 =
m(���=2). Also, since U � N(�; c2), the �=2 quantile of U is
�� z�=2c. Hence u��=2 = �� z�=2c. Similarly, u�1��=2 = �+ z�=2c.
Therefore,
P(���=2 � � � ��1��=2) = P(m(��
�=2) � m(�) � m(��1��=2))
= P(u��=2 � � � u�1��=2)
= P(U � cz�=2 � � � U + cz�=2)
= P(�z�=2 �U � �
c� z�=2)
= 1� �:
An exact normalizing transformation will rarely exist but there
may exist approximate normalizing transformations.
9.6 Excercises
1. Consider the data in Example 9.6. Find the plug-in esti-
mate of the correlation coeÆcient. Estimate the standard
error using the bootstrap. Find a 95 per cent con�dence
inerval using all three methods.
2. (Computer Experiment.) Conduct a simulation to compare
the four bootstrap con�dence interval methods. Let n =
50 and let T (F ) =R(x � �)3dF (x)=�3 be the skewness.
Draw Y1; : : : ; Xn � N(0; 1) and set Xi = eYi, i = 1; : : : ; n.
Construct the four types of bootstrap 95 per cent intervals
for T (F ) from the data X1; : : : ; Xn. Repeat this whole
142 9. The Bootstrap
thing many times and estimate the true coverage of the
four intervals.
3. Let
X1; : : : ; Xn � t3
where n = 25. Let � = T (F ) = (q:75 � q:25)=1:34 where qp
denotes the pth quantile. Do a simulation to compare the
coverage and length of the following con�dence intervals
for �: (i) Normal interval with standard error from the
bootstrap, (ii) bootstrap percentile interval.
Remark: The jackknife does not give a consistent estima-
tor of the variance of a quantile.
4. Let X1; : : : ; Xn be distinct observations (no ties). Show
that there are �2n� 1
n
�distinct bootstrap samples.
Hint: Imagine putting n balls into n buckets.
5. LetX1; : : : ; Xn be distinct observations (no ties). LetX�1 ; : : : ; X
�n
denote a bootstrap sample and let X�n= n�1
Pn
i=1X�i.
Find: E (X�njX1; : : : ; Xn), V(X
�njX1; : : : ; Xn), E (X
�n) and
V(X�n).
6. (Computer Experiment.) Let X1; :::; Xn Normal(�; 1): Let
� = e� and let b� = eX be the mle. Create a data set (using
� = 5) consisting of n=100 observations.
(a) Use the bootstrap to get the se and 95 percent con�-
dence interval for �:
(b) Plot a histogram of the bootstrap replications for the
parametric and nonparametric bootstraps. These are es-
timates of the distribution of b�. Compare this to the true
sampling distribution of b�.
9.6 Excercises 143
7. LetX1; :::; Xn Unif(0; �):The mle is b� = Xmax = maxfX1; :::; Xng.Generate a data set of size 50 with � = 1:
(a) Find the distribution of b�. Compare the true distri-
bution of b� to the histograms from the parametric and
nonparametric bootstraps.
(b) This is a case where the nonparametric bootstrap does
very poorly. In fact, we can prove that this is the case.
Show that, for the parametric bootstrap P (b�� = b�) = 0
but for the nonparametric bootstrap P (b�� = b�) � :632.
Hint: show that, P (b�� = b�) = 1 � (1� (1=n))n then take
the limit as n gets large.
8. Let Tn = X2
n, � = E (X1), �k =
Rjx � �jkdF (x) andb�k = n�1
Pn
i=1 jXi �Xnjk. Show that
vboot =4X
2
nb�2
n+
4Xnb�3
n2+b�4
n3:
144 9. The Bootstrap
10
Parametric Inference
We now turn our attention to parametric models, that is,
models of the form
F =nf(x; �) : � 2 �
o(10.1)
where the � � Rk is the parameter space and � = (�1; : : : ; �k)
is the parameter. The problem of inference then reduces to the
problem of estimating the parameter �.
Students learning statistics often ask: how would we ever
know that the distribution that generated the data is in some
parametric model? This is an excellent question. Indeed, we
would rarely hav e such knowledge which is why nonparametric
methods are preferable. Still, studying methods for parametric
models is useful for two reasons. First, there are some cases
where background knowledge suggests that a parametric model
provides a reasonable approximation. F or example, counts of
traÆc accidents are known from prior experience to follow ap-
proximately a P oissonmodel. Second, the inferential concepts
146 10. Parametric Inference
for parametric models provide background for understanding
certain nonparametric methods.
We will begin with a brief dicussion about parameters of in-
terest and nuisance parameters in the next section, then we will
discuss two methods for estimating �, the method of moments
and the method of maximum likelihood.
10.1 Parameter of Interest
Often, we are only interested in some function T (�). For ex-
ample, if X � N(�; �2) then the parameter is � = (�; �). If our
goal is to estimate � then � = T (�) is called the parameter
of interest and � is called a nuisance parameter. The pa-
rameter of interest can be a complicated function of � as in the
following example.
Example 10.1 Let X1; : : : ; Xn � Normal(�; �2). The parameter
is � = (�; �) is � = f(�; �) : � 2 R; � > 0g. Suppose that Xi
is the outcome of a blood test and suppose we are interested in
� , the fraction of the population whose test score is larger than
1. Let Z denote a standard Normal random variable. Then
� = P(X > 1) = 1� P(X < 1) = 1� P
�X � �
�<
1� �
�
�= 1� P
�Z <
1� �
�
�= 1� �
�Z <
1� �
�
�:
The parameter of interest is � = T (�; �) = 1��((1� �)=�). �
Example 10.2 Recall that X has a Gamma(�; �) distribution if
f(x; �; �) =1
���(�)x��1e�x=�; x > 0
where �; � > 0 and
�(�) =
Z 1
0
y��1e�ydy
10.2 The Method of Moments 147
is the Gamma function. The parameter is � = (�; �). The Gamma
distribution is sometimes used to model lifetimes of people, an-
imals, and electronic equipment. Suppose we want to estimate
the mean lifetime. Then T (�; �) = E �(X1) = ��. �
10.2 The Method of Moments
The �rst method for generating parametric estimators that
we will study is called the method of moments. We will see
that these estimators are not optimal but they are often easy to
compute. They are are also useful as starting values for other
methods that require iterative numerical routines.
Suppose that the parameter � = (�1; : : : ; �k) has k compo-
nents. For 1 � j � k, de�ne the jth moment
�j � �j(�) = E �(Xj) =
ZxjdF�(x) (10.2)
and the jth sample moment
b�j = 1
n
nXi=1
Xj
i: (10.3)
De�nition 10.3 Themethod of moments estimator b�nis de�ned to be the value of � such that
�1(b�n) = b�1
�2(b�n) = b�2
......
...
�k(b�n) = b�k: (10.4)
Formula (10.4) de�nes a system of k equations with k un-
knowns.
148 10. Parametric Inference
Example 10.4 Let X1; : : : ; Xn � Bernoulli(p). Then �1 = E p(X) =
p and b�1 = n�1P
n
i=1Xi. By equating these we get the estimator
bpn = 1
n
nXi=1
Xi: �
Example 10.5 Let X1; : : : ; Xn � Normal(�; �2). Then, �1 = E �(X1) =
� and �2 = E �(X21 ) = V�(X1) + (E �(X1))
2 = �2 + �2. We needRecall that V(X) =
E (X2) � (E (X))2.
Hence, E (X2) =
V(X) + (E (X))2.
to solve the equations
b� =1
n
nXi=1
Xi
b�2 + b�2 =1
n
nXi=1
X2i:
This is a system of 2 equations with 2 unknowns. The solution
is
b� = Xn
b�2 =1
n
nXi=1
(Xi �Xn)2: �
Theorem 10.6 Let b�n denote the method of moments estimator.
Under the the conditions given in the appendix, the following
statements hold:
1. The estimate b�n exists with probability tending to 1.
2. The estimate is consistent: b�n P�! �.
3. The estimate is asymptotically Normal:
pn(b�n � �) N(0;�)
where
� = gE �(Y YT )gT ;
Y = (X;X2; : : : ; Xk)T , g = (g1; : : : ; gk) and gj = @��1j(�)=@�.
10.3 Maximum Likelihood 149
The last statement in the Theorem above can be used to �nd
standard errors and con�dence intervals. However, there is an
easier way: the bootstrap. We defer discussion of this until the
end of the chapter.
10.3 Maximum Likelihood
The most common method for estimating parameters in a
parametric model is the maximum likelihood method. Let
X1; : : : ; Xn be iid with pdf f(x; �).
De�nition 10.7 The likelihood function is de�ned by
Ln(�) =nYi=1
f(Xi; �): (10.5)
The log-likelihood function is de�ned by `n(�) = logLn(�).
The likelihood function is just the joint density of the data,
except that we treat it is a function of the parameter �.
Thus Ln : �! [0;1). The likelihood function is not a density
function: in general, it is not true that Ln(�) integrates to 1.
De�nition 10.8 The maximum likelihood estimator
mle , denoted by b�n, is the value of � that maximizes
Ln(�).
The maximum of `n(�) occurs at the same place as the max-
imum of Ln(�), so maximizing the log-likelihood leads to the
same answer as maximizing the likelihood. Often, it is easier to
work with the log-likelihood.
Remark 10.9 If we multiply Ln(�) by any positive constant c (notdepending on �) then this will not change the mle . Hence, we
150 10. Parametric Inference
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p
bpnLn(p)
FIGURE 10.1. Likelihood function for Bernoulli with n = 20 andPn
i=1Xi = 12. The mle is bpn = 12=20 = 0:6.
shall often be sloppy about dropping constants in the likelihood
function.
Example 10.10 Suppose that X1; : : : ; Xn � Bernoulli(p). The prob-
ability function is f(x; p) = px(1 � p)1�x for x = 0; 1. The un-
known parameter is p. Then,
Ln(p) =nYi=1
f(Xi; p) =
nYi=1
pXi(1� p)1�Xi = pS(1� p)n�S
where S =P
iXi. Hence,
`n(p) = S log p+ (n� S) log(1� p):
Take the derivative of `n(p), set it equal to 0 to �nd that the
mle is bpn = S=n. See Figure 10.1. �
Example 10.11 Let X1; : : : ; Xn � N(�; �2). The parameter is
� = (�; �) and the likelihood function is (ignoring some con-
stants)
Ln(�; �) =Yi
1
�exp
��
1
2�2(Xi � �)2
�
10.3 Maximum Likelihood 151
= ��n exp
(�
1
2�2
Xi
(Xi � �)2
)
= ��n exp
��nS2
2�2
�exp
��n(X � �)2
2�2
�whereX = n�1
PiXi is the sample mean and S2 = n�1
Pi(Xi�
X)2. The last equality above follows from the fact thatP
i(Xi�
�)2 = nS2+n(X��)2 which can be veri�ed by writingP
i(Xi�
�)2 =P
i(Xi�X+X��)2 and then expanding the square. The
log-likelihood is
`(�; �) = �n log � �nS2
2�2�n(X � �)2
2�2:
Solving the equations
@`(�; �)
@�= 0 and
@`(�; �)
@�= 0
we conclude that b� = X and b� = S. It can be veri�ed that these
are indeed global maxima of the likelihood. �
Example 10.12 (A Hard Example.) Here is the example that con-
fuses everyone. Let X1; : : : ; Xn � Unif(0; �). Recall that
f(x; �) =
�1�
0 � x � �
0 otherwise:
Consider a �xed value of �. Suppose � < Xi for some i. Then,
f(Xi; �) = 0 and hence Ln(�) =Q
if(Xi; �) = 0. It follows that
Ln(�) = 0 if any Xi > �. Therefore, Ln(�) = 0 if � < X(n)
where X(n) = maxfX1; : : : ; Xng. Now consider any � � X(n).
For every Xi we then have that f(Xi; �) = 1=� so that Ln(�) =Qif(Xi; �) = ��n. In conclusion,
Ln(�) =� �
1�
�n� � X(n)
0 � < X(n):
See Figure 10.2. Now Ln(�) is strictly decreasing over the inter-
val [X(n);1). Hence, b�n = X(n). �
152 10. Parametric Inference
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
0.6 0.8 1.0 1.2 1.4
0.0
0.5
1.0
1.5
� = :75 � = 1
� = 1:25
x x
x �
f(x;�)
f(x;�)
f(x;�)
Ln(�)
FIGURE 10.2. Likelihood function for Uniform (0; �). The
vertical lines show the observed data. The �rst three
plots show f(x; �) for three di�erent values of �. When
� < X(n) = maxfX1; : : : ;Xng, as in the �rst plot, f(X(n); �) = 0
and hence Ln(�) =Qn
i=1 f(Xi; �) = 0. Otherwise f(Xi; �) = 1=�
for each i and hence Ln(�) =Qn
i=1 f(Xi; �) = (1=�)n. The last plot
shows the likelihood function.
10.4 Properties of Maximum Likelihood Estimators. 153
10.4 Properties of Maximum Likelihood
Estimators.
Under certain conditions on the model, the maximum likeli-
hood estimator b�n possesses many properties that make it an ap-
pealing choice of estimator. The main properties of the mle are:
(1) It is consistent: b�n P�! �? where �? denotes the true value
of the parameter �;
(2) It is equivariant: if b�n is the mle of � then g(b�n) is themle of g(�);
(3) It is asymptotically Normal:pn(b�� �?)= bse N(0; 1)
where bse can be computed analytically;
(4) It is asymptotically optimal or eÆcient: roughly, this
means that among all well behaved estimators, the mle has the
smallest variance, at least for large samples.
(5) The mle is approximately the Bayes estimator. (To be
explained later.)
We will spend some time explaining what these properties
mean and why they are good things. In suÆciently complicated
problems, these properties will no longer hold and the mle will
no longer be a good estimator. For now we focus on the simpler
situations where the mle works well. The properties we discuss
only hold if the model satis�es certain regularity conditions.
These are essentially smoothness conditions on f(x; �). Unless
otherwise stated, we shall tacitly assume that these con-
ditions hold.
10.5 Consistency of Maximum Likelihood
Estimators.
Consistency means that the mle converges in probability to
the true value. To proceed, we need a de�nition. If f and g are
154 10. Parametric Inference
pdf 's, de�ne the Kullback-Leibler distance between f and
g to be
D(f; g) =
Zf(x) log
�f(x)
g(x)
�dx: (10.6)
It can be shown that D(f; g) � 0 and D(f; f) = 0. For anyThis is not a dis-
tance in the formal
sense.�; 2 � write D(�; ) to mean D(f(x; �); f(x; )). We will
assume that � 6= implies that D(�; ) > 0.
Let �? denote the true value of �. Maximizing `n(�) is equiv-
alent to maximizing
Mn(�) =1
n
Xi
logf(Xi; �)
f(Xi; �?):
By the law of large numbers, Mn(�) converges to
E �?
�log
f(Xi; �)
f(Xi; �?)
�=
Zlog
�f(x; �)
f(x; �?)
�f(x; �?)dx
= �Z
log
�f(x; �?)
f(x; �)
�f(x; �?)dx
= �D(�?; �):
Hence,Mn(�) � �D(�?; �) which is maximized at �? since�D(�?; �?) =
0 and �D(�?; �?) < 0 for � 6= �?. Hence, we expect that the max-
imizer will tend to �?. To prove this formally, we need more than
Mn(�)P�! �D(�?; �). We need this convergence to be uniform
over �. We also have to make sure that the function D(�?; �) is
well behaved. Here are the formal details.
Theorem 10.13 Let �? denote the true value of �. De�ne
Mn(�) =1
n
Xi
logf(Xi; �)
f(Xi; �?)
and M(�) = �D(�?; �). Suppose that
sup�2�
jMn(�)�M(�)j P�! 0 (10.7)
10.6 Equivariance of the mle 155
and that, for every � > 0,
sup�:j���?j��
M(�) < M(�?): (10.8)
Let b�n denote the mle. Then b�n P�! �?.
Proof. See appendix. �
10.6 Equivariance of the mle
Theorem 10.14 Let � = g(�) be a one-to-one function of �. Letb�n be the mle of �. Then b�n = g(b�n) is the mle of � .
Proof. Let h = g�1 denote the inverse of g. Then b�n = h(b�n).For any � , L(�) =
Qif(xi; h(�)) =
Qif(xi; �) = L(�) where
� = h(�). Hence, for ana � , Ln(�) = L(�) � L(b�) = Ln(b�). �Example 10.15 Let X1; : : : ; Xn � N(�; 1). The mle for � is b�n =Xn. Let � = e�. Then, the mle for � is b� = e
b� = eX. �
10.7 Asymptotic Normality
It turns out that b�n is approximately Normal and we can
compute its variance analytically. To explore this, we �rst need
a few de�nitions.
156 10. Parametric Inference
De�nition 10.16 The score function is de�ned to be
s(X; �) =@ log f(X; �)
@�: (10.9)
The Fisher information is de�ned to be
In(�) = V�
nXi=1
s(Xi; �)
!
=
nXi=1
V� (s(Xi; �)) : (10.10)
For n = 1 we will sometimes write I(�) instead of I1(�).
It can be shown that E �(s(X; �)) = 0. It then follows that
V�(s(X; �)) = E �(s2(X; �)). In fact a further simpli�cation of
In(�) is given in the next result.
Theorem 10.17 In(�) = nI(�) and
I(�) = �E ��@2 log f(X; �)
@2�2
�= �
Z �@2 log f(x; �)
@2�2
�f(x; �)dx: (10.11)
10.7 Asymptotic Normality 157
Theorem 10.18 (Asymptotic Normality of the mle .) Under appro-
priate regularity conditions, the following hold:
1. Let se =p1=In(�). Then,
(b�n � �)
se N(0; 1): (10.12)
2. Let bse =
q1=In(b�n). Then,
(b�n � �)bse N(0; 1): (10.13)
The proof is in the appendix. The �rst statement says thatb�n � N(�; se ) where the standard error of b�n is se =p1=In(�).
The second statement says that this is still true even if we re-
place the standard error by its estimated standard error bse =q1=In(b�n).Informally, the theorem says that the distribution of themle can
be approximated withN(�; bse2). From this fact we can construct
an (asymptotic) con�dence interval.
Theorem 10.19 Let
Cn =�b�n � z�=2 bse ; b�n + z�=2 bse �:
Then, P�(� 2 Cn)! 1� � as n!1.
Proof. Let Z denote a standard normal random variable.
Then,
P�(� 2 Cn) = P�
�b�n � z�=2 bse � � � b�n + z�=2 bse�= P�
�z�=2 �
b�n � �bse � z�=2
!
158 10. Parametric Inference
! P(�z�=2 < Z < z�=2) = 1� �: �
For � = :05, z�=2 = 1:96 � 2, so:
b�n � 2 bse (10.14)
is an approximate 95 per cent con�dence interval.
When you read an opinion poll in the newspaper, you often
see a statement like: the poll is accurate to within one point,
95 per cent of the time. They are simply giving a 95 per cent
con�dence interval of the form b�n � 2 bse .Example 10.20 Let X1; : : : ; Xn � Bernoulli(p). The mle is bpn =P
iXi=n and f(x; p) = px(1� p)1�x, log f(x; p) = x log p+ (1�
x) log(1�p), s(X; p) = (x=p)� (1�x)=(1�p), and �s0(X; p) =
(x=p2) + (1� x)=(1� p)2. Thus,
I(p) = E p(�s0(X; p)) =p
p2+
(1� p)
(1� p)2=
1
p(1� p):
Hence,
bse =1pIn(bpn) = 1p
nI(bpn) =�bp(1� bp)
n
�1=2
:
An approximate 95 per cent con�dence interval is
bpn � 2
�bp(1� bp)n
�1=2
:
Compare this with the Hoe�ding interval. �
Example 10.21 Let X1; : : : ; Xn � N(�; �2) where �2 is known.
The score function is s(X; �) = (X � �)=�2 and s0(X; �) =
�1=�2 so that I1(�) = 1=�2. The mle is b�n = Xn. According
to Theorem 10.18, Xn � N(�; �2=n). In fact, in this case, the
distribution is exact. �
10.8 Optimality 159
Example 10.22 Let X1; : : : ; Xn � Poisson(�). Then b�n = Xn
and some calculations show that I1(�) = 1=�, so
bse =1q
nI1(b�n) =sb�n
n:
Therefore, an approximate 1 � � con�dence interval for � isb�n � z�=2
qb�n=n. �10.8 Optimality
Suppose that X1; : : : ; Xn � N(�; �2). The mle is b�n = Xn.
Another reasonable estimator is the sample median e�n. Themle satis�es p
n(b�n � �) N(0; �2):
It can be proved that the median satis�es
pn(e�n � �) N
�0; �2
�
2
�:
This means that the median converges to the right value but
has a larger variance than the mle .
More generally, consider two estimators Tn and Un and sup-
pose that pn(Tn � �) N(0; t2)
and that pn(Un � �) N(0; u2):
We de�ne the asymptotic relative eÆciency of U to T by ARE(U; T ) =
t2=u2. In the Normal example, ARE(e�n; b�n) = 2=� = :63. The
interpretation is that if you use the median, you are only using
about 63 per cent of the available data.
Theorem 10.23 If b�n is the mle and e�n is any other estimator
then The result is ac-
tually more subtle
than this but we
needn't worry about
the details.
160 10. Parametric Inference
ARE(e�n; b�n) � 1:
Thus, the mle has the smallest (asymptotic) variance and we
say that mle is eÆcient or asymptotically optimal.
This result is predicated upon the assumed model being cor-
rect. If the model is wrong, the mle may no longer be optimal.
We will discuss optimality in more generality when we discuss
decision theory.
10.9 The Delta Method.
Let � = g(�) where g is a smooth function. The maximum
likelihood estimator of � is b� = g(b�). Now we address the fol-
lowing question: what is the distribution of b�?Theorem 10.24 (The Delta Method) If � = g(�) where g is di�er-
entiable and g0(�) 6= 0 then
pn(b�n � �)bse (b� ) N(0; 1) (10.15)
where b�n = g(b�n) andbse (b�n) = jg0(b�)j bse (b�n) (10.16)
Hence, if
Cn =�b�n � z�=2 bse (b�n); b�n + z�=2 bse (b�n)� (10.17)
then P�(� 2 Cn)! 1� � as n!1.
Example 10.25 Let X1; : : : ; Xn � Bernoulli(p) and let = g(p) =
log(p=(1�p)). The Fisher information function is I(p) = 1=(p(1�p)) so the standard error of the mle bpn is se = fbpn(1� bpn)=ng1=2.
10.9 The Delta Method. 161
The mle of is b = log bp=(1� bp). Since, g0(p) = 1=(p=(1�p)),according to the delta method
bse ( b n) = jg0(bpn)j bse (bpn) = 1pnbpn(1� bpn) :
An approximate 95 per cent con�dence interval is
b n � 2pnbpn(1� bpn) : �
Example 10.26 Let X1; : : : ; Xn � N(�; �2). Suppose that � is
known, � is unknown and that we want to estimate = log�.
The log-likelihood is `(�) = �n log � � 12�2
Pi(xi � �)2. Di�er-
entiate and set equal to 0 and conclude that
b� =
�Pi(Xi � �)2
n
�1=2
:
To get the standard error we need the Fisher information. First,
log f(X; �) = � log � �(X � �)2
2�2
with second derivative
1
�2�
3(X � �)2
�4
and hence
I(�) = �1
�2+
3�2
�4=
2
�2:
Hence, bse = b�n=p2n: Let = g(�) = log(�). Then, b n =
log b�n. Since, g0 = 1=�,
bse ( b n) = 1b� b�p2n
=1p2n
and an approximate 95 per cent con�dence interval is b n �2=p2n. �
162 10. Parametric Inference
10.10 Multiparameter Models
These ideas can directly be extended to models with several
parameters. Let � = (�1; : : : ; �k) and let b� = (b�1; : : : ; b�k) be themle . Let `n =
Pn
i=1 log f(Xi; �),
Hjj =@2`n
@�2j
and Hjk =@2`n
@�j@�k:
De�ne the Fisher Information Matrix by
In(�) = �
26664E �(H11) E �(H12) � � � E �(H1k)
E �(H21) E �(H22) � � � E �(H2k)...
......
...
E �(Hk1) E �(Hk2) � � � E �(Hkk)
37775 : (10.18)
Let Jn(�) = I�1n(�) be the inverse of In.
Theorem 10.27 Under appropriate regularity conditions,
pn(b� � �) � N(0; Jn):
Also, if b�j is the jth component of b�, thenpn(b�j � �j)bse j N(0; 1)
where bse 2j= Jn(j; j) is the j
th diagonal element of Jn. The ap-
proximate covariance of b�j and b�k is Cov(b�j; b�k) � Jn(j; k).
There is also a multiparameter delta method. Let � = g(�1; : : : ; �k)
be a function and let
rg =
0BBBBB@@g@�1.
.
.
@g@�k
1CCCCCAbe the gradient of g.
10.11 The Parametric Bootstrap 163
Theorem 10.28 (Multiparameter delta method) Suppose that
rg evaluated at b� is not 0. Let b� = g(b�). Thenpn(b� � �)bse (b�) N(0; 1)
where bse (b�) =q(brg)T bJn(brg); (10.19)
bJn = Jn(b�n) and brg is rg evaluated at � = b�.Example 10.29 Let X1; : : : ; Xn � N(�; �2). Let � = g(�; �) =
�=�. In homework question 8 you will show that
In(�; �) =
�n
�20
0 2n�2
�:
Hence,
Jn = I�1n(�; �) =
1
n
��2 0
0 �2
2
�:
The gradient of g is
rg =
� �
�2
1�
!:
Thus,
bse (b�) =q(brg)T bJn(brg) = 1pn
�1b�4 + b�2
2b�2�1=2
: �
10.11 The Parametric Bootstrap
For parametric models, standard errors and con�dence in-
tervals may also be estimated using the bootstrap. There is
only one change. In the nonparametric bootstrap, we sampled
164 10. Parametric Inference
X�1 ; : : : ; X
�nfrom the empirical distribution bFn. In the paramet-
ric bootstrap we sample instead from f(x; b�n). Here, b�n could
be the mle or the method of moments estimator.
Example 10.30 Consider example 10.29. To get to bootstrap stan-
dard error, simulate X1; : : : ; X�n� N(b�; b�2), compute b�� =
n�1P
iX�iand b�2� = n�1
Pi(X�
i� b��)2. Then compute b� � =
g(b��; b��) = b��=b��. Repeating this B times yields bootstrap repli-
cations b� �1 ; : : : ; b� �Band the estimated standard error is
bse boot =
sPB
b=1(b� �b � b�)2B
: �
The bootstrap is much easier than the delta method. On the
other hand, the delta method has the advantage that it gives a
closed form expression for the standard error.
10.12 Technical Appendix
10.12.1 Proofs
Proof of Theorem 10.13. Since b�n maximizes Mn(�), we
have Mn(b�n) �Mn(�?). Hence,
M(�?)�M(b�n) = Mn(�?)�M(b�n) +M(�?)�Mn(�?)
� Mn(b�)�M(b�n) +M(�?)�Mn(�?)
� sup�
jMn(�)�M(�)j+M(�?)�Mn(�?)
P�! 0
where the last line follows from (10.7). It follows that, for any
Æ > 0,
P
�M(b�n) < M(�?)� Æ
�! 0:
10.12 Technical Appendix 165
Pick any � > 0. By (10.8), there exists Æ > 0 such that j���?j � �
implies that M(�) < M(�?)� Æ. Hence,
P(jb�n � �?j > �) � P
�M(b�n) < M(�?)� Æ
�! 0: �
Next we want to prove Theorem 10.18. First we need a Lemma.
Lemma 10.31 The score function satis�es
E � [s(X; �)] = 0:
Proof. Note that 1 =Rf(x; �)dx. Di�erentiate both sides
of this equation to conclude that
0 =@
@�
Zf(x; �)dx =
Z@
@�f(x; �)dx
=
Z @f(x;�)
@�
f(x; �)f(x; �)dx =
Z@ log f(x; �)
@�f(x; �)dx
=
Zs(x; �)f(x; �)dx = E �s(X; �): �
Proof of Theorem 10.18. Let `(�) = logL(�). Then,
0 = `0(b�) � `0(�) + (b� � �)`00(�):
Rearrange the above equation to get b�� � = �`0(�)=`00(�) or, inother words,
pn(b� � �) =
1pn`0(�)
� 1n`00(�)
�TOP
BOTTOM:
Let Yi = @ log f(Xi; �)=@�. Recall that E (Yi) = 0 from the pre-
vious Lemma and also V(Yi) = I(�). Hence,
TOP = n�1=2Xi
Yi =pnY =
pn(Y � 0) W � N(0; I)
166 10. Parametric Inference
by the central limit theorem. Let Ai = �@2 log f(Xi; �)=@�2.
Then E (Ai) = I(�) and
BOTTOM = AP�! I(�)
by the law of large numbers. Apply Theorem 6.5 part (e), to
conclude that
pn(b� � �)
W
I(�)
d= N
�0;
1
I(�)
�:
Assuming that I(�) is a continuous function of �, it follows that
I(b�n) P�! I(�). Now
b�n � �bse =pnI1=2(b�n)(b�n � �)
=np
nI1=2(�)(b�n � �)o(I(b�n)
I(�)
)1=2
:
The �rst terms tends in distribution to N(0,1). The second term
tends in probability to 1. The result follows from Theorem 6.5
part (e). �
Outline of Proof of Theorem 10.24. Write,
b� = g(b�) � g(�) + (b� � �)g0(�) = � + (b� � �)g0(�):
Thus, pn(b� � �) �
pn(b� � �)g0(�)
and hence pnI(�)(b� � �)
g0(�)�pnI(�)(b� � �):
Theorem 10.18 tells us that the right hand side tends in distri-
bution to a N(0,1). Hence,pnI(�)(b� � �)
g0(�) N(0; 1)
10.12 Technical Appendix 167
or, in other words, b� � N��; se 2(b�n)�
where
se 2(b�n) = (g0(�))2
nI(�):
The result remains true if we substitute b� for � by Theorem 6.5
part (e). �
10.12.2 SuÆciency
A statistic is a function T (Xn) of the data. A suÆcient statis-
tic is a statistic that contains all the information in the data.
To make this more formal, we need some de�ntions.
De�nition 10.32 Write xn $ yn if f(xn; �) = c f(yn; �)
for some constant c that might depend on xn and yn but
not �. A statistic T (xn) is suÆcient if T (xn) $ T (yn)
implies that xn $ yn.
Notice that if xn $ yn then the likelihood function based on
xn has the same shape as the likelihood function based on yn.
Roughly speaking, a statistic is suÆcient if we can calculate the
likelihood function knowing only T (Xn).
Example 10.33 Let X1; : : : ; Xn � Bernoulli(p). Then L(p) =
pS(1� p)n�S where S =P
iXi so S is suÆcient. �
Example 10.34 Let X1; : : : ; Xn � N(�; �) and let T = (X;S).
Then,
f(Xn;�; �) =Yi
f(Xi;�; �)
=Yi
1
�p2�
exp
(�
1
2�2
Xi
(Xi � �)2
)
=
�1
�p2�
�nexp
��nS2
2�2
�exp
��n(X � �)2
2�2
�
168 10. Parametric Inference
The last expression depends on the data only through T and
therefore, T = (X;S) is a suÆcient statistic. Note that U =
(17X;S) is also a suÆcient statistic. If I tell you the value of U
then you can easily �gure out T and then compute the likelihood.
SuÆcient statistics are far from unique. Consider the following
statistics for the N(�; �2) model:
T1(Xn) = (X1; : : : ; Xn)
T2(Xn) = (X;S)
T3(Xn) = X
T4(Xn) = (X;S;X3):
The �rst statistic is just the whole data set. This is suÆcient.
The second is also suÆcient as we proved above. The third is not
suÆcient: you can't compute L(�; �) if I only tell you X. The
fourth statistic T4 is suÆcient. The statistics T1 and T4 are suÆ-
cient but they contain redundant information. Intuitively, there
is a sense in which T2 is a \more concise" suÆcient statistic
than either T1 or T4. We can express this formally by noting
that T2 is a function of T1 and similarly, T2 is a function of T4.
For example, T2 = g(T4) where g(a1; a2; a3) = (a1; a2). �
De�nition 10.35 A statistic T is minimally suÆcient if
(i) it is suÆcient and (ii) it is a function of every other
suÆcient statistic.
Theorem 10.36 T is minimal suÆcient if T (xn) = T (yn) if and
only if xn $ yn.
A statistic induces a partition on the set if outcomes. We can
think of suÆciency in terms of these partitions.
10.12 Technical Appendix 169
Example 10.37 Let X1; X2; X3 � Bernoulli(�). Let V = X1,
T =P
iXi and U = (T;X1). Here is the set of outcomes and
the statistics:
X1 X2 V T U
0 0 0 0 (0,0)
0 1 0 1 (1,0)
1 0 1 1 (1,1)
1 1 1 2 (2,1)
The partitions induced by these statistics are:
V �! f(0; 0); (0; 1)g; f(1; 0); (1; 1)gT �! f(0; 0)g; f(0; 1); (1; 0)g; f(1; 1)gU �! f(0; 0)g; f(0; 1)g; f(1; 0)g; f(1; 1)g:
Then V is not suÆcient but T and U are suÆcient. T is min-
imal suÆcient; U is not minimal since if xn = (1; 0; 1) and
yn = (0; 1; 1), then xn $ yn yet U(xn) 6= U(yn). The statistic
W = 17T generates the same partition as T . It is also minimal
suÆcient. �
Example 10.38 For a N(�; �2) model, T = (X;S) is a minimally
suÆcient statistic. For the Bernoulli model, T =P
iXi is a
minimally suÆcient statistic. For the Poisson model, T =P
iXi
is a minimally suÆcient statistic. Check that T = (P
iXi; X1)
is suÆcient but not minimal suÆcient. Check that T = X1 is
not suÆcient. �
I did not give the usual de�nition of suÆciency. The usual
de�nition is this: T is suÆcient if the distribution of Xn given
T (Xn) = t does not depend on �.
Example 10.39 Two coin ips. Let X = (X1; X2) � Bernoulli(p).
Then T = X1 + X2 is suÆcient. To see this, we need the dis-
tribution of (X1; X2) given T = t. Since T can take 3 possible
values, there are 3 conditional distributions to check. They are:
170 10. Parametric Inference
(i) the distribution of (X1; X2) given T = 0:
P (X1 = 0; X2 = 0jt = 0) = 1; P (X1 = 0; X2 = 1jt = 0) = 0;
P (X1 = 1; X2 = 0jt = 0) = 0; P (X1 = 1; X2 = 1jt = 0) = 0
(ii) the distribution of (X1; X2) given T = 1:
P (X1 = 0; X2 = 0jt = 1) = 0; P (X1 = 0; X2 = 1jt = 1) =1
2;
P (X1 = 1; X2 = 0jt = 1) =1
2; P (X1 = 1; X2 = 1jt = 1) = 0
(iii) the distribution of (X1; X2) given T = 2:
P (X1 = 0; X2 = 0jt = 2) = 0; P (X1 = 0; X2 = 1jt = 2) = 0;
P (X1 = 1; X2 = 0jt = 2) = 0; P (X1 = 1; X2 = 1jt = 2) = 1:
None of these depend on the parameter p. Thus, the distribution
of X1; X2jT does not depend on � so T is suÆcient. �
Theorem 10.40 (Factorization Theorem) T is suÆcient if and
only if there are functions g(t; �) and h(x) such that f(xn; �) =
g(t(xn); �)h(xn).
Example 10.41 Return to the two coin ips. Let t = x1 + x2.
Then
f(x1; x2; �) = f(x1; �)f(x2; �)
= �x1(1� �)1�x1�x2(1� �)1�x2
= g(t; �)h(x1; x2)
where g(t; �) = �t(1 � �)2�t and h(x1; x2) = 1. Therefore, T =
X1 +X2 is suÆcient. �
Now we discuss an implication of suÆciency in point estima-
tion. Let b� be an estimator of �. The Rao-Blackwell theorem says
that an estimator should only depend on the suÆcient statistic,
otherwise it can be improved. Let R(�; b�) = E �[(� � b�)2] denotethe mse of the estimator.
10.12 Technical Appendix 171
Theorem 10.42 (Rao-Blackwell) Let b� be an estimator and
let T be a suÆcient statistic. De�ne a new estimator by
e� = E (b�jT ):Then, for every �, R(�; e�) � R(�; b�):Example 10.43 Consider ipping a coin twice. Let b� = X1. This
is a well de�ned (and unbiased) estimator. But it is not a func-
tion of the suÆcient statistic T = X1 + X2. However, note
that e� = E (X1jT ) = (X1 +X2)=2. By the Rao-Blackwell Theo-
rem, e� has MSE at least as small as b� = X1. The same applies
with n coin ips. Again de�ne b� = X1 and T =P
iXi. Thene� = E (X1jT ) = n�1
PiXi has improved mse . �
10.12.3 Exponential Families
Most of the parametric models we have studied so far are spe-
cial cases of a general class of models called exponential families.
We say that ff(x; �; � 2 �g is a one-parameter exponential
family if there are functions �(�), B(�), T (x) and h(x) such
that
f(x; �) = h(x)e�(�)T (x)�B(�):
It is easy to see that T (X) is suÆcient. We call T the natural
suÆcient statistic.
Example 10.44 Let X � Poisson(�). Then
f(x; �) =�xe��
x!=
1
x!ex log ���
and hence, this is an exponential family with �(�) = log �, B(�) =
�, T (x) = x, h(x) = 1=x!. �
Example 10.45 Let X � Bin(n; �). Then
f(x; �) =
�n
x
��x(1��)n�x =
�n
x
�exp
�x log
��
1� �
�+ n log(1� �)
�:
172 10. Parametric Inference
In this case,
�(�) = log
��
1� �
�; B(�) = �n log(�)
and
T (x) = x; h(x) =
�n
x
�:
�
We can rewrite an exponential family as
f(x; �) = h(x)e�T (x)�A(�)
where � = �(�) is called the natural parameter and
A(�) = log
Zh(x)e�T (x)dx:
For example a Poisson can be written as f(x; �) = e�x�e�
=x!
where the natural parameter is � = log �.
LetX1; : : : ; Xn be iid from a exponential family. Then f(xn; �)
is an exponential family:
f(xn; �) = hn(xn)hn(x
n)e�(�)Tn(xn)�Bn(�)
where hn(xn) =
Qih(xi), Tn(x
n) =P
iT (xi) and Bn(�) =
nB(�). This implies thatP
iT (Xi) is suÆcient.
Example 10.46 Let X1; : : : ; Xn � Uniform(0; �). Then
f(xn; �) =1
�nI(x(n) � �)
where I is 1 if the term inside the brackets is true and 0 other-
wise, and x(n) = maxfx1; : : : ; xng. Thus T (Xn) = maxfX1; : : : ; Xngis suÆcient. But since T (Xn) 6=
PiT (Xi), this cannot be an ex-
ponential family. �
10.12 Technical Appendix 173
Theorem 10.47 Let X have density in an exponential family.
Then,
E (T (X)) = A0(�); V(T (X)) = A00(�):
If � = (�1; : : : ; �k) is a vector, then we say that f(x; �) has
exponential family form if
f(x; �) = h(x) exp
(kXj=1
�j(�)Tj(x)� B(�)
):
Again, T = (T1; : : : ; Tk) is suÆcient and n iid samples also has
exponential form with suÆcient statistic (P
iT1(Xi); : : : ;
PiTk(Xi)).
Example 10.48 Consider the normal family with � = (�; �). Now,
f(x; �) = exp
��
�2x�
x2
2�2�
1
2
��2
�2+ log(2��2)
��:
This is exponential with
�1(�) =�
�2; T1(x) = x
�2(�) = �1
2�2; T2(x) = x2
B(�) =1
2
��2
�2+ log(2��2)
�; h(x) = 1:
Hence, with n iid samples, (P
iXi;P
iX2i) is suÆcient. �
As before we can write an exponential family as
f(x; �) = h(x) exp�T T (x)� � A(�)
where A(�) =
Rh(x)eT
T (x)�dx. It can be shown that
E (T (X)) = _A(�) V(T (X)) = �A(�)
where the �rst expression is the vector of partial derivatives and
the second is the matrix of second derivatives.
174 10. Parametric Inference
10.13 Exercises
1. Let X1; : : : ; Xn � Gamma(�; �). Find the method of mo-
ments estimator for � and �.
2. Let X1; : : : ; Xn � Uniform(a; b) where a and b are un-
known parameters and a < b.
(a) Find the method of moments estimators for a and b.
(b) Find the mle ba and bb.(c) Let � =
RxdF (x). Find the mle of � .
(d) Let b� be the mle from (1bc). Let e� be the nonpara-
metric plug-in estimator of � =RxdF (x). Suppose that
a = 1, b = 3 and n = 10. Find the mse of b� by simulation.Find the mse of e� analytically. Compare.
3. Let X1; : : : ; Xn � N(�; �2). Let � be the .95 percentile,
i.e. P(X < �) = :95.
(a) Find the mle of � .
(b) Find an expression for an approximate 1�� con�dence
interval for � .
(c) Suppose the data are:
3.23 -2.50 1.88 -0.68 4.43 0.17
1.03 -0.07 -0.01 0.76 1.76 3.18
0.33 -0.31 0.30 -0.61 1.52 5.43
1.54 2.28 0.42 2.33 -1.03 4.00
0.39
Find the mle b� . Find the standard error using the delta
method. Find the standard error using the parametric
bootstrap.
10.13 Exercises 175
4. Let X1; : : : ; Xn � Uniform(0; �). Show that the mle is
consistent. Hint: Let Y = maxfX1; :::; Xng:. For any c,
P(Y < c) = P(X1 < c;X2 < c; :::; Xn < c) = P(X1 <
c)P(X2 < c):::P(Xn < c).
5. Let X1; : : : ; Xn � Poisson(�). Find the method of mo-
ments estimator, the maximum likelihood estimator and
the Fisher information I(�).
6. Let X1; :::; Xn � N(�; 1). De�ne
Yi =
�1 if Xi > 0
0 if Xi � 0:
Let = P(Y1 = 1):
(a) Find the maximum likelihood estimate b of .
(b) Find an approximate 95 per cent con�dence interval
for :
(c) De�ne e = (1=n)P
iYi. Show that e is a consistent
estimator of .
(d) Compute the asymptotic relative eÆciency of e to b .Hint: Use the delta method to get the standard error of the
mle . Then compute the standard error (i.e. the standard
deviation) of e .(e) Suppose that the data are not really normal. Show thatb is not consistent. What, if anything, does b converge to?
7. (Comparing two treatments.) n1 people are given treat-
ment 1 and n2 people are given treatment 2. Let X1 be
the number of people on treatment 1 who respond fa-
vorably to the treatment and let X2 be the number of
people on treatment 2 who respond favorably. Assume
that X1 � Binomial(n1; p1) X2 � Binomial(n2; p2). Let
= p1 � p2.
176 10. Parametric Inference
(a) Find the mle for .
(b) Find the Fisher information matrix I(P1; p2).
(c) Use the multiparameter delta method to �nd the asymp-
totic standard error of b .(d) Suppose that n1 = n2 = 200, X1 = 160 and X2 = 148.
Find b . Find an approximate 90 percent con�dence inter-
val for using (i) the delta method and (ii) the parametric
bootstrap.
8. Find the Fisher information matrix for Example 10.29.
9. Let X1; :::; Xn Normal(�; 1): Let � = e� and let b� = eX
be the mle. Create a data set (using � = 5) consisting of
n=100 observations.
(a) Use the delta method to get bse and 95 percent con�-
dence interval for �. Use the parametric bootstrap to getbse and 95 percent con�dence interval for �. Use the non-
parametric bootstrap to get bse and 95 percent con�dence
interval for �: Compare your answers.
(b) Plot a histogram of the bootstrap replications for the
parametric and nonparametric bootstraps. These are esti-
mates of the distribution of b�. The delta method also gives
an approximation to this distribution namely, Normal(b�; se 2).
Compare these to the true sampling distribution of b� (whihcyou can get by simulation). Which approximation, para-
metric bootstrap, bootstrap, or delta method is closer to
the true distribution?
10. LetX1; :::; Xn Unif(0; �): Themle is b� = X(n) = maxfX1; :::; Xng.Generate a data set of size 50 with � = 1:
(a) Find the distribution of b� analytically. Compare the
true distribution of b� to the histograms from the paramet-
ric and nonparametric bootstraps.
10.13 Exercises 177
(b) This is a case where the nonparametric bootstrap
does very poorly. Show that, for the parametric bootstrap
P(b�� = b�) = 0 but for the nonparametric bootstrap P(b�� =b�) � :632. Hint: show that, P(b�� = b�) = 1� (1� (1=n))n
then take the limit as n gets large. What is the implication
of this?
178 10. Parametric Inference
11
Hypothesis Testing and p-values
Suppose we want to know if a certain chemical causes cancer.
We take some rats and randomly divide them into two groups.
We expose one group to the chemical and then we compare
the cancer rate in the two groups. Consider the following two
h ypotheses:
The Null Hypothesis: The cancer rate is the same
in the two groups
The Alternative Hypothesis: The cancer rate is
not the same in the two groups.
If the exposed group has a much higher rate of cancer than the
unexposed group then we will reject the null hypothesis and
conclude that the evidence favors the alternative h ypothesis;
in other words we will conclude that there is evidence that the
chemical causes cancer. This is an example of hypothesis testing.
More formally, suppose that we partition the parameter space
� into two disjoint sets �0 and �1 and that we wish to test
H0 : � 2 �0 v ersus H1 : � 2 �1: (11.1)
180 11. Hypothesis Testing and p-values
We call H0 the null hypothesis and H1 the alternative hy-
pothesis.
Let X be a random variable and let X be the range of X. We
test a hypothesis by �nding an appropriate subset of outcomes
R � X called the rejection region. If X 2 R we reject the
null hypothesis, otherwise, we do not reject the null hypothesis:
X 2 R =) reject H0
X =2 R =) retain (do not reject) H0
Usually the rejection region R is of the form
R =nx : T (x) > c
o
where T is a test statistic and c is a critical value. The prob-
lem in hypothesis testing is to �nd an appropriate test statistic
T and an appropriate cuto� value c.
Warning! There is a tendency to use hypthesis testing meth-
ods even when they are not appropriate. Often, estimation and
con�dence intervals are better tools. Use hypothesis testing only
when you want to test a well de�ned hypothesis.
Hypothesis testing is like a legal trial. We asume someone is
innocent unless the evidence strongly suggests that he is guilty.
Similarly, we retain H0 unless there is strong evidence to reject
H0. There are two types of errors when can make. Rejecting H0
when H0 is true is called a type I error. Rejecting H1 when
H1 is true is called a type II error. The possible outcomes for
hypothesis testing are summarized in the Table below:
11. Hypothesis Testing and p-values 181
Retain Null Reject Null
H0 truep
type I error
H1 true type II errorp
Summary of outcomes of hypothesis testing.
De�nition 11.1 The power function of a test with rejection re-
gion R is de�ned by
�(�) = P�(X 2 R): (11.2)
The size of a test is de�ned to be
� = sup�2�0
�(�): (11.3)
A test is said to have level � if its size is less than or equal to
�.
A hypothesis of the form � = �0 is called a simple hypoth-
esis. A hypothesis of the form � > �0 or � < �0 is called a
composite hypothesis. A test of the form
H0 : � = �0 versus H1 : � 6= �0
is called a two-sided test. A test of the form
H0 : � � �0 versus H1 : � > �0
or
H0 : � � �0 versus H1 : � < �0
is called a one-sided test. The most common tests are two-
sided.
182 11. Hypothesis Testing and p-values
Example 11.2 Let X1; : : : ; Xn � N(�; �) where � is known. We
want to test H0 : � � 0 versus H1 : � > 0. Hence, �0 = (�1; 0]
and �1 = (0;1). Consider the test:
reject H0 if T > c
where T = X. The rejection region is R =nxn : T (xn) > c
o.
Let Z denote a standard normal random variable. The power
function is
�(�) = P��X > c
�= P�
�pn(X � �)
�>
pn(c� �)
�
�= P
�Z >
pn(c� �)
�
�= 1� �
�pn(c� �)
�
�:
This function is increasing in �. Hence
size = sup�<0
�(�) = �(0) = 1� �
�pnc
�
�:
To get a size � test, set this equal to � and solve for c to get
c =���1(1� �)p
n:
So we reject when X > ���1(1��)=pn. Equivalently, we reject
when pn (X � 0)
�> z�: �
Finding most powerful tests is hard and, in many cases, most
powerful tests don't even exist. Instead of searching for most
powerful tests, we'll just consider three widely used tests: the
Wald test, the �2 test and the permutation test. A fourth test,
the likelihood ratio test, is dicussed in the appendix.
11.1 The Wald Test 183
11.1 The Wald Test
Let � be a scalar parameter, let b� be an estimate of � and letbse be the estimated standard error of b�. The test is named
after Abraham Wald
(1902-1950), who
was a very in uen-
tial mathematical
statistician.
De�nition 11.3 The Wald Test
Consider testing
H0 : � = �0 versus H1 : � 6= �0:
Assume that b� is asymptotically Normal:
pn(b� � �0)bse N(0; 1):
The size � Wald test is: reject H0 when jW j > z�=2
where
W =b� � �0bse :
Theorem 11.4 Asymptotically, the Wald test has size �, that is,
P�0
�jZj > z�=2
�! �
as n!1.
Proof. Under � = �0, (b� � �0)= bse N(0; 1). Hence, the
probability of rejecting when the null � = �0 is true is
P�0
�jW j > z�=2
�= P�0
jb� � �0jbse > z�=2
!! P
�jN(0; 1)j > z�=2
�= �: �
Remark 11.5 Most texts de�ne the Wald test slightly di�erently.
They use the standard error computed at � = �0 rather then at
the estimated value b�. Both versions are valid.
184 11. Hypothesis Testing and p-values
Let us consider the power of the Wald test when the null
hypothesis is false.
Theorem 11.6 Suppose the true value of � is �? 6= �0. The power
�(�?) { the probability of correctly rejecting the null hypothesis
{ is given (approximately) by
1� �
��0 � �?bse + z�=2
�+ �
��0 � �?bse � z�=2
�: (11.4)
Recall that bse tends to 0 as the sample size increases. Inspect-
ing (11.4) closely we note that: (i) the power is large if �? is far
from �0 and (ii) the power is large if the sample size is large.
Example 11.7 (Comparing Two Prediction Algorithms) We test a
prediction algorithm on a test set of size m and we test a second
prediction algorithm on a second test set of size n. Let X be
the number of incorrect predictions for algorithm 1 and let Y
be the number of incorrect predictions for algorithm 2. Then
X � Binomial(m; p1) and Y � Binomial(n; p2). To test the null
hypothesis that p1 = p2 write
H0 : Æ = 0 versus H1 : Æ 6= 0
where Æ = p1 � p2. The mle is bÆ = bp1 � bp2 with estimated
standard error
bse =
rbp1(1� bp1)m
+bp2(1� bp2)
n:
The size � Wald test is to reject H0 when jW j > z�=2 where
W =bÆ � 0bse =
bp1 � bp2qbp1(1�bp1)
m+
bp2(1�bp2)n
:
The power of this test will be largest when p1 is far from p2 and
when the sample sizes are large.
11.1 The Wald Test 185
What if we used the same test set to test both algorithms?
The two samples are no longer independent. Instead we use the
following strategy. Let Xi = 1 if algorithm 1 is correct on test
case i and Xi = 0 otherwise. Let Yi = 1 if algorithm 2 is correct
on test case i Yi = 0 otherwise. A typical data set will look
something like this:
Test Case Xi Yi Di = Xi � Yi1 1 0 1
2 1 1 0
3 1 1 0
4 0 1 -1
5 0 0 0...
......
...
n 0 1 -1
Let
Æ = E (Di) = E (Xi)� E (Yi) = P(Xi = 1)� P(Yi = 1):
Then bÆ = D = n�1P
n
i=1Di and bse (bÆ) = S=pn where S2 =
n�1P
n
i=1(Di�D)2. To test H0 : Æ = 0 versus H1 : Æ 6= 0 we use
W = bÆ= bse and reject H0 if jW j > z�=2. This is called a paired
comparison. �
Example 11.8 (Comparing Two Means.) Let X1; : : : ; Xm and Y1; : : : ; Yn
be two independent samples from populations with means �1 and
�2, respectively. Let's test the null hypothesis that �1 = �2.
Write this as H0 : Æ = 0 versus H1 : Æ 6= 0 where Æ = �1 � �2.
Recall that the nonparametric plug-in estimate of Æ is bÆ = X�Ywith estimated standard error
bse =
rs21m
+s22n
186 11. Hypothesis Testing and p-values
where s21 and s22 are the sample variances. The size � Wald test
rejects H0 when jW j > z�=2 where
W =bÆ � 0bse =
X � Yqs21
m+
s22
n
: �
Example 11.9 (Comparing Two Medians.) Consider the previous
example again but let us test whether the medians of the two
distributions are the same. Thus, H0 : Æ = 0 versus H1 : Æ 6=0 where Æ = �1 � �2 where �1 and �2 are the medians. The
nonparametric plug-in estimate of Æ is bÆ = b�1� b�2 where b�1 andb�2 are the sample medians. The estimated standard error bse ofbÆ can be obtained from the bootstrap. The Wald test statistic is
W = bÆ= bse . �There is a relationship between the Wald test and the 1� �
asymptotic con�dence interval b� � bse z�=2.Theorem 11.10 The size � Wald test rejects H0 : � = �0 versus
H1 : � 6= �0 if and only if �0 =2 C where
C = (b� � bse z�=2; b� + bse z�=2):Thus, testing the hypothesis is equivalent to checking whether
the null value is in the con�dence interval.
11.2 p-values
Reporting \reject H0" or \retain H0" is not very informative.
Instead, we could ask, for every �, whether the test rejects at
that level. Generally, if the test rejects at level � it will also
reject at level �0 > �. Hence, there is a smallest � at which the
test rejects and we call this number the p-value.
11.2 p-values 187
De�nition 11.11 Suppose that for every � 2 (0; 1) we have
a size � test with rejection region R�. Then,
p-value = infn� : T (Xn) 2 R�
o:
That is, the p-value is the smallest level at which we can
reject H0.
Informally, the p-value is a measure of the evidence against
H0: the smaller the p-value, the stronger the evidence against
H0. Typically, researchers use the following evidence scale:
p-value evidence
< :01 very strong evidence againt H0
.01 - .05 strong evidence againt H0
.05 - .10 weak evidence againt H0
> :1 little or no evidence againt H0
Warning! A large p-value is not strong evidence in favor of
H0. A large p-value can occur for two reasons: (i) H0 is true or
(ii) H0 is false but the test has low power.
But do not confuse the p-value with P(H0jData). The p- We discuss quanti-
ties like P(H0jData)in the chapter on
Bayesian inference.
value is not the probability that the null hypothesis is
true.
The following result explains how to compute the p-value.
Theorem 11.12 Suppose that the size � test is of the form
reject H0 if and only if T (Xn) � c�:
Then,
p-value = sup�2�0
P�(T (Xn) � T (xn)):
In words, the p-value is the probability (under H0) of observing
a value of the test statistic as or more extreme than what was
actually observed.
188 11. Hypothesis Testing and p-values
For a Wald test, W has an aproximate N(0; 1) distribution
under H0. Hence, the p-value is
p-value � P
�jZj > jwj
�= 2P
�Z < �jwj
�= 2�
�Z < �jwj
�(11.5)
where Z � N(0; 1) and w = (b�� �0)= bse is the observed value of
the test statistic.
Here is an important property of p-values.
Theorem 11.13 If the test statistic has a continuous distribu-
tion, then under H0 : � = �0, the p-value has a Uniform (0,1)
distribution.
If the p-value is less than .05 then people often report that
\the result is statistically signi�cant at the 5 per cent level."
This just means that the null would be rejected if we used a size
� = 0:05 test. In general, the size � test rejects if and only if
p-value < �.
Example 11.14 Recall the cholesterol data from Example 8.11.
To test of the means ard di�erent we compute
W =bÆ � 0bse =
X � Yqs21
m+
s22
n
=216:2� 195:3
52 + 2:42= 3:78:
To compute the p-value, let Z � N(0; 1) denote a standard Nor-
mal random variable. Then,
p-value = P(jZj > 3:78) = 2P(Z < �3:78) = :0002
which is very strong evidence against the null hypothesis. To
test if the medians are di�erent, let b�1 and b�2 denote the sample
medians. Then,
W =b�1 � b�2bse =
212:5� 194
7:7= 2:4
11.3 The �2 distribution 189
where the standard error 7.7 was found using the boostrap. The
p-value is
p-value = P(jZj > 2:4) = 2P(Z < �2:4) = :02
which is strong evidence against the null hypothesis. �
Warning! A result might be statistically signi�cant and yet
the size of the e�ect might be small. In such a case we have a
result that is statistically signi�cant but not practically signi�-
cant. It is wise to report a con�dence interval as well.
11.3 The �2 distribution
Let Z1; : : : ; Zk be independent, standard normals. Let V =Pk
i=1 Z2i. Then we say that V has a �2 distribution with k de-
grees of freedom, written V � �2k. The probability density of V
is
f(v) =v(k=2)�1e�v=2
2k=2�(k=2)
for v > 0. It can be shown that E (V ) = k and V(k) = 2k. We
de�ne the upper � quantile �2k;�
= F�1(1 � �) where F is the
cdf . That is, P(�2k> �2
k;�) = �.
11.4 Pearson's �2 Test For Multinomial Data
Pearson's �2 test is used for multinomial data. Recall that
X = (X1; : : : ; Xk) has a multinomial distribution if
f(x1; : : : ; xk; p) =
�n
x1 : : : xk
�px11 � � � p
xk
k
where �n
x1 : : : xk
�=
n!
x1! � � �xk!:
The mle of p is bp = (bp1; : : : ; bpk) = (X1=n; : : : ; Xk=n).
190 11. Hypothesis Testing and p-values
Let (p01; : : : ; p0k) be some �xed set of probabilities and sup-
pose we want to test
H0 : (p1; : : : ; pk) = (p01; : : : ; p0k) versus H1 : (p1; : : : ; pk) 6= (p01; : : : ; p0k):
Pearson's �2statistic is
T =
kXj=1
(Xj � np0j)2
np0j=
kXj=1
(Oj � Ej)2
Ej
where Oj = Xj is the observed data and Ej = E (Xj) = np0j is
the expected value of Xj under H0.
Theorem 11.15 Under H0, T �2k�1. Hence the test: reject H0
if T > �2k�1;� has asymptotic level �. The p-value is P(�2
k> t)
where t is the observed value of the test statistic.
Example 11.16 (Mendel's peas.) Mendel bred peas with round yel-
low seeds and wrinkled green seeds. There are four types of progeny:
round yellow, wrinkled yellow, round green and (4) wrinkled
green. The number of each type is multinomial with probability
(p1; p2; p3; p4). His theory of inheritance predicts that
p =
�9
16;3
16;3
16;1
16
�� p0:
In n = 556 trials he observed X = (315; 101; 108; 32). Since,
np10 = 312:75; np20 = np30 = 104:25 and np40 = 34:75, the test
statistic is
�2 =(315� 312:75)2
312:75+(101� 104:25)2
104:25+(108� 104:25)2
104:25+(32� 34:75)2
34:75= 0:47:
The � = :05 value for a �23 is 7.815. Since 0:47 is not larger
than 7.815 we do not reject the null. The p-value is
p-value = P(�23 > :47) = :07
which is only moderate evidence againt H0. Hence, the data do
not contradict Mendel's theory. Interestingly, there is some con-
troversey about whether Mendel's results are \too good." �
11.5 The Permutation Test 191
11.5 The Permutation Test
The permutation test is a nonparametric method for test-
ing whether two distribution are the same. This test is \exact"
meaning that it is not based on large sample theory approxima-
tions. Suppose that X1; : : : ; Xm � FX and Y1; : : : ; Yn � FY are
two independent samples and H0 is the hypothesis that the two
samples are identically distributed. This is the type of hypothe-
sis we would consider when testing whether a treatment di�ers
from a placebo. More precisely we are testing
H0 : FX = FY versus H1 : FX 6= FY :
Let T (x1; : : : ; xm; y1; : : : ; yn) be some test statistic, for example,
T (X1; : : : ; Xm; Y1; : : : ; Yn) = jXm � Y nj:
Let N = m + n and consider forming all N ! permutations of
the dataX1; : : : ; Xm; Y1; : : : ; Yn. For each permutation, compute
the test statistic T . Denote these values by T1; : : : ; TN !. Under
the null hypothesis, each of these values is equally likely. The Under the null hy-
pothesis, given the
ordered data values,
X1; : : : ; Xm; Y1; : : : ; Ynis uniformly dis-
tributed over the
N ! permutations of
the data.
distribution P0 that puts mass 1=N ! on each Tj is called the
permutation distribution of T . Let tobs be the observed value
of the test statistic. Assuming we reject when T is large, the p-
value is
p-value = P0(T > tobs) =1
N !
N !Xj=1
I(Tj > tobs):
Example 11.17 Here is a toy example to make the idea clear.
Suppose the data are: (X1; X2; Y1) = (1; 9; 3). Let T (X1; X2; Y1) =
jX � Y j = 2. The permutations are:
192 11. Hypothesis Testing and p-values
permutation value of T probability
(1,9,3) 2 1/6
(9,1,3) 2 1/6
(1,3,9) 7 1/6
(3,1,9) 7 1/6
(3,9,1) 5 1/6
(9,3,1) 5 1/6
The p-value is P(T > 2) = 4=6. �
Usually, it is not practical to evaluate all N ! permutations.
We can approximate the p-value by sampling randomly from
the set of permutations. The fraction of times Tj > tobs among
these samples approximates the p-value.
Algorithm for Permutation Test
1. Compute the observed value of the test statistic tobs =
T (X1; : : : ; Xm; Y1; : : : ; Yn).
2. Randomly permute the data. Compute the statistic again
using the permuted data.
3. Repeat the previous step B times and let T1; : : : ; TB de-
note the resulting values.
4. The approximate p-value is
1
B
BXj=1
I(Tj > tobs):
Example 11.18 DNA microarrays allow researchers to measure
the expression levels of thousands of genes. The data are the lev-
els of messenger RNA (mRNA) of each gene, which is thought
11.6 Multiple Testing 193
to provide a measure how much protein that gene produces.
Roughly, the larger the number, the more active the gene. The
table below, reproduced from Efron, et. al. (JASA, 2001, p. 1160)
shows the expression levels for genes from two types of liver can-
cer cells. There are 2,638 genes in this experiment but here we
show just the �rst two. The data are log-ratios of the intensity
levels of two di�erent color dyes used on the arrays.
Type I Type II
Patient 1 2 3 4 5 6 7 8 9 10
Gene 1 230.0 -1,350 -1,580.0 -400 -760 970 110 -50 -190 -200
Gene 2 470.0 -850 -.8 -280 120 390 -1730 -1360 -.8 -330...
......
......
......
......
......
Let's test whether the median level of gene 1 is di�erent be-
tween the two groups. Let �1 denote the median level of gene 1
of Type I and let �2 denote the median level of gene 1 of Type II.
The absolute di�erence of sample medians is T = jb�1�b�2j = 710.
Now we estimate the permutation distribution by simulation and
we �nd that the estimated p-value is .045. Thus, if we use a
� = :05 level of signi�cance, we would say that there is evidence
to reject the null hypothesis of no di�erence. �
In large samples, the permutation test usually gives similar
results to a test that is based on large sample theory. The per-
mutation test is thus most useful for small samples.
11.6 Multiple Testing
In some situations we may conduct many hypothesis tests. In
example 11.18, there were actually 2,638 genes. If we tested for a
di�erence for each gene, we would be conducting 2,638 separate
hypothesis tests. Suppose each test is conducted at level �. For
any one test, the chance of a false rejection of the null is �.
But the chance of at least one false rejection is much higher.
194 11. Hypothesis Testing and p-values
This is themultiple testing problem. The problem comes up
in many data mining situations where one may end up testing
thousands or even millions of hypotheses. There are many ways
to deal with this problem. Here we discuss two methods.
Consider m hypothesis tests:
H0i versus H1i; i = 1; : : : ; m
and let P1; : : : ; Pm denote m p-values for these tests.
The Bonferroni Method
Given p-values P1; : : : ; Pm, reject null hypothesis H0i if Pi < �=m.
Theorem 11.19 Using the Bonferroni method, the probability of
falsely rejecting any null hypotheses is less than or equal to �.
Proof. Let R be the event that at least one null hypotheses
is falsely rejected. Let Ri be the event that the ith null hypoth-
esis is falsely rejected. Recall that if A1; : : : ; Ak are events then
P(Sk
i=1Ai) �P
k
i=1 P(Ai). Hence,
P(R) = P
m[i=1
Ri
!�
mXi=1
P (Ri) =
mXi=1
�
m= �
from Theorem 11.13. �
Example 11.20 In the gene example, using � = :05, we have
that :05=2638 = :00001895375. Hence, for any gene with p-value
less than .00001895375, we declare that there is a signi�cant
di�erence.
The Bonferroni method is very conservative because it is try-
ing to make it unlikely that you would make even one false
rejection. Sometimes, a more reasonable idea is to control the
11.6 Multiple Testing 195
false discovery rate (FDR) which is de�ned as the mean of the
number of false rejections divided by the number of rejections.
Suppose we reject all null hypotheses whose p-values fall be-
low some threshold. Let m0 be the number of null hypotheses
are true and m1 = m�m0 null hypotheses are false. The tests
can be categorize in a 2� 2 as in the following table.
H0 Not Rejected H0 Rejected Total
H0 True U V m0
H0 False T S m1
Total m�R R m
De�ne the False Discovery Proportion (FDP)
FDP =
�V=R if R > 0
0 if R = 0:
The FDP is the proportion of rejections that are incorrect. Next
de�ne FDR = E (FDP).
The Benjamini-Hochberg (BH) Method
1. Let P(1) < � � � < P(m) denote the ordered p-values.
2. De�ne
`i =i�
Cmm; and R = max
ni : P(i) < `i
o(11.6)
where Cm is de�ned to be 1 if the p-values are independent
and Cm =P
m
i=1(1=i) otherwise.
3. Let t = P(R); we call t the BH rejection threshold.
4. Rejects all null hypotheses H0i for which Pi � t:
196 11. Hypothesis Testing and p-values
Theorem 11.21 (Benjamini and Hochberg) If the procedure
above is applied, then regardless of how many nulls are true
and regardless of the distribution of the p-values when the null
hypthesis is false,
FDR = E (FDP) �m0
m� � �:
Example 11.22 Figure 11.1 shows 7 ordered p-values plotted as
vertical lines. If we tested at level � without doing any corec-
tion for mutiple testing, we would reject all hypotheses whose
p-values are less than �. In this case, the 5 hypotheses corre-
sponding to the 5 smallest p-values are rejected. The Bonferroni
method rejects all hypotheses whose p-values are less than �=m.
In this example, this leads to no rejections. The BH threshold
corresponds to the last p-value that falls under the line with slope
�. This leads to three hypotheses being rejected in this case. �
Summary. The Bonferonni method controls the probability
of a single false rejection. This is very strict and leads to low
power when there are many tests. The FDR method controls the
fraction of false discoveries which is a more reasonable criterion
when there are many tests.
11.7 Technical Appendix
11.7.1 The Neyman-Pearson Lemma
In the special case of a simple null H0 : � = �0 and a simple
alternative H1 : � = �1 we can say precisely what the most
powerful test is.
Theorem 11.23 (Neyman-Pearson.) Suppose we test H0 : � =
�0 versus H1 : � = �1. Let
T =L(�1)L(�0)
=
Qn
i=1 f(xi; �1)Qn
i=1 f(xi; �0):
11.7 Technical Appendix 197
threshold
reject don't reject
p-values
0 1
��
t
�=m
FIGURE 11.1. Schematic illustration of Benjamini-Hochberg pro-
cedure. All hypotheses corresponding to the last undercrossing are
rejected.
198 11. Hypothesis Testing and p-values
Suppose we reject H0 when T > k. If we choose k so that
P�0(T > k) = � then this test is the most powerful, size �
test. That is among all tests with size �, this test maximizes the
power �(�1).
11.7.2 Power of the Wald Test
Proof of Theorem 11.6.
Let Z � N(0; 1). Then,
Power = �(�?)
= P�?(Reject H0)
= P�?(jW j > z�=2)
= P�?
jb� � �0jbse > z�=2
!
= P�?
b� � �0bse > z�=2
!+ P�?
b� � �0bse < �z�=2
!= P�?
�b� > �0 + bse z�=2�+ P�?
�b� < �0 � bse z�=2�= P�?
b� � �?bse >�0 � �?bse + z�=2
!+ P�?
b� � �?bse <�0 � �?bse � z�=2
!
� P
�Z >
�0 � �?bse + z�=2
�+ P
�Z <
�0 � �?bse � z�=2
�= 1� �
��0 � �?bse + z�=2
�+ �
��0 � �?bse � z�=2
�: �
11.7.3 The t-test
To test H0 : � = �0 where � is the mean, we can use the Wald
test. When the data are assumed to be Normal and the sample
size is small, it is common instead to use the t-test. A random
variable T has a t-distribution with k degrees of freedom if it has
11.7 Technical Appendix 199
density
f(t) =��k+12
�pk��
�k
2
� �1 + t2
k
�(k+1)=2:
When the degrees of freedom k ! 1, this tends to a Normal
distribution. When k = 1 it reduces to a Cauchy.
Let X1; : : : ; Xn � N(�; �2) where � = (�; �2) are both un-
known. Suppose we want to test � = �0 versus � 6= �0. Let
T =
pn(Xn � �0)
Sn
where S2nis the sample variance. For large samples T � N(0; 1)
under H0. The exact distribution of T under H0 is tn�1. Hence
if we reject when jT j > tn�1;�=2 then we get a size� test.
11.7.4 The Likelihood Ratio Test
Let � = (�1; : : : ; �q; �q+1; : : : ; �r) and suppose that �0 consists
of all parameter values � such that (�q+1; : : : ; �r) = (�0;q+1; : : : ; �0;r).
De�nition 11.24 De�ne the likelihood ratio statistic
by
� = 2 log
�sup
�2� L(�)sup�2�0
L(�)
�= 2 log
L(b�)L(b�0)
!
where b� is the mle and b�0 is the mle when � is restricted
to lie in �0. The likelihood ratio test is: reject H0
when �(xn) > �2r�q;�.
For example, if � = (�1; �2; �3; �4) and we want to test the null
hypothesis that �3 = �4 = 0 then the limiting distribution has
4� 2 = 2 degrees of freedom.
200 11. Hypothesis Testing and p-values
Theorem 11.25 Under H0,
2 log�(xn)d! �2
r�q:
Hence, asymptotically, the LR test is level �.
11.8 Bibliographic Remarks
The most complete book on testing is (1986). See also Casella
and Berger (1990, Chapter 8). The FDR method is due to Ben-
jamini and Hochberg (1995).
11.9 Exercises
1. Prove Theorem 11.13.
2. Prove Theorem 11.10.
3. LetX1; :::; Xn � Uniform(0; �) and let Y = maxfX1; :::; Xng.We want to test
H0 : � = 1=2 versus H1 : � > 1=2.
The Wald test is not appropriate since Y does not converge
to a Normal. Suppose we decide to test this hypothesis by
rejecting H0 when Y > c.
(a) Find the power function.
(b) What choice of c will make the size of the test .05?
(c) In a sample of size n = 20 with Y=0.48 what is the
p-value? What conclusion about H0 would you make?
(d) In a sample of size n = 20 with Y=0.52 what is the
p-value? What conclusion about H0 would you make?
4. There is a theory that people can postpone their death
until after an important event. To test the theory, Phillips
11.9 Exercises 201
and King (1988) collected data on deaths around the Jew-
ish holiday Passover. Of 1919 deaths, 922 died the week
before the holiday and 997 died the week after. Think
of this as a binomial and test the null hypothesis that
� = 1=2: Report and interpret the p-value. Also construct
a con�dence interval for �:
Reference:
Phillips, D.P. and King, E.W. (1988).
Death takes a holiday: Mortality surrounding major social
occasions.
The Lancet, 2, 728-732.
5. In 1861, 10 essays appeared in the New Orleans Daily
Crescent. They were signed \Quintus Curtuis Snodgrass"
and some people suspected they were actually written by
Mark Twain. To investigate this, we will consider the pro-
portion of three letter words found in an author's work.
From eight Twain essays we have:
.225 .262 .217 .240 .230 .229 .235 .217
From 10 Snodgrass essays we have:
.209 .205 .196 .210 .202 .207 .224 .223 .220 .201
(source: Rice xxxx)
(a) Perform a Wald test for equality of the means. Use
the nonparametric plug-in estimator. Report the p-value
and a 95 per cent con�dence interval for the di�erence of
means. What do you conclude?
(b) Now use a permutation test to avoid the use of large
sample methods. What is your conclusion?
6. Let X1; : : : ; Xn � N(�; 1). Consider testing
H0 : � = 0 versus � = 1:
202 11. Hypothesis Testing and p-values
Let the rejection region be R = fxn : T (xn) > cg whereT (xn) = n�1
Pn
i=1Xi.
(a) Find c so that the test has size �.
(b) Find the power under H1, i.e. �nd �(1).
(c) Show that �(1)! 1 as n!1.
7. Let b� be themle of a parameter � and let bse = fnI(b�)g�1=2where I(�) is the Fisher information. Consider testing
H0 : � = �0 versus � 6= �0:
Consider the Wald test with rejection region R = fxn :
jZj > z�=2g where Z = (b� � �0)= bse. Let �1 > �0 be some
alternative. Show that �(�1)! 1.
8. Here are the number of elderly Jewish and Chinese women
who died just before and after the Chinese Harvest Moon
Festival.
Week Chinese Jewish
-2 55 141
-1 33 145
1 70 139
2 49 161
Compare the two mortality patterns.
9. A randomized, double-blind experiment was conducted to
assess the e�ectiveness of several drugs for reducing post-
operative nausea. The data are as follows.
Number of Patients Incidence of Nausea
Placebo 80 45
Chlorpromazine 75 26
Dimenhydrinate 85 52
Pentobarbital (100 mg) 67 35
Pentobarbital (150 mg) 85 37
11.9 Exercises 203
(source: )
(a) Test each drug versus the placebo at the 5 per cent
level. Also, report the estimated odds-ratios. Summarize
your �ndings.
(b) Use the Bonferoni and the FDR method to adjust for
multiple testing.
10. Let X1; :::; Xn � Poisson(�):
(a) Let �0 > 0. Find the size � Wald test for
H0 : � = �0 versus H1 : � 6= �0:
(b) (Computer Experiment.) Let �0 = 1, n = 20 and � =
:05. Simulate X1; : : : ; Xn � Poisson(�0) and perform the
Wald test. Repeat many times and count how often you
reject the null. How close is the type I error rate to .05?
204 11. Hypothesis Testing and p-values
12
Bayesian Inference
12.1 The Bay esianPhilosophy
The statistical theory and methods that we hav e discussed
so far are known as frequentist (or classical) inference. The
frequentist point of view is based on the following postulates:
(F1) Probability refers to limiting relative frequencies. Proba-
bilities are objective properties of the real world.
(F2) P arameters are �xed, (usually unknown) constants. Be-
cause they are not uctuating, no probability statements can
be made about parameters.
(F3) Statistical procedures should be designed to hav ewell de-
�ned long run frequency properties. F or example, a 95 per cent
con�dence in terval should trap the true value of the parameter
with limiting frequency at least 95per cent.
There is another approach to inference called Bay esian in-
ference. The Bay esianapproach is based on the following pos-
tulates:
206 12. Bayesian Inference
(B1) Probability describes degree of belief, not limiting fre-
quency. As such, we can make probability statements about lots
of things, not just data which are subject to random variation.
For example, I might say that `the probability that Albert Ein-
stein drank a cup of tea on August 1 1948" is .35. This does not
refer to any limiting frequency. It re ects my strength of belief
that the proposition is true.
(B2) We can make probability statements about parameters,
even though they are �xed constants.
(B3) We make inferences about a parameter �, by producing
a probability distribution for �. Inferences, such as point esti-
mates and interval estimates may then be extracted from this
distribution.
Bayesian inference is a controversial approach because it in-
herently embraces a subjective notion of probability. In general,
Bayesian methods provide no guarantees on long run perfor-
mance. The �eld of Statistics puts more emphasis on frequentist
methods although Bayesian methods certainly have a presence.
Certain data mining and machine learning communuties seem to
embrace Bayesian methods very strongly. Let's put aside philo-
sophical arguments for now and see how Bayesian inference is
done. We'll conclude this chapter with some discussion on the
strengths and weaknesses of each approach.
12.2 The Bayesian Method
Bayesian inference is usually carried out in the following way.
1. We choose a probability density f(�) { called the prior
distribution { that expresses our degrees of beliefs about
a parameter � before we see any data.
12.2 The Bayesian Method 207
2. We choose a statistical model f(xj�) that re ects our be-liefs about x given �. Notice that we now write this as
f(xj�) instead of f(x; �).
3. After observing data X1; : : : ; Xn, we update our beliefs
and form the posterior distribution f(�jX1; : : : ; Xn).
To see how the third step is carried out, �rst, suppose that � is
discrete and that there is a single, discrete observation X. We
should use a capital letter now to denote the parameter since
we are treating it like a random variable so let � denote the
parameter. Now, in this discrete setting,
P (� = �jX = x) =P (X = x;� = �)
P (X = x)=
P (X = xj� = �)P (� = �)P�P (X = xj� = �)P (� = �)
which you may recognize from earlier in the course as Bayes'
theorem. The version for continuous variables is obtained by
using density functions:
f(�jx) =f(xj�)f(�)Rf(xj�)f(�)d�
: (12.1)
If we have n iid observations X1; : : : ; Xn, we replace f(xj�)with f(x1; : : : ; xnj�) =
Qn
i=1 f(xij�). Let us write Xn to mean
(X1; : : : ; Xn) and xn to mean (x1; : : : ; xn). Then
f(�jxn) =f(xnj�)f(�)Rf(xnj�)f(�)d�
=Ln(�)f(�)RLn(�)f(�)d�
/ Ln(�)f(�):
(12.2)
In the right hand side of the last equation, we threw away the
denominatorRLn(�)f(�)d� which is a constant that does not
depend on �; we call this quantity the normalizing constant.
We can summarize all this by writing:
\posterior is proportional to likelihood times prior:00 (12.3)
You might wonder, doesn't it cause a problem to throw away
the constantRLn(�)f(�)d�? The answer is that we can always
208 12. Bayesian Inference
recover the constant is since we know thatRf(�jxn)d� = 1.
Hence, we often omit the constant until we really need it.
What do we do with the posterior? First, we can get a point
estimate by summarizing the center of the posterior. Typically,
we use the mean or mode of the posterior. The posterior mean
is
�n =
Z�f(�jxn)d� =
R�Ln(�)f(�)RLn(�)f(�)d�
: (12.4)
We can also obtain a Bayesian interval estimate. De�ne a and b
byRa
�1 f(�jxn)d� =R1bf(�jxn)d� = �=2. Let C = (a; b). Then
P(� 2 Cjxn) =Z
b
a
f(�jxn) d� = 1� �
so C is a 1� � posterior interval.
Example 12.1 Let X1; : : : ; Xn � Bernoulli(p). Suppose we take
the uniform distribution f(p) = 1 as a prior. By Bayes' theorem
the posterior has the form
f(pjxn) / f(p)Ln(p) = ps(1� p)n�s = ps+1�1(1� p)n�s+1�1
where s =P
ixi is the number of heads. Recall that random
variable has a Beta distribution with parameters � and � if its
density is
f(p; �; �) =�(� + �)
�(�)�(�)p��1(1� p)��1:
We see that the posterior for p is a Beta distribution with pa-
rameters s+ 1 and n� s+ 1. That is,
f(pjxn) =�(n+ 2)
�(s+ 1)�(n� s + 1)p(s+1)�1(1� p)(n�s+1)�1:
We write this as
pjxn � Beta(s+ 1; n� s+ 1):
12.2 The Bayesian Method 209
Notice that we have �gured out the normalizing constant without
actually doing the integralRLn(p)f(p)dp. The mean of a Beta
(�; �) is �=(� + �) so the Bayes estimator is
p =s+ 1
n+ 2:
It is instructive to rewrite the estimator as
p = �nbp+ (1� �n)epwhere bp = s=n is the mle, ep = 1=2 is the prior mean and �n =
n=(n + 2) � 1. A 95 per cent posterior interval can be obtained
by numerically �nding a and b such thatRb
af(pjxn) dp = :95.
Suppose that instead of a uniform prior, we use the prior p �Beta(�; �). If you repeat the calculations above, you will see that
pjxn � Beta(� + s; � + n� s). The at prior is just the special
case with � = � = 1. The posterior mean is
p =� + s
� + � + n=
�n
� + � + n
� bp+ � �+ �
� + � + n
�p0
where p0 = �=(�+ �) is the prior mean. �
In the previous example, the prior was a Beta distribution
and the posterior was a Beta distribution. When the prior and
the posterior are in the same family, we say that the prior is
conjugate.
Example 12.2 Let X1; : : : ; Xn � N(�; �2). For simplicity, let us
assume that � is known. Suppose we take as a prior � � N(a; b2).
In problem 1 in the homework, it is shown that the posterior for
� is
�jXn � N(a; b2) (12.5)
where
� = wX + (1� w)a
210 12. Bayesian Inference
where
w =1se2
1se2
+ 1b2
and1
� 2=
1
se2+
1
b2
and se = �=pn is the standard error of the mle X. This is
another example of a conjugate prior. Note that w ! 1 and
�=se ! 1 as n ! 1. So, for large n, the posterior is approx-
imately N(b�; se2). The same is true if n is �xed but b ! 1,
which corresponds to letting the prior become very at.
Continuing with this example, let is �nd C = (c; d) such that
Pr(� 2 CjXn) = :95. We can do this by choosing c such that
Pr(� < cjXn) = :025 and Pr(� > djXn) = :025. So, we want to
�nd c such that
P (� < cjXn) = P
�� � �
�<c� �
�
���� Xn
�= P
�Z <
c� �
�
�= :025:
Now, we know that P (Z < �1:96) = :025. So
c� �
�= �1:96
implying that c = ��1:96�: By similar arguments, d = �+1:96:
So a 95 per cent Bayesian interval is � � 1:96 � . Since � � b�and � � se, the 95 per cent Bayesian interval is approximated
by b� � 1:96 se which is the frequentist con�dence interval. �
12.3 Functions of Parameters
How do we make inferences about a function � = g(�)? Re-
member in Chapter 3 we solved the following problem: given
the density fX for X, �nd the density for Y = g(X). We now
simply apply the same reasoning. The posterior cdf for � is
H(� jxn) = P(g(�) � �) =
ZA
f(�jxn)d�
12.4 Simulation 211
where A = f� : g(�) � �g. The posterior density is h(� jxn) =H 0(� jxn).
Example 12.3 Let X1; : : : ; Xn � Bernoulli(p) and f(p) = 1 so
that pjXn � Beta(s + 1; n � s + 1) with s =P
n
i=1 xi. Let =
log(p=(1� p)). Then
H( jxn) = P( � jxn) = P
�log
�P
1� P
��
���� xn�= P
�P �
e
1 + e
���� xn�=
Ze =(1+e )
0
f(pjxn) dp
=�(n+ 2)
�(s+ 1)�(n� s+ 1)
Ze =(1+e )
0
ps(1� p)n�s dp
and
h( jxn) = H 0( jxn)
=�(n+ 2)
�(s+ 1)�(n� s+ 1)
�e
1 + e
�s�1
1 + e
�n�s0@@�
e
1+e
�@
1A=
�(n+ 2)
�(s+ 1)�(n� s+ 1)
�e
1 + e
�s�1
1 + e
�n�s�1
1 + e
�2
=�(n+ 2)
�(s+ 1)�(n� s+ 1)
�e
1 + e
�s�1
1 + e
�n�s+2
for 2 R. �
12.4 Simulation
The posterior can often be approximated by simulation. Sup-
pose we draw �1; : : : ; �B � p(�jxn). Then a histogram of �1; : : : ; �B
approximates the posterior density p(�jxn). An approximation
212 12. Bayesian Inference
to the posterior mean �n = E (�jxn) is B�1PB
j=1 �j: The poste-
rior 1� � interval can be approximated by (��=2; �1��=2) where
��=2 is the �=2 sample quantile of �1; : : : ; �B.
Once we have a sample �1; : : : ; �B from f(�jxn), let �i = g(�i).
Then �1; : : : ; �B is a sample from f(� jxn). This avoids the needto do any analytical calculations. Simulation is discussed in more
detail later in the book.
Example 12.4 Consider again Example 12.3. We can approxi-
mate the posterior for without doing any calculus. Here are
the steps:
1. Draw P1; : : : ; PB � Beta(s+ 1; n� s+ 1).
2. Let i = log(Pi=(1� Pi)) for i = 1; : : : ; B.
Now 1; : : : ; B are iid draws from h( jxn). A histogram of
these values provides an estimate of h( jxn). �
12.5 Large Sample Properties of Bayes'
Procedures.
In the Bernoulli and Normal examples we saw that the pos-
terior mean was close to the mle . This is true in greater gen-
erality.
Theorem 12.5 Under appropriate regularity conditions, we have
that the posterior is approximately N(b�; bse 2) where b�n is the
mle and bse = 1=
qnI(b�n). Hence, �n � b�n. Also, if C = (b�n �
z�=2 bse ; b�n+z�=2 bse ) is the asymptotic frequentist 1�� con�dence
interval, then Cn is also an approximate 1�� Bayesian posterior
interval:
P(� 2 CjXn)! 1� �:
There is also a Bayesian delta method. Let � = g(�). Then
� jXn � N(b� ; ese2)
12.6 Flat Priors, Improper Priors and \Noninformative" Priors. 213
where b� = g(b�) and ese = se jg0(b�)j.12.6 Flat Priors, Improper Priors and
\Noninformative" Priors.
A big question in Bayesian inference is: where do you get
the prior f(�)? One school of thought, called \subjectivism"
says that the prior should re ect our subjective opinion about
� before the data are collected. This may be possible in some
cases but seems impractical in complicated problems especially
if there are many parameters. An alternative is to try to de�ne
some sort of \noninformative prior." An obvious candidate for
a noninformative prior is to use a \ at" prior f(�) / constant.
In the Bernoulli example, taking f(p) = 1 leads to pjXn �Beta(s+ 1; n� s+ 1) as we saw earlier which seemed very rea-
sonable. But unfettered use of at priors raises some questions.
Improper Priors. Consider the N(�; 1) example. Suppose
we adopt a at prior f(�) / c where c > 0 is a constant. Note
thatRf(�)d� = 1 so this is not a real probability density
in the usual sense. We call such a prior an improper prior.
Nonetheless, we can still carry out Bayes' theorem and compute
the posterior density f(�) / Ln(�)f(�) / Ln(�). In the nor-
mal example, this gives �jXn � N(X; �2=n) and the resulting
point and interval estimators agree exactly with their frequen-
tist counterparts. In general, improper priors are not a problem
as long as the resulting posterior is a well de�ned probability
distribution.
Flat Priors are Not Invariant.Go back to the Bernoulli
example and consider using the at prior f(p) = 1. Recall that
a at prior presumably represents our lack of information about
p before the experiment. Now let = log(p=(1� p)). This is a
transformation and we can compute the resulting distribution
214 12. Bayesian Inference
for . It turns out that
f( ) =e
(1 + e )2:
But one could argue that if we are ignorant about p then we are
also ignorant about so shouldn't we use a at prior for ? This
contradicts the prior f( ) for that is implied by using a at
prior for p. In short, the notion of a at prior is not well-de�ned
because a at prior on a parameter does not imply a at prior
on a transformed version of the parameter. Flat priors are not
transformation invariant.
Jeffreys' Prior. Je�reys came up with a \rule" for cre-
ating priors. The rule is: take f(�) / I(�)1=2 where I(�) is the
Fisher information function. This rule turns out to be transfor-
mation invariant. There are various reasons for thinking that
this prior might be a useful prior but we will not go into details
here.
Example 12.6 Consider the Bernoulli (p). Recall that
I(p) =1
p(1� p):
Je�rey's rule says to use the prior
f(p) /pI(p) = p�1=2(1� p)�1=2:
This is a Beta (1/2,1/2) density. This is very close to a uniform
density.
In a multiparameter problem, the Je�reys' prior is de�ned to
be f(�) /pdetI(�) where det(A) denotes the determinant of
a matrix A.
12.7 Multiparameter Problems
12.7 Multiparameter Problems 215
In principle, multiparameter problems are handled the same
way. Suppose that � = (�1; : : : ; �p). The posterior density is still
given by
p(�jxn) / Ln(�)f(�):
The question now arises of how to extract inferences about one
parameter. The key is �nd the marginal posterior density for
the parameter of interest. Suppose we want to make inferences
about �1. The marginal posterior for �1 is
f(�1jxn) =Z� � �Zf(�1; � � � ; �pjxn)d�2 : : : d�p:
In practice, it might not be feasible to do this integral. Simula-
tion can help. Draw randomly from the posterior:
�1; : : : ; �B � f(�jxn)
where the superscripts index the di�erent draws. Each �j is a
vector �j = (�j
1; : : : ; �j
p). Now collect together the �rst compo-
nent of each draw:
�11; : : : ; �B
1 :
These are a sample from f(�1jxn) and we have avoided doing
any integrals.
Example 12.7 (Comparing two binomials.) Suppose we have n1 con-
trol patients and n2 treatment patients and that X1 control pa-
tients survive while X2 treatment patients survive. We want to
estimate � = g(p1; p2) = p2 � p1. Then,
X1 � Binomial(n1; p1) and X2 � Binomial(n2; p2):
Suppose we take f(p1; p2) = 1. The posterior is
f(p1; p2jx1; x2) / px11 (1� p1)n1�x1px22 (1� p2)
n2�x2:
Notice that (p1; p2) live on a rectangle (a square, actually) and
that
f(p1; p2jx1; x2) = f(p1jx1)f(p2jx2)
216 12. Bayesian Inference
where
f(p1jx1) / px11 (1� p1)n1�x1 and f(p2jx2) / px22 (1� p2)
n2�x2
which implies that p1 and p2 are independent under the pos-
terior. Also, p1jx1 � Beta(x1 + 1; n1 � x1 + 1) and p2jx2 �Beta(x2+1; n2�x2+1). If we simulate P1;1; : : : ; P1;B � Beta(x1+
1; n1�x1+1) and P2;1; : : : ; P2;B � Beta(x2+1; n2�x2+1) then
�b = P2;b � P1;b, b = 1; : : : ; B, is a sample from f(� jx1; x2). �
12.8 Strengths and Weaknesses of Bayesian
Inference
Bayesian inference is appealing when prior information is avail-
able since Bayes' theorem is a natural way to combine prior in-
formation with data. Some people �nd Bayesian inference psy-
chologically appealing because it allows us to make probability
statements about parameters. In contrast, frequentist inference
provides con�dence sets Cn which trap the parameter 95 per
cent of the time, but we cannot say that P(� 2 CnjXn) is .95.
In the frequentist approach we can make probability statements
about Cn not �. However, psychological appeal is not really an
argument for using one type of inference over another.
In parametric models, with large samples, Bayesian and fre-
quentist methods give approximately the same inferences. In
general, they need not agree. Consider the following example.
Example 12.8 Let X � N(�; 1) and suppose we use the prior
� � N(0; � 2). From (12.5), the posterior is
�jx � N
�x
1 + 1�2
;1
1 + 1�2
�= N (cx; c)
where c = � 2=(� 2 + 1). A 1 � � per cent posterior interval is
C = (a; b) where
a = cx�pcz�=2 and b = cx+
pcz�=2:
12.9 Appendix 217
Thus, P(� 2 CjX) = 1� �. We can now ask, from a frequntist
perspective, what is the coverage of C, that is, how often will
this interval contain the true value? The answer is
P�(a < � < b) = P�(cX � z�=2pc < � < cX + z�=2
pc)
= P�
�� � z�=2
pc� c�
c< X � � <
� + z�=2pc� c�
c
�= P�
��(1� c)� z�=2
pc
c< Z <
�(1� c) + z�=2pc
c
�= �
��(1� c) + z�=2
pc
c
�� �
��(1� c)� z�=2
pc
c
�where Z � N(0; 1). Figure 12.1 shows the coverage as a function
of � for � = 1 and � = :05. Unless the true value of � is close to
0, the coverage can be very small. Thus, upon repeated use, the
Bayesian 95 per cent interval might contain the true value with
frequency near 0! In contrast, a con�dence interval has coverage
95 per cent coverage no matter what the true value of � is. �
What should we conclude from all this? The important thing
is to understand that frequentist and Bayesian methods are an-
swering di�erent questions. To combine prior beliefs with data
in a principled way, use bayesian inference. To construct proce-
dures with guaranteed long run performance, such as con�dence
intervals, use frequentist methods. It is worth remarking that it
is possible to develop nonparametric Bayesian methods similar
to plug-in estimation and the bootstrap. be forewarned, how-
ever, that the frequency properties of nonparametric Bayesian
methods can sometimes be quite poor.
12.9 Appendix
Proof of Theorem 12.5.
It can be shown that the e�ect of the prior diminishes as
n increases so that f(�jXn) / Ln(�)f(�) � Ln(�). Hence,
218 12. Bayesian Inference
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
�
coverage
FIGURE 12.1. Frequentist coverage of 95 per cent Bayesian posterior
interval as a function of the true value �. The dotted line marks the
95 per cent level.
12.10 Bibliographic Remarks 219
log f(�jXn) � `(�). Now, `(�) � `(b�) + (� � b�)`0(b�) + [(� �b�)2=2]`00(b�) = `(b�) + [(� � b�)2=2]`00(b�) since `0(b�) = 0. Exponen-
tiating, we get approximately that
f(�jXn) / exp
(�1
2
(� � b�)2�2n
)
where �2n= ��1=`00(b�n). So the posterior of � is approximately
Normal with mean b� and variance �2n. Let `i = log f(Xij�), then
��2n
= �`00(b�n) =Xi
�`00i(b�n)
= n
�1
n
�Xi
�`00i(b�n) � nE �
h�`00
i(b�n)i
= nI(b�n)and hence �n � se(b�). �12.10 Bibliographic Remarks
Some references on Bayesian inference include Carlin and
Louis (1996), Gelman, Carlin, Stern and Rubin (1995), Lee
(1997), Robert (1994) and Schervish (1995). See Cox (1997), Di-
aconis and Freedman (1996), Freedman (2001), Barron, Schervish
andWasserman (1999), Ghosal, Ghosh and van der Vaart (2001),
Shen and Wasserman (2001) and Zhao (2001) for discussions of
some of the technicalities of nonparametric Bayesian inference.
12.11 Exercises
1. Verify (12.5).
2. Let X1; :::; Xn Normal(�; 1): (a) Simulate a data set (using
� = 5) consisting of n=100 observations.
(b) Take f(�) = 1 and �nd the posterior density. Plot the
density.
220 12. Bayesian Inference
(c) Simulate 1000 draws from the posterior. Plot a his-
togram of the simulated values and compare the histogram
to the answer in (b).
(d) Let � = e�. Find the posterior density for � analytically
and by simulation.
(e) Find a 95 per cent posterior interval for �.
(f) Find a 95 per cent con�dence interval for �.
3. Let X1; :::; Xn Uniform(0; �): Let f(�) / 1=�. Find the
posterior density.
4. Suppose that 50 people are given a placebo and 50 are
given a new treatment. 30 placebo patients show improve-
ment while 40 treated patients show improvement. Let
� = p2�p1 where p2 is the probability of improving under
treatment and p1 is the probability of improving under
placebo.
(a) Find the mle of � . Find the standard error and 90 per
cent con�dence interval using the delta method.
(b) Find the standard error and 90 per cent con�dence
interval using the parametric bootstrap.
(c) Use the prior f(p1; p2) = 1. Use simulation to �nd the
posterior mean and posterior 90 per cent interval for � .
(d) Let
= log
��p1
1� p1
���
p2
1� p2
��be the log-odds ratio. Note that = 0 if p1 = p2. Find
the mle of . Use the delta method to �nd a 90 per cent
con�dence interval for .
(e) Use simulation to �nd the posterior mean and posterior
90 per cent interval for .
12.11 Exercises 221
5. Consider the Bernoulli(p) observations
0 1 0 1 0 0 0 0 0 0
Plot the posterior for p using these priors: Beta(1/2,1/2),
Beta(1,1), Beta(10,10), Beta(100,100).
6. Let X1; : : : ; Xn � Poisson(�).
(a) Let � � Gamma(�; �) be the prior. Show that the
posterior is also a Gamma. Find the posterior mean.
(b) Find the Je�reys' prior. Find the posterior.
������������� ������� �������������������� �"!#�$�������
%'&
(*),+�)�-/.0)�-213+547698�1:-/.�-2;=< >@?A8�;=B�C
DFEHGID J5KML#NPORQ7OPS,TUKVOPLFWXY����[Z��:\�]��#����^#�����_^*�`�_Z ���/ a��]����b�,�_�`����c* �/]��`�H��!#\2�d �H������ceVfbg
��ch!#cia���j ��a�����]"]F^k���l�/��c* ��]���m0�/���nco�_����]"^p] q:c*] c*�_�b���5�_�`�/��ceV�/]�� ��^r�����sU] �`�/�_����]��tco�[ �vuvwx�rqy \z�[mU�������`�= ���5ce ��{|] �/�#����}~�{"����]�����#���/V�/�=���l�/��c* ��]����_u��3]�}�^#]�}��o\2��]F] ���o co]����d�/�#��ce�o�����* �Fg�`}��������:q�] !���^r�����0�"���P�[�����7�M���b�����'}�����\2�r���3oq�] ��c* a��/���_]��`{Aq�]��\�]�co� �`�����s�l�2 �����l�/��\� a���`]F\_��^�!#�����_u
� ]��#����^#����d� �/ co�_�������*}��#��\2��a���Z����t����d� �/ co�_�������`� \��o�hu���_�:����U�t �e�_�`����c* �/]�� ] q0�"uFwx�*�/���3a¡ ��� !� ���:] q�^��_\�������] �*�/�#��]��l{�mF���l�/��c* ��]��~���¢��]�co�_����co����\[ a�a���^ds���"���y�[�y���£�V¤¦¥��* ��^d�/�#��U]��`������a��Z a�!��_��] q¦�����$^���\_���`��]�����!#a��$ �`��\[ a�a���^�§�� ���y����� u
XY�d���� a�a�c*�� ��!��`�d�/���n^#����\_���_� ��\z{p���_�x}������¨�r ��^©���!#�����#��¥y���[�rªI¤¦��� �M���U�¬«:P�"® ���¯zu¦°�]���c* a�a�{�m0«±c* ��s�³²´�µ�����/]�¶:u¢�:���`�
����� ��������� ��������������������������������� ����!�" �`����]�co�$�zf# co�a��_��] q a�]�����q�!��#\_�/��]����_�
«:P�F® ���¯�# y�%$ ���¯'& �`�b!� ���_^��_����] ��a�]��`��m«:P�F® ���¯�#)( �*$©��+( ����]�a�!#�/���_���`]���a�]��`��m«:P�F® ���¯�#)( �*$©��+( , « , a�]����_®«:P�F® ���¯�#.-=��q¢�/# ��s ��^10���q��32# �� 4_���`] gx]��#��a�]��`��m«:P�F® ���¯�#65*a�]��87�9;:=< >@?�A9;:=< >CB?�AED/F HGJI���¯'KLG M�!�a�a��� \2j�g ���_����a����~a�]����ON
P~�� �t����co����^A���A}��� �tq�] a�a�]�}����/��V�� �����`����c* �/] � ��o���toq�!���\z�/��]��] q�������^� �/#u#��]s��co��� ���Q4_�t�/�#���~�]����b��m#�`]�c*�z�/��c*�_� }���}���a�a�}��`���/���� � ���SRk¯zub��]s ���`���`�� �e�_�`����c* �/]��_m�}~�3�_Z a�!� �/���/�#��[Z��_�/ ���3a�]��`��] �������j�u
TVUOW+XLY Z[Y \]X3^�_]`a^cbedgf �V�P�Ohji�k/l�m f�nporqSs l o iut �� qQnv=y�F®2�� ¯�#xw ? 7�«:P�"®z���¯ D #zy «:P�"®z���SGU¯�¯ F HGJI2� ¯EKLGJN
X �����'�����oa�]��`�$q�!#��\_����]��k���$���b!� ����^'�����`]���m������*�`���`jr����{`!��l� �����|*}�~ �c*�� �����b!� �`��^|�����`]��2¯ �v=P�F®2���¯V#�w ? R��%$p��¯ & # |*}�~ #�� ? R���¯��j� Y��u� &? I���¯�N
wx�s�/�����`���l�,] qU������\2�� F�/���_mb�yª��h�o���|�0�����z��§#���8�*��§F��¥y���[��ªx¤�������M�����.�h�£§��M�£¤��[�y���J��§����[¤��Y�p�M���p¥y���[�Aªx¤¦��� �M����� �P��� ��¤0§��M�"��b�M�V�����DFEHGC� ����Q�� T�K OPS����nORW��j�V��SV �¡ O¢��S�W�¦]|\�]�co� �`�=�x}�]r�_�`�/��ceV�/]��`��}~�*\[ �Y\�]�co� �`�=���������$�����`jrq�!���\ g
�/��]����_u��3]�}��_Z��_��m"�/����� ^#]F�_� �#] � ���]�Z"��^��:$\�a��[ � ���l}~�_�~ �,��]�}��#��\2����l�/��ce ��]��¢�����U�z���/�_��u � ]��#����^#�����/����q�]�a�a�]�}����#�h�zf# co�a��_��u£�¤�� ¥§¦ ¨ U8^�_©`«ª�¬�fpo R®z¯py�F® 0�¯°l�m+±§l n�np²³s´f¶µ�f l�t f¶²�npq m¸· n;¹;² l�t f ±f t�t[iut�ºQi n�n;»½¼ ium npq ± f t o¢µ i f�nporqSs l o iut n;¾ ���¿/#ÀR l�m+±��� & #ÂÁ »Ãbedgf
�����«��� �����¸� !'������ ������� ����[�������� �����
- 0 � Á � �-0�Á
v=y�F® ���¿�¯v=P�"® �� & ¯
������������ ������� ��� ����¸� !E�����¶�� � !'�a����� ����[������¸���� ������ ��!"! ���%�a��� ������ ����� ��!�� �����$# ���% ���V���'& �t qQn)( k ² m�* o¢q ium n l�t f v=P�F®2���¿�¯�# w ? SR $¬��¯�&3# 0 l�m+± v=y�F® �� & ¯ #w ? ¢Á½$£��¯ & #�rÁ½$´��¯ & N,+§i o¢q * f/oSd l o.- q k3�0/ �0/1� oSdgf m3v=y�F® �� & ¯�/v=P�"®z���¿�¯ i oSdgf t µVqQn�f v=P�"® ���¿�¯2/ v=P�F®2�� & ¯ » + fpqSoHdgf t f�nporqSs l o iut ² m q43k iut s º65 ±¸i s8q m+l o f�n8oSdgf i oSdgf t)7 n�f�f98�q · ² t f;:=<�»4: »?>A@CB�q i ² n º65d�� & qQn lt q ± q * ² ºQi ² n f�npo¢qSs l o iut @;²³o*qSo n�f t B�f�n�o i q ºSº ² npo tEl o f´oSdgf�D i q m o oSd l o qSoqQn m i o i @CB�q i ² n*d i µ�o iE*�i sFD l�t f%orµ i�t qQn)( k ² m�* orq ium n;»HG£�¤�� ¥/¦ ¨ U ^�_©`«_x¬Jfpo R3¿ ® N N;N�® RJI� P~�_����]�!#a�a��P6K�¯ » ¼ ium npq ± f t n;¹;² l�t f ±f t�t�iut%ºQi n�n l�m+±´º fpo �K ¿V# R »ML�q m�* f½oHd¸qQn½d l n9NO@;q l n)-Vµ�f*d l B�f%oSd l o
v=%K�® �K ¿`¯�#��= R�¯V# K�'0 $PK�¯Q NR m i oHdgf t f�nporqSs l o iut qQn�K & # S �UTT �WV� Qµ dgf t f S #YX IZ [ ¿ R Z l�m+±\Tjl�m+±V�l�t f]D i npqSorq4B�f *�ium npo l�m oSn;» bed¸qQn qQnoHdgf^D i npo f t q iut s´f l�m ²�npq m¸·´l P~�_�/`_4T�®aV'b D t q iut » +§i µc-
v=6K�® �K & ¯ # � , �K & ¯�� � Y ��� , �K & ¯�¯ &# � , d S �UTT �eV� Q^f � d w , d S �WTT3�eV3� Q�f $PK f &# Q K0'0 $PK�¯
gT �WV� Q ¯ & � d Q K �WTT �eV� Q $PK f & N
����� ��������� ��������������������������������� ����!�"
0.0 0.2 0.4 0.6 0.8 1.0
0.00
000.
0005
0.00
100.
0015
0.00
200.
0025
K� ����
v= �K & ¯v= �K ¿�¯
� �����]� �������«��� � ����\�� ����[�������� �@��!�� ¿ ��� !� & ���0���L�����¸���%�����«���+§i µ º fpo T1# V #� Q�� � » _���m�����l sFD º f�:=<�»4:=< µ�f½µVq ºSº f � D ºal q m oSd¸qQn* d i q * f;» b bedgf t f�np² º o¢q m¸· f�npo¢qSs l o iut qQn
�K & # S �� Q�� �Q ��� Ql�m+±�t qQn)( k ² m�* o¢q ium qQn
v=6K�® �K & ¯V# Q�� Q � � Q ¯ & Nbedgf t qQn)( k ² m�* o¢q ium n l�t fOD ºQi oro�f ± q m���· ² t f :=<�»��¸» R n µ�f * l�m n�f�f)-m fpqSoSdgf t f�npo¢qSs l o iut ² m q k iut s º65�±¸i s8q m+l o�f�n½oHdgf i oSdgf t »�������`���zf# co�a��_���#������a��������H�����:�����_^e�/] �U�3 ��a�����] \_]�co� ���:������j
q�!���\z�/��]�����u"��]h^�]5�`]�mb}~���#����^� ]����zgI�b!�ch�����~�`!�coce �l{=] q�������������jq�!���\z�/��]���u��¢}�]���!�\2�p�`!�coce �`���_� ���=�/���*ceVfF��ch!�c �`����j� ��^Y�����P¢[{��_�3�`���`j�u
�����«��� �����¸� !'������ ������� ����[�������� ���L�TVUOW+XLY Z[Y�\³X^�_©`��)bedgf �'§��0�r��¤�� �V�y�;h qQn
v= ���¯V# ��!#? v=y�F® �� ¯ �0�Á#ua0M¯l�m+±��M�0���h§����F�=�V�P�Oh qQn
� ���® �� ¯�# y v=P�"® ���¯� P� ¯EKb� �0�Á#u ��¯µ dgf t f � P� ¯ qQn l D t q iutVk iut�� »
£�¤�� ¥/¦ ¨ U ^�_©`� ¼ ium npq ± f t l�· l q m oSdgf�orµ i f�npo¢qSs l o iut n q m ����l sFD º f�:=<�»�<�»� f½d l B�f v= �K ¿�¯V# ceMf
�� , ¿ K�'0 $ K�¯Q # 0� Ql�m+± v= �K & ¯�#¬c*Vf, Q�� Q � � Q ¯ & # Q�� Q � � Q ¯ & N�°l n�f ±3ium s l�� qSs8²³s t qQn)(�- �K & qQn l @pfpo¢o f t f�npo¢qSs l o iut npq m�* f v= �K & ¯]/v= �K ¿�¯ »�� i µ�f B�f t - µ dgf m Q qQn ºal�t · f)- v= �K ¿�¯ d l n½nps l�ºSº f t½t qQn)( f � * f�D+ok iut�l nps l�ºSº�t f · q ium q m oSdgfFD l�tEl s´fpo�f t n�D l�* f m f l�t]K1#À0 � � »%bed¸² n)-s l�m 5 Dgf i D º f D t f k f t �K ¿ o i �K & »�bed¸qQn3q ºSº ²�npo tEl o�f�n oSd l o ium f 3 m ²¸s;@pf tnp²³s8s l�t q f�n º q ( f8s l�� qSs8²¸s t qQn)( l�t f qSsFDgf tSk f * o » +/i µ *�ium npq ± f t oSdgf�°l 5 f�n t qQn)(�»"8 iut q ºSº ² npo tEl orq ium - º fpo ² n§o l ( f � %K�¯ #c0 »°bedgf m
� ���® �K ¿`¯V# y v=%K�® �K ¿`¯EK K´# y K��0 $ K�¯Q K K# 0� Ql�m+±
� ���® �K & ¯�#zy v=%K�® �K & ¯'K�K´# Q�� Q ��� Q ¯ & N8 iut Q�� ��- - � ���® �K & ¯�� � ���® �K ¿l¯ µ d¸q * d8np² ·�· f�npoHn°oSd l o �K ¿ qQn l @pfpo¢o f tf�npo¢qSs l o iut »�bed¸qQn3s8q · d¸o n�f�fps q m o¢²¸qSo¢q4B�f º65 t f l n ium+l @ º fE@;²³o§oHd¸qQn l�m 3npµ�f t ± f�Dgf m+± n ium oHdgf * d i q * f i�k D t q iut » bedgf l ± B�o l�m o l�· f i�k ² npq m¸·s l�� qSs8²³s t qQn)(�- ± f�n�D+qSo�f�qSoSnD t[i @ º fpsÃn)- qQn�oSd l o°qSo ±¸i f�n m i o t f ¹;²¸q t f
����� ��������� ��������������������������������� ����!�"ium f o i * d i�i n�f l D t q iut » �pm d¸q · d�3 ± qSs´f m npq ium+l�º - *�i sFD º f � D t[i @ º fpsÃn)-* d i�i npq m¸·´l3± f k f m npq�@ º f�D t q iut0* l�m @pf8f � o t fps´f º65�± q � * ² º o »HG�������`���x}�]´��!�coc* �������e] q�������������j´q�!���\z�/��] � ��!���� ���`�n�x}�]´^���q¡g
q����`������co�_����]"^��:q�] �3^��zZ"���������o���`����c* �/] ������\2��]"]��`�����d��=��]eco������c*�@4���/�#�oc*VfF��c5!�c9�`����j�a��[ ^����/]Ac*������c*Vf|���l�/��ce ��]����OI�\2��]"]�������� ��n��]co������c*�@4����/���§P¢[{��_�3�`���`jda��� ^��¢�/]�P¢[{ ���t�_�`����c* �/]��`��u
TVUOW+XLY Z[Y \]X3^�_]`�� R ± f * qQnpq ium t ² º f§oSd l o�s8q m qSs8q��uf�n½oSdgf �°l 5 f�nt qQn)( qQn * l�ºSº f ± l �h§��U�"���V¤�¥y� » 8 iut s l�ºSº65 - �� qQn l �A5 f�n t ² º fk iut D t q iut � q k v=P�"®z���¯�# ���#q�? � ��®����¯ '0�ÁFu=Á�¯
µ dgf t f§oHdgf§q m�� s8²¸sÂqQn i B�f t/l�ºSº f�nporqSs l o iut n��� »TVUOW+XLY Z[Y \]X3^�_]`� R m f�nporqSs l o iut oHd l o3s8q m qSs8q��uf�n oHdgf s l�� q43s8²¸s t qQn)(§qQn * l�ºSº f ± l½�p�y�¦�r�'§��r� ¤�¥�� » 8 iut s l�ºSº65 -��� qQn s8q m 3qSs l�� q k v=P�"® ���¯V#¬���#q
�? ��!�? v=P�"®���¯ '0�ÁFu �b¯µ dgf t f§oHdgf§q m�� s8²¸sÂqQn i B�f t/l�ºSº f�nporqSs l o iut n �� »
DFEHG�E �oT ��LFW��hW ¡ ORQ T�¡��UK W�0�z� �Y���$s�����]���u�°��`]�c P¢[{��������/���_]����_c|m�������U] �`�/�_����]���^����#�����x{
���
F P��( GU¯V# F HGV( � ¯ � y��¯� SGU¯ # F SGV( � ¯ � y��¯5 F HGV( ��¯� P� ¯EKb� '0 Á#u � ¯
}����_��� � HGU¯# 5 F HG0®/��¯EKb� # 5 F SGV( ��¯� P� ¯EKb�r���h����� �'§������P�0§�¥���P�_�[� ���¦¤0�M�����'] q�R£u������������/���������_���b�V�y���d�V�y�;hY] q, ���_�`����c* �/]��
�����«�����p"�������� ��%�%� ��!'� ��������SGU¯���{
� ���( GU¯V# y «:P�"® ���SGU¯�¯ F P��( GU¯EKb�¸N '0�Á#u � ¯����UO\��CU�¥�� ��� bedgf �°l 5 f�n t qQn)( � ��® �� ¯ n l orqQn � f�n
� ��® �� ¯�# y � ��+( GU¯ � HGU¯©KLGJN¬Jfpo ���HGU¯ @pf/oSdgf9B l�º ²gf i�k�� oSd l o¶s8q m qSs8q��uf�n � I��+( GU¯ »*bedgf m��� qQn oSdgf�°l 5 f�n f�npo¢qSs l o iut » ����������¦XY��\� �A���_}��`�������/���§P¢[{��_�3�`���`jn ��q�]�a�a�]�}����
� ��® �� ¯ # y v=y�F® �� ¯ � y��¯'K"�%# y d y «:P�F® ���HGU¯`¯ F SGV( � ¯EKLG f � y��¯EKb�# y y «:P�"®z���SGU¯�¯ F HG0®2� ¯EKLG+Kb�§# y y «:y�F®z��#HGU¯�¯ F P��( GU¯ � HGU¯EKLG+Kb�# y d yµ«:P�F®2���HGU¯`¯ F y�+( GU¯EKb� f � HGU¯©KLG #zy � I���( GU¯ � SGU¯©KLGJN
wIqU}���\2��]"]���� ���HGU¯���]��U�¢�����¢Z a�!��~] qU�3�/��V�,c*������co�Q4_��� � ��+( GU¯��/���_�}~��}���a�abco������c*�@4��H�����¢������������ ��^s � �_Z ���`{/Go ��^5���b!���c*������co�Q4_�H�/��������/�_���/ aJ5 � I���( GU¯ � SGU¯EKLG0u G�:]M} }��5\� � ���#^� ��� fF�a���\�����q�]��`ch!�a�=q�]����/�#�ÃP¢[{ �����_�`�/��ceV�/]��
q�]�����] c*�$�����\_� ��\$a�]����¢q�!��#\_�/��]����_u����UO\��CU�¥�� ��� �Hk�«:y�F® �� ¯ # P� $ ���¯ & oHdgf m oHdgf � l 5 f�n/f�npo¢qSs l o iut qQn
���SGU¯�#zy � F y�+( GU¯'Kb�/#�w$y�+( R #xGU¯pN '0�Á#u ��¯�Hk�«:P�"®z���¯%# ( � $ �� ( oHdgf m oSdgf �°l 5 f�n3f�nporqSs l o iut qQn�oSdgf�s´f ± q l�m i�koHdgf?D i npo�f t q iut F P��( GU¯ » �Hkt«:P�"® ���¯ qQn �uf t�i 3 ium f ºQi n�n)-�oHdgf m oHdgf �°l 5 f�nf�npo¢qSs l o iut qQn%oSdgf§s i;± f i�k oHdgf^D i npo f t q iut F P��( GU¯ » ����������#X'� }���a�a"#��]�Z��~������������] ����c q�]������b!� ����^s�_���`]�� a�]����_uV���#�P¢[{��_� ��!#a������SGU¯0co������c*�@4��_� � I��+( GU¯�#65:P�g$ ���HGU¯`¯'& F P�+( GU¯EKb�Fu��¦ jb���#�
����� ��������� ��������������������������������� ����!�"�/�#�3^��_����Z V�/��Z ��] q � I���( GU¯H}����/�o���_�����\z�~�/]h���SGU¯ �#^e���z������������� ���b!� a�/]´-*{"���_a�^����/���5���b!� ����]��r� 5�y�§$ ���SGU¯�¯ F P��( GU¯EKb� #c-#u��F]�a�Z"�����oq�] ����SGU¯~}~�$���z�/0 Á#u �"u G£�¤�� ¥§¦ ¨ U8^�_©`�^���¬Jfpo R ¿2® N N N_® RJI ¯'��H®�e&z¯ µ dgf t f � & qQn;( m i µ m »L�²�D�D i n�f¶µ�f�²�n�f l ¯'��® � & ¯ D t q iut k iut�� »�bedgf �°l 5 f�n f�nporqSs l o iut µVqSoSdt f�n�Dgf * o�o i n;¹;² l�t f ± f tpt[iut%ºQi n�n%qQn§oSdgf�D i npo f t q iut s´f l�m -�µ d¸q * d3qQn
���SR ¿2® N N N_® RJI�¯V# ��&� & �����I RÀ� � �I
� & �����I gN GDFEHG�� � ORS�ORQ7T��j�3��NPLFW����� #��]���a���c ] q ����^����#�oco���#��c*Vfo��!�a���������\�]�co�a���\[ ����^� �#^A}��
\[ ����] ��V���/�_c*F��p\�]�co�a��_�/��\�]�Z��_�/ ����] q����� �d������] �`{7�#�������#!#�}��d}���a�a co�����/��]��´�q��_} j �z{£���_��!�a��/��u �����nce ���pco���`�/ ���d��]r�2 j �[}¢[{oq���]�c������������_\_�/��]��d�����¸P¢[{��_�¢���l�/��c* ��]����,}������n \_]����`�/ ����������jq�!���\z�/��]��A ����co������ceMfvu� ��U;\��CUu¥�� � � �.¬Jfpo ���� @pf§oSdgf �°l 5 f�n t ² º f k iut n i s´f�D t q iut � ¾
� ��® �� � ¯V# ���#qB? � ��H® �� ¯�N '0 Á#u���¯L�²�D�D i n�f%oHd l ov=P�F® �� � ¯�� � ��H® �� � ¯±q�] �t a�a��³N '0 Á#u���¯
bedgf m����� qQn%s8q m qSs l��l�m+± � qQn * l�ºSº f ±l5¥��b§��_�oªI§! ����V§ � ¥�� ��� ����� » ����������"�F!#�U] ���s�/��V�5����n�����#] ��c*������c*VfUu��������k�/���_���=���$ �"g
] ���������`!�a��=�� � �`!�\2���/�� ����!� ? v=P�"® �� � ¯/ ��!� ? v=y�F®z�����¯ u#�F���#\��5�����[Z��_�/ ���*] q�nq�!���\z�/��]��'��� a�}~�{"� a����`������ �Y]��$���b!� a,�/]A������c*VfF��gch!�cAmF}�����[Z��$�/�� � � ��H® �� � ¯$� ��!� ? v=y�F® �� � ¯ u��3�_��\�� m
� ���® �� � ¯%�¬�`!�? v=P�"® �� � ¯]/7��!#? v=y�F® �� � ¯%� � ���® �� � ¯
����� � � �����%�%� �9� ������ �����}�����\2�A\�] �b���/ ^#��\_��� '0�Á#u ��¯ u G
����UO\ �QU�¥�� � ��� L�²�D�D i n�f oSd l o �� qQn°oSdgf � l 5 f�n t ² º f µVqSoSd t f 3n�Dgf * o o i n i s´f,D t q iut � » L�²�D�D i n�f k ² t oHdgf t oHd l o �� d l n *�ium npo l�m ot qQn)(�¾ v=P�F® ���¯�#�� k iut n i s´f � »°bedgf mn�� qQn%s8q m qSs l�� » ���������������� P¢[{ �����`����j���� � ��H® ���¯¶#c5�v=P�F® ���¯ �,P��¯'Kb�8#��$ �#^
�����#\�� v=P�"® ���¯ � � ��H® �� ¯�q�]��� a�a��Fu��3]�} ��a�{A�����h��`�_Z"��] !��3������] g���_c|u G£�¤�� ¥/¦ ¨ U ^�_©`�^�_ ¼ ium npq ± f t oSdgf � f t�m i ² ºSº q�s i;± f º µVqSoSd�n;¹;² l�t f ± f t�t[iutºQi n�n;» �pm f ��l sFD º f�:=<�»�<´µ�f½n�d i µ�f ± oSd l o oSdgf8f�nporqSs l o iut
�K0HR I ¯�# X IZ [ ¿ R Z � Q � �Q � � Qd l n l *�ium npo l�m o t qQn)( k ² m�* orq ium » bed¸qQn f�nporqSs l o iut qQn1oSdgfED i npo�f t q iuts´f l�m - l�m+± dgf m�* f oSdgf �°l 5 f�n t ² º f)- k iut oSdgfD t q iut%P~�_�/�gT�® V ¯ µVqSoSdT1# V # Q�� � »��/f m�* f)-]@ 5 oHdgf^D t f B�q i ² n§oHdgf iut fps - oSd¸qQn8f�npo¢qSs l o iutqQn§s8q m qSs l�� »�G£�¤�� ¥/¦ ¨ U ^�_©`�^ � ¼ ium npq ± f t*l�· l q m oSdgf � f t�m i ² ºSº q @;²¸o�µVqSoHd ºQi n�n k ² m�* 3o¢q ium
«:6K�® �K�¯V# 6K�$ �K�¯'&K0'0 $PK�¯ N¬Jfpo�K�HR I ¯�# �K# S Q Nbedgf t qQn)(�qQn
v=%K�® �K�¯V# d �K�$PK�¯�&K�'0 $PK�¯ f # 0K�'0 $PK�¯ d K��0 $2K�¯Q f # 0Qµ d¸q * d�- l n l k ² m�* orq ium i�k K -�qQn *�ium npo l�m o » � o * l�m @pf*n�d i µ m oHd l o4- k iutoHd¸qQn ºQi n�n k ² m�* orq ium - �K�HR I ¯ qQn°oSdgf � l 5 f�n%f�nporqSs l o iut ² m+± f t oHdgf D t q iut� %K�¯ #c0 »��§f m�* f)- �K qQn§s8q m qSs l�� »�G
����� ��������� ��������������������������������� ����!�"� �� ��!��/ a��b!����l�/��]����/]d ��jA���_��}���V�����:�/���5c*������c*Vfn�_�`����c* �/]��
q�]��� �3] ��c* a�co]F^��_a �� ��U;\��CUu¥�� � ��� ¬Jfpo R3¿z® N N;N_® RJI/ ¯py�F® 0�¯Ãl�m+±�º fpo ��/# R »°bedgf m�� qQn8s8q m qSs l�� µVqSoSd t f�n�Dgf * o o i l�m 5 µ�f ºSº 3 @pf�d l B�f ± ºQi n�n k ² m�* orq ium » � o��� U�¨ ¨ � � U �u���U� �¥°U � X��VZ �u��Z Z ��U§¨ U��U�¨�'UOZ�� ¥�����Z � U��;\]X��U�¤� X� ��� ¥§¥ UpZ �¢Y �� � \���Z Z ��U \��¢Y��¸Y X¸`
� ��U �QU;��� ¨ Z ��\]¨ ��� ¦Z�\ �'UOZ�� \�� ¥°U �������QU�]`
qQn%oSdgf iumgº65 f�nporqSs l o iut µVqSoSd oSd¸qQn^D t[i Dgf t o 5 »wIq �����h� �/ co�_�������`� \��s���t�`���`������\_�/�_^�mv�����5�/�#��]��`��c �U]�Z �s^�]F�_�
��] �: ��a�{d ���/���$��� f"�t� f� c*�a�������]�}��_u£�¤�� ¥§¦ ¨ U8^�_©`�^ � L�²�D�D i n�f�oSd l o R ¯'P�F®;0M¯l�m+± oSd l o � qQnJ( m i µ mo i º q fq m oSdgfq m o f t B l�º���$ � ® �! µ dgf t f -`/ � /®0 »8bedgf´² m qr¹;²gf)-s8q m qSs l�� f�nporqSs l o iut ² m+± f t n;¹;² l�t f ± f t�t[iut%ºQi n�n%qQn��#HRk¯�# � �2 ���0 � R�¯µ dgf t f �2 ����#"�¯ # �$&%�$'$)(�%_¯ � �$&% �*$)(�% ¯ » � o * l�m @pf n�d i µ m oHd l oVoSd¸qQnqQn oHdgf �°l 5 f�n t ² º f*µVqSoSd t f�n�Dgf * o�o i oSdgf D t q iut oHd l o�D+²¸oSn*s l n�n:�+ � l o
� l�m+± s l n�nF:�+ � l o $ � »-, iut f i B�f t -�qSo * l�m @pf n�d i µ m oHd l o�oHdgf t qQn)(qQn m i o *�ium npo l�m o @;²¸o�qSo ±¸i f�n n l orqQn k 5%v=P�F® ���¯$� � ��® �� ¯ k iut%l�ºSº���7 n�f�f8�q · ² t f;:=<�»�<�» �/f m�* f)-Ãbedgf iut fps :=<�»4:�:3qSsFD º q f�n oSd l o��� qQn s8q m qSs l�� »GDFEHG/. � T���OPQ���Q 03Or��L#NPO21V���4365 � OPS�ORQ T��¬TvS73 �=T ��L#W°�] ��� �/ c*�z�/����\Hco]F^��_a������� �0�/V�/���lq {�}��[ j��`����!�a� �����x{t\_]���^����/��]�����m
�/�#�dceMf#��ch!�c a���j �_a�����]F]"^k���`����c* �/] �5���5 ���`][f#��ceV�/��a�{�c*������c*VfUu� ]����`��^��_�~�`�"!� ���_^n�����`]��~a�]���� }�����\2�e���~�`�b!� ���_^n���¡ ���a�!���Z �`�¡ �#\�� uwx��� �/ co�_������\�c*]"^���a��¢}�������a¡ �������� c*#a����_m�����\[ ����� �`��]�}����/��V��/�#�hZ ���¡ ��\��h������c ^�]�co����V�/���������s���� ����]n�/�#�o�`���`j�] q,�/�#� |98¸~ ���:� ¦LY�� � ¨=¨ ��; Z ��U��<=�u� �QU� � Y��u��Y �V\��/ U �
> Q ( & ¯@? �LY ¨ U Z ��Uu� �¢Y � X��;U Y �\��/\��/ U �> Q ( ¿ ¯ `
��] !�����a�{d���b!� a����/����Z �`�¡ ��\_� �v=P�F® ���¯�#�� ? �� ¯���� Y��u� &BA � ? �� ¯pN� ��}��*��[} �����/�#� � �� #�/�_�$]��Y� �/ co�_������\hc*]"^��_a���mU�/���sZ ���� ��\_�] q��/��� |C8�~ ���� ���`][f#��ceV�/��a�{
�= �� ¯ A 0Q:D P��¯
����� � � � �L�%�F ��� �6������������=!�� � ���¸�%�%� �Ã����! ����"���� �����
−0.5 0.0 0.5
510
1520
�
�� �� ��
����������� �����«��� � �a��2�� ����[�������� �@��!Ã�[����� !'���a� � ! ��! �%��� � ���� �� � �������� �� � �'� ��!�/�a��� ���%����� � �� ����������\�C� #���!'���¸���;��!'����!�� �����E� �� ���§�����%���'��� ��� �\�©����������}������`� D y��¯~���¢������° ���`���������#q�]���c* ����]���u"�:����\_� m
Q v=P�F® ���¯ A 0D P��¯ N �0�Á#ua0�-�¯
°�]��= ��{£] �������=���l�/��ce ��]��=����m����s\� � �U�����#]M}�� �/��V�oq�] �sa¡ �`��� Q mv=P�"®2��� ¯ � v=y�F® �� ¯zu��|] ���$����_\�������a�{�m
a���c��� �
a���c7�`!�I ��� ��!#� ? ( ?�� � � � Q v=P� � ® ���¯ � 0
D P� ¯ N �0�Á#ua0L0M¯����������[{F�$�/�� �[mU����na�]"\[ aPmUa� ��� �h�/ co�a��h���_����� m��/��� |C8�~ ����c*���Fg��c*VfUuUwI�$\� �k a��`]d���s����]�}����/�� �t�/��� |C8�~ ���� #���][fF��c* �/�_a�{A�/���P¢[{��_�3�`!�a�� u
wx�7�`!�coce �l{�m ���7� �/ co�_������\|co]F^#��a��=}������7a¡ �����A�/ co�a�����m �/���|C8�~ ���h ���`]�fF��ce ����a�{'co������ceMfY ��^ P¢[{��_��u,�������`�����=�\[[Z��� �[��/���_���Y�����`!�a����|�#���[ j¬^#]M}��¬}������ �/���k�b!�ch�����A] qs� �� co�_�/�_�������a¡ �`���$ �¢�����$���zf"�3�zf# co�a����`��]�}���u£�¤�� ¥/¦ ¨ U ^�_©`�^ ��! 8� X��#"�\�� ¥ ��¨�¥°U � X��!$ ¬Jfpo S Z ¯py� Z ® �e& � Q ¯ -&% #0�® N N;N�® Q »Ã¬Jfpo S # S ¿ ®;N N N�® S I ¯ ± f m i o f oHdgf ± l o l�l�m+±�º fpo � #
����� ��������� ��������������������������������� ����!�"P��¿z®;N N Nz®2�CIb¯§± f m i o�f§oSdgf§² m ( m i µ m D l�tEl s´fpo�f t n;» R n�np²³s´f½oHd l o
�����FI�� � P��¿z® N N;N_®2�CI�¯~� I� Z [ ¿ � &Z � � &��k iut n i s´f ��� - » �pm oHd¸qQns i;± f º -*oSdgf t f l�t f l ns l�m 5 D l�tEl s´fpo�f t nl n i @pn�f t B l o¢q ium n;»§bedgf |C8�~ qQn �� # S # S ¿ ®;N N Nz® S I�¯ »�� m+± f t oSdgf� ��U ¥ � X�� " \�� ¥ ��¨¥°U � X�� ¦ �Q\ � ¨ Uu¥ Y��¥°\��CU �LUuX�U �S� ¨ Z �u� XY Z ¨ \³\;�O` 8��X��X�\³X ¦u� �S� ¥ UpZ �¢Y � UO� �Z[Y=¥*��Z[Y�\³X ¦ �Q\ � ¨ Uu¥ �� �CU ¥*��Z ��U�¥ ��Z[Y � � ¨=¨ �U�<=�LY u� ¨ UuX�Z Z�\ Z �LY��¥°\� U�¨�`
ºQi n�n k ² m�* orq ium£«:P�F® ���¯�# X IZ [ ¿ �� Z $ � Z ¯ & -%oSdgf t qQn)( i�k oSdgf |C8�~ qQnv=P�"® ���¯�# � &�» � o * l�m @pf½n�d i µ m oSd l o�oSdgf§s8q m qSs l���t qQn)(�qQn l D�D t[i � q43s l o f º65 �e& � ��e&p� �p&z¯�l�m+±§ium f * l�m ��m+±½l�m f�npo¢qSs l o iut �� oHd l o l�* d¸q f B�f�noSd¸qQn t qQn)(�» L�q m�* f �e& � ��e&©� �p& ¯]/ �e& -�µ�f n�f�f oSd l o �� d l n nps l�ºSº f t t qQn)(oSd l�m oHdgf |C8�~ » ��m D tEl�* orq * f)- oHdgf ± q �*f t f m�* f�@pfporµ�f�f m oHdgf t qQn)(�n * l�m@pf3np² @pnpo l�m orq l�º »�bed¸qQn3n�d i µ n oHd l o/s l�� qSs8²³s º q ( f º qQd i�i;± qQn m i o l�mi D+orqSs l�º f�npo¢qSs l o iut q m d¸q · d ± qSs´f m npq ium+l�º D t�i @ º fpsÃn;»DFEHG � �'3�Q¨ORWMWMO��,ORNPOr¡���|������ceMf$���l�/��c* ��]��� ��^8P¢[{ ���H�_�`�/��ceV�/]��� �`���`��]"]"^ �_�`�/��ceV�/]����
�����/�������_�����~���� �¦�/���z{ ���Z �¢��c* a�a"�`���`j�u �F] c*�z�/��co�������¦���¦ a���]t!#���_q�!#a�/]=\2�� �/ \z�/���`�Q4_���� ^|�_�`����c* �/]��_u
TVUOW+XLY Z[Y \]X3^�_]`a^�� R m f�npo¢qSs l o iut��� qQn �y��§����p�y�[�[��� ¥�� q k oHdgf t ff � qQnpoSn l�m i oHdgf t½t ² º f ���� np² * d oSd l ov=P�"®z�� � ¯ � v=y�F® �� ¯±q�]��3 a�a�� ��^v=P�"®z�� � ¯ / v=y�F® �� ¯±q�]��3 ��a��[ �l�t] ���h�¸N
£�¤�� ¥§¦ ¨ U8^�_©`�^���¬Jfpo R ¯'P�F®;0M¯ l�m+± *�ium npq ± f t f�nporqSs l o¢q m¸·h� µVqSoSdn;¹;² l�t f ± f t�t�iut´ºQi n�n;» ¬Jfpo ���HR�¯�#®Á » � fµVq ºSº n�d i µ oHd l o �� qQn l ± 3s8qQn�npq�@ º f;»,L�²�D�D i n�f m i o »°bedgf m oHdgf t f8f � qQnpoSn l3± q �*f t f m o t ² º f��� �VµVqSoSdnps l�ºSº f t t qQn)(�» ��m D l�t o¢q * ² ºal�t - v=¢ÁF® �� � ¯ � v=rÁ#® �� ¯ # - » �§f m�* f)- -3#v=¢ÁF® ���� ¯ # 53R�����SGU¯ $ Á�¯'& F HGJI�Á�¯'KLG »%bed¸² n)- ����yHGU¯ # Á » L i oSdgf t f8qQnm i�t ² º f oHd l oA@pf l oSn �� » � B�f m oSd i ² · d �� qQn l ± s8qQn�npq�@ º f/qSo¶qQn *pº f l�t�º65 l@ l ±± f * qQnpq ium t ² º f;»HG
����� ����!��%������� �¸������¢" ������ �����]��*^#�����`���x{©�� �eªI¤¦¥y¥*��¤ � �����M����q�q�]��o�_Z ���`{7�' �#^7�_Z ���`{
� ��-#m�5 ?�� �? ( � � P� ¯EKb� �x-#u����UO\��CU�¥�� � ����� �h§����F� �V¤¦¥��"�s§��M�r§����p�P���[� �¦¥��+��� L�²�D�D i n�f�oHd l o�� ¶ l�m+± oSd l o v=y�F®z���¯ qQn l2*�ium orq m ² i ² n k ² m�* orq ium i�k3�/k iut f B�f t 5�� »�¬Jfpo � @pf l D t q iut± f m npqSo 5 µVqSoSd k ² ºSº np²�D�D iut o l�m+± º fpo �� � @pfoSdgf�°l 5 f�n�� t ² º f;» �Sk oHdgf � l 5 f�n t qQn)( qQn ��m qSo f%oSdgf m �� � qQn l ± s8qQn�npq�@ º f;» ���������� �F!#�U] ��� �� ���������� ^�co���`������a�� u_�����_� �������`� � fF���`����:���_�`�/���
��!�a�� ��p��!�\2� �/�� � v=P�"® ���¯ � v=y�F® �� � ¯*q�] �� a�a3�£ ��^jv=P� � ® ���¯ /v=P� � ®2�����¯�q�] �¢��]�co��� � u����z����#6v=y� � ®z�����¯�$ v=P� � ®2�� ¯ �x-#u �F����\_�%v����\_]����/���b!�] !���m[�������`� ���0 � � ��-��`!�\2� ���� ��v=y�F® �� � ¯O$ v=P�"® ���¯���� � �q�]��� a�a����´P� � $ � ®2� � � � ¯zu �3]�}$m
� ��H® �� � ¯ $ � ���® �� ¯ # y v=y�F® �� � ¯� P� ¯EKb�*$ y v=P�"® ���¯� P� ¯EKb�# y �Qv=P�"®z�� � ¯ $ v=P�"®z���¯�� � y��¯EKb�� y ?���� �
?�� ( � � v=P�"® �� � ¯ $ v=P�"® ���¯ � � y��¯EKb��
�� y ?���� �
?�� ( � � y��¯'K"�� -]N
�3�_��\�� m � ��H® �� � ¯ � � ���®2���¯zu�����������c*�a����_�U�/�� � �� ��^�]"�����#] �0c*������co�Q4_�� ��® �� ¯�}�����\2�A\�]������/ ^���\_����������qy \_������ �t�� �h�����/���§P¢[{ ���:��!�a�� u G����UO\��CU�¥�� � � �.¬�fpo R3¿ ® N;N N�® RJI� ¯p �H® �e&z¯ »�� m+± f t n;¹;² l�t f ± f t 3t[iut§ºQi n�n)- R qQn l ± s8qQn�npq�@ º f;»
�����~���]"] q�] q������¢a¡ �`���/���_]����_c ��� �b!������~�/�_\2������\[ a# ��^h��� ]�co���`�/��^��!#�:�/��� ��^���o���: �:q�] a�a�]�}��_u������5�]��`�����`��]���co�[ �A���3 ^�co���`�����#a���q�]�� ��{n�`������\_�/a�{d�]������/��Z�����`��]��_u#�¦ j �$�/���$��`��]����/]o�U�%¯'��® ��&_¯zu#X ���_���&s��� Z ���l{pa¡ ����� m0�����d�]��`�����`��]���c*�� �p���5 #���][fF��c* �/�_a�{��_�"!� a���]R£u
��� � ��������� ��������������������������������� ����!�"�3]�} ���dco���#��c*VfF���x{� ��^´ ^�co���`�����#��a����x{ra�����j ��^��dwx�£���_������ aRm¦
��!#a�� ce[{$���~]���� m ��] ���5]��¦�������/���_��u�P~!#�¦�����`�� ��� ��] c*� qy \_��� a�����jb����� ^�co���`��������a����x{o ��^Aco������ceVfF���x{�u� ��U;\��CUu¥�� � � � L�²�D�D i n�f½oHd l o��� d l n *�ium npo l�m o t qQn)( l�m+± qQn l ± s8qQn 3npq�@ º f;» bedgf m qSo qQn§s8q m qSs l�� » �����������������������j ���v=R��F®2�� ¯ # �nq�]��e�`]�c*� �Vu wIqh��Y}~�_������] �
co������ceVf=�������A�/���_���$�zfF���`�/�:=��!�a����������!#\2�|�/�� �v=y�F® �� � ¯%� ��!�? v=P�"® �� � ¯�/¨��!#? v= ��F® �� ¯V#�uN
�������¢}~]�!#a�^A��co�a�{o�/��V�t��=�������� ^�c*�����`����a�� u G�3]�} }��=\[ ����`]MZ �o��`���l�/����\_����^�Z����`����]���] q~���#��]��`��c 0�ÁFu@0��*q�] �
���b!� ����^|���`��]���a�]����_u� ��U;\��CUu¥�� � � ¬Jfpo R3¿z® N;N N_®ER;I/z¯'P�"® 0M¯ »�bedgf m -�² m+± f t n;¹;² l�t f ±f t�t[iut§ºQi n�n)- ��/# R qQn%s8q m qSs l�� » ���������� � \_\�]��`^����#�k��]'�����_]����_c 0 Á#u �¸0�m ������o ^�co���`�����#a�� u �����
������jd] q ��s���½0 � Q }�����\2�A����\�] ���`�/ ���[u����������_��!�a��:q�] a�a�]�}��¢q��`]�c ����� g]��`��c 0�ÁFu ���"u G� a�����]�!�� �©co������ceMfp�`!�a��_�* ���A��] �o��!� �� ���/�_��^©��]Y���| ^�co�����`��g
��a��h�/���z{k ��� �`\_a�]��`�=��]A ^�co�����`����a�� u � �#[{��/��V� ��n���$�_���M������¥ � �y���§����p�y�[�[��� ¥��£��q��������`�r�zfF���`�/��Y��!�a�� �� � ��^ � � � -p�`!�\2� �/��V�v=P�"® ���� ¯�/�v=P�"®z���¯ $ � q�]��� a�a��"u� ��U;\��CUu¥�� � ��� �Hk �� qQn°s8q m qSs l�� oHdgf m qSo�qQn m i o�npo t[ium¸·�º65 q m+l ± s8qQn 3npq�@ º f;»DFEHG�� �J¡VL#OPS���WnJ�TUK T-3V� ��F!���]����|�/�� ��R ¯py�F® 0M¯= ��^¨\�]����`��^��_�*�_�`����c* �/�����'��}������
���b!� ����^±�_����] ��a�] ����ut°���]�c �/�#�p����zZ"��]�!�������\z�/��]�� }~�£jb��]�}i�/��V�
�����«����� �¸��������!'� �¸�����]��� �§� !��� ��������SRk¯# R ���o ^#c*����������a�� u �:]M} \�]����`��^��_�*�_�`����c* �/�������x}~]�m,!��#���zga¡ ����^ �"!� ���/�������_� �§# y��¿z®2� & ¯0 ��^5�`!���]������/�� ��R3¿�z¯'P��¿z® 0�¯� �#^R & z¯'P� & ® 0M¯����#^��������^#�����/a�{�mM}����/� a�]��`��«3y�F® ���¯V#YX &� [ ¿ y� � $ �� � ¯ & u�3] ���`!���#�����`����� a�{�m����HR�¯ # R ���� �� ��� ^�co���`������a��A}������`�1R #HR3¿ ® R & ¯ u �3]�} \�] ������^����$�/���o���_������ a��@4[ ����]��r��]�����]��`ce a�co�[ ����u���_����# P��¿ ® N;N N�®2���[¯zm�R # HR3¿ ®;N N N�®ER���¯�}����/� R Z ¯'P� Z ® 0�¯ ����Fg^�������#^������ ¯3 ��^ra�]����3«:P�F® ���¯�# X �
� [ ¿ P� � $ �� � ¯ & u �"�/�_���� �`��]�!���^��_^�_Z ���`{ ]����t}��#���d���:��`]MZ ��^d���� �[m���q�� � Á#m��/�#������#HRk¯V#jR ���,���� ^Fgc*�����`����a�� u�wI��\[ �Y���*�`��]�}��Y���� ���/���sq�]�a�a�]�}������n�_�`����c* �/]��_m�jb��]�}�� ���/������ co��� g �"�������A�_�`����c* �/]��_m#�� �:�`ce a�a��_�~�`����j��
���HRk¯�# d 0 $ � $ �X Z R &Z f � R Z �0�Á#ua0M��¯}������`�´#"�¯ � # ceMf�� "F®�-� bu,�������*���l�/��c* ��]��=�����`����jb�*����� R Z � �o�/] g}¢ ��^���-#uH������co���`�/ �������h�/�� ��m }����_�¨���l�/��ce ��������ce ��{£� �/ csg�_�������_m¢�/���_�������e������ �nZ a�!������ �`�`��������jb�������p���������l�/��ce �����_u¢�������]����`���`Z ����]��Y�a¡[{"�h �p��coU] �`�2 �b���`]�a��=���Yc*]"^��_���'�#]���� �� co�_�/�`��\q�!���\z�/��] �A�_�`����c* �/��]���uDFEHG�� �=O �,NRO � ��K T � 1�O �nLFQ TUK��¦WwI�5���h^����o\�!�a��5�/] ����^£��]F]�jb�5���� �s\_]�Z����sco]"^����`�p^���\_���`��]��'�����zg
]��`{����r� ���[V��^��_�/ ��aPu � �����\z�/�$] q ^���\_���`��]���������] �`{r\� ���U� q�]�!���^����� ���_a�a�o ��^ P������ ��� R��-�-���¯zm©P~���`����� '0 � ����¯ m�°������ !���]��''0 � � ��¯¢ �#^������c* ���| ��^ � �`��a�a�A'0�� � ��¯zuDFEHG�� � �HL"Ku FOPW�L#W
0�u�wx�=�[ \2�o] q��/�#�¢q�]�a�a�]M}�������co]F^#��a��_m ����^r�� ¯��/��� P¢[{��_�,�����`jh �#^�/�#�/P~�{ ���:���l�/��c* ��]���mF!��������*�`�b!� ���_^��_����] ��a�]��`��uP�¯ R zP~����]�co�¡ a Q ®gK�¯zm=K´zP~�z�2�gT ®aV¦¯zuy��¯¶R® ��]����`��]�����U¯ m��3��� c*c*"gT�® V ¯ uy\[¯�R z¯'P�"® �e&z¯~}������`� �e&3����jb��]�}��r ��^|�§z¯p��® ��&z¯ u
���u� ��������� ��������������������������������� ����!�"�Fu:���_� R3¿ ® N;N N�® RJI6 ¯py�F® �e&z¯n �#^ �`!���]�����}~�k���l�/��ce �����
}����/��a�] ���tq�!��#\_�/��]���«:P�"® ���¯ # P�/$ ���¯ & � � & u �F��]�} ���� � R ��� ^#c*����������a��t ��^Ac*������c*VfUu
Á#u:���_�s� # �V��¿ ® N;N N�®2��� ��U�d �������/�e� �/ co�_����� ��� \_� u �,�`]�Z������ � �����~�]��l�/���`��]���co]F^��������/����P¢[{������_�`����c* �/]���!���^#����4_����]Vg]��#��a�]��`��u��u � ���_a�a¡= �#^ P~�_�����_��u ¯:���_� R3¿z® N N N_® RJI=�U� =�/ co�a���q��`]�cµ^����`�������#!#�/��]��|}����/�rZ ���� ��\_� �e&[u � ]�������^������_�`����c* �/]��`�t] q,�����q�]��`c � �V& }������`� ��& ���=�/���|�/ co�a��AZ �`�¡ ��\_� u ���_�*�/���|a�]��`�q�!��#\_�/��]��nq�]������l�/��ce ������� �e&3���
«:�� & ® � � & ¯�# � � &� & $x0 $pa�]�� d � � &
� & f N° ����^*������]�F�/��c* a�Z a�!#�t] q �¢�/�� ��co������c*�@4����H�������`����j=q�]��~ a�a�e&�u�Fu rP~���`a����#����m�0�� �LÁ�¯zu"���_� R®cP�����]�co�¡ a� Q ® K�¯� ��^d��!#�U] ���t�����a�]����¢q�!��#\_�/��]������
«:%K�® �K�¯ # d 0 $ �K K f &}����_���8-E/WK / 0�u � ]�������^����3�/���h�_�`����c* �/]�� �K�SRk¯ # -FuU��������_�`�/��ceV�/]��:qy a�a���]�!#������^#�h�/���=� �� co�_�/�_����� \���¢-#®;0M¯3��!#�$}��}���a�a� a�a�]�}��/������u �F�#]M}'�/�� � �K�SRk¯�#.-������/�#��!��#���b!�� m�co���#��c*Vf�`!�a�� u
� u �� \]¥§¦ ��Z�U �¶£�¤¸¦LU �¢Y=¥ UuX�Zp` ¯ � ] c*� ���5�����h������j�] q������hcoa�� ��^����� �� co��� g �"�������e���l�/��ce ��]��t'0�Á#ua0M��¯���{e�`��ch!#a¡ ����]���u����`{=Z � g��]�!��hZ a�!#���o] q Q ��^´Z ����]�!��sZ��_\_��]����e�Fu �F!�coc* ���@4��d{�] !���`����!#a��/�_u
Part III
Statistical Models and
Methods
243
This is page 244
Printer: Opaque this
This is page 245
Printer: Opaque this
14
Linear Regression
Regression is a method for studying the relationship be-
tween a response variable Y and a covariates X. The co-
variate is also called a predictor variable or a feature. Later,
we will generalize and allow for more than one covariate. The The term \regres-
sion" is due to
Sir Francis Galton
(1822-1911) who
noticed that tall
and short men tend
to have sons with
heights closer to the
mean. He called this
\regression towards
the mean."
data are of the form
(Y1; X1); : : : ; (Yn; Xn):
One way to summarize the relationship between X and Y is
through the regression function
r(x) = E (Y jX = x) =
Zy f(yjx)dy: (14.1)
Most of this chapter is concerned with estimating the regression
function.
14.1 Simple Linear Regression
The simplest version of regression is when Xi is simple (a
scalar not a vector) and r(x) is assumed to be linear:
r(x) = �0 + �1x:
246 14. Linear Regression
This model is called the the simple linear regression model.
Let �i = Yi � (�0 + �1Xi). Then,
E (�ijXi) = E (Yi � (�0 + �1Xi)jXi) = E (YijXi)� (�0 + �1Xi)
= r(Xi)� (�0 + �1Xi)
= (�0 + �1Xi)� (�0 + �1Xi)
= 0:
Let �2(x) = V(�ijX = x). We will make the further simplifying
assumption that �2(x) = �2 does not depend on x. We can thus
write the linear regression model as follows.
De�nition 14.1 The Linear Regresion Model
Yi = �0 + �1Xi + �i (14.2)
where E (�ijXi) = 0 and V(�ijXi) = �2.
Example 14.2 Figure 14.1 shows a plot of Log surface tempera-
ture (Y) versus Log light intensity (X) for some nearby stars.
Also on the plot is an estimated linear regression line which will
be explained shortly.
The unknown parameters in the model are the intercept �0
and the slope �1 and the variance �2. Let b�0 and b�1 denote
estimates of �0 and �1. The �tted line is de�ned to be
br(x) = b�0 + b�1x: (14.3)
The predicted values or �tted values are bYi = br(Xi) and the
residuals are de�ned to be
b�i = Yi � bYi = Yi ��b�0 + b�1Xi
�: (14.4)
The residual sums of squares or rss is de�ned by
rss =
nXi=1
b�2i:
14.1 Simple Linear Regression 247
4.0 4.5 5.0 5.5
4.0
4.1
4.2
4.3
4.4
4.5
4.6
Logsurfacetemperature(Y)
Log light intensity (X)
FIGURE 14.1. Data on stars in nearby stars.
248 14. Linear Regression
The quantity rss measures how well the �tted line �ts the data.
De�nition 14.3 The least squares estimates are the val-
ues b�0 and b�1 that minimize rss =P
n
i=1 b�2i .Theorem 14.4 The least squares estimates are given by
b�1 =
Pn
i=1(Xi �Xn)(Yi � Y n)Pn
i=1(Xi �Xn)2; (14.5)
b�0 = Y n � b�1Xn: (14.6)
An unbiased estimate of �2is
b�2 =
�1
n� 2
� nXi=1
b�2i: (14.7)
Example 14.5 Consider the star data from Example 14.2. The
least squares estimates are b�0 = 3:58 and b�1 = 0:166. The �tted
line br(x) = 3:58 + 0:166x is shown in Figure 14.1. �
Example 14.6 (The 2001 Presidential Election.) Figure 14.2 shows
the plot of votes for Buchanan (Y) versus votes for Bush (X)
in Florida. The least squares estimates (omitting Palm Beach
County) and the standard errors areb�0 = 66:0991 bse (b�0) = 17:2926b�1 = 00:0035 bse (b�1) = 0:0002:
The �tted line is
Buchanon = 66:0991 + :0035Bush:
(We will see later how the standard errors were computed.) Fig-
ure 14.2 also shows the residuals. The inferences from linear
14.2 Least Squares and Maximum Likelihood 249
regression are most accurate when the residuals behave like ran-
dom normal numbers. Based on the residual plot, this is not the
case in this example. If we repeat the analysis replacing votes
with log(votes) we get
b�0 = �2:3298 bse (b�0) = :3529b�1 = 0:730300 bse (b�1) = 0:0358:
This gives the �t
log(Buchanon) = �2:3298 + :7303 log(Bush):
The residuals look much healthier. Later, we shall address two
interesting questions: (1) how do we see if Palm Beach County
has a statistically plausible outcome? (2) how do we do this prob-
lem nonparametrically? �
14.2 Least Squares and Maximum Likelihood
Suppose we add the assumption that �ijXi � N(0; �2), that
is,
YijXi � N(�i; �2i)
where �i = �0 + �1Xi. The likelihood function is
nYi=1
f(Xi; Yi) =
nYi=1
fX(Xi)fY jX(YijXi)
=
nYi=1
fX(Xi)�nYi=1
fY jX(YijXi)
= L1 � L2
where L1 =Q
n
i=1 fX(Xi) and
L2 =
nYi=1
fY jX(YijXi): (14.8)
250 14. Linear Regression
•• ••• •
• •• • •••
••
•••••••••••
••
•
• ••••• •••••••
••• •••
•
•
••
•• • ••••••••
•• ••
Bush
Buchanon
0 50000 150000 250000
01000
3000
••
••
• •• •
••
•
•
•
•••
•
•••••••••
••
•
• ••••
•
•
•••• •
•
••••
• ••
• •
••
•
••
••
••••
•• ••
Bush
Resid
uals
0 50000 150000 250000
-400
0200
•
•
•
•
• •
••
• •••
•
••
••
•• ••
•
•
• • •
••
•
• ••
•
•
• ••
•• •
••
•••
•
•
•
•
•
••
•
•• •
•••••
••
•
•••
log Bush
log B
uchanon
7 8 9 10 11 12
23
45
67
8
•••• • •
•
•
•
•
•
•
•
•••
•
••
••
•
•
••
•
•
••
•
•
••
•
•
•
••
•
• •
•
••
••• ••
•••• •
•
•
•
•
••
•
• •••
•
log Bush
Resid
uals
7 8 9 10 11 12
-1.0
0.0
1.0
FIGURE 14.2. Voting Data for Election 2000.
14.3 Properties of the Least Squares Estimators 251
The term L1 does not involve the parameters �0 and �1. We shall
focus on the second term L2 which is called the conditional
likelihood, given by
L2 � L(�0; �1; �) =nYi=1
fY jX(YijXi) / ��n exp
(� 1
2�2
Xi
(Yi � �i)2
):
The conditional log-likelihood is
`(�0; �1; �) = �n log � � 1
2�2
nXi=1
�Yi � (�0 + �1Xi)
�2: (14.9)
To �nd the mle of (�0; �1) we maximize `(�0; �1; �). From (14.9)
we see that maximizing the likelihood is the same as minimizing
the rssP
n
i=1
�Yi�(�0+�1Xi)
�2. Therefore, we have shown the
following.
Theorem 14.7 Under the assumption of Normality, the least squares
estimator is also the maximum likelihood estimator.
We can also maximize `(�0; �1; �) over � yielding the mle
b�2 =1
n
Xi
b�2i: (14.10)
This estimator is similar to, but not identical to, the unbiased
estimator. Common practice is to use the unbiased estimator
(14.7).
14.3 Properties of the Least Squares Estimators
We now record the standard errors and limiting distribution
of the least squares estimator. In regression problems, we usu-
ally focus on the properties of the estimators conditional on
Xn = (X1; : : : ; Xn). Thus, we state the means and variances as
conditional means and variances.
252 14. Linear Regression
Theorem 14.8 Let b�T = (b�0; b�1)T denote the least squares
estimators. Then,
E (b�jXn) =
��0�1
�V(b�jXn) =
�2
n s2X
1n
Pn
i=1X2i�Xn
�Xn 1
!(14.11)
where s2X= n�1
Pn
i=1(Xi �Xn)2.
The estimated standard errors of b�0 and b�1 are obtained by
taking the square roots of the corresponding diagonal terms of
V(b�jXn) and inserting the estimate b� for �. Thus,
bse (b�0) =b�
sXpn
rPn
i=1X2i
n(14.12)
bse (b�1) =b�
sXpn: (14.13)
We should really write these as bse (b�0jXn) and bse (b�1jXn) but
we will use the shorter notation bse (b�0) and bse (b�1).Theorem 14.9 Under appropriate conditions we have:
1. (Consistency): b�0 P�!�0 and b�1 P�!�1.
2. (Asymptotic Normality):
b�0 � �0bse (b�0) N(0; 1) andb�1 � �1bse (b�1) N(0; 1):
3. Approximate 1� � con�dence intervals for �0 and �1 areb�0 � z�=2 bse (b�0) and b�1 � z�=2 bse (b�1): (14.14)
The Wald test statistic for testing H0 : �1 = 0 versus H1 :
�1 6= 0 is: reject H0 if W > z�=2 where W = b�1= bse (b�1).
14.4 Prediction 253
Example 14.10 For the election data, on the log scale, a 95 per
cent con�dence interval is :7303 � 2(:0358) = (:66; :80). The
fact that the interval excludes 0 The Wald statistics for testing
H0 : �1 = 0 versus H1 : �1 6= 0 is W = j:7303�0j=:0358 = 20:40
with a p-value of P(jZj > 20:40) � 0. This is strong evidence
that that the true slope is not 0. �
14.4 Prediction
Suppose we have estimated a regression model br(x) = b�0+ b�1from data (X1; Y1); : : : ; (Xn; Yn). We observe the value X = x�
of the covariate for a new subject and we want to predict their
outcome Y�. An estimate of Y� is
bY� = b�0 + b�1X�: (14.15)
Using the formula for the variance of the sum of two random
variables,
V(bY�) = V(b�0 + b�1x�) = V(b�0) + x2�V(b�1) + 2x�Cov(b�0; b�1):
Theorem 14.8 gives the formulas for all the terms in this equa-
tion. The estimated standard error bse (bY�) is the square root ofthis variance, with b�2 in place of �2. However, the con�dence
interval for Y� is not of the usual form bY�� z�=2. The appendix
explains why. The correct form of the con�dence interval is given
in the following Theorem. We call the interval a prediction in-
terval.
254 14. Linear Regression
Theorem 14.11 (Prediction Interval) Let
b�2n
= bse 2(bY�) + b�2
= b�2
�Pn
i=1(Xi �X�)2
nP
i(Xi �X)2
+ 1
�: (14.16)
An approximate 1� � predicition interval for Y� is
bY� � z�=2�n: (14.17)
Example 14.12 (Election Data Revisited.) On the log-scale, our lin-
ear regression gives the following prediction equation: log(Buchanon) =
�2:3298 + :7303log(Bush). In Palm Beach, Bush had 152954
votes and Buchanan had 3467 votes. On the log scale this is
11.93789 and 8.151045. How likely is this outcome, assuming
our regression model is appropriate? Our prediction for log Buchanan
votes -2.3298 + .7303 (11.93789)=6.388441. Now 8.151045 is
bigger than 6.388441 but is is \signi�cantly" bigger? Let us com-
pute a con�dence interval. We �nd that b�n = :093775 and the ap-
proximate 95 per cent con�dence interval is (6.200,6.578) which
clearly excludes 8.151. Indeed, 8.151 is nearly 20 standard er-
rors from bY�. Going back to the vote scale by exponentiating, thecon�dence interval is (493,717) compared to the actual number
of votes which is 3467. �
14.5 Multiple Regression
Now suppose we have k covariates X1; : : : ; Xk. The data are
of the form
(Y1; X1); : : : ; (Yi; Xi); : : : ; (Yn; Xn)
where
Xi = (Xi1; : : : ; Xik):
14.5 Multiple Regression 255
Here, Xi is the vector of k covariate values for the ith observa-
tion. The linear regression model is
Yi =
kXj=1
�jXij + �i (14.18)
for i = 1; : : : ; n, where E (�ijX1i; : : : ; Xki) = 0. Usually we want
to include an intercept in the model which we can do by setting
Xi1 = 1 for i = 1; : : : ; n. At this point it will be more convenient
to express the model in matrix notation. The outcomes will be
denoted by
Y =
0BBB@Y1Y2...
Yn
1CCCAand the covariates will be denoted by
X =
0BBB@X11 X12 : : : X1k
X21 X22 : : : X2k
......
......
Xn1 Xn2 : : : Xnk
1CCCA :
Each row is one observation; the columns correspond to the k
covariates. Thus, X is a (n� k) matrix. Let
� =
0B@ �1...
�k
1CA and � =
0B@ �1...
�n
1CA :
Then we can write (14.18) as
Y = X� + �: (14.19)
Theorem 14.13 Assuming that the (k � k) matrix XTX is in-
vertible, the least squares estimate is
256 14. Linear Regression
b� = (XTX)�1XTY: (14.20)
The estimated regression function is
br(x) = kXj=1
b�jxj: (14.21)
The variance-covariance matrix of b� is
V(b�jXn) = �2(XTX)�1:
Under appropriate conditions,
b� � N(�; �2(XTX)�1):
An unbiased estimate of �2is
b�2 =
�1
n� k
� nXi=1
b�2i
where b� = X b� � Y is the vector of residuals. An approximate
1� � con�dence interval for �j is
b�j � z�=2 bse (b�j) (14.22)
where bse 2(b�j) is the jth diagonal element of the matrix b�2(XTX)�1.
Example 14.14 Crime data on 47 states in 1960 can be obtained
at http://lib.stat.cmu.edu/DASL/Stories/USCrime.html. If we
�t a linear regression of crime rate on 10 variables we get the
following:
14.6 Model Selection 257
Covariate Least Estimated t value p-value
Squares Standard
Estimate Error
(Intercept) -589.39 167.59 -3.51 0.001 **
Age 1.04 0.45 2.33 0.025 *
Southern State 11.29 13.24 0.85 0.399
Education 1.18 0.68 1.7 0.093
Expenditures 0.96 0.25 3.86 0.000 ***
Labor 0.11 0.15 0.69 0.493
Number of Males 0.30 0.22 1.36 0.181
Population 0.09 0.14 0.65 0.518
Unemployment (14-24) -0.68 0.48 -1.4 0.165
Unemployment (25-39) 2.15 0.95 2.26 0.030 *
Wealth -0.08 0.09 -0.91 0.367
This table is typical of the output of a multiple regression pro-
gram. The \t-value" is the Wald test statistic for testing H0 :
�j = 0 versus H1 : �j 6= 0. The asterisks denote \degree of
signi�cance" with more asterisks being signi�cant at a smaller
level. The example raises several important questions. In partic-
ular: (1) should we eliminate some variables from this model?
(2) should we interpret this relationships as causal? For exam-
ple, should we conclude that low crime prevention expenditures
cause high crime rates? We will address question (1) in the next
section. We will not address question (2) until a later Chapter.
14.6 Model Selection
Example 14.14 illustrates a problem that often arises in mul-
tiple regression. We may have data on many covariates but we
may not want to include all of them in the model. A smaller
model with fewer covariates has two advantages: it might give
better predictions than a big model and it is more parsimonious
(simpler). Generally, as you add more variables to a regression,
258 14. Linear Regression
the bias of the predictions decreases and the variance increases.
Too few covariates yields high bias; too many covariates yields
high variance. Good predictions result from achieving a good
balance between bias and variance.
In model selection there are two problems: (i) assigning a
\score" to each model which measures, in some sense, how good
the model is and (ii) searching through all the models to �nd
the model with the best score.
Let us �rst discuss the problem of scoring models. Let S �f1; : : : ; kg and let XS = fXj : j 2 Sg denote a subset of the
covariates. Let �S denote the coeÆcients of the corresponding
set of covariates and let b�S denote the least squares estimate of
�S. Also, let XS denote the X matrix for this subset of covari-
ates and de�ne brS(x) to be the estimated regression function
from (14.21). The predicted valus from model S are denoted bybYi(S) = brS(Xi). The prediction risk is de�ned to be
R(S) =
nXi=1
E (bYi(S)� Y �i)2 (14.23)
where Y �i
denotes the value of a future observation of Yi at
covariate value Xi. Our goal is to choose S to make R(S) small.
The training error is de�ned to be
bRtr(S) =
nXi=1
(bYi(S)� Yi)2:
This estimate is very biased and under-estimates R(S).
Theorem 14.15 The training error is a downward biased esti-
mate of the prediction risk:
E ( bRtr(S)) < R(S):
In fact,
bias ( bRtr(S)) = E ( bRtr(S))�R(S) = �2Xi=1
Cov(bYi; Yi): (14.24)
14.6 Model Selection 259
The reason for the bias is that the data are being used twice:
to estimate the parameters and to estimate the risk. When
we �t a complex model with many parameters, the covariance
Cov(bYi; Yi) will be large and the bias of the training error gets
worse. In summary, the training error is a poor estimate of risk.
Here are some better estimates.
Mallow's Cp statistic is de�ned by
bR(S) = bRtr(S) + 2jSjb�2 (14.25)
where jSj denotes the number of terms in S and b�2 is the es-
timate of �2 obtained from the full model (with all covariates
in the model). This is simply the training error plus a bias cor-
rection. This estimate is named in honor of Colin Mallows who
invented it. The �rst term in (14.25) measures the �t of the
model while the second measure the complexity of the model.
Think of the Cp statistic as:
lack of �t + complexity penalty:
Thus, �nding a good model involves trading o� �t and
complexity.
A related method for estimating risk is AIC (Akaike Infor-
mation Criterion). The idea is to choose S to maximize Some texts use a
slightly di�erent
de�nition of AIC
which involves
multiplying the
de�nition here by
2 or -2. This has
no e�ect on which
model is selected.
`S � jSj (14.26)
where `S is the log-likelihood of the model evaluated at the
mle . This can be thought of \goodness of �t" minus \com-
plexity." In linear regression with Normal errors, maximizing
AIC is equivalent to minimizing Mallow's Cp; see exercise 8.
Yet another method for estimating risk is leave-one-out
cross-validation. In this case, the risk estimator is
bRCV (S) =
nXi=1
(Yi � bY(i))2 (14.27)
260 14. Linear Regression
where bY(i)) is the prediction for Yi obtained by �tting the model
with Yi omitted. It can be shown that
bRCV (S) =
nXi=1
Yi � bYi(S)1� Uii(S)
!2
(14.28)
where Uii(S) is the ith diagonal element of the matrix
U(S) = XS(XT
SXS)
�1XT
S: (14.29)
Thus, one need not actually drop each observation and re-�t
the model. A generalization is k-fold cross-validation. Here
we divide the data into k groups; often people take k = 10. We
omit one group of data and �t the models to the remaining data.
We use the �tted model to predict the data in the group that
was omitted. We then estimate the risk byP
i(Yi � bYi)2 where
the sum is over the the data points in the omitted group. This
process is repeated for each of the k groups and the resulting
risk estimates are averaged.
For linear regression, Mallows Cp and cross-validation often
yield essentially the same results so one might as well use Mal-
lows' method. In some of the more complex problems we will
discuss later, cross-validation will be more useful.
Another scoring method is BIC (Bayesian information crite-
rion). Here we choose a model to maximize
BIC(S) = rss (S) + 2jSjb�2: (14.30)
The BIC score has a Bayesian interpretation. Let S = fS1; : : : ; Smgdenote a set of models. Suppose we assign the prior P(Sj) = 1=m
over the models. Also, assume we put a smooth prior on the pa-
rameters within each model. It can be shown that the posterior
probability for a model is approximately,
P(Sjjdata) � eBIC(Sj )PreBIC(Sr)
:
14.6 Model Selection 261
Hence, choosing the model with highest BIC is like choosing the
model with highest posterior probability. The BIC score also has
an information-theoretic interpretation in terms of something
called minimum description length. The BIC score is identical
to Mallows Cp except that it puts a more severe penalty for
complexity. It thus leads one to choose a smaller model than
the other methods.
Now let us turn to the problem of model search. If there are k
covariates then there are 2k possible models. We need to search
through all these models, assign a score to each one, and choose
the model with the best score. If k is not too large we can do
a complete search over all the models. When k is large, this is
infeasible. In that case we need to search over a subset of all
the models. Two common methods are forward and back-
ward stepwise regression. In forward stepwise regression, we
start with no covariates in the model. We then add the one vari-
able that leads to the best score. We continue adding variables
one at a time until the score does not improve. Backwards step-
wise regression is the same except that we start with the biggest
model and drop one variable at a time. Both are greedy searches;
nether is guaranteed to �nd the model with the best score. An-
other popular method is to do random searching through the
set of all models. However, there is no reason to expect this to
be superior to a deterministic search.
Example 14.16 We apply backwards stepwise regression to the
crime data using AIC. The following was obtained from the pro-
gram R. This program uses minus our version of AIC. Hence,
we are seeking the smallest possible AIC. This is the same is
minimizing Mallows Cp.
262 14. Linear Regression
The full model (which includes all covariates) has AIC= 310.37.
The AIC scores, in ascending order, for deleting one variable are
as follows:
variable Pop Labor South Wealth Males U1 Educ. U2 Age Expend
AIC 308 309 309 309 310 310 312 314 315 324
For example, if we dropped Pop from the model and kept the
other terms, then the AIC score would be 308. Based on this in-
formation we drop \population" from the model and the current
AIC score is 308. Now we consider dropping a variable from the
current model. The AIC scores are:
variable South Labor Wealth Males U1 Education U2 Age Expend
AIC 308 308 308 309 309 310 313 313 329
We then drop \Southern" from the model. This process is con-
tinued until there is no gain in AIC by dropping any variables.
In the end, we are left with the following model:
Crime = 1:2 Age + :75 Education + :87 Expenditure + :34 Males � :86 U1 + 2:31 U2:
Warning! This does not yet address the question of which vari-
ables are causes of crime.
14.7 The Lasso
There is an easier model search method although it addresses
a slightly di�erent question. The method, due to Tibshirani, is
called the Lasso. In this section we assume that the covari-
ates have all been rescaled to have the same variance. This
puts each covariate on the same scale. Consider estimating � =
(�1; : : : ; �k) by minimizing the loss function
nXi=1
(Yi � bYi)2 + �
kXj=1
j�jj (14.31)
14.8 Technical Appendix 263
where � > 0. The idea is to minimize the sums of squares but
we include a penalty that gets large if any of the � 0js are large.
The solution b� = (b�1; : : : ; b�k) can be found numerically and
will depend on the choice of �. It can be shown that some of
the b�j's will be 0. We interpret this is meaning that the jth is
omitted from the model. Hence, we are doing estimation and
model selection simultaneously. We need to choose a value of
�. We can do this by estimating the prediction risk R(�) as
a function of � and choosing � to minimze the estimated risk.
For example, we can estimate the risk using leave-one-out cross-
validation.
Example 14.17 Returning to the crime data, Figure 14.3 shows
the results of the lasso. The �rst plot shows the leave-one-out
cross-validation score as a function of �. The minimum occurs
at � = :7. The second plot shows the estimated coeÆcients as a
function of �. You can see how some estimated parameters are
zero until � gets larger. At � = :7, all the b�j are non-zero so theLasso chooses the full model.
14.8 Technical Appendix
Why is the predicition interval of a di�erent form than the
other con�dence intervals we have seen? The reason is that the
quantity we want to estimate, Y�, is not a �xed parameter, it
is a random variable. To understand this point better, let � =
�0+�1X� and let b� = b�0+ b�1X�. Thus, bY� = b� while Y� = �+ �.
Now, b� � N(�; se 2) where
se 2 = V(b�) = V(b�0 + b�1x�):Note that V(b�) is the same as V(bY�). Now, b� � 2
qV ar(b�) is a
95 per cent con�dence interval for � = �0+�1x� using the usual
264 14. Linear Regression
0.2 0.4 0.6 0.8
80
01
20
0
0.2 0.4 0.6 0.8
−1
01
03
0
�
cross-validationscore
�
b�(�)
FIGURE 14.3. The Lasso applied to the crime data.
14.8 Technical Appendix 265
argument for a con�dence interval. It is not a valid con�dence
interval for Y�. To see why, let's compute the probability thatbY� � 2
qV(bY�) contains Y�. Let s =qV ar(bY�). Then,
P(bY� � 2s < Y� < bY� + 2s) = P
�2 <
bY� � Y�
s< 2
!
= P
�2 <
b� � � � �
s< 2
!
= P
�2 <
b� � �
s� �
s< 2
!
� P
��2 < N(0; 1)�N
�0;�2
s2
�< 2
�= P
��2 < N
�0; 1 +
�2
s2
�< 2
�6= :95:
The problem is that the quantity of interest Y� is equal
to a parameter � plus a random variable. We can �x this
by de�ning
�2n= V(bY�) + �2 =
� Pi(xi � x�)
2
nP
i(xi � x)2
+ 1
��2:
In practice, we substitute b� for � and we denote the resulting
quantity by b�n. Now consider the interval bY� � 2b�n. Then,P(bY� � 2b�n < Y� < bY� + 2b�n) = P
�2 <
bY� � Y�b�n < 2
!
= P
�2 <
b� � � � �b�n < 2
!
� P
��2 < N(0; s2 + �2)b�n < 2
�� P
��2 < N(0; s2 + �2)
�n< 2
�
266 14. Linear Regression
= P (�2 < N(0; 1) < 2) = :95:
Of course, a 1� � interval is given by bY� � z�=2b�n.14.9 Exercises
1. Prove Theorem 14.4.
2. Prove the formulas for the standard errors in Theorem
14.8. You should regard the Xi's as �xed constants.
3. Consider the regression through the origin model:
Yi = �Xi + �:
Find the least squares estimate for �. Find the standard
error of the estimate. Find conditions that guarantee that
the estimate is consistent.
4. Prove equation (14.24).
5. In the simple linear regression model, construct a Wald
test for H0 : �1 = 17�0 versus H1 : �1 6= 17�0.
6. Get the passenger car mileage data from
http://lib.stat.cmu.edu/DASL/Data�les/carmpgdat.html
(a) Fit a simple linear regression model to predict MPG
(miles per gallon) from HP (horsepower). Summarize your
analysis including a plot of the data with the �tted line.
(b) Repeat the analysis but use log(MPG) as the response.
Compare the analyses.
7. Get the passenger car mileage data from
http://lib.stat.cmu.edu/DASL/Data�les/carmpgdat.html
(a) Fit a multiple linear regression model to predict MPG
(miles per gallon) from HP (horsepower). Summarize your
analysis.
14.9 Exercises 267
(b) Use Mallow Cp to select a best sub-model. To search
trhough the models try (i) all possible models, (ii) for-
ward stepwise, (iii) backward stepwise. Summarize your
�ndings.
(c) Repeat (b) but use BIC. Compare the results.
(d) Now use the Lasso and compare the results.
8. Assume that the errors are Normal. Show that the model
with highest AIC (equation (14.26)) is the model with the
lowest Mallows Cp statistic.
9. In this question we will take a closer look at the AIC
method. Let X1; : : : ; Xn be iid observations. Consider two
modelsM0 and M1. Under M0 the data are assumed to
be N(0; 1) while under M1 the data are assumed to be
N(�; 1) for some unknown � 2 R:
M0 : X1; : : : ; Xn � N(0; 1)
M1 : X1; : : : ; Xn � N(�; 1); � 2 R:
This is just another way to view the hypothesis testing
problem: H0 : � = 0 versus H1 : � 6= 0. Let `n(�) be
the log-likelihood function. The AIC score for a model is
the log-likelihood at the mle minus the number of param-
eters. (Some people multiply this score by 2 but that is
irrelevant.) Thus, the AIC score for M0 is AIC0 = `n(0)
and the AIC score for M1 is AIC1 = `n(b�) � 1. Suppose
we choose the model with the highest AIC score. Let Jn
denote the selected model:
Jn =
�0 if AIC0 > AIC1
1 if AIC1 > AIC0:
(a) Suppose thatM0 is the true model, i.e. � = 0. Find
limn!1
P (Jn = 0) :
268 14. Linear Regression
Now compute limn!1 P (Jn = 0) when � 6= 0.
(b) The fact that limn!1 P (Jn = 0) 6= 1 when � = 0 is
why some people say that AIC \over�ts." But this is not
quite true as we shall now see. Let ��(x) denote a Normal
density function with mean � and variance 1. De�ne
bfn(x) = � �0(x) if Jn = 0
�b�(x) if Jn = 1:
If � = 0, show that D(�0; bfn) p! 0 as n!1 where
D(f; g) =
Zf(x) log
�f(x)
g(x)
�dx
is the Kullback-Leibler distance. Show also thatD(��; bfn) p!0 if � 6= 0. Hence, AIC consistently estimates the true den-
sity even if it \overshoots" the correct model.
REMARK: If you are feeling ambitious, repeat this anal-
ysis for BIC which is the log-likelihood minus (p=2) logn
where p is the number of parameters and n is sample size.
This is page 269
Printer: Opaque this
15
Multivariate Models
In this chapter we revisit the Multinomial model and the mul-
tivariate Normal, as we will need them in future chapters.
Let us �rst review some notation from linear algebra. Re-
call that if x and y are vectors then xTy =
Pj xjyj. If A is a
matrix then det(A) denotes the determinant of A, AT denotes
the transpose of A and A� denotes the the inverse of A (if the
inverse exists). The trace of a square matrix A { denoted by
tr(A) { is the sum of its diagonal elements. The trace satis-
�es tr(AB) = tr(BA) and tr(A) + tr(B). Also, tr(a) = a if a
is a scalar (i.e. a real number). A matrix is positive de�nite if
xT�x > 0 for all non-zero vectors x. If a matrix � is symmet-
ric and positive de�nite, there exists a matrix �1=2 { called the
square root of � { with the following properties:
1. �1=2 is symmetric;
2. � = �1=2�1=2;
3. �1=2��1=2 = ��1=2�1=2 = I where ��1=2 = (�1=2)�1.
270 15. Multivariate Models
15.1 Random Vectors
Multivariate models involve a random vector X of the form
X =
0B@ X1
...
Xk
1CA :
The mean of a random vector X is de�ned by
� =
0B@ �1
...
�k
1CA =
0B@ E(X1)...
E(Xk)
1CA : (15.1)
The covariance matrix � is de�ned to be
� = V(X) =
26664V(X1) Cov(X1; X2) � � � Cov(X1; Xk)
Cov(X2; X1) V(X2) � � � Cov(X2; Xk)...
......
...
Cov(Xk; X1) Cov(Xk; X2) � � � V(Xk)
37775 :(15.2)
This is also called the variance matrix or the variance-covariance
matrix.
Theorem 15.1 Let a be a vector of length k and let X be a ran-
dom vector of the same length with mean � and variance �.
Then E (aTX) = aT� and V(aTX) = a
T�a. If A is a matrix
with k columns then E (AX) = A� and V(AX) = A�AT .
Now suppose we have a random sample of n vectors:0BBB@X11
X21
...
Xk1
1CCCA ;
0BBB@X12
X22
...
Xk2
1CCCA ; : : : ;
0BBB@X1n
X2n
...
Xkn
1CCCA : (15.3)
The sample mean X is a vector de�ned by
X =
0B@ X1
...
Xk
1CA
15.2 Estimating the Correlation 271
where X i = n�1Pn
j=1Xij. The sample variance matrix, also
called the covariance matrix or the variance-covariance matrix,
is
S =
26664s11 s12 � � � s1k
s12 s22 � � � s2k...
......
...
s1k s2k � � � skk
37775 (15.4)
where
sab =1
n� 1
nXj=1
(Xaj �Xa)(Xbj �Xb):
It follows that E (X) = �. and E (S) = �.
15.2 Estimating the Correlation
Consider n data points from a bivariate distribution:�X11
X21
�;
�X12
X22
�; � � � ;
�X1n
X2n
�:
Recall that the correlation between X1 and X2 is
� =E ((X1 � �1)(X2 � �2))
�1�2: (15.5)
The sample correlation (the plug-in estimator) is
b� = Pni=1(X1i �X1)(X2i �X2)
s1s2: (15.6)
We can construct a con�dence interval for � by applying the
delta method as usual. However, it turns out that we get a more
accurate con�dence interval by �rst constructing a con�dence
interval for a function � = f(�) and then applying the inverse
function f�1. The method, due to Fisher, is as follows. De�ne
f(r) =1
2
�log(1 + r)� log(1� r)
�
272 15. Multivariate Models
akd let � = f(�). The inverse of r is
g(z) � f�1(z) =
e2z � 1
e2z + 1:
Now do the following steps:
Approximate Con�dence Interval for The Correlation
1. Compute
b� = f(b�) = 1
2
�log(1 + b�)� log(1� b�)�:
2. Compute the approximate standard error of b� which can
be shown to be bse (b�) = 1pn� 3
:
3. An approximate 1� � con�dence interval for � = f(�) is
(a; b) ��b� � z�=2p
n� 3; b� + z�=2p
n� 3
�:
4. Apply the inverse transformation f�1(z) to get a con�-
dence interval for �:�e2a � 1
e2a + 1;e2b � 1
e2b + 1
�:
15.3 Multinomial
Let us now review the Multinomial distribution. Consider
drawing a ball from an urn which has balls with k di�erent
colors labeled color 1, color 2, ... , color k. Let p = (p1; : : : ; pk)
15.3 Multinomial 273
where pj � 0 andPk
j=1 pj = 1 and suppose that pj is the prob-
ability of drawing a ball of color j. Draw n times (independent
draws with replacement) and let X = (X1; : : : ; Xk) where Xj is
the number of times that color j appeared. Hence, n =Pk
j=1Xj.
We say that X has a Multinomial (n,p) distribution. The prob-
ability function is
f(x; p) =
�n
x1 : : : xk
�px11 � � � pxkk
where �n
x1 : : : xk
�=
n!
x1! � � �xk!:
Theorem 15.2 Let X � Multinomial(n; p). Then the marginal
distribution of Xj is Xj � Binomial(n; pj). The mean and vari-
ance of X are
E (X) =
0B@ np1...
npk
1CAand
V(X) =
0BBB@np1(1� p1) �np1p2 � � � �np1pk�np1p2 np2(1� p2) � � � �np2pk
......
......
�np1pk �np2pk � � � npk(1� pk)
1CCCA :
Proof. That Xj � Binomial(n; pj) follows easily. Hence,
E (Xj) = npj and V(Xj) = npj(1�pj). To compute Cov(Xi; Xj)
we proceed as follows. Notice thatXi+Xj � Binomial(n; pi+pj)
and so V(Xi+Xj) = n(pi+ pj)(1� pi� pj). On the other hand,
V(Xi +Xj) = V(Xi) + V(Xj) + 2Cov(Xi; Xj)
= npi(1� pi) + npj(1� pj) + 2Cov(Xi; Xj):
Equating this last expression with n(pi+pj)(1�pi�pj) implies
that Cov(Xi; Xj) = �npipj. �
274 15. Multivariate Models
Theorem 15.3 The maximum likelihood estimator of p is
bp =0B@ bp1
...bpk1CA =
0B@X1
n...Xk
n
1CA =X
n:
Proof. The log-likelihood (ignoring the constant) is
`(p) =
kXj=1
Xj log pj:
When we maximize ` we have to be careful since we must enforce
the constraint thatP
j pj = 1. We use the method of Lagrange
multipliers and instead maximize
A(p) =
kXj=1
Xj log pj + �
�Xj
pj � 1�:
Now@A(p)
@pj=
Xj
pj+ �:
Setting@A(p)
@pj= 0 yields bpj = �Xj=�. Since
Pj bpj = 1 we see
that � = �n and hence bpj = Xj=n as claimed. �
Next we would like to know the variability of the mle . We
can either compute the variance matrix of bp directly or we can
approximate the variability of the mle by computing the Fisher
information matrix. These two approaches give the same answer
in this case. The direct approach is easy: V(bp) = V(X=n) =
n�2V(X) and so
V(bp) = 1
n
0BBB@p1(1� p1) �p1p2 � � � �p1pk�p1p2 p2(1� p2) � � � �p2pk
......
......
�p1pk �p2pk � � � pk(1� pk)
1CCCA :
15.4 Multivariate Normal 275
15.4 Multivariate Normal
Let us recall how the multivariate Normal distribution is de-
�ned. To begin, let
Z =
0B@ Z1
...
Zk
1CAwhere Z1; : : : ; Zk � N(0; 1) are independent. The density of Z
is
f(z) =1
(2�)k=2exp
(�1
2
kXj=1
z2j
)=
1
(2�)k=2exp
��1
2zTz
�:
(15.7)
The variance matrix of Z is the identity matrix I. We write
Z � N(0; I) where it is understood that 0 denotes a vector of
k zeroes. We say that Z has a standard multivariate Normal
distribution.
More generally, a vector X has a multivariate Normal distri-
bution, denoted by X � N(�;�), if its density is
f(x; �;�) =1
(2�)k=2det(�)1=2exp
��1
2(x� �)T��1(x� �)
�(15.8)
where det(�) denotes the determinant of a matrix, � is a vector
of length k and � is a k�k symmetric, positive de�nite matrix.
Then E (X) = � and V(X) = �. Setting � = 0 and � = I gives
back the standard Normal.
Theorem 15.4 The following properties hold:
1. If Z � N(0; 1) and X = �+ �1=2Z then X � N(�;�).
2. If X � N(�;�), then ��1=2(X � �) � N(0; 1).
3. If X � N(�;�) a is a vector of the same length as X,
then aTX � N(aT�; aT�a).
276 15. Multivariate Models
4. Let
V = (X � �)T��1(X � �):
Then V � �2k.
Suppose we partition a random Normal vector X into two
parts X = (Xa; Xb) We can similarly partition the the mean
� = (�a; �b) and the variance
� =
��aa �ab
�ba �bb
�:
Theorem 15.5 Let X � N(�;�). Then:
(1) The marginal distribition of Xa is Xa � N(�a;�aa).
(2) The conditional distribition of Xb given Xa = xa is
XbjXa = xa � N(�(xa);�(xa))
where
�(xa) = �b + �ba��1aa (xa � �a) (15.9)
�(xa) = �bb � �ba��1aa �ab: (15.10)
Theorem 15.6 Given a random sample of size n from a N(�;�),
the log-likelihood is (up to a constant not depending on � or �)
is given by
`(�;�) = �n
2(X��)T��1(X��)�
n
2tr(��1S)�
n
2log det(�):
The mle is
b� = X and b� =
�n� 1
n
�S: (15.11)
15.5 Appendix 277
15.5 Appendix
Proof of Theorem 15.6. Denote the ith random vector by Xi.
The log-likelihood is
`(�;�) =Xi
f(X i; �;�) = �kn
2log(2�)�
n
2log det(�)�
1
2
Xi
(X i��)T��1(X i��):
Now,Xi
(X i � �)T��1(X i � �) =Xi
[(X i �X) + (X � �)]T��1[(X i �X) + (X � �)]
=Xi
�(X i �X)T��1(X i �X)
�+ n(X � �)T��1(X � �)
sinceP
i(Xi � X)��1(X � �) = 0. Also, notice that (X i �
�)T��1(X i � �) is a scalar, soXi
(X i � �)T��1(X i � �) =Xi
tr�(X i � �)T��1(X i � �)
�=
Xi
tr���1(X i � �)(X i � �)T
�= tr
"��1
Xi
(X i � �)(X i � �)T
#= n tr
���1S
�and the conclusion follows.
15.6 Exercises
1. Prove Theorem 15.1.
2. Find the Fisher informationmatrix for themle of a Multi-
nomial.
3. Prove Theorem 15.5.
4. (Computer Experiment. Write a function to generate nsim
observations from a Multinomial(n; p) distribution.
278 15. Multivariate Models
5. (Computer Experiment. Write a function to generate nsim
observations from a Multivariate normal with given mean
� and covariance matrix �.
6. (Computer Experiment.Generate 1000 random vectors from
a N(�;�) distribution where
� =
�3
8
�; � =
�2 6
6 2
�:
Plot the simulation as a scatterplot. Find the distribution
of X2jX1 = x1 using theorem 15.5. In particular, what is
the formula for E (X2jX1 = x1)? Plot E (X2jX1 = x1) on
your scatterplot. Find the correlation � between X1 and
X2. Compare this with the sample correlations from your
simulation. Find a 95 per cent con�dence interval for �.
Estimate the covariance matrix �.
7. Generate 100 random vectors from a multivariate Normal
with mean (0; 2)T and variance�1 3
3 1
�:
Find a 95 per cent con�dence interval for the correlation
�. What is the true value of �?
This is page 279
Printer: Opaque this
16
Inference about Independence
In this chapter we address the following questions:
(1) How do we test if two random variables are independent?
(2) How do we estimate the strength of dependence between two
random variables?
When Y and Z are not independent, we say that they are
dependent or associated or related. If Y and Z are associ- Recall that we write
Y qZ to mean that
Y and Z are inde-
pendent.
ated, it does not imply that Y causes Z or that Z causes Y . If
Y does cause Z then changing Y will change the distribution of
Z, otherwise it will not change the distribution of Z.
For example, quitting smoking Y will reduce your probabil-
ity of heart disease Z. In this case Y does cause Z. As another
example, owning a TV Y is associated with having a lower in-
cidence of starvation Z. This is because if you own a TV you
are less likely to live in an impoverished nation. But giving a
starving person will not cause them to stop being hungry. In
this case, Y and Z are associated but the relationship is not
causal.
280 16. Inference about Independence
We defer detailed discussion of the important question of cau-
sation until a later chapter.
16.1 Two Binary Variables
Suppose that Y and Z are both binary. Consider a data set
(Y1; Z1); : : : ; (Yn; Zn). Represent the data as a two-by-two table:
Y = 0 Y = 1
Z = 0 X00 X01 X0�
Z = 1 X10 X11 X1�
X�0 X
�1 n = X��
where the Xij represent counts:
Xij = number of observations for which Y = i and Z = j:
The dotted subscripts denote sums. For example, Xi� =P
jXij.
This is a convention we use throughout the remainder of the
book. Denote the corresponding probabilities by:
Y = 0 Y = 1
Z = 0 p00 p01 p0�Z = 1 p10 p11 p1�
p�0 p
�1 1
where pij = P(Z = i; Y = j). Let X = (X00; X01; X10; X11)
denote the vector of counts. Then X � Multinomial(n; p) where
p = (p00; p01; p10; p11). It is now convenient to introduce two new
parameters.
16.1 Two Binary Variables 281
De�nition 16.1 The odds ratio is de�ned to be
=p00p11
p01p10: (16.1)
The log odds ratio is de�ned to be
= log( ): (16.2)
Theorem 16.2 The following statements are equivalent:
1. Y q Z.2. = 1.3. = 0.4. For i; j 2 f0; 1g,
pij = pi�p�j: (16.3)
Now consider testing
H0 : Y q Z versus H1 : Y 6qZ:
First we consider the likelihood ratio test. Under H1, X �
Multinomial(n; p) and the mle is bp = X=n. Under H0, we again
have that X � Multinomial(n; p) but p is subject to the con-
straint
pij = pi�p�j; j = 0; 1:
This leads to the following test.
Theorem 16.3 (Likelihood Ratio Test for Independence in a 2-by-2 table)
Let
T = 2
1Xi=0
2Xj=0
Xij log
�XijX��
Xi�X�j
�: (16.4)
Under H0, T �21. Thus, an approximate level � test is
obtained by rejecting H0 then T > �21;�
.
282 16. Inference about Independence
Another popular test for independence is Pearson's �2 test.
Theorem 16.4 (Pearson's �2 test for Independence in a 2-by-2 table)
Let
U =
1Xi=0
1Xj=0
(Xij � Eij)2
Eij
(16.5)
where
Eij =Xi�X�j
n:
Under H0, U �21. Thus, an approximate level � test is
obtained by rejecting H0 then U > �21;�
.
Here is the intuition for the Pearson test. Under H0, pij =
pi�p�j, so the maximum likelihood estimator of pij under H0 is
bpij = bpi�bp�j = Xi�
n
X�j
n:
Thus, the expected number of observations in the (i,j) cell is
Eij = nbpij = Xi�X�j
n:
The statistic U compares the observed and expected counts.
Example 16.5 The following data from Johnson and Johnson
(1972) relate tonsillectomy and Hodgkins disease. (The data are
actually from a case-control study; we postpone discussion of
this point until the next section.)
Hodgkins Disease No Disease
Tonsillectomy 90 165 255
No Tonsillectomy 84 307 391
Total 174 472 646
We would like to know if tonsillectomy is related to Hodgkins
disease. The likelihood ratio statistic is T = 14:75 and the p-
value is P(�21> 14:75) = :0001. The �2 statistic is U = 14:96
16.1 Two Binary Variables 283
and the p-value is P(�21> 14:96) = :0001. We reject the null
hypothesis of independence and conclude that tonsillectomy is
associated with Hodgkins disease. This does not mean that ton-
sillectomies cause Hodgkins disease. Suppose, for example, that
doctors gave tonsillectomies to the most seriosuly ill patients.
Then the assocation between tonsillectomies and Hodgkins dis-
ease may be due to the fact that those with tonsillectomies were
the most ill patients and hence more likely to have a serious
disease.
We can also estimate the strength of dependence by estimat-
ing the odds ratio and the log-odds ratio .
Theorem 16.6 The mle 's of and are
b =X00X11
X01X10
; b = log b : (16.6)
The asymptotic stanrad errors (computed from the delta method)
are
bse (b ) =
r1
X00
+1
X01
+1
X10
+1
X11
(16.7)
bse ( b ) = b bse (b ): (16.8)
Remark 16.7 For small sample sizes, b and b can have a very
large variance. In this case, we often use the modi�ed estimator
b =
�X00 +
1
2
� �X11 +
1
2
��X01 +
1
2
� �X10 +
1
2
� : (16.9)
Yet another test for indepdnence is the Wald test for = 0
given by W = (b �0)= bse (b ). A 1�� con�dence interval for isb � z�=2 bse (b ). A 1�� con�dence interval for can be obtained
in two ways. First, we could use b � z�=2 bse ( b ). Second, since = e we could use
exp�b � z�=2 bse (b ) : (16.10)
This second method is usually more accurate.
284 16. Inference about Independence
Example 16.8 In the previous example,
b =90� 307
165� 84= 1:99
and b = log(1:99) = :69:
So tonsillectomy patients were twice as likely to have Hodgkins
disease. The standard error of b isr1
90+
1
84+
1
165+
1
307= :18:
The Wald statistic is W = :69=:18 = 3:84 whose p-value is
P(jZj > 3:84) = :0001, the same as the other tests. A 95 per
cent con�dence interval for is b � 2(:18) = (:33; 1:05). A 95
per cent con�dence interval for is (e:33; e1:05) = (1:39; 2:86).
16.2 Interpreting The Odds Ratios
Suppose event A as probability P (A). The odds of A are de�ned
as
odds(A) =P (A)
1� P (A):
It follows that
P(A) =odds(A)
1 + odds(A):
Let E be the event that someone is exposed to something
(smoking, radiation, etc) and let D be the event that they get
a disease. The odds of getting the disease given that you are
exposed are
odds(DjE) =P(DjE)
1� P(DjE)
and the odds of getting the disease given that you are not ex-
posed are
odds(DjEc) =P(DjEc)
1� P(DjEc):
16.2 Interpreting The Odds Ratios 285
The odds ratio is de�ned to be
=odds(DjE)
odds(DjEc):
If = 1 then disease probability is the same for exposed and un-
exposed. This implies that these events are independent. Recall
that the log-odds ratio is de�ned as = log( ). Independence
corresponds to = 0.
Consider this table of probabilities:
Dc D
Ec p00 p01 p0�E p10 p11 p1�
p�0 p
�1 1
Denote the data by
Dc D
Ec X00 X01 X0�
E X10 X11 X1�
X�0 X
�1 X��
Now
P(DjE) =p11
p10 + p11and P(DjEc) =
p01
p00 + p01
and so
odds(DjE) =p11
p10and odds(DjEc) =
p01
p00
and therefore,
=p11p00
p01p10:
To estimate the parameters, we have to �rst consider how the
data were collected. There are three methods.
Multinomial Sampling. We draw a sample from the pop-
ulation and, for each person, record their exposure and disease
status. In this case,X = (X00; X01; X10; X11) � Multinomial(n; p).
286 16. Inference about Independence
We then estimate the probabilities in the table by bpij = Xij=n
and b =bp11bp00bp01bp10 =
X11X00
X01X10
:
Prospective Sampling. (Cohort Sampling). We get
some exposed and unexposed people and count the number with
disease in each group. Thus,
X01 � Binomial(X0�; P (DjEc))
X11 � Binomial(X1�; P (DjE)):
We should really write x0� and x1� instead of X0� and X1� since
in this case, these are �xed not random but for notational sim-
plicity I'll keep using capital letters. We can estimate P (DjE)
and P (DjEc) but we cannot estimate all the probabilities in the
table. Still, we can estimate since is a function of P (DjE)
and P (DjEc). Now bP (DjE) = X11
X1�
and bP (DjEc) =X01
X0�
:
Thus, b =X11X00
X01X10
just as before.
Case-Control (Retrospective) Sampling. Here we get
some diseased and non-diseased people and we observe how
many are exposed. This is much more eÆcient if the disease
is rare. Hence,
X10 � Binomial(X�0; P (EjD
c))
X11 � Binomial(X�1; P (EjD)):
From these data we can estimate P(EjD) and P(EjDc). Sur-
prisingly, we can also still estimate . To understand why, note
16.2 Interpreting The Odds Ratios 287
that
P(EjD) =p11
p01 + p11; 1�P(EjD) =
p01
p01 + p11; odds(EjD) =
p11
p01:
By a similar argument,
odds(EjDc) =p10
p00:
Hence,odds(EjD)
odds(EjDc)=p11p00
p01p10= :
From the data, we form the following estimates:
bP (EjD) =X11
X�1
; 1� bP (EjD) =X01
X�1
; [odds(EjD) =X11
X01
; [odds(EjDc) =X10
X00
:
Therefore, b =X00X11
X01X10
:
So in all three data collection methods, the estimate of turns
out to be the same.
It is tempting to try to estimate P(DjE) � P(DjEc). In a
case-control design, this quantity is not estimable. To see this,
we apply Bayes' theorem to get
P(DjE)� P(DjEc) =P(EjD)P(D)
P(E)�P(EcjD)P(D)
P(Ec):
Because of the way we obtained the data, P(D) is not estimable
from the data. However, we can estimate � = P(DjE)=P(DjEc),
which is called the relative risk, under the rare disease as-
sumption.
Theorem 16.9 Let � = P(DjE)=P(DjEc). Then
�! 1
as P(D)! 0.
288 16. Inference about Independence
Thus, under the rare disease assumption, the relative risk is
approximately the same as the odds ratio and, as we have seen,
we can estimate the ods ratio.
In a randomized experiment, we can interpret a strong associ-
ation, that is 6= 1, as a causal relationship. In an observational
(non-randomized) study, the association can be due to other
unobserved confounding variables. We'll discuss causation in
more detail later.
16.3 Two Discrete Variables
Now suppose that Y 2 f1; : : : ; Ig and Z 2 f1; : : : ; Jg are two
discrete variables. The data can be represented as an I� by�J
table of counts:
Y = 1 Y = 2 � � � Y = j � � � Y = J
Z = 1 X11 X12 � � � X1j � � � X1J X1�
......
......
......
......
Z = i Xi1 Xi2 � � � Xij � � � XiJ Xi�
......
......
......
......
Z = I XI1 XI2 � � � XIj � � � XIJ XI�
X�1 X
�2 � � � X�j � � � X
�J n
Consider testing H0 : Y q Z versus H1 : Y 6qZ.
16.3 Two Discrete Variables 289
Theorem 16.10 Let
T = 2
IXi=1
JXj=1
Xij log
�XijX��
Xi�X�j
�: (16.11)
The limiting distribution of T under the null hypothesis
of independence is �2�where � = (I�1)(J�1). Pearson's
�2 test statistic is
U =
IXi=1
JXj=1
(nij � Eij)2
Eij
: (16.12)
Asymptotically, under H0, U has a �2�distribution where
� = (I � 1)(J � 1).
Example 16.11 These data are from a study by Hancock et al
(1979). Patients with Hodkins disease are classi�ed by their re-
sponse to treatment and by histological type.
Type Positive Response Partial Response No Response
LP 74 18 12 104
NS 68 16 12 96
MC 154 54 58 266
LD 18 10 44 72
The �2 test statistic is 75.89 with 2� 3 = 6 degrees of freedom.
The p-value is P(�26> 75:89) � 0. The likelihood ratio test
statistic is 68.30 with 2� 3 = 6 degrees of freedom. The p-value
is P(�26> 68:30) � 0. Thus there is strong evidence that reponse
to treatment and histological type are associated.
There are a variety of ways to quantify the strength of depen-
dence between two discrete variables Y and Z. Most of them
are not very intuitive. The one we shall use is not standard but
is more interpretable.
290 16. Inference about Independence
We de�ne
Æ(Y; Z) = maxA;B
jPY;Z(Y 2 A;Z 2 B)�PY (Y 2 A)�PZ(Z 2 B)j
(16.13)
where the maximum is over all pairs of events A and B.
Theorem 16.12 Properties of Æ:
1. 0 � Æ(Y; Z) � 1.
2. Æ(Y; Z) = 0 if and only if Y q Z.
3. The following identity holds:
Æ(X; Y ) =1
2
IXi=1
JXj=1
���pij � pi� p�j
���: (16.14)
4. The mle of Æ is
Æ(X; Y ) =1
2
IXi=1
JXj=1
���bpij � bpi� bp�j��� (16.15)
where bpij = Xij
n; bpi� = Xi�
n; bp
�j =X�j
n:
The interpretation of Æ is this: if one person makes probability
statements assuming independence and another person makes
probability statements without assuming independence, their
probability statements may di�er by as much as Æ. Here is a
suggested scale for interpreting Æ:
0 < Æ � :01 negligible association
:01 < Æ � :05 non-negligible association
:05 < Æ � :10 substantial association
Æ > :10 very strong association
A con�dence interval for Æ can be obtained by bootstrapping.
The steps for bootstrapping are:
16.4 Two Continuous Variables 291
1. Draw X� � Multinomial(n; bp);2. Compute bp�
ij; bp�
i�; bp�
�j.
3. Compute �.
4. Repeat.
Now we use any of the methods we learned earlier for construct-
ing bootstrap con�dence intervals. However, we should not use
a Wald interval in this case. The reason is that if Y q Z then
Æ = 0 and we are on the boundary of the parameter space. In
this case, the Wald method is not valid.
Example 16.13 Returning to Example 16.11 we �nd that bÆ = :11.
Using a pivotal bootstrap with 10; 000 bootstrap samples, a 95
per cent con�dence interval for Æ is (.09,.22). Our conclusion
is that the association between histological type and response is
substantial.
16.4 Two Continuous Variables
Now suppose that Y and Z are both continuous. If we assume
that the joint distribution of Y and Z is bivariate Normal, then
we mesure the dependence between Y and Z by means of the
correaltion coeÆcient �. Tests, estimates and con�dence inter-
vals for � in the Normal case are given in the previous chapter.
If we do not assume Normality then we need a nonparametric
method for assessing dependence.
Recall that the correaltion is
� =E ((X1 � �1)(X2 � �2))
�1�2:
A nonparametric estimator of � is the plug-in estimator which
is
b� = Pn
i=1(X1i �X1)(X2i �X2)qP
n
i=1(X1i �X1)2
Pn
i=1(X2i �X2)2
292 16. Inference about Independence
which is just the sample correlation. A con�dence interval can be
constructed using the bootstrap. A test for � = 0 can be based
on the Wald test using the bootstrap to estimate the standard
error.
The plug-in approach is useful for large samples. For small
samples, we measure the correlation using the Spearman rank
correaltion coeÆcient b�S. We simply replace the data by their
ranks { ranking each variable separately { then we compute the
correaltion coeÆcient of the ranks. To test the null hypothesis
that �S = 0 we need the distribution of b�S under the null hy-
pothesis. This can be obtained easily by simulation. We �x the
ranks of the �rst variable as 1; 2; : : : ; n. The ranks of the second
variable are chosen at random from the set of n! possible order-
ings. Then we compute the correlation. This is repeated many
times and the resulting distribution P0 is the null distribution ofb�S. The p-value for the test is P0(jRj > jb�Sj) where R is drawn
from the null distribution P0.
Example 16.14 The following data (Snedecor and Cochran, 1980,
p. 191) are systolic blood pressure X1 and are diastolic blood
pressure X2 in millimeters:
X1 100 105 110 110 120 120 125 130 130 150 170 195
X2 65 65 75 70 78 80 75 82 80 90 95 90
The sample correlation is b� = :88. The bootstrap standard
error is .046 and the Wald test statistic is :88=:046 = 19:23.
The p-value is near 0 giving strong evidence that the correlation
is not 0. The 95 per cent pivotal bootstrap con�dence interval is
(.78,.94). Because the sample size is small, consider Spearman's
rank correlation. In ranking the data, we will use average ranks
if there are ties. So if the third and fourth lowest numbers are
the same, they each get rank 3.5. The ranks of the data are:
16.5 One Continuous Variable and One Discrete 293
X1 1 2 3.5 3.5 5.5 5.5 7 8.5 8.5 10 11 12
X2 1.5 1.5 4.5 3 6 7.5 4.5 9 7.5 10.5 12 10.5
The rank correlation is b�S = :94 and the p-value for testing the
null hypothesis that there is no correaltion is P0(jRj > :94) � 0
which is was obtained by simulation.
16.5 One Continuous Variable and One Discrete
Suppose that Y 2 f1; : : : ; Ig is discrete and Z is continuous.
Let Fi(z) = P(Z � zjY = i) denote the cdf of Z conditional
on Y = i.
Theorem 16.15 When Y 2 f1; : : : ; Ig is discrete and Z is con-
tinuous, then Y q Z if and only if F1 = � � � = FI.
It follows from the previous theorem that to test for indepen-
dence, we need to test
H0 : F1 = � � � = FI versus H1 : not H0:
For simplicity, we consider the case where I = 2. To test the
null hypothesis that F1 = F2 we will use the two sample
Kolmogorov-Smirnov test. Let n1 denote the nimber of ob-
servations for which Yi = 1 and let n2 denote the nimber of
observations for which Yi = 2. Let
bF1(z) = 1
n1
nXi=1
I(Zi � z)I(Yi = 1)
and bF2(z) = 1
n2
nXi=1
I(Zi � z)I(Yi = 2)
denote the empirical distribution function of Z given Y = 1 and
Y = 2 respectively. De�ne the test statistic
D = supx
j bF1(x)� bF2(x)j:
294 16. Inference about Independence
Theorem 16.16 Let
H(t) = 1� 2
1Xj=1
(�1)j�1e�2j2t2
:
Under the null hypthesis that F1 = F2,
limn!1
P
�rn1n2
n1 + n2D � t
�= H(t):
It follows from the theorem that an approximate level � test
is obtained by rejecting H0 whenrn1n2
n1 + n2D > H�1(1� �):
16.6 Bibliographic Remarks
Johnson, S.K. and Johnson, R.E. (1972). New England Journal
of Medicine. 287. 1122-1125.
Hancock, B.W. (1979). Clinical Oncology, 5, 283-297.
16.7 Exercises
1. Prove Theorem 16.2.
2. Prove Theorem 16.3.
3. Prove Theorem 16.9.
4. Prove equation (16.14).
5. The New York Times (January 8, 2003, page A12) re-
ported the following data on death sentencing and race,The data here are
an approximate re-
creation using the
information in the
article.
from a study in Maryland:
Death Sentence No Death Sentence
Black Victim 14 641
White Victim 62 594
16.7 Exercises 295
Analyze the data using the tools from this Chapter. In-
terpret the results. Explain why, based only on this infor-
mation, you can't make causal conclusions. (The authors
of the study did use much more information in their full
report.)
6. Analyze the data on the variables Age and Financial Sta-
tus from:
http://lib.stat.cmu.edu/DASL/Data�les/montanadat.html
7. Estimate the correlation between temperature and lati-
tude using the data from
http://lib.stat.cmu.edu/DASL/Data�les/USTemperatures.html
Use the correlation coeÆcient and the Spearman rank cor-
realtion. Provide estimates, tests and con�dence intervals.
8. Test whether calcium intake and drop in blood pressure
are associated. Use the data in
http://lib.stat.cmu.edu/DASL/Data�les/Calcium.html
This is page 297
Printer: Opaque this
17
Undirected Graphs and ConditionalIndependence
Graphical models are a class of multivariate statistical models
that are useful for representing independence relations. They are
also useful develop parsimonious models for multivariate data.
To see why parsimonious models are useful in the multivariate
setting, consider the problem of estimating the joint distribu-
tion of several discrete random variables. Two binary variables
Y1 and Y2 can be represented as a two-by-two table which corre-
sponds to a multinomial with four categories. Similarly, k binary
variables Y1; : : : ; Yk correspond to a multinomial with N = 2k
categories. When k is even moderately large, N = 2k will be
huge. It can be shown in that case that the mle is a poor es-
timator. The reason is that the data are sparse: there are not
enough data to estimate so many parameters. Graphical mod-
els often require fewer parameters and may lead to estimators
with smaller risk. There are two main types of graphical models:
undirected and directed. Here, we introduce undirected graphs.
We'll discuss directed graphs later.
298 17. Undirected Graphs and Conditional Independence
17.1 Conditional Independence
Underlying graphical models is the concept of conditional inde-
pendence.
De�nition 17.1 Let X, Y and Z be discrete random vari-ables. We say that X and Y are conditionally inde-
pendent given Z, written X q Y j Z, if
P(X = x; Y = yjZ = z) = P(X = xjZ = z)P(Y = yjZ = z)
(17.1)
for all x; y; z. If X, Y and Z are continuous random vari-
ables, we say that X and Y are conditionally independent
given Z if
fX;Y jZ(x; yjz) = fXjZ(xjz)fY jZ(yjz):
for all x, y and z.
Intuitively, this means that, once you know Z, Y provides no
extra information about X.
The conditional independence relation satis�es some basic
properties.
Theorem 17.2 The following implications hold:
X q Y j Z =) Y qX j Z
X q Y j Z and U = h(X) =) U q Y j Z
X q Y j Z and U = h(X) =) X q Y j (Z; U)
X q Y j Z and X qW j(Y; Z) =) X q (W;Y ) j Z
X q Y j Z and X q Z j Y =) X q (Y; Z):
The last property requires the assumption that all events have
positive probability; the �rst four do not.
17.2 Undirected Graphs 299
b
b
bX
Y
Z
FIGURE 17.1. A graph with vertices V = fX;Y;Zg. The edge set isE = f(X;Y ); (Y;Z)g.
17.2 Undirected Graphs
An undirected graph G = (V;E) has a �nite set V of ver-
tices (or nodes) and a set E of edges (or arcs) consisting of
pairs of vertices. The vertices correspond to random variables
X; Y; Z; : : : and edges are written as unordered pairs. For exam-
ple, (X; Y ) 2 E means that X and Y are joined by an edge.
An example of a graph is in Figure 17.1.
Two vertices are adjacent, written X � Y , if there is an edge
between them. In Figure 17.1, X and Y are adjacent but X and
Z are not adjacent. A sequence X0; : : : ; Xn is called a path if
Xi�1 � Xi for each i. In Figure 17.1, X; Y; Z is a path. A graph
is complete if there is an edge between every pair of vertices.
A subset U � V of vertices together with their edges is called a
subgraph.
If A;B and C are three distinct subsets if V , we say that C
separates A and B if every path from a variable in A to a
variable in B intersects a variable in C. In Figure 17.2 fY;Wg
and fZg are separated by fXg. Also, W and Z are separated
by fX; Y g.
300 17. Undirected Graphs and Conditional Independence
b
b
b
b
Y
W X
Z
FIGURE 17.2. fY;Wg and fZg are separated by fXg. Also, W and
Z are separated by fX;Y g.
b
b
bX
Y
Z
FIGURE 17.3. X q ZjY .
17.3 Probability and Graphs
Let V be a set of random variables with distribution P. Con-
struct a graph with one vertex for each random variable in V .
Suppose we omit the edge between a pair of variables if they are
independent given the rest of the variables:
no edge between X and Y () X q Y jrest
where \rest" refers to all the other variables besides X and Y .
This type of graph is called a pairwise Markov graph. Some
examples are shown in Figures 17.3, 17.4 17.6 and 17.5.
The graph encodes a set of pairwise conditional independence
relations. These relations imply other conditional independence
17.3 Probability and Graphs 301
b
b
bX
Y
Z
FIGURE 17.4. No implied independence relations.
b
b
b
b
Y
X W
Z
FIGURE 17.5. X q ZjfY;Wg and Y qW jfX;Zg.
b b b b
X Y Z W
FIGURE 17.6. Pairwise independence implies that X q ZjfY;Wg.But is X q ZjY ?
302 17. Undirected Graphs and Conditional Independence
relations. How can we �gure out what they are? Fortunately, we
can read these other conditional independence relations directly
from the graph as well, as is explained in the next theorem.
Theorem 17.3 Let G = (V;E) be a pairwise Markov graph for
a distribution P. Let A;B and C be distinct subsets of V such
that C separates A and B. Then AqBjC.
Remark 17.4 If A and B are not connected (i.e. there is no path
from A to B) then we may regard A and B as being separated
by the empty set. Then Theorem 17.3 implies that A qB.
The independence condition in Theorem 17.3 is called the
global Markov property. We thus see that the pairwise and
global Markov properties are equivalent. Let us state this more
precisely. Given a graph G, let Mpair(G) be the set of distri-
butions which satisfy the pairwise Markov property: thus P 2
Mpair(G) if, under P, X q Y jrest if and only if there is no edge
between X and Y . Let Mglobal(G) be the set of distributions
which satisfy the global Markov property: thus P 2 Mpair(G) if,
under P, AqBjC if and only if C separates A and B.
Theorem 17.5 Let G be a graph. Then, Mpair(G) =Mglobal(G).
This theorem allows us to construct graphs using the simpler
pairwise property and then we can deduce other independence
relations using the global Markov property. Think how hard this
would be to do algebraically. Returning to 17.6, we now see that
X q ZjY and Y qW jZ.
Example 17.6 Figure 17.7 implies that X q Y , X q Z and X q
(Y; Z).
Example 17.7 Figure 17.8 implies that X qW j(Y; Z) and X q
ZjY .
17.3 Probability and Graphs 303
b
b
bX
Y
Z
FIGURE 17.7. X q Y , X q Z and X q (Y;Z).
b
b
b
b
X Y
Z
W
FIGURE 17.8. X qW j(Y;Z) and X q ZjY .
304 17. Undirected Graphs and Conditional Independence
17.4 Fitting Graphs to Data
Given a data set, how do we �nd a graphical model that �ts
the data. Some authors have devoted whole books to this sub-
ject. We will only treat the discrete case and we will consider a
method based on log-linear models which are the subject of
the next chapter.
17.5 Bibliographic Remarks
Thorough treatments of undirected graphs can be found inWhit-
taker (1990) and Lauritzen (1996). Some of the exercises below
are adapted from Whittaker (1990).
17.6 Exercises
1. Consider random variables (X1; X2; X3). In each of the
following cases, draw a graph that has the given indepen-
dence relations.
(a) X1 qX3 j X2.
(b) X1 qX2 j X3 and X1 qX3 j X2.
(c) X1 qX2 j X3 and X1 qX3 j X2 and X2 qX3 j X1.
2. Consider random variables (X1; X2; X3; X4). In each of the
following cases, draw a graph that has the given indepen-
dence relations.
(a) X1 qX3 j X2; X4 and X1 qX4 j X2; X3 and X2 qX4 j
X1; X3.
17.6 Exercises 305
b
b b b
X1 X2 X4
X3
FIGURE 17.9.
b b b b
X1 X2 X3 X4
FIGURE 17.10.
(b) X1 qX2 j X3; X4 and X1 qX3 j X2; X4 and X2 qX3 j
X1; X4.
(c) X1 qX3 j X2; X4 and X2 qX4 j X1; X3.
3. A conditional independence between a pair of variables is
minimal if it is not possible to use the Separation The-
orem to eliminate any variable from the conditioning set,
i.e. from the right hand side of the bar (Whittaker 1990).
Write down the minimal conditional independencies from:
(a) Figure 17.9; (b) Figure 17.10; (c) Figure 17.11; (d)
Figure 17.12.
4. Here are breast cancer data on diagnostic center (X1),
nuclear grade (X2), and survival (X3):
306 17. Undirected Graphs and Conditional Independence
b
b
b
b
X4
X3 X2
X1
FIGURE 17.11.
b
b
b
b b
b
X4
X2 X3
X1
X6X5
FIGURE 17.12.
17.6 Exercises 307
X2 malignant malignant benign benign
X3 died survived died survived
X1 Boston 35 59 47 112
Glamorgan 42 77 26 76
(a) Treat this as a multinomial and �nd the maximum
likelihood estimator.
(b) If someone has a tumour classi�ed as benign at the
Glamorgan clinic, what is the estimated probability that
they will die? Find the standard error for this estimate.
(c) Test the following hypotheses:
X1 qX2jX3 versus X1 =qX2jX3
X1 qX3jX2 versus X1 =qX3jX2
X2 qX3jX1 versus X2 =qX3jX1
Based on the results of your tests, draw and interpret the
resulting graph.
This is page 309
Printer: Opaque this
18
Loglinear Models
In this chapter we study loglinear models which are useful for
modelling multivariate discrete data. There is a strong connec-
tion between loglinear models and undirected graphs. Parts of
this Chapter draw on the material in Whittaker (1990).
18.1 The Loglinear Model
Let X = (X1; : : : ; Xm) be a random vector with probability
function
f(x) = P(X = x) = P(X1 = x1; : : : ; Xm = xm)
where x = (x1; : : : ; xm). Let rj be the number of values that
Xj takes. Without loss of generality, we can assume that Xj 2
f0; 1; : : : ; rj�1g. Suppose now that we have n such random vec-
tors. We can think of the data as a sample from a Multinomial
with N = r1 � r2 � � � � � rm categories. The data can be repre-
sented as counts in a r1�r2�� � ��rm table. Let p = (p1; : : : ; pN)
denote the multinomial parameter.
310 18. Loglinear Models
Let S = f1; : : : ; mg. Given a vector x = (x1; : : : ; xm) and a
subset A � S, let xA = (xj : j 2 A). For example, if A = f1; 3g
then xA = (x1; x3).
Theorem 18.1 The joint probability function f(x) of a single
random vector X = (X1; : : : ; Xm) can be written as
log f(x) =XA�S
A(x) (18.1)
where the sum is over all subsets A of S = f1; : : : ; mg and the
's satisfy the following conditions:
1. ;(x) is a constant;
2. For every A � S, A(x) is only a function of xA and not
the rest of the x0js.
3. If i 2 A and xi = 0, then A(x) = 0.
The formula in equation (18.1) is called the log-linear ex-
pansion of f . Note that this is the probability function for a
single draw. Each A(x) will depend on some unknown parame-
ters �A. Let � = (�A : A � S) be the set of all these parameters.
We will write f(x) = f(x; �) when we want to estimate the de-
pendence on the unknown parameters �.
In terms of the multinomial, the parameter space is
P =np = (p1; : : : ; pN) : pj � 0;
NXj=1
pj = 1o:
This is an N � 1 dimensional space. In the log-linear represen-
ation, the parameter space is
� =n� = (�1; : : : ; �N) : � = �(p); p 2 P
owhere �(p) is the set of � values associated with p. The set � is
a N � 1 dimensinal surface in RN . We can always go back and
forth betwee the two parameterizations we can write � = �(p)
and p = p(�).
18.1 The Loglinear Model 311
Example 18.2 Let X � Bernoulli(p) where 0 < p < 1. We can
write the probability mass function for X as
f(x) = px(1� p)1�x = px1 p1�x
2
for x = 0; 1, where p1 = p and p2 = 1� p. Hence,
log f(x) = ;(x) + 1(x)
where
;(x) = log(p2)
1(x) = x log
�p1
p2
�:
Notice that ;(x) is a constant (as a function of x) and 1(x) =
0 when x = 0. Thus the three conditions of the Theorem hold.
The loglinear parameters are
�0 = log(p2); �1 = log
�p1
p2
�:
The original, multinomial parameter space is P = f(p1; p2) :
pj � 0; p1 + p2 = 1g. The log-linear parameter space is
� = f(�0; �1) 2 R2 : e�0+�1 + e�0 = 1:g
Given (p1; p2) we can solve for (�0; �1). Conversely, given (�0; �1)
we can solve for (p1; p2). �
Example 18.3 Let X = (X1; X2) where X1 2 f0; 1g and X2 2
f0; 1; 2g. The joint distribution of n such random vectors is a
multinomial with 6 categories. The multinomial parameters can
be written as a 2-by-3 table as follows:
multinomial x2 0 1 2
x1 0 p00 p01 p021 p10 p11 p12
312 18. Loglinear Models
The n data vectors can be summarized as counts:
data x2 0 1 2
x1 0 C00 C01 C02
1 C10 C11 C12
For x = (x1; x2), the log-linear expansion takes the form
log f(x) = ;(x) + 1(x) + 2(x) + 12(x)
where
;(x) = log p00
1(x) = x1 log
�p10
p00
�
2(x) = I(x2 = 1) log
�p01
p00
�+ I(x2 = 2) log
�p02
p00
�
12(x) = I(x1 = 1; x2 = 1) log
�p11p00
p01p10
�+ I(x1 = 1; x2 = 2) log
�p12p00
p02p10
�:
Convince yourself that the three conditions on the 's of the
theorem are satis�ed. The six parameters of this model are:
�1 = log p00 �2 = log�p10
p00
��3 = log
�p01
p00
��4 = log
�p02
p00
��5 = log
�p11p00
p01p10
��6 = log
�p12p00
p02p10
�:
�
The next Theorem gives an easy way to check for conditional
independence in a loglinear model.
Theorem 18.4 Let (Xa; Xb; Xc) be a partition of a vectors (X1; : : : ; Xm).
Then XbqXcjXa if and only if all the -terms in the log-linear
expansion that have at least one coordinate in b and one coordi-
nate in c are 0.
To prove this Theorem, we will use the following Lemma
whose proof follows easily from the de�nition of conditional in-
dependence.
18.2 Graphical Log-Linear Models 313
Lemma 18.5 A partition (Xa; Xb; Xc) satis�es XbqXcjXa if and
only if f(xa; xb; xc) = g(xa; xb)h(xa; xc) for some functions g and
h
Proof. (Theorem 18.4.) Suppose that t is 0 whenever t has
coordinates in b and c. Hence, t is 0 if t 6� aSb or t 6� a
Sc.
Therefore
log f(x) =X
t�aSb
t(x) +X
t�aSc
t(x)�Xt�a
t(x):
Exponentiating, we see that the joint density is of the form
g(xa; xb)h(xa; xc). By Lemma 18.5, Xb q XcjXa. The converse
follows by reversing the argument. �
18.2 Graphical Log-Linear Models
A log-linear model is graphical if missing terms correspond
only to conditional independence constraints.
De�nition 18.6 Let log f(x) =P
A�S A(x) be a log-linear
model. Then f is graphical if all -terms are non-zero
except for any pair of coordinates not in the edge set for
some graph G. In other words, A(x) = 0 if and only if
fi; jg � A and (i; j) is not an edge.
Here is way to think about the de�nition above:
If you can add a term to the model and the graph does
not change, then the model is not graphical.
Example 18.7 Consider the graph in Figure 18.1.
The graphical log-linear model that corresponds to this graph
is
log f(x) = ; + 1(x) + 2(x) + 3(x) + 4(x) + 5(x)
+ 12(x) + 23(x) + 25(x) + 34(x) + 35(x) + 45(x) + 235(x) + 345(x):
314 18. Loglinear Models
b b b
b b
X1 X2 X3
X5 X4
FIGURE 18.1. Graph for Example 18.7.
Let's see why this model is graphical. The edge (1; 5) is missing
in the graph. Hence any term containing that pair of indices is
omitted from the model. For example,
15; 125; 135; 145; 1235; 1245; 1345; 12345
are all omitted. Similarly, the edge (2; 4) is missing and hence
24; 124; 234; 245; 1234; 1245; 2345; 12345
are all omitted. There are other missing edges as well. You can
check that the model omits all the corresponding terms. Now
consider the model
log f(x) = ;(x) + 1(x) + 2(x) + 3(x) + 4(x) + 5(x)
+ 12(x) + 23(x) + 25(x) + 34(x) + 35(x) + 45(x):
This is the same model except that the three way interactions
were removed. If we draw a graph for this model, we will get the
same graph. For example, no terms contain (1; 5) so we omit
the edge between X1 and X5. But this is not graphical since it has
extra terms omitted. The independencies and graphs for the two
models are the same but the latter model has other constraints
besides conditional independence constraints. This is not a bad
18.3 Hierarchical Log-Linear Models 315
thing. It just means that if we are only concerned about presence
or absence of conditional independences, then we need not con-
sider such a model. The presence of the three-way interaction
235 means that the strength of association between X2 and X3
varies as a function of X5. Its absence indicates that this is not
so. �
18.3 Hierarchical Log-Linear Models
There is a set of log-linear models that is larger than the set of
graphical models and that are used quite a bit. These are the
hierarchical log-linear models.
De�nition 18.8 A log-linear model is hierarchical if a =
0 and a � t implies that t = 0.
Lemma 18.9 A graphical model is hierarchical but the reverse
need not be true.
Example 18.10 Let
log f(x) = ;(x) + 1(x) + 2(x) + 3(x) + 12(x) + 13(x):
The model is hierarchical; its graph is given in Figure 18.2. The
model is graphical because all terms involving (1,3) are omitted.
It is also hierarchical. �
Example 18.11 Let
log f(x) = ;(x)+ 1(x)+ 2(x)+ 3(x)+ 12(x)+ 13(x)+ 23(x):
The model is hierarchical. It is not graphical. The graph corre-
sponding to this model is complete; see Figure 18.3. It is not
graphical because 123(x) = 0 which does not correspond to any
pairwise conditional independence. �
316 18. Loglinear Models
b b b
X1 X2 X3
FIGURE 18.2. Graph for Example 18.10.
b
bb
X3
X1 X2
FIGURE 18.3. The graph is complete. The model is hierarchical but
not graphical.
18.4 Model Generators 317
b b b
X1 X2 X3
FIGURE 18.4. The model for this graph is not hierarchical.
Example 18.12 Let
log f(x) = ;(x) + 3(x) + 12(x):
The graph corresponding is in Figure 18.4. This model is not
hierarchical since 2 = 0 but 12 is not. Since it is not hierar-
chical, it is not graphical either. �
18.4 Model Generators
Hierarchical models can be written succinctly using genera-
tors. This is most easily explained by example. Suppose that
X = (X1; X2; X3). Then, M = 1:2 + 1:3 stands for
log f = ; + 1 + 2 + 3 + 12 + 13:
The formulaM = 1:2+1:3 says: \include 12 and 13." We have
to also include the lower order terms or it won't be hierarchical.
The generator M = 1:2:3 is the saturated model
log f = ; + 1 + 2 + 3 + 12 + 13 + 23 + 123:
The saturated models corresponds to �tting an unconstrained
multinomial. Consider M = 1 + 2 + 3 which means
log f = ; + 1 + 2 + 3:
318 18. Loglinear Models
;
1 2
1 + 2
1.2
FIGURE 18.5. The lattice with two variables.
This is the mutual independence model. Finally, consider M =
1:2 which has log-linear expansion
log f = ; + 1 + 2 + 12:
This model makes X3jX2 = x2; X1 = x1 a uniform distribution.
18.5 Lattices
Hierarchical models can be organized into something called a
lattice. This is the set of all hierarchical models partially or-
dered by inclusion. The set of all hierarchical models for two
variables can be illustrated as in Figure 18.5.
M = 1:2 is the saturated model, M = 1 + 2 is the indepen-
dence model, M = 1 is independence plus X2jX1 is uniform,
M = 2 is independence plus X1jX2 is uniform, M = 0 is the
uniform distribution.
The lattice of trivariate models is shown in �gure 18.6.
18.6 Fitting Log-Linear Models to Data
Let � denote all the parameters in a log-linear model M . The
loglikelihood for � is
`(�) =Xj
xj log pj(�)
18.6 Fitting Log-Linear Models to Data 319
;
1 2 3
1 + 2 1 + 3 2 + 3
1.2 1.3 2.3
1 + 2 + 3
1.2 + 3 1.3 + 2 2.3 + 1
1.2 + 1.3 1.2 + 2.3 1.3 + 2.3
1.2 + 1.3 + 2.3
1.2.3
uniform
independence
conditional uniform
mutual independence
(graphical)
graphical
graphical
two-way
saturated(graphical)
FIGURE 18.6. The lattice of models for three variables.
320 18. Loglinear Models
where the sum is over the cells and p(�) denotes the cell proba-
bilities corresponding to �. The mle b� generally has to be found
numerically. The model with all possible -terms is called the
saturated model. We can also �t any sub-model which cor-
responds to setting some subset of terms to 0.
De�nition 18.13 For any submodelM , de�ne the deviance
dev(M) by
dev(M) = 2(bsat � bM)
where bsat is the log-likelihood of the saturated model eval-
uated at the mle and bM is the log-likelihood of the model
M evaluated at its mle .
Theorem 18.14 The deviance is the likelihood ratio test statistic
for
H0 : the model is M versus H1 : the model is not M:
Under H0, dev(M)d
! �2�with � degrees of freedom equal to the
di�erence in the number of parameters between the saturated
model and M .
One way to �nd a good model is to use the deviance to test
every sub-model. Every model that is not rejected by this test is
then considered a plausible model. However, this is not a good
strategy for two reasons. First, we will end up doing many tests
which means that there is ample opportunity for making Type I
and Type II errors. Second, we will end up using models where
we failed to reject H0. But we might fail to reject H0 due to low
power. The result is that we end up with a bad model just due
to low power.
There are many model searching strategies. A common ap-
proach is to use some form of penalized likelihood. One version
of penalized is the AIC that we used in regression. For any model
18.6 Fitting Log-Linear Models to Data 321
M de�ne
AIC(M) = �2�b(M)� jM j
�(18.2)
where jM j is the number of parameters. Here is the explanation
of where AIC comes from.
Consider a set of models fM1;M2; : : :g. Let bfj(x) denote theestimated probability function obtained by using the maximum
likelihood estimator ofr mode Mj. Thus, bfj(x) = bf(x; b�j) whereb�j is the mle of the set of parameters �j for modelMj. We will
use the loss function D(f; bf) whereD(f; g) =
Xx
f(x) log
�f(x)
g(x)
�
is the Kullback-Leibler distance between two probability func-
tions. The corresponding risk function is R(f; bf) = E (D(f; bf).Notice that D(f; bf) = c � A(f; bf) where c = P
xf(x) log f(x)
deos not depend on bf and
A(f; bf) =Xx
f(x) log bf(x):Thus, minimizing the risk is equivalent to maximizing a(f; bf) �E (A(f; bf)).It is tempting to estimate a(f; bf) by
Px
bf(x) log bf(x) but,
just as the training error in regression is a highly biased esti-
mate of prediction risk, it is also the case thatP
xbf(x) log bf(x)
is a highly biased estimate of a(f; bf). In fact, the bias is approx-
imately equal to jMjj. Thus:
Theorem 18.15 AIC(Mj) is an approximately unbiased estimate
of a(f; bf).After �nding a \best model" this way we can draw the corre-
sponding graph. We can also check the overall �t of the selected
model using the deviance as described above.
322 18. Loglinear Models
Example 18.16 Data on breast cancer from Morrison et al (1973)
were presented in homework question 4 of Chapter 17. The data
are on diagnostic center (X1), nuclear grade (X2), and survival
(X3): (Morrison et al 1973):
X2 malignant malignant benign benign
X3 died survived died survived
X1 Boston 35 59 47 112
Glamorgan 42 77 26 76
The saturated log-linear model is:
CoeÆcient Standard Error Wald Statistic p-value
(Intercept) 3.56 0.17 21.03 0.00 ***
center 0.18 0.22 0.79 0.42
grade 0.29 0.22 1.32 0.18
survival 0.52 0.21 2.44 0.01 *
center�grade -0.77 0.33 -2.31 0.02 *
center�survival 0.08 0.28 0.29 0.76
grade�survival 0.34 0.27 1.25 0.20
center�grade�survival 0.12 0.40 0.29 0.76
The best sub-model, selected using AIC and backward searching
is
CoeÆcient Standard Error Wald Statistic p-value
(Intercept) 3.52 0.13 25.62 < 0.00 ***
center 0.23 0.13 1.70 0.08
grade 0.26 0.18 1.43 0.15
survival 0.56 0.14 3.98 6.65e-05 ***
center�grade -0.67 0.18 -3.62 0.00 ***
grade�survival 0.37 0.19 1.90 0.05
The graph for this model M is shown in Figure 18.7. To test
the �t of this model, we compute the deviance of M which is
0.6. The appropriate �2 has 8� 6 = 2 degrees of freedom. The
p-value is P(�22 > :6) = :74. So we have no evidence to suggest
that the model is a poor �t. �
18.6 Fitting Log-Linear Models to Data 323
Center Grade Survival
FIGURE 18.7. The graph for Example 18.16.
The example above is a toy example. With so few variables,
there is little to be gained by searching for a best model. The
overall multinomial mle would be suÆcient. More importantly,
we are probably interested in predicting survivial from grade and
center in which case this should really be treated as a regresion
problem with outcome Y = survival and covariates center and
grade.
Example 18.17 Here is a synthetic example. We generate n =
100 random vectors X = (X1; : : : ; X5) of length 5. We generated
the data as follows:
X1 � Bernoulli(1=2)
and
XjjX1; : : : ; Xj�1 �
�1=4 if Xj�1 = 0
3=4 if Xj�1 = 1:
It follows that
f(x1; : : : ; x5) =
�1
2
��3
4
�x1�3
4
�1�x1�3
4
�x2�3
4
�1�x2�3
4
�x3�3
4
�1�x3�3
4
�x4�3
4
�1�x4
=
�1
2
��3
4
�x1+x2+x3+x4�3
4
�4�x1�x2�x3�x4
:
We estimated f using three methods: (i) maximum likelihood
treating this as a multinomial with 32 categories, (ii) maximum
likelihood from the best loglinear model using AIC and forward
selection and (iii) maximum likelihood from the best loglinear
324 18. Loglinear Models
model using BIC and forward selection. We estimated the risk
by simulating the example 100 times. The average risks were:
Method Risk
MLE 0.63
AIC 0.54
BIC 0.53
In this example, there is little di�erence between AIC and BIC.
Both are better than maximum likelihood. �
18.7 Bibliographic Remarks
A classic references on loglinear models is Bishop, Fienberg and
Holland (1975). See also Whittaker (1990) from which some of
the exercises are borrowed.
18.8 Exercises
1. Solve for the p0ijs in terms of the �'s in Example 18.3.
2. Repeat example 18.17 using 7 covariates and n = 1000.
To avoid numerical problems, replace any zero count with
a one.
3. Prove Lemma 18.5.
4. Prove Lemma 18.9.
5. Consider random variables (X1; X2; X3; X4). Suppose the
log-density is
log f(x) = ;(x) + 12(x) + 13(x) + 24(x) + 34(x):
(a) Draw the graph G for these variables.
(b) Write down all independence and conditional indepen-
dence relations implied by the graph.
18.8 Exercises 325
(c) Is this model graphical? Is it hierarchical?
6. Suppose that parameters p(x1; x2; x3) are proportional to
the following values:
x2 0 0 1 1
x3 0 1 0 1
x1 0 2 8 4 16
1 16 128 32 256
Find the -terms for the log-linear expansion. Comment
on the model.
7. Let X1; : : : ; X4 be binary. Draw the independence graphs
corresponding to the following log-linear models. Also,
identify whether each is graphical and/or hierarchical (or
neither).
(a) log f = 7 + 11x1 + 2x2 + 1:5x3 + 17x4
(b) log f = 7+11x1+2x2+1:5x3+17x4+12x2x3+78x2x4+
3x3x4 + 32x2x3x4
(c) log f = 7+11x1+2x2+1:5x3+17x4+12x2x3+3x3x4+
x1x4 + 2x1x2
(d) log f = 7 + 5055x1x2x3x4
This is page 327
Printer: Opaque this
19
Causal Inference
In this Chapter we discuss causation. Roughly speaking, the
statement \X causes Y " means that changing the value of X
will change the distribution of Y . When X causes Y , X and Y
will be associated but the reverse is not, in general, true. Associ-
ation does not necessarily imply causation. We will consider two
frameworks for discussing causation. The �rst uses the notation
of counterfactual random variables. The second, presented in
the next Chapter, uses directed acyclic graphs.
19.1 The Counterfactual Model
Suppose that X is a binary treatment variable where X = 1
means \treated" and where X = 0 means \not treated." We are
using the word \treatment" in a very broad sense: treatment
might refer to a medication or something like smoking. An al-
ternative to \treated/not treated" is \exposed/not exposed" but
we shall use the former.
328 19. Causal Inference
Let Y be some outcome variable such as presence or absence of
disease. To distinguish the statement \X is associated Y " from
the statement \X causes Y " we need to enrich our probabilistic
vocabulary. We will decompose the response Y into a more �ne-
grained object.
We introduce two new random variables (C0; C1), called po-
tential outcomes with the following interpretation: C0 is the
outcome if the subject is not treated (X = 0) and C1 is the
outcome if the subject is treated (X = 1). Hence,
Y =
�C0 if X = 0
C1 if X = 1:
We can express the relationship between Y and (C0; C1) more
succintly by
Y = CX : (19.1)
This equation is called the consistency relationship.
Here is a toy data set to make the idea clear:
X Y C0 C1
0 4 4 *
0 7 7 *
0 2 2 *
0 8 8 *
1 3 * 3
1 5 * 5
1 8 * 8
1 9 * 9
The asterisks denote unobserved values. When X = 0 we don't
observe C1 in which case we say that C1 is a counterfactual
since it is the outcome you would have had if, counter to the
fact, you had been treated (X = 1). Similarly, when X = 1 we
don't observe C0 and we say that C0 is counterfactual.
Notice that there are four types of subjects:
19.1 The Counterfactual Model 329
Type C0 C1
Survivors 1 1
Responders 0 1
Anti-repsonders 1 0
Doomed 0 0
Think of the potential outcomes (C0; C1) as hidden variables
that contain all the relavent information about the subject.
De�ne the average causal e�ect or average treatment
e�ect to be
� = E (C1)� E (C0): (19.2)
The parameter � has the following interpretation: � is the mean
if everyone were treated (X = 1) minus the mean if everyone
were not treated (X = 0). There are other ways of measuring
the causal e�ect. For example, if C0 and C1 are binary, we de�ne
the causal odds ratio
P(C1 = 1)
P(C1 = 0)�
P(C0 = 1)
P(C0 = 0)
and the causal relative risk
P(C1 = 1)
P(C0 = 1):
The main ideas will be the same whatever causal e�ect we use.
For simplicity, we shall work with the average causal e�ect �.
De�ne the association to be
� = E (Y jX = 1)� E (Y jX = 0): (19.3)
Again, we could use odds ratios or other summaries if we wish.
Theorem 19.1 (Association is not equal to Causation) In
general, � 6= �.
Example 19.2 Suppose the whole population is as follows:
330 19. Causal Inference
X Y C0 C1
0 0 0 0�
0 0 0 0�
0 0 0 0�
0 0 0 0�
1 1 1� 1
1 1 1� 1
1 1 1� 1
1 1 1� 1
Again, the asterisks denote unobserved values. Notice that C0 =
C1 for every subject, thus, this treatment has no e�ect. Indeed,
� = E (C1)� E (C0) =1
8
8Xi=1
C1i �
1
8
8Xi=1
C0i
=1 + 1 + 1 + 1 + 0 + 0 + 0 + 0
8�
1 + 1 + 1 + 1 + 0 + 0 + 0 + 0
8= 0:
Thus the average causal e�ect is 0. The observed data are X's
and Y 's only from which we can estimate the association:
� = E (Y jX = 1)� E (Y jX = 0)
=1 + 1 + 1 + 1
4�
0 + 0 + 0 + 0
4= 1:
Hence, � 6= �.
To add some intutition to this example, imagine that the out-
come variable is 1 if \healthy" and 0 if \sick". Suppose that
X = 0 means that the subject does not take vitamin C and that
X = 1 means that the subject does take vitamin C. Vitamin
C has no causal e�ect since C0 = C1. In this example there
are two types of people: healthy people (C0; C1) = (1; 1) and
unhealthy people (C0; C1) = (0; 0). Healthy people tend to take
vitamin C while unhealty people don't. It is this association be-
tween (C0; C1) and X that creates an association between X and
Y . If we only had data on X and Y we would conclude that X
19.1 The Counterfactual Model 331
and Y are associated. Suppose we wrongly interpret this causally
and conclude that vitamin C prevents illness. Next we might en-
courage everyone to take vitamin C. If most people comply the
population will look something like this:
X Y C0 C1
0 0 0 0�
1 0 0 0�
1 0 0 0�
1 0 0 0�
1 1 1� 1
1 1 1� 1
1 1 1� 1
1 1 1� 1
Now � = (4=7)� (0=1) = 4=7. We see that � went down from 1
to 4/7. Of course, the causal e�ect never changed but the naive
observer who does not distinguish association and causation will
be confused because his advice seems to have made things worse
instead of better. �
In the last example, � = 0 and � = 1. It is not hard to create
examples in which � > 0 and yet � < 0. The fact that the
association and causal e�ects can have di�erent signs is very
confusing to many people including well educated statisticians.
The example makes it clear that, in general, we cannot use the
association to estimate the causal e�ect �. The reason that � 6= �
is that (C0; C1) was not independent of X. That is, treatment
assigment was not independent of person type.
Can we ever estimate the causal e�ect? The answer is: some-
times. In particular, random asigment to treatment makes it
possible to estimate �.
Theorem 19.3 Suppose we randomly assign subjects to treat-
ment and that P(X = 0) > 0 and P(X = 1) > 0. Then � = �.
Hence, any consistent estimator of � is a consistent estimator
332 19. Causal Inference
of �. In particular, a consistent estimator is
b� = bE (Y jX = 1)� bE (Y jX = 0)
= Y 1 � Y 0
is a consistent estimator of �, where
Y 1 =1
n1
Xi:Xi=1
Yi; Y 0 =1
n0
Xi:Xi=0
Yi;
n1 =P
n
i=1Xi and n0 =
Pn
i=1(1�Xi).
Proof. Since X is randomly assigned, X is independent of
(C0; C1). Hence,
� = E (C1)� E (C0)
= E (C1jX = 1)� E (C0jX = 0) since X q (C0; C1)
= E (Y jX = 1)� E (Y jX = 0) since Y = CX
= �:
The consistency follows from the law of large numbers. �
If Z is a covariate, we de�ne the conditional causal e�ect
by
�z = E (C1jZ = z)� E (C0jZ = z):
For example, if Z denotes gender with values Z = 0 (women)
and Z = 1 (men), then �0 is the causal e�ect among women and
�1 is the causal e�ect among men. In a randomized experiment,
�z = E (Y jX = 1; Z = z) � E (Y jX = 0; Z = z) and we can
estimate the conditional causal e�ect using appropriate sample
averages.
19.2 Beyond Binary Treatments 333
Summary
Random variables: (C0; C1; X; Y ).
Consistency relationship: Y = CX .
Causal E�ect: � = E (C1)� E (C0).
Association: � = E (Y jX = 1)� E (Y jX = 0).
Random assigment =) (C0; C1)qX =) � = �.
19.2 Beyond Binary Treatments
Let us now generalize beyond the binary case. Suppose that
X 2 X . For example, X could be the dose of a drug in which
case X 2 R. The counterfactual vector (C0; C1) now becomes
the counterfactual process
fC(x) : x 2 Xg (19.4)
where C(x) is the outcome a subject would have if he received
dose x. The observed reponse is given by the consistency relation
Y � C(X): (19.5)
See Figure 19.1. The causal regression function is
�(x) = E (C(x)): (19.6)
The regression function, which measures association, is r(x) =
E (Y jX = x).
Theorem 19.4 In general, �(x) 6= r(x). However, when X is
randomly assigned, �(x) = r(x).
Example 19.5 An example in which �(x) is constant but r(x)
is not constant is shown in Figure 19.2. The �gure shows the
counterfactuals processes for four subjects. The dots represent
334 19. Causal Inference
0 1 2 3 4 5 6 7 8 9 10 110
1
2
3
4
5
6
7
x
C(x)
X
Y = C(X)
FIGURE 19.1. A counterfactual process C(x). The outcome Y is the
value of the curve C(x) evaluated at the observed dose X.
19.3 Observational Studies and Confounding 335
their X values X1; X2; X3; X4. Since Ci(x) is constant over x
for all i, there is no causal e�ect and hence
�(x) =C1(x) + C2(x) + C3(x) + C4(x)
4
is constant. Changing the dose x will not change anyone's out-
come. The four dots in the lower plot represent the observed data
points Y1 = C1(X1); Y2 = C2(X2); Y3 = C3(X3); Y4 = C4(X4).
The dotted line represents the regression r(x) = E (Y jX = x).
Although there is no causal e�ect, there is an association since
the regression curve r(x) is not constant. �
19.3 Observational Studies and Confounding
A study in which treatment (or exposure) is not randomly as-
signed, is called an observational study. In these studies, sub-
jects select their own value of the exposure X. Many of the
health studies your read about in the newspaper are like this.
As we saw, association and causation could in general be quite
di�erent. This descrepancy occurs in non-randomized studies
because the potential outcome C is not independent of treat-
ment X. However, suppose we could �nd groupings of subjects
such that, within groups, X and fC(x) : x 2 Xg are indepen-
dent. This would happen if the subjects are very similar within
groups. For example, suppose we �nd people who are very sim-
ilar in age, gender, educational background and ethnic back-
ground. Among these people we might feel it is reasonable to
assume that the choice of X is essentially random. These other
variables are called confounding variables. If we denote these A more precise def-
inition of confound-
ing is given in the
next Chapter.
other variables collectively as Z, then we can express this idea
by saying that
fC(x) : x 2 Xg qXjZ: (19.7)
Equation (19.7) means that, within groups of Z, the choice of
treatmentX does not depend on type, as represented by fC(x) :
336 19. Causal Inference
0 1 2 3 4 50
1
2
3
C1(x)
C2(x)
C3(x)
C4(x)
x
b
b
b
b
0 1 2 3 4 50
1
2
3�(x)
b
b
b
b Y1
Y2
Y3
Y4
FIGURE 19.2. The top plot shows the counterfactual process C(x)
for four subjects. The dots represent their X values. Since Ci(x) is
constant over x for all i, there is no causal e�ect. Changing the dose
will not change anyone's outcome. The lower plot shows the causal
regression function �(x) = (C1(x) + C2(x) + C3(x) + C4(x))=4.
The four dots represent the observed data points
Y1 = C1(X1); Y2 = C2(X2); Y3 = C3(X3); Y4 = C4(X4). The
dotted line represents the regression r(x) = E (Y jX = x). There is
no causal e�ect since Ci(x) is constant for all i. But there is an
association since the regression curve r(x) is not constant.
19.3 Observational Studies and Confounding 337
x 2 Xg. If (19.7) holds and we observe Z then we say that there
is no unmeasured confounding.
Theorem 19.6 Suppose that (19.7) holds. Then,
�(x) =
ZE (Y jX = x; Z = z)dFZ(z)dz: (19.8)
If br(x; z) is a consistent estimate of the regression function E (Y jX =
x; Z = z), then a consistent estimate of �(x) is
b�(x) = 1
n
nXi=1
br(x; Zi):
In particular, if r(x; z) = �0 + �1x + �2z is linear, then a con-
sistent estimate of �(x) is
b�(x) = b�0 + b�1x+ b�2Zn (19.9)
where (b�0; b�1; b�2) are the least squares estimators.
Remark 19.7 It is useful to compare equation (19.8) to E (Y jX =
x) which can be written as E (Y jX = x) =RE (Y jX = x; Z =
z)dFZjX(zjx).
Epidemiologists call (19.8) the adjusted treatment e�ect.
The process of computing adjusted treatment e�ects is called
adjusting (or controlling) for confounding. The selection
of what confounders Z to measure and control for requires sci-
enti�c insight. Even after adjusting for confounders, we cannot
be sure that there are not other confounding variables that we
missed. This is why observational studies must be treated with
healthy skepticism. Results from observational studies start to
become believable when: (i) the results are replicated in many
studies, (ii) each of the studies controlled for plausible confound-
ing variables, (iii) there is a plausible scienti�c explanation for
the existence of a causal relationship.
338 19. Causal Inference
A good example is smoking and cancer. Numerous studies
have shown a relationship between smoking and cancer even af-
ter adjusting for many confouding variables. Moreover, in labo-
ratory studies, smoking has been shown to damage lung cells. Fi-
nally, a causal link between smoking and cancer has been found
in randomized animal studies. It is this collection of evidence
over many years that makes this a convincing case. One single
observational study is not, by itself, strong evidence. Remember
that when you read the newspaper.
19.4 Simpson's Paradox
Simpson's paradox is a puzzling phenomenon that is dicussed in
most statistics texts. Unfortunately, it is explained incorrectly in
most statistics texts. The reason is that it is nearly impossible to
explain the paradox without using counterfactuals (or directed
acyclic graphs).
Let X be a binary treatment variable, Y a binary outcome
and Z a third binary variable such as gender. Suppose the joint
distribution of X; Y; Z is
Y = 1 Y = 0 Y = 1 Y = 0
X = 1 .1500 .2250 .1000 .0250
X = 0 .0375 .0875 .2625 .1125
Z = 1 (men) Z = 0 (women)
The marginal distribution for (X; Y ) is
19.4 Simpson's Paradox 339
Y = 1 Y = 0
X = 1 .25 .25 .50
X = 0 .30 .20 .50
.55 .45 1
Now,
P(Y = 1jX = 1)� P(Y = 1jX = 0) =P(Y = 1; X = 1)
P(X = 1)�
P(Y = 1; X = 0)
P(X = 0)
=:25
:50�
:30
:50
= �0:1:
We might naively interpret this to mean that the treatment
is bad for you since P(Y = 1jX = 1) < P(Y = 1jX = 0).
Furthermore, among men,
P(Y = 1jX = 1; Z = 1) � P(Y = 1jX = 0; Z = 1)
=P(Y = 1; X = 1; Z = 1)
P(X = 1; Z = 1)�
P(Y = 1; X = 0; Z = 1)
P(X = 0; Z = 1)
=:15
:3750�
:0375
:1250
= 0:1:
Among women,
P(Y = 1jX = 1; Z = 0) � P(Y = 1jX = 0; Z = 0)
=P(Y = 1; X = 1; Z = 0)
P(X = 1; Z = 0)�
P(Y = 1; X = 0; Z = 0)
P(X = 0; Z = 0)
=:1
:1250�
:2625
:3750
= 0:1:
To summarize, we seem to have the following information:
340 19. Causal Inference
Mathematical Statement English Statement?
P(Y = 1jX = 1) < P(Y = 1jX = 0) treatment is harmful
P(Y = 1jX = 1; Z = 1) > P(Y = 1jX = 0; Z = 1) treatment is bene�cial to men
P(Y = 1jX = 1; Z = 0) > P(Y = 1jX = 0; Z = 0) treatment is bene�cial to women
Clearly, something is amiss. There can't be a treatment which
is good for men, good for women but bad overall. This is non-
sense. The problem is with the set of English statements in the
table. Our translation from math into english is specious.
The inequality P(Y = 1jX = 1) < P(Y = 1jX =
0) does not mean that treatment is harmful.
The phrase \treatment is harmful" should be written mathe-
matically as P(C1 = 1) < P(C0 = 1). The phrase \treatment is
harmful for men" should be written P(C1 = 1jZ = 1) < P(C0 =
1jZ = 1). The three mathematical statements in the table are
not at all contradictory. It is only the translation into English
that is wrong.
Let us now show that a real Simpson's paradox cannot hap-
pen, that is, there cannot be a treatment that is bene�cial for
men and women but harmful overall. Suppose that treatment is
bene�cial for both sexes. Then
P(C1 = 1jZ = z) > P(C0 = 1jZ = z)
for all z. It then follows that
P(C1 = 1) =Xz
P(C1 = 1jZ = z)P(Z = z)
>Xz
P(C0 = 1jZ = z)P(Z = z)
= P(C0 = 1):
Hence, P(C1 = 1) > P(C0 = 1) so treatment is bene�cial overall.
No paradox.
19.5 Bibliographic Remarks 341
19.5 Bibliographic Remarks
The use of potential outcomes to clarify causation is due mainly
to Jerzy Neyman and Don Rubin. Later developments are due
to Jamie Robins, Paul Rosenbaum and others. A parallel devel-
opment took place in econometrics by various people including
Jim Heckman and Charles Manski. Currently, there are no text-
book treatments of causal inference from the potential outcomes
viewpoint.
19.6 Exercises
1. Create an example like Example 19.2 in which � > 0 and
� < 0.
2. Prove Theorem 19.4.
3. Suppose you are given data (X1; Y1); : : : ; (Xn; Yn) from an
observational study, where Xi 2 f0; 1g and Yi 2 f0; 1g.
Although it is not possible to estimate the causal e�ect �,
it is possible to put bounds on �. Find upper and lower
bounds on � that can be consistently estimated from the
data. Show that the bounds have width 1. Hint: Note that
E (C1) = E (C1jX = 1)P(X = 1) + E (C1jX = 0)P(X = 0).
4. Suppose that X 2 R and that, for each subject i, Ci(x) =
�1ix. Each subject has their own slope �1i. Construct a
joint distribution on (�1; X) such that P(�1 > 0) = 1
but E (Y jX = x) is a decreasing function of x, where Y =
C(X). Interpret. Hint: Write f(�1; x) = f(�1)f(xj�1). Choose
f(xj�1) so that when �1 is large, x is small and when �1
is small x is large.
This is page 343
Printer: Opaque this
20
Directed Graphs
20.1 Introduction
Directed graphs are similar to undirected graphs except that
there are arrows between vertices instead of edges. Like undi-
rected graphs, directed graphs can be used to represent inde-
pendence relations. They can also be used as an alternative to
counterfactuals to represent causal relationships. Some people
use the phrase Bayesian network to refer to a directed graph
endowed with a probability distribution. This is a poor choice of
terminology. Statistical inference for directed graphs can be per-
formed using frequentist or Bayesian methods so it is misleading
to call them Bayesian networks.
20.2 DAG's
A directed graph G consists of a set of vertices V and an edge
set E of ordered pairs of variables. If (X; Y ) 2 E then there is
an arrow pointing from X to Y . See Figure 20.1.
344 20. Directed Graphs
Y
X Z
FIGURE 20.1. A directed graph with V = fX;Y;Zg and
E = f(Y;X); (Y;Z)g.
If an arrow connects two variables X and Y (in either direc-
tion) we say that X and Y are adjacent. If there is an arrow
from X to Y then X is a parent of Y and Y is a child of
X. The set of all parents of X is denoted by �X or �(X). A
directed path from X to Y is a set of vertices beginning with
X, ending with Y such that each pair is connected by an arrow
and all the arrows point in the same direction as follows:
X �! � � � �! Y or X � � � � � Y
A sequence of adjacent vertices staring with X and ending with
Y but ignoring the direction of the arrows is called an undi-
rected path. The sequence fX; Y; Zg in Figure 20.1 is an undi-
rected path. X is an ancestor of Y is there is a directed path
from X to Y . We also say that Y is a descendant of X.
A con�guration of the form:
X �! Y � Z
is called a collider. A con�guration not of that form is called a
non-collider, for example,
X �! Y �! Z
or
X � Y �! Z
20.3 Probability and DAG's 345
A directed path that starts and ends at the same variable is
called a cycle. A directed graph is acyclic if it has no cycles. In
this case we say that the graph is a directed acyclic graph or
DAG. From now on, we only deal with graphs that are DAG's.
20.3 Probability and DAG's
Let G be a DAG with vertices V = (X1; : : : ; Xk).
De�nition 20.1 If P is a distribution for V with probability
function p, we say that If P is Markov to G or that G
represents P if
p(v) =
kYi=1
p(xi j �i) (20.1)
where �i are the parents of Xi. The set of distributions
represented by G is denoted by M(G).
Example 20.2 For the DAG in Figure 20.2, P 2 M(G) if and
only if its probability function p has the form
p(x; y; z; w) = p(x)p(y)p(z j x; y)p(w j z): �
The following theorem says that P 2M(G) if and only if the
Markov Condition holds. Roughly speaking, the Markov Con-
dition says that every variable W is independent of the \past"
given its parents.
Theorem 20.3 A distribution P 2 M(G) if and only if the fol-
lowing Markov Condition holds: for every variable W ,
W qfW j �W (20.2)
where fW denotes all the other variables except the parents and
descendants of W .
346 20. Directed Graphs
Y
X
Z W
FIGURE 20.2. Another DAG.
A
C
B
D E
FIGURE 20.3. Yet another DAG.
Example 20.4 In Figure 20.2, the Markov Condition implies that
X q Y and W q fX; Y g j Z: �
Example 20.5 Consider the DAG in Figure 20.3. In this case
probability function must factor like
p(a; b; c; d; e) = p(a)p(bja)p(cja)p(djb; c)p(ejd):
The Markov Condition implies the following independence rela-
tions:
D q A j fB;Cg; E q fA;B;Cg j D and B q C j A �
20.4 More Independence Relations 347
X1
X2
X3
X4
X5
FIGURE 20.4. And yet another DAG.
20.4 More Independence Relations
With undirected graphs, we saw that pairwise independence re-
lations implied other independence relations. Luckily, we were
able to deduce these other relations from the graph. The situ-
ation is similar for DAG's. The Markov Condition allows us to
list some independence relations. These relations may logically
imply other other independence relations. Consider the DAG in
Figure 20.4. The Markov Condition implies:
X1 qX2; X2 q fX1; X4g; X3 qX4 j fX1; X2g;
X4 q fX2; X3g j X1; X5 q fX1; X2g j fX3; X4g
It turns out (but it is not obvious) that these conditions imply
that
fX4; X5g qX2 j fX1; X3g:
How do we �nd these extra independence relations? The an-
swer is \d-separation." d-separation can be summarized by three In what follows, we
implicitly assume
that P is faithful
to G which means
that P has no
extra independence
relations other than
those logically im-
plied by the Markov
Condition.
rules. Consider the four DAG's in Figure 20.5 and the DAG in
Figure 20.6. The �rst 3 DAG's in Figure 20.5 are non-colliders.
The DAG in the lower right of Figure 20.5 is a collider. The
DAG in Figure 20.6 is a collider with a descendent.
348 20. Directed Graphs
X Y Z X Y Z
X Y Z X Y Z
FIGURE 20.5. The �rst three DAG's have no colliders. The fourth
DAG in the lower right corner has a collider at Y .
X Y Z
W
FIGURE 20.6. A collider with a descendent.
The rules of d-separation.
Consider the DAG's in Figures 20.5 and 20.6.
Rule (1). In a non-collider, X and Z are d-connected,
but they are d-separated given Y .
Rule (2). If X and Z collide at Y then X and Z are
d-separated but they are d-connected given Y .
Rule (3). Conditioning on the descendant of a collider
has the same e�ect as conditioning on the collider. Thus
in Figure 20.6, X and Z are d-separated but they are
d-connected given W .
Here is a more formal de�nition of d-separation. Let X and Y
be distinct vertices and letW be a set of vertices not containing
X or Y . Then X and Y are d-separated given W if there
20.4 More Independence Relations 349
X U V W Y
S1 S2
FIGURE 20.7. d-separation explained.
exists no undirected path U between X and Y such that (i)
every collider on U has a descendant in W and (ii) no other
vertex on U is in W . If U; V and W are distinct sets of vertices
and U and V are not empty, then U and V are d-separated given
W if for every X 2 U and Y 2 V , X and Y are d-separated
given W . Vertices that are not d-separated are said to be d-
connected.
Example 20.6 Consider the DAG in Figure 20.7. From the d-
seperation rules we conclude that:
X and Y are d-separated (given the empty set)
X and Y are d-connected given fS1; S2g
X and Y are d-separated given fS1; S2; V g
Theorem 20.7 (Spirtes, Glymour and Scheines) Let A, B
and C be disjoint sets of vertices. Then A q B j C if and only
if A and B are d-separated by C.
Example 20.8 The fact that conditioning on a collider creates
dependence might not seem intuitive. Here is a whimsical ex-
ample from Jordan (2003) that makes this idea more palatable.
Your friend appears to be late for a meeting with you. There
are two explanations: she was abducted by aliens or you forgot
to set your watch ahead one hour for daylight savings time. See
Figure 20.8 Aliens and Watch are blocked by a collider which
implies they are marginally independent. This seems reasonable
since, before we know anything about your friend being late, we
would expect these variables to be independent. We would also
expect that P(Aliens = yesjLate = Y es) > P(Aliens = yes);
350 20. Directed Graphs
late
aliens watch
FIGURE 20.8. Jordan's alien example. Was your friend kidnapped
by aliens or did you forget to set your watch?
learning that your friend is late certainly increases the probabil-
ity that she was abducted. But when we learn that you forgot to
set your watch properly, we would lower the chance that your
friend was abducted. Hence, P(Aliens = yesjLate = Y es) 6=
P(Aliens = yesjLate = Y es;Watch = no). Thus, Aliens and
Watch are dependent given Late.
Graphs that look di�erent may actually imply the same inde-
pendence relations. If G is a DAG, we let I(G) denote all the in-
dependence statements implied by G. Two DAG's G1 and G2 for
the same variables V areMarkov equivalent if I(G1) = I(G2).
Given a DAG G, let skeleton(G) denote the undirected graph ob-
tained by replacing the arrows with undirected edges.
Theorem 20.9 Two DAG's G1 and G2 are Markov equivalent if
and only if (i) skeleton(G1) = skeleton(G2) and (ii) G1 and G2have the same colliders.
Example 20.10 The �rst three DAG's in Figure 20.5 are Markov
equivalent. The DAG in the lower right of the Figure is not
Markov equivalent to the others. �
20.5 Estimation for DAG's
Let G be a DAG. Assume that all the variables V = (X1; : : : ; Xm)
are discrete. The probability function can be written
p(v) =
kYi=1
p(xi j �i): (20.3)
20.5 Estimation for DAG's 351
To estimate p(v) we need to estimate p(xi j �i) for each i. Think
of the parents of Xi as one discrete variable eXi with many levels.
For example, suppose that X3 has parents X1 and X2 and that
X1 2 f0; 1; 2g while X2 2 f0; 1g. We can regard the parents X1
and X2 as a single variable eX3 de�ned by
eX3 =
8>>>>>><>>>>>>:
1 if X1 = 0; X2 = 0
2 if X1 = 0; X2 = 1
3 if X1 = 1; X2 = 0
4 if X1 = 1; X2 = 1
5 if X1 = 2; X2 = 0
6 if X1 = 2; X2 = 1:
Hence we can write
p(v) =
kYi=1
p(xi j exi): (20.4)
Theorem 20.11 Let V1; : : : ; Vn be iid random vectors from dis-
tribution p given in (20.4). The maximum likelihood estimator
of p is
bp(v) = kYi=1
bp(xi j exi) (20.5)
where
bp(xi j exi) = #fi : Xi = xi and eXi = exig#fi : eXi = exig :
Example 20.12 Let V = (X; Y; Z) have the DAG in the top left of
Figure 20.5. Given n observations (X1; Y1; Z1); : : : ; (Xn; Yn; Zn),
the mle of p(x; y; z) = p(x)p(yjx)p(zjy) is
bp(x) = #fi : Xi = xg
n;
bp(yjx) = #fi : Xi = x and Yi = yg
#fi : Yi = yg;
and bp(zjy) = #fi : Yi = y and Zi = zg
#fi : Zi = zg:
352 20. Directed Graphs
It is possible to extend these ideas to continuous random vari-
ables as well. For example, we might use some parametric model
p(xj�x; �x) for each condiitonal density. The likelihood function
is then
L(�) =
nYi=1
p(Vi; �) =
nYi=1
mYj=1
p(Xijj�j; �j)
where Xij is the value of Xj for the ith data point and �j are the
parameters for the jth conditional density. We can then proceed
using maximum likelihood.
So far, we have assumed that the structure of the DAG is
given. One can also try to estimate the structure of the DAG it-
self from the data. For example, we could �t every possible DAG
using maximum likelihood and use AIC (or some other method)
to choose a DAG. However, there are many possible DAG's so
you would need much data for such a method to be reliable.
Producing a valid, accurate con�dence set for the DAG struc-
ture would require astronomical sample sizes. DAG's are thus
most useful for encoding conditional independence information
rather than discovering it.
20.6 Causation Revisited
We discussed causation in Chapter 19 using the idea of coun-
terfactual random variables. A di�erent approach to causation
uses DAG's. The two approaches are mathematically equivalent
though they appear to be quite di�erent. In Chapter 19, the
extra element we added to clarify causation was the idea of a
counterfactual. In the DAG approach, the extra element is the
idea of intervention. Consider the DAG in Figure 20.9.
The probability function for a distribution consistent with
this DAG has the form p(x; y; z) = p(x)p(yjx)p(zjx; y). Here is
pseudo-code for generating from this distribution:
20.6 Causation Revisited 353
X Y Z
FIGURE 20.9. Conditioning versus intervening.
for i = 1; : : : ; n :
xi <� pX(xi)
yi <� pY jX(yijxi)
zi <� pZjX;Y (zijxi; yi)
Suppose we repeat this code many times yielding data (x1; y1; z1); : : : ; (xn; yn; zn).
Among all the times that we observe Y = y, how often is Z = z?
The answer to this question is given by the conditional distri-
bution of ZjY . Speci�cally,
P(Z = zjY = y) =P(Y = y; Z = z)
P(Y = y)=p(y; z)
p(y)
=
Px p(x; y; z)
p(y)=
Px p(x) p(yjx) p(zjx; y)
p(y)
=Xx
p(zjx; y)p(yjx) p(x)
p(y)=Xx
p(zjx; y)p(x; y)
p(y)
=Xx
p(zjx; y) p(xjy):
Now suppose we intervene by changing the computer code.
Speci�cally, suppose we �x Y at the value y. The code now
looks like this:
354 20. Directed Graphs
set Y = y
for i = 1; : : : ; n
xi <� pX(xi)
zi <� pZjX;Y (zijxi; y)
Having set Y = y, how often was Z = z? To answer, note
that the intervention has changed the joint probability to be
p�(x; z) = p(x)p(zjx; y):
The answer to our question is given by the marginal distribution
p�(z) =Xx
p�(x; z) =Xx
p(x)p(zjx; y):
We shall denote this as P(Z = zjY := y) or p(zjY := y). We call
P(Z = zjY = y) conditioning by observation or passive
conditioning. We call P(Z = zjY := y) conditioning by
intervention or active conditioning.
Passive conditioning is used to answer a predictive question
like:
\Given that Joe smokes, what is the probability he will get lung
cancer?"
Active conditioning is used to answer a causal question like:
\If Joe quits smoking, what is the probability he will get lung
cancer?"
Consider a pair (G;P) where G is a DAG and P is a distribu-
tion for the variables V of the DAG. Let p denote the probability
function for P. Consider intervening and �xing a variable X to
be equal to x. We represent the intervention by doing two things:
(1) Create a new DAG G� by removing all arrows pointing
into X;
(2) Create a new distribution p�(v) = P(V = vjX := x) by
removing the term p(xj�X) from p(v).
20.6 Causation Revisited 355
The new pair (G�; P �) represents the intervention \set X =
x."
Example 20.13 You may have noticed a correlation between rain
and having a wet lawn, that is, the variable \Rain" is not in-
dependent of the variable \Wet Lawn" and hence pR;W (r; w) 6=
pR(r)pW (w) where R denotes Rain and W denotes Wet Lawn.
Consider the following two DAGS:
Rain �!Wet Lawn Rain �Wet Lawn:
The �rst DAG implies that p(w; r) = p(r)p(wjr) while the sec-
ond implies that p(w; r) = p(w)p(rjw) No matter what the joint
distribution p(w; r) is, both graphs are correct. Both imply that
R and W are not independent. But, intuitively, if we want a
graph to indicate causation, the �rst graph is right and the sec-
ond is wrong. Throwing water on your lawn doesn't cause rain.
The reason we feel the �rst is correct while the second is wrong is
because the interventions implied by the �rst graph are correct.
Look at the �rst graph and form the interventionW = 1 where
1 denotes \wet lawn." Following the rules of intervention, we
break the arrows into W to get the modi�ed graph:
Rain set Wet Lawn =1
with distribution p�(r) = p(r). Thus P(R = r j W := w) =
P(R = r) tells us that \wet lawn" does not cause rain.
Suppose we (wrongly) assume that the second graph is the
correct causal graph and form the intervention W = 1 on the
second graph. There are no arrows intoW that need to be broken
so the intervention graph is the same as the original graph. Thus
p�(r) = p(rjw) which would imply that changing \wet" changes
\rain." Clearly, this is nonsense.
Both are correct probability graphs but only the �rst is correct
causally. We know the correct causal graph by using background
knowledge.
356 20. Directed Graphs
Remark 20.14 We could try to learn the correct causal graph
from data but this is dangerous. In fact it is impossible with two
variables. With more than two variables there are methods that
can �nd the causal graph under certain assumptions but they are
large sample methods and, furthermore, there is no way to ever
know if the sample size you have is large enough to make the
methods reliable.
We can use DAG's to represent confouding variables. If X is
a treatment and Y is an outcome, a confounding variable Z is
a variable with arrows into both X and Y ; see Figure 20.10. It
is easy to check, using the formalism of interventions, that the
following facts are true.
In a randomized study, the arrow betwen Z and X is bro-
ken. In this case, even with Z unobserved (represented by en-
closing Z in a circle), the causal relationship between X and
Y is estimable because it can be shown that E (Y jX := x) =
E (Y jX = x) which does not involve the unobserved Z. In
an observational study, with all confounders observed, we get
E (Y jX := x) =RE (Y jX = x; Z = z)dFZ(z) as in formula
(19.8). If Z is unobserved then we cannot estimate the causal
e�ect because E (Y jX := x) =RE (Y jX = x; Z = z)dFZ(z)
involves the unobserved Z. We can't just use X and Y since in
this case. P(Y = yjX = x) 6= P(Y = yjX := x) which is just
another way of saying that causation is not association.
In fact, we can make a precise connection between DAG's and
counterfactuals as follows. Suppose that X and Y are binary.
De�ne the confounding variable Z by
Z =
8>><>>:
1 if (C0; C1) = (0; 0)
2 if (C0; C1) = (0; 1)
3 if (C0; C1) = (1; 0)
4 if (C0; C1) = (1; 1):
20.7 Bibliographic Remarks 357
X Y
Z
X Y
Z
X Y
Z
FIGURE 20.10. Randomized study; Observational study with mea-
sured confounders; Observational study with unmeasured con-
founders. The circled variables are unobserved.
From this, you can make the correspondence between the DAG
approach and the counterfactual approach explicit. I leave this
for the interested reader.
20.7 Bibliographic Remarks
There are a number of texts on DAG's including Edwards (1996)
and Jordan (2003). The �rst use of DAG's for representing
causal relationships was by Wright (1934). Modern tratments
are contained in Spirtes, Glymour, and Scheines (1990) and
Pearl (2000). Robins, Scheines, Spirtes and Wasserman (2003)
discuss the problems with estimating causal structure from data.
20.8 Exercises
1. Consider the three DAG's in Figure 20.5 without a col-
lider. Prove that X q ZjY .
2. Consider the DAG in Figure 20.5 with a collider. Prove
that X q Z and that X and Z are dependent given Y .
3. Let X 2 f0; 1g, Y 2 f0; 1g, Z 2 f0; 1; 2g. Suppose the
distribution of (X; Y; Z) is Markov to:
X �! Y �! Z
358 20. Directed Graphs
Create a joint distribution p(x; y; z) that is Markov to this
DAG. Generate 1000 random vectors from this distribu-
tion. Estimate the distribution from the data using max-
imum likelihood. Compare the estimated distribution to
the true distribution. Let � = (�000; �001; : : : ; �112) where
�rst = P(X = r; Y = s; Z = t). Use the bootstrap to get
standard errors and 95 per cent con�dence intervals for
these 12 parameters.
4. Let V = (X; Y; Z) have the following joint distribution
X � Bernoulli
�1
2
�
Y j X = x � Bernoulli
�e4x�2
1 + e4x�2
�
Z j X = x; Y = y � Bernoulli
�e2(x+y)�2
1 + e2(x+y)�2
�:
(a) Find an expression for P(Z = z j Y = y). In particular,
�nd P(Z = 1 j Y = 1).
(b) Write a program to simulate the model. Conduct a
simulation and compute P(Z = 1 j Y = 1) empirically.
Plot this as a function of the simulation size N . It should
converge to the theoretical value you computed in (a).
(c) Write down an expression for P(Z = 1 j Y := y). In
particular, �nd P(Z = 1 j Y := 1).
20.8 Exercises 359
(d) Modify your program to simulate the intervention \set
Y = 1." Conduct a simulation and compute P(Z = 1 j
Y := 1) empirically. Plot this as a function of the simu-
lation size N . It should converge to the theoretical value
you computed in (c).
5. This is a continuous, Gaussian version of the last question.
Let V = (X; Y; Z) have the following joint distribution
X � Normal (0; 1)
Y j X = x � Normal (�x; 1)
Z j X = x; Y = y � Normal (�y + x; 1):
Here, �; � and are �xed parameters. Economists refer to
models like this as structural equation models.
(a) Find an explicit expression for f(z j y) and E (Z j Y =
y) =Rzf(z j y)dz.
(b) Find an explicit expression for f(z j Y := y) and then
�nd E (Z j Y := y) �Rzf(z j Y := y)dy. Compare to (b).
(c) Find the joint distribution of (Y; Z). Find the correla-
tion � between Y and Z.
(d) Suppose that X is not observed and we try to make
causal conclusions from the marginal distribution of (Y; Z).
(Think ofX as unobserved confounding variables.) In par-
ticular, suppose we declare that Y causes Z if � 6= 0 and
360 20. Directed Graphs
we declare that Y does not cause Z if � = 0. Show that
this will lead to erroneous conclusions.
(e) Suppose we conduct a randomized experiment in which
Y is randomly assigned. To be concrete, suppose that
X � Normal(0; 1)
Y � Normal(�; 1)
Z j X = x; Y = y � Normal(�y + x; 1):
Show that the method in (d) now yields correct conclu-
sions i.e. � = 0 if and only if f(z j Y := y) does not depend
on y.
This is page 359
Printer: Opaque this
21
Nonparametric Curve Estimation
In this Chapter we discuss nonparametric estimation of proba-
bility density functions and regression functions which we refer
to as curve estimation.
In Chapter 8 we saw that it is possible to consistently estimate
a cumulative distribution function F without making any as-
sumptions about F . If we want to estimate a probability density
function f(x) or a regression function r(x) = E (Y jX = x) the
situation is di�erent. We cannot estimate these functions con-
sistently without making some smoothness assumptions. Corre-
spondingly, we will perform some sort of smoothing operation
on the data.
A simple example of a density estimator is a histogram,
which we discuss in detail in Section 21.2. To form a histogram
estimator of a density f , we divide the real line to disjoint sets
called bins. The histogram estimator is piecewise constant func-
tion where the height of the function is proportional to number
of observations in each bin; see Figure 21.3. The number of bins
is an example of a smoothing parameter. If we smooth too
360 21. Nonparametric Curve Estimation
bg(x)
This is a function of the data This is the point at which we are
evaluating bg(�)FIGURE 21.1. A curve estimate bg is random because it is a function
of the data. The point x at which we evaluate bg is not a random
variable.
much (large bins) we get a highly biased estimator while if we
smooth too little (small bins) we get a highly variable estimator.
Much of curve estimation is concerned with trying to optimally
balance variance and bias.
21.1 The Bias-Variance Tradeo�
Let g denote an unknown function and let bgn denote an es-
timator of g. Bear in mind that bgn(x) is a random function
evaluated at a point x. bgn is random because it depends on
the data. Indeed, we could be more explicit and write bgn(x) =hx(X1; : : : ; Xn) to show that bgn(x) is a function of the data
X1; : : : ; Xn and that the function could be di�erent for each x.
See Figure 21.1.
As a loss function, we will use the integrated squared error
(ISE):
L(g; bgn) = Z (g(u)� bgn(u))2 du: (21.1)
The risk or mean integrated squared error (MISE) is
R(f; bf) = E
�L(g; bg)�: (21.2)
21.2 Histograms 361
Lemma 21.1 The risk can be written as
R(g;bgn) = Z b2(x) dx+
Zv(x) dx (21.3)
where
b(x) = E (bgn(x))� g(x) (21.4)
is the bias of bgn(x) at a �xed x and
v(x) = V(bgn(x)) = E
�(bgn(x)� E (bgn(x))2)� (21.5)
is the variance of bgn(x) at a �xed x.
In summary,
RISK = BIAS2 +VARIANCE: (21.6)
When the data are over-smoothed, the bias term is large and
the variance is small. When the data are under-smoothed the op-
posite is true; see Figure 21.2. This is called the bias-variance
trade-o�. Minimizing risk corresponds to balancing bias and
variance.
21.2 Histograms
Let X1; : : : ; Xn be iid on [0; 1] with density f . The restriction
to [0; 1] is not crucial; we can always rescale the data to be on
this interval. Let m be an integer and de�ne bins
B1 =
�0;
1
m
�; B2 =
�1
m;2
m
�; : : : ; Bm =
�m� 1
m; 1
�: (21.7)
De�ne the binwidth h = 1=m, let �j be the number of obser-
vations in Bj, let bpj = �j=n and let pj =RBjf(u)du.
362 21. Nonparametric Curve Estimation
0.1 0.2 0.3 0.4 0.5
0.00
0.02
0.04
0.06
0.08
0.10
more smoothing −−−>
Bias squared
Variance
Risk
Optimal Smoothing
FIGURE 21.2. The Bias-Variance trade-o�. The bias increases and
the variance decreases with the amount of smoothing. The optimal
amount of smoothing, indicated by the vertical line, minimizes the
risk = bias2 + variance.
21.2 Histograms 363
The histogram estimator is de�ned by
bfn(x) =8>>><>>>:bp1=h x 2 B1bp2=h x 2 B2
......bpm=h x 2 Bm
which we can write more succinctly as
bfn(x) = nXj=1
bpjhI(x 2 Bj): (21.8)
To
understand the motivation for this estimator, let pj =RBjf(u)du
and note that, for x 2 Bj and h small,
bfn(x) = bpjh� pj
h=
RBjf(u)du
h� f(x)h
f(x)= f(x):
Example 21.2 Figure 21.3 shows three di�erent histograms based
on n = 1; 266 data points from an astronomical sky survey.
Each data point represents the distance from us to a galaxy.
The galaxies lie on a \pencilbeam" pointing directly from the
Earth out into space. Because of the �nite speed of light, look-
ing at galaxies farther and farther away corresponds to looking
back in time. Choosing the right number of bins involves �nding
a good tradeo� between bias and variance. We shall see later
that the top left histogram has too few bins resulting in over-
smoothing and too much bias. The bottom left histogram has too
many bins resulting in undersmoothing and too few bins. The
top right histogram is just right. The histogram reveals the pres-
ence of clusters of galaxies. Seeing how the size and number of
galaxy clusters varies with time, helps cosmologists understand
the evolution of the universe. �
The mean and variance of bfn(x) are given in the following
Theorem.
364 21. Nonparametric Curve Estimation
Oversmoothed
0.00 0.05 0.10 0.15 0.20
02
46
810
1214
Just Right
0.00 0.05 0.10 0.15 0.20
010
2030
4050
60
Undersmoothed
0.00 0.05 0.10 0.15 0.20
020
4060
0 200 400 600 800 1000
−14
−12
−10
−8−6
number of bins
cros
s va
lidat
ion
scor
e
FIGURE 21.3. Three versions of a histogram for the astronomy data.
The top left histogram has too few bins. The bottom left histogram
has too many bins. The top right histogram is just right. The lower,
right plot shows the estimated risk versus the number of bins.
21.2 Histograms 365
Theorem 21.3 Consider �xed x and �xed m, and let Bj be the
bin containing x. Then,
E ( bfn(x)) = pj
hand V( bfn(x)) = pj(1� pj)
nh2: (21.9)
Let's take a closer look at the bias-variance tradeo� using
equation (21.9) . Consider some x 2 Bj. For any other u 2 Bj,
f(u) � f(x) + (u� x)f 0(x)
and so
pj =
ZBj
f(u)du �ZBj
�f(x) + (u� x)f 0(x)
�du
= f(x)h + hf 0(x)
�h
�j � 1
2
�� x
�:
Therefore, the bias b(x) is
b(x) = E ( bfn(x))� f(x) =pj
h� f(x)
� f(x)h + hf 0(x)�h�j � 1
2
�� x�
h� f(x)
= f 0(x)
�h
�j � 1
2
�� x
�:
If exj is the center of the bin, thenZBj
b2(x) dx �ZBj
(f 0(x))2�h
�j � 1
2
�� x
�2
dx
� (f 0(exj))2 ZBj
�h
�j � 1
2
�� x
�2
dx
= (f 0(exj))2h312:
Therefore,Z 1
0
b2(x)dx =
mXj=1
ZBj
b2(x)dx �mXj=1
(f 0(exj))2h312
=h2
12
mXj=1
h (f 0(exj))2 � h2
12
Z 1
0
(f 0(x))2dx:
366 21. Nonparametric Curve Estimation
Note that this increases as a function of h. Now consider the
variance. For h small, 1� pj � 1, so
v(x) � pj
nh2
=f(x)h+ hf 0(x)
�h�j � 1
2
�� x�
nh2
� f(x)
nh
where we have kept only the dominant term. So,Z 1
0
v(x)dx � 1
nh:
Note that this decreases with h. Putting all this together, we
get:
Theorem 21.4 Suppose thatR(f 0(u))2du <1. Then
R( bfn; f) � h2
12
Z(f 0(u))2du+
1
nh: (21.10)
The value h� that minimizes (21.10) is
h� =1
n1=3
�6R
(f 0(u))2du
�1=3
: (21.11)
With this choice of binwidth,
R( bfn; f) � C
n2=3(21.12)
where C = (3=4)2=3�R
(f 0(u))2du�1=3
.
Theorem 21.4 is quite revealing. We see that with an opti-
mally chosen binwidth, the MISE decreases to 0 at rate n�2=3.
By comparison, most parametric estimators converge at rate
21.2 Histograms 367
n�1. The slower rate of convergence is the price we pay for be-
ing nonparametric. The formula for the optimal binwidth h� is
of theoretical interest but it is not useful in practice since it
depends on the unknown function f .
A practical way to choose the binwidth is to estimate the
risk function and minimize over h. Recall that the loss function,
which we now write as a function of h, is
L(h) =
Z( bfn(x)� f(x))2 dx
=
Z bf 2n(x) dx� 2
Z bfn(x)f(x)dx + Z f 2(x) dx:
The last term does not depend on the binwidth h so minimizing
the risk is equivalent to minimizing the expected value of
J(h) =
Z bf 2n(x) dx� 2
Z bfn(x)f(x)dx:We shall refer to E (J(h)) as the risk, although it di�ers from
the true risk by the constant termRf 2(x) dx.
De�nition 21.5 The cross-validation estimator of risk
is
bJ(h) = Z � bfn(x)�2 dx� 2
n
nXi=1
bf(�i)(Xi) (21.13)
where bf(�i) is the histogram estimator obtained after re-
moving the ith observation. We refer to bJ(h) as the cross-validation score or estimated risk.
Theorem 21.6 The cross-validation estimator is nearly unbiased:
E ( bJ(x)) � E (J(x)):
368 21. Nonparametric Curve Estimation
In principle, we need to recompute the histogram n times to
compute bJ(h). Moreover, this has to be done for all values of h.
Fortunately, there is a shortcut formula.
Theorem 21.7 The following identity holds:
bJ(h) = 2
(n� 1)h� n + 1
(n� 1)
mXj=1
bp2j: (21.14)
Example 21.8 We used cross-validation in the astronomy exam-
ple. The cross-validation function is quite at near its mini-
mum. Any m in the range of 73 to 310 is an approximate min-
imizer but the resulting histogram does not change much over
this range. The histogram in the top right plot in Figure 21.3
was constructed using m = 73 bins. The bottom right plot shows
the estimated risk, or more precisely, bA, plotted versus the num-
ber of bins.
Next we want some sort of con�dence set for f . Suppose bfnis a histogram with m bins and binwidth h = 1=m. We cannot
realistically make con�dence statements about the �ne details of
the true density f . Instead, we shall make con�dence statements
about f at the resolution of the histogram. To this end, de�ne
ef(x) = pj
hfor x 2 Bj (21.15)
where pj =RBjf(u)du which is a \histogramized" version of f .
Theorem 21.9 Let m = m(n) be the number of bins in the his-
togram bfn. Assume that m(n) ! 1 and m(n) logn=n ! 0 as
n!1. De�ne
`(x) =
�max
nqbfn(x)� c; 0o�2
u(x) =
�qbfn(x) + c
�2
(21.16)
21.2 Histograms 369
where
c =z�=(2m)
2
rm
n: (21.17)
Then,
limn!1
P
�`(x) � ef(x) � u(x) for all x
�� 1� �: (21.18)
Proof. Here is an outline of the proof. A rigorous proof
requires fairly sophisticated tools. From the central limit the-
orem, bpj � N(pj; pj(1 � pj)=n). By the delta method,pbpj �
N(ppj; 1=(4n)). Moreover, it can be shown that the
pbpj's areapproximately independent. Therefore,
2pn�pbpj �ppj� � Zj (21.19)
where Z1; : : : ; Zm � N(0; 1). Let A be the event that `(x) �ef(x) � u(x) for all x. So,
A =n`(x) � ef(x) � u(x) for all x
o=
np`(x) �
qef(x) �pu(x) for all xo
=nqbfn(x)� c �
qef(x) �qbfn(x) + c for all xo
=nmaxx
����qbfn(x)�qef(x)���� � co:
Therefore,
P(Ac) = P
�maxx
����qbfn(x)�qef(x)���� > c
�= P
maxj
�����rbpj
h�r
pj
h
����� > c
!
= P
�maxj
���pbpj �ppj��� > cph
�= P
�maxj
2pn���pbpj �ppj��� > 2c
phn
�= P
�maxj
2pn���pbpj �ppj��� > z�=(2m)
�� P
�maxj
jZjj > z�=(2m)
��
mXj=1
P�jZjj > z�=(2m)
�
370 21. Nonparametric Curve Estimation
=
mXj=1
�
m= �: �
Example 21.10 Figure 21.4 shows a 95 per cent con�dence enve-
lope for the astronomy data. We see that even with over 1,000
data points, there is still substantial uncertainty. �
21.3 Kernel Density Estimation
Histograms are discontinuous.Kernel density estimators are
smoother and they converge faster to the true density than his-
tograms.
Let X1; : : : ; Xn denote the observed data, a sample from f .
In this chapter, a kernel is de�ned to be any smooth func-
tion K such that K(x) � 0,RK(x) dx = 1,
RxK(x)dx = 0
and �2K� R
x2K(x)dx > 0. Two examples of kernels are the
Epanechnikov kernel
K(x) =
�34(1� x2=5)=
p5 jxj < p
5
0 otherwise(21.20)
and the Gaussian (Normal) kernel K(x) = (2�)�1=2e�x2=2.
De�nition 21.11 Given a kernel K and a positive number
h, called the bandwidth, the kernel density estima-
tor is de�ned to be
bf(x) = 1
n
nXi=1
1
hK
�x�Xi
h
�: (21.21)
An example of a kernel density estimator is show in Figure
21.5. The kernel estimator e�ectively puts a smoothed out lump
of mass of size 1=n over each data point Xi. The bandwidth h
21.3 Kernel Density Estimation 371
0.00 0.05 0.10 0.15 0.20
010
2030
4050
FIGURE 21.4. 95 per cent con�dence envelope for astronomy data
using m = 73 bins.
372 21. Nonparametric Curve Estimation
−10 −5 0 5
0.00
0.05
0.10
0.15
FIGURE 21.5. A kernel density estimator bf . At each point x, bf(x)
is the average of the kernels centered over the data points Xi. The
data points are indicated by short vertical bars.
21.3 Kernel Density Estimation 373
controls the amount of smoothing. When h is close to 0, bfnconsists of a set of spikes, one at each data point. The height of
the spikes tends to in�nity as h ! 0. When h ! 1, bfn tends
to a uniform density.
Example 21.12 Figure 21.6 shows kernel density estimators for
the astronomy data using for three di�erent bandwidths. In each
case we used a Gaussian kernel. The properly smoothed kernel
density estimator in the top right panel shows similar structure
as the histogram. However, it is easier to see the clusters in the
kernel estimator. �
To construct a kernel density estimator, we need to choose
a kernel K and a bandwidth h. It can be shown theoretically
and empirically that the choice of K is not crucial. However, the
choice of bandwidth h is very important. As with the histogram,
we can make a theoretical statement about how the risk of the
estimator depends on the bandwidth.
Theorem 21.13 Under weak assumptions on f and K,
R(f; bfn) � 1
4�4Kh4Z(f
00
(x))2 +
RK2(x)dx
nh(21.22)
where �2K=Rx2K(x)dx. The optimal bandwidth is
h� =c�2=51 c
1=52 c
�1=53
n1=5(21.23)
where c1 =Rx2K(x)dx, c2 =
RK(x)2dx and c3 =
R(f 00(x))2dx.
With this choice of bandwidth,
R(f; bfn) � c4
n4=5
for some constant c4 > 0.
Proof. Write Kh(x;X) = h�1K ((x�X)=h) and bfn(x) =
n�1P
iKh(x;Xi). Thus, E [ bfn(x)] = E [Kh(x;X)] and V[ bfn(x)] =
374 21. Nonparametric Curve Estimation
0.00 0.05 0.10 0.15 0.20
02
46
810
12
oversmoothed
0.00 0.05 0.10 0.15 0.20
010
2030
4050
6070
just right
0.00 0.05 0.10 0.15 0.20
020
040
060
080
010
0012
0014
00
undersmoothed
0.002 0.004 0.006 0.008 0.010 0.012
−7.8
−7.6
−7.4
−7.2
−7.0
bandwidth
cross
−vali
datio
n sc
ore
FIGURE 21.6. Kernel density estimators and estimated risk for the
astronomy data.
21.3 Kernel Density Estimation 375
n�1V[Kh(x;X)]. Now,
E [Kh(x;X)] =
Z1
hK
�x� t
h
�f(t) dt
=
ZK(u)f(u� hu) du
=
ZK(u)
�f(u)� hf
0
(u) +1
2f00
(u) + � � ��du
= f(u) +1
2h2f
00
(u)
Zu2K(u) du � � �
sinceRK(x) dx = 1 and
RxK(x) dx = 0. The bias is
E [Kh(x;X)]� f(x) � 1
2�2kh2f
00
(x):
By a similar calculation,
V[ bfn(x)] � f(x)RK2(x) dx
n hn:
The second result follows from integrating the squared bias plus
the variance. �
We see that kernel estimators converge at rate n�4=5 while his-
tograms converge at the slower rate n�2=3. It can be shown that,
under weak assumptions, there does not exist a nonparametric
estimator that converges faster than n�4=5.
The expression for h� depends on the unknown density f
which makes the result of little practical use. As with the his-
tograms, we shall use cross-validation to �nd a bandwidth. Thus,
we estimate the risk (up to a constant) by
bJ(h) = Z bf 2(x)dz � 21
n
nXi=1
bf�i(Xi) (21.24)
where bf�i is the kernel density estimator after omitting the ith
observation.
376 21. Nonparametric Curve Estimation
Theorem 21.14 For any h > 0,
E
h bJ(h)i = E [J(h)] :
Also,
bJ(h) � 1
hn2
Xi
Xj
K�
�Xi �Xj
h
�+
2
nhK(0) (21.25)
whereK�(x) = K(2)(x)�2K(x) andK(2)(z) =RK(z�y)K(y)dy.
In particular, if K is a N(0,1) Gaussian kernel then K(2)(z) is
the N(0; 2) density.
We then choose the bandwidth hn that minimizes bJ(h). AFor large data sets,bf and (21.25)
can be computed
quickly using the
fast Fourier trans-
form.
justi�cation for this method is given by the following remarkable
theorem due to Stone.
Theorem 21.15 (Stone's Theorem.) Suppose that f is bounded.
Let bfh denote the kernel estimator with bandwidth h and let hn
denote the bandwidth chosen by cross-validation. Then,
R �f(x)� bfhn(x)�2 dx
infhR �
f(x)� bfh(x)�2 dxP�! 1: (21.26)
Example 21.16 The top right panel of Figure 21.6 is based on
cross-validation. These data are rounded which problems for
cross-validation. Speci�cally, it causes the minimizer to be h =
0. To overcome this problem, we added a small amount of ran-
dom Normal noise to the data. The result is that bJ(h) is very
smooth with a well de�ned minimum.
Remark 21.17 Do not assume that, if the estimator bf is wiggly,
then cross-validation has let you down. The eye is not a good
judge of risk.
21.3 Kernel Density Estimation 377
To construct con�dence bands, we use something similar to
histograms although the details are more complicated. The ver-
sion described here is from Chaudhuri and Marron (1999).
378 21. Nonparametric Curve Estimation
Con�dence Band for Kernel Density Estimator
1. Choose an evenly spaced grid of points V = fv1; : : : ; vNg.For every v 2 V, de�ne
Yi(v) =1
hK
�v �Xi
h
�: (21.27)
Note that bfn(v) = Y n(v), the average of the Yi(v)0s.
2. De�ne
se (v) =s(v)pn
(21.28)
where s2(v) = (n� 1)�1P
n
i=1(Yi(v)� Y n(v))2.
3. Compute the e�ective sample size
ESS(v) =
Pn
i=1K�(v�Xi)
h
�K(0)
: (21.29)
Roughly, this is the number of data points being averaged
together to form bfn(v).4. Let V� = fv 2 V : ESS(v) � 5g. Now de�ne the number
of independent blocks m by
1
m=
ESS
n(21.30)
where ESS is the average of ESS over V�.
5. Let
q = ��1�1 + (1� �)1=m
2
�(21.31)
and de�ne
`(v) = bfn(v)� q se (v) and u(v) = bfn(v) + q se (v)
(21.32)
where we replace `(v) with 0 if it becomes negative.
Then,
P
n`(v) � f(v) � u(v) for all v
o� 1� �:
21.3 Kernel Density Estimation 379
Example 21.18 Figure 21.7 shows approximate 95 per cent con-
�dence bands for the astronomy data. �
Suppose now that the dataXi = (Xi1; : : : ; Xid) are d-dimensional.
The kernel estimator can easily be generalized to d dimensions.
Let h = (h1; : : : ; hd) be a vector of bandwidths and de�ne
bfn(x) = 1
n
nXi=1
Kh(x�Xi) (21.33)
where
Kh(x�Xi) =1
nh1 � � �hd
(dY
j=1
K
�xi �Xij
hj
�)(21.34)
where h1; : : : ; hd are bandwidths. For simplicity, we might take
hj = sjh where sj is the standard deviation of the jth variable.
There is now only a single bandwidth h to choose. Using calcu-
lations like those in the one-dimensional case, the risk is given
by
R(f; bfn) � 1
4�4K
"dX
j=1
h4j
Zf 2jj(x)dx+
Xj 6=k
h2jh2k
Zfjjfkkdx
#
+
�RK2(x)dx
�dnh1 � � �hd
where fjj is the second partial derivative of f . The optimal
bandwidth satis�es hi � c1n�1=(4+d)) leading to a risk of order
n�4=(4+d). From this fact, we see that the risk increases quickly
with dimension, a problem usually called the curse of dimen-
sionality. To get a sense of how serious this problem is, con-
sider the following table from Silverman (1986) which shows the
sample size required to ensure a relative mean squared error less
than 0.1 at 0 when the density is multivariate normal and the
optimal bandwidth is selected.
380 21. Nonparametric Curve Estimation
0.00 0.05 0.10 0.15 0.20
010
2030
40
FIGURE 21.7. 95 per cent Con�dence bands for kernel density esti-
mate for the astronomy data.
21.4 Nonparametric Regression 381
Dimension Sample Size
1 4
2 19
3 67
4 223
5 768
6 2790
7 10,700
8 43,700
9 187,000
10 842,000
This is bad news indeed. It says that having 842,000 observa-
tions in a ten dimensional problem is really like having 4 obser-
vations.
21.4 Nonparametric Regression
Consider pairs of points (x1; Y1); : : : ; (xn; Yn) related by
Yi = r(xi) + �i (21.35)
where E (�i) = 0 and we are treating the xi's as �xed. In nonpara-
metric regression, we want to estimate the regression function
r(x) = E (Y jX = x).
There are many nonparametric regression estimators. Most
involve estimating r(x) by taking some sort of weighted average
of the Yi's, giving higher weight to those points near x. In par-
ticular, the Nadaraya-Watson kernel estimator is de�ned
by
382 21. Nonparametric Curve Estimation
br(x) = nXi=1
wi(x)Yi (21.36)
where the weights wi(x) are given by
wi(x) =K�x�xi
h
�Pn
j=1K�x�xj
h
� : (21.37)
The form of this estimator comes from �rst estimating the
joint density f(x; y) using kernel density estimation and then
inserting this into:
r(x) = E (Y jX = x) =
Zyf(yjx)dy =
Ryf(x; y)dyRf(x; y)dy
:
Theorem 21.19 Suppose that V(�i) = �2. The risk of the Nadaraya-
Watson kernel estimator is
R(brn; r) � h4
4
�Zx2K2(x)dx
�4 Z �r00(x) + 2r0(x)
f 0(x)
f(x)
�2
dx
+
Z�2RK2(x)dx
nhf(x)dx: (21.38)
The optimal bandwidth decreases at rate n�1=5 and with this
choice the risk decreases at rate n�4=5.
In practice, to choose the bandwidth h we minimize the cross
validation score
bJ(h) = nXi=1
(Yi � br�i(xi))2 (21.39)
where br�i is the estimator we get by omitting the ith variable.
An approximation to bJ is given by
bJ(h) = nXi=1
(Yi � br(xi))2 1�1� K(0)Pn
j=1K
�xi�xj
h
��2
: (21.40)
21.4 Nonparametric Regression 383
Example 21.20 Figures 21.8 shows cosmic microwave background
(CMB) data from BOOMERaNG (Netter�eld et al. 2001), Max-
ima (Lee et al. 2001) and DASI (Halverson 2001). The data
consist of n pairs (x1; Y1); : : : ; (xn; Yn) where xi is called the
multipole moment and Yi is the estimated power spectrum of
the temperature uctuations. What your seeing are sound waves
in the cosmic microwave background radiation which is the heat,
left over from the big bang. If r(x) denotes the true power spec-
trum then
Yi = r(xi) + �i
where �i is a random error with mean 0. The location and size
of peaks in r(x) provides valuable clues about the behavior of
the early universe. Figure 21.8 shows the �t based on cross-
validation as well as an undersmoothed and oversmoothed �t.
The cross-validation �t shows the presence of three well de�ned
peaks, as predicted by the physics of the big bang. �
The procedure for �nding con�dence bands is similar to that
for density estimation. However, we �rst need to estimate �2.
Suppose that the xi's are ordered. Assuming r(x) is smooth, we
have r(xi+1)� r(xi) � 0 and hence
Yi+1 � Yi =hr(xi+1) + �i+1
i�hr(xi) + �i
i� �i+1 � �i
and hence
V(Yi+1 � Yi) � V(�i+1 � �i) = V(�i+1) + V(�i) = 2�2:
We can thus use the average of the n � 1 di�erences Yi+1 � Yi
to estimate �2. Hence, de�ne
b�2 = 1
2(n� 1)
n�1Xi=1
(Yi+1 � Yi)2: (21.41)
384 21. Nonparametric Curve Estimation
200 400 600 800 1000
010
0020
0030
0040
0050
0060
00
Undersmoothed
200 400 600 800 1000
010
0020
0030
0040
0050
0060
00
Oversmoothed
200 400 600 800 1000
010
0020
0030
0040
0050
0060
00
Just Right (Using cross−valdiation)
20 40 60 80 100 120
3e+0
54e
+05
5e+0
56e
+05
7e+0
58e
+05
bandwidth
estim
ated
risk
FIGURE 21.8. Regression analysis of the CMB data. The �rst �t is
undersmoothed, the second is oversmoothed and the third is based
on cross-validation. The last panel shows the estimated risk versus
the bandwidth of the smoother. The data are from BOOMERaNG,
Maxima and DASI.
21.4 Nonparametric Regression 385
Con�dence Bands for Kernel Regression
Follow the same procedure as for kernel density esti-
mators except, change the de�nition of se (v) to
se (v) = b�vuut nX
i=1
w2(xi) (21.42)
where b� is de�ned in (21.41).
Example 21.21 Figure 21.9 shows a 95 per cent con�dence enve-
lope for the CMB data. We see that we are highly con�dence of
the existence and position of the �rst peak. We are more uncer-
tain about the second and third peak. At the time of this writing,
more accurate data are becoming available that apparently pro-
vide sharper estimates of the second and third peak. �
The extension to multiple regressors X = (X1; : : : ; Xp) is
straightforward. As with kernel density estimation we just re-
place the kernel with a multivariate kernel. However, the same
caveats about the curse of dimensionality apply. In some cases,
we might consider putting some restrictions on the regression
function which will then reduce the curse of dimensionality. For
example, additive regression is based on the model
Y =
pXj=1
rj(Xj) + �: (21.43)
Now we only need to �t p one-dimensional functions. The model
can be enriched by adding various interactions, for example,
Y =
pXj=1
rj(Xj) +Xj<k
rjk(XjXk) + �: (21.44)
Additive models are usually �t by an algorithm called back�t-
ting.
386 21. Nonparametric Curve Estimation
200 400 600 800 1000
−100
00
1000
2000
3000
4000
5000
6000
FIGURE 21.9. 95 per cent con�dence envelope for the CMB data.
21.5 Appendix: Con�dence Sets and Bias 387
Back�tting
1. Initialize r1(x1), . . . , rp(xp).
2. For j = 1; : : : ; p:
(a) Let �i = Yi �P
s6=j rs(xi).
(b) Let rj be the function estimate obtained by regressing
the �i's on the jth covariate.
3. If converged STOP. Else, go back to step 2.
Additive models have the advantage that they avoid the curse
of dimensionality and they can be �t quickly but they have one
disadvantage: the model is not fully nonparametric. In other
words, the true regression function r(x) may not be of the form
(21.43).
21.5 Appendix: Con�dence Sets and Bias
The con�dence bands we computed are not for the density func-
tion or regression function but rather, for the smoothed function.
For example, the con�dence band for a kernel density estimate
with bandwidth h is a band for the function one gets by smooth-
ing the true function with a kernel with the same bandwidth.
Getting a con�dence set for the true function is complicated for
reasons we now explain.
First, let's review the parametric case. When estimating a
scalar quantity � with an estimator b�, the usual con�dence in-
terval is of the form b��z�=2sn where b� is the maximum likelihood
estimator and sn =
qV(b�) is the estimated standard error of the
estimator. Under standard regularity conditions, b� � N(�; sn)
388 21. Nonparametric Curve Estimation
and
limn!1
P
�b� � 2 sn � � � b� + 2 sn
�= 1� �:
But let us take a closer look.
Let bn = E(b�n) � � We know that b� � N(�; sn) but a more
accurate statement is b� � N(� + bn; sn). We usually ignore the
bias term bn in large sample calculations but let's keep track of
it. The coverage of the usual con�dence interval is
P
�b� � z�=2sn � � � b� + z�=2sn
�= P
�z�=2 �
b� � �
sn� z�=2
!
= P
��z�=2 � N (bn; sn)
sn� z�=2
�= P
��z�=2 � N
�bn
sn; 1
�� z�=2
�:
In a well behaved parametric model, sn is of size n�1=2 bn is
of size n�1. Hence, bn=sn ! 0 so the last displayed probability
statement becomes P(�z�=2 � N(0; 1) � z�=2) = 1 � �. What
makes parametric con�dence intervals have the right coverage
is the fact that bn=sn ! 0.
The situation is more complicated for kernel methods. Con-
sider estimating a density f(x) at a single point x with a kernel
density estimator. Since bf(x) is a sum if iid random variables,
the central limit theorem implies that
bf(x) � N
�f(x) + bn(x);
c2f(x)
nh
�(21.45)
where
bn(x) =1
2h2f 00(x)c1 (21.46)
is the bias, c1 =Rx2K(x)dx and c2 =
RK2(x)dx. The esti-
mated standard error is
sn(x) =
(c2 bf(x)nh
)1=2
: (21.47)
21.6 Bibliographic Remarks 389
Suppose we use the usual interval bf(x) � z�=2sn. Arguing as
above, the coverage is approximately,
P
��z�=2 � N
�bn(x)
sn(x); 1
�� z�=2
�:
The optimal bandwidth is of the form h = cn�1=5 for some
constant c. If we plug h = cn�1=5 into (21.46) and(21.47) we
see that bn(x)=sn(x) does not tend to 0. Thus, the con�dence
interval will have coverage less than 1� �.
21.6 Bibliographic Remarks
Two very good books on curve estimation are Scott (1992) and
Silverman (1986). I drew heavily on those books for this Chap-
ter.
21.7 Exercises
1. Let X1; : : : ; Xn � f and let bfn be the kernel density esti-
mator using the boxcar kernel:
K(x) =
�1 �1
2< x < 1
2
0 otherwise:
(a) Show that
E ( bf(x)) = 1
h
Zx+(h=2)
x�(h=2)
f(y)dy
and
V( bf(x)) = 1
nh2
24Z x+(h=2)
x�(h=2)
f(y)dy � Z
x+(h=2)
x�(h=2)
f(y)dy
!235 :
390 21. Nonparametric Curve Estimation
(b) Show that if h ! 0 and nh ! 1 as n ! 1 thenbfn(x) P�! f(x).
2. Get the data on fragments of glass collected in forensic
work. You need to download the MASS library for R then:
library(MASS)
data(fgl)
x <- fgl[[1]]
help(fgl)
The data are also on my website. Estimate the density
of the �rst variable (refractive index) using a histogram
and use a kernel density estimator. Use cross-validation
to choose the amount of smoothing. Experiment with dif-
ferent binwidths and bandwidths. Comment on the simi-
larities and di�erences. Construct 95 per cent con�dence
bands for your estimators.
3. Consider the data from question 2. Let Y be refractive
index and let x be aluminium content (the fourth vari-
able). Do a nonparametric regression to �t the model Y =
f(x) + �. Use cross-validation to estimate the bandwidth.
Construct 95 per cent con�dence bands for your estimate.
4. Prove Lemma 21.1.
5. Prove Theorem 21.3.
6. Prove Theorem 21.7.
7. Prove Theorem 21.14.
21.7 Exercises 391
8. Consider regression data (x1; Y1); : : : ; (xn; Yn). Suppose that
0 � xi � 1 for all i. De�ne bins Bj as in equation (21.7).
For x 2 Bj de�ne brn(x) = Y j
where Y j is the mean of all the Yi's corresponding to those
xi's in Bj. Find the approximate risk of this estimator.
From this expression for the risk, �nd the optimal band-
width. At what rate does the risk go to zero?
9. Show that with suitable smoothness assumptions on r(x),b�2 in equation (21.41) is a consistent estimator of �2.
This is page 393
Printer: Opaque this
22
Smoothing Using OrthogonalFunctions
In this Chapter we study a di�erent approach to nonparametric
curve estimation based on orthogonal functions. We begin
with a brief introduction to the theory of orthogonal functions.
Then we turn to density estimation and regression.
22.1 Orthogonal Functions and L2 Spaces
Let v = (v1; v2; v3) denote a three dimensional vector, that is,
a list of three real numbers. Let V denote the set of all such
vectors. If a is a scalar (a number) and v is a vector, we de�ne
av = (av1; av2; av3). The sum of vectors v and w is de�ned by
v+w = (v1+w1; v2+w2; v3+w3). The inner product between
two vectors v and w is de�ned by hv; wi =P3
i=1 viwi. The norm
(or length) of a vector v is de�ned by
jjvjj =phv; vi =
vuut 3Xi=1
v2i : (22.1)
394 22. Smoothing Using Orthogonal Functions
Two vectors are orthogonal (or perpendicular) if hv; wi = 0.
A set of vectors are orthogonal if each pair in the set is orthog-
onal. A vector is normal if jjvjj = 1.
Let �1 = (1; 0; 0), �2 = (0; 1; 0), �3 = (0; 0; 1). These vectors
are said to be an orthonormal basis for V since they have the
following properties:
(i) they are orthogonal;
(ii) they are normal;
(iii) they form a basis for V which means that if v 2 V then v
can be written as a linear combination of �1, �2, �3:
v =
3Xj=1
�j�j where �j = h�j; vi: (22.2)
For example, if v = (12; 3; 4) then v = 12�1+3�2+4�3. There
are other orthonormal bases for V, for example,
1 =
�1p3;1p3;1p3
�; 2 =
�1p2;� 1p
2; 0
�; 3 =
�1p6;1p6;� 2p
6
�:
You can check that these three vectors also form an orthonormal
basis for V. Again, if v is any vector then we can write
v =
3Xj=1
�j j where �j = h j; vi:
For example, if v = (12; 3; 4) then
v = 10:97 1 + 6:36 2 + 2:86 3:
Now we make the leap from vectors to functions. Basically, we
just replace vectors with functions and sums with integrals. Let
L2(a; b) denote all functions de�ned on the interval [a; b] such
thatR b
af(x)2dx <1:
L2[a; b] =
�f : [a; b]! R;
Z b
a
f(x)2dx <1�: (22.3)
22.1 Orthogonal Functions and L2 Spaces 395
We sometimes write L2 instead of L2(a; b). The inner product
between two functions f; g 2 L2 is de�ned byRf(x)g(x)dx. The
norm of f is
jjf jj =sZ
f(x)2dx: (22.4)
Two functions are orthogonal ifRf(x)g(x)dx = 0. A function
is normal if jjf jj = 1.
A sequence of functions �1; �2; �3; : : : is orthonormal ifR�2j(x)dx =
1 for each j andR�i(x)�j(x)dx = 0 for i 6= j. An orthonor-
mal sequence is complete if the only function that is orthog-
onal to each �j is the zero function. In this case, the functions
�1; �2; �3; : : : form in basis, meaning that if f 2 L2 then f can The equality in
the displayed
equation means
thatR(f(x) �
fn(x))2dx ! 0
where fn(x) =Pn
j=1 �j�j(x).
be written as
f(x) =
1Xj=1
�j�j(x); where �j =
Z b
a
f(x)�j(x)dx:
(22.5)
Parseval's relation says that
jjf jj2 �Zf2(x)dx =
1Xj=1
�2j � jj�jj2 (22.6)
where � = (�1; �2; : : :).
Example 22.1 An example of an orthonormal basis for L2[0; 1]
is the cosine basis de�ned as follows. Let �1(x) = 1 and for
j � 2 de�ne
�j(x) =p2 cos((j � 1)�x): (22.7)
The �rst six functions are plotted in Figure 22.1. �
Example 22.2 Let
f(x) =px(1� x) sin
�2:1�
(x+ :05)
�
396 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
0.6
0.8
1.0
1.2
1.4
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
FIGURE 22.1. The �rst six functions in the cosine basis.
22.1 Orthogonal Functions and L2 Spaces 397
which is called the \doppler function." Figure 22.2 shows f (top
left) and its approximation
fJ(x) =
JXj=1
�j�j(x)
with J equal to 5 (top right), 20 (bottom left) and 200 (bottom
right). As J increases we see that fJ(x) gets closer to f(x). The
coeÆcients �j =R 1
0f(x)�j(x)dx were computed numerically. �
Example 22.3 The Legendre polynomials on [�1; 1] are de-
�ne by
Pj(x) =1
2jj!
dj
dxj(x2 � 1)j; j = 0; 1; 2; : : : (22.8)
It can be shown that these functions are complete and orthogonal
and that Z 1
�1
P2j (x)dx =
2
2j + 1: (22.9)
It follows that the functions �j(x) =p(2j + 1)=2Pj(x), j =
0; 1; : : : form an orthonormal basis for L2(�1; 1). The �rst few
Legendre polynomials are
P0(x) = 1
P1(x) = x
P2(x) =1
2
�3x2 � 1
�P3(x) =
1
2
�5x3 � 3x
�:
These polynomials may be constructed explicitly using the fol-
lowing recursive relation:
Pj+1(x) =(2j + 1)xPj(x)� jPj�1(x)
j + 1: � (22.10)
398 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
−0.1
0.0
0.1
0.2
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
FIGURE 22.2. Approximating the doppler function
f(x) =px(1� x) sin
�2:1�
(x+:05)
�with its expansion in
the cosine basis. The function f (top left) and its ap-
proximation fJ(x) =PJ
j=1 �j�j(x) with J equal to 5
(top right), 20 (bottom left) and 200 (bottom right). The
coeÆcients �j =R 1
0f(x)�j(x)dx were computed numer-
ically.
22.2 Density Estimation 399
The coeÆcients �1; �2; : : : are related to the smoothness of
the function f . To see why, note that if f is smooth, then
its derivatives will be �nite. Thus we expect that, for some k,R 1
0(f (k)(x))2dx <1 where f (k) is ther kth derivative of f . Now
consider the cosine basis (22.7) and let f(x) =P1
j=1 �j�j(x).
Then, Z 1
0
(f (k)(x))2dx = 2
1Xj=1
�2j (�(j � 1))2k:
The only way thatP1
j=1 �2j (�(j � 1))2k can be �nite is if the
�j's get small when j gets large. To summarize:
If the function f is smooth then the coe�cients �j will
be small when j is large.
For the rest of this chapter, assume we are using the cosine
basis unless otherwise speci�ed.
22.2 Density Estimation
Let X1; : : : ; Xn be iid observations from a distribution on [0; 1]
with density f . Assuming f 2 L2 we can write
f(x) =
1Xj=1
�j�j(x)
where �1; �2; : : : is an orthonormal basis. De�ne
b�j = 1
n
nXi=1
�j(Xi): (22.11)
Theorem 22.4 The mean and variance of b�j areE
�b�j� = �j (22.12)
V
�b�j� =�2j
n(22.13)
400 22. Smoothing Using Orthogonal Functions
where
�2j = V(�j(Xi)) =
Z(�j(x)� �j)
2f(x)dx: (22.14)
Proof. We have
E
�b�j� =1
n
nXi=1
E (�j(Xi))
= E (�j(X1))
=
Z�j(x)f(x)dx = �j:
The calculation for the variance is similar. �
Hence, b�j is an unbiased estimate of �j. It is tempting to
estimate f byP1
j=1b�j�j(x) but this turns out to have a very
high variance. Instead, consider the estimator
bf(x) = JXj=1
b�j�j(x): (22.15)
The number of terms J is a smoothing parameter. Increasing J
will decrease bias while increasing variance. For technical rea-
sons, we restrict J to lie in the range
1 � J � p
where p = p(n) =pn. To emphasize the dependence of the risk
function on J , we write the risk function as R(J).
Theorem 22.5 The risk of bf is given by
R(J) =
JXj=1
�2j
n+
1Xj=J+1
�2j : (22.16)
In kernel estimation, we used cross-validation to estimate the
risk. In the orthogonal function approach, we instead use the
risk estimator
bR(J) = JXj=1
b�2jn
+
pXj=J+1
�b�2j �
b�2jn
�+
(22.17)
22.2 Density Estimation 401
where a+ = maxfa; 0g and
b�2j = 1
n� 1
nXi=1
��j(Xi)� b�j�2 : (22.18)
To motivate this estimator, note that b�2j is an unbiased estmate
of �2j and b�2j � b�2j is an unbiased estimator of �2
j . We take the
positive part of the latter term since we know that �2j cannot be
negative. We now choose 1 � bJ � p to minimize bR( bf; f). Hereis a summary.
Summary of Orthogonal Function Density Estimation
1. Let b�j = 1
n
nXi=1
�j(Xi):
2. Choose bJ to minimize bR(J) over 1 � J � p =pn where
bR(J) = JXj=1
b�2jn
+
pXj=J+1
�b�2j �
b�2jn
�+
and b�2j = 1
n� 1
nXi=1
��j(Xi)� b�j�2 :
3. Let bf(x) = bJXj=1
b�j�j(x):
The estimator bfn can be negative. If we are interested in
exploring the shape of f , this is not a problem. However, if
we need our estimate to be a probability density function, we
402 22. Smoothing Using Orthogonal Functions
can truncate the estimate and then normalize it. That, we takebf � = maxf bfn(x); 0g= R 1
0maxf bfn(u); 0gdu.
Now let us construct a con�dence band for f . Suppose we
estimate f using J orthogonal functions. We are essentially
estimating ef(x) =PJ
j=1 �j�j(x) not the true density f(x) =P1
j=1 �j�j(x). Thus, the con�dence band should be regarded as
a band for ef(x).Theorem 22.6 An approximate 1 � � con�dence band for ef is
(`(x); u(x)) where
`(x) = bfn(x)� c; u(x) = bfn(x) + c (22.19)
where
c =JK
2
pn
s1 +
p2z�pJ
(22.20)
and
K = max1�j�J
maxxj�j(x)j:
For the cosine basis, K =p2.
Example 22.7 Let
f(x) =5
6�(x; 0; 1) +
1
6
5Xj=1
�(x;�j; :1)
where �(x;�; �) denotes a Normal density with mean � and
standard deviation �, and (�1; : : : ; �5) = (�1;�1=2; 0; 1=2; 1).Marron and Wand (1990) call this \the claw" although I prefer
the \Bart Simpson." Figure 22.3 shows the true density as well
as the estimated density based on n = 5; 000 observations and
a 95 per cent con�dence band. The density has been rescaled to
have most of its mass between 0 and 1 using the transformation
y = (x + 3)=6.
22.2 Density Estimation 403
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
FIGURE 22.3. The top plot is the true density for the Bart Simpson
distribution (rescaled to have most of its mass between 0 and 1). The
bottom plot is the orthogonal function density estimate and 95 per
cent con�dence band.
404 22. Smoothing Using Orthogonal Functions
22.3 Regression
Consider the regression model
Yi = r(xi) + �i; i = 1; : : : ; n (22.21)
where the �i are independent with mean 0 and variance �2. We
will initially focus on the special case where xi = i=n. We assume
that r 2 L2[0; 1] and hence we can write
r(x) =
1Xj=1
�j�j(x) where �j =
Z 1
0
r(x)�j(x)dx (22.22)
where �1; �2; : : : where is an orthonormal basis for [0; 1].
De�ne b�j = 1
n
nXi=1
Yi �j(xi); j = 1; 2; : : : (22.23)
Since b�j is an average, the central limit theorem tells us that b�jwill be approximately normally distributed.
Theorem 22.8 b�j � N
��j;
�2
n
�: (22.24)
Proof. The mean of b�j isE (b�j) =
1
n
nXi=1
E (Yi)�j(xi) =1
n
nXi=1
r(xi)�j(xi)
�Zr(x)�j(x)dx = �j
where the approximate equality follows from the de�nition of a
Riemann integral:P
i�nh(xi) !R 1
0h(x)dx where �n = 1=n.
The variance is
V(b�j) =1
n2
nXi=1
V(Yi)�2j(xi)
22.3 Regression 405
=�2
n2
nXi=1
�2j(xi) =
�2
n
1
n
nXi=1
�2j(xi)
� �2
n
Z�2j(x)dx =
�2
n
sinceR�2j(x)dx = 1. �
As we did for density estimation, we will estimate r by
br(x) = JXj=1
b�j�j(x):Let
R(J) = E
Z(r(x)� br(x))2 dx
be the risk of the estimator.
Theorem 22.9 The risk R(J) of the estimator brn(x) =PJj=1b�j�j(x)
is
R(J) =J�
2
n+
1Xj=J+1
�2j : (22.25)
To estimate for �2 = V ar(�i) we use
b�2 = n
k
nXi=n�k+1
b�2j (22.26)
where k = n=4. To motivate this estimator, recall that if f is
smooth then �j � 0 for large j. So, for j � k, b�j � N(0; �2=n).
Thus, b�j � �b�j=pn for for j � k, where b�j � N(0; 1). There-
fore,
b�2 =n
k
nXi=n�k+1
b�2j
� n
k
nXi=n�k+1
��pn
b�j�2
406 22. Smoothing Using Orthogonal Functions
=�2
k
nXi=n�k+1
b�2j
=�2
k�2k
since a sum of k Normals has a �2k distribution. Now E (�2
k) = k
and hence E (b�2) � �2. Also, V(�2
k) = 2k and hence V(b�2) �(�4=k2)(2k) = (2�4=k)! 0 as n!1. Thus we expect b�2 to bea consistent estimator of �2. There is nothing special about the
choice k = n=4. Any k that increases with n at an appropriate
rate will suÆce.
We estimate the risk with
bR(J) = Jb�2n
+
nXj=J+1
�b�2j �
b�2n
�+
: (22.27)
We are now ready to give a complete description of the method,
which Beran (2000) calls REACT (Risk Estimation and Adap-
tation by Coordinate Transformation).
22.3 Regression 407
Orthogonal Series Regression Estimator
1. Let b�j = 1
n
nXi=1
Yi�j(xi); j = 1; : : : ; n:
2. Let b�2 = n
k
nXi=n�k+1
b�2j (22.28)
where k � n=4.
3. For 1 � J � n, compute the risk estimate
bR(J) = Jb�2n
+
nXj=J+1
�b�2j �
b�2n
�+
:
4. Choose bJ 2 f1; : : : ng to minimize bR(J).5. Let bf(x) = bJX
j=1
b�j�j(x):
Example 22.10 Figure 22.4 shows the doppler function f and
n = 2; 048 observations generated from the model
Yi = r(xi) + �i
where xi = i=n, �i � N(0; (:1)2). Then �gure shows the true
function, the data, the estimated function and the estimated risk.
The estimated �gure was based on bJ = 234 terms.
Finally, we turn to con�dence bands. As before, these bands
are not really for the true function r(x) but rather for the
smoothed version of the function er(x) =P bJ
j=1 �j�j(x).
408 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
−0.5
0.0
0.5
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
0 500 1000 1500 2000
0.00
0.02
0.04
0.06
0.08
J
Estim
ated
Risk
FIGURE 22.4. The doppler test function.
22.3 Regression 409
Theorem 22.11 Suppose the estimate br is based on J terms andb� is de�ned as in equation (22.28). Assume that J < n� k+1.
An approximate 1� � con�dence band for er is (`; u) where
`(x) = brn(x)�K
pJ
sz�b�pn+Jb�2n
(22.29)
u(x) = brn(x) +K
pJ
sz�b�pn+Jb�2n
(22.30)
where
K = max1�j�J
maxxj�j(x)j
and
b� 2 = 2Jb�4n
�1 +
J
k
�(22.31)
and k = n=4 is from equation (22.28). In the cosine basis, we
may use K =p2.
Example 22.12 Figure 22.5 shows the con�dence envelope for the
doppler signal. Th �rst plot is based on J = 234 (the value of
J that minimizes the estimated risk). The second is based on
J = 45 � pn. Larger J yields a higher resolution estimator
at the cost of large con�dence bands. Smaller J yields a lower
resolution estimator but has tighter con�dence bands. �
So far, we have assumed that the xi's are of the form f1=n; 2=n; : : : ; 1g.If the xi's are on interval [a; b] then we can rescale them so that
are in the interval [0; 1]. If the xi's are not equally spaced the
methods we have discussed still apply so long as the xi's \�ll
out" the interval [0,1] in such a way so as to not be too clumped
together. If we want to treat the xi's as random instead of �xed,
then the method needs signi�cant modi�cations which we shall
not deal with here.
410 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
−1.0
−0.5
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
−1.0
−0.5
0.0
0.5
1.0
FIGURE 22.5. The con�dence envelope for the doppler
test function using n = 2; 048 observations. The top plot
shows the estimate and envelope using J = 234 terms.
The bottom plot shows the estimate and envelope using
J = 45 terms.
22.4 Wavelets 411
22.4 Wavelets
Suppose there is a sharp jump in a regression function f at some
point x but that f is otherwise very smooth. Such a function
f is said to be spatially inhomogeneous. See Figure 22.6 for
an example.
It is hard to estimate f using the methods we have discussed
so far. If we use a cosine basis and only keep low order terms, we
will miss the peak; if we allow higher order terms we will �nd
the peak but we will make the rest of the curve very wiggly.
Similar comments apply to kernel regression. If we use a large
bandwidth, then we will smooth out the peak; if we use a small
bandwidth, then we will �nd the peak but we will make the rest
of the curve very wiggly.
One way to estimate inhomogeneous functions is to use a more
carefully chosen basis that allows us to place a \blip" in some
small region without adding wiggles elsewhere. In this section,
we describe a special class of bases called wavelets, that are
aimed at �xing this problem. Statistical inference suing wavelets
is a large and active area. We will just discuss a few of the main
ideas to get a avor of this approach.
We start with a particular wavelet called the Haar wavelet.
The Haar father wavelet or Haar scaling function is de-
�ned by
�(x) =
�1 if 0 � x < 1
0 otherwise:(22.32)
The mother Haar wavelet is de�ned by
(x) =
� �1 if 0 � x � 12;
1 if 12< x � 1:
(22.33)
For any integers j and k de�ne
�j;k(x) = 2j=2�(2jx�k) and j;k(x) = 2j=2 (2jx�k): (22.34)The function j;k has the same shape as but it has been
rescaled by a factor of 2j=2 and shifted by a factor of k.
412 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
0.4
0.6
0.8
1.0
1.2
1.4
FIGURE 22.6. An inhomogeneous function. The func-
tion is smooth except for a sharp peak in one place.
22.4 Wavelets 413
See Figure 22.7 for some examples of Haar wavelets. Notice
that for large j, j;k is a very localized function. This makes it
possible to add a blip to a function in one place without adding
wiggles elsewhere. Increasing j is like looking in a microscope at
increasing degrees of resolution. In technical terms, we say that
wavelets provide a multiresolution analysis of L2(0; 1).
Let
Wj = f jk; k = 0; 1; : : : ; 2j � 1g
be the set of rescaled and shifted mother wavelets at resolution
j.
Theorem 22.13 The set of functionsn�;W0;W1;W2; : : : ;
ois an orthonormal basis for L2(0; 1).
It follows from this theorem that we can expand any function
f 2 L2(0; 1) in this basis. Because each Wj is itself a set of
functions, we write the expansion as a double sum:
f(x) = ��(x) +
1Xj=0
2j�1Xk=0
�j;k j;k(x) (22.35)
where
� =
Z 1
0
f(x)�(x) dx; �j;k =
Z 1
0
f(x) j;k(x) dx:
We call � the scaling coeÆcient and the �j;k's are called
the detail coeÆcients. We call the �nite sum
ef(x) = ��(x) +
J�1Xj=0
2j�1Xk=0
�j;k j;k(x) (22.36)
414 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
−4−2
02
4
0.0 0.2 0.4 0.6 0.8 1.0
−4−2
02
4
0.0 0.2 0.4 0.6 0.8 1.0
−4−2
02
4
0.0 0.2 0.4 0.6 0.8 1.0
−4−2
02
4
FIGURE 22.7. Some Haar wavelets. Top left: the father
wavelet �(x); top right: the mother wavelet (x); bot-
tom left: 2;2(x); bottom right: 4;10(x).
22.4 Wavelets 415
the resolution J approximation to f . The total number of
terms in this sum is
1 +
J�1Xj=0
2j = 1 + 2J � 1 = 2J :
Example 22.14 Figure 22.8 shows the doppler signal, and its res-
olution J approximation for J = 3; 5 and J = 8. �
Haar wavelets are localized, meaning that they are zero out-
side an interval. But they are not smooth. This raises the ques-
tion of whether there exist smooth, localized wavelets that from
an orthonormal basis. In 1988, Ingrid Daubechie showed that
such wavelets do exist. These smooth wavelets are diÆcult to
describe. They can be constructed numerically but there is no
closed form formual for the smoother wavelets. To keep things
simple, we will continue to use Haar wavelets. For your interest,
�gure 22.9 shows an example of a smooth wavelet called the
\symmlet 8."
We can now use wavelets to do density estimation and re-
gression. We shall only discuss the regression problem Yi =
r(xi) + ��i where �i � N(0; 1) and xi = i=n. To simplify the
discussion we assume that n = 2J for some J .
There is one major di�erence between estimation using wavelets
instead of a cosine (or polynomial) basis. With the cosine basis,
we used all the terms 1 � j � J for some J . With wavelets,
we use a method called thresholding where we keep a term in
the function approximation if its coeÆcient is large, otherwise,
we throw we don't use that term. There are many versions of
thresholding. The simplest is called hard, universal threshold-
ing.
416 22. Smoothing Using Orthogonal Functions
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
x
f
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
FIGURE 22.8. The doppler signal and its reconstructionef(x) = ��(x) +PJ�1
j=0
Pk �j;k j;k(x) based on J = 3, J = 5 and
J = 8.
22.4 Wavelets 417
−2 0 2 4
−1.0
0.0
0.5
1.0
1.5
0 2 4 6
−0.2
0.2
0.6
1.0
FIGURE 22.9. The symmlet 8 wavelet. The top plot is the mother
and the bottom plot is the father.
418 22. Smoothing Using Orthogonal Functions
Haar Wavelet Regression
1. Let J = log2(n) and de�ne
b� =1
n
Xi
�k(xi)Yi and Dj;k =1
n
Xi
j;k(xi)Yi (22.37)
for 0 � j � J � 1.
2. Estimate � by
b� =pn� median
�jDJ�1;kj : k = 0; : : : ; 2J�1 � 1�
0:6745:
(22.38)
3. Apply universal thresholding:
b�j;k =(Dj;k if jDj;kj > b�q2 log n
n
0 otherwise:
)(22.39)
4. Set bf(x) = b��(x) + J�1Xj=j0
2j�1Xk=0
b�j;k j;k(x):
In practice, we do not compute Sk and Dj;k using (22.37). In-
stead, we use the discrete wavelet transform (DWT) which
is very fast. For Haar wavelets, the DWT works as follows.
22.4 Wavelets 419
DWT for Haar Wavelets
Let y be the vector of Yi's (length n) and let J =
log2(n). Create a list D with elements
D[[0]]; : : : ; D[[J � 1]]:
Set:
temp <� y=pn:
Then do:
for(j in (J � 1) : 0)fm <� 2j
I <� (1 : m)
D[[j]] <��temp[2 � I]� temp[(2 � I)� 1]
�=
p2
temp <��temp[2 � I] + temp[(2 � I)� 1]
�=
p2
g
Remark 22.15 If you use a programming language that does not
allow a 0 index for a list, then index the list as
D[[1]]; : : : ; D[[J ]]
and replace the second last line with
D[[j + 1]] <��temp[2 � I]� temp[(2 � I)� 1]
�=
p2
The estimate for � probably looks strange. It is similar to the
estimate we used for the cosine basis but it is designed to be
insensitive to sharp peaks in the function.
To understand the intuition behind universal thresholding,
consider what happens when there is no signal, that is, when
�j;k = 0 for all j and k.
420 22. Smoothing Using Orthogonal Functions
Theorem 22.16 Suppose that �j;k = 0 for all j and k and letb�j;k be the universal threshold estimator. Then
P(b�j;k = 0 for all j; k)! 1
as n!1.
Proof. To simplify the proof, assume that � is known. Now
Dj;k � N(0; �2=n). We will need Mill's inequality: if Z �N(0; 1) then P(jZj > t) � (c=t)e�t
2=2 where c =p2=� is a
constant. Thus,
P(max jDj;kj > �) �Xj;k
P(jDj;kj > �)
�Xj;k
P
�pnjDj;kj�
>
pn�
�
��
Xj;k
c�
�pnexp
��1
2
n�2
�2
�=
cp2 logn
! 0: �
Example 22.17 Consider Yi = r(xi) + ��i where f is the doppler
signal, � = :1 and n = 2048. Figure 22.10 shows the data and
the estimated function using universal thresholding. Of course,
the estimate is not smooth since Haar wavelets are not smooth.
Nonetheless, the estimate is quite accurate. �
22.5 Bibliographic Remarks
A reference for orthogonal function methods is Efromovich (1999).
See also Beran (2000) and Beran and D�umbgen (1998). An in-
troduction to wavelets is given in Ogden (1997). A more ad-
vanced treatment can be found in H�ardle, Kerkyacharian, Pi-
card and Tsybakov (1998). The theory of statistical estimation
22.5 Bibliographic Remarks 421
0.0 0.2 0.4 0.6 0.8 1.0
−0.5
0.0
0.5
0.0 0.2 0.4 0.6 0.8 1.0
−0.4
−0.2
0.0
0.2
0.4
FIGURE 22.10. Estimate of the Doppler function using Haar
wavelets and universal thresholding.
422 22. Smoothing Using Orthogonal Functions
using wavelets has been developed by many authors, especially
David Donoho and Ian Johnstone. There is a series of papers es-
pecially worth reading: Donoho and Johnstone (1994), Donoho
and Johnstone (1995a), Donoho and Johnstone (1995b), and
Donoho and Johnstone (1998).
22.6 Exercises
1. Prove Theorem 22.5.
2. Prove Theorem 22.9.
3. Let
1 =
�1p3;1p3;1p3
�; 2 =
�1p2;� 1p
2; 0
�; 3 =
�1p6;1p6;� 2p
6
�:
Show that these vectors have norm 1 and are orthogonal.
4. Prove Parseval's relation equation (22.6).
5. Plot the �rst �ve Legendre polynomials. Very, numerically,
that they are orthonormal.
6. Expand the following functions in the cosine basis on [0; 1].
For (a) and (b), �nd the coeÆcients �j analytically. For
(c) and (d), �nd the coeÆcients �j numerically, i.e.
�j =
Z 1
0
f(x)�j(x) � 1
N
NXr=1
f
�r
N
��j
�r
N
�for some large integerN . Then plot the partial sum
Pn
j=1 �j�j(x)
for increasing values of n.
(a) f(x) =p2 cos(3�x).
(b) f(x) = sin(�x).
(c) f(x) =P11
j=1 hjK(x�tj) where K(t) = (1+sign(t))=2,
22.6 Exercises 423
(tj) = (:1; :13; :15; :23; :25; :40; :44; :65; :76; :78; :81),
(hj) = (4;�5; 3;�4; 5;�4:2; 2:1; 4:3;�3:1; 2:1;�4:2).(d) f =
px(1� x) sin
�2:1�
(x+:05)
�:
7. Consider the glass fragments data. Let Y be refractive in-
dex and let X be aluminium content (the fourth variable).
(a) Do a nonparametric regression to �t the model Y =
f(x)+� using the cosine basis method. The data are not on
a regular grid. Ignore this when estimating the function.
(But do sort the data �rst.) Provide a function estimate,
an estimate of the risk and a con�dence band.
(b) Use the wavelet method to estimate f .
8. Show that the Haar wavelets are orthonormal.
9. Consider again the doppler signal:
f(x) =px(1� x) sin
�2:1�
x + 0:05
�:
Let n = 1024 and let (x1; : : : ; xn) = (1=n; : : : ; 1). Generate
data
Yi = f(xi) + ��i
where �i � N(0; 1).
(a) Fit the curve using the cosine basis method. Plot the
function estimate and con�dence band for J = 10; 20; : : : ; 100.
(b) Use Haar wavelets to �t the curve.
10. (Haar density Estimation.) Let X1; : : : ; Xn � f for some
density f on [0; 1]. Let's consider constructing a wavelet
histogram. Let � and be the Haar father and mother
wavelet. Write
f(x) � �(x) +
J�1Xj=0
2j�1Xk=0
�j;k j;k(x)
424 22. Smoothing Using Orthogonal Functions
where J � log2(n). Let
b�j;k = 1
n
nXi=1
j;k(Xi):
(a) Show that b�j;k is an unbiased estimate of �j;k.
(b) De�ne the Haar histogram
bf(x) = �(x) +
BXj=0
2j�1Xk=0
b�j;k j;k(x)
for 0 � B � J � 1.
(c) Find an approximate expression for the MSE as a func-
tion of B.
(d) Generate n = 1000 observations from a Beta (15,4)
density. Estimate the density using the Haar histogram.
Use leave-one-out cross vaidation to choose B.
11. In this question, we will explore the motivation for equa-
tion (22.38). Let X1; : : : ; Xn � N(0; �2). Let
b� =pn� median (jX1j; : : : ; jXnj)
0:6745:
(a) Show that E (b�) = �.
(b) Simulate n = 100 observations from a N(0,1) distribu-
tion. Compute b� as well as the usual estimate of �. Repeat
1000 times and compare the MSE.
(c) Repeat (b) but add some outliers to the data. To do
this, simulate each observation from a N(0,1) with prob-
ability .95 and simulate each observation from a N(0,10)
with probability .95.
12. Repeat question 6 using the Haar basis.
This is page 425
Printer: Opaque this
23
Classi�cation
23.1 Introduction
The problem of predicting a discrete random variable Y from an-
other random variable X is called classi�cation, supervised
learning, discrimination or pattern recognition.
In more detail, consider iid data (X1; Y1); : : : ; (Xn; Yn) where
Xi = (Xi1; : : : ; Xid) 2 X � Rd
is a d-dimensional vector and Yi takes values in some �nite set
Y. A classi�cation rule is a function h : X ! Y. When we
observe a new X, we predict Y to be h(X).
Example 23.1 Here is a an example with fake data. Figure 23.1
shows 100 data points. The covariateX = (X1; X2) is 2-dimensional
and the outcome Y 2 Y = f0; 1g. The Y values are indicated on
the plot with the triangles representing Y = 1 and the squares
representing Y = 0. Also shown is a linear classi�cation rule
represented by the solid line. This is a rule of the form
h(x) =
�1 if a+ b1x1 + b2x2 > 0
0 otherwise:
426 23. Classi�cation
Everything above the line is classi�ed as a 0 and everything below
the line is classi�ed as a 1. �
Example 23.2 The Coronary Risk-Factor Study (CORIS) data
involve 462 males between the ages of 15 and 64 from three ru-
ral areas in South Africa (Rousseauw et al 1983, Hastie and
Tibshirani, 1987). The outcome Y is the presence (Y = 1) or
absence (Y = 0) of coronary heart disease. There are 9 covari-
ates: systolic blood pressure, cumulative tobacco (kg), ldl (low
density lipoprotein cholesterol), adiposity, famhist (family his-
tory of heart disease), typea (type-A behavior), obesity, alco-
hol (current alcohol consumption), and age. Figure 23.2 shows
the outcomes and two of the covariates, systolic blood pressure
and tobaco consumption. People with/without heart disease are
coded with triangles and squares. Also shown is a linear deci-
sion boundary which we will explain shortly. In this example,
the groups are much harder to tell apart. In fact, 141 people are
misclassi�ed using this classi�cation rule.
At this point, it is worth revisiting the Statistics/Data Mining
dictionary:
Statistics Computer Science Meaning
classi�cation supervised learning predicting a discrete Y from X
data training sample (X1; Y1); : : : ; (Xn; Yn)
covariates features the Xi's
classi�er hypothesis map h : X ! Yestimation learning �nding a good classi�er
In most cases in this Chapter, we deal with the case Y =
f0; 1g.
23.1 Introduction 427
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x1
x2
FIGURE 23.1. Two covariates and a linear decision
boundary. 4 means Y = 1. � means Y = 0. You won't
see real data like this.
428 23. Classi�cation
100 120 140 160 180 200 220
05
10
15
20
25
30
Systolic Blood Pressure
To
ba
cco
FIGURE 23.2. Two covariates and and a linear deci-
sion boundary with data from the Coronary Risk-Factor
Study (CORIS). 4 means Y = 1. � means Y = 0. This
is real life.
23.2 Error Rates and The Bayes Classi�er 429
23.2 Error Rates and The Bayes Classi�er
Our goal is to �nd a classi�cation rule h that makes accurate
predictions. We start with the following de�nitions. One can use other
measures of error as
well.De�nition 23.3 The true error rate of a classi�er h is
L(h) = P(fh(X) 6= Y g) (23.1)
and the empirical error rate or training error rate
is bLn(h) =1
n
nXi=1
I(h(Xi) 6= Yi): (23.2)
First we consider the special case where Y = f0; 1g. Let
r(x) = E (Y jX = x) = P(Y = 1jX = x)
denote the regression function. From Bayes' theorem we have
that
r(x) =�f1(x)
�f1(x) + (1� �)f0(x)(23.3)
where
f0(x) = f(xjY = 0)
f1(x) = f(xjY = 1)
and � = P(Y = 1).
De�nition 23.4 The Bayes classi�cation rule h� is de-
�ned to be
h�(x) =
�1 if r(x) > 1
2
0 otherwise:(23.4)
The set D(h) = fx : P(Y = 1jX = x) = P(Y = 0jX =
x)g is called the decision boundary.
430 23. Classi�cation
Warning! The Bayes rule has nothing to do with Bayesian
inference. We could estimate the Bayes rule using either fre-
quentist or Bayesian methods.
The Bayes' rule may be written in several equivalent forms:
h�(x) =
�1 if P(Y = 1jX = x) > P(Y = 0jX = x)
0 otherwise(23.5)
and
h�(x) =
�1 if �f1(x) > (1� �)f0(x)
0 otherwise:(23.6)
Theorem 23.5 The Bayes rule is optimal, that is, if h is any
other classi�cation rule then L(h�) � L(h).
The Bayes rule depends on unknown quantities so we need to
use the data to �nd some approximation to the Bayes rule. At
the risk of oversimplifying, there are three main approaches:
1. Empirical Risk Minimization. Choose a set of classi�ers Hand �nd bh 2 H that minimizes some estimate of L(h).
2. Regression. Find an estimate br of the regression function
r and de�ne
bh(x) = � 1 if br(x) > 12
0 otherwise:
3. Density Estimation. Estimate f0 from the Xi's for which
Yi = 0, estimate f1 from the Xi's for which Yi = 1 and letb� = n�1P
n
i=1 Yi. De�ne
br(x) = bP(Y = 1jX = x) =b� bf1(x)b� bf1(x) + (1� b�) bf0(x)
and bh(x) = � 1 if br(x) > 12
0 otherwise:
23.3 Gaussian and Linear Classi�ers 431
Now let us generalize to the case where Y takes on more than
two values as follows.
Theorem 23.6 Suppose that Y 2 Y = f1; : : : ; Kg. Theoptimal rule is
h(x) = argmaxk P(Y = kjX = x) (23.7)
= argmaxk�k fk(x) (23.8)
where
P(Y = kjX = x) =fk(x)�kPrfr(x)�r
; (23.9)
�r = P (Y = r), fr(x) = f(xjY = r) and argmaxk means
\the value of k that maximizes that expression."
23.3 Gaussian and Linear Classi�ers
Perhaps the simplest approach to classi�cation is to use the den-
sity estimation strategy and assume a parametric model for the
densities. Suppose that Y = f0; 1g and that f0(x) = f(xjY = 0)
and f1(x) = f(xjY = 1) are both multivariate Gaussians:
fk(x) =1
(2�)d=2j�kj1=2exp
��12(x� �k)
T��1k(x� �k)
�; k = 0; 1:
Thus, XjY = 0 � N(�0;�0) and XjY = 1 � N(�1;�1).
432 23. Classi�cation
Theorem 23.7 If XjY = 0 � N(�0;�0) and XjY = 1 �N(�1;�1), then the Bayes rule is
h�(x) =
(1 if r21 < r20 + 2 log
��1
�0
�+ log
�j�0j
j�1j
�0 otherwise
(23.10)
where
r2i= (x� �i)
T��1i(x� �i); i = 1; 2 (23.11)
is the Manalahobis distance. An equivalent way of ex-
pressing the Bayes' rule is
h(x) = argmaxkÆk(x)
where
Æk(x) = �1
2log j�kj �
1
2(x� �k)
T��1k(x� �k) + log�k
(23.12)
and jAj denotes the determinant of a matrix A.
The decision boundary of the above classi�er is quadratic so
this procedure is often called quadratic discriminant analy-
sis (QDA). In practice, we use sample estimates of �; �1; �2;�0;�1
in place of the true value, namely:
b�0 =1
n
nXi=1
(1� Yi); b�1 = 1
n
nXi=1
Yi
b�0 =1
n0
Xi: Yi=0
Xi; b�1 =1
n1
Xi: Yi=1
Xi
S0 =1
n0
Xi: Yi=0
(Xi � b�0)(Xi � b�0)T ; S1 =
1
n1
Xi: Yi=1
(Xi � b�1)(Xi � b�1)T
where n0 =P
i(1� Yi) and n1 =
PiYi.
23.3 Gaussian and Linear Classi�ers 433
A simpli�cation occurs if we assume that �0 = �0 = �. In
that case, the Bayes rule is
h(x) = argmaxkÆk(x) (23.13)
where now
Æk(x) = xT��1�k �1
2�Tk��1 + log �k: (23.14)
The parameters are estimated as before except that the mle of
� is
S =n0S0 + n1S1
n0 + n1:
The classi�cation rule is
h�(x) =
�1 if Æ1(x) > Æ0(x)
0 otherwise(23.15)
where
Æj(x) = xTS�1b�j � 1
2b�TjS�1b�j + log b�j
is called the discriminant function. The decision boundary
fx : Æ0(x) = Æ1(x)g is linear so this method is called linear
discrimination analysis (LDA).
Example 23.8 Let us return to the South African heart disease
data. The decision rule in Figure 23.2 was obtained by linear
discrimination. The outcome was
classi�ed as 0 classi�ed as 1
y = 0 277 25
y = 1 116 44
The observed misclassi�cation rate is 141=462 = :31. Including
all the covariates reduces the error rate to .27. The results from
quadratic discrimination are
classi�ed as 0 classi�ed as 1
y = 0 272 30
y = 1 113 47
434 23. Classi�cation
which has about the same error rate 143=462 = :31. Including
all the covariates reduces the error rate to .26. In this example,
there is little advantage to QDA over LDA.
Now we generalize to the case where Y takes on more than
two values.
Theorem 23.9 Suppose that Y 2 f1; : : : ; Kg. If fk(x) =f(xjY = k) is Gaussian, the Bayes rule is
h(x) = argmaxkÆk(x)
where
Æk(x) = �1
2log j�kj �
1
2(x� �k)
T��1k(x� �k) + log �k:
(23.16)
If the variances of the Gaussians are equal then
Æk(x) = xT��1�k �1
2�Tk��1 + log�k: (23.17)
We estimate Æk(x) by by inserting estimates of �k, �k and �k.
There is another version of linear discriminant analysis due to
Fisher. The idea is to �rst reduce the dimension of covariates to
one dimension by projecting the data onto a line. Algebraically,
this meansd replacing the covariate X = (X1; : : : ; Xd) with a
linear combination U = wTX =P
d
j=1wjXj. The goal is to
choose the vector w = (w1; : : : ; wd) that \best separates the
data." Then we perform classi�cation with the new covariate Z
instead of X.
We need de�ne what we mean by separation of the groups.
We would like the two groups to have means that are far apart
relative to their spread. Let �j denote the mean of X for Yj
and let � be the variance matrix of X. Then E (U jY = j) =
E (wTXjY = j) = wT�j and V(U) = wT�w. De�ne the separa-The quantity J
arises in physics,
where it is called the
Rayleigh coeÆcient.
23.3 Gaussian and Linear Classi�ers 435
tion by
J(w) =(E (U jY = 0)� E (U jY = 1))2
wT�w
=(wT�0 � wT�1)
2
wT�w
=wT (�0 � �1)(�0 � �1)
Tw
wT�w:
We estimate J as follows. Let njP
n
i=1 I(Yi = j) be the number
of observations in group j, let Xj be the sample mean vector of
the X's for group j, and let Sj be the sample covariance matrix
in group j. De�ne bJ(w) = wTSBw
wTSWw(23.18)
where
SB = (X0 �X1)(X0 �X1)
SW =(n0 � 1)S0 + (n1 � 1)S1
(n0 � 1) + (n1 � 1):
Theorem 23.10 The vector
w = S�1W(X0 �X1) (23.19)
is a minimizer of bJ(w). We call
U = wTX = (X0 �X1)TS�1
WX (23.20)
the Fisher linear discriminant function. The midpoint m
between X0 and X1 is
m =1
2(X0 +X1) =
1
2(X0 �X1)
TS�1B(X0 +X1) (23.21)
Fisher's classi�cation rule is
h(x) =
�0 if wTX � m
1 if wTX < m
=
�0 if (X0 �X1)
TS�1Wx � m
1 if (X0 �X1)TS�1
Wx < m:
Fisher's rule is the same as the Bayes linear classi�er in equa-
tion (23.14) when b� = 1=2.
436 23. Classi�cation
23.4 Linear Regression and Logistic Regression
Amore direct approach to classi�cation is to estimate the regres-
sion function r(x) = E (Y jX = x) without bothering to estimste
the densities fk. For the rest of this section, we will only con-
sider the case where Y = f0; 1g. Thus, r(x) = P(Y = 1jX = x)
and once we have an estimate br, we will use the classi�cation
rule
h(x) =
�1 if br(x) > 1
2
0 otherwise:(23.22)
The simplest regression model is the linear regression model
Y = r(x) + � = �0 +
dXj=1
�jXj + � (23.23)
where E (�) = 0. This model can't be correct since it does not
force Y = 0 or 1. Nonetheless, it can sometimes lead to a decent
classi�er.
Recall that the least squares estimate of � = (�0; �1; : : : ; �d)T
minimizes the residual sums of squares
RSS(�) =
nXi=1
�Yi � �0 �
dXj=1
Xij�j
�2:
Let us brie y review what that estimator is. Let X denote the
N � (d+ 1) matrix of the form
X =
266641 X11 : : : X1d
1 X21 : : : X2d
......
......
1 Xn1 : : : Xnd
37775 :Also let Y = (Y1; : : : ; Yn)
T . Then,
RSS(�) = (Y �X�)T (Y �X�)
23.4 Linear Regression and Logistic Regression 437
and the model can be written as
Y = X� + �
where � = (�1; : : : ; �n)T . The least squares solution b� that min-
imizes RSS can be found by elementary calculus and is given
by b� = (XTX)�1XTY:
The predicted values are
bY = Xb�:Now we use (23.22) to classify, where br(x) = b�0 +Pj
b�jxj.It seems sensible to use a regression method that takes into
account that Y 2 f0; 1g. The most common method for doing
so is logistic regression. Recall that the model is
r(x) = P (Y = 1jX = x) =e�0+
Pj �jxj
1 + e�0+P
j �jxj: (23.24)
We may write this is
logit P (Y = 1jX = x) = �0 +Xj
�jxj
where logit (a) = log(a=(1� a)). Under this model, each Yi is a
Bernoulli with success probability
pi(�) =e�0+
Pj �jXij
1 + e�0+P
j �jXij:
The likelihood function for the data set is
L(�) =nYi=1
pi(�)Yi(1� pi(�))
1�Yi:
We obtain the mle numerically.
438 23. Classi�cation
Example 23.11 Let us return to the heart disease data. A logistic
regression yields the following estimates and Wald tests for the
coeÆcients:Covariate Estimate Std. Error Wald statistic p-value
Intercept -6.145 1.300 -4.738 0.000
sbp 0.007 0.006 1.138 0.255
tobacco 0.079 0.027 2.991 0.003
ldl 0.174 0.059 2.925 0.003
adiposity 0.019 0.029 0.637 0.524
famhist 0.925 0.227 4.078 0.000
typea 0.040 0.012 3.233 0.001
obesity -0.063 0.044 -1.427 0.153
alcohol 0.000 0.004 0.027 0.979
age 0.045 0.012 3.754 0.000Are surprised by the fact that systolic blood pressure is not
signi�cant or by the minus sign for the obesity coeÆcient? If
yes, then you are confusing correlation and causation. There
are lots of unobserved confounding variables so the coeÆcients
above cannot be interpreted causally. The fact that blood pres-
sure is not signi�cant does not mean that blood pressure is not an
important cause of heart disease. It means that it is not an im-
portant predictor of heart disease relative to the other variables
in the model. The error rate, using this model for classi�cation,
is .27. The error rate from a linear regression is .26.
We can get a better classi�er by �tting a richer model. For
example we could �t
logit P (Y = 1jX = x) = �0 +Xj
�jxj +Xj;k
�jkxjxk: (23.25)
More generally, we could add terms of up to order r for some
integer r. Large values of r give a more complicated model which
should �t the data better. But there is a bias-variance tradeo�
which we'll discuss later.
Example 23.12 If we use model (23.25) for the heart disease data
with r = 2, the error rate is reduced to .22.
23.5 Relationship Between Logistic Regression and LDA 439
The logistic regression model can easily be extended to k
groups but we shall not give the details here.
23.5 Relationship Between Logistic Regression and
LDA
LDA and logistic regression are almost the same thing. If we
assume that each group is Gaussian with the same covariance
matrix then we saw that
log
�P(Y = 1jX = x)
P(Y = 0jX = x)
�= log
��0
�1
�� 1
2(�0 + �1)
T��1(�1 � �0) + xT��1(�1 � �0)
� �0 + �Tx:
On the other hand, the logistic model is, by assumption,
log
�P(Y = 1jX = x)
P(Y = 0jX = x)
�= �0 + �Tx:
These are the same model since they both lead to classi�cation
rules that are linear in x. The di�erence is in how we estimate
the parameters.
The joint density of a single observation is f(x; y) = f(xjy)f(y) =f(yjx)f(x). In LDA we estimated the whole joint distribution by
estimating f(xjy) and f(y); speci�cally, we estimated fk(x) =
f(xjY = k), �1 = fY (1) and �0 = fY (0). We maximized the
likelihood Yi
f(xi; yi) =Yi
f(xijyi)| {z }Gaussian
Yi
f(yi)| {z }Bernoulli
: (23.26)
In logistic regression we maximized the conditional likelihoodQif(yijxi) but we ignored the second term f(xi):Y
i
f(xi; yi) =Yi
f(yijxi)| {z }logistic
Yi
f(xi)| {z }ignored
: (23.27)
440 23. Classi�cation
Since classi�cation only requires knowing f(yjx), we don't reallyneed to estimate the whole joint distribution. Here is an analogy:
we can estimate the mean � of a distribution with the sample
mean X without bothering to estimate the whole density func-
tion. Logistic regression leaves the marginal distribution f(x)
un-speci�ed so it is more nonparametric than LDA.
To summarize: LDA and logistic regression both lead to a
linear classi�cation rule. In LDA we estimate the entire joint
distribution f(x; y) = f(xjy)f(y). In logistic regression we only
estimate f(yjx) and we don't bother estimating f(x).
23.6 Density Estimation and Naive Bayes
Recall that the Bayes rule is h(x) = argmaxk �k fk(x): If we
can estimate �k and fk then we can estimate the Bayes classi-
�cation rule. Estimating �k is easy but what about fk? We did
this previously by assuming fk was Gaussian. Another strategy
is to estimate fk with some nonparametric density estimatorbfk such as a kernel estimator. But if x = (x1; : : : ; xd) is high
dimensional, nonparametric density estimation is not very reli-
able. This problem is ameliorated if we assume that X1; : : : ; Xd
are independent, for then, fk(x1; : : : ; xd) =Q
d
j=1 fkj(xj). This
reduces the problem to d one-dimensional density estimation
problems, within each of the k groups. The resulting classi�er
is called the naive Bayes classi�er. The assumption that the
components of X are independent is usually wrong yet the re-
sulting classi�er might still be accurate. Here is a summary of
the steps in the naive Bayes classi�er:
23.7 Trees 441
The Naive Bayes Classi�er
1. For each group k, compute an estimate bfkj of the densityfkj for Xj, using the data for which Yi = k.
2. Let bfk(x) = bfk(x1; : : : ; xd) = dYj=1
bfkj(xj):3. Let b�k = 1
n
nXi=1
I(Yi = k)
where I(Yi = k) = 1 if Yi = k and I(Yi = k) = 0 if Yi 6= k.
4. Let
h(x) = argmaxk b�k bfk(x):The naive Bayes classi�er is especially popular when x is high
dimensional and discrete. In that case, bfkj(xj) is especially sim-
ple.
23.7 Trees
Trees are classi�cation methods that partition the covariate
space X into disjoint pieces and then classify the observations
according to which partition element they fall in. As the name
implies, the classi�er can be represented as a tree.
For illustration, suppose there are two covariates, X1 = age
and X2 = blood pressure. Figure 23.3 shows a classi�cation tree
using these variables.
The tree is used in the following way. If a subject has Age
� 50 then we classify him as Y = 1. If a subject has Age < 50
then we check his blood pressure. If blood pressure is < 100
442 23. Classi�cation
0 1
Blood Pressure 1
Age
< 100 � 100
< 50 � 50
FIGURE 23.3. A simple classi�cation tree.
then we classify him as Y = 1, otherwise we classify him as
Y = 0. Figure 23.4 shows the same classi�er as a partition of
the covariate space.
Here is how a tree is constructed. For simplicity, we focus on
the case in which 2 Y = f0; 1g. First, suppose there is only a
single covariate X. We choose a split point t that divides the
real line into two sets A1 = (�1; t] and A2 = (t;1). Let bps(j)be the proportion of observations in As such that Yi = j:
bps(j) = Pn
i=1 I(Yi = j;Xi 2 As)Pn
i=1 I(Xi 2 As)(23.28)
for s = 1; 2 and j = 0; 1. The impurity of the split t is de�ned
to be
I(t) =
2Xs=1
s (23.29)
where
s = 1�1X
j=0
bps(j)2: (23.30)
23.7 Trees 443
1
0
1
Age
BloodPressure
50
110
FIGURE 23.4. Partition representation of classi�cation tree.
444 23. Classi�cation
This measure of impurity is known as the Gini index. If a
partition element As contains all 0's or all 1's, then s = 0.
Otherwise, s > 0. We choose the split point t to minimize the
impurity. (Other indices of impurity besides can be used besides
the Gini index.)
When there are several covariates, we choose whichever co-
variate and split that leads to the lowest impurity. This process
is continued until some stopping criterion is met. For example,
we might stop when every partition element has fewer than n0
data points, where n0 is some �xed number. The bottom nodes
of the tree are called the leaves. Each leaf is assigned a 0 or 1
depending on whether there are more data points with Y = 0
or Y = 1 in that partition element.
This procedure is easily generalized to the case where Y 2f1; : : : ; Kg: We simply de�ne the impurity by
s = 1�kX
j=1
bps(j)2 (23.31)
where bpi(j) is the proportion of observations in the partition
element for which Y = j.
Example 23.13 Figure 23.5 shows a classi�cation tree for the
heart disease data. The misclassi�cation rate is .21. Suppose we
do the same example but only using two covariates: tobacco and
age. The misclassi�cation rate is then .29. The tree is shown in
Figure 23.6. Because there are only two covariates, we can also
display the tree as a partition of X = R2 as shown in Figure
23.7. �
Our description of how to build trees is incomplete. If we keep
splitting until there are few cases in each leaf of the tree, we are
likely to over�t the data. We should choose the complexity of
the tree in such a way that the estimated true error rate is low.
In the next section, we discuss estimation of the error rate.
23.7 Trees 445
|age < 31.5
tobacco < 0.51
alcohol < 11.105
age < 50.5
typea < 68.5 famhist:a
tobacco < 7.605
typea < 42.5
adiposity < 24.435
adiposity < 28.955
ldl < 6.705
tobacco < 4.15
adiposity < 28
typea < 48
0
0 0 0 1
0
0 0
1 1
0 1
0
1
1
FIGURE 23.5. A classi�cation tree for the heart disease data.
446 23. Classi�cation
|age < 31.5
tobacco < 0.51 age < 50.5
tobacco < 7.470 0 0
0 1
FIGURE 23.6. A classi�cation tree for the heart disease
data using two covariates.
23.7 Trees 447
20 30 40 50 60
05
10
15
20
25
30
age
tob
acco
0
00
0
1
FIGURE 23.7. The tree pictured as a partition of the
covariate space.
448 23. Classi�cation
23.8 Assessing Error Rates and Choosing a Good
Classi�er
How do we choose a good classi�er? We would like to have a
classi�er h with a low true error rate L(h). Usually, we can't use
the training error rate bLn(h) as an estimate of the true error rate
because it is biased downward.
Example 23.14 Consider the heart disease data again. Suppose
we �t a sequence logistic regression models. In the �rst model we
include one covariate. In the second model we include two co-
variates and so on. The ninth model includes all the covariates.
We can go even further. Let's also �t a tenth model that in-
cludes all nine covariates plus the �rst covariate squared. Then
we �t an eleventh model that includes all nine covariates plus the
�rst covariate squared and the second covariate squared. Con-
tinuing this way we will get a sequence of 18 classi�ers of in-
creasing complexity. The solid line in Figure 23.8 shows the ob-
served classi�cation error which steadily decreases as we make
the model more complex. If we keep going, we can make a model
with zero observed classi�cation error. The dotted line shows the
cross-validation estimate of the error rate (to be explained
shortly) which is a better estimate of the true error rate than the
observed classi�cation error. The estimated error decreases for
a while then increases. This is the bias-variance tradeo� phe-
nomenon we have seen before. �
There are many ways to estimate the error rate. We'll consider
two: cross-validation and probability inequalities.
Cross-Validation. The basic idea of cross-validation, which
we have already encountered in curve estmation, is to leave out
some of the data when �tting a model. The simplest version of
cross-validation involves randomply splitting the data into two
23.8 Assessing Error Rates and Choosing a Good Classi�er 449
5 10 15
0.2
60
.28
0.3
00
.32
0.3
4
Number of terms in the model
err
or
rate
FIGURE 23.8. Solid line is the observed error rate and
dashed line is the cross-validation estimate of true error
rate.
450 23. Classi�cation
Training Data T Validation Data V
| {z }bh | {z }bLFIGURE 23.9. Cross-validation.
pieces: the training set T and the validation set V. Often,about 10 per cent of the data might be set aside as the valida-
tion set. The classi�er h is constructed from the training set.
We then estimate the error by
bL(h) = 1
m
XXi2V
I(h(Xi) 6= YI): (23.32)
where m is the size of the validation set. See Figure 23.9.
Another approach to cross-validation isK-fold cross-validation
which is obtained from the following algorithm.
23.8 Assessing Error Rates and Choosing a Good Classi�er 451
K-fold cross-validation.
1. Randomly divide the data intoK chunks of approximately
equal size. A common choice is K = 10.
2. For k = 1 to K do the following:
(a) Delete chunk k from the data.
(b) Compute the classi�er bh(k) from the rest of the data.
(c) Use bh(k) to the predict the data in chunk k. Let bL(k)
denote the observed error rate.
3. Let bL(h) = 1
K
KXk=1
bL(k): (23.33)
This is how dotted line in Figure 23.8 was computed. An al-
ternative approach to K-fold cross-validation is to split the data
into two pieces called the training set and the test set. All
the models are �t on the training set. The models are then used
to do predictions on the test set and the errors are estimated
from these predictions. When the clasi�cation procedure is time
consuming and there are lots of data, this approach is preferred
to K-fold cross-validation.
Cross-validation can be applied to any classi�cation method.
To apply it to trees, one begins by �tting an initial tree. Smaller
trees are obtained by pruning tree, that is removing splits from
the tree. We can do this for trees of various sizes where size refers
to the number of terminal nodes on the tree. Cross-validation is
then used to estimate error rate as a function of tree size.
Example 23.15 Figure 23.10 shows the cross-validation plot for
a classi�cation tree �tted to the heart disease data. The error
452 23. Classi�cation
is shown as a function of tree size. Figure 23.11 shows the tree
with the size chosen by minimizing the cross-validation error. �
Probability Inequalities. Another approach to estimat-
ing the error rate is to �nd a con�dence interval for bLn(h) using
probability inequalities. This method is useful in the context of
empirical risk minimization.
Let H be a set of classi�ers, for example, all linear classi�ers.
Empirical risk minimization means choosing the classi�er bh 2 Hto minimize the training error bLn(h), also called the empirical
risk. Thus,
bh = argminh2HbLn(h) = argminh2H
1
n
Xi
I(h(Xi) 6= Yi)
!:
(23.34)
Typically, bLn(bh) underestimates the true error rate L(bh) becausebh was chosen to make bLn(bh) small. Our goal is to assess how
much underestimation is taking place. Our main tool for this
analysis isHoe�ding's inequality. Recall that ifX1; : : : ; Xn �Bernoulli(p), then, for any � > 0,
P (jbp� pj > �) � 2e�2n�2
(23.35)
where bp = n�1P
n
i=1Xi.
First, suppose thatH = fh1; : : : ; hmg consists of �nitely many
classi�ers. For any �xed h, bLn(h) converges in almost surely
to L(h) by the law of large numbers. We will now establish a
stronger result.
Theorem 23.16 (Uniform Convergence.) AssumeH is �nite
and has m elements. Then,
P
�maxh2H
jbLn(h)� L(h)j > �
�� 2me�2n�
2
:
Proof. We will use Hoe�ding's inequality and we will also
use the fact that ifA1; : : : ; Am is a set of events then P(S
m
i=1Ai) �
23.8 Assessing Error Rates and Choosing a Good Classi�er 453
2 4 6 8 10 12 14
13
01
35
14
01
45
15
01
55
16
0
size
sco
re
FIGURE 23.10. Cross validation plot
454 23. Classi�cation
|age < 31.5
age < 50.5
typea < 68.5 famhist:a
tobacco < 7.605
0
0 1
0 1
1
FIGURE 23.11. Smaller classi�cation tree with size cho-
sen by cross-validation.
23.8 Assessing Error Rates and Choosing a Good Classi�er 455Pm
i=1 P(Ai). Now,
P
�maxh2H
jbLn(h)� L(h)j > �
�= P
[h2H
jbLn(h)� L(h)j > �
!�
XH2H
P
�jbLn(h)� L(h)j > �
��
XH2H
2e�2n�2
= 2me�2n�2
: �
Theorem 23.17 Let
� =
s2
nlog
�2m
�
�:
Then bLn(bh)� � is a 1� � con�dence interval for L(bh).Proof. This follows from the fact that
P (jbLn(bh)� L(bh)j > �) � P
�maxh2H
jbLn(bh)� L(bh)j > �
�� 2me�2n�
2
= �: �
When H is large the con�dence interval for L(bh) is large.
The more functions there are in H the more likely it is we have
\over�t" which we compensate for by having a larger con�dence
interval.
In practice we usually use sets H that are in�nite, such as the
set of linear classi�ers. To extend our analysis to these cases we
want to be able to say something like
P
�suph2H
jbLn(h)� L(h)j > �
�� something not too big:
All the other results followed from this inequality. One way
to develop such a generalization is by way of the Vapnik-
Chervonenkis or VC dimension. We now consider the main
ideas in VC theory.
456 23. Classi�cation
Let A be a class of sets. Give a �nite set F = fx1; : : : ; xng let
NA(F ) = #nF\
A : A 2 Ao
(23.36)
be the number of subsets of F \picked out" by A. Here #(B)denotes the number of elements of a set B. The shatter coef-
�cient is de�ned by
s(A; n) = maxF2Fn
NA(F ) (23.37)
where Fn consists of all �nite sets of size n. Now letX1; : : : ; Xn �P and let
Pn(A) =1
n
Xi
I(Xi 2 A)
denote the empirical probability measure. The following remark-
able theorem bounds the distance between P and Pn.
Theorem 23.18 (Vapnik and Chervonenkis (1971).) For any
P, n and � > 0,
P
�supA2A
jPn(A)� P(A)j > �
�� 8s(A; n)e�n�2=32: (23.38)
The proof, though very elegant, is long and we omit it. If His a set of classi�ers, de�ne A to be the class of sets of the form
fx : h(x) = 1g. When then de�ne s(H; n) = s(A; n).
Theorem 23.19
P
�suph2H
jbLn(h)� L(h)j > �
�� 8s(H; n)e�n�2=32:
A 1� � con�dence interval for L(bh) is bLn(bh)� �n where
�2n=
32
nlog
�8s(H; n)
�
�:
23.8 Assessing Error Rates and Choosing a Good Classi�er 457
These theorems are only useful if the shatter coeÆcients do
not grow too quickly with n. This is where VC dimension enters.
De�nition 23.20 The VC (Vapnik-Chervonenkis) dimen-
sion of a class of sets A is de�ned as follows. If s(A; n) =2n for all n set V C(A) = 1. Otherwise, de�ne V C(A)to be the largest k for which s(A; n) = 2k.
Thus, the VC-dimension is the size of the largest �nite set
F than can be shattered by A meaning that A picks out each
subset of F . IfH is a set of classi�ers we de�ne V C(H) = V C(A)where A is the class of sets of the form fx : h(x) = 1g as hvaries in H. The following theorem shows that if A has �nite
VC-dimension then the shatter coeÆcients grow as a polynomial
in n.
Theorem 23.21 If A has �nite VC-dimension v, then
s(A; n) � nv + 1:
Example 23.22 Let A = f(�1; a]; a 2 Rg. The A shatters
ever one point set fxg but it shatters no set of the form fx; yg.Therefore, V C(A) = 1.
Example 23.23 Let A be the set of closed intervals on the real
line. Then A shatters S = fx; yg but it cannot shatter sets with3 points. Consider S = fx; y; zg where x < y < z. One cannot
�nd an interval A such that ATS = fx; zg. So, V C(A) = 2.
Example 23.24 Let A be all linear half-spaces on the plane. Any
three point set (not all on a line) can be shattered. No 4 point
set can be shattered. Consider, for example, 4 points forming a
diamond. Let T be the left and rightmost points. This can't be
picked out. Other con�gurations can also be seen to be unshat-
terable. So V C(A) = 3. In general, halfspaces in Rd have VC
dimension d+ 1.
458 23. Classi�cation
Example 23.25 Let A be all rectangles on the plane with sides
parallel to the axes. Any 4 point set can be shattered. Let S be
a 5 point set. There is one point that is not leftmost, rightmost,
uppermost, or lowermost. Let T be all points in S except this
point. Then T can't be picked out. So V C(A) = 4.
Theorem 23.26 Let x have dimension d and let H be th set of
linear classi�ers. The VC dimension of H is d+1. Hence, a 1��con�dence interval for the true error rate is bL(bh)� � where
�2n=
32
nlog
�8(nd+1 + 1)
�
�:
23.9 Support Vector Machines
In this section we consider a class of linear classi�ers called sup-
port vector machines. Throughout this section, we assume
that Y is binary. It will be convenient to label the outcomes as
�1 and +1 instead of 0 and 1. A linear classi�er can then be
written as
h(x) = sign�H(x)
�where x = (x1; : : : ; xd),
H(x) = a0 +
dXi=1
aixi
and
sign(z) =
8<:�1 if z < 0
0 if z = 0
1 if z > 0:
First, suppose that the data are linearly separable, that
is, there exists a hyperplane that perfectly separates the two
classes.
23.9 Support Vector Machines 459
Lemma 23.27 The data can be separated by some hyperplane if
and only if there exists a hyperplane H(x) = a0+P
d
i=1 aixi such
that
YiH(xi) � 1; i = 1; : : : ; n: (23.39)
Proof. Suppose the data can be separated by a hyperplane
W (x) = b0+P
d
i=1 bixi. It follows that there exists some constant
c such that Yi = 1 implies W (Xi) � c and Yi = �1 implies
W (Xi) � �c. Therefore, YiW (Xi) � c for all i. Let H(x) =
a0+P
d
i=1 aixi where aj = bj=c. Then YiH(Xi) � 1 for all i. The
reverse direction is straightforward. �
In the separable case, there will be many separating hyper-
planes. How should we choose one? Intuitively, it seems reason-
able to choose the hyperplane \furthest" from the data in the
sense that it separates the +1'a and -1's and maximizes the dis-
tance to the closest point. This hyperplane is called the maxi-
mum margin hyperplane. The margin is the distance to from
the hyperplane to the nearest point. Points on the boundary of
the margin are called support vectors. See Figure 23.12.
Theorem 23.28 The hyperplane bH(x) = ba0+Pd
i=1 baixi that sep-arates the data and maximizes the margin is given by minimizing
(1=2)P
d
j=1 b2jsubject to (23.39).
It turns out that this problem can be recast as a quadratic
programming problem. Recall that hXi; Xki = XT
iXk is the in-
ner product of Xi and Xk.
Theorem 23.29 Let bH(x) = ba0 +Pd
i=1 baixi denote the optimal
(largest margin) hyperplane. Then, for j = 1; : : : ; d,
baj = nXi=1
b�iYiXj(i)
460 23. Classi�cation
bc
bc
bc
bc
bc
bc
bc
bc
bc
bc
bc
bc
bc
bc
H(x) = a0 + aTx = 0
FIGURE 23.12. The hyperplane H(x) has the largest margin of all
hyperplanes that separate the two classes.
where Xj(i) is the value of the covariate Xj for the ith data
point, and b� = (b�1; : : : ; b�n) is the vector that maximizes
nXi=1
�i �1
2
nXi=1
nXk=1
�i�kYiYkhXi; Xki (23.40)
subject to
�i � 0
and
0 =Xi
�iYi:
The points Xi for which b� 6= 0 are called support vectors. ba0can be found by solving
b�i�Yi(XT
iba+ b�0� = 0
for any support point Xi. bH may be written as
bH(x) = ba0 + nXi=1
b�iYihx;Xii:
23.10 Kernelization 461
There are many software packages that will solve this problem
quickly. If there is no perfect linear classi�er, then one allows
overlap between the groups by replacing the condition (23.39)
with
YiH(xi) � 1� �i; �i � 0; i = 1; : : : ; n: (23.41)
The variables �1; : : : ; �n are called slack variables.
We now maximize (23.40) subject to
0 � �i � c; i = 1; : : : ; n
andnXi=1
�iYi = 0:
The constant c is a tuning parameter that controls the amount
of overlap.
23.10 Kernelization
There is a trick for improving a computationally simple classi�er
h called kernelization. The idea is to map the covariate X {
which takes values in X { into a higher dimensional space Z and
apply the classi�er in the bigger space Z. This can yield a more
exible classi�er while retaining computationally simplicity.
The standard example of this idea is illustrated in Figure
23.13. The covariate x = (x1; x2). The Yi's can be separated
into two groups using an ellipse. De�ne a mapping � by
z = (z1; z2; z3) = �(x) = (x21;p2x1x2; x
22):
This � maps X = R2 into Z = R
3 . In the higher dimensional
space Z, the Yi's are seperable by a linear decision boundary.
In other words, a linear classi�er in a higher dimenional space
corresponds to a non-linear classi�er in the original space.
462 23. Classi�cation
x1
x2
+
+
+
++
+
+ +
+
+
+++
+ +
z1
z2
z3
+
+
+
+
+
+
+
+
�
FIGURE 23.13. Kernelization. Mapping the covariates into a higher
dimensional space can make a complicated decision boundary into a
simpler decision boundary.
23.10 Kernelization 463
There is a potential drawback. If we signicantly expand the
dimension of the problem, we might increase the computational
burden. For example, x has dimension d = 256 and we wanted
to use all fourth order terms, then z = �(x) has dimension
183,181,376. What saves us is two observations. First, many
classi�ers do not require that we know the values of the indi-
vidual points but, rather, just the inner product between pairs
of points. Second, notice in our example that the inner product
in Z can be written
hz; ezi = h�(x); �(ex)i= x21ex21 + 2x1ex1x2ex2 + x22ex22= (hx; exi)2� k(x; ex):
Thus, we can compute hz; ezi without ever computing Zi =
�(Xi).
To summarize, kernelization means involves �nding a map-
ping � : X ! Z and a classi�er such that:
1. Z has higher dimension than X and so leads a richer set
of classi�ers.
2. The classi�er only requires computing inner products.
3. There is a functionK, called a kernel, such that h�(x); �(ex)i =K(x; ex).
4. Everywhere the term hx; exi appears in the algorithm, re-
place it with K(x; ex).In fact, we never need to construct the mapping � at all.
We only need to specify a kernel k(x; ex) that corresponds to
h�(x); �(ex)i for some �. This raises an interesting question: given
a function of two variables K(x; y), does there exist a function
�(x) such that K(x; y) = h�(x); �(y)i. The answer is provided
464 23. Classi�cation
byMercer's theorem which says, roughly, that ifK is positive
de�nite { meaning thatZ ZK(x; y)f(x)f(y)dxdy � 0
for square integrable functions f { then such a � exists. Exam-
ples of commonly used kernels are:
polynomial K(x; ex) =�hx; exi+ a
�rsigmoid K(x; ex) = tanh(ahx; exi+ b)
Gaussian K(x; ex) = exp��jjx� exjj2=(2�2)
�Let us now see how we can use this trick in LDA and in
support vector machines.
Recall that the Fisher linear discriminant method replaces X
with U = wTX where w is chosen to maximize the Rayleigh
coeÆcient
J(w) =wTSBw
wTSWw;
SB = (X0 �X1)(X0 �X1)T
and
SW =
�(n0 � 1)S0
(n0 � 1) + (n1 � 1)
�+
�(n1 � 1)S1
(n0 � 1) + (n1 � 1)
�:
In the kernelized version, we replace Xi with Zi = �(Xi) and
we �nd w to maximize
J(w) =wT eSBwwT eSWw (23.42)
where eSB = (Z0 � Z1)(Z0 � Z1)T
and
SW =
(n0 � 1)eS0
(n0 � 1) + (n1 � 1)
!+
(n1 � 1)eS1
(n0 � 1) + (n1 � 1)
!:
23.10 Kernelization 465
Here, eSj is the sample of covariance of the Zi's for which Y =
j. However, to take advantage of kernelization, we need to re-
express this in terms of inner products and then replace the
inner products with kernels.
It can be shown that the maximizing vector w is a linear
combination of the Zi's. Hence we can write
w =
nXi=1
�iZi:
Also,
Zj =1
nj
nXi=1
�(Xi)I(Yi = j):
Therefore,
wTZj =� nXi=1
�iZi
�T� 1
nj
nXi=1
�(Xi)I(Yi = j)�
=1
nj
nXi=1
nXs=1
�iI(Ys = j)ZT
i�(Xs)
=1
nj
nXi=1
�i
nXs=1
I(Ys = j)�(Xi)T�(Xs)
=1
nj
nXi=1
�i
nXs=1
I(Ys = j)K(Xi; Xs)
= �TMj
where Mj is a vector whose ith component is
Mj(i) =1
nj
nXs=1
K(Xi; Xs)I(Yi = j):
It follows that
wT eSBw = �TM�
where M = (M0 �M1)(M0 �M1)T . By similar calcuations, we
can write
wT eSWw = �TN�
466 23. Classi�cation
where
N = K0
�I � 1
n01
�KT
0 +K1
�I � 1
n11
�KT
1 ;
I is the identity matrix, 1 is a matrix of all one's, and Kj is
the n� nj matrix with entries (Kj)rs = K(xr; xs) with xs vary-
ing over the observations in group j. Hence, we now �nd � to
maximize
J(�) =�TM�
�TN�:
Notice that all the qauntities are expressed in terms of the ker-
nel. Formally, the solution is � = N�1(M0 �M1). However, N
might be non-invertible. In this case on replaces N by N + bI
form some constant b. Finally, the projection onto the new sub-
space can be written as
U = wT�(x) =
nXi=1
�iK(xi; x):
The support vector machine can similarly be kernelized. We
simply replace hXi; Xji with K(Xi; Xj). For example, instead
of maximizing (23.40), we now maximize
nXi=1
�i �1
2
nXi=1
nXk=1
�i�kYiYkK(Xi; Xj): (23.43)
The hyperplane can be written as bH(x) = ba0+Pn
i=1 b�iYiK(X;Xi).
23.11 Other Classi�ers
There are many other classi�ers and space precludes a full di-
cussion of all of them. Let us brie y mention a few.
The k-nearest-neighbors classi�er is very simple. Given a
point x, �nd the k data points closest to x. Classify x using the
majority vote of these k neighbors. Ties can be broke randomly.
The parameter k can be chosen by cross-validation.
23.11 Other Classi�ers 467
Bagging is a method for reducing the variability of a clas-
si�er. It is most helpful for highly non-linear classi�ers such as
a tree. We draw B boostrap samples from the data. The bth
bootstrap sample yields a classi�er hb. The �nal classi�er is
bh(x) = � 1 if 1B
PB
b=1 hb(x) � 12
0 otherwise:
Boosting is a method for starting with a simple classi�er
and gradually improving it by re�tting the data giving higher
weight to misclassi�ed samples. Suppose thatH is a collection of
classi�ers, for example, tress with onely one split. Assume that
Yi 2 f�1; 1g and that each h is such that h(x) 2 f�1; 1g. We
usually give equal weight to all data points in the methods we
have discussed. But one can incorporate unequal weights quite
easily in most algorithms. For example, in constructing a tree,
we could replace the impurity measure with a weighted impurity
measure. The original version of bossting, called AdaBoost, is
as follows.
1. Set the weights wi = 1=n, i = 1; : : : ; n.
2. For j = 1; : : : ; J , do the following steps:
(a) Constructing a classi�er hj from the data using the
weights w1; : : : ; wn.
(b) Compute the weighted error estimate:
bLj =
Pn
i=1wiI(Yi 6= hj(Xi))Pn
i=1wi
:
(c) Let �j = log((1� bLj)=bLj).
(d) Update the weights:
wi � wie�jI(Yi 6=hj(Xi))
468 23. Classi�cation
3. The �nal classi�er is
bh(x) = sign� JXj=1
�jhj(x)�:
There is now an enormous literature trying to explain and
improve on boosting. Whereas bagging is a variance reduction
technique, boosting can be thought of as a bias reduction tech-
nique. We starting with a simple { and hence highly biased {
classi�er, and we gradually reduce the bias. The disadvantage
of boosting is that the �nal classi�er is quite complicated.
23.12 Bibliographic Remarks
An excellent reference on classi�cation is Hastie, Tibshirani and
Friedman (2001). For more on the theory, see Devroye, Gy�or�,
Lugosi (1996) and Vapnik (2001).
23.13 Exercises
1. Prove Theorem 23.5.
2. Prove Theorem 23.7.
3. Download the spam data from:
http://www-stat.stanford.edu/�tibs/ElemStatLearn/index.html
The data �le can also be found on the course web page.
The data contain 57 covariates relating to email messages.
Each email message was classi�ed as spam (Y=1) or not
spam (Y=0). The outcome Y is the last column in the �le.
The goal is to predict whether an email is spam or not.
23.13 Exercises 469
(a) Construct classi�cation rules using (i) LDA, (ii) QDA,
(iii) logistic regression and (iv) a classi�cation tree. For
each, report the observed misclassi�cation error rate and
construct a 2-by-2 table of the form
bh(x) = 0 bh(x) = 1
Y = 0
Y = 1
Hint: Here are some things to remember when using R.
(i) To �t a logistic regression and get the classi�ed values:
d <- data.frame(x,y)
n <- nrow(x)
out <- glm(y ~ .,family=binomial,data=d)
p <- out$fitted.values
pred <- rep(0,n)
pred[p > .5] <- 1
table(y,pred)
(ii) To build a tree you must turn y into a factor and you
must do it before you make the data frame:
y <- as.factor(y)
d <- data.frame(x,y)
library(tree)
out <- tree(y ~ .,data=d)
print(summary(out))
plot(out,type="u",lwd=3)
text(out)
pred <- predict(out,d,type="class")
table(y,pred)
470 23. Classi�cation
(b) Use 5-fold cross-validation to estimate the prediction
accuracy of LDA and logistic regression.
(c) Sometimes it helps to reduce the number of covariates.
One strategy is to compare Xi for the spam and email
group.. For each of the 57 covariates, test whether the
mean of the covriate is the same or di�erent between the
two groups. Keep the 10 covariates with the smallest p-
values. Try LDA and logstic regression using only these 10
variables.
4. Let A be the set of two-dimensional spheres. That is, A 2A if A = f(x; y) : (x�a)2+(y� b)2 � c2g for some a; b; c.
Find the VC dimension of A.
5. Classify the spam data using support vector machines.
Free software for the support vector machine is at
http://svmlight.joachims.org/
It is very easy to use. After installing, do the following.
First, you must recode the Yi's as +1 and -1. Then type
svm_learn data output
where \data" is the �le with the data. It expects the data
in the following form:
1 1:3.4 2:.17 3:129 etc
where the �rst number is Y (+1 or -1) and the rest of the
line means: X1 is 3.4, X2 is .17, etc.
If the �le \test" contains test data, then type
23.13 Exercises 471
svm_classify test output predictions
to get the predictions on the test data. There are many
options available, as explained on the web site.
6. Use VC theory to get a con�dence interval on the true
error rate of the LDA classi�er in question 3. Suppose
that the covariate X = (X1; : : : ; Xd) has d dimensions.
Assume that each variable has a continuous distribution.
7. Suppose that Xi 2 R and that Yi = 1 whenever jXij �1 and Yi = 0 whenever jXij > 1. Show that no linear
classi�er can perfectly classify these data. Show that the
kernelized data Zi = (Xi; X2i) can be linearly separated.
8. Repeat question 5 using the kernel K(x; ex) = (1 + xT ex)p.Choose p by cross-validation.
9. Apply the k nearest neighbors classi�er to the \iris data."
Get the data in R from the command data(iris). Choose
k by cross-validation.
10. (Curse of Dimensionality.) Suppose that X has a uniform
distribution on the d-dimensional cube [�1=2; 1=2]d. LetR be the distance from the origin to the closest neighbor.
Show that the median of R is0@�1�
�12
�1=n�vd(1)
1A1=d
where
vd(r) = rd�d=2
�((d=2) + 1)
is the volume of a sphere of radius r. When does R exceed
the edge of the cube when n = 100; 1000; 10000?
472 23. Classi�cation
11. Fit a tree to the data in question 3. Now apply bagging
and report your results.
12. Fit a tree that uses only one split on one variable to the
data in question 3. Now apply boosting.
13. Let r(x) = P(Y = 1jX = x) and let br(x) be an estimate
of r(x). Consider the classi�er
h(x) =
�1 if br(x) � 1=2
0 otherwise:
Assume that br(x) � N(r(x); �2(x)) for some functions
r(x) and �2(x)). Show that
P(Y 6= h(X)) � 1��
0B@sign�(br(x)� (1=2))(r(x)� (1=2))
��(x)
1CAwhere � is the standard Normal cdf. Regard sign
�(br(x)�
(1=2))(r(x)� (1=2))�as a type of bias term. Explain the
implications for the bias-variance trade-o� in classi�cation
(Friedman, 1997).
This is page 473
Printer: Opaque this
24
Stochastic Processes
24.1 Introduction
Most of this book has focused on iid sequences of random vari-
ables. Now we consider sequences of dependent random vari-
ables. For example, daily temperatures will form a sequence of
time-ordered random variables and clearly the temperature on
one day is not independent of the temperature on the previous
day.
A stochastic process fXt : t 2 Tg is a collection of ran-
dom variables. We shall sometimes write X(t) instead of Xt.
The variables Xt take values in some set X called the state
space. The set T is called the index set and for our pur-
poses can be thought of as time. The index set can be discrete
T = f0; 1; 2; : : :g or continuous T = [0;1) depending on the
application.
Example 24.1 (iid observations.) A sequence of iid random vari-
ables can be written as fXt : t 2 Tg where T = f1; 2; 3; : : : ; g.
474 24. Stochastic Processes
Thus, a sequence of iid random variables is an example of a
stochastic process. �
Example 24.2 (The Weather.) Let X = fsunny; cloudyg. A typi-
cal sequence (depending on where you live) might be
sunny; sunny; cloudy; sunny; cloudy; cloudy; � � �This process has a discrete state space and a discrete index set.
�
Example 24.3 (Stock Prices.) Figure 24.1 shows the price of a
stock over time. The price is monitored continuously so the index
set T is not discrete. Price is discrete but, for practical purposes
we can imagine treating it as a continuous variable. �
Example 24.4 (Empirical Distribution Function.) Let X1; : : : ; Xn �F where F is some cdf on [0,1]. Let
bFn(t) =1
n
nXi=1
I(Xi � t)
be the empirical cdf . For any �xed value t, bFn(t) is a random
variable. But the whole empirical cdfn bFn(t) : t 2 [0; 1]o
is a stochastic process with a continuous state space and a con-
tinuous index set. �
We end this section by recalling a basic fact. If X1; : : : ; Xn
are random variables then we can write the joint density as
f(x1; : : : ; xn) = f(x1)f(x2jx1) � � �f(xnjx1; : : : ; xn�1)
=
nYi=1
f(xijpasti) (24.1)
where pasti refers to all the variables before Xi.
24.1 Introduction 475
0 2 4 6 8 10
78
91
01
11
21
3
time
price
FIGURE 24.1. Stock price over ten week period.
476 24. Stochastic Processes
24.2 Markov Chains
The simplest stochastic process is a Markov chain in which the
distribution of Xt depends only on Xt�1. In this section we as-
sume that the state space is discrete, either X = f1; : : : ; Ngor X = f1; 2; : : : ; g and that the index set is T = f0; 1; 2; : : :g.Typically, most authors write Xn instead of Xt when discussing
Markov chains and I will do so as well.
De�nition 24.5 The process fXn : n 2 Tg is a Markov
chain if
P(Xn = x j X0; : : : ; Xn�1) = P(Xn = x j Xn�1) (24.2)
for all n and for all x 2 X .
For a Markov chain, equation (24.1) simplifes to
f(x1; : : : ; xn) = f(x1)f(x2jx1)f(x3jx2) � � � f(xnjxn�1):
A Markov chain can be represented by the following DAG:
X1 X2 X3 � � � Xn � � �
Each variable has a single parent, namely, the previous obser-
vation.
The theory of Markov chains is a very rich and complex. We
have to get through many de�nitions before we can do anything
interesting. Our goal is to answer the following questions:
1. When does a Markov chain \settle down" into some sort
of equilibrium?
2. How do we estimate the parameters of a Markov chain?
24.2 Markov Chains 477
3. How can we construct Markov chains that converge to a
given equilibrium distribution and why would we want to
do that?
We will answer questions 1 and 2 in this chapter. We will
answer question 3 in the next chapter. To understand question
1, look at the two chains in Figure 24.2. The �rst chain oscillates
all over the place and will continue to do so forever. The second
chain eventually settles into an equilibrium. If we constructed a
histogram of the �rst process, it would keep changing as we got
more and more observations. But a histogram from the second
chain would eventually converge to some �xed distribution.
Transition Probabilities.The key quantities of a Markov
chain are the probabilities of jumping from one state into an-
other state.
De�nition 24.6 We call
P(Xn+1 = jjXn = i) (24.3)
the transition probabilities. If the transition probabil-
ities do not change with time, we say the chain is homo-
geneous. In this case we de�ne pij = P(Xn+1 = jjXn =
i). The matrix P whose (i; j) element is pij is called the
transition matrix.
We will only consider homogeneous chains. Notice that P has
two properties: (i) pij � 0 and (ii)P
ipij = 1. Each row is
a probability mass function. A matrix with these properties is
called a stochastic matrix.
Example 24.7 (Random Walk With Absorbing Barriers.) Let X =
f1; : : : ; Ng. Suppose you are standing at one of these points.
478 24. Stochastic Processes
0 200 400 600 800 1000
−1
50
−1
00
−5
00
time
x
0 200 400 600 800 1000
−2
02
46
81
0
time
x
FIGURE 24.2. The �rst chain does not settle down into
an equilibrium. The second does.
24.2 Markov Chains 479
Flip a coin with P(Heads) = p and P(Tails) = q = 1 � p. If
it is heads, take one step to the right. If it is tails, take one
step to the left. If you hit one of the endpoints, stay there. The
transition matrix is
P =
26666664
1 0 0 0 � � � 0 0
q 0 p 0 � � � 0 0
0 q 0 p � � � 0 0
� � � � � � � � � � � � � � � � � � � � �0 0 0 0 q 0 p
0 0 0 0 0 0 1
37777775: �
Example 24.8 Suppose the state space is X = fsunny; cloudyg.Then X1; X2; : : : represents the weather for a sequence of days.
The weather today clearly depends on yesterday's weather. It
might also depend on the weather two days ago but as a �rst
approximation we might assume that the dependence is only one
day back. In that case the weather is a Markov chain and a
typical transition matrix might be
Sunny Cloudy
Sunny 0.4 0.6Cloudy 0.8 0.2
For example, if it is sunny today, there is a 60 per cent chance
it will be cloudy tommorrow. �
Let
pij(n) = P(Xm+n = jjXm = i) (24.4)
be the probability of of going from state i to state j in n steps.
Let Pn be the matrix whose (i; j) element is pij(n). These are
called the n-step transition probabilities.
Theorem 24.9 (The Chapman-Kolmogorov equations.) The
n-step probabilities satisfy
pij(m + n) =Xk
pik(m)pkj(n): (24.5)
480 24. Stochastic Processes
Proof. Recall that, in general,
P(X = x; Y = y) = P(X = x)P(Y = yjX = x):
This fact is true in the more general form
P(X = x; Y = yjZ = z) = P(X = xjZ = z)P(Y = yjX = x; Z = z):
Also, recall the law of total probability:
P(X = x) =Xy
P(X = x; Y = y):
Using these facts and the Markov property we have
pij(m + n) = P(Xm+n = jjX0 = i)
=Xk
P(Xm+n = j;Xm = kjX0 = i)
=Xk
P(Xm+n = jjXm = k;X0 = i)P(Xm = kjX0 = i)
=Xk
P(Xm+n = jjXm = k)P(Xm = kjX0 = i)
=Xk
pik(m)pkj(n): �
Look closely at equation (24.5). This is nothing more than the
equation for matrix multiplication. Hence we have shown that
Pm+n = PmPn: (24.6)
By de�nition, P1 = P. Using the above theorem, P2 = P1+1 =
P1P1 = PP = P2. Continuing this way, we see that
Pn = Pn � P�P� � � � �P| {z }
multiply the matrix n times
: (24.7)
24.2 Markov Chains 481
Let �n = (�n(1); : : : ; �n(N)) be a row vector where
�n(i) = P(Xn = i) (24.8)
is the marginal probability that the chain is in state i at time
n. In particular, �0 is called the initial distribution. To sim-
ulate a Markov chain, all you need to know is �0 and P. The
simulation would look like this:
Step 1: Draw X0 � �0. Thus, P(X0 = i) = �0(i).
Step 2: Suppose the outcome of step 1 is i. Draw X1 � P. In
other words, P(X1 = jjX0 = i) = pij.
Step 3: Suppose the outcome of step 2 is j. Draw X2 � P. In
other words, P(X2 = kjX1 = j) = pjk.
And so on.
It might be diÆcult to understand the meaning of �n. Imag-
ing simulating the chain many times. Collect all the outcomes at
time n from all the chains. This histogram would look approxi-
mately like �n. A consequence of theorem 24.9 is the following.
Lemma 24.10 The marginal probabilities are given by
�n = �0Pn:
Proof.
�n(j) = P(Xn = j)
=Xi
P(Xn = jjX0 = i)P (X0 = i)
=Xi
�0(i)pij(n) = �0Pn: �
482 24. Stochastic Processes
Summary
1. Transition matrix: P(i; j) = P(Xn+1 = jjXn = i).
2. n-step matrix: Pn(i; j) = P(Xn+m = jjXm = i).
3. Pn = Pn.
4. Marginal: �n(i) = P(Xn = i).
5. �n = �0Pn.
States. The states of a Markov chain can be classi�ed ac-
cording to various properties.
De�nition 24.11 We say that i reaches j (or j is ac-
cessible from i) if pij(n) > 0 for some n, and we write
i ! j. If i ! j and j ! i then we write i $ j and we
say that i and j communicate.
Theorem 24.12 The communication relation satis�es the fol-
lowing properties:
1. i$ i.
2. If i$ j then j $ i.
3. If i$ j and j $ k then i$ k.
4. The set of states X can be written as a disjoint union
of classes X = X1
SX2
S � � � where two states i and j
communicate with each other if and only if they are in the
same class.
24.2 Markov Chains 483
If all states communicate with each other then the chain is
called irreducible. A set of states is closed if, once you enter
that set of states you never leave. A closed set consisting of a
single state is called an absorbing state.
Example 24.13 Let X = f1; 2; 3; 4g and
P =
0BBBB@
13
23
0 023
13
0 014
14
14
14
0 0 0 1
1CCCCA
The classes are f1; 2g; f3g and f4g. State 4 is an absorbing state.
�
Suppose we start a chain in state i. Will the chain ever return
to state i? If so, that state is called persistent or recurrent.
De�nition 24.14 State i is recurrent or persistent if
P(Xn = i for some n � 1 j X0 = i) = 1:
Otherwise, state i is transient.
Theorem 24.15 A state i is recurrent if and only ifXn
pii(n) =1: (24.9)
A state i is transient if and only ifXn
pii(n) <1: (24.10)
Proof. De�ne
In =
�1 if Xn = i
0 if Xn 6= i:
484 24. Stochastic Processes
The number of times that the chain is in state i is Y =P1
n=0 In.
The mean of Y , given that the chain starts in state i, is
E (Y jX0 = i) =
1Xn=0
E (InjX0 = i)
=
1Xn=0
P(Xn = ijX0 = i)
=
1Xn=0
pii(n):
De�ne ai = P(Xn = i for some n � 1 j X0 = i). If i is recurrent,
ai = 1. Thus, the chain will eventually return to i. Once it
does return to i, we argue again that since ai = 1, the chain will
return to state i again. By repeating this argument, we conclude
that E (Y jX0 = i) =1. If i is transient, then ai < 1. When the
chain is in state i, there is a probability 1� ai > 0 that it will
never return to state i. Thus, the probability that the chain is
in state i exactly n times is an�1i (1 � ai). This is a geometric
distribution which has �nite mean. �
Theorem 24.16 Facts about recurrence.
1. If state i is recurrent and i$ j then j is recurrent.
2. If state i is transient and i$ j then j is transient.
3. A �nite Markov chain must have at least one recurrent
state.
4. The states of a �nite, irreducible Markov chain are all
recurrent.
Theorem 24.17 (Decomposition Theorem.) The state space
X can be written as the disjoint union
X = XT
[X1
[X2 � � �
24.2 Markov Chains 485
where XT are the transient states and each Xi is a closed, irre-
ducible set of recurrent states.
Convergence of Markov Chains. To discuss the convre-
gence of chains, we need a few more de�nitions. Suppose that
X0 = i. De�ne the recurrence time
Tij = minfn > 0 : Xn = jg (24.11)
assuming Xn ever returns to state i, otherwise de�ne Tij =1.
The mean recurrence time of a recurrent state i is
mi = E(Tii) =Xn
nfii(n) (24.12)
where
fij(n) = P (X1 6= j;X2 6= j; : : : ; Xn�1 6= j;Xn = jjX0 = i):
A recurrent state is null if mi =1 otherwise it is called non-
null or positive.
Lemma 24.18 If a state is null and recurrent, then pnii ! 0.
Lemma 24.19 In a �nite state Markov chain, all recurrent states
are positive.
Consider a three state chain with transition matrix24 0 1 0
0 0 1
1 0 0
35 :
Suppose we start the chain in state 1. Then we will be in state
3 at times 3, 6, 9, . . . . This is an example of a periodic chain.
Formally, the period of state i is d if pii(n) = 0 whenever n is
not divisible by d and d is the largest integer with this property.
Thus, d = gcdfn : pii(n) > 0g where gcd means \greater
common divisor." State i is periodic if d(i) > 1 and aperiodic
if d(i) = 1. A state with period 1 is called aperiodic.
486 24. Stochastic Processes
Lemma 24.20 If state i has period d and i$ j then j has period
d.
De�nition 24.21 A state is ergodic if it is recurrent, non-
null and aperiodic. A chain is ergodic if all its states are
ergodic.
Let � = (�i : i 2 X ) be a vector of non-negative numbers
that sum to one. Thus � can be thought of as a probability mass
function.
De�nition 24.22 We say that � is a stationary (or in-
variant) distribution if � = �P.
Here is the intuition. Draw X0 from distribution � and sup-
pose that � is a stationary distribution. Now draw X1 accord-
ing to the transition probability of the chain. The distribition
of X1 is then �1 = �0P = �P = �. The distribution of X2 is
�P2 = (�P)P = �P = �. Continuing this way, we see that the
distribution of Xn is �Pn = �. In other words:
If at any time the chain has distribution � then it will
continue to have distribution � forever.
24.2 Markov Chains 487
De�nition 24.23 We say that a chain has limiting dis-
tribution if
Pn !
26664�
�...
�
37775
for some �. In other words, �j = limn!1Pnij exists and
is independent of i.
Here is the main theorem.
Theorem 24.24 An irreducible, ergodic Markov chain has
a unique stationary distribution �. The limiting distrubu-
tion exists and is equal to �. If g is any bounded function,
then, with probability 1,
limN!1
1
N
NXn=1
g(Xn)! E �(g) �Xj
g(j)�j: (24.13)
The last statement of the theorem, is the law of large numbers
for Markov chains. It says that sample averages converge to their
expectations. Finally, there is a special condition which will be
useful later. We say that � satis�es detailed balance if
�ipij = pji�j: (24.14)
Detailed balance guarantees that � is a stationary distribution.
Theorem 24.25 If � satis�es detailed balance then � is a sta-
tionary distribution.
Proof. We need to show that �P = �. The jth element of
�P isP
i�ipij =
Pi�jpji = �j
Pipji = �j: �
488 24. Stochastic Processes
The importance of detailed balance will become clear when
we discuss Markov chain Monte Carlo methods.
Warning! Just because a chain has a stationary distribution
does not mean it converges.
Example 24.26 Let
P =
24 0 1 0
0 0 1
1 0 0
35 :
Let � = (1=3; 1=3; 1=3). Then �P = � so � is a stationary
distribution. If the chain is started with the distribution � it will
stay in that distribution. Imagine simulating many chains and
checking the marginal distribution at each time n. It will always
be the uniform distribution �. But this chain does not have a
limit. It continues to cycle around forever. �
Examples of Markov Chains
Example 24.27 (Random Walk) . Let X = f: : : ;�2;�1; 0; 1; 2; : : : ; gand suppose that pi;i+1 = p, pi;i�1 = q = 1 � p. All states com-
municate hence either all the states are recurrent or all are tran-
sient. To see which, suppose we start at X0 = 0. Note that
p00(2n) =
�2n
n
�pnqn (24.15)
since, the only way to get back to 0 is to have n heads (steps
to the right) and n tails (steps to the left). We can approximate
this expression using Stirling's formula which says that
n! � nnpne�n
p2�:
24.2 Markov Chains 489
Inserting this approximation into (24.15) shows that
p00(2n) � (4pq)npn�
:
It is easy to check thatP
np00(n) <1 if and only if
Pnp00(2n) <
1. Moreover,P
np00(2n) = 1 if and only if p = q = 1=2. By
Theorem (24.15), the chain is recurrent if p = 1=2 otherwise it
is transient. �
Example 24.28 Let X = f1; 2; 3; 4; 5; 6g. Let
P =
26666666664
12
12
0 0 0 014
34
0 0 0 014
14
14
14
0 014
0 14
14
0 14
0 0 0 0 12
12
0 0 0 0 12
12
37777777775
Then C1 = f1; 2g and C2 = f5; 6g are irreducible closed sets.
States 3 and 4 are transient because the of the path 3! 4! 6
and once you hit state 6 you cannot return to 3 or 4. Since
pii(1) > 0, all the states are aperiodic. In summary, 3 and 4 are
transient while 1,2,5 and 6 are ergodic. �
Example 24.29 (Hardy-Weinberg.) Here is a famous example from
genetics. Suppose a gene can be type A or type a. There are three
types of people (called genotypes): AA, Aa and aa. Let (p; q; r)
denote the fraction of people of each genotype. We assume that
everyone contributes one of their two copies of the gene at ran-
dom to their children. We also assume that mates are selected
at random. The latter is not realistic however, it is often rea-
sonable to assume that you do not choose your mate based on
whether they are AA, Aa or aa. (This would be false if the gene
was for eye color and if people chose mates based on eye color.)
Imagine if we pooled eveyone's genes together. The proportion
490 24. Stochastic Processes
of A genes is P = p + (q=2) and the proportion of a genes is
Q = r+(q=2). A child is AA with probability P 2, aA with prob-
ability 2PQ and aa with probability Q2. Thus, the fraction of A
genes in this generation is
P 2 + PQ =�p+
q
2
�2+�p+
q
2
��r +
q
2
�:
However, r = 1 � p � q. Substitute this in the above equation
and you get P 2 + PQ = P . A similar calculation shows that
fraction of a genes is Q. We have shown that the proportion of
type A and type a is P and Q and this remains stable after the
�rst generation. The proportion of people of type AA, Aa, aa is
thus (P 2; 2PQ;Q2) from the second generation and on. This is
called the Hardy-Weinberg law.
Assume everyone has exactly one child. Now consider a �xed
person and let Xn be the genotype of their nth descendent. This
is a Markov chain with state space X = fAA;Aa; aag. Some
basic calculations will show you that the transition matrix is24 P Q 0
P
2
P+Q
2
Q
2
0 P Q
35 :
The stationary distribution is � = (P 2; 2PQ;Q2). �
Inference for Markov Chains. Consider a chain with �nite
state space X = f1; 2; : : : ; Ng. Suppose we observe n observa-
tions X1; : : : ; Xn from this chain. The unknown parameters of a
Markov chain are the initial probabilities �0 = (�0(1); �0(2); : : : ; )
and the elements of the transition matrix P. Each row of P is
a multinomial distribution. So we are essentially estimating N
distributions (plus the initial probabilities). Let nij be the ob-
served number of transitions from state i to state j. The likeli-
hood function is
L(�0;P) = �0(x0)
nYr=1
pXr�1;Xr= �0(x0)
NYi=1
NYj=1
pnijij :
24.3 Poisson Processes 491
There is only one observation on �0 so we can't estimate that.
Rather, we focus on estimatingP. The mle is obtained by max-
imizing L(�0;P) subject to the constraint that the elements are
non-negative and the rows sum to 1. The solution is
bpij = nij
ni
where ni =PN
j=1 nij. Here we are assuming that ni > 0. If note,
then we set bpij = 0 by convention.
Theorem 24.30 (Consistency and Asymptotic Normality of the mle .)
Assume that the chain is ergodic. Let bpij(n) denote the mle after
n observations. Then bpij(n) P�! pij. Also,hpNi(n)(bpij � pij)
i N(0;�)
where the left hand side is a matrix, Ni(n) =Pn
r=1 I(Xr = i)
and
�ij;k` =
8<:
pij(1� pij) (i; j) = (k; `)
�pijpi` i = k; j 6= `
0 otherwise:
24.3 Poisson Processes
One of the most studied and useful stochastic processes is the
Poisson process. It arises when we count occurences of events
over time, for example, traÆc accidents, radioactive decay, ar-
rival of email messages etc. As the name suggests, the Poisson
process is intimately related to the Poisson distribution. Let's
�rst review the Poisson distribution.
Recall that X has a Poisson distribution with parameter � {
written X � Poisson(�) { if
P(X = x) � p(x; �) =e���x
x!; x = 0; 1; 2; : : :
492 24. Stochastic Processes
Also recall that E (X) = � and V(X) = �. If X � Poisson(�),
Y � Poisson(�) and X q Y , then X + Y � Poisson(� + �).
Finally, if N � Poisson(�) and Y jN = n � Binomial(n; p), then
the marginal distribution of Y is Y � Poisson(�p).
Now we describe the Poisson process. Imaging you are at your
computer. Each time a new email message arrives you record the
time. Let Xt be the number of messages you have received up
to and including time t. Then, fXt : t 2 [0;1)g is a stochastic
process with state space X = f0; 1; 2; : : :g. A process of this form
is called a counting process. A Poisson process is a counting
process that satis�es certain conditions. In what follows, we will
sometimes write X(t) instead of Xt. Also, we need the following
notation. Write f(h) = o(h) if f(h)=h ! 0 as h ! 0. This
means that f(h) is smaller than h when h is close to 0. For
example, h2 = o(h).
24.3 Poisson Processes 493
De�nition 24.31 A Poisson process is a stochastic pro-
cess fXt : t 2 [0;1)g with state space X = f0; 1; 2; : : :gsuch that
1. X(0) = 0.
2. For any 0 = t0 < t1 < t2 < � � � < tn, the increments
X(t1)�X(t0); X(t2)�X(t1); � � � ; X(tn)�X(tn�1)
are independent.
3. There is a function �(t) such that
P(X(t+ h)�X(t) = 1) = �(t)h+ o(h) (24.16)
P(X(t+ h)�X(t) � 2) = o(h): (24.17)
We call �(t) the intensity function.
The last condition means that the probability of an event in
[t; t + h] is approximately h�(t) while the probability of more
than one event is small.
Theorem 24.32 If Xt is a Poisson process with intensity func-
tion �(t), then
X(s+ t)�X(s) � Poisson(m(s+ t)�m(s))
where
m(t) =
Z t
0
�(s) ds:
In particular, X(t) � Poisson(m(t)). Hence, E (X(t)) = m(t)
and V(X(t)) = m(t).
494 24. Stochastic Processes
De�nition 24.33 A Poisson process with intensity func-
tion �(t) � � for some � > 0 is called a homogeneous
Poisson process with rate �. In this case,
X(t) � Poisson(�t):
Let X(t) be a homogeneous Poisson process with rate �. Let
Wn be the time at which the nth event occurs and set W0 = 0.
The random variables W0;W1; : : : ; are called waiting times.
Let Sn = Wn+1�Wn. Then S0; S1; : : : ; are called sojourn times
or interarraival times.
Theorem 24.34 The sojourn times S0; S1; : : : are iid random
variables. Their distribution is exponential with mean 1=�, that
is, they have density
f(s) = �e��s; s � 0:
The waiting time Wn � Gamma(n; 1=�) i.e. it has density
f(w) =1
�(n)�nwn�1e��t:
Hence, E (Wn) = n=� and V(Wn) = n=�2.
Proof. First, we have
P(S1 > t) = P(X(t) = 0) = e��t
with shows that the cdf for S1 is 1�e��t. This shows the resultfor S1. Now,
P(S2 > tjS1 = s) = P(no events in (s; s+ t]jS1 = s)
= P(no events in (s; s+ t]) (increments are independent)
= e��t:
24.3 Poisson Processes 495
Hence, S2 has an exponential distribution and is independent
of S1. The result follows by repeating the argument. The re-
sult for Wn follows since a sum of exponentials has a Gamma
distribution. �
Example 24.35 Figures 24.3 and 24.4 show requests to a WWW
server in Calgary.1 Assuming that this is a homogeneous Pois-
son process, N � X(T ) � Poisson(�T ). The likelihood is
L(�) / e��T (�T )N
which is maxmimized at
b� =N
T= 48:0077
in units per minute.
In the preceding example we might want to check if the as-
sumption that the data follow a Poisson process is reasonable.
To proceed, let us �rst note that if a Poisson process is observed
on [0; T ] then, given N = X(T ), the event times W1; : : : ;WN
are a sample from a Uniform(0; T ) distribution. To see intutively
why this is true, note that the probability of �nding an event in
a small interval [t; t + h] is proportional to h. But this doesn't
depend on t. Since the probability is constant over t it must be
uniform.
Figure 24.5 shows a histogram and a some density estimates.
The density does not appear to be uniform. To test this formally,
let is divide into 4 equal length intervals I1; I2; I3; I4. Let pi be
the probability of a point being in Ii. The null hypothesis is that
p1 = p2 = p3 = p4. We can test this hypothesis using either a
likelihood ratio test or a �2 test. The latter is
4Xi=1
(Oi � Ei)2
Ei
1See http://ita.ee.lbl.gov/html/contrib/Calgary-HTTP.html for more information.
496 24. Stochastic Processes
0 2 4 6
0.6
0.8
1.0
1.2
1.4
time
0 200 400 600 800 1000 1200
0.6
0.8
1.0
1.2
1.4
time
FIGURE 24.3. Hits on a web server. First plot is �rst
20 events. Second plot is several hours worth of data.
24.3 Poisson Processes 4975
10
15
20
time (seconds)
X(t
)
24/09 24/09 24/09 24/09 24/09 24/09
02
00
60
01
00
0
time (seconds)
X(t
)
24/09 24/09 25/09 25/09 25/09
FIGURE 24.4. X(t) for hits on a web server. First plot
is �rst 20 events. Second plot is several hours worth of
data.
498 24. Stochastic Processes
where Oi is the number of observations in Ii and Ei = n=4 is the
expected number under the null. This yields �2 = 252 which a
p-value of 0. Strong evidence againt the null.
24.4 Hidden Markov Models
24.5 Bibliographic Remarks
This is standard material and there are many good references.
In this Chapter, I followed the following references quite closely:
Probability and Random Processes by Grimmett and Stirzaker
(1982), An Introduction to Stochastic Modeling by Taylor and
Karlin (1998), Stochastic Modeling of Scienti�c Data by Guttorp
(1995) and Probability Models for Computer Science by Ross
(2002). Some of the exercises are from these texts.
24.6 Exercises
1. Let X0; X1; : : : be a Markov chain with states f0; 1; 2g andtransition matrix
P =
24 0:1 0:2 0:7
0:9 0:1 0:0
0:1 0:8 0:1
35
Assume that �0 = (0:3; 0:4; 0:3). Find P(X0 = 0; X1 =
1; X2 = 2) and P(X0 = 0; X1 = 1; X2 = 1).
2. Let Y1; Y2; : : : be a sequence of iid observations such that
P(Y = 0) = 0:1, P(Y = 1) = 0:3, P(Y = 2) = 0:2,
P(Y = 3) = 0:4. Let X0 = 0 and let
Xn = maxfY1; : : : ; Yng:
Show that X0; X1; : : : is a Markov chain and �nd the tran-
sition matrix.
24.6 Exercises 499
Histogram of w/60
w/60
Fre
quency
0 200 400 600 800 1000 1200
050
100
150
200
0 500 1000 15000.0
000
0.0
005
0.0
010
0.0
015
density(x = w/60)
Density
0 200 400 600 800 1000 1200
0.0
00
0.0
01
0.0
02
0.0
03
0.0
04
density(x = w/60, bw = "ucv")
Density
FIGURE 24.5. Density estimate of event times.
500 24. Stochastic Processes
3. Consider a two state Markov chain with states X = f1; 2gand transition matrix
P =
�1� a a
b 1� b
�where 0 < a < 1 and 0 < b < 1. Prove that
limn!1
Pn =
�b
a+ba
a+bb
a+ba
a+b
�:
4. Consider the chain from question 3 and set a = :1 and
b = :3. Simulate the chain. Let
bpn(1) =1
n
nXi=1
I(Xi = 1)
bpn(2) =1
n
nXi=1
I(Xi = 2)
be the proportion of times the chain is in state 1 and
state 2. Plot bpn(1) and bpn(2) versus n and verify that they
converge to the values predicted from the answer in the
previous question.
5. An important Markov chain is the branching process
which is used in biology, genetics, nuclear physics and
many other �elds. Suppose that an animal has Y chil-
dren. Let pk = P (Y = k). Hence, pk � 0 for all k andP1
k=0 pk = 1. Assume each animal has the same lifespan
and that they produce o�spring according to the distri-
bution pk. Let Xn be the number of animals in the nth
generation. Let Y(n)1 ; : : : ; Y
(n)
Xnbe the o�spring produced
in the nth generation. Note that
Xn+1 = Y(n)1 + � � �+ Y
(n)
Xn:
Let � = E (Y ) and �2 = V(Y ). Assume throughout this
question that X0 = 1. Let M(n) = E (Xn) and V (n) =
V(Xn).
24.6 Exercises 501
(a) Show thatM(n+1) = �M(n) and V (n+1) = �2M(n)+
�2V (n).
(b) Show that M(n) = �n and that V (n) = �2�n�1(1 +
�+ � � �+ �n�1).
(c) What happens to the variance if � > 1? What happens
to the variance if � = 1? What happens to the variance if
� < 1?
(d) The population goes extinct if Xn = 0 for some n. Let
us thus de�ne the extinction time N by
N = minfn : Xn = 0g:
Let F (n) = P (N � n) be the cdf of the random variable
N . Show that
F (n) =
1Xk=0
pk(F (n� 1))k; n = 1; 2; : : :
Hint: Note that the event fN � ng is the same as event
fXn = 0g. Thus, P(fN � ng) = P(fXn = 0g). Let k be
the number of o�spring of the original parent. The pop-
ulation becomes extinct at time n if and only if each of
the k sub-populations generated from the k o�spring goes
extinct in n� 1 generations.
(e) Suppose that p0 = 1=4, p1 = 1=2, p2 = 1=4. Use the
formula from (5d) to compute the cdf F (n).
502 24. Stochastic Processes
6. Let
P =
24 0:40 0:50 0:10
0:05 0:70 0:25
0:05 0:50 0:45
35
Find the stationary distribution �.
7. Show that if i is a recurrent state and i $ j, then j is a
recurrent state.
8. Let
P =
26666664
13
0 13
0 0 13
12
14
14
0 0 0
0 0 0 0 1 014
14
14
0 0 14
0 0 1 0 0 0
0 0 0 0 0 1
37777775
Which states are transient? Which states are recurrent?
9. Let
P =
�0 1
1 0
�Show that � = (1=2; 1=2) is a stationary distribution. Does
this chain converge? Why/why not?
10. Let 0 < p < 1 and q = 1� p. Let
P =
266664q p 0 0 0
q 0 p 0 0
q 0 0 p 0
q 0 0 0 p
1 0 0 0 0
377775
Find the limiting distribution of the chain.
11. Let X(t) be an inhomogeneous Poisson process with in-
tensity function �(t) > 0. Let �(t) =R t0�(u)du. De�ne
Y (s) = X(t) where s = �(t). Show that Y (s) is a homo-
geneous Poisson process with intensity � = 1.
24.6 Exercises 503
12. Let X(t) be a Poisson process with intensity �. Find the
conditional distribution of X(t) given that X(t+ s) = n.
13. Let X(t) be a Poisson process with intensity �. Find the
probability that X(t) is odd, i.e. P(X(t) = 1; 3; 5; : : :).
14. Suppose that people logging in to the University computer
system is described by a Poisson process X(t) with inten-
sity �. Assume that a person stays logged in for some
random time with cdf G. Assume these times are all inde-
pendent. Let Y (t) be the number of people on the system
at time t. Find the distribution of Y (t).
15. LetX(t) be a Poisson process with intensity �. LetW1;W2; : : : ;
be the waiting times. Let f be an arbitrary function. Show
that
E
0@X(t)X
i=1
f(Wi)
1A = �
Z t
0
f(w)dw:
16. A two dimensional Poisson point process is a process of
random points on the plane such that (i) for any set A,
the number of points falling in A is Poisson with mean
��(A) where �(A) is the area of A, (ii) the number of
events in nonoverlapping regions is independent. Consider
an arbitrary point x0 in the plane. Let X denote the dis-
tance from x0 to the nearest random point. Show that
P(X > t) = e���t2
and
E (X) =1
2p�:
This is page 505
Printer: Opaque this
25
Simulation Methods
In this we will see that by generating data in a clever way, we
can solve a number of problems such as integrating or maxi-
mizing a complicated function.1 For integration, we will study
three methods: (i) basic Monte Carlo integration, (ii) impor-
tance sampling and (iii) Markov chain Monte Carlo (MCMC).
25.1 Bayesian Inference Revisited
Simulation methods are especially useful in Bayesian inference
so let us brie y review the main ideas in Bayesian inference.
Given a prior f(�) and data Xn = (X1; : : : ; Xn) the posterior
density is
f(�jXn) =L(�)f(�)R L(u)f(u)du
1My main source for this chapter is Monte Carlo Statistical Methods by C. Robert
and G. Casella.
506 25. Simulation Methods
where L(�) is the likelihood function. The posterior mean is
� =
Z�f(�jXn)d� =
R�L(�)f(�)d�R L(�)f(�)d� :
If � = (�1; : : : ; �k) is multidimensional, then we might be inter-
ested in the posterior for one of the components, �1, say. This
marginal posterior density is
f(�1jXn) =
Z Z� � �Zf(�1; : : : ; �kjXn)d�2 � � �d�k
which involves high dimensional integration.
You can see that integrals play a big role in Bayesian infer-
ence. When � is high dimensional, it may not be feasible to
calculate these integrals analytically. Simulation methods will
often be very helpful.
25.2 Basic Monte Carlo Integration
Suppose you want to evaluate the integral I =Rb
ah(x)dx for
some function h. If h is an \easy" function like a polynomial
or triginometric function then we can do the integral in closed
form. In practice, h can be very complicated and there may be no
known closed form expression for I. There are many numerical
techniques for evaluating I such as Simpson's rule, the trape-
zoidal rule, Guassian quadrature and so on. In some cases these
techniques work very well. But other times they may not work
so well. In particular, it is hard to extend them to higher dimen-
sions. Monte Carlo integration is another approach to evaluating
I which is notable for its simplicity, generality and scalability.
Let us begin by writing
I =
Zb
a
h(x)dx =
Zb
a
h(x)(b� a)1
b� adx =
Zb
a
w(x)f(x)dx
25.2 Basic Monte Carlo Integration 507
where w(x) = h(x)(b � a) and f(x) = 1=(b � a). Notice that f
is the density for a uniform random variable over (a; b). Hence,
I = Ef(w(X))
where X � Unif(a; b).
Suppose we generate X1; : : : ; XN � Unif(a; b) where N is
large. By the law of large numbers
bI � 1
N
NXi=1
w(Xi)p! E(w(X)) = I:
This is the basic Monte Carlo integration method. We can
also compute the standard error of the estimate
bse = spN
where
s2 =
Pi(Yi � bI)2N � 1
where Yi = w(Xi). A 1�� con�dence interval for I is bI�z�=2 bse.We can take N as large as we want and hence make the length
of the con�dence inteval very small.
Example 25.1 Let's try this on an example where we know the
true answer. Let h(x) = x3. Hence, I =R 10x3dx = 1=4. Based
on N = 10; 000 I got bI = :2481101 with a standard error of
.0028. Not bad at all.
A simple generalization of the basic method is to consider
integrals of the form
I =
Zh(x)f(x)dx
where f(x) is a probability density function. Taking f to be a
Unif (a,b) gives us the special case above. The only di�erence is
508 25. Simulation Methods
that now we draw X1; : : : ; XN � f and take
bI � 1
N
NXi=1
h(Xi)
as before.
Example 25.2 Let
f(x) =1p2�e�x
2=2
be the standard Normal pdf. Suppose we want to compute the cdf
at some point x:
I =
Zx
�1
f(s)ds = �(x):
Of course, we could look this up in a table or use the pnorm
command in R. But suppose you don't have access to either.
Write
I =
Zh(s)f(s)ds
where
h(s) =
�1 s < x
0 s � x:
Now we generate X1; : : : ; XN � N(0; 1) and set
bI = 1
N
Xi
h(Xi) =number of observations � x
N:
For example, with x = 2, the true answer is �(2) = :9772 and
the Monte Carlo estimate with N = 10; 000 yields .9751. Using
N = 100; 000 I get .9771.
Example 25.3 (Two Binomials) Let X � Binomial(n; p1) and Y �Binomial(m; p2). We would like to estimate Æ = p2�p1. The mle
is bÆ = bp2� bp1 = (Y=m)� (X=n). We can get the standard errorbse using the delta method (remember?) which yields
bse =rbp1(1� bp1)
n+bp2(1� bp2)
m
25.2 Basic Monte Carlo Integration 509
and then construct a 95 per cent con�dence interval bÆ � 2 bse.Now consider a Bayesian analysis. Suppose we use the prior
f(p1; p2) = f(p1)f(p2) = 1, that is, a at prior on (p1; p2). The
posterior is
f(p1; p2jX; Y ) / pX1 (1� p1)n�X pY2 (1� p2)
m�Y :
The posterior mean of Æ is
Æ =
Z 1
0
Z 1
0
Æ(p1; p2)f(p1; p2jX; Y ) =Z 1
0
Z 1
0
(p2�p1)f(p1; p2jX; Y ):
If we want the posterior density of Æ we can �rst get the posterior
cdf
F (cjX; Y ) = P (Æ � cjX; Y ) =ZA
f(p1; p2jX; Y )
where A = f(p1; p2) : p2 � p1 � cg. The density can then be
obtained by di�erentiating F .
To avoid all these integrals, let's use simulation. Note that
f(p1; p2jX; Y ) = f(p1jX)f(p2jY ) which implies that p1 and p2are independent under the posterior distribution. Also, we see
that p1jX � Beta(X+1; n�X+1) and p2jY � Beta(Y +1; m�Y + 1). Hence, we can simulate (P
(1)1 ; P
(1)2 ); : : : ; (P
(N)1 ; P
(N)2 )
from the posterior by drawing
P(i)1 � Beta(X + 1; n�X + 1)
P(i)2 � Beta(Y + 1; m� Y + 1)
for i = 1; : : : ; N . Now let Æ(i) = P(i)2 � P
(i)1 . Then,
Æ � 1
N
Xi
Æ(i):
We can also get a 95 per cent posterior interval for Æ by sorting
the simulated values, and �nding the :025 and :975 quantile. The
posterior density f(ÆjX; Y ) can be obtained by applying density
510 25. Simulation Methods
estimation techniques to Æ(1); : : : ; Æ(N) or, simply by plotting a
histogram.
For example, suppose that n = m = 10, X = 8 and Y = 6.
The R code is:
x <- 8; y <- 6
n <- 10; m <- 10
B <- 10000
p1 <- rbeta(B,x+1,n-x+1)
p2 <- rbeta(B,y+1,m-y+1)
delta <- p2 - p1
print(mean(delta))
left <- quantile(delta,.025)
right <- quantile(delta,.975)
print(c(left,right))
I found that Æ and a 95 per cent posterior interval of (-0.52,0.20).
The posterior density can be estimated from a histogram of the
simulated values as shown in Figure 25.1.
Example 25.4 (Dose Response) Suppose we conduct an experiment
by giving rats one of ten possible doses of a drug, denoted by
x1 < x2 < : : : < x10. For each dose level xi we use n rats and
we observe Yi, the number that survive. Thus we have ten inde-
pendent binomials Yi � Binomial(n; pi). Suppose we know from
biological considerations that higher doses should have higher
probability of death. Thus, p1 � p2 � � � � � p10. Suppose we
want to estimate the dose at which the animals have a 50 per
cent chance of dying. This is called the LD50. Formally, Æ = xjwhere
j = minfi : pi � :50g:
Notice that Æ is implicitly a (complicated) function of p1; : : : ; p10so we can write Æ = g(p1; : : : ; p10) for some g. This just means
that if we know (p1; : : : ; p10) then we can �nd Æ. The posterior
25.2 Basic Monte Carlo Integration 511
delta = p2 − p1
f(d
elta
| d
ata
)
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6
0.0
0.5
1.0
1.5
2.0
FIGURE 25.1. Posterior of Æ from simulation.
512 25. Simulation Methods
mean of Æ isZ Z� � �ZA
g(p1; : : : ; p10)f(p1; : : : ; p10jY1; : : : ; Y10)dp1dp2 : : : dp10:
The integral is over the region
A = f(p1; : : : ; p10) : p1 � � � � � p10g:
Similarly, the posterior cdf of Æ is
F (cjY1; : : : ; Y10) = P (Æ � cjY1; : : : ; Y10)=
Z Z� � �ZB
f(p1; : : : ; p10jY1; : : : ; Y10)dp1dp2 : : : dp10
where
B = A\f(p1; : : : ; p10) : g(p1; : : : ; p10) � cg:
We would need to do a 10 dimensional integral over a resticted
region A. Instead, let's use simulation.
Let us take a at prior truncated over A. Except for the trun-
cation, each Pi has once again a Beta distribution. To draw from
the posterior we do the following steps:
(1) Draw Pi � Beta(Yi + 1; n� Yi + 1); i = 1; : : : ; 10.
(2) If P1 � P2 � � � � � P10 keep this draw. Otherwise, throw
it away and draw again until you get one you can keep.
(3) Let Æ = xj where
j = minfi : Pi > :50g:
We repeat this N times to get Æ(1); : : : ; Æ(N) and take
E(ÆjY1; : : : ; Y10) � 1
N
Xi
Æ(i):
25.2 Basic Monte Carlo Integration 513
Æ is a discrete variable. We can estimate its probability mass
function by
P (Æ = xjjY1; : : : ; Y10) � 1
N
NXi=1
I(Æ(i) = j):
For example, suppose that x = (1; 2; : : : ; 10). Figure 25.2 shows
some data with n = 15 animals at each dose. The R code is a
bit tricky.
## assume x and y are given
n <- rep(15,10)
B <- 10000
p <- matrix(0,B,10)
check <- rep(0,B)
for(i in 1:10){
p[,i] <- rbeta(B, y[i]+1, n[i]-y[i]+1)
}
for(i in 1:B){
check[i] <- min(diff(p[i,]))
}
good <- (1:B)[check >= 0]
p <- p[good,]
B <- nrow(p)
delta <- rep(0,B)
for(i in 1:B){
temp <- cumsum(p[i,])
j <- (1:10)[temp > .5]
j <- min(j)
delta[i] <- x[j]
}
print(mean(delta))
left <- quantile(delta,.025)
right <- quantile(delta,.975)
print(c(left,right))
514 25. Simulation Methods
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
dose
pro
port
ion w
ho d
ied
11 1
1
1
1
1 11 1
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
xt(
p)
2 2 22
2
22
2 2 2
33
3 33
3
3 33 3
4
44
4
4
4 44
4 4
5 5
5 5
5
5
5 5 5 5
66 6
6
6
6
66 6 6
77
7
7
7
7
7 7 77
8
88
8 8
8
8 8 88
9
9 9
9 9
9
99 9 9
0 00
0 0
00 0
00
a a a
a
a
a
a a a a
b bb
b
b
b
b b b b
c c c
c c
cc
cc c
d d
d
dd
d
d d d d
ee e
ee
e e
ee e
f ff
f
f
f
f f
f f
g
g g
g
g
g gg g
g
h h
h
h
h
h
h h h h
i
i ii
i
i
i i i i
jj
j jj
j
j j jj
kk
k
k
k
kk
k kk
l ll
ll
l
l l ll
m mm
m
m
m
m
m mm
nn
n
nn
n
n
n n n
o oo
oo
o
o o o o
p p pp
p
pp p p
p
q q
q
q
qq q q
r
r r
r r
r
r r r r
s ss
ss
s
ss
s s
t t
t
t
t
tt
t t t
u u
u
u
u
uu u u
u
v
vv
v
v
vv v
v v
w w w
w w
w
w w w w
xx
x x
x
xx
x x x
y
y yy y
y
y y yy
z z
z z z
zz
z z z
N
NN
NN
N
NN N N
N
N
N
N N
N
N N N N
N
NN
N
N
NN
N N N
N N
NN N
N
N N N N
N
N
N N
N
N
N N NN
N
NN
N
N
NN N N N
NN N
N
N
N
NN N N
N
N N
N N
NN N N N
N N
N
N N
NN N N N
N
N N
N
N
NN
N NN
N
N NN
N
N
N N N N
NN
N
N
N
N
N NN N
N N
N
N
N N
N N N N
N
N N
N N
N
N NN N
0.0
0.2
0.4
0.6
0.8
LD50
f(delta | d
ata
)
3 4 5
FIGURE 25.2. Dose repoonse data.
25.3 Importance Sampling 515
The posterior draws for p1; : : : ; p10 are shown in the second
panel in the �gure. We �nd that that Æ = 4:04 with a 95 per
cent interval of (3,5). The third panel shows the posterior for Æ
which is necessarily a discrete distribution.
25.3 Importance Sampling
Consider the integral I =Rh(x)f(x)dx where f is a probability
density. The basic Monte Carlo method involves sampling from
f . However, there are cases where we may not know how to
sample from f . For example, in Bayesian inference, the posterior
density density is is obtained by multiplying the likelihood L(�)times the prior f(�). There is no guarantee that f(�jx) will bea known distribution like a Normal or Gamma or whatever.
Importance sampling is a generalization of basic Monte Carlo
which overcomes this problem. Let g be a probability density
that we know how to simulate from. Then
I =
Zh(x)f(x)dx =
Zh(x)f(x)
g(x)g(x)dx = Eg(Y )
where Y = h(X)f(X)=g(X) and the expectation Eg(Y ) is with
respect to g. We can simulate X1; : : : ; XN � g and estimate I
by
bI = 1
N
Xi
Yi =1
N
Xi
h(Xi)f(Xi)
g(Xi):
This is called importance sampling. By the law of large num-
bers, bI p! I. However, there is a catch. It's possible that bI might
have an in�nite standard error. To see why, recall that I is the
mean of w(x) = h(x)f(x)=g(x). The second moment of this
quantity is
Eg(w2(X)) =
Z �h(x)f(x)
g(x)
�2
g(x)dx =
Zh2(x)f 2(x)
g(x)dx:
516 25. Simulation Methods
If g has thinner tails than f , then this integral might be in�nite.
To avoid this, a basic rule in importance sampling is to sample
from a density g with thicker tails than f . Also, suppose that
g(x) is small over some set A where f(x) is large. Again, the
ratio of f=g could be large leading to a large variance. This
implies that we should choose g to be similar in shape to f . In
summary, a good choice for an importance sampling density g
should be similar to f but with thicker tails. In fact, we can say
what the optimal choice of g is.
Theorem 25.5 The choice of g that minimizes the variance of bIis
g�(x) =jh(x)jf(x)R jh(s)jf(s)ds:
PROOF. The variance of w = fh=g is
Eg(w2)� (E(w2))2 =
Zw2(x)g(x)dx�
�Zw(x)g(x)dx
�2
=
Zh2(x)f 2(x)
g2(x)g(x)dx�
�Zh(x)f(x)
g(x)g(x)dx
�2
=
Zh2(x)f 2(x)
g2(x)g(x)dx�
�Zh(x)f(x)dx
�2
:
The second integral does not depend on g so we only need to
minimize the �rst integral. Now, from Jensen's inequality we
have
Eg(W2) � (Eg(jW j))2 =
�Zjh(x)jf(x)dx
�2
:
This establishes a lower bound on Eg(W2). However, Eg�(W
2)
equals this lower bound which proves the claim. �
This theorem is interesting but it is only of theoretical inter-
est. If we did not know how to sample from f then it is unlikely
25.3 Importance Sampling 517
that we could sample from jh(x)jf(x)= R jh(s)jf(s)ds. In prac-
tice, we simply try to �nd a thick tailed distribution g which is
similar to f jhj.Example 25.6 (Tail Probability) Let's estimate I = P (Z > 3) =
:0013 where Z � N(0; 1). Write I =Rh(x)f(x)dx where f(x) is
the standard Normal density and h(x) = 1 if x > 3 and 0 other-
wise. The basic Monte Carlo estimator is bI = N�1P
ih(Xi).
Using N = 100 we �nd (from simulating many times) that
E(bI) = :0015 and V ar(bI) = :0039. Notice that most obser-
vations are wasted in the sense that most are not near the right
tail. We will estimate this with importance sampling taking g
to be a Normal(4,1) density. We draw values from g and the
estimate is now bI = N�1P
if(Xi)h(Xi)=g(Xi). In this case we
�nd that E(bI) = :0011 and V ar(bI) = :0002: We have reduced
the standard deviation by a factor of 20.
Example 25.7 (Measurement Model With Outliers) Suppose we have
measurements X1; : : : ; Xn of some physical quantity �. We might
model this as
Xi = � + �i:
If we assume that �i � N(0; 1) then Xi � N(�i; 1). However,
when taking measurements, it is often the case that we get the
occasional wild obvervation, or outlier. This suggests that a Nor-
mal might be a poor model since Normals have thin tails which
implies that extreme observations are rare. One way to improve
the model is to use a density for �i with a thicker tail, for ex-
ample, a t-distribution with � degrees of freedom which has the
form
t(x) =���+12
����
2
� 1
��
�1 +
x2
�
��(�+1)=2
:
Smaller values of � correspond to thicker tails. For the sake of
illustration we will take � = 3. Suppose we observe n Xi = �+�i,
i = 1; : : : ; n where �i has a t distribution with � = 3. We will
518 25. Simulation Methods
take a at prior on �. The likelihood is L(�) = Qn
i=1 t(Xi � �)
and the posterior mean of � is
� =
R�L(�)d�R L(�)d� :
We can estimate the top and bottom integral using importance
sampling. We draw �1; : : : ; �N � g and then
� �1N
PN
j=1
�jL(�j)
g(�j)
1N
PN
j=1
L(�j)
g(�j)
:
To illustrate the idea, I drew n = 2 observations. The posterior
mean (ccomputed numerically) is -0.54. Using a Normal impor-
tance sampler g yields an estimate of -0.74. Using a Cauchy (t-
distribution with 1 degree of freedom) importance sampler yields
an estimate of -0.53.
25.4 MCMC Part I: The Metropolis-Hastings
Algorithm
Consider again the problem of estimating the integral I =Rh(x)f(x)dx.
In this chapter we introduce Markov chain Monte Carlo (MCMC)
methods. The idea is to constuct a Markov chain X1; X2; : : : ;
whose stationary distribution is f . Under certain conditions it
will then follow that
1
N
NXi=1
h(Xi)p! Ef(h(X)) = I:
This works because there is a law of large numbers for Markov
chains called the ergodic theorem.
The Metropolis-Hastings algorithm is a speci�c MCMC
method that works as follows. Let q(yjx) be an arbitrary, friendlydistribution (i.e. we know how to sample from q(yjx)). The con-ditional density q(yjx) is called the proposal distribution.
25.4 MCMC Part I: The Metropolis-Hastings Algorithm 519
The Metropolis-Hastings algorithm creates a sequence of obser-
vations X0; X1; : : : ; as follows.
Metropolis-Hastings Algorithm. Choose X0 arbitrarily. Suppose
we have generated X0; X1; : : : ; Xi. To generate Xi+1 do the following:
(1) Generate a proposal or candidate value Y � q(yjXi).
(2) Evaluate r � r(Xi; Y ) where
r(x; y) = min
�f(y)
f(x)
q(xjy)q(yjx) ; 1
�:
(3) Set
Xi+1 =
�Y with probability r
Xi with probability 1� r:
REMARK 1: A simple way to execute step (3) is to generate
U � (0; 1). If U < r set Xi+1 = Y otherwise set Xi+1 = Xi.
REMARK 2: A common choice for q(yjx) is N(x; b2) for some
b > 0. This means that the proposal is draw from a Normal,
centered at the current value.
REMARK 3: If the proposal density q is symmetric, q(yjx) =q(xjy), then r simpli�es to
r = min
�f(Y )
f(Xi); 1
�:
The Normal proposal distribution mentioned in Remark 2 is an
example of a symmetric proposal density.
520 25. Simulation Methods
By construction, X0; X1; : : : is a Markov chain. But why does
this Markov chain have f as its stationary distribution? Before
we explain why, let us �rst do an example.
Example 25.8 The Cauchy distribution has density
f(x) =1
�
1
1 + x2:
Our goal is to simulate a Markov chain whose stationary distri-
bution is f . As suggested in the remark above, we take q(yjx) tobe a N(x; b2). So in this case,
r(x; y) = min
�f(y)
f(x); 1
�= min
�1 + x2
1 + y2; 1
�:
So the algorithm is to draw Y � N(Xi; b2) and set
Xi+1 =
�Y with probability r(Xi; Y )
Xi with probability 1� r(Xi; Y ):
The R code is very simple:
metrop <- function(N,b){
x <- rep(0,N)
for(i in 2:N){
y <- rnorm(1,x[i-1],b)
r <- (1+x^2)/(1+y^2)
u <- runif(1)
if(u < r)x[i] <- y else x[i] <- x[i-1]
}
return(x)
}
The simulator requires a choice of b. Figure 25.3 shows three
chains of length N = 1000 using b = :1, b = 1 and b = 10. The
histograms of the simulated values are shown in Figure 25.4. Set-
ting b = :1 forces the chain to take small steps. As a result, the
chain doesn't \explore" much of the sample space. The histogram
25.4 MCMC Part I: The Metropolis-Hastings Algorithm 521
from the sample does not approximate the true density very well.
Setting b = 10 causes the proposals to often be far in the tails,
making r small and hence we reject the proposal and keep the
chain at its current position. The result is that the chain \gets
stuck" at the same place quite often. Again, this means that the
histogram from the sample does not approximate the true density
very well. The middle choice avoids these extremes and results in
a Markov chain sample that better represents the density sooner.
In summary, there are are tuning parameters and the eÆciency
of the chain depends on these parameters. We'll discus this in
more detail later.
If the sample from the Markov chain starts to \look like"
the target distribution f quickly, then we say that the chain is
\mixing well." Constructing a chain that mixes well is somewhat
of an art.
Why It Works. Recall from the previous chapter that a
distribution � satis�es detailed balance for a Markov chain if
pij�i = pji�j:
We then showed that if � satis�es detailed balance, then it is a
stationary distribution for the chain.
Because we are now dealing with continuous state Markov
chains, we will change notation a little and write p(x; y) for the
probability of making a transition from x to y. Also, let's use
f(x) instead of � for a distribution. In this new notation, f is
a stationary distribution if f(x) =Rf(y)p(y; x)dy and detailed
balance holds for f if
f(x)p(x; y) = f(y)p(y; x):
Detailed balance implies that f is a stationary distribution since,
if detailed balance holds, thenZf(y)p(y; x)dy =
Zf(x)p(x; y)dy = f(x)
Zp(x; y)dy = f(x)
522 25. Simulation Methods
0 200 400 600 800 1000
−1
.00
.01
.02
.0
0 200 400 600 800 1000
−2
02
4
0 200 400 600 800 1000
−8
−4
02
4
FIGURE 25.3. Three Metropolis chains corresponding
to b = :1, b = 1, b = 10.
25.4 MCMC Part I: The Metropolis-Hastings Algorithm 523
−5 0 5
0.0
0.1
0.2
0.3
0.4
0.5
−5 0 5
0.0
0.1
0.2
0.3
0.4
0.5
−5 0 5
0.0
0.1
0.2
0.3
0.4
0.5
FIGURE 25.4. Histograms from the three chains. The
true density is also plotted.
524 25. Simulation Methods
which shows that f(x) =Rf(y)p(y; x)dy as required. Our goal
is to show that f satis�es detailed balance which will imply that
f is a stationary distribution for the chain.
Consider two points x and y. Either
f(x)q(yjx) < f(y)q(xjy) or f(x)q(yjx) > f(y)q(xjy):
We will ignore ties which we can do in the continuous setting.
Without loss of generality, assume that f(x)q(yjx) > f(y)q(xjy).This implies that
r(x; y) =f(y)
f(x)
q(xjy)q(yjx)
and that r(y; x) = 1. Now p(x; y) is the probability of jumping
from x to y. This requires two things: (i) the proposal distribu-
tion must generate y and (ii) you must accept y. Thus,
p(x; y) = q(yjx)r(x; y) = q(yjx)f(y)f(x)
q(xjy)q(yjx) =
f(y)
f(x)q(xjy):
Therefore,
f(x)p(x; y) = f(y)q(xjy): (25.1)
On the other hand, p(y; x) is the probability of jumping from
y to x. This requires two things: (i) the proposal distribution
must generate x and (ii) you must accept x. This occurs with
probability p(y; x) = q(xjy)r(y; x) = q(xjy). Hence,
f(y)p(y; x) = f(y)q(xjy): (25.2)
Comparing (25.1) and (25.2), we see that we have shown that
detailed balance holds.
25.5 MCMC Part II: Di�erent Flavors
There are di�erent types of MCMC algorithm. Here we will
consider a few of the most popular versions.
25.5 MCMC Part II: Di�erent Flavors 525
Random-walk-Metropolis-Hastings. In the previous sec-
tion we considered drawing a proposal Y of the form
Y = Xi + �i
where �i comes from some distribution with density g. In other
words, q(yjx) = g(y � x). We saw that in this case,
r(x; y) = min
�1;f(y)
f(x)
�:
This is called a random-walk-Metropolis-Hastingsmethod.
The reason for the name is that, if we did not do the accept-
reject step, we would be simulating a random walk. The most
common choice for g is a N(0; b2). The hard part is choosng b
so that the chain mixes well. A good rule of thumb is: choose b
so that you accept the proposals about 50 per cent of the time.
Warning: This method doesn't make sense unless X takes
values on the whole real line. If X is restricted to some interval
then it is best to transformX, say, Y = m(X) say, where Y takes
values on the whole real line. For example, if X 2 (0;1) then
you might take Y = logX and then simulate the distribution
for Y instead of X.
Independence-Metropolis-Hastings.This is like an importance-
sampling version of MCMC. In this we draw the proposal from
a �xed distribution g. Generally, g is chosen to be an approxi-
mation to f . The acceptance probability becomes
r(x; y) = min
�1;f(y)
f(x)
g(x)
g(y)
�:
Gibbs Sampling. The two previous methods can be easily
adapted, in principle, to work in higher dimensions. In practice,
526 25. Simulation Methods
tuning the chains to make them mix well is hard. Gibbs sampling
is a way to turn a high-dimensional problem into several one
dimensional problems.
Here's how it works for a bivariate problem. Suppose that
(X; Y ) has density fX;Y (x; y). First, suppose that it is possi-
ble to simulate from the conditional distributions fXjY (xjy) andfY jX(yjx). Let (X0; Y0) be starting values. Asume we have drawn
(X0; Y0); : : : ; (Xn; Yn). Then the Gibbs sampling algorithm for
getting (Xn+1; Yn+1) is:
The n+ 1 step in Gibbs sampling:
Xn+1 � fXjY (xjYn)Yn+1 � fY jX(yjXn+1)
This generalizes in the obvious way to higher dimensions.
Example 25.9 (Normal Hierarchical Model) Gibbs sampling is very
useful for a class of models called hierarchical models. Here
is a simple case. Suppose we draw a sample of k cities. From
each city we draw ni people and observe how many people Yihave a disease. Thus, Yi � Binomial(ni; pi). We are allowing
for di�erent disease rates in di�erent cities. We can also think
of the p0is as random draws from some distribution F . We can
write this model in the following way:
Pi � F
YijPi = pi � Binomial(ni; pi):
We are interested in estimating the p0is and the overall disease
rateRpdF (p).
To proceed, it will simplify matters if we make some transfor-
mations that allow us to use some Normal approxmations. Let
25.5 MCMC Part II: Di�erent Flavors 527
bpi = Yi=ni. Recall that bpi � N(pi; si) where si =pbpi(1� bpi)=ni.
Let i = log(pi=(1� pi)) and de�ne Zi � b i = log(bpi=(1� bpi)).By the delta method,
b i � N( i; �2i)
where �2i= 1=(nbpi(1 � bpi)). Experience shows that the Normal
approximation for is more accurate than the Normal approx-
imation for p so we shall work with . We shall treat �i as
known. Furthermore, we shall take the distribution of the 0is to
be Normal. The hierarchical model is now
i � N(�; � 2)
Zij i � N( i; �2i):
As yet another simpli�cation we take � = 1. The unknown pa-
rameter are � = (�; 1; : : : ; k). The likelihood function is
L(�) /Yi
f( ij�)Yi
f(Zij )
/Yi
exp
��1
2( i � �)2
�exp
�� 1
2�2i
(Zi � i)2
�:
If we use the prior f(�) / 1 then the posterior is proportional
to the likelihood. To use Gibbs sampling, we need to �nd the
conditional distribution of each parameter conditional on all the
others. Let us begin by �nding f(�jrest) where \rest" refers to
all the other variables. We can throw away any terms that don't
involve �. Thus,
f(�jrest) /Yi
exp
��1
2( i � �)2
�
/ exp
��k2(�� b)2
�where
b =1
k
Xi
i:
528 25. Simulation Methods
Hence we see that �jrest � N(b; 1=k). Next we will �nd f( jrest).Again, we can throw away any terms not involving i leaving us
with
f( ijrest) / exp
��1
2( i � �)2
�exp
�� 1
2�2i
(Zi � i)2
�
/ exp
�� 1
2d2i
( i � ei)2
�
where
ei =
Zi
�2i+ �
1 + 1�2i
and d2i=
1
1 + 1�2i
and so ijrest � N(ei; d2i). The Gibbs sampling algorithm then
involves iterating the following steps N times:
draw � � N(b; v2)
draw 1 � N(e1; d21)
......
draw k � N(ek; d2k):
It is understood that at each step, the most recently drawn ver-
sion of each variable is used.
Let's now consider a numerical example. Suppose that there
are k = 20 cities and we sample n = 20 people from each city.
The R function for this problem is:
gibbs.fun <- function(y,n,N){
k <- length(y)
p.hat <- y/n
Z <- log(p.hat/(1-p.hat))
sigma <- sqrt(1/(n*p.hat*(1-p.hat)))
v <- sqrt(1/sum(1/sigma^2))
mu <- rep(0,N)
psi <- matrix(0,N,k)
for(i in 2:N){
25.5 MCMC Part II: Di�erent Flavors 529
### draw mu given rest
b <- v^2*sum(psi[i-1,]/sigma^2)
mu[i] <- rnorm(1,b,v)
### draw psi given rest
e <- (Z + mu[i]/sigma^2)/(1 + (1/sigma^2))
d <- sqrt(1/(1 + (1/sigma^2)))
psi[i,] <- rnorm(k,e,d)
}
list(mu=mu,psi=psi)
}
After running the chain, we can convert each i back into piby way of pi = e i=(1 + e i). The raw proportions are shown in
Figure 25.6. Figure 25.5 shows \trace plots" of the Markov chain
for p1 and �. (We should also look at the plots for p2; : : : ; p20.)
Figure 25.6 shows the posterior for � based on the simulated
values. The second panel of Figure 25.6 shows the raw propor-
tions and the Bayes estimates. Note that the Bayes estimates
are \shrunk" together. The parameter � controls the amount of
shrinkage. We set � = 1 but in practice, we should treat � as an-
other unknown parameter and let the data determine how much
shrinkage is needed.
So far we assumed that we know how to draw samples from
the conditionals fXjY (xjy) and fY jX(yjx). If we don't know how,
we can still use the Gibbs sampling algorithm by drawing each
observation using a Metropolis-Hastings step. Let q be a pro-
posal distribution for x and let eq be a proposal distribution for
y. When we do a Metropolis step for X we treat Y as �xed.
Similarly, when we do a Metropolis step for Y we treat X as
�xed. Here are the steps:
530 25. Simulation Methods
0 200 400 600 800 1000
0.4
0.6
0.8
Simulated values of p1
p1
0 200 400 600 800 1000
−0
.6−
0.2
0.2
Simulated values of mu
mu
FIGURE 25.5. Simulated values of p1 and �.
25.5 MCMC Part II: Di�erent Flavors 531
mu
−0.6 −0.4 −0.2 0.0 0.2 0.4
05
01
00
15
02
00
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
3.0
raw estimates and Bayes estimates
FIGURE 25.6. Top panel: posterior histogram of �.
Lower panel: raw proptions and the Bayes posterior es-
timates.
532 25. Simulation Methods
Metropolis within Gibbs:
(1a) Draw a proposal Z � q(zjXn).
(1b) Evaluate
r = min
�f(Z; Yn)
f(Xn; Yn)
q(XnjZ)q(ZjXn)
; 1
�:
(1c) Set
Xn+1 =
�Z with probability r
Xn with probability 1� r:
(2a) Draw a proposal Z � eq(zjYn).
(2b) Evaluate
r = min
�f(Xn+1; Z)
f(Xn+1; Yn)
eq(YnjZ)eq(ZjYn) ; 1�:
(2c) Set
Yn+1 =
�Z with probability r
Yn with probability 1� r:
Again, this generalizes to more than two dimensions.
� � ��� ����
� � � ��� � ����� � � � ������� � � � ���� � �
��� �"!$#%�'&)(+*,(+�"-.#/� 0213*5476'0/(+8:9<;=� �>6?(@#%A3(+�"-.#CBD(+A$� 9E6?(+#%�3FG47�3(H4 &I-.J'*5(+(K-L!M1N(O9+P(+9Q-.#/8R�S-5#/47�UTV;+47�VBDA3(+�';+(K95(O-59W4 *XJY!M1?4 -.J'(+95#%9<-5(+9Q-.#/�'F3Z
[]\?^`_NaWbMcOad^fe�g'ad^)\?_G*5(h&)(+*59i-.4]13*,4kjV#%A3#/�'F]�]95#%�3F70%(ml,6?(+9Q-IF7n3(O959,op4 &q95478:(srYnD�S�Vt-.#C-L!H4S&D#/�"-.(O*5(O9,-=ZvuXJ3(wrYnD� �"-.#C-L!K4 &q#/�"-.(O*5(O9,-x;O47n30%Am6N(s�E1D� *5� 8:(O-.(O*i#%�y�W13� *.� 8:(O-5*5#%;8R4MA3(O0zTY�H{U|i}>~�TM�H13*,476D� 63#%0/#C-L!@A3(O�395#C-L!�&)n3�3;O-5#/47���iTM��*5(+F *5(+9,95#%47�R&)n3�';O-.#%47���iTY47*�y13*5(OA3#/;h-.#/4 ��&)4 *��y&)nV-.n3*,(�jS� 0%n3(���4 &i95478:(�*.� �3A'478�jS� *,#�� 630%( Z
�����+�V�k�d�S�k�.���V�Y�v�I�G�"���d� �����>�7�V���k�H�O�Q�.���]�k���G� �W�¡ O��¢�'£�¤N�S�W�S�K 7��¥�§¦��k�w�R� ���H¨Y©S�O�D��ª"�"«��d�k�<�@¬3ª��Y�k�.� �`�+£x<¦d�p�O�Q�.���]�k��� ¢�y�"�S�7���d�"�X�'���§¦d��Y�k�Q�@�,��� �<���X��¥��Y�d�"�V�®���S¥`���Y "¯ �7£
°²± \?_´³xµ2bM_ ± b¶c+b"aK·s¸H#/9X�@9,(O-X-5JD�S-X;O47�"-§� #%�39w�mrYnD� �"-5#%-L!G4S&2#/�"-.(O*5(+9Q- �H¹ #C-.J95478:(E1'*5(+9,;+*,#/6?(+AG13*,476D� 6'#/0/#C-L!�ºI»$¼½ZI¾D47*½(h¿'� 8:130/(ST ¹ (]8:#/F7J"- ¹ �S�Y-s-.4@;O47�39Q-.*5n';O-� �:#/�"-.(O*,jS� 0q·s¸KÀÂÁfÃDħÅhÆx-5JD�S-Ç-.*5� 139½� �:n3�3ÈY�34 ¹ �G1D�S*.� 8:(O-5(+* ��É"Ê 1N(O*½;O(+�"-Ë4 &�-5J3(-.#%8R(SZ
ÌL�RÍ?δÏI\DadÍÐbMc=^fcWakbMcOak^`_2Ñ2T ¹ (X9Q-§� *Q- ¹ #%-5Jy95478:(wA3(h&f� n30C-x-.J3(O47*,!pÒH;=�S0/0/(OAy�p_UÓÐÔ`ÔÍNδÏI\DadÍ2bMc=^`cwÒ:� �'A ¹ (E� 9,È�#%&�-5J3(<AD�S-§��13*,4djM#%A3(W9,n'Õ:;+#%(+�"-w(hjV#%A3(+�';+(E-54H*,(�Ö,(+;h-s-5J3(-.J3(O47*,! ZÇÌ×&x�34 - ¹ (K*5(O-.� #/��-5J3(��Mn'0/0UJ"!M1N4S-.J3(O95#/9OZ <¦d�E����¥z�ÙØ5¥/�O�Q�7�Ú�7���dÛ
�§¦d�R�"ª"¯�¯q¦k�"� � �§¦d�+�§����Ü���X�3ªd�]����Ýv¦"¥f���EÞÐ�S�=ß� �d�+�Q�7£HàX�§¦d��¥I���S¥`�K� ß�d�'¯ �7ÛS�K� �:Ø×���O�+�S�k�.���dÛ�§¦d�¶�"ª"¯�¯ Ü$�"¥GØ`�f�"��¯á�Ú�dÛ���â¥/�×ãL�+�h�s�§¦d�y�"ª"¯�¯Ú£äÜ
º7ºkå
º7º�� ���������� ����������������������� �"!"�#�����%$�&'�(&'����� )���#���*Y© �"�K�"¯ �,+3£.-0/QÝ��V�Ú�213¯á�Ú�"�7���dÛD£436587:9;9=<:>:?%@BA ÄDC�C�C+Ä @ ¸,EGFs(+*,�347n30%0/#`Á.H?Æ"I ?KJL<NMO?%PRQSJ=TI ?O9=?KJ I ?KJ=MUV<NQSJXWYQZ9=>D[X\^]_?a`N>:<NJb`;cDde?�9=<NQSJ=M�?V>KMfQSgh`iM'?B<'j H Qe> ¢H�À Plk A=monp.q A @ p [\ º]» ¼ 9=?K]rUV?KJ=M�UV<NJ�s I ?KJLUV?tQSJ=M'?K]vuN`id�j�<N] H Qe> ·s¸GÀ Á ¢H3¸H»xw,¸3Ä ¢H'¸�y w,¸"Æ#z%{ ?K]V?wa|¸ À~}�� P�� k A 0%47FqÁ���� ¼xÆ [R��<2MO?V>KM)M { ?�J=7�dSd {�� 9=<NM { ?V>KQe>�M { `iM)M { ?2UV<NQSJRQe>�j�`iQS]_��M { `iMQe>V� H$À�º���� � z ?hUa`iJ�]_?���?VUKMYM { ?#J=7�dSd {�� 9=<NM { ?V>KQe> z%{ ?KJ�� ¢H3¸K» A| � �8� ¢H3¸DÁ�º�» ¢H3¸"Æ�� PQe>td�`i]���?K]tM { `iJ���[�� { ?��v7;>KMfQ s�Ua`iM�Q�<NJ0j�<N]#M { ?V>:?9b]_<:UV? I 7�]_?V> z QSdSd�cK?#M { ?�>K7�c���?VUKM<'jM { ?�JL?���M8j�? z U { `K9bMO?K]_>D[
���: ¡G¢h£�¤X¥§¦G¨�¥©£Dª «�¥©£�¢r¤¬´(O- @BA Ä�C�C�COÄ @ ¸�6N( Px� | AD��-§��1N4 #/�"-�&)*5478 9,478:(:A'#/9,-5*5#%63n'-.#%47�¡~�Zr® 1?47#/�"-
(+9Q-.#%8��S-547* ¢� ¸�4 &v�y1D� *5� 8R(h-.(O* � #%9X95478:(]&)n3�3;h-.#%47�¶4S& @BA Ä�CDC�C+Ä @ ¸MP¢� ¸HÀ°¯�Á @BA Ä�CDC�C+Ä @ ¸7ÆKC
±�(KA'(OBD�3( 7�ä�d� Á ¢� ¸"ÆÇÀ ²´³dÁ ¢� ¸"Æv» � Á�µ'ZCºdÆ
-.4�6?(:-5J3(�63#��S9p4 & ¢� ¸VZh±�(�9.�=!>-.J3�S- ¢� ¸�#/9KÓi_�¶v^)gDc+bMµ #%&©²pÁ ¢� ¸7ÆpÀ � Zh·<�M6'#�� 9,(+AVt�3(O959<n395(OA�-.4:*5(O;+(+#Cj7(K8mn3;§J��S-5-5(+�"-.#%47�¶6'n'-W-5J3(+9,(HA3�k!M9W#%-�#/9��34S-W;O47�39,#/A3(O*5(+Aâj7(O*,!#/8:1?47*,-.� �"-�¸Y8��S�Y!:4 &´-5J3(](O9,-.#%8���-.47*,9 ¹ ( ¹ #/0%0?n'95(p� *,(E63#/� 95(OAUZ©®®1?47#/�"-Ë(+9Q-.#/8R�S-547*¢� ¸$4S&<�$1D� *.�S8R(h-.(+* � #/9 ± \N_Ðc=^fcOakbM_Na�#%& ¢� ¸G¹»bº � ZR»s4 �395#%9,-.(O�3;O!�#/9m�$*,(=� 9,47�D� 6'0/(*5(OrYn3#/*,(+8:(+�"-@&)47*H(+9Q-.#/8R�S-547*59OZ¶uXJ3(�A3#/9Q-.*5#%63n'-5#/47� 4 & ¢� ¸¶#/9@;+� 0/0%(+A -.J3(�cOgDe�ÏiÔ`^f_ÐѵÐ^`cOa�¼S^�¶iÓ2ad^)\?_½ZWuXJ3(y9,-.� �3AD�S*5A¡A3(OjM#/�S-.#%47�$4 & ¢� ¸�#/9];+� 0/0%(+A�-5J3(ycOakg3_iµ2g�¼�µ®b½¼N¼d\8¼YTA3(O�34 -.(OA¶6"! �Q� P
�,� À �,� Á ¢� ¸YÆIÀ¿¾ ÀyÁ ¢� ¸YÆKC Á�µ'ZZ�7ÆÁ & -.(O�UT2#%-�#/9p�'4 -K1N479,95#%630/(m-54¶;O478R1'n'-.(y-.J'(:9Q-§� �'AD� *5A (+*,*547*�63nV-Hn'95nD� 0%0%! ¹ (:;=� �(+9Q-.#%8��S-5(<-5J3(]9Q-§� �3A3� *5A�(O*5*54 *+ZIuXJ3(](O9,-5#/8R�S-.(OAG9,-.� �3AD� *,A�(+*,*547*Ë#/9sA'(+�34 -5(+A�6"! ¢�Q� Z*Y© �"�K�"¯ �,+3£Ã°Äl?KM´@BA Ä�CDC�C+Ä @ ¸ÅEÆFË(+*5�'47n30/0%#`Á.H?Æ `iJ I de?KM ¢H'¸>À P k A m p @ p [Ç� { ?KJ²]Á ¢H3¸7Æ@À P k A m p ²pÁ @ p Æ@À�H >:< ¢H'¸ Qe>r7�JbcDQf`N>:? I [È� { ?0>KM�`iJ I `i] I ?K]v]_<N]2Qe>H�Q� À� À�Á ¢H3¸7ÆÇÀÉ� HÐÁ�º�»ÊH?Æa� P©[#� { ?�?V>KM�QSgh`iMO? I >KM�`iJ I `i] I ?K]K]_<N]�Qe> ¢�Q� À¿� ¢HÐÁQº�» ¢H?Æ�� P©[Ë
���������)!�&'�0��Y$ �& ����&K!"� º º Ê
uXJ3( rMn3� 0/#C-L!®4 &R� 1N4 #/�"->(+9Q-.#%8��S-5(�#/9>9,478:(O-.#%8R(O9>� 9,95(+9,95(OA 6"!®-.J'( e�bMg3_c��UÓ2g�¼dbMµ b½¼�¼d\8¼MT'47*�����HT3A3(hBD�3(OA�6"!
��L* À°²Y³dÁ ¢� ¸�» � Æ | C <(O;=� 0%0�-.J3�S-�²�³kÁ�� Æs*,(O&)(O*59X-.4:(h¿V1?(+;O-.�S-.#%47� ¹ #%-5Jâ*,(+9,1N(O;O-<-.4�-.J3(�A3#%9,-.*,#/63nV-.#/4 �
�xÁ�� A Ä�C�C�COÄ��D¸�¸ � ÆÇÀ ¸� p.q A �xÁ�� p ¸ � Æ
-.JD��-pF7(+�'(+*.��-.(+A¡-.J'(yAD�S-§�VZKÌ×-�A'4V(O9p�34 -p8:(=� � ¹ (:� *5(m�kj (+*5� F7#/�'F�4dj (+*K�GA3(O�395#C-L!&)47* � Z<¦d�O�"¥��S������� � { ?���L* Ua`iJRcK? z ]vQSM�MO?KJR`N>
���* À 7�ä�d� Á ¢� ¸YÆ | y À�³dÁ ¢� ¸YÆvC Á�µVZ�å"Æ������v} "¬´(O- � ¸HÀ"!�³�Á ¢� ¸7ÆhZIuXJ3(O�²Y³dÁ ¢� ¸]» � Æ | À ²Y³dÁ ¢� ¸�» � ¸)y � ¸p» � Æ |
À ²Y³dÁ ¢� ¸�» � ¸7Æ | y��3Á � ¸p» � Æ | ²Y³dÁ ¢� ¸�» � ¸7Æ y�²Y³dÁ � ¸p» � Æ |À Á � ¸p» � Æ | y�²´³dÁ ¢� ¸p» � ¸"Æ |À 7�ä�d� | y À�Á ¢� ¸7ÆKCÊË
<¦d�O�"¥��S�����$#&%�j� 7����� º ' `iJ I �,� º ' `N>P º ( M { ?KJ$¢� ¸ Qe>0UV<NJ >KQe>KMO?KJ=M��´M { `iMQe>V� ¢� ¸ ¹»�º ��[������v} âÌ×& 7�ä�d� º 'â� �3A �Q� º '�-5J3(+��Tv6"!¡uXJ3(+4 *5(+8 µ'Záå'T ��L* º ''ZRÌ×-
&)470/0%4 ¹ 9x-.JD�S- ¢� ¸*)�+»�º � Z�Á, �(+;+� 0/0DA'(OBD�3#C-.#%47�.-'Záå'ZÚÆ@uXJ'(W*,(+95n'0%-Ç&)470/0%4 ¹ 9v&)*5478 1D� *,-EÁ)6qÆ4 &iuXJ3(+47*,(+8/-'ZÃ�3Z�Ë*Y© �"�H�"¯ ��+3£1032�?KM�7�]vJ=QSJ��#M'<�M { ?�UV<NQSJ�WYQZ9;9bQSJ��h?��½`ig9bde?V� z ? { `iui?)M { `iM ² n Á ¢H3¸7ÆIÀ H>:<XM { `iM 63#/� 9�À°H�»6H$À&' `iJ I �Q� À � HÐÁ�º<»ÅHqÆa� P º ' [54�?KJLUV?V� ¢H3¸ ¹»�º H �"M { `iMQe>V� ¢H3¸ Qe>t` UV<NJ >KQe>KM'?KJ=M�?V>KMfQSgh`iM'<N]�[ Ë
º7º - ���������� ����������������������� �"!"�#�����%$�&'�(&'����� )���#���»s47�39,#/A3(O* P AD��-§�y1D� #%*59KÁ @BA Ä.� A Æ Ä�C�C�COÄdÁ @ ¸Mħ�D¸"ÆË&)*,478 ��*5(OF7*5(O959,#/47�â8R4MA3(O0
� p À®�xÁ @ p Æ yxw p¹ J3(O*5(�²pÁ�w p Æ�À 'G� �'A ��#/9E� �$n3�3ÈY�34 ¹ �¡*5(OF7*5(O95#%47��&)n3�3;h-.#%47�UZ�®W�$(+9Q-.#/8R�S-5(@4 &Ë�#/9
¢�k¸DÁ��NƽÀ �kj (+*5� F7(�4 &�� 0%0´� p 95n3;§Jâ-.JD�S- � @ p » � � ���&)47*y9,478R( ��� ''Z��<4 -.#%;+(G-5JD�S-�&)47*y(Oj (+*Q! � ¹ (âJD�=j7(�� � n3�3ÈY�34 ¹ � 1D�S*.� 8:(O-5(+*�xÁ��NÆ��S�3A��S�¶(O9,-.#%8���-.47* ¢�k¸DÁ��NÆhZ½uXJ3(m^f_NakbYÑ8¼dg'akbMµ e�bYgD_ c���ÓÐg�¼dbMµ b�¼�¼d\8¼�� ��L*#/9XA'(OBD�3(OA�6"!
� ��L* À ��L* Á ¢�d¸3Á �NÆ5Æ |� �À �´ 7����� Á ¢�d¸3Á �NÆ,Æ | y À�Á ¢�k¸DÁ��NÆ5Æ |��� �À 7�ä�d� Á ¢�k¸DÁ��NÆ5Æ |� �#y�~À�Á ¢�d¸3Á �NÆ,Æ |� �À #/�"-.(OF7*.��-.(+A�63#/� 9 | y #/�"-.(OF7*.�S-5(+A�jS� *5#/� �3;O(iC Á�µ'ZÃ�YÆ� 8R� 0/0Vj �S0/n3(O9Ë4 & � ;=� n39,(E9,8�� 0%0363#/� 9Ë6'n'-s0/� *5F (wj �S*5#��S�3;+(SZ©¬2� *5F7(�jS� 0/n'(+9Ç4 & � ;+� n395(
0��S*5F7(Ë63#�� 9i63n'-I9,8��S0/0Yj �S*5#��S�3;+(SZvuXJ3#%9x0/(+� A39v-54E-.J3("¶i^)gDc����3g�¼�^)gD_ ± bya:¼dgDµ2bY\���-5JD�S-¹ ( ¹ #%0/0UA'#/95;On3959�#%�âA'(O-§�S#/0 ¹ J3(O� ¹ (H;O4kj7(+*<�347�'1D� *.�S8R(h-.*5#%;]*5(OF7*5(O959,#/47�UZ
ÌL��8R� �"!�;=� 9,(+9OT3�H1?47#%�Y-½(O9,-.#%8���-.(�JD� 9Ë� �G�S9,!M8R1V-.471'-5#/;��W47*,8��S03A3#/9Q-.*,#/63n'-5#/47��T-.J3�S-�#/9OT
¢� ¸�����Á � Ä �Q� | ÆKC Á�µ'Z Ê ÆuXJ3#%9X#/9<�m&f� ;O- ¹ ( ¹ #%0/0N4 & -.(O�¶8R� È (pn'95(K4 &�Z
����� �~¢h¤ �"!$#�¤&%'# ()#¥©¨®�ºI»�¼ ± \?_´³xµ2bM_ ± b�^f_Nakb½¼*�3g3Ô�&)47*s�K1D� *5� 8R(h-.(O* � #/9Ë� ��#%�"-.(+*QjS� 0?·s¸HÀ Á`ÃDħÅhÆ
¹ J3(O*5(�Ã�À Ã?Á @BA ÄDC�C�C+Ä @ ¸7ÆH�S�3A ÅGÀ Å�Á @BA Ä�C�CDCOÄ @ ¸7ÆH� *,(�&)n3�3;h-.#%47�39@4S&X-.J3(GA3�S-§�95n';§J¶-.J3�S- + ³dÁ �-, ·s¸7Æ/. º<»�¼ËÄ &)47*��S0/0 �-,10 C Á�µ'Z -"ÆÌL� ¹ 47*,A39+TvÁ`ÃDħÅhÆs-.*5� 139 �m¹ #%-.Jâ13*,476D� 63#%0/#C-L!�º<» ¼ËZ�±�(K;=�S0/0xº<» ¼�-.J3( ± \2�?b½¼dg3Ñqb4 &p-5J3($;+4 �'BDA3(O�3;+(>#%�Y-5(+*Qj �S0zZ(»s4 8R8:47�30C!7TX1?(+47130%(�n39,( É"Ê 1?(+*�;+(+�"-�;O47�'BDA3(O�3;+(
����� �G�"!"���&'�����t��� $ �%$ º º��
#/�"-.(O*,jS� 0%9 ¹ J3#/;§J ;+47*,*5(+9,1N4 �3A39�-.4 ;§J34V4 95#/�'F�¼ À ' C ' Ê Z �<4 -.(SP�·s¸ ^fc�¼dgD_еÐ\?egD_е � ^fc�³��´bMµ��]Ì×& � #/9��yj7(O;O-.4 *<-5J3(+� ¹ (Kn395(H��;+47�'B3A3(+�3;O(@9,(O-mÁf95n';§J>�y9513J3(O*5(47*�� �â(+0%0/#/1'95(kƽ#/�39Q-.(=�SA¶4 &v� ��#/�"-5(+*,jS� 0`Z
� g�¼�_Ð^`_2Ñ��<uXJ'(+*5(K#/9X8@n3;§J¶;O47�'&)n39,#/47�â� 6?47n'-�J'4 ¹ -.4�#/�"-5(+*51'*5(O-W�y;+4 �'BDA3(O�3;+(#/�"-.(O*,jS� 0`Z�®�;+47�'B3A3(+�3;O(�#/�"-.(O*,jS� 0v#%9E�34S-��G13*54 6D� 63#%0/#%-L!â9Q-§�S-5(+8:(+�"-�� 6?47n'- � 9,#/�3;O(� #%9w�HB'¿V(OA�rMn3� �"-.#%-L! T'�34S-X�@*5� �3A3478�jS� *,#�� 630%( Z � 478:(W-.( ¿M-.9w#%�Y-5(+*,13*5(h-w;+4 �'BDA3(O�3;+(#/�"-.(O*,jS� 0%9I� 9x&)470%0/4 ¹ 9+PÐ#%&qÌi*5(+1?(=��-I-.J3(X( ¿'1?(+*,#/8:(+�"-Ç4kj7(O*Ë�S�3A�4dj (+*OTY-5J3(�#/�"-.(O*,jS� 0 ¹ #/0/0;+47�"-.� #/�p-.J'(½13� *.� 8:(O-5(+* É"Ê 1?(+*´;O(+�"-Ð4 &V-5J3(I-.#%8R(SZÐuXJ'#/9´#/9´;O47*5*,(+;h-i6'n'-Ðn395(O0/(+9,9i9,#/�3;O(¹ (E*.� *,(+0%!m*5(+1?(=��-s-.J'(E9.�S8R(<(h¿V1?(+*5#%8R(O�"-s4kj7(O*X� �3AR4kj7(+*OZ�® 6?(O-,-.(+*Ë#/�"-.(O*513*,(O-.�S-.#%47�#/9w-5J3#/9OP
à]���Y�O� -7�U�O�'ª:�+�V¯�¯ �O�O�E�Y�d�Q�R�Y�d���+�V�d�Q�§¥`ªd�O��� 0$�7��¥Ç�+�S�k�E�+�'�k¨V�"�S�d�O��Ú�k����¥/���"¯2�)�Y¥E�����S¥��Y�W�O����¥]��AO£ àp���Y�O�6Â'�½�+�Vª$�+�V¯�¯ �O�O�G�d� � �Y�d�Q�>�"�d��+�V�d�Q�§¥`ªd�O�´�� 0]�7��¥Y�+�S�k���+�'�k¨V�"�S�d�O�w�Ú�k���S¥%���Y¯§�)�"¥M�"��ª"�"¥��S¯ �d���O�H���S¥��Y�W�O����¥� | £ àp���Y�O� �'�Ç�O�'ª$�O�'¯�¯ �+�O���d�h� �Y�k�Q�$�Y�d���+�'�d���§¥zªd�h�:�� 0 �7��¥W�O�S�k��+�V�k¨'�"���d�+�@�Ú�k����¥/���"¯ �)�"¥´�"�yª"�"¥���¯��k���+�����S¥��Y�W�O����¥2���d£��N�'ªK�+�'�k�.���"ªd�E�§¦7����Ç�h���O�'�d���§¥zªd�O�.���dÛ �O�'�k¨V�"�S�d�+�����k���S¥%���Y¯ ���)�"¥����Q�+¬3ªd���d�+�¡� �yª"�"¥���¯��k���+����S¥��Y�W�O����¥��H��A Ä � | Ä�C�C�C �¦d�S�� 0�� �S¥½�+���k�p�+�Vª"¥<���k���S¥/���"¯ �]���Ú¯á¯N�§¥��Y�R�§¦d��§¥zªd�����S¥ �"�E�h���S¥]���"¯�ªd� £ �¦d�S¥/� �����d� �d�O�+� ��� �Ú�k�§¥/�V�3ªd�O���§¦d� � �"�=� � �¥����7�+�d�.���dÛ:�§¦d���,�Y�W��� ©M�7��¥`���E���k�X� �d�S¥Ð�"�d�:� �d�S¥×£
*Y© �"�H�"¯ ��+3£����Yui?K] � I ` � ��JL? z >O9 `K9=?K]_>�]_?O9=<N]vM<V9bQSJ=Q�<NJ�9=<NdSdZ>D[���<N]0?��½`ig9bde?V�"M { ? �g�Q�� { M�>D` � M { `iM������9=?K]�UV?KJ=M�<'jM { ?�9=<V9b7�d�`iMfQ�<NJ�j�`iui<N]�`i]vg�QSJ���9bQSde<NM�> z QSM { �i7�J >D[�� >K7 `idSd � � � <N7 z QSdSd8>:?V?�`t>KM `iMO?Kg2?KJ=M�d Q"!;?#��M { Qe>Y9=<NdSd8Qe>�`�UVUK7�]�`iMO?�MO< z QSM { QSJ%$#9=<NQSJ=M�>&�' 9=?K] UV?KJ=MX<'j6M { ?�M�QSg2?D[�� � { ? � `i]V?Ê>D` � QSJ��xM { `iM µ7å(�� Qe>�` &�' 9=?K] UV?KJ=MUV<NJ�s I ?KJLUV?"QSJ=MO?K]vuN`id�j�<N]"M { ?)Mf]v78?cD7�M%7�J)!NJL< z J�9b]_<V9=<N]vM�Q�<NJ H <'j�9=?V<V9bde? z%{ <�j�`iui<N]`i]vg�QSJ���9bQSde<NMS> z QSM { �i7�J >D[ %Sj � <N7�j�<N]Kg `BUV<NJ�s I ?KJLUV?�QSJ=MO?K]vuN`idLM { Qe> z ` � ?Kui?K] �iI ` �j�<N]rM { ?r]_?V>KM�<'j � <N7�]rd Q j�?V� &�' 9=?K]XUV?KJ=M�<'j � <N7�]hQSJ=MO?K]KuN`idZ> z QSdSd´UV<NJ=M�`iQSJ M { ?hMf]v78?9 `i]�`ig2?KM'?K]�[Ê� { Qe>�Qe>#Mf]v78?r?Kui?KJRM { <N7i� {Å� <N7R`i]_?h?V>KMfQSgh`iMfQSJ���` I Q *�?K]_?KJ=M,+D7 `iJ=M�QSM �- ` I Q *�?K]_?KJ=M 9=<NdSd.+D78?V>KMfQ�<NJ0/B?Kui?K] �2I ` � [¬Ð��-.(+*OT ¹ ( ¹ #%0/0?A3#/9,;+n39,9�Fs�k! (+9,#�� �â8R(h-.J34MA39X#%� ¹ J3#/;§J ¹ (p-5*5(=��- � � 9X#%&Ð#%- ¹ (O*5(
�R*5� �3A3478�jS� *,#�� 630%(K� �3A ¹ (mA34R8��SÈ (@1'*5476D�S63#/0%#%-L!G9Q-§�S-5(+8:(+�"-.9]� 6N4 n'- � Z�ÌL�$1D� *�t-.#%;+n30/� *+T ¹ ( ¹ #/0/0?8R� È (]9Q-§�S-5(+8:(+�"-.9s0/#/ÈS(�lQ-5J3(]13*,476D� 6'#/0/#C-L!y-.JD��- � #%9X·s¸MT3F7#%j (+�G-5J3(
º7º:µ ���������� ����������������������� �"!"�#�����%$�&'�(&'����� )���#���AD�S-.�'T´#/9 É7Ê 1?(+*�;O(+�"-=Z o��<4 ¹ (Oj (+*OTi-.J'(+95(rFw�=!7(O95#/� ��#%�"-.(+*QjS� 0/9p*,(O&)(+*p-54âA'(+F7*,(+(ht×4 &�t6?(+0/#%(O&s13*54763� 63#/0%#%-5#/(+9OZ�uXJ3(+9,(BFw�=!7(O95#/� � #/�"-.(O*,jS� 0%9 ¹ #%0/0Ç�34 -+Tv#/��F7(+�'(+*.�S0zTi-5*.� 1�-.J3(1D� *5� 8:(O-.(O* É"Ê 1N(O*�;+(+�"-<4 &Ð-.J3(p-5#/8:( Z*Y© �"�K�"¯ �,+3£�� %vJ M { ?BUV<NQSJrW´QZ9;9bQSJ���>:?KMfMfQSJ��N�de?KM ·s¸âÀ Á ¢H3¸H» w,¸VÄ ¢H3¸�yÈwQ¸"Æ0z%{ ?K]V?w |¸ À 0/47FqÁ���� ¼vÆa�3Á�� P Æ [ � ]_<Ng 4�<:? * I QSJ���� >,QSJL? +D7 `id QSM � - ' [ $0/0QSM8j�<NdSde< z >,M { `iM+
Á H , ·s¸7Æ/. º�» ¼j�<N]t?Kui?K] �)H [�4�?KJLUV?V� ·s¸ Qe>t` º�» ¼ UV<NJ�s I ?KJLUV?,QSJ=M'?K]vuN`id�[ Ë®E9�8:(+�"-.#%47�3(+Aâ(=�S*50/#%(+*OT31N4 #/�"-�(+9Q-.#/8R�S-547*59w4S& -.(+�âJD�=j7(H��0/#/8:#%-5#/�3F��W4 *58R� 0UA3#%9Qt
-.*,#/63nV-.#/4 �UTY8R(+� �3#/�'FH-.J3�S-s(OrYnD�S-.#%47�¡Áfµ'Z Ê Æ½J3470%A39+TY-.J3�S-w#%9+T ¢� ¸ ����Á � Ä ¢�,� | Æ ZxÌL�G-.J'#/9;=�S95( ¹ (K;=�S�¶;O47�39,-5*5n3;h-@Á`� 131'*54=¿V#/8R�S-.(=Æs;+47�'B3A3(+�3;O(H#%�"-.(+*QjS� 0/9�� 9X&)470%0/4 ¹ 9+Z�¦d�+�"¥������ ������v�"¥z�E�"¯ ß. ��d�Q�+�âÝ��'�k¨V�"�S�d�O� � �k���S¥%���Y¯Ú£Z3Ê587:9;9=<:>:?�M { `iM ¢� ¸�����Á � Ä ¢�Q� | Æ [Äl?KM� cK?�M { ? {U|Ð} <'j#`r>KM�`iJ I `i] I � <N]vgh`id�`iJ I de?KM������ | À k A ÁQº<» Áf¼���� Æ5Æ ��M { `iMQe>V� + Á�� � ����� | ƽÀ ¼���� `iJ I + Á�» ����� | � � � ����� | ÆÇÀº½»¡¼ z%{ ?K]V? �ÈE���Á,'VÄ=ºdÆ [Äl?KM
·s¸HÀ Á ¢� ¸�» � ��� | ¢�Q� Ä ¢� ¸�y � ��� | ¢�Q� ÆKC Á�µ'Z �7Æ� { ?KJ + ³dÁ � , ·s¸ Æ�º º<» ¼�C Á�µ'Z4µ"Æ������v} �¬´(O-��I¸ À Á ¢� ¸�» � Æa� ¢�,� ZÇFË! � 959,n38R1V-.#/4 ���x¸�� � ¹ J3(O*5(�� E��Á ''Ä=ºdÆ Z��W(O�3;+(ST+ ³dÁ �-, ·s¸7Æ À
+ ³�� ¢� ¸p» ����� | ¢�,� � � � ¢� ¸)y ����� | ¢�Q���À+ ³��v» � ��� | � ¢� ¸�» �
¢�,� � � ��� |! º +#"
» � ��� | � � � � ��� |%$À º�» ¼�C Ë
¾D47* É"Ê 1N(O*x;+(O�Y-Ç;O47�'BDA'(+�3;O(<#%�"-.(+*QjS� 0/9OT"¼¡À ' C ' Ê � �'A � ��� | ÀºiC É - � �W0%(=� A3#%�3F-.4H-.J3(�� 1'13*54=¿V#/8R�S-.( É"Ê 1?(+*w;O(+�"-�;+4 �'BDA3(O�3;+(�#/�"-5(+*,jS� 0 ¢� ¸�( � ¢�,� Z�±�( ¹ #/0%0?A'#/95;On3959-.J'(H;O47�39,-5*5n3;h-.#%47�¶4S&x;+4 �'BDA3(O�3;+(m#%�"-.(+*QjS� 0/9�#/��8:47*5(�F7(O�3(+*5� 0/#C-L!�#/��-.J'(H*,(+9,-W4 &i-.J3(6?4V47ÈqZ
�����������©�)!�����$�& $����$l�&'��� º º É
*Y© �"�H�"¯ ��+3£�oÄl?KM�@BA Ä�CDC�C+Ä @ ¸�E(Fs(O*5�347n'0/0/#`Á H?Æ `iJ I de?KM ¢H3¸@À P k A�m ¸p.q A @ p [2� { ?KJÀ:Á ¢H3¸7Æ À Plk�| m ¸p.q A À:Á @ p Æ�À P�k�| m ¸p.q A HÐÁQº:»oH?Æ¡À Plk�|VP HÐÁ�º:»oHqÆ�À H2ÁQº:»H?Æa� P©[ 4�?KJLUV?V�I�,� À � HÐÁ�ºX»ÅH?Æa� P `iJ I ¢�Q� À � ¢H3¸3ÁQº�» ¢H3¸7Æa� P©[�� � M { ?�Y?KJ=M�]�`idÄ�QSg�QSM�� { ?V<N]V?Kgt� ¢H3¸�����Á.H´Ä ¢�� | Æ [h� { ?K]_?�j�<N]V?V�©`iJ6`K9;9b]_<���QSgh`iMO? ºÇ» ¼ UV<NJ�s I ?KJLUV?QSJ=M'?K]vuN`id Qe>¢H�( � ��� | ¢�Q� À ¢H�( � ��� | ¢H'¸3Á�º�» ¢H3¸"ÆP C
�Y<Ng9 `i]_? M { Qe> z QSM { M { ?xUV<NJ�s I ?KJLUV?RQSJ=MO?K]vuN`id,QSJ M { ? 9b]V?Ku:Q�<N7;> ?��½`ig9bde?D[ � { ?� <N]vgh`id T cV`N>:? I QSJ=M'?K]vuN`idYQe>2> { <N]vM'?K]BcD7�M,QSM#<NJ=d �Ê{ `N> `K9;9b]_<���QSgh`iMO?Kd � - d�`i]���?2>D`ig�T9bde? /XUV<N]v]_?VUKM�UV<Nui?K]�`:��?D[ Ë����� ����� ¢�¥��$#�¨�£�¨�� #�¨�¥�£�¤���½47n3*Ç;+470%0/(+� F7n3(wJ3� 9 ¹ *5#%-,-.(O��� �R� 0%F747*5#C-.J38 &)47*I9,#/8mn'0��S-5#/�3F��3#/139Ç4 &��]&f� #/*x;O47#/��Z�½47n � *,(R9,n3951'#/;+#%47n39H4 &X!747n'*@;+4 0/0/(+� F7n3(��á9K13*,47F7*.�S8R8:#/�3F�9,ÈM#%0/0%9+Z�¬´(O- @�A ÄDC�C�ChÄ @ ¸
A3(+�'4 -.( P 9,#/8mn'0��S-5(+A ;+47#%� �3#/139:� �3A 0/(h-,H À+Á @ À�ºdÆ Z�¬´(O-"!$#RA'(+�34 -5(¶-5J3(
J"!V1?4 -5J3(+9,#/9�-.JD�S-y-5J3(�95#%8mn30���-.47*H#/9y;+4 *5*5(O;O-R� �3A 0/(h-%! A A3(O�34 -5(�-5J3(�J"!V1?4 -5J3(+9,#/9-.JD��-p-.J3(y9,#/8mn30/�S-.4 *E#/9E�34 -p&f� #%*+Z&!$#K#%9p;=� 0%0/(OA�-.J3(:_UÓÐÔfÔWÍNÎUÏI\qakÍÐbMc=^fc�� �3A'! A #%9;=� 0%0/(OA�-5J3(�gDÔ akb½¼S_2g'ad^ �?b�Í?δÏI\DadÍÐbMc=^fc Z�±�(�;=� � ¹ *5#C-.(]-.J'(KJ"!M1?4 -.J3(O95(O9]� 9
!$#<P�HGÀº���� j7(O*595n'9 ! A P�H)(À�º:����CÌ×& ¹ (WF7(O-Eº ' '�95#%8mn30���-.(+A��D#%139w� �'AG� 0%0к�' '@� *,(Hº T ¹ (]8:#/F7J"-½*5(+� 9547�3� 630%!y-.� È (<-.J3#%9� 9p(OjM#%A3(+�3;O(:� F"� #%�39,-*!$#=ZHuXJ3#/9p#%9p� �¡(h¿'� 8:130%(m4 &sÍNδÏI\DadÍ2bVc+^`cGakbMcOad^f_2ÑÐZ�¬2�S-.(O*¹ ( ¹ #/0%0�9,(+(�-.JD�S-�;O478:8R47��1'*.� ;h-.#/;O(�#/9w-.4�*,(�Ö,(+;h-+!�# ¹ J3(O�
� ¢H3¸p» A| �¾ A, ¸ � ��C±�( ¹ #%0/0�A'#/95;On3959�-.J'#/9�� �3Aâ4 -5J3(+*�J"!M1N4 -5J3(+9,#/9X-.(O9,-59W#%�âA'(O-§�S#/0U#%�¶�y0/�S-.(O*X;§JD� 1'-5(+*+Z���.- � # %/�6¤6£ %�«10 2��3� #�¤&!6£54Á n'*mA3(OB3�3#%-5#/47��4 &�;+47�VBDA3(+�';+(�#/�"-5(+*,jS� 0½*5(OrYn3#/*,(+9@-5JD�S-
+ ³dÁ � , ·s¸7Æ . º�» ¼&)47*]�S0/0 �),�0 Z�®E�$ÏI\?^f_Na76R^`c+b¡gDchÎ2e�Ï2ak\Dad^ ± ;+47�'B3A3(+�3;O(y#/�"-.(O*,jS� 0Ð*5(OrMn'#/*5(O9E-.J3�S-
º�� ' ���������� ����������������������� �"!"�#�����%$�&'�(&'����� )���#���0/#%8 #/�V&z¸�� �
+ ³�Á � , ·s¸ Æ). º@» ¼ &)47*y�S0/0 � , 0 Z ®W� Ói_Ð^��z\b¼�e gDcOδe�ÏÐa=\qak^ ±;+4 �'BDA3(O�3;+(y#%�Y-5(+*Qj �S0v*5(OrYn3*5(O9]-.JD�S-E0/#%8 #/�'&`¸�� � #/�'& ³���� + ³kÁ �), ·s¸7Æ . ºW» ¼ËZpuXJ3(� 131'*54=¿V#/8R�S-.( �<47*58R� 0Ct×6D� 9,(+A�#/�"-5(+*,jS� 0´#%9��:1?47#%�Y- ¹ #%95(K� 9Q!V8:1'-54 -.#%;p;+47�'B3A3(+�3;O(@#%�Vt-.(O*,jS� 0`ZxÌL��F7(+�3(O*.� 0`T'#C-X8R#%F7J"-w�34S-�6N(��mn3�'#%&)47*,8 � 9Q!M8R1'-54 -.#%;];+4 �'BDA3(O�3;+(K#/�"-.(O*,jS� 0`Z