phar2813revision

19
School of Mathematics and Statistics The University of Sydney Statistic component of PHAR2813 Therapeutic Principles Lecture Notes - Revision John T. Ormerod

Transcript of phar2813revision

Page 1: phar2813revision

School of Mathematics and StatisticsThe University of Sydney

Statistic component of PHAR2813 Therapeutic Principles

Lecture Notes - Revision

John T. Ormerod

17th May 2011

Page 2: phar2813revision

1. Probability on SetsFormulae:

I P(A ∪ B) = P(A) + P(B)− P(A ∩ B)

I If A and B are independent then

P(A ∩ B) = P(A)× P(B),

I otherwise A and B are dependent and

P(A ∩ B) 6= P(A)× P(B).

I If A and B are mutually exclusive then

P(A ∩ B) = 0.

I Bayes rule:

P(A|B) =P(A ∩ B)

P(B)and P(B|A) =

P(A ∩ B)

P(A).

Page 3: phar2813revision

2. Medical Jargon used in Diagnostics

Know the definitions of medical jargon in terms ofprobabilities, e.g.

I Sensitivity – P(S+|D+).I Specificity – P(S−|D−).I False Negative – P(S−|D+)

I False Positive – P(S+|D−)

I PV+ – P(D+|S+).I PV− – P(D−|S−).I Prevalence – P(D+).

Page 4: phar2813revision

3. Calculate PV+

Calculate P(D+|S+):I You will be given the formula for P(D+|S+):

PV+ = P(D+|S+)

=P(S+|D+)P(D+)

P(S+|D−)P(D−) + P(S+|D+)P(D+).

I You will need to read the problem description, and fromthe problem description, identify the constituent parts ofthe formula.

I A little algebra might be needed, e.g. know, for example,

P(S+|D−) = 1− P(S−|D−) and P(D−) = 1− P(D+).

Page 5: phar2813revision

4. Binomial and Poisson ProbabilitiesBinomial Probabilities: X ∼ Binomial(n, p), 0 ≤ X ≤ n and X isan integer.

I P(X = x) = BINOMDIST(x,n,p,0) in EXCEL.I P(X ≤ x) = BINOMDIST(x,n,p,1) in EXCEL.I E(X) = np and Var(X) = np(1− p).

Poisson Probabilities: X ∼ Poisson(µ), X ≥ 0 is an integer.I P(X = x) = POISSON(x,n,p,0) in EXCEL.I P(X ≤ x) = POISSON(x,n,p,1) in EXCEL.I E(X) = µ and Var(X) = µ.

Be careful about:I P(X > x) = 1− P(X ≤ x)

I P(X < x) = P(X ≤ x− 1)

I and P(X ≥ x) = 1− P(X ≤ x− 1)

since x only takes integer values.

Page 6: phar2813revision

5. Normal Probabilities

Normal Probabilities: X ∼ N(µ, σ2), X is continuous.I E(X) = µ and Var(X) = σ2.I P(X = x) = 0 (But impossible things happen all the time!)I Many equivalent statements

P(X ≤ x) = NORMDIST(x, µ, σ,1) in EXCEL

and if Z = X−µσ then

P(Z ≤ z) = Φ(z) = NORMSDIST(z) in EXCEL

Page 7: phar2813revision

6. Normal Probabilities Continued

Some formulae:I P(Z ≥ a) = 1− P(Z ≤ a) = 1− Φ(a).I P(Z ≥ −a) = 1− P(Z ≤ −a) = 1− Φ(−a) = Φ(a).I Assuming a < b, P(a ≤ Z ≤ b) = Φ(b)− Φ(a).I If X1 ∼ N(µ1, σ

21) and X2 ∼ N(µ2, σ

22) are independent then

a1X1 + a2X2 ∼ N(a1µ1 + a2µ2, a21σ

21 + a2

2σ22)

I Means and totals:

T =

n∑i=1

Xi ∼ N(nµ,nσ2) and X = 1n

n∑i=1

Xi ∼ N(µ,σ2

n

).

Page 8: phar2813revision

7. Approximating Distributions

Poisson Distribution on different time lengths:I If the rate of occurrence is Poisson(µ) per unit interval, and

if W counts the responses over an interval of length t then

W ∼ Poisson(tµ).

Approximating a Binomial Distribution by a PoissonDistribution:

I If X ∼ Binomial(n, p), and if n is large and p is small, agood approximate variable is Y ∼ Poisson(µ), whereµ = np, and so

P(X ≤ r) ≈ P(Y ≤ r).

Page 9: phar2813revision

8. Approximating Distributions

Approximating the distribution of total or mean by a NormalDistribution, i.e. the central limit theorem:

I If E(X) = µ, Var(X) = σ2 and n is large then

T =

n∑i=1

Xi ∼ N(nµ,nσ2) and X = 1n

n∑i=1

Xi ∼ N(µ,σ2

n

).

Page 10: phar2813revision

9. Confidence Intervals

Know how to calculate a confidence interval for the cases:I Normal/Constant σ2 = σ2

0 case: x± z∗ × σ0√n

I Normal/Unknown σ2 case: x± z∗ × s√n

I Proportions: p̂± z∗√

p̂(1−p̂)n

where P(|Z| ≤ z∗) = 1− α, P(|tn−1| ≤ t∗) = 1− α and α istypically 5%, e.g. P(|Z| ≤ 1.96) = 0.95. Notez∗ = ABS(NORMSINV(α/2)) and t∗ = ABS(TINV(α,n− 1)).

Also:I Interpretation of confidence intervals.

Page 11: phar2813revision

10. p-values

Definition: The p-value is the probability of observations atleast extreme of unusual as actually observed. Also, thep-value is calculated assuming that H0 is true.

Interpretation:I Small p-values (< 0.05), for example a p-value of 0.01

means either 1. or 2. is true (but we cant tell which):1. H0 is true and the observed sample is improbable.2. H0 is not true.

I Late p-values (> 0.05), for example a p-value of 0.2 means.1. The observed sample is consistent with H0.2. It does not mean H0 is actually true (the sample could have

come from a different distribution for example).

The smaller the p-value, the stronger the evidence against H0 infavour of HA.

Page 12: phar2813revision

12. Short Answer – Hypothesis Testing

Given a problem description:I Select an appropriate null and alternative hypothesis.I Select an appropriate test statistic for the problem (and

know its distribution), i.e. choose the correct test.I State the EXCEL command to calculate the p-value.I Given the p-value draw a conclusion (again, interpret the

p-value in relation to the hypothesis).Note that test statistics and their distributions are in theformula sheet.

Too many tests to go through now. Consult the lecture notes.

Page 13: phar2813revision

13. Short Answer – Study Types

What is the difference between a Prospective and RetrospectiveStudy?

A prospective study is based on subjects who are initiallyidentified as “disease-free” and classified by presence orabsence of the risk factor. A random sample from each groupis followed in time (prospectively) until eventually classifiedby disease outcome.

In a prospective study the row totals are fixed.

A retrospective study is based on random samples from eachof the two outcome categories which are followed back(retrospectively) to determine the presence or absence of therisk factor for each individual.

In a prospective study the column totals are fixed.

Page 14: phar2813revision

14. Short Answer – Relative Risk/Odds RatiosConsider the general table:

D+ D−

S+ a b a + bS− c d c + d

a + c b + d a + b + c + d

Relative risk: Risk of the disease given a risk factor divided bythe risk of the disease without the risk factor. Formula:

RR =P(D+|S+)

P(D+|S−)=

a(c + d)

c(a + b).

Only makes sense if data from a prospective study or from asample of completed records.

Odds ratio: Many definitions. For the general 2× 2 table allcome down to:

OR =adbc.

Can be calculated regardless of the type of study used.

Page 15: phar2813revision

15. Sample Question

The presence of a symptom (S+) is used to diagnose thepresence of a certain disease (D+). The probability, P(D−|S−) isknown as:(a) sensitivity(b) specificity(c) PV− (This one is correct)(d) odds ratio(e) relative risk.

Page 16: phar2813revision

16. Sample Question

In a certain community, 10% of all adults have depression(P(D+) the prevalence). Suppose that a social worker in thiscommunity correctly diagnoses 95% of all adults withdepression as having depression (P(S+|D+) the sensitivity).This same social worker also incorrectly diagnoses 2% of alladults without depression as having depression (P(S+|D−)false positive). What is the probability that an adult, diagnosedby the social worker as having depression, actually hasdepression(a) 0.492(b) 0.995(c) 0.010(d) 0.841 (This one is correct)(e) none of the above

Page 17: phar2813revision

17. Sample Question

If the prevalence of an infection is 0.0003, the probability of atmost 2 cases in a random sample of 10000 is, (using the Poissonapproximation):(a) P (Y < 3), with Y ∼ Poisson(3), (This one is correct)(b) P (Y ≤ 3), with Y ∼ Poisson(3),(c) =1-POISSON(0,3,1), (in EXCEL)(d)=POISSON(0,3,1)+POISSON(1,3,1)+POISSON(2,3,1),(in EXCEL)(e) none of these.

Page 18: phar2813revision

19. Sample Question

Consider the distribution of serum cholesterol levels for allmales in the U.S. who smoke. The distribution is normal withan unknown mean (µ) and a known standard deviationσ = σ0 = 46 mg/100 ml. Suppose we draw a random sample ofsize 12 from the population of male U.S. smokers and thesemen have a mean serum cholesterol level x = 217 mg/100 ml.Based on this sample, an appropriate 95% confidence intervalfor the population mean µ is:(a) (217− 1.96× 46, 217 + 1.96× 46).(b)(

217− t∗ × 46√12, 217 + t∗ × 46√

12

)where P(|t11| < t∗) = 0.95.

(c)(

217− 1.96× 46√12, 217 + 1.96× 46√

12

). (This one is correct)

(d)(

217− t∗ × 46√12, 217 + t∗ × 46√

12

)where P(|t11| > t∗) = 0.95.

(e) none of these.

Page 19: phar2813revision

20. Sample Question

A p-value of 0.01 means:(a) there is 1% chance H0 true,(b) there is 1% chance H1 true,(c) the data are consistent with H0,(d) there is evidence against H0, (This one is correct)(e) none of these.