Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf ·...

32
Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur 10:00-12:00 6:30-8:30 lab Thur 1:00-3:00 8:30-10:30 통계학의 이해, 이승욱 편저, 자유아카데미 Biostatistics: A Foundation for Analysis in the Health Sciences, W.W. Daniel, Wiley slide files are available at http://hosting03.snu.ac.kr/~hokim/ -> 열린강의실 English ~~

Transcript of Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf ·...

Page 1: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur 10:00-12:00 6:30-8:30 lab Thur 1:00-3:00 8:30-10:30 통계학의 이해, 이승욱 편저, 자유아카데미 Biostatistics: A Foundation for Analysis in the Health Sciences, W.W. Daniel, Wiley slide files are available at http://hosting03.snu.ac.kr/~hokim/ -> 열린강의실 English ~~

Page 2: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• We will learn

– Basic concepts of statistics

– Application to public health researches

– Statistical software (SAS and R)

• Problems will be given at the end of every lecture

• Due on the next lecture (One week later)

• Home works (30)%, midterm exam (30%), final exam (30%), etc (10%)

• Office hour 9:00-10:00 5:00-6:00 on Thursday at 221-209 or 220-706

Page 3: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

Statistical Software (SAS and R)

http://dss.princeton.edu/training/RStata.pdf

R을 이용해서 누구나하는 통계분석, 안재형, 한나래

case- sensitive

Page 4: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur
Page 5: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• 기술통계학 (descriptive statistics)

• 추측통계학 (inferential statistics)

• 통계학 (statistics):

1) 자료를 수집, 정리, 요약

to collect, organize, and summarize data

2) 자료의 일부만 관찰하여서 그 자료의 출처가 되는 전체 자료의 특성에 관한 추측

Observe only a portion, draw information on the whole

Page 6: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.2 basic concepts

• 생통계학 (Biostatistics)

– 통계학은 다양한 분야에서 사용되고 있음 (경제학, 심리학, 공학,…등등)

– Statistics are used in many fields (economics, psychology, engineering, etc)

– 생명과학 분야의 자료가 가지는 특성

– Characteristics of biological data

– 특히 의생명, 보건학 자료가 가지는 특성

– Public health & biomedical data

– 미국의 예

– US examples

Page 7: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.2 basic concepts

• 변수 (variables)

– 생통계분야 연구의 중요한 목적 중 하나는 인간의 건강에 영향을 미치는 요인과 그 이유를 설명하는 것이다.

– One of major purposes of statistics in public health is to find out the factors and explain how

– 만약 모든 사람들이 같은 값을 가지는 결과가 있다면 이와 관련이 있는 설명변수를 찾기는 불가능하다.

– If everyone has one value (constants, not variable), it is impossible to see different health outcomes by this factor

Page 8: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.2 basic concepts

• 변수 (variables) – 이를 위해서 사람들마다 다른 값을 가지는 건강 변수와 위험

요인 변수들을 관찰한다.

• 양적변수 (quantitative variables): 숫자로 표시되는 경우

• 질적변수 (qualitative variables) : 수량화하기 어려운 경우

• 확률변수 (random variable): 측정한 변수가 우연성(확률)에 의해서 관찰되는 경우 (values of the variable are observed with certain probability rules) – 이산확률변수 (discrete random variable): 불연속적으로 관

찰되는 경우

– 연속확률변수 (continuous random variable): 연속적인 값으로 주어지는 경우

Page 9: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.2 basic concepts

• 모집단 (population) – 연구의 궁극적 목표 (target of the study)

– 각각의 구성인자로 이루어진 집단 (composed with individual elements)

– 유한, 혹은 무한 (finite or infinite)

• 표본 (sample) – 모집단을 전수조사하는 것이 적당하지 않다면 표본을 뽑아

야 함

– We may investigate all elements of the population. Or we may select samples.

– 표본을 과학적으로 추출하는 것이 매우 중요함

– How to select samples is extremely important.

Page 10: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

10

Representativeness(1)

Page 11: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

11

Representativeness(2)

Page 12: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

12

Representativeness(3)

Page 13: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

13

Representativeness(4)

Page 14: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

Basic concepts of sampling survey

• 전수조사 혹은 총조사 (census) vs. 표본조사 (sampling survey) : 전수조사가 불가능하거나 혹은 더 정확하지 않을 수도 있음 (Census is not always more accurate.)

• 대상모집단(target population) and 추출모집단(sampling population): may be different (ex. Telephone survey)

Page 15: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• 표본오차 (sampling error): 일부만을 조사하기 때문에 발생 (happens because we observe sample, not the whole) always positive, statistical, probabilistic, theoretically derivable,

• 비표본오차(non-sampling error) 혹은 비표집오차: 그 이외의 오차

– 대상모집단과 추출모집단의 차이 (difference between target population and the sample)

– 설문지 결함 (Faults of the questionnaire)

– 무응답오차 (non-response)

– 기타 (조사원 불성실, 조사단위의 누락, 자료처리 과정에서의 오류 등), etc (interviewer’s error, deletion from the survey, data processing error)

Basic concepts of sampling survey

Page 16: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.4 organizing data

보기(example)

1.4.1

Age of the

patients

Page 17: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

Sort by the age

Page 18: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.5 frequency table

• Sturge’s rule : k=1+3.222(log 10 n)

k=# of classes, n=# sample

width of the classes : w=R/k

R=range (max-min)

Page 19: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

Example 1.5.1

• K=1+3.322(log57) =about 7

• R/k=(79-12)/7=9.6

= about 10

Page 20: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• Frequency table

Page 21: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• histogram

Page 22: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur
Page 23: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

Population & sample

• population

• parameters 2( , )N

• sample

• estimates

1, , nY Y

1

1 n

i

i

Y Yn

2 2

1

1( )

1

n

i

i

S Y Yn

Page 24: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• parameter: constants which determine statistical properties of the assumed model

Y=a+b x 2( , )N

2

2

1 ( )exp

22

x

Page 25: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.6 measuring central location

• statistic<- sample,

• parameter<-population

• Mean of the pop

(parameter)

• Mean of the sample

(statistics)

1

N

i

i

x

N

1

n

i

i

x

xn

Page 26: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• 중위수는 평균과 달리 극단적인 값에 영향을 훨씬 덜 받는다. (median은 robust한 통계량이다) : 자료가 홀수개이면 가운데 값이고 짝수개이면 가운데 2개의 평균값이다.

• Median is less sensitive (robust) to the outliers:

• 최빈수 (mode)

1.6 measuring central location

Page 27: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.7 Measuring variability

• 산포성: 범위, 분산(표준편차), 변이계수

• Variability: range, variance(standard deviation), CV (coefficient of variation)

• 범위 = 최대값-최소값, Range=Max-Min

• 모집단의 분산

• (Variance of the pop)

• 표본의 분산

• Variance of the sample

2

2 1

( )N

i

i

x

N

2

2 1

( )

1

N

i

i

x x

sn

n

Page 28: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

• 변이계수 CV=

• SD’s are similar

• CV 10/145*100=6.9 (%)

10/80*100=12.5 (%)

(100)s

x

1.7 Measuring variability

Page 29: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1

1

k

i i

i

k

i

i

m f

x

f

1.8 measuring central location for grouped data

Page 30: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1

1

2086.536.6

57

k

i i

i

k

i

i

m f

x

f

Page 31: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

1.9 variance and SD fro grouped data

2

2 1

1

( )13347.37

238.345956

1

k

i i

i

k

i

i

m x f

s

f

Page 32: Introduction to Biostatistics Prof. Ho Kim (김호hosting03.snu.ac.kr/~hokim/int/2014/chap1.pdf · 2014-03-06 · Introduction to Biostatistics Prof. Ho Kim (김호) lecture Thur

homework

• 연습문제 1.9.1 1.9.2 1.9.3

• 종합문제 22 23 28

• English translation file is also available at my web site.