유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We...

108
유전자연구 통계기법 김호 서울대학교 보건대학원 2007/01/13

Transcript of 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We...

Page 1: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

유전자연구 통계기법

김호

서울대학교 보건대학원

2007/01/13

Page 2: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Human and Chimp

• How Similar ?

• Very Similar ! (99.999% ?)

Page 3: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

차례

• 서론 및 일반이론 (Statistical Genetics)

• Linkage Analysis

• SNP association

• Sample Size Problem (Two stage Design)

• SAS Genetics 소개 및 구현의 예

Page 4: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Genotype

Page 5: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Allele frequency

Page 6: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Genotype frequency

Page 7: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Hardy-WeinbergIn a stable population with random mating, allele freq predicts genotype freq.

Goodness-of-fit can be applied to test H-W Equilibrium

Page 8: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Chi-square Test Ho: 우리의 자료가 특정모형(HWE)을 따른다2p pq

qp 2q

2q

1)(( −−= 개수모수의추정할개수유전형의자유도 )

통계기본이론

자유도 = 범주의 개수-1-추정한 모수의 수

22( - ) χ∑ ∼ df

관찰값 기대값

기대값

Page 9: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• HWE 예제 1

- p = (2ⅹ298+489)/(2ⅹ1000) = 0.5425q = (489+2ⅹ213)/(2ⅹ1000) = 0.4575

- P(A) = p2=(0.5425)2=0.2943P(Aa) = 2pq= 2ⅹ(0.5425)ⅹ(0.4575)=0.4964P(aa) = q2=(0.4575)2=0.2093

- 기대 값 (expected frequency)AA = P(AA)ⅹ1000 = 294.3064Aa = P(Aa)ⅹ1000 = 496.3875aa = P(aa)ⅹ1000 = 209.3063

1110001000total

0.20930.2130209.3063213aa

0.49640.4890496.3875489Aa

0.29430.2980294.3063298AA

기대값관찰값기대값관찰값

빈도개수

Page 10: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• 검정통계량

자유도=3-1-1=1

∴자유도가 1인 카이제곱 분포에 근거가 p값이 0.6379이므로 관찰된 값은 Ho (HWE 상태)를 기각할 수 있는 충분한 근거가 없다. 즉 HWE 상태라고 결론 내린다.

실무에서는 genotype error check의 방법으로 많이 사용된다.

2 2 22 (298 294.3063) (489 496.3875) (213 209.3063)=

294.3063 496.3875 209.30630.2215

χ − − −+ +

=

Page 11: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Test of association (Odds ratio, Chi-square test)2p pq

qp 2q

2q

1)(( −−= 개수모수의추정할개수유전형의자유도 )

통계기본이론

Nn+2n+1Total

n2+n22n21Control

n1+n12n11Case

Total21

1 1 11 1 12 1 11 12 11 22

2 2 21 2 22 2 21 22 21 12

/(1 ) ( / ) /( / ) /OR =/(1 ) ( / ) /( / ) /

p p n n n n n n n np p n n n n n n n n

+ +

+ +

−= = =

ChiChi--square test with square test with dfdf=(#col=(#col--1)(#row1)(#row--1) :1) :

Ho: OR=1 Ho: OR=1

Expected cell freq is bigger than 5, if not use FisherExpected cell freq is bigger than 5, if not use Fisher’’s Exact s Exact test test

Page 12: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Chi-square 예제1 : Genotype-based

2p pq

qp 2q

2q

1)(( −−= 개수모수의추정할개수유전형의자유도 )

통계기본이론

n1+

n1O

n1A

Mm

Nn0+n2+Total

n+ON0ON2OControl

n+AN0AN2ACase

TotalmmMM

2 0MM/mm

2 0

1 0Mm/mm

1 0

OR =

OR =

A O

O A

A O

O A

n nn nn nn n

=

=

Page 13: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Linkage DisequilibriumAlleles at different sites should occur in a combinations relative to their SNP allele freq

Page 14: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

LD Block

Page 15: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Shaw et al. Am J of Medical Genet 114 205-213 (2002)

Page 16: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

From SNP to Haplotype

DNA Sequence

GATATTCGTACGGA-TGATGTTCGTACTGAATGATATTCGTACGGA-TGATATTCGTACGGAATGATGTTCGTACTGAATGATGTTCGTACTGAAT

SNP

SNP

1 2

3

4

5 6

AG- 2/6

GTA 3/6AGA 1/6

Haplotypes

PhenotypeBlack eyeBrown eyeBlack eyeBlue eyeBrown eyeBrown eye

Page 17: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

In-silico Haplotyping: Approaches

1) Clark’s algorithm

2) E-M algorithm (expectation-maximization algorithm)

3) Bayesian algorithm

Page 18: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Clark’s Algorithm

1) Find Homozygotes or heterozygotes at one locus

SNP1 T T

SNP2 A A

SNP3 C C

T-A-C

T-A-C

SNP1 T T

SNP2 A A

SNP3 C G

T-A-C

T-A-G

Unambiguously defined

Page 19: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Clark’s Algorithm2) Try to solve ambiguous haplotype as a combination of solved ones

SNP1 A T

SNP2 A A

SNP3 C G

T-A-C : solved one

A-A-G

Continue until either all haplotypes have been solved or until no more haplotypes can be found in this way

……………………………

Page 20: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Clark’s Algorithmproblems

• No homozygotes or single SNP heterozygotes -> chain might never get started

•Many unsolved haplotypes left at the end

•Quite useful in practice !!

Page 21: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

EM Algorithm• Use multinomial likelihood with HWE

Pr(AT//AA//CG)

=pr(AAC/TAG)+pr(AAG/TAC)

=pr(AAC)pr(TAG)+pr(AAG)pr(TAC)

Falling and Schork(2000) showed that EM is better than Clark’s algorithm

Page 22: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

A Gibbs sampler, Stephens et al (2001)

• G=(G1, …, Gn) observed multilocus genotype freq

H=(H1, …, Hn) unknown haplotype pairs

F=(F1, …, FM) M unknown pop’n hap freq

1. Choose individual i from all ambiguous individuals

2. Sample Hi(t+1) from pr(Hi|G,H-i

(t))

3. Set Hj(t+1)=Hj

(t) for j=1,2,…,i-1,i+1,…n

Page 23: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 24: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Environmental Effects

Assume 2(0, )i EE NID σ≈

Normal

Bell shaped

Independent (not correlated)

Distribution Mean=0Homogeneous Variance

Assume the phenotype(Y) is the combined effect of the genotype(G) and the environment(E)

i i iY G E= +

Page 25: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 26: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 27: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 28: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 29: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 30: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Putative gene(locus)

Gene ?Phenotype

Linkage analysisLinkage analysis(LD, (LD, sibpairsibpair et al)et al)

Association studyAssociation study

New GeneNew GeneDiscoveryDiscovery

SegregationSegregation

Page 31: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Heritability

58감성34혈중최대 젖산 농도

66사회적응력44혈중지질 농도

47기억력72아미노산 분비

76계산능력63몸무게

84최대맥박수85키

63언어능력29수명

유전율 (%)형질(Trait)유전율 (%)형질(Trait)

Page 32: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 33: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 34: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Model

i=1,2,..I the sipship, j=1,2,Fi the members of sipship, deviation of the family form the , deviation of member j from the family mean.

2

2

(0, )

(0, )

ij i ij

i B

ij W

Y X E

X N

E N

µ

σ

σ

= + +

∼∼

ijEiX µ

2

2 2B

B W

σσ σ+Heritability=

Page 35: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

A: Additive D:Dominance I:Epistatic

E: Environmental (Common+Independent)

Narrow sense Heritability = A / Total

Broad sense Heritability = Genetic / Total

Variance Component Model

P A D I E G E= + + + + ×

Genetic Nuisance(?)

Page 36: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

How to identify the genes

• Family study

– Linkage analysis: pedigree 필요

– Sib pair analysis: oligogenic, multigenic

• Population study

– Case-control association study

Page 37: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

21 1

1

11

{ ( )}k

i ii

k

ii

D ET

V

=

=

−=∑

Page 38: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Number of genetic determinants by effects

1 2 ... kt t t< <

Page 39: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Linkage analysis: Key Concepts

Page 40: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Linkage analysis: Key Concepts

Page 41: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Recombination

Stochastic

Stochastic

Stochastic

…….

子1 子2 …

Stochastic

Linkage analysis: Key Concepts

Page 42: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• PARAMETRIC LINKAGE ANALYSISTo estimate the recombination fraction between markers and a hypothesized trait locus, where inheritance parameters of the trait locus (mode of inheritance, penetrance, phenocopy rate, allele frequencies etc) must be specified.

Ex. Lod score method

Page 43: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• LOD SCORE

The common logarithm of the likelihood ratio:

Z(θ) = log10 [L(θ ) / L(½)]

where θ is the recombination fraction between two loci

Page 44: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Purpose Of The Lod Score Method

1. Estimation of the recombination fraction, θ

2. Hypothesis testing

H0: θ = ½ (absence of linkage)

H1: θ < ½ (linkage)

max 10( ) log [ ( ) / (1/ 2)]Z Z L L= =θ̂ θ̂

Page 45: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Scale For Testing Linkage

Zmax ≥ 3 : Strong linkage

Zmax > 0 : Support linkage

Zmax < 0 : Against linkage

Zmax = 0 : No support

(not related to recombination in linkage or no linkage)

Page 46: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Phase known pedigree

Page 47: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Figure 2 Phase known pedigree

• The maximum likelihood estimator of is 2/6=1/3

2 46 2 4

10 102 4

(1 )( ) log log 2 (1 )0.5 0.5

Z θ θθ θ θ−= = −

(1/ 3) 0.1475Z =

θ

Page 48: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Phase-unknown pedigree

Page 49: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Figure 3 Phase-unknown pedigree

• The maximum likelihood estimator of is not so trivial

• The MLE is found to be 0.5 by numerical method

4 2 2 4

2 2 2 2

1 1( ) (1 ) (1 )2 21 = (1 ) [ (1 ) ]2

L θ θ θ θ θ

θ θ θ θ

= − + −

− + −

θ

Page 50: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Genotype Unknown-Phenotype known

Page 51: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Figure 4 Genotype Unknown-Phenotype known

( ; ) Pr( ) Pr( )

Pr( | , ; )

and we know thatPr( ) Pr( ) Pr( | )

ma pa

offs ma paoffspring

ma G

L data Ph Ph

Ph Ph Ph

Ph G Ph G

θ

θ

=

×

=

Page 52: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• NONPARAMETRIC LINKAGE ANALYSISInheritance parameters of the trait locus are not specified. Rather, one focuses on pairs (or multiples) of affected individuals and investigates marker allele sharing among these individuals, contrasting observed allele sharing with that expected when the marker has nothing to do with the trait.

Ex. IBD (identical by descent) test

Page 53: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

IBS (Identical by state), IBD (Identical by descent)

Human Genome Epidemiology, Khoury, Little, Burke

Page 54: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

AN EXAMPLE FAMILY WITH DISEASE LOCUS AT THE MARKER

3 4+ –

3 2+ –

3 3+ +

3 4+ –

2 3– +

2 4– –

• Only ‘+ +’ indicates as “affected”(‘+’ is recessive to ‘–’)

** Qualitative Trait

Sib-Pair Markers

sib1 sib23 | 3 3 | 33 | 3 3 | 43 | 3 2 | 33 | 3 2 | 43 | 4 3 | 43 | 4 2 | 33 | 4 2 | 42 | 3 2 | 32 | 3 2 | 42 | 4 2 | 4

Disease Status

d1 d2+ ++ -+ -+ -- -- -- -- -- -- -

# ofShared i.b.d.

2110201212

C

10.250.250.250.50.50.50.50.50.5

• Cj = (dj1 – µ) (dj2 – µ)

= α + β IBDj + εj

Page 55: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Linkage And LD

- The two loci can be assumed to reside on different chromosomes.

The presence of LD does not necessarily imply linkage between the loci considered.

- Although LD originally referred to an association of alleles at different loci, it has become customary to take LD to mean association among alleles due to close linkage. “allelic association”

Page 56: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Genomewide Linkage Analysis of Bipolar Disorder by Use of a High-Density Single-Nucleotide Polymorphism (SNP) Genotyping Assay: A Comparison with MicrosatelliteMarker Assays and Finding of Significant Linkage to Chromosome 6q22

• F. A. Middleton,1,2,3 M. T. Pato,2,3,4 K. L. Gentile,1,2 C. P. Morley,2 X. Zhao,1,2 A. F. Eisener,2 A. Brown,1,2 T. L. Petryshen,6 A. N. Kirby,5,6 H. Medeiros,2,4 C. Carvalho,2 A. Macedo,8 A. Dourado,8 I. Coelho,8 J. Valente,8 M. J. Soares,8 C. P. Ferreira,9 M. Lei,9 M. H. Azevedo,4 J. L. Kennedy,10 M. J. Daly,5 P. Sklar,6,7 and C. N. Pato2,3,4,9

• Am. J. Hum. Genet., 74:000, 2004

Page 57: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

We performed a linkage analysis on 25 extended multiplex Portuguese families segregating for bipolar disorder, by use of a high-density single-nucleotide polymorphism (SNP) genotyping assay, the GeneChip Human Mapping 10K Array (HMA10K). Of these families, 12 were used for a direct comparison of the HMA10K with the traditional 10-cM microsatellite marker set and the more dense 4-cM marker set. This comparative analysis indicated the presence of significant linkage peaks in the SNP assay in chromosomal regions characterized by poor coverage and low information content on the microsatellite assays. The HMA10K provided consistently high information and enhanced coverage throughout these regions. Across the entire genome, the HMA10K had an average information content of 0.842 with 0.21-Mb intermarker spacing. In the 12-family set, the HMA10K-based analysis detected two chromosomal regions with genomewide significant linkage on chromosomes 6q22 and 11p11; both regions had failed to meet this strict threshold with the microsatelliteassays. The full 25-family collection further strengthened the findings on chromosome 6q22, achieving genomewide significance with a maximum nonparametric linkage (NPL) score of 4.20 and a maximum LOD score of 3.56 at position 125.8 Mb. In addition to this highly significant finding, several other regions of suggestive linkage have also been identified in the 25-family data set, including two regions on chromosome 2 (57 Mb, NPL = 2.98; 145 Mb, NPL = 3.09), as well as regions on chromosomes 4 (91 Mb, NPL = 2.97), 16 (20 Mb, NPL = 2.89), and 20 (60 Mb, NPL = 2.99).We conclude that at least some of the linkage peaks we have identified may have been largely undetected in previous whole-genome scans for bipolar disorder because of insufficient coverage or information content, particularly on chromosomes 6q22 and 11p11.

Page 58: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Figure 2 Linkage signals obtained with 10-cM spaced and 4-cM spaced microsatellite assays, as well as the HMA10K SNP genotyping assay. These assays were performed on the same individuals from each of the same 12 families. Note the high correlation of the different assays in general, and that for both chromosomes 6 and 11, the SNP assay detected major linkage peaks at locations where the information content and coverage of the microsatellite panels were relatively low. Mb, megabaseposition; MSM, microsatellitemarkers.

Page 59: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Figure 3 NPL analysis of 25 families with bipolar disorder from the Portuguese Island Collection. The number of each chromosome is shown at the top of each plot. The X-axis indicates the physical position (Mb) of the SNP marker. The Y-axis indicates the NPL Z score (black) or Kong and Cox LOD score (gray). For this scan, the empirical limit for genomewide significance was an NPL score of 3.85 and a LOD score of 3.15. Note that only the peak on chromosome 6 at 125.8 Mb was significant when both NPL Z and LOD thresholds were used.

Page 60: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 61: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Figure 4 Comparison of the 12-family (gray) and 25-family (black)genomewide linkage scans for selected

chromosomes showing suggestive or

significant linkage (see table 1). The X-axis indicates physical position (Mb). Notethat for both scans,

the signal on chromosome 6 at

position 125.8 Mb is the only genomic

region that achievesgenomewide

significance (of NPLscore and/or LOD

score).

Page 62: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Another Approach To LD Analysis(“Family-Based Study”)

1. Haplotype relative risk (HRR) method

: Falk and Rubinstein (1987)

2. Haplotype-based haplotype relative risk (HHRR) method: Terwilliger and Ott (1992)

3. Transmission/ disequilibrium test (TDT)

: Spielman et al. (1993)

4. Sib-Transmission/ disequilibrium test (S-TDT): Spielman and Ewens (1998)

Page 63: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Transmission/ disequilibrium test

1 2 1 2

1 1 0(d)0 (c)Allele2 (A2)

2(b)0 (a)Allele 1 (A1)Transm

itted

Allele2 (A2)

Allele1 (A1)

Not transmitted

- Focus on heterozygous parents only, and allow the use of multiple affected siblings.- McNemar’s test (standard χ2 test) H0: b = cThe TDT statistic:

- Powerful only in the presence of LD. cbcb+−

=2

21

)(χ

Page 64: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

1. Study design 1. Select target disease

2. Case-control criteria

3. Determine # of samples

2. Sample and Data Collection 1. Genetic materials

2. Clinical information/phenotypic classification

3. Environmental Information

3. Genotyping1. Select candidate genes/SNP

2. Whole genome screening

3. Select appropriate method of genotyping

4. Statistical Analysis

SNP Association Study

Page 65: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

• Statistical analysis scheme of SNP Genotyping Data

Page 66: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 67: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

ex) 한 test 에서 유의수준이 인 test가 있다고 하자.

일반적으로 를

multiple comparison을 한다면

overall 는 0.05가 아니라 0.1855가 되므로 type I error가

Inflate 되었다.

α

∴∴∴

01 1 01 01

02 2 02 02

0 0 0 01 02

01

Let : 0, Pr(do not reject H H is true) 1

: 0, Pr(do not reject H H is true) 1

then Pr(do not reject ) where and

Pr(do not reject H and do no

H

H

H H H H H

α α

α α

= = −

= = −

=

= 02 0

2

t reject )

(1- ) (1- ) (1- )

H H

α α α= =

1 2 3 0kα α α α= = = ⋅⋅⋅⋅ = =

4

(1 ) (1 )1 0.1855 0.8145 ( .95) .95

kα α− ≤ −

− = = ≤∴ α

Multiple Comparisons

Page 68: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Bonferroni Correction : 만약 m개의 multiple comparison을 한다

면 각각의 유의수준을 로 하면 전체의 유의수준을 에 가

깝게 할 수 있다.

예)m이 4인 경우

응용) 10개의 mean을 비교하는 경우

p값의 기준을 0.05로 하면 overall p값을 유지할 수 없으므로 각각

의 경우 를 기준으로 test를 실시한다.

이를 “Bonferroni corrected p-value”라고 한다.

mα α

40.05(1 ) 0.95 1 0.054

− ≅ = −

0.05 0.00510

=

Multiple Comparisons: Bonferroni Correction

Page 69: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

FDR = False Positive / Total Positive

1. Order p-values (largest to smallest)

2. Test 0.05 k/N, k=N, N-1, …. , 1

• Sequentially reduce error rate > power reduced much less

• Bonferroni, too conservative ; FDR helpful

Multiple Comparisons: FDR

False Discovery Rate

Page 70: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

1. Order p-values by P(1), P(2),…., P(m)

2. Find the largest k such that

3. 1,2,…,k 까지는 유의하다.

(예) m=500K ,

0.05/500K =10^(-7) : Bonferroni correction

0.05/500K * 2,

0.05/500K * 3 …… 해서

P(2000) <2000*10^(-7) 이고 P(2001)>2001*10^(-7) 이라면 2000개 뽑는다.

Multiple Comparisons: FDR (independent test)

Benjamini and Hochberg (1995)

Page 71: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

1. Order p-values by P(1), P(2),…., P(m)

2. Find the largest k such that

3. 1,2,…,k 까지는 유의하다.

If tests are indep or positively correlated then

If tests are negatively correlated then

Multiple Comparisons: FDR (dependent test)

Benjamini and Yekutieli (2001)

Page 72: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Permutation test

Page 73: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Five steps to a permutation test (Good 1994)1. Analyze the problem—identify the hypothesis and the

alternative (s) of interest.2. Choose a statistic.3. Compute the test statistic for the original observations.4. Generate the null reference distribution by- rearranging the labels in a manner consistent with the

randomization procedure- compute the test statistics- repeat these two steps until you obtain the distribution of the

test statistic for all possible rearrangements.5. Accept or reject the hypothesis using this permutation

distribution as a guide.• test statistic based on:• a) the actual observations (Fisher-Pitman)• b) their ranks

Page 74: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

A numerical Example• Gene Expression data for two groups• Test statistic=t statistics (comparing two means).

….….….

-1.192,3,4,-3,1,0, -2,-1P1

4.380,-1,-2,-31,2,3,4원자료

T21

One-sided p-value

Page 75: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Statistical Models for SNP Association Study

Chi-square test

Fisher’s Exact test

Logistic regression

2 groups

2 groups (N<5 per group)

보정변수

이항변수 (case-control)

T-test

Wilcoxon test

ANOVA

ANCOVA, regression

2 groups

2 groups (N<5 per group)

3 groups or more

보정변수

연속변수 (BMI, BP, etc)

Statistical MethodsGroupResponse Var

Page 76: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Logistic regression for SNP Association Study

• 진단의 정확성, control 군 선정의 문제

• SNP, Haplotye, Haplotype pair

• Genetic model (ex. Additive or dominant )

• Small cell size에 주의

• Haplotype estimation의 uncertainty

Page 77: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Two-Stage Genotyping Designs for Genome-Wide Association Scans

• Optimal Two-stage Genotyping Design for Genome-Wide Association Scans, Wang et al. Genetic Epidemiology (2006)

• All SNPs are genotyped in the first stage in a fraction of samples.

• A liberal significance level threshold is used to identify a subset of SNPs with putative associations.

• In later stages, these putative associations are re-tested in a separate sample.

Page 78: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Cost Function

Expected genotype cost of the overall study : t1n1m + t2n2m2

Expected m2 = [(m-T)α1+ T(1-β1)] t1 and t2 be the per-genotype costm, m2 : number of markersT : number of true causal SNP

The goal is to find the minimum expected cost and the corresponding parameter n1, c1, n2, c2 (thus α1, β1, α2, β2), subject to the two constraints (1) and (2).

Page 79: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 80: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Public DB

Page 81: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

SNP DB (NCBI)http://www.ncbi.nlm.nih.gov/projects/SNP/

Page 82: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

http://geneticassociationdb.nih.gov/

Page 83: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Hapmap Projecthttp://www.hapmap.org

Page 84: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

KSNP databasehttp://www.ngri.go.kr/SNP

Page 85: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 86: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

SAS Genetics 소개 및 구현 예

Page 87: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

data markers;input (a1-a10) ($);datalines;

B B A B B B A A B BA A B B A B A B C CB B A A B B B B A CA B A B A B A B A BA A A B A B B B C CB B A A A B A B C CA B B B A B A A A BA B A A A A A A A AB B A A A A A B B BA B A B A B B B A CA A A B A A A B B CB B A B A B A B A CA B B B A A A B A CB B B B A A A A A BA B A A A B A A C CA B A A A B A B C CB B A A A A A B A AA A A B A A A B A BA B A A A A B B C CA A A A A A A A B BA B B B A A A A C CA B A B A B A A B BB B A B A B A A A CA B A A A B A B A CA B B B B B A B B B;

proc allele data=markers outstat=ld prefix=Markerperms=10000 boot=1000 seed=123;

var a1-a10;run;

proc print data=ld;run;

25명에 대해서 5개 marker

Input, output datasets

HW Exact test를 위한Permutation

HWD 계수계산을위하여

Page 88: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Marker Summary

Number Number

of of Hetero- Allelic

Locus Indiv Alleles PIC zygosity Diversity

Marker1 25 2 0.3714 0.4800 0.4928

Marker2 25 2 0.3685 0.3600 0.4872

Marker3 25 2 0.3546 0.4800 0.4608

Marker4 25 2 0.3648 0.4800 0.4800

Marker5 25 3 0.5817 0.4400 0.6552

Marker Summary

--------------Test for HWE--------------

Chi- Pr > Prob

Locus Square DF ChiSq Exact

Marker1 0.0169 1 0.8967 1.0000

Marker2 1.7041 1 0.1918 0.2262

Marker3 0.0434 1 0.8350 1.0000

Marker4 0.0000 1 1.0000 1.0000

Marker5 9.3537 3 0.0249 0.0106

Polymorphism information Contents 부모로부터 자손에 전달되는 대립유전자를구별해낼 수 있는 확률

이형접합 개체의 비율

HWE 유지시 이형접합개체의 예상되는 비율

Page 89: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Allele Frequencies

Standard 95% Confidence

Locus Allele Frequency Error Limits

Marker1 A 0.4400 0.0711 0.3000 0.5800

Marker1 B 0.5600 0.0711 0.4200 0.7000

Marker2 A 0.5800 0.0784 0.4200 0.7400

Marker2 B 0.4200 0.0784 0.2600 0.5800

Marker3 A 0.6400 0.0665 0.5200 0.7600

Marker3 B 0.3600 0.0665 0.2400 0.4800

Marker4 A 0.6000 0.0693 0.4600 0.7400

Marker4 B 0.4000 0.0693 0.2600 0.5400

Marker5 A 0.2800 0.0637 0.1400 0.4200

Marker5 B 0.3000 0.0800 0.1600 0.4600

Marker5 C 0.4200 0.0833 0.2800 0.6000

Page 90: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Genotype Frequencies

HWD Standard 95% Confidence

Locus Genotype Frequency Coeff Error Limits

Marker1 A/A 0.2000 0.0064 0.0493 -0.0916 0.0956

Marker1 A/B 0.4800 0.0064 0.0493 -0.0916 0.0956

Marker1 B/B 0.3200 0.0064 0.0493 -0.0916 0.0956

Marker2 A/A 0.4000 0.0636 0.0477 -0.0336 0.1484

Marker2 A/B 0.3600 0.0636 0.0477 -0.0336 0.1484

Marker2 B/B 0.2400 0.0636 0.0477 -0.0336 0.1484

Marker3 A/A 0.4000 -0.0096 0.0457 -0.1044 0.0800

Marker3 A/B 0.4800 -0.0096 0.0457 -0.1044 0.0800

Marker3 B/B 0.1200 -0.0096 0.0457 -0.1044 0.0800

Marker4 A/A 0.3600 0.0000 0.0480 -0.0916 0.0864

Marker4 A/B 0.4800 0.0000 0.0480 -0.0916 0.0864

Marker4 B/B 0.1600 0.0000 0.0480 -0.0916 0.0864

Marker5 A/A 0.0800 0.0016 0.0405 -0.0756 0.0816

Marker5 A/B 0.1600 0.0040 0.0337 -0.0664 0.0636

Marker5 A/C 0.2400 -0.0024 0.0380 -0.0736 0.0680

Marker5 B/B 0.2000 0.1100 0.0445 0.0144 0.1884

Marker5 B/C 0.0400 0.1060 0.0282 0.0440 0.1564

Marker5 C/C 0.2800 0.1036 0.0453 0.0096 0.1884

Page 91: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

data markers;

input (g1-g5) ($);

datalines;

B/B A/B B/B A/A B/B

A/A B/B A/B A/B C/C

B/B A/A B/B B/B A/C

A/B A/B A/B A/B A/B

A/A A/B A/B B/B C/C

B/B A/A A/B A/B C/C

A/B B/B A/B A/A A/B

A/B A/A A/A A/A A/A

B/B A/A A/A A/B B/B

……

proc allele data=markers outstat=ld prefix=Marker

perms=10000 boot=1000 seed=123

genocol delimiter='/';

var g1-g5;

run;

Page 92: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

data snps;

input s1-s10;

datalines;

2 2 2 1 2 1 1 1 2 2

2 2 2 2 2 1 1 1 2 2

2 2 2 2 2 1 2 1 2 2

2 2 2 2 . . 1 1 2 2

2 2 2 2 1 2 1 2 2 2

2 2 2 2 . . 2 1 2 2

2 2 2 2 2 1 2 1 2 2

2 2 2 2 . . 2 1 2 2

2 2 2 2 1 1 1 1 2 2

2 2 1 1 2 2 2 1 2 2

2 2 2 1 2 2 2 1 2 2

2 2 2 2 1 1 1 1 2 2

2 2 2 1 2 2 2 2 2 2

2 2 2 2 2 2 1 1 2 2

2 2 2 2 2 1 2 1 2 2

2 2 2 2 2 2 2 2 2 2

2 2 2 2 2 2 2 1 2 2

proc allele data=snps prefix=SNP

nofreq haplo=est corrcoeff dprime yulesq;

var s1-s10;

run;

Page 93: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

The ALLELE Procedure

Marker Summary

Number Number

of of Hetero- Allelic

Locus Indiv Alleles PIC zygosity Diversity

SNP1 44 1 0.0000 0.0000 0.0000

SNP2 44 2 0.1190 0.0909 0.1271

SNP3 41 2 0.3283 0.4390 0.4140

SNP4 43 2 0.3728 0.4884 0.4957

SNP5 44 1 0.0000 0.0000 0.0000

Marker Summary

---------Test for HWE---------

Chi- Pr >

Locus Square DF ChiSq

SNP1 0.0000 0 .

SNP2 3.5627 1 0.0591

SNP3 0.1493 1 0.6992

SNP4 0.0093 1 0.9231

SNP5 0.0000 0 .

Page 94: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Linkage Disequilibrium Measures

LD Corr

Locus1 Locus2 Haplotype Frequency Coeff Coeff

SNP1 SNP2 2-1 0.0682 -0.0000 .

SNP1 SNP2 2-2 0.9318 -0.0000 .

SNP1 SNP3 2-1 0.2927 -0.0000 .

SNP1 SNP3 2-2 0.7073 -0.0000 .

SNP1 SNP4 2-1 0.5465 -0.0000 .

SNP1 SNP4 2-2 0.4535 -0.0000 .

SNP1 SNP5 2-2 1.0000 0.0000 .

SNP2 SNP3 1-2 0.0732 0.0214 0.1807

SNP2 SNP3 2-1 0.2927 0.0214 0.1807

SNP2 SNP3 2-2 0.6341 -0.0214 -0.1807

SNP2 SNP4 1-1 0.0331 -0.0050 -0.0398

SNP2 SNP4 1-2 0.0367 0.0050 0.0398

SNP2 SNP4 2-1 0.5134 0.0050 0.0398

Lewontin's Yule's

Locus1 Locus2 D' Q

SNP1 SNP2 . .

SNP1 SNP2 . .

SNP1 SNP3 . .

SNP1 SNP3 . .

SNP1 SNP4 . .

SNP1 SNP4 . .

SNP1 SNP5 . .

SNP2 SNP3 1.0000 1.0000

SNP2 SNP3 1.0000 1.0000

SNP2 SNP3 -1.0000 -1.0000

SNP2 SNP4 -0.1322 -0.1546

SNP2 SNP4 0.1322 0.1546

SNP2 SNP4 0.1322 0.1546

Page 95: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Linkage Disequilibrium Measures

LD Corr

Locus1 Locus2 Haplotype Frequency Coeff Coeff

SNP2 SNP4 2-2 0.4168 -0.0050 -0.0398

SNP2 SNP5 1-2 0.0682 0.0000 .

SNP2 SNP5 2-2 0.9318 0.0000 .

SNP3 SNP4 1-1 0.2221 0.0608 0.2661

SNP3 SNP4 1-2 0.0779 -0.0608 -0.2661

SNP3 SNP4 2-1 0.3154 -0.0608 -0.2661

SNP3 SNP4 2-2 0.3846 0.0608 0.2661

SNP3 SNP5 1-2 0.2927 0.0000 .

SNP3 SNP5 2-2 0.7073 0.0000 .

SNP4 SNP5 1-2 0.5465 0.0000 .

SNP4 SNP5 2-2 0.4535 0.0000 .

Linkage Disequilibrium Measures

Lewontin's Yule's

Locus1 Locus2 D' Q

SNP2 SNP4 -0.1322 -0.1546

SNP2 SNP5 . .

SNP2 SNP5 . .

SNP3 SNP4 0.4382 0.5529

SNP3 SNP4 -0.4382 -0.5529

SNP3 SNP4 -0.4382 -0.5529

SNP3 SNP4 0.4382 0.5529

SNP3 SNP5 . .

SNP3 SNP5 . .

SNP4 SNP5 . .

SNP4 SNP5 . .

Page 96: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

data markers;

input (m1-m8) ($);

datalines;

B B A B B B A A

A A B B A B A B

B B A A B B B B

A B A B A B A B

A A A B A B B B

B B A A A B A B

A B B B A B A A

A B A A A A A A

B B A A A A A B

A B A B A B B B

A B A B A B A A

B B A B A B A A

A B A A A B A B

A B B B B B A B

A A A B A A A B

B B A B A B A B

A B B B A A A B

B B B B A A A A

A B A A A B A A

A B A A A B A B

B B A A A A A B

A A A B A A A B

A B A A A A B B

A A A A A A A A

A B B B A A A A

;

proc haplotype data=markers out=hapout

init=random prefix=SNP;

var m1-m8;

run;

Page 97: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

The HAPLOTYPE Procedure

Analysis Information

Loci Used SNP1 SNP2 SNP3 SNP4

Number of Individuals 25

Number of Starts 1

Convergence Criterion 0.00001

Iterations Checked for Conv. 1

Maximum Number of Iterations 100

Number of Iterations Used 24

Log Likelihood -97.62955

Initialization Method Random

Random Number Seed 520781000

Standard Error Method Binomial

Haplotype Frequency Cutoff 0

Algorithm converged.

Haplotype Frequencies

Standard 95% Confidence

Number Haplotype Freq Error Limits

1 A-A-A-A 0.16312 0.05278 0.05967 0.26657

2 A-A-A-B 0.02642 0.02291 0.00000 0.07132

3 A-A-B-A 0.00000 0.00000 0.00000 0.00001

4 A-A-B-B 0.02655 0.02297 0.00000 0.07157

5 A-B-A-A 0.02942 0.02414 0.00000 0.07673

6 A-B-A-B 0.12429 0.04713 0.03192 0.21667

7 A-B-B-A 0.06964 0.03636 0.00000 0.14091

8 A-B-B-B 0.00056 0.00339 0.00000 0.00720

9 B-A-A-A 0.09444 0.04178 0.01256 0.17632

10 B-A-A-B 0.07297 0.03715 0.00015 0.14579

11 B-A-B-A 0.07333 0.03724 0.00034 0.14632

12 B-A-B-B 0.12317 0.04695 0.03116 0.21519

13 B-B-A-A 0.12935 0.04794 0.03539 0.22331

14 B-B-A-B 0.00000 0.00000 0.00000 0.00000

15 B-B-B-A 0.04071 0.02823 0.00000 0.09603

16 B-B-B-B 0.02603 0.02275 0.00000 0.07061

Page 98: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

proc print data=hapout noobs round; run;

_ID_ m1 m2 m3 m4 m5 m6 m7 m8 HAPLOTYPE1 HAPLOTYPE2 PROB

1 B B A B B B A A B-A-B-A B-B-B-A 1.00

2 A A B B A B A B A-B-A-A A-B-B-B 0.00

2 A A B B A B A B A-B-A-B A-B-B-A 1.00

3 B B A A B B B B B-A-B-B B-A-B-B 1.00

4 A B A B A B A B A-A-A-A B-B-B-B 0.16

4 A B A B A B A B A-A-A-B B-B-B-A 0.04

4 A B A B A B A B A-A-B-B B-B-A-A 0.13

4 A B A B A B A B A-B-A-A B-A-B-B 0.14

4 A B A B A B A B A-B-A-B B-A-B-A 0.34

4 A B A B A B A B A-B-B-A B-A-A-B 0.19

4 A B A B A B A B A-B-B-B B-A-A-A 0.00

5 A A A B A B B B A-A-A-B A-B-B-B 0.00

5 A A A B A B B B A-A-B-B A-B-A-B 1.00

6 B B A A A B A B B-A-A-A B-A-B-B 0.68

6 B B A A A B A B B-A-A-B B-A-B-A 0.32

7 A B B B A B A A A-B-A-A B-B-B-A 0.12

7 A B B B A B A A A-B-B-A B-B-A-A 0.88

8 A B A A A A A A A-A-A-A B-A-A-A 1.00

9 B B A A A A A B B-A-A-A B-A-A-B 1.00

10 A B A B A B B B A-A-A-B B-B-B-B 0.04

10 A B A B A B B B A-B-A-B B-A-B-B 0.95

10 A B A B A B B B A-B-B-B B-A-A-B 0.00

Page 99: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

11 A B A B A B A A A-A-A-A B-B-B-A 0.43

11 A B A B A B A A A-B-A-A B-A-B-A 0.14

11 A B A B A B A A A-B-B-A B-A-A-A 0.43

12 B B A B A B A A B-A-A-A B-B-B-A 0.29

12 B B A B A B A A B-A-B-A B-B-A-A 0.71

13 A B A A A B A B A-A-A-A B-A-B-B 0.82

13 A B A A A B A B A-A-A-B B-A-B-A 0.08

13 A B A A A B A B A-A-B-B B-A-A-A 0.10

14 A B B B B B A B A-B-B-A B-B-B-B 0.99

14 A B B B B B A B A-B-B-B B-B-B-A 0.01

15 A A A B A A A B A-A-A-A A-B-A-B 0.96

15 A A A B A A A B A-A-A-B A-B-A-A 0.04

16 B B A B A B A B B-A-A-A B-B-B-B 0.12

16 B B A B A B A B B-A-A-B B-B-B-A 0.14

16 B B A B A B A B B-A-B-B B-B-A-A 0.75

17 A B B B A A A B A-B-A-B B-B-A-A 1.00

18 B B B B A A A A B-B-A-A B-B-A-A 1.00

19 A B A A A B A A A-A-A-A B-A-B-A 1.00

20 A B A A A B A B A-A-A-A B-A-B-B 0.82

20 A B A A A B A B A-A-A-B B-A-B-A 0.08

20 A B A A A B A B A-A-B-B B-A-A-A 0.10

21 B B A A A A A B B-A-A-A B-A-A-B 1.00

22 A A A B A A A B A-A-A-A A-B-A-B 0.96

22 A A A B A A A B A-A-A-B A-B-A-A 0.04

23 A B A A A A B B A-A-A-B B-A-A-B 1.00

Page 100: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Haplotype Trend Regression (Zaykin)

data alleles;

input (a1-a6) ($) disease;datalines;

A a B B c C 1A A B b c C 1a A B b c c 0A A B B c C 1A A b B c C 1A A B b C c 0A a b B C c 1A A b B C c 1A a B B c c 1a a B b c c 0A A B B C C 1A A B B c c 1a A b b c c 0A A B B c c 1A A b b c c 0A A b B c C 0A A B b c C 1A a b B c c 1A a B B c C 1A A b b C C 0A A B B C C 1A A b B C c 1A A b B c C 1a A B b C c 0A a B B C C 0A A B B C c 1

A A B b C c 0A A B B c C 1a A B b C C 1A a B b C c 1A A B b c C 1A a B B c c 1A A B b C c 1a A B b C c 1A A B b C C 1A a B B C C 1a A B b C c 0a A b B C C 0A A B b c C 1a A B b c c 0A A B B C C 0A A B B c c 1A a B B C c 1;

proc haplotype data=alleles out=out outid;var a1-a6;trait disease;id disease;

run;

Page 101: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

The HAPLOTYPE Procedure

Analysis Information

Loci Used M1 M2 M3

Number of Individuals 43

Number of Starts 1

Convergence Criterion 0.00001

Iterations Checked for Conv. 1

Maximum Number of Iterations 100

Number of Iterations Used 31

Log Likelihood -115.48338

Initialization Method Linkage Equilibrium

Standard Error Method Binomial

Haplotype Frequency Cutoff 0

Algorithm converged.

Haplotype Frequencies

Standard 95% Confidence

Number Haplotype Freq Error Limits

1 A-B-C 0.26144 0.04766 0.16802 0.35485

2 A-B-c 0.22524 0.04531 0.13643 0.31405

3 A-b-C 0.13806 0.03742 0.06473 0.21140

4 A-b-c 0.14270 0.03794 0.06834 0.21706

5 a-B-C 0.07662 0.02885 0.02007 0.13316

6 a-B-c 0.08787 0.03071 0.02768 0.14805

7 a-b-C 0.00063 0.00271 0.00000 0.00595

8 a-b-c 0.06745 0.02720 0.01413 0.12076

Test for Marker-Trait Association

Trait Trait Num Chi-

Number Value Obs DF LogLike Square

1 1 29 7 -68.11558

2 0 14 7 -37.28544

Combined 43 7 -115.48338 20.1647

Test for

Marker-Trait

Association

Pr >

ChiSq 0.0052

Page 102: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

proc print data=out;

run;

OBS _ID_ disease a1 a2 a3 a4 a5 a6 HAPLOTYPE1 HAPLOTYPE2 PROB

1 1 1 A a B B c C A-B-C a-B-c 0.57103

2 1 1 A a B B c C A-B-c a-B-C 0.42897

3 2 1 A A B b c C A-B-C A-b-c 0.54538

4 2 1 A A B b c C A-B-c A-b-C 0.45462

5 3 0 a A B b c c A-B-c a-b-c 0.54783

6 3 0 a A B b c c A-b-c a-B-c 0.45217

7 4 1 A A B B c C A-B-C A-B-c 1.00000

8 5 1 A A b B c C A-B-C A-b-c 0.54538

9 5 1 A A b B c C A-B-c A-b-C 0.45462

10 6 0 A A B b C c A-B-C A-b-c 0.54538

11 6 0 A A B b C c A-B-c A-b-C 0.45462

12 7 1 A a b B C c A-B-C a-b-c 0.43177

13 7 1 A a b B C c A-B-c a-b-C 0.00346

14 7 1 A a b B C c A-b-C a-B-c 0.29706

Page 103: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

data out1;set out;haplotype=tranwrd(haplotype1,'-','_');

data out2;set out;haplotype=tranwrd(haplotype2,'-','_');

data outnew;set out1 out2;

proc sort data=outnew; by haplotype;

run;

data outnew2;set outnew;lagh=lag(haplotype);if haplotype ne lagh then num+1;hapname=compress("H"||num,' ');

proc sort data=outnew2; by _id_ hapname;

run;

data outt;set outnew2;by _id_ haplotype;if first.haplotype then totprob=prob/2;else totprob+prob/2;if last.haplotype;

proc transpose data=outt out=outreg(drop=_NAME_) id hapname;idlabel haplotype;var totprob;by _id_ disease;

run;

data htr;set outreg;array h{8};do i=1 to 8;if h{i}=. then h{i}=0;

end;keep _id_ disease h1-h8;

proc print data=htr ;*noobs round label;run;

proc logistic data=htr descending;model disease = h1-h8 / selection=stepwise;

run;

Page 104: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Individual

ID disease A_B_C A_B_c a_B_C

1 1 0.29 0.21 0.21

2 1 0.27 0.23 0.00

3 0 0.00 0.27 0.00

4 1 0.50 0.50 0.00

5 1 0.27 0.23 0.00

6 0 0.27 0.23 0.00

7 1 0.22 0.00 0.13

8 1 0.27 0.23 0.00

9 1 0.00 0.50 0.00

10 0 0.00 0.00 0.00

11 1 1.00 0.00 0.00

12 1 0.00 1.00 0.00

13 0 0.00 0.00 0.00

14 1 0.00 1.00 0.00

15 0 0.00 0.00 0.00

16 0 0.27 0.23 0.00

17 1 0.27 0.23 0.00

18 1 0.00 0.27 0.00

19 1 0.29 0.21 0.21

20 0 0.00 0.00 0.00

21 1 1.00 0.00 0.00

22 1 0.27 0.23 0.00

23 1 0.27 0.23 0.00

24 0 0.22 0.00 0.13

25 0 0.50 0.00 0.50

26 1 0.50 0.50 0.00

27 0 0.27 0.23 0.00

28 1 0.50 0.50 0.00

29 1 0.01 0.00 0.49

a_B_c A_b_C A_b_c a_b_c a_b_C

0.29 0.00 0.00 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.23 0.00 0.23 0.27 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.15 0.15 0.13 0.22 0.00

0.00 0.23 0.27 0.00 0.00

0.50 0.00 0.00 0.00 0.00

0.50 0.00 0.00 0.50 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.50 0.50 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.00 1.00 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.23 0.00 0.23 0.27 0.00

0.29 0.00 0.00 0.00 0.00

0.00 1.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.15 0.15 0.13 0.22 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.23 0.27 0.00 0.00

0.00 0.00 0.00 0.00 0.00

0.00 0.49 0.00 0.00 0.01

H1 H2 H5 H6 H3 H4 H8 H7

Page 105: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 6.1962 1 0.0128

Score 6.3995 1 0.0114

Wald 4.9675 1 0.0258

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 1.1986 0.4058 8.7224 0.0031

H8 1 -6.3249 2.8378 4.9675 0.0258

Page 106: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 107: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating
Page 108: 유전자연구통계기법hosting03.snu.ac.kr/~hokim/seminar/asan20070113.pdf · 2007-01-03 · We performed a linkage analysis on 25 extended multiplex Portuguese families segregating