비전공자를위한...

105
유전통계학의 기본개념 소개: 비전공자를 위한 유전체학의 소개 김호 서울대학교 보건대학원

Transcript of 비전공자를위한...

Page 1: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

유전통계학의 기본개념 소개:비전공자를 위한유전체학의 소개

김호

서울대학교 보건대학원

Page 2: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Key Concepts of This Talk

Page 3: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Key Concepts of This Talk

Page 4: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Key Concepts of This Talk

Recombination

Stochastic

Stochastic

Stochastic

…….

子1 子2 …

Stochastic

Page 5: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Contents

A. Introduction

B. Basic Concepts in Population (Statistical) Genetics

C. Statistical Tools for Genetics and Genomics for Health Research

D. Examples & Discussion

Page 6: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

A. Introduction

• Basic Biology

• DNA

• Gene

• Genome

• Chromosome

Page 7: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

What is genome ?

• A person’s genome is the complete DNA sequence of of their chromosomes

• Each person has a unique genome

• The human genome project provides a reference sequence for the human genome based on ~5 individuals

• How the genome is expressed in a cell determines it’s size shape, and function

Page 8: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 9: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

What do we know about the human genome ?

• The human genome contains ~3.1 billion DNA bases.

• Almost all DNA bases (99.9%) are exactly the same in all people.

• However, we still differ one another at millions of DNA bases.

• The size of genes varies greatly.

Page 10: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

What does the human genome sequence tell us ?

• Less than 2% of the genome codes for genes.

• Some areas of the genome are gene-rich and some are gene-poor.

• Gene rich (poor) areas have an abundance of G(A)s and C(T)s.

• Where gene-rich areas appear in the genome appears random.

• Chromosome 1 has most genes (2968).

• The Y Chromosome has fewest genes (231).

Page 11: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

What is DNA ?

• The double helix structure of DNA makes it very stable

Page 12: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 13: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

What is a gene ?

• A gene is a DNA sequence that contains the coding instructions for making a particular protein.

• The average gene is ~3000 bases long.

• Some of the DNA sequence of the gene helps the regulate the expression of the gene in our cells.

Page 14: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 15: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 16: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Synonymous SNP (Single Nucleotide Polymorphism)

Page 17: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 18: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 19: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Chromosomal Banding Pattern

• Chromosome condense during cell division

• Chromosomes are numbered according to their size.

• The banding pattern of each chromosome is unique.

• Each band contain millions of DNA nucleotides

• Light bands correspond to areas rich in Gs and Cs.

• Dark bands corresponds to areas rich in As and Ts.

Page 20: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

International System

• Short arms are labeled ‘p’ (petit)

• Long arm are labeled ‘q’(queue).

• Chromosome bands are labeled ‘p11’, ‘p12’, etc like zip codes.

• The terminal ends of the chromosomes are labeled ‘ter’

• Where the arms meet in the center is called ‘centromere’

Page 21: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

What do we know about the human genome ?

• The human genome contains ~3.1 billion DNA bases

• Almost all DNA bases (99.9%) are exactly same in all people

• However, we still differ from one another at millions of DNA bases

• The size of genes varies greatly.

• The largest known human gene is dystrophin at 2.4 million bases

• The total number of genes is estimated at 30,000 to 40,000

Page 22: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

B. Basic Concepts in Population (Statistical) Genetics

• Terminology (Genotype, Allel, Haplotype, Hardy-Weinberg Equilibrium, Linkage Disequilibrium)

• SNP and haplotype Estimation

Page 23: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Genotype

Page 24: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Allele frequency

Page 25: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Genotype frequency

Page 26: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Hardy-WeinbergIn a stable population with random mating, allele freq predicts genotype freq.

Goodness-of-fit can be applied to test H-W Equilibrium

Page 27: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Linkage DisequilibriumAlleles at different sites should occur in a combinations relative to their SNP allele freq

Page 28: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

LD Block

Page 29: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Shaw et al. Am J of Medical Genet 114 205-213 (2002)

Page 30: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

SNPs(pronounced snips)

Page 31: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

SNPs as DNA Landmarks

• Help in DNA sequencing

• Help in the discovery of genes responsible for many major diseases:

– asthma, diabetes, heart disease, schizophrenia and cancer among others

Page 32: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

From SNP to Haplotype

DNA Sequence

GATATTCGTACGGA-TGATGTTCGTACTGAATGATATTCGTACGGA-TGATATTCGTACGGAATGATGTTCGTACTGAATGATGTTCGTACTGAAT

SNP

SNP

1 2

3

4

5 6

AG- 2/6

GTA 3/6AGA 1/6

Haplotypes

PhenotypeBlack eyeBrown eyeBlack eyeBlue eyeBrown eyeBrown eye

Page 33: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 34: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

In-silico Haplotyping: Approaches

1) Clark’s algorithm

2) E-M algorithm (expectation-maximization algorithm)

3) Bayesian algorithm

Page 35: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Clark’s Algorithm

1) Find Homozygotes or heterozygotes at one locus

SNP1 T T

SNP2 A A

SNP3 C C

T-A-C

T-A-C

SNP1 T T

SNP2 A A

SNP3 C G

T-A-C

T-A-G

Unambiguously defined

Page 36: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Clark’s Algorithm

2) Try to solve ambiguous haplotype as a combination of solved ones

SNP1 A T

SNP2 A A

SNP3 C G

T-A-C : solved one

A-A-G

Continue until either all haplotypes have been solved or until no more haplotypes can be found in this way

……………………………

Page 37: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Clark’s Algorithmproblems

• No homozygotes or single SNP heterozygotes -> chain might never get started

•Many unsolved haplotypes left at the end

•Quite useful in practice !!

Page 38: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

EM Algorithm

• Use multinomial likelihood with HWE

Pr(AT//AA//CG)

=pr(AAC/TAG)+pr(AAG/TAC)

=pr(AAC)pr(TAG)+pr(AAG)pr(TAC)

Falling and Schork(2000) showed that EM is better than Clark’s algorithm

Page 39: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

A Gibbs sampler, Stephens et al (2001)

• G=(G1, …, Gn) observed multilocus genotype freq

H=(H1, …, Hn) unknown haplotype pairs

F=(F1, …, FM) M unknown pop’n hap freq

1. Choose individual i from all ambiguous individuals

2. Sample Hi(t+1) from pr(Hi|G,H-i

(t))

3. Set Hj(t+1)=Hj

(t) for j=1,2,…,i-1,i+1,…n

Page 40: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 41: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Haplotype InferenceA: SNP data: 0 (MM), 1 (Mm), 2 (mm) for a single locus

B: Haplotype data: 0(M), 1 (m) for a single locus

Page 42: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

C.Statistical tools in Genetics and Genomics for Health

Research • Linkage Analysis (LOD score,

Pedigree Analysis )

• Segregation Analysis

• SNP and haplotype Inference

• Association Study

• Examples

Page 43: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Putative gene(locus)

Gene ?Phenotype

Linkage analysisLinkage analysis(LD, (LD, sibpairsibpair et al)et al)

Association studyAssociation study

New GeneNew Gene DiscoveryDiscovery

SegregationSegregation

Page 44: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Biological Basis of Linkage

• If two loci are on different chromosome, they recombine with probability 0.5

• Similarly, if two loci are very far apart on the same chromosome,..

• But then the two loci are very close together, recombination tends towards zero.

Page 45: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 46: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· PARAMETRIC LINKAGE ANALYSISTo estimate the recombination fraction between markers and a hypothesized trait locus, where inheritance parameters of the trait locus (mode of inheritance, penetrance, phenocopy rate, allele frequencies etc) must be specified.

Ex. Lod score method

Page 47: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· LOD SCORE

The common logarithm of the likelihood ratio:

Z(θ) = log10 [L(θ ) / L(½)]

where θ is the recombination fraction between two loci

Page 48: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· Purpose Of The Lod Score Method

1. Estimation of the recombination fraction, θ

2. Hypothesis testing

H0: θ = ½ (absence of linkage)

H1: θ < ½ (linkage)

max 10( ) log [ ( ) / (1/ 2)]Z Z L L= =θ̂ θ̂

Page 49: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· Scale For Testing Linkage

Zmax ≥ 3 : Strong linkage

Zmax > 0 : Support linkage

Zmax < 0 : Against linkage

Zmax = 0 : No support

(not related to recombination in linkage or no linkage)

Page 50: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· Asymptotic Distribution

2 ln [L(θ ) / L(½)] = 4.6 × Zmax ~ χ21

under the null hypothesis of no linkage

P (Zmax ≥ 3) = P (χ21 ≥ 13.8) = 0.0002

α = 0.0001

Page 51: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Phase known pedigree

Page 52: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Figure 2 Phase known pedigree

• The maximum likelihood estimator of is 2/6=1/3

2 46 2 4

10 102 4

(1 )( ) log log 2 (1 )0.5 0.5

Z θ θθ θ θ−= = −⋅

(1/ 3) 0.1475Z =

θ

Page 53: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Phase-unknown pedigree

Page 54: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Figure 3 Phase-unknown pedigree

• The maximum likelihood estimator of is not so trivial

• The MLE is found to be 0.5 by numerical method

4 2 2 4

2 2 2 2

1 1( ) (1 ) (1 )2 21 = (1 ) [ (1 ) ]2

L θ θ θ θ θ

θ θ θ θ

= − + −

− + −

θ

Page 55: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Genotype Unknown-Phenotype known

Page 56: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Figure 4 Genotype Unknown-Phenotype known

( ; ) Pr( ) Pr( )

Pr( | , ; )

and we know thatPr( ) Pr( ) Pr( | )

ma pa

offs ma paoffspring

ma G

L data Ph Ph

Ph Ph Ph

Ph G Ph G

θ

θ

=

×

=

Page 57: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· NONPARAMETRIC LINKAGE ANALYSISInheritance parameters of the trait locus are not specified. Rather, one focuses on pairs (or multiples) of affected individuals and investigates marker allele sharing among these individuals, contrasting observed allele sharing with that expected when the marker has nothing to do with the trait.

Ex. IBD (identical by descent) test

Page 58: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

AN EXAMPLE FAMILY WITH DISEASE LOCUS AT THE MARKER

3 4+ –

3 2+ –

3 3+ +

3 4+ –

2 3– +

2 4– –

• Only ‘+ +’ indicates as “affected”(‘+’ is recessive to ‘–’)

** Qualitative Trait

Sib-Pair Markers

sib1 sib23 | 3 3 | 33 | 3 3 | 43 | 3 2 | 33 | 3 2 | 43 | 4 3 | 43 | 4 2 | 33 | 4 2 | 42 | 3 2 | 32 | 3 2 | 42 | 4 2 | 4

Disease Status

d1 d2+ ++ -+ -+ -- -- -- -- -- -- -

# ofShared i.b.d.

2110201212

C

10.250.250.250.50.50.50.50.50.5

• Cj = (dj1 – µ) (dj2 – µ)

= α + β IBDj + εj

Page 59: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· Linkage And LD

- The two loci can be assumed to reside on different chromosomes.

The presence of LD does not necessarily imply linkage between the loci considered.

- Although LD originally referred to an association of alleles at different loci, it has become customary to take LD to mean association among alleles due to close linkage. “allelic association”

Page 60: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

• Genomewide Linkage Analysis of Bipolar Disorder by Use of a High-Density Single-Nucleotide Polymorphism (SNP) Genotyping Assay: A Comparison with MicrosatelliteMarker Assays and Finding of Significant Linkage to Chromosome 6q22

• F. A. Middleton,1,2,3 M. T. Pato,2,3,4 K. L. Gentile,1,2 C. P. Morley,2 X. Zhao,1,2 A. F. Eisener,2 A. Brown,1,2 T. L. Petryshen,6 A. N. Kirby,5,6 H. Medeiros,2,4 C. Carvalho,2 A. Macedo,8 A. Dourado,8 I. Coelho,8 J. Valente,8 M. J. Soares,8 C. P. Ferreira,9 M. Lei,9 M. H. Azevedo,4 J. L. Kennedy,10 M. J. Daly,5 P. Sklar,6,7 and C. N. Pato2,3,4,9

• Am. J. Hum. Genet., 74:000, 2004

Page 61: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

We performed a linkage analysis on 25 extended multiplex Portuguese families segregating for bipolar disorder, by use of a high-density single-nucleotide polymorphism (SNP) genotyping assay, the GeneChip Human Mapping 10K Array (HMA10K). Of these families, 12 were used for a direct comparison of the HMA10K with the traditional 10-cM microsatellite marker set and the more dense 4-cM marker set. This comparative analysis indicated the presence of significant linkage peaks in the SNP assay in chromosomal regions characterized by poor coverage and low information content on the microsatellite assays. The HMA10K provided consistently high information and enhanced coverage throughout these regions. Across the entire genome, the HMA10K had an average information content of 0.842 with 0.21-Mb intermarker spacing. In the 12-family set, the HMA10K-based analysis detected two chromosomal regions with genomewide significant linkage on chromosomes 6q22 and 11p11; both regions had failed to meet this strict threshold with the microsatelliteassays. The full 25-family collection further strengthened the findings on chromosome 6q22, achieving genomewide significance with a maximum nonparametric linkage (NPL) score of 4.20 and a maximum LOD score of 3.56 at position 125.8 Mb. In addition to this highly significant finding, several other regions of suggestive linkage have also been identified in the 25-family data set, including two regions on chromosome 2 (57 Mb, NPL = 2.98; 145 Mb, NPL = 3.09), as well as regions on chromosomes 4 (91 Mb, NPL = 2.97), 16 (20 Mb, NPL = 2.89), and 20 (60 Mb, NPL = 2.99).We conclude that at least some of the linkage peaks we have identified may have been largely undetected in previous whole-genome scans for bipolar disorder because of insufficient coverage or information content, particularly on chromosomes 6q22 and 11p11.

Page 62: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

• Figure 2 Linkage signals obtained with 10-cM spaced and 4-cM spaced microsatellite assays, as well as the HMA10K SNP genotyping assay. These assays were performed on the same individuals from each of the same 12 families. Note the high correlation of the different assays in general, and that for both chromosomes 6 and 11, the SNP assay detected major linkage peaks at locations where the information content and coverage of the microsatellite panels were relatively low. Mb, megabaseposition; MSM, microsatellitemarkers.

Page 63: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

• Figure 3 NPL analysis of 25 families with bipolar disorder from the Portuguese Island Collection. The number of each chromosome is shown at the top of each plot. The X-axis indicates the physical position (Mb) of the SNP marker. The Y-axis indicates the NPL Z score (black) or Kong and Cox LOD score (gray). For this scan, the empirical limit for genomewide significance was an NPL score of 3.85 and a LOD score of 3.15. Note that only the peak on chromosome 6 at 125.8 Mb was significant when both NPL Z and LOD thresholds were used.

Page 64: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 65: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Figure 4 Comparison of the 12-family (gray) and

25-family (black)genomewide linkage scans for selected

chromosomes showing suggestive or

significant linkage (see table 1). The X-axis indicates physical position (Mb). Notethat for both scans,

the signal on chromosome 6 at

position 125.8 Mb is the only genomic

region that achievesgenomewide

significance (of NPLscore and/or LOD

score).

Page 66: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· QUANTITATIVE TRAITA phenotype with a continuous (normal/ lognormal) distribution.

Ex. Height, blood pressure, head circumstance and the cholesterol level in the blood

· QUALITATIVE TRAITA phenotype with a discrete distribution. Ex. Signs and symptoms indicate whether a disease state is present or absent.

Page 67: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

· HERITABILITY Of The Trait (H2)

The fraction of the variation caused by genetic variation.

H2 = Vg / Vp =Vg / (Vg + Ve ) (broad sense)

= Va / Vp (narrow sense)

· QUANTITATIVE TRAIT LOCI (QTL)

The location of a gene that affects a trait that is measured on a quantitative (linear) scale. The loci that are determinants of quantitative trait expression.

Page 68: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

예제: Descriptive statistics

1.8 0 CHOL≥240(%)

10.07 8.78 BMI≥30(%)

21.2 14.8 HP(%)

38.7 9.4 ALCHOL(%)

18.7 14.3 SMOK(%)

69.9 90.5 MADE(%)

39.9 46.3 MALE(%)

8.0 8.2 EDU-YR(MEAN)

38.2 29.4 AGE (MEAN)

SelengeDornodD (지역) S (지역)

Page 69: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

16.8 10.5 16.6 7.3 SKIN FOLD

163.4 165.4 159.7 154.2 CHOL

74.1 75.0 74.1 69.2 WC

76.1 82.0 73.2 67.4 DBP

116.0 127.0 114.0 107.6 SBP

24.3 22.3 24.1 21.0 BMI

54.9 58.1 54.7 48.0 WEIGHT

151.9 159.8 149.6 147.5 HEIGHT

Female Male Female Male

Selenge Dornod D S

Page 70: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

**0.50WC ^

**

**

**

**

**

**

*

**

**

유의성

0.53DBP ^

0.35BMI ^

0.38SKIN_FOLD ^

0.43BMD_LF1 ^

0.50HDL-C

0.42HC

0.17HEIGHT

0.39WEIGHT

0.51SBP

유전율변수

** P-value <0.05 * P-value < 0.1^ 정규성, 왜도, 첨도를 위해 변환을 실행한 변수들HEIGHT의 경우는 첨도에 문제가 있어서 유전율이 낮게 나왔음. Covariate으로 age sex age^2 age*sex bmi 중에서 사용하였는데 변수마다그 covariate 들이 다르다

SOLARSOLAR로로 살펴본살펴본 일부일부 PHENOTYPEPHENOTYPE의의 유전율유전율

Page 71: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

• Traditional linkage studies > use recombination information only in pedigrees

• Association methods > use recombination information at the population level

• Association methods have greater power to detect small and moderate genetic effects than does linkage analysis (Risch and Merikangas 1996)

Page 72: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

A Strategy for Suggested Asso St for complex disease

1. Small # of people (10-20) genotyped at a very dense SNP map, haps also determined

2. Hap block partitioning algorithm : hap block and tag SNPs

3. Large # of people genotyped at tag SNP marker loci

4. Association study analysis

Page 73: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

D. Examples & Discussion

• 대장암 연구

• Population admixture

• Gene-Environmental Interactions

• 미토콘드리아 연구

Page 74: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Polymorphisms in the XRCC1gene and alcohol consumption are associated with colorectal

cancer risk

• a case-control study of 209 colorectal cancer cases and 209 age- and sex-matched controls in the Korean population

• Allelic variants of the XRCC1 gene at codons 194, 280 and 399 were analyzed in lymphocyte DNA by PCR-RFLP

Page 75: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Table 1. Frequencies of single nucleotide polymorphisms and the odds ratios of colorectal cancer

0.0171.61 (1.09, 2.39)73 (34.9)97 (46.4)Arg/Gln or Gln/Gln

0.6911.21 (0.47, 3.16)9 (4.3)9 (4.3)Gln/Gln

0.0141.67 (1.11, 2.51)64 (30.6)88 (42.1)Arg/Gln

1136 (65.1)112 (53.6)Arg/Arg

XRCC1 Codon 399

0.1441.43 (0.88, 2.32)36 (17.2)48 (23.0)Arg/His or His/His

0.6130.54 (0.05, 5.98)2 (1.0)1 (0.5)His/His

0.1141.49 (0.91, 2.43)34 (16.2)47 (22.5)Arg/His

1173 (82.8)161 (77.0)Arg/Arg

XRCC1 Codon 280

0.2801.24 (0.84, 1.82)108 (51.7)119 (57.0)Arg/Trp or Trp/Trp

0.8091.08 (0.58, 2.00)26 (12.5)25 (12.0)Trp/Trp

0.2291.29 (0.85, 1.94)82 (39.2)94 (45.0)Arg/Trp

1101 (48.3)90 (43.0)Arg/Arg

XRCC1 Codon 194

P-valueOR (95% CI)*Controls (%)Patients (%)

00

01

11

01 or 11

00

01

11

01 or 11

00

01

11

01 or 11

Page 76: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Table 2. Estimated haplotype frequencies and odds ratios of colorectal cancer based on haplotypes

0.0021.78 (1.23, 2.59)82 (19.6)106 (25.4)194Arg-280Arg-399Gln

0.0151.81 (1.12, 2.94)38 (9.1)50 (12.0)194Arg-280His-399Arg

0.0231.47 (1.05, 2.05)134 (32.1)143 (34.2)194Trp-280Arg-399Arg

1164 (39.2)119 (28.4)194Arg-280Arg-399Arg

P-valueOR (95% CI)Controls (%)Patients (%)XRCC1*

* The frequencies of 194Trp-280His-399Arg, 194Trp-280Arg-399Gln, 194Arg-280His-399Gln, 194Trp-280His-399Gln were zero in both groups.

000

100

010

001

Page 77: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Table 3. Estimated genotype frequencies and the odds ratios of colorectal cancer aftercontrolling for alcohol intake, smoking, dietary habits and exercise

0.4861.28 (0.64, 2.54)9 (4.3)9 (4.3)194Arg-280Arg-399Gln /194Arg-280Arg-399Gln

0.0043.69 (1.53, 8.90)6 (2.9)17 (8.1)194Arg-280His-399Arg /194Arg-280Arg-399Gln

0.2801.79 (0.62, 5.14)2 (1.0)1 (0.5)194Arg-280His-399Arg /194Arg-280His-399Arg

0.0042.08 (1.27, 3.40)22 (10.5)42 (20.1)194Trp-280Arg-399Arg /194Arg-280Arg-399Gln

0.5641.54 (0.36, 6.60)12 (5.7)14 (6.7)194Trp-280Arg-399Arg /194Arg-280His-399Arg

0.6091.32 (0.46, 3.75)26 (12.4)25 (12.0)194Trp-280Arg-399Arg /194Trp-280Arg-399Arg

0.7701.07 (0.68, 1.69)36 (17.2)29 (13.9)194Arg-280Arg-399Arg /194Arg-280Arg-399Gln

0.5401.14 (0.76, 1.71)16 (7.7)17 (8.1)194Arg-280Arg-399Arg /194Arg-280His-399Arg

0.8260.90 (0.34, 2.35)48 (23.0)37 (17.7)194Arg-280Arg-399Arg /194Trp-280Arg-399Arg

132 (15.3)18 (8.6)194Arg-280Arg-399Arg /194Arg-280Arg-399Arg

P-value

OR (95% CI)Controls (%)Patients (%)

000/000

000/100

000/010

000/001

100/100

100/010

100/001

010/010

010/001

001/001

Page 78: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Table 4. Risk of colorectal cancer associated with alcohol intake after controlling for smoking, dietary habits and exercise, and the risk modification by genotype

0.3154.14 (0.26, 66.36)1 (16.7)7 (41.2)A bottle or more a week

5 (83.3)10 (58.8)Less than a bottle a week

0.0317.15 (1.20, 42.46)19 (86.4)3 (13.6)

27 (64.3)15 (35.7)

194Trp-280Arg-399Arg /194Arg-280Arg-399Gln Less than a bottle a weekA bottle or more a week

194Arg-280His-399Arg /194Arg-280Arg-399Gln

0.6181.58 (0.26, 9.65)6 (18.7)6 (33.3)A bottle or more a week

26 (81.3)12 (66.7)Less than a bottle a week

194Arg-280Arg-399Arg /194Arg-280Arg-399Arg

0.0012.45 (1.41, 4.25)52 (24.9)64 (30.6)A bottle or more a week

157 (75.1)145 (69.4)Less than a bottle a week

All subjects

P-valueOR (95% CI)Controls (%)Patients (%)Amount of alcohol intake

000/000

100/001

010/001

Page 79: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Haplotype 분석시의 유의점

• Haplotype estimation에서의 불확실성

• LD를 살펴봄

• Sub-cell의 freq 가 너무 적은 경우에는 비모수적인 방법 등을 고려해야함

• Population mixture의 문제

Page 80: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

무유

0.7134/0.1333=5.52상대위험도

28/(28+182)=0.133318228a

81/(81+29)=0.73642981A

위험도질병상태

유전정보

표1. 질병상태의 유전정보에 따른 위험도 (예제1)

결론 : 유전정보와 질병상태에는 연관이 있다.

Page 81: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

표2. 혼란변수 유무에 따른 위험도 (예제1)

무유

1.00상대위험도

0.80028a

0.8002080A

위험도질병상태유전

정보무유

1.00상대위험도

0.10018020a

0.10091A

위험도질병상태유전

정보

A 인종 B 인종

결론 : 두 인종 모두에서 유전정보와 질병상태에는 연관이 없다.

Page 82: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

요약하면

• 전체 집단에서는 질병과 유전정보에 연관 있다.

• A 인종에서는 질병과 유전정보에 연관 없다.

• A 인종에서는 질병과 유전정보에 연관 없다.

• ? ? ?

Page 83: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

무유

1.0000상대위험

0.3636350200a

0.3636420240A

위험도질병상태

유전정보

표3. 질병상태의 유전정보에 따른 위험도 (예제2)

결론 : 질병상태와 유전정보에는 연관이 없다.

Page 84: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

표4. 혼란변수 유무에 따른 위험도 (예제2)

무유

2.45상대위험도

0.3900305195a

0.95455105A

위험도질병상태유전

정보무유

2.45상대위험도

0.1000455a

0.2455415135A

위험도질병상태유전

정보

A 인종 B 인종

결론 : 두 인종 모두에서 유전정보와 질병상태에는 연관이 없다.

Page 85: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

요약하면

• 전체 집단에서는 질병과 유전정보에 연관 없다.

(RR=1.00)

• A 인종에서는 질병과 유전정보에 연관 있다. (RR=2.45)

• B 인종에서는 질병과 유전정보에 연관 있다. (RR=2.45)

• ? ? ?

Page 86: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

• 질병상태와 유전정보는 인종에 의해 혼란

(Confounding) 되고 있다

• 이러한 경우 올바른 자료의 분석을 위해서는 인종은 질병상태와 유전정보와 함께 반드시 고려해야한다. (성별을 혼란변수라고 부른다.)

• Population mixture 문제

Page 87: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Gene-Environment Interaction

Nature vs. NurtureGenes …. or environment ?

? 사상의학 (체질) ?

Page 88: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 89: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Type of gene-environment interactions

EnvironmentsNutritionalChemicalPharmacologicalPhysicalBehavioral

Page 90: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Model 1: Neither Genotype nor Environment alone increase Risk

Genotype

Environment

Genotype

Page 91: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Model 2: Genotype exacerbates the effect of the Risk factor

UV light Skin Cancer

Nucleotide Excision Repair (NER) gene

Page 92: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Model 3: The risk factor exacerbates the effect of the

Genotype

G6PD variants

Hemolytic Anemia

Fava bean consumption

Page 93: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Model 4: Genotype and Risk Factor each Risk by themselves

Alpha-1 antitrypsin deficiency

Smoking

Emphysema

Page 94: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Complicated !!

More than One gene ?

Page 95: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Time line of developments in human statistical genetics

Theory Technology Study design

Page 96: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Mitochondrial DNA

• 16.5 kb

• High copy number

• Lack of recombination

• High substitution rate

• Maternal mode of inheritance

• Y chromosome: Paternal inheritance

Page 97: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

MtDNA-based analyses of modern human variation

• Cann et al. (1987) Mitochondrial DNA and human evolution. Nature 325:31

“Mitochondrial DNAs from 147 people, drawn from five geographic populations have been analysedby restriction mapping. All these mitochondrialDMAs stem from one woman who is postulated to have lived about 200,000 years ago, probably in Africa. All the populations examined except the African population have multiple origins, implying that each area was colonised repeatedly.”

Page 98: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Neighbour-joining

phylogrambased on

complete mtDNAgenome

sequences (excluding the

D-loop)

Page 99: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Mismatch distributions of

pairwisenucleotide differences

between mtDNAgenomes

(excluding the D-loop)

African

Non-African

Page 100: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant
Page 101: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

References• Clark (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Bio Evol 7: 111-122

• Escoffier and Slatkin (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Bio Evol 12: 921-927.

• Stephens, Smith, and Donnelly (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978-989.

• Niu, Qin, Xu and Liu (2002) Bayesian haplotype inference for multiple linked single-nucleotide ploymorphisms. Am J Hum Genet 70;157-169

•Patil et al (2001) Science 294: 1719-1723

Page 102: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

•Escoffier and Slatkin (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Bio Evol 12: 921-927.

• Stephens, Smith, and Donnelly (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978-989.

• Niu, Qin, Xu and Liu (2002) Bayesian haplotype inference for multiple linked single-nucleotide ploymorphisms. Am J Hum Genet 70;157-169

•Toivonen et al. (2000) Data Mining Applied to Linkage Disequilibrium Mapping. AM J Hum Genet 67: 133-145

•Petteri Sevon, Hannu T.T. Toivonen, Vesa Ollikainen. TreeDT: Gene Mapping by Tree Disequilibrium Test. The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), pp. 365-370. San Francisco, California, August 2001.

Page 103: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

• Wallenstein, Hodge, Weston (1998) Logistic regression model for analyzing extended haplotype data, Genet Epidemiol 15:173-181.

•Http://www.genome.helsinki.fi/eng/research/projects/DM/index.html

•ZHAOHUI S. QIN, TIANHUA NIU, JUN S. LIU (2002) Partition-Ligation–Expectation-Maximization Algorithm for Haplotype Inference with Single-Nucleotide Polymorphisms Am. J. Hum. Genet. 71:1242–1247, 2002

•Petteri Sevon, Vesa Ollikainen, Päivi Onkamo, Hannu Toivonen, Heikki Mannila, and Juha Kere.

•Johnson et al. (2001) Nat Genetics 29: 233-237

Page 104: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Useful Sites

http://www.genomicawareness.org/Very nice introduction

http://linkage.rockefeller.edu/soft/list.html

http://www.biology.lsu.edu/general/software.htmlSoftware List

http://www.ngic.re.kr 국가 유전체 정보센터

http://www.jax.org/staff/churchill/labsite/index.htmlGary Churchill’s stat genetics group

http://linkage.cpmc.columbia.edu/index2.html

Joseph D. Terwilliger

Page 105: 비전공자를위한 유전체학의소개hosting03.snu.ac.kr/~hokim/seminar/genestat20041105.pdf · 2004-11-12 · HMA10K-based analysis detected two chromosomal regions with genomewidesignificant

Acknowledgement

• 조성일 (서울대학교 보건대학원)

• 김종일(한림의대 생화학과)

• 서정선(서울의대 생화학과)

• 홍윤철(서울의대 예방의학과)

• 서영주

• Dr. Terwilliger

• Dr. Kardia