Introduction of Genome Research

Post on 08-Jan-2016

27 views 0 download

description

Introduction of Genome Research. Bioinformatics Research Center Institute of Biomedical S ciences ACADEMIA SINICA. 莊樹諄. www.sinica.edu.tw/~trees/bioinformatics E-mail: trees@gate.sinica.edu.tw. Introduction. Outline. Introduction. Some Research Topics. Related Links and Resources. - PowerPoint PPT Presentation

Transcript of Introduction of Genome Research

1 (BRC)中研院生物資訊中心90/4/9 pm

Introduction of Introduction of Genome ResearchGenome Research

Bioinformatics Research CenterInstitute of Biomedical Sciences

ACADEMIA SINICA

莊樹諄www.sinica.edu.tw/~trees/bioinformaticsE-mail: trees@gate.sinica.edu.tw

Outline

IntroductionIntroduction Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

290/4/9 pm 中研院生物資訊中心 (BRC)

390/4/9 pm 中研院生物資訊中心 (BRC)

Chromosome

Introduction

490/4/9 pm

GeneDNA Sequence

Intron

5‘ 3’

5‘UTR 3’UTR

mRNAcDNAComplement DNA

ORF

Exon(coding regions)

DNA

RNA

Protein

Function

90/4/9 pm (BRC)中研院生物資訊中心 5

DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶 ) --- 4 letters

Introduction

DNA nucleotide acid ( 核苷酸 )

Phosphoric acid( 磷酸 ) Deoxyribose ( 去氧核糖 ) Nitrogenous base ( 含氮鹽基 )

Nitrogenous base ( 含氮鹽基 )

Purines :

Pyrimidine :

Nitrogenous base ( 含氮鹽基 )

Adenine (A, 腺嘌呤 ) Guanine (G, 鳥糞嘌呤 )

Cytosine (C, 胞嘧啶 ) Thymine (T, 胸腺嘧啶 )

ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA

5‘ 3‘

TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT3‘ 5‘

Codon

ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA

Amino acid

90/4/9 pm 中研院生物資訊中心 (BRC) 6

43 = 64 20

IntroductionDNA sequence: A, C, G, T --- 4 lettersRNA sequence: A, C, G, U --- 4 lettersAmino acid sequence: --- 20 letters

7

Second position ThirdPosition (3’)

FirstPosition (5’) U C A G

UCAG

UCAGUCAG

UCAG

U

C

A

G

Phe (F) Ser (S) Tyr (Y) Cys (C)Phe (F) Ser (S) Tyr (Y) Cys (C)Leu (L) Ser (S) StopStop StopStopLeu (L) Ser (S) StopStop Trp (W)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Lys (K) Arg (R)Met (M)Met (M) Thr (T) Lys (K) Arg (R)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)

StopStop StopStopStopStop

Met (M)Met (M)

中研院生物資訊中心90/4/9 pm

6-frame translations6-frame translationsaagctgatcgatcgattttagatagagaaaaaact K L I D R F - I E K Kaagctgatcgatcgattttagatagagaaaaaact S - S I D F R - R K N aagctgatcgatcgattttagatagagaaaaaact A D R S I L D R E K Tagttttttctctatctaaaatcgatcgatcagctt S F F S I - N R S I Sagttttttctctatctaaaatcgatcgatcagctt V F S L S K I D R S Aagttttttctctatctaaaatcgatcgatcagctt F F L Y L K S I D Q L

5'3' Frame 1

5'3' Frame 2

5'3' Frame 3

3'5' Frame 1

3'5' Frame 2

3'5' Frame 3

Introduction

8中研院生物資訊中心90/4/9 pm

90/4/9 pm (BRC)中研院生物資訊中心 9

Introduction

EST (Expressed Sequence Tags) DBEST (Expressed Sequence Tags) DB

HGI (Human Gene Index) DBHGI (Human Gene Index) DB

Gene : Gene : ExonExon & Intron & IntroncDNA DatabasecDNA Database

UniGene DBUniGene DB

Introduction

Human Genome Sequencing (2/11/2001)

Draft 61.0 %

Finished 32.5%

Total 93.5 %

10中研院生物資訊中心90/4/9 pm

gap

Chromosome

90/4/9 pm 12中研院生物資訊中心

90/4/9 pm (BRC)中研院生物資訊中心 12

Introduction

Phase 0: Single-few pass reads of a single clone (not contigs)

Genome Database -- 3×10Genome Database -- 3×109 9

HTGS (High Throughput Genomic Sequences)HTGS (High Throughput Genomic Sequences)

Phase 1: Unfinished, may be unordered, unoriented contigs, with gaps.

Phase 2: Unfinished, ordered, oriented contigs, with or without gaps.

Phase 3: Finished, no gaps (with or without annotations).

90/4/9 pm (BRC)中研院生物資訊中心 13

Size range (kb) Contigs Aggregate size (kb) Percent of total  

<30 kb 44 666 0.1%  

30-100 479 32172 4.9%  

100-250 1628 260933 39.9%  

250-500 421 144518 22.1%  

500-1000 145 98623 15.1%  

>1000 kb 43 116557 17.8%  

total 2760 653471 100.0%  

Introduction

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction Some Research Topics

14中研院生物資訊中心90/4/9 pm

15

Early estimate: 60,000~100,000

By Ch22: ~45,000

By EST: ~140,000

By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene

rich and extremely cleaning and assemble process)

By 2/16/2001 Science: ~ 30,000

There are many more genes awaiting discovery

within the sequence

Gene number of human

中研院生物資訊中心90/4/9 pm

90/4/9 pm (BRC)中研院生物資訊中心 16

Some Research Topics

Alternative SplicingAlternative Splicing

Human DiversityHuman Diversity

Gene SignatureGene Signature

Genome AnnotationGenome Annotation

Human Genome: 3x109 bp

Genomic Sequence

Coding Region Non-coding Region

Gene

Single Nucleotide Polymorphism (SNP)

Inter-genic Region

Variations

gSNP

cSNP rSNP iSNP nSNP

106-107

Functional Variants (5%)17中研院生物資訊中心90/4/9 pm

Gene-based SNPsGene-based SNPs

18中研院生物資訊中心90/4/9 pm

Gene 1 Gene 2

P1 P2

nSNPrSNP

cSNP iSNP

exon

Intron

90/4/9 pm (BRC)中研院生物資訊中心 19

Human DiversityHuman Diversity

SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)

acccgctcgtcgct tgtgtt cggctaattgcgcgaat C

cC

Synonymous(tgt tgc C)

Silent

gH

Non-synonymous(tgt C, tgg W)

C: polar W: nonpolar(Non-conservative)

tat YY: polar

(Conservative)

90/4/9 pm (BRC)中研院生物資訊中心 20

Human DiversityHuman Diversity

SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)

Purines (A/G) & Pyrimidines (C/T)Transition: A G, C TTransversion: A/G C/T

CD-CV: common diseases - common variants.

90/4/9 pm (BRC)中研院生物資訊中心 21

Ch22: 134 pseudogenes (134/679 19%)

Pseudogene

Processed pseudogene (cDNAgenebank, 82% of 134 pseudogenes)

a) Single block

b) Lack characteristic intron – exon structure

Spliced pseudogene – segments of duplicated gene families

PseudogenePseudogene

90/4/9 pm (BRC)中研院生物資訊中心 22

Tandem Repeats

Repetitive SequenceRepetitive Sequence

SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR,

LINEs (Long Interspersed Elements): LINE1, LINE2,

Interspersed Repeats

Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp

Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp

α-Satellite: at centromere

Telomere Repeats

CentromereTelomere

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

Related Links and Resources

2390/4/9 pm 中研院生物資訊中心

90/4/9 pm (BRC)中研院生物資訊中心 24

TIGR(The Institute for Genomic Research) http://www.tigr.org/

Japan Science and Technology Corporation - Advanced Lifescience Information System JST - ALIS )

http://www-alis.tokyo.jst.go.jp/HGS/top.pl

NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ Sanger --- http://www.ensembl.org/

Related Links and Resources

90/4/9 pm (BRC)中研院生物資訊中心 25

Gene Prediction ProgramsGene Prediction Programs http://www.bork.embl-heidelberg.de/genepredict.html

http://linkage.rockefeller.edu/wli/gene/programs.html

ExPASy_Traslate ToolExPASy_Traslate Toolhttp://expasy.nhri.org.tw/tools/dna.html

Bioinformatics Research Center, Academia SinicaBioinformatics Research Center, Academia Sinicahttp://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics.html

Related Links and Resources

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

Bioinformation Research Center (BRC)

2690/4/9 pm 中研院生物資訊中心

Firewall

Local Server

Lab. 1 Lab. 2 Lab. 3

27中研院生物資訊中心90/4/9 pm

90/4/9 pm (BRC)中研院生物資訊中心 28

CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis

Genome Annotation Alternative Splicing SNP (Single Nucleotide Polymorphism)

cDNA database

Genome Sequences: Chromosome1~22,

X,Y

90/4/9 pm (BRC)中研院生物資訊中心 29

CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis

PC Clustering: 10 PC (PIII-667), 1 Server Win2000 (NT) HD: IDE support RAID DB2

Progressive Processing: Pyramid Structure Pattern Match Direct Search Parallel Processing

Environment

Algorithm

Server

query

p1 p2 p3

HD I/O bound

Network I/O bound

Sorting & assembling: CPU bound

Parallel ProcessingParallel Processing

30中研院生物資訊中心90/4/9 pm

BioinformaticsBioinformatics

Computer Science Biology Computer Science Biology

??