90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics...

31
1 中中中中中中中中中 (BRC) 90/4/9 pm Introduction of Genome Introduction of Genome Research Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊莊莊 www.sinica.edu.tw/~trees/bioinformatics E-mail: [email protected]

Transcript of 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics...

Page 1: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

1 (BRC)中研院生物資訊中心90/4/9 pm

Introduction of Introduction of Genome ResearchGenome Research

Bioinformatics Research CenterInstitute of Biomedical Sciences

ACADEMIA SINICA

莊樹諄www.sinica.edu.tw/~trees/bioinformaticsE-mail: [email protected]

Page 2: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Outline

IntroductionIntroduction Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

290/4/9 pm 中研院生物資訊中心 (BRC)

Page 3: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

390/4/9 pm 中研院生物資訊中心 (BRC)

Chromosome

Page 4: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Introduction

490/4/9 pm

GeneDNA Sequence

Intron

5‘ 3’

5‘UTR 3’UTR

mRNAcDNAComplement DNA

ORF

Exon(coding regions)

DNA

RNA

Protein

Function

Page 5: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 5

DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶 ) --- 4 letters

Introduction

DNA nucleotide acid ( 核苷酸 )

Phosphoric acid( 磷酸 ) Deoxyribose ( 去氧核糖 ) Nitrogenous base ( 含氮鹽基 )

Nitrogenous base ( 含氮鹽基 )

Purines :

Pyrimidine :

Nitrogenous base ( 含氮鹽基 )

Adenine (A, 腺嘌呤 ) Guanine (G, 鳥糞嘌呤 )

Cytosine (C, 胞嘧啶 ) Thymine (T, 胸腺嘧啶 )

Page 6: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA

5‘ 3‘

TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT3‘ 5‘

Codon

ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA

Amino acid

90/4/9 pm 中研院生物資訊中心 (BRC) 6

43 = 64 20

Page 7: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

IntroductionDNA sequence: A, C, G, T --- 4 lettersRNA sequence: A, C, G, U --- 4 lettersAmino acid sequence: --- 20 letters

7

Second position ThirdPosition (3’)

FirstPosition (5’) U C A G

UCAG

UCAGUCAG

UCAG

U

C

A

G

Phe (F) Ser (S) Tyr (Y) Cys (C)Phe (F) Ser (S) Tyr (Y) Cys (C)Leu (L) Ser (S) StopStop StopStopLeu (L) Ser (S) StopStop Trp (W)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Lys (K) Arg (R)Met (M)Met (M) Thr (T) Lys (K) Arg (R)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)

StopStop StopStopStopStop

Met (M)Met (M)

中研院生物資訊中心90/4/9 pm

Page 8: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

6-frame translations6-frame translationsaagctgatcgatcgattttagatagagaaaaaact K L I D R F - I E K Kaagctgatcgatcgattttagatagagaaaaaact S - S I D F R - R K N aagctgatcgatcgattttagatagagaaaaaact A D R S I L D R E K Tagttttttctctatctaaaatcgatcgatcagctt S F F S I - N R S I Sagttttttctctatctaaaatcgatcgatcagctt V F S L S K I D R S Aagttttttctctatctaaaatcgatcgatcagctt F F L Y L K S I D Q L

5'3' Frame 1

5'3' Frame 2

5'3' Frame 3

3'5' Frame 1

3'5' Frame 2

3'5' Frame 3

Introduction

8中研院生物資訊中心90/4/9 pm

Page 9: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 9

Introduction

EST (Expressed Sequence Tags) DBEST (Expressed Sequence Tags) DB

HGI (Human Gene Index) DBHGI (Human Gene Index) DB

Gene : Gene : ExonExon & Intron & IntroncDNA DatabasecDNA Database

UniGene DBUniGene DB

Page 10: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Introduction

Human Genome Sequencing (2/11/2001)

Draft 61.0 %

Finished 32.5%

Total 93.5 %

10中研院生物資訊中心90/4/9 pm

Page 11: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

gap

Chromosome

90/4/9 pm 12中研院生物資訊中心

Page 12: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 12

Introduction

Phase 0: Single-few pass reads of a single clone (not contigs)

Genome Database -- 3×10Genome Database -- 3×109 9

HTGS (High Throughput Genomic Sequences)HTGS (High Throughput Genomic Sequences)

Phase 1: Unfinished, may be unordered, unoriented contigs, with gaps.

Phase 2: Unfinished, ordered, oriented contigs, with or without gaps.

Phase 3: Finished, no gaps (with or without annotations).

Page 13: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 13

Size range (kb) Contigs Aggregate size (kb) Percent of total  

<30 kb 44 666 0.1%  

30-100 479 32172 4.9%  

100-250 1628 260933 39.9%  

250-500 421 144518 22.1%  

500-1000 145 98623 15.1%  

>1000 kb 43 116557 17.8%  

total 2760 653471 100.0%  

Introduction

Page 14: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction Some Research Topics

14中研院生物資訊中心90/4/9 pm

Page 15: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

15

Early estimate: 60,000~100,000

By Ch22: ~45,000

By EST: ~140,000

By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene

rich and extremely cleaning and assemble process)

By 2/16/2001 Science: ~ 30,000

There are many more genes awaiting discovery

within the sequence

Gene number of human

中研院生物資訊中心90/4/9 pm

Page 16: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 16

Some Research Topics

Alternative SplicingAlternative Splicing

Human DiversityHuman Diversity

Gene SignatureGene Signature

Genome AnnotationGenome Annotation

Page 17: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Human Genome: 3x109 bp

Genomic Sequence

Coding Region Non-coding Region

Gene

Single Nucleotide Polymorphism (SNP)

Inter-genic Region

Variations

gSNP

cSNP rSNP iSNP nSNP

106-107

Functional Variants (5%)17中研院生物資訊中心90/4/9 pm

Page 18: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Gene-based SNPsGene-based SNPs

18中研院生物資訊中心90/4/9 pm

Gene 1 Gene 2

P1 P2

nSNPrSNP

cSNP iSNP

exon

Intron

Page 19: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 19

Human DiversityHuman Diversity

SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)

acccgctcgtcgct tgtgtt cggctaattgcgcgaat C

cC

Synonymous(tgt tgc C)

Silent

gH

Non-synonymous(tgt C, tgg W)

C: polar W: nonpolar(Non-conservative)

tat YY: polar

(Conservative)

Page 20: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 20

Human DiversityHuman Diversity

SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)

Purines (A/G) & Pyrimidines (C/T)Transition: A G, C TTransversion: A/G C/T

CD-CV: common diseases - common variants.

Page 21: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 21

Ch22: 134 pseudogenes (134/679 19%)

Pseudogene

Processed pseudogene (cDNAgenebank, 82% of 134 pseudogenes)

a) Single block

b) Lack characteristic intron – exon structure

Spliced pseudogene – segments of duplicated gene families

PseudogenePseudogene

Page 22: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 22

Tandem Repeats

Repetitive SequenceRepetitive Sequence

SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR,

LINEs (Long Interspersed Elements): LINE1, LINE2,

Interspersed Repeats

Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp

Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp

α-Satellite: at centromere

Telomere Repeats

CentromereTelomere

Page 23: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

Related Links and Resources

2390/4/9 pm 中研院生物資訊中心

Page 24: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 24

TIGR(The Institute for Genomic Research) http://www.tigr.org/

Japan Science and Technology Corporation - Advanced Lifescience Information System JST - ALIS )

http://www-alis.tokyo.jst.go.jp/HGS/top.pl

NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ Sanger --- http://www.ensembl.org/

Related Links and Resources

Page 25: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 25

Gene Prediction ProgramsGene Prediction Programs http://www.bork.embl-heidelberg.de/genepredict.html

http://linkage.rockefeller.edu/wli/gene/programs.html

ExPASy_Traslate ToolExPASy_Traslate Toolhttp://expasy.nhri.org.tw/tools/dna.html

Bioinformatics Research Center, Academia SinicaBioinformatics Research Center, Academia Sinicahttp://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics.html

Related Links and Resources

Page 26: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

Bioinformation Research Center (BRC)

2690/4/9 pm 中研院生物資訊中心

Page 27: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Firewall

Local Server

Lab. 1 Lab. 2 Lab. 3

27中研院生物資訊中心90/4/9 pm

Page 28: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 28

CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis

Genome Annotation Alternative Splicing SNP (Single Nucleotide Polymorphism)

cDNA database

Genome Sequences: Chromosome1~22,

X,Y

Page 29: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

90/4/9 pm (BRC)中研院生物資訊中心 29

CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis

PC Clustering: 10 PC (PIII-667), 1 Server Win2000 (NT) HD: IDE support RAID DB2

Progressive Processing: Pyramid Structure Pattern Match Direct Search Parallel Processing

Environment

Algorithm

Page 30: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

Server

query

p1 p2 p3

HD I/O bound

Network I/O bound

Sorting & assembling: CPU bound

Parallel ProcessingParallel Processing

30中研院生物資訊中心90/4/9 pm

Page 31: 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 trees/bioinformatics.

BioinformaticsBioinformatics

Computer Science Biology Computer Science Biology

??