Repeated (Conserved) Patterns in Bioinformatics2010/04/09 · appear many times, e.g., commonor...

Repeated (Conserved) Patterns in Bioinformatics

Francis Y.L. Chin {钱玉麟教授}Taikoo Chair of Engineering

Chair Professor of Computer ScienceAssociate Dean of Engineering

University of Hong Kong

March, 2010

BioinformaticsUse laboratory experiments to understand biological processes is difficult, laborious, expensive and time-consuming.

Nowadays, large volumes of biological data are available.

Bioinformatics aims to exploit this data to understand biological processes through computational approach.

What are Repeated Patterns?Repeated patterns are similar patterns that appear many times, e.g., common or conserved patterns.Repeated patterns can be measured by the probability of their occurrence in a random environment (p-value)Low p-value means “information or signal bearing” and not “artifact”High p-value implies “inconclusive”

Why Repeated Patterns?

Finding repeated patterns is important in bioinformatics research

Sequence analysisAnalysis of mutations Comparative genomicsEvolutionary biology BiodiversityProtein-protein interaction

Repeated (Conserved) Patterns in DNA Sequences

The Central Dogma

DNA

produces

Protein

http://research.microsoft.com/uai2004/Slides/FriedmanUAI2004-part-II.ppt

Gene

Binding Site

Gene

Tyr Leu Protein

DNA

Transcription factor

C GA

AG

C AT

GG A

T

Identifying the TF binding sites on DNA sequence is an important problem.

Binding Site

Binding Sites Identification•

Find genes associated with the same TF.•

Search for short similar patterns (motif), i.e., binding sites, in the DNA sequences.

GCN4

DNA AAAATTGAGTCATATC…GAGAATGCCGGTCGTTCACGTG…

HIS7

GCN4

DNA AGTTATGACTAATATT …TATCATGTCCGAGGCGACTTTG…

TRP4

GCN4

DNA CCGAATGACTGCTCAT…AAAAATGTGTGGTATTTTAGGTA…

ADE4

Promoter regions | gene

Example>SPR3……CTGGTCGTAATACAAATAGAAGAGGTAAACCAATCAATGGCCC GTTAGTTTGCCATTTGCTGCATCCTTCCCATGCAAAGTGTCTT……>COX6……ACAGAAAATTCCAATCAAAAAGTTGGTGTTAGGCTATACTGAT GGCCGTATCGCTCCATACGAGCCAATCAGGGCCCCGCGCGTTA……>QCR8……CCACGTGACTAGTCCAAGGATTTTTTTTAAGCCAATTAAAATG AAGAAATGCGTGATCGGAAATTACGGGTAGTACGAGAAGGAAA……>CYC1……GGGCTTGATCCACCAACCAACGCTCGCCAAATGAACTGGCGCT TTGGTCTTCTGCCATCGTCCGTAAACCCCTTCCAAAGAGACCG……

Hypothesis: The binding sites are short similar string pattern in each sequence.

Example>SPR3……CTGGTCGTAATACAAATAGAAGAGGTAAACCAATCAATGGCCC GTTAGTTTGCCATTTGCTGCATCCTTCCCATGCAAAGTGTCTT……>COX6……ACAGAAAATTCCAATCAAAAAGTTGGTGTTAGGCTATACTGAT GGCCGTATCGCTCCATACGAGCCAATCAGGGCCCCGCGCGTTA……>QCR8……CCACGTGACTAGTCCAAGGATTTTTTTTAAGCCAATTAAAATG AAGAAATGCGTGATCGGAAATTACGGGTAGTACGAGAAGGAAA……>CYC1(reverse)……GGGCTTGATCCACCAACCAACGCTCGCCAAATGAACTGGCGCT TTGGTCTTCTGCCATCGTCCGTAAACCCCTTCCAAAGAGACCG……

All these strings are binding sites, and similar to a common pattern, CCAATCA, called motif.

String MotifGTTGTCATGGTGACGTTTCCATGGAAACGCTACCATGGCAACGTTACCATAGTAACGTTTCCATGGTAAC

Motif represents the common pattern of binding sitesThe binding sites are variants of the motif.Consensus motif gives the minimum total number of

errors (NP-complete for finding the motif with minimum maximum error) (Li et al, JCSS 2002)

GTTACCATGGTAAC – Consensus string (motif)

C. elegans Binding sites

Planted (l,d)-Motif Problem (PMP) (Pevzner and Sze, ISMB 2000)

l = length of motif

Md = Hamming DistanceInput:

T = t length-n sequences, each with at least one binding site.

Problem:Find M

and the binding sites (sub-strings

within Hamming distance d from M)

Can Motif Always be Found? Many methods, EM, Gibbs Sampling, exhaustive search,

maximal clique,… exist to find the motif.Motif can never be found when

Too few sequences/binding sites (t and n are too small)Binding sites “too” short (l is too small)Binding sites vary too much (d is too large)

Because p-vaule of those similar patterns is too high, i.e., more than one possible solutions will result.

Successful only when the “similar” patterns are “many” and “long” (low p-values) – such existence probability by random is very low.

Problems with String Motif

GTTACCATGGTAAC is a motifCTTACCATGGTAAC

(not binding site)Hamming distance cannot

model the real situation

GTTGTCATGGTGACGTTTCCATGGAAACGCTACCATGGCAACGTTACCATAGTAACGTTTCCATGGTAAC GTTACCATGGTAAC

Consensus string as motif

Hypothesis: The binding sites are short similar string pattern in each sequence, while some positions are conserved.

Motif represented by Matrix

Probability Matrix M M(α,j) = probability that the jth position is α

0 0 0 .8 0 0 0

.6 .8 1 0 0 0 .2

.2 .2 0 0 0 1 .8

.2 0 0 .2 1 0 0

AC

GT

Binding sites:TCCATGGCGCATGGCCCATGCGCCATGGCCCTTGG

Matrix motif (PSSM)

Problems about Matrix Representation

Given the matrix, what are binding sites?Hamming distance for string representation

Which matrix can be the motif?The pattern which gives most number of binding sites (string representation)

Motif RepresentationGiven matrix M,

is σ = GGCTTGC a binding

site?

Pr(σ generated by M)∏=

=l

j

jjMMp1

)],[(),( σσ

0 0 0 .8 0 0 0

.6 .8 1 0 0 0 .2

.2 .2 0 0 0 1 .8

.2 0 0 .2 1 0 0

AC

GT

p(M,“GGCTTGC”)= 0.2 ×

0.2 ×

1 ×

0.2

×

1 ×

1 ×

0.2

= 0.0016

Motif Representation

Pr(σ generated by Background)

∏=

=l

j

jBBp1

])[(),( σσ

B(A) = 0.2B(C) = 0.3B(G) = 0.3B(T) = 0.2

p(B,“GGCTTGC”)= 0.3 ×

0.3 ×

0.3 ×

0.2

×

0.2 ×

0.3 ×

0.3

= 0.0000972

What are the Binding Sites?σ is a binding site iff

Large ⇒

likely to be a binding sites

Example: “GGCTTGC”

= log 16.5 = 1.218 (a binding site if t

=1)

)(threshold ),(),(log t

BpMp

≥⎟⎟⎠

⎞⎜⎜⎝

⎛σσ

⎟⎠⎞

⎜⎝⎛=⎟⎟

⎠

⎞⎜⎜⎝

⎛0000972.0

0016.0log((log

)GGCTTGC"",)GGCTTGC"",

BpMp

Which is the Correct Matrix?

Each matrix M is given a score (Information Content), IC(M)

The score increases withNumber of binding sitesSimilarity with M, log(p(M,σ)/p(B,σ)) – t

The correct matrix M* has the maximum score.

∑≥

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟⎠

⎞⎜⎜⎝

⎛=

tBpMpt

BpMpMIC

)),(/),(log(: ),(),(log)(

σσσ σσ

Motif-Finding ProblemInput:

A set of sequences bound by a particular transcription factor

Output:A motif (probability matrix)Positions of binding sites in each sequence

Leung and Chin, "Finding Exact Optimal Motif in Matrix Representation by Partitioning", Bioinformatics, Vol 21, Supp 2, ECCB/JBI, ii86-92 (September 2005)

False positivesPattern “CGCGCG” appears many times in the

sequences, but not binding sites. Why?Some genes are not regulated (contains no

binding sites), their promoter regions also contain many “CGCGCG” patternsThese sequences can be used as negative set (control set)

Hypothesis: Those sequences without binding sites probably do not contain any string patterns similar to the motif.

Generalized Motif-Finding Problem

Input:T = sequences containing binding sitesF = sequences not containing binding sites

Output:Motif M and the binding sites

Leung and Chin, "Finding Motifs from All Sequences With and Without Binding Sites," Bioinformatics 2006

Color Ratio and Binding EnergyMicroarray experiments can be used to indicate

gene expression which is measured by color intensity (the probability of TF binding)

The amount of binding energy (strong or weak) can be estimated by the color ratio.

Hypothesis: Each binding site, depending on its pattern, has different binding energy with the TF.

Hypothesis: Higher color intensity means stronger binding.

Energy-based Model

Seq 1

Seq 2

Seq 4

Seq 3

-4.8

-5.1

-4.6

-0.5

Energy-based Model

Seq 1

Seq 2

Seq 4

Seq 3

Seq 2

CCAGATGAGATG

GACGATGAACGC

AGTGCTGAGGCTCCACCAGCTATT

-0.5 -0.7 0.5 -0.5 -0.6 0.1

0.3 -0.4 -0.1 0.3 0.1 0.1

-1.1 0.5 0.2 -0.8 -0.2 -0.4

0.1 0.8 -1.5 -0.2 0.3 -0.2

A

C

G

T

Energy Matrix:

-5.1

-4.6

-4.8

-0.5

Problem with binding energyInput:

A set of DNA sequencesThe binding energy between TF and each sequence (color intensity in microarray)

Output:The motif (energy matrix) M which produces the binding energy of each sequence

The binding sites are those patterns in the sequence with the lowest energy.

Simulated Data

Expected number of matrices

EBMF AlignACE MEME

Find? rank Find? rank Find? rank

B = 7 149475 yes 1 no - no -

B = 8 0.000439 yes 1 no - yes 1

B = 9 7.7×10-07 yes 1 yes 1 yes 1

Results of the algorithms on simulated data for 200 sequences of length 700 where 10 of them contain B binding sites of length 17 with expected likelihood -10

Leung, Chin, Yiu, et al, "Finding Motifs with Insufficient Number of Strong Binding Sites", Journal of Computational Biology, 2005, preliminary version appeared in RECOMB 2004.

Real Data

EBMF AlignACE MEME

Find? rank Find? rank Find? rank

Using the top 100 sequences in theoriginal data yes 2 yes 1 yes 1

Using the top 100 sequences exceptsequences 2,3,4 and 6 yes 1 no - no -

Using the top 100 sequences exceptsequences 1 to 6 yes 10 no - no -

Using the top 100 sequences exceptsequences 1 to 8 yes 5 no - no -

GAL4 (motif pattern is CGGN11 CCG)

Further Information about Transcription FactorsProteins (Transcription Factors) are with 3D

structure and can be grouped into classes

Their binding sites have different characteristics

Zinc finger Leucine zipper Helix-Turn-Helix

Six Major Classes

Guess the motif class based on the characteristics of binding sites

Search motifs in that class by modifying their likelihood accordingly

Classes Sub-Classes Characteristics Freq

Zinc FingerI. Cys2 His2

G . . G | G . . G . . G | [CG] . . [CG] . . [CG] 13%

II. Cys4 AGGTCA | TGACCT 13%Leucine zipper III. bZip TGA .* TCA 23%

Helix-Turn-HelixIV. bHLH CA . . TG 3%V. Homedomain TAAT | ATTA 11%

VI. Others (e.g. Forkhead) unknown 33%

3D Binding Domains and DNA Binding Sites

TFs - classified by 3D binding domains.3D binding domains and DNA binding sites are

relatedDNA binding sites - classified accordingly

Hypothesis: Most transcription factors can be classified into a few protein structures and binding domains, most binding sites should have a few patterns.

Leung and Chin, "Discovering Motifs with Transcription Factor Domain Knowledge", PSB2007

Experimental ResultsNumber of motifs with known sites discovered

MEME / DIMDomMEME: 38 DIMDom: 47

Average accuracyMEME: 0.3141DIMDom: 0.4471

Higher accuracy MEME: 9 data setsDIMDom: 26 data setssame accuracy: 3 data sets

Protein Sequences: A Motif-Pair for Binding

Leung, Siu, Yiu, Chin, Sung, "Finding Linear Motif Pair from Protein Interaction Networks: A Probabilistic Approach", CSB2007

Protein-Protein Interaction (PPI) Network

VertexProtein sequence

Edge (interaction) between u

and v

Protein u and protein vcan bind together

Many interactions are missing and erroneous

Motif Pairs for Protein Interaction

Problem: When two proteins interact, where are their binding sites (domains)?

Binding sites and Motif

1) Different proteins may have similar binding sites.

GLFPSNY

GFIPGNY

GVFPGNY

GIFPLNY

2) The proteins that these proteins bind to also contain another set of similar binding sites.

PTLPPR

PIKPPR

PTAPQR

PTLPSR

PPLPNR

PPLPTR

Interaction

If M and M’

are two motifs representing two sets of real binding sites that interact,

we expect that the sequences containing instances of the corresponding motifs should have many interactions.

Hypothesis

The Motif Pair Finding ProblemInput: protein-protein interaction network

Problem: To find a pair of motifs (M1, M2) such that sequences containing M1 and sequences containing M2 have unexpectedly large number of interactions.

( , ) may be a real motif pair!

e.g. ( , ) is not a real motif pair

Another Problem related to PPI: Predicting Protein Complexes

Leung, Xiang, Yiu and Chin, "Predicting Protein Complexes from PPI Data: A Core-Attachment Approach", Journal of Computational Biology, 2009. Preliminary version presented in RECOMB Satellite 2008.

Protein Complex

Protein ComplexGroup of proteins bind togetherRepresented as a connectedsubgraph in PPI network

Problem: To predict protein complexes from PPI network

Hypothesis: Proteins in the same complex have more interactions among them.

Protein complexes (dense subgraphs)

Heuristics to Find Dense Sub-graphsMarkov Cluster (MCL) (Enright et al., Nucl Acids Res 2002)Bootstrapping through random walks in graphs,

i.e., trap in clusters and rarely go out to other clustersMolecular Complex Detection (MCODE)

(Bader and Hogue, BMC Bioinf 2003)Start with high-degree vertices and recursively merge with

neighbors to ensure its density is above a given thresholdCFinder (Adamcsek et al., Bioinformatics 2006)Locate overlapping cliques by merging two k-cliques

if they share k-1 nodes

Core and Attachment (Leung et al., JCB 2009)Based on biological information that each complex has a core.

DifficultiesMany interactions are missing

The protein complexes may not necessarily be dense subgraphs, especially when the complexes are large

Some proteins present in multiple complexes

Structure of Protein Complex

Proteins in a complex consist of [Gavin et al. 2006]Core proteinsAttachments

Each protein complex has a unique set of core proteins

Core

Attachments

Modules

Core ProteinsRelatively more interactions among themselves

Dense subgraphsAttachments bind to core protein to form

complexesAttachments are neighbors of cores

Each protein complex has a unique set of core proteins

Cores are disjointCores do not present in other complexes

Experimental Results on Cores

Compare with Mediator [Andreopoulos et al. 2007] on Gavin dataset

# of correct cores acc ≥

0.4 acc ≥

0.6 acc ≥

0.8

Mediator 29 8 0

Ours 267 169 103

Experiments on Complexes

Compare with 3 methods (MCL, MCODE and CFinder) on 3 datasets

Datasets Number of proteinsNumber of

interactionsAverage degree

DIP 4,928 17,201 6.98

Krogan 2,675 7,080 5.29

Gavin 1,430 6,531 9.13

Comparison with MCL

MCL/Ours# of correct complexes

acc ≥

0.6 acc ≥

0.7 acc ≥

0.8

DIP 30 / 36 20 / 26 10 / 15

Krogan 28 / 37 16 / 21 7 / 11

Gavin 32 / 35 26 / 29 11 / 17

Comparison with MCODE

MCODE/Ours# of correct complexes

acc ≥

0.6 acc ≥

0.7 acc ≥

0.8

DIP 17 / 29 13 / 22 7 / 13

Krogan 6 / 24 5 / 16 2 / 8

Gavin 23 / 32 19 / 27 8 / 17

Comparison with CFinder

CFinder/Ours# of correct complexes

acc ≥

0.6 acc ≥

0.7 acc ≥

0.8

DIP 19 / 28 14 / 22 10 / 13

Krogan 28 / 28 13 / 19 5 / 11

Gavin 25 / 29 20 / 26 13 / 16

Different Biological Processes between Two Groups of Species

Given two groups of species, each with a metabolic network

Find those reactions or metabolic pathways which belong to most of the networks in one group but not in the other.

Hypothesis: There must exist a set of reactions or metabolic pathways in one group of specifies but not in the other.

Repeated Patterns makes Genome Assembly Difficult

Peng, Leung, Yiu and Chin, “IDBA - Iterative de Bruijn Graph de Novo Assembler”, RECOMB 2010 (to appear).

Genome with unknown sequence

Sequencing

Assembling

De novo Assembling

Read (45 – 140bp)

Genome with known sequence

However, there are many problems

Problems

Error in reads1-2% error rate per nucleotide

e.g. 1% error rate, 75bp read length~1 – (1 – 1%)75 = 53% reads have error

Gap Positions with no read cover

Repeatlength of repeat ≥

read length

impossible to assemble

Repeats in E.coli.

length Repeat #

30 3899

40 2784

50 2248

100 1074

200 536

300 345

500 200

1000 101

Genome

Sequencing

Assembling

De novo Assembling

gap

RepeatRepeat

error

contig new gap

Input:A set of reads from a genome

Objective:Construct contigs of the genome

Accuracy (>99.9%)CoverageLength

Assembling Problem

N50 (length of shortest contig in a set cover ≥ 50% genome)

Existing Approaches

GreedySSAKE (Warren, 2007), SHARCGS (Dohm, 2007)

Work well inhigh coverage and error-free data

Stop (favors large k)

Contig:

Stop (favors small k)overlap < k

error or repeat

overlap ≥

k

String GraphEdena (Hernandez, 2008)Vertex: ReadEdge: overlap ≥ kSimple path: contig

Handle some errors (dead-end)Other errors

Large graph (requires memory)

Work in high coverage and error-free data

TACTCTA CTCTAGC CTAGCTG TAGCTGAAGCTGCT

GTACTCT

Contig: GTACTCTAGCTG

G

CTGCTCC

Dead-endCTCC…Ex: k= 5

De Bruijn GraphVelvet (Zerbino, 2008)Vertex: k-mer in read (length-k substring)Edge: overlap k – 1 bpSimple path: contig

Advantage:Gain information in error readsBetter filtering

TACTCTA CTCTAGC CTAGCTGGTACTCT

TACTC ACTCTGTACT CTCTA TCTAG CTAGC TAGCT

G

GCTAGC

AGCTG

Ex: k= 5

FilteringSequencing depth: 75Read length: 75bpString graph

correct read appear ~1 timeserror read appear ~1 times

De Bruijn Graphcorrect 30-mer appear~45 timeserror 30-mer appear~1 – 2 times

Can filter error read easily

ACGCCATCACGTTCCGCCTTCACGTTCT

CCATCACGTTCTCA CCATCACGCTCTCACATCACGTTCTCAG

Reads

Correct 7-mersCATCACG: 4 timesATCACGT: 3 timesTCACGTT: 4 times

Error 7-mersCTTCACG: 1 timesTTCACGT: 1 timesATCACGC: 1 timesTCACGCT: 1 times

Problem of Existing Algorithms

Find Suitable k

GTACTAC

TACTA ACTACGTACT CTACT ACTAT CTATG TATGC

TACTACTACTACTACTACTAT

ACTATGC

TACTAC ACTACTGTACTACTACTATACTAT

ACTATG CTATGC

Branch problem

Gap problem

k

= 5

k

= 6

GTACTACTATGC

No suitable kLarge k Small k

Branch problemdue to repeat / error

Less More

Gap problemdue to low coverage / error

More Less

Solved by select suitable kStill produce short contigsRemark: appear in greedy and string graph algorithms too

Target

Less

Less

Iterative de Bruijn Graph Assembler

Construct an algorithm achieve a better result than any selection of k.Start with small k,

less gap but more branchesmany short contigs

Iterate to higher kto resolve branchesto get longer contigs by merging.

65

Simulated DataGenome

Escherichia coli (O157:H7 str. EC4115)5.6 million bp

Sequencing depth: 30xRead length: 75bpError rate: 1%Paired-end

Insert distance: 250bpRequirements for contigs

Accuracy > 99.9%Length > 100bp

Comparison (Simulated)N50: The minimum length that all contigs longer

than it will cover more than 50% of the genome

E.Coli, L = 75, depth = 30, error rate = 1% 67

Other Statistics

time Mem k Contigs# N50 max len. avg. len. cov. wrong # wrong len.

velvet 155s 1641M 30 1369 19284 96905 2652 94.6% 19 9813edena 957s 678M 40 4672 5104 46908 900 97.2% 650 72019abyss 1113s 1749M 40 1390 22109 87118 2966 95.1% 66 34998IDBA 370s 360M 25-50 1550 63218 217365 2210 97.5% 10 3935

optimal 50 1561 63218 217365 2051 99.1%

Other SettingOther Setting

High Coverage

Low Error Rate

(100x,0.5%)

Low Coverage

Low Error Rate(30x, 1%)

High Coverage

High Error Rate(100x, 2%)

Low Coverage

High Error Rate(30x, 2%)

Edena (string Graph) 63256 5104 53491 147

Velvet (de Bruijn Graph) 63214 24772 59285 16527

Abyss (de Bruijn Graph) 58678 22109 50009 10992

IDBA (our algorithm) 63218 63218 59287 32612

69

Real Biological Data

GenomeBacillus Subtilis

Sequencing depth: ~45xRead length: 75bpError rate: ~1%Requirement for contigs

Length > 100bp

Comparison (Real)

Bacillus Subtilis, L = 75, depth = 45, error rate = 1%

71

Other Statistics

time Mem k Contigs# N50 max len. avg. len. total len.

velvet 89s 893M 35 476 35136 164023 8580 4.08Medena 649s 632M 40 926 19423 66455 4444 4.11Mabyss 729s 923M 45 445 30081 134067 9184 4.09MIDBA 313s 310M 25-50 283 122574 602412 14489 4.1M

Conclusions

IDBA outperforms existing assembling algorithms

Contig lengthAccuracy

Main IdeaUsing multiple values of kGuarantee better results by increasing values of k

Can be downloaded athttp://www.cs.hku.hk/~alse/idba/

http://www.cs.hku.hk/~alse/idba/

Future Works

Develop more effective algorithm for paired-end readsAssembling transcriptome (RNA)

Different expression levelAlternative splicing

Meta genomic sequencingMixed genomes with different coverages

Thanks and Questions

Repeated (Conserved) Patterns in BioinformaticsBioinformaticsWhat are Repeated Patterns?Why Repeated Patterns?Repeated (Conserved) Patterns in DNA Sequences The Central DogmaBinding Site Binding Sites IdentificationExampleExampleString MotifPlanted (l,d)-Motif Problem (PMP)�(Pevzner and Sze, ISMB 2000)Can Motif Always be Found? Problems with String MotifMotif represented by MatrixProblems about Matrix RepresentationMotif RepresentationMotif RepresentationWhat are the Binding Sites?Which is the Correct Matrix?Motif-Finding ProblemFalse positivesGeneralized Motif-Finding ProblemColor Ratio and Binding EnergyEnergy-based ModelEnergy-based ModelProblem with binding energySimulated DataReal DataFurther Information about Transcription FactorsSix Major Classes3D Binding Domains �and DNA Binding SitesExperimental ResultsProtein Sequences:�A Motif-Pair for BindingProtein-Protein Interaction (PPI) NetworkMotif Pairs for Protein InteractionSlide Number 37Slide Number 38Slide Number 39Another Problem related to PPI: Predicting Protein ComplexesProtein ComplexSlide Number 42Heuristics to Find Dense Sub-graphsDifficultiesStructure of Protein ComplexCore ProteinsExperimental Results on CoresExperiments on ComplexesComparison with MCLComparison with MCODEComparison with CFinderDifferent Biological Processes between Two Groups of SpeciesRepeated Patterns makes Genome Assembly DifficultDe novo AssemblingProblemsDe novo AssemblingAssembling ProblemExisting ApproachesString GraphDe Bruijn GraphFilteringProblem of Existing AlgorithmsFind Suitable kNo suitable kIterative de Bruijn Graph AssemblerSimulated DataComparison (Simulated)Other StatisticsOther SettingReal Biological DataComparison (Real)Other StatisticsConclusionsFuture WorksThanks and Questions

Repeated (Conserved) Patterns in Bioinformatics2010/04/09 · appear many times, e.g., commonor...

Documents

Transcript of Repeated (Conserved) Patterns in Bioinformatics2010/04/09 · appear many times, e.g., commonor...