Optimal alignments in linear space

Introduction

Optimal alignments in linear spaceEugene W.Myers and Webb Miller Optimal alignments in linear space1OutlineIntroductionGotoh's algorithm O(N) space Gotoh's algorithmMain algorithmImplementationConclusion

outlineintroductionpaperGotohlinear spacegotohimplementation2Introduction Optimal alignments in linear space3IntroductionSpace, not time Hirschbergs AlgorithmMaximizing the similarity score of an alignmentGotohs AlgorithmMinimizing the difference score of a conversionLinear space version for affine gap penalties.For a megabyte of memory.W.Myers and Miller : sequences of length 62500 Altschul and Erickson : sequences length < 1070

hirschbergGotohhirschbergAlignmentGotohseqeucensequencecostpapergotohaffine gap penaltieslinear spacePaper1 megabyte62500sequenceErickson1070sequence4Transformation (1/2)hisberggotohHirschbergGotohHirbergmatchmismatch#(a,b)Gotohw(a,b)XXX.. GotohcostgappenaltiesG = -Q, h = XXXH XXX. h1/2max?5Transformation (2/2)Match = 8, Mismatch = -5, Gap Symbol = -3, Gap-open = -4< hisberg2gap penaltymismatchW(a,b)gotohmimatch#max-2rgotohgapgapcost6Example(1/2)hisbergmatch8convertion cost0mismatch7Example(2/2)Hirschbergs AlgorithmGotohs AlgorithmCostC(minimum) gotohconversion costChisberg maximum score Mnseq1seqconversion cost0hisberg2mismatchhisrsbeg673gapConversion costgap symbolgap open11hirsbergGotohconversion cost

8Gotoh's algorithmR99922005Some notations : the i-symbol prefix of A : the j-symbol prefix of BC(i, j):minimum cost of a conversion of to

Simple gap(1/4)gap(k)= h*k

Simple gap(2/4)0.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTACSpace= O(n^2)Simple gap(3/4)m/2Simple gap(4/4)Forward score and backward scoreSpace: O(m+n)Affine gap(1/8)A gap of length k : cost = g + k*hA - - - T A A C TC G A A T C - - T Affine gap(2/8)C(i, j):minimum cost of a conversion of to D(i, j):minimum cost of a conversion of to that deletesI(i, j):minimum cost of a conversion of to that inserts

Affine gap(3/8) if i > 0 and j> 0 if i = 0 and j> 0 if i > 0 and j= 0 if i = 0 and j= 0

Affine gap(4/8) if i > 0 and j> 0 if i = 0 and j> 0

Affine gap(5/8) if i > 0 and j> 0 if i > 0 and j= 0

Affine gap(6/8)

Affine gap(7/8)*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0 A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

CDIAffine gap(8/8)*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

IDCO(N) space Gotoh's algorithm R99922041Observationi-th row of C and D depends only on row i and i-1.i-th row of I depends only on row i.

CDILinear SpaceUse two one-dimension arrays (CC and DD) and three variables.Linear SpaceAlgorithm

*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

CDIg = 2.0 h = 0.5CCDDt = 2.0*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

sceCCDDg = 2.0 h = 0.5i = 5t = 4.5CDI*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

sceCCDDt = 4.5i = 5j = 1g = 2.0 h = 0.5CDI*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

scCCDDt = 4.5i = 5j = 1g = 2.0 h = 0.5eCDI*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

sCCDDt = 4.5i = 5j = 1g = 2.0 h = 0.5ecCDI*4.55.05.5*5.05.56.0*2.55.05.5*3.03.55.0*3.54.04.5*4.04.55.00.02.53.03.52.50.02.53.03.02.51.02.53.53.03.52.04.03.53.04.54.54.04.54.0A A GAGTAC****4.55.02.53.05.05.55.03.55.56.05.56.06.06.56.05.56.57.06.57.0A A GAGTACA A GAGTAC

Optimal conversion cost.CCDDCDIWhat is the conversion of AGTAC and AAG ?Main algorithm part 1B95902077

MidpointHirschberg (1975): recursive divide-and-conquerBackward ComputingForward ComputingGap Penaltyi-1, j-1i, j-1i-1, ji, jGap PenaltyCC( j) = minimum cost of a conversion of Ai* to BjDD( j) = minimum cost of a conversion of Ai* to Bj that ends with a deleteGap PenaltyRR(N - j) = minimum cost of a conversion of Ai*T to BjTSS(N - j) = minimum cost of a conversion of Ai*T to BjT that begins with a delete

Find Midpoint with Gap PenaltyBackward ComputingForward ComputingHow to compute the midpoint?Main algorithm part 2R99922035MidpointThe problem of calculating the midpoint is that when we concatenate two substrings into one, we may coalesce two gaps into one

Which means that we may consider min { CC + RR, DD + SS - g, II + JJ - g}MidpointRecall the above algorithm, we do save the space of II and JJ.

We can reduce it into min {CC + RR, DD + SS - g} MidpointRemember that we should find minj [0, N]{min { CC + RR, DD + SS - g, II + JJ - g}} i*j j+1MidpointType 1 recurrence Type 2 recurrence

i*j*i*j*Example A = agtac , B = aag, i* = 2 agtac a__ag

Recurrsive call on (a, a) and (ac, ag)

ImplementationR99922062ImplementationStorage Requirement

Memory v.s. Sequence length

Compared with classic dynamic programming algorithm

Linear space algorithm -> space not time49Storage Requirement(1/4)Vectors : CC,DD,RR, and SSSpace: 4N words

M + N words for an optimal conversion

M = N = 38

40Storage Requirement(2/4)16384 words for the table(w):replacement costs128*128

wASCII [1]ASCII [2]ASCII[3]ASCII[4]ASCII[]ASCII[128]ASCII [1]W1,1W1,2W1,3W1,4W1,W1,128ASCII [2]W2,1W2,2W2,3W2,4W2,W2,128ASCII [3]W3,1W3,2W3,3W3,4W3,W3,128ASCII [4]W4,1W4,2W4,3W4,4W4,W4,128ASCII[]W,1W,2W,3W,4W,W,128ASCII[128]W128,1W128,2W128,3W128,4W128,W128,128Storage Requirement(3/4)16 words for the table(w):replacement costs4*4

ATCGAW(A,A)W(A,T)W(A,C)W(A,G)TW(T,A)W(T,T)W(T,C)W(T,G)CW(C,A)W(C,T)W(C,C)W(C,G)GW(G,A)W(G,T)W(G,C)W(G,G)Storage Requirement(4/4)M + N bytes for the sequences A and B.A and B could be compressedDNA sequences only 2(M + N) bits are necessary

Compress -> Huffman code 53Memory v.s. Sequence lengthMaximum length of sequences that can be aligned in a given amount of memory

Altschul and Erickson : 7MN-bit approachMemory (bytes)Linear Space(w/o op.)Linear Space(with op.)Altschul and Erickson 64K40002666270128k80005333382256k16000106665401000k62500416661069N = Memory / 4*4N = Memory / 6*4N = sqrt(Memory *8 / 7)

Compared with classic dynamic programming algorithmclassic dynamic programming algorithm(Wagner and Fischer, 1974).

Compared with classic dynamic programming algorithmSpace : classic dynamic programming algorithm : O(MN)linear-space algorithm O(N + lgM)Time : Both O(MN)But in practice, linear-space slower than classic dynamic programming algorithm.linear-space : classic DP = 2.84 : 1 ConclusionR9994502058 0-3-6-9-12-15-18-21-24-3852-1-4-7-10-13-6530-3741-2-920-2-552-19-12-1-3-5630107-15-4-6-831-285-18-7-9-110-2963-21-10-12-14-386414C G G A T C A TCTTAACTReduce problem58Reduce problem(cont.)

60Reduce problem(cont.)m/2Partition line60

Optimal alignments in linear space

Documents

Transcript of Optimal alignments in linear space

Controle Optimal

Optimal ernæring

Model Space Paper Space Layout Scalas AutoCAD

Chapter 2 Data Searches and Pairwise Alignments

Space Complexity Non-Deterministic Space

Ushqimi Optimal

Optimal hverdag

Flexible Load Balancing with Multi-dimensional State-space Collapse: Throughput … · 2020. 4. 28. · Given a throughput optimal load balancing policy, if there exists an 2(0;1]

Optimal Kedjeröjare

SPONSOR’s SPACE SPONSOR’s SPACE...SPONSOR’s SPACE SPONSOR’s SPACE SPONSOR’s SPACE SPONSOR’s SPACE 12 13 オフィシャルスポンサーOFFICIAL SPONSOR Title kf_sponsorship2019_07.indd

CFA Level I - The Analyst Space · optimal portfolio using Roy’s safety-first criterion. 15. Explain the relationship between normal and lognormal distributions and why the lognormal

CONTROLE OTIMO DE SISTEMAS SUJEITOS A SALTOS COM CADEIA DE MARKOV … · 2014-09-20 · Keywords| Linear Quadratic Optimal Control, Markov Jump Linear Systems, General Borel Space,

1 4.7 Alternate Optimal Solutions If an LP has more than one optimal solution, it has multiple optimal solutions ( 多重最佳解 ) or alternative optimal solutions(

Sequence Alignments --- Ilka Maria Axmann - Charité-Universitätsmedizin Berlin, Institut für Theoretische Biologie.

Evolutionary inaccuracy of pairwise structural alignments (slide)

AIA Alignments - 2016-2018aiaonline.org/files/15189/aia-alignments-2016-2018.pdf · AIA Alignments 2016-2018 sorted by school name 1 | Page Updated: 05/17/2016, 10:30 a.m. Table of

Optimal beraten ins Staatsexamen! · Optimal beraten ins Staatsexamen! (LA 2011) 09.01.2015 Marte Türschmann, M.A. Geschäftsführerin Zentrales Prüfungsamt IQF-Projekt „Optimal

Approximate solutions of the incompressible Euler ......P.-L. Lions proved an optimal result for bounded domain ﬂows, assuming that the initial vorticity lies in the Lorentz space

Optimal ADMM Parameters for Distributed Optimal Power Flow

Statistical significance of alignments