ウェーブレット木の世界

67
ウェーブレットの世界 岡野原 株式会社Preferred Infrastructure [email protected] 2013/1/9 統数研チャンネル _ _ _ _

description

2013/1/9に統数研チャンネルにて、ウェーブレット木の解説をしました。岩波書店より出版されました「高速文字列解析の世界」の解説になっています。

Transcript of ウェーブレット木の世界

2. l l l 5, 18l 2012/12/27 l l l l Burrows Wheelerl l 2 3. l l l l l __ _ _3 4. v[0] v[1] v[2] v[3] v[3] v[4] 17 19 80 66 10 6 1103l l 77l 77777774 5. a[s, e] = a[se]a[s, e) = a[s, e-1]v[0, n)v[s, e) mM w k kv[0] v[1] . v[s] v[s+1] .. .v[e-1] v[n-1]5 6. l l 1l l 0> 2) & 0x3333333333333333ULL);r = (r + (r >> 4)) & 0x0f0f0f0f0f0f0f0fULL;r = r + (r >> 8);r = r + (r >> 16);r = r + (r >> 32);return (uint64_t)(r & 0x7f);}34 35. 35 36. l T[0, n), 0 rangefreq/list0 1 2 3 4 5 6 7 0* 1* * 31621405 2* 3 *O(log n) 4* 1O(log n) 5* 6* 48. l l 0 0 1551 51 2422 4 3 243 4 15 2 48 49. 0155 T = 155242412124 2324B = 10001001100101041 i52 Bselect(1) i T(2) i selecti(T, p) 49 50. 50 51. l l = -1l [0264-1]264-1l l l l l 51 52. 0721436725047263 52 53. 07214367250472631bit0 0213202374675476 1[03][47] 53 54. 0721436725047263 0213202374675476 2bit0 0104542322376776 0, 1, 4, 52, 3, 6, 7 54 55. 0721436725047263 0213202374675476 0104542322376776 3bit0 00442226615337770, 2, 4, 61, 3, 5, 7 55 56. 0721436725047263 0213202374675476 0104542322376776 0044222661533777 56 57. 0721436725047263021320237467547601045423223767760044222661533777 57 58. 01001011010110100101101110110011010100100100110110 0 4 2 6 1 5 3 7 58 59. 010010110101101001011011101100110100100100110110 59 [0s-1]log2s 1nzd=0 [0264-1]64 1 60. rank2(T, 12)0123456789012345T = 0721436725047263s ( e ( Bd : dbit0100101101011010 nzd : Bd-10 s = 0, e = pos0101101110110011 for d = 0 to log2s b = cdbit s = rankb(Bd, s)0100100100110110 e = rankb(Bd, e) if (b == 1)0044222661533777 s += nzd, e += nzd end if end for60 return e - s 61. l l + l l l l l l Burrows Wheelerl 61 62. 1l l l access, rank, select, topk,l l l 0l 0l 2/31/262 63. 2l 01l l l RRRrank/select100l rsdic l 1163 64. 64 65. l l l l sdarrayl rsdic l l wat_arrayl l fmindex++ http://code.google.com/p/fmindex-plus-plus/ l gwt http://code.google.com/p/gwt/65 66. l l l state-of-the-artl l l l l l l 66 67. l , , l Wavele Trees for All, G. Navarro, CPM2012l The Wavelet Matrix, F Claude, G. Navarro, SPIRE 2012.67