IR Project, Team 9
1
Information Retrieval Project
Team 9資研一 90522035 黃國瑜資研一 90522045 何聰鑫資研一 90522077 丁智凱
IR Project, Team 9
2
System architecture CPU Speed
– PIII 1G RAM
– 256 Mb OS
– Win 2000 Programming
– php Database
– MySQL
IR Project, Team 9
3
Indexing method(1/3) Indexing
– Using lower case of letter– Elimination of stopwords
Using hash table 317 word
– Removing punctuation mark– Removing letters with length less than 3– Removing <tag>
IR Project, Team 9
4
Indexing method(2/3) Database Table
– IndexMap (Index, TermID, DocID, Line, Pattern)
– DocMap (DocID, FileName, DocTitle)
– TermMap (TermID, Term)
IR Project, Team 9
5
Indexing method(3/3) Indexing Speed
– 130 sec/Mb– Total : 125sec * 490Mb=17 hr– E.q
File Name : FB496255 File Size : 997438 Total Term : 8523 Start : 1004540338.9145 sec End : 1004540464.1279 sec Total : 125.2134180069 sec
IR Project, Team 9
6
Query(1/3) Interface
– Query– Insert New Data– Existed Data View– Help– Mail
IR Project, Team 9
7
Query(2/3) Query
– Feature Multiple keyword query Title Query
– Speed Match String : 6448
– Search Time : 2.3293360471725 sec Match String : 239
– Search Time : 0.72075593471527 sec
( Base on speed of netwrok and result number)
IR Project, Team 9
8
Query(3/3) Output
– Performance Match String Search Time
– Query Result File Name Document Title Line ( show 5 line ) # of Pattern ( Highlight Mark )
Top Related