Korean script searching in Korean Library OPACs

39
Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

description

Korean script searching in Korean Library OPACs. Junglim Chae Yonsei University. Indexing Method. N-Gram Morphological Analysis. N-Gram Indexing. N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다 12 Index by Bigram Segmentation - PowerPoint PPT Presentation

Transcript of Korean script searching in Korean Library OPACs

  • Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

  • Indexing Method

    N-Gram

    Morphological Analysis

  • N-Gram IndexingN-Gram : Unigram, Bigram, Trigram, N-GramE.g.) 12 Index by Bigram Segmentation, , , 0 , 0, , 0 , 0, , , , Many index terms-many results but lots of noise High recall ratio but low precision ratio

  • Morphological AnalysisRequires a morphological analysis dictionaryE.g.) Three Index by morphological analysis, , Ability to match linguistically similar terms Faster performance with a smaller index Accurate matches that meet user expectationsHigh precision ratio but low recall ratio

  • N-Gram Vs. Morphological Analysis

    N-GramMorphological AnalysisRecall RatioHighLowPrecision RatioLowHighSize of IndexBigSmallIndexing SpeedFastSlowSearch SpeedSlowFastApplicationLibrariesWeb Search Engines

  • A Case Study

    Yonsei University LibraryLibrary System: Maestro-Y Search Engine: K2 by VerityIndexing Method N-Gram (bigram) + Morphological AnalysisIndexing RulesRule1: Divide Strings by space Rule2: Extract index using bigram indexing methodRule3: Add the whole string excluding spaces between strings Rule4: Add words from Korean morphological analysis dictionary

  • A Case Study

    Yonsei University LibraryE.g.)

    / (rule1), , , , (rule2)(rule3)(rule4)Index: , , , , , ,

  • Search Tips

  • Search Tips(1)Keyword Search

    , Default Search OptionUse at most 3 keywordsUse Boolean operatorsOmit Stop-words

  • Search Tips(2)Keyword Search

    Follow the Korean Word Division Rules E.g.) (O) (X)

  • Search Tips(3)

    Keyword Search

    Compound Nounsdo not use spaces between nounsE.g.) (O), (X )

  • Browse SearchBegin with or Truncation,

    When you already know the first word of the title, author, or publisher E.g.)

    Search Tips(4)

  • Browse Search

    Korean ClassicsE.g.)

    Search Tips(5)

  • Exact Match

    Precise Search

    Known itemsE.g.) Search Tips(6)

  • Exact Match

    Single character wordsE.g.) , , C

    Search Tips(7)

  • Support Hangul/Hancha Searching

    E.g.) /

    Search Tips(8)

  • Japanese KanaArchaic KoreanRussianSpecial characters : Choose scripts from Multi-language Input Table

    Search Tips(9)

  • E.g.) Multi-Script Input Table

  • Japanese Kana//

    Search Tips(10)

  • Personal names ; Shakespeare ; Murakami, Haruki ; ; ,

    Search Tips(11)

  • Space Considered as ANDE.g.) = AND In some OPACs, spaces in the character fields do make a difference in retrieval

    Search Tips(12)

  • Comparative search with and without space

  • Thank You

    [email protected]

    *********************.************

    *****

    *