Text Mining : Experience
-
Upload
boonlert-aroonpiboon -
Category
Technology
-
view
2.877 -
download
0
description
Transcript of Text Mining : Experience
ประสบการณ์การวเิคราะห์ข้อมลูด้วยวิธีการท าเหมืองข้อมูล (Text Mining)
ดร.อลสิา คงทน
นักวิจัย ห้องปฏบิัติการวิจัยวิทยาการมนุษยภาษา
ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ
1
Text Mining is about…
“Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.”
Tapping the Power of Text Mining
Communications of the ACM, Sept. 2006
2
Humans VS. Computers
• Humans: Ability to distinguish and apply linguistic patterns to text
– Could overcome language difficulties such as slangs, spelling
variations, contextual meaning
• Computers: Ability to process text in large volumes at high speed
– Could sift through a large collection of texts to find simple statistics
and relationship among terms in an instant of time
• Text mining requires a combination of both
Human's linguistic capability + computer's speed and accuracy
NLP Data Mining
Text Mining Tasks
• Information extraction:
– Analyze unstructured text and identify key words or
phrases and relationships within text
• Topic detection and tracking:
– Filter and present only documents relevant to the user
profile
• Summarization:
– Text summarization reduces the content by retaining only its main points and overall meaning
4
Text Mining Tasks
• Categorization:
– Automatic classify documents into predefined
categories
• Clustering:
– Group similar documents based on their similarity
• Concept Linkage
– Connect related documents by identifying their shared
concepts, helping users find information they perhaps
wouldn't have found through traditional search methods
5
Text Mining Tasks
• Information Visualization
– Represent documents or information in graphical
formats for easily browsing, viewing, or searching
• Question and answering (Q&A)
– Search and extract the best answer to a given question
6
Applications: Tech Mining
• Tech Mining is the application of text mining
tools to science and technology (S&T)
information particularly bibliographic abstracts
• It exploits the S&T databases to see patterns,
detect associations, and foresee opportunities
7
Tech Mining Process
8
Technical Intelligences:
Who, What, When, Where?
• Digest multiple S&T information resources
• Profile Research Domains:
– Who?
– What?
– When?
– Where?
• Map Relationships: Topics & Teams
• Analyze Trends: What’s Hot & What’s Coming
• And do so -- Quickly
9
What if I don’t have Tech
Mining Software?
10
What if I don’t have Tech
Mining Software?
11
Output example from Tech
Mining Software
12Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005)
Applications: Expert Finder
13
Applications: Expert Finder
14
Applications: Expert Finder
15
Applications: ABDUL
(Artificial BudDy U Love)
• An online information service which currently provides access to Thai linguistic (e.g., dictionary and sentence
translation) and information resources (e.g., weather
condition, stock price, gas price, traffic condition, etc.)
• Users are able to use natural language to interact with
ABDUL via Instant Messaging (IM) based protocol, Web
browser, and Mobile devices
16
Applications: ABDUL
(Artificial BudDy U Love)
17
Applications: ABDUL
(Artificial BudDy U Love)
18
Web 1.0 VS. Web 2.0
19
User-Generated Contents
• With the Web 2.0 or social networking websites, the amount of user-generated contents has increased
exponentially
• User-generated contents often contain opinions and/or sentiments
• An in-depth analysis of these opinionated texts could
reveal potentially useful information, e.g.,
– Preferences of people towards many different topics including news
events, social issues and commercial products
20
Online Opinion Resources
Characteristics of Online
Reviews
• Natural language and unstructured text format
• Some reviews are long and contain only a few
sentences expressing opinions on the product
• Could be difficult for a potential reader to
understand and analyze each review that
maybe relevant to his or her decision making
22
Opinion Mining
• Opinion mining and sentiment analysis is a task for analyzing and summarizing what people think about a
certain topic
• Opinion mining has gained a lot of interest in text mining and NLP communities
• Three granularities of opinion mining:
– Document level
– Sentence level
– Feature level
23
Feature-Based Opinion Mining
• This approach typically consists of two following
steps:
1. Identifying and extracting features of an object,
topic or event from each sentence
2. Determining whether the opinions regarding the
features are positive or negative
24
Opinion Mining on Hotel Reviews in
Thailand (Graphical Display)
25
Opinion Mining on Hotel Reviews in
Thailand (Textual Display)
26
Comparison among Hotels
27
Opinion Mining on Mobile
Network Operators in Thailand
28
Opinion Mining on Mobile
Network Operators in Thailand
29
Challenges in Text Mining
• Text Mining = NLP + Data Mining
• Statistical NLP
– Ambiguity
– Context
– Tokenization \ Sentence Detection
– POS tagging
• Data Mining
– Ability to process the data
– Massive amounts of data
– Determining and extracting information of interest
30
Conclusions
• As the amount of data increases, text-mining
tools that sift through it will be increasingly
valuable
• Various applications for academic and industry
uses
31
Thank you for your attention
Q&A
32