Post on 16-Apr-2017
Text Mining / Data Mining
STKS Applied ICT for Executive Librarians30 2553
Outlines
Definition Text Mining Techniques Applications Text mining tools STKS Text Mining
Text Mining
Text mining is the process of analyzing & structure Large sets of documents applying statistical and/or Computational linguistics technology in order to extract Previously unknown knowledge useful to take crucial Business decision.
/ / information extraction
Text mining ()
Text mining is a new and exciting research area in computer sciences that tries to solve the information overload problem by using techniques from data mining / machine learning / natural language processing (NLP) / information retrieval and knowledge management.
A key element of text mining is its on the document collection. At its simplest a document collection can be any grouping of text based documents such asbusiness report /legal memorandum / e-mail/ research Paper / manuscript /article /press release
Text Mining
/ Searching /
Text Mining
Text Mining Data Mining
Scientometrics
Webometrics
Bibliometrics etc.
Information Extraction / IE
Natural language processing community MUC conference1987 US DARPA (naval tactical operation)MUC-2 Conference 1989 MUC-3 conference 1991 Latin American Terrorisms MUC-4 1992MUC-5 1993 Japanese document ( joint ventures + microelectronics)MUC-6 1995 Financial domainMUC-7 1998 Airline Crashes domain ( Chinese, Japanese, English )European Commission / LRE ( linguistic research & engineering )
IE NAMIC /CROSSMARC , MOSES
Figure 1 The Evolution ofdatabase system technology.
Example of output from industry analyzer term extraction process
Biogen Idec Inc. ended its third quarter with $543 million inRevenues , slightly lower than analyst estimates as it near theOne-year anniversary of a merger that made it the world largestBiotech company
The Cambridge,Mass.-based company reported non-GAAP Earnings per share of 37 cents and net income of $132 million compared with 35 cents and 123 million for the quarter last year. Analysts consensus estimate for the quarter was 35 cents
Text Mining
TM
Consumer purchasing Pattern ()
Bioscience Don Swanson Hypothesizing causes of rare diseases TM great impact
Genomics TM
2 Co-Occurrence
TM
Security Application (CIA analyze terrorist events)
Software Application IBM , Microsoft
Academic Application
Nature / NIH / Univ.Manchester / Uinv.California Customer Service quick response
1000 /
Text Mining Techniques
TM Text Extraction
Summarized Extraction
Feature Selection
Cluster Generation
Topic Identification
Information Mapping, Visualization
Text Categorization
TM Data Mining / Information Retrieval / Linguistics / Machine pattern / Statistics/ Pattern recognition / Database / Visualization
Text Mining
TM 4 Customer Relationship Management /CRM
$ 15.2 bn
Intelligence security / corporate/research $ 12 bn
Knowledge & Content management $ 1.9 bn
Information Retrieval technology $ 3.5 bn
TM
Customer Transaction Analysis
Competitive Intelligence / CI
R & D support
Crime Pattern Detection
Virginia ,USA.
Police Information Report / PIR TM
data pre-process
Date District Event type Description 1/05/2003 Reston Robbery . 5/05/2003 Lake Accident . 6/05/2003 South Narcotics
()
2 Extract important & concept
3 Analyze pattern ( Co-occurrence )
software Poly Analyst for text mining German / Spanish / French /Russian / Italian / Portuguese / Dutch / Swedish / Greek
Text Mining Tools / Software
Megaputer Intelligence
SAS
SPSS
Synthema
TEMIS
Autonomy
Clearforest
Fast
IBM
Inxight
Vantage Point
etc.
Text Mining Tools Open Sources Software
Gate - Natural language processing & language
engineering tool
YALE- with its Word vector Tool plugin data and text
mining software
Pimiento- a text-mining application framework written
in Java (http://ee.usyd.edu.au/~jjga/pimiento)
Text Mining Applications
(have proven particularly fertile ground for TM)
Corporate Finance / /
business intelligence
Patent Research / /
Life Science identify complex patterns of interactivities
between
proteins
Text Mining
Issue identificationSelection of information sourcesSearch refinement and data retrievalData cleaningBasic analysesAdvance analysesRepresentation
Text Mining Tasks
Search & Retrieval Information
mine various databases ( internal,external publications/patents ) retrieve search results analyses with text mining software
Profile ( Statistical analyses ) R&D activities /
Technology application emphases
Represent : text , tables , graphs activities by time /
player
/ Technology map
Interpret : perform competitive analyses describe &
project
technology by nation / company anticipate / forecast / trend
technology
STKS TM
Tool TM : Vantage Point / VP ISI / Scopus Delphion
features data mining ISI : WOS / SCOPUS / Delphion / Aureka etc.
Thomson : ISI Web of Science
PT JAU Yoksan, R Akashi, MAF Yoksan, Rangrong Akashi, MitsuruTI Low molecular weight chitosan-g-L-phenylalanine: Preparation, characterization, and complex formation with DNASO CARBOHYDRATE POLYMERSLA EnglishDT ArticleDE Chitosan; Phenylalanine; DNA; Nanoparticle; Complex coacervation; DNA releaseID HUMAN ENDOTHELIAL-CELLS; GENE DELIVERY; PLASMID DNA; TRANSFECTION EFFICIENCY; IN-VITRO; NANOPARTICLES; OLIGOSACCHARIDE; SCAFFOLDS; VECTORS; REMOVALAB The grafting of L-phenylalanine onto low molecular weight chitosan is .............................................................................C1 [Akashi, Mitsuru] Osaka Univ, Grad Sch Engn, Dept Appl Chem, Suita, Osaka 5650871, Japan. [Yoksan, Rangrong] Kasetsart Univ, Fac Agroind, Dept Packaging Technol & Mat, Bangkok 10900, Thailand.RP Akashi, M, Osaka Univ, Grad Sch Engn, Dept Appl Chem, 2-2 Yamadaoka, Suita, Osaka 5650871, Japan.EM akashi@chem.eng.osaka-u.ac.jpFU Japan Society for the Promotion of Science (JSPS), Japan [P05133]FX This work was financially supported by the Japan Society for the Promotion of Science (JSPS), Japan (P05133). One of the authors (R.Y.) thanks Assist. Prof. Michiya Matsusaki (Osaka University, Japan) for the technique and discussion on cell culture.NR 36TC 5PU ELSEVIER SCI LTDPI OXFORDPA THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, OXON, ENGLANDSN 0144-8617J9 CARBOHYD POLYMJI Carbohydr. Polym.PD JAN 5PY 2009VL 75IS 1BP 95EP 103DI 10.1016/j.carbpol.2008.07.001PG 9SC Chemistry, Applied; Chemistry, Organic; Polymer ScienceGA 361SYUT ISI:000260148600015
Thomson : Delphion
Text / Data Mining
Intelligence Market /
Technology Intelligence
(hidden content)
(relationship)
(sorting/ranking)
() 4 W (Who/What/When/Where)
() Mining
Metadata / Controlled Vocabulary / Taxonomy / Ontology
STKS Mining (Owned raw data) STKS ... . / . . ............... ...... / .................... 2545 / 1997
Zanasi A. 2005 Text mining and its applications to Intelligence , CRM and Knowledge Management
ppt Text Minning : Techniques and Application 2550.
Wikipedia Text Mining http://en.wikipedia.org as 13/11/2007
END
Thank you for your attention
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level