頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌...

102
111/01/18 頁頁頁頁 1 Data Mining for Healthcare Documents 陳陳陳 陳陳陳陳陳陳陳陳陳陳陳 2011.10.27

Transcript of 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌...

Page 1: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字1

Data Mining for Healthcare Documents

陳啟煌臺灣大學計資中心程式組

2011.10.27

Page 2: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

關於我 陳啟煌 學歷

– 交大資工、台大資工、台大電機 經歷

– 興匯財務顧問公司、台大計資中心 Email:[email protected]

Page 3: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字3

Outlines

Introduction Biomedical Semantic Similarity Measure Semantic-driven Keyword Matching Extractor Web-based Discharge Summary System Healthcare Mining Project with Mongolia Conclusions and Future Works

Page 4: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字4

Clinical Mining

Clinical Database

Clinical Pathways

Page 5: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

IntroductionIntroduction

In IOM 2000 report, 44,000 to 98,000 unIn IOM 2000 report, 44,000 to 98,000 unnecessary deaths per yearnecessary deaths per year– Death rate equivalent to three jumbo jets cra

shed every two days– Motor vehicle accidents: 43,458– breast cancer:42,297 – AIDS:16,516

Page 6: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

Suggested SolutionsSuggested Solutions

Development of IT infrastructuresDevelopment of IT infrastructures– Computerized Physician Order Entry (CPOE CPOE

))• Order Sets: to do the right thing easier.Order Sets: to do the right thing easier.• Alerts / remindersAlerts / reminders• Clinical guidelineClinical guideline

Restriction on working hoursRestriction on working hours Greater staffing to patient ratiosGreater staffing to patient ratios

Page 7: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.
Page 8: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字8

Motivation

Clinical Pathway– a way of treating a patient with a

standardized procedure in order to• Enhance the efficiency, • Increase the quality,• Lower the costs,• Shorten the length of stay in hospital.

Usually represented in a script book and/or flow chart diagram

Page 9: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

Order Sets System Evolution

Paper Order Sets– Predefined orders written on paper.

Electronic Order Sets– Just a UI to create and lookup order sets

Knowledge-based Order Sets– Machine Learning– Interactive UI to user.

Page 10: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

How to Create Order Sets

Committee– Traditional method, time-consuming

Feedback system– Interaction with users, suggestions

Data mining– Find patterns from existed clinical data

Page 11: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.
Page 12: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

Raw Data

Page 13: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字13

Introduction

Page 14: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字14

Motivation

Free-Text Reports– Discharge summaries – Radiology reports– Pathology reports– Enclose treatments can be extracted,

learned, and gained knowledge

Page 15: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字15

Motivation

Biomedical Semantic Similar Terms exists in medical reposts.

– “congestive heart failure”,”cardiac decompensation “, and “volume overload”

Page 16: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字16

Approaches

Biomedical Semantic Similarity Measure– Calculate semantic similarity between terms

A Powerful Extractor– To view, verify, extract data items from reports

Structuralized – Providing Highly Interactive Editor

• Auto-complete• Model essay• User phrases

Page 17: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字17

Biomedical Semantic Similarity Measure

Page 18: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字18

Introduction(1/4)

Ontology-techniques– Ontology Tree

• Single ontology

• Cross ontology

– Path length, Edge counting

Corpus-based techniques– Context vector measure, Latent semantic

analysis (LSA)

Page 19: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字19

Introduction(2/4)

The Web Corpus– The Web is providing unprecedented access

to the information as well as interacting with people’s daily lives.

– The idea of using the Web as a corpus for NLP research is getting popular.

Page 20: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字20

Introduction(3/4)

How to analyze each document directly of the Web?

Page 21: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字21

Introduction(4/4)

Web search engines– Efficient interface

– Numerous documents & high growth rate

– Google – page count

Page 22: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字22

Background and Related Work

Ontology-techniques – Single ontology

• Edge counting

• Information content

• Feature based

• Hybrid

– Cross ontology• Hliaoutakis etc.

Page 23: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字23

Methodologies

Sample Construction Feature Definitions Feature Selection Strategy Machine Learning Model

– Support Vector Machine Model

Page 24: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字24

Sample Construction(1/3)

Page 25: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字25

Sample Construction(2/3)

Page 26: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字26

Sample Construction(3/3)

In our study, we collect– 1500 synonymous term pairs

– 1500 non-synonymous term pairs

Page 27: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字27

Feature Definitions(1/4)

Features–Co-occurrence

• A

• a

• B

Page 28: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字28

Feature Definitions(2/4)

Features–Co-occurrence

• A

–Semantic distance• A

Page 29: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字29

Feature Definitions(3/4)

”Apoptosis known as programmed cell death” The phrase known as indicates a synonymous

relationship between the apoptosis and the programmed cell death.

”Apoptosis known as programmed cell death”– Google page count - 141

” Isoflavone known as Cyclooxygenase”– Google page count - 0

Page 30: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字30

Feature Definitions(4/4)

Features– Lexico-syntactic pattern

• P known as Q H( P known as Q )/H( P ∩ Q )• of P (Q)• P (Q)• and P (Q• , P (Q• against P (Q• prevalence of P Q• patients with P Q • P/Q• P, Q

Page 31: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字31

Feature Selection Strategy

Rank the features according to their ability to express synonymy by F-score:

Page 32: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字32

Support Vector Machine Model(1/2)

Page 33: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字33

Support Vector Machine Model(2/2)

LIBSVM 2.89– C-SVC

• Linear• Polynomial degree=2• Polynomial degree=3• RBF

– nu-SVC• Linear• Polynomial degree=2• Polynomial degree=3• RBF

Page 34: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字34

Datasets(1/5)  Concept 1 Concept 2 Human

Anemia Appendicitis 0.031

Dementia Atopic Dermatitis 0.062

Bacterial Pneumonia Malaria 0.156

OsteoporosisPatent Ductus Arteriosu

s0.156

Amino Acid Sequence Anti-Bacterial Agents 0.156

Acquired Immunodeficiency Syndrome

Congenital Heart Defects

0.062

Otitis Media Infantile Colic 0.156

Meningitis Tricuspid Atresia 0.031

Sinusitis Mental Retardation 0.031

Hypertension Kidney Failure 0.5

Hyperlipidemia Hyperkalemia 0.156

Hypothyroidism Hyperthyroidism 0.406

Sarcoidosis Tuberculosis 0.406

Vaccines Immunity 0.593

Asthma Pneumonia 0.375

Table 1: Dataset 1 of 36 medical term pairs

Page 35: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字35

Datasets(2/5)  Concept 1 Concept 2 Human

Diabetic Nephropathy Diabetes Mellitus 0.5

Lactose IntoleranceIrritable Bowel Syndro

me0.468

Urinary Tract Infection Pyelonephritis 0.656

Neonatal Jaundice Sepsis 0.187

Sickle Cell Anemia Iron Deficiency Anemia 0.437

Psychology Cognitive Science 0.593

Adenovirus Rotavirus 0.437

Migraine Headache 0.718

Myocardial Ischemia Myocardial Infarction 0.75

Hepatitis B Hepatitis C 0.562

Carcinoma Neoplasm 0.75

Pulmonary Valve Stenosis

Aortic Valve Stenosis 0.531

Failure To Thrive Malnutrition 0.625

Breast Feeding Lactation 0.843

Antibiotics Antibacterial Agents 0.937

Table 1: Dataset 1 of 36 medical term pairs

Page 36: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字36

Datasets(3/5)

  Concept 1 Concept 2 Human

Seizures Convulsions 0.843

Pain Ache 0.875

Malnutrition Nutritional Deficiency 0.875

Measles Rubeola 0.906

Chicken Pox Varicella 0.968

Down Syndrome Trisomy 21 0.875

Table 1: Dataset 1 of 36 medical term pairs

Page 37: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字37

Datasets(4/5)  Concept 1 Concept 2 Physician Expert

Renal Failure Kidney Failure 4 4

Heart Myocardium 3.3 3

Stroke Infarct 3 2.8

Abortion Miscarriage 3 3.3

Delusion Schizophrenia 3 2.2

Congestive Heart Failure

Pulmonary Edema 3 1.4

Metastasis Adenocarcinoma 2.7 1.8

Calcification Stenosis 2.7 2

Diarrhea Stomach Cramps 2.3 1.3

Mitral Stenosis Atrial Fibrillation 2.3 1.3

Chronic ObstructivePulmonary Disease

Lung Infiltrates 2.3 1.9

Rheumatoid Arthritis Lupus 2 1.1

Brain TumorIntracranial Hemorrhag

e2 1.3

Carpel Tunnel Syndrome

Osteoarthritis 2 1.1

Diabetes mellitus Hypertension 2 1

Table 2: Dataset 2 of 30 medical term pairs

Page 38: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字38

Datasets(5/5)  Concept 1 Concept 2 Physician Expert

Acne Syringe 1.7 1.2

Antibiotic Allergy 1.7 1

CortisoneTotal Knee Replacemen

t1.7 1.2

Pulmonary Embolus Myocardial Infarction 1.7 1.4

Pulmonary Fibrosis Lung Cancer 1.3 1

Cholangiocarcinoma Colonoscopy 1.3 1

Lymphoid Hyperplasia Laryngeal Cancer 1 1

Multiple sclerosis Psychosis 1 1

Appendicitis Osteoporosis 1 1

Rectal Polyp Aorta 1 1

Xerostomia Alcoholic Cirrhosis 1 1

Peptic Ulcer Disease Myopia 1 1

Depression Cellulites 1 1

Varicose Vein Entire Knee Meniscus 1 1

Hyperlidpidemia Metastasis 1 1

Table 2: Dataset 2 of 30 medical term pairs

Page 39: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字39

Experiment Results

Rank Feature F(i)

1 NGD 0.2751

2 WebPMI 0.237

3 , X (Y 0.1648

4 X/Y 0.1632

5 X(Y) 0.1606

6 X, Y 0.1585

7 WebOverlap 0.1173

8 WebDice 0.0555

9 WebJaccard 0.0347

10 of X (Y) 0.0185

11 and X (Y 0.0093

12 against X (Y 0.0027

13 patients with X Y 0.0017

14 X known as Y 0.0014

15 prevalence of X Y 0.0011

Page 40: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字40

Experiment Results

Figure 3.4(a): Correlation vs. No of features and training samples using C-SVC with linear kernel

Page 41: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字41

Experiment Results

Figure 3.4(b): Correlation vs. No of features and training samples using C-SVC with polynomial degree=2 kernel

Page 42: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字42

Experiment Results

Figure 3.4(c): Correlation vs. No of features and training samples using C-SVC with polynomial degree=3 kernel

Page 43: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字43

Experiment Results

Figure 3.4(d): Correlation vs. No of features and training samples using C-SVC with RBF kernel

Page 44: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字44

Experiment Results

Figure 3.5(a): Correlation vs. No of features and training samples using nu-SVC with linear kernel

Page 45: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字45

Experiment Results

Figure 3.5(b): Correlation vs. No of features and training samples using nu-SVC with polynomial degree=2 kernel

Page 46: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字46

Experiment Results

Figure 3.5(c): Correlation vs. No of features and training samples using nu-SVC with polynomial degree=3 kernel

Page 47: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字47

Experiment Results

Figure 3.5(d): Correlation vs. No of features and training samples using nu-SVC with RBF kernel

Page 48: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字48

Experiment Results

ModelMaximum correlation

Number of samplesNumber of features

C-SVC(Linear) 0.758 1500 9

C-SVC(Poly=2) 0.776 1200 7

C-SVC(Poly=3) 0.759 300 13

C-SVC(RBF) 0.612 1100 10

nu-SVC(Linear) 0.798 900 7

nu-SVC(Poly=2) 0.766 300 11

nu-SVC(Poly=3) 0.736 300 12

nu-SVC(RBF) 0.743 100 11

Page 49: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字49

Experiment Results

Table 5: Correlation vs. Dataset 1 and Dataset 2 with physician scores and expert scores of differe

nt models

Model Dataset 1Dataset 2(Phy)

Dataset 2(Exp)

C-SVC(Linear) 0.758 0.689 0.482

C-SVC(Poly=2) 0.776 0.698 0.479

C-SVC(Poly=3) 0.759 0.649 0.395

C-SVC(RBF) 0.612 0.388 0.171

nu-SVC(Linear) 0.798 0.705 0.496

nu-SVC(Poly=2) 0.766 0.671 0.424

nu-SVC(Poly=3) 0.736 0.641 0.384

nu-SVC(RBF) 0.743 0.632 0.373

Page 50: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/1950

Result comparisonTable 3.4 Result comparison for Dataset 1

Measure Dataset 1

SemDist 0.726(2)

Path length 0.422(5)

Leacock & Chodorow

0.600 (3)

Wu & Palmer 0.498(4)

Proposed 0.798 (1)

Page 51: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/1951

Result comparisonTable 3.5: Results comparison for Dataset 2

MeasureDataset 2((Phys

ician)Dataset 2(EXPE

RT)

Path length 0.512(4) 0.731(2)

Leacock & Chodorow

0.358(7) 0.497(5)

Lin 0.522(3) 0.565(4)

Resnik 0.534(2) 0.61(3)

Jiang & Conrath

0.506(5) 0.741(1)

Vector(All sect, 1M notes)

0.436(6) 0.497(5)

Proposed 0.705(1) 0.496(6)

Page 52: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字52

Semantic-driven Keyword Matching Extractor

Page 53: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字53

Introduction

For Structuralized Clinical Data– Data can be directly exported for further anal

yzing and mining For Non-structuralized Clinical Data

– Data need to be further processed to extract the relevant information

Page 54: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字54

Background and Related works

Marking concepts and related semantics– Cancer Text Information Extraction System (ca

TIES) Extracting data items fill the outcomes into t

he predefined template– IBM Watson Research Center & Mayo Clinic

Providing the verification user interface– Commercial natural language processing (NL

P) engines

Page 55: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字55

Architecture

Apply match pattern on textual reports

Send matching profile

Review and verify matched information

Clinical data warehouse

Textualclinical reports

Matching metadata

Retrievekeyword list

Select keyword

Retrievematching profile

Store structuralized data

Case-oriented template schema

Keyword selection interface

Information matching modules

Textual documentsviewer

Extraction verification editor

Page 56: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字56

Methodology

The default common keyword lists of each type of textual documents

the personal keyword lists – matching the keyword and the keywords with relate

d semantic – mapping the corresponding matching rules using th

e retrieved matching pattern and applying the matching rules on the textual reports

– Date, 2009/01/01, 12/01 – Size, “4.9 x 1 x 1.8” length x width x height

Page 57: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字57

Result

Page 58: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字58

Result

Page 59: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字59

Discharge Summary System

Page 60: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字60

Background

Old Discharge summary system(Dis32)– Client/Server Architecture – Install/upgrade client applications

Web Discharge summary system– Service-Oriented Architecture– 2009.10 Online

Page 61: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字61

Page 62: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字62

Motivation

Discharge summary user interface– Chief Complaint, Brief History – Free-Text field– How to generate a list of suggesting phrases

Page 63: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字63

Motivation

Auto-Complete

Page 64: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字64

Language Modeling

We want to compute P(w1,w2,w3,w4,w5…wn), the probability of a sequence

Alternatively we want to compute P(w5|w1,w2,w3,w4): the probability of a word given some previous words

The model that computes P(W) or P(wn|w1,w2…wn-1) is called the language model.

Page 65: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字65

SRILM

SRILM– The SRI Language Modeling Toolkit – SRILM is a toolkit for building and applying s

tatistical language models (LMs)– http://www.speech.sri.com/projects/srilm/

Page 66: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字66

SRILM

Three Main Functionalities – Generate the n-gram count file from the corpus – Train the language model from the n-gram count file – Calculate the test data perplexity using the trained la

nguage mode

Page 67: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字67

Implementation

N-gram Count File– Chief Complaint, Brief History

Static– Phrase lists

Dynamic– AJAX + AutoComplete toolkit

Page 68: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字68

Discharge notes

Page 69: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字69

Results

System Name Time Spent

Client-server system 652 seconds

(00:10:52)

Web-based system 372 seconds

(00:06:12)

The average consumed time (Measure unit: seconds (hh:mm:ss)

7 intern participants

Page 70: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字70

Healthcare Mining Project with Mongolia

Page 71: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字71

Background

Taiwan — Mongolia– National Science Council– Mongolian Ministry of Education, Culture an

d Sciences NTU — MUST

– Mongolian University of Science and Technology

3-Year Project– 2009/8/1 – 2012/7/31

Page 72: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字72

Motivation

Reduce cost– Length of stay in hospital – Early detection of disease

Improve quality and patient safety– SOP, Clinical Pathways

Page 73: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字73

Motivation

Clinical Pathway– a way of treating a patient with a standardize

d procedure in order to• Enhance the efficiency, • Increase the quality,• Lower the costs,• Shorten the length of stay in hospital.

Usually represented in a script book and/or flow chart diagram

Page 74: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字74

Project Goal

Build A Data Mining framework for– Early detection of disease

• Find out the sequential patterns between different diseases

– Standardized therapeutic procedure • Discover clinical pathways and clinical guide

Page 75: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字75

Mining Clinical Pathway

Clinical Database

Clinical Pathways

Page 76: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字76

Clinical Data

The clinical data include– Patient information,– Diagnosis– Sequences of physicians orders taken at diff

erent time moments.

Page 77: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字77

Page 78: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字78

Clinical Sequence Mining system diagram

DataPreparation

Data Pre-Processing

Mining Model

HistoricalDiagnosisDatabase

OrdersSequenceKnowledge

base Alert and Reminding

System

Clinical Pathway Creation System

Page 79: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字79

Data Preparation

Inpatient Department raw data– From 2007/1/1 to 2007/5/26

Discharge notes– with admission/discharge diagnosis, chief co

mplaint. 22,000 records Diagnosis records in IPD

– with ICD9 code Related orders in IPD

Page 80: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字80

Data Preparation

Chief complaint– For scheduled chemotherapy– Total

• 791 cases• 33,771 physician orders

Page 81: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字81

Data Pre-processing

Select relevant data according to the order type attribute– Drop some non-meaningful orders such as n

ursing care, Administration routine orders.

Page 82: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字82

Order Type Statistics

ordertypecode cnt ordercntR 10309 368

T 6180 135

A 6063 20

L 5569 175

M 4026 84

D 814 25

X 360 41

B 168 5

O 106 58

J 47 6

E 40 14

P 12 4

I 11 3

N 6 2

Page 83: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字83

Mining Model

Sequence Clustering Algorithm Mining Tool

– Microsoft SQL Server 2005– Sequence Clustering Model– Visualize Data Analysis

Parameter– Support– Confidence

Page 84: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字84

Sequence Clustering Mining

Sequence Clustering algorithm finds clusters of cases that contain similar paths in a sequence.

Page 85: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字85

Sequence Clustering Sample

CustomID (Sequence Data)1 (30) (60 90)

2 (10 20) (30) (40 60 70)

3 (30 50 70)

4 (30) (40 70) (90)

5 (90)

Sequential Pattern :

(30) (90) 、 (30) (40 70)

Page 86: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字86

Mapping

Custom Patient Item Order Shopping Cart Concurrent Orders

Page 87: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

Result

Page 88: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字88

Page 89: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字89

Page 90: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字90

Page 91: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字91

Page 92: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字92

Sequence Sample

09029CZP Bilirubin, total

08011CZP CBC & platelet

08013CZP WBC differential count

09015CZP (Blood)Creatinine

09002CZP (Blood)UN

09025CZP AST(GOT)

09026CZP ALT(GPT)

09038CZP Albumin(Blood)

09021CZP Sodium, Na

09022CZP Potassium, K

血小板

白血球

肌酸酐

膽紅素

肝功能指數

肝功能指數

清蛋白

Page 93: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字93

The SAGE Guideline Model

Standards-Based Sharable Active Guideline Environment– Developed by

• Stanford Medical Informatics, IDX Systems Corporation, Apelon Inc., Intermountain Health Care, Mayo Clinic and University of Nebraska Medical Center

Page 94: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

The Protégé

Page 95: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字95

Activity Graphs

Aspirin Therapy for diabetic patients

Page 96: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字96

Page 97: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字97

Cooperation Architecture

Hospital in Mongolia

VM-DB VM-Web VM-DB VM-Web

Hospital in Taiwan

VM Images

Model Feedback

Page 98: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字98

Cloud Architecture

Health Mining Server

Hospital in Taiwan

Hospital in Mongolia

Hospital in Canada

Page 99: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字99

Conclusions

A measure that uses page counts calculate semantic similarity between two given concepts.

A semantic-driven keyword matching extractor help extract data item from reports

Page 100: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字100

Conclusions

A highly Interactive free-text editor with auto-complete feature speed up the composition of discharge summaries.

A Data mining framework is proposed.

Page 101: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字101

Future Works

Find out why corpus-based methods produce closer correlation with physicians’ scores than experts’

Structuralized the healthcare documents Prove Data mining models’ robustness

– Variation analysis across hospitals/regions– Taiwan and Mongolia– Canada , Taiwan and Mongolia

Page 102: 頁尾文字 2015/9/18 頁尾文字 1 Data Mining for Healthcare Documents 陳啟煌 臺灣大學計資中心程式組 2011.10.27.

112/04/19頁尾文字102

Q&A