Wen-Hsiang Lu ( 盧文祥 ) Department of Computer Science and Information Engineering,
Department of Computer Science and Engineering
description
Transcript of Department of Computer Science and Engineering
Department of Computer Science and EngineeringYixin Chen (陈一昕) , Yi Mao, Minmin Chen, Rahav Dor, Greg
Hackermann, Zhicheng Yang, Chengyang LuSchool of Medicine
Kelly Faulkner, Kevin Heard, Marin Kollef, Thomas Bailey
Real-Time Clinical Warning for Hospitalized Patients via Data Mining (数据挖掘实现的住院病人的实时预警)
Background • The ICU direct costs per day for
survivors is between six and seven times those for non-ICU care.
• Unlike patients at ICUs, general hospital wards (GHW) patients are not under extensive electronic monitoring and nurse care.
• Clinical study has found that 4–17% of patients will undergo cardiopulmonary or respiratory arrest while in the GHW of hospital.
Project mission• Sudden deteriorations (e.g. septic
shock, cardiopulmonary or respiratory arrest) of GHW patients can often be severe and life threatening.
• Goal: Provide early detection and
intervention based on data mining– to prevent these serious, often life-
threatening events.– Using both clinical data and wireless
body sensor data
• A NIH-ICTS funded project: currently under clinical trials at Barnes-Jewish Hospital, St. Louis, MO
What exactly do we predict
Is he going to die?
What exactly do we predict
Is he going to ICU?
System Architecture
• Tier 1: EWS (early warning system)• Clinical data, lab tests, manually collected, low frequency
• Tier 2: RDS (real-time data sensing)• Body sensor data, automatically collected, wirelessly transmitted, high frequency
AgendaBackground and overview1
Real-time data sensing (RDS)3
Future work5
Early warning system (EWS)2
Medical Record (34 vital signs: pulse, temperature, oxygen saturation, shock index, respirations, age, blood pressure …)
Time/second
Time/second
Related Work
Main problems : Most previous general work uses a snapshot method that takes all the features at a given time as input to a model, discarding the temporal evolving of data
Medical data
mining
medical knowledge
machine learning
methods
SCAP and PSI
Acute Physiology Score, Chronic
Health Score , andAPACHE score are
used to predict renal failures
Modified Early Warning
Score (MEWS)
decision trees
neural networks SVM
Overview of EWS Goal: Design an data mining algorithm that can automatically identify patients at risk of clinical deterioration based on their existing electronic medical records time-series.
0
5000
10000
15000
20000
25000
30000
Non-ICUICU
Challenges: • Classification of high-
dimensional time series data
• Irregular data gaps• measurement errors• class imbalance
Key Techniques in the EWS Algorithm
• Temporal bucketing • Discriminative classification • Bootstrap aggregating (bagging)• Exploratory under-sampling• Exponential moving average smoothing• Kernel-density estimation
Workflow of the System
Exploratory Undersampl ing
Logistic Regression
Bucket bagging
Data set D,T
Converge?
Predict Model
Final Model
Yes
No
Real -time data stream
Final Model
EMA Smoothing
> threshold?
Alarm Warning
No
Data Preprocessing
> i teration count?
Bucketing
Yes
No
Bucketing
Data Preprocessing
(A) Model bui lding phase (B) Deployment phase
Generate a 24-hour window
Yes
Data Preprocessing
Outlier removal
Normalization
Temporal Bucketing
We retain data in a sliding window of the last 24 hours and divided it evenly into 6 buckets
In order to capture temporal variations, we compute several feature values for each bucket, including the minimum, maximum, and average
Bucket 1 Bucket 3 Bucket 5Bucket 2 Bucket 4 Bucket 6
Discriminative Classification
Clinical data
Data preprocessing
Classification Algo.
Output Model, Threshold
• Logistic regression (LR)
• Support vector machine (SVM)
• Use max, min, and avg of each bucket and each vital sign as the input features. (~ 400 features in total)
• Use the training data to learn the model parameters.
Temporal Bucketing
Aggregated Bootstrapping (bagging)
………….
………….
Final Model
Advantages:
1. Handles outliers
2. Avoid over-fitting
3. Better model quality
Biased Bucket Bagging
………….
………….
Final Model
Bucketing
Exploratory Undersampling
Predict model
Class balance
Remove the right record from the majori ty class according to the predicted value
Exponential Moving Average (EMA)
Evaluation Criteria
AUC (Area Under receives operating characteristic (ROC) Curve) represents the probability that a randomly chosen positive example is correctly rated with greater suspicion than a randomly chosen negative example.
Results on Historical DatabaseMethod AUC SENS PPV NPV ACCU
1 0.86809 0.44753 0.29562 0.97345 0.92747
2 0.8907 0.5135 0.3386 0.9751 0.9293
3 0.91995 0.58558 0.36864 0.97871 0.93269
4 0.92108 0.60087 0.37466 0.97948 0.93342
5 0.9221 0.60961 0.37805 0.97992 0.93384
1: bucketing + logistic regression
2: bucketing + logistic regression + bagging
3: bucketing + logistic regression + bucket bagging
4: bucketing + logistic regression + biased bucket bagging
5: bucketing + logistic regression + biased bucket bagging + exploratory undersampling
At specificity=0.95
Comparison of various modelsMethod AUC SPEC SENS PPV NPV ACCU
RPART 0.6703 0.93 0.55 0.287 0.977 0.912
SVM (Linear kernel
0.6879 0.9762 0.3997 0.4405 0.9719 0.95033
SVM (Quadratic
kernel
0.6851 0.9675 0.4028 0.3676 0.9718 0.94216
SVM (Cubic kernel)
0.6792 0.9681 0.3904 0.3646 0.9713 0.94216
SVM(RBF kernel
0.6968 0.9615 0.4321 0.3448 0.9730 0.93774
Our method 5 0.9221 0.94996 0.60961 0.37805 0.97992 0.93384
Dates Start Date Last Date277 days 1/24/2011 11/1/2011
ICU Transfers
total with alert w/o alert
ICU transfer 510 243 267
Total 11286 1430 9856
Ratio 4.5% 17.0 % 2.7 %
Deaths total with alert w/o alert
Deaths 239 138 102Total 11286 1430 9856Ratio 2.12% 9.65 % 1.02 %
Alerts already triggered early prevention that may prevented deaths
Clinical Trial at Barnes-Jewish Hospital
Agenda
Background & Related work1
Real-time data sensing (RDS)3
Future work5
Early warning system (EWS)2
A challenging problem• Classification based on multiple high-frequency real-time time-
series (heart rate, pulse, oxygen sat., CO2, temperature, etc.)
Overview of RDS
Wireless Sensor Network at BJH
Overview of Learning Algorithm
Key techniques: Feature extraction from multiple time series Feature selection Classification algorithms Exploratory undersampling
A Large Pool of Features
Features: • Detrended fluctuation
analysis (DFA) features• Approximate entropy
(ApEn)• Spectral features• First-order features• Second-order features• Cross-sign features
Detrended Fluctuation Analysis (DFA)
DFA is a method for quantifying the statistical self-affinity of a time-seriessignal. (See: e.g., Peng et al. 1994)
Applicable to both pulse rate and SpO2
Spectral Analysis (FFT)
Used component values of VLF (<0.04Hz), LF (0.04-0,15HZ), HF (0.15-0.4HZ), and the ratio LF/HF for each signal.
Other Features• Approximate Entropy (ApEn): It quantifies the unpredictability of
fluctuations in a time series. – A low value deterministic– A high value unpredictable
• First Order Features: – Mean, standard deviation– skewness (symmetry of distribution), Kurtosis (peakness of distribution)
• Second Order Features: related to co-occurrence of patterns– First quantify a time series into Q discrete bins, then construct a pattern matrix – energy (E), entropy (S), correlation (COR), inertia (F), local homogeneity (LH),
• Cross-sign features: link multiple vital signs together– Correlation: the degree of departure of two signals from independence– Coherence: amplitude and phase about the frequencies held in common
between two signals
Empty Feature Set
Current Feature Set
Evaluate each of the remaining features
Forward Feature Selection
Pick one feature to add into the set
(if no improvement)Final feature set
Experimental Setup
Dataset: MIMIC-II (Multiparameter Intelligent Monitoring in Intensive Care II): A public-access ICU databaseThe data model can be used for both GHW patients with sensors and ICU patients
Our data: between 2001 and 2008 from a variety of ICUs (medical, surgical, coronary care, and neonatal)
Prediction goal: death or survivalReal-time vital signs: heart rate and oxygen saturation rateClass imbalance: most patients survivedEvaluation: Based on a 10-fold cross validation
Method Feature AUC Specificity Sensitivity PPV NPV
LSVM 1 0.5759 0.9497 0.0755 0.2550 0.7781
LR 1 0.4742 0.9483 0.0729 0.3181 0.7555
KSVM 1 0.5897 0.9497 0.1265 0.3643 0.7879
LSVM 2 0.4473 0.9497 0.0346 0.1300 0.7705LR 2 0.4902 0.9483 0.0313 0.1667 0.7473
KSVM 2 0.5016 0.9497 0.0676 0.2450 0.7768
LSVM 1 & 2 0.5757 0.9497 0.1416 0.3917 0.7694LR 1 & 2 0.5370 0.9483 0.0521 0.2500 0.7513
KSVM 1 & 2 0.6332 0.9497 0.1428 0.4146 0.7911
LSVM: Linear SVMLR: Logistic RegressionKSVM: RBF Kernel SVM
1: DFA of Heart Rate2: DFA of Oxygen Saturation
Result – Linear and Nonlinear Classification
Algorithm Features AUC
KSVM DFA 0.6332
DFA + Cross-sign features 0.6565
DFA + Cross-sign features + ApEn 0.6753
All features 0.7079
Logistic Regression DFA 0.5370
DFA + Cross-sign features 0.5731
DFA + Cross-sign features + ApEn 0.5974All features 0.7402
Result – Feature Combinations
Result – Feature Selection
Method #Selected Features
AUC Specificity Sensitivity PPV NPV
KSVM 5 0.7752 0.9654 0.4852 0.8041 0.8651
LR 23 0.7844 0.9483 0.5208 0.7692 0.8567
LR is our first choice: better AUC, interpretability, efficiency
First 12 Selected Features (in logistic regression)
standard deviation of heart rate
ApEn of heart rate
Energy of oxygen saturation
LF of oxygen saturation
LF of heart rate
DFA of oxygen saturation
Mean of heart rate
HF of heart rate
Inertia of heart rate
Homogeneity of heart rate
Energy of heart rate
linear correlation of heart rate of oxygen saturation
Result – Our Final Model Method AUC Specificity Sensitivity PPV NPV
1 0.7402 0.9500 0.3646 0.7000 0.8185
2 0.7767 0.9500 0.4615 0.9000 0.6440
3 0.8082 0.9500 0.4865 0.9000 0.6546
Method 1: Logistic Regression + all features
Method 2: Logistic Regression + all features + exploratory undersampling
Method 3: Logistic Regression + feature selection + exploratory undersampling
Current Work: Density-based LR
• Standard logistic regression φk(x) = xk:– P(y=1|x) = 1/(1 + exp( - ∑ wk xk))– Probability of an event (e.g., ICU, death) grows or decreases
monotonically with each feature – Not true in many case: e.g., ICU transfer rate vs. age
• Ideas: transform each feature xk
Current Work: Density-based LR
• Use a kernel-density estimator to estimate p(xk, y=1) and p(xk, y=0) for each feature xk
• Resulting in a nonlinear separation plane that conforms to the true distribution of data
• Advantages over KLR, SVM– Efficiency, interpretability
Example of Density-based LR
Original LR Density-based LR
Test Data:
Future Work
• Distance-based classification algorithms for multi-dimensional time-series– Dynamic time warping, information distance
• Combination of feature-base and distance-based classification algorithms– Include distance information in the objective function
• Combining Tier-1 and Tier-2 data– Multi-kernel methods
• Interpretation of alerts– Based on the magnitude and sign of model coefficients
Acknowledgement
Real-Time Simulation on Historical Data
Method AUC SENS PPV NPV ACCU
1 0.6834 0.30159 0.2345 0.9634 0.9128
1 + EMA 0.78203 0.36508 0.27059 0.96664 0.9128
2 0.74359 0.30159 0.23457 0.96342 0.9293
2 + EMA 0.777737 0.38095 0.27907 0.96342 0.92134
4 0.77689 0.38905 0.27907 0.96745 0.9336
4 + EMA 0.81411 0.39683 0.28736 0.96825 0.92212
5 0.79902 0.4127 0.29545 0.96096 0.92295 + EMA 0.79902 0.4127 0.29545 0.96096 0.9229
@ Specificity=0.95
(Assuming feature Independence)
Feature Coefficient
local homogeneity of heart rate -14.50
standard deviation of oxygen saturation
10.20
entropy of oxygen saturation 10.17
LF of heart rate 8.62
local homogeneity of oxygen saturation
7.77
LF/HF of oxygen saturation 4.53
inertia of heart rate 3.86
entropy of heart rate 2.97
low frequency of oxygen saturation
-2.89
mean of oxygen saturation -2.86
Let each be the bucket sample that is independently drawn from . is the predictor.
The aggregated predictor is: The average prediction error in is:
The error in the aggregated predictor is:
Using the inequality gives us .
( , ),1i iD y i m
' 2( ( , ))i i ie E y D y
( , )D y
2( ( , ))Ae E y D y
( , ) ( ( , ))A i iD y E D y
2 2( )EZ EZ'e e
Why Bagging Works?
( , )i iD y
Algorithm details – Biased Bucket bagging (BBB)
020000400006000080000
100000120000140000160000180000
2buckets
3buckets
4buckets
5buckets
A critical factor deciding how much bagging will improve accuracy is the variance of these bootstrap models. We see that BBB with 4 buckets has the largest difference between and . Besides this, BBB with 4 buckets also has the highest standard deviations in predict results. So we choose BBB with 4 buckets as the final method.
2 ( , )i iE D y 2 ( , )i iE D y
Standard deviation
Algorithm Details –Bucket Bagging
………….
………….
Final Model
Bucketing
Result on Real-Time System
We can see that all cases attain best performance when is around 0.06, showing that the choice of is robust. This small optimal value shows that historical records plays an important role for prediction.
Cross validation for the EMA parameter