Rough-set-based ADR signaling from SRS data with missing values
-
Upload
phate334 -
Category
Technology
-
view
146 -
download
2
Transcript of Rough-set-based ADR signaling from SRS data with missing values
智慧型計算實驗室指導教授 :林文揚 教授作者 : 藍琳簡報人 : 王敏賢
On the Feasibility of Rough-Set-based ADR Signaling from
Spontaneous Reporting Data with Missing Values
甚麼是 ADR
• Adverse Drug Reaction( 藥物不良反應 )
• ADR rule:Predc, drug → symptom
• e.g. sex=“Female”, drug=“d1” → symptom=“s1”
ADR 案例• 1950 年在德國上市的 Thalidomide 在當時被認為是最安全且快速的鎮定劑之一,經常用在抑制懷孕期間的嘔吐感。• 造成超過 12000 畸形胎兒,並在多個國家被發現容易造成多發性神經炎。
Thalidomide 產生畸形胎兒
Spontaneous Reporting System(SRS)
• 自發性通報系統• FDA Adverse Event Reporting System (FAERS)
• 所有通報資料以 line-oriented 格式儲存,並定期開放。
FAERS 開放資料
FAERS 開放資料
The 2*2 contingency table
Predc. Symptom Other symptom Total
Drug a b a + b
other drugs c d c + d
Total a + c b + d N = a + b +c + d
•For ADR signal detection
ADR 信號量測方式• Frequentist methods
– Proportional Reporting Ratio(PRR)
– Reporting Odds Ratio(ROR)
• Bayesian methods– Bayesian Confidence Propagation Neural network(BCPNN)– Multi-item Gamma Poisson Shrinker(MGPS)
d)c/(cb)a/(a
dbca
//
ADR 信號量測方式04
Q1
04Q
305
Q1
05Q
306
Q1
06Q
307
Q1
07Q
308
Q1
08Q
309
Q1
09Q
310
Q1
10Q
311
Q1
11Q
312
Q1
12Q
313
Q1
13Q
30
7
14
21
28
35
世界不會永遠是美好的 !!
SRS 資料問題• 資料並非完全嚴謹,無法驗證其可靠度。• 在資料探勘的過程中,帶有缺漏值的資料對結果影響很大。
Missing Value
幾個處理 Miss Value 的傳統方法• Deletion methods:
– Listwise deletion– Pairwise deletion
Series1
5104523
35288353720493
4709159
TotalListwisePairwise-agePairwise-gender
ROUGH SET BASED METHOD
Rough Set Theory
• 由波蘭數學家 Zdzisław I. Pawlak(1926-2006) 在1982 年提出,是一個用來分析帶有不確定性資料的工具。
• 用以求出明確集 (crisp set)的上、下逼近集合。
一些簡單的名詞Case Height Weight Gender
1 170 60 Male
2 165 55 Female
3 155 45 Female
4 150 65 Male
S={U,A}•U={1,2,3,4}•A={Height,Weight,Gender}
Lower and Upper Approximations
• 目前有一資訊系統 S={U,A} , 設 X 、 P 分別為 U 、 A 的子集合,則 PX 的上下近似集定義如下 :
}][|{ XPeUeXP
}][|{ XPeUeXP
Lower and Upper Approximations
Lower approximation
Set X
Upper approximation
ExampleCase Height Weight Age
1 170 75 18
2 165 50 30
3 165 60 18
4 145 75 18
5 145 50 30
6 170 45 45
7 145 50 45
8 170 45 30
X={1,2,6,8}P={Weight, Age}
Equivalence classes:{1,4}{2,5}{3}{6}{7}{8}
}8,6{
}][|{
XP
XPeUeXP
}8,6,5,4,2,1{
}][|{
XP
XPeUeXP
ROUGH SET STRATEGIES TO DATA WITH MISSING DATA
原有的列聯表
•在 Information system 完整的情況下, a 、 b 、 c及 d 四個值是確定的。
Predc. Symptom Other symptoms
Drug a b
Other Drugs c d
帶有近似範圍的列聯表The specific
attribute symptom Other symptoms
drug
Other drugs
•利用粗糙集理論目的是求出該 crisp set 的上下逼近集合。
對缺漏值的兩種解釋• Lost(?):
– 原本應該存在的資料但遺失或被刪除。– 不應被忽略。
• Don’t care(*):– 缺漏的屬性值可有可無。
Characteristic relation & Characteristic set
• Lost(?)– Similarity characteristic relation:
– Similarity characteristic set:
.)(
?,such thatallfor
),(),(ifonly and if)(),(
axPa
ayaxPRyx S
)}(),(|{),( PsRyxyxPK s
Characteristic relation & Characteristic set
• Don’t care(*):– Tolerance characteristic relations:
– Tolerance characteristic set:
. *)( *)(
Paayax
ayaxPRyx T
allfor ,or,
or),(),( ifonly and if)(),(
)}(),( |{),( PRyxyxPK TT
Lower and Upper approximations
• Singleton approximation
})(|{ XxKUxXP Pkg
})( |{ XxKUxXP Pkg
• Subset approximation}),(, |),({ XxpKUxxpKXPK
s
}),(,|),({ XxPKUxxPKXPKS
• Concept approximation
}),(, |),({ XxPKXxxPKXPKc
}),(, |),({ XxPKXxxPKXPKc
Lower approximation
Set X
Upper approximation
Incomplete SRS data
Attribute set P
Strength Computation
• global • local
Characteristic set K(P, x)
• tolerance (don’t care) • Similarity (lost)
Approximation PX
• singleton• subset• concept
known rule :Predc , drug reaction
• Analyze the feasibility of the 12 different methods
Rough Set : Basic Idea
Example the singleton approximation& global
ISR Age Gender Drug PT
1 ? ? d1 s1
2 a2 ? d2,d3 s1,s2
3 a1 g1 d1 s1
4 a1 g1 d2,d3 s1,s2
5 ? ? d2,d3 s1,s2
6 ? g2 d1 s1
7 ? g1 d1 s1
8 a1 g1 d3 s1,s2
}8{)8,( }4{)4,(}73{)7,( }3{)3(
}6{)6,( }2{)2(}542{)5,( }7631{)1(
PKPK,PKP,K
PKP,K,,PK,,,P,K
SS
SS
SS
SS
.)(
?,such thatallfor
),(),(ifonly and if)(),(
axPa
ayaxPRyx S
Similarity characteristic relation:
Example the singleton approximation& global
Gender = g1 PT = s1 other PT
Drug = d2 Xa={4} Xb={}
other drugs Xc={3,7,8} Xd={}
}8{)8,( }4{)4,(}73{)7,( }3{)3(
}6{)6,( }2{)2(}542{)5,( }7631{)1(
PKPK,PKP,K
PKP,K,,PK,,,P,K
SS
SS
SS
SS
dd
cc
bb
aa
XPXPXPXPXPXPXPXP
}8,7,3,1{}8,7,3{
}4{}4{
})(|{ XxKUxXP Pkg
})( |{ XxKUxXP Pkg
Example the singleton approximation& global
Gender = g1 PT = s1 other reactions
Drug = d2 [1, 1] 0
other drugs [3, 4] 0
)()(PRR
)()(
bacdca
bacdca
333.1)01(3)04(1PRR75.0
)01(4)03(1
Experiment
No. Rule Drug Name Symptom
The suitable of group(Age or Gender)
Marked year in US
Year withdrawn
in US
R1-1
AVANDIA
MYOCARDIAL INFARCTION
18~ 1990 2010R1-2 DEATH
R1-3 CEREBROVASCULAR ACCIDENT
R2
TYSABRIPROGRESSIVE MULTIFOCAL
LEUKOENCEPHALOPATHY18~ 2004 2005
R3ZELNORM CEREBROVASCULAR
ACCIDENT Female 2002 2007
實驗結果04
Q1
04Q
204
Q3
04Q
405
Q1
05Q
205
Q3
05Q
406
Q1
06Q
206
Q3
06Q
407
Q1
07Q
207
Q3
07Q
408
Q1
08Q
208
Q3
08Q
409
Q1
09Q
209
Q3
09Q
410
Q1
10Q
210
Q3
10Q
411
Q1
11Q
211
Q3
11Q
412
Q1
12Q
212
Q3
12Q
413
Q1
13Q
213
Q30
1
2
3
4
5
0
53
106
159
212
265
Method 1 M(s, g, g) for R1-2
PRR_ld PRR_lower PRR_pd PRR_upperThreshold=2 A_ld A_rs A_pd
PRR A Value
04Q
104
Q2
04Q
304
Q4
05Q
105
Q2
05Q
305
Q4
06Q
106
Q2
06Q
306
Q4
07Q
107
Q2
07Q
307
Q4
08Q
108
Q2
08Q
308
Q4
09Q
109
Q2
09Q
309
Q4
10Q
110
Q2
10Q
310
Q4
11Q
111
Q2
11Q
311
Q4
12Q
112
Q2
12Q
312
Q4
13Q
113
Q2
13Q
30
1
2
3
4
5
0
53
106
159
212
265
Method 1 M(s, g, g) for R1-2
ROR_ld ROR_lower ROR_pdROR_upper Threshold=2 A_ld
ROR A Value
Q & A