STARD2015に学ぶ「診断精度の分析」の書き方

STARD2015に学ぶ「診断精度の分析」の書き方統計数理研究所リスク解析戦略研究センター

特任助教竹林由武

16/07/09 第 26 回 REQUIRE 研究会診断精度の分析 : STARD2015東京医科歯科大学湯島キャンパス : 14:30-17:45

ytake2 [at] ism.ac.jp

発表の構成 2

概要

バイアス

書き方

「診断精度の分析」の概要 (5min)

「診断精度の分析」におけるバイアスのリスク (10min)

「診断精度の分析」の書き方(25min)

診断精度研究の特徴• 定義• 精度指標• 研究疑問 (PIRATE)• 研究デザイン

3

診断精度研究の特徴• 定義

　現状において最も精度が高い診断法を参照基準 ( 至適基準 ) として、関心のある指標検査の診断精度を検討する研究

4

真陽性 (TP) 偽陽性 (FP) 陽性者の数(TP+FP)

偽陰性 (FN) 真陰性 (TN) 陰性者の数(FN+TN)

病気の者の数(TP+FN)

病気の者の数(FP+TN)

全員の数(TP+FP+FN+TN)

病気あり (+) 　　　　　　病気なし (-)

+-指標

検査

参照基準 ( 真の結果 )

• 感度 : 参照基準 + の中で、指標検査 + の割合• 特異度 : 参照基準の中で、指標検査の割合− −

診断精度：指標 (2 値検査 ) 5






病気あり (+) 　　　　　　病気なし (-)

+-指標

検査


感度 (sensitivity) TP / (TP+FN)

特異度(specificity)

TN / (FP+TN) TP+FN

TP

FP+TN

TN

• 陽性尤度比 (LR+): 感度 / (1- 特異度 )「参照基準＋のうち、指標検査＋ ( 感度 ) 」

「参照基準ーのうち、指標検査で＋ (1- 特異度 ) 」







病気あり (+) 　　　　　　病気なし (-)

+-指標

検査


( 真に ) 病気ありに比べて、 ( 真に ) 病気なしが何倍陽性になりやすいか？

• 陰性尤度比 (LR-): 1− 感度 / 特異度「参照基準＋のうち、指標検査ー (1- 感度 ) 」

「参照基準ーのうち、指標検査でー ( 特異度 ) 」







病気あり (+) 　　　　　　病気なし (-)

+-指標

検査


( 真に ) 病気ありに比べて、 ( 真に ) 病気なしが何倍陰性になりやすいか？

• 診断オッズ比 (DOR): 1− 感度 / 特異度「陽性尤度比」

「陰性尤度比」


値が大きいほど、診断精度が高い

• 陽性的中率 : 指標検査 + のうち、参照基準 + の割合

• 陰性的中率 : 指標検査のうち、参照基準の割合− −

診断精度 : 指標 (2 値検査 ) 9






+ 　　　　　　　 -

指標検査


陽性的中率(PPV)

TP / (TP+FP)

陰性的中率(NPV)

TN / (FN+TN) TP+FP

TP

FN+TN

TN

+-

診断精度 : 指標 ( 連続検査 )• ROC 曲線–曲線化面積 ( 診断精度 )–最適カットオフ

10

D+ D-

指標 + TP FP

指標 - FN TN

D+D-Cut-off

①

②

③

TP

FPFN

TN

指標検査得点

度数

診断精度の分析 : 特徴• 研究疑問の定式化

　

11

http://doctorvoodoocartoons.com/diagnostic-challenge/

P p o p u l a t i o n

R r e f e r e n c e t e s t

A a c c u r a c y m e t h o d s

T t e s t c u t o f f p o i n t

E e x p e c t e d u s e

I i n d e x t e s t

母集団指標検査参照検査精度検討の指標カテゴリーの選定法検査の用途

PICO で定式化しても良いが PIRATE の方が診断精度研究にフィット

診断精度の分析 : 特徴• 研究疑問の定式化　

12

P 関心のある母集団は何か？どういう状態の患者か？

I 関心のある検査は何か？R 指標検査の検討に用いられる参照基準は何か？現

在のところ何が最善の検査か？A 診断精度の指標は何か？ ( 感度、特異度、尤度

比？ )

T 検査による分類はどのようになされるか？カットオフががどのように定められるか？

E 指標検査の用途は何か？“Chapter4: Planning a systematic review of diagnostic test accuracy evidence”, Synthesizing Evidence of Diagnostic Accuracy, Lippincot Williams & Wilkins, 2011

診断精度の分析 : 特徴• 研究疑問の定式化　

13

P 妊婦の疑いがある女性I 32-34 日目の double decidual sac sign (DDSS)

R 懐胎 7 週目の経膣超音波検査 (TVS)

A 感度、特異度、陽性・陰性尤度、陽性・陰性的中率T 指標検査 : 超音波検査で DDS が視認されるか否か。

参照基準 : 超音波検査によるエキスパートの判断E トリアージ子宮内妊娠を確定検査前に正確に診断

できるので、効率良く子宮外妊娠者を除外できるRichardson, A., Hopkisson, J., Campbell, B., & Raine Fenning, N. (2016). Use of the double decidual sac sign to ‐confirm intra uterine pregnancy location prior to ultrasonographic visualisation of embryonic contents: a diagnostic ‐accuracy study. Ultrasound in Obstetrics & Gynecology.

診断精度研究の特徴• 研究デザイン–基本的に横断研究 , 参加者のリクルート方法で

Single Gate 型と Two-Gate 型に分けられる

14

Kohn, M. A., Carpenter, C. R., & Newman, T. B. (2013). Understanding the direction of bias in studies of diagnostic test accuracy. Academic Emergency Medicine, 20(11), 1194-1206.

Positive (+D)

Negative

(-D)Positive

(+D)Negative

(-D)

Positive (+) TP

FP Positive (+) TP

FP

TN TNNegative (-)

FN Negative (-) FN

症例対照研究 (two gate)

横断研究 (single gate)

Sepa

rate

sam

ples

症例対照研究では、陰性・陽性的中率 (PPV, NPV), 有病率 (apparent or true prevalence) が正しく算出できないので、報告している研究の結果の解釈は要注意

発表の構成 15

概要

バイアス

書き方




研究計画のステップとバイアス 16

ステップ留意すべきバイアス1 研究目的の設定2 標的母集団の特定3 標本抽出計画の選定 selection / spectrum bias

4 参照基準の選定 imperfect gold standard bias,incorporation bias, treatment paradox, disease progression bias, work-up bias, differential verification bias, verification bias

5 精度指標の選定 Location bias

6 標的評価者の母集団の特定7 標的評価者の抽出計画の選定8 データ収集計画 diagnostic review bias, test review bias,

reading order bias, context bias

9 データ解析計画10 標本サイズの決定

参照基準の選び方や測定の仕方が超重要

Obuchowski, N. A., & McClish, D. K. (2011). Statistical methods in diagnostic medicine. Wiley.

診断精度研究の主要なバイアス• 組み入れバイアス (incorporation bias)

• 検証バイアス１ (partial verification bias)

• 検証バイアス２ (differential verification bias)

• 誤分類バイアス (imperfect gold standard bias)

• スペクトラムバイアス１ (disease and non disease)

• スペクトラムバイアス２ (ambiguous test results)

17

参照基準の中に、指標検査 ( の項目 ) が含まれている場合に生じるバイアス

18

参照基準が、複数の検査項目群からなる場合 )指標検査はその項目群から除かれている？参照基準が、エキスパートによる病態評価である場合 ) エキスパートは指標検査の結果について盲検化された？

→ 参照基準と指標検査が独立しているか？

組み入れバイアス

検証バイアス ① : Partial verification bias 19

指標検査で陽性の人は、参照基準による検査を受けやすく、参照基準をうけた人だけ研究に含まれることで、真陰性者と偽陰性者に欠落が生じる

本当のクロス集計バイアスありのクロス集計Positive (+D)

Negative (-D)

Positive (+D)

Negative (-D)

Positive

(+)TP23

FP87 Positive

(+)TP23

FP87

TN55

TN55

Negative (-)

FN14 TN'

182Negativ

e (-)

FN14 　FN'

13 　Exclude !!感度 =TP/ (TP+FN+FN’)=23 / 50 =46%

特異度 = TN’ / (FP+TN+TN’)= 237 / 324 = 73%

感度 =TP/ (TP+FN)=23 / 37 =62%

特異度 = TN / (FP+TN)= 55 / 142 = 39%

バイアスの方向性 : 感度↑ up 、特異度↓ down

検証バイアス ① : Partial verification bias 20

指標検査で陽性の人は、参照基準による検査を受けやすく、参照基準をうけた人だけ研究に含まれることで、真陰性者と偽陰性者に欠落が生じる

本当のクロス集計バイアスありのクロス集計Positive (+D)

Negative (-D)

Positive (+D)

Negative (-D)

Positive

(+)TP23

FP87 Positive

(+)TP23

FP87

TN55

TN55

Negative (-)

FN14 TN'

182Negativ

e (-)

FN14 　FN'

13 　Exclude !!感度 =TP/ (TP+FN+FN’)=23 / 50 =46%

特異度 = TN’ / (FP+TN+TN’)= 237 / 324 = 73%

感度 =TP/ (TP+FN)=23 / 37 =62%

特異度 = TN’ / (FP+TN+TN’)= 237 / 142 = 39%

バイアスの方向性 : 感度↑ up 、特異度↓ down

審美眼研究参加者の組み入れが、単一の参照基準の結果に基づいている場合 )参照基準を実施するか否かが、指標検査の結果とは独立しているか？

検証バイアス② : differential verification bias

指標検査と参照基準の実施間隔が空いた時に、指標検査で偽陰性の人が病気が改善して、偽陰性数が下がり、真陰性数が上がる。

Positive (+D)

Negative

(-D)Positive (+D)

Negative

(-D)Positive

(+)TP

311 　 TP336

Positive

(+)TP

311 　 TP336

TN'300

TN'300

Negative (-)

FN'5 + 6

Negative (-)

FN'5 6

感度=TP/ (TP+FN’)

=311 / 322 = 96.6%

特異度= TN’ / (FP+TN’)

= 300 / 636 = 47.2%

感度=TP/ (TP+FN)

=311 / 316 = 98.4%

特異度= TN’ / (FP+TN+TN’)

= 300 / 642 = 47.4%

バイアスの方向性 : 感度↑ up 、特異度↑ up

本当のクロス集計バイアスありのクロス集計

検証バイアス② : differential verification bias

指標検査と参照基準の実施間隔が空いた時に、指標検査で偽陰性の人が病気が改善して、偽陰性数が下がり、真陰性数が上がる。

Positive (+D)

Negative

(-D)Positive (+D)

Negative

(-D)Positive

(+)TP

311 　 TP336

Positive

(+)TP

311 　 TP336

TN'300

TN'300

Negative (-)

FN'5 + 6

Negative (-)

FN'5 6


=311 / 322 = 96.6%


= 300 / 636 = 47.2%

感度=TP/ (TP+FN)

=311 / 316 = 98.4%

特異度= TN’ / (FP+TN+TN’)

= 300 / 642 = 47.4%

バイアスの方向性 : 感度↑ up 、特異度↑ up

審美眼多くの指標検査の陽性者には、すぐに参照基準を実施し、指標検査陰性者にはフォロアップを実施しているか？その場合、フォローアップはすぐに実施した参照基準の結果と同じだった？

誤分類バイアス : imperfect gold standard bias

参照基準の分類結果が不正確であることで発生するバイアス

Positive

(+D)Negative

(-D)

Positive

(+D)Negative

(-D)

Positive (+)

TPCx(+)5 TP

0Positive

(+)

TPCx(+)5

TP0

TP'Cx(-)5

TP'Cx(-)5

Negative (-)

FNCx(+)3 TN

185Negative

(-)

FNCx(+)3

TN185

FN'Cx(-)14

FN'Cx(-)14


=10 / 27 = 37%


= 185 / 185 = 100%


=3 / 8 = 68.5%


= 199 / 204 = 97.5%

( この例の ) バイアスの方向性 : 感度↑ up 、特異度↓ down

本当のクロス集計バイアスありのクロス集計

誤分類バイアス : imperfect gold standard bias

参照基準の分類結果が不正確であることで発生するバイアス

Positive

(+D)Negative

(-D)

Positive

(+D)Negative

(-D)

Positive (+)

TPCx(+)5 TP

0Positive

(+)

TPCx(+)5

TP0

TP'Cx(-)5

TP'Cx(-)5

Negative (-)

FNCx(+)3 TN

185Negative

(-)

FNCx(+)3

TN185

FN'Cx(-)14

FN'Cx(-)14


=10 / 27 = 37%


= 185 / 185 = 100%


=3 / 8 = 68.5%


= 199 / 204 = 97.5%

( この例の ) バイアスの方向性 : 感度↑ up 、特異度↓ down

審美眼参照基準は、常に正しく標的条件を分類しているか？

スペクトラムバイアス : disease and non disease

参照基準、軽度の症状がある陰性者を除外( 極端に健康な人のみを Negative(-) にする )

Positive (+D)

Negative (-D)

Positive (+D)

Negative(-D)

Positive (+)

TP670

FP'60

FP142 Positive

(+)TP

670　 FP

142

TN628

TN628TN'

12 　Negative (-)

FN74

Negative (-)

FN74 Ex

clud

e !!

感度=TP/ (TP+FN)

=670 / 744 = 90%

特異度= TN’+TN / (FP’+FP+TN’+TN)

= 640 / 842 = 76%

感度=TP/ (TP+FN)

=670 / 744 = 90%

特異度= TN / (FP+TN)

= 628 / 770 = 82%

( この例の ) バイアスの方向性 : 特異度↑ up ( 感度は不変 )

スペクトラムバイアス : disease and non disease

軽度の症状がある陰性者を除外( 極端に健康な人のみを Negative(-) にする )

Positive (+D)

Negative (-D)

Positive (+D)

Negative(-D)

Positive (+)

TP670

FP'60

FP142 Positive

(+)TP

670　 FP

142

TN628

TN628TN'

12 　Negative (-)

FN74

Negative (-)

FN74 Ex

clud

e !!

感度=TP/ (TP+FN)

=670 / 744 = 90%

特異度= TN’+TN / (FP’+FP+TN’+TN)

= 640 / 842 = 76%

感度=TP/ (TP+FN)

=670 / 744 = 90%


= 628 / 770 = 82%

( この例の ) バイアスの方向性 : 特異度↑ up ( 感度は不変 )

審美眼D+ と D- は個別に標本抽出されているか？D+ のスペクトラムは適切か？中等症ケースが D+ に含まれているか？D- のスペクトラムは適切か？D+ の疑いがあるような人も含めて幅広く標本抽出がされているか？

曖昧な指標検査結果を除外Positive (+D)

Negative (-D)

Positive (+D)

Negative (-D)

high prob TP89

FP13 high prob TP

89FP13

FP'105 　

　intermedi

ateprob

TP'47

intermediate

prob 　　TN'

474low and very low

probFN'30

low and very low

probTN150

TN150normal FN

2 normal FN2

スペクラムバイアス： removing ambiguous test results

Exclude !!

感度=TP+TP’/

(TP+TP’+FN+FN’)=139 / 168 = 81%

特異度= TN’+TN /

(FP’+FP+TN’+TN)= 624 / 742 = 84%

感度=TP/ (TP+FN)

=89 / 91 = 98%


= 150 / 163 = 92%

バイアスの方向性 : 感度↑ up, 特異度↑ up

曖昧な検査結果を除外Positive (+D)

Negative (-D)

Positive (+D)

Negative (-D)

high prob TP89

FP13 high prob TP

89FP13

FP'105 　

　intermedi

ateprob

TP'47

intermediate

prob 　　TN'

474low and very low

probFN'30

low and very low

probTN150

TN150normal FN

2 normal FN2

スペクラムバイアス： removing ambiguous test results

Exclude !!

感度=TP+TP’/

(TP+TP’+FN+FN’)=139 / 168 = 81%

特異度= TN’+TN /

(FP’+FP+TN’+TN)= 624 / 742 = 84%

感度=TP/ (TP+FN)

=89 / 91 = 98%


= 150 / 163 = 92%

バイアスの方向性 : 感度↑ up, 特異度↑ up

審美眼指標検査結果が曖昧 ( 中等症、軽症 ) な人が研究に含まれているか？

QUADAS2• 診断精度研究のバイアスのリスクの評価項目

A 参加者選定 (STARD2015, 5,6,7,9)

B 指標検査 (STARD2015, 12a, 13a)

C 参照基準 (STARD2015, 12b, 13b)

D フローとタイミング (STARD2015, 8, 22)

29


A 参加者選定 (STARD2015, 5,6,7,9)参加者は、連続あるいはランダムに抽出し

た？症例対照研究ではない？不適切なデータの除外を行っていない？

30


B. 指標検査 (STARD2015, 12a, 13a)指標検査の結果は参照基準の結果を盲検化して評価した？閾値が用いられた場合、事前に定義した？

C. 参照基準 (STARD2015, 12b, 13b)参照基準の結果は指標検査の結果を盲検化して解釈した？参照基準は標的症状を正しく分類していると仮定される？

31


D. フローとタイミング (STARD2015, 8, 22)指標検査と参照基準の実施間隔は適切か？

全ての参加者に参照基準を実施した？全ての参加者が同一の参照基準で分類し

た？全ての参加者が解析に含まれているか？

32

バイアスのリスクまとめ• 診断精度研究は、研究デザインが命

• バイアス発生の原因と方向性を理解して、その対処を考慮し事前に研究計画

• 関心は新しい検査の精度であるが、研究の質を高めるには、参照基準の質、幅広い対象者の選定、指標の測定期間を考慮すべき

33

発表の構成 34

概要

バイアス

書き方




「診断精度の分析」の書き方• 適正報告調査

Korevaar, D. A., van Enst, W. A., Spijker, R., Bossuyt, P. M., & Hooft, L. (2014). Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Evidence Based Medicine, 19(2), 47-54.

• 報告ガイドライン : STARD2015 Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L., ... & Kressel, H. Y. (2015). STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology, 277(3), 826-832.

35

診断精度研究のガイドライン

バイアスがかかるポイントを押さえて、そのポイントに関し説明・記述することが重要

STARD2003- QUADAS2• STARD2003 や QUADAS2 の解説

36

http://www.slideshare.net/shinjiyamagata/v14-17600186

http://www.slideshare.net/YoshihikoKunisato/ss-40713224?qid=45212543-1b92-4b13-9958-

391e479aee76&v=&b=&from_search=3

http://www.slideshare.net/shinjiyamagata/v14-17600186

http://www.slideshare.net/YoshihikoKunisato/ss-40713224?qid=45212543-1b92-4b13-9958-391e479aee76&v=&b=&from_search=3

STARD2003→2015• 報告基準、 STARD が 2015 年に改定

37

Korevaar et al. Research Integrity and Peer Review (2016) 1:7DOI 10.1186/s41073-016-0014-7

主要な修正・追加点表題 STARD for abstracts に基づく構造化抄録序論指標検査の用途、研究仮説の設定方法陽性カットオフやカテゴリの事前特定 or探索

変動性の分析の事前特定例数設計結果解析対象者のリクルートフロー図の使用

考察潜在するバイアスに関する考察臨床的意義その他研究の事前登録、プロトコルの公開、資金源

表題や抄録 (STARD: 項目１ ,２ )• 精度指標 ( 感度 , 特異度 , 予測値 , あるいは

AUC) を少なくとも 1 つは使って、診断精度の研究であることを明確にする。

• STARD for Abstract で構造化抄録

※ STARD for Abstracts はまだ公表されていないけど、以下の文献で、ドラフトを見ることができる

38

Bossuyt P. Draft STARD for abstracts (personal communication). 2016.

https://www.ruor.uottawa.ca/bitstream/10393/34253/1/Protocol%20for%20Registration%201-3-2016.pdf

STARD for Abstructs (仮 ) 39

構造化抄録の構造表題背景と目的研究デザイン参加者検査手法フローダイアグラム検査結果考察Bossuyt P. Draft STARD for abstracts (personal communication). 2016.

序論 (STARD: 項目 3)• 研究目的と仮説を明確に述べる。その際、指

標検査の用途 (臨床上の有用性 ) を明記する。

40

スクリーニング検査スクリーニング検査新規検査スクリーニング検査

既存検査新規検査既存検査既存検査新規検査

replacement triage add-on現状

指標検査の利点も明記 (早い？、安い？、正確？ )

序論 (STARD: 項目 3)• 記載例 ) 用途 ( トリアージ )

• 記載例 ) 仮説

41

A gestation sac is the first ultrasonographic sign of an intrauterine pregnancy (IUP). It appears as a uniformly round, hypoechoic structure with an echogenic rim. Initially it does not contain any internal echoes and can therefore be difficult to differentiate from a ‘pseudosac’, that is, an endometrial fluid collection that occurs in up to 15% of ectopic pregnancies (EPs) (1). It is clinically important not to confuse these two structures and hence several different ultrasonographic signs have been proposed to help differentiate between them prior to visualisation of any embryonic contents. The double decidual sac sign (DDSS) is one such sign.

our hypothesis being that all intrauterine fluid collections that exhibit the DDSS represent a true gestation sac.

Richardson, A., Hopkisson, J., Campbell, B., & Raine Fenning, N. (2016). Use of the double decidual sac sign to ‐confirm intra uterine pregnancy location prior to ultrasonographic visualisation of embryonic contents: a diagnostic ‐accuracy study. Ultrasound in Obstetrics & Gynecology.

STARD2015 方法と結果の節• 参加者 (QUADAS2: A, D)• 検査手法 (QUADAS2: B, C, D)• 分析• その他

42

参加者• 方法の節–組み入れ基準の詳細–組み入れのフロー ( セッティング , 場所 , 日付 )–参加者の構成

• 結果の節–参加者のフローダイアグラム–ベースライン属性、臨床特性–指標検査と参照基準の測定間隔 , 臨床介入

43

• ダイアグラムを用いて、どのようなフローで解析対象者が選定されたか明記する。

–潜在組み入れ対象者–組み入れ対象者–指標検査の実施者数–参照基準の実施者数–最終的な解析対象

参加者 44

各フェイズで除外された人数と除外理由を記載し、フロー明確に

参加者• 結果の節

参加者のフローダイアグラム–記載例

45

Richardson, A., Hopkisson, J., Campbell, B., & Raine Fenning, N. (2016). Use of the double decidual sac sign to confirm ‐intra uterine pregnancy location prior to ultrasonographic visualisation of embryonic contents: a diagnostic accuracy ‐study. Ultrasound in Obstetrics & Gynecology.

参加者• 結果の節

ー参加者のフローダイアグラム• 記載例　

46

Between 1st January and 31st October 2015, 620 IVF/ICSI cycles were undertaken within the unit. Of these, 124 (20%) women agreed to participate in the study. In addition to these, a further six women were approached by one of the authors at the time of embryo transfer and declined to participate in the study due to various reasons, namely work commitments (n=3), reluctance to have a TVS (n=2) and distance to travel to the clinic (n=1). 45 (36.3%) of the 124 women were subsequently excluded as they had a negative urinary pregnancy test. Of the 79 women who had a positive pregnancy test, two (2.53%) did not attend for the index test and nine (11.39%) of those that did attend did not have an intrauterine fluid collection present on TVS and were therefore excluded. 77 intrauterine fluid collections were observed in the remaining 68 women (nine of the women had two intrauterine fluid collections detected).


参加者• 方法の節–組み入れ可能な対象者を特定する基準• 症状• 先行する検査結果• レジストリー

–組み入れ可能な対象者をいつどこで特定したか？• セッティング、場所、日付

47

参加者• 方法の節–研究デザイン–組み入れ可能な対象者を特定する基準–組み入れ可能な対象者をいつどこで特定したか？• 記載例

48

Participants were recruited prospectively from Nurture Fertility, Nottingham, United Kingdom between 1st January and 31st October 2015. Women were aged between 18 and 45 years of age and had undergone IVF/ICSI treatment using a standard long agonist or antagonist protocol depending on ovarian reserve tests as previously described (13). The study was well advertised within the IVF unit using posters and patient information leaflets. Whenever possible, one of the authors (AR) was also present to discuss the study with women following their embryo transfer procedure. All women were invited to participate in the study.

参加者• 方法の節–組み入れ (or 除外 ) 基準の詳細• 記載例　

49

Women were excluded from the study if they had a negative urinary pregnancy test (performed 18 days after oocyte retrieval in a fresh cycle or 13-16 days after embryo transfer in a frozen embryo replacement cycle depending on the stage of embryo development at the time of transfer) or if, at the time of the index test, there was either no ultrasonographic evidence of an intrauterine fluid collection, or a yolk sac and/or fetal pole was clearly visible within the intrauterine fluid collection. Women were also excluded if no outcome data were available or if, following the reference standard, the final diagnosis was not known (for example resolving or persistent pregnancies of unknown location).

参加者• 結果の節–指標検査と参照基準の測定間隔やその間実施され

た臨床介入を明記する• 記載例

50

If the urinary pregnancy test was positive, an early ultrasound scan was scheduled for either 19 or 20 days after oocyte retrieval corresponding to a gestational age of 33 or 34 days. This range was specifically chosen to optimize the chances of a gestation sac being present but a yolk sac or fetal pole being absent (14, 15).

All women were scheduled to have a routine viability ultrasound scan at between 6 and 7 weeks gestation (between 8 and 16 days after the index test) as per the fertility unit’s standard practice.

指標検査

参照基準

検証バイアス (differential verification bias)② をチェック

※ この研究では、臨床介入なし

参加者• 結果の節–ベースライン属性、臨床特性• 記載例 (Table 1)

51

The baseline characteristics of study participants are illustrated in Table 1 (values refer to mean ±standard deviation). These were not significantly different from the baseline characteristics of the general population attending the IVF unit during the same time period.

検査手法• 方法の節 ( 指標・参照共通 )–検査の詳細 (再現可能な程度に )–検査の陽性カットオフ、カテゴリ化の定義、説明–検査の盲検化–参照基準の選択理由 ( 参照基準のみ )

• 結果の節–クロス集計–診断精度指標とその正確性 (信頼区間 )–有害事象

52

検査手法• 方法の節– ( 指標・参照 ) 検査の詳細• 検査の材料、器具の仕様と使用法 , 具体的な測定方法

について再現可能な情報を記載する (引用文献含む )

• 検査の陽性カットオフ、カテゴリ化の定義、説明– 事前に特定されたカットオフか？事後的に精度が高くなるよ

うに、分類基準をポストホックに調整していないか？

• 参照基準の選択理由

53

引用文献を含め、必ず読者が再現可能な情報を記載する

検査手法• 方法の節

検査の盲検化　記載例指標検査

参照検査

54

Interpretation of the reference standard was performed by an experienced gynaecologist without knowledge of the findings from the index test.

The findings from the early scan were interpreted immediately and recorded separate to the main clinical notes. 実施時期は参照基準前なので、参照基準の情報は知りようがない

バイアスのリスク評価項目

検査手法• 結果の節–クロス集計• 記載例

55

Of the six intrauterine fluid collections that did not display the DDSS, four were subsequently proven to have an IUP and two were found to have an EP (Table 2).

本研究での記載はないが、参照基準や指標検査によって診断が確定できなかった人数も検査の性能を知る上で重要なので報告する

検査手法• 結果の節–診断精度指標とその正確性 (信頼区間 )• 記載例 ( ２値検査 )

56

The DDSS therefore has a sensitivity of 93.9% (95% CI 85.0%-98.3%), specificity of 100% (95% CI 15.8%-100%) and overall diagnostic accuracy of 94.0% (95% CI 88.3%-99.7%) for predicting an IUP. The positive and negative predictive values are 100% (95% CI 94.1%-100%) and 33.3% (95% CI 4.3%-77.7%) respectively whilst the positive likelihood ratio was infinite and the negative likelihood ratio was 0.06 (95% CI 0.02-0.16).

検査手法• 結果の節–有害事象• ( 指標・参照 ) 検査の実施により生じた有害事象を記載

• 記載例

57

No adverse events from performing the index test or reference standard were reported.

分析• 方法の節–多様性の分析–例数設計–欠測の扱い

58

分析• 多様性の検討

• 実施された診断精度の多様性 (variability) の分析を報告。事前に解析が予定されたものと探索的な解析は区別する ( 事前の解析プラン有りが望ましい ) 。

• サブグループ解析等

59

診断精度が、特定の要因によって変動するか？

分析• 多様性のソース

1. 患者共変量　属性、症状の種類、合併、実施施設など

2. 標的条件と関連する要因　重症度や実施地域など

3. 検査のデバイスやモダリティに関連する要因　検査機器の経年による精度の変化など

4. 検査結果の評価者要因　熟練度など

60

Obuchowski, N. A., & McClish, D. K. (2011). Statistical methods in diagnostic medicine. Wiley.

分析• 例数設計– 例数の設定根拠を具体的に記載する。• 抑うつの診断精度研究、例数設計の方法を明記してい

るのは 3% のみ• 抑うつの診断精度研究、感度の信頼区間が 10%以下で

ある研究は 8% 、 62% が 95%信頼幅が 21%以上

精度の点推定値のみでなく、正確性 (信頼区間幅 ) を考慮した例数設計が必要

61

Thombs, B. D., & Rice, D. B. (2016). Sample sizes and precision of estimates of sensitivity and specificity from primary studies on the diagnostic accuracy of depression screening tools: a survey of recently published studies. International journal of methods in psychiatric research, 25(2), 145-152.

分析• 例数設計の手法– 1 つの検査の診断精度• 2 値検査の感度、特異度• 連続検査の ROC

– 2 つの検査の診断精度の比較• 2 値検査の感度、特異度• 連続検査の ROC

62

Hajian-Tilaki, K. (2014). Sample size estimation in diagnostic test studies of biomedical informatics. Journal of biomedical informatics, 48, 193-204.

分析• 例数設計 : 2 値検査の感度・特異度

63


P = 感度 or 特異度Zα/2 = 1.96(α=0.05),d = 正確度 (許容誤差 )Prev= 有病率

有意水準 =0.05, 感度 = 90, 特異度 = 70, 正確度 = 0.07, 有病率 = 0.10 とすると、必要な例数は…

1.962 × 90 ×100.072 × (1-90) = 706

1.962 × 70 ×300.072 × (1-90) = 1647

感度特異度

分析• 例数設計 : 2 値検査の感度・特異度–記載例

64

Our sample size calculation was based on the following formula as described by Karimollah16. …( 中略 )…As for our study, the predetermined values of sensitivity and specificity were 99% and 98%, respectively, Zα/2=1.96, and the margin of error (d) was set as ±5%, which yielded results that would be accurate to within ±5 percentage points. Based on the formula, the sample sizes for sensitivity and specificity were 15 and 30, respectively. Subsequently, the overall sample sizes for sensitivity and specificity were calculated using the following formulae, respectively:…( 中略 )… Prev denotes the prevalence of disease in the population. The prevalence of disease in the population was 40% in our present study, and thus the overall sample sizes calculated based on sensitivity and specificity were 38.0 and 50.2, respectively. The maximum total number of participants based on sensitivity and specificity was 50.2, and thus a sample size of 51 was finally selected in our study.

Gao, J., Wu, H., Wang, L., Zhang, H., Duan, H., Lu, J., & Liang, Z. (2016). Validation of targeted next-generation sequencing for RAS mutation detection in FFPE colorectal cancer tissues: comparison with Sanger sequencing and ARMS-Scorpion real-time PCR. BMJ open, 6(1), e009532.

分析• 例数設計連続値検査、 ROC AUC

65


は逆累積標準正規分布AUC=.70 で、正確度 0.07 の場合に必要な例数は…

分析• 例数設計 2 値検査、 2 つの検査の比較

66


　 = 2 つの検査の感度 (or 特異度 ) の平均P1 = 一方の検査の感度 (or 特異度 )P2= もう一方の検査の感度 (or 特異度 )Zα/2 = 1.96(α=0.05), Zβ = 0.84(β=0.80)

分析• 例数設計連続値検査、 ROC AUC 、比較

67


の AUC は比較する 2 つの検査の AUC の平均AUC1=.70 で、比較するテストとの差 AUC2-AUC1=0.10を検出力 .80 、 95%信頼区間で検出したい場合の必要例数

は 1 つの検査の時と同様に求める

分析• 欠測–欠測の理由と割合を報告することが重要

欠測への対処– verification bias• BG 法による補正• 多重代入

– Differential verification bias• Bayesian methodsなど、各種脱落に応じた手法が開発されているが、普及してはいない…

68

de Groot, J. A., Bossuyt, P. M., Reitsma, J. B., Rutjes, A. W., Dendukuri, N., Janssen, K. J., & Moons, K. G. (2011). Verification problems in diagnostic accuracy studies: consequences and solutions. BMJ, 343, d4770.

事前登録 , プロトコル公開• 診断精度研究の事前登録率は 15%程度 [1] 。• 結果良好な診断精度研究はより早く出版

[2] 。

診断精度研究でも、事前の研究登録、プロトコル公開は必須

69

[1] Korevaar, D. A., van Es, N., Zwinderman, A. H., Cohen, J. F., & Bossuyt, P. M. (2016). Time to publication among completed diagnostic accuracy studies: associated with reported accuracy estimates. BMC medical research methodology, 16(1), 1.

[1] Korevaar DA, Bossuyt PM, Hooft L. Infrequent and incomplete registration of test accuracy studies: analysis of recent study reports. BMJ Open. 2014;4(1):e004596.

STARD2015 の重要追加事項• 研究の登録番号と登録名 ( 項目 28)• 研究プロトコルの入手可能性 ( 項目 29)• 資金源 ( 項目 30)記載例 ) 方法の節 , 論文末尾

70

The study was registered with www.clinicaltrials.gov (NCT02700789) and conducted following STARD guidelines (12). The full study protocol can be accessed by contacting the corresponding author.

FUNDINGUniversity of Nottingham and Nurture Fertility


プロトコル公開• 研究プロトコル、論文として公開

• 上記を論文に記載

71

Ethical approval was given by the National Research Ethics Service Committee North-West (Cheshire) on February 19, 2013 (13/NW/0010; 118638), and the study protocol was published (Macey et al. 2013).

Macey, R., Glenny, A., Walsh, T., Tickle, M., Worthington, H., Ashley, J., & Brocklehurst, P. (2015). The efficacy of screening for common dental diseases by hygiene-therapists a diagnostic test accuracy study. Journal of dental research, 0022034514567335.

Take Home Message• STARD2015 を参考に、透明性の高い研究計画

• 診断精度の指標は、記述的な指標なので、測定対象の影響をもろに受けるので、研究デザインが極めて重要

• バイアスのリスクを考慮した、研究デザイン

• 研究の事前登録や例数設計は、 RCT だけではなく、診断精度研究でも必須

72

解析コード 73

クロス集計、診断精度ROC曲線の描画例数設計

クロス集計の精度指標• EpiR

74

# クロス集計表dat <- as.table(matrix(c(670,202,74,640), nrow = 2, byrow = TRUE))colnames(dat) <- c("Dis+","Dis-")rownames(dat) <- c("Test+","Test-")

# epiR による診断精度の推定library(epiR)rval <- epi.tests(dat, conf.level = 0.95)print(rval)

• pROCROC曲線 75

data(aSAH)

roc1<-roc(outcome ~ s100b, aSAH, plot=T, ci=T, print.auc=T, grid=T, show.thres=T,auc.polygon=T)

roc2<-roc(outcome ~ s100b, aSAH, plot=T, ci=T, print.auc=T, smooth=T, grid=T, show.thres=T,auc.polygon=T)

> coords(roc1, “best”, ret=c(“threshold”, “specificity”, “1-npv”)) # 最適カットオフ threshold specificity 1-npv 0.2050000 0.8055556 0.2054795

例数設計• 2 値検査 , 感度・得意度

76

# 関数precision.sn.sp <- function(sn=NULL, sp=NULL, w = NULL, p = NULL, sig.level=.05){ z.value <- qnorm(sig.level/2, lower.tail=FALSE) tp.fn <- z.value^2 * (sn * (1-sn)) n.sn <- ceiling(tp.fn/((w^2)*(1-p))) fp.tn <- z.value^2 * (sp * (1-sp)) n.sp <- ceiling(fp.tn/((w^2)*(1-p))) return(list(n.sensitivity=n.sn, n.specificity=n.sp))}# 実行precision.sn.sp(sn=.90, sp=.70, w=.07,p=.90)

例数設計• 2 値検査 , 感度・得意度 , 2 つの検査の比較

77

# 関数precision.comp <- function(p1=NULL, p2=NULL, sig.level=.05, beta=0.80){ z.alpha <- qnorm(sig.level/2, lower.tail=FALSE) z.beta<- qnorm(beta) mean.p<-(p1+p2)/2; d.p<- (p1-p2)^2 tp.fn <- ((z.alpha) * sqrt(2*mean.p*(1-mean.p)) + (z.beta) * sqrt(p1*(1-p1)+p2*(1-p2)))^2 n.p <- ceiling(tp.fn/d.p) return(list(n.compare=n.p))}

# 実行precision.comp(p1=.70,p2=.90)

例数設計• 連続検査 , AUC

78

# 関数precision.auc <- function(auc=NULL, w=NULL, sig.level=0.05){ z.value<-qnorm(0.05/2,lower.tail=F) a <- qnorm(auc)*1.414 v_auc<-(0.0099*exp(1)^((-a^2)/2))*(6*a^2+16) n.auc<- ceiling(((z.value^2)*v_auc)/w^2) return(list(n.auc=n.auc))}

# 実行precision.auc(auc=.70, w=0.07)

例数設計• 連続検査 , AUC, 2 つの検査の比較

79

# 関数precision.auc.comp <- function(auc1=NULL, auc2=NULL, sig.level=0.05,beta=0.80){z.alpha<-qnorm(sig.level/2,lower.tail=F)z.beta<-qnorm(beta)a1 <- qnorm(auc1)*1.414v_auc1<-(0.0099*exp(1)^((-a1^2)/2))*(6*a1^2+16)a2 <- qnorm(auc2)*1.414v_auc2<-(0.0099*exp(1)^((-a2^2)/2))*(6*a2^2+16)a_H0 <- qnorm(mean(c(auc2,auc1)))*1.414v_auc_H0<-(0.0099*exp(1)^((-a_H0^2)/2))*(6*a_H0^2+16)

n.auc.comp<-ceiling((z.alpha*sqrt(2*v_auc_H0)+z.beta*sqrt(v_auc1+v_auc2))^2/(auc2-auc1)^2)return(n.auc.compare=n.auc.comp)}

# 実行precision.auc.comp(auc1=0.50,auc2=0.80)

参考図書• STARD2003 の解説論文の和訳が掲載。

中山健夫 , & 津谷喜一郎 . (2008). 臨床研究と疫学研究のための国際ルール集 .

• 日本語で診断精度研究のデザイン , バイアスに関して解説。HULLEY, S. B., et al. 木原雅子・木原正博 (訳 ): 医学的研究のデザイン .

2004.

• 診断精度研究の入門書– Knottnerus, J. A., & Buntinx, F. (2009). The evidence base of clinical

diagnosis. Theory and methods of diagnostic research.

• 診断精度分析の概説書 : 研究計画や統計手法 ( 統計手法充実 )– Zhou, X. H., McClish, D. K., & Obuchowski, N. A. (2009). Statistical

methods in diagnostic medicine (Vol. 569). John Wiley & Sons.

80

STARD2015に学ぶ「診断精度の分析」の書き方

Data & Analytics

Transcript of STARD2015に学ぶ「診断精度の分析」の書き方