Why You Should Care About Statistics - Jeff Leek

@leekgroup

@simplystats

why you should care about sta6s6cs

Jeff Leek Johns Hopkins Bloomberg Biosta6s6cs

jtleek@gmail.com

@leekgroup

@simplystats

credits

•  slides shamelessly borrowed from: –  Ingo Ruczinski (Google: “ingo’s pond”) –  Josh Akey (UW Genomics) – Karl Broman (Google: “the stupidest thing broman”)

@leekgroup

@simplystats

why this stuff maNers

@leekgroup

@simplystats

seems like an exci6ng result!

hNp://www.nature.com/nm/journal/v12/n11/full/nm1491.html

@leekgroup

@simplystats

stunning problems

@leekgroup

@simplystats

how it went down

hNp://www.nature.com/news/2011/110111/full/469139a/box/1.html

@leekgroup

@simplystats

s6ll going on

@leekgroup

@simplystats

worth a watch

hNp://www.birs.ca/events/2013/5-‐day-‐workshops/13w5083/videos/watch/201308141121-‐Baggerly.mp4

@leekgroup

@simplystats

worth a read

hNp://www.iom.edu/Reports/2012/Evolu6on-‐of-‐Transla6onal-‐Omics.aspx

@leekgroup

@simplystats

what were the problems?

•  irreproducibility •  lack of coopera6on

•  silly predic6on rules •  study design/batch effects •  procedures not locked down

Exper6se

Transparency

@leekgroup

@simplystats

6p #1: know the analysis

hNp://bit.ly/OgW3xv

@leekgroup

@simplystats

6p #2: care about the analysis

Drinkel et al. Oganometalics 2013

@leekgroup

@simplystats

6p #3: have a data/analysis sharing plan

hNp://www.nature.com/nature/journal/v467/n7314/full/467401b.html

@leekgroup

@simplystats

6p #4: know where to get help

hNp://www.biostat.jhsph.edu/consult/

@leekgroup

@simplystats

6p #5: no subs6tute for the real thing

@leekgroup

@simplystats

“central dogma” of sta6s6cs

Adapted from Josh Akey

@leekgroup

@simplystats

sample size

@leekgroup

@simplystats

some experiment

@leekgroup

@simplystats

example calcula6ons

@leekgroup

@simplystats

beNer technology ≠ no variability

hNp://www.nature.com/nbt/journal/v29/n7/full/nbt.1910.html

@leekgroup

@simplystats

@leekgroup

@simplystats

bad study design

78% of genes differen6ally expressed

@leekgroup

@simplystats

group and date “confounded”

@leekgroup

@simplystats

uh-‐oh!

@leekgroup

@simplystats

confounding:

associa6on between shoe size and literacy in kids

@leekgroup

@simplystats

proteomics

@leekgroup

@simplystats

proteomics

@leekgroup

@simplystats

gene expression

@leekgroup

@simplystats

gene expression

@leekgroup

@simplystats

@leekgroup

@simplystats

@leekgroup

@simplystats

confounding is a big deal

hNp://www.nature.com/nrg/journal/v11/n10/full/nrg2825.html

@leekgroup

@simplystats

confounding and study design

@leekgroup

@simplystats

6p #6: randomiza6on

@leekgroup

@simplystats

an example study

@leekgroup

@simplystats

a bad design

@leekgroup

@simplystats

stra6fied design

@leekgroup

@simplystats

more good study characteris6cs

•  Balanced

•  Replicated •  Has Controls

@leekgroup

@simplystats

6p #7: look at the data

hNp://en.wikipedia.org/wiki/Anscombe's_quartet

@leekgroup

@simplystats

summarizing data

hNp://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

@leekgroup

@simplystats

replicates

@leekgroup

@simplystats

watch the scale!

@leekgroup

@simplystats

log transform is common/useful

@leekgroup

@simplystats

bland-‐altman plots

hNp://en.wikipedia.org/wiki/Bland%E2%80%93Altman_plot

@leekgroup

@simplystats

beware ridiculograms!

@leekgroup

@simplystats

ack! math!

X1,…,XM

Y1,…,YN

X = 1M

1M −1

(Xi − X )2i=1

1N −1

(Yi −Y )2i=1

Observa6ons:

Averages:

SD2 or variances:

@leekgroup

@simplystats

an important issue

@leekgroup

@simplystats

t-‐sta%s%c: you’ll see this a lot*

Y − X sY2

Invented to improve beer: hNp://en.wikipedia.org/wiki/Student's_t-‐test

@leekgroup

@simplystats

p-‐values

Original Sta6s6c

@leekgroup

@simplystats

how to calculate

{# |Sperm| ≥ |Sobs|} P-‐value = # of Permuta6ons

Observed Sta6s6c = 2

@leekgroup

@simplystats

6p #8: know what a p-‐value is(n’t)

The probability of observing a sta6s6c that extreme if the null hypothesis is true. The p-‐value is not •  Probability the null is true •  Probability the alterna6ve is true •  A measure of sta6s6cal evidence

@leekgroup

@simplystats

an easy mistake to make

@leekgroup

@simplystats

a problem

@leekgroup

@simplystats

a problem

@leekgroup

@simplystats

a problem

@leekgroup

@simplystats

mul6ple comparison error rates •  Family wise error rate:

Pr(# False Positives ≥ 1) •  False discovery rate:

•  EFP (e-‐values) E[# False Positives]

E #False Positives# Of Discoveries"

@leekgroup

@simplystats

difference in interpreta6on Suppose 550 out of 10,000 genes are significant at 0.05 level

P-‐value < 0.05 Expect 0.05*10,000 = 500 false posi6ves False Discovery Rate < 0.05 Expect 0.05*550 = 27.5 false posi6ves Family Wise Error Rate < 0.05 The probability of at least 1 false posi6ve ≤ 0.05

@leekgroup

@simplystats

read this

hNp://www.pnas.org/content/100/16/9440.long

@leekgroup

@simplystats

the inevitable

hNp://simplysta6s6cs.org/2013/08/26/sta6s6cs-‐meme-‐sad-‐p-‐value-‐bear/

@leekgroup

@simplystats

why I’m sympathe6c

@leekgroup

@simplystats

beware of “hacking” sta6s6cs

@leekgroup

@simplystats

be nice to the poor sta6s6cian

@leekgroup

@simplystats

6p #9: correla6on and causa6on

hNp://xkcd.com/552/

@leekgroup

@simplystats

most common mistake

Fit regression models (correla7ons) followed by: “In summary, our results support a causal rela%onship of breasxeeding in infancy with recep6ve language at age 3 and with verbal and nonverbal IQ at school age. These findings support Na6onal and interna6onal recommenda6ons to promote exclusive breasxeeding through age 6 months and con6nua6on of breasxeeding through at least age 1 year.”

@leekgroup

@simplystats

predic6on and associa6on

@leekgroup

@simplystats

diagnos6cs

@leekgroup

@simplystats

6p #10: know these quan66es

@leekgroup

@simplystats

key quan66es as frac6ons

@leekgroup

@simplystats

important to keep in mind

@leekgroup

@simplystats

general popula6on

@leekgroup

@simplystats

general popula6on

@leekgroup

@simplystats

at risk subpopula6on

@leekgroup

@simplystats

at risk subpopula6on

@leekgroup

@simplystats

summary of 6ps 1.  know the analysis 2.  care about the analysis 3.  have a data sharing plan 4.  know where/when to get help 5.  this isn’t a subs6tute for learning sta6s6cs 6.  randomize in your study design 7.  look at your data 8.  know what p-‐values are(n’t) 9.  beware causality creep 10. know the key diagnos6c quan66es

Why You Should Care About Statistics - Jeff Leek

Health & Medicine

Transcript of Why You Should Care About Statistics - Jeff Leek

Preek van de Leek

Hilversummediacampus mediawijsheid - saxion - john leek-081116

Statistics homework help, statistics tutoring, statistics tutor by onlinetutorsite

Interacciones Farmacologicas_by Jeff

Jeff Fonseca

Mediawijsheid PleinC 010212 - John Leek

De Boomgaard Leek, type Cerise

STATISTICS BOTSWANA INTERNATIONAL MERCHANDISE TRADE STATISTICS · STATISTICS BOTSWANA INTERNATIONAL MERCHANDISE TRADE STATISTICS Contact Statistician: Mogotsi J. Morewanare Email:

Introduction to Statistics: Descriptive Statistics

Jeff Biologia Final

Official Whānau Statistics, Statistics NZ, 2013

Population statistics at regional level/el Statistics ...ec.europa.eu/eurostat/statistics-explained/pdfscache/19876.pdf · Population statistics at regional level/el Statistics Explained

BASIC CONCEPTS DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS · statistics basic concepts descriptive statistics. inferential statistics. ... 3. งานว ...

Adres: Kamerlingh-Onnesstraat 8, 9351 VD Leek€¦ · 9351 AP, LEEK Tel: 0594-510100 E-mail: leek@jonkmangarantiemakelaars.nl Omschrijving Leek, Kamerlingh Onnesstraat 8 Representatieve

Jeff Ziemann Portfolio

College SIOB Opleiding Mediacoach - 030214 -John Leek

Abbott Jeff - Miedo

KoreaPlus Statistics - Embedded on SPSS Statistics 26spss.datasolution.kr/product/file/KoreaPlus_Statistics_26_Standard.pdf · KoreaPlus Statistics - Embedded on SPSS Statistics Standard

Jeff Koons

GEMEENTE LEEK BEDRIJVENTERREINEN LEEK EN OLDEBERT ...€¦ · Leek en Leeksterhout en is op diezelfde datum vastgesteld. blz 2 113302 Rho Adviseurs B.V. Bestemmingsplan Bedrijventerreinen