RESEARCH & DATA ANALYSIS. SCIENTISTS COLLECT STATISTICAL DATA FROM EXPERIMENTS STATISTICAL OR...

Post on 21-Jan-2016

217 views 1 download

Transcript of RESEARCH & DATA ANALYSIS. SCIENTISTS COLLECT STATISTICAL DATA FROM EXPERIMENTS STATISTICAL OR...

RESEARCH & DATA ANALYSIS

SCIENTISTS COLLECT STATISTICAL DATA

FROM EXPERIMENTS STATISTICAL OR

NUMERICAL DATA ALLOWS FOR MORE ACCURATE ANALYSIS & EVALUATION OF THE RESULTS FROM EXPERIMENTS

STATISTICS DEAL WITH COLLECTING, ANALYZING,

AND INTERPRETING INFORMATION OR RESULTS

TYPES OF DATA:

QUANTITATIVE DATA – AMOUNTS, MEASUREMENTS OR NUMERICAL DATA

QUALITATIVE DATA – NON-NUMERICAL IN NATURE (CHARACTERISTICS – COLOR, SHAPE, ETC.)

TYPES OF DATA COLLECTED:

• POPULATION: 100% OF DATA ARE COLLECTED (CAN BE EXACT)

(GREEK LETTERS USED TO ABBREVIATE QUANTITIES)

• SAMPLE: A SMALLER ESTIMATE OR REPRESENTATION IS COLLECTED

(ENGLISH LETTERS STAND FOR QUANTITIES SURVEYED)

EXPERIMENTS ARE CONDUCTED USING

MULTIPLE REPLICATIONS

• REPLICATION INSURES MORE RELIABLE / ACCURATE RESULTS

DATA IS COLLECTED FROM MULTIPLE EXPERIMENTS AND AN AVERAGE IS DETERMINED

FROM THOSE RESULTS

• THE LARGER THE SAMPLE SIZE THE BETTER REPRESENTATION OF THE TRUE VALUE.

MEASURES OF CENTRAL TENDENCY INCLUDE:

MEAN AVERAGE *

MEDIAN

MODE

* USUALLY THE BEST CHOICE FOR GETTING CENTRAL TENDENCY

THE AVERAGE FOR A SET OF DATA / NUMBERS IS ALSO

CALLED THE MEANEXAMPLE:

10

12

8

11

+ 12

53 5 = 10.6 MEAN (AVG)

MEDIAN VALUE IS THE MIDDLE VALUE IN A SAMPLE

OF VALUES

THERE MUST BE THE SAME NUMBER ABOVE AND BELOW THE MEDIAN

SAMPLE: 6, 9, 10, 11, 12, 14, 18

MEDIAN VALUE = 11

MEAN / AVERAGE = 11.43

MODE IS THE VALUE THAT OCCURS MOST OFTEN IN A

SAMPLE

• SAMPLE: 10, 8 11, 12, 14, 8, 11, 11

MODE = 11

Which Example Below is a More Accurate Mean Average ?

MODE ?MEAN ?MEDIAN ?

WHY IS IT MORE ACCURATE?

MEASURES OF VARIATION

RANGE = HIGHEST SCORE – LOWEST SCORE

in the set of numbers

_ 2 STANDARD = Sq. Root of E(x – x)

DEVIATION * n - 1

* BEST CHOICE - USES ALL NUMBERS IN THE LIST

RANGE OF A SET OF DATA= HIGHEST VALUE – LOWEST VALUE

EXAMPLE:

6, 7, 8, 11,12,14,14,15,15, 16, 19, 20

20 – 6 = 14 (RANGE)

RANGE HAS LIMITED USE

10% RULE:

Some researchers consider data to be valid and representative or significant within the

10% range {

EXAMPLE: 10%RULE: 10 0.6 12 1.4 8 2.6 11 0.4 12 1.4 53 (10.6 MEAN)

10% of 10.6 (mean) is + / - 1.06

or a range of 9.54 – 11.66

THE RANGE (9.54 – 11.66) REPRESENTS A VALID RANGE

FOR ACCEPTING THE DATA

EXAMPLE:

10

12 NOTE: USING THE 10% RULE

8 & THE RANGE (9.54 – 11.6)

11 WHICH VALUES WOULD BE

12 CONSIDERED OUT OF RANGE?

53 8 & 12

STANDARD DEVIATION (SD) IS A MEASUREMENT OF THE VARIATION FROM THE MEAN

SD CONSIDERS THE #

THAT ARE OUT

OF RANGE AND

HOW FAR OUT OF

RANGE THEY ARE

STANDARD DEVIATION REPRESENTS -

HOW CLOSELY

DATA ARE

CLUSTERED

AROUND

THE MEAN

STANDARD DEVIATION TERMS:

_

X = MEAN

X = INDIVIDUAL SCORES IN THE SET

EX = SUM OF ALL SCORES / VALUES

n = TOTAL NUMBER OF SCORES OR

VALUES IN THE SET

Calculating a Standard Deviation Take a sample problem with the following values:

                            There are eight data points total, with a mean (or average) value of 5:

                                            To calculate the standard deviation, compute the difference of each data point from the mean, then square the result:

                                                    

Next divide the sum of these values by the number of values, then take the square root to get the standard deviation:

                                               

The standard deviation of this example is 2.

FINDING STANDARD DEVIATION CAN BE CONFUSING &

DIFFICULT IN SOME SITUATIONS

• PROCEDURES VARY DEPENDING ON THE PURPOSE & TYPE OF DATA RECORDED

• COMPUTER PROGRAMS & SCIENTIFIC CALCULATORS WILL MAKE

THE TASK EASIER

Use of Standard Deviation:

One standard deviation away from the mean in either direction represents around 68 % of the population in this group. Two standard

deviations away from the mean account for roughly 95 % of the population. And three standard deviations account for about 99 % of the

population.

If the curve were flatter and more spread out, the standard deviation would be larger in order to account for 68 % of the population. So

standard deviation can tell you how spread out the examples in a set are from the mean.

This is useful if you are comparing results for different things (drugs, equipment, etc.). Standard deviation will tell you how diverse the test

scores are for each specific thing being measured.

NORMAL DISTRIBUTION OR

A BELL CURVE

                                                    Each colored band has a width of one standard deviation.

MEAN

GAUSSIAN CURVE

• SCORES ARE PLOTTED ON A GRAPH

• ALSO KNOWN AS:

NORMAL DISTRIBUTION CURVE

NORMAL DISTRIBUTION OR Gaussian Curve

Shows Normal Frequency:

– 68.26% of the values are within 1 standard deviations from the mean.

– 95.44% of the values are within 2 standard deviations from the mean. Common Choice

– 99.73% of the values are within 3 standard deviations from the mean.

STANDARD DEVIATION:EXAMPLE: 2 SD: 10 12 10 12 13 9 } SD FROM 11 MEAN 14 53 5 = 10.6 MEAN

• MOST RESEARCHERS CONSIDER +/- 2 SD DATA VALID / ACCEPTABLE DATA

ACCURACY IS HOW CLOSE A RESULT IS TO THE TRUE VALUE

WHERE AS

PRECISION REFERS TO THE REPRODUCIBILITY OF RESULTS,

OR HOW CLOSE THE RESULTS ARE TO EACH OTHER

LABORATORY INSTRUMENTS MUST BE PRECISE AS WELL

AS ACCURATE

CLOSETRUE

Coefficient of Variation

• Precision of a new instrument will be compared to the precision of old instrument

• CV = STANDARD DEVIATION X 100%

MEAN AVERAGE

OR

% DIFFERENCE = LOW # - HIGH # X 100%

HIGH

COEFFICIENT OF VARIATION -(CV) IS THE STANDARD

DEVIATION RELATIVE TO THE MEAN OF THAT SAMPLE

• MAYBE EXPRESSED AS A % OF THE MEAN

CV = s x 100

x

PURPOSE OF DETERMINING COEFFICIENT OF VARIATION

IS TO COMPARE VARIATION OF TWO DIFFERENT SAMPLES OR

PRECISION OF TWO DIFFERENT INSTRUMENTS OR METHODS

Chi Square Analysis

• A STATISTICAL MEASURING INSTRUMENT THAT DETERMINES HOW WELL A SET OF DATA SUPPORT THE HYPOTHESIS OR EXPECTED VALUES

• [MAJOR USE IS IN GENETICS]• [EMPLOYS THE PUNNETT SQUARE]• PREDICTIONS ARE BASED ON PROBABILITY

Chi Square Analysis• TESTS WHETHER ITEMS IN VARIOUS

CATEGORIES DEVIATE OR ARE THE SAME

• NULL HYPOTHESIS MEANS IT MEETS EXPECTATIONS OR LITTLE DIFFERENCE

• A PROBABILITY OF 0.05 OR LESS SHOWS AN EXTREME DIFFERENCE FROM EXPECTATED OBSERVATION / HYPOTHESIS

• THE SMALLER THE # THE GREATER THE LIKELYHOOD IT SUPPORTS

THE HYPOTHESIS

IN SUMMARY:• THERE ARE A VARIETY OF DATA

ANALYSIS INSTRUMENTS• EACH INSTRUMENT IS BEST SUITED

TO MEASURE CERTAIN PARAMETERS OF DATA• SCIENTISTS AND RESEACHERS USE

THE INSTRUMENTS TO INTERPRET TEST RESULTS