Statistics 02

Post on 05-Feb-2016

22 views 0 download

description

Statistics 02. Normal distribution. Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution. Normal distribution. - PowerPoint PPT Presentation

Transcript of Statistics 02

Statistics 02

Normal distribution

Also called normal curve or the Gaussian curve

Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution.

Normal distribution

Normal distribution

• Three features of the normal distribution• 1.    symmetrical histogram• 2.    the mean of the sample is very close to that of

the original population.• 3.    the standard deviation of the set of sample

means will be very close to the original population standard deviation divided by the square root of the sample size, n.

Z score

• Converted raw score on the basis of standard deviation. We convert a raw score to z score o determine how many standard deviation units that raw score is above or below the mean.

• Z=(X-M)/s

Application of Z score

• Comparison of two scores from two tests

• Conversion to standardized score (T score): T=50+10Z

• Determining the proportion below a particular raw score: X < Score

• Statistic inference: Range estimation

Case

• Student A takes 2 tests with the following data:

• Test 1: Raw score=67. Mean=63, Standard deviation=3

• Test 2: R=56, M=51, s=4

• Question: What possible information can we obtain?

Case

• Two students take two different tests of English.

• Student A: RS=67, M=63, s=3

• Student B: RS=56, M=51, s=4

• Question 1: Which student is better in English?

• Question 2: Their T scores?

Table of Normal Distribution

• Relation between Z score and Proportion

Case

• When we select a score randomly from the population, how much probability is this score below or above a certain score?

• That is: the probability of this score (X) < a certain score (say: 60)

• X<60

Case• Z<? • Z=(X-M)/s• Therefore, inequality• X-M < 60-M• (X-M)/s < (60-M)/s• Z<-1• P=0.1587• The chance that we randomly select a score that is

below 60 is 16%.

Case • Xiamen University wants to give the freshmen a pl

acement test upon the admission and put them into 5 levels of English learning. Work out a plan for this test and inform the students before the test the scores required for each level.

• Total of freshmen: 5000• Classes for each level:• B0: 4• B1: remaining• B2: 20• B3: 8• B4: 4• Normal class size: 35

Level Classes Number % Z Cut-off

Sub 4 0.028

1 109 0.762

2 20 0.14

3 6 0.042

4 4 0.028

140

3810

700

210

140

-1.90

0.80

1.5

1.90

44.6

60.8

65

67.4

Statistic inference

• Use a collection of observed values to make inferences about a larger set of potential values.

• Classical problem of statistic inference: how to infer from the properties of a part the likely properties of the whole.

• Because of the way in which samples are selected, it is often impossible to generalize beyond the samples.

Population

• The largest class to which we can generalize the results of an investigation based on a subclass, in other words, the set of all possible values of a variable.

• A population, for statistical purpose, is a set of values.

• We need to be sure that the values that constitute the sample somehow reflect the target statistical population.

Sampling• Random sampling gives us reasonable confidence

that our inference from sample values to population values are valid.

• The most common type of sampling frame is a list (actual or notional) of all the subjects in the group to which generalization is intended.

• What the techniques of statistics offer is a common ground, a common measuring stick by which experimenters can measure and compare the strength of evidence for one hypothesis or another that can be obtained from a sample of subjects.

Sampling• Careful considerations are needed to ensure the sample

represents the population. eg. The gravity of errors in written English as perceive

d by two different groups: native English-speaking teachers of English and Greek teachers of English. Both samples contained individuals from different institutions to avoid institution attitude bias.

• Researchers have an inescapable duty of describing carefully how their experimental material -- including subjects -- was actually obtained. It is also a good practice to attempt to foresee some of the objections that might be made about the quality of the material and either attempt to forestall criticism or admit openly to any serious defects.

Case Study

• Study the population and sample for the following investigations:

• Vocabulary size

• Listening input and listening comprehension

• Social backgrounds and learning strategy

Random Sampling

• Use the Table of Random Numbers

• Other methods

Statistic Parameters• Population parameters• Mean: μ(mu, [mju], English correspond

ent: m)• Standard deviation: σ(sigma [sigm], En

glish correspondent: s)• Sample parameters• Mean: M• Standard deviation: s

Other Greek Alphabets

• Σ sigma, symbol of sum, English correspondent: S

• ε: epsilon, symbol of error, English correspondent: e

• α: alpha

• χ: chi [kai], English correspondent: x

Parameter Estimation ( 参数估计 )

• Point estimator ( 点估计 ): a single number calculated from a sample and used to estimate a population parameter.

• Interval estimator ( 区间估计 ): a likely range within which the population value may lie.

Standard error of the sample means

• If we draw repeatedly a sample from the population and calculate the means of these samples, these means will fall into a normal distribution. The variability of these means from the population mean is called standard error of the sample means, and is calculated as follows:

• Standard error σx = σ/√n• When the population standard deviation σ is

unknown, we often use the sample standard deviation s: σx = s/√n

Case

• If the following data are obtained from a test

• N=132

• M=67

• S=6.5

• What is the standard error of the sample means?

Case

• σx = s/√n

• =6.5/√132

• =6.5/11.49

• =0.566

Confidence( 置信度 )

• The probability at which we are confident the value will fall into, usually 95% or 99%.

• Procedure: calculate the Z score

• Look up in the Normal Distribution Table the Z score that corresponds to the probability of Z=α/2.

• Compare Z and Z=α/2

Case

• N=132, M=67, S=6.5

• μα=0.05?

Case

• Z=(X-M)/s

• =(X-μ)/ σx

• =(67-μ)/0.566

• -Z=α/2 ≤ Z ≤ Z=α/2

• -1.96 ≤ (67-μ)/0.566 ≤ 1.96• -1.10936 ≤67-μ≤ 1.10936• -68.1036 ≤-μ≤ -65.89064• 65.89064 ≤μ≤ 68.1036

t Distribution

• When the sample size becomes less than 30, the sample fall into T distribution.

• T distribution is a family of curves

• Degree of freedom ( 自由度 ) : the number of conditions that are free to vary. In t distribution, df=n-1

Case

• Sample mean=63.16

• Sample standard deviation=7.25

• N=19

• μα=0.05?

Case • Standard error=s/√19=7.25/ 4.36 = 1.66• Z=(X-M)/s• =(X-μ)/ σx

• =(63.16-μ)/1.66• t0.05/2(18)=2.101• -2.101<=(63.16-μ)/1.66<=2.101• -3.48766<=63.16-μ<=3.48766• -66.64766<=-μ<=59.67234• 59.7<=μ<=66.6