Statistics 02

31
Statistics 02

description

Statistics 02. Normal distribution. Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution. Normal distribution. - PowerPoint PPT Presentation

Transcript of Statistics 02

Page 1: Statistics 02

Statistics 02

Page 2: Statistics 02

Normal distribution

Also called normal curve or the Gaussian curve

Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution.

Page 3: Statistics 02

Normal distribution

Page 4: Statistics 02

Normal distribution

• Three features of the normal distribution• 1.    symmetrical histogram• 2.    the mean of the sample is very close to that of

the original population.• 3.    the standard deviation of the set of sample

means will be very close to the original population standard deviation divided by the square root of the sample size, n.

Page 5: Statistics 02

Z score

• Converted raw score on the basis of standard deviation. We convert a raw score to z score o determine how many standard deviation units that raw score is above or below the mean.

• Z=(X-M)/s

Page 6: Statistics 02

Application of Z score

• Comparison of two scores from two tests

• Conversion to standardized score (T score): T=50+10Z

• Determining the proportion below a particular raw score: X < Score

• Statistic inference: Range estimation

Page 7: Statistics 02

Case

• Student A takes 2 tests with the following data:

• Test 1: Raw score=67. Mean=63, Standard deviation=3

• Test 2: R=56, M=51, s=4

• Question: What possible information can we obtain?

Page 8: Statistics 02

Case

• Two students take two different tests of English.

• Student A: RS=67, M=63, s=3

• Student B: RS=56, M=51, s=4

• Question 1: Which student is better in English?

• Question 2: Their T scores?

Page 9: Statistics 02

Table of Normal Distribution

• Relation between Z score and Proportion

Page 10: Statistics 02

Case

• When we select a score randomly from the population, how much probability is this score below or above a certain score?

• That is: the probability of this score (X) < a certain score (say: 60)

• X<60

Page 11: Statistics 02

Case• Z<? • Z=(X-M)/s• Therefore, inequality• X-M < 60-M• (X-M)/s < (60-M)/s• Z<-1• P=0.1587• The chance that we randomly select a score that is

below 60 is 16%.

Page 12: Statistics 02

Case • Xiamen University wants to give the freshmen a pl

acement test upon the admission and put them into 5 levels of English learning. Work out a plan for this test and inform the students before the test the scores required for each level.

• Total of freshmen: 5000• Classes for each level:• B0: 4• B1: remaining• B2: 20• B3: 8• B4: 4• Normal class size: 35

Page 13: Statistics 02

Level Classes Number % Z Cut-off

Sub 4 0.028

1 109 0.762

2 20 0.14

3 6 0.042

4 4 0.028

140

3810

700

210

140

-1.90

0.80

1.5

1.90

44.6

60.8

65

67.4

Page 14: Statistics 02

Statistic inference

• Use a collection of observed values to make inferences about a larger set of potential values.

• Classical problem of statistic inference: how to infer from the properties of a part the likely properties of the whole.

• Because of the way in which samples are selected, it is often impossible to generalize beyond the samples.

Page 15: Statistics 02

Population

• The largest class to which we can generalize the results of an investigation based on a subclass, in other words, the set of all possible values of a variable.

• A population, for statistical purpose, is a set of values.

• We need to be sure that the values that constitute the sample somehow reflect the target statistical population.

Page 16: Statistics 02

Sampling• Random sampling gives us reasonable confidence

that our inference from sample values to population values are valid.

• The most common type of sampling frame is a list (actual or notional) of all the subjects in the group to which generalization is intended.

• What the techniques of statistics offer is a common ground, a common measuring stick by which experimenters can measure and compare the strength of evidence for one hypothesis or another that can be obtained from a sample of subjects.

Page 17: Statistics 02

Sampling• Careful considerations are needed to ensure the sample

represents the population. eg. The gravity of errors in written English as perceive

d by two different groups: native English-speaking teachers of English and Greek teachers of English. Both samples contained individuals from different institutions to avoid institution attitude bias.

• Researchers have an inescapable duty of describing carefully how their experimental material -- including subjects -- was actually obtained. It is also a good practice to attempt to foresee some of the objections that might be made about the quality of the material and either attempt to forestall criticism or admit openly to any serious defects.

Page 18: Statistics 02

Case Study

• Study the population and sample for the following investigations:

• Vocabulary size

• Listening input and listening comprehension

• Social backgrounds and learning strategy

Page 19: Statistics 02

Random Sampling

• Use the Table of Random Numbers

• Other methods

Page 20: Statistics 02

Statistic Parameters• Population parameters• Mean: μ(mu, [mju], English correspond

ent: m)• Standard deviation: σ(sigma [sigm], En

glish correspondent: s)• Sample parameters• Mean: M• Standard deviation: s

Page 21: Statistics 02

Other Greek Alphabets

• Σ sigma, symbol of sum, English correspondent: S

• ε: epsilon, symbol of error, English correspondent: e

• α: alpha

• χ: chi [kai], English correspondent: x

Page 22: Statistics 02

Parameter Estimation ( 参数估计 )

• Point estimator ( 点估计 ): a single number calculated from a sample and used to estimate a population parameter.

• Interval estimator ( 区间估计 ): a likely range within which the population value may lie.

Page 23: Statistics 02

Standard error of the sample means

• If we draw repeatedly a sample from the population and calculate the means of these samples, these means will fall into a normal distribution. The variability of these means from the population mean is called standard error of the sample means, and is calculated as follows:

• Standard error σx = σ/√n• When the population standard deviation σ is

unknown, we often use the sample standard deviation s: σx = s/√n

Page 24: Statistics 02

Case

• If the following data are obtained from a test

• N=132

• M=67

• S=6.5

• What is the standard error of the sample means?

Page 25: Statistics 02

Case

• σx = s/√n

• =6.5/√132

• =6.5/11.49

• =0.566

Page 26: Statistics 02

Confidence( 置信度 )

• The probability at which we are confident the value will fall into, usually 95% or 99%.

• Procedure: calculate the Z score

• Look up in the Normal Distribution Table the Z score that corresponds to the probability of Z=α/2.

• Compare Z and Z=α/2

Page 27: Statistics 02

Case

• N=132, M=67, S=6.5

• μα=0.05?

Page 28: Statistics 02

Case

• Z=(X-M)/s

• =(X-μ)/ σx

• =(67-μ)/0.566

• -Z=α/2 ≤ Z ≤ Z=α/2

• -1.96 ≤ (67-μ)/0.566 ≤ 1.96• -1.10936 ≤67-μ≤ 1.10936• -68.1036 ≤-μ≤ -65.89064• 65.89064 ≤μ≤ 68.1036

Page 29: Statistics 02

t Distribution

• When the sample size becomes less than 30, the sample fall into T distribution.

• T distribution is a family of curves

• Degree of freedom ( 自由度 ) : the number of conditions that are free to vary. In t distribution, df=n-1

Page 30: Statistics 02

Case

• Sample mean=63.16

• Sample standard deviation=7.25

• N=19

• μα=0.05?

Page 31: Statistics 02

Case • Standard error=s/√19=7.25/ 4.36 = 1.66• Z=(X-M)/s• =(X-μ)/ σx

• =(63.16-μ)/1.66• t0.05/2(18)=2.101• -2.101<=(63.16-μ)/1.66<=2.101• -3.48766<=63.16-μ<=3.48766• -66.64766<=-μ<=59.67234• 59.7<=μ<=66.6