1/54 Statistics Descriptive Statistics— Tables and Graphics.
-
date post
19-Dec-2015 -
Category
Documents
-
view
229 -
download
2
Transcript of 1/54 Statistics Descriptive Statistics— Tables and Graphics.
1/54
Statistics
Descriptive Statistics—Tables and Graphics
2/54
Summarizing Qualitative Data Summarizing Quantitative Data The Stem-and-Leaf Display(枝葉圖 ) Cross-tabulations and Scatter Diagrams
Contents
STATISTICS in PRACTICE
The Colgate-Palmolive Company uses statistics in its quality assurance program for home laundry detergent products.
One concern is customersatisfaction with the quantity ofdetergent in a carton. To control theproblem of heavy detergent powder, limits are placed on the acceptable range of powder density.
STATISTICS in PRACTICE
Statistical samples are taken and the density of each powder sample is measured.
Data summaries are then providedfor operating personnel so thatcorrective action can be taken if necessary to keep the density within the desired quality.
Summarizing Qualitative Data Frequency Distribution Relative Frequency Distributions Percent Frequency Distributions Bar Graphs Pie Charts
A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes.
A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes.
The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.
The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.
Summarizing Qualitative Data -- Frequency Distribution
Example: Data from a sample of 50 Soft Drink Purchases
Summarizing Qualitative Data -- Frequency Distribution
Frequency Distribution
Soft Drink FrequencyCoke Classic 19Diet Coke 8Dr. Pepper 5Pepsi-Cola 13Sprite 5 Total 50
Summarizing Qualitative Data -- Frequency Distribution
Example: Marada Inn, Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor.
Summarizing Qualitative Data -- Frequency Distribution
The ratings provided by a sample of 20 guests are:
Below Average Above Average Above Average Average Above Average Average Above Average
Average Above Average Below Average Poor Excellent Above Average Average
Above Average Above Average Below Average Poor Above Average Average Average
Summarizing Qualitative Data -- Frequency Distribution
PoorBelow AverageAverageAbove AverageExcellent
2 3 5 9 1
Total 20
Rating Frequency
Summarizing Qualitative Data -- Frequency Distribution
The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class.
The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class.
Frequency of the classRelative frequency of a class= n
Summarizing Qualitative Data – Relative Frequency
A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.
A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.
Summarizing Qualitative Data – Relative Frequency Distribution
Example: Relative and Percent Frequency Distribution of Soft Drink Purchases
Soft Drink Relative FrequencyCoke Classic .38Diet Coke .16Dr. Pepper .10Pepsi-Cola .26Sprite .10 Total 1.00
Summarizing Qualitative Data – Relative Frequency Distribution
Summarizing Qualitative Data – Percent Frequency Distribution
The percent frequency of a class is the relative frequency multiplied by 100.
The percent frequency of a class is the relative frequency multiplied by 100.
A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.
A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.
Example: Percent Frequency Distribution of Soft Drink Purchases
Soft Drink Percent FrequencyCoke Classic 38Diet Coke 16Dr. Pepper 10Pepsi-Cola 26Sprite 10 Total 100
Summarizing Qualitative Data – Percent Frequency Distribution
PoorBelow AverageAverageAbove AverageExcellent
.10 .15 .25 .45 .05
Total 1.00
10 15 25 45 5 100
RelativeFrequency
PercentFrequencyRating
.10(100) = 10
1/20 = .05
Summarizing Qualitative Data – Relative and Percent Frequency Distribution
Summarizing Qualitative Data – Bar Graph
A bar graph is a graphical device for depicting qualitative data.
On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis)
Using a bar of fixed width drawn above each class label, we extend the height appropriately.
Example: Bar Graph of Soft Drink Purchases
Summarizing Qualitative Data – Bar Graph
PoorPoor BelowAverageBelowAverage
AverageAverage AboveAverage AboveAverage
ExcellentExcellent
Fre
qu
en
cy
Fre
qu
en
cy
RatingRating
1122
33
44
55
66
77
88
991010
Marada Inn Quality Ratings
Summarizing Qualitative Data – Bar Graph
Summarizing Qualitative Data – Pie Chart
The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.
First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class.
Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.
Example: Pie Chart of Soft Drink Purchases
Summarizing Qualitative Data – Pie Chart
BelowAverage 15%
BelowAverage 15%
Average 25%Average 25%
AboveAverage 45%
AboveAverage 45%
Poor10%Poor10%
Excellent 5%Excellent 5%
Marada Inn Quality Ratings Marada Inn Quality Ratings
Summarizing Qualitative Data – Pie Chart--Example
Summarizing Qualitative Data – Pie Chart--Example
Insights Gained from the Preceding Pie Chart One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager.
For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager.
Summarizing Quantitative Data
Frequency Distribution Relative Frequency and Percent Frequency
Distributions Dot Plot Histogram Cumulative Distributions Ogive(折線圖 )
The three steps necessary to define the classes for a frequency distribution with quantitative data are:
1. Determine the number of nonoverlapping classes. we recommend using between 5 and 20 classes. 2. Determine the width of each class.
3. Determine the class limits.
Largest data value –Smallest data valueApproximate class width= Number of classes
Summarizing Quantitative Data-- Frequency Distribution
Summarizing Quantitative Data-- Frequency Distribution
Use between 5 and 20 classes. Data sets with a larger number of elements
usually require a larger number of classes. Smaller data sets usually require fewer classes
Guidelines for Selecting Width of Classes
Use classes of equal width.
Approximate Class Width =
Largest data value –Smallest data value
Number of classes
Summarizing Quantitative Data-- Frequency Distribution
Example: These data show the time in days required to complete year-end audits for a sample of 20 clients of Sanderson and Clifford, a small public accounting firm with the data rounded to the nearest day. YEAR-END AUDIT
TIMES (IN DAYS)12 14 19 18 15 15 18 1720 27 22 2322 21 33 2814 18 16 13
Summarizing Quantitative Data-- Frequency Distribution
Example:
1. Number of classes = 5
2. provides an approximate class width of
(33 — 12)/5= 4.2.
3. We therefore decided to round up and use a class width of
five days in the frequency distribution.
Audit Time Frequency (days) 10-14 4 15-19 8 20-24 5 25-29 2 30-34 1 Total 20
Summarizing Quantitative Data-- Frequency Distribution
The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide.
Summarizing Quantitative Data-- Frequency Distribution
Sample of Parts Cost for 50 Tune-ups
91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73
Summarizing Quantitative Data-- Frequency Distribution
For Hudson Auto Repair, if we choose six classes:
50-59 60-69 70-79 80-89 90-99 100-109
2 13 16 7 7 5Total 50
Parts Cost ($)Frequency
Approximate Class Width = (109 - 52)/6 = 9.5 10
Summarizing Quantitative Data-- Frequency Distribution
50-59 60-69 70-79 80-89 90-99 100-109
PartsCost ($)
.04 .26 .32 .14 .14 .10Total 1.00
RelativeFrequency
4 26 32 14 14 10 100
Percent Frequency
2/50 .04(100)
Summarizing Quantitative Data-- Relative Frequency andPercent Frequency Distributions
• Only 4% of the parts costs are in the $50-59 class.
• The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class.
• 30% of the parts costs are under $70.
• 10% of the parts costs are $100 or more.
• Insights Gained from the Percent Frequency
Distribution
Summarizing Quantitative Data-- Relative Frequency andPercent Frequency Distributions
Dot Plot One of the simplest graphical summaries of
data is a dot plot. A horizontal axis shows the range of data
values. Then each data value is represented by a dot
placed above the axis. Example: Dot Plot for The Audit Time Data
Dot Plot
50 60 70 80 90 100 11050 60 70 80 90 100 110
Cost ($)Cost ($)
Dot Plot
Tune-up Parts Cost
. . . ..... .......... .. . .. . . ... . .. . . . . ..... .......... .. . .. . . ... . .. .
. . . .. . . . . .. . . . . .. .. .. .. . . . .. .. .. .. . .
Histogram Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency , relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.
Histogram
Example: Histogram for The Audit Time Data
Histogram
22
44
66
88
1010
1212
1414
1616
1818
PartsCost ($) PartsCost ($)
Fre
qu
ency
Fre
qu
ency
50-59 60-69 70-79 80-89 90-99 100-11050-59 60-69 70-79 80-89 90-99 100-110
Tune-up Parts Cost
Histogram provides information about the shape. Symmetric
Left tail is the mirror image of the right tail Examples: heights and weights of people
HistogramR
ela
tive
Freq
uen
cyR
ela
tive
Freq
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Rel
ativ
e F
requ
ency
Rel
ativ
e F
requ
ency
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Histogram
Moderately Skewed Left A longer tail to the left Example: exam scores
Rel
ativ
e F
requ
ency
Rel
ativ
e F
requ
ency
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Moderately Right Skewed A Longer tail to the right Example: housing values
Histogram
Rel
ativ
e F
requ
ency
Rel
ativ
e F
requ
ency
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Histogram
Highly Skewed Right A very long tail to the right Example: executive salaries
Cumulative frequency distribution - shows the number of items with values less than or equal to the upper limit of each class.. Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the upper limit of each class.
Cumulative Distributions
Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the upper limit of each class.
Cumulative Distributions
Example: Cumulative Frequency, Cumulative Relative Frequency and Cumulative Percent
Frequency Distributions for the Audit Data.
Cumulative Distributions
Hudson Auto Repair
< 59 < 69 < 79 < 89 < 99< 109
Cost ($) CumulativeFrequency
CumulativeRelativeFrequency
CumulativePercent Frequency
2 15 31 38 45 50
.04 .30 .62 .76 .90 1.00
4 30 62 76 90 100
2 + 13
15/50 .30(100)
Ogive
• An ogive is a graph of a cumulative distribution.• The data values are shown on the horizontal
axis.• Shown on the vertical axis are the:
• cumulative frequencies, or• cumulative relative frequencies, or• cumulative percent frequencies
Ogive
• The frequency (one of the above) of each class is plotted as a point.
• The plotted points are connected by straight lines.
Ogive
These gaps are eliminated by plotting points halfway between the class limits.
Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69 class, and so on.
Hudson Auto Repair Because the class limits for the parts-
cost data are 50-59, 60-69, and so on, there appear to be one-unit gaps from 59 to 60, 69 to 70, and so on.
Ogive Example: Ogive for the Audit Time Data.
PartsCost ($) PartsCost ($)
2020
4040
6060
8080
100100
Cu
mu
lati
ve P
erc
en
t Fr
eq
uen
cyC
um
ula
tive P
erc
en
t Fr
eq
uen
cy
50 60 70 80 90 100 11050 60 70 80 90 100 110
(89.5, 76)
Ogive with
Cumulative Percent Frequencies Tune-up Parts CostTune-up Parts Cost
Exploratory Data Analysis
The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly. One such technique is the stem-and-leaf display.
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order and shape of the distribution of the data.
It is similar to a histogram on its side, but it has the advantage of showing the actual data values.
The first digits of each data item are arranged to the left of a vertical line.
Stem-and-Leaf Display
To the right of the vertical line we record the last digit for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
Stem-and-Leaf Display
Example: Number of Questions Answered Correctly on An Aptitude Test
Stem-and-Leaf Display
Stem: The numbers to the left of the vertical line (6, 7, 8, 9, 10, 11, 12, 13, and 14).
Leaf: each digit to the right of the vertical line.
Stem-and-Leaf Display Although the stem-and-leaf display may
appear to offer the same information as a histogram, it has two primary advantages. The stem-and-leaf display is easier to construct by
hand. Within a class interval, the stem-and-leaf display
provides more information than the histogram because the stem-and-leaf shows the actual data.
Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. She examines 50 customer invoices for
tune-ups. The costs of parts , rounded to the
nearest dollar, are listed on the next slide.
Example: Hudson Auto Repair Sample of Parts Cost for 50 Tune-ups
91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73
Stem-and-Leaf Display
56789
10
2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9
a stema leaf
Stretched Stem-and-Leaf Display
If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s).
Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value corresponds to leaf values of 5 - 9.
Stretched Stem-and-Leaf Display
5 5 91 47 7 7 8 91 35 8 90 0 2 35 5 5 6 7 8 9 9 91 1 2 2 3 4 45 6 7 8 8 8 9 9 92 2 2 2725
566778899
1010
Stem-and-Leaf Display
Leaf Units
• Where the leaf unit is not shown, it is assumed to equal 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.• In the preceding example, the leaf unit was 1.
• A single digit is used to define each leaf.
Example: Leaf Unit = 0.1
If we have data with values such as
8 91011
Leaf Unit = 0.16 8
1 42
0 7
8.6 11.7 9.4 9.1 10.2 11.0 8.8
a stem-and-leaf display of these data will be
Example: Leaf Unit = 10
If we have data with values such as
16 17 18 19
Leaf Unit = 108
1 90 3
1 7
1806 1717 1974 1791 1682 1910 1838
a stem-and-leaf display of these data will be
The 82 in 1682is rounded downto 80 and isrepresented as an 8.
Cross-tabulations and Scatter Diagrams
Thus far we have focused on methods that are used to summarize the data for one variable at a time.
Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables.
Cross-tabulation and a scatter diagram are two methods for summarizing the data for two (or more) variables simultaneously.
Crosstabulation
The left and top margin labels define the classes for the two variables.
Crosstabulation can be used when:• one variable is qualitative and the other is• quantitative,• both variables are qualitative, or• both variables are quantitative.
A crosstabulation is a tabular summary of data for two variables.
Crosstabulation
Example: Data from Zagat’s Restaurant Review. Data on a restaurant’s quality rating and
typical meal price are reported. Quality rating is a qualitative variable with
rating categories of good, very good, and
excellent. Meal price is a quantitative variable
that ranges from $10 to $49.
Crosstabulation
Crosstabulation
Example: Crosstabulation of Quality Rating and Meal Price for 300 Los Angeles Restaurants
PriceRange Colonial Log Split A-FrameTotal
< $99,000> $99,000
18 6 19 12 55
45
30 20 35 15Total 100
12 14 16 3
Home Style
Crosstabulation• Example: Finger Lakes Homes
The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitativ
e variable
qualitative variable
Crosstabulation Insights Gained from Preceding Crosstabulation
• Only three homes in the sample are an A-Frame style and priced at more than $99,000.
• The greatest number of homes in the sample (19) are a split-level style and priced at less than or equal to $99,000.
PriceRange Colonial Log Split A-FrameTotal
< $99,000> $99,000
18 6 19 12 55
45
30 20 35 15Total 100
12 14 16 3
Home Style
CrosstabulationFrequency distributionfor the price variable
Frequency distributionfor the home style variable
Crosstabulation: Row or Column Percentages
Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.
PriceRange Colonial Log Split A-Frame Total
< $99,000
> $99,000
32.73 10.91 34.55 21.82 100
100
Note: row totals are actually 100.01 due to rounding.
26.67 31.11 35.56 6.67
Home Style
(Colonial and > $99K)/(All >$99K) × 100 = (12/45) × 100
Crosstabulation: Row Percentages
PriceRange Colonial Log Split A-Frame
< $99,000> $99,000
60.00 30.00 54.29 80.0040.00 70.00 45.71 20.00
Home Style
100 100 100 100Total
(Colonial and > $99K)/(All Colonial) × 100 = (12/30) × 100
Crosstabulation: Column Percentages
Crosstabulation: Simpson’s Paradox
Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation.
We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation.
Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data.
• One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
• A scatter diagram is a graphical presentation of the relationship between two quantitative variables.
Scatter Diagram and Trendline
• The general pattern of the plotted points suggests the overall relationship between the variables.
Scatter Diagram and Trendline
• A trendline is an approximation of the
relationship.
Scatter Diagram and Trendline A Positive Relationship
x
y
Scatter Diagram and Trendline A Negative Relationship
x
y
Scatter Diagram and Trendline
No Apparent Relationship
x
y
Example: Panthers Football Team
Scatter Diagram
The Panthers football team is interested
in investigating the relationship, if any,
between interceptions made and points scored.
13213
1424181730
x = Number ofInterceptions
y = Number of Points Scored
Scatter Diagramy
x
Number of Interceptions
Num
ber
of
Poin
ts S
core
d
510
15
2025
30
0
35
1 2 30 4
• Insights Gained from the Preceding Scatter Diagram
• The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.
• Higher points scored are associated with a higher number of interceptions.
• The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored.
Example: Panthers Football Team
Tabular and Graphical Procedures
Qualitative DataQualitative Data Quantitative DataQuantitative Data
TabularMethods TabularMethods
TabularMethods TabularMethods
Graphical MethodsGraphical Methods
Graphical MethodsGraphical Methods
• Frequency Distribution• Rel. Freq. Dist.• Percent Freq. Distribution• Crosstabulation
• Bar Graph• Pie Chart
• Frequency Distribution• Rel. Freq. Dist.• Cum. Freq. Dist.• Cum. Rel. Freq. Distribution • Stem-and-Leaf Display• Crosstabulation
• Dot Plot• Histogram• Ogive• Scatter Diagram
DataData