Lecture II
-
Upload
ankit-verma -
Category
Documents
-
view
219 -
download
1
description
Transcript of Lecture II
Calculation Time
LECTURE II
Bivariate Frequency Distribution
WorkerAgeWageWorkerAgeWage1267851623680239148017237903371390182710804248251928124052568020321250621625213614607271050223312708321225233414309301300243012601033142025271020112689026228801228102027236401322630282468014281040293212001529124030311150Problem 1
In a newspaper account, describing the incidence of influenza among tubercular persons living in the same family, the following paragraph appeared. Exactly a fifth of the 100,000 inhabitants showed signs of tuberculosis and no fewer than 5,000 among them had an attack of influenza, but among them only 1,000 lived in uninfected houses. In contrast with this 1/15th of the tubercular persons who did not have influenza were still exposed to infection. Altogether 21,000 were attacked by influenza and 41, 000 were exposed to risk of infection, but the number, who having influenza but not tuberculur lived in houses where no other cases of influenza occurred, was only 2000.
Redraft the information in a concise tabular form.
Ratio Chart
When we want to see the rate of change in a variable, or when a comparison is to be made between the rates of change of two or m0re variables.
With alinearscale, the physical distance on the scale from, say, $1 to $2 is the same as the distance from $2 to $3.
On aratioscale, the intervals between amounts are determined by their ratio to each other. Thus the distance from $1 to $2 is the same as the distance from $2 to $4 or from $3 to $6 because each represents an increase of 100%.
A ratio scale makes it possible to compare at a glance the magnitudes of changes that occur at different ranges. For example, if an investment rises from $1 to $2 during one time period, and later in the graph it rises from $3 to $6, a ratio scale will make it obvious that each rise was of the same magnitude. A linear scale would make it appear that the second rise was proportionally three times as great as the first rise.
On a linear scale, a vertical inch (or any other distance) represents the samenumber of units(such as dollars) wherever it appears on the graph. On a ratio scale, a vertical inch represents the samedegree of growthwherever it appears on the graph.
A ratio scale demonstrates comparisons, growth, relationships between items and between time periods. It is especially useful for plotting two or more items on the same graph, because it's the only way to compare the growth of two items starting at different levels. If you compare them on a linear scale, the apparent differences between them will be meaningless.
If a graph bears no indication of which kind of scale it's using, it probably is a linear scale. If it's a ratio scale, there should be a label on it somewhere sayingratio scale,log scale, orsemi-log scale.
Why Ratio Scale
To see the reason a ratio scale is important, look at this graph. It shows a trend that appears out of control. After making a slow start, the subject of the graph appears to be growing faster and fasterand near the end of the graph it is exploding.
What is this thing that's racing to the sky? Is it the federal debtcareening on the road to national bankruptcy? Is it some investment in the last stages of a runaway bull marketclimbing toward its final blowoff?
Well, as a matter of fact, it's neither of those things. It's the yearly value of a 5% passbook savings account with interest compounded year after year. Hold on to the account long enough and the curve will rise with a steeper and steeper arc, even though each year's value is only 5% greater than the prior year'sjust as at the beginning of the trend.
Sooner or later, a linear graph will turn any constant growth rate into a picture of a skyrocket. It distorts comparisons because it makes the change from 100 to 120 appear to be 20 times as large as the change from 5 to 6even though each is, proportionately, a 20% increase.
Now look at this graph. The same 5% growth rate is plotted on a ratio scale. Each horizontal gridline represents the same percentage growth from the previous gridline. Since the growth is a constant 5% per year, the plot line is perfectly straightbecause 5% represents the same degree of gain towards the end of the graph as it did at the beginning.
Economists and mathematicians almost always use ratio scales whenever rates of growth are important.
Ratio scales are appropriate for investment prices, sales figures, income, or any other absolute amounts being plotted over a period of time. They should not be used to plot anything in which a relationship is already inherent in the amountssuch as percentages (like the inflation rate), ratios between two items (such as a gold-silver ratio or price-earnings ratio)because the benefit provided by a ratio scale is already built into the figures being plotted.
Measures of Central TendencyMean
Mean x = x/ N
OR
x = fx/ f
Mean of Grouped Data
x = fm/ f (where m is the mid point of the class interval)
Problem 2
Marks (x)Frequency (f)9110211312613101411157163172181Total 46Calculate Mean
Problem 3
Monthly earningsNo. of Workers27.5-32.512032.5-37.515237.5-42.517042.5-47.521447.5-52.5410Calculate Mean
Problem 4
The mean age of a group of 100 persons was found to be 32.02. Later, it was discovered that age 57 was misread as 27. find the corrected mean.
Problem 5
The mean monthly salary paid to all employees in a company was Rs 500. the monthly salaries paid to male and female employees average Rs. 520 and Rs 420 respectively. Determine the percentage of males and females employed by the company.
Mean Formula to be used
x 12 = N1 x 1 + N2 x 2
__________
N1+N2
Problem 6
xf9110211312613101411157163172181Total 46Short cut method for calculating mean
x = A + fx/f
Problem 7
Monthly earningNo. of workers1000-150031500-2000242000-2500552500-3000513000-3500393500-4000224000-4500154500-500011Total = 220Short cut method for calculating mean
x = A + fx/f *i
Weighted Arithmetic Mean
x= (w1x1) + (w2x2) +(w3x3) +(w4x4)
w1 +w2 +w3 +w4
OR
x= wx
w
ToyPrice per Toy Rs xNumber Sold wCar350Locomotive525Airplane715Double Decker910100Median
Median = (N + 1) th item
2
Also when the data is in class intervals
(N + 1) - C
Me=l + 2 X i
f
where l is the lower limit of the median class
i is the width of median class
f is frequency of median class
C cumulative frequency upto (excluding) the median class
N total no. of cases
Problem 8
Class IntervalsCumulative Frequency0-5295-1022410-1546515-2058220-2563425-3064430-3565035-4065340-45655Quartiles, Deciles and Percentiles
The Median divides a distribution into two halves.
The first and third quartiles (denoted Q1 and Q3) are defined as follows:
25% of the data lie below Q1 (and 75% is above Q1),
25% of the data lie above Q3 (and 75% is below Q3)
Decile divides data in 10 equal intervals
Percentile divides data in 100 equal intervals
Nth percentile of a data is the value of the item such that N percent items lie below it.
Percentile
nN - C
Pn=l + 100 X i
f
Decile
nN - C
Dn=l + 10 X i
f
Quartile
nN - C
Qn=l + 4 X i
f
Problem 9
Minutes/weekNo. of teenagersMinutes/weekNo. of teenagers0-99.527399.5-499.55899.5-199.542499.5-599.532199.5- 299.565599.5 & above9299.5-399.578The above distribution represents the number of minutes spent by a group of teenagers in going to movies.
Find the median, 1st and 3rd quartile, 70th percentile and 8th decile.
Problem 10
MarksNo. of StudentsMore than 050More than 1046More than 2040More than 3020More than 4010More than 503Above is the distribution of marks obtained by 50 students in Decision Science.
Calculate the median marks. If 60 percent of the students pass this test, find the minimum marks obtained by a pass candidate
Inter Quartile Range
The inter-quartile range (IQR) is the difference between the first and third quartiles, i.e. IQR = Q3- Q1
Example
The ordered blood pressure data is:
113124124132146151170
Q1 Q3
Inter Quartile Range (IQR) is 151-124 = 27
27
An alternative definition of Q1 and Q3 is based on Q1 having a rank position = 0.25(n+1) and Q3 having rank position = 0.75(n+1), where n is the sample size.
If n=10, then Q1 would have rank position = 0.2511=2.75 and Q3 has rank position = 8.25 . Therefore Q1 is found by interpolating between the second an third observations and Q3 is found by interpolating between observations 8 and 9.
JMP software uses this definition of quartiles.
Box-plots
A box-plot is a visual description of the distribution based on
Minimum
Q1
Median
Q3
Maximum
Useful for comparing large sets of data
28
Example 1
The pulse rates of 12 individuals arranged in increasing order are:
62, 64, 68, 70, 70, 74, 74, 76, 76, 78, 78, 80
Q1=(68+70)2 = 69, Q3=(76+78)2 = 77
IQR = (77 69) = 8
29
Example 1: Box-plot
Example 2: Box-plots of intensities from 11 gene expression arrays
Outliers
An outlier is an observation which does not appear to belong with the other data
Outliers can arise because of a measurement or recording error or because of equipment failure during an experiment, etc.
An outlier might be indicative of a sub-population, e.g. an abnormally low or high value in a medical test could indicate presence of an illness in the patient.
32
Outlier Boxplot
Re-define the upper and lower limits of the boxplots (the whisker lines) as:
Lower limit = Q1-1.5IQR, and
Upper limit = Q3+1.5IQR
Note that the lines may not go as far as these limits
If a data point is < lower limit or > upper limit, the data point is considered to be an outlier.
33
Example
outliers
34
Problem 11
The numbers below represent forty eighteen-hole scores of a golfer at Crystal Springs Golf Club. Make a box-and-whisker plot of this data and find the outlier.
88 80 80 77 84 74 80 79 83 77
75 75 78 79 97 81 72 85 75 76
78 76 77 72 75 70 78 77 72 79
78 75 79 77 63 73 77 78 82 86
Mode
Mode by Interpolation
Mo = l1 + f2X i
f0+f2
OR
Mo = l2 - f0X i
f0+f2
OR
Mo = l1 + f1-f0X i
(f1- f0)+(f1-f2)
Where l1 is the lower limit of the modal class
l2 is the upper limit of the modal class
f0 equals the frequency of the class next below modal class in value
f1 equals the frequency of the modal class in value
f2 equals the frequency of the following class (class next above modal class) in value and
i equals the interval of the modal class
Problem 12
Wage GroupFrequency14-18618-221822-261926-301230-34534-38438-42342-46246-50150-54054-581Mode= 3 Median 2 Mean
Problem 12
Twenty percent of the workers in a firm employing a total of 2,000 earn less than Rs 2 per hour, 440 earn from Rs 2 to 2.24 per hour, 24% earn 2.25 to 2.49 per hour, 370 earn from rs 2.50-2.74 per hour, 12 % earn from 2.75 to 2.99 per hour and the rest earn rs 3 or more per hour. Set up a frequency table and calculate the modal wage.
Thank You
AG_04659_AS.celAG_11745_AS.celKB_5828_AS.celKB_8840_AS.cel
8
10
12
14