Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of...

82
Cross-tabulation

Transcript of Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of...

Page 1: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Cross-tabulation

Page 2: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

金枝玉孽

Page 3: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

無間道

Page 4: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

神鵰俠侶

Page 5: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

白娘子傳奇

Page 6: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Objectives

To study the use of Crosstab for data analysis.

To study certain measures of association.

Page 7: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Content

Introduction Crosstab Measures of Association

Page 8: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Introduction

X Y Method

Nominal Nominal Chi-squared test in Crosstab

Nominal Ordinal or above ANOVA

Ordinal or above Nominal Discriminant Analysis

Ordinal or above Ordinal or above Regression Analysis

Page 9: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Introduction

A cross tabulation involve the simultaneous counting of the number of observations that fall into each of the data categories of 2 or more variables.

Age group Highly loyal Moderately loyal

Brand switchers

Total

<30 30 42 18 90

30-40 14 20 31 65

>40 34 25 16 75

Total 78 87 65 230

Page 10: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

1. Introduction

Example: Testing the effectiveness of coupon in increasing

consumer awareness of Brand A:

Before coupon After coupon

Aware Not aware

Total Aware Not aware

Total

Test area

250 350 600 330 170 550

Control area

160 240 400 160 220 380

410 590 1000 490 390 880

Page 11: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

1. Introduction

In percentage of column totals:

Before coupon After coupon

Aware Not aware

Total Aware Not aware

Total

Test area 61 59 60 37 44 57

Control area

39 41 40 33 56 43

100 100 100 100 100 100

Page 12: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Introduction

In percentage of row totals :

Before coupon After coupon

Aware Not aware

Total Aware Not aware

Total

Test area 42 58 100 66 34 100

Control area

40 60 100 42 58 100

41 59 100 57 43 100

Page 13: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Simpson’s Paradox

Lurking variables can change or reverse a relation between two categorical variables!

1997 2007

Age Fraction Income Fraction Income

<=45 0.5 $60,000 0.7 $70,000

>45 0.5 $120,000 0.3 $130,000

Mean $90,000 $88,000

Page 14: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Simpson’s Paradox

Buyer Nonbuyer Total Buyer %

Male 700 300 1000 70%

Female 400 600 1000 40%

Total 1100 900 2000 55%

Page 15: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Simpson’s ParadoxLuxury Buyer Nonbuyer Total Buyer %

Male 40 160 200 20%

Female 200 600 800 25%

Total 240 760 1000 24%

Plain Buyer Nonbuyer Total Buyer %

Male 660 140 800 82.5%

Female 200 0 200 100%

Total 860 140 1000 86%

Page 16: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-squared Test The chi-square test assumes a multinominal

experiment. The multinominal experiment is analogous to tossing n balls at k boxes, where:

1. Each ball must fall in one of the boxes.

2. The probability, pi , that a ball will fall in box i remains the same in repeated tosses.

3. The trials are independent.

4. At the conclusion of the experiment, we have n1

balls in box one, n2 balls in box two, and so on.

Page 17: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Multinomial Distribution

n independents trials permitting K mutually exclusive outcomes whose respective probabilities are P1 , P2 , ..., Pk (Σ1

kPi = 1).   

Pk remains constant throughout the n trials.

We are interested in the probability of getting X1 outcomes of the 1st kind , X2 outcomes of the 2nd kind, ......, Xk outcomes of the kth kind (Σ1

kXi = n):

Page 18: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-squared Test Chi-square test can be used to test if the observed

association between the variables in the cross-tabulation is statistically significant. This is called the test of independence.

Example:

Age group Highly loyal Moderately loyal

Brand switchers

Total

<30 30 (30.5) 42 (34.1) 18 (25.4) 90

30-40 14 (22.1) 20 (24.5) 31 (18.4) 65

>40 34 (25.4) 25 (28.4) 16 (21.2) 75

Total 78 87 65 230

Page 19: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-squared Test Chi-squared statistic = Σ{(O - E)2/E}, with degrees

of freedom =  (r - 1)(c - 1)

.Hreject ,05.00003.0)01.21(

4)13)(13(

01.212.21

2.2116...

1.34

1.3442

5.30

5.3030

tIndependen areLoyalty and Age:Ho

02

4

222*2

dfP

df

Age group

Highly loyal

Moderately loyal

Brand switchers

Total

<30 30 (30.5) 42 (34.1) 18 (25.4) 90

30-40 14 (22.1) 20 (24.5) 31 (18.4) 65

>40 34 (25.4) 25 (28.4) 16 (21.2) 75

Total 78 87 65 230

O

E

Page 20: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-Square Distribution Table

Page 21: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

CrossTable

• The CrossTable( ) function in the gmodels package produces crosstabulations and tests results.

• install.packages("gmodels") if necessary.• library(gmodels)• sales <- read.table('sales.csv', header = T, sep=',')• CrossTable(gender, purchase, expected=T, prop.chisq

= F, chisq = T, fisher = T, sresid = T, format= "SPSS")

Page 22: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

CrossTableTotal Observations in Table: 582

| purchase gender | no | yes | Row Total | -------------|-----------|-----------|-----------| female | 102 | 271 | 373 | | 125.615 | 247.385 | | | 27.346% | 72.654% | 64.089% | | 52.041% | 70.207% | | | 17.526% | 46.564% | | | -2.107 | 1.501 | | -------------|-----------|-----------|-----------| male | 94 | 115 | 209 | | 70.385 | 138.615 | | | 44.976% | 55.024% | 35.911% | | 47.959% | 29.793% | | | 16.151% | 19.759% | | | 2.815 | -2.006 | | -------------|-----------|-----------|-----------|Column Total | 196 | 386 | 582 | | 33.677% | 66.323% | | -------------|-----------|-----------|-----------|

Page 23: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

CrossTable

Statistics for All Table Factors

Pearson's Chi-squared test -------------------------------------------------------Chi^2 = 18.64021 d.f. = 1 p = 1.578558e-05

Pearson's Chi-squared test with Yates' continuity correction -------------------------------------------------------Chi^2 = 17.85923 d.f. = 1 p = 2.378625e-05

Count Expected

Count Expected -Count Residuals edStandardiz

Page 24: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

CrossTableFisher's Exact Test for Count Data-----------------------------------------------------------Sample estimate odds ratio: 0.4611142

Alternative hypothesis: true odds ratio is not equal to 1p = 2.408778e-05 95% confidence interval: 0.3179706 0.6676994

Alternative hypothesis: true odds ratio is less than 1p = 1.352295e-05 95% confidence interval: 0 0.6308124

Alternative hypothesis: true odds ratio is greater than 1p = 0.999994 95% confidence interval: 0.3367543 Inf Minimum expected frequency: 70.38488

Page 25: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Yates’s continuity correction

Page 26: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-squared Test Someone gives you a 52 card deck. You draw a

card, record its suit, replace the card, shuffle the deck and repeat that process 200 times, obtaining the following table:

Diamonds Clubs Hearts Spades46 54 49 51

Does the distribution of suits appear to be standard?

Page 27: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-squared Test

To use the test, each cell count should be greater than one, and not more than 20% of the expected frequencies (E) should be less than 5.

Pooling of categories is a method to solve the problem of having more than 20% of the cells with expected frequency < 5.

Page 28: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

• Need vsc package to get measures of association• install.packages("vcd") if needed• library(vcd)• tab <- xtabs(~gender + purchase, data = sales) #

produce a table between gender and purchase• summary(assocstats(tab))

Page 29: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Call: xtabs(formula = ~gender + purchase)Number of cases in table: 582 Number of factors: 2 Test for independence of all factors: Chisq = 18.64, df = 1, p-value = 1.579e-05 X^2 df P(> X^2)Likelihood Ratio 18.368 1 1.8212e-05Pearson 18.640 1 1.5786e-05

Phi-Coefficient : 0.179 Contingency Coeff.: 0.176 Cramer's V : 0.179

Page 30: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Magnitude of Effect

For phi, Contingency Coefficient, Cramer’s V:�

small ≈ 0.1� �moderate ≈ 0.3�large ≈ 0.5

Page 31: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

n)( t Coefficieny Contingenc

1)-n(k V sCramer'

n Phi

2

2

2

2

Page 32: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

The likelihood ratio

Page 33: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Introduction

There is a plethora of indexes for measuring the strength of association. Factors affecting the value of a particular measure:

Skewed marginal distributions - only a few indices are impervious to marginal distributions.

Nonsquare tables - some measures cannot attain their maximum. If the table is nonsquare.

Nominal Measures

These measures only provide some indication of the strength of the association, but they cannot show direction and nature of the relationship.

Page 34: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Odds Ratio

The Odds Ratio Consider the following table :

Look at the odds of being white-collar. In the case of Loyal, these odds are 30 to 14 about 2:1. In the case of Disloyal, these odds are 42 to 20 or about 2:1.

The following ratio is called the odds:

odds ratio =n11n21/(n12n22)

Loyal Disloyal

White collar 30 42

Blue collar 14 20

Page 35: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Odds Ratio with a Stratifying Variable

Data comes from a cohort study or case-control study that is stratified, for example, the data may be separated (stratified) by the sex of the people studied. Consider the following tables :

Assuming a constant odds ratio across age-strata, test to see if the odds ratio is 1 and report a Pvalue

High age High alcohol consumption

Low alcohol consumption

Case 25 21

Control 29 128

Page 36: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Odds Ratio with a Stratifying Variable> mymatrix1 <- matrix(c(8,5,52,164),nrow=2,byrow=TRUE)> colnames(mymatrix1) <- c("High","Low")> rownames(mymatrix1) <- c("Case",“Control")> print(mymatrix1) High LowCase 8 5Control 52 164> mymatrix2 <- matrix(c(25,21,29,128),nrow=2,byrow=TRUE)> colnames(mymatrix2) <- c("High","Low")> rownames(mymatrix2) <- c("Case",“Control")> print(mymatrix2) High LowCase 25 21Control 29 128

Page 37: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Odds Ratio with a Stratifying Variable The Mantel-Haenszel odds ratio estimates the odds ratio for

association between the Case and Control, controlling for the possible confounding effects of the stratifying variable (age here).

> install.packages("lawstat") > library("lawstat") > myarray <- array(c(mymatrix1,mymatrix2),dim=c(2,2,2)) > cmh.test(myarray)

Cochran-Mantel-Haenszel Chi-square Test

data: myarray CMH statistic = 32.181, df = 1.000, p-value = 0.000, MH Estimate = 5.197, Pooled Odd Ratio = 4.575, Odd Ratio of

level 1 = 5.046, Odd Ratio of level 2 = 5.255

Page 38: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

The log odds is insensitive to marginal distribution.  The following tables have the same log odds ratio:

The odd ratio is also invariant under interchange of rows and columns, hence, the odd ratio is a symmetric measure.

75 15 750 15

10 100 100 100

85 105 850 115

Page 39: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Yule's Y (or the Coefficient of Colligation) 

Y^ = sqrt(Q^)

variance = (1/n11 + 1/n21 + 1/n12 + 1/n22)(1 -Y^2)2/16

Pearson's Product Moment Correction

The following formula is used:

r = (n11n22 - n12n21)/sqrt(n1.n2.n.2n.2)

r lies between -1.0 and 1.0 and equals 0 if the variables are independent.

Page 40: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Cramer introduced the following variant (it can achieve the maximum value of 1):

V = sqrt(χ2/n(k - 1))

k is the smaller of the number of rows and columns. If one of the table's dimensions is 2, V and phi are the same.

Another version is Tschuprow's T = sqrt(χ2/sqrt[(I - 1)(J - 1)])

T varies between 1 and 0 and attains maximum when I=J.

Page 41: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Proportional Reduction in Error Refer to the following table:

Like the product

Dislike the product

Total

High income 26 8 34

Low income 17 33 50

Total 13 41 84

Page 42: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Suppose that we want to predict what category a randomly selected person would fall in when asked the question "Do you like the product?". If we had no knowledge of the row variable, the best bid is always "like the product" we shall be wrong in 41 of the 84 cases. If we know that the subject comes from the high income group, we shall predict `like the product' and be wrong in only 8 cases. If we know that he/she is in the low income group, we shall predict dislike the product and be wrong in 17 cases. Hence, we have reduced our number of prediction errors to 8 + 17 = 25 cases. Proportional reduction in error is:

lambdaC/R = (41 - 25)/41 = 0.39

Page 43: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

That is, 39% of the errors in predicting the column variables are eliminated by knowing the row variable.

Lambda is asymmetric and varies between 0, indicating no ability at all to eliminate errors in the column variable on the basis of the row variable, and 1, indicating an ability to eliminate all errors in the column variable predictions, given knowledge of the row variable. Similarly, the lambda-asymmetric for the column variable is:

lambdaC/R = (ΣfKR* -FC*)/(n - FC*)

fKR* is the maximum frequency found within each subclass of the row variable.

FC* is the maximum frequency among the marginal totals of the row variable.

Page 44: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

For the rows:lambdaR/c = (ΣfLC* -FR*)/(n - FR*)

An symmetric measure:lambda = (ΣfKR* -FC* + ΣfLC* -FR*)/(2n - FC* - FR*)

Page 45: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Ordinal Measures of Association Kendall’s r or Kendall rank correlation coefficient.

A pair of cases is concordant (P), if the values of both variables for one case are higher (or both are lower) than the corresponding values for the other case. The pair is discordant (Q) if the value of one variable for a case is larger than the corresponding value for the other case, and the direction is reversed for the second variable. When the 2 cases have identical values on one or on both variables, they are tied.

Page 46: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Ordinal Measures of Association

If the preponderance of pairs is concordant, the association is said to be positive, otherwise it is negative. If concordant and discordant pairs are equally likely, no association is said to exist. The following are some measures:

Kendall tau-a =(P - Q) / total number of pairs.

tau-b = (P - Q)/sqrt[(P + Q + Tx)(P + Q + Ty)] where Tx = pairs tied on X and Ty = pairs tied on Y.

tau-c =2m(P - Q)/n^2(m - 1) where m is the smaller of the number of rows and columns.

Goodman and Kruskal’s gamma = (P - Q)/(P + Q), and if G=0, it means independence.

Somer’s d = (P - Q)/(P + Q + Ty)

Page 47: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Ordinal Measures of Association

Page 48: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Interval Data Pearson's correlation coefficient and a lot of other measures can

be used.

Page 49: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Kappa (Agreement - Expected Agreement) / (1 – Expected Agreement)

Page 50: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Kappa For measuring agreement.  The 2 variables must have the same

range of values. (Agreement - Expected Agreement) / (1 – Expected Agreement) Example:

Rater 1 rated 44.4% of the customers as loyal.Rater 2 rated 40.3% of the customers as loyal.

Loyal Moderately loyal Brand switcher Total

Loyal 17 4 8 29 40.3%

Moderately loyal

5 12 17 23.6%

Brand switcher 10 3 13 26 36.1%

Total 32 19 21 72

44.4% 26.4% 29.2%

Page 51: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Kappa

If the ratings are independent, 17.9% (44.4% X 40.3%) of the customers would be rated as loyal by both; 6.2% would be rated as moderately loyal by both, and 10.5% would be rated as brand switcher by both.

Therefore, (17.9 + 6.2 +10.5)%=34.6% would be classified the same merely by chance.

Now, observed percentage of customers classified the same = 42/72 = 58.3%

And the largest possible non-chance agreement = 1- 34.6%

Then Kappa = (0.583 - 0.346)/(1 - 0.346) = 0.362.

Page 52: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Kappa

Kappa is always less than or equal to 1. Kappa = 0: Agreement is at chance A value of 1 implies perfect agreement and values less

than 1 imply less than perfect agreement. Kappa < 0: your model is worse than chance. Kappa = negative infinity: Agreement is perfectly inverse Poor agreement = Less than 0.20 Fair agreement = 0.20 to 0.40 Moderate agreement = 0.40 to 0.60 Good agreement = 0.60 to 0.80 Very good agreement = 0.80 to 1.00

Page 53: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Measures of Association

Page 54: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Entering data: weight cases

Page 55: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Entering data: weight cases

Page 56: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

SPSS Crosstabs From the pull down menu Analyze, select Descriptive

Statistics and choose Crosstabs to open up the following dialogue box. 

Select Gender from the variable list and move it to the Row variable box.

For the column variable, select Shopping Duty.

Page 57: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

SPSS Crosstabs Click Cells in the previous dialog box to bring up the

following. Select Expected under Counts to compute the

expected frequency of each cell. Select Standardized under Residuals. click Continue to go back to the previous dialog box.

Page 58: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

SPSS Crosstabs Click Continue to go back

to the previous dialog box, then click Statistics to display the following dialog box.

Select Chi-square, Contingency Coefficient, and Phi and Cramer's V.

Click Continue and OK to get the output.

Page 59: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

SPSS Crosstabs

Page 60: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Test of Independence

Ho: Gender and shopping duty are independent (or in other words, there is no gender difference in shopping duty).

Ha: Gender and shopping duty are dependent (or in other words, there is gender difference in shopping duty).

Since the Sig. (or p-value) associated with the Likelihood Ratio is 0.490 > 0.05, we would not reject Ho and conclude that there is no difference in shopping duty between male and female respondents.

Page 61: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode To combine categories, choose Transform

from the main menu, and then select Recode and Into Different Variables.

In the dialog box that pops up, select Duty and put it into the Numeric Variable -> Output box. 

Page 62: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode Make up a name for the Output Variable.

Let’s call the new variable rec_duty which stands for recoded duty. Type recoded duty in the textbox under Label.

Page 63: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode

Click the Old and New Values button to go into another dialog box.

Page 64: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode Click Range under Old Values and enter the

numbers 1 & 2 represent the duty = yes group and the duty = shared responsibility group. Then enter the number 1 into the New Value box.

Page 65: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode

Now enter 3 into Old Value textbox and 2 into the New Value text box as shown below

Page 66: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode

Specify the value labels of rec_duty as shown below:

Page 67: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Recode

Page 68: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Fisher’s Exact Test When drinking tea, a woman claimed to be able to

distinguish whether milk or tea was added to the cup first.

To test this claim, she was given eight cups of tea. In four of the cups, tea was added first, and in four of the cups, milk was added first.

The order in which the cups were presented to her was randomized.

She was told that there were four cups of each type, so that she should make four predictions of each order.

Ho: The order in which milk or tea is poured into a cup and the taster’s guess of the order are independent.

Ha: The taster can correctly guess the order in which milk or tea is poured into a cup.

Page 69: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Fisher’s Exact TestGuess * Actual Crosstabulation

3 1 42.0 2.0 4.0

1 3 42.0 2.0 4.0

4 4 84.0 4.0 8.0

CountExpected CountCountExpected CountCountExpected Count

Milk First

Tea First

Guess

Total

Milk First Tea FirstActual

Total

Chi-Square Tests

2.000b 1 .157.500 1 .480

2.093 1 .148.486 .243

1.750 1 .1868

Pearson Chi-SquareContinuity Correctiona

Likelihood RatioFisher's Exact TestLinear-by-Linear AssociationN of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea. 4 cells (100.0%) have expected count less than 5. The minimum expected count is 2.00.b.

Page 70: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

70

http://www.swogstat.org/stat/public/fisher.htm

Y

X

The output consists of three p-values: Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right. Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right. 2-Tail: Use this when there is no prior alternative.

TABLE = [ 3 , 7 , 5 , 10 ]Left : p-value = 0.6069Right : p-value = 0.726392-Tail : p-value = 1

yes no total

yes 3 7 10

no 5 10 15

total 8 17

Page 71: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Fisher’s Exact Test

Fisher's exact test returns exact one-tailed and two-tailed p-values for a given frequency table.

The probability of observing a given set of frequencies A, B, C, and D in a 2 x 2 contingency table, given fixed row and column marginal totals and sample size N, is:

Page 72: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Fisher’s Exact Test

Fisher's exact test computes the probability, given the observed marginal frequencies, of obtaining exactly the frequencies observed and any configuration more extreme.

Example:

2 (A) 3 (B) 5   (A+B)

6 (C) 4 (D) 10 (C+D)

8 (A+C) 7 (B+D) 15 (N)

Page 73: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Fisher’s Exact Test

All configurations with the same marginal frequencies include:

Page 74: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Fisher’s Exact Test

Thus, the one-tailed probability for this table would be: .326 + .093 + .007 = .426 ...whereas the two-tailed probability would be: .326 + .093 + .007 + .163 + .019 = .608 The probability for the fourth configuration is not included

because it is less extreme (more probable) than the observed frequency configuration.

Since p = 0.608 > 0.025 (the test is a two-tail test), the null hypothesis is not rejected.

Page 75: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Standardized Residuals

Count Expected

Count Expected -Count Residuals edStandardiz

Page 76: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Symmetric Measures

Page 77: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Magnitude of Effect

For phi, Contingency Coefficient, Cramer’s V:�

trivial if value < ±0.1 � small if ±0.1 < value < ±0.3 medium effect ±0.3 < value < ±0.5 large effect if value > ±0.5

Page 78: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

One Sample Chi-Square Test Example

Page 79: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

One-Sample Chi-Square Example

Null Ho: 0 = E

Statistical test One-sample chi-square

Significance level .05

Calculated value 9.89

Critical test value 7.815

Page 80: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

References:

Reference book Chapter 16.

Page 81: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

END

Page 82: Cross-tabulation. 金枝玉孽 無間道 神鵰俠侶 白娘子傳奇 Objectives To study the use of Crosstab for data analysis. To study certain measures of association.

Chi-squared Test Chi-squared statistic = Σ{(O - E)2/E}, with degrees

of freedom =  (r - 1)(c - 1)

2

dwewr

xF

dfwr

x

r2

12

0 2/

222*2

22/

1)(

4)13)(13(

01.212.21

2.2116...

1.34

1.3442

5.30

5.3030

tIndependen areLoyalty and Age:Ho

Age group

Highly loyal

Moderately loyal

Brand switchers

Total

<30 30 (30.5) 42 (34.1) 18 (25.4) 90

30-40 14 (22.1) 20 (24.5) 31 (18.4) 65

>40 34 (25.4) 25 (28.4) 16 (21.2) 75

Total 78 87 65 230

O

E