STA429/1007 F 2004 Handout 3: The Berkeley Data
Control by Subdivision
/*************************** berkeley.sas *********************************/
options linesize=79 pagesize=35 noovp formdlim='_';
title 'Berkeley Graduate Admissions Data: ';
proc format;
value sexfmt 1 = 'Female' 0 = 'Male';
value ynfmt 1 = 'Yes' 0 = 'No';
data berkley;
input line sex dept $ admit count;
format sex sexfmt.; format admit ynfmt.;
datalines;
1 0 A 1 512
2 0 B 1 353
3 0 C 1 120
4 0 D 1 138
5 0 E 1 53
6 0 F 1 22
7 1 A 1 89
8 1 B 1 17
9 1 C 1 202
10 1 D 1 131
11 1 E 1 94
12 1 F 1 24
13 0 A 0 313
14 0 B 0 207
15 0 C 0 205
16 0 D 0 279
17 0 E 0 138
18 0 F 0 351
19 1 A 0 19
20 1 B 0 8
21 1 C 0 391
22 1 D 0 244
23 1 E 0 299
24 1 F 0 317
;
proc freq;
tables sex*admit / nopercent nocol chisq;
tables dept*sex / nopercent nocol chisq;
tables dept*admit / nopercent nocol chisq;
tables dept*sex*admit / nopercent nocol chisq;
weight count;
/homes/students/u0/stats/brunner > cat berkeley.lst
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 1
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table of sex by admit
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 1493 | 1198 | 2691
| 55.48 | 44.52 |
---------+--------+--------+
Female | 1278 | 557 | 1835
| 69.65 | 30.35 |
---------+--------+--------+
Total 2771 1755 4526
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 2
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table of sex by admit
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 92.2053 <.0001
Likelihood Ratio Chi-Square 1 93.4494 <.0001
Continuity Adj. Chi-Square 1 91.6096 <.0001
Mantel-Haenszel Chi-Square 1 92.1849 <.0001
Phi Coefficient -0.1427
Contingency Coefficient 0.1413
Cramer's V -0.1427
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 1493
Left-sided Pr <= F 2.854E-22
Right-sided Pr >= F 1.0000
Table Probability (P) 1.314E-22
Two-sided Pr <= P 4.836E-22
Sample Size = 4526
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 3
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table of dept by sex
dept sex
Frequency|
Row Pct |Male |Female | Total
---------+--------+--------+
A | 825 | 108 | 933
| 88.42 | 11.58 |
---------+--------+--------+
B | 560 | 25 | 585
| 95.73 | 4.27 |
---------+--------+--------+
C | 325 | 593 | 918
| 35.40 | 64.60 |
---------+--------+--------+
D | 417 | 375 | 792
| 52.65 | 47.35 |
---------+--------+--------+
E | 191 | 393 | 584
| 32.71 | 67.29 |
---------+--------+--------+
F | 373 | 341 | 714
| 52.24 | 47.76 |
---------+--------+--------+
Total 2691 1835 4526
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 4
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table of dept by sex
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 5 1068.3717 <.0001
Likelihood Ratio Chi-Square 5 1220.6148 <.0001
Mantel-Haenszel Chi-Square 1 507.0346 <.0001
Phi Coefficient 0.4859
Contingency Coefficient 0.4370
Cramer's V 0.4859
Sample Size = 4526
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 5
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table of dept by admit
dept admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
A | 332 | 601 | 933
| 35.58 | 64.42 |
---------+--------+--------+
B | 215 | 370 | 585
| 36.75 | 63.25 |
---------+--------+--------+
C | 596 | 322 | 918
| 64.92 | 35.08 |
---------+--------+--------+
D | 523 | 269 | 792
| 66.04 | 33.96 |
---------+--------+--------+
E | 437 | 147 | 584
| 74.83 | 25.17 |
---------+--------+--------+
F | 668 | 46 | 714
| 93.56 | 6.44 |
---------+--------+--------+
Total 2771 1755 4526
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 6
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table of dept by admit
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 5 778.9065 <.0001
Likelihood Ratio Chi-Square 5 855.3209 <.0001
Mantel-Haenszel Chi-Square 1 724.8170 <.0001
Phi Coefficient 0.4148
Contingency Coefficient 0.3832
Cramer's V 0.4148
Sample Size = 4526
Table 1 of sex by admit
Controlling for dept=A
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 313 | 512 | 825
| 37.94 | 62.06 |
---------+--------+--------+
Female | 19 | 89 | 108
| 17.59 | 82.41 |
---------+--------+--------+
Total 332 601 933
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 7
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table 1 of sex by admit
Controlling for dept=A
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 17.2480 <.0001
Likelihood Ratio Chi-Square 1 19.0540 <.0001
Continuity Adj. Chi-Square 1 16.3718 <.0001
Mantel-Haenszel Chi-Square 1 17.2295 <.0001
Phi Coefficient 0.1360
Contingency Coefficient 0.1347
Cramer's V 0.1360
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 313
Left-sided Pr <= F 1.0000
Right-sided Pr >= F 1.151E-05
Table Probability (P) 7.672E-06
Two-sided Pr <= P 1.669E-05
Sample Size = 933
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 8
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table 2 of sex by admit
Controlling for dept=B
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 207 | 353 | 560
| 36.96 | 63.04 |
---------+--------+--------+
Female | 8 | 17 | 25
| 32.00 | 68.00 |
---------+--------+--------+
Total 215 370 585
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 9
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table 2 of sex by admit
Controlling for dept=B
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 0.2537 0.6145
Likelihood Ratio Chi-Square 1 0.2586 0.6111
Continuity Adj. Chi-Square 1 0.0851 0.7705
Mantel-Haenszel Chi-Square 1 0.2533 0.6148
Phi Coefficient 0.0208
Contingency Coefficient 0.0208
Cramer's V 0.0208
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 207
Left-sided Pr <= F 0.7598
Right-sided Pr >= F 0.3918
Table Probability (P) 0.1516
Two-sided Pr <= P 0.6771
Sample Size = 585
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 10
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table 3 of sex by admit
Controlling for dept=C
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 205 | 120 | 325
| 63.08 | 36.92 |
---------+--------+--------+
Female | 391 | 202 | 593
| 65.94 | 34.06 |
---------+--------+--------+
Total 596 322 918
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 11
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table 3 of sex by admit
Controlling for dept=C
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 0.7535 0.3854
Likelihood Ratio Chi-Square 1 0.7510 0.3862
Continuity Adj. Chi-Square 1 0.6332 0.4262
Mantel-Haenszel Chi-Square 1 0.7527 0.3856
Phi Coefficient -0.0287
Contingency Coefficient 0.0286
Cramer's V -0.0287
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 205
Left-sided Pr <= F 0.2129
Right-sided Pr >= F 0.8265
Table Probability (P) 0.0394
Two-sided Pr <= P 0.3866
Sample Size = 918
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 12
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table 4 of sex by admit
Controlling for dept=D
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 279 | 138 | 417
| 66.91 | 33.09 |
---------+--------+--------+
Female | 244 | 131 | 375
| 65.07 | 34.93 |
---------+--------+--------+
Total 523 269 792
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 13
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table 4 of sex by admit
Controlling for dept=D
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 0.2980 0.5852
Likelihood Ratio Chi-Square 1 0.2979 0.5852
Continuity Adj. Chi-Square 1 0.2216 0.6378
Mantel-Haenszel Chi-Square 1 0.2976 0.5854
Phi Coefficient 0.0194
Contingency Coefficient 0.0194
Cramer's V 0.0194
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 279
Left-sided Pr <= F 0.7328
Right-sided Pr >= F 0.3188
Table Probability (P) 0.0516
Two-sided Pr <= P 0.5995
Sample Size = 792
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 14
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table 5 of sex by admit
Controlling for dept=E
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 138 | 53 | 191
| 72.25 | 27.75 |
---------+--------+--------+
Female | 299 | 94 | 393
| 76.08 | 23.92 |
---------+--------+--------+
Total 437 147 584
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 15
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table 5 of sex by admit
Controlling for dept=E
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 1.0011 0.3171
Likelihood Ratio Chi-Square 1 0.9904 0.3196
Continuity Adj. Chi-Square 1 0.8080 0.3687
Mantel-Haenszel Chi-Square 1 0.9994 0.3175
Phi Coefficient -0.0414
Contingency Coefficient 0.0414
Cramer's V -0.0414
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 138
Left-sided Pr <= F 0.1841
Right-sided Pr >= F 0.8646
Table Probability (P) 0.0486
Two-sided Pr <= P 0.3604
Sample Size = 584
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 16
09:50 Thursday, September 23, 2004
The FREQ Procedure
Table 6 of sex by admit
Controlling for dept=F
sex admit
Frequency|
Row Pct |No |Yes | Total
---------+--------+--------+
Male | 351 | 22 | 373
| 94.10 | 5.90 |
---------+--------+--------+
Female | 317 | 24 | 341
| 92.96 | 7.04 |
---------+--------+--------+
Total 668 46 714
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 17
09:50 Thursday, September 23, 2004
The FREQ Procedure
Statistics for Table 6 of sex by admit
Controlling for dept=F
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 0.3841 0.5354
Likelihood Ratio Chi-Square 1 0.3836 0.5357
Continuity Adj. Chi-Square 1 0.2182 0.6404
Mantel-Haenszel Chi-Square 1 0.3836 0.5357
Phi Coefficient 0.0232
Contingency Coefficient 0.0232
Cramer's V 0.0232
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 351
Left-sided Pr <= F 0.7801
Right-sided Pr >= F 0.3198
Table Probability (P) 0.1000
Two-sided Pr <= P 0.5458
To test for an association between Gender and Admission controlling for department, here is a good approach. Pool the chisquare tests by adding the values of the chisquare statistics and also adding the degrees of freedom. This means we get Chisquare = 17.2480+0.2537+0.7535+0.2980+1.0011+0.3841 = 19.9384. This exceeds the critical value of 12.59 for a chisquare with 6 degrees of freedom, and we conclude that controlling for department, admission is related to gender. A good reference for this method of pooling chisquare values is Stephen Feinberg's (1989) book, the analysis of cross-classified categorical data.
This overall test does not tell you what the nature of the relationship is. For that, you need to examine the 6 individual 2x2 tables. So look at the chi-square test separately for each department. Use a Bonferroni correction to allow for the fact that you are doing 6 tests. This means declaring the results significant only if p < 0.05/6 = 0.0083. The only results that are significant by this criterion (or with a one-at-a-time test, for that matter) are the results for Department A. There, we see ... what?