STA429/1007 F 2004 Handout 9
Logistic regression (math and Berkeley data)
/********************** mathlog1.sas **********************/
title2 'Logistic regression on math data';
options linesize=79 pagesize=2000 noovp formdlim='_';
libname math 'mathlib'; /* Location of permanent SAS datasets */
libname library 'mathlib'; /* SAS will seach for permanently stored
formats ONLY in a place called "library." */
/* mathread.sas creates the variable passed like this
if (50<=mark<=100) then passed=1; else passed=0;
label passed = 'Passed the course';
format passed ynfmt.; */
data quant; /* Includes only cases that are used for full model.
It's a shame that this is necessary. */
set math.explore;
goodcase = gpa+hscalc+precalc+calc; /* Will be missing if any missing */
if goodcase = . then delete ;
proc logistic descending; /* Always use descending */
title3 'Fit full model and do Wald tests';
model passed = gpa hscalc precalc calc;
hschool: test gpa=hscalc=0;
dtest: test precalc=calc=0;
proc logistic descending;
title3 'Fit reduced model for testing hschool (gpa & hscalc)';
model passed = precalc calc;
proc iml;
title3 'Calculate Likelihood Ratio Test for hschool';
G = 434.817 - 366.007; /* Got these numbers from the printout */
pval = 1-probchi(G,2);
print "G = " G ", df = 2, p = " pval;
print "Compare Wald chisquare = 51.2448";
proc logistic descending ;
title3 'Reduced model for testing diagnostic test (precalc & calc)';
model passed = gpa hscalc;
proc iml;
title3 'Calculate Likelihood Ratio Test for dtest';
G = 380.910 - 366.007; /* Got these numbers from the printout */
pval = 1-probchi(G,2);
print "G = " G ", df = 2, p = " pval;
print "Compare Wald chisquare = 13.8587";
Here is mathlog1.lst
_______________________________________________________________________________
The SAS System 1
Logistic regression on math data
Fit full model and do Wald tests
21:38 Saturday, October 30, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.QUANT
Response Variable passed Passed the course
Number of Response Levels 2
Number of Observations 375
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value passed Frequency
1 Yes 234
2 No 141
Probability modeled is passed='Yes'.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 498.554 376.007
SC 502.481 395.642
-2 Log L 496.554 366.007
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 130.5468 4 <.0001
Score 108.2737 4 <.0001
Wald 79.7057 4 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -14.6351 2.2803 41.1914 <.0001
gpa 1 0.1181 0.0311 14.4227 0.0001
hscalc 1 0.0592 0.0136 18.9109 <.0001
precalc 1 0.2633 0.0890 8.7518 0.0031
calc 1 0.0821 0.0650 1.5969 0.2063
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
gpa 1.125 1.059 1.196
hscalc 1.061 1.033 1.090
precalc 1.301 1.093 1.549
calc 1.086 0.956 1.233
Association of Predicted Probabilities and Observed Responses
Percent Concordant 83.4 Somers' D 0.670
Percent Discordant 16.4 Gamma 0.671
Percent Tied 0.1 Tau-a 0.315
Pairs 32994 c 0.835
Linear Hypotheses Testing Results
Wald
Label Chi-Square DF Pr > ChiSq
hschool 51.2448 2 <.0001
dtest 13.8587 2 0.0010
_______________________________________________________________________________
The SAS System 2
Logistic regression on math data
Fit reduced model for testing hschool (gpa & hscalc)
21:38 Saturday, October 30, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.QUANT
Response Variable passed Passed the course
Number of Response Levels 2
Number of Observations 375
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value passed Frequency
1 Yes 234
2 No 141
Probability modeled is passed='Yes'.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 498.554 440.817
SC 502.481 452.598
-2 Log L 496.554 434.817
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 61.7369 2 <.0001
Score 55.0700 2 <.0001
Wald 47.8371 2 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.8113 0.3604 25.2549 <.0001
precalc 1 0.3696 0.0821 20.2696 <.0001
calc 1 0.2066 0.0567 13.2947 0.0003
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
precalc 1.447 1.232 1.700
calc 1.230 1.100 1.374
Association of Predicted Probabilities and Observed Responses
Percent Concordant 72.4 Somers' D 0.467
Percent Discordant 25.7 Gamma 0.475
Percent Tied 1.9 Tau-a 0.220
Pairs 32994 c 0.733
_______________________________________________________________________________
The SAS System 3
Logistic regression on math data
Calculate Likelihood Ratio Test for hschool
21:38 Saturday, October 30, 2004
G PVAL
G = 68.81 , df = 2, p = 1.11E-15
Compare Wald chisquare = 51.2448
_______________________________________________________________________________
The SAS System 4
Logistic regression on math data
Reduced model for testing diagnostic test (precalc & calc)
21:38 Saturday, October 30, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.QUANT
Response Variable passed Passed the course
Number of Response Levels 2
Number of Observations 375
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value passed Frequency
1 Yes 234
2 No 141
Probability modeled is passed='Yes'.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 498.554 386.910
SC 502.481 398.691
-2 Log L 496.554 380.910
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 115.6442 2 <.0001
Score 99.0512 2 <.0001
Wald 73.7669 2 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -14.7000 2.2138 44.0913 <.0001
gpa 1 0.1250 0.0303 17.0524 <.0001
hscalc 1 0.0719 0.0130 30.5780 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
gpa 1.133 1.068 1.202
hscalc 1.075 1.048 1.102
Association of Predicted Probabilities and Observed Responses
Percent Concordant 81.9 Somers' D 0.640
Percent Discordant 18.0 Gamma 0.641
Percent Tied 0.1 Tau-a 0.301
Pairs 32994 c 0.820
_______________________________________________________________________________
The SAS System 5
Logistic regression on math data
Calculate Likelihood Ratio Test for dtest
21:38 Saturday, October 30, 2004
G PVAL
G = 14.903 , df = 2, p = 0.0005806
Compare Wald chisquare = 13.8587
For reference, here is berkdata.sas. Use it with %include.
/* berkdata.sas: Define Berkeley Grad admissions data
Always need weight count */
options linesize=79 pagesize=35 noovp formdlim='_';
title 'Berkeley Graduate Admissions Data: ';
proc format;
value sexfmt 1 = 'Female' 0 = 'Male';
value ynfmt 1 = 'Yes' 0 = 'No';
data berkley;
input line sex dept $ admit count;
/* Dummy vars for department: F is ref category */
if dept='A' then a = 1 ; else a = 0;
if dept='B' then b = 1 ; else b = 0;
if dept='C' then c = 1 ; else c = 0;
if dept='D' then d = 1 ; else d = 0;
if dept='E' then e = 1 ; else e = 0;
format sex sexfmt.; format admit ynfmt.;
datalines;
1 0 A 1 512
2 0 B 1 353
3 0 C 1 120
4 0 D 1 138
5 0 E 1 53
6 0 F 1 22
7 1 A 1 89
8 1 B 1 17
9 1 C 1 202
10 1 D 1 131
11 1 E 1 94
12 1 F 1 24
13 0 A 0 313
14 0 B 0 207
15 0 C 0 205
16 0 D 0 279
17 0 E 0 138
18 0 F 0 351
19 1 A 0 19
20 1 B 0 8
21 1 C 0 391
22 1 D 0 244
23 1 E 0 299
24 1 F 0 317
;
/* Commented out
proc freq;
tables dept * (a--e) / norow nocol nopercent;
Now the program file, followed by the list file.
/* logberk.sas*/
%include 'berkdata.sas'; /* Always need weight count */
title2 'Logistic dummy var regression on Berkeley data';
proc logistic descending;
title3 'Admit by sex: proc freq gives LR chisq = 93.4494';
model admit = sex;
weight count;
proc logistic descending;
title3 'Admit by Dept: proc freq gives LR chisq = 855.3209';
title4 '(Also the reduced model for testing sex controlling for dept.)';
model admit = a -- e;
weight count;
proc logistic descending;
title3 'Full model with Sex and Dept and Interaction';
class sex dept;
model admit = sex dept sex*dept;
weight count;
/* The subdivision approach to controlling for dept specifies a reduced
model in which there is no relationship between sex and admission for
ANY department. If we were testing means in an ordinary ANOVA, this
would look like two profiles (one for males and one for females) that
were exactly on top of each other. That is, there would be no main
effect for sex AND no dept by sex interaction. That's what we test
in the present case too. We have already fit the reduced model, which
has just dept, but no sex and no interaction. */
proc iml;
title3 'Calculate LR test of Sex controlling for Dept';
print " proc freq gives LR chisq =";
print "19.0540+0.2586+0.75100+0.2979+0.9904+0.3836 = 21.7355";
G = 5189.020 - 5167.284; pval = 1-probchi(G,5);
print "G = " G ", df = 5, p = " pval;
/* Finally, the significant Wald test for interaction is intriguing. If it
holds up, it would represent the substantial association in Dept A,
contrasted with the weak or absent association in other departments.
Fit reduced model with sex and dept but no interaction. */
proc logistic descending;
title3 'Reduced model with Sex and Dept';
class sex dept;
model admit = sex dept;
weight count;
proc iml;
title3 'Calculate LR test of Sex by Dept';
print "Wald chisquare = 17.9011, df=5, p = 0.0031";
G = 5187.488 - 5167.284; pval = 1-probchi(G,5);
print "G = " G ", df = 5, p = " pval;
Here is logberk.lst.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 1
Logistic dummy var regression on Berkeley data
Admit by sex: proc freq gives LR chisq = 93.4494
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.BERKLEY
Response Variable admit
Number of Response Levels 2
Number of Observations 24
Weight Variable count
Sum of Weights 4526
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total Total
Value admit Frequency Weight
1 Yes 12 1755.0000
2 No 12 2771.0000
Probability modeled is admit='Yes'.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 2
Logistic dummy var regression on Berkeley data
Admit by sex: proc freq gives LR chisq = 93.4494
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 6046.341 5954.891
SC 6047.519 5957.247
-2 Log L 6044.341 5950.891
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 93.4494 1 <.0001
Score 92.2053 1 <.0001
Wald 91.2356 1 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -0.2201 0.0388 32.2086 <.0001
sex 1 -0.6103 0.0639 91.2356 <.0001
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 3
Logistic dummy var regression on Berkeley data
Admit by sex: proc freq gives LR chisq = 93.4494
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
sex 0.543 0.479 0.616
Association of Predicted Probabilities and Observed Responses
Percent Concordant 25.0 Somers' D 0.000
Percent Discordant 25.0 Gamma 0.000
Percent Tied 50.0 Tau-a 0.000
Pairs 144 c 0.500
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 4
Logistic dummy var regression on Berkeley data
Admit by Dept: proc freq gives LR chisq = 855.3209
(Also the reduced model for testing sex controlling for dept.)
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.BERKLEY
Response Variable admit
Number of Response Levels 2
Number of Observations 24
Weight Variable count
Sum of Weights 4526
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total Total
Value admit Frequency Weight
1 Yes 12 1755.0000
2 No 12 2771.0000
Probability modeled is admit='Yes'.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 5
Logistic dummy var regression on Berkeley data
Admit by Dept: proc freq gives LR chisq = 855.3209
(Also the reduced model for testing sex controlling for dept.)
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 6046.341 5201.020
SC 6047.519 5208.088
-2 Log L 6044.341 5189.020
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 855.3209 5 <.0001
Score 778.9065 5 <.0001
Wald 623.0288 5 <.0001
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 6
Logistic dummy var regression on Berkeley data
Admit by Dept: proc freq gives LR chisq = 855.3209
(Also the reduced model for testing sex controlling for dept.)
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -2.6755 0.1524 308.1089 <.0001
a 1 3.2689 0.1671 382.8915 <.0001
b 1 3.2183 0.1749 338.6368 <.0001
c 1 2.0598 0.1674 151.4376 <.0001
d 1 2.0106 0.1699 140.0623 <.0001
e 1 1.5860 0.1798 77.8152 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
a 26.283 18.944 36.464
b 24.986 17.735 35.202
c 7.844 5.650 10.890
d 7.468 5.353 10.418
e 4.884 3.433 6.947
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 7
Logistic dummy var regression on Berkeley data
Admit by Dept: proc freq gives LR chisq = 855.3209
(Also the reduced model for testing sex controlling for dept.)
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Association of Predicted Probabilities and Observed Responses
Percent Concordant 41.7 Somers' D 0.000
Percent Discordant 41.7 Gamma 0.000
Percent Tied 16.7 Tau-a 0.000
Pairs 144 c 0.500
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 8
Logistic dummy var regression on Berkeley data
Full model with Sex and Dept and Interaction
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.BERKLEY
Response Variable admit
Number of Response Levels 2
Number of Observations 24
Weight Variable count
Sum of Weights 4526
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total Total
Value admit Frequency Weight
1 Yes 12 1755.0000
2 No 12 2771.0000
Probability modeled is admit='Yes'.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 9
Logistic dummy var regression on Berkeley data
Full model with Sex and Dept and Interaction
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Class Level Information
Design Variables
Class Value 1 2 3 4 5
sex Female 1
Male -1
dept A 1 0 0 0 0
B 0 1 0 0 0
C 0 0 1 0 0
D 0 0 0 1 0
E 0 0 0 0 1
F -1 -1 -1 -1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 10
Logistic dummy var regression on Berkeley data
Full model with Sex and Dept and Interaction
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 6046.341 5191.284
SC 6047.519 5205.421
-2 Log L 6044.341 5167.284
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 877.0564 11 <.0001
Score 797.7045 11 <.0001
Wald 628.1667 11 <.0001
Type III Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
sex 1 3.3927 0.0655
dept 5 389.4863 <.0001
sex*dept 5 17.9011 0.0031
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 11
Logistic dummy var regression on Berkeley data
Full model with Sex and Dept and Interaction
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -0.5552 0.0551 101.5729 <.0001
sex Female 1 0.1015 0.0551 3.3927 0.0655
dept A 1 1.5733 0.1206 170.2821 <.0001
dept B 1 1.1990 0.1869 41.1307 <.0001
dept C 1 -0.0428 0.0805 0.2822 0.5953
dept D 1 -0.1078 0.0824 1.7094 0.1911
dept E 1 -0.5019 0.0986 25.9191 <.0001
sex*dept Female A 1 0.4246 0.1206 12.3999 0.0004
sex*dept Female B 1 0.00854 0.1869 0.0021 0.9635
sex*dept Female C 1 -0.1639 0.0805 4.1420 0.0418
sex*dept Female D 1 -0.0605 0.0824 0.5382 0.4632
sex*dept Female E 1 -0.2016 0.0986 4.1808 0.0409
Association of Predicted Probabilities and Observed Responses
Percent Concordant 45.8 Somers' D 0.000
Percent Discordant 45.8 Gamma 0.000
Percent Tied 8.3 Tau-a 0.000
Pairs 144 c 0.500
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 12
Logistic dummy var regression on Berkeley data
Calculate LR test of Sex controlling for Dept
06:40 Sunday, October 31, 2004
proc freq gives LR chisq =
19.0540+0.2586+0.75100+0.2979+0.9904+0.3836 = 21.7355
G PVAL
G = 21.736 , df = 5, p = 0.0005877
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 13
Logistic dummy var regression on Berkeley data
Reduced model with Sex and Dept
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.BERKLEY
Response Variable admit
Number of Response Levels 2
Number of Observations 24
Weight Variable count
Sum of Weights 4526
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total Total
Value admit Frequency Weight
1 Yes 12 1755.0000
2 No 12 2771.0000
Probability modeled is admit='Yes'.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 14
Logistic dummy var regression on Berkeley data
Reduced model with Sex and Dept
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Class Level Information
Design Variables
Class Value 1 2 3 4 5
sex Female 1
Male -1
dept A 1 0 0 0 0
B 0 1 0 0 0
C 0 0 1 0 0
D 0 0 0 1 0
E 0 0 0 0 1
F -1 -1 -1 -1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 15
Logistic dummy var regression on Berkeley data
Reduced model with Sex and Dept
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 6046.341 5201.488
SC 6047.519 5209.735
-2 Log L 6044.341 5187.488
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 856.8521 6 <.0001
Score 780.0984 6 <.0001
Wald 623.9394 6 <.0001
Type III Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
sex 1 1.5260 0.2167
dept 5 534.7084 <.0001
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 16
Logistic dummy var regression on Berkeley data
Reduced model with Sex and Dept
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -0.6424 0.0397 262.4455 <.0001
sex Female 1 0.0499 0.0404 1.5260 0.2167
dept A 1 1.2744 0.0723 310.8017 <.0001
dept B 1 1.2310 0.0856 206.9662 <.0001
dept C 1 0.0118 0.0714 0.0272 0.8690
dept D 1 -0.0202 0.0729 0.0772 0.7812
dept E 1 -0.4649 0.0898 26.7929 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
sex Female vs Male 1.105 0.943 1.295
dept A vs F 27.284 19.553 38.070
dept B vs F 26.125 18.403 37.087
dept C vs F 7.719 5.555 10.726
dept D vs F 7.476 5.358 10.430
dept E vs F 4.792 3.365 6.825
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 17
Logistic dummy var regression on Berkeley data
Reduced model with Sex and Dept
06:40 Sunday, October 31, 2004
The LOGISTIC Procedure
Association of Predicted Probabilities and Observed Responses
Percent Concordant 45.8 Somers' D 0.000
Percent Discordant 45.8 Gamma 0.000
Percent Tied 8.3 Tau-a 0.000
Pairs 144 c 0.500
_______________________________________________________________________________
Berkeley Graduate Admissions Data: 18
Logistic dummy var regression on Berkeley data
Calculate LR test of Sex by Dept
06:40 Sunday, October 31, 2004
Wald chisquare = 17.9011, df=5, p = 0.0031
G PVAL
G = 20.204 , df = 5, p = 0.0011442