STA429/1007 F 2004 Handout 10
Logistic regression (Math data): Tie up some loose ends
/********************** mathlog2.sas **********************/
title2 'Logistic regression on math data: Part II';
options linesize=79 pagesize=2000 noovp formdlim='_';
libname math '/homes/students/u0/stats/brunner/mathlib';
/* Full path to permanent SAS datasets */
libname library '/homes/students/u0/stats/brunner/mathlib';
/* SAS will seach for permanently stored formats ONLY in a
place called "library." */
data quant; /* Includes only cases that are used for full model. */
set math.explore;
goodcase = gpa+hscalc+precalc+calc; /* Will be missing if any missing */
if goodcase =. then delete; /* Includes only cases used for full model. */
/* Standardize these vars in proc standard below */
zgpa=gpa ; zhscalc=hscalc ; zprecalc=precalc ; zcalc=calc ;
proc standard data=quant mean=0 std=1 out=withz;
var zgpa zhscalc zprecalc zcalc; /* Standardize these vars */
proc logistic descending;
title3 'Fit full model and do Wald tests with original vars';
model passed = gpa hscalc precalc calc;
hschool: test gpa=hscalc=0;
dtest: test precalc=calc=0;
proc logistic descending;
title3 'Fit full model and do Wald tests with standardized vars';
title4 'Compare -2LL=366.007, hschool=51.2448, dtest=13.8587';
model passed = zgpa zhscalc zprecalc zcalc;
hschool: test zgpa=zhscalc=0;
dtest: test zprecalc=zcalc=0;
proc iml;
title3 'Calculate prodicted probability of passing';
print "For a student at the mean on all Independent Variables,";
/* Using estimated intercept for standardized model */
avepass = exp(0.8036) / (1 + exp(0.8036) );
print "Estimated probability of passing is " avepass ;
b = {-14.6351,0.1181,0.0592,0.2633,0.0821}; print b;
/* From the Estimate col, non-standardized */
print "For a student with HS GPA = 80, HS calc = 75,";
print "7 out of 10 on the precalc and 6 out of 10 on calc,";
x = {1,80,75,7,6}; /* That 1 corresponds to the intercept */
lcombo = b` * x; /* Matrix multiplication: Short for
lcombo = -14.6351 + 0.1181*80 + 0.0592*75 + 0.2633*7 + 0.0821*6 */
pass = exp(lcombo) / (1+exp(lcombo));
print "Estimated probability of passing is " pass ;
/* Try to make proc logistic do the LR test for dtest, G = 14.903 */
proc logistic data=math.explore descending;
title3 'Try to get LR test for calc & precal: G = 14.903 ';
model passed = gpa hscalc precalc calc
/ include=2 selection=forward sequential slentry=1;
/************* Here is what the options are doing: *************
include=2 Include the first two variables regardless
selection=forward Add vars to the model (not remove)
sequential In the order they appear in the model statement
slentry=1 Enter a var if p < 1 (default is 0.05). So get them all.
******************************************************************/
/* Here's what happened. After the first two variables were entered (step
zero) -2 Log L = 380.910. After all the rest were entered, (end of step 2), we
have -2 Log L = 366.007. That's the same -2 Log L as the full model, and
380.910 - 366.007 = 14.903. Good. It's still clumsy, but at least we don't
have to fit full and reduced models separately. */
Output from the first two proc logistics is omitted, because all it does is show that in logistic regression, like in regular regression, you can standardize the independent variables (or just center them by subtracting off the mean) without affecting tests of whether a regression coefficient or collection of regression coefficients is equal to zero (this excludes tests about Õ0, of course). If you want to see the entire list file anyway, it's available.
Here is the rest of mathlog2.lst
_______________________________________________________________________________
The SAS System 3
Logistic regression on math data: Part II
Calculate prodicted probability of passing
10:52 Sunday, November 7, 2004
For a student at the mean on all Independent Variables,
AVEPASS
Estimated probability of passing is 0.690744
B
-14.6351
0.1181
0.0592
0.2633
0.0821
For a student with HS GPA = 80, HS calc = 75,
7 out of 10 on the precalc and 6 out of 10 on calc,
PASS
Estimated probability of passing is 0.830419
_______________________________________________________________________________
The SAS System 4
Logistic regression on math data: Part II
Try to get LR test for calc & precal: G = 14.903
10:52 Sunday, November 7, 2004
The LOGISTIC Procedure
Model Information
Data Set MATH.EXPLORE
Response Variable passed Passed the course
Number of Response Levels 2
Number of Observations 375
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value passed Frequency
1 Yes 234
2 No 141
Probability modeled is passed='Yes'.
NOTE: 204 observations were deleted due to missing values for the response or
explanatory variables.
Forward Selection Procedure
The following effects will be included in each model:
Intercept gpa hscalc
Step 0. The INCLUDE effects were entered.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 498.554 386.910
SC 502.481 398.691
-2 Log L 496.554 380.910
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 115.6442 2 <.0001
Score 99.0512 2 <.0001
Wald 73.7669 2 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
14.4788 2 0.0007
Step 1. Effect precalc entered:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 498.554 375.618
SC 502.481 391.326
-2 Log L 496.554 367.618
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 128.9358 3 <.0001
Score 107.7971 3 <.0001
Wald 79.6583 3 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
1.6051 1 0.2052
Step 2. Effect calc entered:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 498.554 376.007
SC 502.481 395.642
-2 Log L 496.554 366.007
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 130.5468 4 <.0001
Score 108.2737 4 <.0001
Wald 79.7057 4 <.0001
NOTE: All effects have been entered into the model.
Summary of Forward Selection
Effect Number Score
Step Entered DF In Chi-Square Pr > ChiSq
1 precalc 1 3 13.0467 0.0003
2 calc 1 4 1.6051 0.2052
Summary of Forward Selection
Variable
Step Label
1 Number precalculus correct
2 Number calculus correct
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -14.6351 2.2803 41.1914 <.0001
gpa 1 0.1181 0.0311 14.4227 0.0001
hscalc 1 0.0592 0.0136 18.9109 <.0001
precalc 1 0.2633 0.0890 8.7518 0.0031
calc 1 0.0821 0.0650 1.5969 0.2063
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
gpa 1.125 1.059 1.196
hscalc 1.061 1.033 1.090
precalc 1.301 1.093 1.549
calc 1.086 0.956 1.233
Association of Predicted Probabilities and Observed Responses
Percent Concordant 83.4 Somers' D 0.670
Percent Discordant 16.4 Gamma 0.671
Percent Tied 0.1 Tau-a 0.315
Pairs 32994 c 0.835