STA429/1007 F 2004 Handout 10
Logistic regression (Math data): Tie up some loose ends
/********************** mathlog2.sas **********************/ title2 'Logistic regression on math data: Part II'; options linesize=79 pagesize=2000 noovp formdlim='_'; libname math '/homes/students/u0/stats/brunner/mathlib'; /* Full path to permanent SAS datasets */ libname library '/homes/students/u0/stats/brunner/mathlib'; /* SAS will seach for permanently stored formats ONLY in a place called "library." */ data quant; /* Includes only cases that are used for full model. */ set math.explore; goodcase = gpa+hscalc+precalc+calc; /* Will be missing if any missing */ if goodcase =. then delete; /* Includes only cases used for full model. */ /* Standardize these vars in proc standard below */ zgpa=gpa ; zhscalc=hscalc ; zprecalc=precalc ; zcalc=calc ; proc standard data=quant mean=0 std=1 out=withz; var zgpa zhscalc zprecalc zcalc; /* Standardize these vars */ proc logistic descending; title3 'Fit full model and do Wald tests with original vars'; model passed = gpa hscalc precalc calc; hschool: test gpa=hscalc=0; dtest: test precalc=calc=0; proc logistic descending; title3 'Fit full model and do Wald tests with standardized vars'; title4 'Compare -2LL=366.007, hschool=51.2448, dtest=13.8587'; model passed = zgpa zhscalc zprecalc zcalc; hschool: test zgpa=zhscalc=0; dtest: test zprecalc=zcalc=0; proc iml; title3 'Calculate prodicted probability of passing'; print "For a student at the mean on all Independent Variables,"; /* Using estimated intercept for standardized model */ avepass = exp(0.8036) / (1 + exp(0.8036) ); print "Estimated probability of passing is " avepass ; b = {-14.6351,0.1181,0.0592,0.2633,0.0821}; print b; /* From the Estimate col, non-standardized */ print "For a student with HS GPA = 80, HS calc = 75,"; print "7 out of 10 on the precalc and 6 out of 10 on calc,"; x = {1,80,75,7,6}; /* That 1 corresponds to the intercept */ lcombo = b` * x; /* Matrix multiplication: Short for lcombo = -14.6351 + 0.1181*80 + 0.0592*75 + 0.2633*7 + 0.0821*6 */ pass = exp(lcombo) / (1+exp(lcombo)); print "Estimated probability of passing is " pass ; /* Try to make proc logistic do the LR test for dtest, G = 14.903 */ proc logistic data=math.explore descending; title3 'Try to get LR test for calc & precal: G = 14.903 '; model passed = gpa hscalc precalc calc / include=2 selection=forward sequential slentry=1; /************* Here is what the options are doing: ************* include=2 Include the first two variables regardless selection=forward Add vars to the model (not remove) sequential In the order they appear in the model statement slentry=1 Enter a var if p < 1 (default is 0.05). So get them all. ******************************************************************/ /* Here's what happened. After the first two variables were entered (step zero) -2 Log L = 380.910. After all the rest were entered, (end of step 2), we have -2 Log L = 366.007. That's the same -2 Log L as the full model, and 380.910 - 366.007 = 14.903. Good. It's still clumsy, but at least we don't have to fit full and reduced models separately. */
Output from the first two proc logistics is omitted, because all it does is show that in logistic regression, like in regular regression, you can standardize the independent variables (or just center them by subtracting off the mean) without affecting tests of whether a regression coefficient or collection of regression coefficients is equal to zero (this excludes tests about Õ0, of course). If you want to see the entire list file anyway, it's available.
Here is the rest of mathlog2.lst
_______________________________________________________________________________ The SAS System 3 Logistic regression on math data: Part II Calculate prodicted probability of passing 10:52 Sunday, November 7, 2004 For a student at the mean on all Independent Variables, AVEPASS Estimated probability of passing is 0.690744 B -14.6351 0.1181 0.0592 0.2633 0.0821 For a student with HS GPA = 80, HS calc = 75, 7 out of 10 on the precalc and 6 out of 10 on calc, PASS Estimated probability of passing is 0.830419 _______________________________________________________________________________ The SAS System 4 Logistic regression on math data: Part II Try to get LR test for calc & precal: G = 14.903 10:52 Sunday, November 7, 2004 The LOGISTIC Procedure Model Information Data Set MATH.EXPLORE Response Variable passed Passed the course Number of Response Levels 2 Number of Observations 375 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value passed Frequency 1 Yes 234 2 No 141 Probability modeled is passed='Yes'. NOTE: 204 observations were deleted due to missing values for the response or explanatory variables. Forward Selection Procedure The following effects will be included in each model: Intercept gpa hscalc Step 0. The INCLUDE effects were entered. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 498.554 386.910 SC 502.481 398.691 -2 Log L 496.554 380.910 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 115.6442 2 <.0001 Score 99.0512 2 <.0001 Wald 73.7669 2 <.0001 Residual Chi-Square Test Chi-Square DF Pr > ChiSq 14.4788 2 0.0007 Step 1. Effect precalc entered: Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 498.554 375.618 SC 502.481 391.326 -2 Log L 496.554 367.618 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 128.9358 3 <.0001 Score 107.7971 3 <.0001 Wald 79.6583 3 <.0001 Residual Chi-Square Test Chi-Square DF Pr > ChiSq 1.6051 1 0.2052 Step 2. Effect calc entered: Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 498.554 376.007 SC 502.481 395.642 -2 Log L 496.554 366.007 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 130.5468 4 <.0001 Score 108.2737 4 <.0001 Wald 79.7057 4 <.0001 NOTE: All effects have been entered into the model. Summary of Forward Selection Effect Number Score Step Entered DF In Chi-Square Pr > ChiSq 1 precalc 1 3 13.0467 0.0003 2 calc 1 4 1.6051 0.2052 Summary of Forward Selection Variable Step Label 1 Number precalculus correct 2 Number calculus correct Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -14.6351 2.2803 41.1914 <.0001 gpa 1 0.1181 0.0311 14.4227 0.0001 hscalc 1 0.0592 0.0136 18.9109 <.0001 precalc 1 0.2633 0.0890 8.7518 0.0031 calc 1 0.0821 0.0650 1.5969 0.2063 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits gpa 1.125 1.059 1.196 hscalc 1.061 1.033 1.090 precalc 1.301 1.093 1.549 calc 1.086 0.956 1.233 Association of Predicted Probabilities and Observed Responses Percent Concordant 83.4 Somers' D 0.670 Percent Discordant 16.4 Gamma 0.671 Percent Tied 0.1 Tau-a 0.315 Pairs 32994 c 0.835