STA429/1007 F 2004 Handout 9
Logistic regression (math and Berkeley data)
/********************** mathlog1.sas **********************/ title2 'Logistic regression on math data'; options linesize=79 pagesize=2000 noovp formdlim='_'; libname math 'mathlib'; /* Location of permanent SAS datasets */ libname library 'mathlib'; /* SAS will seach for permanently stored formats ONLY in a place called "library." */ /* mathread.sas creates the variable passed like this if (50<=mark<=100) then passed=1; else passed=0; label passed = 'Passed the course'; format passed ynfmt.; */ data quant; /* Includes only cases that are used for full model. It's a shame that this is necessary. */ set math.explore; goodcase = gpa+hscalc+precalc+calc; /* Will be missing if any missing */ if goodcase = . then delete ; proc logistic descending; /* Always use descending */ title3 'Fit full model and do Wald tests'; model passed = gpa hscalc precalc calc; hschool: test gpa=hscalc=0; dtest: test precalc=calc=0; proc logistic descending; title3 'Fit reduced model for testing hschool (gpa & hscalc)'; model passed = precalc calc; proc iml; title3 'Calculate Likelihood Ratio Test for hschool'; G = 434.817 - 366.007; /* Got these numbers from the printout */ pval = 1-probchi(G,2); print "G = " G ", df = 2, p = " pval; print "Compare Wald chisquare = 51.2448"; proc logistic descending ; title3 'Reduced model for testing diagnostic test (precalc & calc)'; model passed = gpa hscalc; proc iml; title3 'Calculate Likelihood Ratio Test for dtest'; G = 380.910 - 366.007; /* Got these numbers from the printout */ pval = 1-probchi(G,2); print "G = " G ", df = 2, p = " pval; print "Compare Wald chisquare = 13.8587";
Here is mathlog1.lst
_______________________________________________________________________________ The SAS System 1 Logistic regression on math data Fit full model and do Wald tests 21:38 Saturday, October 30, 2004 The LOGISTIC Procedure Model Information Data Set WORK.QUANT Response Variable passed Passed the course Number of Response Levels 2 Number of Observations 375 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value passed Frequency 1 Yes 234 2 No 141 Probability modeled is passed='Yes'. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 498.554 376.007 SC 502.481 395.642 -2 Log L 496.554 366.007 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 130.5468 4 <.0001 Score 108.2737 4 <.0001 Wald 79.7057 4 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -14.6351 2.2803 41.1914 <.0001 gpa 1 0.1181 0.0311 14.4227 0.0001 hscalc 1 0.0592 0.0136 18.9109 <.0001 precalc 1 0.2633 0.0890 8.7518 0.0031 calc 1 0.0821 0.0650 1.5969 0.2063 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits gpa 1.125 1.059 1.196 hscalc 1.061 1.033 1.090 precalc 1.301 1.093 1.549 calc 1.086 0.956 1.233 Association of Predicted Probabilities and Observed Responses Percent Concordant 83.4 Somers' D 0.670 Percent Discordant 16.4 Gamma 0.671 Percent Tied 0.1 Tau-a 0.315 Pairs 32994 c 0.835 Linear Hypotheses Testing Results Wald Label Chi-Square DF Pr > ChiSq hschool 51.2448 2 <.0001 dtest 13.8587 2 0.0010 _______________________________________________________________________________ The SAS System 2 Logistic regression on math data Fit reduced model for testing hschool (gpa & hscalc) 21:38 Saturday, October 30, 2004 The LOGISTIC Procedure Model Information Data Set WORK.QUANT Response Variable passed Passed the course Number of Response Levels 2 Number of Observations 375 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value passed Frequency 1 Yes 234 2 No 141 Probability modeled is passed='Yes'. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 498.554 440.817 SC 502.481 452.598 -2 Log L 496.554 434.817 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 61.7369 2 <.0001 Score 55.0700 2 <.0001 Wald 47.8371 2 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.8113 0.3604 25.2549 <.0001 precalc 1 0.3696 0.0821 20.2696 <.0001 calc 1 0.2066 0.0567 13.2947 0.0003 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits precalc 1.447 1.232 1.700 calc 1.230 1.100 1.374 Association of Predicted Probabilities and Observed Responses Percent Concordant 72.4 Somers' D 0.467 Percent Discordant 25.7 Gamma 0.475 Percent Tied 1.9 Tau-a 0.220 Pairs 32994 c 0.733 _______________________________________________________________________________ The SAS System 3 Logistic regression on math data Calculate Likelihood Ratio Test for hschool 21:38 Saturday, October 30, 2004 G PVAL G = 68.81 , df = 2, p = 1.11E-15 Compare Wald chisquare = 51.2448 _______________________________________________________________________________ The SAS System 4 Logistic regression on math data Reduced model for testing diagnostic test (precalc & calc) 21:38 Saturday, October 30, 2004 The LOGISTIC Procedure Model Information Data Set WORK.QUANT Response Variable passed Passed the course Number of Response Levels 2 Number of Observations 375 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value passed Frequency 1 Yes 234 2 No 141 Probability modeled is passed='Yes'. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 498.554 386.910 SC 502.481 398.691 -2 Log L 496.554 380.910 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 115.6442 2 <.0001 Score 99.0512 2 <.0001 Wald 73.7669 2 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -14.7000 2.2138 44.0913 <.0001 gpa 1 0.1250 0.0303 17.0524 <.0001 hscalc 1 0.0719 0.0130 30.5780 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits gpa 1.133 1.068 1.202 hscalc 1.075 1.048 1.102 Association of Predicted Probabilities and Observed Responses Percent Concordant 81.9 Somers' D 0.640 Percent Discordant 18.0 Gamma 0.641 Percent Tied 0.1 Tau-a 0.301 Pairs 32994 c 0.820 _______________________________________________________________________________ The SAS System 5 Logistic regression on math data Calculate Likelihood Ratio Test for dtest 21:38 Saturday, October 30, 2004 G PVAL G = 14.903 , df = 2, p = 0.0005806 Compare Wald chisquare = 13.8587
For reference, here is berkdata.sas. Use it with %include.
/* berkdata.sas: Define Berkeley Grad admissions data Always need weight count */ options linesize=79 pagesize=35 noovp formdlim='_'; title 'Berkeley Graduate Admissions Data: '; proc format; value sexfmt 1 = 'Female' 0 = 'Male'; value ynfmt 1 = 'Yes' 0 = 'No'; data berkley; input line sex dept $ admit count; /* Dummy vars for department: F is ref category */ if dept='A' then a = 1 ; else a = 0; if dept='B' then b = 1 ; else b = 0; if dept='C' then c = 1 ; else c = 0; if dept='D' then d = 1 ; else d = 0; if dept='E' then e = 1 ; else e = 0; format sex sexfmt.; format admit ynfmt.; datalines; 1 0 A 1 512 2 0 B 1 353 3 0 C 1 120 4 0 D 1 138 5 0 E 1 53 6 0 F 1 22 7 1 A 1 89 8 1 B 1 17 9 1 C 1 202 10 1 D 1 131 11 1 E 1 94 12 1 F 1 24 13 0 A 0 313 14 0 B 0 207 15 0 C 0 205 16 0 D 0 279 17 0 E 0 138 18 0 F 0 351 19 1 A 0 19 20 1 B 0 8 21 1 C 0 391 22 1 D 0 244 23 1 E 0 299 24 1 F 0 317 ; /* Commented out proc freq; tables dept * (a--e) / norow nocol nopercent;
Now the program file, followed by the list file.
/* logberk.sas*/ %include 'berkdata.sas'; /* Always need weight count */ title2 'Logistic dummy var regression on Berkeley data'; proc logistic descending; title3 'Admit by sex: proc freq gives LR chisq = 93.4494'; model admit = sex; weight count; proc logistic descending; title3 'Admit by Dept: proc freq gives LR chisq = 855.3209'; title4 '(Also the reduced model for testing sex controlling for dept.)'; model admit = a -- e; weight count; proc logistic descending; title3 'Full model with Sex and Dept and Interaction'; class sex dept; model admit = sex dept sex*dept; weight count; /* The subdivision approach to controlling for dept specifies a reduced model in which there is no relationship between sex and admission for ANY department. If we were testing means in an ordinary ANOVA, this would look like two profiles (one for males and one for females) that were exactly on top of each other. That is, there would be no main effect for sex AND no dept by sex interaction. That's what we test in the present case too. We have already fit the reduced model, which has just dept, but no sex and no interaction. */ proc iml; title3 'Calculate LR test of Sex controlling for Dept'; print " proc freq gives LR chisq ="; print "19.0540+0.2586+0.75100+0.2979+0.9904+0.3836 = 21.7355"; G = 5189.020 - 5167.284; pval = 1-probchi(G,5); print "G = " G ", df = 5, p = " pval; /* Finally, the significant Wald test for interaction is intriguing. If it holds up, it would represent the substantial association in Dept A, contrasted with the weak or absent association in other departments. Fit reduced model with sex and dept but no interaction. */ proc logistic descending; title3 'Reduced model with Sex and Dept'; class sex dept; model admit = sex dept; weight count; proc iml; title3 'Calculate LR test of Sex by Dept'; print "Wald chisquare = 17.9011, df=5, p = 0.0031"; G = 5187.488 - 5167.284; pval = 1-probchi(G,5); print "G = " G ", df = 5, p = " pval;
Here is logberk.lst.
_______________________________________________________________________________ Berkeley Graduate Admissions Data: 1 Logistic dummy var regression on Berkeley data Admit by sex: proc freq gives LR chisq = 93.4494 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Information Data Set WORK.BERKLEY Response Variable admit Number of Response Levels 2 Number of Observations 24 Weight Variable count Sum of Weights 4526 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Total Value admit Frequency Weight 1 Yes 12 1755.0000 2 No 12 2771.0000 Probability modeled is admit='Yes'. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. _______________________________________________________________________________ Berkeley Graduate Admissions Data: 2 Logistic dummy var regression on Berkeley data Admit by sex: proc freq gives LR chisq = 93.4494 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 6046.341 5954.891 SC 6047.519 5957.247 -2 Log L 6044.341 5950.891 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 93.4494 1 <.0001 Score 92.2053 1 <.0001 Wald 91.2356 1 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.2201 0.0388 32.2086 <.0001 sex 1 -0.6103 0.0639 91.2356 <.0001 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 3 Logistic dummy var regression on Berkeley data Admit by sex: proc freq gives LR chisq = 93.4494 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits sex 0.543 0.479 0.616 Association of Predicted Probabilities and Observed Responses Percent Concordant 25.0 Somers' D 0.000 Percent Discordant 25.0 Gamma 0.000 Percent Tied 50.0 Tau-a 0.000 Pairs 144 c 0.500 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 4 Logistic dummy var regression on Berkeley data Admit by Dept: proc freq gives LR chisq = 855.3209 (Also the reduced model for testing sex controlling for dept.) 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Information Data Set WORK.BERKLEY Response Variable admit Number of Response Levels 2 Number of Observations 24 Weight Variable count Sum of Weights 4526 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Total Value admit Frequency Weight 1 Yes 12 1755.0000 2 No 12 2771.0000 Probability modeled is admit='Yes'. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. _______________________________________________________________________________ Berkeley Graduate Admissions Data: 5 Logistic dummy var regression on Berkeley data Admit by Dept: proc freq gives LR chisq = 855.3209 (Also the reduced model for testing sex controlling for dept.) 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 6046.341 5201.020 SC 6047.519 5208.088 -2 Log L 6044.341 5189.020 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 855.3209 5 <.0001 Score 778.9065 5 <.0001 Wald 623.0288 5 <.0001 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 6 Logistic dummy var regression on Berkeley data Admit by Dept: proc freq gives LR chisq = 855.3209 (Also the reduced model for testing sex controlling for dept.) 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.6755 0.1524 308.1089 <.0001 a 1 3.2689 0.1671 382.8915 <.0001 b 1 3.2183 0.1749 338.6368 <.0001 c 1 2.0598 0.1674 151.4376 <.0001 d 1 2.0106 0.1699 140.0623 <.0001 e 1 1.5860 0.1798 77.8152 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits a 26.283 18.944 36.464 b 24.986 17.735 35.202 c 7.844 5.650 10.890 d 7.468 5.353 10.418 e 4.884 3.433 6.947 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 7 Logistic dummy var regression on Berkeley data Admit by Dept: proc freq gives LR chisq = 855.3209 (Also the reduced model for testing sex controlling for dept.) 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Association of Predicted Probabilities and Observed Responses Percent Concordant 41.7 Somers' D 0.000 Percent Discordant 41.7 Gamma 0.000 Percent Tied 16.7 Tau-a 0.000 Pairs 144 c 0.500 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 8 Logistic dummy var regression on Berkeley data Full model with Sex and Dept and Interaction 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Information Data Set WORK.BERKLEY Response Variable admit Number of Response Levels 2 Number of Observations 24 Weight Variable count Sum of Weights 4526 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Total Value admit Frequency Weight 1 Yes 12 1755.0000 2 No 12 2771.0000 Probability modeled is admit='Yes'. _______________________________________________________________________________ Berkeley Graduate Admissions Data: 9 Logistic dummy var regression on Berkeley data Full model with Sex and Dept and Interaction 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Class Level Information Design Variables Class Value 1 2 3 4 5 sex Female 1 Male -1 dept A 1 0 0 0 0 B 0 1 0 0 0 C 0 0 1 0 0 D 0 0 0 1 0 E 0 0 0 0 1 F -1 -1 -1 -1 -1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. _______________________________________________________________________________ Berkeley Graduate Admissions Data: 10 Logistic dummy var regression on Berkeley data Full model with Sex and Dept and Interaction 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 6046.341 5191.284 SC 6047.519 5205.421 -2 Log L 6044.341 5167.284 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 877.0564 11 <.0001 Score 797.7045 11 <.0001 Wald 628.1667 11 <.0001 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq sex 1 3.3927 0.0655 dept 5 389.4863 <.0001 sex*dept 5 17.9011 0.0031 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 11 Logistic dummy var regression on Berkeley data Full model with Sex and Dept and Interaction 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.5552 0.0551 101.5729 <.0001 sex Female 1 0.1015 0.0551 3.3927 0.0655 dept A 1 1.5733 0.1206 170.2821 <.0001 dept B 1 1.1990 0.1869 41.1307 <.0001 dept C 1 -0.0428 0.0805 0.2822 0.5953 dept D 1 -0.1078 0.0824 1.7094 0.1911 dept E 1 -0.5019 0.0986 25.9191 <.0001 sex*dept Female A 1 0.4246 0.1206 12.3999 0.0004 sex*dept Female B 1 0.00854 0.1869 0.0021 0.9635 sex*dept Female C 1 -0.1639 0.0805 4.1420 0.0418 sex*dept Female D 1 -0.0605 0.0824 0.5382 0.4632 sex*dept Female E 1 -0.2016 0.0986 4.1808 0.0409 Association of Predicted Probabilities and Observed Responses Percent Concordant 45.8 Somers' D 0.000 Percent Discordant 45.8 Gamma 0.000 Percent Tied 8.3 Tau-a 0.000 Pairs 144 c 0.500 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 12 Logistic dummy var regression on Berkeley data Calculate LR test of Sex controlling for Dept 06:40 Sunday, October 31, 2004 proc freq gives LR chisq = 19.0540+0.2586+0.75100+0.2979+0.9904+0.3836 = 21.7355 G PVAL G = 21.736 , df = 5, p = 0.0005877 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 13 Logistic dummy var regression on Berkeley data Reduced model with Sex and Dept 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Information Data Set WORK.BERKLEY Response Variable admit Number of Response Levels 2 Number of Observations 24 Weight Variable count Sum of Weights 4526 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Total Value admit Frequency Weight 1 Yes 12 1755.0000 2 No 12 2771.0000 Probability modeled is admit='Yes'. _______________________________________________________________________________ Berkeley Graduate Admissions Data: 14 Logistic dummy var regression on Berkeley data Reduced model with Sex and Dept 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Class Level Information Design Variables Class Value 1 2 3 4 5 sex Female 1 Male -1 dept A 1 0 0 0 0 B 0 1 0 0 0 C 0 0 1 0 0 D 0 0 0 1 0 E 0 0 0 0 1 F -1 -1 -1 -1 -1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. _______________________________________________________________________________ Berkeley Graduate Admissions Data: 15 Logistic dummy var regression on Berkeley data Reduced model with Sex and Dept 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 6046.341 5201.488 SC 6047.519 5209.735 -2 Log L 6044.341 5187.488 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 856.8521 6 <.0001 Score 780.0984 6 <.0001 Wald 623.9394 6 <.0001 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq sex 1 1.5260 0.2167 dept 5 534.7084 <.0001 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 16 Logistic dummy var regression on Berkeley data Reduced model with Sex and Dept 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.6424 0.0397 262.4455 <.0001 sex Female 1 0.0499 0.0404 1.5260 0.2167 dept A 1 1.2744 0.0723 310.8017 <.0001 dept B 1 1.2310 0.0856 206.9662 <.0001 dept C 1 0.0118 0.0714 0.0272 0.8690 dept D 1 -0.0202 0.0729 0.0772 0.7812 dept E 1 -0.4649 0.0898 26.7929 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits sex Female vs Male 1.105 0.943 1.295 dept A vs F 27.284 19.553 38.070 dept B vs F 26.125 18.403 37.087 dept C vs F 7.719 5.555 10.726 dept D vs F 7.476 5.358 10.430 dept E vs F 4.792 3.365 6.825 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 17 Logistic dummy var regression on Berkeley data Reduced model with Sex and Dept 06:40 Sunday, October 31, 2004 The LOGISTIC Procedure Association of Predicted Probabilities and Observed Responses Percent Concordant 45.8 Somers' D 0.000 Percent Discordant 45.8 Gamma 0.000 Percent Tied 8.3 Tau-a 0.000 Pairs 144 c 0.500 _______________________________________________________________________________ Berkeley Graduate Admissions Data: 18 Logistic dummy var regression on Berkeley data Calculate LR test of Sex by Dept 06:40 Sunday, October 31, 2004 Wald chisquare = 17.9011, df=5, p = 0.0031 G PVAL G = 20.204 , df = 5, p = 0.0011442