STA429/1007 F 2004 Handout 11
Logistic regression with more than 2 response categories (Math data)
/********************** mathlog3.sas **********************/ title2 'Logistic regression on math data: Part III Using proc catmod'; options linesize=79 pagesize=2000 noovp formdlim='_'; libname math '/homes/students/u0/stats/brunner/mathlib'; /* Full path to permanent SAS datasets */ libname library '/homes/students/u0/stats/brunner/mathlib'; /* SAS will seach for permanently stored formats ONLY in a place called "library." */ /* Make proc catmod do the logistic regression LR test for dtest, G = 14.903 */ data quant; /* Includes only cases that are used for full model. */ set math.explore; goodcase = gpa+hscalc+precalc+calc; /* Will be missing if any missing */ if goodcase =. then delete; proc catmod; /* Regular 2-category logistic regression as a test */ title3 'Test precalc & calc controlling for gpa & hscalc: Full model'; direct gpa hscalc precalc calc; /* Direct means no dummy vars please */ model passed = gpa hscalc precalc calc / noprofile; contrast 'dtest' all_parms 0 0 0 1 0, all_parms 0 0 0 0 1; /* Just a Wald test (sigh) */ proc catmod; title3 'Test precalc & calc controlling for gpa & hscalc: Reducd model'; direct gpa hscalc; model passed = gpa hscalc / noprofile; /* Okay, got 380.90973 - 366.00715 = 14.90258. The parameter estimates have reversed signs, but we can live with it. */ /* Now try a 4-category response */ data quant2; set math.explore; goodcase = sex+course; if goodcase=. then delete; proc freq; tables sex*course / nocol nopercent chisq; proc catmod; title3 'Course is a 4-category DV: Full'; direct sex; /* already 0=M, 1=F */ model course = sex; proc catmod; title3 'Course is a 4-category DV: Reduced'; model course = ; proc iml; title3 'Calculate G: Should be Likelihood Ratio Chi-Square = 15.4099'; title4 '(from proc freq)'; G = 895.64588 - 880.23601; pval = 1-probchi(G,3); print "G = " G ", df = 3, p = " pval; proc iml; title3 'Predicted probability of female taking course 2 = 0.8158 ?'; /* Need to solve 3 linear equations in 3 unknowns */ xm = {1,0}; xf = {1,1}; beta1 = {0.2007,0.6817}; /* Values from printout */ beta2 = {2.0025,0.8925}; beta3 = {-0.0770,-0.3285}; A = J(3,3); /* 3x3 matrix of ones */ A(|1,1|) = 1+exp(-xf`*beta1); A(|2,2|) = 1+exp(-xf`*beta2); A(|3,3|) = 1+exp(-xf`*beta3); one = {1,1,1}; print A one; print "Need to solve A pi = one for pi"; pi = solve(A,one); /* Tack on the last (redundant) element of pi */ pi4 = 1-sum(pi); pi = pi // pi4; /* Vertical concatination */ print "Predicted probabilities for females:"; print pi; /* Could do the same thing for males, of course */ /* Predict choice of class from HS & diagnos. To make things simpler, make course 2 (MAT132) the reference (last) cagtegory. But proc catmod does not use alphabetical order. */ proc format; value cfmt 1 = 'Mat122Y?' 2 = 'Mat138Y' 3 = 'Other' 4 = 'Mat132Y'; data quant3; /* Only cases used in full model */ set math.explore; /* Make course2 with Mat132 last */ course2=course; if course=1 then course2=1; else if course=2 then course2=4; else if course=3 then course2=2; else if course=4 then course2=3; else course=.; good = precalc+calc+gpa+hscalc+english+sex; if good=. then delete; format course2 cfmt.; proc freq; title3 ''; tables course2*course / norow nocol nopercent; proc catmod; direct precalc calc gpa hscalc english sex; model course = precalc calc gpa hscalc english sex / noprofile; contrast 'dtest' all_parms 0 1 0 0 0 0 0, all_parms 0 0 1 0 0 0 0;
Here is mathlog3.lst
_______________________________________________________________________________ The SAS System 1 Logistic regression on math data: Part III Using proc catmod Test precalc & calc controlling for gpa & hscalc: Full model 21:27 Wednesday, November 10, 2004 The CATMOD Procedure Data Summary Response passed Response Levels 2 Weight Variable None Populations 374 Data Set QUANT Total Frequency 375 Frequency Missing 0 Observations 375 Maximum Likelihood Analysis Sub -2 Log Convergence Iteration Iteration Likelihood Criterion --------------------------------------------------- 0 0 519.86039 1.0000 1 0 380.37464 0.2683 2 0 366.86764 0.0355 3 0 366.01214 0.002332 4 0 366.00715 0.0000136 5 0 366.00715 5.337E-10 Maximum Likelihood Analysis Parameter Estimates Iteration 1 2 3 4 5 -------------------------------------------------------------------------- 0 0 0 0 0 0 1 8.4509 -0.0548 -0.0481 -0.1728 -0.0336 2 12.9510 -0.1001 -0.0567 -0.2429 -0.0687 3 14.5028 -0.1167 -0.0590 -0.2619 -0.0811 4 14.6347 -0.1181 -0.0592 -0.2633 -0.0821 5 14.6355 -0.1181 -0.0592 -0.2633 -0.0821 Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 1 41.20 <.0001 gpa 1 14.42 0.0001 hscalc 1 18.91 <.0001 precalc 1 8.75 0.0031 calc 1 1.60 0.2063 Likelihood Ratio 369 366.01 0.5342 Analysis of Maximum Likelihood Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq ---------------------------------------------------------- Intercept 14.6355 2.2803 41.20 <.0001 gpa -0.1181 0.0311 14.42 0.0001 hscalc -0.0592 0.0136 18.91 <.0001 precalc -0.2633 0.0890 8.75 0.0031 calc -0.0821 0.0650 1.60 0.2063 Contrasts of Maximum Likelihood Estimates Contrast DF Chi-Square Pr > ChiSq ----------------------------------------- dtest 2 13.86 0.0010 _______________________________________________________________________________ The SAS System 2 Logistic regression on math data: Part III Using proc catmod Test precalc & calc controlling for gpa & hscalc: Reducd model 21:27 Wednesday, November 10, 2004 The CATMOD Procedure Data Summary Response passed Response Levels 2 Weight Variable None Populations 354 Data Set QUANT Total Frequency 375 Frequency Missing 0 Observations 375 Maximum Likelihood Analysis Sub -2 Log Convergence Parameter Estimates Iteration Iteration Likelihood Criterion 1 2 3 ------------------------------------------------------------------------------ 0 0 519.86039 1.0000 0 0 0 1 0 391.3283 0.2472 9.0738 -0.0663 -0.0563 2 0 381.37804 0.0254 13.3942 -0.1109 -0.0688 3 0 380.91122 0.001224 14.6249 -0.1242 -0.0717 4 0 380.90973 3.9307E-6 14.6999 -0.1250 -0.0719 5 0 380.90973 4.506E-11 14.7001 -0.1250 -0.0719 Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 1 44.09 <.0001 gpa 1 17.05 <.0001 hscalc 1 30.58 <.0001 Likelihood Ratio 351 360.46 0.3524 Analysis of Maximum Likelihood Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq ---------------------------------------------------------- Intercept 14.7001 2.2138 44.09 <.0001 gpa -0.1250 0.0303 17.05 <.0001 hscalc -0.0719 0.0130 30.58 <.0001 _______________________________________________________________________________ The SAS System 3 Logistic regression on math data: Part III Using proc catmod Test precalc & calc controlling for gpa & hscalc: Reducd model 21:27 Wednesday, November 10, 2004 The FREQ Procedure Table of sex by course sex course Frequency| Row Pct | 1 | 2 | 3 |No Resp | Total ---------+--------+--------+--------+--------+ Male | 33 | 200 | 25 | 27 | 285 | 11.58 | 70.18 | 8.77 | 9.47 | ---------+--------+--------+--------+--------+ Female | 29 | 217 | 8 | 12 | 266 | 10.90 | 81.58 | 3.01 | 4.51 | ---------+--------+--------+--------+--------+ Total 62 417 33 39 551 Statistics for Table of sex by course Statistic DF Value Prob ------------------------------------------------------ Chi-Square 3 14.8404 0.0020 Likelihood Ratio Chi-Square 3 15.4099 0.0015 Mantel-Haenszel Chi-Square 1 6.9148 0.0085 Phi Coefficient 0.1641 Contingency Coefficient 0.1619 Cramer's V 0.1641 Sample Size = 551 _______________________________________________________________________________ The SAS System 4 Logistic regression on math data: Part III Using proc catmod Course is a 4-category DV: Full 21:27 Wednesday, November 10, 2004 The CATMOD Procedure Data Summary Response course Response Levels 4 Weight Variable None Populations 2 Data Set QUANT2 Total Frequency 551 Frequency Missing 0 Observations 551 Population Profiles Sample sex Sample Size ------------------------------- 1 Male 285 2 Female 266 Response Profiles Response course ------------------- 1 1 2 2 3 3 4 No Resp Maximum Likelihood Analysis Sub -2 Log Convergence Iteration Iteration Likelihood Criterion ------------------------------------------------- 0 0 1527.6964 1.0000 1 0 906.57678 0.4066 2 0 884.29299 0.0246 3 0 880.2869 0.004530 4 0 880.23602 0.0000578 5 0 880.23601 1.6226E-8 6 0 880.23601 1.292E-15 Maximum Likelihood Analysis Parameter Estimates Iteration 1 2 3 4 5 6 --------------------------------------------------------------------------- 0 0 0 0 0 0 0 1 0.0842 2.4281 -0.0281 0.1714 0.6546 -0.0321 2 0.2527 1.9553 -0.0932 0.9879 0.9340 -0.2970 3 0.2034 2.0023 -0.0773 0.7232 0.8930 -0.3287 4 0.2007 2.0025 -0.0770 0.6825 0.8925 -0.3285 5 0.2007 2.0025 -0.0770 0.6817 0.8925 -0.3285 6 0.2007 2.0025 -0.0770 0.6817 0.8925 -0.3285 Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 3 227.38 <.0001 sex 3 13.90 0.0030 Likelihood Ratio 0 . . Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ------------------------------------------------------------------- Intercept 1 0.2007 0.2595 0.60 0.4393 2 2.0025 0.2050 95.39 <.0001 3 -0.0770 0.2776 0.08 0.7816 sex 1 0.6817 0.4303 2.51 0.1131 2 0.8925 0.3605 6.13 0.0133 3 -0.3285 0.5342 0.38 0.5386 _______________________________________________________________________________ The SAS System 5 Logistic regression on math data: Part III Using proc catmod Course is a 4-category DV: Reduced The CATMOD Procedure Data Summary Response course Response Levels 4 Weight Variable None Populations 1 Data Set QUANT2 Total Frequency 551 Frequency Missing 0 Observations 551 Population Profiles Sample Sample Size --------------------- 1 551 Response Profiles Response course ------------------- 1 1 2 2 3 3 4 No Resp Maximum Likelihood Analysis Sub -2 Log Convergence Parameter Estimates Iteration Iteration Likelihood Criterion 1 2 3 ------------------------------------------------------------------------------ 0 0 1527.6964 1.0000 0 0 0 1 0 920.21756 0.3976 0.1670 2.7441 -0.0436 2 0 897.90453 0.0242 0.6237 2.3308 -0.1972 3 0 895.65595 0.002504 0.4774 2.3699 -0.1669 4 0 895.64588 0.0000112 0.4636 2.3695 -0.1671 5 0 895.64588 3.033E-10 0.4636 2.3695 -0.1671 Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 3 499.35 <.0001 Likelihood Ratio 0 . . Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ------------------------------------------------------------------- Intercept 1 0.4636 0.2044 5.14 0.0233 2 2.3695 0.1674 200.24 <.0001 3 -0.1671 0.2365 0.50 0.4800 _______________________________________________________________________________ Calculate G: Should be Likelihood Ratio Chi-Square = 15.4099 (from proc freq) 21:27 Wednesday, November 10, 2004 G PVAL G = 15.40987 , df = 3, p = 0.0014979 _______________________________________________________________________________ The SAS System 7 Logistic regression on math data: Part III Using proc catmod Predicted probability of female taking course 2 = 0.8158 ? 21:27 Wednesday, November 10, 2004 A ONE 1.4137886 1 1 1 1 1.055299 1 1 1 1 2.5000523 1 Need to solve A pi = one for pi Predicted probabilities for females: PI 0.1090229 0.8157908 0.0300739 0.0451124 _______________________________________________________________________________ The SAS System 8 Logistic regression on math data: Part III Using proc catmod 21:27 Wednesday, November 10, 2004 The FREQ Procedure Table of course2 by course course2 course Frequency| 1 | 2 | 3 | Total ---------+--------+--------+--------+ Mat122Y? | 20 | 0 | 0 | 20 ---------+--------+--------+--------+ Mat138Y | 0 | 0 | 24 | 24 ---------+--------+--------+--------+ Mat132Y | 0 | 324 | 0 | 324 ---------+--------+--------+--------+ Total 20 324 24 368 _______________________________________________________________________________ The SAS System 9 Logistic regression on math data: Part III Using proc catmod Predicted probability of female taking course 2 = 0.8158 ? 21:27 Wednesday, November 10, 2004 The CATMOD Procedure Data Summary Response course Response Levels 3 Weight Variable None Populations 368 Data Set QUANT3 Total Frequency 368 Frequency Missing 0 Observations 368 Maximum Likelihood Analysis Sub -2 Log Convergence Parameter Estimates Iteration Iteration Likelihood Criterion 1 2 3 ------------------------------------------------------------------------------ 0 0 808.57864 1.0000 0 0 0 1 0 310.6374 0.6158 1.1397 1.5742 -0.0490 2 0 281.36458 0.0942 6.4226 2.8677 -0.2137 3 0 274.79215 0.0234 10.0860 3.3236 -0.2707 4 0 273.9678 0.003000 11.7154 3.4453 -0.2896 5 0 273.94419 0.0000862 11.9603 3.4508 -0.2918 6 0 273.94416 1.0121E-7 11.9671 3.4508 -0.2919 7 0 273.94416 1.544E-13 11.9671 3.4508 -0.2919 Maximum Likelihood Analysis Parameter Estimates Iteration 4 5 6 7 8 9 --------------------------------------------------------------------------- 0 0 0 0 0 0 0 1 -0.0568 -0.0321 0.0174 0.004018 -0.0173 -0.0198 2 -0.1669 -0.1672 -0.007892 0.000429 -0.0207 -0.0819 3 -0.1949 -0.2670 -0.002234 -0.0183 -0.005235 -0.1074 4 -0.2009 -0.3299 -0.001903 -0.0296 -0.002096 -0.1172 5 -0.2011 -0.3439 -0.001982 -0.0316 -0.002008 -0.1187 6 -0.2011 -0.3444 -0.001985 -0.0316 -0.002008 -0.1187 7 -0.2011 -0.3444 -0.001985 -0.0316 -0.002008 -0.1187 Maximum Likelihood Analysis Parameter Estimates Iteration 10 11 12 13 14 ------------------------------------------------------------------------ 0 0 0 0 0 0 1 0.004332 0.004316 0.0253 0.0426 0.3729 2 -0.0193 0.0124 0.0440 0.2434 0.7878 3 -0.0351 0.0122 0.0414 0.3387 0.8962 4 -0.0389 0.0127 0.0410 0.3577 0.9171 5 -0.0390 0.0129 0.0409 0.3576 0.9177 6 -0.0390 0.0129 0.0409 0.3576 0.9177 7 -0.0390 0.0129 0.0409 0.3576 0.9177 Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 2 5.10 0.0782 precalc 2 1.98 0.3724 calc 2 3.65 0.1614 gpa 2 0.18 0.9133 hscalc 2 10.76 0.0046 english 2 2.47 0.2910 sex 2 4.21 0.1218 Likelihood Ratio 722 273.94 1.0000 Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ------------------------------------------------------------------- Intercept 1 11.9671 5.3022 5.09 0.0240 2 3.4508 2.9503 1.37 0.2422 precalc 1 -0.2919 0.2390 1.49 0.2220 2 -0.2011 0.1543 1.70 0.1925 calc 1 -0.3444 0.2034 2.87 0.0904 2 -0.00198 0.1005 0.00 0.9842 gpa 1 -0.0316 0.0906 0.12 0.7268 2 -0.00201 0.0597 0.00 0.9732 hscalc 1 -0.1187 0.0400 8.82 0.0030 2 -0.0390 0.0306 1.62 0.2030 english 1 0.0129 0.0495 0.07 0.7943 2 0.0409 0.0284 2.07 0.1500 sex 1 0.3576 0.7373 0.24 0.6277 2 0.9177 0.5082 3.26 0.0710 Contrasts of Maximum Likelihood Estimates Contrast DF Chi-Square Pr > ChiSq ----------------------------------------- dtest 2 2.98 0.2255