STA429/1007 F 2004 Handout 10

Logistic regression (Math data): Tie up some loose ends


/********************** mathlog2.sas **********************/
title2 'Logistic regression on math data: Part II';
options linesize=79 pagesize=2000 noovp formdlim='_';
libname math '/homes/students/u0/stats/brunner/mathlib';
               /* Full path to permanent SAS datasets */
libname library '/homes/students/u0/stats/brunner/mathlib';
               /* SAS will seach for permanently stored formats ONLY in a
                  place called "library."  */

data quant; /* Includes only cases that are used for full model. */
     set math.explore;
     goodcase = gpa+hscalc+precalc+calc; /* Will be missing if any missing */
     if goodcase =. then delete; /* Includes only cases used for full model. */
     /* Standardize these vars in proc standard below */
     zgpa=gpa ; zhscalc=hscalc ; zprecalc=precalc ; zcalc=calc ;

proc standard data=quant mean=0  std=1 out=withz;
     var zgpa zhscalc zprecalc zcalc; /* Standardize these vars */

proc logistic descending;
     title3 'Fit full model and do Wald tests with original vars';
     model passed = gpa hscalc precalc calc;
     hschool: test gpa=hscalc=0;
     dtest: test precalc=calc=0;

proc logistic descending;
     title3 'Fit full model and do Wald tests with standardized vars';
     title4 'Compare -2LL=366.007, hschool=51.2448, dtest=13.8587';
     model passed = zgpa zhscalc zprecalc zcalc;
     hschool: test zgpa=zhscalc=0;
     dtest: test zprecalc=zcalc=0;

proc iml;
     title3 'Calculate prodicted probability of passing';
     print "For a student at the mean on all Independent Variables,";
     /* Using estimated intercept for standardized model */
     avepass = exp(0.8036) / (1 +  exp(0.8036) );
     print "Estimated probability of passing is " avepass ;
     b = {-14.6351,0.1181,0.0592,0.2633,0.0821}; print b;
     /* From the Estimate col, non-standardized  */
     print "For a student with HS GPA = 80, HS calc = 75,";
     print "7 out of 10 on the precalc and 6 out of 10 on calc,";
     x = {1,80,75,7,6}; /* That 1 corresponds to the intercept */
     lcombo = b` * x; /* Matrix multiplication: Short for
     lcombo =  -14.6351 + 0.1181*80 + 0.0592*75 + 0.2633*7 + 0.0821*6    */
     pass = exp(lcombo) / (1+exp(lcombo));
     print "Estimated probability of passing is " pass ;

/* Try to make proc logistic do the LR test for dtest, G = 14.903 */

proc logistic data=math.explore descending;
     title3 'Try to get LR test for calc & precal: G = 14.903 ';
     model passed = gpa hscalc precalc calc
                   / include=2 selection=forward sequential slentry=1;

/*************  Here is what the options are doing:  *************
include=2            Include the first two variables regardless
selection=forward    Add vars to the model (not remove)
sequential           In the order they appear in the model statement
slentry=1            Enter a var if p < 1 (default is 0.05). So get them all.
******************************************************************/

/* Here's what happened.  After the first two variables were entered (step
zero) -2 Log L = 380.910. After all the rest were entered, (end of step 2), we
have -2 Log L = 366.007.  That's the same -2 Log L as the full model, and
380.910 - 366.007 = 14.903. Good. It's still clumsy, but at least we don't
have to fit full and reduced models separately. */

Output from the first two proc logistics is omitted, because all it does is show that in logistic regression, like in regular regression, you can standardize the independent variables (or just center them by subtracting off the mean) without affecting tests of whether a regression coefficient or collection of regression coefficients is equal to zero (this excludes tests about Õ0, of course). If you want to see the entire list file anyway, it's available.

Here is the rest of mathlog2.lst

_______________________________________________________________________________

                                The SAS System                                3
                   Logistic regression on math data: Part II
                  Calculate prodicted probability of passing
                                                 10:52 Sunday, November 7, 2004

            For a student at the mean on all Independent Variables,


                                                       AVEPASS

                Estimated probability of passing is   0.690744


                                       B

                                    -14.6351
                                      0.1181
                                      0.0592
                                      0.2633
                                      0.0821


                 For a student with HS GPA = 80, HS calc = 75,


              7 out of 10 on the precalc and 6 out of 10 on calc,


                                                          PASS

                Estimated probability of passing is   0.830419

_______________________________________________________________________________

                                The SAS System                                4
                   Logistic regression on math data: Part II
               Try to get LR test for calc & precal: G = 14.903
                                                 10:52 Sunday, November 7, 2004

                            The LOGISTIC Procedure

                              Model Information

     Data Set                      MATH.EXPLORE
     Response Variable             passed               Passed the course
     Number of Response Levels     2
     Number of Observations        375
     Model                         binary logit
     Optimization Technique        Fisher's scoring


                                Response Profile

                        Ordered                    Total
                          Value     passed     Frequency

                              1     Yes              234
                              2     No               141

                     Probability modeled is passed='Yes'.

NOTE: 204 observations were deleted due to missing values for the response or
      explanatory variables.


                          Forward Selection Procedure


The following effects will be included in each model:

Intercept  gpa  hscalc

Step  0. The INCLUDE effects were entered.


                           Model Convergence Status

                Convergence criterion (GCONV=1E-8) satisfied.


                             Model Fit Statistics

                                                  Intercept
                                   Intercept         and
                    Criterion        Only        Covariates

                    AIC              498.554        386.910
                    SC               502.481        398.691
                    -2 Log L         496.554        380.910


                    Testing Global Null Hypothesis: BETA=0

            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio       115.6442        2         <.0001
            Score                   99.0512        2         <.0001
            Wald                    73.7669        2         <.0001


                           Residual Chi-Square Test

                      Chi-Square       DF     Pr > ChiSq

                         14.4788        2         0.0007


Step  1. Effect precalc entered:


                           Model Convergence Status

                Convergence criterion (GCONV=1E-8) satisfied.


                             Model Fit Statistics

                                                  Intercept
                                   Intercept         and
                    Criterion        Only        Covariates

                    AIC              498.554        375.618
                    SC               502.481        391.326
                    -2 Log L         496.554        367.618


                    Testing Global Null Hypothesis: BETA=0

            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio       128.9358        3         <.0001
            Score                  107.7971        3         <.0001
            Wald                    79.6583        3         <.0001


                           Residual Chi-Square Test

                      Chi-Square       DF     Pr > ChiSq

                          1.6051        1         0.2052


Step  2. Effect calc entered:


                           Model Convergence Status

                Convergence criterion (GCONV=1E-8) satisfied.


                             Model Fit Statistics

                                                  Intercept
                                   Intercept         and
                    Criterion        Only        Covariates

                    AIC              498.554        376.007
                    SC               502.481        395.642
                    -2 Log L         496.554        366.007


                    Testing Global Null Hypothesis: BETA=0

            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio       130.5468        4         <.0001
            Score                  108.2737        4         <.0001
            Wald                    79.7057        4         <.0001

NOTE: All effects have been entered into the model.


                          Summary of Forward Selection

                  Effect               Number         Score
          Step    Entered      DF          In    Chi-Square    Pr > ChiSq

             1    precalc       1           3       13.0467        0.0003
             2    calc          1           4        1.6051        0.2052

                         Summary of Forward Selection

                                Variable
                        Step    Label

                           1    Number precalculus correct
                           2    Number calculus correct


                   Analysis of Maximum Likelihood Estimates

                                     Standard          Wald
      Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

      Intercept     1    -14.6351      2.2803       41.1914        <.0001
      gpa           1      0.1181      0.0311       14.4227        0.0001
      hscalc        1      0.0592      0.0136       18.9109        <.0001
      precalc       1      0.2633      0.0890        8.7518        0.0031
      calc          1      0.0821      0.0650        1.5969        0.2063


                             Odds Ratio Estimates

                                Point          95% Wald
                  Effect     Estimate      Confidence Limits

                  gpa           1.125       1.059       1.196
                  hscalc        1.061       1.033       1.090
                  precalc       1.301       1.093       1.549
                  calc          1.086       0.956       1.233


         Association of Predicted Probabilities and Observed Responses

               Percent Concordant     83.4    Somers' D    0.670
               Percent Discordant     16.4    Gamma        0.671
               Percent Tied            0.1    Tau-a        0.315
               Pairs                 32994    c            0.835