/* mathlogreg1.sas */ title2 'Logistic Regression on the Exploratory Math data'; %include '/home/u1407221/441s24/SAS08/ReadLabelMath2.sas'; proc freq data=explore; title3 'Course by passed with proc freq'; tables course2 * passed / nocol nopercent chisq; proc logistic data=explore; title3 'Course by passed with dummy vars: Compare LR Chisq = 34.4171'; model passed (event='Yes') = c1 c3; /* Omit c2 so Mainstream is reference category */ /* Without proc format, (event='1') would work. */ /* Wald chi-squared tests */ course: test c1=c3=0; Course1_vs_2: test c1=0; Course1_vs_3: test c1=c3; Course2_vs_3: test c3=0; /* Question: The estimated odds of passing the course are ___ times as great for a student in the elite course, compared to a student in the mainstream course. Question: With 95% confidence, the chances of a student passing the catch-up course are between ___% and ___% as great as the chances of passing the mainstream course. Note the deliberately vague but useful word "chances." > exp(-1.4838) [1] 0.2267743 > A = -1.4838 - 1.96*0.3171; B = -1.4838 + 1.96*0.3171 > exp(c(A,B)) [1] 0.1218072 0.4221967 A few details about the output : The higher the minus 2 Log Likelihood, the lower the (estimated) maximum probability of observing these responses. It is a meaure of lack of model fit. The Akaike information criterion and Schwarz's Bayesian criterion both impose a further penalty for number of explanatory variables. Small is good. "Association of Predicted Probabilities and Observed Responses": * Every case has Y=0 or Y=1. * Every case has a p-hat. * Pick a case with Y=0, and another case with Y=1. That's a pair. * If the case with Y=0 has a lower p-hat than the case with Y=1, the pair is concordant. */ proc iml; title3 'Estimate prob. of passing for for course=3: Compare 31/39 = 0.7949'; b0 = 0.4077; b1 = -1.4838; b2 = 0.9468; c1 = 0; c3=1; lcombo = b0 + b1*c1 + b2*c3; probpass = exp(lcombo) / (1+exp(lcombo)); print "Estimated probability of passing course 3 (Elite) is " probpass; proc logistic data=explore; title3 'Use the class and contrast statements'; class course2 / param=ref; /* This param option makes the ALPHABETICALLY last category (Mainstream) the reference category. Default is effect coding. */ model passed (event='Yes') = course2; contrast 'Catch-up vs Mainstream' course2 1 0; contrast 'Elite vs Mainstream' course2 0 1; contrast 'Catch-up vs Elite' course2 1 -1; /* Contrast is a little tricky in proc logistic compared to proc glm. It lets you specify a set of linear combinations of regression coefficients to test against zero. It is essential to know exactly what the dummy variable coding scheme is. This can still be more convenient than defining your own dummy variables in the data step. */ proc logistic data=explore; title3 'Course controlling for score on diagnostic test'; class course2 / param=ref; model passed (event='Yes') = course2 totscore; contrast 'Course controlling for totscore' course2 1 0, course2 0 1; contrast 'Catch-up vs. Elite controlling for totscore' course2 1 -1; run; /* Estimate a probability of passing without typing in the * estimated regression coefficients: Use the Output Delivery * System (ODS). All the tables in a SAS results file have names. * You can find out what they are with a web search on * "proc logistic ods table names" , which will take you to the * manual. Easier is to do a preliminary run with ods trace on, * which writes the table names on the log file as they are produced. */ ods trace on; proc logistic data=explore; title3 'What are the ods table names?'; model passed (event='Yes') = c1 c3 totscore; run; /* Need run with ods trace */ ods trace off; ods output ParameterEstimates = estimout; /* The ParameterEstimates table will be written to a SAS data set called estimout. */ proc logistic data=explore; title3 'Save parameter estimates in data set estimout using ods'; model passed (event='Yes') = c1 c3 totscore; proc print data=estimout; proc iml; title3 'Estimated Probabilty of Passing'; use estimout; read all var {Estimate} into b; print "Estimated regression coefficients"; print b; /* Student in the catch-up class who got 10 right out of 20 */ x1 = {1, 1, 0, 10}; /* Rows are separated by commas */ pihat1 = exp(x1`*b)/(1+exp(x1`*b)); print "Student in the catch-up class who got 10 right out of 20" pihat1; /* Student in the elite class who got all 20 right */ x2 = {1, 0, 1, 20}; /* Rows are separated by commas */ Pihat2 = exp(x2`*b)/(1+exp(x2`*b)); print "Student in the elite class who got 20 right out of 20" pihat2; /********************** Output not shown ****************************/ proc logistic data=explore noprint; title3 'Course controlling for score on diagnostic test'; class course2 / param=ref; model passed (event='Yes') = course2 totscore; contrast 'Course controlling for totscore' course2 1 0, course2 0 1 / e; /* The e option gives the "effect" matrix L in H0: L beta = 0 */ proc logistic data=explore noprint; title3 'Course controlling for score on diagnostic test'; class course2 / param=ref; model passed (event='Yes') = course2 totscore; contrast 'Course controlling for totscore' course2 -19.7 0, course2 0 9 / e; quit; /********************************************************************/