/* MathLogReg1.sas */ %include '/folders/myfolders/441s16/Lecture/readmath2b.sas'; title2 'Logistic Regression with dummy variables on the Math data'; /* Recall definition of passed if (50<=mark<=100) then passed=1; else passed=0; And if course=4 then course2=.; else course2=course; if course2=. then c1=.; else if course2=1 then c1=1; else c1=0; if course2=. then c2=.; else if course2=2 then c2=1; else c2=0; if course2=. then c3=.; else if course2=3 then c3=1; else c3=0; label c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite'; */ proc freq; title3 'Check course2 and dummy vars -- and why so many no course?'; tables (course c1-c3) * course2 / norow nocol nopercent missing; proc freq; title3 'A few simple Chi-squared tests to predict passed'; tables (course2 sex ethnic tongue) * passed / nocol nopercent chisq; proc logistic descending order=internal; /* To model Y=1 */ title3 'Course2 by passed with dummy vars: Compare LR Chisq = 34.4171'; model passed = c1 c3; /* Mainstream is reference category */ Course1_vs_2: test c1=0; Course1_vs_3: test c1=c3; Course2_vs_3: test c3=0; /* A few details: The higher the minus 2 Log Likelihood, the lower the (estimated) maximum probability of observing these responses. It is a meaure of lack of model fit. The Akaike information criterion and Schwarz's Bayesian criterion both impose a further penalty for number of explanatory variables. Small is good. Association of Predicted Probabilities and Observed Responses: * Every case has Y=0 or Y=1. * Every case has a p-hat. * Pick a case with Y=0, and another case with Y=1. That's a pair. * If the case with Y=0 has a lower p-hat than the case with Y=1, the pair is concordant. */ proc iml; title3 'Estimate prob. of passing for for course=3: Compare 31/39 = 0.7949'; b0 = 0.4077; b1 = -1.4838; b2 = 0.9468; c1 = 0; c3=1; lcombo = b0 + b1*c1 + b2*c3; probpass = exp(lcombo) / (1+exp(lcombo)); print "Estimated probability of passing course 3 is " probpass; proc logistic descending order=internal; title3 'Use the Class statement'; class course2 / param=ref; /* This param option makes the ALPHABETICALLY last category (Mainstream) the reference category */ model passed = course2; contrast 'Catch-up vs Mainstream' course2 1 0; contrast 'Elite vs Mainstream' course2 0 1; contrast 'Catch-up vs Elite' course2 1 -1; /* Contrast is a little tricky in proc logistic. It lets you specify a set of linear combinations (not necessarily contrasts) to test on the regression coefficients. It is essential to know exactly what the dummy variable coding scheme is. This can still be more convenient than defining your own dummy variables in the data step. */