1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;5556 /* MathLogReg1.sas */57 %include '/folders/myfolders/441s16/Lecture/readmath2b.sas';NOTE: Format YNFMT has been output.NOTE: Format CRSFMT has been output.NOTE: Format NFMT has been output.NOTE: PROCEDURE FORMAT used (Total process time):real time 0.03 secondscpu time 0.01 seconds166 title2 'Logistic Regression with dummy variables on the Math data';167168 /* Recall definition of passed169 if (50<=mark<=100) then passed=1; else passed=0;170171 And172173 if course=4 then course2=.; else course2=course;174175 if course2=. then c1=.; else if course2=1 then c1=1; else c1=0;176 if course2=. then c2=.; else if course2=2 then c2=1; else c2=0;177 if course2=. then c3=.; else if course2=3 then c3=1; else c3=0;178 label c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite';179 */180181NOTE: The infile '/folders/myfolders/exploremath.data.txt' is:Filename=/folders/myfolders/exploremath.data.txt,Owner Name=root,Group Name=vboxsf,Access Permission=-rwxrwx---,Last Modified=18Jan2016:17:34:49,File Size (bytes)=44583NOTE: 579 records were read from the infile '/folders/myfolders/exploremath.data.txt'.The minimum record length was 75.The maximum record length was 75.NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).99 at 80:24 99 at 117:13NOTE: The data set WORK.MATHEX has 579 observations and 34 variables.NOTE: DATA statement used (Total process time):real time 0.05 secondscpu time 0.05 seconds182 proc freq;183 title3 'Check course2 and dummy vars -- and why so many no course?';184 tables (course c1-c3) * course2185 / norow nocol nopercent missing;186NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE FREQ used (Total process time):real time 0.25 secondscpu time 0.24 seconds187 proc freq;188 title3 'A few simple Chi-squared tests to predict passed';189 tables (course2 sex ethnic tongue) * passed / nocol nopercent chisq;190NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE FREQ used (Total process time):real time 0.34 secondscpu time 0.33 seconds191 proc logistic descending order=internal; /* To model Y=1 */192 title3 'Course2 by passed with dummy vars: Compare LR Chisq = 34.4171';193 model passed = c1 c3; /* Mainstream is reference category */194 Course1_vs_2: test c1=0;195 Course1_vs_3: test c1=c3;196 Course2_vs_3: test c3=0;197198 /*199 A few details:200201 The higher the minus 2 Log Likelihood, the lower the (estimated) maximum202 probability of observing these responses. It is a meaure of lack of203 model fit. The Akaike information criterion and Schwarz's Bayesian204 criterion both impose a further penalty for number of explanatory205 variables. Small is good.206207 Association of Predicted Probabilities and Observed Responses:208 * Every case has Y=0 or Y=1.209 * Every case has a p-hat.210 * Pick a case with Y=0, and another case with Y=1. That's a pair.211 * If the case with Y=0 has a lower p-hat than the case with Y=1,212 the pair is concordant.213 */214215NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.28 secondscpu time 0.25 seconds216 proc iml;NOTE: IML Ready217 title3 'Estimate prob. of passing for for course=3: Compare 31/39 = 0.7949';218 b0 = 0.4077;218 ! b1 = -1.4838;218 ! b2 = 0.9468;219 c1 = 0;219 ! c3=1;220 lcombo = b0 + b1*c1 + b2*c3;221 probpass = exp(lcombo) / (1+exp(lcombo));222 print "Estimated probability of passing course 3 is " probpass;223NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.03 secondscpu time 0.03 seconds224 proc logistic descending order=internal;225 title3 'Use the Class statement';226 class course2 / param=ref; /* This param option makes the ALPHABETICALLY227 last category (Mainstream) the reference228 category */229 model passed = course2;230 contrast 'Catch-up vs Mainstream' course2 1 0;231 contrast 'Elite vs Mainstream' course2 0 1;232 contrast 'Catch-up vs Elite' course2 1 -1;233234 /* Contrast is a little tricky in proc logistic. It lets you specify a235 set of linear combinations (not necessarily contrasts) to test on the236 regression coefficients. It is essential to know exactly what the dummy237 variable coding scheme is. This can still be more convenient than238 defining your own dummy variables in the data step. */239240241242 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;254