1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 /* MathLogReg1.sas */
57 %include '/folders/myfolders/441s16/Lecture/readmath2b.sas';
NOTE: Format YNFMT is already on the library WORK.FORMATS.
NOTE: Format YNFMT has been output.
NOTE: Format CRSFMT is already on the library WORK.FORMATS.
NOTE: Format CRSFMT has been output.
NOTE: Format NFMT is already on the library WORK.FORMATS.
NOTE: Format NFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
166 title2 'Logistic Regression with dummy variables on the Math data';
167
168 /* Recall definition of passed
169 if (50<=mark<=100) then passed=1; else passed=0;
170
171 And
172
173 if course=4 then course2=.; else course2=course;
174
175 if course2=. then c1=.; else if course2=1 then c1=1; else c1=0;
176 if course2=. then c2=.; else if course2=2 then c2=1; else c2=0;
177 if course2=. then c3=.; else if course2=3 then c3=1; else c3=0;
178 label c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite';
179 */
180
181
NOTE: The infile '/folders/myfolders/exploremath.data.txt' is:
Filename=/folders/myfolders/exploremath.data.txt,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=18Jan2016:18:34:49,
File Size (bytes)=44583
NOTE: 579 records were read from the infile '/folders/myfolders/exploremath.data.txt'.
The minimum record length was 75.
The maximum record length was 75.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
99 at 80:24 99 at 117:13
NOTE: The data set WORK.MATHEX has 579 observations and 34 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
182 proc freq;
183 title3 'Check course2 and dummy vars -- and why so many no course?';
184 tables (course c1-c3) * course2
185 / norow nocol nopercent missing;
186
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.11 seconds
cpu time 0.12 seconds
187 proc freq;
188 title3 'A few simple Chi-squared tests to predict passed';
189 tables (course2 sex ethnic tongue) * passed / nocol nopercent chisq;
190
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.15 seconds
cpu time 0.15 seconds
191 proc logistic;
192 title3 'Course2 by passed with dummy vars: Compare LR Chisq = 34.4171';
193 model passed (event='Yes') = c1 c3; /* Mainstream is reference category */
194 Course1_vs_2: test c1=0;
195 Course1_vs_3: test c1=c3;
196 Course2_vs_3: test c3=0;
197
198 /*
199 A few details:
200
201 The higher the minus 2 Log Likelihood, the lower the (estimated) maximum
202 probability of observing these responses. It is a meaure of lack of
203 model fit. The Akaike information criterion and Schwarz's Bayesian
204 criterion both impose a further penalty for number of explanatory
205 variables. Small is good.
206
207 Association of Predicted Probabilities and Observed Responses:
208 * Every case has Y=0 or Y=1.
209 * Every case has a p-hat.
210 * Pick a case with Y=0, and another case with Y=1. That's a pair.
211 * If the case with Y=0 has a lower p-hat than the case with Y=1,
212 the pair is concordant.
213 */
214
215
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.11 seconds
cpu time 0.10 seconds
216 proc iml;
NOTE: IML Ready
217 title3 'Estimate prob. of passing for for course=3: Compare 31/39 = 0.7949';
218 b0 = 0.4077;
218 ! b1 = -1.4838;
218 ! b2 = 0.9468;
219 c1 = 0;
219 ! c3=1;
220 lcombo = b0 + b1*c1 + b2*c3;
221 probpass = exp(lcombo) / (1+exp(lcombo));
222 print "Estimated probability of passing course 3 is " probpass;
223
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 0.03 seconds
cpu time 0.02 seconds
224 proc logistic;
225 title3 'Use the Class statement';
226 class course2 / param=ref; /* This param option makes the ALPHABETICALLY
227 last category (Mainstream) the reference
228 category */
229 model passed (event='Yes') = course2;
230 contrast 'Catch-up vs Mainstream' course2 1 0;
231 contrast 'Elite vs Mainstream' course2 0 1;
232 contrast 'Catch-up vs Elite' course2 1 -1;
233
234 /* Contrast is a little tricky in proc logistic. It lets you specify a
235 set of linear combinations (not necessarily contrasts) to test on the
236 regression coefficients. It is essential to know exactly what the dummy
237 variable coding scheme is. This can still be more convenient than
238 defining your own dummy variables in the data step. */
239
240
241
242 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
254