1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
NOTE: ODS statements in the SAS Studio environment may disable some output features.
71
72 /* mathlogreg2.sas */
73 %include '/home/brunner0/441s20/readmath2b.sas';
NOTE: Format YNFMT is already on the library WORK.FORMATS.
NOTE: Format YNFMT has been output.
NOTE: Format CRSFMT is already on the library WORK.FORMATS.
NOTE: Format CRSFMT has been output.
NOTE: Format NFMT is already on the library WORK.FORMATS.
NOTE: Format NFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 244.34k
OS Memory 30372.00k
Timestamp 02/10/2020 12:09:23 AM
Step Count 64 Switch Count 0
Page Faults 0
Page Reclaims 24
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 32
180 /* %include '/home/brunner0/441s20/readexplor.sas'; */
181 title2 'Predict Passing the course (Y-N) with Logistic Regression';
182
183 /* We know course is useful:
184 c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite' */
185
NOTE: The infile '/home/brunner0/441s20/exploremath.data.txt' is:
Filename=/home/brunner0/441s20/exploremath.data.txt,
Owner Name=brunner0,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=26Jan2020:18:49:34,
File Size (bytes)=44583
NOTE: 579 records were read from the infile '/home/brunner0/441s20/exploremath.data.txt'.
The minimum record length was 75.
The maximum record length was 75.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
99 at 96:24 99 at 135:13
NOTE: The data set WORK.MATHEX has 579 observations and 35 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 891.56k
OS Memory 30888.00k
Timestamp 02/10/2020 12:09:23 AM
Step Count 65 Switch Count 3
Page Faults 0
Page Reclaims 106
Page Swaps 0
Voluntary Context Switches 24
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
186 proc logistic data = mathex;
187 title3 'Course and HS variables';
188 model passed (event='Yes') = c1 c3 hsgpa hscalc hsengl;
189 course: test c1=c3=0;
190 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.10 seconds
user cpu time 0.10 seconds
system cpu time 0.00 seconds
memory 4573.59k
OS Memory 32952.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 66 Switch Count 1
Page Faults 0
Page Reclaims 435
Page Swaps 0
Voluntary Context Switches 6
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 56
191
192
193 /* Decision: Drop course */
194
195
196 ods select ParameterEstimates; /* Limit the output */
197 proc logistic data = mathex;
198 title3 'Just HS variables';
199 model passed (event='Yes') = hsgpa hscalc hsengl;
200 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 2339.25k
OS Memory 32952.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 67 Switch Count 1
Page Faults 0
Page Reclaims 192
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
201
202
203 /* Decision: Drop HS English.
204 Does the diagnostic test add anything? */
205
206
207 ods select ParameterEstimates;
208 proc logistic data = mathex;
209 title3 'HS GPA, HS Calculus and Diagnostic Test';
210 model passed (event='Yes') = hsgpa hscalc calc precalc;
211 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 2306.09k
OS Memory 32952.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 68 Switch Count 1
Page Faults 0
Page Reclaims 202
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
212
213
214 /* Decision: Drop the calc subscale, but which is better,
215 precalc or total score? */
216
217
218 ods select ParameterEstimates TestStmts; /* I ran a trace to find out the name */
219 proc logistic data = mathex;
220 title3 'HS GPA, HS Calculus and Diagnostic Test';
221 model passed (event='Yes') = hsgpa hscalc precalc totscore;
222 precalc_n_totscore: test precalc = totscore = 0;
223 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.03 seconds
user cpu time 0.03 seconds
system cpu time 0.01 seconds
memory 2331.43k
OS Memory 32952.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 69 Switch Count 1
Page Faults 0
Page Reclaims 201
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 48
224
225
226 /* Decision: Keep precalc rather than totscore. Confirm */
227
228
229 ods select ParameterEstimates;
230 proc logistic data = mathex;
231 title3 'HS GPA, HS Calculus and Pre-calculus test';
232 model passed (event='Yes') = hsgpa hscalc precalc;
233 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 2282.78k
OS Memory 32952.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 70 Switch Count 1
Page Faults 0
Page Reclaims 188
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
234
235 proc logistic data = mathex;
236 title3 'Try gender, ethnic and mother tongue controlling for good stuff';
237 class ethnic (param=ref ref='East Indian');
238 /* Specifying a reference category that's not the last value */
239 model passed (event='Yes') = hsgpa hscalc precalc ethnic gender mtongue;
240 contrast 'Demographics' ethnic 1 0 0 0 0,
241 ethnic 0 1 0 0 0,
242 ethnic 0 0 1 0 0,
243 ethnic 0 0 0 1 0,
244 ethnic 0 0 0 0 1,
245 gender 1,
246 mtongue 1 / e;
247 /* Display the effect matrix */
248 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.14 seconds
user cpu time 0.15 seconds
system cpu time 0.00 seconds
memory 2586.50k
OS Memory 33208.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 71 Switch Count 1
Page Faults 0
Page Reclaims 253
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 104
249
250
251 /* Decision: Forget about ethnicity. */
252
253
254 ods select ParameterEstimates;
255 proc logistic data = mathex;
256 title3 'HS GPA, HS Calculus, Pre-calculus test, Gender and Mother tongue';
257 model passed (event='Yes') = hsgpa hscalc precalc gender mtongue;
258 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 2303.68k
OS Memory 33208.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 72 Switch Count 1
Page Faults 0
Page Reclaims 198
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
259
260
261 /* Decision: Drop Gender and Mother tongue too.
262 My model now has just HS GPA, HS Calculus and Pre-calculus test. */
263
264
265 proc logistic data = mathex;
266 title3 'Try automatic (stepwise) selection';
267 model passed (event='Yes') =
268 gender mtongue e1-e6
269 hsgpa hscalc hsengl
270 c1-c3 precalc calc totscore
271 / selection = stepwise slentry = 0.05 slstay = 0.05 ;
272 /* Default slentry = slstay = 0.15 */
273 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 0.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 2.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 3.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.16 seconds
user cpu time 0.17 seconds
system cpu time 0.00 seconds
memory 2577.09k
OS Memory 33208.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 73 Switch Count 1
Page Faults 0
Page Reclaims 211
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 104
274
275
276 /* Note 211 observations lost to missingness for stepwise, compared to 204
277 for the earlier model with hsgpa, hscalc and precalc. */
278
279 /* Perhaps missingness on the variables we dropped could be useful. */
280
281 proc freq;
282 title2 'Explore missingness on omitted variables';
283 tables gender mtongue ethnic;
284 tables gender*mtongue / norow nocol nopercent missing;
285 tables gender*course / norow nocol nopercent missing;
286 run;
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.08 seconds
user cpu time 0.08 seconds
system cpu time 0.00 seconds
memory 1629.90k
OS Memory 32944.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 74 Switch Count 5
Page Faults 0
Page Reclaims 232
Page Swaps 0
Voluntary Context Switches 33
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 544
287
288
289 data mathex2;
290 set mathex;
291 if gender = . then sexmiss = 1; else sexmiss=0; /* Includes mtongue */
292 if course = . then coursemiss = 1; else coursemiss=0;
293 format sexmiss coursemiss ynfmt.;
294 label sexmiss = 'Gender and mother tongue missing'
295 coursemiss = 'Course missing';
296
297 /* Checks are commented out
298 proc freq;
299 tables gender*sexmiss / norow nocol nopercent missing;
300 tables course*coursemiss / norow nocol nopercent missing;
301 tables sexmiss*coursemiss / norow nocol nopercent missing chisq;
302 */
303
304
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: The data set WORK.MATHEX2 has 579 observations and 37 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1291.53k
OS Memory 32684.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 75 Switch Count 2
Page Faults 0
Page Reclaims 95
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
305 proc logistic data = mathex2;
306 title3 'Try adding missingness on gender/mtongue and course';
307 model passed (event='Yes') = hsgpa hscalc precalc sexmiss coursemiss;
308 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX2.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.08 seconds
user cpu time 0.08 seconds
system cpu time 0.00 seconds
memory 2429.46k
OS Memory 33720.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 76 Switch Count 1
Page Faults 0
Page Reclaims 211
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 8
Block Input Operations 0
Block Output Operations 64
309
310
311 /* All the cases with course missing were deleted because of
312 missingness on other variables. */
313
314 proc logistic data = mathex2;
315 title3 'Try adding missingness on gender/mtongue and course';
316 model passed (event='Yes') = hsgpa hscalc precalc sexmiss;
317 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX2.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.08 seconds
user cpu time 0.07 seconds
system cpu time 0.00 seconds
memory 2333.09k
OS Memory 33720.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 77 Switch Count 1
Page Faults 0
Page Reclaims 205
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 80
318
319 /* Here's the current model. */
320
321 proc logistic data = mathex;
322 title3 'HS GPA, HS Calculus and Pre-calculus test';
323 model passed (event='Yes') = hsgpa hscalc precalc;
324 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.MATHEX.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.07 seconds
user cpu time 0.07 seconds
system cpu time 0.00 seconds
memory 2308.87k
OS Memory 33720.00k
Timestamp 02/10/2020 12:09:24 AM
Step Count 78 Switch Count 1
Page Faults 0
Page Reclaims 201
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 64
325
326
327 /* Goal: Develop a prediction model that uses all the data and makes a
328 prediction for every case. */
329
330
331
332
333 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
344