1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
NOTE: ODS statements in the SAS Studio environment may disable some output features.
73
74 /* mathlogreg2.sas */
75 %include '/home/u1407221/441s24/SAS08/ReadLabelMath2.sas';
NOTE: Format YNFMT has been output.
NOTE: Format CRSFMT has been output.
NOTE: Format NFMT has been output.
NOTE: Format NCFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 302.78k
OS Memory 25252.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 24 Switch Count 2
Page Faults 0
Page Reclaims 89
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 56
NOTE: The infile '/home/u1407221/441s24/data/math.data.txt' is:
Filename=/home/u1407221/441s24/data/math.data.txt,
Owner Name=u1407221,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=10Feb2024:17:04:10,
File Size (bytes)=90324
NOTE: 1158 records were read from the infile '/home/u1407221/441s24/data/math.data.txt'.
The minimum record length was 76.
The maximum record length was 76.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
180 at 121:24
NOTE: The data set WORK.MATH has 1158 observations and 37 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1176.21k
OS Memory 26536.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 25 Switch Count 3
Page Faults 0
Page Reclaims 283
Page Swaps 0
Voluntary Context Switches 23
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 776
NOTE: There were 1158 observations read from the data set WORK.MATH.
NOTE: The data set WORK.REPLIC has 579 observations and 37 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1412.81k
OS Memory 26924.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 26 Switch Count 2
Page Faults 0
Page Reclaims 152
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
NOTE: There were 1158 observations read from the data set WORK.MATH.
NOTE: The data set WORK.EXPLORE has 579 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1408.28k
OS Memory 26924.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 27 Switch Count 2
Page Faults 0
Page Reclaims 132
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
236
237 title2 'Predict Passing the course (Y-N) with Logistic Regression';
238
239 /* We know course is useful:
240 c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite' */
241
242 proc logistic data = explore;
243 title3 'Course and HS variables';
244 model passed (event='Yes') = c1 c3 hsgpa hscalc hsengl;
245 course: test c1=c3=0;
246 HSvars: test hsgpa=hscalc=hsengl=0;
247 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.09 seconds
user cpu time 0.09 seconds
system cpu time 0.00 seconds
memory 5277.37k
OS Memory 30648.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 28 Switch Count 1
Page Faults 0
Page Reclaims 1896
Page Swaps 0
Voluntary Context Switches 6
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 56
248
249 /* Decision: Drop course */
250
251 ods select ParameterEstimates; /* Limit the output */
252 proc logistic data = explore;
253 title3 'Just HS variables';
254 model passed (event='Yes') = hsgpa hscalc hsengl;
255 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 2338.65k
OS Memory 30904.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 29 Switch Count 1
Page Faults 0
Page Reclaims 257
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 48
256
257 /* Decision: Drop HS English.
258 Does the diagnostic test add anything? */
259
260 ods select ParameterEstimates;
261 proc logistic data = explore;
262 title3 'HS GPA, HS Calculus and Diagnostic Test';
263 model passed (event='Yes') = hsgpa hscalc calc precalc;
264 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.01 seconds
memory 2301.65k
OS Memory 30904.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 30 Switch Count 1
Page Faults 0
Page Reclaims 199
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
265
266 /* Decision: Drop the calc subscale, but which is better,
267 precalc or total score? */
268
269 ods select ParameterEstimates TestStmts; /* I ran a trace to find out the name */
270 proc logistic data = explore;
271 title3 'HS GPA, HS Calculus and Diagnostic Test';
272 model passed (event='Yes') = hsgpa hscalc precalc totscore;
273 precalc_n_totscore: test precalc = totscore = 0;
274 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 2339.25k
OS Memory 30904.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 31 Switch Count 1
Page Faults 0
Page Reclaims 209
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 48
275
276 /* Decision: Keep precalc rather than totscore. Confirm */
277
278 ods select ParameterEstimates;
279 proc logistic data = explore;
280 title3 'HS GPA, HS Calculus and Pre-calculus test';
281 model passed (event='Yes') = hsgpa hscalc precalc;
282 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 2302.59k
OS Memory 30904.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 32 Switch Count 1
Page Faults 0
Page Reclaims 198
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
283
284 proc logistic data = explore;
285 title3 'Try gender, ethnic and mother tongue controlling for good stuff';
286 class ethnic (param=ref ref='East Indian');
287 /* Specifying a reference category that's not the last value */
288 model passed (event='Yes') = hsgpa hscalc precalc ethnic gender mtongue;
289 contrast 'Demographics' ethnic 1 0 0 0 0,
290 ethnic 0 1 0 0 0,
291 ethnic 0 0 1 0 0,
292 ethnic 0 0 0 1 0,
293 ethnic 0 0 0 0 1,
294 gender 1,
295 mtongue 1 / e;
296 /* Display the effect matrix */
297 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.11 seconds
user cpu time 0.11 seconds
system cpu time 0.01 seconds
memory 2690.68k
OS Memory 31416.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 33 Switch Count 1
Page Faults 0
Page Reclaims 311
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 104
298
299 /* Decision: Forget about ethnicity. */
300
301 ods select ParameterEstimates;
302 proc logistic data = explore;
303 title3 'HS GPA, HS Calculus, Pre-calculus test, Gender and Mother tongue';
304 model passed (event='Yes') = hsgpa hscalc precalc gender mtongue;
305 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 2325.37k
OS Memory 31416.00k
Timestamp 03/10/2024 06:14:57 PM
Step Count 34 Switch Count 1
Page Faults 0
Page Reclaims 193
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
306
307 /* Decision: Drop Gender and Mother tongue too.
308 My model now has just HS GPA, HS Calculus and Pre-calculus test. */
309
310 proc logistic data = explore;
311 title3 'Try automatic (stepwise) selection';
312 model passed (event='Yes') =
313 gender mtongue e1-e6
314 hsgpa hscalc hsengl
315 c1-c3 precalc calc totscore
316 / selection = stepwise slentry = 0.05 slstay = 0.05 ;
317 /* Default slentry = slstay = 0.15 */
318 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 0.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 2.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 3.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.13 seconds
user cpu time 0.13 seconds
system cpu time 0.00 seconds
memory 2602.12k
OS Memory 31416.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 35 Switch Count 1
Page Faults 0
Page Reclaims 246
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 2
Block Input Operations 0
Block Output Operations 176
319
320
321 /* Note 211 observations lost to missingness for stepwise, compared to 204
322 for the earlier model with hsgpa, hscalc and precalc. */
323
324 /* Perhaps missingness on the variables we dropped could be useful. */
325
326 proc freq;
327 title2 'Explore missingness on omitted variables';
328 tables gender mtongue ethnic;
329 tables gender*mtongue / norow nocol nopercent missing;
330 tables gender*course2 / norow nocol nopercent missing;
331 run;
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.05 seconds
user cpu time 0.06 seconds
system cpu time 0.01 seconds
memory 1762.21k
OS Memory 31152.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 36 Switch Count 5
Page Faults 0
Page Reclaims 452
Page Swaps 0
Voluntary Context Switches 37
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 544
332
333
334 data explore2;
335 set explore;
336 if gender = . then sexmiss = 1; else sexmiss=0; /* Includes mtongue */
337 if course = . then coursemiss = 1; else coursemiss=0;
338 format sexmiss coursemiss ynfmt.;
339 label sexmiss = 'Gender and mother tongue missing'
340 coursemiss = 'Course missing';
341
342 /* Checks are commented out
343 proc freq;
344 tables gender*sexmiss / norow nocol nopercent missing;
345 tables course*coursemiss / norow nocol nopercent missing;
346 tables sexmiss*coursemiss / norow nocol nopercent missing chisq;
347 */
348
349
NOTE: Variable course is uninitialized.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: The data set WORK.EXPLORE2 has 579 observations and 31 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1266.25k
OS Memory 31148.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 37 Switch Count 2
Page Faults 0
Page Reclaims 162
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
350 proc logistic data = explore2;
351 title3 'Try adding missingness on gender/mtongue and course';
352 model passed (event='Yes') = hsgpa hscalc precalc sexmiss coursemiss;
353 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE2.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.06 seconds
user cpu time 0.07 seconds
system cpu time 0.00 seconds
memory 2414.34k
OS Memory 32184.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 38 Switch Count 1
Page Faults 0
Page Reclaims 243
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 72
354
355
356 /* All the cases with course missing were deleted because of
357 missingness on other variables. */
358
359 proc logistic data = explore2;
360 title3 'Try adding missingness on gender/mtongue and course';
361 model passed (event='Yes') = hsgpa hscalc precalc sexmiss;
362 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE2.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.00 seconds
memory 2342.46k
OS Memory 32184.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 39 Switch Count 1
Page Faults 0
Page Reclaims 208
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 80
363
364 /* Here's the current model. */
365
366 proc logistic data = explore;
367 title3 'HS GPA, HS Calculus and Pre-calculus test';
368 model passed (event='Yes') = hsgpa hscalc precalc;
369 output out=explore3 prob=pihat;
370 run;
NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: The data set WORK.EXPLORE3 has 579 observations and 30 variables.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.00 seconds
memory 2841.50k
OS Memory 32444.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 40 Switch Count 3
Page Faults 0
Page Reclaims 265
Page Swaps 0
Voluntary Context Switches 26
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 584
371
372
373 proc print data=explore3 (obs=13);
374 /* List only the first 13 observations */
375 var hsgpa hscalc precalc pihat passed;
376 run;
NOTE: There were 13 observations read from the data set WORK.EXPLORE3.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 821.53k
OS Memory 31144.00k
Timestamp 03/10/2024 06:14:58 PM
Step Count 41 Switch Count 0
Page Faults 0
Page Reclaims 147
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
377
378
379 /* Based on invariance and the Law of Total Probability (double expectation),
380 I predict that the mean pihat will be 234/375 = 0.624, the proportion of
381 students with non-missing data who passed. */
382
383 proc univariate normal plot data=explore3;
384 title2 'Explore the distribution of estimated probabilities';
385 where pihat ne .;
386 var pihat; /* Should have n=375 non-missing. */
387 run;
NOTE: PROCEDURE UNIVARIATE used (Total process time):
real time 2.43 seconds
user cpu time 0.17 seconds
system cpu time 0.03 seconds
memory 22269.43k
OS Memory 52396.00k
Timestamp 03/10/2024 06:15:00 PM
Step Count 42 Switch Count 4
Page Faults 0
Page Reclaims 7226
Page Swaps 0
Voluntary Context Switches 523
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 1632
388
389
390 /* Goal: Develop a prediction model that uses all the data and makes a
391 prediction for every case. Base on estimated probabilities. */
392
393
394
395
396
397
398
399
400
401 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
413