1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
70
71 /* MathReg3.sas */
72 %include '/home/brunner0/441s20/readexplor.sas';
NOTE: Format YNFMT is already on the library WORK.FORMATS.
NOTE: Format YNFMT has been output.
NOTE: Format CRSFMT is already on the library WORK.FORMATS.
NOTE: Format CRSFMT has been output.
NOTE: Format NFMT is already on the library WORK.FORMATS.
NOTE: Format NFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 244.34k
OS Memory 30884.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 119 Switch Count 0
Page Faults 0
Page Reclaims 29
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 32
189 /* Creates data table explore */
190 %include '/home/brunner0/441s20/readreplic.sas';
NOTE: The infile '/home/brunner0/441s20/exploremath.data.txt' is:
Filename=/home/brunner0/441s20/exploremath.data.txt,
Owner Name=brunner0,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=26Jan2020:18:49:34,
File Size (bytes)=44583
NOTE: 579 records were read from the infile '/home/brunner0/441s20/exploremath.data.txt'.
The minimum record length was 75.
The maximum record length was 75.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
99 at 98:24 99 at 134:18
NOTE: The data set WORK.EXPLORE has 579 observations and 35 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1014.21k
OS Memory 31656.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 120 Switch Count 3
Page Faults 0
Page Reclaims 158
Page Swaps 0
Voluntary Context Switches 25
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
NOTE: Format YNFMT is already on the library WORK.FORMATS.
NOTE: Format YNFMT has been output.
NOTE: Format CRSFMT is already on the library WORK.FORMATS.
NOTE: Format CRSFMT has been output.
NOTE: Format NFMT is already on the library WORK.FORMATS.
NOTE: Format NFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 273.84k
OS Memory 31396.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 121 Switch Count 0
Page Faults 0
Page Reclaims 14
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
303 /* Creates data table replic */
304 title2 'Predict Grade for Replication Sample';
305
306 /* Plan:
307
308 1. Non-obvious findings from the exploration (based on Model I,
309 which predicts grade from hsgpa hscalc hsengl totscore mtongue) were
310 a. HS Engl neg
311 b. mtongue neg
312 c. totscore positive (diagnostic test matters)
313 Test these on the replication with a Bonferroni correction for 3 tests.
314 The other two results (HS GPA and HS Calculus) were obvious.
315
316 2. See if prediction intervals work as advertised for Model H, which
317 predicts grade from hsgpa hscalc hsengl totscore.
318
319 3. Compare prediction of letter grade for the models with and without
320 the diagnostic test.
321
322
323 First, just illustrate use of different data tables in the same run. */
324
NOTE: The infile '/home/brunner0/441s20/replicmath.data.txt' is:
Filename=/home/brunner0/441s20/replicmath.data.txt,
Owner Name=brunner0,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=26Jan2020:18:49:13,
File Size (bytes)=38214
NOTE: 579 records were read from the infile '/home/brunner0/441s20/replicmath.data.txt'.
The minimum record length was 64.
The maximum record length was 64.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
81 at 218:24 81 at 254:18
NOTE: The data set WORK.REPLIC has 579 observations and 35 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 924.90k
OS Memory 31656.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 122 Switch Count 3
Page Faults 0
Page Reclaims 92
Page Swaps 0
Voluntary Context Switches 22
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
325 proc freq data = explore;
326 title3 'Exploratory Sample';
327 tables outcome;
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.03 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 2760.93k
OS Memory 32172.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 123 Switch Count 3
Page Faults 0
Page Reclaims 211
Page Swaps 0
Voluntary Context Switches 16
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 272
328 proc freq data = replic;
329 title3 'Replication Sample';
330 tables outcome;
331
332 /* Now test the three findings: Point 1 above */
333
NOTE: There were 579 observations read from the data set WORK.REPLIC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.01 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 977.12k
OS Memory 32428.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 124 Switch Count 3
Page Faults 0
Page Reclaims 185
Page Swaps 0
Voluntary Context Switches 21
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
334 proc reg data = replic plots = none;
335 title3 'Try to replicate HS Engl neg, mtongue neg, totscore pos';
336 title4 'with a Bonferroni correction (check p < 0.05/3 = 0.01666667)';
337 model grade = hsgpa hscalc hsengl totscore mtongue;
338
339 /* Make combined data table, look at prediction intervals: Point 2 */
340
NOTE: PROCEDURE REG used (Total process time):
real time 0.05 seconds
user cpu time 0.05 seconds
system cpu time 0.01 seconds
memory 2711.15k
OS Memory 33984.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 125 Switch Count 3
Page Faults 0
Page Reclaims 297
Page Swaps 0
Voluntary Context Switches 32
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 64
341 data predict;
342 set explore replic;
343 keeper = grade+hsgpa+hscalc+hsengl+totscore;
344 /* keeper will be missing if any of the vars are missing */
345 if keeper ne .; /* Discards all other cases */
346 grade2 = grade; /* Save value of grade for future use */
347 if sample=2 then grade=. ;
348 /* Response variable is now missing for replication sample.
349 But it is preserved in grade2 */
350
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
486 at 343:21 21 at 343:27 4 at 343:34 65 at 343:41
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: There were 579 observations read from the data set WORK.REPLIC.
NOTE: The data set WORK.PREDICT has 582 observations and 37 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1820.68k
OS Memory 32944.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 126 Switch Count 2
Page Faults 0
Page Reclaims 132
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
351 proc reg plots = none data = predict;
352 /* Data table predict is the default anyway */
353 title3 'Model H: hsgpa hscalc hsengl totscore: R-sq = 0.4532';
354 model grade = hsgpa hscalc hsengl totscore;
355 output out = predataH predicted = Yhat
356 L95 = lowpred
357 U95 = hipred;
358 /* Data table predataH has everything in predict plus
359 Yhat and the lower and upper 95% prediction limits. */
360
NOTE: The data set WORK.PREDATAH has 582 observations and 40 variables.
NOTE: PROCEDURE REG used (Total process time):
real time 0.04 seconds
user cpu time 0.05 seconds
system cpu time 0.00 seconds
memory 3133.56k
OS Memory 34244.00k
Timestamp 02/01/2020 08:40:44 PM
Step Count 127 Switch Count 4
Page Faults 0
Page Reclaims 344
Page Swaps 0
Voluntary Context Switches 40
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 576
361 proc print;
362 title3 'Look at predictions for the replication sample';
363 var id sample grade2 Yhat lowpred hipred;
364 where sample = 2;
365 /* Should predicted marks be used to advise students? */
366
367 /* Does 95 Percent Prediction Interval really contain 95 percent of grades?
368 Recall that the data fail all tests for normality, and the prediction
369 intervals are based on normal theory. */
370
NOTE: There were 293 observations read from the data set WORK.PREDATAH.
WHERE sample=2;
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.41 seconds
user cpu time 0.42 seconds
system cpu time 0.00 seconds
memory 1754.50k
OS Memory 32940.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 128 Switch Count 6
Page Faults 0
Page Reclaims 172
Page Swaps 0
Voluntary Context Switches 27
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 248
371 data predictB;
372 set predataH;
373 if (lowpred < grade2 < hipred) then ininterval='Yes';
374 else ininterval='No';
375
NOTE: There were 582 observations read from the data set WORK.PREDATAH.
NOTE: The data set WORK.PREDICTB has 582 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1349.25k
OS Memory 33196.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 129 Switch Count 2
Page Faults 0
Page Reclaims 99
Page Swaps 0
Voluntary Context Switches 14
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
376 proc freq;
377 title3 'Does 95 Percent Prediction Interval Work?';
378 tables sample * ininterval / nocol nopercent;
379
380 /* Keep trying. Try to predict letter grade. */
381
NOTE: There were 582 observations read from the data set WORK.PREDICTB.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 1396.90k
OS Memory 33456.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 130 Switch Count 5
Page Faults 0
Page Reclaims 219
Page Swaps 0
Voluntary Context Switches 38
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 536
382 data predictC;
383 set predictB;
384 if 80 <= grade2 <= 100 then lgrade = 'A';
385 else if 70 <= grade2 <= 79 then lgrade = 'B';
386 else if 60 <= grade2 <= 69 then lgrade = 'C';
387 else if 50 <= grade2 <= 59 then lgrade = 'D';
388 else if 0 <= grade2 <= 49 then lgrade = 'F';
389 label lgrade = 'Letter Grade';
390 pregrade = round(Yhat);
391 if 80 <= pregrade <= 100 then prelgrade = 'A';
392 else if 70 <= pregrade <= 79 then prelgrade = 'B';
393 else if 60 <= pregrade <= 69 then prelgrade = 'C';
394 else if 50 <= pregrade <= 59 then prelgrade = 'D';
395 else if 0 <= pregrade <= 49 then prelgrade = 'F';
396 label prelgrade = 'Predicted Letter Grade';
397
NOTE: There were 582 observations read from the data set WORK.PREDICTB.
NOTE: The data set WORK.PREDICTC has 582 observations and 44 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1368.81k
OS Memory 33196.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 131 Switch Count 2
Page Faults 0
Page Reclaims 102
Page Swaps 0
Voluntary Context Switches 17
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
398 proc freq;
399 title3 'Accuracy of predicting Letter Grades From Model H';
400 tables sample*prelgrade*lgrade / nocol nopercent;
401 /* Will yield separate table for each sample. */
402
403 /* Predict grade for a new student with hsgpa=80 hscalc=90 hsengl=70
404 totscore=15. For just a prediction (no interval), proc glm is easier. */
405
NOTE: There were 582 observations read from the data set WORK.PREDICTC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.08 seconds
user cpu time 0.09 seconds
system cpu time 0.00 seconds
memory 1345.62k
OS Memory 33456.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 132 Switch Count 5
Page Faults 0
Page Reclaims 213
Page Swaps 0
Voluntary Context Switches 38
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 560
406 proc glm data = explore;
407 model grade = hsgpa hscalc hsengl totscore;
408 estimate 'New Student' intercept 1 hsgpa 80 hscalc 90 hsengl 70
409 totscore 15;
410
411 /* Prediction for Y_{n+1} is the same as estimate of E[Y|X]. CI from proc glm
412 is for E[Y|X]. PREDICTION interval for Y_{n+1} is wider. */
413
NOTE: PROCEDURE GLM used (Total process time):
real time 0.09 seconds
user cpu time 0.10 seconds
system cpu time 0.00 seconds
memory 1961.28k
OS Memory 33976.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 133 Switch Count 2
Page Faults 0
Page Reclaims 221
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 312
414 data student;
415 hsgpa=80; hscalc=90; hsengl=70; totscore=15; id = -27;
416
NOTE: The data set WORK.STUDENT has 1 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 609.25k
OS Memory 33192.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 134 Switch Count 2
Page Faults 0
Page Reclaims 92
Page Swaps 0
Voluntary Context Switches 14
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
417 data together;
418 set explore student;
419 /* All variables not assigned will be missing for observation -27 */
420
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: There were 1 observations read from the data set WORK.STUDENT.
NOTE: The data set WORK.TOGETHER has 580 observations and 35 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1624.78k
OS Memory 33712.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 135 Switch Count 2
Page Faults 0
Page Reclaims 132
Page Swaps 0
Voluntary Context Switches 15
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
421 proc reg plots = none;
422 title3 'Model H: hsgpa hscalc hsengl totscore: R-sq = 0.4532';
423 model grade = hsgpa hscalc hsengl totscore;
424 output out = guess predicted = PredictedY
425 L95 = LowerLimit
426 U95 = UpperLimit;
427
NOTE: The data set WORK.GUESS has 580 observations and 38 variables.
NOTE: PROCEDURE REG used (Total process time):
real time 0.05 seconds
user cpu time 0.05 seconds
system cpu time 0.00 seconds
memory 3151.62k
OS Memory 35524.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 136 Switch Count 4
Page Faults 0
Page Reclaims 340
Page Swaps 0
Voluntary Context Switches 30
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 592
428 data newguess;
429 set guess;
430 if id < 0; /* Discard all other cases */
431
NOTE: There were 580 observations read from the data set WORK.GUESS.
NOTE: The data set WORK.NEWGUESS has 1 observations and 38 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1184.46k
OS Memory 33964.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 137 Switch Count 2
Page Faults 0
Page Reclaims 122
Page Swaps 0
Voluntary Context Switches 15
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
432 proc print;
433 title3 'hsgpa=80 hscalc=90 hsengl=70 totscore=15';
434 var predictedY LowerLimit UpperLimit;
435 quit;
NOTE: There were 1 observations read from the data set WORK.NEWGUESS.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 772.15k
OS Memory 33704.00k
Timestamp 02/01/2020 08:40:45 PM
Step Count 138 Switch Count 1
Page Faults 0
Page Reclaims 72
Page Swaps 0
Voluntary Context Switches 9
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
436
437
438
439
440
441
442
443
444
445 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
456