1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 /* MathReg3.sas */
74 %include '/home/u1407221/441s24/SAS08/ReadLabelMath2.sas';
NOTE: Format YNFMT has been output.
NOTE: Format CRSFMT has been output.
NOTE: Format NFMT has been output.
NOTE: Format NCFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 327.93k
OS Memory 25252.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 24 Switch Count 2
Page Faults 0
Page Reclaims 101
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 56
NOTE: The infile '/home/u1407221/441s24/data/math.data.txt' is:
Filename=/home/u1407221/441s24/data/math.data.txt,
Owner Name=u1407221,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=10Feb2024:16:04:10,
File Size (bytes)=90324
NOTE: 1158 records were read from the infile '/home/u1407221/441s24/data/math.data.txt'.
The minimum record length was 76.
The maximum record length was 76.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
180 at 120:24
NOTE: The data set WORK.MATH has 1158 observations and 37 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1176.34k
OS Memory 26536.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 25 Switch Count 3
Page Faults 0
Page Reclaims 281
Page Swaps 0
Voluntary Context Switches 23
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 776
NOTE: There were 1158 observations read from the data set WORK.MATH.
NOTE: The data set WORK.REPLIC has 579 observations and 37 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1412.75k
OS Memory 26924.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 26 Switch Count 2
Page Faults 0
Page Reclaims 157
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 528
NOTE: There were 1158 observations read from the data set WORK.MATH.
NOTE: The data set WORK.EXPLORE has 579 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1409.87k
OS Memory 26924.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 27 Switch Count 2
Page Faults 0
Page Reclaims 131
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
235 title2 'Replication and Prediction';
236
237 /* Plan:
238
239 1. Non-obvious findings from the exploratory analysis (based on Model I,
240 which predicts grade from hsgpa hscalc hsengl totscore mtongue) were
241 a. HS Engl negative
242 b. mtongue negative
243 c. totscore positive (diagnostic test matters controlling for HS)
244 Test these on the replication with a Bonferroni correction for 3 tests.
245 The other two results (HS GPA and HS Calculus) were obvious.
246
247 2. See if prediction intervals work as advertised on replication data.
248
249 3. Try prediction of letter grade.
250
251 4. Try predictions of grade for some imaginary students.
252
253 /* Test the three findings: Point 1 above */
254
255 proc reg data = replic plots = none;
256 title3 'Try to replicate HS Engl neg, mtongue neg, totscore pos';
257 title4 'with a Bonferroni correction (check p < 0.05/3 = 0.01666667)';
258 model grade = hsgpa hscalc hsengl totscore mtongue;
259
260 /* Point 2: Look at prediction intervals */
261
NOTE: PROCEDURE REG used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.02 seconds
memory 5302.25k
OS Memory 30400.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 28 Switch Count 3
Page Faults 0
Page Reclaims 1745
Page Swaps 0
Voluntary Context Switches 29
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 56
262 data predict;
263 set math; /* Combined data set */
264 keeper = grade+hsgpa+hscalc+hsengl+totscore+mtongue;
265 /* keeper will be missing if any of the vars are missing */
266 if keeper ne .; /* Discards all other cases */
267 grade2 = grade; /* Save value of grade for future use */
268 if sample=2 then grade=. ;
269 /* Response variable is now missing for replication sample.
270 But it is preserved in grade2 */
271
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
486 at 264:21 21 at 264:27 4 at 264:34 65 at 264:41 7 at 264:50
NOTE: There were 1158 observations read from the data set WORK.MATH.
NOTE: The data set WORK.PREDICT has 575 observations and 39 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1424.53k
OS Memory 30252.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 29 Switch Count 2
Page Faults 0
Page Reclaims 246
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 536
272 proc reg plots=none data=predict;
273 /* Data table predict is the default anyway */
274 title3 'Re-running Model I to generate y-hat and prediction intervals';
275 model grade = hsgpa hscalc hsengl totscore mtongue;
276 output out = predataI predicted = Yhat
277 L95 = lowpred
278 U95 = hipred;
279 /* Data set predataI has everything in predict plus
280 Yhat and the lower and upper 95% prediction limits. */
281
282 /* Does 95 Percent Prediction Interval really contain 95 percent of grades?
283 Recall that the data fail all tests for normality, and the prediction
284 intervals are based on normal theory. */
285
NOTE: The data set WORK.PREDATAI has 575 observations and 42 variables.
NOTE: PROCEDURE REG used (Total process time):
real time 0.04 seconds
user cpu time 0.04 seconds
system cpu time 0.00 seconds
memory 3192.81k
OS Memory 31940.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 30 Switch Count 4
Page Faults 0
Page Reclaims 448
Page Swaps 0
Voluntary Context Switches 34
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 584
286 data predictB;
287 set predataI;
288 if (lowpred < grade2 < hipred) then ininterval='Yes';
289 else ininterval='No';
290
NOTE: There were 575 observations read from the data set WORK.PREDATAI.
NOTE: The data set WORK.PREDICTB has 575 observations and 43 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 1304.93k
OS Memory 30636.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 31 Switch Count 2
Page Faults 0
Page Reclaims 119
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
291 proc freq data=predictB;
292 title3 'Does 95 Percent Prediction Interval Work?';
293 tables sample * ininterval / nocol nopercent;
294
NOTE: There were 575 observations read from the data set WORK.PREDICTB.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 1489.90k
OS Memory 30896.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 32 Switch Count 5
Page Faults 0
Page Reclaims 408
Page Swaps 0
Voluntary Context Switches 31
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 544
295 proc print data=predataI;
296 title3 'Look at predictions for the replication sample';
297 var id sample grade2 Yhat lowpred hipred;
298 where sample = 2;
299 /* Should predicted marks be used to advise students? */
300
301 /* Keep trying. Try to predict letter grade. */
302
NOTE: There were 288 observations read from the data set WORK.PREDATAI.
WHERE sample=2;
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.31 seconds
user cpu time 0.31 seconds
system cpu time 0.00 seconds
memory 4210.28k
OS Memory 34220.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 33 Switch Count 6
Page Faults 0
Page Reclaims 1040
Page Swaps 0
Voluntary Context Switches 23
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 304
303 data predictC;
304 set predictB;
305 if 80 <= grade2 <= 100 then lgrade = 'A';
306 else if 70 <= grade2 <= 79 then lgrade = 'B';
307 else if 60 <= grade2 <= 69 then lgrade = 'C';
308 else if 50 <= grade2 <= 59 then lgrade = 'D';
309 else if 0 <= grade2 <= 49 then lgrade = 'F';
310 label lgrade = 'Letter Grade';
311 pregrade = round(Yhat);
312 if 80 <= pregrade <= 100 then prelgrade = 'A';
313 else if 70 <= pregrade <= 79 then prelgrade = 'B';
314 else if 60 <= pregrade <= 69 then prelgrade = 'C';
315 else if 50 <= pregrade <= 59 then prelgrade = 'D';
316 else if 0 <= pregrade <= 49 then prelgrade = 'F';
317 label prelgrade = 'Predicted Letter Grade';
318
NOTE: There were 575 observations read from the data set WORK.PREDICTB.
NOTE: The data set WORK.PREDICTC has 575 observations and 46 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1326.59k
OS Memory 34220.00k
Timestamp 02/24/2024 05:11:05 PM
Step Count 34 Switch Count 2
Page Faults 0
Page Reclaims 125
Page Swaps 0
Voluntary Context Switches 15
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 520
319 proc freq;
320 title3 'Accuracy of predicting Letter Grades From Model I';
321 tables sample*prelgrade*lgrade / nocol nopercent;
322 /* Will yield separate table for each sample. */
323
324 /* Predict grade for a new student with hsgpa=80 hscalc=90 hsengl=70
325 totscore=15. For just a prediction (no interval), proc glm is easier. */
326
NOTE: There were 575 observations read from the data set WORK.PREDICTC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.07 seconds
user cpu time 0.06 seconds
system cpu time 0.01 seconds
memory 1375.50k
OS Memory 34480.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 35 Switch Count 5
Page Faults 0
Page Reclaims 203
Page Swaps 0
Voluntary Context Switches 36
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 552
327 proc glm data = explore;
328 model grade = hsgpa hscalc hsengl mtongue totscore;
329 estimate 'New Student 1' intercept 1 hsgpa 80 hscalc 90 hsengl 70
330 mtongue 1 totscore 15;
331 estimate 'New Student 2' intercept 1 hsgpa 80 hscalc 90 hsengl 0
332 mtongue 1 totscore 15;
333
334 /* Prediction for Y_{n+1} is the same as estimate of E[Y|X]. CI from proc glm
335 is for E[Y|X]. PREDICTION interval for Y_{n+1} is wider. */
336
NOTE: PROCEDURE GLM used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.00 seconds
memory 2030.46k
OS Memory 35000.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 36 Switch Count 2
Page Faults 0
Page Reclaims 357
Page Swaps 0
Voluntary Context Switches 14
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 312
337 data student;
338 hsgpa=80; hscalc=90; hsengl=70; mtongue=1; totscore=15; id = -1; output;
339 hsgpa=80; hscalc=90; hsengl=0; mtongue=1; totscore=15; id = -2; output;
NOTE: The data set WORK.STUDENT has 2 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 670.03k
OS Memory 33960.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 37 Switch Count 2
Page Faults 0
Page Reclaims 92
Page Swaps 0
Voluntary Context Switches 14
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
340 proc print;
341
NOTE: There were 2 observations read from the data set WORK.STUDENT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 667.25k
OS Memory 33960.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 38 Switch Count 0
Page Faults 0
Page Reclaims 69
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 24
342 data together;
343 set explore student;
344 /* All variables not assigned will be missing for new observations */
345
NOTE: There were 579 observations read from the data set WORK.EXPLORE.
NOTE: There were 2 observations read from the data set WORK.STUDENT.
NOTE: The data set WORK.TOGETHER has 581 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1602.56k
OS Memory 34480.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 39 Switch Count 2
Page Faults 0
Page Reclaims 136
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 528
346 proc reg noprint data=together;
347 title3 'Fit Model I to predict new student data';
348 model grade = hsgpa hscalc hsengl mtongue totscore;
349 output out = guess predicted = PredictedY
350 L95 = LowerLimit
351 U95 = UpperLimit;
352
NOTE: The data set WORK.GUESS has 581 observations and 31 variables.
NOTE: PROCEDURE REG used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 2671.65k
OS Memory 35780.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 40 Switch Count 4
Page Faults 0
Page Reclaims 299
Page Swaps 0
Voluntary Context Switches 39
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 568
353 data newguess;
354 set guess;
355 if id < 0; /* Discard all other cases */
356
NOTE: There were 581 observations read from the data set WORK.GUESS.
NOTE: The data set WORK.NEWGUESS has 2 observations and 31 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1163.09k
OS Memory 34220.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 41 Switch Count 2
Page Faults 0
Page Reclaims 130
Page Swaps 0
Voluntary Context Switches 14
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
357 proc print;
358 title3 'Prediction intervals for new students';
359 var id hsgpa hscalc hsengl totscore predictedY LowerLimit UpperLimit;
360
361 quit;
NOTE: There were 2 observations read from the data set WORK.NEWGUESS.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.01 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 687.75k
OS Memory 33960.00k
Timestamp 02/24/2024 05:11:06 PM
Step Count 42 Switch Count 1
Page Faults 0
Page Reclaims 67
Page Swaps 0
Voluntary Context Switches 9
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
362
363
364
365
366
367 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
379