1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;7071 /* MathReg3.sas */72 %include '/home/brunner0/441s20/readexplor.sas';NOTE: Format YNFMT is already on the library WORK.FORMATS.NOTE: Format YNFMT has been output.NOTE: Format CRSFMT is already on the library WORK.FORMATS.NOTE: Format CRSFMT has been output.NOTE: Format NFMT is already on the library WORK.FORMATS.NOTE: Format NFMT has been output.NOTE: PROCEDURE FORMAT used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 244.34kOS Memory 30884.00kTimestamp 02/01/2020 08:40:44 PMStep Count 119 Switch Count 0Page Faults 0Page Reclaims 29Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 32189 /* Creates data table explore */190 %include '/home/brunner0/441s20/readreplic.sas';NOTE: The infile '/home/brunner0/441s20/exploremath.data.txt' is:Filename=/home/brunner0/441s20/exploremath.data.txt,Owner Name=brunner0,Group Name=oda,Access Permission=-rw-r--r--,Last Modified=26Jan2020:18:49:34,File Size (bytes)=44583NOTE: 579 records were read from the infile '/home/brunner0/441s20/exploremath.data.txt'.The minimum record length was 75.The maximum record length was 75.NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).99 at 98:24 99 at 134:18NOTE: The data set WORK.EXPLORE has 579 observations and 35 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1014.21kOS Memory 31656.00kTimestamp 02/01/2020 08:40:44 PMStep Count 120 Switch Count 3Page Faults 0Page Reclaims 158Page Swaps 0Voluntary Context Switches 25Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520NOTE: Format YNFMT is already on the library WORK.FORMATS.NOTE: Format YNFMT has been output.NOTE: Format CRSFMT is already on the library WORK.FORMATS.NOTE: Format CRSFMT has been output.NOTE: Format NFMT is already on the library WORK.FORMATS.NOTE: Format NFMT has been output.NOTE: PROCEDURE FORMAT used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 273.84kOS Memory 31396.00kTimestamp 02/01/2020 08:40:44 PMStep Count 121 Switch Count 0Page Faults 0Page Reclaims 14Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0303 /* Creates data table replic */304 title2 'Predict Grade for Replication Sample';305306 /* Plan:307308 1. Non-obvious findings from the exploration (based on Model I,309 which predicts grade from hsgpa hscalc hsengl totscore mtongue) were310 a. HS Engl neg311 b. mtongue neg312 c. totscore positive (diagnostic test matters)313 Test these on the replication with a Bonferroni correction for 3 tests.314 The other two results (HS GPA and HS Calculus) were obvious.315316 2. See if prediction intervals work as advertised for Model H, which317 predicts grade from hsgpa hscalc hsengl totscore.318319 3. Compare prediction of letter grade for the models with and without320 the diagnostic test.321322323 First, just illustrate use of different data tables in the same run. */324NOTE: The infile '/home/brunner0/441s20/replicmath.data.txt' is:Filename=/home/brunner0/441s20/replicmath.data.txt,Owner Name=brunner0,Group Name=oda,Access Permission=-rw-r--r--,Last Modified=26Jan2020:18:49:13,File Size (bytes)=38214NOTE: 579 records were read from the infile '/home/brunner0/441s20/replicmath.data.txt'.The minimum record length was 64.The maximum record length was 64.NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).81 at 218:24 81 at 254:18NOTE: The data set WORK.REPLIC has 579 observations and 35 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 924.90kOS Memory 31656.00kTimestamp 02/01/2020 08:40:44 PMStep Count 122 Switch Count 3Page Faults 0Page Reclaims 92Page Swaps 0Voluntary Context Switches 22Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520325 proc freq data = explore;326 title3 'Exploratory Sample';327 tables outcome;NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE FREQ used (Total process time):real time 0.03 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 2760.93kOS Memory 32172.00kTimestamp 02/01/2020 08:40:44 PMStep Count 123 Switch Count 3Page Faults 0Page Reclaims 211Page Swaps 0Voluntary Context Switches 16Involuntary Context Switches 0Block Input Operations 0Block Output Operations 272328 proc freq data = replic;329 title3 'Replication Sample';330 tables outcome;331332 /* Now test the three findings: Point 1 above */333NOTE: There were 579 observations read from the data set WORK.REPLIC.NOTE: PROCEDURE FREQ used (Total process time):real time 0.01 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 977.12kOS Memory 32428.00kTimestamp 02/01/2020 08:40:44 PMStep Count 124 Switch Count 3Page Faults 0Page Reclaims 185Page Swaps 0Voluntary Context Switches 21Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264334 proc reg data = replic plots = none;335 title3 'Try to replicate HS Engl neg, mtongue neg, totscore pos';336 title4 'with a Bonferroni correction (check p < 0.05/3 = 0.01666667)';337 model grade = hsgpa hscalc hsengl totscore mtongue;338339 /* Make combined data table, look at prediction intervals: Point 2 */340NOTE: PROCEDURE REG used (Total process time):real time 0.05 secondsuser cpu time 0.05 secondssystem cpu time 0.01 secondsmemory 2711.15kOS Memory 33984.00kTimestamp 02/01/2020 08:40:44 PMStep Count 125 Switch Count 3Page Faults 0Page Reclaims 297Page Swaps 0Voluntary Context Switches 32Involuntary Context Switches 0Block Input Operations 0Block Output Operations 64341 data predict;342 set explore replic;343 keeper = grade+hsgpa+hscalc+hsengl+totscore;344 /* keeper will be missing if any of the vars are missing */345 if keeper ne .; /* Discards all other cases */346 grade2 = grade; /* Save value of grade for future use */347 if sample=2 then grade=. ;348 /* Response variable is now missing for replication sample.349 But it is preserved in grade2 */350NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).486 at 343:21 21 at 343:27 4 at 343:34 65 at 343:41NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: There were 579 observations read from the data set WORK.REPLIC.NOTE: The data set WORK.PREDICT has 582 observations and 37 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 1820.68kOS Memory 32944.00kTimestamp 02/01/2020 08:40:44 PMStep Count 126 Switch Count 2Page Faults 0Page Reclaims 132Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520351 proc reg plots = none data = predict;352 /* Data table predict is the default anyway */353 title3 'Model H: hsgpa hscalc hsengl totscore: R-sq = 0.4532';354 model grade = hsgpa hscalc hsengl totscore;355 output out = predataH predicted = Yhat356 L95 = lowpred357 U95 = hipred;358 /* Data table predataH has everything in predict plus359 Yhat and the lower and upper 95% prediction limits. */360NOTE: The data set WORK.PREDATAH has 582 observations and 40 variables.NOTE: PROCEDURE REG used (Total process time):real time 0.04 secondsuser cpu time 0.05 secondssystem cpu time 0.00 secondsmemory 3133.56kOS Memory 34244.00kTimestamp 02/01/2020 08:40:44 PMStep Count 127 Switch Count 4Page Faults 0Page Reclaims 344Page Swaps 0Voluntary Context Switches 40Involuntary Context Switches 0Block Input Operations 0Block Output Operations 576361 proc print;362 title3 'Look at predictions for the replication sample';363 var id sample grade2 Yhat lowpred hipred;364 where sample = 2;365 /* Should predicted marks be used to advise students? */366367 /* Does 95 Percent Prediction Interval really contain 95 percent of grades?368 Recall that the data fail all tests for normality, and the prediction369 intervals are based on normal theory. */370NOTE: There were 293 observations read from the data set WORK.PREDATAH.WHERE sample=2;NOTE: PROCEDURE PRINT used (Total process time):real time 0.41 secondsuser cpu time 0.42 secondssystem cpu time 0.00 secondsmemory 1754.50kOS Memory 32940.00kTimestamp 02/01/2020 08:40:45 PMStep Count 128 Switch Count 6Page Faults 0Page Reclaims 172Page Swaps 0Voluntary Context Switches 27Involuntary Context Switches 1Block Input Operations 0Block Output Operations 248371 data predictB;372 set predataH;373 if (lowpred < grade2 < hipred) then ininterval='Yes';374 else ininterval='No';375NOTE: There were 582 observations read from the data set WORK.PREDATAH.NOTE: The data set WORK.PREDICTB has 582 observations and 41 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1349.25kOS Memory 33196.00kTimestamp 02/01/2020 08:40:45 PMStep Count 129 Switch Count 2Page Faults 0Page Reclaims 99Page Swaps 0Voluntary Context Switches 14Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520376 proc freq;377 title3 'Does 95 Percent Prediction Interval Work?';378 tables sample * ininterval / nocol nopercent;379380 /* Keep trying. Try to predict letter grade. */381NOTE: There were 582 observations read from the data set WORK.PREDICTB.NOTE: PROCEDURE FREQ used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 1396.90kOS Memory 33456.00kTimestamp 02/01/2020 08:40:45 PMStep Count 130 Switch Count 5Page Faults 0Page Reclaims 219Page Swaps 0Voluntary Context Switches 38Involuntary Context Switches 0Block Input Operations 0Block Output Operations 536382 data predictC;383 set predictB;384 if 80 <= grade2 <= 100 then lgrade = 'A';385 else if 70 <= grade2 <= 79 then lgrade = 'B';386 else if 60 <= grade2 <= 69 then lgrade = 'C';387 else if 50 <= grade2 <= 59 then lgrade = 'D';388 else if 0 <= grade2 <= 49 then lgrade = 'F';389 label lgrade = 'Letter Grade';390 pregrade = round(Yhat);391 if 80 <= pregrade <= 100 then prelgrade = 'A';392 else if 70 <= pregrade <= 79 then prelgrade = 'B';393 else if 60 <= pregrade <= 69 then prelgrade = 'C';394 else if 50 <= pregrade <= 59 then prelgrade = 'D';395 else if 0 <= pregrade <= 49 then prelgrade = 'F';396 label prelgrade = 'Predicted Letter Grade';397NOTE: There were 582 observations read from the data set WORK.PREDICTB.NOTE: The data set WORK.PREDICTC has 582 observations and 44 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 1368.81kOS Memory 33196.00kTimestamp 02/01/2020 08:40:45 PMStep Count 131 Switch Count 2Page Faults 0Page Reclaims 102Page Swaps 0Voluntary Context Switches 17Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520398 proc freq;399 title3 'Accuracy of predicting Letter Grades From Model H';400 tables sample*prelgrade*lgrade / nocol nopercent;401 /* Will yield separate table for each sample. */402403 /* Predict grade for a new student with hsgpa=80 hscalc=90 hsengl=70404 totscore=15. For just a prediction (no interval), proc glm is easier. */405NOTE: There were 582 observations read from the data set WORK.PREDICTC.NOTE: PROCEDURE FREQ used (Total process time):real time 0.08 secondsuser cpu time 0.09 secondssystem cpu time 0.00 secondsmemory 1345.62kOS Memory 33456.00kTimestamp 02/01/2020 08:40:45 PMStep Count 132 Switch Count 5Page Faults 0Page Reclaims 213Page Swaps 0Voluntary Context Switches 38Involuntary Context Switches 0Block Input Operations 0Block Output Operations 560406 proc glm data = explore;407 model grade = hsgpa hscalc hsengl totscore;408 estimate 'New Student' intercept 1 hsgpa 80 hscalc 90 hsengl 70409 totscore 15;410411 /* Prediction for Y_{n+1} is the same as estimate of E[Y|X]. CI from proc glm412 is for E[Y|X]. PREDICTION interval for Y_{n+1} is wider. */413NOTE: PROCEDURE GLM used (Total process time):real time 0.09 secondsuser cpu time 0.10 secondssystem cpu time 0.00 secondsmemory 1961.28kOS Memory 33976.00kTimestamp 02/01/2020 08:40:45 PMStep Count 133 Switch Count 2Page Faults 0Page Reclaims 221Page Swaps 0Voluntary Context Switches 13Involuntary Context Switches 0Block Input Operations 0Block Output Operations 312414 data student;415 hsgpa=80; hscalc=90; hsengl=70; totscore=15; id = -27;416NOTE: The data set WORK.STUDENT has 1 observations and 5 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 609.25kOS Memory 33192.00kTimestamp 02/01/2020 08:40:45 PMStep Count 134 Switch Count 2Page Faults 0Page Reclaims 92Page Swaps 0Voluntary Context Switches 14Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264417 data together;418 set explore student;419 /* All variables not assigned will be missing for observation -27 */420NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: There were 1 observations read from the data set WORK.STUDENT.NOTE: The data set WORK.TOGETHER has 580 observations and 35 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1624.78kOS Memory 33712.00kTimestamp 02/01/2020 08:40:45 PMStep Count 135 Switch Count 2Page Faults 0Page Reclaims 132Page Swaps 0Voluntary Context Switches 15Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520421 proc reg plots = none;422 title3 'Model H: hsgpa hscalc hsengl totscore: R-sq = 0.4532';423 model grade = hsgpa hscalc hsengl totscore;424 output out = guess predicted = PredictedY425 L95 = LowerLimit426 U95 = UpperLimit;427NOTE: The data set WORK.GUESS has 580 observations and 38 variables.NOTE: PROCEDURE REG used (Total process time):real time 0.05 secondsuser cpu time 0.05 secondssystem cpu time 0.00 secondsmemory 3151.62kOS Memory 35524.00kTimestamp 02/01/2020 08:40:45 PMStep Count 136 Switch Count 4Page Faults 0Page Reclaims 340Page Swaps 0Voluntary Context Switches 30Involuntary Context Switches 0Block Input Operations 0Block Output Operations 592428 data newguess;429 set guess;430 if id < 0; /* Discard all other cases */431NOTE: There were 580 observations read from the data set WORK.GUESS.NOTE: The data set WORK.NEWGUESS has 1 observations and 38 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1184.46kOS Memory 33964.00kTimestamp 02/01/2020 08:40:45 PMStep Count 137 Switch Count 2Page Faults 0Page Reclaims 122Page Swaps 0Voluntary Context Switches 15Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264432 proc print;433 title3 'hsgpa=80 hscalc=90 hsengl=70 totscore=15';434 var predictedY LowerLimit UpperLimit;435 quit;NOTE: There were 1 observations read from the data set WORK.NEWGUESS.NOTE: PROCEDURE PRINT used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.01 secondsmemory 772.15kOS Memory 33704.00kTimestamp 02/01/2020 08:40:45 PMStep Count 138 Switch Count 1Page Faults 0Page Reclaims 72Page Swaps 0Voluntary Context Switches 9Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8436437438439440441442443444445 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;456