1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;NOTE: ODS statements in the SAS Studio environment may disable some output features.7374 /* mathlogreg2.sas */75 %include '/home/u1407221/441s24/SAS08/ReadLabelMath2.sas';NOTE: Format YNFMT has been output.NOTE: Format CRSFMT has been output.NOTE: Format NFMT has been output.NOTE: Format NCFMT has been output.NOTE: PROCEDURE FORMAT used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.01 secondsmemory 302.78kOS Memory 25252.00kTimestamp 03/10/2024 06:14:57 PMStep Count 24 Switch Count 2Page Faults 0Page Reclaims 89Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 56NOTE: The infile '/home/u1407221/441s24/data/math.data.txt' is:Filename=/home/u1407221/441s24/data/math.data.txt,Owner Name=u1407221,Group Name=oda,Access Permission=-rw-r--r--,Last Modified=10Feb2024:17:04:10,File Size (bytes)=90324NOTE: 1158 records were read from the infile '/home/u1407221/441s24/data/math.data.txt'.The minimum record length was 76.The maximum record length was 76.NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).180 at 121:24NOTE: The data set WORK.MATH has 1158 observations and 37 variables.NOTE: DATA statement used (Total process time):real time 0.02 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 1176.21kOS Memory 26536.00kTimestamp 03/10/2024 06:14:57 PMStep Count 25 Switch Count 3Page Faults 0Page Reclaims 283Page Swaps 0Voluntary Context Switches 23Involuntary Context Switches 0Block Input Operations 0Block Output Operations 776NOTE: There were 1158 observations read from the data set WORK.MATH.NOTE: The data set WORK.REPLIC has 579 observations and 37 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1412.81kOS Memory 26924.00kTimestamp 03/10/2024 06:14:57 PMStep Count 26 Switch Count 2Page Faults 0Page Reclaims 152Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520NOTE: There were 1158 observations read from the data set WORK.MATH.NOTE: The data set WORK.EXPLORE has 579 observations and 28 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1408.28kOS Memory 26924.00kTimestamp 03/10/2024 06:14:57 PMStep Count 27 Switch Count 2Page Faults 0Page Reclaims 132Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520236237 title2 'Predict Passing the course (Y-N) with Logistic Regression';238239 /* We know course is useful:240 c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite' */241242 proc logistic data = explore;243 title3 'Course and HS variables';244 model passed (event='Yes') = c1 c3 hsgpa hscalc hsengl;245 course: test c1=c3=0;246 HSvars: test hsgpa=hscalc=hsengl=0;247 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.09 secondsuser cpu time 0.09 secondssystem cpu time 0.00 secondsmemory 5277.37kOS Memory 30648.00kTimestamp 03/10/2024 06:14:57 PMStep Count 28 Switch Count 1Page Faults 0Page Reclaims 1896Page Swaps 0Voluntary Context Switches 6Involuntary Context Switches 0Block Input Operations 0Block Output Operations 56248249 /* Decision: Drop course */250251 ods select ParameterEstimates; /* Limit the output */252 proc logistic data = explore;253 title3 'Just HS variables';254 model passed (event='Yes') = hsgpa hscalc hsengl;255 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 2338.65kOS Memory 30904.00kTimestamp 03/10/2024 06:14:57 PMStep Count 29 Switch Count 1Page Faults 0Page Reclaims 257Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 48256257 /* Decision: Drop HS English.258 Does the diagnostic test add anything? */259260 ods select ParameterEstimates;261 proc logistic data = explore;262 title3 'HS GPA, HS Calculus and Diagnostic Test';263 model passed (event='Yes') = hsgpa hscalc calc precalc;264 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.01 secondsmemory 2301.65kOS Memory 30904.00kTimestamp 03/10/2024 06:14:57 PMStep Count 30 Switch Count 1Page Faults 0Page Reclaims 199Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40265266 /* Decision: Drop the calc subscale, but which is better,267 precalc or total score? */268269 ods select ParameterEstimates TestStmts; /* I ran a trace to find out the name */270 proc logistic data = explore;271 title3 'HS GPA, HS Calculus and Diagnostic Test';272 model passed (event='Yes') = hsgpa hscalc precalc totscore;273 precalc_n_totscore: test precalc = totscore = 0;274 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 2339.25kOS Memory 30904.00kTimestamp 03/10/2024 06:14:57 PMStep Count 31 Switch Count 1Page Faults 0Page Reclaims 209Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 48275276 /* Decision: Keep precalc rather than totscore. Confirm */277278 ods select ParameterEstimates;279 proc logistic data = explore;280 title3 'HS GPA, HS Calculus and Pre-calculus test';281 model passed (event='Yes') = hsgpa hscalc precalc;282 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 2302.59kOS Memory 30904.00kTimestamp 03/10/2024 06:14:57 PMStep Count 32 Switch Count 1Page Faults 0Page Reclaims 198Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40283284 proc logistic data = explore;285 title3 'Try gender, ethnic and mother tongue controlling for good stuff';286 class ethnic (param=ref ref='East Indian');287 /* Specifying a reference category that's not the last value */288 model passed (event='Yes') = hsgpa hscalc precalc ethnic gender mtongue;289 contrast 'Demographics' ethnic 1 0 0 0 0,290 ethnic 0 1 0 0 0,291 ethnic 0 0 1 0 0,292 ethnic 0 0 0 1 0,293 ethnic 0 0 0 0 1,294 gender 1,295 mtongue 1 / e;296 /* Display the effect matrix */297 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.11 secondsuser cpu time 0.11 secondssystem cpu time 0.01 secondsmemory 2690.68kOS Memory 31416.00kTimestamp 03/10/2024 06:14:57 PMStep Count 33 Switch Count 1Page Faults 0Page Reclaims 311Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 104298299 /* Decision: Forget about ethnicity. */300301 ods select ParameterEstimates;302 proc logistic data = explore;303 title3 'HS GPA, HS Calculus, Pre-calculus test, Gender and Mother tongue';304 model passed (event='Yes') = hsgpa hscalc precalc gender mtongue;305 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 2325.37kOS Memory 31416.00kTimestamp 03/10/2024 06:14:57 PMStep Count 34 Switch Count 1Page Faults 0Page Reclaims 193Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40306307 /* Decision: Drop Gender and Mother tongue too.308 My model now has just HS GPA, HS Calculus and Pre-calculus test. */309310 proc logistic data = explore;311 title3 'Try automatic (stepwise) selection';312 model passed (event='Yes') =313 gender mtongue e1-e6314 hsgpa hscalc hsengl315 c1-c3 precalc calc totscore316 / selection = stepwise slentry = 0.05 slstay = 0.05 ;317 /* Default slentry = slstay = 0.15 */318 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 0.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 1.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 2.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 3.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.13 secondsuser cpu time 0.13 secondssystem cpu time 0.00 secondsmemory 2602.12kOS Memory 31416.00kTimestamp 03/10/2024 06:14:58 PMStep Count 35 Switch Count 1Page Faults 0Page Reclaims 246Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 2Block Input Operations 0Block Output Operations 176319320321 /* Note 211 observations lost to missingness for stepwise, compared to 204322 for the earlier model with hsgpa, hscalc and precalc. */323324 /* Perhaps missingness on the variables we dropped could be useful. */325326 proc freq;327 title2 'Explore missingness on omitted variables';328 tables gender mtongue ethnic;329 tables gender*mtongue / norow nocol nopercent missing;330 tables gender*course2 / norow nocol nopercent missing;331 run;NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: PROCEDURE FREQ used (Total process time):real time 0.05 secondsuser cpu time 0.06 secondssystem cpu time 0.01 secondsmemory 1762.21kOS Memory 31152.00kTimestamp 03/10/2024 06:14:58 PMStep Count 36 Switch Count 5Page Faults 0Page Reclaims 452Page Swaps 0Voluntary Context Switches 37Involuntary Context Switches 0Block Input Operations 0Block Output Operations 544332333334 data explore2;335 set explore;336 if gender = . then sexmiss = 1; else sexmiss=0; /* Includes mtongue */337 if course = . then coursemiss = 1; else coursemiss=0;338 format sexmiss coursemiss ynfmt.;339 label sexmiss = 'Gender and mother tongue missing'340 coursemiss = 'Course missing';341342 /* Checks are commented out343 proc freq;344 tables gender*sexmiss / norow nocol nopercent missing;345 tables course*coursemiss / norow nocol nopercent missing;346 tables sexmiss*coursemiss / norow nocol nopercent missing chisq;347 */348349NOTE: Variable course is uninitialized.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: The data set WORK.EXPLORE2 has 579 observations and 31 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 1266.25kOS Memory 31148.00kTimestamp 03/10/2024 06:14:58 PMStep Count 37 Switch Count 2Page Faults 0Page Reclaims 162Page Swaps 0Voluntary Context Switches 13Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520350 proc logistic data = explore2;351 title3 'Try adding missingness on gender/mtongue and course';352 model passed (event='Yes') = hsgpa hscalc precalc sexmiss coursemiss;353 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE2.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.06 secondsuser cpu time 0.07 secondssystem cpu time 0.00 secondsmemory 2414.34kOS Memory 32184.00kTimestamp 03/10/2024 06:14:58 PMStep Count 38 Switch Count 1Page Faults 0Page Reclaims 243Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 72354355356 /* All the cases with course missing were deleted because of357 missingness on other variables. */358359 proc logistic data = explore2;360 title3 'Try adding missingness on gender/mtongue and course';361 model passed (event='Yes') = hsgpa hscalc precalc sexmiss;362 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE2.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.06 secondsuser cpu time 0.06 secondssystem cpu time 0.00 secondsmemory 2342.46kOS Memory 32184.00kTimestamp 03/10/2024 06:14:58 PMStep Count 39 Switch Count 1Page Faults 0Page Reclaims 208Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 80363364 /* Here's the current model. */365366 proc logistic data = explore;367 title3 'HS GPA, HS Calculus and Pre-calculus test';368 model passed (event='Yes') = hsgpa hscalc precalc;369 output out=explore3 prob=pihat;370 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.EXPLORE.NOTE: The data set WORK.EXPLORE3 has 579 observations and 30 variables.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.06 secondsuser cpu time 0.06 secondssystem cpu time 0.00 secondsmemory 2841.50kOS Memory 32444.00kTimestamp 03/10/2024 06:14:58 PMStep Count 40 Switch Count 3Page Faults 0Page Reclaims 265Page Swaps 0Voluntary Context Switches 26Involuntary Context Switches 0Block Input Operations 0Block Output Operations 584371372373 proc print data=explore3 (obs=13);374 /* List only the first 13 observations */375 var hsgpa hscalc precalc pihat passed;376 run;NOTE: There were 13 observations read from the data set WORK.EXPLORE3.NOTE: PROCEDURE PRINT used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 821.53kOS Memory 31144.00kTimestamp 03/10/2024 06:14:58 PMStep Count 41 Switch Count 0Page Faults 0Page Reclaims 147Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0377378379 /* Based on invariance and the Law of Total Probability (double expectation),380 I predict that the mean pihat will be 234/375 = 0.624, the proportion of381 students with non-missing data who passed. */382383 proc univariate normal plot data=explore3;384 title2 'Explore the distribution of estimated probabilities';385 where pihat ne .;386 var pihat; /* Should have n=375 non-missing. */387 run;NOTE: PROCEDURE UNIVARIATE used (Total process time):real time 2.43 secondsuser cpu time 0.17 secondssystem cpu time 0.03 secondsmemory 22269.43kOS Memory 52396.00kTimestamp 03/10/2024 06:15:00 PMStep Count 42 Switch Count 4Page Faults 0Page Reclaims 7226Page Swaps 0Voluntary Context Switches 523Involuntary Context Switches 0Block Input Operations 0Block Output Operations 1632388389390 /* Goal: Develop a prediction model that uses all the data and makes a391 prediction for every case. Base on estimated probabilities. */392393394395396397398399400401 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;413