1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;NOTE: ODS statements in the SAS Studio environment may disable some output features.7172 /* mathlogreg2.sas */73 %include '/home/brunner0/441s20/readmath2b.sas';NOTE: Format YNFMT is already on the library WORK.FORMATS.NOTE: Format YNFMT has been output.NOTE: Format CRSFMT is already on the library WORK.FORMATS.NOTE: Format CRSFMT has been output.NOTE: Format NFMT is already on the library WORK.FORMATS.NOTE: Format NFMT has been output.NOTE: PROCEDURE FORMAT used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 244.34kOS Memory 30372.00kTimestamp 02/10/2020 12:09:23 AMStep Count 64 Switch Count 0Page Faults 0Page Reclaims 24Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 32180 /* %include '/home/brunner0/441s20/readexplor.sas'; */181 title2 'Predict Passing the course (Y-N) with Logistic Regression';182183 /* We know course is useful:184 c1 = 'Catch-up' c2 = 'Mainstream' c3 = 'Elite' */185NOTE: The infile '/home/brunner0/441s20/exploremath.data.txt' is:Filename=/home/brunner0/441s20/exploremath.data.txt,Owner Name=brunner0,Group Name=oda,Access Permission=-rw-r--r--,Last Modified=26Jan2020:18:49:34,File Size (bytes)=44583NOTE: 579 records were read from the infile '/home/brunner0/441s20/exploremath.data.txt'.The minimum record length was 75.The maximum record length was 75.NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).99 at 96:24 99 at 135:13NOTE: The data set WORK.MATHEX has 579 observations and 35 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 891.56kOS Memory 30888.00kTimestamp 02/10/2020 12:09:23 AMStep Count 65 Switch Count 3Page Faults 0Page Reclaims 106Page Swaps 0Voluntary Context Switches 24Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520186 proc logistic data = mathex;187 title3 'Course and HS variables';188 model passed (event='Yes') = c1 c3 hsgpa hscalc hsengl;189 course: test c1=c3=0;190 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.10 secondsuser cpu time 0.10 secondssystem cpu time 0.00 secondsmemory 4573.59kOS Memory 32952.00kTimestamp 02/10/2020 12:09:24 AMStep Count 66 Switch Count 1Page Faults 0Page Reclaims 435Page Swaps 0Voluntary Context Switches 6Involuntary Context Switches 0Block Input Operations 0Block Output Operations 56191192193 /* Decision: Drop course */194195196 ods select ParameterEstimates; /* Limit the output */197 proc logistic data = mathex;198 title3 'Just HS variables';199 model passed (event='Yes') = hsgpa hscalc hsengl;200 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 2339.25kOS Memory 32952.00kTimestamp 02/10/2020 12:09:24 AMStep Count 67 Switch Count 1Page Faults 0Page Reclaims 192Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40201202203 /* Decision: Drop HS English.204 Does the diagnostic test add anything? */205206207 ods select ParameterEstimates;208 proc logistic data = mathex;209 title3 'HS GPA, HS Calculus and Diagnostic Test';210 model passed (event='Yes') = hsgpa hscalc calc precalc;211 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 2306.09kOS Memory 32952.00kTimestamp 02/10/2020 12:09:24 AMStep Count 68 Switch Count 1Page Faults 0Page Reclaims 202Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40212213214 /* Decision: Drop the calc subscale, but which is better,215 precalc or total score? */216217218 ods select ParameterEstimates TestStmts; /* I ran a trace to find out the name */219 proc logistic data = mathex;220 title3 'HS GPA, HS Calculus and Diagnostic Test';221 model passed (event='Yes') = hsgpa hscalc precalc totscore;222 precalc_n_totscore: test precalc = totscore = 0;223 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.03 secondsuser cpu time 0.03 secondssystem cpu time 0.01 secondsmemory 2331.43kOS Memory 32952.00kTimestamp 02/10/2020 12:09:24 AMStep Count 69 Switch Count 1Page Faults 0Page Reclaims 201Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 48224225226 /* Decision: Keep precalc rather than totscore. Confirm */227228229 ods select ParameterEstimates;230 proc logistic data = mathex;231 title3 'HS GPA, HS Calculus and Pre-calculus test';232 model passed (event='Yes') = hsgpa hscalc precalc;233 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 2282.78kOS Memory 32952.00kTimestamp 02/10/2020 12:09:24 AMStep Count 70 Switch Count 1Page Faults 0Page Reclaims 188Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40234235 proc logistic data = mathex;236 title3 'Try gender, ethnic and mother tongue controlling for good stuff';237 class ethnic (param=ref ref='East Indian');238 /* Specifying a reference category that's not the last value */239 model passed (event='Yes') = hsgpa hscalc precalc ethnic gender mtongue;240 contrast 'Demographics' ethnic 1 0 0 0 0,241 ethnic 0 1 0 0 0,242 ethnic 0 0 1 0 0,243 ethnic 0 0 0 1 0,244 ethnic 0 0 0 0 1,245 gender 1,246 mtongue 1 / e;247 /* Display the effect matrix */248 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.14 secondsuser cpu time 0.15 secondssystem cpu time 0.00 secondsmemory 2586.50kOS Memory 33208.00kTimestamp 02/10/2020 12:09:24 AMStep Count 71 Switch Count 1Page Faults 0Page Reclaims 253Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 1Block Input Operations 0Block Output Operations 104249250251 /* Decision: Forget about ethnicity. */252253254 ods select ParameterEstimates;255 proc logistic data = mathex;256 title3 'HS GPA, HS Calculus, Pre-calculus test, Gender and Mother tongue';257 model passed (event='Yes') = hsgpa hscalc precalc gender mtongue;258 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.02 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 2303.68kOS Memory 33208.00kTimestamp 02/10/2020 12:09:24 AMStep Count 72 Switch Count 1Page Faults 0Page Reclaims 198Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40259260261 /* Decision: Drop Gender and Mother tongue too.262 My model now has just HS GPA, HS Calculus and Pre-calculus test. */263264265 proc logistic data = mathex;266 title3 'Try automatic (stepwise) selection';267 model passed (event='Yes') =268 gender mtongue e1-e6269 hsgpa hscalc hsengl270 c1-c3 precalc calc totscore271 / selection = stepwise slentry = 0.05 slstay = 0.05 ;272 /* Default slentry = slstay = 0.15 */273 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 0.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 1.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 2.NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 3.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.16 secondsuser cpu time 0.17 secondssystem cpu time 0.00 secondsmemory 2577.09kOS Memory 33208.00kTimestamp 02/10/2020 12:09:24 AMStep Count 73 Switch Count 1Page Faults 0Page Reclaims 211Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 104274275276 /* Note 211 observations lost to missingness for stepwise, compared to 204277 for the earlier model with hsgpa, hscalc and precalc. */278279 /* Perhaps missingness on the variables we dropped could be useful. */280281 proc freq;282 title2 'Explore missingness on omitted variables';283 tables gender mtongue ethnic;284 tables gender*mtongue / norow nocol nopercent missing;285 tables gender*course / norow nocol nopercent missing;286 run;NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE FREQ used (Total process time):real time 0.08 secondsuser cpu time 0.08 secondssystem cpu time 0.00 secondsmemory 1629.90kOS Memory 32944.00kTimestamp 02/10/2020 12:09:24 AMStep Count 74 Switch Count 5Page Faults 0Page Reclaims 232Page Swaps 0Voluntary Context Switches 33Involuntary Context Switches 0Block Input Operations 0Block Output Operations 544287288289 data mathex2;290 set mathex;291 if gender = . then sexmiss = 1; else sexmiss=0; /* Includes mtongue */292 if course = . then coursemiss = 1; else coursemiss=0;293 format sexmiss coursemiss ynfmt.;294 label sexmiss = 'Gender and mother tongue missing'295 coursemiss = 'Course missing';296297 /* Checks are commented out298 proc freq;299 tables gender*sexmiss / norow nocol nopercent missing;300 tables course*coursemiss / norow nocol nopercent missing;301 tables sexmiss*coursemiss / norow nocol nopercent missing chisq;302 */303304NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: The data set WORK.MATHEX2 has 579 observations and 37 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 1291.53kOS Memory 32684.00kTimestamp 02/10/2020 12:09:24 AMStep Count 75 Switch Count 2Page Faults 0Page Reclaims 95Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 520305 proc logistic data = mathex2;306 title3 'Try adding missingness on gender/mtongue and course';307 model passed (event='Yes') = hsgpa hscalc precalc sexmiss coursemiss;308 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX2.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.08 secondsuser cpu time 0.08 secondssystem cpu time 0.00 secondsmemory 2429.46kOS Memory 33720.00kTimestamp 02/10/2020 12:09:24 AMStep Count 76 Switch Count 1Page Faults 0Page Reclaims 211Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 8Block Input Operations 0Block Output Operations 64309310311 /* All the cases with course missing were deleted because of312 missingness on other variables. */313314 proc logistic data = mathex2;315 title3 'Try adding missingness on gender/mtongue and course';316 model passed (event='Yes') = hsgpa hscalc precalc sexmiss;317 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX2.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.08 secondsuser cpu time 0.07 secondssystem cpu time 0.00 secondsmemory 2333.09kOS Memory 33720.00kTimestamp 02/10/2020 12:09:24 AMStep Count 77 Switch Count 1Page Faults 0Page Reclaims 205Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 80318319 /* Here's the current model. */320321 proc logistic data = mathex;322 title3 'HS GPA, HS Calculus and Pre-calculus test';323 model passed (event='Yes') = hsgpa hscalc precalc;324 run;NOTE: PROC LOGISTIC is modeling the probability that passed='Yes'.NOTE: Convergence criterion (GCONV=1E-8) satisfied.NOTE: There were 579 observations read from the data set WORK.MATHEX.NOTE: PROCEDURE LOGISTIC used (Total process time):real time 0.07 secondsuser cpu time 0.07 secondssystem cpu time 0.00 secondsmemory 2308.87kOS Memory 33720.00kTimestamp 02/10/2020 12:09:24 AMStep Count 78 Switch Count 1Page Faults 0Page Reclaims 201Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 1Block Input Operations 0Block Output Operations 64325326327 /* Goal: Develop a prediction model that uses all the data and makes a328 prediction for every case. */329330331332333 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;344