1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;NOTE: ODS statements in the SAS Studio environment may disable some output features.7374 /********************* cars2.sas ***************************/75 title 'Regression on the Metric Cars Data';7677 /* Read data directly from Excel spreadsheet */78 proc import datafile="/home/u1407221/441s24/data/mcars4.xlsx"79 out=cars dbms=xlsx replace;80 getnames=yes;81 /* Input data file is mcars4.xlsx82 Ouput data set is called cars83 dbms=xlsx The input file is an Excel spreadsheet.84 Necessary to read an Excel spreadsheet directly under unix/linux85 Works in PC environment too except for Excel 4.0 spreadsheets86 If there are multiple sheets, use sheet="sheet1" or something.87 replace If the data set cars already exists, replace it.88 getnames=yes Use column names as variable names. */89NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run withoptions MSGLEVEL=I.NOTE: The import data set has 100 observations and 4 variables.NOTE: WORK.CARS data set was successfully created.NOTE: PROCEDURE IMPORT used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.01 secondsmemory 3348.31kOS Memory 35836.00kTimestamp 02/21/2024 06:18:32 PMStep Count 87 Switch Count 4Page Faults 0Page Reclaims 781Page Swaps 0Voluntary Context Switches 30Involuntary Context Switches 0Block Input Operations 0Block Output Operations 26490 proc print data=cars(obs=10);91 title2 'Look at first 10 lines of input data set';92NOTE: There were 10 observations read from the data set WORK.CARS.NOTE: PROCEDURE PRINT used (Total process time):real time 0.03 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 2524.75kOS Memory 33704.00kTimestamp 02/21/2024 06:18:32 PMStep Count 88 Switch Count 1Page Faults 0Page Reclaims 149Page Swaps 0Voluntary Context Switches 6Involuntary Context Switches 0Block Input Operations 0Block Output Operations 893 proc contents;94 title2 'Contents of the default data set';95NOTE: PROCEDURE CONTENTS used (Total process time):real time 0.04 secondsuser cpu time 0.04 secondssystem cpu time 0.00 secondsmemory 1511.34kOS Memory 34476.00kTimestamp 02/21/2024 06:18:32 PMStep Count 89 Switch Count 1Page Faults 0Page Reclaims 162Page Swaps 0Voluntary Context Switches 13Involuntary Context Switches 1Block Input Operations 0Block Output Operations 2496 data auto;97 set cars;98 mpg = 100/lper100k * 0.6214/0.2642;99 Country = Cntry; /* I just like the spelling more */100 label Country = 'Location of Head Office'101 lper100k = 'Litres per 100 kilometers'102 mpg = 'Miles per Gallon'103 weight = 'Weight in kg'104 length = 'Length in meters';105 /* Indicator dummy vars: Ref category is Japanese */106 if country = 'US' then c1=1; else c1=0;107 if country = 'Europ' then c2=1; else c2=0;108 if country = 'Japan' then c3=1; else c3=0;109 label c1 = 'US = 1'110 c2 = 'Europe = 1'111 c3 = 'Japan';112 /* Interaction Terms */113 cw1 = c1*weight; cw2 = c2*weight; cw3 = c3*weight;114 cL1 = c1*length; cL2 = c2*length; cL3 = c3*length;115116 /* This way of creating dummy variables is safe only because117 Country is never missing. If it could be missing, better is118 if country = ' ' then c1 = .;119 else if country = 'US' then c1=1;120 else c1=0;121 if country = ' ' then c2 = .;122 else if country = 'Europ' then c2=1;123 else c2=0;124 if country = ' ' then c3 = .;125 else if country = 'Japan' then c3=1;126 else c3=0;127 Note that a blank space is the missing value code for character variables,128 while a period is missing for numeric variables. */129NOTE: There were 100 observations read from the data set WORK.CARS.NOTE: The data set WORK.AUTO has 100 observations and 15 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 963.90kOS Memory 34476.00kTimestamp 02/21/2024 06:18:32 PMStep Count 90 Switch Count 2Page Faults 0Page Reclaims 166Page Swaps 0Voluntary Context Switches 16Involuntary Context Switches 0Block Input Operations 0Block Output Operations 272130 proc freq;131 title2 'Check dummy variables';132 tables (c1 c2 c3)*country / norow nocol nopercent;133NOTE: There were 100 observations read from the data set WORK.AUTO.NOTE: PROCEDURE FREQ used (Total process time):real time 0.05 secondsuser cpu time 0.05 secondssystem cpu time 0.01 secondsmemory 1552.93kOS Memory 34736.00kTimestamp 02/21/2024 06:18:33 PMStep Count 91 Switch Count 5Page Faults 0Page Reclaims 237Page Swaps 0Voluntary Context Switches 31Involuntary Context Switches 0Block Input Operations 0Block Output Operations 552134 proc means;135 title2 'Means of quantitative variables';136 var weight length lper100k mpg;137138139 /* First an analysis with country only. */140141 /* Questions for every significance test:142 * What is E(y|x) for the model SAS is using?143 * Give the null hypothesis in symbols.144 * Do you reject H0 at alpha = 0.05? Answer Yes or No.145 * In plain, non-statistical language, what do you conclude? */146147NOTE: There were 100 observations read from the data set WORK.AUTO.NOTE: PROCEDURE MEANS used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 6584.53kOS Memory 40124.00kTimestamp 02/21/2024 06:18:33 PMStep Count 92 Switch Count 2Page Faults 0Page Reclaims 1468Page Swaps 0Voluntary Context Switches 27Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8148 proc means;149 title2 'Litres per 100 k Broken Down by Country';150 class Country;151 var lper100k;152NOTE: There were 100 observations read from the data set WORK.AUTO.NOTE: PROCEDURE MEANS used (Total process time):real time 0.02 secondsuser cpu time 0.01 secondssystem cpu time 0.01 secondsmemory 9060.34kOS Memory 43196.00kTimestamp 02/21/2024 06:18:33 PMStep Count 93 Switch Count 2Page Faults 0Page Reclaims 2009Page Swaps 0Voluntary Context Switches 19Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0153 proc reg plots=none; /* Suppress diagnostic plots for now*/154 title2 'Regression with Just Country';155 model lper100k = c1 c2;156 USvsEURO: test c1=c2;157NOTE: PROCEDURE REG used (Total process time):real time 0.05 secondsuser cpu time 0.05 secondssystem cpu time 0.00 secondsmemory 2434.21kOS Memory 37568.00kTimestamp 02/21/2024 06:18:33 PMStep Count 94 Switch Count 2Page Faults 0Page Reclaims 262Page Swaps 0Voluntary Context Switches 21Involuntary Context Switches 0Block Input Operations 0Block Output Operations 80158 proc glm;159 title2 'Compare Oneway with proc glm';160 class country;161 model lper100k = country;162NOTE: PROCEDURE GLM used (Total process time):real time 0.35 secondsuser cpu time 0.13 secondssystem cpu time 0.02 secondsmemory 21981.09kOS Memory 52664.00kTimestamp 02/21/2024 06:18:33 PMStep Count 95 Switch Count 3Page Faults 0Page Reclaims 5073Page Swaps 0Voluntary Context Switches 488Involuntary Context Switches 0Block Input Operations 0Block Output Operations 1352163 proc reg simple plots=none data=auto; /* simple gives descriptive stats */164 title2 'Country, Weight and Length';165 model lper100k = c1 c2 weight length;166 country: test c1 = c2 = 0; /* Country controlling for wgt, length */167 USvsEURO: test c1=c2; /* US vs. Europe controlling for wgt, length */168 wgt_len: test weight=length=0; /* wgt, length controlling for Country */169170 /* Proportions of remaining variation, using a = sF/(n-p+sF) */171NOTE: PROCEDURE REG used (Total process time):real time 0.09 secondsuser cpu time 0.09 secondssystem cpu time 0.00 secondsmemory 3159.28kOS Memory 53696.00kTimestamp 02/21/2024 06:18:33 PMStep Count 96 Switch Count 2Page Faults 0Page Reclaims 313Page Swaps 0Voluntary Context Switches 21Involuntary Context Switches 0Block Input Operations 0Block Output Operations 152172 proc iml;NOTE: IML Ready173 title2 'Proportion of remaining variation';174 print "Country controlling for Weight and Length";175 n = 100;175 ! p = 5;175 ! s = 2;176 F = 6.90;176 ! a = s*F/(n-p + s*F);177 print a;178179 print "Weight and Length controlling for Country";180 F = 115.16;180 ! a = s*F/(n-p + s*F);181 print a;182NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 606.65kOS Memory 51876.00kTimestamp 02/21/2024 06:18:33 PMStep Count 97 Switch Count 1Page Faults 0Page Reclaims 56Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0183 proc glm data=auto plots=none;184 title2 'Country, weight and length with proc glm';185 class country;186 model lper100k = weight length country;187 lsmeans country / pdiff tdiff adjust = bon;188189 /* Reproduce Bonferroni p-values from proc reg output */NOTE: PROCEDURE GLM used (Total process time):real time 0.07 secondsuser cpu time 0.07 secondssystem cpu time 0.00 secondsmemory 1921.53kOS Memory 53176.00kTimestamp 02/21/2024 06:18:33 PMStep Count 98 Switch Count 3Page Faults 0Page Reclaims 288Page Swaps 0Voluntary Context Switches 25Involuntary Context Switches 0Block Input Operations 0Block Output Operations 328190 proc iml;NOTE: IML Ready191 title2 "Reproduce Bonferroni p-values from proc reg output";192 USvsJap = 0.0010;192 ! EURvsJap = 0.4448;192 ! USvsEUR = 0.0113;193 print "Uncorrected" USvsJap EURvsJap USvsEUR;194 BonUSvsJap = 3*USvsJap;194 ! BonEURvsJap = 3*EURvsJap;194 ! BonUSvsEUR = 3*USvsEUR;195 print "Bonferroni " BonUSvsJap BonEURvsJap BonUSvsEUR;196197 /* Reproduce LS means from proc reg output198 Hold weight and length fixed at sample mean values. */NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 547.84kOS Memory 51876.00kTimestamp 02/21/2024 06:18:33 PMStep Count 99 Switch Count 1Page Faults 0Page Reclaims 57Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8199 proc iml;NOTE: IML Ready200 title2 "Reproduce LS means from proc reg output";201 /* yhat = b0 + b1*c1 + b2*c2 + b3*xbar1 + b4*xbar2 */202 EuropLSM = -5.28270 - 0.50652 + 0.00546*1413.21 + 2.34597*4.8492;203 JapanLSM = -5.28270 + 0.00546*1413.21 + 2.34597*4.8492;204 US_LSM = -5.28270 -1.99424 + 0.00546*1413.21 + 2.34597*4.8492;205 print EuropLSM JapanLSM US_LSM;206 run;NOTE: Module MAIN is undefined in IML; cannot be RUN.207 quit;NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.01 secondsmemory 532.09kOS Memory 51876.00kTimestamp 02/21/2024 06:18:33 PMStep Count 100 Switch Count 1Page Faults 0Page Reclaims 55Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0207 ! /* Try quitting iml. Needed for ods below. */208209 /* Reproduce LS means a more sophisticated way. */210211 /*212 ods trace on;213 proc reg simple data=auto plots=none;214 model lper100k = c1 c2 weight length;215 run;216 ods trace off;217 */218219220 options replace=yes;221 ods output SimpleStatistics=simplestats;222 ods output ParameterEstimates=parest;223 proc reg simple data=auto plots=none;224 model lper100k = c1 c2 weight length;225 run;NOTE: The data set WORK.PAREST has 5 observations and 9 variables.NOTE: The data set WORK.SIMPLESTATS has 6 observations and 7 variables.226NOTE: PROCEDURE REG used (Total process time):real time 0.06 secondsuser cpu time 0.06 secondssystem cpu time 0.00 secondsmemory 3131.25kOS Memory 54472.00kTimestamp 02/21/2024 06:18:33 PMStep Count 101 Switch Count 5Page Faults 0Page Reclaims 518Page Swaps 0Voluntary Context Switches 36Involuntary Context Switches 0Block Input Operations 0Block Output Operations 600227 proc print data=simplestats;228 title2 "Simple Statistics";NOTE: There were 6 observations read from the data set WORK.SIMPLESTATS.NOTE: PROCEDURE PRINT used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 661.25kOS Memory 52392.00kTimestamp 02/21/2024 06:18:33 PMStep Count 102 Switch Count 1Page Faults 0Page Reclaims 64Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8229 proc print data=parest;230 title2 "Parameter Estimates";231NOTE: There were 5 observations read from the data set WORK.PAREST.NOTE: PROCEDURE PRINT used (Total process time):real time 0.01 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 726.53kOS Memory 52392.00kTimestamp 02/21/2024 06:18:33 PMStep Count 103 Switch Count 1Page Faults 0Page Reclaims 72Page Swaps 0Voluntary Context Switches 9Involuntary Context Switches 0Block Input Operations 0Block Output Operations 16232 proc iml;NOTE: IML Ready233 title2 "Least squares means";234 use simplestats;235 read point 4 var {Mean} into xbar1;236 read point 5 var {Mean} into xbar2;237 close simplestats;238 use parest;239 read all var {Estimate} into b;240 close parest;241 /* Calculate y-hat values: U.S. first */242 /* Order is Intercept c1 c2 weight length */243 x = {1 0 0 0 0};243 ! /* Row vector */244 x[2] = 1;244 ! x[4] = xbar1;244 ! x[5] = xbar2;245 US = x * b;246 /* Set dummy variables for Europe */247 x[2] = 0;247 ! x[3] = 1;248 Europe = x * b;249 /* Now Japan */250 x[3]=0;251 Japan = x * b;252 print US Europe Japan;253NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 798.46kOS Memory 52392.00kTimestamp 02/21/2024 06:18:33 PMStep Count 104 Switch Count 1Page Faults 0Page Reclaims 129Page Swaps 0Voluntary Context Switches 9Involuntary Context Switches 0Block Input Operations 0Block Output Operations 16254 proc iml;NOTE: IML Ready255 title2 "Repeat LS means from earlier";256 /* yhat = b0 + b1*c1 + b2*c2 + b3*xbar1 + b4*xbar2 */257 EuropLSM = -5.28270 - 0.50652 + 0.00546*1413.21 + 2.34597*4.8492;258 JapanLSM = -5.28270 + 0.00546*1413.21 + 2.34597*4.8492;259 US_LSM = -5.28270 -1.99424 + 0.00546*1413.21 + 2.34597*4.8492;260 print US_LSM EuropLSM JapanLSM ;261 run;NOTE: Module MAIN is undefined in IML; cannot be RUN.261 ! quit;NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.01 secondsmemory 531.87kOS Memory 52132.00kTimestamp 02/21/2024 06:18:33 PMStep Count 105 Switch Count 1Page Faults 0Page Reclaims 56Page Swaps 0Voluntary Context Switches 10Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0262263 proc reg data=auto plots = none;264 title2 'Country, Weight and Length with Interactions';265 model lper100k = c1 c2 weight length cw1 cw2 cL1 cL2;266 country: test c1 = c2 = 0; /* Is it really still country? */267 Interactions: test cw1 = cw2 = cL1 = cL2 = 0;268269 /* Centering an explanatory variable by subtracting off the mean affects the270 intercept, but not the relationships among variables. I want to create a new271 data set with weight and length centered. It's not an issue for these data,272 but it's important to sbtract off the mean of the cases used in the regression.273 Also, to avoid confusion I will make sure the centered variables are nicely274 labelled. */275NOTE: PROCEDURE REG used (Total process time):real time 0.06 secondsuser cpu time 0.07 secondssystem cpu time 0.00 secondsmemory 2449.12kOS Memory 53952.00kTimestamp 02/21/2024 06:18:33 PMStep Count 106 Switch Count 2Page Faults 0Page Reclaims 274Page Swaps 0Voluntary Context Switches 19Involuntary Context Switches 0Block Input Operations 0Block Output Operations 72276 data auto2;277 set auto;278 utility = lper100k + c1 + c2 + weight + length;279 if utility = . then delete;280 drop utility;281NOTE: There were 100 observations read from the data set WORK.AUTO.NOTE: The data set WORK.AUTO2 has 100 observations and 15 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 981.53kOS Memory 52652.00kTimestamp 02/21/2024 06:18:33 PMStep Count 107 Switch Count 2Page Faults 0Page Reclaims 140Page Swaps 0Voluntary Context Switches 11Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264282 proc standard mean=0 data=auto2 out=cntrd;283 var weight length;284285 /* In the new data set "cntrd," weight and length are adjusted to have mean286 zero (the sample means have been subtracted from each observation). If I had287 said mean=0 std=1, they would have been converted to z-scores. All the other288 variables (including the product terms) are as they were before, and the289 labels are the same as before too. */290NOTE: The data set WORK.CNTRD has 100 observations and 15 variables.NOTE: PROCEDURE STANDARD used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 861.46kOS Memory 52652.00kTimestamp 02/21/2024 06:18:33 PMStep Count 108 Switch Count 2Page Faults 0Page Reclaims 113Page Swaps 0Voluntary Context Switches 14Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264291 data centered;292 set cntrd; /* Now centered has everything in cntrd */293 /* Re-create Interaction Terms and re-label explanatory vars*/294 cw1 = c1*weight; cw2 = c2*weight;295 cL1 = c1*length; cL2 = c2*length;296 label weight = 'Weight in kg (Centered)'297 length = 'Length in cm (Centered)';298299 /* By default, SAS procedures use the most recently created data set,300 but specify it anyway. */301NOTE: There were 100 observations read from the data set WORK.CNTRD.NOTE: The data set WORK.CENTERED has 100 observations and 15 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 993.84kOS Memory 52652.00kTimestamp 02/21/2024 06:18:33 PMStep Count 109 Switch Count 2Page Faults 0Page Reclaims 125Page Swaps 0Voluntary Context Switches 14Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264302 proc reg plots=none simple data=centered;303 title2 'Weight and length are now centered: Mean=0';304 model lper100k = c1 c2 weight length cw1 cw2 cL1 cL2;305 country: test c1 = c2 = 0; /* Does this make better sense? */306 Interactions: test cw1 = cw2 = cL1 = cL2 = 0;307NOTE: PROCEDURE REG used (Total process time):real time 0.08 secondsuser cpu time 0.09 secondssystem cpu time 0.00 secondsmemory 2418.06kOS Memory 53952.00kTimestamp 02/21/2024 06:18:34 PMStep Count 110 Switch Count 2Page Faults 0Page Reclaims 278Page Swaps 0Voluntary Context Switches 19Involuntary Context Switches 1Block Input Operations 0Block Output Operations 104308 proc sgplot data=centered;309 title2 'Look at the regression lines';310 reg x=weight y=lper100k / group = country;311312 proc reg plots=none data=centered;NOTE: PROCEDURE SGPLOT used (Total process time):real time 0.31 secondsuser cpu time 0.07 secondssystem cpu time 0.01 secondsmemory 4009.43kOS Memory 54956.00kTimestamp 02/21/2024 06:18:34 PMStep Count 111 Switch Count 2Page Faults 0Page Reclaims 926Page Swaps 0Voluntary Context Switches 627Involuntary Context Switches 0Block Input Operations 0Block Output Operations 912NOTE: There were 100 observations read from the data set WORK.CENTERED.313 title2 'Drop Length: Weight is centered';314 model lper100k = c1 c2 weight cw1 cw2;315 Interactions: test cw1 = cw2 = 0;316 USvsEurSlope: test cw1 = cw2;317NOTE: PROCEDURE REG used (Total process time):real time 0.06 secondsuser cpu time 0.06 secondssystem cpu time 0.00 secondsmemory 2618.21kOS Memory 55744.00kTimestamp 02/21/2024 06:18:34 PMStep Count 112 Switch Count 2Page Faults 0Page Reclaims 260Page Swaps 0Voluntary Context Switches 18Involuntary Context Switches 0Block Input Operations 0Block Output Operations 88318 proc reg plots=none data=auto;319 title2 'Cell means coding with interactions. Weight is uncentered.';320 title3 'Compare F = 4.36 for interaction';321 model lper100k = c1 c2 c3 cw1 cw2 cw3 / noint ;322 Interactions: test cw1 = cw2 = cw3;323 USvsEurSlope: test cw1 = cw2;324 USvsJapSlope: test cw1 = cw3;325 EurvsJapSlope: test cw2 = cw3;326327 quit;NOTE: PROCEDURE REG used (Total process time):real time 0.08 secondsuser cpu time 0.08 secondssystem cpu time 0.01 secondsmemory 2500.03kOS Memory 55744.00kTimestamp 02/21/2024 06:18:34 PMStep Count 113 Switch Count 3Page Faults 0Page Reclaims 263Page Swaps 0Voluntary Context Switches 29Involuntary Context Switches 0Block Input Operations 0Block Output Operations 144328329330331332 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;344