1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;7071 /********************* cars2.sas ***************************/72 title 'Regression on Metric Cars Data';7374 /* Read data directly from Excel spreadsheet */75 proc import datafile="/home/brunner0/441s18/mcars4.xlsx"76 out=cars dbms=xlsx replace;77 getnames=yes;78 /* Input data file is mcars4.xlsx79 Ouput data set is called cars80 dbms=xlsx The input file is an Excel spreadsheet.81 Necessary to read an Excel spreadsheet directly under unix/linux82 Works in PC environment too except for Excel 4.0 spreadsheets83 If there are multiple sheets, use sheet="sheet1" or something.84 replace If the data set cars already exists, replace it.85 getnames=yes Use column names as variable names. */86NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run withoptions MSGLEVEL=I.NOTE: The import data set has 100 observations and 4 variables.NOTE: WORK.CARS data set was successfully created.NOTE: PROCEDURE IMPORT used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 2618.75kOS Memory 28840.00kTimestamp 01/06/2018 03:04:54 AMStep Count 18 Switch Count 1Page Faults 0Page Reclaims 934Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 24Block Output Operations 26487 proc print;88NOTE: There were 100 observations read from the data set WORK.CARS.NOTE: PROCEDURE PRINT used (Total process time):real time 0.12 secondsuser cpu time 0.13 secondssystem cpu time 0.00 secondsmemory 2548.46kOS Memory 29096.00kTimestamp 01/06/2018 03:04:54 AMStep Count 19 Switch Count 0Page Faults 0Page Reclaims 902Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 4889 data auto;90 set cars;91 mpg = 100/lper100k * 0.6214/0.2642;92 Country = Cntry; /* I just like the spelling more */93 label Country = 'Location of Head Office'94 lper100k = 'Litres per 100 kilometers'95 mpg = 'Miles per Gallon'96 weight = 'Weight in kg'97 length = 'Length in meters';98 /* Indicator dummy vars: Ref category is Japanese */99 if country = 'US' then c1=1; else c1=0;100 if country = 'Europ' then c2=1; else c2=0;101 /* Interaction Terms */102 cw1 = c1*weight; cw2 = c2*weight;103 cL1 = c1*length; cL2 = c2*length;104 /* This way of creating dummy variables is safe only because105 Country is never missing. If it could be missing, better is106 if country = ' ' then c1 = .;107 else if country = 'US' then c1=1;108 else c1=0;109 if country = ' ' then c2 = .;110 else if country = 'Europ' then c2=1;111 else c2=0;112 Note that a blank space is the missing value code for character variables,113 while a period is missing for numeric variables. */114NOTE: There were 100 observations read from the data set WORK.CARS.NOTE: The data set WORK.AUTO has 100 observations and 12 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 809.96kOS Memory 29612.00kTimestamp 01/06/2018 03:04:54 AMStep Count 20 Switch Count 1Page Faults 0Page Reclaims 183Page Swaps 0Voluntary Context Switches 7Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264115 proc freq;116 title2 'Check dummy variables';117 tables (c1 c2)*country / norow nocol nopercent;118119 /* First an analysis with country only. */120121 /* Questions for every significance test:122 * What is E(y|x) for the model SAS is using?123 * Give the null hypothesis in symbols.124 * Do you reject H0 at alpha = 0.05? Answer Yes or No.125 * In plain, non-statistical language, what do you conclude? */126127NOTE: There were 100 observations read from the data set WORK.AUTO.NOTE: PROCEDURE FREQ used (Total process time):real time 0.04 secondsuser cpu time 0.04 secondssystem cpu time 0.00 secondsmemory 1772.59kOS Memory 30384.00kTimestamp 01/06/2018 03:04:54 AMStep Count 21 Switch Count 3Page Faults 0Page Reclaims 557Page Swaps 0Voluntary Context Switches 15Involuntary Context Switches 0Block Input Operations 0Block Output Operations 528128 proc means;129 title2 'Litres per 100 k Broken Down by Country';130 class Country;131 var lper100k;132NOTE: There were 100 observations read from the data set WORK.AUTO.NOTE: PROCEDURE MEANS used (Total process time):real time 0.02 secondsuser cpu time 0.03 secondssystem cpu time 0.01 secondsmemory 9047.03kOS Memory 39356.00kTimestamp 01/06/2018 03:04:54 AMStep Count 22 Switch Count 2Page Faults 0Page Reclaims 2364Page Swaps 0Voluntary Context Switches 23Involuntary Context Switches 0Block Input Operations 0Block Output Operations 24133 proc reg plots = none; /* Suppress diagnostic plots for now*/134 title2 'Regression with Just Country';135 model lper100k = c1 c2;136 USvsEURO: test c1=c2;137NOTE: PROCEDURE REG used (Total process time):real time 0.06 secondsuser cpu time 0.06 secondssystem cpu time 0.00 secondsmemory 2629.78kOS Memory 33984.00kTimestamp 01/06/2018 03:04:54 AMStep Count 23 Switch Count 2Page Faults 0Page Reclaims 870Page Swaps 0Voluntary Context Switches 17Involuntary Context Switches 0Block Input Operations 0Block Output Operations 56138 proc reg plots = none;139 title2 'Country, Weight and Length';140 model lper100k = c1 c2 weight length;141 country: test c1 = c2 = 0; /* Country controlling for wgt, length */142 USvsEURO: test c1=c2; /* US vs. Europe controlling for wgt, length */143 wgt_len: test weight=length=0; /* wgt, length controlling for Country */144145 /* Proportions of remaining variation, using a = sF/(n-p+sF) */146NOTE: PROCEDURE REG used (Total process time):real time 0.08 secondsuser cpu time 0.08 secondssystem cpu time 0.00 secondsmemory 2371.21kOS Memory 33984.00kTimestamp 01/06/2018 03:04:54 AMStep Count 24 Switch Count 2Page Faults 0Page Reclaims 264Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 96147 proc iml;NOTE: IML Ready148 title2 'Proportion of remaining variation';149 print "Country controlling for Weight and Length";150 n = 100;150 ! p = 3;150 ! s = 2;151 F = 6.90;151 ! a = s*F/(n-p + s*F);152 print a;153154 print "Weight and Length controlling for Country";155 F = 115.16;155 ! a = s*F/(n-p + s*F);156 print a;157NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.02 secondssystem cpu time 0.00 secondsmemory 418.56kOS Memory 32164.00kTimestamp 01/06/2018 03:04:54 AMStep Count 25 Switch Count 1Page Faults 0Page Reclaims 196Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 0158 proc reg plots = none;159 title2 'Country, Weight and Length with Interactions';160 model lper100k = c1 c2 weight length cw1 cw2 cL1 cL2;161 country: test c1 = c2 = 0; /* Is it really still country? */162 Interactions: test cw1 = cw2 = cL1 = cL2 = 0;163164 /* Centering an explanatory variable by subtracting off the mean affects the165 intercept, but not the relationships among variables. I want to create a new166 data set with weight and length centered, and to avoid confusion167 I will make sure the variables are nicely labelled. */168NOTE: PROCEDURE REG used (Total process time):real time 0.07 secondsuser cpu time 0.08 secondssystem cpu time 0.00 secondsmemory 2526.53kOS Memory 33984.00kTimestamp 01/06/2018 03:04:54 AMStep Count 26 Switch Count 2Page Faults 0Page Reclaims 262Page Swaps 0Voluntary Context Switches 14Involuntary Context Switches 0Block Input Operations 0Block Output Operations 72169 proc standard mean=0 data=auto out=cntrd;170 var weight length;171172 /* In the new data set "cntrd," weight and length are adjusted to have mean173 zero (the sample means have been subtracted from each observation). If I had174 said mean=0 std=1, they would have been converted to z-scores. All the other175 variables (including the product terms) are as they were before, and the176 labels are the same as before too. */177NOTE: The data set WORK.CNTRD has 100 observations and 12 variables.NOTE: PROCEDURE STANDARD used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 690.50kOS Memory 32684.00kTimestamp 01/06/2018 03:04:54 AMStep Count 27 Switch Count 1Page Faults 0Page Reclaims 120Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264178 data centered;179 set cntrd; /* Now centered has everything in cntrd */180 /* Re-create Interaction Terms and re-label explanatory vars*/181 cw1 = c1*weight; cw2 = c2*weight;182 cL1 = c1*length; cL2 = c2*length;183 label weight = 'Weight in kg (Centered)'184 length = 'Length in cm (Centered)';185186 /* By default, SAS procedures use the most recently created data set,187 but specify it anyway. */188NOTE: There were 100 observations read from the data set WORK.CNTRD.NOTE: The data set WORK.CENTERED has 100 observations and 12 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.01 secondssystem cpu time 0.00 secondsmemory 808.18kOS Memory 32684.00kTimestamp 01/06/2018 03:04:54 AMStep Count 28 Switch Count 1Page Faults 0Page Reclaims 123Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264189 proc reg plots=none simple data=centered;190 title2 'Weight and length are now centered: Mean=0';191 model lper100k = c1 c2 weight length cw1 cw2 cL1 cL2;192 country: test c1 = c2 = 0; /* Does this make better sense? */193 Interactions: test cw1 = cw2 = cL1 = cL2 = 0;194195196197198199200201202 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;214