1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;NOTE: ODS statements in the SAS Studio environment may disable some output features.7374 /* permutation.sas */7576 %include '/home/u1407221/441s24/SAS02/senicread.sas';NOTE: Format YESNOFMT has been output.NOTE: Format REGFMT has been output.NOTE: PROCEDURE FORMAT used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 243.06kOS Memory 29860.00kTimestamp 01/16/2024 06:34:08 PMStep Count 145 Switch Count 0Page Faults 0Page Reclaims 20Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 40146 title "Permutation and randomization tests";147NOTE: The infile '/home/u1407221/441s24/SAS02/openSENIC2.data.txt' is:Filename=/home/u1407221/441s24/SAS02/openSENIC2.data.txt,Owner Name=u1407221,Group Name=oda,Access Permission=-rw-r--r--,Last Modified=21Dec2023:11:37:19,File Size (bytes)=8585NOTE: 100 records were read from the infile '/home/u1407221/441s24/SAS02/openSENIC2.data.txt'.The minimum record length was 83.The maximum record length was 83.NOTE: Missing values were generated as a result of performing an operation on missing values.Each place is given by: (Number of times) at (Line):(Column).3 at 112:16 3 at 114:44 2 at 115:43NOTE: The data set WORK.SENIC has 100 observations and 18 variables.NOTE: DATA statement used (Total process time):real time 0.02 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 691.03kOS Memory 30376.00kTimestamp 01/16/2024 06:34:08 PMStep Count 146 Switch Count 3Page Faults 0Page Reclaims 129Page Swaps 0Voluntary Context Switches 27Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264148 proc print data=senic (obs=10);149NOTE: There were 10 observations read from the data set WORK.SENIC.NOTE: PROCEDURE PRINT used (Total process time):real time 0.05 secondsuser cpu time 0.05 secondssystem cpu time 0.01 secondsmemory 3023.46kOS Memory 30888.00kTimestamp 01/16/2024 06:34:08 PMStep Count 147 Switch Count 0Page Faults 0Page Reclaims 245Page Swaps 0Voluntary Context Switches 0Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8150 proc freq data=senic;151 title2 'The SENIC data';152 title3 "Fisher's exact test is part of the default output";153 tables mdschl*agecat / nocol nopercent fisher;154NOTE: There were 100 observations read from the data set WORK.SENIC.NOTE: PROCEDURE FREQ used (Total process time):real time 0.04 secondsuser cpu time 0.04 secondssystem cpu time 0.00 secondsmemory 1976.62kOS Memory 31664.00kTimestamp 01/16/2024 06:34:08 PMStep Count 148 Switch Count 6Page Faults 0Page Reclaims 295Page Swaps 0Voluntary Context Switches 43Involuntary Context Switches 0Block Input Operations 0Block Output Operations 552155 proc freq data=senic;156 title3 'An exact permutation test';157 tables mdschl*region / norow nopercent;158 exact pchi / maxtime=120; /* Unnecessary 2 minute limit */159160 /******************************************************************************/161 /* Illustrate anova-like tests with the scab disease data */162 /******************************************************************************/163NOTE: There were 100 observations read from the data set WORK.SENIC.NOTE: PROCEDURE FREQ used (Total process time):real time 0.04 secondsuser cpu time 0.05 secondssystem cpu time 0.00 secondsmemory 1247.56kOS Memory 31920.00kTimestamp 01/16/2024 06:34:08 PMStep Count 149 Switch Count 5Page Faults 0Page Reclaims 209Page Swaps 0Voluntary Context Switches 41Involuntary Context Switches 1Block Input Operations 0Block Output Operations 544164 proc import datafile="/home/u1407221/441s24/SAS03/ScabDisease.xlsx"165 out=scab dbms=xlsx replace;166 getnames=yes;167 /* Input data file is ScabDisease.xlsx168 Ouput data table is called scab.169 dbms=xlsx The input file is an Excel spreadsheet.170 Necessary to read an Excel spreadsheet directly under unix/linux171 Works in PC environment too except for Excel 4.0 spreadsheets172 If there are multiple sheets, use sheet="sheet1" or something.173 replace If the data table already exists, replace it. Use this!174 getnames=yes Use column names as variable names. Beware of embedded,175 leading and trailing blanks.176 By default, blanks in a spreadsheet are missing. */177NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run withoptions MSGLEVEL=I.NOTE: The import data set has 32 observations and 2 variables.NOTE: WORK.SCAB data set was successfully created.NOTE: PROCEDURE IMPORT used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 3344.81kOS Memory 34044.00kTimestamp 01/16/2024 06:34:08 PMStep Count 150 Switch Count 2Page Faults 0Page Reclaims 766Page Swaps 0Voluntary Context Switches 18Involuntary Context Switches 0Block Input Operations 0Block Output Operations 272178 proc print data=scab;179 title2 'The scab disease data';180 run;NOTE: There were 32 observations read from the data set WORK.SCAB.NOTE: PROCEDURE PRINT used (Total process time):real time 0.02 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 618.15kOS Memory 31656.00kTimestamp 01/16/2024 06:34:08 PMStep Count 151 Switch Count 1Page Faults 0Page Reclaims 64Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8181182 data potato;183 set scab;184 label infection = 'Ave percent surface covered';185NOTE: There were 32 observations read from the data set WORK.SCAB.NOTE: The data set WORK.POTATO has 32 observations and 2 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 941.21kOS Memory 31916.00kTimestamp 01/16/2024 06:34:08 PMStep Count 152 Switch Count 2Page Faults 0Page Reclaims 141Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264186 proc glm data=potato order=data;187 title3 'Overall F-test with proc glm for Comparison';188 class condition;189 model infection=condition;190 ods select OverallANOVA; /* Just the ANOVA summary table */191192 /* How to find out the names of the tables:193 ods stands for Output Delivery System.194 Information is written on the log file as the tables are produced.195 ods trace is esier than looking in the documentation.196197 ods trace on;198 proc glm data=potato order=data;199 class condition;200 model infection=condition;201 run;202 ods trace off;203 */204205 /* Wilcoxon scores are the ranks of the observations. */NOTE: PROCEDURE GLM used (Total process time):real time 0.02 secondsuser cpu time 0.02 secondssystem cpu time 0.01 secondsmemory 1943.40kOS Memory 32696.00kTimestamp 01/16/2024 06:34:08 PMStep Count 153 Switch Count 4Page Faults 0Page Reclaims 255Page Swaps 0Voluntary Context Switches 34Involuntary Context Switches 0Block Input Operations 0Block Output Operations 768206 proc npar1way wilcoxon data=potato;207 title3 'Classical Kruskal-Wallis Test: Compare F = 4.26, p = 0.0043';208 class condition;209 var infection;210211 /* Permutation test using the raw data. The usual F-test statistic is212 computed for each permutation. The anova option displays the ANOVA213 summary table and suppresses some other statistics. The anova214 option does not affect the permutation or randomization p-value. */215NOTE: PROCEDURE NPAR1WAY used (Total process time):real time 0.28 secondsuser cpu time 0.12 secondssystem cpu time 0.01 secondsmemory 21309.56kOS Memory 51376.00kTimestamp 01/16/2024 06:34:08 PMStep Count 154 Switch Count 1Page Faults 0Page Reclaims 5807Page Swaps 0Voluntary Context Switches 702Involuntary Context Switches 1Block Input Operations 0Block Output Operations 1208216 proc npar1way anova scores=data data=potato;217 title3 'Randomization Test';218 class condition;219 var infection;220 exact / mc seed=88888 maxtime=300;221222 /* Follow up with proc multtest, which randomizes the t statistic for testing223 a single linear combination equal to zero.224 Requesting all 21 pairwise comparisons plus225 * Control vs. Average of Spring226 * Control vs. Average of Fall227 * Average of Spring vs. Average of Fall228 */229NOTE: PROCEDURE NPAR1WAY used (Total process time):real time 0.43 secondsuser cpu time 0.20 secondssystem cpu time 0.02 secondsmemory 4764.03kOS Memory 54448.00kTimestamp 01/16/2024 06:34:09 PMStep Count 155 Switch Count 1Page Faults 0Page Reclaims 1460Page Swaps 0Voluntary Context Switches 1318Involuntary Context Switches 1Block Input Operations 0Block Output Operations 1448230 proc multtest data=potato order=data permutation nsample=20000231 seed=99999 out=pvals;232 title2 'Randomization multiple comparisons with proc multtest';233 class condition;234 /* Pairwise Bonferroni */235 contrast 'Control vs. Spring300' 1 -1 0 0 0 0 0; /* 1.0000 */236 contrast 'Control vs. Spring600' 1 0 -1 0 0 0 0; /* 1.0000 */237 contrast 'Control vs. Spring1200' 1 0 0 -1 0 0 0; /* 1.0000 */238 contrast 'Control vs. Fall300' 1 0 0 0 -1 0 0; /* 0.0888 */239 contrast 'Control vs. Fall600' 1 0 0 0 0 -1 0; /* 1.0000 */240 contrast 'Control vs. Fall1200' 1 0 0 0 0 0 -1; /* 0.0096 */241 contrast 'Spring300 vs. Spring600' 0 1 -1 0 0 0 0;242 contrast 'Spring300 vs. Spring1200' 0 1 0 -1 0 0 0;243 contrast 'Spring300 vs. Fall300' 0 1 0 0 -1 0 0;244 contrast 'Spring300 vs. Fall600' 0 1 0 0 0 -1 0;245 contrast 'Spring300 vs. Fall1200' 0 1 0 0 0 0 -1;246 contrast 'Spring600 vs. Spring1200' 0 0 1 -1 0 0 0;247 contrast 'Spring600 vs. Fall300' 0 0 1 0 -1 0 0;248 contrast 'Spring600 vs. Fall600' 0 0 1 0 0 -1 0;249 contrast 'Spring600 vs. Fall1200' 0 0 1 0 0 0 -1;250 contrast 'Spring1200 vs. Fall300' 0 0 0 1 -1 0 0;251 contrast 'Spring1200 vs. Fall600' 0 0 0 1 0 -1 0;252 contrast 'Spring1200 vs. Fall1200' 0 0 0 1 0 0 -1;253 contrast 'Fall300 vs. Fall600' 0 0 0 0 1 -1 0;254 contrast 'Fall300 vs. Fall1200' 0 0 0 0 1 0 -1;255 contrast 'Fall600 vs. Fall1200' 0 0 0 0 0 1 -1;256 /* Averages */257 contrast 'Control vs. Spring' 3 -1 -1 -1 0 0 0;258 contrast 'Control vs. Fall' 3 0 0 0 -1 -1 -1;259 contrast 'Spring vs. Fall' 0 1 1 1 -1 -1 -1;260 test mean(infection); /* Requests t-tests */261 run;NOTE: The multiple testing procedure for this run is not closed. In cases with badly heteroskedastic data, tests for individualnull hypotheses can have inflated familywise Type I error rates.NOTE: There were 32 observations read from the data set WORK.POTATO.NOTE: The data set WORK.PVALS has 24 observations and 9 variables.NOTE: PROCEDURE MULTTEST used (Total process time):real time 0.21 secondsuser cpu time 0.21 secondssystem cpu time 0.00 secondsmemory 1634.40kOS Memory 53164.00kTimestamp 01/16/2024 06:34:09 PMStep Count 156 Switch Count 3Page Faults 0Page Reclaims 205Page Swaps 0Voluntary Context Switches 26Involuntary Context Switches 0Block Input Operations 0Block Output Operations 328262263 /* Calculate 99% confidence intervals for the permutation p-values,264 based on the output data set pvals. Reject the null hypothesis265 that the true permutation p-value = 0.05 at the 0.01 level if266 and only if the confidence interval does not include 0.05. */267268 proc print data=pvals;269 title3 'Output data set with permutation p-values';270271 /* Need the alpha = 0.01 critical value for the normal distribution. Use272 R, or ... */273NOTE: There were 24 observations read from the data set WORK.PVALS.NOTE: PROCEDURE PRINT used (Total process time):real time 0.04 secondsuser cpu time 0.05 secondssystem cpu time 0.00 secondsmemory 805.78kOS Memory 53160.00kTimestamp 01/16/2024 06:34:09 PMStep Count 157 Switch Count 1Page Faults 0Page Reclaims 87Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 24274 proc iml;NOTE: IML Ready275 title3 'What is that critical value?';276 CritVal = probit(0.995);277 print CritVal;278NOTE: Exiting IML.NOTE: PROCEDURE IML used (Total process time):real time 0.01 secondsuser cpu time 0.01 secondssystem cpu time 0.01 secondsmemory 664.34kOS Memory 53156.00kTimestamp 01/16/2024 06:34:09 PMStep Count 158 Switch Count 1Page Faults 0Page Reclaims 203Page Swaps 0Voluntary Context Switches 9Involuntary Context Switches 0Block Input Operations 0Block Output Operations 16279 data CI;280 set pvals;281 Lower99 = perm_p - 2.5758*sim_se;282 Upper99 = perm_p + 2.5758*sim_se;283 keep _contrast_ perm_p Lower99 Upper99;NOTE: There were 24 observations read from the data set WORK.PVALS.NOTE: The data set WORK.CI has 24 observations and 4 variables.NOTE: DATA statement used (Total process time):real time 0.00 secondsuser cpu time 0.00 secondssystem cpu time 0.00 secondsmemory 950.28kOS Memory 53932.00kTimestamp 01/16/2024 06:34:09 PMStep Count 159 Switch Count 2Page Faults 0Page Reclaims 160Page Swaps 0Voluntary Context Switches 12Involuntary Context Switches 0Block Input Operations 0Block Output Operations 264284 proc print data=CI;285 title3 '99% confidence intervals for the permutation p-values';286287 /******************************************************************************/288 /* Rank correlation with the SENIC data */289 /******************************************************************************/290NOTE: There were 24 observations read from the data set WORK.CI.NOTE: PROCEDURE PRINT used (Total process time):real time 0.02 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 631.65kOS Memory 53672.00kTimestamp 01/16/2024 06:34:09 PMStep Count 160 Switch Count 1Page Faults 0Page Reclaims 75Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 16291 proc corr data=senic nosimple;292 title2 'The SENIC Data';293 title3 'Pearson Product-moment Correlations';294 var nurses lngstay age infpercent;NOTE: PROCEDURE CORR used (Total process time):real time 0.03 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 864.12kOS Memory 53672.00kTimestamp 01/16/2024 06:34:09 PMStep Count 161 Switch Count 2Page Faults 0Page Reclaims 61Page Swaps 0Voluntary Context Switches 20Involuntary Context Switches 0Block Input Operations 0Block Output Operations 8295 proc corr data=senic nosimple spearman;296 title3 'Spearman Rank Correlations';297 var nurses lngstay age infpercent;298299 /* Length of stay and age is interesting because it's barely significant300 with Pearson but n.s. wth Spearman. */301302 /* The usual p-values for Spearman correlations from proc corr assume303 bivariate normality before ranking, and this assumption matters. It looks like304 proc freq with the scorr test will do permutation/randomization.305 Can use pcorr for the Pearson correlation. */306307 /* Permutation test for length of stay and age308 Compare rho = 0.12404, p = 0.2212 */NOTE: PROCEDURE CORR used (Total process time):real time 0.03 secondsuser cpu time 0.03 secondssystem cpu time 0.00 secondsmemory 981.03kOS Memory 53672.00kTimestamp 01/16/2024 06:34:09 PMStep Count 162 Switch Count 1Page Faults 0Page Reclaims 56Page Swaps 0Voluntary Context Switches 8Involuntary Context Switches 0Block Input Operations 0Block Output Operations 24309 proc freq data=senic;310 title3 'Randomization test on ranks: Compare p = 0.2212';311 tables lngstay*age / noprint; /* No frequency table */312 exact scorr / mc seed = 7777;313NOTE: There were 100 observations read from the data set WORK.SENIC.NOTE: PROCEDURE FREQ used (Total process time):real time 0.29 secondsuser cpu time 0.30 secondssystem cpu time 0.00 secondsmemory 1258.96kOS Memory 54192.00kTimestamp 01/16/2024 06:34:10 PMStep Count 163 Switch Count 5Page Faults 0Page Reclaims 194Page Swaps 0Voluntary Context Switches 37Involuntary Context Switches 1Block Input Operations 0Block Output Operations 560314 proc freq data=senic;315 title3 'Randomization test on raw numbers: Compare p = 0.0491';316 tables lngstay*age / noprint measures; /* No frequency table */317 exact pcorr / mc seed = 7777;318319320 /* This ran out of memory321 proc freq data=senic;322 title3 'Full exact test';323 tables lngstay*age / noprint;324 exact pcorr / maxtime = 120;325 */326327 /* Try a bigger MC sample size: 2 million */NOTE: There were 100 observations read from the data set WORK.SENIC.NOTE: PROCEDURE FREQ used (Total process time):real time 0.29 secondsuser cpu time 0.30 secondssystem cpu time 0.01 secondsmemory 1255.40kOS Memory 54192.00kTimestamp 01/16/2024 06:34:10 PMStep Count 164 Switch Count 5Page Faults 0Page Reclaims 198Page Swaps 0Voluntary Context Switches 37Involuntary Context Switches 0Block Input Operations 0Block Output Operations 600328 proc freq data=senic;329 title3 'Randomization test on raw numbers: Big Monte Carlo sample size';330 tables lngstay*age / noprint; /* No frequency table */331 exact pcorr / mc n=2000000 maxtime=120 seed = 7777;332 ods select PearsonCorrMC;333334 run;NOTE: There were 100 observations read from the data set WORK.SENIC.NOTE: PROCEDURE FREQ used (Total process time):real time 51.79 secondsuser cpu time 51.78 secondssystem cpu time 0.01 secondsmemory 1249.34kOS Memory 54192.00kTimestamp 01/16/2024 06:35:02 PMStep Count 165 Switch Count 6Page Faults 0Page Reclaims 195Page Swaps 0Voluntary Context Switches 53Involuntary Context Switches 40Block Input Operations 0Block Output Operations 600335336337338339340341342343344345346347348349350351 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;363