1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
NOTE: ODS statements in the SAS Studio environment may disable some output features.
73
74 /* permutation.sas */
75
76 %include '/home/u1407221/441s24/SAS02/senicread.sas';
NOTE: Format YESNOFMT has been output.
NOTE: Format REGFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 243.06k
OS Memory 29860.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 145 Switch Count 0
Page Faults 0
Page Reclaims 20
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
146 title "Permutation and randomization tests";
147
NOTE: The infile '/home/u1407221/441s24/SAS02/openSENIC2.data.txt' is:
Filename=/home/u1407221/441s24/SAS02/openSENIC2.data.txt,
Owner Name=u1407221,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=21Dec2023:11:37:19,
File Size (bytes)=8585
NOTE: 100 records were read from the infile '/home/u1407221/441s24/SAS02/openSENIC2.data.txt'.
The minimum record length was 83.
The maximum record length was 83.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
3 at 112:16 3 at 114:44 2 at 115:43
NOTE: The data set WORK.SENIC has 100 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 691.03k
OS Memory 30376.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 146 Switch Count 3
Page Faults 0
Page Reclaims 129
Page Swaps 0
Voluntary Context Switches 27
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
148 proc print data=senic (obs=10);
149
NOTE: There were 10 observations read from the data set WORK.SENIC.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.05 seconds
user cpu time 0.05 seconds
system cpu time 0.01 seconds
memory 3023.46k
OS Memory 30888.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 147 Switch Count 0
Page Faults 0
Page Reclaims 245
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
150 proc freq data=senic;
151 title2 'The SENIC data';
152 title3 "Fisher's exact test is part of the default output";
153 tables mdschl*agecat / nocol nopercent fisher;
154
NOTE: There were 100 observations read from the data set WORK.SENIC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.04 seconds
user cpu time 0.04 seconds
system cpu time 0.00 seconds
memory 1976.62k
OS Memory 31664.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 148 Switch Count 6
Page Faults 0
Page Reclaims 295
Page Swaps 0
Voluntary Context Switches 43
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 552
155 proc freq data=senic;
156 title3 'An exact permutation test';
157 tables mdschl*region / norow nopercent;
158 exact pchi / maxtime=120; /* Unnecessary 2 minute limit */
159
160 /******************************************************************************/
161 /* Illustrate anova-like tests with the scab disease data */
162 /******************************************************************************/
163
NOTE: There were 100 observations read from the data set WORK.SENIC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.04 seconds
user cpu time 0.05 seconds
system cpu time 0.00 seconds
memory 1247.56k
OS Memory 31920.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 149 Switch Count 5
Page Faults 0
Page Reclaims 209
Page Swaps 0
Voluntary Context Switches 41
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 544
164 proc import datafile="/home/u1407221/441s24/SAS03/ScabDisease.xlsx"
165 out=scab dbms=xlsx replace;
166 getnames=yes;
167 /* Input data file is ScabDisease.xlsx
168 Ouput data table is called scab.
169 dbms=xlsx The input file is an Excel spreadsheet.
170 Necessary to read an Excel spreadsheet directly under unix/linux
171 Works in PC environment too except for Excel 4.0 spreadsheets
172 If there are multiple sheets, use sheet="sheet1" or something.
173 replace If the data table already exists, replace it. Use this!
174 getnames=yes Use column names as variable names. Beware of embedded,
175 leading and trailing blanks.
176 By default, blanks in a spreadsheet are missing. */
177
NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
options MSGLEVEL=I.
NOTE: The import data set has 32 observations and 2 variables.
NOTE: WORK.SCAB data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 3344.81k
OS Memory 34044.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 150 Switch Count 2
Page Faults 0
Page Reclaims 766
Page Swaps 0
Voluntary Context Switches 18
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 272
178 proc print data=scab;
179 title2 'The scab disease data';
180 run;
NOTE: There were 32 observations read from the data set WORK.SCAB.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.02 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 618.15k
OS Memory 31656.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 151 Switch Count 1
Page Faults 0
Page Reclaims 64
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
181
182 data potato;
183 set scab;
184 label infection = 'Ave percent surface covered';
185
NOTE: There were 32 observations read from the data set WORK.SCAB.
NOTE: The data set WORK.POTATO has 32 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 941.21k
OS Memory 31916.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 152 Switch Count 2
Page Faults 0
Page Reclaims 141
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
186 proc glm data=potato order=data;
187 title3 'Overall F-test with proc glm for Comparison';
188 class condition;
189 model infection=condition;
190 ods select OverallANOVA; /* Just the ANOVA summary table */
191
192 /* How to find out the names of the tables:
193 ods stands for Output Delivery System.
194 Information is written on the log file as the tables are produced.
195 ods trace is esier than looking in the documentation.
196
197 ods trace on;
198 proc glm data=potato order=data;
199 class condition;
200 model infection=condition;
201 run;
202 ods trace off;
203 */
204
205 /* Wilcoxon scores are the ranks of the observations. */
NOTE: PROCEDURE GLM used (Total process time):
real time 0.02 seconds
user cpu time 0.02 seconds
system cpu time 0.01 seconds
memory 1943.40k
OS Memory 32696.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 153 Switch Count 4
Page Faults 0
Page Reclaims 255
Page Swaps 0
Voluntary Context Switches 34
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 768
206 proc npar1way wilcoxon data=potato;
207 title3 'Classical Kruskal-Wallis Test: Compare F = 4.26, p = 0.0043';
208 class condition;
209 var infection;
210
211 /* Permutation test using the raw data. The usual F-test statistic is
212 computed for each permutation. The anova option displays the ANOVA
213 summary table and suppresses some other statistics. The anova
214 option does not affect the permutation or randomization p-value. */
215
NOTE: PROCEDURE NPAR1WAY used (Total process time):
real time 0.28 seconds
user cpu time 0.12 seconds
system cpu time 0.01 seconds
memory 21309.56k
OS Memory 51376.00k
Timestamp 01/16/2024 06:34:08 PM
Step Count 154 Switch Count 1
Page Faults 0
Page Reclaims 5807
Page Swaps 0
Voluntary Context Switches 702
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 1208
216 proc npar1way anova scores=data data=potato;
217 title3 'Randomization Test';
218 class condition;
219 var infection;
220 exact / mc seed=88888 maxtime=300;
221
222 /* Follow up with proc multtest, which randomizes the t statistic for testing
223 a single linear combination equal to zero.
224 Requesting all 21 pairwise comparisons plus
225 * Control vs. Average of Spring
226 * Control vs. Average of Fall
227 * Average of Spring vs. Average of Fall
228 */
229
NOTE: PROCEDURE NPAR1WAY used (Total process time):
real time 0.43 seconds
user cpu time 0.20 seconds
system cpu time 0.02 seconds
memory 4764.03k
OS Memory 54448.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 155 Switch Count 1
Page Faults 0
Page Reclaims 1460
Page Swaps 0
Voluntary Context Switches 1318
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 1448
230 proc multtest data=potato order=data permutation nsample=20000
231 seed=99999 out=pvals;
232 title2 'Randomization multiple comparisons with proc multtest';
233 class condition;
234 /* Pairwise Bonferroni */
235 contrast 'Control vs. Spring300' 1 -1 0 0 0 0 0; /* 1.0000 */
236 contrast 'Control vs. Spring600' 1 0 -1 0 0 0 0; /* 1.0000 */
237 contrast 'Control vs. Spring1200' 1 0 0 -1 0 0 0; /* 1.0000 */
238 contrast 'Control vs. Fall300' 1 0 0 0 -1 0 0; /* 0.0888 */
239 contrast 'Control vs. Fall600' 1 0 0 0 0 -1 0; /* 1.0000 */
240 contrast 'Control vs. Fall1200' 1 0 0 0 0 0 -1; /* 0.0096 */
241 contrast 'Spring300 vs. Spring600' 0 1 -1 0 0 0 0;
242 contrast 'Spring300 vs. Spring1200' 0 1 0 -1 0 0 0;
243 contrast 'Spring300 vs. Fall300' 0 1 0 0 -1 0 0;
244 contrast 'Spring300 vs. Fall600' 0 1 0 0 0 -1 0;
245 contrast 'Spring300 vs. Fall1200' 0 1 0 0 0 0 -1;
246 contrast 'Spring600 vs. Spring1200' 0 0 1 -1 0 0 0;
247 contrast 'Spring600 vs. Fall300' 0 0 1 0 -1 0 0;
248 contrast 'Spring600 vs. Fall600' 0 0 1 0 0 -1 0;
249 contrast 'Spring600 vs. Fall1200' 0 0 1 0 0 0 -1;
250 contrast 'Spring1200 vs. Fall300' 0 0 0 1 -1 0 0;
251 contrast 'Spring1200 vs. Fall600' 0 0 0 1 0 -1 0;
252 contrast 'Spring1200 vs. Fall1200' 0 0 0 1 0 0 -1;
253 contrast 'Fall300 vs. Fall600' 0 0 0 0 1 -1 0;
254 contrast 'Fall300 vs. Fall1200' 0 0 0 0 1 0 -1;
255 contrast 'Fall600 vs. Fall1200' 0 0 0 0 0 1 -1;
256 /* Averages */
257 contrast 'Control vs. Spring' 3 -1 -1 -1 0 0 0;
258 contrast 'Control vs. Fall' 3 0 0 0 -1 -1 -1;
259 contrast 'Spring vs. Fall' 0 1 1 1 -1 -1 -1;
260 test mean(infection); /* Requests t-tests */
261 run;
NOTE: The multiple testing procedure for this run is not closed. In cases with badly heteroskedastic data, tests for individual
null hypotheses can have inflated familywise Type I error rates.
NOTE: There were 32 observations read from the data set WORK.POTATO.
NOTE: The data set WORK.PVALS has 24 observations and 9 variables.
NOTE: PROCEDURE MULTTEST used (Total process time):
real time 0.21 seconds
user cpu time 0.21 seconds
system cpu time 0.00 seconds
memory 1634.40k
OS Memory 53164.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 156 Switch Count 3
Page Faults 0
Page Reclaims 205
Page Swaps 0
Voluntary Context Switches 26
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 328
262
263 /* Calculate 99% confidence intervals for the permutation p-values,
264 based on the output data set pvals. Reject the null hypothesis
265 that the true permutation p-value = 0.05 at the 0.01 level if
266 and only if the confidence interval does not include 0.05. */
267
268 proc print data=pvals;
269 title3 'Output data set with permutation p-values';
270
271 /* Need the alpha = 0.01 critical value for the normal distribution. Use
272 R, or ... */
273
NOTE: There were 24 observations read from the data set WORK.PVALS.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.04 seconds
user cpu time 0.05 seconds
system cpu time 0.00 seconds
memory 805.78k
OS Memory 53160.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 157 Switch Count 1
Page Faults 0
Page Reclaims 87
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 24
274 proc iml;
NOTE: IML Ready
275 title3 'What is that critical value?';
276 CritVal = probit(0.995);
277 print CritVal;
278
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 664.34k
OS Memory 53156.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 158 Switch Count 1
Page Faults 0
Page Reclaims 203
Page Swaps 0
Voluntary Context Switches 9
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 16
279 data CI;
280 set pvals;
281 Lower99 = perm_p - 2.5758*sim_se;
282 Upper99 = perm_p + 2.5758*sim_se;
283 keep _contrast_ perm_p Lower99 Upper99;
NOTE: There were 24 observations read from the data set WORK.PVALS.
NOTE: The data set WORK.CI has 24 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 950.28k
OS Memory 53932.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 159 Switch Count 2
Page Faults 0
Page Reclaims 160
Page Swaps 0
Voluntary Context Switches 12
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
284 proc print data=CI;
285 title3 '99% confidence intervals for the permutation p-values';
286
287 /******************************************************************************/
288 /* Rank correlation with the SENIC data */
289 /******************************************************************************/
290
NOTE: There were 24 observations read from the data set WORK.CI.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.02 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 631.65k
OS Memory 53672.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 160 Switch Count 1
Page Faults 0
Page Reclaims 75
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 16
291 proc corr data=senic nosimple;
292 title2 'The SENIC Data';
293 title3 'Pearson Product-moment Correlations';
294 var nurses lngstay age infpercent;
NOTE: PROCEDURE CORR used (Total process time):
real time 0.03 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 864.12k
OS Memory 53672.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 161 Switch Count 2
Page Faults 0
Page Reclaims 61
Page Swaps 0
Voluntary Context Switches 20
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
295 proc corr data=senic nosimple spearman;
296 title3 'Spearman Rank Correlations';
297 var nurses lngstay age infpercent;
298
299 /* Length of stay and age is interesting because it's barely significant
300 with Pearson but n.s. wth Spearman. */
301
302 /* The usual p-values for Spearman correlations from proc corr assume
303 bivariate normality before ranking, and this assumption matters. It looks like
304 proc freq with the scorr test will do permutation/randomization.
305 Can use pcorr for the Pearson correlation. */
306
307 /* Permutation test for length of stay and age
308 Compare rho = 0.12404, p = 0.2212 */
NOTE: PROCEDURE CORR used (Total process time):
real time 0.03 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 981.03k
OS Memory 53672.00k
Timestamp 01/16/2024 06:34:09 PM
Step Count 162 Switch Count 1
Page Faults 0
Page Reclaims 56
Page Swaps 0
Voluntary Context Switches 8
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 24
309 proc freq data=senic;
310 title3 'Randomization test on ranks: Compare p = 0.2212';
311 tables lngstay*age / noprint; /* No frequency table */
312 exact scorr / mc seed = 7777;
313
NOTE: There were 100 observations read from the data set WORK.SENIC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.29 seconds
user cpu time 0.30 seconds
system cpu time 0.00 seconds
memory 1258.96k
OS Memory 54192.00k
Timestamp 01/16/2024 06:34:10 PM
Step Count 163 Switch Count 5
Page Faults 0
Page Reclaims 194
Page Swaps 0
Voluntary Context Switches 37
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 560
314 proc freq data=senic;
315 title3 'Randomization test on raw numbers: Compare p = 0.0491';
316 tables lngstay*age / noprint measures; /* No frequency table */
317 exact pcorr / mc seed = 7777;
318
319
320 /* This ran out of memory
321 proc freq data=senic;
322 title3 'Full exact test';
323 tables lngstay*age / noprint;
324 exact pcorr / maxtime = 120;
325 */
326
327 /* Try a bigger MC sample size: 2 million */
NOTE: There were 100 observations read from the data set WORK.SENIC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.29 seconds
user cpu time 0.30 seconds
system cpu time 0.01 seconds
memory 1255.40k
OS Memory 54192.00k
Timestamp 01/16/2024 06:34:10 PM
Step Count 164 Switch Count 5
Page Faults 0
Page Reclaims 198
Page Swaps 0
Voluntary Context Switches 37
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 600
328 proc freq data=senic;
329 title3 'Randomization test on raw numbers: Big Monte Carlo sample size';
330 tables lngstay*age / noprint; /* No frequency table */
331 exact pcorr / mc n=2000000 maxtime=120 seed = 7777;
332 ods select PearsonCorrMC;
333
334 run;
NOTE: There were 100 observations read from the data set WORK.SENIC.
NOTE: PROCEDURE FREQ used (Total process time):
real time 51.79 seconds
user cpu time 51.78 seconds
system cpu time 0.01 seconds
memory 1249.34k
OS Memory 54192.00k
Timestamp 01/16/2024 06:35:02 PM
Step Count 165 Switch Count 6
Page Faults 0
Page Reclaims 195
Page Swaps 0
Voluntary Context Switches 53
Involuntary Context Switches 40
Block Input Operations 0
Block Output Operations 600
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
363