STA 402S 1998: Introduction to SAS (Lesson One)
tuzo.erin > ls tuzo.erin > ls cars.dat cars98a.sas class90.dat intro98a.sas tuzo.erin > less class90.dat 1 2 9 1 7 8 4 3 5 2 6 10 10 10 5 0 0 0 0 55 43 0 2 10 10 5 9 10 8 6 8 10 10 8 9 9 9 9 10 10 66 79 1 2 10 10 5 10 10 10 9 8 10 10 10 10 10 10 9 10 10 94 67 1 2 10 10 8 9 10 7 10 9 10 10 10 9 10 10 9 10 10 81 65 0 1 10 1 0 0 8 6 5 2 10 9 0 0 10 6 0 5 0 54 29
And so on. There are 21 columns and 62 rows of data; columns not aligned. This is a typical data file, in which the rows are cases, and the columns are variables. If there are lots of variables, you may need more than one line of data for each case. In this instance the cases are people, but they can be anything -- cities, families, farms, days (time series analysis).
Now the COMMAND FILE, a simple program that reads the data and specifies what statistics to compute.
tuzo.erin > cat intro98a.sas options linesize=79 pagesize=100; title 'Grades from STA 302: Fall, 1990'; proc format; /* Used to label values of the categorical variables */ value sexfmt 0 = 'Male' 1 = 'Female'; value ethfmt 1 = 'Chinese' 2 = 'European' 3 = 'Other' ; data erindale; infile 'class90.dat'; input sex ethnic quiz1-quiz8 comp1-comp9 midterm final; /* Drop lowest score for quiz & computer */ quizave = ( sum(of quiz1-quiz8) - min(of quiz1-quiz8) ) / 7; compave = ( sum(of comp1-comp9) - min(of comp1-comp9) ) / 8; label ethnic = 'Apparent ethnic background (ancestry)' quizave = 'Quiz Average (drop lowest)' compave = 'Computer Average (drop lowest)'; mark = .3*quizave*10 + .1*compave*10 + .3*midterm + .3*final; label mark = 'Final Mark'; format sex sexfmt.; /* Associates sex & ethnic */ format ethnic ethfmt.; /* with formats defined above */ proc freq; tables sex ethnic; proc means n mean std; var quiz1 -- mark; /* single dash only works with numbered lists, like quiz1-quiz8 */ proc freq; tables sex*ethnic / chisq; proc corr; var final midterm quizave compave; proc ttest; class sex; var mark; /* Predict final exam score from midterm, quiz & computer */ proc reg simple; model final = midterm quizave compave / ss1; smalstuf: test quizave = 0, compave = 0;
That was the end of the command file. On tuzo we must use the command sasb instead of sas.
tuzo.erin > sasb class90 tuzo.erin > tuzo.erin > ls cars.dat class90.dat intro98a.lst cars98a.sas intro98a.log intro98a.sas tuzo.erin >
SAS produces two output files, the log and the list file. If you have errors, there may be no lst file, and your error messages will appear on the log file.
tuzo.erin > cat intro98a.log The SAS System 13:58 Sunday, January 7, 1998 NOTE: Copyright(c) 1989 by SAS Institute Inc., Cary, NC USA. NOTE: SAS (r) Proprietary Software Release 6.07 TS203 Licensed to UNIVERSITY OF TORONTO, Site 0008987003. NOTE: AUTOEXEC processing beginning; file is /local/lib/sas/autoexec.sas. NOTE: SAS initialization used: real time 0.255 seconds cpu time 0.120 seconds NOTE: AUTOEXEC processing completed. 1 options linesize=79 pagesize=100; 2 title 'Grades from STA 302: Fall, 1990'; 3 4 proc format; 4 /* Used to label values of the categorical variables */ 5 value sexfmt 0 = 'Male' 1 = 'Female'; NOTE: Format SEXFMT has been output. 6 value ethfmt 1 = 'Chinese' 7 2 = 'European' 8 3 = 'Other' ; NOTE: Format ETHFMT has been output. 9 NOTE: PROCEDURE FORMAT used: real time 0.122 seconds cpu time 0.040 seconds 10 data erindale; 11 infile 'class90.dat'; 12 input sex ethnic quiz1-quiz8 comp1-comp9 midterm final; 13 /* Drop lowest score for quiz & computer */ 14 quizave = ( sum(of quiz1-quiz8) - min(of quiz1-quiz8) ) / 7; 15 compave = ( sum(of comp1-comp9) - min(of comp1-comp9) ) / 8; 16 label ethnic = 'Apparent ethnic background (ancestry)' 17 quizave = 'Quiz Average (drop lowest)' 18 compave = 'Computer Average (drop lowest)'; 19 mark = .3*quizave*10 + .1*compave*10 + .3*midterm + .3*final; 20 label mark = 'Final Mark'; 21 format sex sexfmt.; /* Associates sex & ethnic */ 22 format ethnic ethfmt.; /* with formats defined above */ 23 NOTE: The infile 'class90.dat' is: File Name=/res/jbrunner/class/sta402s98/class90.dat, Owner Name=jbrunner,Group Name=research, Access Permission=rw-------, File Size (bytes)=4372 NOTE: 62 records were read from the infile 'class90.dat'. The minimum record length was 65. The maximum record length was 77. NOTE: The data set WORK.ERINDALE has 62 observations and 24 variables. NOTE: DATA statement used: real time 0.160 seconds cpu time 0.060 seconds 24 proc freq; 25 tables sex ethnic; NOTE: The PROCEDURE FREQ printed page 1. NOTE: PROCEDURE FREQ used: real time 0.186 seconds cpu time 0.060 seconds 26 proc means n mean std; 27 var quiz1 -- mark; /* single dash only works with numbered 28 lists, like quiz1-quiz8 */ NOTE: The PROCEDURE MEANS printed page 2. NOTE: PROCEDURE MEANS used: real time 0.025 seconds cpu time 0.030 seconds 29 proc freq; 30 tables sex*ethnic / chisq; NOTE: The PROCEDURE FREQ printed page 3. NOTE: PROCEDURE FREQ used: real time 0.291 seconds cpu time 0.040 seconds 31 proc corr; 32 var final midterm quizave compave; NOTE: The PROCEDURE CORR printed page 4. 2 The SAS System 13:58 Sunday, January 7, 1998 NOTE: PROCEDURE CORR used: real time 0.022 seconds cpu time 0.020 seconds 33 proc ttest; 34 class sex; 35 var mark; 36 37 /* Predict final exam score from midterm, quiz & computer */ 38 NOTE: The PROCEDURE TTEST printed page 5. NOTE: PROCEDURE TTEST used: real time 0.015 seconds cpu time 0.020 seconds 39 proc reg simple; 40 model final = midterm quizave compave / ss1; 41 42 43 NOTE: 62 observations read. 62 observations used in computations. NOTE: The PROCEDURE REG printed pages 6-7. NOTE: PROCEDURE REG used: real time 0.382 seconds cpu time 0.030 seconds NOTE: The SAS System used: real time 1.511 seconds cpu time 0.460 seconds NOTE: SAS Institute Inc., SAS Circle, PO Box 8000, Cary, NC 27512-8000 tuzo.erin > tuzo.erin > cat intro98a.lst Grades from STA 302: Fall, 1990 1 13:58 Sunday, January 7, 1998 Cumulative Cumulative SEX Frequency Percent Frequency Percent ---------------------------------------------------- Male 39 62.9 39 62.9 Female 23 37.1 62 100.0 Apparent ethnic background (ancestry) Cumulative Cumulative ETHNIC Frequency Percent Frequency Percent ------------------------------------------------------ Chinese 41 66.1 41 66.1 European 15 24.2 56 90.3 Other 6 9.7 62 100.0 Variable Label N Mean Std Dev ------------------------------------------------------------------------ QUIZ1 62 9.0967742 2.2739413 QUIZ2 62 5.8870968 3.2294995 QUIZ3 62 6.0483871 2.3707744 QUIZ4 62 7.7258065 2.1590022 QUIZ5 62 9.0645161 1.4471109 QUIZ6 62 7.1612903 1.9264641 QUIZ7 62 5.7903226 2.1204477 QUIZ8 62 6.3064516 2.3787909 COMP1 62 9.1451613 1.1430011 COMP2 62 8.8225806 1.7604414 COMP3 62 8.3387097 2.5020880 COMP4 62 7.8548387 3.2180168 COMP5 62 9.4354839 1.7237109 COMP6 62 7.8548387 2.4350364 COMP7 62 6.6451613 2.7526248 COMP8 62 8.8225806 1.6745363 COMP9 62 8.2419355 3.7050497 MIDTERM 62 70.1935484 13.6235557 FINAL 62 49.4677419 17.5141327 QUIZAVE Quiz Average (drop lowest) 62 7.6751152 1.1266917 COMPAVE Computer Average (drop lowest) 62 8.8346774 1.1204997 MARK Final Mark 62 67.7584101 11.0235746 ------------------------------------------------------------------------ TABLE OF SEX BY ETHNIC SEX ETHNIC(Apparent ethnic background (ancestry)) Frequency| Percent | Row Pct | Col Pct |Chinese |European|Other | Total ---------+--------+--------+--------+ Male | 27 | 7 | 5 | 39 | 43.55 | 11.29 | 8.06 | 62.90 | 69.23 | 17.95 | 12.82 | | 65.85 | 46.67 | 83.33 | ---------+--------+--------+--------+ Female | 14 | 8 | 1 | 23 | 22.58 | 12.90 | 1.61 | 37.10 | 60.87 | 34.78 | 4.35 | | 34.15 | 53.33 | 16.67 | ---------+--------+--------+--------+ Total 41 15 6 62 66.13 24.19 9.68 100.00 STATISTICS FOR TABLE OF SEX BY ETHNIC Statistic DF Value Prob ------------------------------------------------------ Chi-Square 2 2.921 0.232 Likelihood Ratio Chi-Square 2 2.996 0.224 Mantel-Haenszel Chi-Square 1 0.000 0.995 Phi Coefficient 0.217 Contingency Coefficient 0.212 Cramer's V 0.217 Sample Size = 62 WARNING: 33% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Grades from STA 302: Fall, 1990 4 13:58 Sunday, January 7, 1998 Correlation Analysis 4 'VAR' Variables: FINAL MIDTERM QUIZAVE COMPAVE Simple Statistics Variable N Mean Std Dev Sum FINAL 62 49.467742 17.514133 3067.000000 MIDTERM 62 70.193548 13.623556 4352.000000 QUIZAVE 62 7.675115 1.126692 475.857143 COMPAVE 62 8.834677 1.120500 547.750000 Simple Statistics Variable Minimum Maximum Label FINAL 15.000000 89.000000 MIDTERM 44.000000 103.000000 QUIZAVE 4.571429 9.714286 Quiz Average (drop lowest) COMPAVE 5.000000 10.000000 Computer Average (drop lowest) Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 62 FINAL MIDTERM QUIZAVE COMPAVE FINAL 1.00000 0.51078 0.47127 0.14434 0.0 0.0001 0.0001 0.2630 MIDTERM 0.51078 1.00000 0.59294 0.41277 0.0001 0.0 0.0001 0.0009 QUIZAVE 0.47127 0.59294 1.00000 0.52649 Quiz Average (drop lowest) 0.0001 0.0001 0.0 0.0001 COMPAVE 0.14434 0.41277 0.52649 1.00000 Computer Average (drop lowest) 0.2630 0.0009 0.0001 0.0 Grades from STA 302: Fall, 1990 5 13:58 Sunday, January 7, 1998 TTEST PROCEDURE Variable: MARK Final Mark SEX N Mean Std Dev Std Error Minimum Maximum ------------------------------------------------------------------------------- Male 39 67.62097070 10.11112521 1.61907581 43.61428571 89.93214286 Female 23 67.99145963 12.65945704 2.63967927 48.48214286 95.45714286 Variances T DF Prob>|T| --------------------------------------- Unequal -0.1196 38.5 0.9054 Equal -0.1268 60.0 0.8995 For H0: Variances are equal, F' = 1.57 DF = (22,38) Prob>F' = 0.2190 Grades from STA 302: Fall, 1990 6 13:58 Sunday, January 7, 1998 Descriptive Statistics Variables Sum Mean Uncorrected SS INTERCEP 62 1 62 MIDTERM 4352 70.193548387 316804 QUIZAVE 475.85714286 7.6751152074 3729.6938776 COMPAVE 547.75 8.8346774194 4915.78125 FINAL 3067 49.467741935 170429 Variables Variance Std Deviation INTERCEP 0 0 MIDTERM 185.60126917 13.623555673 QUIZAVE 1.2694341618 1.1266916889 COMPAVE 1.2555195664 1.1204996949 FINAL 306.744844 17.514132693 Model: MODEL1 Dependent Variable: FINAL Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 3 6216.33032 2072.11011 9.618 0.0001 Error 58 12495.10516 215.43285 C Total 61 18711.43548 Root MSE 14.67763 R-square 0.3322 Dep Mean 49.46774 Adj R-sq 0.2977 C.V. 29.67112 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 0.554234 16.09660613 0.034 0.9727 MIDTERM 1 0.498057 0.17318547 2.876 0.0056 QUIZAVE 1 5.371214 2.24349814 2.394 0.0199 COMPAVE 1 -3.086879 1.99437533 -1.548 0.1271 Variable Variable DF Type I SS Label INTERCEP 1 151718 Intercept MIDTERM 1 4881.795290 QUIZAVE 1 818.430992 Quiz Average (drop lowest) COMPAVE 1 516.104040 Computer Average (drop lowest) Dependent Variable: FINAL Test: SMALSTUF Numerator: 667.2675 DF: 2 F value: 3.0973 Denominator: 215.4328 DF: 58 Prob>F: 0.0527