STA 402S 1998: Introduction to SAS (Lesson One)


tuzo.erin >  ls
tuzo.erin > ls
cars.dat      cars98a.sas   class90.dat   intro98a.sas
tuzo.erin >  less class90.dat
1  2  9  1  7  8  4  3  5  2  6  10  10  10  5  0  0  0  0  55  43
0  2  10  10  5  9  10  8  6  8  10  10  8  9  9  9  9  10  10  66  79
1  2  10  10  5  10  10  10  9  8  10  10  10  10  10  10  9  10  10  94  67
1  2  10  10  8  9  10  7  10  9  10  10  10  9  10  10  9  10  10  81  65
0  1  10  1  0  0  8  6  5  2  10  9  0  0  10  6  0  5  0  54  29 

And so on. There are 21 columns and 62 rows of data; columns not aligned. This is a typical data file, in which the rows are cases, and the columns are variables. If there are lots of variables, you may need more than one line of data for each case. In this instance the cases are people, but they can be anything -- cities, families, farms, days (time series analysis).

Now the COMMAND FILE, a simple program that reads the data and specifies what statistics to compute.

tuzo.erin >  cat intro98a.sas
options linesize=79 pagesize=100;
title 'Grades from STA 302:  Fall, 1990';

proc format; /* Used to label values of the categorical variables */
     value sexfmt    0 = 'Male'   1 = 'Female';
     value ethfmt    1 = 'Chinese'
                     2 = 'European'
                     3 = 'Other' ;

data erindale;
     infile 'class90.dat';
     input sex ethnic quiz1-quiz8 comp1-comp9 midterm final;
     /* Drop lowest score for quiz & computer  */
     quizave = ( sum(of quiz1-quiz8) - min(of quiz1-quiz8) ) / 7;
     compave = ( sum(of comp1-comp9) - min(of comp1-comp9) ) / 8;
     label ethnic = 'Apparent ethnic background (ancestry)'
           quizave = 'Quiz Average (drop lowest)'
           compave = 'Computer Average (drop lowest)';
     mark = .3*quizave*10 + .1*compave*10 + .3*midterm + .3*final;
     label mark = 'Final Mark';
     format sex sexfmt.;       /* Associates sex & ethnic    */
     format ethnic ethfmt.;    /* with formats defined above */


proc freq;
     tables sex ethnic;
proc means n mean std;
     var quiz1 -- mark;        /*  single dash only works with numbered
                                  lists, like quiz1-quiz8    */
proc freq;
     tables sex*ethnic / chisq;
proc corr;
     var final midterm quizave compave;
proc ttest;
     class sex;
     var mark;

/* Predict final exam score from midterm, quiz & computer */

proc reg  simple;
     model final = midterm quizave compave / ss1;
     smalstuf: test quizave = 0, compave = 0;

That was the end of the command file. On tuzo we must use the command sasb instead of sas.


tuzo.erin > sasb class90
tuzo.erin > 
tuzo.erin >  ls
cars.dat      class90.dat   intro98a.lst
cars98a.sas   intro98a.log  intro98a.sas
tuzo.erin > 

SAS produces two output files, the log and the list file. If you have errors, there may be no lst file, and your error messages will appear on the log file.

tuzo.erin >  cat intro98a.log 
                                                         The SAS System
                       13:58 Sunday, January 7, 1998

NOTE: Copyright(c) 1989 by SAS Institute Inc., Cary, NC USA.
NOTE: SAS (r) Proprietary Software Release 6.07  TS203
      Licensed to UNIVERSITY OF TORONTO, Site 0008987003.


NOTE: AUTOEXEC processing beginning; file is /local/lib/sas/autoexec.sas.

NOTE: SAS initialization used:
      real time           0.255 seconds
      cpu time            0.120 seconds


NOTE: AUTOEXEC processing completed.





1          options linesize=79 pagesize=100;
2          title 'Grades from STA 302:  Fall, 1990';
3
4          proc format;
4                       /* Used to label values of the categorical variables
*/
5               value sexfmt    0 = 'Male'   1 = 'Female';
NOTE: Format SEXFMT has been output.
6               value ethfmt    1 = 'Chinese'
7                               2 = 'European'
8                               3 = 'Other' ;
NOTE: Format ETHFMT has been output.
9

NOTE: PROCEDURE FORMAT used:
      real time           0.122 seconds
      cpu time            0.040 seconds

10         data erindale;
11              infile 'class90.dat';
12              input sex ethnic quiz1-quiz8 comp1-comp9 midterm final;
13              /* Drop lowest score for quiz & computer  */
14              quizave = ( sum(of quiz1-quiz8) - min(of quiz1-quiz8) ) / 7;
15              compave = ( sum(of comp1-comp9) - min(of comp1-comp9) ) / 8;
16              label ethnic = 'Apparent ethnic background (ancestry)'
17                    quizave = 'Quiz Average (drop lowest)'
18                    compave = 'Computer Average (drop lowest)';
19              mark = .3*quizave*10 + .1*compave*10 + .3*midterm + .3*final;
20              label mark = 'Final Mark';
21              format sex sexfmt.;       /* Associates sex & ethnic    */
22              format ethnic ethfmt.;    /* with formats defined above */
23

NOTE: The infile 'class90.dat' is:
      File Name=/res/jbrunner/class/sta402s98/class90.dat,
      Owner Name=jbrunner,Group Name=research,
      Access Permission=rw-------,
      File Size (bytes)=4372

NOTE: 62 records were read from the infile 'class90.dat'.
      The minimum record length was 65.
      The maximum record length was 77.
NOTE: The data set WORK.ERINDALE has 62 observations and 24 variables.
NOTE: DATA statement used:
      real time           0.160 seconds
      cpu time            0.060 seconds


24         proc freq;
25              tables sex ethnic;
NOTE: The PROCEDURE FREQ printed page 1.
NOTE: PROCEDURE FREQ used:
      real time           0.186 seconds
      cpu time            0.060 seconds


26         proc means n mean std;
27              var quiz1 -- mark;    /*  single dash only works with numbered
28                                           lists, like quiz1-quiz8    */

NOTE: The PROCEDURE MEANS printed page 2.
NOTE: PROCEDURE MEANS used:
      real time           0.025 seconds
      cpu time            0.030 seconds


29         proc freq;
30              tables sex*ethnic / chisq;

NOTE: The PROCEDURE FREQ printed page 3.
NOTE: PROCEDURE FREQ used:
      real time           0.291 seconds
      cpu time            0.040 seconds


31         proc corr;
32              var final midterm quizave compave;

NOTE: The PROCEDURE CORR printed page 4.

2                               The SAS System    13:58 Sunday, January 7,
1998

NOTE: PROCEDURE CORR used:
      real time           0.022 seconds
      cpu time            0.020 seconds


33         proc ttest;
34              class sex;
35              var mark;
36
37         /* Predict final exam score from midterm, quiz & computer */
38

NOTE: The PROCEDURE TTEST printed page 5.
NOTE: PROCEDURE TTEST used:
      real time           0.015 seconds
      cpu time            0.020 seconds

39         proc reg  simple;
40              model final = midterm quizave compave / ss1;
41
42
43
NOTE: 62 observations read.
      62 observations used in computations.
NOTE: The PROCEDURE REG printed pages 6-7.
NOTE: PROCEDURE REG used:
      real time           0.382 seconds
      cpu time            0.030 seconds


NOTE: The SAS System used:
      real time           1.511 seconds
      cpu time            0.460 seconds

NOTE: SAS Institute Inc., SAS Circle, PO Box 8000, Cary, NC 27512-8000

tuzo.erin > 


tuzo.erin >  cat intro98a.lst
                       Grades from STA 302:  Fall, 1990
1
                                                  13:58 Sunday, January 7,
1998

                                           Cumulative  Cumulative
                SEX   Frequency   Percent   Frequency    Percent
             ----------------------------------------------------
             Male           39      62.9          39       62.9
             Female         23      37.1          62      100.0




                     Apparent ethnic background (ancestry)

                                            Cumulative  Cumulative
              ETHNIC   Frequency   Percent   Frequency    Percent
            ------------------------------------------------------
            Chinese          41      66.1          41       66.1
            European         15      24.2          56       90.3
            Other             6       9.7          62      100.0


   Variable  Label                            N          Mean       Std Dev
   ------------------------------------------------------------------------
   QUIZ1                                     62     9.0967742     2.2739413
   QUIZ2                                     62     5.8870968     3.2294995
   QUIZ3                                     62     6.0483871     2.3707744
   QUIZ4                                     62     7.7258065     2.1590022
   QUIZ5                                     62     9.0645161     1.4471109
   QUIZ6                                     62     7.1612903     1.9264641
   QUIZ7                                     62     5.7903226     2.1204477
   QUIZ8                                     62     6.3064516     2.3787909
   COMP1                                     62     9.1451613     1.1430011
   COMP2                                     62     8.8225806     1.7604414
   COMP3                                     62     8.3387097     2.5020880
   COMP4                                     62     7.8548387     3.2180168
   COMP5                                     62     9.4354839     1.7237109
   COMP6                                     62     7.8548387     2.4350364
   COMP7                                     62     6.6451613     2.7526248
   COMP8                                     62     8.8225806     1.6745363
   COMP9                                     62     8.2419355     3.7050497
   MIDTERM                                   62    70.1935484    13.6235557
   FINAL                                     62    49.4677419    17.5141327
   QUIZAVE   Quiz Average (drop lowest)      62     7.6751152     1.1266917
   COMPAVE   Computer Average (drop lowest)  62     8.8346774     1.1204997
   MARK      Final Mark                      62    67.7584101    11.0235746
   ------------------------------------------------------------------------

                            TABLE OF SEX BY ETHNIC

                 SEX       ETHNIC(Apparent ethnic background (ancestry))

                 Frequency|
                 Percent  |
                 Row Pct  |
                 Col Pct  |Chinese |European|Other   |  Total
                 ---------+--------+--------+--------+
                 Male     |     27 |      7 |      5 |     39
                          |  43.55 |  11.29 |   8.06 |  62.90
                          |  69.23 |  17.95 |  12.82 |
                          |  65.85 |  46.67 |  83.33 |
                 ---------+--------+--------+--------+
                 Female   |     14 |      8 |      1 |     23
                          |  22.58 |  12.90 |   1.61 |  37.10
                          |  60.87 |  34.78 |   4.35 |
                          |  34.15 |  53.33 |  16.67 |
                 ---------+--------+--------+--------+
                 Total          41       15        6       62
                             66.13    24.19     9.68   100.00


                     STATISTICS FOR TABLE OF SEX BY ETHNIC

            Statistic                     DF     Value        Prob
            ------------------------------------------------------
            Chi-Square                     2     2.921       0.232
            Likelihood Ratio Chi-Square    2     2.996       0.224
            Mantel-Haenszel Chi-Square     1     0.000       0.995
            Phi Coefficient                      0.217
            Contingency Coefficient              0.212
            Cramer's V                           0.217

            Sample Size = 62
            WARNING:  33% of the cells have expected counts less
                       than 5. Chi-Square may not be a valid test.

                       Grades from STA 302:  Fall, 1990
4
                                                  13:58 Sunday, January 7,
1998

                             Correlation Analysis

              4 'VAR' Variables:  FINAL    MIDTERM  QUIZAVE  COMPAVE


                              Simple Statistics

 Variable               N             Mean          Std Dev              Sum

 FINAL                 62        49.467742        17.514133      3067.000000
 MIDTERM               62        70.193548        13.623556      4352.000000
 QUIZAVE               62         7.675115         1.126692       475.857143
 COMPAVE               62         8.834677         1.120500       547.750000

                              Simple Statistics

 Variable         Minimum          Maximum     Label

 FINAL          15.000000        89.000000
 MIDTERM        44.000000       103.000000
 QUIZAVE         4.571429         9.714286     Quiz Average (drop lowest)
 COMPAVE         5.000000        10.000000     Computer Average (drop lowest)


   Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 62

                                     FINAL     MIDTERM     QUIZAVE     COMPAVE

FINAL                              1.00000     0.51078     0.47127     0.14434
                                    0.0         0.0001      0.0001      0.2630

MIDTERM                            0.51078     1.00000     0.59294     0.41277
                                    0.0001      0.0         0.0001      0.0009

QUIZAVE                            0.47127     0.59294     1.00000     0.52649
Quiz Average (drop lowest)          0.0001      0.0001      0.0         0.0001

COMPAVE                            0.14434     0.41277     0.52649     1.00000
Computer Average (drop lowest)      0.2630      0.0009      0.0001      0.0

                       Grades from STA 302:  Fall, 1990
5
                                                  13:58 Sunday, January 7,
1998

                                TTEST PROCEDURE

Variable: MARK         Final Mark

   SEX       N         Mean      Std Dev    Std Error      Minimum  Maximum
-------------------------------------------------------------------------------
  Male      39  67.62097070  10.11112521   1.61907581  43.61428571 89.93214286
Female      23  67.99145963  12.65945704   2.63967927  48.48214286 95.45714286

Variances        T       DF    Prob>|T|
---------------------------------------
Unequal    -0.1196     38.5      0.9054
Equal      -0.1268     60.0      0.8995

For H0: Variances are equal, F' = 1.57    DF = (22,38)    Prob>F' = 0.2190

                       Grades from STA 302:  Fall, 1990
6
                                                  13:58 Sunday, January 7,
1998

                            Descriptive Statistics

     Variables                 Sum                Mean      Uncorrected SS

     INTERCEP                   62                   1                  62
     MIDTERM                  4352        70.193548387              316804
     QUIZAVE          475.85714286        7.6751152074        3729.6938776
     COMPAVE                547.75        8.8346774194          4915.78125
     FINAL                    3067        49.467741935              170429


               Variables            Variance       Std Deviation

               INTERCEP                    0                   0
               MIDTERM          185.60126917        13.623555673
               QUIZAVE          1.2694341618        1.1266916889
               COMPAVE          1.2555195664        1.1204996949
               FINAL              306.744844        17.514132693


Model: MODEL1
Dependent Variable: FINAL

                             Analysis of Variance

                                Sum of         Mean
       Source          DF      Squares       Square      F Value       Prob>F

       Model            3   6216.33032   2072.11011        9.618       0.0001
       Error           58  12495.10516    215.43285
       C Total         61  18711.43548

           Root MSE      14.67763     R-square       0.3322
           Dep Mean      49.46774     Adj R-sq       0.2977
           C.V.          29.67112

                              Parameter Estimates

                      Parameter      Standard    T for H0:
     Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

     INTERCEP   1      0.554234   16.09660613         0.034        0.9727
     MIDTERM    1      0.498057    0.17318547         2.876        0.0056
     QUIZAVE    1      5.371214    2.24349814         2.394        0.0199
     COMPAVE    1     -3.086879    1.99437533        -1.548        0.1271

                                 Variable
     Variable  DF     Type I SS     Label

     INTERCEP   1        151718  Intercept
     MIDTERM    1   4881.795290
     QUIZAVE    1    818.430992  Quiz Average (drop lowest)
     COMPAVE    1    516.104040  Computer Average (drop lowest)



Dependent Variable: FINAL
Test: SMALSTUF Numerator:    667.2675  DF:    2   F value:   3.0973
               Denominator:  215.4328  DF:   58   Prob>F:    0.0527