STA2453 Assignment One

Due at the beginning of class on Friday October 9th


This assignment is based on the fish data. This data set was submitted to the Journal of Statistical Education by Professor Juha Puranen, Department of Statistics, University of Helsinki, Finland. Here is a description.

159 fishes of 7 species are caught and measured. Altogether there are 8 variables. All the fishes are caught from the same lake (Laengelmavesi) near Tampere in Finland.

VARIABLE DESCRIPTIONS:

1  Obs       Observation number ranges from 1 to 159
2  Species   (Numeric)
        Code Finnish  Swedish    English        Latin      
         1   Lahna    Braxen     Bream          Abramis brama
         2   Siika    Iiden      Whitewish      Leusiscus idus
         3   Saerki   Moerten    Roach          Leuciscus rutilus
         4   Parkki   Bjoerknan  Silver Bream   Abramis bjrkna
         5   Norssi   Norssen    Smelt          Osmerus eperlanus
         6   Hauki    Jaedda     Pike           Esox lucius
         7   Ahven    Abborre    Perch          Perca fluviatilis

3  Weight      Weight of the fish (in grams)
4  Length1     Length from the nose to the beginning of the tail (in cm)
5  Length2     Length from the nose to the notch of the tail (in cm)
6  Length3     Length from the nose to the end of the tail (in cm)
7  Height%     Maximal height as % of Length3
8  Width%      Maximal width as % of Length3
9  Sex         1 = male 0 = female



          ___/////___                  _
         /           \    ___          |
       /\             \_ /  /          H
     <   )            __)  \           |
       \/_\\_________/   \__\          _

     |------- L1 -------|
     |------- L2 ----------|
     |------- L3 ------------|


Values are aligned and delimited by blanks.
Missing values are denoted with NA.
There is one data line for each case.


==========

    1      1     242.0     23.2    25.4    30.0    38.4   13.4   NA
    2      1     290.0     24.0    26.3    31.2    40.0   13.8   NA
    3      1     340.0     23.9    26.5    31.1    39.8   15.1   NA

etc.


The complete data file is available here.

For this assignment, use SAS to read and label the data.

  1. Produce a table of means, standard deviations and sample sizes for the continuous variables.
  2. Produce frequency distributions for the categorical variables.
  3. Produce a correlation matrix of weight and the length variables.
  4. You will turn in hard copy of your log file (not the program file, the log file) and your output file. Make sure your name is written clearly on both files.

Please note that this is not a group project. You must do the work yourself, using an installation of SAS on your computer. Do not copy anyone's code (except mine), or allow yours to be copied. I will enforce this.