STA442/1008 Final Exam Information


General Information

The Final exam will be on Wednesday May 1st, 9 am to 12 noon, in Room 3093, South Building. It will be open book, open notes and comprehensive. Bring a calculator.

As you know, the final exam is optional. You do not have to take it, but if you do take the final, the mark you receive will be substituted for either your lowest quiz score or your two lowest quiz scores, whichever will help you more. If your final exam mark is lower than your lowest quiz score, it will still be substituted. So in theory, taking the final could hurt your mark. However, I have never seen this happen.

There will be a computer portion of the final exam. It will include both SAS and S. There will be a lot more SAS than S. You will not be asked to write any SAS statements or S code. The exam will include pieces of input and output, and you will be asked questions about the output.

All the computer-related questions, and lots of the other questions too, will be based on two data sets.

Detailed descriptions of these data sets are given below.

A good way to study for the final is to familiarize yourself with the data sets, and then read through the notes, connecting the concepts and methods in the notes to the data sets. For example, in each data set, what are the most natural independent and dependent variables. For various hypotheses in multiple regression, what proportion of the remaining variation is explained? What proportion of the remaining variation is required for statistical significance?

Also, please do some reasonable analyses of the data. Try to understand everything on the printouts. Chances are good that you will do something very similar to what you see on the final exam. This is just a suggestion, though. It is not required. You will not be asked to turn in any printouts.

Description of Data Sets

The Tooth Growth Data: The Effect of Vitamin C on Tooth Growth in Guinea Pigs

The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). The data file has 4 variables. I will use the variable names given below, and you should too.

  1. id: Identification Number
  2. len: Tooth length
  3. supp: Supplement type (VC or OJ).
  4. dose: Dose in milligrams.
Here are the data. Copy and paste.
    len supp dose
1   4.2   VC  0.5
2  11.5   VC  0.5
3   7.3   VC  0.5
4   5.8   VC  0.5
5   6.4   VC  0.5
6  10.0   VC  0.5
7  11.2   VC  0.5
8  11.2   VC  0.5
9   5.2   VC  0.5
10  7.0   VC  0.5
11 16.5   VC  1.0
12 16.5   VC  1.0
13 15.2   VC  1.0
14 17.3   VC  1.0
15 22.5   VC  1.0
16 17.3   VC  1.0
17 13.6   VC  1.0
18 14.5   VC  1.0
19 18.8   VC  1.0
20 15.5   VC  1.0
21 23.6   VC  2.0
22 18.5   VC  2.0
23 33.9   VC  2.0
24 25.5   VC  2.0
25 26.4   VC  2.0
26 32.5   VC  2.0
27 26.7   VC  2.0
28 21.5   VC  2.0
29 23.3   VC  2.0
30 29.5   VC  2.0
31 15.2   OJ  0.5
32 21.5   OJ  0.5
33 17.6   OJ  0.5
34  9.7   OJ  0.5
35 14.5   OJ  0.5
36 10.0   OJ  0.5
37  8.2   OJ  0.5
38  9.4   OJ  0.5
39 16.5   OJ  0.5
40  9.7   OJ  0.5
41 19.7   OJ  1.0
42 23.3   OJ  1.0
43 23.6   OJ  1.0
44 26.4   OJ  1.0
45 20.0   OJ  1.0
46 25.2   OJ  1.0
47 25.8   OJ  1.0
48 21.2   OJ  1.0
49 14.5   OJ  1.0
50 27.3   OJ  1.0
51 25.5   OJ  2.0
52 26.4   OJ  2.0
53 22.4   OJ  2.0
54 24.5   OJ  2.0
55 24.8   OJ  2.0
56 30.9   OJ  2.0
57 26.4   OJ  2.0
58 27.3   OJ  2.0
59 29.4   OJ  2.0
60 23.0   OJ  2.0

The Motor Trend Car Data

The data were extracted from the 1974 Motor Trend US magazine, and comprise fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). There are 12 variables. I will use the variable names given below, and you should too.

  1. name: Name of car
  2. mpg: Miles/(US) gallon
  3. cyl: Number of cylinders
  4. disp: Displacement (cu.in.)
  5. hp: Gross horsepower
  6. drat: Rear axle ratio
  7. wt: Weight (lb/1000)
  8. qsec: 1/4 mile time
  9. vs: V/S
  10. am: Transmission (0 = automatic, 1 = manual)
  11. gear: Number of forward gears
  12. carb: Number of carburettors
Here are the data. Copy and paste.
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda_RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda_RX4_Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun_710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet_4_Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet_Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster_360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc_240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc_230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc_280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc_280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc_450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc_450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc_450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac_Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln_Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler_Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat_128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda_Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota_Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota_Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge_Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC_Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro_Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac_Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat_X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche_914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus_Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford_Pantera_L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari_Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati_Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo_142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Hints

  1. If you are unsure about whether a quantitative independent variable with just a few values should be treated as quantitative or categorical (for example, number of cylinders of a car), the answer is probably categorical.
  2. The data files are set up with the first line containing variable names. This is great for the S read.table command, but seemingly inconvenient for SAS. Try the firstobs option on the infile statement to start reading from line 2 of the data file. My infile statement for the Tooth Growth data is
    infile 'toothgrowth.dat' firstobs=2;
  3. Remember, in SAS proc glm output, tests associated with Type I Sums of squares control each effect for the preceding ones, while in Type III, each effect is controlled for all the others.
  4. I'm going to do two things with R. One will be a multiple regression, so do at least one multiple regression and practice looking at the output, say from the anova and summary functions. The other thing will be a randomization test. I'll just do it, and you'll be asked to explain what I did and state the conclusions.

Sample Questions

  1. In the Motor Trend Cars data, consider the correlation between HP and QSEC.
    1. What proportion of the variation in quarter mile time is explained by horsepower?
    2. Is the result statistically significant?
    3. State the finding in non-technical language. Assume the person you are talking to has never heard of a correlation.
  2. In the Tooth Growth data, what are the independent variables? What is the dependent variable? Do the most obvious analysis. State your conclusions, in non-technical language.
  3. In the Motor Trend Cars data, once you control for weight, HP and auto vs manual transmission, is number of gears (categorical) related to time in the quarter mile? To MPG? Use proc glm as well as proc reg to check. There are no interactions in your models.
  4. In the question above, what are your conclusions if you did a crude multivariate analysis, allowing for the fact that you've got 2 DVs by doing a Bonferroni correction?