STA429/1007 Assignment 2

Quiz on Thursday Oct. 4th


This assignment is based on material in Chapter 2 of the online text, and associated lecture material. You may also want to review the material on sample correlation and regression in Chapter 1.

The file trees.data has diameter in inches at 4.5 feet from the ground, height in feet, and volume in cubic feet for a sample of Black Cherry trees. Read the label the data with SAS. Obtain a full set of descriptive statistics with proc univariate; use the normal and plot options. Then use proc corr to get a matrix of Pearson and Spearman (rank) correlations, with significance tests. Take a close look at the output, and be ready to answer questions like the following.

  1. How many cherry trees are in the sample?
  2. What is the median height, in feet?
  3. What is the third smallest diameter?
  4. What is the standard deviation of the volumes?
  5. Do the volume measurements appear to be from a normal distribution? Answer Yes or No and give Four p-values.
  6. For the height measurements, we have a t statistic of 66.40969 with p < 0.0001. What does this tell you?
  7. What is the common Pearson correlation between height and volume?
  8. What is the p-value associated with the correlation between height and volume?
  9. Is the correlation between height and volume statistically significant at the 0.05 level?
  10. The correlation is positive, so greater height tends to go with ______ volume.
  11. What proportion of the variation in volume is explained by height?
  12. What is the Spearman (rank) correlation between height and volume?
  13. What is the p-value associated with the rank correlation between height and volume?
  14. Is the correlation between height and volume statistically significant at the 0.05 level?
  15. The rank correlation is _______, so greater hight tends to go with ______ volume.
  16. Does it look like the departure of volume from normality caused a problem here? Briefly justify your answer.

You will not be asked to hand in the answers to these questions. They are just practice for the quiz. Questions on the quiz will be similar to these, but probably not identical.

Bring your list file and log file (not just a listing of the program file) to the quiz. Please do not write anything on the printouts in advance of the quiz, except possibly your name. Please ensure that the log and list file you bring correspond to the same run. It is common for students to notice mistakes in the log file, fix them, run the job again, and then bring the new list file along with the old log file. This makes it so frustrating for me to track down errors that I promise to deduct marks if you do it.

As mentioned in the text and lecture, the unix curl command is a great way to get data from the course website. For example,

curl http://fisher.utstat.toronto.edu/~brunner/429f07/code_n_data/hw/trees.data > trees.data