STA442/1008 Assignment 8

Do this for Test Three: Friday March 26th


The file potato2.dat contains data from an agricultural study in which potatoes were randomly assigned to be infected with one of three common strains of bacteria. Again randomly, equal numbers of potatoes with each type of bacteria were stored at one of two temperatures (1=cooler, 2=warmer). The dependent variable is amount of rot.

  1. Use proc glm to do an ordinary two-factor Analysis of Variance on these data. For each of the main effects and the interaction, answer these questions:
    1. What is the value of the test statistic?
    2. What is the p-value?
    3. Are the results statistically significant at the 0.05 level?
    4. Controlling for other effects in the model, what proportion of the remaining variation does the effect explain?
    5. What do you conclude?

    There is a bit of a trap here. I believe that because of the interaction, it makes sense to interpret one of the main effects, but not the other.

  2. Graph the interaction. A rough sketch is fine, or you can use graph paper. Does the effect of temperature appear to depend on type of bacteria? Describe any apparent trend in words. Does the effect of type of bacteria appear to depend on temperature? Describe any apparent trend in words.
  3. Now make 6 indicator variables, one for each treatment combination; of course you'll have to go back into your data step to do this. If the two lines on your graph (one for each temperature) were parallel, you'd have

    µ12 - µ11 = µ22 - µ21

    and

    µ13 - µ12 = µ23 - µ22.

    Using your indicator dummy variables, set up a regression model with cell means coding. Use the test command of proc reg to duplicate the test for interaction. You should get exactly the same F and p-value that proc glm gave you.

  4. In the preceding question, you tested whether the effect of bacteria type depended on storage temperature. You can look at the interaction the other way: does the effect of storage temperature depend on type of bacteria? If not, you would have

    µ21 - µ11 = µ22 - µ12

    and

    µ22 - µ12 = µ23 - µ13.

    Test this using the test command of proc reg. It's just another way of looking at the interaction, so you should get exactly the same F and p-value again.

  5. That last test for the interaction was a test for difference among three differences. It was significant, and this raises the question of which differences are different from one another. So please follow up with tests for all three pairwise differences between differences. You can just put three more test statements on your proc reg. What do you conclude? Explain your conclusions in plain language. Zero marks if you use terms that would not be understood by the average potato farmer.
  6. Would your answer to the last question change if you used a Bonferroni correction to protect the three tests of pairwise differences at joint significance level 0.05? Answer Yes or No. If the answer is Yes, say how the conclusion changed.
  7. Now please go back to the TV data. You'd like to test for regional differences (Rural vs Small Town vs Urban) in total TV hours watched, controlling for number of persons in the household. But the standard test would assume three regression lines with parallel slopes. You want to test this assumption. Do it with proc reg. Of course you have to create some additional variables in your data step.
    1. What is the value of the test statistic?
    2. What is the p-value?
    3. Are the results statistically significant at the 0.05 level?
    4. Controlling for other effects in the model, what proportion of the remaining variation does the effect explain?
    5. What do you conclude?
  8. The preceding test was not significant, so go ahead and do the usual test for regional differences controlling for number of persons in the household.
    1. First do it with proc glm, getting least squares means. What appears to be happening?
    2. Now go back to proc reg. Controlling for number of persons in the household, test all pairwise differences between locations. Use a Bonferroni correction. At joint significance level 0.05, be able to say whether each test is significant. What do you conclude? Use plain language that even a politician could understand. Even the words "significant" and "mean" are forbidden.

Bring your log files and your list files to the test. Also bring your hand-drawn graph. You may need to hand some of this in as one of the test questions.