Scab Disease Data

Scab disease is a fungal infection that affects potatoes. The fungus does not grow well in acidic soil, so investigators designed a study to see whether adding sulphur to the soil would reduce the scab disease. In a completely randomized design, plots of land were randomly assigned to either a control condition or to several levels of sulphur that was spread on the land in the Spring or Fall. The amounts of sulphur were either 300 pounds per acre, 600 pounds per acre or 1200 pounds per acre. In the control condition, no sulphur was applied. The potatoes were harvested at the end of the growing season. One hundred potatoes were randomly selected from each plot of land. The potatoes were washed, and then a lab assistant estimated the percent of each potato's surface that was infected with scab disease. The response variable is, for each plot of land, the mean percent surface area covered with scab disease. It is a 2x3 factorial design augmented by a control condition.

Control (No sulphur)
 
      
 300    600   1200  
Fall      
Spring 

Here are the raw data.


Control    35.26
Control    30.69
Control    15.56
Control    31.59
Control    15.91
Control    15.77
Control    19.07
Control    17.15
Spring300  19.96
Spring300  30.12
Spring300  11.78
Spring300   5.13
Spring600  20.63
Spring600  22.75
Spring600  18.22
Spring600  11.40
Spring1200 19.84
Spring1200 15.45
Spring1200  8.11
Spring1200 13.61
Fall300     5.52
Fall300    14.80
Fall300     5.08
Fall300    12.59
Fall600    16.14
Fall600    14.54
Fall600    11.10
Fall600    20.23
Fall1200    2.01
Fall1200    8.48
Fall1200    7.43
Fall1200    5.08

Click here for a convenient plain text version of this short data file.

Here are some questions we would like to answer. For each question, start by stating the null hypothesis in terms of μj values. The term amount of scab disease refers to average percent surface area infected for 100 potatoes.

  1. Is amount of scab disease affected by experimental treatment? Treatment includes the control condition.
  2. Is the expected amount of scab disease for each treatment different from the expected amount of scab disease in the control condition? That's six tests.
  3. Are there any other significant differences between experimental conditions?
  4. Is the average expected amount of scab disease in the 3 Fall conditions different from the expected amount in the control condition? This is one test.
  5. Is the average expected amount of scab disease in the 3 Spring conditions different from the expected amount in the control condition? This is one test.
  6. Is the average expected amount of scab disease in the 3 Spring conditions different from the average expected amount of scab disease in the 3 Fall conditions? This is one test.
  7. Is amount of scab disease affected by the amount of sulfur when the sulfur is applied in the Spring? This is one test.
  8. Is amount of scab disease affected by the amount of sulfur when the sulfur is applied in the Fall? This is one test.
  9. For each amount of sulphur, is the expected amount of scab disease different depending on whether the treatment is applied in Fall or the Spring? That's three tests.

The tests in Questions 2 through 9 can be viewed as follow-up tests to the overall test in Question 1. That is, they are testing for differences that would produce inequality among the seven treatment means.

Assuming the overall test is significant (otherwise why try to find out where the effect came from?), suppose we want to protect all the follow-up tests simultaneously at a joint significance level of 0.05.

 

 

There are so many good questions and discussion topics that could be based on this example.

  1. Is n=4 per treatment really enough?
  2. Why not use the individual potatoes as experimental units?
  3. Why is the normal assuption pretty reasonable? (CLT, but not iid)
  4. Explanatory variable is numeric. Why not just do a linear regression?
  5. This brings up the possibility of testing departure from linearity.
  6. Is there a case for one-sided tests here?
  7. Is the amount of reduction worth the expense?
  8. Monitoring actual soil acidity, at various depths. Mediating variable, if that's what it's called.
  9. Degree of infestation is likely not controlled, and could be the most powerful influence. Consider analysis of covariance or blocking.
  10. Other potential response variables, like size and flavour of the potatoes!
  11. Possible interactions with fertilizers and other pesticides. At least these should be held constant.
  12. How would things be different if this were a purely observational study? If would be easier, you know. Just ask the farmers.

 


This data set is based on an example in Cochran and Cox's (1958) classic text Experimental design.. The original data appear on page 97 of Cochran and Cox's book. The data above are carefully designed to give the same results as the original, without actually using their numbers. The R function used to reconstruct the data appears in a comment statement at the end of this document. View the html source to see it.

This document, including the data and the R function, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Use any part of it almost any way you like, as long as you share the results freely. See the license for details.