STA305s14 Computer Assignment Three
Quiz in lecture on Monday Feb. 10th: Bring a calculator.
This is a continuation of your work on the sulphur and scab disease data used in
Assignment 2. One possibility is to start by making a copy of your SAS program from last time, using something like cp hw2.sas hw3.sas, and then editing hw3.sas.
- It will help to have the means and standard deviations available, so just repeat the code that produced n, mean and standard deviation of the response variable for each treatment condition separately, including the control.
- Again, use proc reg to fit a regression model with an intercept and indicator dummy variables for experimental treatment. If you didn't lose marks for this on Assignment 2, you can just start with the code you already have.
- Do non-directional tests (which would be two-sided tests if they were t-tests) to compare the expected amounts of scab disease for each pair of experimental conditions, including the control. We'll call these "pairwise comparisons." Some of these tests are already on the default output. It's unnecessary to do them twice, though you can if you really want to. For each test, be able to answer questions like these:
- Give the null hypothesis in terms of β values.
- Give the value of the test statistic. The answer is a number from your printout.
- Give the p-value. The answer is a number from your printout.
- Do you reject the null hypothesis at α = 0.05? Answer Yes or No.
- In plain, non-statistical language, what do you conclude from the test? The answer is something about scab disease on potatoes.
Remember, even though the tests are non-directional, you will give directional conclusions if the null hypothesis is rejected; so look at the means from Question 1. That is, you would not say "Putting 1200 pounds of sulphur per acre on the fields has an effect on the amount of scab disease." You would say "Putting 1200 pounds of sulphur per acre on the fields reduces the amount of scab disease." This feature of your answer is worth half the marks.
- Now let's do more with the comparisons of treatment to control. Denote the treatment effects by Δ3, Δ6 and Δ12. That is, Δ6 is the change in expected amount of scab disease that results from adding 600 pounds of sulphur to a field. What is Δ6 in terms of your β parameters?
- Using proc iml to obtain the necessary critical value, give a 95% confidence interval for Δ6. This is calculator work. Do not hand-write write the calculations and answers on your printout. If you do, it's an unauthorized aid and I will charge you with an academic offence. But feel free to make proc reg do it so you can check your work.
- A scientist might be very interested if adding sulphur to a field actually increased the amount of scab disease. It would be important to understand what happened, so the test must be able to detect a finding that is the opposite of what is expected. The scientist needs a two-tailed test. But from a practical standpoint, a sophisticated farmer who is considering adding sulphur to her potato fields does not care. If there's no evidence that it works, she won't do it. If there's evidence that it's harmful, she still won't do it. The other tail of the distribution just does not matter. For her, a one-tailed test makes more sense. So with a one-sided test and the usual α=0.05 significance level, is there evidence that adding 600 pounds of sulphur reduces scab disease? Answer Yes or No. What is the one-tailed p-value?
- Give the one-tailed p-values for the other two experimental treatments.
- But the farmer is really sophisticated. She says "I'm not going to bother with this unless there's evidence that the treatment can reduce the expected amount of scab by at least 10 percent of surface area. I'm fine with a one-tailed test at α=0.05." Consider the 300-pound treatment.
- Give the null hypothesis in terms of β values.
- Give the alternative hypothesis in terms of β values.
- Give the value of the test statistic. The answer is a number. This is calculator work. Again, do not hand-write write the calculations and answers on your printout. Just be ready to do something like this on the quiz.
- Do you reject H0? Answer Yes or No.
- Is the farmer willing to apply a sulphur treatment of 300 pounds per acre?
- Try the same thing for the 1200-pound treatment.
- Give the null hypothesis in terms of β values.
- Give the alternative hypothesis in terms of β values.
- Give the value of the test statistic. The answer is a number. This is calculator work. Again, do not hand-write write the calculations and answers on your printout. Just be ready to do something like this on the quiz.
- Do you reject H0? Answer Yes or No.
- Is the farmer willing to apply a sulphur treatment of 1200 pounds per acre?
- On Assignment two, some people fit a linear regression with a single independent variable: Amount of sulphur, in 100s of pounds. This was not enough, but still it's not unreasonable, and we can make it good enough by adding to it.
Think of it this way. We know that within treatments, the sum of squared deviations from the sample mean is at a minimum, and this is true of the regression surface you have been fitting, as well. But any regression surface that passes exactly though the sample means should do the trick. So let's see. A straight line passes exactly through two points, a quadratic passes exactly through 3 points ... Oh, we can do this with polynomial regression.
- First, fit a simple linear regression model with just amount of sulphur as the independent variable. Is there evidence of a linear trend relating amount of sulphur to amount of scab disease? I don't have anything tricky in mind here. Just base your answer on a two-sided test of β1.
- Now fit a polynomial regression model of the right degree. You can create the polynomial terms with assignment statements in the data step. SAS syntax is x**2 for x2, etc.. Look at the overall F-test and R2. Compare them with what you got from dummy variable regression.
- We have some evidence of a linear trend. Now let's test for departure from a linear trend. If we cannot reject this null hypothesis, e might as well go with a simple linear regression. You have actually fit both the full and the reduced model, but don't do the test that way; you might press the wrong key on your calculator. Instead, use a general linear test. What is the null hypothesis? What do you conclude? Plain language is not necessary here.
- You can also do this test with the first model, though it requires a little thought. You will probably want to make a table. In terms of the β values from your first model, what is the null hypothesis? Carry out the test. Does the F-statistic match with what you got for the polynomial model? (I just went back and added one more test statement to my earlier model.)
Bring both your log file and your procedure output file to the quiz. You will be asked to hand them in with your quiz. Please put your name and student number in the title statement.
Please be reminded of the rules for computer assignments and quizzes. See the syllabus for more detail.
- You may copy freely from me, but do not look at anyone else's SAS code before the quiz.
- Do not show your SAS code to any other student in the class.
- Do not write anything on the printouts.
- Important: Comment statements and other typed material that would help with interpretation of the computer output are expressly forbidden. For example, you may not type assignment questions or answers into your code, or otherwise cause them to appear on your log file or procedure output file. Any such material is an unauthorized aid for the purposes of this course, and if you use or possess an unauthorized aid, you will be charged with an academic offence.
- The log and procedure output files must be
generated at the same time by the same SAS program or you may lose a lot of marks.
This assignment is based on an example in Cochran and Cox's (1958) classic text Experimental design.. The data in this assignment is a reconstructed data set. The original data appear on page 97 of Cochran and Cox's book. The data in this assignment are carefully designed to give the same results as the original, without actually using their numbers. The R function used to reconstruct the data appears in a comment statement at the end of this document. View the html source to see it.
This assignment, including the data and the R function, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Use any part of it almost any way you like, as long as you share the results freely. See the license for details.