Assignment 10

STA441 Assignment 10

Quiz in Tutorial on Friday April the 5th

Suppose you have a data set with a quantitative response variable Y, quantitative explanatory variables X₁ and X₂, and factors A, B and C. Factor A has 3 levels (categories), factor B has 2 levels and factor C has 3 levels. You see this SAS code;
```
                proc glm;
                class A B C;
                model Y = X1 X2 A|B|C;
```
1. Is this an analysis of covariance? Answer Yes or No. If Yes, what are the covariates?
2. Indicate how you would define dummy variables for the factors, using effect coding. That's the setup with 1, 0 and -1. Use names like a₁, a₂, and so on for your dummy variables.
3. Write E(Y|X=x) for a regression model with your dummy variables, equivalent to the proc glm model above.
4. For the overall F-test produced by proc glm,
  1. State the null hypothesis in terms of β quantities from your regression model.
  2. How many (numerator) degrees of freedom will be in your test?
5. For each of the following effects in the proc glm model, state the null hypothesis in terms of β quantities from your regression model. All effects are controlling for x₁ and x₂. You need not show any work. Just write down the answers.
  1. Main effect of A.
  2. Averaging over B and C, is the mean of Y the same for each level of A?
  3. Main effect of B.
  4. Averaging over A and C, is the mean of Y different for the various levels of B?
  5. Main effect of C.
  6. Averaging over A and B, is Y related to C?
  7. Are the marginal means different for the different levels of C?
  8. A by B interaction.
  9. Averaging over C, does the relationship of A to Y depend on the level of B?
  10. Averaging over C, does the relationship of B to Y depend on the level of A?
  11. A by C interaction.
  12. Averaging over B, do the differences in mean Y for different values of A depend on the value of C?
  13. Averaging over B, do the differences in mean Y for different values of C depend on the value of A?
  14. B by C interaction.
  15. A by B by C interaction.
  16. Does the nature of the A by B interaction depend on the value of C?
  17. Does the nature of the A by C interaction depend on the value of B?
  18. Does the nature of the B by C interaction depend on the value of A?
6. Controlling for the covariates, are there any differences among the 18 treatment means? State the null hypothesis you would test in order to answer this question. You answer should be in terms of β quantities from your regression model.
Now let's make that last example a little smaller, and try cell means coding. Again the quantitative response variable is Y. This time there is one quantitative covariate X, and factors A and B. Factor A has 3 levels (categories), factor B has 2 levels. The SAS code is
```
                proc glm;
                class A B;
                model Y = X A|B;
```
1. Indicate how you would define dummy variables, using cell means coding. That's the setup with just zeros and ones, and no intercept. You need to make a table with 6 rows, and a column for each dummy variable.
2. Write E(Y|X=x) for a regression model with your dummy variables, equivalent to the proc glm model above. This equation has all 7 β quantities.
3. Add one more column to your table, showing E(Y|X=x) for each treatment combination. My answer has only two β quantities in each row.
4. Make a 3x2 (A by B) table, and write E(Y|x) in each cell. This will help you answer the next question.
5. For each of the following effects in the proc glm model, state the null hypothesis in terms of β quantities from your regression model. You need not show any work. Just write down the answers.
  1. Main effect of A, controlling for x.
  2. Main effect of B, controlling for x.
  3. A by B interaction, controlling for x.
6. Perhaps the clearest test of a factor is to test the null hypothesis that it has no relationship with the response variable within any combination of the other factors. This is the null hypothesis of "conditional independence." That is, conditionally upon the values of all the other factors, the factor in question is independent of y.
  1. What is the null hypothesis for testing whether, controlling for x, factor A is related to y within any level of factor B? This is the null hypothesis of conditional independence of A and y given B and x. State the null hypothesis in terms of β quantities from your regression model.
  2. What is the null hypothesis for testing whether, controlling for x, factor B is related to y within any level of factor A? This is the null hypothesis of conditional independence of B and y given A and x. State the null hypothesis in terms of β quantities from your regression model.
Consider a two-factor analysis of variance in which each factor has two levels. Use this regression equation for the problem:
E[Y|X=d] = β₀ + β₁d₁ + β₂d₂ + β₃d₁d₂
where d₁ and d₂ are dummy variables.
1. Make a two-by-two table showing the four treatment means in terms of β values. Use effect coding (the scheme with the minus ones). In terms of the β values, state the null hypothesis you would use to test for
  1. Main effect of the first factor
  2. Main effect of the second factor
  3. Interaction
2. Make another two-by-two table showing the four treatment means in terms of β values. Use indicator dummy variables (just zeros and ones ones). In terms of the β values, state the null hypothesis you would use to test for
  1. Main effect of the first factor
  2. Main effect of the second factor
  3. Interaction
3. Which dummy variable scheme do you like more for this purpose? Why?
I know this is pretty gruesome, but the data are real -- from the U of T School of Dentistry.
An experiment in dentistry seeks to test the effectiveness of a drug (HEBP) that is supposed to help dental implants become more firmly attached to the jaw bone. This is an initial test on animals. False teeth were implanted into the leg bones of rabbits, and the rabbits were randomly assigned to receive either the drug or a saline solution (placebo). Technicians administering the drug were blind to experimental condition.
Rabbits were also randomly assigned to be "sacrificed" after either 3, 6, 9 or 12 days. At that time, the implants were pulled out of the bone by a machine that measures force in newtons and stiffness in newtons/mm. For both of these measurements, higher values indicate more healing. A measure of "pre-load stiffness" in newtons/mm is also available for each animal. This may be another indicator of how firmly the false tooth was implanted into the bone, but it might even be a covariate. Nobody can seem to remember what "preload" means, so we'll ignore this variable for now.
The data are available in the file bunnies.data.txt. The variables are
1. Identification code
2. Time (3,6,9,12 days of healing)
3. Drug (1=HEBP, 0=saline solution)
4. Stiffness in newtons/mm
5. Force in newtons
6. Preload stiffness in newtons/mm
Please do the following.
1. Classify the factors as within cases or between cases. See Lecture One Slide 32 for the definition.
2. Use proc freq to find out how many rabbits are in each experimental condition.
3. Use the means statement to get cell means and marginal means.
4. Using proc glm, conduct a standard two-way ANOVA, with force as the response variable. Don't suppress the interaction plot. Be prepared to answer the following questions about each of the significance tests that SAS produces by default (I count 4 default tests).
  1. What is the value of the test statistic? The answer is a number from your printout.
  2. What proportion of the remaining variation is explained? Better use proc iml.
  3. What is the p-value? The answer is a number from your printout.
  4. Is the result statistically significant at the 0.05 level? Yes or No.
  5. What, if anything, do you conclude? This is not the place for statistical jargon. "What do you conclude" means say something about the drug, healing, time.
5. Here are some questions you have already answered, partly.
  1. Averaging across healing time, does the drug have an effect on implantation of the false teeth into the bone? Answer Yes or No. If Yes, is it possible to draw a directional conclusion without further testing?
  2. Averaging across drug versus placebo, does healing time have an effect on implantation of the false teeth into the bone? Answer Yes or No. If Yes, is it possible to draw a directional conclusion without further testing?
  3. Does the effect of drug depend on amount of healing time? Answer Yes or No. If Yes, is it possible to draw a directional conclusion without further testing?
6. Make a table with one row for each treatment combination. Make columns showing the dummy variables for effect coding. That's the setup with 1, 0 and -1. You do not need to make columns for the product terms.
7. Give E[y|X=x] for a regression model with both main effects and the interaction. Use your variable names from the table. Notice that you are not being asked to actually fit this model to the data.
8. In terms of the β values of your regression model, give the null hypothesis you would test in order to answer each of the following questions.
  1. Averaging across time periods, is there a difference between the drug and placebo in mean force required to extract the tooth?
  2. Averaging across drug and placebo, does elapsed time affect the mean force required to extract the tooth?
  3. Does the effect of the drug depend upon elapsed time before sacrifice?
  4. Does the pattern of healing over time depend upon drug?
  5. Is there a drug by time interaction?
9. Now, make a table with a row for each treatment combination. Make columns showing how you would set up the dummy variables for cell means coding. That's the setup with just zeros and ones, and no intercept.
10. Write E(y|X=x) for a regression model with your dummy variables. This equation has all 8 β quantities.
11. Add one more column to your table, showing E(Y) for each treatment combination in terms of your β quantities.
12. Make a 2x4 (Drug by Time) table, and write E(Y) in each cell. This will help you answer the next question.
13. In terms of the β values of your regression model, give the null hypothesis you would test in order to answer each of the following questions.
  1. Averaging across time periods, is there a difference between the drug and placebo in mean force required to extract the tooth?
  2. Averaging across drug and placebo, does elapsed time affect the mean force required to extract the tooth?
  3. Does the effect of the drug depend upon elapsed time?
  4. Does the pattern of healing over time depend upon drug?
  5. Is there a drug by time interaction?
14. Now please return to SAS. Using proc reg and cell means coding (zero-one indicators and no intercept), test whether the drug has an effect at any time period. This is the main point of the study. You are testing the null hypothesis of conditional independence with one test. Obtain the F statistic and p-value. Do you reject H₀? Are the results statistically significant? What do you conclude?
  To follow up, conduct tests to answer the following questions. You will Bonferroni-correct the four tests at a joint significance level of α=0.05, and base any conclusions on Bonferroni-adjusted p-values.
  1. Is there a difference between Drug and Placebo at 3 days?
  2. Is there a difference between Drug and Placebo at 6 days?
  3. Is there a difference between Drug and Placebo at 9 days?
  4. Is there a difference between Drug and Placebo at 12 days?
  Be able to answer questions like these for each test.
  1. What is the value of the test statistic? The answer is a number.
  2. What is the Bonferroni-adjusted p-value? The answer is a number that you calculate with a calculator.
  3. Is the result statistically significant at the joint 0.05 level? Yes or No.
  4. What, if anything, do you conclude? This is not the place for statistical jargon. "What do you conclude" means say something about the drug, implantation of the false tooth into bone, healing, time -- something like that.
15. Now do the four follow-up tests in a non-parametric way using proc multest. While in Question 4n you were applying a Bonferroni correction for multiple testing, here you will be using the permutation approach. You will be permuting (randomizing) the data values, not ranks. Obtain adjusted p-values. Do your conclusions change?
16. Finally, this was an animal trial. Based on the results of this experiment, do you recommend proceeding to clinical trails with humans? Answer Yes or No and briefly comment.

Please bring your log file and your results file to the quiz. As usual, answers to the questions are not to be handed in. They are just practice for the quiz. Please do not write anything on your printouts except your name and student number. It is okay to highlight the results file, but do not write interpretations on your results file, or cause them to appear in any way (including comment statements) on your log file.