Assignment 8

STA441 Assignment 8

Quiz in Tutorial on Thursday March 10th

Suppose you have a data set with a quantitative response variable Y, quantitative explanatory variables X₁ and X₂, and factors A, B and C. Factor A has 3 levels (categories), factor B has 2 levels and factor C has 3 levels. You see this SAS code;
```
                proc glm;
                class A B C;
                model Y = X1 X2 A|B|C;
```
1. Is this an analysis of covariance? Answer Yes or No. If Yes, what are the covariates?
2. Indicate how you would define dummy variables for the factors, using effect coding. That's the setup with 1, 0 and -1. Use names like a₁, a₂, and so on for your dummy variables.
3. Write E(Y|X=x) for a regression model with your dummy variables, equivalent to the proc glm model above.
4. For the overall F-test produced by proc glm,
  1. State the null hypothesis in terms of β quantities from your regression model.
  2. How many (numerator) degrees of freedom will be in your test?
5. For each of the following effects in the proc glm model, state the null hypothesis in terms of β quantities from your regression model. You need not show any work. Just write down the answers.
  1. Main effect of A.
  2. Main effect of B.
  3. Main effect of C.
  4. A by B interaction.
  5. A by C interaction.
  6. B by C interaction.
  7. A by B by C interaction.
6. Controlling for the covariates, are there any differences among the 18 treatment means? State the null hypothesis you would test in order to answer this question. You answer should be in terms of β quantities from your regression model.
Now let's make that last example a little smaller, and try cell means coding. Again the quantitative response variable is Y. This time there is one quantitative covariate X, and factors A and B. Factor A has 3 levels (categories), factor B has 2 levels. The SAS code is
```
                proc glm;
                class A B;
                model Y = X A|B;
```
1. Indicate how you would define dummy variables, using cell means coding. That's the setup with just zeros and ones, and no intercept. You need to make a table with 6 rows, and a column for each dummy variable.
2. Write E(Y|X=x) for a regression model with your dummy variables, equivalent to the proc glm model above. This equation has all 7 β quantities.
3. Add one more column to your table, showing E(Y|X=x) for each treatment combination. My answer has only two β quantities in each row.
4. Make a 3x2 (A by B) table, and write E(Y|x) in each cell. This will help you answer the next question.
5. For each of the following effects in the proc glm model, state the null hypothesis in terms of β quantities from your regression model. You need not show any work. Just write down the answers.
  1. Main effect of A.
  2. Main effect of B.
  3. A by B interaction.
Consider a two-factor analysis of variance in which each factor has two levels. Use this regression equation for the problem:
E[Y|X=d] = β₀ + β₁d₁ + β₂d₂ + β₃d₁d₂
where d₁ and d₂ are dummy variables.
1. Make a two-by-two table showing the four treatment means in terms of β values. Use effect coding (the scheme with the minus ones). In terms of the β values, state the null hypothesis you would use to test for
  1. Main effect of the first factor
  2. Main effect of the second factor
  3. Interaction
2. Make another two-by-two table showing the four treatment means in terms of β values. Use indicator dummy variables (just zeros and ones ones). In terms of the β values, state the null hypothesis you would use to test for
  1. Main effect of the first factor
  2. Main effect of the second factor
  3. Interaction
3. Which dummy variable scheme do you like more for this purpose? Why?
I know this is pretty gruesome, but the data are real -- from the U of T School of Dentistry.
An experiment in dentistry seeks to test the effectiveness of a drug (HEBP) that is supposed to help dental implants become more firmly attached to the jaw bone. This is an initial test on animals. False teeth were implanted into the leg bones of rabbits, and the rabbits were randomly assigned to receive either the drug or a saline solution (placebo). Technicians administering the drug were blind to experimental condition.
Rabbits were also randomly assigned to be "sacrificed" after either 3, 6, 9 or 12 days. At that time, the implants were pulled out of the bone by a machine that measures force in newtons and stiffness in newtons/mm. For both of these measurements, higher values indicate more healing. A measure of "pre-load stiffness" in newtons/mm is also available for each animal. This may be another indicator of how firmly the false tooth was implanted into the bone, but it might even be a covariate. Nobody can seem to remember what "preload" means, so we'll ignore this variable for now.
The data are available in the file bunnies.data.txt. The variables are
1. Identification code
2. Time (3,6,9,12 days of healing)
3. Drug (1=HEBP, 0=saline solution)
4. Stiffness in newtons/mm
5. Force in newtons
6. Preload stiffness in newtons/mm
Please do the following.
1. Classify the factors as within cases or between cases.
2. Use proc freq to find out how many rabbits are in each experimental condition.
3. Using proc glm, conduct a two-way ANOVA, with force as the response variable. Use the means statement to get cell means and marginal means. Be prepared to answer the following questions about each of the significance tests that SAS produces by default (I count 4 default tests).
  1. What is the value of the test statistic? The answer is a number from your printout.
  2. What proportion of the remaining variation is explained? Better use proc iml.
  3. What is the p-value? The answer is a number from your printout.
  4. Is the result statistically significant at the 0.05 level? Yes or No.
  5. What, if anything, do you conclude? This is not the place for statistical jargon. "What do you conclude" means say something about the drug, healing, time -- something like that.
4. I know this is a bit redundant with the preceding question, but did the HEBP drug cause the implants to become more firmly attached to the bone? If the results justify an answer, then answer Yes or No. If the results do not justify and answer, say "The results of this experiment were consistent with no effect of the drug."
5. Make a table with one row for each treatment combination. Make columns showing the dummy variables for effect coding. That's the setup with 1, 0 and -1. You do not need to make columns for the product terms.
6. Give E[Y|X=x] for a regression model with both main effects and the interaction. Use your variable names from the table.
7. In terms of the β values of your regression model, give the null hypothesis you would test in order to answer each of the following questions.
  1. Averaging across time periods, is there a difference between the drug and placebo in mean force required to extract the tooth?
  2. Averaging across drug and placebo, does elapsed time affect the mean force required to extract the tooth?
  3. Does the effect of the drug depend upon elapsed time?
  4. Does the pattern of healing over time depend upon drug?
  5. Is there a drug by time interaction?
8. Now, make a table with a row for each treatment combination. Make columns showing how you would set up the dummy variables for cell means coding. That's the setup with just zeros and ones, and no intercept.
9. Write E(Y|X=x) for a regression model with your dummy variables. This equation has all 8 β quantities.
10. Add one more column to your table, showing E(Y) for each treatment combination in terms of your β quantities.
11. Make a 2x4 (Drug by Time) table, and write E(Y) in each cell. This will help you answer the next question.
12. In terms of the β values of your regression model, give the null hypothesis you would test in order to answer each of the following questions.
  1. Averaging across time periods, is there a difference between the drug and placebo in mean force required to extract the tooth?
  2. Averaging across drug and placebo, does elapsed time affect the mean force required to extract the tooth?
  3. Does the effect of the drug depend upon elapsed time?
  4. Does the pattern of healing over time depend upon drug?
  5. Is there a drug by time interaction?
13. Now please return to SAS. Using proc reg and cell means coding, conduct tests to answer the following questions. Just do regular one-at-a-time (custom) tests. Don't bother with any Bonferroni correction this time. Just consider one response variable: Force. As usual, we are guided by the alpha = 0.05 significnce level.
  1. Are the marginal means different at 3 and 6 days?
  2. Are the marginal means different at 6 and 9 days?
  3. Are the marginal means different at 9 and 12 days?
  4. Is there a difference between Drug and Placebo just at 3 days?
  5. Is there a difference between Drug and Placebo just at 6 days?
  6. Is there a difference between Drug and Placebo just at 9 days?
  7. Is there a difference between Drug and Placebo just at 12 days?
  Be able to answer questions like these for each test:
  1. What is the value of the test statistic? The answer is a number.
  2. What is the p-value? The answer is a number.
  3. Is the result statistically significant at the 0.05 level? Yes or No.
  4. What, if anything, do you conclude? This is not the place for statistical jargon. "What do you conclude" means say something about the drug, healing, time -- something like that.

Please bring your log file and your results file to the quiz. As usual, answers to the questions are not to be handed in. They are just practice for the quiz. Please do not write anything on your printouts except your name and student number. It is okay to highlight the results file, but do not write interpretations on your results file, or cause them to appear in any way (including comment statements) on your log file.