STA442/1008 S2002 Assignment 5

Assignment 5

For Quiz on April 5

For this assignment, it may be useful to have a copy of sasfiles.txt This is a plain test file containing all the SAS programs from this class, in alphabetical order. For some of the questions below, you will want to take a program from Chapter 4, make slight modifications, and re-run it.

You may also use scode.txt in the same way. It has programs and pieces of code from Chapter 6 and Chapter 7.

Recall that for the mcars.dat data, we are comparing three countries of origin, and we have one covariate -- weight. Consider the (full) model that includes the interaction between country and weight.
1. As mentioned in the notes, an interesting follow-up would be to use Scheffée tests to compare the heights of the regression lines at many values of weight; infinitely many comparisons would be protected simultaneously. These tests are not proper follow-ups to the test for interaction. What is the right initial test?
2. Suppose that you wanted the interaction to be significant provided that it explained at least 4% of the remaining sample variation. What sample size would be required?
3. Actually, there were 100 cars. What proportion of the remaining sample variation would the interaction have to explain in order to be significant?
4. Suppose that you want to have an 80% chance of detecting the interaction as significant if the interaction explains 4% of the remaining variation in the population. What sample size would be required?
Recall that bunnies.dat from the last assignment involved a 2 by 4 (drug by time) factorial ANOVA.
1. Using effect coding and representing the interaction by a collection of dummy variables,
  1. Write a general expression for E[Y]. It has 8 beta values.
  2. What is E[Y] for rabbits that didn't get the drug and were killed after 6 days? Note that a correct answer will not have any symbols for the dummy variables or products of dummy variables. They are all -1, 0 or 1.
  3. Which regression coefficient(s) would you test to assess the main effect of drug?
  4. Which regression coefficient(s) would you test to assess the main effect of time?
  5. Which regression coefficient(s) would you test to assess the interaction of drug by time?
2. Suppose you want the main effect for drug to be significant provided it explains at least 10% of the remaining variation after allowing for the main effect of time and the interaction. What sample size is required?
3. Well, actually there were only n=40 rabbits. What proportion of the remaining variation would the main effect for drug need to explain in order to be significant at the usual 0.05 level?
Give your own example of a two-factor design (not nested) where one factor is fixed, and the other is random. Do not use any examples from class or the online notes.
Give your own example of a design with Factor B nested within A, and C nested within B, where A is fixed, and B and C are random effects. Do not use any examples from class or the online notes.
Read in bunnies.dat into R.
1. Use the lm function to repeat the two-way ANOVA just for stiffness in newtons/mm.
2. Write an S function that will let you calculate the proportion of remaining variation explained, for the F statistic and the degrees of freedom. You are being asked to program a formula from Chapter 3. Of course you should test your function on the example in the class notes!
3. In the two-way ANOVA you just did on the bunny data, find the proportion of remaining variation explined by each of the main effects and by the interaction.
Now you will use some of my code from the regression artifact example to see what happens when you do an independent t-test on data that demand a matched t-test. Suppose you were to simutate n = 200 (not n = 10,000!) pre-post pairs just the way I did, with the same means, standard deviations and so on. Then, instead of doing a matched t-test, suppose you mistakenly did an independent t-test, pretending you had n=200 observations in each group. What's the probability of a significant difference? Accompany your answer with a 99% confidence interval that does not include 0.05. This last requirement will help you choose your Monte Carlo sample size. Bring printout(s) to the quiz. The printouts will show all your code, and the answers. In this as in all simulations, please set a seed with the set.seed function so we can reproduce your results if need be.
Please continue the power analysis for the two-sample t-test with unequal variances. Using the same parameter values that we have been using all along and letting the sample sizes be proportional to the standard deviations, what is the smallest total sample size that gives power of at least 0.80? How much sample size did we save by adandoning equal sample sizes? Accompany your final power value with a confidence interval of width no more than 0.01. Bring printout(s) to the quiz. The printouts will show all your code, and the answers. In this as in all simulations, please set a seed with the set.seed function so we can reproduce your results if need be.
Using the bunnies data again, do a permutation test to test the main effect of drug on the dependent variable force. Your test statistic should be the absolute value of the difference between marginal means. Accompany your p-value with a confidence interval of width no more than 0.02. Bring printout(s) to the quiz. The printouts will show all your code, and the answers. In this as in all simulations, please set a seed with the set.seed function so we can reproduce your results if need be.