Assignment 11

STA441 Assignment 11

Quiz Online March 30, 7-8 pm

In a two-factor analysis of variance, with 2 levels of A and 3 levels of B, suppose you have

μ₁₁=4 μ₁₂=4 μ₁₃=5

μ₂₁=3 μ₂₂=3 μ₂₃=4

and σ² = 9. The sample sizes are all equal. What total sample size do you need to detect the main effect for factor B with a power of 0.80? Your answer is an integer, the smallest multiple of 6 that will get the job done. Feel free to use my code.
In a one-way analysis of variance with three treatment groups, suppose that μ₁, μ₂ and μ₃ are equally spaced, half a standard deviation apart. With equal sample sizes, what total sample size is needed to reject H₀: μ₁ = μ₂ = μ₃ with probability 0.90? Your answer is an integer, the smallest multiple of 3 that will get the job done. Feel free to use my code.
This question uses data from the Toronto Raptors' 2006-2007 season. For each regular season and playoff game, the following variables were recorded:
- Date
- Home or Away game
- Opponent
- Won or lost
- Days since last game
- Points scored by the Raptors
- Points scored by opponents
- Opponents' won-lost record the preceding year.
The data are available in the file Raptors06-07.data.txt.
The response variable will be point spread, defined as Raptors' score minus opponents' score. The explanatory variables will be Opponents' won-lost record, Home vs Away game, Days since last game, and a binary variable that equals one if the game was on a weekend, and zero if on a weekday.
Here a few hints and reminders intended to make it easier for you to read and process the data.
- Date is actually two variables. You'll use only the first one.
- Remember, the name of a character-valued variable is followed by a dollar sign in the input statement. Opponents' name will be truncated to 8 characters, but that's just cosmetic.
- You can put an apostrophe (single quote) inside double quotes if you wish.
- You may not have seen an or inside an if statement. Here's how I did it:
  if dayweek='Sat' or dayweek='Sun' then weekend=1; else weekend=0;
An ordinary regression on these data lacks credibility, because it's obviously a time series and the assumption of independent random sampling (which means no sequential association between observations) is implausible without further evidence. So please follow these steps:
1. Start with an ordinary least squares (OLS) regression, using the explanatory and response variables given above. First use proc reg, requesting the Durbin-Watson statistic. Then fit the same OLS using proc autoreg; request the p-value for the Durbin-Watson statistic, which proc reg.The least squares estimates, t-tests and the value of the Durbin-Watson statistic should be the same.
  1. What is the estimated first-order auto-corrrelation of the residuals given by proc reg? The answer is a number.
  2. Can you find that autocorrelation in the autocorrelation function? What shows that it's not quite significant?
2. P = 0.0595 is not quite evidence for autocorrelation, but it's not a completely clean bill of health either. Fit a first-order autoregressive model with proc autoreg.
  1. Are the beta-hat values different when you fit the first-order autoregressive model? Do the conclusions change (using the 0.05 significance level)?
  2. Based on the first-order autoregressive model, when the Raptors played at home it was worth an estimated ______ points.
  3. Did they do better when they played on the weekend, or worse?
3. Finally, run your regression analysis again using proc mixed, specifying a first-order autoregressive error structure. To do this, I had to create a "subject" variable that was the same for all games. Please include the cl option on the proc mixed statement to get confidence intervals for the covariance parameters, and the / solution option on the model statement so you get the β-hats. They are not quite what you got from proc reg, because the usual beta-hats are maximum likelihood estimates, while these are restricted maximum likelihood (REML).
4. In my proc mixed output, there are two ways to test whether that first-order autocorrelation is zero; one of them is based on the confidence interval. Is there acceptable evidence of sequential correlation in the errors?
In summary, my conclusion is that the ordinary regression on these time series data was fine, but we didn't know it in advance. We had to check.

You will probably attach part or all of your output to the quiz.

μ₁₁=4	μ₁₂=4	μ₁₃=5
μ₂₁=3	μ₂₂=3	μ₂₃=4