STA429/1007 Assignment 9

Quiz on Thursday Nov. 22nd at 10:10 a.m.


This assignment uses data from the Toronto Raptors' 2006-2007 season. For each regular season and playoff game, the following variables were recorded:

  1. Date
  2. Home or Away game
  3. Opponent
  4. Won or lost
  5. Days since last game (If I've made a mistake and you notice please let me know, but don't correct it.)
  6. Points scored by the Raptors
  7. Points scored by opponents
  8. Opponents' won-lost record the preceding year.
The data are available in the file 2006-07.data.

The dependent variable will be point spread, defined as Raptors' score minus opponents' score. The independent variables will be Opponents' won-lost record, Home vs Away game, Days since last game, and a binary variable that equals one if the game was on a weekend, and zero if on a weekday.

Here a few hints and reminders intended to make it easier for you to read the data.

An ordinary regression on these data lacks all credibility, because it's obviously a time series and the assumption of independet random sampling (which means no sequential association between observations) is implausible without further evidence. So please follow these steps:

  1. Start with an ordinary regression, using the dependent and independent variables described above. Request the Durbin-Watson statistic, and save the residuals. So you can check, my regression had R2=0.2739.
  2. To see if the Durbin-Watson test is significant, you need the Table. Take a look. How do you know that the test is inconclusive?
  3. Using proc arima, take examine the autocorrelation function of the residuals. The lag 1 autocorrelation (0.17524) comes right to the edge of the 95% interval around zero. Again, inconclusive. There might be some autocorrelation. If so, it's no more than lag one.
  4. Finally, run your analysis again using proc autoreg, specifying a first-order autoregressive error structure with maximum likelihood estimation (as in the handout). We'll believe the test for the first-order autoregressive parameter (it's called AR1 in the printout). Is there significant sequential correlation in the errors?
  5. Compare the significance tests (based on the t statistics) associated with the Maximum Likelihood Estimates to those you got from proc reg. Do your conclusions change? What are those conclusions? For example, how much better or worse do we predict the Raptors to do on a weekend? What is the value of the test statistic, and so on.
  6. As one final check, please test whether the residuals from proc reg come from a normal distribution. Do we have reason to question the normal assumption?

In summary, my conclusion is that the ordinary regression on these time series data was fine, but we didn't know it in advance. We had to check.

Please bring your log file and your list file to the quiz.