This question uses data from the Toronto Raptors' 2006-2007 season. For
each regular season and playoff game, the following variables were recorded:
- Date
- Home or Away game
- Opponent
- Won or lost
- Days since last game
- Points scored by the Raptors
- Points scored by opponents
- Opponents' won-lost record the preceding year.
The data are available in the file
Raptors06-07.data.txt.
The response variable will be point spread, defined as Raptors' score
minus opponents' score. The explanatory variables will be Opponents' won-lost record, Home vs Away game, Days since last game, and a binary variable that equals one if the game was on a weekend, and zero if on a weekday.
Here a few hints and reminders intended to make it easier for you to read and process the data.
An ordinary regression on these data lacks credibility, because it's obviously a time series and the assumption of independent random sampling (which means no sequential association between observations) is implausible without further evidence. So please follow these steps:
- Start with an ordinary least squares (OLS) regression, using the explanatory and response variables given above. First use proc reg, requesting the Durbin-Watson statistic. Then fit the same OLS using proc autoreg; request the p-value for the Durbin-Watson statistic, which proc reg.The least squares estimates, t-tests and the value of the Durbin-Watson statistic should be the same.
- What is the estimated first-order auto-corrrelation of the residuals given by proc reg? The answer is a number.
- Can you find that autocorrelation in the autocorrelation function? What shows that it's not quite significant?
- P = 0.0595 is not quite evidence for autocorrelation, but it's not a completely clean bill of health either. Fit a first-order autoregressive model with proc autoreg.
- Are the beta-hat values different when you fit the first-order autoregressive model? Do the conclusions change (using the 0.05 significance level)?
- Based on the first-order autoregressive model, when the Raptors played at home it was worth an estimated ______ points.
- Did they do better when they played on the weekend, or worse?
- Finally, run your regression analysis again using proc mixed, specifying a first-order autoregressive error structure. To do this, I had to create a "subject" variable that was the same for all games. Please include the cl option on the proc mixed statement to get confidence intervals for the covariance parameters, and the / solution option on the model statement so you get the β-hats. They are not quite what you got from proc reg, because the usual beta-hats are maximum likelihood estimates, while these are restricted maximum likelihood (REML).
- In my proc mixed output, there are two ways to test whether that first-order autocorrelation is zero; one of them is based on the confidence interval. Is there acceptable evidence of sequential correlation in the errors?
In summary, my conclusion is that the ordinary regression on these time series data was fine, but we didn't know it in advance. We had to check.