Assignment 8

STA441 Assignment 8

Quiz in Tutorial on Friday March 15th

In Question 2 of Assignment 7, your analysis of the exploratory Diversity data pointed to four predictors of job satisfaction: relations with colleagues at work, relations with magnagement, fair opportunities for advancement, and visible minority status. Your analyses for this question will be based on a model with just these four explanatory variables.
1. Use the exploratory data to produce a 95% prediction interval for each employee in the replication sample. The value of R² for the exploratory sample should be 0.4374, as in Assignment 7.
  1. Using proc print, display just the first ten cases in the replication sample, for the following variables:
    - id (should be 501-510)
    - The four predictors
    - Job satisfaction
    - Predicted job satisfaction
    - Lower 95% prediction limit
    - Upper 95% prediction limit
    Just display the first 10 cases. Do you really want to bring printout of 500 data lines to the quiz?! There is an example of how to do this in the regression on the cars data (SAS Example 5).
2. Using a calculator and your results from the regression, be able to reproduce predicted job satisfaction for one of the first ten cases.
3. Find the percentage of observations in the replication sample that fall within the prediction interval. Don't bother with the percentage for the exploratory sample. I used proc freq.
This question uses data from the Toronto Raptors' 2006-2007 season. For each regular season and playoff game, the following variables were recorded:
- Date
- Home or Away game
- Opponent
- Won or lost
- Days since last game
- Points scored by the Raptors
- Points scored by opponents
- Opponents' won-lost record the preceding year.
The data are available in the file Raptors06-07.data.txt.
The response variable will be point spread, defined as Raptors' score minus opponents' score. The explanatory variables will be Opponents' won-lost record, Home vs Away game, Days since last game, and a binary variable that equals one if the game was on a weekend, and zero if on a weekday.
Here a few hints and reminders intended to make it easier for you to read and process the data.
- Date is actually two variables. You'll use only the first one.
- Remember, the name of a character-valued variable is followed by a dollar sign in the input statement. Opponents' name will be truncated to 8 characters, but that's just cosmetic.
- You can put an apostrophe (single quote) inside double quotes if you wish.
- You may not have seen an or inside an if statement. Here's how I did it:
  if dayweek='Sat' or dayweek='Sun' then weekend=1; else weekend=0;
An ordinary regression on these data lacks credibility, because it's obviously a time series and the assumption of independent random sampling (which means no sequential association between observations) is implausible without further evidence. So please follow these steps:
1. Start with an ordinary least squares (OLS) regression using proc reg, with the explanatory and response variables given above. Request the Durbin-Watson statistic and associated p-value. What is the estimated first-order auto-corrrelation of the residuals? The answer is a number.
2. P = 0.0595 is not quite evidence for autocorrelation, but it's not a completely clean bill of health either. Fit a first-order autoregressive model with proc autoreg.
  1. Are the beta-hat values different when you fit the first-order autoregressive model? Do the conclusions change (using the 0.05 significance level)?
  2. Two of the time series plots indicate that the first-order autoregressive model is okay. Which ones?
  3. Based on the first-order autoregressive model, when the Raptors played at home it was worth an estimated ______ points.
  4. Did they do better when they played on the weekend, or worse?
3. It's probably not necessary, but fit an AR(6) model (a model with 6 autoregressive parameters). Is there evidence of higher-order autocorrelation? Was the ordinary regression acceptable for these time series data?

Bring both log files and both sets of output to the quiz. A calculator might be useful too.