STA441 Assignment 8

Quiz in Tutorial on Friday March 15th



  1. In Question 2 of Assignment 7, your analysis of the exploratory Diversity data pointed to four predictors of job satisfaction: relations with colleagues at work, relations with magnagement, fair opportunities for advancement, and visible minority status. Your analyses for this question will be based on a model with just these four explanatory variables.
    1. Use the exploratory data to produce a 95% prediction interval for each employee in the replication sample. The value of R2 for the exploratory sample should be 0.4374, as in Assignment 7.
      1. Using proc print, display just the first ten cases in the replication sample, for the following variables:
        • id (should be 501-510)
        • The four predictors
        • Job satisfaction
        • Predicted job satisfaction
        • Lower 95% prediction limit
        • Upper 95% prediction limit
        Just display the first 10 cases. Do you really want to bring printout of 500 data lines to the quiz?! There is an example of how to do this in the regression on the cars data (SAS Example 5).
    2. Using a calculator and your results from the regression, be able to reproduce predicted job satisfaction for one of the first ten cases.
    3. Find the percentage of observations in the replication sample that fall within the prediction interval. Don't bother with the percentage for the exploratory sample. I used proc freq.
  2. This question uses data from the Toronto Raptors' 2006-2007 season. For each regular season and playoff game, the following variables were recorded: The data are available in the file Raptors06-07.data.txt.

    The response variable will be point spread, defined as Raptors' score minus opponents' score. The explanatory variables will be Opponents' won-lost record, Home vs Away game, Days since last game, and a binary variable that equals one if the game was on a weekend, and zero if on a weekday.

    Here a few hints and reminders intended to make it easier for you to read and process the data.

    An ordinary regression on these data lacks credibility, because it's obviously a time series and the assumption of independent random sampling (which means no sequential association between observations) is implausible without further evidence. So please follow these steps:

    1. Start with an ordinary least squares (OLS) regression using proc reg, with the explanatory and response variables given above. Request the Durbin-Watson statistic and associated p-value. What is the estimated first-order auto-corrrelation of the residuals? The answer is a number.
    2. P = 0.0595 is not quite evidence for autocorrelation, but it's not a completely clean bill of health either. Fit a first-order autoregressive model with proc autoreg.
      1. Are the beta-hat values different when you fit the first-order autoregressive model? Do the conclusions change (using the 0.05 significance level)?
      2. Two of the time series plots indicate that the first-order autoregressive model is okay. Which ones?
      3. Based on the first-order autoregressive model, when the Raptors played at home it was worth an estimated ______ points.
      4. Did they do better when they played on the weekend, or worse?
    3. It's probably not necessary, but fit an AR(6) model (a model with 6 autoregressive parameters). Is there evidence of higher-order autocorrelation? Was the ordinary regression acceptable for these time series data?

 


Bring both log files and both sets of output to the quiz. A calculator might be useful too.