STA442/1008 Assignment 10

Quiz in tutorial on Friday April 11th


This assignment uses the Noise data described in Chapter 5. You are free to take and modify my code if you wish.

  1. Create a new variable that is the average of the 5 discrimination scores. Using proc means, calculate the mean and standard deviation of all 6 variables, and also the mean and standard deviation of the average variable separately for each age category (use the class statement in proc means). Based on the fact that a higher "detection score" mean better detection of the signal in the presence of background noise, make your best guess:
    1. Is age=1 youngest, or oldest?
    2. Is Noise Level=1 loudest, or softest?

  2. Using the multivariate approach to repeated measures in proc glm, carry out an analysis of covariance in which the factors are age (three categories) and noise level (we're ignoring sex of subject), the dependent variable is discrimination score, and level of interest in the topic being discussed is a covariate. Use lsmeans to view the "corrected" means for discrim1-discrim5, separately for each level of age. The picture emerging is pretty consistent, but it would be nice to have least squares means on the average discrimination variable. This reflects what proc glm is doing under the surface, but I can't make lsmeans produce it. See the next question.

  3. Using proc reg output and a calculator (or proc iml),
    1. Reproduce the least squares means of discrim5 separately for the 3 age categories. They are good only to the first decimal place, because of rounding in the numbers you got from the printout. Still, this shows you know what you are doing.
    2. Now generate least squares means (one for each age category) based on the mean discrimination score. This shows what is going on for one of the tests from your proc glm; which one?
    3. In the same proc reg (the one where mean discrimination score is the dependent variable), reproduce the F test for the main effect of age from proc glm (you will find the F and p values under "Tests of Hypotheses for Between Subjects Effects"). Also carry out F tests for all pairwise comparisons of marginal means controlling for interest in topic. For two of these comparisons, F=t2. Can you find the t statistics? Using a Bonferroni correction for the fact that you are carrying out three pairwise comparisons, what do you conclude? Use plain language. Start with "Allowing for interest in the topic ..."

  4. Now use the covariance structure approach of proc mixed to carry out basically the same analysis of covariance you have been doing with the multivariate approach. The factors are age and noise level, the dependent variable is discrimination score, and level of interest in the topic being discussed is the covariate. Assume the covariance structure is unknown. Note that the lsmeans and contrast statements work the same way in proc mixed as they do in proc glm. Obtain least-squares means for any significant effects. For comparison with your results in 3c, follow up the main effect for age with Bonferroni-corrected pairwise comparisons using contrasts. Please do not use the method=ml (maximum likelihood) option, even though it is used in the text. For age, my F = 7.63, which is close to what I got from proc glm and proc reg, but not exactly the same. The least-squares means for age are also close. They are not exactly equal to what we got from proc reg, because the proc reg was based on 60 averages, while proc mixed is doing a regression with all 300 observations).

    There is one major difference between the proc glm and proc mixed results, and it leads to a substantially different conclusion about one of the independent variables. What is the difference? Which set of results makes more sense on intuitive grounds?

  5. My humble conclusion is that I do not know exactly what proc glm is doing when it tests for the main effects of within-cases factors in the presence of covariates. However, I know what I wish it were doing. Your last task is to explain why the code below makes sense.

    The option data=loud means use the SAS data set called loud, which I created with a multivariate data read. Proc mixed, of course, requires a univariate data read. The dependent variables d1 through d4 are differences between successive levels of the discrimination variables: d1 = discrim1-discrim2, etc.. A1 and A2 are indicator dummy variables for age. Why .3333? This question will require some thought. By the way, I get F = 16.26, p < .0001, which is very close to the proc mixed results.

    proc reg data=loud;
         title2 'Try to do main effect for noise better with multivariate approach';
         model d1-d4 = interest A1 A2;
         Noise: mtest intercept + 2.43167*interest + .3333*A1 + .3333*A2= 0;