Assignment 9

STA441 Assignment 9

Quiz in Tutorial on Tuesday March 20th

Multivariate regression model

This assignment uses 2 different data sets, so please bring 2 separate sets of output, the log file and the results file for each data set. You may not hand both of them in.

An IQ (Intelligence Quotient) is a score on a test that supposedly measures intelligence. A score of 100 is average. IQ tests are controversial, but it's fair to say that an IQ test is an inexact measure of one kind of intelligence -- the kind that helps you do well in school. In the Longitudinal IQ data, adopted children had their IQs tested at ages 2, 4, 8 and 13. Birth mother's IQ and adoptive mother's education are also available for each child.
The data are available in the file origIQ.data.txt. The variables are
- Adoptive mother's education in years
- Birth mother's IQ
- Child's IQ at age 2
- Child's IQ at age 4
- Child's IQ at age 8
- Child's IQ at age 13
1. Use proc means to produce the default descriptive statistics on all the variables.
2. Just to get a feel for the relationships among variables, produce a correlation matrix with proc corr and take a look. Be able to interpret all the tests. Is there evidence that adoptive mother's education is related to birth mother's IQ?
3. Now treat this as a multivariate regression with two explanatory variables (no interaction) and four response variables.
  1. Carry out multivariate tests of adoptive mother's education controlling for birth mother's IQ, and birth mother's IQ controlling for adoptive mother's education. What do you conclude?
  2. You have four univariate tests of adoptive mother's education controlling for birth mother's IQ, and four univariate tests of birth mother's IQ controlling for adoptive mother's education. Protecting these 8 tests with a Bonferroni correction is a good idea, and makes you wonder why you bothered with the multivariate test. With the correction, what do you conclude? Be able to state your conclusions in plain, non-statistical language.
  3. I can't ask this on the quiz, but do you see why these results could make some people uneasy? Think of the political implications.
Bring your log file and results file to the quiz.
A random sample of male and female university students is weighed midway through year 1, 2, 3 and 4.
1. What are the cases in this study?
2. How many numbers (observations) do you have for each student?
3. This is a factorial experiment. What are the factors? (Let's say that case is not a factor.)
4. Classify each factor as between-cases or within-cases.
5. Make a 2 by 4 table. Draw an oval or ovals on the table, indicating the crossing or nesting of cases within experimental conditions. See the lecture slides for some examples.
6. In the "multivariate" approach to within-cases analysis, you set up effect coding dummy variables for the between-cases factors (if any), and calculate response variables that are linear combinations of the variables that are recorded for each case. You can then obtain tests for all the main effects and interactions by testing null hypotheses about the β values in the regression model. Sometimes the model has more than one response variable (linear combination). In this case it really is multivariate, and the second subscript on the βs refers to the response variable.
  For this sex by weight example, denote the four weights for student i by y_i1, y_i2, y_i3, y_i4. The response variables will be linear combinations of these values. First consider the main effect for sex of student.
  1. Give a formula (or formulas) for the linear combination (or combinations) that you would use as the response variable (or variables).
  2. Write the regression model -- just the expected value(s).
  3. What is the null hypothesis in terms of the β values from your model?
7. A single model applies to Year and Gender by Year.
  1. Give a formula (or formulas) for the linear combination (or combinations) that you would use as the response variable (or variables).
  2. Write the regression model -- just the expected value(s).
  3. In terms of β values from your model, what is the null hypothesis for testing the main effect of Year?
  4. In terms of β values from your model, what is the null hypothesis for testing the Gender by Year interaction?

In an experiment on anxiety medications, volunteer patients took a pill in the morning every day and in the evening they rated how anxious they had felt on average during that day. What was in the pill was unknown to the patient, and came in a different random order for every patient. The pill contained Drug A (Yes or No) and Drug B (Yes or No), in all four combinations. The four numbers for each patient are actually average ratings over 10 cycles, so the experiment took 40 days. The four numbers for each patient are

y₁₁: Average rating in the No, No condition. E(y₁₁)=μ₁₁
y₁₂: Average rating in the No, Yes condition. E(y₁₂)=μ₁₂
y₂₁: Average rating in the Yes, No condition. E(y₂₁)=μ₂₁
y₂₂: Average rating in the Yes, Yes condition. E(y₂₂)=μ₂₂

What are the cases in this study?
This is a factorial experiment. What are the factors? (Let's say that case is not a factor.)
Classify each factor as between-cases or within-cases.
Make a 2 by 2 table and write expected values in the cells.
Make an oval or ovals on the table, indicating the crossing or nesting of cases within experimental conditions. See the lecture slides for some examples.
You can test the main effects and interactions in this study with one-sample t-tests, testing whether the means of certain linear combinations of the y_ij variables equal zero. Give the linear combination you would use to test for each of the following. In each case, the answer is a formula, a function of the y_ij.
1. Main effect for Drug A .
2. Main effect for Drug B .
3. Drug A by Drug B interaction.

Psychoactive drugs can have very different effects depending on the age of the person taking them. So consider independent samples of patients aged 5-12, 13-18, 19-29, 30-64 and 65+.

Write a regression model in which the expected value of the response variable depends on age group. You don't have to specify how your dummy variables are defined. You will do that in the next part.
Make a table with 5 rows, showing how your dummy variables for age group are defined. Add another column for the expected value of the response variable.
Why is your dummy variable coding scheme a good choice for testing whether the average expected value of some linear combination is equal to zero?
We now have a 3-factor design, with one between-cases factor and 2 within-cases factors. What is the between-cases factor?

For each of the effects in your 3-way design, give the linear combination of y_ij, you would use as the response variable, and the null hypothesis you would test in terms of β values from your regression model.

Effect	Linear combination	Null hypothesis
Drug A
Drug B
Age
A × B
A × Age
B × Age
A × B × Age

In a study of the psychology of attention, subjects attempted to solve word problems while listening to distracting backgound noise. The distracting material was either music, or spoken words related to the problem they were trying to solve. The distracting material was presented at three different levels of loudness. Each subject attempted 10 problems at each combination of loudness and type of distraction, for a total of 60 problems. Order of presentation was randomized. Data for each subject are number correct in each of the six treatment combinations, and are avalable in distract.data.txt.
1. How many factors are there in this study? Classify each one as between cases or within cases. Let's say that case is not a factor.
2. Make a 2 by 3 table. Draw an oval or ovals on the table, indicating the crossing or nesting of cases within experimental conditions.
3. Using the multivariate appoach to repeated measures, carry out the appropriate analysis, obtaining F-tests for all main effects and interaction(s).
4. Prepare a plot of the means. You can do it by hand, or use Excel or a similar program. Either way, bring your plot to the quiz. You may hand it in.
5. Your plot suggests that the interaction and one of the main effects (not the other one) should be interpreted. Which test for main effects is more interpretable and why?
6. Describe the more interpretable effect in plain, non-statistical language.
7. Based on the interaction plot, let's test for a volume effect (low-medium-high) just within the musical distraction condition. This is a multivariate test. Is there evidence of an effect?
8. Just for completeness, test all pairwise volume differences just within the voice condition, even though we know how it's going to come out. Use matched t-tests. There are several ways to do this. The results are so strong that correction for multiple testing is not an issue. In plain, non-statistical language, what do you conclude?
Bring your log file and results file to the quiz.

Please bring both log files and both results files to the quiz. Also bring the plot for Question 4. As usual, answers to the questions are not to be handed in. They are just practice for the quiz. Please do not write anything on your printouts except your name and student number. It is okay to highlight the results file, but do not write interpretations on your results file, or cause them to appear in any way (including comment statements) on your log file.