Assignment 8

STA441 Assignment 8

Quiz in Tutorial on Monday March 9th

This assignment uses 2 different data sets, so please bring 2 separate sets of output, the log file and the results file for each data set. You will hand at least one of them in, but maybe not both.

An IQ (Intelligence Quotient) is a score on a test that supposedly measures intelligence. A score of 100 is average. IQ tests are controversial, but it's fair to say that an IQ test is an inexact measure of one kind of intelligence -- a kind that helps you do well in school. In the Longitudinal IQ data, adopted children had their IQs tested at ages 2, 4, 8 and 13. Birth mother's IQ and adoptive mother's education are also available for each child.
The data are available in the file origIQ.data.txt. The variables are
- Adoptive mother's education in years
- Birth mother's IQ
- Child's IQ at age 2
- Child's IQ at age 4
- Child's IQ at age 8
- Child's IQ at age 13
1. Use proc means to produce the default descriptive statistics on all the variables.
2. Just to get a feel for the relationships among variables, produce a correlation matrix with proc corr and take a look. Be able to interpret all the tests. Is there evidence that adoptive mother's education is related to birth mother's IQ?
3. Now treat this as a multivariate regression with two explanatory variables (no interaction) and four response variables.
  1. Carry out multivariate tests of adoptive mother's education controlling for birth mother's IQ, and birth mother's IQ controlling for adoptive mother's education. What do you conclude?
  2. You have four univariate tests of adoptive mother's education controlling for birth mother's IQ, and four univariate tests of birth mother's IQ controlling for adoptive mother's education. Protecting these 8 tests with a Bonferroni correction is a good idea, and makes you wonder why you bothered with the multivariate test. With the correction, what do you conclude? Be able to state your conclusions in plain, non-statistical language.
  3. Do you see why these results could make some people uneasy? Think of the political implications.
Bring your log file and results file to the quiz.
The Titanic was a passenger ship that hit an iceberg and sank on its very first voyage in 1912. It was the largest passenger ship in the world at the time, and supposedly unsinkable. More than 1,500 of the roughly 2,200 passengers and crew died. See the Titanic Data.
1. First, explore the data using proc freq. Produce simple frequency distributions of all the variables, and two-way tables of survival (the natural response variable) by the other variables. For the two-way tables, carry out chi-squared tests.
2. Now do a table of class by survival, but just for children. Include the chi-squared test and display the expected frequencies. In spite of any warnings about expected frequencies less than 5, you actually only have to worry if an expected frequency is less than one.
  Why would a logistic regression model for these data present a problem? For example, use the table to estimate the odds of survival for a child in first class. To avoid problems like this, we will limit the rest of the analysis to adult passengers.
3. Controlling for class by sub-division (look at the Berkeley graduate admission example if you need to), check the relationship of sex to survival in each sub-table. Protecting the three tests with a Bonferroni correction, what do you conclude? You are able to draw directional conclusions here.
4. Here is a question you cannot answer with proc freq. Is the relationship between sex and survival stronger for some passenger classes than for others? Consider this odds ratio: odds of female survival divided by odds of male survival. You could call it the female survival advantage. Is this number the same for all three passenger classes? We are definitely talking about an interaction: The interaction of class by sex.
  1. One way to test for the interaction is with effect coding dummy variables, just as you would for ordinary regression. Recall that by default, the class statement in proc logistic uses effect coding to produce dummy variables, and proc glm syntax works. Obtain the test statistic for the interaction, and the p-value. 52.2607 There is a Note about "Type 3 effect tests under GLM parameterization." You can ignore it. Effect coding is a full-rank parameterization, while proc glm uses a model that is deliberately over-parameterized.
  2. To understand the interaction, produce the same test statistic and p-value you got from effect coding, but this time using a model with no intercept and cell means coding.
  3. Estimate that "female survival advantage" for each class. The answer is a set of three numbers. My answer for first class is 72.44. That is, the (estimated) odds of survival were over 72 times as great for women. Please do this with proc iml so the numbers appear on your printout.
    It's much easier to get those survival advantage odds ratios from the cell means coding output; getting them from the effect coding output would be unpleasant. Actually, I did it three ways, and my answers for first class were 72.44, 72.45 and 72.46. There is some rounding error, and some minor noise in the numerical MLEs.
  4. In the cell means coding output, there is a table labelled "Odds Ratio Estimates." Are those really estimates of odds ratios, or are they just estimates of odds?
  5. Now conduct tests for the pairwise comparisons of the three survival advantage numbers. With a Bonferroni correction, what do you conclude?
5. The final question is this. The passengers and crew abord the Titanic were not a random sample from any population, but even so, survival had a random component. Would you say that the observations were independent?

Please bring both log files and both results file to the quiz. As usual, answers to the questions are not to be handed in. They are just practice for the quiz. Please do not write anything on your printouts except your name and student number. It is okay to highlight the results files, but do not write interpretations on your results files, or cause them to appear in any way (including comment statements) on your log files.