STA429/1007 Final Exam


The final exam will be on Friday Dec 10th in , from 7-10pm. It will be open book, open notes, open handout, open everything except not open communication. For each computer part, please bring your log and list file. You may be asked to hand them in. Bring a calculator. Here are the topics to be covered and suggestions about what you should do to prepare.

Basic concepts and vocabulary

Review the material from Assignment 1; that's Chapter 1 of the online test. Also, obscure vocabulary from later sections. Be prepared to make up examples of studies with certain characteristics, like a purely experimental study with one categorical and one quantitative independent variable, and the quantitative variable is a random effect. The purpose is to verify that you are up to speed on the jargon and the basic concepts of significance testing.

Choice of statistical methods

You might be asked to design an original study that would use, say, a three-way multivariate analysis of covariance, or a logistic regression with one quantitative IV and dummy variables for a categorical IV with 3 categories. Or, you might be asked to explain how you would choose between a multivariate analysis and loglinear modelling if you had a study with multiple dependent variables.

Univariate Multiple regression and ANOVA

Expect questions about full and reduced models, interpretation of ß coefficients, dummy variable setup and interactions. Fill in the table. Also, do the following and bring your printouts to the final exam.

The file mathACT.dat contains data based on a 1987 administration of the ACT Mathematics Assesment test, a standardized multiple choice test sometimes given to screen American High School students for admission to university. There are three variables: Sex, Course work in High School mathematics, and score on the test. Students were selected so that they met one of three common profiles of Course work in High School mathematics, labelled a, b and c in the data file. They are:

  1. Algebra I only
  2. Two algebra courses and geometry
  3. Two algebra courses, geometry, Trigonometry, Advanced Mathematics and Beginning Calculus
  1. What are the independent variables? What is the dependent variable?
  2. Is this study experimental, observational, or both? Why?
  3. Test both main effects and the interaction. I recommend proc glm. Answer these questions:
    1. Averaging across different High School math backgrounds, do males and females perform differently on average? If so, who does better on average?
    2. Averaging across sex, is course work in High School mathematics related to achievement on the Math ACT? If the answer to this question is Yes, follow up with Bonferroni-corrected pairwise comparisons of marginal means. I believe it will be easier to do this in a separate proc reg run with cell means dummy variable coding and no intercept.
    3. Does the magnitude of the sex difference in performance depend on upon profile of course work in High School math? Answer Yes or no conclusion.
    4. In every case, please base your conclusions on the Type III sums of squares. Why might tests based on the the Type I sums of squares be different?
    5. For each test, what proportion of the remaining variation is explained by each effect?

Multivariate Regression and ANOVA

The file bank.dat has the following data for a sample of employees at a bank.

There is just one question: Controlling for seniority, age, education and prior work experience, is there a difference between the salaries of men and women? There are two dependent variables, to be considered simultaneously: starting salary and 1977 salary.

  1. Do the appropriate multivariate test. Base your conclusions on Wilks' lambda. Give the p-value.
  2. If the test is significant, follow up with Bonferroni-corrected univariate tests. What, if anything, do you conclude?
  3. When seniority, age, education and prior work experience are held constant at their sample mean levels, what is estimated
    1. Starting salary of female employees
    2. Starting salary of male employees
    3. 1977 salary of female employees
    4. 1977 salary of male employees
    To do this efficiently, you might want to consider standardizing the quantitative independent variables for this question. Of course the significance tests of interest will not be affected.

Bring your log file and your list file to the exam.

Logistic Regression

As with regular regression, you need to know what the ß coefficients mean. There is also a little data analysis part. Please bring log and list files to the exam.

The file donner.dat contains data from the ill-fated Donner party, a group of American pioneers who, in the mid 1800s, decided to attempt a new and untested route over the Sierra Nevada mountains. They were snowed in, and the legend is that the survivors were forced to cannibalism. The data file supposedly contains three pieces of information from each adult (15 and over) in the party. I say supposedly because the historical record is not perfect, and there is even room for disagreement about what it meant to be a member of the Donner party, because some people split off from the party during the trek, rejoined later or not, and so on.

Anyway, the variables are

Please conduct likelihood ratio tests to answer these questions. In each case, be able to give the value of the test statistic, the p-value, and say what happened. As usual, be specific. For example, don't say age was related to the odds of survival; say older (or yournger) people were more likely to survive.
  1. Controlling for age, were the chances of survival significantly different for men and women?
  2. Regardless of the answer to the last question, fill in the blank. Controlling for age, the estimated odds of survival are ____ times as great for women.
  3. Controlling for sex, were the chances of survival significantly related to age?
  4. For the preceding questions, your full maodel had just age and sex; there was no interaction. Now please test the interaction, and draw conclusions if appropriate.

On the exam, there will not be any logistic regression with more than 2 categories in the dependent variable.

Loglinear Models

Path Analysis

Consider this model:

Using the data in wishbone.dat, use proc calis to test whether ß1=0 and & ß2=0, simultaneously. Fit the full and reduced model, and use proc iml to calculate the test statistic G and the p-value. When you fit the reduced model, SAS will complain that