About the Final

STA442/1008 Final Exam Information

Recent additions to this document are at the end in a section called Afterthoughts.

Time and Location

The final exam will be on Wednesday April 23d from 4-7 p.m. in the cafeteria of the South Building.

Format

The exam will be closed book. You should bring a calculator.

There are ten questions, worth 10 marks each. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The first 6 questions are non-computer; the questions on assignments and quizzes are a good indication of what to expect.

The last 4 questions are based on SAS output that will be provided to you. Again, the type of questions will be familiar from the assignments and quizzes. More information about the SAS part is given below.

Coverage

The final exam is cumulative, and you are responsible for the text as well as lectures. However, I did not actually look at the text while making up the exam. I glanced over the assignments, handouts and quizzes. On the other hand, there was a lot of overlap between the text and lectures, so reading over the text should be valuable, especially to fill in gaps in your lecture notes.

Not all parts of the course are equally represented. Here are some details.

The basic concepts and vocabulary of Chapter One is important, and worth a relatively large amount of marks. See Assignment 1.
The elementary tests are de-emphasized. There are no questions about the choice of elementary tests.
Regression is over-represented, relative to how much time we spent on it. Review the ideas of full versus reduced models, interpretation of regression coefficients, and dummy variable coding for categorical independent variables. You may be asked to calculate Y-hat. Bring a calculator. Do you think you might have to make a table of E[Y|X] for the different values of a categorical independent variable?
Of course setting up tests for main effects and interactions in terms of contrasts is on the exam.
Analysis of within-cases data is important. Of the three methods for treating such data, the old-fashioned mixed models (in which subject is a random effect nested within the between-cases factors) is de-emphasized.
If the Scheffé tests occur, it will be in the SAS section, and only in connection with univariate tests. You don't need to know any formulas
Early versions of the exam were much too long. Some things you would expect do not appear because they were cut out.

SAS

You will not be asked to write any SAS code on the final. There will be three new data sets, which are described below. To prepare for the final exam, familiarize yourself with the data sets and analyze them using methods from the course. Draw conclusions, and be ready to state them in plain, non-statistical language. You will not bring your printouts to the exam. Instead, you will get a copy of my printouts, and will answer questions based on that. The idea is that even if you do not do things exactly the same way I do, you will understand the output a lot better and faster if you have done it yourself.

If you do nothing else, at least familiarize yourself with the studies and variables (My SAS variable names are given, and I suggest you use them). The final exam does not include a description of the studies and variables. During the exam, Christine and I will answer questions about the data sets, but only if the answers are brief.

The Mantids Data

Mantids are insects, kind of like crickets or grasshoppers. When frightened, they emit loud noises that function as alarm calls. I believe they make the sounds by rubbing their hind legs together. The frequency (number of calls per minute) may indicate how alarmed the mantids are.

In this study, caged mantids (either Female or Male) were randomly assigned to be exposed to one of four predators (birds), and the number of alarm calls per minute was recorded. Each mantid was tested at three distances from the predator: 8 cm, 13 cm and 18 cm. The three distances were presented in different orders, but I am going to ignore order in the analysis I do.

There are three lines of data per case. The variables are

Case identification number (id1-id3): Repeated on each line.
Sex (sex, sex2 sex3): 0=Male, 1=Female. Repeated on each line.
Order of presentation (order, order2 order3): Repeated on each line. Ignore this.
Predator (predator, predator2 predator3): Numbered 1-4. Repeated on each line.
Distance (distance1-distance3): Distance from the predator in centimeters.
Number of alarm calls per minute (calls1-calls3)

I might do three things with this data set; at most I will do two of the three.

Two-factor multivariate ANOVA with three dependent variables, following up any significant multivariate tests with Bonferroni-corrected univariate tests.
Univariate two-factor ANOVA on just one of the dependent variables, with contrasts and Scheffé follow-ups.
Three-factor ANOVA with one of the factors within-cases, multivariate approach, any follow-ups Bonferroni and not Scheffé.

The Longitudinal IQ Data

IQ is short for "Intelligence Quotient." It is measured by various kinds of test. For all the tests, a score of 100 is considered average, while scores above 100 are above average and scores below 100 are below average. Whether IQ tests really measure intelligence is debatable and highly political. How much IQ is influenced by heredity as opposed to environment is also a question on which many people have strong opinions.

In the Longitudinal IQ Data, the IQs of adopted children were measured at ages 2, 4, 8 and 13. The birth mother's IQ was assessed at the time of adoption, and the adoptive mother's education (in years) was also recorded. The variables are:

ameduc: Adoptive mother's education
bmiq: Birth mother's IQ
age2iq: IQ at age 2
age4iq: IQ at age 4
age8iq: IQ at age 8
age13iq: IQ at age 13

I might do two things with these data (one or both)

Multivariate regression with Bonferroni-corrected univariate follow-ups.
One-factor within-cases analysis of covariance (covariance structure approach), following up with Bonferroni-corrected pairwise comparisons if indicated.

The Cartoon Data

In a test of how well people remember instructional materials, subjects of various educational levels were presented with training materials that were either in Black & White or in Colour. Their ability to recall the material was tested with both Cartoon and Realistic testing materials at two points in time -- immediately after training, and several weeks later. Scores on an IQ test (the Otis Mental Ability Test) were available for all subjects. The variables are

id: Subject identification number
colour: Colour of training materials: 0 = Black & White, 1 = Colour
educ: 0 = Pre-professional, 1 = Professional, 2 = Student.
location: 1 = Hospital A, 2 = Hospital B, 3 = Hospital C, 4 = Penn State University. Ignore this one too.
otis: Otis Test of Mental Ability (IQ)
cartoon1: Recall at Time One, Cartoon testing materials
real1: Recall at Time One, Realistic testing materials
cartoon2: Recall at Time Two, Cartoon testing materials
real2: Recall at Time Two, Realistic testing materials

I'm going to do a four-factor analysis of covariance with proc mixed.

Not everything will be on the exam

In this section, I have suggested 6 SAS analyses. Only four of them will be on the exam.

Afterthoughts

Added April 10th
- In any regressions you do, include the simple and corr options so you are used to how they look. Like this: proc reg simple corr;
- Make sure you answer the question that is asked. If you just dump memory and answer a similar question from one of the assignments, you will be lucky to get any marks at all.
- Several questions on the final ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like, correlation, regression, ANOVA, statistically significant, factorial design, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
- It is also very important in describing a set of findings to say what happened! For example, do not just say that the average amount of rot in potatoes was related to temperature. Instead, say that there was more rot on average at warmer temperatures.
  In a real-world situation (and in the artificial world we presently inhabit, too), you don't get part marks for an answer that (correctly) indicates a relationship is present, but does not say what it is. Imagine you are working in marketing, and you leave a voice mail that says "Consumers recalled one of the commercials better than the other one." Click. Are you trying to frustrate your boss? Are you trying to get fired?
- At more than one point in my analysis of the final exam data, I followed up a significant main effect with pairwise comparisons of marginal means, and protected them with either a Bonferroni or a Scheffé correction.
- Here is a copy of the STA 442 final exam from 2005. It should be useful, except for the parts dealing with categorical dependent variables.