About the Final Exam

STA442/1008 Final Exam Information

Time and Location

The final exam will be on Thursday April 12th from 9 a.m. to 12 p.m. in Gym C (Davis Building).

Jerry's Office Hours for the Final

Tuesday April 3d, 10:30-12:30
Thursday April 5th 10:30-12:30
Monday April 9th 11:00-2:00
Wednesday April 11th 11:00-2:00

Quiz solutions are now posted below.

Format

You will write your answers on the question paper. The exam will be closed book and closed notes. You should bring a calculator (any kind is acceptable unless it has communications capability). Pencil is okay.

There are 10 questions, occupying 9 pages. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect.

The last 4 questions (worth 52 out of 100 marks on the exam) are based on SAS output that will be provided to you with the exam. Again, the type of questions will be familiar from the assignments and quizzes. More information about the SAS part is given below.

Coverage

The final exam is cumulative. What you are supposed to be able to do is indicated by the assignments, including the SAS part of this one. The text and lecture overheads are intended to help you understand how to answer questions like the ones in the assignments.

Not all parts of the course are equally represented. Here are some details.

The basic concepts and vocabulary of Chapter One are important, and worth a relatively large amount of marks. See Assignment 1. There will be some True-false. You get either full marks or zero on the True-False, and you will have to get 10 out of 13 right in order to get any marks. Some of the true-false questions will be about material from later in the course, not just Chapter 1. Do you think you might have to make up a study?
Based on a data-oriented question (like "Given level of rainfall, is soil acidity a useful predictor of crop yield?") able to state the null hypothesis you'd test. The answer will be some statement about μ or (more likely) β quantities. This kind of question is quick, clean, and requires you to connect the research questions of a study to the statistical model. There is more than one question of this type. Together, they are worth a lot of marks.
Regression is important. Review the ideas of full versus reduced models, interpretation of regression coefficients, and dummy variable coding for categorical independent variables. How many kinds of dummy variable coding do you know? They are all on the exam. You may be asked to calculate Y-hat. Formulas (including the formula for F in terms of a and a in terms of F) will be on the cover sheet of the exam if you need them; you need not memorize any formulas. Bring a calculator. Do you think you might have to fill in a table of E[Y|X] for the different values of a categorical independent variable?
Of course you will be asked to set up tests for main effects and interactions in terms of contrasts or regression coefficients.
Analysis of within-cases data is important. Of the three main methods for treating such data, the old-fashioned mixed models (in which subject is a random effect nested within the between-cases factors) is in the text only (no lecture) and will not be on the exam. As you know, these tests appear as a by-product of the multivariate approach (labelled "univariate tests"), but you will not be asked any questions about them. But for any data where a within-cases analysis is possible, you should be prepared for either the multivariate approach or the covariance structure approach. You will not see both for the same data set, but you should do it both ways to prepare for the exam.
Bonferroni follow-ups will be emphasized over Scheffé. You don't need to know any formulas, except maybe that the adjusted p-value = p*k.
Early versions of the exam were much too long. The exam is very predictable, but some things you would expect to see do not appear because they were cut out.

SAS

You will not be asked to write any SAS code on the final. There will be five data sets, two of which you have seen before. They are described below. To prepare for the final exam, familiarize yourself with the data sets and analyze them using methods from the course. My SAS variable names are given, and I suggest you use them, even for the furnace data. Draw conclusions, and be ready to state your findings in plain, non-statistical language. You will not bring your printouts to the exam. Instead, you will get a copy of my printouts, and will answer questions based on them. The idea is that even if you do not do things exactly the same way I do, you will understand the output a lot better and faster if you have done it yourself. Note that I may or may not center the data for some analyses.

If you do nothing else, at least familiarize yourself with the studies and variables. The final exam does not include a full description of the studies and variables. During the exam, Cristina and I will answer questions about the data, but only if the answers are very brief.

You will notice below that unlike the SAS assignments during the term, you are not always being asked specific sample questions about the data sets. This time, it is your job to ask the relevant questions and choose the statistical techniques that will help you answer them. The questions on computer assignments during the term should be your guide. For some of the data sets, more than one statistical technique is applicible, and you should not hesitate to do more than one kind of analysis. Be prepared to follow up any significant multivariate tests with Bonferroni-corrected univariate tests. See, if you understand that last sentence, you've learned something in the course.

Of course you may discuss the questions with other people, but this is not the time to let yourself be convinced too easily by your friends. I promise you that in several cases, there is more than one set of questions you could ask about the data, and (correspondingly) more than one natural and reasonable analysis. Please avoid tunnel vision, and do it your way first. Then compare answers. This way, it's more likely that somebody will think of what I'll do on the exam. In a group setting, if four people come up with six analyses, the whole group will benefit. The question is not which analysis is right, but whether each one is reasonable (or not).

The Furnace Data

We have some unfinished business here. We never had a quiz on Assignment 6, so there are some potential exam questions. Also, think of a 2-factor ANOVA: Type of vent damper by in-out. There are 2 main ways I might do it. And elementary tests are always possible. My variable names are: typfurn area shape height liner house age dampin dampout damper.

The Poverty Data

For 97 countries, the United Nations supplied data on birth rates, death rates, infant death rates, life expectancies for males and females, and Gross National Product. The variables (with my variable names) are:

birthrate: Live birth rate per 1,000 of population
deathrate: Death rate per 1,000 of population
infmort: Infant deaths per 1,000 of population under 1 year old
lifexM: Life expectancy at birth for males
lifexF: Life expectancy at birth for females
gnp: Gross National Product per capita in U.S. dollars
group: Country Group
1. Eastern Europe
2. South America and Mexico
3. Western Europe, North America, Japan, Australia, New Zealand -- let's just call them "Industrialized."
4. Middle East
5. Asia
6. Africa
country: Country -- not a variable, more like a case identifier.

To me, the birth and health stuff are the dependent variables.

The Mantids Data

You saw this data set in Assignment 10. The data are already set up nicely for proc mixed. My variable names are id Sex Order Predator Distance Calls. If you recall, obtaining the marginal means (including 2-way tables of marginal means, averaging over the 3d variable) was a chore in Assignment 10. But proc mixed gives them to you easily with lsmeans. My strategy will be to list only the effects that are statistically significant. For example, if the only significant effects were the main effect for B, the A by C interaction and the A by B by C interaction, I would say

lsmeans B A*C A*B*C;

The Salmon Data

These data represent growth for a sample of Alaskan and Canadian salmon. Apparently, growth during different time periods can be estimated by the diameter of rings in a fish's scales. We have two measurements of growth: marine growth (growth during the fishes' first year of life in the ocean) and freshwater growth. The variables (with my variable names) are:

country: 1=Alaskan 2=Canadian
gender: 1=Female 2=Male
fresh: Diameter of rings for first-year freshwater growth in 100ths of an inch
marine: Diameter of rings for first-year marine growth in 100ths of an inch

Either approach to repeated measures is a possibility.

The The Blood Flow Data

A clinician studied the effects of 2 drugs used either alone or together on the blood flow of human subjects. Twelve healthy middle-aged men participated in the study; they are viewed as a random sample. Each of the men received all four treatment combinatons in a random order, with 2-week resting periods in between. The four values for each subject are increases in blood flow compard to a single baseline measurement. My variable names are Patient NN NY YN YY.

Quizzes and Past Exams

For the past exams, ignore anything with categorical dependent variables, except for basic chi-square tests of independence. Especially, ignore the questions on logistic regression in the 2009 exam.

STA442 Spring 2008
- Exam
- Printouts
STA442 Fall 2009
- Exam
- Printouts

Here are some of the quizzes (all except number six), with solutions. From this point on, no further changes to the marking of the quizzes will be considered.

More comments and suggestions

Cristina marks the quizzes, and I mark the final examination. We have basically the same standards and objectives, but we are not identical (lucky for her). You might say that this section is about my personal peculiarities -- just in the way I mark exams, of course. It is helpful for you to know about this, so your exam-taking strategy will not conflict with my exam-marking strategy.

The purpose of STA442 is for you to learn to use statistical methods to draw reasonable conclusions from numerical data. Often, the first several parts of a question will ask for technical details, and the last part will ask for a conclusion (often in plain, non-statistical language). If the technical part is missing, it does not matter what you conclude. Similarly, an answer that has most of the technical details right but gets the conclusion wrong (or leaves it off, or states it incompletely) is almost worthless, and will get few marks. On the other hand, if you make minor technical mistakes but draw reasonable conclusions from what you have, you can still get substantial marks.

When I read an answer, my main goal is to verify that you know what's going on. Here are some more details, mostly about what to avoid.

Make sure you answer the question that is asked.
1. If you answer another question instead of the one that's asked, you will lose substantial marks. It is especially risky to just dump memory and answer a similar question from one of the assignments. If I detect this, you will get a zero for the question. Thinking is what's important. Memory without thinking is a crime that you should try to hide if you do commit it.
2. If you answer the question and also write something correct that is not asked, you will not get any extra marks. Your marks will be based on your answer to what is asked.
3. However, if you say something off-topic that is wrong, you can definitely lose marks. To repeat, if you write a perfect answer to the question that is asked, and also write something incorrect, you will lose marks.
Vocabulary is important. A large part of this course is about communication. You must be able to deal with the subject matter using both technical terms and plain language.
Some questions on the final may ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, if plain language is requested then you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like, correlation, regression, ANOVA, statistically significant, factorial design, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
It is also very important in describing a set of findings to say what happened! For example, do not just say that the average amount of rot in potatoes was related to temperature. Instead, say that there was more rot on average at warmer temperatures.
In a real-world situation (and in the artificial world we presently inhabit, too), you don't get part marks for an answer that (correctly) indicates a relationship is present, but does not say what it is. Imagine you are working in marketing, and you leave a voice mail that says "Consumers recalled one of the commercials better than the other one." Click. Are you trying to frustrate your boss? Are you trying to get fired?
Some professors mark by looking for the correct answer, or part of it. If they find something good, you get points for it. This can encourage a kind of shotgun strategy for writing answers. Just write everything you can think of, and maybe some of it will be what this peculiar individual is looking for.
But that strategy backfires when I mark an exam, because (except for simple numerical answers) I usually do not give marks for things that are correct; I take off marks for things that are wrong or missing. So, if a student writes a long answer that includes the correct conclusion, the wrong conclusion (based on the same information!) and something irrelevant, all I really see is the contradiction between the two conclusions, and I will probably give the answer a zero. Yet it might be that the student understands everything perfectly, but is just writing all the crazy stuff as insurance against the unlikely possibility that maybe that's what I am looking for. Let's make sure that you don't fall into this trap!