STA442/1008 Final Exam Information


Time and Location

The final exam will be on Thursday April 12th from 9 a.m. to 12 p.m. in Gym C (Davis Building).

Jerry's Office Hours for the Final

Quiz solutions are now posted below.

Format

You will write your answers on the question paper. The exam will be closed book and closed notes. You should bring a calculator (any kind is acceptable unless it has communications capability). Pencil is okay.

There are 10 questions, occupying 9 pages. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect.

The last 4 questions (worth 52 out of 100 marks on the exam) are based on SAS output that will be provided to you with the exam. Again, the type of questions will be familiar from the assignments and quizzes. More information about the SAS part is given below.

Coverage

The final exam is cumulative. What you are supposed to be able to do is indicated by the assignments, including the SAS part of this one. The text and lecture overheads are intended to help you understand how to answer questions like the ones in the assignments.

Not all parts of the course are equally represented. Here are some details.

SAS

You will not be asked to write any SAS code on the final. There will be five data sets, two of which you have seen before. They are described below. To prepare for the final exam, familiarize yourself with the data sets and analyze them using methods from the course. My SAS variable names are given, and I suggest you use them, even for the furnace data. Draw conclusions, and be ready to state your findings in plain, non-statistical language. You will not bring your printouts to the exam. Instead, you will get a copy of my printouts, and will answer questions based on them. The idea is that even if you do not do things exactly the same way I do, you will understand the output a lot better and faster if you have done it yourself. Note that I may or may not center the data for some analyses.

If you do nothing else, at least familiarize yourself with the studies and variables. The final exam does not include a full description of the studies and variables. During the exam, Cristina and I will answer questions about the data, but only if the answers are very brief.

You will notice below that unlike the SAS assignments during the term, you are not always being asked specific sample questions about the data sets. This time, it is your job to ask the relevant questions and choose the statistical techniques that will help you answer them. The questions on computer assignments during the term should be your guide. For some of the data sets, more than one statistical technique is applicible, and you should not hesitate to do more than one kind of analysis. Be prepared to follow up any significant multivariate tests with Bonferroni-corrected univariate tests. See, if you understand that last sentence, you've learned something in the course.

Of course you may discuss the questions with other people, but this is not the time to let yourself be convinced too easily by your friends. I promise you that in several cases, there is more than one set of questions you could ask about the data, and (correspondingly) more than one natural and reasonable analysis. Please avoid tunnel vision, and do it your way first. Then compare answers. This way, it's more likely that somebody will think of what I'll do on the exam. In a group setting, if four people come up with six analyses, the whole group will benefit. The question is not which analysis is right, but whether each one is reasonable (or not).

The Furnace Data

We have some unfinished business here. We never had a quiz on Assignment 6, so there are some potential exam questions. Also, think of a 2-factor ANOVA: Type of vent damper by in-out. There are 2 main ways I might do it. And elementary tests are always possible. My variable names are: typfurn area shape height liner house age dampin dampout damper.

The Poverty Data

For 97 countries, the United Nations supplied data on birth rates, death rates, infant death rates, life expectancies for males and females, and Gross National Product. The variables (with my variable names) are:

To me, the birth and health stuff are the dependent variables.

The Mantids Data

You saw this data set in Assignment 10. The data are already set up nicely for proc mixed. My variable names are id Sex Order Predator Distance Calls. If you recall, obtaining the marginal means (including 2-way tables of marginal means, averaging over the 3d variable) was a chore in Assignment 10. But proc mixed gives them to you easily with lsmeans. My strategy will be to list only the effects that are statistically significant. For example, if the only significant effects were the main effect for B, the A by C interaction and the A by B by C interaction, I would say

lsmeans B A*C A*B*C;

The Salmon Data

These data represent growth for a sample of Alaskan and Canadian salmon. Apparently, growth during different time periods can be estimated by the diameter of rings in a fish's scales. We have two measurements of growth: marine growth (growth during the fishes' first year of life in the ocean) and freshwater growth. The variables (with my variable names) are:

Either approach to repeated measures is a possibility.

The The Blood Flow Data

A clinician studied the effects of 2 drugs used either alone or together on the blood flow of human subjects. Twelve healthy middle-aged men participated in the study; they are viewed as a random sample. Each of the men received all four treatment combinatons in a random order, with 2-week resting periods in between. The four values for each subject are increases in blood flow compard to a single baseline measurement. My variable names are Patient NN NY YN YY.

Quizzes and Past Exams

For the past exams, ignore anything with categorical dependent variables, except for basic chi-square tests of independence. Especially, ignore the questions on logistic regression in the 2009 exam.

Here are some of the quizzes (all except number six), with solutions. From this point on, no further changes to the marking of the quizzes will be considered.

More comments and suggestions

Cristina marks the quizzes, and I mark the final examination. We have basically the same standards and objectives, but we are not identical (lucky for her). You might say that this section is about my personal peculiarities -- just in the way I mark exams, of course. It is helpful for you to know about this, so your exam-taking strategy will not conflict with my exam-marking strategy.

The purpose of STA442 is for you to learn to use statistical methods to draw reasonable conclusions from numerical data. Often, the first several parts of a question will ask for technical details, and the last part will ask for a conclusion (often in plain, non-statistical language). If the technical part is missing, it does not matter what you conclude. Similarly, an answer that has most of the technical details right but gets the conclusion wrong (or leaves it off, or states it incompletely) is almost worthless, and will get few marks. On the other hand, if you make minor technical mistakes but draw reasonable conclusions from what you have, you can still get substantial marks.

When I read an answer, my main goal is to verify that you know what's going on. Here are some more details, mostly about what to avoid.

  1. Make sure you answer the question that is asked.
    1. If you answer another question instead of the one that's asked, you will lose substantial marks. It is especially risky to just dump memory and answer a similar question from one of the assignments. If I detect this, you will get a zero for the question. Thinking is what's important. Memory without thinking is a crime that you should try to hide if you do commit it.
    2. If you answer the question and also write something correct that is not asked, you will not get any extra marks. Your marks will be based on your answer to what is asked.
    3. However, if you say something off-topic that is wrong, you can definitely lose marks. To repeat, if you write a perfect answer to the question that is asked, and also write something incorrect, you will lose marks.
  2. Vocabulary is important. A large part of this course is about communication. You must be able to deal with the subject matter using both technical terms and plain language.
  3. Some questions on the final may ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, if plain language is requested then you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like, correlation, regression, ANOVA, statistically significant, factorial design, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
  4. It is also very important in describing a set of findings to say what happened! For example, do not just say that the average amount of rot in potatoes was related to temperature. Instead, say that there was more rot on average at warmer temperatures.

    In a real-world situation (and in the artificial world we presently inhabit, too), you don't get part marks for an answer that (correctly) indicates a relationship is present, but does not say what it is. Imagine you are working in marketing, and you leave a voice mail that says "Consumers recalled one of the commercials better than the other one." Click. Are you trying to frustrate your boss? Are you trying to get fired?

  5. Some professors mark by looking for the correct answer, or part of it. If they find something good, you get points for it. This can encourage a kind of shotgun strategy for writing answers. Just write everything you can think of, and maybe some of it will be what this peculiar individual is looking for.

    But that strategy backfires when I mark an exam, because (except for simple numerical answers) I usually do not give marks for things that are correct; I take off marks for things that are wrong or missing. So, if a student writes a long answer that includes the correct conclusion, the wrong conclusion (based on the same information!) and something irrelevant, all I really see is the contradiction between the two conclusions, and I will probably give the answer a zero. Yet it might be that the student understands everything perfectly, but is just writing all the crazy stuff as insurance against the unlikely possibility that maybe that's what I am looking for. Let's make sure that you don't fall into this trap!