About the Final Exam

STA312 Fall 2012 Final Exam Information

Time and Location

The final exam will be on Friday December 14th from 9 a.m. to 12 p.m. in IB 140.

Jerry's Office Hours for the Final

Wed Dec. 5th, 11-2
Tues. Dec 11th 11-2
Wed. Dec 12th 11-2
Thurs. Dec 13th 11-2

Format

You will write your answers on the question paper. The exam will be closed book and closed notes. You should bring a calculator with a natural log/exponential function. Any kind is acceptable unless it has communications capability. Pencil is okay.

There are 9 questions, occupying 13 pages. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on homework assignments and quizzes are a good indication of what to expect.

Some of the questions (worth 37 points out of 100) include R printouts, and you are asked to answer typical questions about them. But this time it's my output rather than yours. More information about the R part is given below.

Coverage

The final exam is cumulative. What you are supposed to be able to do is indicated by the assignments. The text and lecture overheads are intended to help you understand how to answer questions like the ones in the assignments.

Not all parts of the course are equally represented. Mostly this is because otherwise the exam would have been too long. Here are some details.

For any method, know what the model parameters mean! This is the connecion between reality (data) and the statistical model. If you can do this, you will be able to answer questions like
- State the model
- State the null hypothesis
- State the conclusion in plain language
- Use parameter estimates for prediction
- Give the test statistic that helps answer a concrete question.
Any power analysis will use the non-central chi-square and not the kind of elaborate calculations involving the standard normal distribution that you see in the introductory slide set.
You will not be asked to give definitions, but vocabulary is important. Know what the technical terminology means.
But also be able to step away from technical vocabulary. The ability to describe results in plain, non-statistical language is important, and it will be asked.
Even though we mostly relied on numerical maximum likelihood, explicit maximum likelihood by differentiating and setting the derivative to zero is possible -- but only for the Poisson and the multinomial (including the Bernoulli and Binomial as special cases).
Regression and analysis with normally distributed response variables was introduced to clarify concepts useful for categorical data analysis. They will not be directly on the final exam.
Make a table showing how dummy variables are defined? Quite possibly.
Power analysis? Quite possibly. It pretty much has to be for a multinomial model or a special case of a multinomial.
What are the main tools for data analysis that were introduced in the course?
- Little one-sample and two sample problems involving the Bernoulli distribution. These are helpful for clarifying and illustrating basic concepts.
- One-dimensional methods for multinomial data.
- Fisher's exact test
- Logistic regression
- Poisson regression
- Multinomial logit models
- Log-linear models (lambda and beta notation)
There will be nothing on structural zeros.
Early versions of the exam were much too long. The exam is very predictable, but some things you would expect to see do not appear because they were cut out.

R

You will not be asked to write any R code on the final. Instead, you will be asked questions about R output that I have produced. I will limit myself to the following data sets.

mathcat.data: Used in many lectures
attack.data: Heart attack data described in Assignment 9
The Birth Weight data. This is information about a sample of new mothers. The response variable is a variable called low, an indicator of a baby with dangerously low weight at birth. To get the data and more information, type library(MASS); help(birthwt) at the R prompt.
The Titanic data. I used this in lecture to illustrate structural zeros, but there will be no structural zeros on the exam. For more information, type help(Titanic) at the R prompt.

More comments and suggestions

Kaijie marks the quizzes, and I mark the final examination. You might say that this section is about my personal peculiarities -- just in the way I mark exams, of course. It is helpful for you to know about this, so your exam-taking strategy will not conflict with my exam-marking strategy.

The purpose of learning Statistics is so you can use statistical methods to draw reasonable conclusions from numerical data. Often, the first several parts of a question will ask for technical details, and the last part will ask for a conclusion (often in plain, non-statistical language). If the technical part is missing, it does not matter what you conclude. Similarly, an answer that has most of the technical details right but gets the conclusion wrong (or leaves it off, or states it incompletely) is almost worthless, and will get few marks. On the other hand, if you make minor technical mistakes but draw reasonable conclusions from what you have, you can still get substantial marks.

When I read an answer, my main goal is to verify that you know what's going on. Here are some more details, mostly about what to avoid.

Make sure you answer the question that is asked.
1. If you answer another question instead of the one that's asked, you will lose substantial marks. It is especially risky to just dump memory and answer a similar question from one of the assignments. If I detect this, you will get a zero for the question. Thinking is what's important. Memory without thinking is a crime that you should try to hide if you do commit it.
2. If you answer the question and also write something correct that is not asked, you will not get any extra marks. Your marks will be based on your answer to what is asked.
3. However, if you say something off-topic that is wrong, you can definitely lose marks. To repeat, if you write a perfect answer to the question that is asked, and also write something incorrect, you will lose marks.
Vocabulary is important. A large part of this course is about communication. You must be able to deal with the subject matter using both technical terms and plain language.
Some questions on the final may ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, if plain language is requested then you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like logistic regression, log-linear model, independence, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
It is also very important in describing a set of findings to say what happened! For example, do not just say that the average amount of rot in potatoes was related to temperature. Instead, say that there was more rot on average at warmer temperatures.
In a real-world situation (and in the artificial world we presently inhabit, too), you don't get part marks for an answer that (correctly) indicates a relationship is present, but does not say what it is. Imagine you are working in marketing, and you leave a voice mail that says "Consumers recalled one of the commercials better than the other one." Click. Are you trying to frustrate your boss? Are you trying to get fired?
Some professors mark by looking for the correct answer, or part of it. If they find something good, you get points for it. This can encourage a kind of shotgun strategy for writing answers. Just write everything you can think of, and maybe some of it will be what this peculiar individual is looking for.
But that strategy backfires when I mark an exam, because (except for simple numerical answers) I usually do not give marks for things that are correct; I take off marks for things that are wrong or missing. So, if a student writes a long answer that includes the correct conclusion, the wrong conclusion (based on the same information!) and something irrelevant, all I really see is the contradiction between the two conclusions, and I will probably give the answer a zero. Yet it might be that the student understands everything perfectly, but is just writing all the crazy stuff as insurance against the unlikely possibility that maybe that's what I am looking for. Let's make sure that you don't fall into this trap!