STA312 Fall 2012 Final Exam Information
Time and Location
The final exam will be on Friday December 14th from 9 a.m. to 12 p.m. in IB 140.
Jerry's Office Hours for the Final
- Wed Dec. 5th, 11-2
- Tues. Dec 11th 11-2
- Wed. Dec 12th 11-2
- Thurs. Dec 13th 11-2
Format
You will write your answers on the question paper. The exam will be closed book and closed notes. You should bring a calculator with a natural log/exponential function. Any kind is acceptable unless it has communications capability. Pencil is okay.
There are 9 questions, occupying 13 pages. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on homework assignments and quizzes are a good indication of what to expect.
Some of the questions (worth 37 points out of 100) include R printouts, and you are asked to answer typical questions about them. But this time it's my output rather than yours. More information about the R part is given below.
Coverage
The final exam is cumulative. What you are supposed to be
able to do is indicated by the assignments. The text and lecture overheads
are intended to help you understand how to answer questions like the ones
in the assignments.
Not all parts of the course are equally represented. Mostly this is
because otherwise the exam would have been too long. Here are some details.
- For any method, know what the model parameters mean! This
is the connecion between reality (data) and the statistical model. If
you can do this, you will be able to answer questions like
- State the model
- State the null hypothesis
- State the conclusion in plain language
- Use parameter estimates for prediction
- Give the test statistic that helps answer a concrete question.
- Any power analysis will use the non-central chi-square and not the kind of elaborate calculations involving the standard normal distribution that you see in the introductory slide set.
- You will not be asked to give definitions, but vocabulary is important. Know what the technical terminology means.
- But also be able to step away from technical vocabulary. The ability to describe results in plain, non-statistical language is important, and it will be asked.
- Even though we mostly relied on numerical maximum likelihood, explicit maximum likelihood by differentiating and setting the derivative to zero is possible -- but only for the Poisson and the multinomial (including the Bernoulli and Binomial as special cases).
- Regression and analysis with normally distributed response variables was introduced to clarify concepts useful for categorical data analysis. They will not be directly on the final exam.
- Make a table showing how dummy variables are defined? Quite possibly.
- Power analysis? Quite possibly. It pretty much has to be for a multinomial model or a special case of a multinomial.
- What are the main tools for data analysis that were introduced in the course?
- Little one-sample and two sample problems involving the Bernoulli distribution. These are helpful for clarifying and illustrating basic concepts.
- One-dimensional methods for multinomial data.
- Fisher's exact test
- Logistic regression
- Poisson regression
- Multinomial logit models
- Log-linear models (lambda and beta notation)
- There will be nothing on structural zeros.
- Early versions of the exam were much too long. The exam is very predictable, but some things you would expect to see do not appear because they were cut out.
You will not be asked to write any R code on the final. Instead, you will be asked questions about R output that I have produced. I will limit myself to the following data sets.
- mathcat.data: Used in many lectures
- attack.data: Heart attack data described in Assignment 9
- The Birth Weight data. This is information about a sample of new mothers. The response variable is a variable called low, an indicator of a baby with dangerously low weight at birth. To get the data and more information, type library(MASS); help(birthwt) at the R prompt.
- The Titanic data. I used this in lecture to illustrate structural zeros, but there will be no structural zeros on the exam. For more information, type help(Titanic) at the R prompt.
Kaijie marks the quizzes, and I mark the final examination.
You might say that this section is about my personal
peculiarities -- just in the way I mark exams, of course. It is helpful for
you to know about this, so your exam-taking strategy will not conflict with
my exam-marking strategy.
The purpose of learning Statistics is so you can use statistical methods to
draw reasonable conclusions from numerical data. Often, the first several
parts of a question will ask for technical details, and the last part will
ask for a conclusion (often in plain, non-statistical language). If the technical part is missing, it does not matter
what you conclude. Similarly, an answer that has most of the technical
details right but gets the conclusion wrong (or leaves it off, or states it
incompletely) is almost worthless, and will get few marks. On the other
hand, if you make minor technical mistakes but draw reasonable conclusions from
what you have, you can still get substantial marks.
When I read an answer, my main goal is to verify that you know what's
going on. Here are some more details, mostly about what to avoid.
- Make sure you answer the question that is asked.
- If you answer another question instead of the one that's asked, you will lose substantial marks. It is especially risky to just
dump memory and answer a similar question from one of the assignments. If I detect this, you
will get a zero for the question. Thinking is what's important. Memory
without thinking is a crime that you should try to hide if you do commit
it.
- If you answer the question and also write something correct that is not asked, you will not get any extra marks. Your marks will be based on your answer to what is asked.
- However, if you say something off-topic that is wrong, you can definitely lose marks. To repeat, if you write a perfect answer to the question that is asked, and also write something incorrect, you will lose marks.
- Vocabulary is important. A large part of this course is about communication. You must be able to deal with the subject matter using both technical terms and plain language.
- Some questions on the final may ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, if plain language is requested then you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like logistic regression, log-linear model, independence, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
- It is also very important in describing a set of findings to
say what happened! For example, do not just say that
the average amount of rot in potatoes was related to
temperature. Instead, say that there was more rot on average at
warmer temperatures.
In a real-world situation (and in the artificial world we presently
inhabit, too), you don't get part marks for an answer that (correctly)
indicates a relationship is present, but does not say what it is. Imagine
you are working in marketing, and you leave a voice mail that says
"Consumers recalled one of the commercials better than the other one."
Click. Are you trying to frustrate your boss? Are you trying
to get fired?
- Some professors mark by looking for the correct answer, or
part of it. If they find something good, you get points for
it. This can encourage a kind of shotgun strategy for writing
answers. Just write everything you can think of, and maybe some of
it will be what this peculiar individual is looking for.
But that strategy backfires when I mark an exam, because
(except for simple numerical answers) I usually do not give marks for
things that are correct; I take off marks for things that are wrong or
missing. So, if a student writes a long answer that includes the correct
conclusion, the wrong conclusion (based on the same information!) and
something irrelevant, all I really see is the contradiction between the two
conclusions, and I will probably give the answer a zero. Yet it might be
that the student understands everything perfectly, but is just writing all
the crazy stuff as insurance against the unlikely possibility that maybe
that's what I am looking for. Let's make sure that you don't fall
into this trap!