STA312 Fall 2022 Final Exam Information
Time and Location
The final exam will be on Friday December 16th from 9 a.m. to 12 p.m. in Gym A/B (Rec. center attached to the Davis building).
Jerry's Office Hours for the Final
- Wednesday Dec. 14th 12-2, in Deerfield 3028. If we have to move to another space I'll leave a note on the door.
- Thursday Dec. 15th 2-4, starting in Deerfield 3028.
Format
You will write your answers on the question paper. The exam will be closed book and closed notes. You should bring a calculator with a natural log/exponential function. Any kind is acceptable unless it has communications capability. Pencil is okay.
The exam is 14 pages long, including space for answers. There are 9 questions. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on homework assignments and quizzes are a good indication of what to expect. This is likje a big long quiz.
Some of the questions (worth 30 points out of 100) include R printouts, and you are asked to answer typical questions about them. But this time it's my output rather than yours. More information about the R part is given below.
Coverage
The final exam is cumulative, except for the following.
- There are no questions on the analysis of within cases data. That is, there are no questions on lecture units 21, 22, 23 or 24. There is no homework on this material, either.
- Regression and analysis of variance with normally distributed response variables was introduced to clarify concepts useful for logistic regression and multinomial logit models. They will not be directly on the final exam.
- Review material (most of Assignment 1) is de-emphasized, but you do need to know how to derive an MLE.
For the rest of the course material, what you are supposed to be
able to do is indicated by the assignments. The text and lecture overheads
are intended to help you understand how to answer questions like the ones
in the assignments.
How to prepare
I would say do the homework again -- well most of the homework, anyway. Specifically, why not start with these questions? For the R parts, answer the questions again based on your printouts. These are the kinds of question I like to ask, and you should consider asking similar questions about the final exam data sets.
- Assignment 1: Just Q7. Question 8 is attractive too, but it's not going to be on the exam.
- Assignment 2: Questions 2 trrough 8.
- Assignment 3: All the questions.
- Assignment 4: Questions 8 through 11. I particularly like 11.
- Assignment 5: All the questions.
- Assignment 6: This is all normal theory regression. Skip it.
- Assignment 7: All the questions.
- Assignment 8: Questions 1 and 2. Skip 3, which is normal theory regression. You do have to be able to make a table, of course.
- Assignment 9: Question 3 only. The other two are normal theory regression.
- Assignment 10: All the questions.
- Assignment 11: Is this really another assignment? It's just a couple more questions so that nothing comes as a surprise. I don't even know if it should get its own number.
Comments and suggestions
- Assignment 1
-
- Assignment 3
- Assignment 4
- Assignment 5
- Assignment 6
- Assignment 7
- Assignment 8
- Assignment 9
-
- Assignment 11
You will not be asked to write any R code on the final. Instead, you will be asked questions about R output that I have produced. The R questions will be based on at least two of the following data sets. Do some analyses similar to what you were asked to do in homework. What I'm going to do with these data on the final exam is quite predictable.
- deathpen4.data.txt: Prisoners who were convicted of murder in Florida were classified as either Black or White, their victims were either Black or White, and they either got the death penalty, or they did not.
- ltc.data.txt: LTC stands for Long Term Care. Operators of long-term care homes are very interested in whether their elderly resi-
dents are going to survive, because they need to plan. In one study, the variables for
a sample of residents were
- One year survival (1=Yes, 0=No)
- Age in years
- Gender (1=F, 0=M)
- Indicator for dementia (1=Yes, 0=No)
To get a Yes for dementia, it has to be pretty serious, so that the person cannot safely
go outside without supervision.
- choice.data.txt: In the Program Choice data, graduating grade eight students were choosing their High School program. The potential choices were Academic, General and Vocational. Predictor variables are gender, socioeconomic status, and scores on reading, writing math science standardized tests.
Haris marks the quizzes, and I mark the final examination. When I read an answer, my main goal is to verify that you know what's going on. Here are some more details, mostly about what to avoid.
- Make sure you answer the question that is asked.
- If you answer another question instead of the one that's asked, you will lose substantial marks. It is especially risky to just
dump memory and answer a similar question from one of the assignments. If I detect this, you
will get a zero for the question. Thinking is what's important. Memory
without thinking is a crime that you should try to hide if you do commit
it.
- If you answer the question and also write something correct that is not asked, you will not get any extra marks. Your marks will be based on your answer to what is asked.
- However, if you say something off-topic that is wrong, you can definitely lose marks. To repeat, if you write a perfect answer to the question that is asked, and also write something incorrect, you will lose marks.
- Vocabulary is important. A large part of this course is about communication. You must be able to deal with the subject matter using both technical terms and plain language.
- Some questions on the final may ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, if plain language is requested then you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like logistic regression, log-linear model, independence, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
- It is also very important in describing a set of findings to
say what happened! For example, do not just say that
the average amount of rot in potatoes was related to
temperature. Instead, say that there was more rot on average at
warmer temperatures.
In a real-world situation (and in the artificial world we presently
inhabit, too), you don't get part marks for an answer that (correctly)
indicates a relationship is present, but does not say what it is. Imagine
you are working in marketing, and you leave a voice mail that says
"Consumers recalled one of the commercials better than the other one."
Click. Are you trying to frustrate your boss? Are you trying
to get fired?
- Some professors mark by looking for the correct answer, or
part of it. If they find something good, you get points for
it. This can encourage a kind of shotgun strategy for writing
answers. Just write everything you can think of, and maybe some of
it will be what this peculiar individual is looking for.
But that strategy backfires when I mark an exam, because
(except for simple numerical answers) I usually do not give marks for
things that are correct; I take off marks for things that are wrong or
missing. So, if a student writes a long answer that includes the correct
conclusion, the wrong conclusion (based on the same information!) and
something irrelevant, all I really see is the contradiction between the two
conclusions, and I will probably give the answer a zero. Yet it might be
that the student understands everything perfectly, but is just writing all
the crazy stuff as insurance against the unlikely possibility that maybe
that's what I am looking for. Let's make sure that you don't fall
into this trap!