About the Final

STA312f10 Final Exam Information

Time and Location

The final exam will be on Thursday Dec 9th from 8-11 a.m. in the Cafeteria (South Building).

Office Hours:

Tuesday Nov 30: 10am - 12 noon (Jerry)
Thursday Dec. 2nd: 10am - 12 noon (Jerry)
Tuesday Dec. 7th: 10am - 12 noon (Christine)
Wednesday Dec 8th: 10 am - 1pm (Jerry)

Review slides (One part accidentally repeated near the end.)

Aids Allowed

Calculator (Statistical calculator allowed) and formula sheet. The formula sheet will be supplied. Click here for a copy of the formula sheet that will be supplied with the exam. You will notice that the error caught in class has been corrected.

Make sure that the calculator you bring has natural log and exponential functions. Be careful not to use the log key on your calculator; almost certainly it gives log base 10. In this course, log means natural log, which is probably ln on your calculator.

Format

It's a three-hour exam. You will write your answers on the examination paper. There are six questions. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect. Also see more detailed information below.

Preparing for the exam

I see three main ways to study for the exam: reviewing your answers to homework assignments, reviewing the lecture displays (the overheads), and doing one final set of data analyses.

Reviewing the Homework

For the exam, some homework problems are more important than others. If you know how to do the following, you will be fine.

Assignment 1: # 13 (Not #14, though it is interesting ...)
Assignment 2:
- #9, but forget about the R part. See formula sheet for the critical values. You may see p-values on the exam, but if so they will be on printouts provided as part of the exam. Just know how to interpret them (Reject H₀ or not).
- #10
Assignment 3:
- #5: Be able to do (c) by hand
- #7
- #8: I should have mentioned that this is a saturated model.
- #9, #10, #11
- #13 by hand with a calculator. I will not ask you to do any plottng on the exam, though I do think plotting log observed frequencies is a useful trick.
Assignment 4
- #1-3, 5
- Just look at your output from #6, and be sure you understand the output. One thing I did not ask much on the homework, but I do ask a lot on the exam, is to state the null hypothesis in symbols. So, for every log-linear model you fit, be able to state the model in bracket notation and, when the fit of the model is tested, be able to state the null hypothesis in μ notation. In little bits and pieces, this is worth quite a few marks on the exam.
Assignment 5 and Assignment 6: Carefully review the process and the kind of question, but realize it will be about the heart data; see the computer assignment below for more detail. Again, for every log-linear model you fit, be able to state the model in bracket notation and, when the fit of the model is tested, be able to state the null hypothesis in μ notation.
Assignment 7: #3-7
Assignment 8:
- #1, 2
- #4 Especially see Part (b), but of course with the heart data. Note the kind of question. Here, I did ask for H₀ in symbols, something that you willl see for logistic regression on the exam. But the logistic regression output on the exam is generated by SAS.
Assignment 9:
- #1. Consider a question of this type for logistic regression, too.
- #3, but of course with the heart data.
Assignment 10:
- #1: And, could you carry out a Score test for this hypothesis? But don't spend a huge amount of time working on the Wald and Score tests. Just because it is on the formula sheet does not mean it is on the shortened version of the exam that you will see on Thursday. This hint does not apply to the Special Deferred exam, which I hope no one will take.
  But take a look at parts (m) through (r). They are trying to teach the same lesson as #2 on Assignment 8.
- #2: This is the same data set that will be used on the final exam, so review this one carefully and use your work as a starting point for the computer assignment. For every test on the printout, you should be able to give the null hypothesis in symbols. For the symbols, see the formula sheet. Remember, the order of variables in the model statement will correspond to the order of the independent variables in the logistic regression formula.
- #3: There is definitely something like this on the exam. What could you do if you had the asymptotic covariance matrix as well?

Final Computer Assignment

Seventy marks out of 100 are based on answering questions about pieces of computer printout from R and SAS. Log-linear models and Poisson regression (if there is any) will be done with R, and logistic regression will be done with SAS. Everything will be based on the heart data used in Assignment 10. The file heartread.sas has been slightly expanded; you should download the most recent version and work from that. Here are some details and suggestions.

Log-linear models: In Assignment 10, you found two useful predictors of heart attacks, and they were both categorical. This gives you a nice 3-dimensional table to analyze with log-linear modeling methods. To get the data into R, I suggest you not struggle with the raw data, which has missing values and could be a real pain with R. Instead, make a three-dimensional table with proc freq, and put it into R mannually. Find a good best model. For any model you fit, be able to decribe the model in bracket notation, and using language like "Television is associated with traffic accidents," and so on. When you test the model, be able to state the null hypothesis in symbols. Once you have the model you like most, be able to say what is going on in plain language, like "Those who watch a lot of Television have fewer traffic accidents."
Look at some 2-dimensional marginal tables. There certainly will be a 2x2 table. A lot of questions can be asked about a 2x2 table. Could you label the output as in Log-linear Part 2?
Logistic Regression with a 2-category outcome: With the heart data, consider models not just for heart attack (attack), but for presence versus absence of coronary heart disease (chd), and whether the person was alive 10 years later (alive). Potential predictor variables will be limited to age diastol cholest bmi smoker famhist educat.
Logistic Regression where the outcome has more than 2 categories. The variable you will see is named outcome. Make a frequency distribution to see what it is. On the exam, the reference category will be "Alive 10 yrs later." Find a nice model. Again, the predictor variables will be limited to age diastol cholest bmi smoker famhist educat. Try generating some simple models too, and be sure you can estimates of quantities like . β₁₂ and so on. For all tests , be able to state the null hypothesis for all tests, in symbols, and bee able to state conclusions in plain, non-technical language. You will not see ordered categories on the final exam.

Here are a few more comments.

For any printout, be able to give parameter estimates. They are numbers from the printout. This shows you understand how the model is connected to what the software does.
Does the model fit adequately? Answer Yes or No.
Critical value. I may edit out the column of p-values sometimes.
It's always the 0.05 level, whether I say it explicitly or not. No tricks.
You may see some test statements in proc logistic. Be sure you can do and interpret it, even when the outcome has more than 2 categories.
You will not be asked to write any SAS or R code on the final.