STA442/2101 Final Exam Information
Office hours will be in Bissel 114, the same basement location we have had for most of the term.
Here are my answers to Quizzes 1-10, except for the R code. From this point on, no further changes to the marking of the quizzes 1-10 will be considered. I will post the solution to Quiz 11 once you've had a chance to get your quiz back and look it over.
The current (and final) formula sheet will be supplied with the quiz. It is a very good idea to be familiar with it, so you can find what you need easily.
There are 8 questions, occupying 15 pages. Some of the pages consist of R output. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect.
Not all parts of the course are equally represented. Early versions of the exam were much too long. The exam is quite predictable, but some things you would expect to see do not appear because they were cut out.
If you do nothing else, at least familiarize yourself with the studies and variables. The final exam assumes you are familiar with these data sets. It does not include a full description of the studies and variables. During the exam, I will answer questions about the data, but only if the answers are very brief.
You will notice below that unlike the R assignments during the term, you are not always being asked specific sample questions about the data sets. This time, it is your job to ask the relevant questions and choose the statistical techniques that will help you answer them. The questions on computer assignments during the term should be your guide. For some of the data sets, more than one statistical technique is applicible, and you should not hesitate to do more than one kind of analysis. Be prepared to follow up any significant tests with multiple comparisons.
Of course you may discuss the questions with other people, but this is not the time to let yourself be convinced too easily by your friends. I promise you that in several cases, there is more than one set of questions you could ask about the data, and (correspondingly) more than one natural and reasonable analysis. Please avoid tunnel vision, and do it your way first. Then compare answers. This way, it's more likely that somebody will think of what I'll do on the exam. In a group setting, if four people come up with six analyses, the whole group will benefit. The question is not which analysis is right, but whether each one is reasonable (or not).
Thirty-four marks out of 100 are based on R output.
This is a built-in R data set. In the R Package manager, check MASS. Then click on birthwt to see a description of the data set. One possible response variable is baby's weight at birth, but also the "indicator of birth weight less than 2.5 kg" variable is clinically meaningful because babies in that category tend to have health problems.
I know this is gruesome, but the data are real -- from the U of T School of Dentistry.
An experiment in dentistry seeks to test the effectiveness of a drug (HEBP) that is supposed to help dental implants become more firmly attached to the jaw bone. This is an initial test on animals. False teeth were implanted into the leg bones of rabbits, and the rabbits were randomly assigned to receive either the drug or a saline solution (placebo). Technicians administering the drug were blind to experimental condition.
Rabbits were also randomly assigned to be "sacrificed" after either 3, 6, 9 or 12 days. At that time, the implants were pulled out of the bone by a machine that measures force in newtons and stiffness in newtons/mm. For both of these measurements, higher values indicate more healing, because it takes more force to pull out the tooth. A measure of "pre-load stiffness" in newtons/mm is also available for each animal. This may be another indicator of how firmly the false tooth was implanted into the bone, but it might even be a covariate. Nobody can seem to remember what "preload" means, so we'll ignore this variable for now. The variables are
These data represent growth for a sample of Alaskan and Canadian salmon. Apparently, growth during different time periods can be estimated by the diameter of rings in a fish's scales. We have two measurements of growth: marine growth (growth during the fishes' first year of life in the ocean) and freshwater growth. The variables are:
A clinician studied the effects of 2 drugs used either alone or together on the blood flow of human subjects. Twelve healthy middle-aged men participated in the study; they are viewed as a random sample. Each of the men received all four treatment combinatons in a random order, with 2-week resting periods in between. The four values for each subject are increases in blood flow compared to a single baseline measurement. Here is the format of the data file:
No Drug A Yes Drug A _____________ ______________ No B Yes B No B Yes B Patient ---- ----- ---- ----- 1 2 10 9 25 2 -1 8 6 21 3 0 11 8 24
An anonymous (I hope) graduate student marked the quizzes, but I mark the final examination. We have basically the same standards and objectives, but we are not identical (lucky for him). You might say that this section is about my personal peculiarities -- just in the way I mark exams, of course. It is helpful for you to know about this, so your exam-taking strategy will not conflict with my exam-marking strategy.
The purpose of this course is to help you learn to use statistical methods to draw reasonable conclusions from numerical data. Often, the first several parts of a question will ask for technical details, and the last part will ask for a conclusion (often in plain, non-statistical language). If the technical part is missing, it does not matter what you conclude. Similarly, an answer that has most of the technical details right but gets the conclusion wrong (or leaves it off, or states it incompletely) is almost worthless, and will get few marks. On the other hand, if you make minor technical mistakes but draw reasonable conclusions from what you have, you can still get substantial marks.
When I read an answer, my main goal is to verify that you know what's going on. Here are some more details, mostly about what to avoid.
In a real-world situation (and in the artificial world we presently inhabit, too), you don't get part marks for an answer that (correctly) indicates a relationship is present, but does not say what it is. Imagine you are working in marketing, and you leave a voice mail that says "Consumers recalled one of the commercials better than the other one." Click. Are you trying to frustrate your boss? Are you trying to get fired?
But that strategy backfires when I mark an exam, because (except for simple numerical answers) I usually do not give marks for things that are correct; I take off marks for things that are wrong or missing. So, if a student writes a long answer that includes the correct conclusion, the wrong conclusion (based on the same information!) and something irrelevant, all I really see is the contradiction between the two conclusions, and I will probably give the answer a zero. Yet it might be that the student understands everything perfectly, but is just writing all the crazy stuff as insurance against the unlikely possibility that maybe that's what I am looking for. Let's make sure that you don't fall into this trap!