About the Final

STA442/1008 Final Exam Information

Time and Location

The final exam will be on Monday Dec 7th from 12-3 p.m. in the Gym (South Building).

Office Hours

Wednesday Dec 2nd: 10-12 (Jerry)
Thursday Dec. 3d: Cancelled because of illlness
Friday Dec. 4th: 2-5 (Jerry)
Sunday Dec. 6th 10-1 (Jerry)

Format

You will write your answers in examination books. Please avoid the temptation to write answers on the examination paper, especially when filling out tables. Copy the table into your exam book, and then fill it out. If you write on the examination paper rather than the book, your answer will probably get lost, and you will get a zero for the question.

The exam will be closed book and closed notes. You should bring a calculator.

There are nine questions. Most of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect.

The last several questions (worth 40 out of 100 marks on the exam) are based on SAS output that will be provided to you. Again, the type of questions will be familiar from the assignments and quizzes. More information about the SAS part is given below.

Coverage

The final exam is cumulative. What you are supposed to be able to do is indicated by the assignments, including this one. The text, handouts and lecture overheads are intended to help you understand how to do the assignments.

Not all parts of the course are equally represented. Here are some details.

The basic concepts and vocabulary of Chapter One is important, and worth a relatively large amount of marks. See Assignment 1. There will be some True-false, and you will have to get 8 out of 10 right in order to get any marks. Do you think you might have to make up a study?
Given a data-oriented question (like "Given level of rainfall, is soil acidity a useful predictor of crop yield?") able to state the null hypothesis you'd test. The answer will be some statement about μ or (more likely) β quantities.
Regression and logistic regression are important. Review the ideas of full versus reduced models, interpretation of regression coefficients, and dummy variable coding for categorical independent variables. You may be asked to calculate Y-hat or an estimated probability. Formulas will be on the cover sheet of the exam if you need them; you need not memorize any formulas. Bring a calculator. Do you think you might have to make a table of E[Y|X] for the different values of a categorical independent variable?
Of course setting up tests for main effects and interactions in terms of contrasts is on the exam.
Analysis of within-cases data is important. Of the three main methods for treating such data, the old-fashioned mixed models (in which subject is a random effect nested within the between-cases factors) is in the text only (no lecture) and will not be on the exam. As you know, these tests appear as a by-product of the multivariate approach (labelled "univariate tests"), but you will not be asked any questions about them.
If the Scheffé tests occur, it will be in the SAS section, and not in connection with multivariate tests. You don't need to know any formulas.
Early versions of the exam were much too long. Some things you would expect do not appear because they were cut out.

SAS

You will not be asked to write any SAS code on the final. There will be five data sets, one of which you have seen before. They are described below. To prepare for the final exam, familiarize yourself with the data sets and analyze them using methods from the course. Draw conclusions, and be ready to state your findings in plain, non-statistical language. You will not bring your printouts to the exam. Instead, you will get a copy of my printouts, and will answer questions based on them. The idea is that even if you do not do things exactly the same way I do, you will understand the output a lot better and faster if you have done it yourself.

If you do nothing else, at least familiarize yourself with the studies and variables. My SAS variable names are given, and I suggest you use them. The final exam does not include a full description of the studies and variables. During the exam, Christine and I will answer questions about the data, but only if the answers are brief.

You will notice below that unlike the SAS assignments during the term, you are not always being asked specific sample questions about the data sets. This time, it is your job to ask the relevant questions and choose the statistical techniques that will help you answer them. The questions on computer assignments during the term should be your guide. For some of the data sets, more than one statistical technique is applicible, and you should not hesitate to do more than one kind of analysis.

Of course you may discuss this with other people, but this is not the time to let yourself be convinced too easily by your friends. I promise you that in several cases, there is more than one set of questions you could ask about the data, and (correspndingly) more than one natural and reasonable analysis. If you avoid tunnel vision by doing it your way first and then comparing answers, it's more likely that one of you come up with what you'll see on the exam. In a group setting, if four people come up with six analyses, the whole group will benefit.

The Longitudinal IQ Data

IQ is short for "Intelligence Quotient." It is measured by various tests. For all the tests, a score of 100 is considered average, while scores above 100 are above average and scores below 100 are below average. Whether IQ tests really measure intelligence is debatable and highly political. How much IQ is influenced by heredity as opposed to environment is also a question on which many people have strong opinions.

In the Longitudinal IQ Data, the IQs of adopted children were measured at ages 2, 4, 8 and 13. The birth mother's IQ was assessed at the time of adoption, and the adoptive mother's education (in years) was also recorded. The variables (with my variable names) are:

ameduc: Adoptive mother's education
bmiq: Birth mother's IQ
age2iq: IQ at age 2
age4iq: IQ at age 4
age8iq: IQ at age 8
age13iq: IQ at age 13

The Birdkeeping Data

People who raise large numbers of birds inhale potentially dangerous material, especially tiny fragments of feathers. Can this be a risk factor for lung cancer, controlling for other possible risk factors? Which of those other possible risk factors are important? Here are the variables in the file, along with my variable names.

                       
       Variable                 My variable Name        Values
                                
       Lung Cancer                  cancer              1=Yes, 0=No
       Gender                       sex                 1=Female, 0=Male
       Socioeconomic Status         ses                 1=High, 0=Low
       Birdkeeping                  birdkeep            1=Yes, 0=No
       Age                          age
       Years smoked                 yrsmoke
       Cigarettes per day           cigday

The Berkeley Data from Chapter 4

Using proc freq, verify that the departments are not equally likely to admit a student who applies for graduate study. But which ones are different from each other? Using another procedure (not proc freq), obtain Wald tests for all pairwise comparisons of departments. Using a Bonferroni correction, which departments have different admission rates? State your findings in clear, non-statistical language.
Men and women tend to apply to different departments, on average. Carry out an analysis in which the independent varable is Sex, and the dependent variable is Department. Obtain a Wald test.

The Eating Data

In this study, pairs of university students came to a Psychology laboratory to eat a meal together. They were either friends or strangers, they ate from either small or large plates, and the food was in either a common bowl or separate bowls. Before the meal, they rated how hungry they were. The total amount of food they served out onto their plates and the total amount of food they actually ate were recorded, in grams. Here are the variables in the file, along wth my variable names.

                       
       Variable        Values
       
        Friend          1 = Friends      2 = Strangers
        Plate           1 = Large Plate  2 = Small Plate
        Share           1 = Common Bowl  2 = Separate bowls
        Hunger          Mean of the two ratings
        FoodSrv         Grams
        FoodEat         Grams

Here are some hints. What are the natural dependent variables? What is the natural covariate? I did not make my own dummy variables.

The Dichotic Listening Data

In a dichotic listening experiment, subjects wear stereo headphones that allow the presentation of different sound tracks to each ear, at the same time. In this example, right-handed female university students listened to short lectures on art history in the presence of background noise. After each lecture, they answered a set of multiple choice questions.

Two factors were varied experimentally:.

Noise Type: The background noise consisted of either Hip-hop music, Classical music or Radio commercials. Volume was carefully held constant.
Presentation: Subjects heard (simultaneously) either
- Lecture (Signal) in the left ear and distraction (Noise) in the right
- Distraction (Noise) in the left ear and lecture (Signal) in the right, or
- Both Signal and Noise in both ears

Each subject in the experiment experienced all nine treatment combinations, in a balanced order that was different for each subject, and randomly assigned. Thus, there are nine data values for each subject: number of questions answered correctly in each experimental condition. The first part of the raw data appears below; it shows just the first three cases. Follow the link above to get the whole data set. You should use my SAS variable names (test11, etc.).

           Signal in           Signal in            Signal in
           Left Ear            Right Ear            Both Ears
     ____________________ ____________________ ____________________
     HipHop Classc Radio  HipHop Classc Radio  HipHop Classc Radio
     ------ ------ ------ ------ ------ ------ ------ ------ ------
     test11 test12 test13 test21 test22 test23 test31 test32 test33
 1     13     12     10     15     14     14     14     13     14
 2      4      8      8      6      5      8      6      3      4
 3     13     15     11     11     13     15     11     13     12

More comments and suggestions

Christine marks the quizzes, and I mark the final examination. We have basically the same standards and objectives, but we are not identical (lucky for her). You might say that this section is about my personal peculiarities -- just in the way I mark exams, of course. It is helpful for you to know about this, so your exam-taking strategy will not conflict with my exam-marking strategy.

The purpose of STA442 is for you to learn to use statistical methods to draw reasonable conclusions from numerical data. Often, the first several parts of a question will ask for technical details, and the last part will ask for a conclusion. If the technical part is missing, it does not matter what you conclude. Similarly, an answer that has most of the technical details right but gets the conclusion wrong (or leaves it off, or states it incompletely) is almost worthless, and will get few marks. On the other hand, if you make technical mistakes but draw reasonable conclusions from what you have, you can still get substantial marks.

When I read an answer, my main goal is to verify that you know what's going on. Here are some more details, mostly about what to avoid.

Make sure you answer the question that is asked. If you just dump memory and answer a similar question from one of the assignments, you will be lucky to get any marks at all. Thinking is what's important. Memory without thinking is a crime that you should try to hide if you do commit it.
Questions on the final may ask you to state results "in plain, non-statistical language." Please do not ignore the request for plain language. Regardless of what you say, you will get zero marks if you mention the null hypothesis, or use any statistical or technical terms like, correlation, regression, ANOVA, statistically significant, factorial design, positive relationship, controlling for, and so on. Even the word "significant" (without "statistically") should be avoided; it's borderline.
It is also very important in describing a set of findings to say what happened! For example, do not just say that the average amount of rot in potatoes was related to temperature. Instead, say that there was more rot on average at warmer temperatures.
In a real-world situation (and in the artificial world we presently inhabit, too), you don't get part marks for an answer that (correctly) indicates a relationship is present, but does not say what it is. Imagine you are working in marketing, and you leave a voice mail that says "Consumers recalled one of the commercials better than the other one." Click. Are you trying to frustrate your boss? Are you trying to get fired?
Some professors mark by looking for the correct answer, or part of it. If they find something good, you get points for it. This can encourage a kind of shotgun strategy for writing answers. Just write everything you can think of, and maybe some of it will be what this clown is looking for.
But this strategy backfires when I mark an exam, because (except for simple numerical answers) I usually do not give marks for things that are correct; I take off marks for things that are wrong or missing. So, if a student writes a long answer that includes the correct conclusion, the wrong conclusion (based on the same information!) and something irrelevant, all I really see is the contradiction between the two conclusions, and I will probably give the answer a zero. Yet it might be that the student understands everything perfectly, but is just writing all the crazy stuff as insurance against the unlikely possibility that maybe that's what I am looking for. Let's make sure that you don't fall into this trap!