STA441s24 Final Exam
This information applies only to the regular final exam, not the special deferred exam.
Time and location
The final exam will be on Wednesday April 17th in KN137, from 9am-12 noon. Questions are like the quiz questions, and as you know, the quiz questions are like the homework.
Format
The exam has 10 questions with parts a, b, c etc. occupying 10 pages including the cover sheet. You will write your answers on the question paper. There will be a separate packet with the
formula sheet, my SAS programs and my Results files. Keep a copy of the formula sheet handy as you prepare for the exam. Forty marks out of 100 are based on my SAS programs and results.
You may be sitting in alternate rows with students from another class taking their final exam in the same room. Our exam papers are white.
Homework
This course is all about the homework. The homework tells you what I want you to be able to do. Lecture material is only useful to the extent that it helps you do the homework. The text may help too. It is less focused on what we are doing this time, but it is more detailed.
To study for the final, I recommend that you
- Re-do the non-SAS parts of the homework.
- For each assignment, locate the corresponding lecture slides. They are pretty much in chronological order (order of time). If this is a difficult task, you are not familiar enough with the course material.
- Look at the lecture slides and the homework problems together. Observe how most of the homework problems are asking you to use some concept or method from the lecture. Of course sometimes I just want you to think about something, but most questions have a lesson.
- Re-do the problems, referring to your earlier answers
- If you do not get what a problem means or what it is asking you to do, this means you should find out. You are missing something, and it could be on the final exam.
- Don't forget the Piazza conversations. There's some good stuff there, including clarification of many questions. There are folders for the various homework assignments, so you should be able to find what you want.
- Using SAS, do something reasonable with the final data sets
described below. What's reasonable? In my opinion, more or less what you did on the SAS part of the homework. However, there is more than one "right answer." The important thing is to become familiar with the data sets, try some analyses, and understand the results. You will not bring your output to the exam. Questions will be based on my output.
Hint
Looking over the exam, I see that it is fairly heavy on regression models for factorial experiments. This wasn't exactly planned, but that's how it turned out.
Office hours for the final exam
- Jerry: Thursday April 11, 12-2 in 3028 Deerfield Hall
- Marija: Sunday April 14 10-11am via Zoom: The Meeting ID is 856 8758 5768. The web link is
https://utoronto.zoom.us/j/85687585768.
- Jerry: Monday April 15, 12-2 in 3028 Deerfield Hall
- Jerry: Tuesday April 16, 12-2 in 3028 Deerfield Hall
Quizzes
- Quiz 1
- Quiz 2
- Quiz 3a
- Quiz 3b
- Quiz 4
- Quiz 5
- Quiz 6
- Quiz 7
- Quiz 8
- Quiz 9
- Quiz 10
Marked quizzes are available during Jerry's office hours.
A few students who missed quizzes with a valid excuse were able to take a make-up test covering the material from Quizzes 1-10. It was lot like a final exam for this course, except there was no multivariate and no repeated measures. The SAS part was based on data sets from the assignments. The test is posted below. I recommend taking it with a 3 hour time limit, and only then checking the answers.
Old Final Exams
The topics covered in STA441 are never exactly the same. So, some of the exams below will have questions on material we have not covered -- for example, on logistic regression with more than two outcomes (multinomial logit models), or logistic regression with repeated measures using a random shock model. In 2020, the course was online. We covered a lot more material than usual, and students did write and run SAS code on the final exam, something you will not have to do.
Also, you should be aware that I often take homework problems and lecture examples from old final exam questions. So, a question on a past final might seem like it is straight from homework, but in that year it was probably new.
Some exam questions (worth 40 points)
will be based on my SAS output for at least two of the following data sets. Try some analyses. Look up any terminology that is unfamiliar, or you can ask in office hours (but why wait?). Understand what the variables are, because we will not be answering questions about the data sets during the exam, unless it's a very short answer. What I will do with the data is very predictable.
- The Tri-campus Data: Can you guess the university? The variables are Campus, High school grade point average, Fourth year university GPA, and the number of credits on which the fourth year GPA was based. I will use the variables names given in the first line of the data file.
- The Little Heart Data are from a study of employees at the Western Electric company. It's "little" because it's only a subset of the data. Variables (with my variable names) are
- age: In years
- height: In inches
- weight: In pounds
- educ: In years
- cig: Reported number of cigarettes per day
- famhist: Family history of coronary heart disease, 1=Yes and 0=No.
- CHD: Have coronary heart disease? 1=Yes and 0=No.
- alive: Alive 10 years after the study? 1=Yes and 0=No.
Here are a couple of comments. First, I'm going to calculate
body mass index
and call it bmi. Second, use proc freq to produce a table of CHD by alive. Based on this table, estimate the following odds ratio: Survival odds given CDH divided by survival odds given no CHD. The problem here is that the absence of CHD is just too good as a predictor of survival. This will cause the failure of any logistic regression model with CHD as one of the predictors of survival, and also of of any logistic regression model with survival as a "predictor" of CHD. So I didn't do it.
-
The Distraction data: In a study of the psychology of attention, subjects attempted to solve word problems while listening to distracting backgound noise. The distracting material was either music, or spoken words related to the problem they were trying to solve. The distracting material was presented at three different levels of loudness. Each subject attempted 10 problems at each combination of loudness and type of distraction, for a total of 60 problems. Order of presentation was randomized. Data for each subject are gender, and number correct in each of the six treatment combinations.
- The Hand-Ear Data: In an experiment on perception and attention, left-handed and right-handed subjects push a key when they hear their names over background noise. They are wearing stereo headphones. The signal comes in the left ear, the right ear, or both. There are 50 trials in each condition, presented in a different random order for each subject. The response variable is median reaction time in milliseconds. Each subject contributes 3 medians.
Extras
Plain language conclusions: For example, see basicmath.sas in SAS Example Six.
- tables ethnic * tongue / nocol nopercent chisq;
First language was related to ethnic background.
- Ethnic by Tongue: Asians vs. Eastern Europeans
Asians were less likely than Eastern Europeans to have English as their first language.
- Ethnic by Tongue: Asians vs. Other Europeans
Asians were less likely than European not Eastern to have English as their first language.
- Ethnic by Tongue: Asians vs. Middle East
Students whose ethnic background was judged to be Asian were less likely to have English as their first language than students whose ethnic background was judged to be Middle-Eastern or Pakistani.
- Ethnic by Tongue: Asians vs. East Indian
Asians were less likely than East Indians to have English as their first language.
- Ethnic by Tongue: Asians vs. Other / DK
Asians were less likely than Other/DK to have English as their first language.
- Eastern European vs. Other European
Eastern European were less likely to have English as their first language than Europeans who were not Eastern European.
- Eastern European vs. Middle Eastern
There was no evidence of a difference between Eastern Europeans and Middle Eastern/Pakistanis in having English as their first language.
- tables (sex ethnic tongue course) * passed / nocol nopercent chisq;
These results are consistent with males and females being equally likely to pass the course.
Differences between ethnic groups in passing the course were small enough to be attributed to chance.
There was no clear evidence of a connection between first language and passing the course.
- Sex and Grade
There was no evidence of a real difference in in the final marks of male and female students.
- Ethnic and Grade
Differences in final marks between students from the various ethnic groups were small enough to be attributed to sampling error.
- Mother tongue and Grade
Students whose first language was not English got higher marks on average.
- Course and Grade
Average marks in the three courses were roughly the same.
What to say about SAS (in a job interview).
I had a course where we used SAS University Edition, so that's base SAS running in the SAS Studio environment. We read data from plain text data files using a simple form of the input statement and we used proc import to read from Excel spreadsheets. We used assignment statements and if statements to create new variables, and proc format to label the values. We used arrays and do loops in the data step. We used proc reg and proc glm for univariate and multivariate regression and analysis of variance, and we used proc logistic for logistic regression. We used proc autoreg on time series data. We used proc mixed as well as proc glm and proc reg to analyze repeated measures and longitudinal data when the response variable was assumed normal. We used ODS to send results to proc iml for further calculations, and we also used ODS select sometimes to limit the output.
If they ask about the put statement, say "Oh, that's like a print statement for writing on the log file, but we didn't use it."
If they ask about macros, say "The only part of the macro language we used was %include."
This document is licensed under a Creative Commons Attribution-ShareAlike 3.0 (or later) Unported License. The basketball data are protected by the Creative Commons license too.