STA302f17 Final Exam Information
Note: This information applies only to the regularly scheduled exam, not the special deferred exam.
The current formula sheet will be supplied with the exam. Use it as you re-do the homework problems. The rule is that you may use anything on the formula sheet unless you are directly asked to prove it. Facts like Cov(X,Y) = E(XY) - E(X)E(Y) are assumed and do not require proof.
There are 6 questions, occupying 13 pages including the cover page, R printout and space for you to write the answers. Many of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect. It is a three-hour exam.
Seventy-two points out of 100 are prove this, derive that and so on. There is a 28-point question based on my R output. The type of questions will be familiar from the assignments and quizzes. More information about the R part is given below.
A partial exception to the rule above is Assignment One, which was review. Nothing from Assignment One will directly be on the final exam unless it also appears on a later assignment. Of course the knowledge needed to do Assignment One is assumed.
The cases in the Census Tract data (there are n cases) are a sample of census tracts in the United States. For each census tract, the following variables are recorded.
A big issue with these data is that the census tracts vary a lot in their population sizes, so that for example number of physicians will automatically have a strong positive correlation with number of serious crimes. We could try to correct for population size with regression methods, but a more standard thing to do is to convert to rates. Dividing a number by population size yields a rate, like crime rate. It will be helpful for all of us to have the same variable names, so that my R output will be more familiar and easier to read when you see it on the final exam. Please use this code directly.
# Calculate rates: Divide by population size, yielding number per 1,000 people doc_rate = docs/pop bed_rate = beds/pop labor_rate = labor/pop ave_income = income/pop # In thousands of dollars crime_rate = crimes/pop density = pop/area # Thousands of people per square mile # Make region a factor region = factor(region,labels=c("Northeast","NorthCentral","South","West")) contrasts(region) # Note 1 = Northeast is still alphabetically first
Naturally, the dependent variable is crime rate. I am going to try backwards stepwise selection and forward stepwise selection, which yield different sets of independent variables in this case. I will choose the set that seems more interesting. Then I will do some fairly predictable things and ask you questions about them.
My answers to Quizzes 1-9 will be posted after you've had a chance to get your quizzes back from Yanbo on Thursday November 30th and deal with any marking issues. After I post my answers, there will be no further discussion of the marking. Release of the Quiz 10 solutions will be delayed until you have had a chance to get your quizzes back.
Past exams