About the Final Exam

STA302f17 Final Exam Information

Note: This information applies only to the regularly scheduled exam, not the special deferred exam.

Time and Location

The final exam will be on Tuesday December 12th from 1 to 4 p.m. in Gym C in the Davis Building.

Jerry's Office Hours for the Final

Tuesday Dec 5th 11-1
Thursday Dec. 7th 11-1
Monday Dec 11th 11-1

Yanbo's Office Hours for the Final

To be announced.

Format

You will write your answers on the question paper. The exam will be closed book and closed notes. You should bring a calculator (any kind is acceptable unless it has communications capability). Pencil is okay.

The current formula sheet will be supplied with the exam. Use it as you re-do the homework problems. The rule is that you may use anything on the formula sheet unless you are directly asked to prove it. Facts like Cov(X,Y) = E(XY) - E(X)E(Y) are assumed and do not require proof.

There are 6 questions, occupying 13 pages including the cover page, R printout and space for you to write the answers. Many of the questions have more than one part. The questions are not equally difficult, and not equally time-consuming. The questions on assignments and quizzes are a good indication of what to expect. It is a three-hour exam.

Seventy-two points out of 100 are prove this, derive that and so on. There is a 28-point question based on my R output. The type of questions will be familiar from the assignments and quizzes. More information about the R part is given below.

Coverage

The final exam is cumulative, but as I look over the exam I see more emphasis on the early and middle part of the course, except that some of the later stuff appears in the R part. What you are supposed to be able to do is indicated by the assignments. The text and lecture overheads are intended to help you understand how to answer questions like the ones in the assignments. If you are wondering whether you're responsible for something, look in the assignments. If it's asked, you're responsible for it. If it's not asked, then you may safely disregard it. This applies to concepts and methods of course, not the exact wording of the questions.

A partial exception to the rule above is Assignment One, which was review. Nothing from Assignment One will directly be on the final exam unless it also appears on a later assignment. Of course the knowledge needed to do Assignment One is assumed.

R

You will not be asked to write any R code on the final. You will answer questions based on my R input and output. Everything will be based on the Census Tract Data described below. My plan is to do some reasonable, predictable things like what I did in lecture and what you did in the homework. You should do some of those things too, and think about what the results mean. That way you will be able to understand what I did a lot more rapidly and easily. Be able to draw plain-language, directional conclusions. At the very least, familiarize yourself with the data and understand what all the variables are. This is important because we will not answer questions about the data during the exam.

The cases in the Census Tract data (there are n cases) are a sample of census tracts in the United States. For each census tract, the following variables are recorded.

Identification number
area: Land area in square miles
pop: Population in thousands
urban: Percent of population in cities
old: Percent of population 65 or older
docs: Number of active physicians
beds: Number of hospital beds
hs: Percent of population 25 or older completing 12+ years of school
labor: Number of persons 16+ employed or looking for work
income: Total Total before tax income in millions of dollars
crimes: Total number of serious crimes reported by police
region: Region of the country: 1=Northeast, 2=North Central, 3=South, 4=West

A big issue with these data is that the census tracts vary a lot in their population sizes, so that for example number of physicians will automatically have a strong positive correlation with number of serious crimes. We could try to correct for population size with regression methods, but a more standard thing to do is to convert to rates. Dividing a number by population size yields a rate, like crime rate. It will be helpful for all of us to have the same variable names, so that my R output will be more familiar and easier to read when you see it on the final exam. Please use this code directly.

# Calculate rates: Divide by population size, yielding number per 1,000 people
doc_rate = docs/pop
bed_rate = beds/pop
labor_rate = labor/pop
ave_income = income/pop # In thousands of dollars
crime_rate = crimes/pop
density = pop/area # Thousands of people per square mile
# Make region a factor
region = factor(region,labels=c("Northeast","NorthCentral","South","West"))
contrasts(region) # Note 1 = Northeast is still alphabetically first

Naturally, the dependent variable is crime rate. I am going to try backwards stepwise selection and forward stepwise selection, which yield different sets of independent variables in this case. I will choose the set that seems more interesting. Then I will do some fairly predictable things and ask you questions about them.

Quizzes and Past Exams

My answers to Quizzes 1-9 will be posted after you've had a chance to get your quizzes back from Yanbo on Thursday November 30th and deal with any marking issues. After I post my answers, there will be no further discussion of the marking. Release of the Quiz 10 solutions will be delayed until you have had a chance to get your quizzes back.

Past exams

Quizzes with solutions (but no R code)