Statistical Consulting Assignment 1
Due at the beginning of class on Tuesday Sept
25th
In the United States, admission to university is based partly on high
school grades and recommendations, and partly on applicants' performance on a
standardized multiple choice test called the Scholastic Aptitude Test
(SAT). The SAT has two sub-tests, Verbal and Math. A university administrator
selects a random sample of 200 applicants, and obtains the Verbal SAT, the
Math SAT and first-year university Grade Point Average (GPA) for each
student. She wants an equation for predicting GPA from Verbal SAT and Math
SAT. That is, predicted GPA will be a function of two variables, Verbal SAT
and Math SAT. The raw data are available here.
Use SAS, preferably on utstat, to do the data analysis. No software other
than SAS is acceptable.
- For each of the 3 variables in the data file, produce (and examine)
- A histogram
- A boxplot
- A stem-and-leaf display
- Mean and standard deviation
- First and third quartiles
- Do a procedure that will let you write down the prediction equation
requested by the client.
Here is what you will hand in.
- The log file (not the command file).
- The list file.
- On a separate sheet of paper (handwritten is okay), answers to these
questions:
- What proportion of the sample variation in GPA is explained by
the two components of the SAT test? The answer to this question is a number
between zero and one.
- Write down the prediction equation based on material in your list
file. Denote Verbal SAT by X1 and Math SAT by X2. All
other components of the formula should be numbers.
- Give a full statement of the statistical model you would adopt in
order to carry out interval estimation and inference. You are being asked to
copy something very standard from a book (or remember it).
- There was something very strange going on here. The client was on
holiday and could not be reached, so you just fixed the problem using common
sense. What was the problem and what did you do about it?
- This last question (which has two parts) is a bit tougher. If you
are able to do it, good! If you are not able to do this last question, it's
not the end of the world.
- The client wants to be able to calculate a "margin of error" as
well as a predicted first year GPA for any new student who applies for
admission. Give a formula for the lower limit of the 95% prediction interval
(not the confidence interval; you'll need to look this up). Give a formula for
the upper limit of the 95% prediction interval. In the formulas,
denote Verbal SAT by X1 and Math SAT by X2. All other
components of the formulas should be numbers.
- For a new student with a verbal SAT of 600 and a math SAT of 650,
give a predicted GPA along with numerical values of the 95% prediction
interval. Use a hand calculator for this. You can use any software you want to
get the critical value, though you can do it with SAS if you know how.
Comments
- The main purpose of this assignment is to let you familiarize
yourself with SAS on utstat. Therefore, some things are left out and will be
covered later. Two examples are residual plots (very dersirable in real data
analysis) and high-resolution graphics.
- There are a few lessons about consulting and data analysis hidden in
the assignment, though. These will be pointed out later in class, though you
will probably discover them yourselves.
- Please feel free to ask Tamara, Jerry or Barbara for
assistance. Jerry will be around most of the day on Tuesday Sept. 18.
Rules
- Assignments are really due at the beginning of class. No
late assignments will be accepted under any circumstances, including last
moment failure of utstat. Excuses will be treated courteously, but ignored.
- This is not a group project. You are expected to do the most or all
of the work independently. It is okay to discuss general principles with each
other (and to come to the instructors for help), but you should not look at
the SAS code of any other student, or allow any other student to look at your
code. It is okay to compare numerical answers, and compare SAS
output. It is not okay to copy, or to allow your work to be copied.