Statistical Consulting Assignment 2
Due at the beginning of class on Thursday Oct. 2nd
In the United States, admission to university is based partly on high
school marks and recommendations, and partly on applicants' performance on a
standardized multiple choice test called the Scholastic Aptitude Test
(SAT). The SAT has two sub-tests, Verbal and Math. A university administrator
selects a random sample of 200 applicants, and obtains the Verbal SAT, the
Math SAT and first-year university Grade Point Average (GPA) for each
student. She wants an equation for predicting GPA from Verbal SAT and Math
SAT. That is, predicted GPA will be a function of two variables, Verbal SAT
and Math SAT. The raw data are available in the file
sat.data.
Use SAS on utstat to do the data analysis. No software other
than SAS is acceptable.
- For each of the 3 variables in the data file, produce (and examine) basic descriptive statistics including
- Means and standard deviations
- Boxplots
- Stem-and-leaf displays
- First and third quartiles
- Minimum and maximum
- Produce a correlation matrix of the three variables.
- Produce a scatterplot for each pair of variables.
- Do a procedure that will give you the prediction equation
requested by the client.
Here is what you will hand in.
- The log file (not just a listing of the program).
- The list file.
- On a separate sheet of paper (handwritten is okay, and one side of one page should be sufficient), answers to these questions:
- What proportion of the sample variation in GPA is explained by
the two components of the SAT test? The answer to this question is a single number
between zero and one.
- Write down the prediction equation based on material in your list
file. Denote predicted GPA by Y-hat, Verbal SAT by X1 and Math SAT by X2. All other components of the formula should be numbers.
- Give a full statement of the statistical model you would adopt in
order to carry out interval estimation and inference. You are being asked to
copy something very standard from a book (or remember it).
- There was something a bit strange here. The client was on
holiday and could not be reached, so you just fixed the problem using common
sense. What was the problem and what did you do about it? Note: This is a serious hint. All your results will be off if you ignore it.
- Give predicted GPA for a new student with a verbal SAT of 600 and a math SAT of 650. The answer is a single number. I expect you to do this with a calculator, though you could do it with SAS if you wanted to go to the trouble.
Rules
- Assignments are really due at the beginning of class. No
late assignments will be accepted under any circumstances, including last
moment failure of utstat. Excuses will be treated courteously, but ignored.
- This is not a group project. You are expected to do the most or all
of the work independently. It is okay to discuss general principles with each
other, but you should not look at the SAS code of any other student, or allow any other student to look at your code. It is okay to compare numerical answers, and compare SAS output. It is not okay to copy, or to allow your work to be copied.
More hints
I used proc univariate, proc corr, proc plot and proc reg. The following descriptive statistics were part of my output; you will likely find them in more than one place. If we agree on these numbers, then at least you read the data correctly (or else we made the same mistakes).
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
verbal 200 595.65000 73.20988 119130 361.00000 780.00000
math 200 649.53000 66.34711 129906 441.00000 800.00000
gpa 200 2.63000 0.58033 526.00000 0.30000 3.90000