Statistical Consulting Assignment 2

Due at the beginning of class on Thursday Oct. 2nd


In the United States, admission to university is based partly on high school marks and recommendations, and partly on applicants' performance on a standardized multiple choice test called the Scholastic Aptitude Test (SAT). The SAT has two sub-tests, Verbal and Math. A university administrator selects a random sample of 200 applicants, and obtains the Verbal SAT, the Math SAT and first-year university Grade Point Average (GPA) for each student. She wants an equation for predicting GPA from Verbal SAT and Math SAT. That is, predicted GPA will be a function of two variables, Verbal SAT and Math SAT. The raw data are available in the file sat.data.

Use SAS on utstat to do the data analysis. No software other than SAS is acceptable.

Here is what you will hand in.

  1. The log file (not just a listing of the program).
  2. The list file.
  3. On a separate sheet of paper (handwritten is okay, and one side of one page should be sufficient), answers to these questions:
    1. What proportion of the sample variation in GPA is explained by the two components of the SAT test? The answer to this question is a single number between zero and one.
    2. Write down the prediction equation based on material in your list file. Denote predicted GPA by Y-hat, Verbal SAT by X1 and Math SAT by X2. All other components of the formula should be numbers.
    3. Give a full statement of the statistical model you would adopt in order to carry out interval estimation and inference. You are being asked to copy something very standard from a book (or remember it).
    4. There was something a bit strange here. The client was on holiday and could not be reached, so you just fixed the problem using common sense. What was the problem and what did you do about it? Note: This is a serious hint. All your results will be off if you ignore it.
    5. Give predicted GPA for a new student with a verbal SAT of 600 and a math SAT of 650. The answer is a single number. I expect you to do this with a calculator, though you could do it with SAS if you wanted to go to the trouble.

Rules

More hints

I used proc univariate, proc corr, proc plot and proc reg. The following descriptive statistics were part of my output; you will likely find them in more than one place. If we agree on these numbers, then at least you read the data correctly (or else we made the same mistakes).

                              Simple Statistics
 
Variable         N        Mean     Std Dev         Sum     Minimum     Maximum

verbal         200   595.65000    73.20988      119130   361.00000   780.00000
math           200   649.53000    66.34711      129906   441.00000   800.00000
gpa            200     2.63000     0.58033   526.00000     0.30000     3.90000