Statistical Consulting Assignment 1

Due at the beginning of class on Tuesday Sept 25th


In the United States, admission to university is based partly on high school grades and recommendations, and partly on applicants' performance on a standardized multiple choice test called the Scholastic Aptitude Test (SAT). The SAT has two sub-tests, Verbal and Math. A university administrator selects a random sample of 200 applicants, and obtains the Verbal SAT, the Math SAT and first-year university Grade Point Average (GPA) for each student. She wants an equation for predicting GPA from Verbal SAT and Math SAT. That is, predicted GPA will be a function of two variables, Verbal SAT and Math SAT. The raw data are available here.

Use SAS, preferably on utstat, to do the data analysis. No software other than SAS is acceptable.

Here is what you will hand in.

  1. The log file (not the command file).
  2. The list file.
  3. On a separate sheet of paper (handwritten is okay), answers to these questions:
    1. What proportion of the sample variation in GPA is explained by the two components of the SAT test? The answer to this question is a number between zero and one.
    2. Write down the prediction equation based on material in your list file. Denote Verbal SAT by X1 and Math SAT by X2. All other components of the formula should be numbers.
    3. Give a full statement of the statistical model you would adopt in order to carry out interval estimation and inference. You are being asked to copy something very standard from a book (or remember it).
    4. There was something very strange going on here. The client was on holiday and could not be reached, so you just fixed the problem using common sense. What was the problem and what did you do about it?
    5. This last question (which has two parts) is a bit tougher. If you are able to do it, good! If you are not able to do this last question, it's not the end of the world.
      1. The client wants to be able to calculate a "margin of error" as well as a predicted first year GPA for any new student who applies for admission. Give a formula for the lower limit of the 95% prediction interval (not the confidence interval; you'll need to look this up). Give a formula for the upper limit of the 95% prediction interval. In the formulas, denote Verbal SAT by X1 and Math SAT by X2. All other components of the formulas should be numbers.
      2. For a new student with a verbal SAT of 600 and a math SAT of 650, give a predicted GPA along with numerical values of the 95% prediction interval. Use a hand calculator for this. You can use any software you want to get the critical value, though you can do it with SAS if you know how.

Comments

  1. The main purpose of this assignment is to let you familiarize yourself with SAS on utstat. Therefore, some things are left out and will be covered later. Two examples are residual plots (very dersirable in real data analysis) and high-resolution graphics.
  2. There are a few lessons about consulting and data analysis hidden in the assignment, though. These will be pointed out later in class, though you will probably discover them yourselves.
  3. Please feel free to ask Tamara, Jerry or Barbara for assistance. Jerry will be around most of the day on Tuesday Sept. 18.

Rules