STA305 s14 Computer Assignment One
Quiz in lecture
on Monday Jan. 20th
The SAT Data
In the United States, admission to university is based partly on high school marks and recommendations, and partly on applicants' performance on a standardized multiple choice test called the Scholastic Aptitude Test (SAT). The SAT has two sub-tests, Verbal and Math. A university administrator selected a random sample of 200 applicants, and recorded the Verbal SAT, the Math SAT and first-year university Grade Point Average for each student. The data are given in the file
sat2.data. The file has four columns: Identification number, score on the verbal sub-test, score on the math sub-test, and grade point average at the end of first year.
Write a single SAS program to do the following. This means you will not use %include. Also, there should be just one procedure output file. Your SAS program should
- Read and label the data.
- Use proc means to obtain n, mean and standard deviation for each variable.
- Use proc reg to fit a regression model in which the independent variables are the verbal and math sub-tests, and the dependent variable is grade point average.
- In addition to the default output, generate a test of whether the regression coefficients for the verbal and math sub-test are different from one another.
You will bring hard copy of your log file and procedure output file to the quiz, answer a few questions about the output, and hand in the output with your quiz. Please put your name and student number in the title statement. The following should give you an idea of what to expect on the quiz. Bring a calculator.
- Be able to give E(Y|X) for the full model, and for the reduced model corresponding to each test.
- For each F or t test on the output, be able to state the null hypothesis in symbols, and whether it is rejected at the α=0.05 significance level. Note that all the null hypotheses in this class are non-directional.
- If the null hypothesis is rejected, you should be able to state the conclusion in plain, non-statistical language, like "Allowing for score on the verbal subtest, students who do better on the math sub-test tend to get lower first-year marks." Notice that this conclusion is just an example, not supported by the data. Also notice that the conclusion is directional, even though the test is two-sided. Always state a directional conclusion where possible; this feature is worth half the marks.
- What proportion of the variation in first-year GPA is explained by the verbal and math sub-tests together? The answer is a number from your printout.
- Give the test statistic, the degrees of freedom and the p-value for each of the following null hypotheses. The answers are numbers from your printout.
- H0: β1 = β2 = 0
- H0: β1=0
- H0: β2=0
- H0: β0=0
- H0: β1 = β2
- Controlling for Math score, is Verbal score related to first-year grade point average?
- Give the null hypothesis in symbols.
- Give the value of the test statistic. The answer is a number from your printout.
- Give the p-value. The answer is a number from your printout.
- Do you reject the null hypothesis?
- In plain, non-statistical language, what do you conclude? The answer is something about test scores and grade point average.
- Controlling for Verbal score, is Math score related to first-year grade point average?
- Give the null hypothesis in symbols.
- Give the value of the test statistic. The answer is a number from your printout.
- Give the p-value. The answer is a number from your printout.
- Do you reject the null hypothesis?
- In plain, non-statistical language, what do you conclude? The answer is something about test scores and grade point average.
- We want to know whether expected GPA increases faster as a function of the Verbal SAT, or the Math SAT.
- Give the null hypothesis in symbols.
- Give the value of the test statistic. The answer is a number from your printout.
- Give the p-value. The answer is a number from your printout.
- Do you reject the null hypothesis?
- In plain, non-statistical language, what do you conclude? The answer is something about test scores and grade point average.
- Give a predicted first-year grade point average for a student who got 650 on the Verbal and 700 on the Math SAT. Use a calculator.
Bring both your log file and your procedure output file to the quiz. You will be asked to hand them in with your quiz. Remember, correct interpretation of the wrong numbers is not worth much, and correct numbers without understanding is worth nothing. Know what your output means.
Please be reminded of the rules for computer assignments and quizzes. See the syllabus for more detail.
- You may copy freely from me, but do not look at anyone else's SAS code before the quiz.
- Do not write anything on the printouts except possibly your name and student number. Even your name and student number should be in the title statement rather than hand-written, but no marks will be deducted if you forget this first time.
- Important: Comment statements and other typed material that would help with interpretation of the computer output are expressly forbidden. For example, you may not type assignment questions or answers into your code, or otherwise cause them to appear on your log file or procedure output file. Any such material is an unauthorized aid for the purposes of this course, and if you use or possess an unauthorized aid, you will be charged with an academic offence.
- The log and procedure output files must be
generated at the same time by the same SAS program or you may lose a lot of marks.
- There must
be no errors or notes about invalid data in your log file.
I believe that the SAT data originally was a Minitab data set, and Minitab (or the PWS-Kent publishing company) probably own the rights to it. This assignment is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Use it almost any way you like, as long as you share the results freely. See the license for details.