STA442/1008 Assignment 5

Quiz on Friday Feb. 10th


This assignment is based on material in Chapter 5 of the online text, and associated lecture material. The following formulas will be provided with the quiz (and the final exam), whether they are needed or not:

          

  1. High School History classes from across Ontario are randomly assigned to either a discovery-oriented or a memory-oriented curriculum in Canadian history. At the end of the year, the students are given a standardized test and the median score of each class is recorded. Please consider a regression model with these variables.:

    The full regression model is E[Y|X] = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5.

    Give E[Y|X] for the reduced model you would use to answer each of the following questions. Don't re-number the variables. Also, for each question please give the null hypothesis in terms of β values.

    1. If you control for parents' education and income and for teacher's university background, does curriculum type affect test scores? (And why is it okay to use the word "affect?")
    2. Controlling for parents' education and income and for curriculum type, is teacher's university background (two variables) related to their students' test performance?
    3. Controlling for teacher's university background and for curriculum type, are parents' education and income (considered simultaneously) related to students' test performance?
    4. Controlling for curriculum type, teacher's university background and parents' education, is parents' income related to students' test performance?
  2. In the United States, admission to university is based partly on high school marks and recommendations, and partly on applicants' performance on a standardized multiple choice test called the Scholastic Aptitude Test (SAT). The SAT has two sub-tests, Verbal and Math. A university administrator selected a random sample of 200 applicants, and recorded the Verbal SAT, the Math SAT and first-year university Grade Point Average (GPA) for each student. The data are given in the file sat.data.
    1. First, fit a model using just the Math score as a predictor. "Fit" means estimate the model parameters. Does there appear to be a relationship between Math score and grade point average?
      1. Answer Yes or No.
      2. Pick one: Students who did better on the Math test tended to have (Better  Worse) first-year grade point average.
      3. Do you reject H0: β1=0 at the α=0.05 significance level?
      4. Are the results statistically significant? Answer Yes or No.
      5. What is the value of the test statistic? There are two correct answers, both numbers on your printout.
      6. What is the p-value? The answer can be found in two places on your printout.
      7. What proportion of the variation in first-year grade point average is explained by score on the SAT Math test? The answer is a number from your printout.
      8. Give a predicted first-year grade point average for a student who got 550 (out of 800) on the Math SAT.
    2. Now fit a model with both the Verbal and Math sub-tests. Please list Verbal and Math in that order on your model statement, so that our βs will mean the same thing.
      1. Give the test statistic and the p-value for each of the following null hypotheses. The answers are on printout.
        1. H0: β1 = β2 = 0
        2. H0: β1 = 0
        3. H0: β2 = 0
        4. H0: β0 = 0
      2. Controlling for Math score, is Verbal score related to first-year grade point average?
        1. Give the null hypothesis in symbols.
        2. Give the value of the test statistic. The answer is a number from your printout.
        3. Give the p-value. The answer is a number from your printout.
        4. Do you reject the null hypothesis? Answer Yes or No.
        5. Are the results statistically significant? Answer Yes or No.
        6. Once you allow for Math score, Verbal score explains what percent of the remaining variation in first-year GPA? The answer is a number between 0 and 100%.
        7. In plain, non-statistical language, what do you conclude? The answer is something about test scores and first-year marks in university. Make it simple! Your goal is to produce a single sentence that an alert 10-year-old could understand.
      3. Controlling for Verbal score, is Math score related to first-year grade point average?
        1. Give the null hypothesis in symbols.
        2. Give the value of the test statistic. The answer is a number from your printout.
        3. Give the p-value. The answer is a number from your printout.
        4. Do you reject the null hypothesis? Answer Yes or No.
        5. Are the results statistically significant? Answer Yes or No.
        6. Once you allow for Verbal score, Math score explains what percent of the remaining variation in first-year GPA? The answer is a number between 0 and 100%.
        7. In plain, non-statistical language, what do you conclude? The answer is something about test scores and first-year marks in university. Make it simple! Your goal is to produce a single sentence that an alert 10-year-old could understand.
      4. Give a predicted first-year grade point average for a student who got 650 on the Verbal and 550 on the Math SAT.
  3. This question refers to the Furnace data of Assignment Three. However, no computation is required. This is just paper-and-pencil.

    Consider a model in which the dependent variable (Y) is average energy consumption with vent damper in and vent damper out, and the independent variables are age of house (X1), chimney area (Xs) and house type (5 categories).

    1. Write E[Y|X] for your full model.
    2. Make a table with one row for each house type. Make columns showing how the indicator dummy variables for house type are defined. The reference category should be Ranch. Write the name of each dummy variable at the top of its column.
    3. Add another column showing E[Y|X] for each house type. The names of your dummy variables must not appear in this column. Why?
    4. You want to test whether, controlling for age of house and chimney area, average energy consumption depends on house type.
      1. Give the null hypothesis in symbols.
      2. Give E[Y|X] for the reduced model.
    5. You want to test whether, controlling for age of house and chimney area, average energy is different for ranch houses and tri-level houses.
      1. Give the null hypothesis in symbols.
      2. Give E[Y|X] for the reduced model.
    6. You want to test whether, controlling for age of house and chimney area, average energy consumption is different for Two-story houses and Bi-level houses.
      1. Give the null hypothesis in symbols.
      2. Give E[Y|X] for the reduced model. This may be too tricky for the quiz.

Please bring your log file and list file to the quiz. Bring a calculator, too.