STA442/1008 Assignment 9

Quiz in tutorial on Friday April 4th


This assignment is based on Chapter 6 and associated lecture material.

  1. High School History classes from across Ontario are randomly assigned to either a discovery-oriented or a memory-oriented curriculum in Canadian history. At the end of the year, the students are given a standardized test and the median score of each class is recorded. Cases are history classes, and the dependent variable is median score. Please consider a regression model with these variables.:

    The full regression model is E[Y|X] = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5.

    Give the reduced model you would use to answer each of the following questions.

    1. If you control for parents' education and income and for teacher's university background, does curriculum type affect test scores? (And why is it okay to use the word "affect?")
    2. Controlling for parents' education and income and for curriculum type, is teacher's university background (two variables) related to their students' test performance?
    3. Controlling for teacher's university background and for curriculum type, are parents' education and income (considered simultaneously) related to students' test performance?
    4. Controlling for curriculum type, teacher's university background and parents' education, is parents' income related to students' test performance?
  2. . In a study of how people may get sick by staying in hospital, the cases are hospitals, and the dependent variable is "Infection risk," the (estimated) probability of getting sick in hospital. Two variables of interest are Age (average age of patient in the hospital) and Geographic Region in the U. S..
    1. . In the table below, set up indicator dummy variables for geographic region so that South is the reference category.
      Region   D1     D2     D3  
      North East
      North Central
      South
      West
    2. Representing infection risk by Y, age by the variable x, and your three dummy variables by D1, D2 and D3, write a regression equation with an intercept and 4 independent variables. Complete the equation below (put x before the dummy variables).

             E[Y|X] =

    3. Give E[Y|X] for each region. The symbols "D1," "D2" and "D3" should not appear in your answer; they are numbers!
      Region E[Y|X]
      North East                                                                                        
      North Central
      South
      West

    4. For the Northeast region, when average patient age is increased by one year, expected infection risk increases by _______.

    5. For the West region, when average patient age is increased by one year, expected infection risk increases by _______.

    6. For any region, when average patient age is increased by one year, expected infection risk increases by _______.

    7. Controlling for average patient age, the difference between expected infection risk in the Northeast and South regions is ____.

    8. Controlling for average patient age, the difference between expected infection risk in the North Central and South regions is ____.

    9. Controlling for average patient age, the difference between expected infection risk in the Northeast and West regions is ____.

    10. What does β0 mean?

    11. Suppose we simultaneously tested D1, D2 and D3 (or equivalently, β2, β3, and β4), and the test was not significant. If you were in an exploratory mode and allowing yourself to accept the null hypothesis, what would you conclude?

    12. Is this study experimental, observational, or both? Why?

    13. Suppose the results in (k) were statistically significant. Could you conclude that the difference in infection risk was caused by differences in how hospitals are run in the different regions? Why or why not?

  3. For the TV data, consider a regression model in which the dependent variable is number of TV sets, and the independent variables are Assessed value of home, Total number of people in household, and Location (represented by a collection of dummy variables). For the overall (initial) F-test, I get F = 23.56. This will allow you to verify that you have the right variables. By the way, what does this (highly signiicant) result mean, in plain language?

    Carry out tests to answer the following questions. For each test, give the numerical value of the test statistic (F or t), and the p-value. State your conclusion. You are in exloratory mode, so you may freely accept the null hypothesis if it is not rejected.

    1. Controlling for number of people in the household and assessed value of home, is there a difference among the three locations in average number of TV sets?
    2. Controlling for location, are number of people in the household and value of home (considered together) related to average number of TV sets?
    3. Controlling for location and number of people in the household, is value of home related to average number of TV sets? Give the t as well as the F statistic. If the relationship is significant, is it positive or negative? State the finding in simple language.
    4. Controlling for location and value of home, is number of people in the household related to average number of TV sets? Give the t as well as the F statistic. If the relationship is significant, is it positive or negative? State the finding in simple language.
    5. Controlling for number of people in the household and assessed value of home, is there a difference between Rural and Urban locations in average number of TV sets?
    6. Controlling for number of people in the household and assessed value of home, is there a difference between Small Town and Urban locations in average number of TV sets?
    7. Controlling for number of people in the household and assessed value of home, is there a difference between Rural and Small Town locations in average number of TV sets?
    8. Controlling for number of people in the household and assessed value of home, is there a difference between Urban locations and the average of Rural and Small Town in mean number of TV sets? My F = 0.51.
    9. Controlling for number of people in the household and assessed value of home, is there a difference between Rural locations and the average of Urban and Small Town in mean number of TV sets? My F = 1.42.