Assignment Five: Quiz on Friday Feb. 16th in tutorial


This assignment is on regression with normal error terms. It is based on Lecture slide set 9, and material in Chapter 5 of the online text.

  1. High School History classes from across Ontario are randomly assigned to either a discovery-oriented or a memory-oriented curriculum in Canadian history. At the end of the year, the students are given a standardized test and the median score of each class is recorded. Please consider a regression model with these variables.:

    The equation of the full regression model is E(y|x) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5.

    For each question below, please give the null hypothesis in terms of β values. Also, give E(y|x) for the reduced (restricted) model you would compare to the full model in order to answer the question. Don't re-number the variables.

    1. If you control for parents' education and income and for teacher's university background, does curriculum type affect test scores? (And why is it okay to use the word "affect?")
    2. Controlling for parents' education and income and for curriculum type, is teacher's university background (two variables) related to their students' test performance?
    3. Controlling for teacher's university background and for curriculum type, are parents' education and income (considered simultaneously) related to students' test performance?
    4. Controlling for curriculum type, teacher's university background and parents' education, is parents' income related to students' test performance?

  2. The U.S. Census Bureau divides the United States into small pieces called census tracts; lots of information is collected about each census tract. The census tracts are grouped into four geographic regions: Northeast, North Central, South and West. In one study, the cases were census tracts, the explanatory variables were Region and average income, and the response variable was crime rate, defined as the number of reported serious crimes in a census tract, divided by the number of people in the census tract.
    1. Write E(y|x) for a regression model with parallel regression lines. You do not have to say how your dummy variables are defined. You will do that in the next part.
    2. Make a table showing how your dummy variables are set up. There should be one row for each region, and a column for each dummy variable. Add a wider column on the right, in which you show E(y|x) for each region. Note that the symbols for your dummy variables will not appear in this column. There are examples of this format in the lecture slides and the text.
    3. For each of the following questions, give the null hypothesis in terms of the β parameters of your regression model. Remember that we are not doing one-tailed tests in this class.
      1. Controlling for income, does average crime rate differ by geographic region?
      2. Controlling for income, is average crime rate different in the Northeast and North Central regions?
      3. Controlling for income, is average crime rate different in the Northeast and Western regions?
      4. Controlling for income, is the crime rate in the South more than the average of the other three regions?
      5. Controlling for income, is the average crime rate in the Northeast and North Central regions different from the average of the South and West?
      6. Controlling for geographic region, is crime rate connected to income?
    4. Referring back to the previous set of questions, say why each of the following is a bad way to ask the question.
      1. Controlling for income, does geographic region affect the average crime rate?
      2. Allowing for geographic region, does average income have any effect on crime rate?
    5. Write E(y|x) for a regression model in which the regression lines might not be parallel. For this new model, give the null hypothesis you would test in order to answer each question.
      1. Are the four regression lines parallel in the population?
      2. Is there an interaction between average income and geographic region?
      3. Does the relationship of average income to crime rate depend on geographic region?
      4. Do regional differences in average crime rate depend on the average income in the census tract?
      5. Is the slope of the line relating average income to expected crime rate different for the Northeast and North Central regions?
      6. Is the slope of the line relating average income to crime rate different for the Northeast and South regions?
      7. Is the slope of the line relating average income to crime rate different for the Northeast and West regions?
      8. Is the slope of the line relating average income to crime rate different for the North Central and South regions?
      9. Is the slope of the line relating average income to crime rate different for the North Central and West regions?
      10. Is the slope of the line relating average income to crime rate different for the South and West regions?

  3. Consider a regression model with one continuous covariate (call it x1), and a categorical explanatory variable with three categories.
    1. Using polynomial regression, write a regression model with three parallel curves. All you have to give is E(y|x).
    2. Make a table with one row for each value of the categorical variable, and columns showing how the dummy variables are defined. Make a wider column on the right, showing the conditional expected value of y for each category. If symbols for the dummy variables appear in this column, the answer is wrong.
    3. You want to test whether, controlling for x1, there are any differences in expected y for the three categories. Does this question still make sense with curves instead of straight lines? If it makes sense, give the null hypothesis using symbols from your regression model.
    4. What null hypothesis would you test to determine whether three straight liness would give an adequate fit to the data? Give the answer using symbols from your regression model.
  4. In a health study on young to middle aged adults, the explanatory variables are x1 = Age and x2 = Amount of exercise. The response variable is y = BMI (Body Mass Index), basically a measure of how heavy a person is relative to his or her height. We expect BMI to go up on average with age and go down on average with amount of exercise. However, might the connection of age to BMI depend on amount of exercise? Maybe the increase of BMI with age is less for those who exercise more.
    1. Write E(y|x).
    2. Re-write the equation so that it's easier to think of the increase of expected BMI as a function of age, for a fixed amount of exercise.
    3. For a fixed x2, what is the slope of the line relating age to expected body mass index?
    4. What null hypothesis would you test to check whether the connection of age to BMI depends on amount of exercise? Give the answer in symbols from your regression model.
    5. What null hypothesis would you test to check whether the connection of exercise to BMI depends on age? Give the answer in symbols from your regression model.
    6. Suppose that the more you exercise, the more slowly your BMI increases as you age. Would β3 be positive, or negative?

 


This assignment is licensed under a Creative Commons Attribution-ShareAlike 3.0 (or later) Unported License. Use and share it freely.