STA312f12 Assignment Six: Quiz on Friday Oct. 26th


  1. In a study of remedies for lower back pain, volunteer patients at a back clinic were randomly assigned to one of seven treatment conditions:
    1. OxyContin: A pain pill in the opiate family.
    2. Ibuprofin: A non-steroidal anti-inflammatory drug (Advil, Motrin)
    3. Acupuncture: The insertion and manipulation of thin needles into specific points on the body to relieve pain or for therapeutic purposes.
    4. Chiropractic: A form of therapy that includes manipulation of the spine, other joints and soft tissue.
    5. Stress reduction training based on thinking positive thoughts, a treatment that theoretically should not be effective. This is the non-drug control condition.
    6. Placebo: A sugar pill; patients were told that it was a pain killer with few side effects. This is the drug control condition.
    7. Waiting list control: Patients were told that the clinic was overcrowded (true), and that they would were on a waiting list. This group received no treatment at all, not even a pretend treatment --- until the study was over, at which point they received the most effective treatment based on the results of the study. We'll call this the "No treatment" group.

    The idea is that the effectiveness of the drug treatments should be measured relative to the drug control (placebo), while the effectiveness of the non-drug treatments should be measured relative to the non-drug control (stress reduction training). The two control treatments can be measured relative to no treatment at all.

    Degree of reported pain was measured by a questionnaire before treatment began, and again after six weeks. The dependent variable was Before-minus-After difference in reported pain, which will be called "improvement," or "effectiveness." Each of the following questions can be answered by testing whether one or more contrasts of treatment means are different from zero. For each question below, state the null hypothesis in terms of the population treatment means μ1 through μ7.

    Note that some of these questions ask whether certain treatments are better than others, while other questions just ask about a difference in effectiveness. In some courses, this would be a signal to choose between a one-tailed and a two-tailed test. But here, we will always use two-tailed tests.

    1. Does OxyContin work any better than the placebo?
    2. Does Ibuprofin work any better than the placebo?
    3. Do Chiropractic treatment and Stress reduction training differ in their effectiveness?
    4. Which results in more mean improvement, Acupuncture or Stress reduction training?
    5. Is the average improvement from the two drug therapies different from the improvement from the placebo?
    6. Does either non-drug therapy differ in effectiveness from Stress reduction training? This is a single null hypothesis.
    7. Is the Placebo better than no treatment at all?
    8. Is Stress reduction training better than no treatment at all?
    9. Is the average effectiveness of the drug therapies different from the average effectiveness of the non-drug therapies?
    10. Do Stress reduction training and the Placebo differ in their effectiveness?
    11. Does either control condition (Drug or Non-Drug) differ from no treatment at all? This is a single null hypothesis.
    12. Is treatment condition (the full independent variable) related to improvement? This is a single null hypothesis.

     

  2. The Wisconsin Power and Light Company studied the effectiveness of two devices for improving the efficiency of gas home-heating systems. The electric vent damper (EVD) reduces heat loss through the chimney when the furnace is in the off cycle by closing off the vent. It is controlled electrically. The thermally activated vent damper (TVD) is the same as the EVD except it is controlled by the thermal properties of a set of bimetal fins set in the vent. Ninety test houses were randomly assigned to have a free vent damper installed; 40 received EVDs and 50 received TVDs. For each house, energy consumption was measured for a period of several weeks with the vent damper active ("vent damper in") and for an equal period with the vent damper not active ("vent damper out". Here are the variables:
        House Identification Number
        Type of furnace (1=Forced air  2=Gravity  3=Forced water)
        Chimney area
        Chimney shape (1=Round  2=Square  3=Rectangular)
        Chimney height in feet
        Type of Chimney liner (0=Unlined  1=Tile  2=Metal)
        Type of house (1=Ranch  2=Two-story 3=tri-level
                       4=Bi-level  5=One and a half stories)
        House age in yrs (99=99+)
        Type of damper (1=EVD 2=TVD)
        Energy consumpt with damper active (in)
        Energy consumpt with damper inactive (out)
    

    Consider a model in which the response variable (Y) is average energy consumption with vent damper in and vent damper out, and the explanatory variables are age of house (X1), chimney area (Xs) and house type (5 categories).

    1. Write E[Y|X] for your full model.
    2. Make a table with one row for each house type. Make columns showing how the indicator dummy variables for house type are defined. The reference category should be Ranch. Write the name of each dummy variable at the top of its column.
    3. Add another column showing E[Y|X] for each house type. The names of your dummy variables must not appear in this column. Why?
    4. You want to test whether, controlling for age of house and chimney area, average energy consumption depends on house type.
      1. Give the null hypothesis in symbols.
      2. Give E[Y|X] for the reduced model.
    5. You want to test whether, controlling for age of house and chimney area, average energy is different for ranch houses and tri-level houses.
      1. Give the null hypothesis in symbols.
      2. Give E[Y|X] for the reduced model.
    6. You want to test whether, controlling for age of house and chimney area, average energy consumption is different for Two-story houses and Bi-level houses.
      1. Give the null hypothesis in symbols.
      2. Give E[Y|X] for the reduced model.

     

  3. High School History classes from across Ontario are randomly assigned to either a discovery-oriented or a memory-oriented curriculum in Canadian history. At the end of the year, the students are given a standardized test and the median score of each class is recorded. Please consider a regression model with these variables.:

    The full regression model is E[Y|X] = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5.

    Give E[Y|X] for the reduced model you would use to answer each of the following questions. Don't re-number the variables. Also, for each question please give the null hypothesis in terms of β values.

    1. If you control for parents' education and income and for teacher's university background, does curriculum type affect test scores? (And why is it okay to use the word "affect?")
    2. Controlling for parents' education and income and for curriculum type, is teacher's university background (two variables) related to their students' test performance?
    3. Controlling for teacher's university background and for curriculum type, are parents' education and income (considered simultaneously) related to students' test performance?
    4. Controlling for curriculum type, teacher's university background and parents' education, is parents' income related to students' test performance?

     

  4. In the United States, admission to university is based partly on high school marks and recommendations, and partly on applicants' performance on a standardized multiple choice test called the Scholastic Aptitude Test (SAT). The SAT has two sub-tests, Verbal and Math. A university administrator selected a random sample of 200 applicants, and recorded the Verbal SAT, the Math SAT and first-year university Grade Point Average (GPA) for each student. The data are given in the file sat.data.
    1. First, fit a model using just the Math score as a predictor. "Fit" means estimate the model parameters. Does there appear to be a relationship between Math score and grade point average?
      1. Answer Yes or No.
      2. Pick one: Students who did better on the Math test tended to have (Better  Worse) first-year grade point average.
      3. Do you reject H0: β1=0 at the α=0.05 significance level?
      4. Are the results statistically significant? Answer Yes or No.
      5. What is the value of the test statistic? There are two correct answers, both numbers on your printout.
      6. What is the p-value? The answer can be found in two places on your printout.
      7. What proportion of the variation in first-year grade point average is explained by score on the SAT Math test? The answer is a number from your printout.
      8. Give a predicted first-year grade point average for a student who got 550 (out of 800) on the Math SAT.
    2. Now fit a model with both the Verbal and Math sub-tests. Please list Verbal and Math in that order in your lm statement, so that our βs will mean the same thing.
      1. Give the test statistic and the p-value for each of the following null hypotheses. The answers are on printout.
        1. H0: β1 = β2 = 0
        2. H0: β1 = 0
        3. H0: β2 = 0
        4. H0: β0 = 0
      2. Controlling for Math score, is Verbal score related to first-year grade point average?
        1. Give the null hypothesis in symbols.
        2. Give the value of the test statistic. The answer is a number from your printout.
        3. Give the p-value. The answer is a number from your printout.
        4. Do you reject the null hypothesis? Answer Yes or No.
        5. Are the results statistically significant? Answer Yes or No.
        6. In plain, non-statistical language, what do you conclude? The answer is something about test scores and first-year marks in university. Make it simple! Your goal is to produce a single sentence that an alert 10-year-old could understand.
      3. Controlling for Verbal score, is Math score related to first-year grade point average?
        1. Give the null hypothesis in symbols.
        2. Give the value of the test statistic. The answer is a number from your printout.
        3. Give the p-value. The answer is a number from your printout.
        4. Do you reject the null hypothesis? Answer Yes or No.
        5. Are the results statistically significant? Answer Yes or No.
        6. In plain, non-statistical language, what do you conclude? The answer is something about test scores and first-year marks in university. Make it simple! Your goal is to produce a single sentence that an alert 10-year-old could understand.
      4. Give a predicted first-year grade point average for a student who got 650 on the Verbal and 550 on the Math SAT.