STA442/1008 Assignment 6

Quiz on Friday Oct. 23d in tutorial


This assignment is based on Chapter 5 and associated lecture material. Using the TV data, fit a regression model in which the answer to Question 4 (price willing to pay for cable TV) is the dependent variable, and the independent variables are location (a 3-value categorical variable represented by indicator dummy variables with City the reference category), value of home, number of TV sets in household, and Total TV hours watched last week. Use proc reg simple to get some basic descriptive statistics you'll need later.

  1. Make a table with three rows, showing E(Y|X=x) for each location. Symbols for the indicators should not appear in your answer, because they are zeros and ones, and different in each row. You are doing this with pencil and paper. The answer should not appear anywhere on your output.
  2. Make sure you know what the default output means. For example: "Controlling for Value of home, Number of TV sets in household, and Total TV hours watched last week, is there a difference between the Rural and City locations in average price willing to pay for cable TV?" My test statistic was t = 2.45. For which location is predicted Y greater, and how can you tell from the regression coefficient (that is, from b)?
  3. Perform a custom F-test for comparing the 3 locations controlling for the quantitative variables. What is the value of the test statistic (a single number)? What is the p-value? Is it significant at the 0.05 level? What, if anything, do you conclude?
  4. Perform a custom F-test for comparing the Rural to Small Town controlling for the quantitative variables (there's no t-test for this one). What is the value of the test statistic (a single number)? What is the p-value? Is it significant at the 0.05 level?
  5. Based on just the default output, use proc iml to calculate the following:
    1. The proportion of remaining variation explained by location after you control for the quantitative predictors.
    2. The proportion of remaining variation explained by value of home after you control for all the other predictors. Using F = t2, I got 0.5122626.
  6. Using proc iml, calculate a single Y-hat value for each of the three locations, with the quantitative independent variables set to their sample mean values. Use all the decimal places given on the printout.
  7. Check your last calculation using the lsmeans command on proc glm. My answers agree with proc iml to the third decimal place. You can also check your test of location controlling for Value of home, Number of TV sets, and Total TV hours; look at the F-test associated with the Type III sums of squares.
  8. If you did not do it initially, obtain the Bonferroni-corrected p-values fore all pairwise comparisons of the three location means, controlling for the quantitative variables. Compare these numbers to what you get with a calculator, doing the Bonferroni adjustment by hand. Do not write your calculator results on the printout or put them in your program. Does it look like you can trust the pdiff adjust= option to do what you want? Answer Yes or No.
  9. Finally, predict price willing to pay for a household in a Small Town with a house worth $70k, 2 TV sets and a weekly average of 60 total hours of TV per week. Your answer is a single number. Do it with a calculator, but do not write your calculations on the printout or put them in your program. Your answer is a single number (in 1982 dollars).

Please bring your log file and your list file to the quiz. Bring a calculator, too.