Assignment Six: Quiz
on Friday March 2nd
The quiz will be based on Chapter 5 and associated lecture material.
This assignment uses the Furnace
data of Assignment Three. The main
purpose of this study is to see which kind of vent damper uses more energy,
thermally activated or electrically activated. The vent damper can't do
anything when it's inactive (out), but still energy consumption with vent
damper active is strongly related to energy consumption with vent damper
inactive. That's because a huge number of important unmeasured variables
(insulation, exposure to the wind, total surface area, efficiancy of the
furnace, how much heat the residents like, etc.) are identical for the two
measurements, because it's the same house. So, let's take energy
consumption with vent damper in as the dependent variable. We'll use
energy consumption with vent damper out (inactive) as a covariate, and see
what else matters.
Note that in this assignment, there are no interactions. We'll
get to that later.
- First, use proc reg to fit a regression model in which energy consumption with vent damper out is the only independent variable. What proportion of the variation does it explain?
- Make indicator dummy variables for the following categorical independent variables. You should check to make sure you did it right, but you need not print the frequency tables.
- Type of Furnace: Forced water is reference category
- Chimney Shape: Rectangular is reference category
- Chimney Liner: Unlined is reference category
- House Type (all 5 categories): Ranch is reference category
- Type of Vent Damper (1=EVD)
For each of these variables, use proc reg to test whether it is related to energy consumption with vent damper in, once you control for energy consumption with vent damper out. Be able to give the value of the test statistic, the p-value, and whether the results are statistically significant. Be able to state your conclusions (if any) in plain, non-statistical language. Make sure you cannot be accused of accepting the null hypothesis. Check your work with proc glm. Agaain, you are considering each of these variables one at a time, controlling for energy consumption with vent damper out, but not controlling for each other, yet.
- Consider the last item, Type of vent damper controlling for energy consumption with vent damper inactive. Please obtain the least squares means as part of your proc glm output. Be able to reproduce the least squares means (using a calculator) from your proc reg output.
- Using proc reg, fit a full model. The independent variables are Energy consumption with vent damper inactive, Chimney area, Chimney height, Age of house, and dummy variables for the categorical independent variables of Question 2. Test each varible controlling for all the others. For the categorical independent variablles with more than two categories, this means using the test statement. Be able to give the value of the test statistic, the p-value, and whether the results are statistically significant. Be able to state your conclusions (if any) in plain, non-statistical language. Make sure you cannot be accused of accepting the null hypothesis. Check your work with proc glm.
- Starting with the full model, try stepwise selection with the significance level for entry to the model and significance level for staying in the model moth equal to 0.05. You are responsible for understanding all the output except C(p). What model do you arrive at?
- Now do the same, except using selection = backward instead of selection = stepwise. Does this suggest a different model?
- Based on the results of the backward variable selection, fit a model in which Energy consumption with vent damper active depends on Energy consumption with vent damper inactive, Chimney area, and Type of chimney liner. This is the full model for the following questions. If you start with proc reg simple, you will get simple descriptive statistics that will be useful later. Here's the question: Controlling for Energy consumption with vent damper inactive and Chimney area, is Type of chimney liner related to Energy consumption with vent damper active?
- Give the value of the test statistic. The answer is a number from the printout.
- What is the p-value? The answer is a number from the printout.
- Do you reject the null hypothesis at α=0.05? Answer Yes or No.
- Are the results statistically significant at the 0.05 level? Answer Yes or No.
- After allowing for Energy consumption with vent
damper inactive and Chimney area, what proportion of
the remaining variation in Energy consumption with vent
damper active is explained by Type of chimney liner?
The answer is a number between zero and one. Show a
little work.
- Using regression output, calculate three least
squares means, one for each type of chimney liner.
- Based on proc reg output, carry out all
pairwise comparisons of means for the three types of
chimney liner. Use a Bonferroni correction. Calculate
all the Bonferroni-corrected p-values.
- In simple, non-technical lnguage, what do you
conclude? Include a statement that (allowing for ...)
houses with certain types of chimney liner use (more,
less) energy.
- Check your least squares means and
Bonferroni-corrected p-values with proc glm.
Bring your log and list files to the quiz. Do not write anything
on the printouts in advance except your name and student number. You may be
asked to hand them in. The log and list files for each data set must
be generated by the same SAS program or you may lose a lot of marks. There
must be no errors or warnings in your log files. There must be no
notes about invalid data. Bring a calculator to the quiz.