Assignment Three: Quiz on Friday Jan 27th

The Furnace Data

The Wisconsin Power and Light Company studied the effectiveness of two devices for improving the efficiency of gas home-heating systems. The electric vent damper (EVD) reduces heat loss through the chimney when the furnace is in the off cycle by closing off the vent. It is controlled electrically. The thermally activated vent damper (TVD) is the same as the EVD except it is controlled by the thermal properties of a set of bimetal fins set in the vent. Ninety test houses were randomly assigned to have a free vent damper installed; 40 received EVDs and 50 received TVDs. For each house, energy consumption was measured for a period of several weeks with the vent damper active ("vent damper in") and for an equal period with the vent damper not active ("vent damper out". Here are the variables:

    House Identification Number
    Type of furnace (1=Forced air  2=Gravity  3=Forced water)
    Chimney area
    Chimney shape (1=Round  2=Square  3=Rectangular)
    Chimney height in feet
    Type of Chimney liner (0=Unlined  1=Tile  2=Metal)
    Type of house (1=Ranch  2=Two-story 3=tri-level
                   4=Bi-level  5=One and a half stories)
    House age in yrs (99=99+)
    Type of damper (1=EVD 2=TVD)
    Energy consumpt with damper active (in)
    Energy consumpt with damper inactive (out)

Here is the raw data file: furnace.data.

Write a SAS program that reads and labels the data (including proc format where appropriate). Create the following new variables:

Then,
  1. Run proc means to obtain sample sizes, means, medians and standard deviations of the quantitative variables. Run proc freq to get frequency distributions of the categorical variables (a variable may occur in both sets). Be able to answer basic questions like "What is the median chimney height?" (my answer is 20 ft.), or "What percentage of houses have a tile chimney liner?" (my answer is 44.94%)
  2. Ignoring all other variables, test whether there is more energy consumption with the damper active or the damper inactive. Be able to answer questions like the following:
    1. What is the value of the test statistic? The answer is a single number from the printout.
    2. What is the p-value? The answer is a single number from the printout.
    3. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    4. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about furnaces and vent dampers. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss, who failed the only Statistics course he ever took, and is touchy about it. Remember that he may forward the email to other "statistical experts" he knows.
  3. If you observe the shape of a house's chimney, does that improve your ability to predict what type of vent damper the house has?
    1. What is the value of the test statistic? The answer is a single number from the printout.
    2. What is the p-value? The answer is a single number from the printout.
    3. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    4. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    5. Are these results what you would expect from the description of the study? Why or why not? This is important.
    6. In plain, non-technical language, what do you conclude, if anything? Say something about chimney shape and type of vent damper. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 
  4. Is there a tendency for houses that consume lots of energy with the vent damper inactive to also consume a lot of energy with the vent damper active?
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    6. What proportion of the variation in energy consumption with vent damper active is explained by energy consumption with vent damper inactive? Your answer is a single number. You can use a calculator, and you probably will want to bring a calculator to the quiz.
    7. The equation for predicting energy consumption with vent damper active from energy consumption with vent damper inactive is    Predicted Y = b0 + b1 X. What are b0 and b1? The answer is a pair of numbers from your printout.
    8. For a house that consumed 10 BTU (British Thermal Units) with vent damper out, what is the predicted energy expenditure with vent damper in? A calculator may be helpful here.
  5. Do the two types of vent damper differ in the amount of energy they save? Your dependent variable should be the difference of two variables in the raw data file.
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    6. What is the mean saving in energy consumption by using an electrical vent damper (compared to not using it)? The answer is a single number from your printout.
    7. What is the mean saving in energy consumption by using a thermal vent damper (compared to not using it)? The answer is a single number from your printout.
  6. Does average amount of energy consumption (mean of consumption with vent damper active and inactive) depend on type of chimney liner?
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the main test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about chimney liners and energy consumption. If the overall test is significant, you are allowed to make statements about which means are different from each other without any formal testing -- just this once. We will discuss multiple comparisons later, in connection with Chapter Three. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 
  7. Do different types of house (Ranch, etc.) tend to have chimneys of different shapes?
    1. First, run the analysis using the original House Type variable -- the one with five categories. See the warning about expected cell frequencies less than five? We are only interested in whether they are less than one, but this still means you need to use the expected option.
    2. How many cells have expected frequencies less than one?
    3. We avoid expected frequencies less than one like they are poison, because they tend to inflate the Type I error rate. So now try the House Type variable you created, the one with three categories. How many expected frequencies are less than one now? Don't worry about the expected frequencies that are between one and five.
    4. What is the value of the main test statistic? The answer is a single number from the printout.
    5. What is the p-value? The answer is a single number from the printout.
    6. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    7. In plain, non-technical language, what do you conclude, if anything? Say something about chimneys. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 

Bring both your log file and your list file to the quiz. Do not write anything on the printouts except your name and student number. You may be asked to hand one or both of them in. These two files must be generated by the same SAS program or you may lose a lot of marks. There must be no errors, warnings or notes about invalid data in your log file. In the list file, warnings about expected frequencies less than five are okay as long as there are no expected frequencies below one. Bring a calculator.

Hint: Once again, there is something funny about the data file. You should always expect this. Common sense is allowed in this course. Please use it.