STA441 Assignment 2

Quiz on Friday Jan. 26th in tutorial


The Furnace Data

Wisconsin Power and Light studied the effectiveness of two devices for improving the efficiency of gas home-heating systems. The electric vent damper (EVD) reduces heat loss through the chimney when the furnace is in the off cycle by closing off the vent. It is controlled electrically. The thermally activated vent damper (TVD) is the same as the EVD except it is controlled by the thermal properties of a set of bimetal fins set in the vent. Ninety test houses were used, 40 with EVDs and 50 with TVDs. For each house, energy consumption was measured for a period of several weeks with the vent damper active ("vent damper in") and for an equal period with the vent damper not active ("vent damper out". The variables are :

    House Identification Number
    Type of furnace (1=Forced air  2=Gravity  3=Forced water)
    Chimney area
    Chimney shape (1=Round  2=Square  3=Rectangular)
    Chimney height in feet
    Type of Chimney liner (0=Unlined  1=Tile  2=Metal)
    Type of house (1=Ranch  2=Two-story 3=tri-level
                   4=Bi-level  5=One and a half stories)
    House age in yrs (99=99+)
    Type of damper (1=EVD 2=TVD)
    Energy consumpt with damper active (in)
    Energy consumpt with damper inactive (out)

Here is the raw data file: furnace.data.txt.

Some details about the collection of these data have been lost, but it seems unlikely that houses were randomly assigned to damper type. It would have made the study too expensive. It's much more likely that existing furnace configurations were selected in some unknown way that included the homeowner being willing to participate in the study.


Write a SAS program that reads and labels the data (including proc format where appropriate). Marks may be deducted for failing to provide labels. Create the following new variables:

Then,
  1. Run proc means to obtain sample sizes, means, medians, standard deviations, maxima and minima of the quantitative variables. Don't be afraid to look at online documentation. Run proc freq to get frequency distributions of the categorical variables. Be able to answer basic questions like "What is the median chimney height?" (my answer is 20 ft.), or "What percentage of houses have a tile chimney liner?" (my answer is 44.94%)
  2. There's something wrong. You may live in the world of the metric system, but for reference, a regulation basketball hoop is 10 feet high. When is the last time you saw a single-family home with a 20 foot chimney? Use common sense and fix the problem. This is data analysis.
  3. Ignoring all other variables, test whether there is more energy consumption with the damper active or the damper inactive. Be able to answer questions like the following:
    1. What is the value of the test statistic? The answer is a single number from the printout.
    2. What is the p-value? The answer is a single number from the printout.
    3. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    4. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about furnaces and vent dampers. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss, who failed the only statistics course he ever took, and is touchy about it. Remember that he may forward the email to other "statistical experts" he knows.
  4. If you observe the shape of a house's chimney, does that improve your ability to predict what type of vent damper the house has?
    1. What is the value of the test statistic? The answer is a single number from the printout.
    2. What is the p-value? The answer is a single number from the printout.
    3. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    4. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about chimney shape and type of vent damper. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 
  5. Is there a tendency for houses that consume lots of energy with the vent damper inactive to also consume a lot of energy with the vent damper active?
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    6. The equation for predicting energy consumption with vent damper active from energy consumption with vent damper inactive is    Predicted Y = b0 + b1 X. What are b0 and b1? The answer is a pair of numbers from your printout.
    7. What proportion of the variation in energy consumption with vent damper active is explained by energy consumption with vent damper inactive? The answer is a single number.
    8. For a house that consumed 10 BTU (British Thermal Units) with vent damper out, what is the predicted energy expenditure with vent damper in? A calculator may be helpful here. You should bring a calculator to the quiz.
    9. Make a scatterplot showing the least squares line. How many points appear to merit further investigation? You don't need to actually do the investigation at this point, because we have not gotten to residuals.
  6. Do the two types of vent damper differ in the amount of energy they save? Your dependent variable should be the difference of two variables in the raw data file.
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    6. What is the mean saving in energy consumption by using an electrical vent damper (compared to not using it)? The answer is a single number from your printout.
    7. What is the mean saving in energy consumption by using a thermal vent damper (compared to not using it)? The answer is a single number from your printout.
  7. Does average amount of energy consumption (mean of consumption with vent damper active and inactive) depend on type of chimney liner?
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the main test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about chimney liners and energy consumption. If the overall test is significant, you are allowed to make statements about which means are different from each other without any formal testing -- just this once. We will discuss multiple comparisons later, in connection with Chapter 3. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 
  8. Do different types of house (Ranch, etc.) tend to have chimneys of different shapes?
    1. First, run the analysis using the original House Type variable -- the one with five categories. See the warning about expected cell frequencies less than five? We are only interested in whether they are less than one, but this still means you need to use the expected option.
    2. How many cells have expected frequencies less than one?
    3. We avoid expected frequencies less than one like they are poison, because they tend to inflate the Type I error probability. So now try the House Type variable you created, the one with three categories. How many expected frequencies are less than one now? Don't worry about the ones that are between one and five.
    4. What is the value of the main test statistic? The answer is a single number from the printout.
    5. What is the p-value? The answer is a single number from the printout.
    6. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    7. In plain, non-technical language, what do you conclude, if anything? Say something about chimneys. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 
  9. Now that you see the kinds of question that might be asked, use elementary tests to assess the relationship between vent damper type and all the other variables in the file, including the ones you created. You have already done some of this. For the quiz, be ready to interpret the results in plain language, if asked.

Bring both your log file and your results file to the quiz. Do not write anything on the printouts except your name and student number. You may be asked to hand one or both of them in. These two files must be generated by the same SAS program or you may lose a lot of marks. There must be no errors or warnings in your log file. Bring a calculator.

Hint: Never fully believe what you are told about a data file. If you find problems, use common sense. Of course you are allowed to edit the data file in an extreme case like this.