Assignment Three: Quiz on Thursday Jan. 28 in tutorial

This assignment mostly asks for elementary tests. The ideas were covered in lecture slide set 1; also see your answer to Question 22 of Assignment 1. Doing elementary tests with SAS was not specifically covered in lecture, though there are some examples. See Chapter 2 of the textbook and all the lecture material to this point. Be resourceful. I do suggest that you avoid proc ttest. If you need a two-sample t-test, do the equivalent F-test with proc glm. If you need a matched t-test, use proc means as illustrated in the text.

Wisconsin Power and Light studied the effectiveness of two devices for improving the efficiency of gas home-heating systems. The electric vent damper (EVD) reduces heat loss through the chimney when the furnace is in the off cycle by closing off the vent. It is controlled electrically. The thermally activated vent damper (TVD) is the same as the EVD except it is controlled by the thermal properties of a set of bimetal fins set in the vent. Ninety test houses were used, 40 with EVDs and 50 with TVDs. For each house, energy consumption was measured for a period of several weeks with the vent damper active ("vent damper in") and for an equal period with the vent damper not active ("vent damper out". Here Are the variables:

    House Identification Number
    Type of furnace (1=Forced air  2=Gravity  3=Forced water)
    Chimney area
    Chimney shape (1=Round  2=Square  3=Rectangular)
    Chimney height in feet
    Type of Chimney liner (0=Unlined  1=Tile  2=Metal)
    Type of house (1=Ranch  2=Two-story 3=tri-level
                   4=Bi-level  5=One and a half stories)
    House age in yrs (99=99+)
    Type of damper (1=EVD 2=TVD)
    Energy consumpt with damper active (in)
    Energy consumpt with damper inactive (out)

The raw data are available in furnace.data.txt. There is a lesson here. Never trust what you are told about a data file. This is nothing compared to what you will encounter in practice. When in doubt, use common sense.

Write a SAS program that reads and labels the data, including proc format where appropriate. Create the following new variables:

Then,
  1. Run proc means to obtain sample sizes, means, medians and standard deviations of the quantitative variables. Search for proc means online to see how to get medians; this is the quickest way to access documentation. Run proc freq to get frequency distributions of the categorical variables (a variable may occur in both sets). Be able to answer basic questions like "What is the median chimney height?" (my answer is 20 ft. -- What?! I guess that should be inches.), or "What percentage of houses have a forced water furnace?" (my answer is 7.78%)
  2. Ignoring all other variables, test whether there is more energy consumption with the damper active or the damper inactive. Be able to answer questions like the following:
    1. What is the value of the test statistic? The answer is a single number from the printout.
    2. What is the p-value? The answer is a single number from the printout.
    3. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    4. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about furnaces and vent dampers. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss, who failed the only Statistics course he ever took, and is touchy about it. Remember that he may forward the email to other "statistical experts" he knows, so beware of accepting H0 if the test is not significant.
  3. If you observe the shape of a house's chimney, does that improve your ability to predict what type of vent damper the house has?
    1. What is the value of the test statistic? The answer is a single number from the printout.
    2. What is the p-value? The answer is a single number from the printout.
    3. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    4. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    5. In plain, non-technical language, what do you conclude, if anything? Say something about chimney shape and type of vent damper. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 
  4. Is there a tendency for houses that consume lots of energy with the vent damper inactive to also consume a lot of energy with the vent damper active?
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    6. What proportion of the variation in energy consumption with vent damper active is explained by energy consumption with vent damper inactive? Your answer is a single number. You can use a calculator, and you probably will want to bring a calculator to the quiz.
    7. The equation for predicting energy consumption with vent damper active from energy consumption with vent damper inactive is    Predicted Y = b0 + b1 X. What are b0 and b1? The answer is a pair of numbers from your printout. (One of the residuals may be an outlier, but don't worry about that for now.)
    8. For a house that consumed 10 BTU (British Thermal Units) with vent damper out, what is the predicted energy expenditure with vent damper in? A calculator may be helpful here.
  5. Do the two types of vent damper differ in the amount of energy they save? Your response variable should be the difference of two variables in the raw data file.
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. Are the results statistically significant at the 0.05 level? Answer Yes or No.
    6. What is the mean saving in energy consumption by using an electrical vent damper (compared to not using it)? The answer is a single number from your printout.
    7. What is the mean saving in energy consumption by using a thermal vent damper (compared to not using it)? The answer is a single number from your printout.
  6. Does average energy consumption (mean of consumption with vent damper active and vent damper inactive) depend on type of chimney liner?
    1. Answer the question Yes or No. If the answer is No, add a sentence that protects you against accusations that you are accepting the null hypothesis.
    2. What is the value of the main test statistic? The answer is a single number from the printout.
    3. What is the p-value? The answer is a single number from the printout.
    4. Do you reject the null hypothesis at the 0.05 level? Answer Yes or No.
    5. What proportion of the variation in average energy consumption is explained by type of chimney liner? The answer is a single number from the printout.
    6. In plain, non-technical language, what do you conclude, if anything? Say something about chimney liners and energy consumption. If the overall test is significant, base your conclusions on Bonferroni pairwise multiple comparisons. Statistical terminology is absolutely not allowed here. Pretend you are writing a quick email to your boss … 

Bring both your log file and your output file to the quiz. Do not write anything on the printouts except your name and student number. You may be asked to hand one or both of them in. These two files must be generated by the same SAS program or you may lose a lot of marks. There must be no errors, no warnings and no notes about invalid data in your log file. Bring a calculator.