\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} \topmargin=-.3in \textheight=9.4in %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment Nine}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent Please bring printouts of your SAS log and list files to the quiz. Note that the log and list files \emph{must be from the same run of SAS}. Marks will be deducted for errors or warnings. \textbf{Do not write anything on your log and list files in advance, except possibly your name and student number.} The non-computer questions are just practice for the quiz, and are not to be handed in. \vspace{5mm} \begin{enumerate} \item For the usual fixed effects multiple regression model, let $\mathbf{W} = (\mathbf{X}^\prime \mathbf{X})^{-1} \mathbf{X}^\prime \mathbf{e}$. \begin{enumerate} \item Simplify this expression for $\mathbf{W}$. \item What is the probability distribution of $\mathbf{W}$? \item Now you know whether $V(\mathbf{e})$ has an inverse. Why? \end{enumerate} \item \label{SAT} This question uses data from the furnace study described in Assignment 8. The data file is \href{http://www.utstat.toronto.edu/~brunner/appliedf13/code_n_data/hw/furnace3.data} {\texttt{furnace3.data}}. There is a link from the course home page in case the one in this document does not work. % furnace3 has some extra missing values for liner type, to make this assignment more interesting. You will see that the description of the data file in Assignment 8 is not completely accurate. This is typical. Furthermore, the client is spear fishing off a coral reef in Samoa, and is unavailable to answer questions. Please use common sense and do the best you can. Using SAS, fit a regression model in which the response variable is the average of energy consumption with vent damper in and vent damper out, and the explanatory variables are age of house, chimney height and type of chimney liner (3 categories). Use indicator dummy variables to represent type of chimney liner, and make Unlined the reference category. Please use \texttt{proc reg simple} instead of just \texttt{proc reg}. This way, you will get simple descriptive statistics including the means of house age and chimney height, which will be useful below. \begin{enumerate} % \item Use texttt{proc freq} to make frequency distributions of Type of chimney liner, and also of your dummy variables. Do the numbers agree? \item Allowing for type of chimney liner and age of house, is chimney height related to energy consumption? \begin{enumerate} \item Give the value of the test statistic, a number from your printout. \item Give $p$-value, a number from your printout. \item Do you reject the null hypothesis at $\alpha = 0.05$? Answer Yes or No. \item If the answer is Yes, what do you conclude? Use plain, non-statistical language. \end{enumerate} \item Controlling for for type of chimney liner and chimney height, is age of house related to energy consumption? \begin{enumerate} \item Give the value of the test statistic, a number from your printout. \item Give $p$-value, a number from your printout. \item Do you reject the null hypothesis at $\alpha = 0.05$? Answer Yes or No. \item If the answer is Yes, what do you conclude? Use plain, non-statistical language. \end{enumerate} \item Taking chimney height and age of house into account, is type of chimney liner related to energy consumption? \begin{enumerate} \item Give the value of the test statistic, a number from your printout. \item Give $p$-value, a number from your printout. \item Do you reject the null hypothesis at $\alpha = 0.05$? Answer Yes or No. \end{enumerate} \item Looking at the estimated regression coefficients and disregarding hypothesis tests for the moment, when you control for chimney height and age of house, for which type of chimney liner is estimated energy consumption greatest? For which type of liner is it least? \item Now it's important to decide which of these differences are real, and which ones might be due to chance. Still guided by the $\alpha = 0.05$ significance level and for the present not worrying about the problem of multiple testing, carry out tests of all pairwise comparisons of the chimney liner types, correcting for chimney height and age of house. Two of these tests are already part of the default output; you'll need to request only one custom test. Express your conclusion \emph{briefly}, using non-statistical language. \item \label{lsmeans} When you do an analysis like this, it's really helpful to present numbers for the average energy consumption of houses with different types of chimney liner. But you don't want to just give sample means, because these ignore chimney height and age of house, rather than controlling for them. A good solution is to calculate three $\widehat{Y}$ values, one for each liner type, with chimney height and age of house set to their \emph{sample mean values}. That's the mean of the entire sample. Go ahead and do this. Please use \texttt{proc iml}, so that the numbers appear on your printout. Are they consistent with your answer to Question~\ref{lsmeans}? \end{enumerate} \end{enumerate} % \vspace{80mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6.5in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf13} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf13}} \end{document} consume liner LSMEAN Metal 7.9729383 Tile 10.7462956 Unlined 11.2050119 /* furnaceread.sas */ options linesize=79 noovp formdlim='_'; title 'STA2101/442(G): Furnace Data'; proc format; value frnfmt 1= 'Forced air' 2= 'Gravity' 3= 'Forced water'; value shfmt 1= 'Round' 2= 'Square' 3= 'Rectangular'; value lnfmt 0= 'Unlined' 1= 'Tile' 2= 'Metal'; value hfmt 1= 'Ranch' 2= 'Two-story' 3= 'tri-level' 4= 'Bi-level' 5= '1.5 stories'; value catfmt 1= 'Ranch' 2= 'Two Story' 3 = 'Other'; data warm; infile 'furnace3.data'; input typfurn area shape height liner house age dampin dampout damper $; /*$*/ label typfurn = 'Type of furnace' area = 'Chimney area' shape = 'Chimney shape' height = 'Chimney height in feet' liner = 'Type of Chimney liner' house = 'Type of house' age = 'House age in yrs (99=99+)' damper = 'Type of damper' dampin = 'Energy consumpt with damper active' dampout = 'Energy consumpt with damper inactive'; format typfurn frnfmt.; format shape shfmt.; format liner lnfmt.; format house hfmt.; /******* Creating New Variables ******/ consume = (dampin+dampout)/2; label consume = 'Aver Energy Consumpt'; diff=dampout-dampin; label diff = 'consumpt w/ damper out minus in'; if house=. then housecat=.; else if house=1 then housecat=1; else if house=2 then housecat=2; else housecat=3; label housecat = 'Recoded House Type'; format housecat catfmt.; /************************** Notes **************************************** There was no id. Type of vent damper was last. Case 24 had * for 3 variables. Manually changed to . (But it does not affect results) **************************************************************************/