\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} % for \mathbb{R} The set of reals \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f15 Assignment Twelve}}\footnote{Copyright information is at the end of the last page.} % A subset of Assignment 8 from 2014. That assignment also had residual analysis of the trees data and other good stuff. \vspace{1 mm} \end{center} \noindent This assignment is preparation for the final exam; the homework questions are not to be handed in. The final exam may or may not have material from this assignment. \begin{enumerate} \item Regression diagnostics are mostly based on the residuals. This question compares the error terms $\epsilon_i$ to the residuals $\widehat{\epsilon}_i$. Answer True or False to each statement. For statements about the residuals, show a calculation that proves your answer. You may use anything on the formula sheet. \begin{enumerate} \item $E(\epsilon_i) = 0$ \item $E(\widehat{\epsilon}_i) = 0$ \item $Var(\epsilon_i) = 0$ \item $Var(\widehat{\epsilon}_i) = 0$ \item $\epsilon_i$ has a normal distribution. \item $\widehat{\epsilon}_i$ has a normal distribution. \item $\epsilon_1, \ldots, \epsilon_n$ are independent. \item $\widehat{\epsilon}_1, \ldots, \widehat{\epsilon}_n$ are independent. \end{enumerate} \item One of these statements is true, and the other is false. Pick one, and show it is true with a quick calculation. Start with something from the formula sheet. \begin{itemize} \item $\widehat{\mathbf{y}} = \mathbf{X} \widehat{\boldsymbol{\beta}} + \widehat{\boldsymbol{\epsilon}}$ \item $\mathbf{y} = \mathbf{X} \widehat{\boldsymbol{\beta}} + \widehat{\boldsymbol{\epsilon}}$ \item $\widehat{\mathbf{y}} = \mathbf{X} \boldsymbol{\beta} + \widehat{\boldsymbol{\epsilon}}$ \end{itemize} As the saying goes, ``Data equals fit plus residual." \item The \emph{deleted residual} is $\widehat{\epsilon}_{(i)} = y_i - \mathbf{x}^\prime_i \widehat{\boldsymbol{\beta}}_{(i)}$, where $\widehat{\boldsymbol{\beta}}_{(i)}$ is defined as usual, but based on the $n-1$ observations with observation $i$ deleted. \begin{enumerate} \item Guided by an expression on the formula sheet, write the formula for the Studentized deleted residual. You don't have to prove anything. You will need the symbols $\mathbf{X}_{(i)}$ and $MSE_{(i)}$, which are defined in the natural way. \item If the model is correct, what is the distribution of the Studentized deleted residual? Make sure you have the degrees of freedom right. \item Why are numerator and denominator independent? \end{enumerate} \pagebreak \item For the general linear regression model, are $\widehat{\mathbf{y}}$ and $\widehat{\boldsymbol{\epsilon}}$ independent? \begin{enumerate} \item Answer Yes or No and prove your answer. \item What does this imply about the plot of predicted values against residuals? \end{enumerate} \item For the general linear regression model, are $\mathbf{y}$ and $\widehat{\mathbf{y}}$ independent? Answer Yes or No and prove your answer. % No, C = sigma^2 H neq 0. Plus sample r^2 = R^2 \item For the general linear regression model, are $\mathbf{y}$ and $\widehat{\boldsymbol{\epsilon}}$ independent? Answer Yes or No and prove your answer. % No, C = sigma^2 (I-H) neq 0. \item For the general linear regression model, calculate $\mathbf{X}^\prime \, \widehat{\boldsymbol{\epsilon}}$ one more time. This will help with the next question. \item For the general linear regression model in which $\mathbf{X}$ is a matrix of constants, \begin{enumerate} \item Why does it not make sense to ask about independence of the independent variable values and the residuals? \item Prove that the sample correlation between residuals and independent variable values must equal exactly zero. \item Does this result depend on the correctness of the model? \item What does the correlation between residuals and independent variable values imply about the corresponding plots? \end{enumerate} \end{enumerate} \vspace{60mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f15} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f15}} \end{document}