% 302f20Assignment4.tex \documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{comment} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f20 Assignment Four}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f20} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f20}}} } \vspace{1 mm} \end{center} \noindent The following problems are not to be handed in. They are preparation for the Quiz on Oct.~8th during tutorial, and for the final exam. Please try them before looking at the answers. Use the formula sheet. \begin{enumerate} \item Independently for $i=1, \ldots, n$, let $y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \epsilon_i$, where the $\beta_j$ are unknown constants, the $x_{ij}$ are known, observable constants, and the $\epsilon_i$ are unobservable random variables with expected value zero. Of course, values of the dependent variable $y_i$ are observable. Start deriving the least squares estimates of $\beta_0$, $\beta_1$ and $\beta_2$ by minimizing the sum of squared differences between the $y_i$ and their expected values. I say \emph{start} because you don't have to finish the job. Stop when you have three linear equations in three unknowns, arranged so they are clearly the so-called ``normal" equations $\mathbf{X}^\prime \mathbf{X}\boldsymbol{\beta} = \mathbf{X}^\prime\mathbf{y}$. \item Assuming $(\mathbf{X}^\prime \mathbf{X})^{-1}$ exists, solve the normal equations for the general case of $k$ predictor variables, obtaining $\widehat{\boldsymbol{\beta}}$. \item \label{xbarybar} For the regression model $y_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik} + \epsilon_i$ etc., \begin{enumerate} \item Differentiate and simplify to obtain the first normal equation. \item Realizing that the least-squares estimates must satisfy this equation, put hats on the $\beta_j$ parameters. \item Defining ``predicted" $y_i$ as $\widehat{y}_i = \widehat{\beta}_0 + \widehat{\beta}_1 x_{i1} + \cdots + \widehat{\beta}_k x_{ik}$, show that $\sum_{i=1}^n \widehat{y}_i = \sum_{i=1}^n y_i$. \item The \emph{residual} for observation $i$ is defined by $\widehat{\epsilon}_i = y_i - \widehat{y}_i$. Show that the sum of residuals equals exactly zero. \item What is $\widehat{y}$ when $x_1 = \overline{x}_1, x_2 = \overline{x}_2, \ldots, x_k = \overline{x}_k$? Show your work. \item Thus, the least squares plane passes through the point $(\overline{x}_1, \overline{x}_2, \ldots, \overline{x}_k, \underline{~~~~})$. Fill in the blank. You have shown that predicted $y$ for average $x$ values is exactly average $y$, and this fact does not depend upon the data at all. \end{enumerate} \item For the general regression model of Question~\ref{xbarybar}, show that $SST = SSR+SSE$; see the formula sheet for definitions. I find it helpful to switch to matrix notation partway through the calculation. \item It is possible to think of the total variation in the $y_i$ not as variation around $\overline{y}$, but as variation around zero. This would make sense if the $y_i$ were differences, like weight loss or increase in profits. Then, variation of $y_i$ around zero can be split into variation of $y_i$ around $\overline{y}$, plus variation of $\overline{y}$ around zero. \begin{enumerate} \item Prove $\sum_{i=1}^n(y_i-0)^2 = \sum_{i=1}^n(y_i-\overline{y})^2 + \sum_{i=1}^n(\overline{y}-0)^2$. \item Propose a version of $R^2$ for this setting. \end{enumerate} ~ \vspace{5mm} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item % Centered simple regression, lifted and cut down from 2101f19 Assignment 1 In the \emph{centered} linear regression model, sample means are subtracted from the explanatory variables, so that values above average are positive and values below average are negative. Here is a version with one explanatory variable. For $i=1, \ldots, n$, let $y_i = \beta_0 + \beta_1(x_i-\overline{x}) + \epsilon_i$, where the $x_i$ values are fixed constants, and so on. \begin{enumerate} \item \label{centeredbetahat} Find the least squares estimates of $\beta_0$ and $\beta_1$. The answer is a pair of formulas. Show your work. \item Because of the centering, it is possible to verify that the solution actually \emph{minimizes} the sum of squares, using only single-variable second derivative tests. Do this part too. \item In an $x,y$ scatterplot, centering $x$ just slides the cloud of points over to the left or right. Should the slope of the least squares line be affected? Comparing your answer to Question~\ref{centeredbetahat} to the formula for $\widehat{\beta}_1$ for the uncentered model on the formula sheet, what do you see? \begin{comment} \item Calculate $\widehat{\beta}_0$ and $\widehat{\beta}_1$ for the following data. Your answer is a pair of numbers. % \begin{center} ~~~~~ \begin{tabular}{c|ccccc} $x$ & 8 & 7 & 7 & 9 & 4 \\ \hline $y$ & 9 & 13 & 9 & 8 & 6 \end{tabular} % \end{center} ~~~~~ I get $\widehat{\beta}_1 = \frac{1}{2}$. \end{comment} \end{enumerate} \item Consider the centered multiple regression model \begin{displaymath} y_i = \beta_0 + \beta_1 (x_{i,1}-\overline{x}_1) + \cdots + \beta_k (x_{i,k}-\overline{x}_k) + \epsilon_i \end{displaymath} with the usual details. \begin{enumerate} \item What is the least squares estimate of $\beta_0$? Show your work. \item What is the connection to Problem \ref{xbarybar}? \end{enumerate} \item \label{glm} For the general linear regression model in matrix form, \begin{enumerate} \item Show (there is no difference beween ``show" and ``prove") that the matrix $\mathbf{X^\prime X}$ is symmetric. You may use without proof the fact that the transpose of an inverse is the inverse of the transpose. \item Show that $\mathbf{X}^\prime\mathbf{X}$ is non-negative definite. \item Show that if the columns of $\mathbf{X}$ are linearly independent, then $\mathbf{X^\prime X}$ is positive definite. \item Show that if $\mathbf{X^\prime X}$ is positive definite, then $(\mathbf{X^\prime X})^{-1}$ exists. \item Show that if $(\mathbf{X^\prime X})^{-1}$ exists, then the columns of $\mathbf{X}$ are linearly independent. \end{enumerate} This is a good problem because it establishes that the least squares estimator $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime\mathbf{y}$ exists if and only if the columns of $\mathbf{X}$ are linearly independent, meaning that no predictor variable is a linear combination of the other ones. \item For the general linear regression model in matrix form with the columns of $\mathbf{X}$ linearly independent as usual, show that $(\mathbf{X}^\prime \mathbf{X})^{-1}$ is positive definite. You may use the existence and properties of $\boldsymbol{\Sigma}^{-1/2}$ without proof. \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item % Hat matrix In the matrix version of the general linear regression model, $\mathbf{X}$ is $n \times (k+1)$ and $\mathbf{y}$ is $n \times 1$. \begin{enumerate} \item What are the dimensions of the hat matrix $\mathbf{H}$? Give the number of rows and the number of columns. \item Show that $\mathbf{H}$ is symmetric. \item Show that $\mathbf{H}$ is idempotent, meaning $\mathbf{H} = \mathbf{H}^2$ \item Using $tr(\mathbf{AB})=tr(\mathbf{BA})$, find $tr(\mathbf{H})$. \item Show that if $\mathbf{H}$ has an inverse, $\mathbf{H} = \mathbf{I}$. \item Assuming that the column of $\mathbf{X}$ are linearly independent (and we always do), what is the rank of $\mathbf{H}$? \item Show that $\widehat{\mathbf{y}} = \mathbf{Hy}$. \item Show that $\widehat{\boldsymbol{\epsilon}} = (\mathbf{I}-\mathbf{H})\mathbf{y}$. \item Show that $\mathbf{I}-\mathbf{H}$ is symmetric. \item Show that $\mathbf{I}-\mathbf{H}$ is idempotent \item What is $tr(\mathbf{I}-\mathbf{H})$? \item Show $(\mathbf{I}-\mathbf{H})\mathbf{y} = (\mathbf{I}-\mathbf{H})\boldsymbol{\epsilon}$. \end{enumerate} \item \label{perpendicular} Prove that $\mathbf{X}^{\prime\,} \widehat{\boldsymbol{\epsilon}} = \mathbf{0}$. If the statement is false (not true in general), explain why it is false when $k>2$. \item In all practical applications, the sample size is larger than the number of regression coefficients: $n>k+1$. But suppose for once that $n=k+1$ and the columns of $\mathbf{X}$ are still linearly independent. This means that $\mathbf{X}^{-1}$ could be obtained by elementary row reduction, proving that $\mathbf{X}^{-1}$ exists. So, for this weird case, \begin{enumerate} \item What is $\widehat{\boldsymbol{\beta}}$? \item What is $\mathbf{H}$? \item What is $\widehat{\mathbf{y}}$? \item What is $\widehat{\boldsymbol{\epsilon}}$? \item How do you know that all the points are exactly on the best-fitting plane? \item For simple regression with an intercept, what is $n$? \item Are all the points exactly on the least squares line? \end{enumerate} \item \label{nocalc} Returning to the matrix version of the linear model and writing $Q(\boldsymbol{\beta}) = (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^\prime (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})$, \begin{enumerate} \item Show that $Q(\boldsymbol{\beta}) = \widehat{\boldsymbol{\epsilon}}^{\,\prime} \, \widehat{\boldsymbol{\epsilon}} + (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta})^\prime (\mathbf{X^\prime X}) (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta})$. \item Why does this imply that the minimum of $Q(\boldsymbol{\beta})$ occurs at $\boldsymbol{\beta} = \widehat{\boldsymbol{\beta}}$? \item The columns of $\mathbf{X}$ are linearly independent. Why does linear independence guarantee that the minimum is unique? \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \label{simple} ``Simple" regression is just regression with a single predictor variable. The model equation is $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$. Fitting this simple regression problem into the matrix framework of the general linear regression model, \begin{enumerate} \item What is the $\mathbf{X}$ matrix? \item What is $\mathbf{X^\prime X}$? \item What is $\mathbf{X^\prime y}$? \item What is $(\mathbf{X^\prime X})^{-1}$? \end{enumerate} \item Show that for simple regression, the proportion of explained sum of squares is the square of the correlation coefficient. That is, $R^2=\frac{SSR}{SST} = r^2$. \item In Question \ref{simple}, the model had an intercept and one predictor variable. But suppose the model has no intercept. This is called simple \emph{regression through the origin}. The model equation would be $y_i = \beta_1 x_i + \epsilon_i$. \begin{enumerate} \item What is the $\mathbf{X}$ matrix? \item What is $\mathbf{X^\prime X}$? \item What is $\mathbf{X^\prime y}$? \item What is $(\mathbf{X^\prime X})^{-1}$? \item What is $\widehat{\boldsymbol{\beta}}$? \end{enumerate} \item There can even be a regression model with an intercept but no predictor variables. In this case the model equation is $y_i = \beta_0 + \epsilon_i$. \begin{enumerate} \item Find the least squares estimator $\widehat{\beta}_0$ with calculus. \item Find the least squares estimator $\widehat{\beta}_0$ without calculus, using Problem~\ref{nocalc} as a model. \item What is the $\mathbf{X}$ matrix? \item What is $\mathbf{X^\prime X}$? \item What is $\mathbf{X^\prime y}$? \item What is $(\mathbf{X^\prime X})^{-1}$? \item Verify that your expression for $\widehat{\beta}_0$ agrees with $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime\mathbf{y}$. \item What is $\widehat{\mathbf{y}}$? What are its dimensions? \end{enumerate} \item For the general linear regression model, \begin{enumerate} \item Show that $s^2 = \frac{\widehat{\boldsymbol{\epsilon}}^{\,\prime \,} \widehat{\boldsymbol{\epsilon}}}{n-k-1}$ is an unbiased estimator of $\sigma^2$. \item What is the connection of this $s^2$ to the usual $s^2$? \end{enumerate} \end{enumerate} %\vspace{60mm} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Next time % Decomposition of SS, and R^2 % Estimating sigma-squared. \item For the general linear regression model in matrix form, find $E(\mathbf{y})$ and $cov(\mathbf{y})$. Show your work. % Maximum likelihood HW: sigma^2 as part b. % Next time, a series of these: % \item What are the dimensions of the matrix ? % \item What is $E()$? Show your work. % \item What is $cov()$? Show your work. \item What are the dimensions of the matrix $\widehat{\boldsymbol{\epsilon}}$? \item What is $E(\widehat{\boldsymbol{\epsilon}})$? Show your work. Is $\widehat{\boldsymbol{\epsilon}}$ an unbiased estimator of $\boldsymbol{\epsilon}$? This is a trick question, and requires thought. \item What is $cov(\widehat{\boldsymbol{\epsilon}})$? Show your work. It is easier if you use $\mathbf{I}-\mathbf{H}$. \item What are the dimensions of the random vector $\mathbf{b}$ as defined in Expression (2.9)? Give the number of rows and the number of columns. \item Is $\mathbf{b}$ an unbiased estimator of $\boldsymbol{\beta}$? Answer Yes or No and show your work. \item Calculate $cov(\mathbf{b})$ and simplify. Show your work. \item What are the dimensions of the random vector $\widehat{\mathbf{y}}$? \item What is $E(\widehat{\mathbf{y}})$? Show your work. \item What is $cov(\widehat{\mathbf{y}})$? Show your work. It is easier if you use $H$. \item What are the dimensions of the random vector $\mathbf{e}$? \item What is $E(\mathbf{e})$? Show your work. Is $\mathbf{e}$ an unbiased estimator of $\boldsymbol{\epsilon}$? This is a trick question, and requires thought. \item What is $cov(\mathbf{e})$? Show your work. It is easier if you use $I-H$. \item Let $s^2 = \mathbf{e}^\prime\mathbf{e}/(n-k-1)$ as in Expression (2.33). Show that $s^2$ is an unbiased estimator of $\sigma^2$. The way this was done in lecture is preferable to the way it is done in the text, in my opinion. \item The set of vectors $\mathcal{V} = \{\mathbf{v} = X\mathbf{a}: \mathbf{a} \in \mathbb{R}^{k+1}\}$ is the subset of $\mathbb{R}^{n}$ consisting of linear combinations of the columns of $X$. That is, $\mathcal{V}$ is the space \emph{spanned} by the columns of $X$. The least squares estimator $\mathbf{b} = (X^\prime X)^{-1}X^\prime\mathbf{y}$ was obtained by minimizing $(\mathbf{y}-X\mathbf{a})^\prime(\mathbf{y}-X\mathbf{a})$ over all $\mathbf{a} \in \mathbb{R}^{k+1}$. Thus, $\widehat{\mathbf{y}} = X\mathbf{b}$ is the point in $\mathcal{V}$ that is \emph{closest} to the data vector $\mathbf{y}$. Geometrically, $\widehat{\mathbf{y}}$ is the \emph{projection} (shadow) of $\mathbf{y}$ onto $\mathcal{V}$. The hat matrix $H$ is a \emph{projection matrix}. It projects the image on any point in $\mathbb{R}^{n}$ onto $\mathcal{V}$. Now we will test out several consequences of this idea. \begin{enumerate} \item The shadow of a point already in $\mathcal{V}$ should be right at the point itself. Show that if $\mathbf{v} \in \mathcal{V}$, then $H\mathbf{v}= \mathbf{v}$. \item The vector of differences $\mathbf{e} = \mathbf{y} - \widehat{\mathbf{y}}$ should be perpendicular (at right angles) to each and every basis vector of $\mathcal{V}$. How is this related to Theorem 2.1? \item Show that the vector of residuals $\mathbf{e}$ is perpendicular to any $\mathbf{v} \in \mathcal{V}$. \end{enumerate}