% 302f17Assignment4.tex \documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f17 Assignment Four}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} % In an effort to use the text, I've commented out some material that may be useful for quiz and exam questions. % 2016 A7 Q5: r^2 = R^2 % Decomp of SS \noindent %The general linear regression model is $\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\epsilon}$, where $X$ is an $n \times (k+1)$ matrix of observable constants, $\boldsymbol{\beta}$ is a $(k+1) \times 1$ vector of unknown constants (parameters), and $\boldsymbol{\epsilon}$ is an $n \times 1$ vector of unobservable random variables with $E(\boldsymbol{\epsilon})=\mathbf{0}$ and $cov(\boldsymbol{\epsilon})=\sigma^2I_n$. The error variance $\sigma^2>0$ is an unknown constant parameter. \begin{enumerate} \item Independently for $i=1, \ldots, n$, let $y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \epsilon_i$, where the $\beta_j$ are unknown constants, the $x_{ij}$ are known, observable constants, and the $\epsilon_i$ are unobservable random variables with expected value zero. If course, values of the dependent variable $y_i$ are observable. Start deriving the least squares estimates of $\beta_0$, $\beta_1$ and $\beta_2$ by minimizing the sum of squared differences between the $y_i$ and their expected values. I say \emph{start} because you don't have to finish the job. Stop when you have three linear equations in three unknowns, arranged so they are clearly the so-called ``normal" equations $X^\prime X\boldsymbol{\beta} = X^\prime\mathbf{y}$. \item \label{glm} For the general linear regression model in matrix form, \begin{enumerate} \item Show (there is no difference beween ``show" and ``prove") that the matrix $X^\prime X$ is symmetric. \item Show that $X^\prime X$ is non-negative definite. \item Show that if the columns of $X$ are linearly independent, then $X^\prime X$ is positive definite. \item Show that if $X^\prime X$ is positive definite, then $(X^\prime X)^{-1}$ exists. \item Show that if $(X^\prime X)^{-1}$ exists, then the columns of $X$ are linearly independent. \end{enumerate} This is a good problem because it establishes that the least squares estimator $\mathbf{b} = (X^\prime X)^{-1}X^\prime\mathbf{y}$ exists if and only if the columns of $X$ are linearly independent, meaning that no independent variable is a linear combination of the other ones. \item %Let $\widehat{\mathbf{y}} = X\mathbf{b} = H\mathbf{y}$, where $H = X(X^\prime X)^{-1}X^\prime$. The residuals are in the vector $\mathbf{e} = \mathbf{y}-\widehat{\mathbf{y}}$. In the matrix version of the general linear regression model, $X$ is $n \times (k+1)$ and $\mathbf{y}$ is $n \times 1$. \begin{enumerate} \item What are the dimensions of the matrix $H$? Give the number of rows and the number of columns. \item Assuming that the column of $X$ are linearly independent (and we always do), what is the rank of $H$? \item You know that the inverse of a square matrix exists if and only if its columns (and rows) are linearly independent. If $n > k+1$, can $H$ have an inverse? Answer Yes or No. \item Show that $H$ is symmetric. \item Show that $H$ is idempotent, meaning $H = H^2$ \item Using $tr(AB)=tr(BA)$, find $tr(H)$. \item Show that $\widehat{\mathbf{y}} = H\mathbf{y}$. \item Show that $\mathbf{e} = (I-H)\mathbf{y}$. \item Show that $M = I-H$ is symmetric. \item Show that $M$ is idempotent. \item Find $tr(M)$. \end{enumerate} \item Please read Chapter 2, pages 28-37 in the textbook. % Just before decomposition of SS. \begin{enumerate} \item Show that $M\boldsymbol{\epsilon}=\mathbf{e}$. %\item \label{perpendicular} Prove that $X^\prime \mathbf{e} = \mathbf{0}$. If the statement is false (not true in general), explain why it is false. \item Prove Theorem 2.1 in the text. % I know this partly repeats work you have already done. \item Why does $X^\prime\mathbf{e}=\mathbf{0}$ tell you that if a regression model has an intercept, the residuals must add up to zero? \item Letting $\mathcal{S} = (\mathbf{y}-X\boldsymbol{\beta})^\prime (\mathbf{y}-X\boldsymbol{\beta})$, \begin{enumerate} \item Show that $\mathcal{S} = \mathbf{e}^\prime\mathbf{e} + (\mathbf{b}-\boldsymbol{\beta})^\prime (X^\prime X) (\mathbf{b}-\boldsymbol{\beta}) .$ \item Why does this imply that the minimum of $\mathcal{S}(\boldsymbol{\beta})$ occurs at $\boldsymbol{\beta} = \mathbf{b}$? \item The columns of $X$ are linearly independent. Why does linear independence guarantee that the minimum is unique? \end{enumerate} \item What are the dimensions of the random vector $\mathbf{b}$ as defined in Expression (2.9)? Give the number of rows and the number of columns. \item Is $\mathbf{b}$ an unbiased estimator of $\boldsymbol{\beta}$? Answer Yes or No and show your work. \item Calculate $cov(\mathbf{b})$ and simplify. Show your work. \item What are the dimensions of the random vector $\widehat{\mathbf{y}}$? \item What is $E(\widehat{\mathbf{y}})$? Show your work. \item What is $cov(\widehat{\mathbf{y}})$? Show your work. It is easier if you use $H$. \item What are the dimensions of the random vector $\mathbf{e}$? \item What is $E(\mathbf{e})$? Show your work. Is $\mathbf{e}$ an unbiased estimator of $\boldsymbol{\epsilon}$? This is a trick question, and requires thought. \item What is $cov(\mathbf{e})$? Show your work. It is easier if you use $I-H$. \item Let $s^2 = \mathbf{e}^\prime\mathbf{e}/(n-k-1)$ as in Expression (2.33). Show that $s^2$ is an unbiased estimator of $\sigma^2$. The way this was done in lecture is preferable to the way it is done in the text, in my opinion. \item Do Exercises 2.1, 2.3 and 2.6 in the text. In 2.6, $\Gamma$ orthogonal means $\Gamma^\prime = \Gamma^{-1}$. \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%% \item \label{scalar} The scalar form of the general linear regression model is \begin{displaymath} y_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik} + \epsilon_i, \end{displaymath} where $\epsilon_1, \ldots, \epsilon_n$ are a random sample from a distribution with expected value zero and variance $\sigma^2$. The numbers $x_{ij}$ are known, observed constants, while $\beta_0, \ldots, \beta_k$ and $\sigma^2$ are unknown constants (parameters). The term ``random sample" means independent and identically distributed in this course, so the $\epsilon_i$ random variables have zero covariance with one another. \begin{enumerate} \item What is $E(y_i)$? \item What is $Var(y_i)$? \item What is $Cov(y_i,y_j)$ for $i \neq j$? \item Defining $SSTO=\sum_{i=1}^n(y_i-\overline{y})^2$, $SSR = \sum_{i=1}^n(\widehat{y}_i-\overline{y})^2$ and $SSE=\sum_{i=1}^n(y_i-\widehat{y}_i)^2$, show $SSTO=SSE+SSR$. I find it helpful to switch to matrix notation partway through the calculation. \end{enumerate} \item \label{simple} ``Simple" regression is just regression with a single independent variable. The model equation is $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$. Fitting this simple regression problem into the matrix framework of the general linear regression model in matrix form (see formula sheet), \begin{enumerate} \item What is the $X$ matrix? \item What is $X^\prime X$? Your answer is a $2 \times 2$ matrix with a formula in each cell. \item What is $X^\prime \mathbf{y}$? Again, your answer is a matrix with a formula in each cell. \end{enumerate} \item Show that for simple regression, the proportion of explained sum of squares is the square of the correlation coefficient. That is, $R^2=\frac{SSR}{SST} = r^2$. \item In \emph{simple regression through the origin}, there is one independent variable and no intercept. The model is $y_i = \beta_1 x_i + \epsilon_i$. \begin{enumerate} \item What is the $X$ matrix? \item What is $X^\prime X$? \item What is $X^\prime \mathbf{y}$? \item What is $(X^\prime X)^{-1}$? \item What is $b_1 = (X^\prime X)^{-1}X^\prime\mathbf{y}$? Compare your answer to (1.22) on page 11 in the textbook. \end{enumerate} \pagebreak \item There can even be a regression model with an intercept and no independent variables. In this case the model would be $y_i = \beta_0 + \epsilon_i$. \begin{enumerate} \item \label{ybar} Find the least squares estimator of $\beta_0$ with calculus. \item What is the $X$ matrix? \item What is $X^\prime X$? \item What is $X^\prime \mathbf{y}$? \item What is $(X^\prime X)^{-1}$? \item What is $b_0 = (X^\prime X)^{-1}X^\prime\mathbf{y}$? Compare this with your answer to Question~\ref{ybar}. \end{enumerate} \item The set of vectors $\mathcal{V} = \{\mathbf{v} = X\mathbf{a}: \mathbf{a} \in \mathbb{R}^{k+1}\}$ is the subset of $\mathbb{R}^{n}$ consisting of linear combinations of the columns of $X$. That is, $\mathcal{V}$ is the space \emph{spanned} by the columns of $X$. The least squares estimator $\mathbf{b} = (X^\prime X)^{-1}X^\prime\mathbf{y}$ was obtained by minimizing $(\mathbf{y}-X\mathbf{a})^\prime(\mathbf{y}-X\mathbf{a})$ over all $\mathbf{a} \in \mathbb{R}^{k+1}$. Thus, $\widehat{\mathbf{y}} = X\mathbf{b}$ is the point in $\mathcal{V}$ that is \emph{closest} to the data vector $\mathbf{y}$. Geometrically, $\widehat{\mathbf{y}}$ is the \emph{projection} (shadow) of $\mathbf{y}$ onto $\mathcal{V}$. The hat matrix $H$ is a \emph{projection matrix}. It projects the image on any point in $\mathbb{R}^{n}$ onto $\mathcal{V}$. Now we will test out several consequences of this idea. \begin{enumerate} \item The shadow of a point already in $\mathcal{V}$ should be right at the point itself. Show that if $\mathbf{v} \in \mathcal{V}$, then $H\mathbf{v}= \mathbf{v}$. \item The vector of differences $\mathbf{e} = \mathbf{y} - \widehat{\mathbf{y}}$ should be perpendicular (at right angles) to each and every basis vector of $\mathcal{V}$. How is this related to Theorem 2.1? \item Show that the vector of residuals $\mathbf{e}$ is perpendicular to any $\mathbf{v} \in \mathcal{V}$. \end{enumerate} \end{enumerate} %\vspace{60mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f17} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f17}} \end{document} % \item What are the dimensions of the matrix ? % \item What is $E()$? Show your work. % \item What is $cov()$? Show your work.