%                                   302f20Assignment4.tex
\documentclass[11pt]{article} 
%\usepackage{amsbsy} % for \boldsymbol and \pmb 
\usepackage{graphicx} % To include pdf files!
\usepackage{amsmath}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage{comment}
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links
\usepackage{fullpage}
%\pagestyle{empty} % No page numbers


\begin{document}
%\enlargethispage*{1000 pt} 

\begin{center}   
{\Large \textbf{STA 302f20 Assignment Four}\footnote{This assignment was prepared by  \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner},
Department of Statistical Sciences, University of Toronto. It is licensed under a 
\href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US}
     {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website:
\href{http://www.utstat.toronto.edu/~brunner/oldclass/302f20} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f20}}} }
\vspace{1 mm}
\end{center}

\noindent
The following problems are not to be handed in. They are preparation for the Quiz on Oct.~8th during tutorial, and for the final exam. Please try them before looking at the answers. Use the formula sheet.

\begin{enumerate} 

\item Independently for $i=1, \ldots, n$, let $y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \epsilon_i$, where the $\beta_j$ are unknown constants, the $x_{ij}$ are known, observable constants, and the $\epsilon_i$ are unobservable random variables with expected value zero. Of course, values of the dependent variable $y_i$ are observable. Start deriving the least squares estimates of $\beta_0$, $\beta_1$ and $\beta_2$ by minimizing the sum of squared differences between the $y_i$ and their expected values. I say \emph{start} because you don't have to finish the job. Stop when you have three linear equations in three unknowns, arranged so they are clearly the so-called ``normal" equations $\mathbf{X}^\prime \mathbf{X}\boldsymbol{\beta} = \mathbf{X}^\prime\mathbf{y}$.

\item Assuming $(\mathbf{X}^\prime \mathbf{X})^{-1}$ exists, solve the normal equations for the general case of $k$ predictor variables, obtaining $\widehat{\boldsymbol{\beta}}$.

\item \label{xbarybar} For the regression model $y_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik} + \epsilon_i$ etc., 
    \begin{enumerate}
        \item Differentiate and simplify to obtain the first normal equation.
        \item Realizing that the least-squares estimates must satisfy this equation, put hats on the $\beta_j$ parameters.
        \item Defining ``predicted" $y_i$ as $\widehat{y}_i = \widehat{\beta}_0 + \widehat{\beta}_1 x_{i1} + \cdots + \widehat{\beta}_k x_{ik}$, show that $\sum_{i=1}^n \widehat{y}_i = \sum_{i=1}^n y_i$. 
        \item The \emph{residual} for observation $i$ is defined by $\widehat{\epsilon}_i = y_i - \widehat{y}_i$.  Show that the sum of residuals equals exactly zero. 
        \item What is $\widehat{y}$ when $x_1 = \overline{x}_1, x_2 = \overline{x}_2, \ldots, x_k = \overline{x}_k$? Show your work.
        \item Thus, the least squares plane passes through the point $(\overline{x}_1, \overline{x}_2, \ldots, \overline{x}_k, \underline{~~~~})$. Fill in the blank. You have shown that predicted $y$ for average $x$ values is exactly average $y$, and this fact does not depend upon the data at all.
    \end{enumerate}

\item For the general regression model of Question~\ref{xbarybar}, show that $SST = SSR+SSE$; see the formula sheet for definitions.  I find it helpful to switch to matrix notation partway through the calculation.

\item It is possible to think of the total variation in the $y_i$ not as variation around $\overline{y}$, but as variation around zero. This would make sense if the $y_i$ were differences, like weight loss or increase in profits. Then, variation of $y_i$ around zero can be split into variation of $y_i$ around $\overline{y}$, plus variation of $\overline{y}$ around zero.
    \begin{enumerate}
        \item Prove $\sum_{i=1}^n(y_i-0)^2 = \sum_{i=1}^n(y_i-\overline{y})^2 + \sum_{i=1}^n(\overline{y}-0)^2$.
        \item Propose a version of $R^2$ for this setting.
    \end{enumerate}

~ \vspace{5mm}

\pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\item % Centered simple regression, lifted and cut down from 2101f19 Assignment 1
In the \emph{centered} linear regression model, sample means are subtracted from the explanatory variables, so that values above average are positive and values below average are negative. Here is a version with one explanatory variable. For $i=1, \ldots, n$, let $y_i = \beta_0 + \beta_1(x_i-\overline{x}) + \epsilon_i$, where the $x_i$ values are fixed constants, and so on.
    \begin{enumerate}
        \item \label{centeredbetahat} Find the least squares estimates of $\beta_0$ and $\beta_1$. The answer is a pair of formulas. Show your work. 
        \item Because of the centering, it is possible to verify that the solution actually \emph{minimizes} the sum of squares, using only single-variable second derivative tests. Do this part too.
        \item In an $x,y$ scatterplot, centering $x$ just slides the cloud of points over to the left or right. Should the slope of the least squares line be affected? Comparing your answer to Question~\ref{centeredbetahat} to the formula for $\widehat{\beta}_1$ for the uncentered model on the formula sheet, what do you see?
\begin{comment} 
        \item Calculate $\widehat{\beta}_0$ and $\widehat{\beta}_1$ for the following data. Your answer is a pair of numbers.
%        \begin{center}
~~~~~   \begin{tabular}{c|ccccc}
        $x$  &  8  &  7  &  7  &  9  &  4 \\ \hline
        $y$  &  9  & 13  &  9  &  8  &  6
        \end{tabular}
%        \end{center}
~~~~~ I get $\widehat{\beta}_1 = \frac{1}{2}$.
\end{comment}
    \end{enumerate}

    \item Consider the centered multiple regression model 
\begin{displaymath}    
    y_i = \beta_0 + \beta_1 (x_{i,1}-\overline{x}_1) + \cdots + \beta_k (x_{i,k}-\overline{x}_k) + \epsilon_i
\end{displaymath}    
with the usual details.
    \begin{enumerate}
        \item What is the least squares estimate of $\beta_0$? Show your work.
        \item What is the connection to Problem \ref{xbarybar}?
    \end{enumerate}

\item  \label{glm} For the general linear regression model in matrix form,
    \begin{enumerate}
        \item Show (there is no difference beween ``show" and ``prove") that the matrix $\mathbf{X^\prime X}$ is symmetric. You may use without proof the fact that the transpose of an inverse is the inverse of the transpose.
        \item Show that $\mathbf{X}^\prime\mathbf{X}$ is non-negative definite.
        \item Show that if the columns of $\mathbf{X}$ are linearly independent, then $\mathbf{X^\prime X}$ is positive definite.
        \item Show that if $\mathbf{X^\prime X}$ is positive definite, then $(\mathbf{X^\prime X})^{-1}$ exists. 
        \item Show that if $(\mathbf{X^\prime X})^{-1}$ exists, then the columns of $\mathbf{X}$ are linearly independent. 
    \end{enumerate}

This is a good problem because it establishes that the least squares estimator $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime\mathbf{y}$ exists if and only if the columns of $\mathbf{X}$ are linearly independent, meaning that no predictor variable is a linear combination of the other ones.

\item  For the general linear regression model in matrix form with the columns of $\mathbf{X}$ linearly independent as usual, show that $(\mathbf{X}^\prime \mathbf{X})^{-1}$ is positive definite. You may use the existence and properties of $\boldsymbol{\Sigma}^{-1/2}$ without proof.

\pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\item % Hat matrix
In the matrix version of the general linear regression model, $\mathbf{X}$ is $n \times (k+1)$ and $\mathbf{y}$ is $n \times 1$.
    \begin{enumerate}
        \item What are the dimensions of the hat matrix $\mathbf{H}$? Give the number of rows and the number of columns.
        \item Show that $\mathbf{H}$ is symmetric.
        \item Show that $\mathbf{H}$ is idempotent, meaning $\mathbf{H} = \mathbf{H}^2$
        \item Using $tr(\mathbf{AB})=tr(\mathbf{BA})$, find $tr(\mathbf{H})$.
        \item Show that if $\mathbf{H}$ has an inverse, $\mathbf{H} = \mathbf{I}$.
        \item Assuming that the column of $\mathbf{X}$ are linearly independent (and we always do), what is the rank of $\mathbf{H}$? 
        \item Show that $\widehat{\mathbf{y}} = \mathbf{Hy}$.
        \item Show that $\widehat{\boldsymbol{\epsilon}} = (\mathbf{I}-\mathbf{H})\mathbf{y}$.
        \item Show that $\mathbf{I}-\mathbf{H}$ is symmetric.
        \item Show that $\mathbf{I}-\mathbf{H}$ is idempotent
        \item What is $tr(\mathbf{I}-\mathbf{H})$?
        \item Show $(\mathbf{I}-\mathbf{H})\mathbf{y} = (\mathbf{I}-\mathbf{H})\boldsymbol{\epsilon}$.  
    \end{enumerate}

\item \label{perpendicular} Prove that $\mathbf{X}^{\prime\,} \widehat{\boldsymbol{\epsilon}} = \mathbf{0}$. If the statement is false (not true in general), explain why it is false when $k>2$.

\item In all practical applications, the sample size is larger than the number of regression coefficients: $n>k+1$.  But suppose for once that $n=k+1$ and the columns of $\mathbf{X}$ are still linearly independent. This means that $\mathbf{X}^{-1}$ could be obtained by elementary row reduction, proving that $\mathbf{X}^{-1}$ exists. So, for this weird case,
    \begin{enumerate}
        \item What is $\widehat{\boldsymbol{\beta}}$?
        \item What is $\mathbf{H}$?
        \item What is $\widehat{\mathbf{y}}$?
        \item What is $\widehat{\boldsymbol{\epsilon}}$?
        \item How do you know that all the points are exactly on the best-fitting plane?
        \item For simple regression with an intercept, what is $n$?
        \item Are all the points exactly on the least squares line?
    \end{enumerate}

\item \label{nocalc} Returning to the matrix version of the linear model and writing  
$Q(\boldsymbol{\beta}) = (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^\prime (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})$, 
    \begin{enumerate}
        \item Show that $Q(\boldsymbol{\beta}) = 
        \widehat{\boldsymbol{\epsilon}}^{\,\prime} \, \widehat{\boldsymbol{\epsilon}} 
        + (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta})^\prime  (\mathbf{X^\prime X}) 
        (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta})$.
        \item Why does this imply that the minimum of $Q(\boldsymbol{\beta})$ occurs at 
        $\boldsymbol{\beta} = \widehat{\boldsymbol{\beta}}$?
        \item  The columns of $\mathbf{X}$ are linearly independent. Why does linear independence guarantee that the minimum is unique?

    \end{enumerate}

\pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\item \label{simple} ``Simple" regression is just regression with a single predictor variable. The model equation is $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$.  Fitting this simple regression problem into the matrix framework of the general linear regression model,
        \begin{enumerate}
            \item What is the $\mathbf{X}$ matrix?
            \item What is $\mathbf{X^\prime X}$?
            \item What is $\mathbf{X^\prime y}$?
            \item What is $(\mathbf{X^\prime X})^{-1}$?
        \end{enumerate}

\item Show that for simple regression, the proportion of explained sum of squares is the square of the correlation coefficient. That is, $R^2=\frac{SSR}{SST} = r^2$.

\item In Question \ref{simple}, the model had an intercept and one predictor variable. But suppose the model has no intercept. This is called simple \emph{regression through the origin}. The model equation would be $y_i = \beta_1 x_i + \epsilon_i$.
        \begin{enumerate}
            \item What is the $\mathbf{X}$ matrix?
            \item What is $\mathbf{X^\prime X}$?
            \item What is $\mathbf{X^\prime y}$?
            \item What is $(\mathbf{X^\prime X})^{-1}$?
            \item What is $\widehat{\boldsymbol{\beta}}$?
        \end{enumerate}

\item There can even be a regression model with an intercept but no predictor variables. In this case the model equation is $y_i = \beta_0 + \epsilon_i$.
        \begin{enumerate}
            \item Find the least squares estimator $\widehat{\beta}_0$ with calculus.
            \item Find the least squares estimator $\widehat{\beta}_0$ without calculus, using Problem~\ref{nocalc} as a model.
            \item What is the $\mathbf{X}$ matrix?
            \item What is $\mathbf{X^\prime X}$?
            \item What is $\mathbf{X^\prime y}$?
            \item What is $(\mathbf{X^\prime X})^{-1}$?
            \item Verify that your expression for  $\widehat{\beta}_0$ agrees with $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime\mathbf{y}$. 
            \item What is $\widehat{\mathbf{y}}$?  What are its dimensions?
        \end{enumerate}

\item For the general linear regression model, 
    \begin{enumerate}
        \item Show that $s^2 = \frac{\widehat{\boldsymbol{\epsilon}}^{\,\prime \,}
             \widehat{\boldsymbol{\epsilon}}}{n-k-1}$ is an unbiased estimator of $\sigma^2$.
        \item What is the connection of this $s^2$ to the usual $s^2$?
    \end{enumerate}


\end{enumerate}

%\vspace{60mm}

\end{document}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Next time

% Decomposition of SS, and R^2
% Estimating sigma-squared. 

\item  For the general linear regression model in matrix form, find $E(\mathbf{y})$ and $cov(\mathbf{y})$. Show your work.


    % Maximum likelihood HW: sigma^2 as part b.

% Next time, a series of these:
%        \item What are the dimensions of the matrix ?  
%        \item What is $E()$? Show your work.
%        \item What is $cov()$? Show your work.

        \item What are the dimensions of the matrix $\widehat{\boldsymbol{\epsilon}}$? 
        \item What is $E(\widehat{\boldsymbol{\epsilon}})$? Show your work. Is $\widehat{\boldsymbol{\epsilon}}$ an unbiased estimator of $\boldsymbol{\epsilon}$? This is a trick question, and requires thought. 
        \item What is $cov(\widehat{\boldsymbol{\epsilon}})$? Show your work. It is easier if you use $\mathbf{I}-\mathbf{H}$.

        \item What are the dimensions of the random vector $\mathbf{b}$ as defined in Expression (2.9)? Give the number of rows and the number of columns.
        \item Is $\mathbf{b}$ an unbiased estimator of $\boldsymbol{\beta}$? Answer Yes or No and show your work.
        \item Calculate $cov(\mathbf{b})$ and simplify. Show your work. 

        \item What are the dimensions of the random vector $\widehat{\mathbf{y}}$? 
        \item What is $E(\widehat{\mathbf{y}})$? Show your work.
        \item What is $cov(\widehat{\mathbf{y}})$? Show your work. It is easier if you use $H$.

        \item What are the dimensions of the random vector $\mathbf{e}$? 
        \item What is $E(\mathbf{e})$? Show your work. Is $\mathbf{e}$ an unbiased estimator of $\boldsymbol{\epsilon}$? This is a trick question, and requires thought. 
        \item What is $cov(\mathbf{e})$? Show your work. It is easier if you use $I-H$.

        \item Let $s^2 = \mathbf{e}^\prime\mathbf{e}/(n-k-1)$ as in Expression (2.33). Show that $s^2$ is an unbiased estimator of $\sigma^2$. The way this was done in lecture is preferable to the way it is done in the text, in my opinion.

\item The set of vectors $\mathcal{V} = \{\mathbf{v} = X\mathbf{a}: \mathbf{a} \in \mathbb{R}^{k+1}\}$ is the subset of $\mathbb{R}^{n}$ consisting of linear combinations of the columns of $X$.  That is, $\mathcal{V}$ is the space \emph{spanned} by the columns of $X$. The least squares estimator $\mathbf{b} = (X^\prime X)^{-1}X^\prime\mathbf{y}$ was obtained by minimizing $(\mathbf{y}-X\mathbf{a})^\prime(\mathbf{y}-X\mathbf{a})$ over all $\mathbf{a} \in \mathbb{R}^{k+1}$. Thus, $\widehat{\mathbf{y}} = X\mathbf{b}$ is the point in $\mathcal{V}$ that is \emph{closest} to the data vector $\mathbf{y}$. Geometrically, $\widehat{\mathbf{y}}$ is the \emph{projection} (shadow) of $\mathbf{y}$ onto $\mathcal{V}$. The hat matrix $H$ is a \emph{projection matrix}. It projects the image on any point in $\mathbb{R}^{n}$ onto $\mathcal{V}$. Now we will test out several consequences of this idea.
    \begin{enumerate}
        \item The shadow of a point already in $\mathcal{V}$ should be right at the point itself. Show that if $\mathbf{v} \in \mathcal{V}$, then $H\mathbf{v}= \mathbf{v}$.
        \item The vector of differences $\mathbf{e} = \mathbf{y} - \widehat{\mathbf{y}}$ should be perpendicular (at right angles) to each and every basis vector of $\mathcal{V}$. How is this related to Theorem 2.1?
        \item Show that the vector of residuals $\mathbf{e}$ is perpendicular to any $\mathbf{v} \in \mathcal{V}$.
    \end{enumerate}