%                                   302f20Assignment7.tex
\documentclass[12pt]{article} 
%\usepackage{amsbsy} % for \boldsymbol and \pmb 
\usepackage{graphicx} % To include pdf files!
\usepackage{amsmath}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage{comment}
\usepackage{euscript} % for \EuScript
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links
\usepackage{fullpage}
%\pagestyle{empty} % No page numbers


\begin{document}
%\enlargethispage*{1000 pt} 

\begin{center}   
{\Large \textbf{STA 302f20 Assignment Seven}\footnote{This assignment was prepared by  \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner},
Department of Statistical Sciences, University of Toronto. It is licensed under a 
\href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US}
     {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website:
\href{http://www.utstat.toronto.edu/~brunner/oldclass/302f20} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f20}}} }
\vspace{1 mm}
\end{center}

\noindent
The following problems are not to be handed in. They are preparation for the Quiz in tutorial and the final exam. Please try them before looking at the answers. Use the formula sheet.  % Please remember that the R parts (Questions~\ref{sat} and~\ref{faraway} are \emph{not group projects}. You may compare numerical answers, but do not show anyone your code or look at anyone else's.

\begin{enumerate} 

\item  Label each of the following statements True (meaning always true) or False (meaning not always true), and show your work or explain. Assume the general linear regression model with normal error terms. As usual, the columns of $\mathbf{X}$ are linearly independent.
    \begin{enumerate}
        \item $\widehat{\mathbf{y}} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}$
        \item $\mathbf{y} = \mathbf{X} \widehat{\boldsymbol{\beta}} + \widehat{\boldsymbol{\epsilon}}$.
        \item $\widehat{\mathbf{y}} = \mathbf{X} \widehat{\boldsymbol{\beta}} + \widehat{\boldsymbol{\epsilon}}$
        \item $\mathbf{y} = \mathbf{X} \boldsymbol{\beta}$
        \item $\mathbf{X}^\prime\boldsymbol{\epsilon} = \mathbf{0}$
        \item $(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^\prime (\mathbf{y}-\mathbf{X}\boldsymbol{\beta}) = \boldsymbol{\epsilon}^\prime\boldsymbol{\epsilon}$. 
        \item $\widehat{\boldsymbol{\epsilon}}^\prime \, \widehat{\boldsymbol{\epsilon}} = \mathbf{0}$
        \item $\widehat{\boldsymbol{\epsilon}}^\prime \, \widehat{\boldsymbol{\epsilon}} = \mathbf{y}^\prime \, \widehat{\boldsymbol{\epsilon}}$. 
        \item $w = \frac{\boldsymbol{\epsilon}^\prime\boldsymbol{\epsilon}}{\sigma^2}$ has a chi-squared distribution. 
        \item $E(\boldsymbol{\epsilon}^\prime\boldsymbol{\epsilon})=0$
        \item $E(\widehat{\boldsymbol{\epsilon}}^\prime \, \widehat{\boldsymbol{\epsilon}})=0$
    \end{enumerate}

\item For the general linear regression model with normal error terms, 
    \begin{enumerate}
        \item What is the distribution of the response variable $\mathbf{y}$? Just write down the answer.
        \item  What is the distribution of the vector of estimated regression coefficients $\widehat{\boldsymbol{\beta}}$? Show the calculations.
        \item What is the distribution of the vector of ``predicted" values $\widehat{\mathbf{y}}$? Show the expected value and covariance matrix calculations. Express the covariance matrix in terms of the hat matrix $\mathbf{H}$.
        \item What is the distribution of the vector of residuals $\widehat{\boldsymbol{\epsilon}}$? Show the calculations. Simplify. Express the covariance matrix in terms of the hat matrix $\mathbf{H}$.
    \end{enumerate}
 
\item For the general linear regression model with normal error terms, show that the $n \times (k+1)$ matrix of covariances $cov(\widehat{\boldsymbol{\epsilon}},\widehat{\boldsymbol{\beta}}) = \mathbf{O} $. Why does this show that \emph{SSE} $ = \widehat{\boldsymbol{\epsilon}}^{\,\prime\,}\widehat{\boldsymbol{\epsilon}}$ and $\widehat{\boldsymbol{\beta}}$ are independent?

\item Calculate $cov(\widehat{\boldsymbol{\epsilon}},\widehat{\mathbf{y}})$; show your work. Why should you have known this answer without doing the calculation, assuming normal error terms? Why does the assumption of normality matter? Is the assumption of normality necessary?

% Maybe this is done better later.
% \item For the general linear regression model with normal error terms, show that $\hat{\boldsymbol{\epsilon}}$ and $\overline{y}$ are independent.

\item What is the distribution of $\mathbf{s}_1 = \mathbf{X}^\prime\boldsymbol{\epsilon}$? Show the calculation of expected value and variance-covariance matrix.

\item What is the distribution of $\mathbf{s}_2 = \mathbf{X}^\prime \, \widehat{\boldsymbol{\epsilon}}$? 
    \begin{enumerate}
        \item Answer the question.
        \item Show the calculation of expected value and variance-covariance matrix.
        \item Is  this a surprise? Answer Yes or No.
        \item What is the probability that $\mathbf{s}_2=\mathbf{0}$? The answer is a single number.
    \end{enumerate}

\item In an earlier Assignment, you proved that 
\begin{displaymath}
    (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^\prime (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})
  = \widehat{\boldsymbol{\epsilon}}^\prime \, \widehat{\boldsymbol{\epsilon}} + (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta})^\prime  (\mathbf{X^\prime X}) (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta}).
\end{displaymath}
    \begin{enumerate}
        \item Since you were able to do it once, please do it again for practice. Adding and subtracting the projection is what makes it work. 
        \item Starting with this expression, show that \emph{SSE}$/\sigma^2  \sim \chi^2(n-k-1)$.  Use the formula sheet as necessary.
    \end{enumerate}

\item For the general linear regression model with normal errors, tests and confidence intervals for linear combinations of regression coefficients are very useful. Derive the appropriate $t$ distribution and some applications by following these steps. Let $\mathbf{a}$ be a $(k+1) \times 1$ vector of constants.
    \begin{enumerate}
        \item What is the distribution of $\mathbf{a}^\prime \widehat{\boldsymbol{\beta}}$? Show a little work. Your answer includes formulas for the parameters of the distribution.
        \item Now standardize $\mathbf{a}^\prime \widehat{\boldsymbol{\beta}}$ (subtract off the mean and divide by the standard deviation) to obtain a standard normal.
        \item Divide by the square root of a well-chosen chi-squared random variable, divided by its degrees of freedom, and simplify. Call the result $t$. 
        \item How do you know numerator and denominator are independent?
        \item Suppose you wanted to test $H_0: \mathbf{a}^\prime\boldsymbol{\beta} = c$. Write down a formula for the test statistic. 
        \item For a regression model with four predictor variables, suppose you wanted to test $H_0: \beta_2=0$. Give the vector $\mathbf{a}$.
        \item For a regression model with four independent variables, suppose you wanted to test $H_0: \beta_1=\frac{1}{2}(\beta_2+\beta_3)$. Give the vector $\mathbf{a}$.
        \item Consider a data set in which there are $n$ first-year students in ECO100. $x_1$ is High School Calculus mark, $x_2$ is High School grade point average, $x_3$ is score on a test of general mathematical knowledge, and $y$ is mark in ECO100. You seek to estimate expected mark for a student with a 91\% in High School Calculus, a High School GPA of 83\%, and 24 out of 25 on the test. You are estimating $\mathbf{a}^\prime\boldsymbol{\beta}$. Give the vector $\mathbf{a}$.
        \item Letting $t_{\alpha/2}$ denote the point cutting off the top $\alpha/2$ of the $t$ distribution with $n-k-1$ degrees of freedom, derive the $(1-\alpha) \times 100\%$ confidence interval for  $\mathbf{a}^\prime\boldsymbol{\beta}$. ``Derive" means show the High School algebra.
    \end{enumerate}

\item In Question 17 of Assignment 4, you considered a regression model with no predictor variables. If the errors are normal, this is the same as sampling $y_1, \ldots, y_n$ from a normal distribution with expected value $\mu=\beta_0$ and variance $\sigma^2$.
    \begin{enumerate}
        \item What is \emph{MSE} for this problem? Show some work. Feel free to use your results from Assignment~4. 
        \item Show that with no predictor variables, the confidence interval for $\beta_0$ is just the usual confidence interval for $\mu$. You may use your answer to Question~3 of Assignment~2.
    \end{enumerate}

% In Assignment 2,



% General linear test %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\item For the general linear model with normal errors, 
    \begin{enumerate}
        \item Let $\mathbf{C}$ be a $q \times (k+1)$ matrix of constants with linearly independent rows. What is the distribution of $\mathbf{C}\widehat{\boldsymbol{\beta}}$?  
        \item If $H_0: \mathbf{C}\boldsymbol{\beta} = \mathbf{t}$ is true, what is the distribution of $\frac{1}{\sigma^2}
            (\mathbf{C}\widehat{\boldsymbol{\beta}}-\mathbf{t})^\prime
            (\mathbf{C}(\mathbf{X}^\prime \mathbf{X})^{-1}\mathbf{C}^\prime)^{-1}
            (\mathbf{C}\widehat{\boldsymbol{\beta}}-\mathbf{t})$? 
Please locate support for your answer on the formula sheet.  For full marks, don't forget the degrees of freedom.
        \item What is the distribution of \emph{SSE}/$\sigma^2$
            \begin{enumerate}
                \item If $H_0$ is true?
                \item If $H_0$ is false?
            \end{enumerate}
        \item Form the $F$ ratio for testing $H_0: \mathbf{C}\boldsymbol{\beta} = \mathbf{t}$. 
            \begin{enumerate}
                \item Write down the formula. Simplify.
                \item How do you know numerator and denominator are independent?
                \item What is the distribution of the test statistic if the null hypothesis is true?
            \end{enumerate}
    \end{enumerate}

\item \label{tsq} Suppose you wish to test the null hypothesis that a \emph{single} linear combination of regression coefficients is equal to a scalar constant $t_0$. That is, you want to test $H_0: \mathbf{a}^\prime\boldsymbol{\beta} = t_0$. Referring to the formula sheet, verify that $F=t^2$. Show your work. 

\pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\item The exact way that you express a linear null hypothesis does not matter. Let $\mathbf{A}$ be a $q \times q$ nonsingular matrix (meaning $\mathbf{A}^{-1}$ exists), so that $\mathbf{C}\boldsymbol{\beta} = \mathbf{t}$ if and only if $\mathbf{AC}\boldsymbol{\beta} = \mathbf{At}$. This is a useful way to express a logically equivalent linear null hypothesis. Show that the general linear test statistic $F^*$ for testing $H_0: \mathbf{AC}\boldsymbol{\beta} = \mathbf{At}$ is the same as the one for testing $H_0: \mathbf{C}\boldsymbol{\beta} = \mathbf{t}$.

\item In the full-reduced approach to testing linear null hypotheses, there are several formulas for the test statistic $F^*$, all connected by a bit of High School algebra. 
    \begin{enumerate}
        \item Show 
$\frac{\left(R^2(\mbox{\footnotesize\emph{full}}) - 
R^2(\mbox{\footnotesize\emph{reduced}})\right)/q} {\left(1-R^2(\mbox{\footnotesize\emph{full}})\right)/(n-k-1)} = \frac{\mbox{\footnotesize\emph{SSR(full)$-$SSR(reduced)}}}
{\mbox{\footnotesize\emph{q MSE(full)}}}$.
        \item Show 
$\frac{\mbox{\footnotesize\emph{SSR(full)$-$SSR(reduced)}}}
{\mbox{\footnotesize\emph{q MSE(full)}}} = 
\frac{\mbox{\footnotesize\emph{SSE(reduced)$-$SSE(full)}}}
{\mbox{\footnotesize\emph{q MSE(full)}}}$
        \item Show $\frac{\mbox{\footnotesize\emph{SSR(full)$-$SSR(reduced)}}}
{\mbox{\footnotesize\emph{q MSE(full)}}} = 
\left( \frac{n-k-1}{q}  \right) \left( \frac{p}{1-p} \right)$, where 
$p = \frac{R^2(\mbox{\footnotesize\emph{full}}) - R^2(\mbox{\footnotesize\emph{reduced}})}
             {1-R^2(\mbox{\footnotesize\emph{reduced}})}$.
        \item Show $p = \frac{qF^*}{qF^*+n-k-1}$.
    \end{enumerate}


% Initial test %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\item For the general linear regression model with normal error terms, we'd like to show that if the model has an intercept, then $\widehat{\boldsymbol{\epsilon}}$ and $\overline{y}$ are independent. If you can show that $\overline{y}$ is a function of $\widehat{\boldsymbol{\beta}}$, you are done (why?). Here are some ingredients to start you out. For the model with intercept,
    \begin{enumerate}
        \item What does $\mathbf{X}^\prime\widehat{\boldsymbol{\epsilon}} = \mathbf{0}$ tell you about $\sum_{i=1}^n \widehat{\epsilon}_i$?
        \item Therefore what do you know about $\sum_{i=1}^n y_i$ and $\sum_{i=1}^n \widehat{y}_i$?
        \item Now indicate why $\widehat{\boldsymbol{\epsilon}}$ and $\overline{y}$ are independent.
    \end{enumerate}

\item Examine the formulas for $SST=SSE+SSR$ on the formula sheet. How do you know that $SSR$ and $SSE$ are independent if the model has an intercept?

\item Continue assuming that the regression model has an intercept. Many statistical programs automatically provide an \emph{overall} test. The null hypothesis of this test   says that none of the predictor variables makes any difference. If you can't reject that, you're in trouble. Supposing $H_0: \beta_1 = \cdots = \beta_k = 0$ is true,
    \begin{enumerate}
        \item What is the distribution of $y_i$?
        \item What is the distribution of $\frac{SST}{\sigma^2}$? Just write down the answer. If necessary, check the formula sheet. 
        \item What is the distribution of $SSR/\sigma^2$? Use the formula sheet and show your work. Don't forget the degrees of freedom.
        \item  \label{Fstat} Recall the definition of the $F$ distribution. If $w_1 \sim \chi^2(\nu_1)$ and $w_2 \sim \chi^2(\nu_2)$ are independent, $F = \frac{w_1/\nu_1}{w_2/\nu_2} \sim F(\nu_1,\nu_2)$. Show that $F^* = \frac{SSR/k}{SSE/(n-k-1)}$ has an $F$ distribution under $H_0: \beta_1 = \cdots = \beta_k = 0$. Refer to earlier results as you use them.
        \item Obtain an $F^*$ test statistic for this same null hypothesis, based on the full-reduced model approach. Does it equal the  $F^*$ test statistic of Question~\ref{Fstat}?
        \item The null hypothesis $H_0: \beta_1 = \cdots = \beta_k = 0$ is less and less believable as $R^2$ becomes larger. Show that the $F^*$ statistic of Question~\ref{Fstat} is an increasing function of $R^2$ for fixed $n$ and $k$. This means it makes sense to reject $H_0$ for large values of $F$.
    \end{enumerate}
    

% Do I put an R question here? 

\end{enumerate} % End of all the questions

%\vspace{60mm}

\end{document}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


% \vspace{30mm} \hrule \vspace{30mm} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \vspace{30mm}\hrule\vspace{30mm}

% Removed

    \item \label{computer} The \texttt{statclass} data consist of Quiz
  average, Computer assignment average, Midterm score and Final Exam score from
  a statistics class, long ago. At the R prompt, type 

{\scriptsize
\begin{verbatim}
statclass = read.table("http://www.utstat.utoronto.ca/~brunner/data/legal/LittleStatclassdata.txt")
\end{verbatim}
} % End size
You now have access to the \texttt{statclass} data, just as you have access to the \texttt{trees} data set used in lecture, or any other R data set.
    \begin{enumerate}
        \item Calculate $\widehat{\boldsymbol{\beta}}$ two ways, with matrix commands and with the \texttt{lm} function. What is $\widehat{\beta}_2$? The answer is a number on your printout.
        \item What is the predicted Final Exam score for a student with a Quiz average of 8.5, a Computer average of 5, and a Midterm mark of 60\%? The answer is a number. Be able to do this kind of thing on the quiz with a calculator. My answer is 63.84144. 
        \item For any fixed Quiz Average and Computer Average, a score one point higher on the Midterm yields a predicted mark on the Final Exam that is \underline{\hspace{10mm}} higher.
        \item For any fixed Quiz Average and Midterm score, an average one point higher on the Midterm yields a predicted mark on the Final Exam that is \underline{\hspace{10mm}} higher. Or is it lower? 
    \end{enumerate}