\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} % for \mathbb{R} The set of reals \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f13 Assignment Six}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent These problems are preparation for the quiz in tutorial on Friday October 25th, and are not to be handed in. For reference, the general linear model with normal error terms is $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$, the columns of $\mathbf{X}$ are linearly independent, and $\boldsymbol{\epsilon} \sim N_n(\mathbf{0},\sigma^2\mathbf{I}_n)$. \begin{enumerate} \item In the general linear model, what is the distribution of $\mathbf{Y}$? \item You know that the least squares estimate of $\boldsymbol{\beta}$ is $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime \mathbf{X})^{-1} \mathbf{X}^\prime \mathbf{Y}$. What is the distribution of $\widehat{\boldsymbol{\beta}}$? Show the calculations. \item Let $\widehat{\mathbf{Y}}=\mathbf{X}\hat{\boldsymbol{\beta}}$. What is the distribution of $\widehat{\mathbf{Y}}$? Show the calculations. \item Let the vector of residuals $\hat{\boldsymbol{\epsilon}} = \mathbf{Y}-\widehat{\mathbf{Y}}$. What is the distribution of $\hat{\boldsymbol{\epsilon}}$? Show the calculations. Simplify both the expected value (which is zero) and the covariance matrix. \item Recall from an earlier homework problem that if $\mathbf{T}$ is a random vector with expected value $\boldsymbol{\mu}$, then $cov(\mathbf{T}) = E(\mathbf{TT}^\prime) - \boldsymbol{\mu\mu}^\prime$. Using this fact, give expressions for \begin{enumerate} \item $E(\mathbf{YY}^\prime)$ \item $E(\widehat{\boldsymbol{\beta}}\widehat{\boldsymbol{\beta}}^\prime)$ \end{enumerate} These may be helpful in the next question. \item For the general linear regression model, show that the $n \times (k+1)$ matrix of covariances $C(\hat{\boldsymbol{\epsilon}},\widehat{\boldsymbol{\beta}}) = \mathbf{0} $. Why does this show that $SSE = \hat{\boldsymbol{\epsilon}}^\prime\hat{\boldsymbol{\epsilon}}$ and $\widehat{\boldsymbol{\beta}}$ are independent? \item In Assignment 4, you proved that \begin{displaymath} (\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})^\prime (\mathbf{Y}-\mathbf{X}\boldsymbol{\beta}) = (\mathbf{Y}-\mathbf{X}\widehat{\boldsymbol{\beta}})^\prime (\mathbf{Y}-\mathbf{X}\widehat{\boldsymbol{\beta}}) + (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta})^\prime (\mathbf{X^\prime X}) (\widehat{\boldsymbol{\beta}}-\boldsymbol{\beta}). \end{displaymath} Starting with this expression, show that $SSE/\sigma^2 \sim \chi^2(n-k-1)$. A result you proved in Assignment 2 will be useful. \item For the general fixed effects linear regression model, tests and confidence intervals for linear combinations of regression coefficients are very useful. Derive the appropriate $t$ distribution and some applications by following these steps. Let $\mathbf{a}$ be a $p \times 1$ vector of constants. \begin{enumerate} \item What is the distribution of $\mathbf{a}^\prime \widehat{\boldsymbol{\beta}}$? Show a little work. Your answer includes both the expected value and the variance. \item Now standardize the difference (subtract off the mean and divide by the standard deviation) to obtain a standard normal. \item Divide by the square root of a well-chosen chi-squared random variable, divided by its degrees of freedom, and simplify. Call the result $T$. \item How do you know numerator and denominator are independent? \item Suppose you wanted to test $H_0: \mathbf{a}^\prime\boldsymbol{\beta} = c$. Write down a formula for the test statistic. \item Suppose you wanted to test $H_0: \beta_2=0$. Give the vector $\mathbf{a}$. \item Suppose you wanted to test $H_0: \beta_1=\beta_2$. Give the vector $\mathbf{a}$. \item Letting $t_{\alpha/2}$ denote the point cutting off the top $\alpha/2$ of the $t$ distribution with $n-k-1$ degrees of freedom, derive the $(1-\alpha) \times 100\%$ confidence interval for $\mathbf{a}^\prime\boldsymbol{\beta}$. \end{enumerate} \item Letting $SST = \sum_{i=1}^n(Y_i-\overline{Y})^2$, $SSE = \sum_{i=1}^n(Y_i-\widehat{Y}_i)^2$ and $SSR = \sum_{i=1}^n(\widehat{Y}_i-\overline{Y})^2$, show $SST=SSR+SSE$. \item Show that $\overline{Y}$ is a function of $\widehat{\boldsymbol{\beta}}$. Why does this establish that $SSR$ and $SSE$ are independent? \item If $H_0: \beta_1 = \cdots = \beta_k = 0$ is true, \begin{enumerate} \item What is the distribution of $Y_i$? \item What is the distribution of $\frac{SST}{\sigma^2}$? Just write down the answer. You already did it in Assignment 2. \end{enumerate} \item Still assuming $H_0: \beta_1 = \cdots = \beta_k = 0$ is true, what is the distribution of $SSR/\sigma^2$? Again you may use material from Assignment 2. \item Suppose $H_0: \beta_1 = \cdots = \beta_k = 0$ were \emph{false}. Would you expect $SSR$ to be bigger, or would you expect it to be smaller? Which one, and why? \item Recall the definition of the $F$ distribution. If $W_1 \sim \chi^2(\nu_1)$ and $W_2 \sim \chi^2(\nu_2)$ are independent, $F = \frac{W_1/\nu_1}{W_2/\nu_2} \sim F(\nu_1,\nu_2)$. How do you know $F = \frac{SSR/k}{SSE/(n-k-1)}$ has an $F$ distribution under $H_0: \beta_1 = \cdots = \beta_k = 0$? List the numbers of the questions that establish the necessary facts. \end{enumerate} \vspace{20mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f13} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f13}} \end{document} \item Give the $(1-\alpha)\times 100\%$ prediction interval for $Y_{n+1}$.