\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} % for \mathbb{R} The set of reals \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f13 Assignment Five}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent These problems are preparation for the quiz in tutorial on Friday October 17th, and are not to be handed in. % For reference, the general linear model in matrix form is $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$, where $\mathbf{X}$ is an $n \times (k+1)$ matrix of observable constants, the columns of $\mathbf{X}$ are linearly independent, $\boldsymbol{\beta}$ is a $(k+1) \times 1$ vector of unknown constants (parameters), and $\boldsymbol{\epsilon}$ is an $n \times 1$ vector of unobservable random variables with $E(\boldsymbol{\epsilon})=\mathbf{0}$ and $cov(\boldsymbol{\epsilon})=\sigma^2\mathbf{I}_n$, where $\sigma^2>0$ is an unknown constant parameter. The least squares estimator of $\boldsymbol{\beta}$ is $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime\mathbf{Y}$. \begin{enumerate} \item The ``hat" matrix is given by $\mathbf{H} = \mathbf{X}(\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime$. It's called the hat matrix because it puts a hat on $\mathbf{Y}$ by $\mathbf{HY} = \widehat{\mathbf{Y}}$. The hat matrix is special. \begin{enumerate} \item What are dimensions (number of rows and columns) of the hat matrix? \item Show that the hat matrix is symmetric. \item Show that the hat matrix is \emph{idempotent}, meaning $\mathbf{H}^{1/2} = \mathbf{H}$. \item Show that $(\mathbf{I}-\mathbf{H})$ is also symmetric and idempotent. \item Write $\widehat{\boldsymbol{\epsilon}}$ in terms of the hat matrix (it's a function of $\mathbf{I}-\mathbf{H}$). \item Write $SSE$ in terms of the hat matrix. Simplify. \item From the last assignment, recall that the the subset of $\mathbb{R}^n$ spanned by the columns of the $\mathbf{X}$ matrix is $\mathcal{V} = \{\mathbf{v} = \mathbf{Xb}: \mathbf{b} \in \mathbb{R}^{k+1}\}$. Also recall that $\widehat{\mathbf{Y}}$, being the closest point in $\mathcal{V}$ to the data vector $\mathbf{Y}$, is the orthoganal projection of $\mathbf{Y}$ onto $\mathcal{V}$. Since $\mathbf{Y}$ could be any point in $\mathbb{R}^n$, multiplication by the hat matrix $\mathbf{H}$ is the operation that projects any point in $\mathbb{R}^n$ onto $\mathcal{V}$. It's like the light bulb above the point that you turn on in order to cast a shadow onto $\mathcal{V}$. All this talk implies that if a point is already in $\mathcal{V}$, its shadow is the point itself. Verify that $\mathbf{Hv} = \mathbf{v}$ for any $\mathbf{v} \in \mathcal{V}$. \end{enumerate} \item \label{nox} The first parts of this question were in Assignment One. Let $Y_1, \ldots, Y_n$ be independent random variables with $E(Y_i)=\mu$ and $Var(Y_i)=\sigma^2$ for $i=1, \ldots, n$. \begin{enumerate} \item Write down $E(\overline{Y})$ and $Var(\overline{Y})$. \item Let $c_1, \ldots, c_n$ be constants and define the linear combination $L$ by $L = \sum_{i=1}^n c_i Y_i$. What condition on the $c_i$ values makes $L$ an unbiased estimator of $\mu$? Recall that $L$ unbiased means that $E(L)=\mu$ for \emph{all} real $\mu$. Treat the cases $\mu=0$ and $\mu \neq 0$ separately. \item Is $\overline{Y}$ a special case of $L$? If so, what are the $c_i$ values? \item What is $Var(L)$? \item Now show that $Var(\overline{Y}) < Var(L)$ for every unbiased $L \neq \overline{Y}$. Hint: Add and subtract $\frac{1}{n}$. \end{enumerate} \newpage \item \label{GM} For the general linear model $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$, suppose we want to estimate the linear combination $\mathbf{a}^\prime\boldsymbol{\beta}$ based on sample data. The Gauss-Markov Theorem tells us that the most natural choice is also (in a sense) the best choice. This question leads you through the proof of the Gauss-Markov Theorem. Your class notes should help. Also see your solution of Question~\ref{nox}. \begin{enumerate} \item What is the most natural choice for estimating $\mathbf{a}^\prime\boldsymbol{\beta}$? \item Show that it's unbiased. \item The natural estimator is a \emph{linear} unbiased estimator of the form $\mathbf{c}_0^\prime \mathbf{Y}$. What is the $n \times 1$ vector $\mathbf{c}_0$? \item Of course there are lots of other possible linear unbiased estimators of $\mathbf{a}^\prime\boldsymbol{\beta}$. They are all of the form $\mathbf{c}^\prime \mathbf{Y}$; the natural estimator $\mathbf{c}_0^\prime \mathbf{Y}$ is just one of these. The best one is the one with the smallest variance, because its distribution is the most concentrated around the right answer. What is $Var(\mathbf{c}^\prime \mathbf{Y})$? Show your work. \item We insist that $\mathbf{c}^\prime \mathbf{Y}$ be unbiased. Show that if $E(\mathbf{c}^\prime \mathbf{Y}) = \mathbf{a}^\prime\boldsymbol{\beta}$ for \emph{all} $\boldsymbol{\beta} \in \mathbb{R}^{k+1}$, we must have $\mathbf{X}^\prime\mathbf{c} = \mathbf{a}$. \item So, the task is to minimize $Var(\mathbf{c}^\prime \mathbf{Y})$ by minimizing $\mathbf{c}^\prime\mathbf{c}$ over all $\mathbf{c}$ subject to the constraint $\mathbf{X}^\prime\mathbf{c} = \mathbf{a}$. As preparation for this, show $(\mathbf{c}-\mathbf{c}_0)^\prime\mathbf{c}_0 = 0$. \item Using the result of the preceding question, show \begin{displaymath} \mathbf{c}^\prime\mathbf{c} = (\mathbf{c}-\mathbf{c}_0)^\prime(\mathbf{c}-\mathbf{c}_0) + \mathbf{c}_0^\prime\mathbf{c}_0. \end{displaymath} \item Since the formula for $\mathbf{c}_0$ has no $\mathbf{c}$ in it, what choice of $\mathbf{c}$ minimizes the preceding expression? How do you know that the minimum is unique? \end{enumerate} The conclusion is that $\mathbf{c}_0^\prime \mathbf{Y} = \mathbf{a}^\prime\widehat{\boldsymbol{\beta}}$ is the Best Linear Unbiased Estimator (BLUE) of $\mathbf{a}^\prime\boldsymbol{\beta}$. \item The model for simple regression through the origin is $Y_i = \beta x_i + \epsilon_i$, where $\epsilon_1, \ldots, \epsilon_n$ are independent with expected value $0$ and variance $\sigma^2$. In previous homework, you found the least squares estimate of $\beta$ to be $\widehat{\beta} = \frac{\sum_{i=1}^n x_iY_i}{\sum_{i=1}^n x_i^2}$. \begin{enumerate} \item What is $Var(\widehat{\beta})$? \item Let $\widehat{\beta}_2 = \frac{\overline{Y}_n}{\overline{x}_n}$. \begin{enumerate} \item Is $\widehat{\beta}_2$ an unbiased estimator of $\beta$? Answer Yes or No and show your work. \item Is $\widehat{\beta}_2$ a linear combination of the $Y_i$ variables, of the form $L = \sum_{i=1}^n c_i Y_i$? Is so, what is $c_i$? \item What is $Var(\widehat{\beta}_2)$? \item How do you know $Var(\widehat{\beta}) \leq Var(\widehat{\beta}_2)$? No calculations are necessary. \end{enumerate} \newpage \item Let $\widehat{\beta}_3 = \frac{1}{n}\sum_{i=1}^n \frac{Y_i}{x_i} $. \begin{enumerate} \item Is $\widehat{\beta}_3$ an unbiased estimator of $\beta$? Answer Yes or No and show your work. \item Is $\widehat{\beta}_3$ a linear combination of the $Y_i$ variables, of the form $L = \sum_{i=1}^n c_i Y_i$? Is so, what is $c_i$? \item What is $Var(\widehat{\beta}_3)$? \item How do you know $Var(\widehat{\beta}) \leq Var(\widehat{\beta}_3)$? No calculations are necessary. \end{enumerate} \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% MVN via MGF %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Need new formula sheet! % \pagebreak \item The joint moment-generating function of a $p$-dimensional random vector $\mathbf{X}$ is defined as $M_{\mathbf{X}}(\mathbf{t}) = E\left(e^{\mathbf{t}^\prime \mathbf{X}} \right)$. \begin{enumerate} \item Let $\mathbf{Y} = \mathbf{AX}$, where $\mathbf{A}$ is a matrix of constants. Find the moment-generating function of $\mathbf{Y}$. \item Let $\mathbf{Y} = \mathbf{X} + \mathbf{c}$, where $\mathbf{c}$ is a $p \times 1$ vector of constants. Find the moment-generating function of $\mathbf{Y}$. \end{enumerate} \item Let $Z_1, \ldots, Z_p \stackrel{i.i.d.}{\sim}N(0,1)$, and \begin{displaymath} \mathbf{Z} = \left( \begin{array}{c} Z_1 \\ \vdots \\ Z_p \end{array} \right). \end{displaymath} \begin{enumerate} \item What is the joint moment-generating function of $\mathbf{Z}$? Show some work. \item Let $\mathbf{Y} = \boldsymbol{\Sigma}^{1/2}\mathbf{Z} + \boldsymbol{\mu}$, where $\boldsymbol{\Sigma}$ is a $p \times p$ symmetric \emph{non-negative definite} matrix and $\boldsymbol{\mu} \in \mathbb{R}^p$. \begin{enumerate} \item What is $E(\mathbf{Y})$? \item What is the variance-covariance matrix of $\mathbf{Y}$? Show some work. \item What is the moment-generating function of $\mathbf{Y}$? Show your work. \end{enumerate} \end{enumerate} \item We say the $p$-dimensional random vector $\mathbf{Y}$ is multivariate normal with expected value $\boldsymbol{\mu}$ and variance-covariance matrix $\boldsymbol{\Sigma}$, and write $\mathbf{Y} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, when $\mathbf{Y}$ has moment-generating function $ M_{_\mathbf{Y}}(\mathbf{t}) = e^{\mathbf{t}^\prime\boldsymbol{\mu} + \frac{1}{2} \mathbf{t}^\prime\boldsymbol{\Sigma}\mathbf{t}}$. \begin{enumerate} \item Let $\mathbf{Y} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ and $\mathbf{W}=\mathbf{AY}$, where $\mathbf{A}$ is an $r \times p$ matrix of constants. What is the distribution of $\mathbf{W}$? Show your work. \item Let $\mathbf{Y} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ and $\mathbf{W}=\mathbf{Y}+\mathbf{c}$, where $\mathbf{A}$ is an $p \times 1$ vector of constants. What is the distribution of $\mathbf{W}$? Show your work. \end{enumerate} \item Let $\mathbf{Y} \sim N_2(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, with \begin{displaymath} \mathbf{Y} = \left(\begin{array}{c} Y_1 \\ Y_2 \end{array}\right) ~~~~~ \boldsymbol{\mu} = \left(\begin{array}{c} \mu_1 \\ \mu_2 \end{array}\right) ~~~~~ \boldsymbol{\Sigma} = \left(\begin{array}{cc} \sigma^2_1 & 0 \\ 0 & \sigma^2_2 \end{array}\right) \end{displaymath} Using moment-generating functions, show $Y_1$ and $Y_2$ are independent. \item Let $\mathbf{X}= (X_1,X_2,X_3)^\prime$ be multivariate normal with \begin{displaymath} \boldsymbol{\mu} = \left[ \begin{array}{c} 1 \\ 0 \\ 6 \end{array} \right] \mbox{ and } \boldsymbol{\Sigma} = \left[ \begin{array}{c c c} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{array} \right] . \end{displaymath} Let $Y_1=X_1+X_2$ and $Y_2=X_2+X_3$. Find the joint distribution of $Y_1$ and $Y_2$. \item Let $X_1$ be Normal$(\mu_1, \sigma^2_1)$, and $X_2$ be Normal$(\mu_2, \sigma^2_2)$, independent of $X_1$. What is the joint distribution of $Y_1=X_1+X_2$ and $Y_2=X_1-X_2$? What is required for $Y_1$ and $Y_2$ to be independent? Hint: Use matrices. \item Show that if $\mathbf{X} \sim N_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$, with $\boldsymbol{\Sigma}$ positive definite, then $Y = (\mathbf{X}-\boldsymbol{\mu})^\prime \boldsymbol{\Sigma}^{-1}(\mathbf{X}-\boldsymbol{\mu})$ has a chi-square distribution with $p$ degrees of freedom. \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. \begin{enumerate} \item Show $Cov(\overline{X},(X_j-\overline{X}))=0$ for $j=1, \ldots, n$. \item Show that $\overline{X}$ and $S^2$ are independent. \item Show that \begin{displaymath} \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1), \end{displaymath} where $S^2 = \frac{\sum_{i=1}^n\left(X_i-\overline{X} \right)^2 }{n-1}$. Hint: $\sum_{i=1}^n\left(X_i-\mu \right)^2 = \sum_{i=1}^n\left(X_i-\overline{X} + \overline{X} - \mu \right)^2 = \ldots$ \end{enumerate} \item Recall the definition of the $t$ distribution. If $Z\sim N(0,1)$, $W \sim \chi^2(\nu)$ and $Z$ and $W$ are independent, then $T = \frac{Z}{\sqrt{W/\nu}}$ is said to have a $t$ distribution with $\nu$ degrees of freedom, and we write $T \sim t(\nu)$. As in the last question, let $X_1, \ldots, X_n$ be random sample from a $N(\mu,\sigma^2)$ distribution. Show that $T = \frac{\sqrt{n}(\overline{X}-\mu)}{S} \sim t(n-1)$. \end{enumerate} \vspace{20mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f14} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f14}} \end{document} R work for simple regression set.seed(444) x = x = c(1,8,3,6,4,7) y = 10 -2*x + rpois(6,10) plot(x,y) cbind(x,y) x; y > x; y [1] 1 8 3 6 4 7 [1] 14 2 14 10 9 9 \begin{tabular}{crrrrrr} \hline $x$ & 1 & 8 & 3 & 6 & 4 & 7 \\ $y$ & 14 & 2 & 14 & 10 & 9 & 9 \\ \hline \end{tabular} \item Let $X_1, \ldots, X_n$ be random sample from a $N(\mu,\sigma^2)$ distribution. You may use the independence of $\overline{X}$ and $S^2$ without proof, for now.