\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} % for \mathbb{R} The set of reals \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f13 Assignment Twelve}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent These questions are practice for the final exam, and are not to be handed in. Material like this may or may not be on the final. This assignment explores the \emph{consistency} of estimated regression coefficients when independent variables are missing from the model or measured with error. Roughly speaking, an estimator is consistent if it converges to the parameter it's estimating as the sample size $n \rightarrow \infty$. This kind of large-sample accuracy is pretty much the least you can ask. \begin{enumerate} \item In the following regression model, the independent variables $X_1$ and $X_2$ are random variables. The true model is \begin{displaymath} Y_i = \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + \epsilon_i, \end{displaymath} independently for $i= 1, \ldots, n$, where $\epsilon_i \sim N(0,\sigma^2)$. The mean and covariance matrix of the independent variables are given by \begin{displaymath} E\left( \begin{array}{c} X_{i,1} \\ X_{i,2} \end{array} \right) = \left( \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right) \mbox{~~ and ~~} cov\left( \begin{array}{c} X_{i,1} \\ X_{i,2} \end{array} \right) = \left( \begin{array}{rr} \phi_{11} & \phi_{12} \\ \phi_{12} & \phi_{22} \end{array} \right) \end{displaymath} Unfortunately $X_{i,2}$, which has an impact on $Y_i$ and is correlated with $X_{i,1}$, is not part of the data set. Since $X_{i,2}$ is not observed, it is absorbed by the intercept and error term, as follows. \begin{eqnarray*} Y_i &=& \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + \epsilon_i \\ &=& (\beta_0 + \beta_2\mu_2) + \beta_1 X_{i,1} + (\beta_2 X_{i,2} - \beta_2 \mu_2 + \epsilon_i) \\ &=& \beta^\prime_0 + \beta_1 X_{i,1} + \epsilon^\prime_i. \end{eqnarray*} The primes just denote a new $\beta_0$ and a new $\epsilon_i$. It was necessary to add and subtract $\beta_2 \mu_2$ in order to obtain $E(\epsilon^\prime_i)=0$. And of course there could be more than one omitted variable. They would all get swallowed by the intercept and error term, the garbage bins of regression analysis. \begin{enumerate} \item What is $Cov(X_{i,1},\epsilon^\prime_i)$? \item Calculate the variance-covariance matrix of $(X_{i,1},Y_i)$ under the true model. Is it possible to have non-zero covariance between $X_{i,1}$ and $Y_i$ when $\beta_1=0$? \item Suppose we want to estimate $\beta_1$. The usual least squares estimator is \begin{displaymath} \widehat{\beta}_1 = \frac{\sum_{i=1}^n(X_{i,1}-\overline{X}_1)(Y_i-\overline{Y})} {\sum_{i=1}^n(X_{i,1}-\overline{X}_1)^2}. \end{displaymath} You may just use this formula; you don't have to derive it. You may also use the fact that like sample means, sample variances and covariances converge to the corresponding Greek-letter versions as $n \rightarrow \infty$ (except possibly on a set of probability zero) like ordinary limits, and all the usual rules of limits apply. So for example, defining $\widehat{\sigma}_{xy}$ as $\frac{1}{n-1}\sum_{i=1}^n(X_{i,1}-\overline{X}_1)(Y_i-\overline{Y})$, we have $\widehat{\sigma}_{xy} \rightarrow Cov(X_i,Y_i)$. So finally, here is the question. As $n \rightarrow \infty$, does $\widehat{\beta}_1 \rightarrow \beta_1$? Show your work. \end{enumerate} \item Consider simple regression through the origin in which the independent variable values are random variables rather than fixed constants. In addition, the independent variable values cannot be observed directly. Instead, we observe $X_i$ plus a piece of random noise. The model is this: Independently for $i=1, \ldots, n$, let \begin{eqnarray} \label{witherror} Y_i & = & X_i \beta + \epsilon_i \\ W_i & = & X_i + e_i, \nonumber \end{eqnarray} where \begin{itemize} \item $X_i$ has expected value $\mu$ and variance $\sigma^2_x$, \item $\epsilon_i$ has expected value $0$ and variance $\sigma^2_\epsilon$, and \item $e_i$ has expected value $0$ and variance $\sigma^2_e$ \item $X_i$, $e_i$ and and $\epsilon_i$ are all independent. \end{itemize} Again, the $X_i$ values are unavailable. All we can see are the pairs $(W_i,Y_i)$ for $i=1, \ldots, n$. \begin{enumerate} % Starting parts of the mereg question \item Following common practice, we ignore the measurement error and apply the usual regression estimator with $W_i$ in place of $X_i$. The parameter $\beta$ is estimated by \begin{displaymath} \widehat{\beta}_{(1)} = \frac{\sum_{i=1}^n W_iY_i}{\sum_{i=1}^n W_i^2} \end{displaymath} Does $\widehat{\beta}_{(1)} \rightarrow \beta$? Answer Yes or No and show your work. \item Now consider instead the estimator \begin{displaymath} \widehat{\beta}_{(2)} = \frac{\sum_{i=1}^n Y_i}{\sum_{i=1}^n W_i}. \end{displaymath} Does $\widehat{\beta}_{(2)} \rightarrow \beta$? Answer Yes or No and show your work. \end{enumerate} % End mereg \end{enumerate} \vspace{40mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f13} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f13}} \end{document} \vspace{5mm} \noindent