%                                   302f20Assignment12.tex
\documentclass[11pt]{article} 
%\usepackage{amsbsy} % for \boldsymbol and \pmb 
\usepackage{graphicx} % To include pdf files!
\usepackage{amsmath}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage{comment}
\usepackage{euscript} % for \EuScript
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links
\usepackage{fullpage}
%\pagestyle{empty} % No page numbers

\begin{document}
%\enlargethispage*{1000 pt} 

\begin{center}   
{\Large \textbf{STA 302f20 Assignment Twelve}\footnote{This assignment was prepared by  \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner},
Department of Statistical Sciences, University of Toronto. It is licensed under a 
\href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US}
     {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website:
\href{http://www.utstat.toronto.edu/~brunner/oldclass/302f20} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f20}}} }
\vspace{1 mm}
\end{center}

\noindent
The following problems are not to be handed in. They are preparation for the final exam. 

\begin{enumerate} 

\item % STA431s17
Ordinary least squares is often applied to data sets where the explanatory variables are best modeled as random variables. In what way does the usual conditional linear regression model imply that (random) explanatory variables have zero covariance with the error term? Hint: Assume that the vector of explanatory variables $\mathbf{x}_i$ as well as $\epsilon_i$ is continuous. For the usual linear regression model with normal errors, what is the conditional distribution of $\epsilon_i$ given $\mathbf{x}_i$?

\item In a regression with one random explanatory variable, show that $E(\epsilon_i|X_i=x_i)=0$ for all $x_i$ implies $Cov(X_i,\epsilon_i)=0$, so that a standard regression model \emph{without} the normality assumption still implies zero covariance (though not necessarily independence) between the error term and explanatory variables. For convenience, you may assume that the distributions are continuous, so you can integrate.
% Hint: the matrix version of this calculation is in the text.


\item % 2017  
In the usual multiple regression model, the  $\mathbf{X}$ matrix is an $n \times (k+1)$ matrix of known constants. But in practice, the independent variables are often random and not fixed. Clearly, if the model holds \emph{conditionally} upon the values of the independent variables, then all the usual results hold, again conditionally upon the particular values of the independent variables. The probabilities (for example, $p$-values) are conditional probabilities, and the $F$ statistic does not have an $F$ distribution, but a conditional $F$ distribution, given $\mathcal{X}=\mathbf{X}$. Here, the $n \times (k+1)$ matrix $\mathcal{X}$ is used to denote the matrix containing the random explanatory variables. It does not have to be \emph{all} random. For example the first column might contain only ones if the model has an intercept. 
        \begin{enumerate}
            \item Show that the least-squares estimator $\widehat{\boldsymbol{\beta}} = (\mathcal{X}^\prime\mathcal{X}) \mathcal{X}^\prime \mathbf{y}$ is unbiased, conditionally upon $\mathcal{X} = \mathbf{X}$. You've done this before with a slightly different notation.
            \item Show that $\widehat{\boldsymbol{\beta}}$ is also unbiased unconditionally. 
            \item A similar calculation applies to the significance level of a hypothesis test. Let $F$ be the test statistic (say for an $F$-test comparing full and reduced models), and $f_c$ be the critical value. If the null hypothesis is true, then the test is size $\alpha$, conditionally upon the independent variable values. That is, $P(F>f_c|\mathcal{X}=\mathbf{X})\stackrel{H_0}{=}\alpha$. Using the Law of Total Probability (see lecture slides), find the \emph{unconditional} probability of a Type I error. Assume that the explanatory variables are discrete, so you can write a multiple sum. 
        \end{enumerate}

\newpage 

\item  % 2017
Consider the following model with random predictor variables. Independently for $i=1, \ldots, n$, 
\begin{eqnarray*}
y_i &=& \alpha + \beta_1 x_{i1} + \cdots + \beta_k x_{ik} + \epsilon_i \\
    &=& \alpha + \boldsymbol{\beta}^\prime \mathbf{x}_i + \epsilon_i, 
\end{eqnarray*}
where
\begin{displaymath}
 \mathbf{x}_i = \left(
\begin{array}{c}
x_{i1} \\ \vdots \\  x_{ik}
\end{array} \right)  
\end{displaymath}
and $\mathbf{x}_i$ is independent of $\epsilon_i$.

Note that in this notation, $\alpha$ is the intercept, and $\boldsymbol{\beta}$ does not include the intercept. The ``independent" variables $\mathbf{x}_i = (x_{i1}, \ldots, x_{ik})^\prime$ are not statistically independent. They have the symmetric and positive definite $k \times k$ covariance matrix $\boldsymbol{\Sigma}_x = [\sigma_{ij}]$, which need not be diagonal. 
They also have the $k \times 1$ vector of expected values $\boldsymbol{\mu}_x = (\mu_1, \ldots, \mu_k)^\prime$. 
        \begin{enumerate}
%            \item What is $Cov(x_{i1},x_i)$? Express your answer in terms of $\beta$ and $\sigma_{ij}$ quantities. Show your work. 
            \item Let $\boldsymbol{\Sigma}_{xy}$ denote the $k \times 1$ matrix of covariances between $y_i$ and $x_{ij}$ for $j=1, \ldots, k$. Calculate $\boldsymbol{\Sigma}_{xy} = cov(\mathbf{x}_i,y_i)$. Stay with matrix notation and don't expand.
%, obtaining $\boldsymbol{\Sigma}_{xy} = \boldsymbol{\Sigma}_x \boldsymbol{\beta}$.
            \item From the equation you just obtained, solve for $\boldsymbol{\beta}$ in terms of $\boldsymbol{\Sigma}_x$ and $\boldsymbol{\Sigma}_{xy}$.
            \item Based on your answer to the last part and  and letting $\widehat{\boldsymbol{\Sigma}}_x$ and $\widehat{\boldsymbol{\Sigma}}_{xy}$ denote matrices of \emph{sample} variances and covariances, what would be a reasonable estimator of $\boldsymbol{\beta}$ that you could calculate from sample data? % If you are not sure, check the lecture notes in which we centered $y_i$ and well as the independent variables, and fit a regression through the origin.
    \end{enumerate}

\item  % 2017
In the following regression model, the explanatory variables $x_1$ and $x_2$ are random variables. The true model is 
\begin{displaymath}
    y_i = \beta_0 + \beta_1 x_{i,1} + \beta_2 x_{i,2} + \epsilon_i,
\end{displaymath}
independently for $i= 1, \ldots, n$, where $\epsilon_i \sim N(0,\sigma^2)$ is independent of $x_{i,1}$ and $x_{i,2}$. 

The mean and covariance matrix of the explanatory variables are given by
\begin{displaymath}
    E\left( \begin{array}{c} x_{i,1} \\ x_{i,2} \end{array} \right) =
     \left( \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right)
     \mbox{~~ and ~~}
    cov\left( \begin{array}{c} x_{i,1} \\ x_{i,2} \end{array} \right) = 
 \left( \begin{array}{rr}
\phi_{11} & \phi_{12} \\ 
\phi_{12} & \phi_{22}
\end{array} \right) 
\end{displaymath}

Unfortunately $x_{i,2}$, which has an impact on $y_i$ and is correlated with $x_{i,1}$, is not part of the data set. Since $x_{i,2}$ is not observed, it is absorbed by the intercept and error term, as follows.
\begin{eqnarray*}
    y_i &=& \beta_0 + \beta_1 x_{i,1} + \beta_2 x_{i,2} + \epsilon_i \\
        &=& (\beta_0 + \beta_2\mu_2) + \beta_1 x_{i,1} + (\beta_2 x_{i,2} - \beta_2 \mu_2 + \epsilon_i) \\
        &=& \beta^*_0 + \beta_1 x_{i,1} + \epsilon^*_i.
\end{eqnarray*}
It was necessary to add and subtract $\beta_2 \mu_2$ in order to obtain $E(\epsilon^*_i)=0$. And of course there could be more than one omitted variable. They would all get swallowed by the intercept and error term, the garbage bins of regression analysis.

\begin{enumerate}
     \item What is $Cov(x_{i,1},\epsilon^*_i)$? This is a scalar calculation.
     \item Calculate $Cov(x_{i,1},y_i)$. This is another scalar calculation. Is it possible to have non-zero covariance between $x_{i,1}$ and $y_i$ when $\beta_1=0$? 
     \item Suppose we want to estimate $\beta_1$ using the usual least squares estimator $\widehat{\beta}_1$ (see formula sheet). As $n \rightarrow \infty$, does $\widehat{\beta}_1 \rightarrow \beta_1$? You may use the fact that like sample means, sample variances and covariances converge to the corresponding Greek-letter versions as $n \rightarrow \infty$ (except possibly on a set of probability zero) like ordinary limits, and all the usual rules of limits apply. So for example, defining $\widehat{\sigma}_{xy}$ as $\frac{1}{n-1}\sum_{i=1}^n(x_{i,1}-\overline{x}_1)(y_i-\overline{y})$, we have $\widehat{\sigma}_{xy} \rightarrow Cov(x_i,y_i)$.
\end{enumerate}

\item % Instrumental variables from 431s17
Independently for $i= 1, \ldots, n$, let $Y_i = \beta X_i + \epsilon_i$, where 
\begin{displaymath}
    \left( \begin{array}{c} X_i \\ \epsilon_i \end{array} \right) \sim
    N_2\left(  \left( \begin{array}{c} \mu_x \\ 0 \end{array} \right), 
 \left( \begin{array}{cc}
\sigma^2_x &       c \\ 
   c        & \sigma^2_\epsilon
\end{array} \right)  
    \right)
\end{displaymath}
The observable data are $\mathbf{D}_1, \ldots, \mathbf{D}_n$, where 
$\mathbf{D}_i =  \left( \begin{array}{c} X_i \\ Y_i \end{array} \right)$.
    \begin{enumerate}
        \item Draw a path diagram for this model.
        \item What is the distribution of $\mathbf{D}_i$? % Hint: If $\mathbf{w} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, what is the distribution of $\mathbf{Aw}$? What is $\mathbf{A}$ for this problem?
        \item What is the vector of parameters for this model? What are their possible values? % (That is, what is the parameter space.)
        % parameter space? Note that $c$ cannot be just anything.
%        \item Give the formula for a Method of Moments estimate of $\beta$. Is it consistent? That is, does it converge in probability to the right answer \emph{everywhere} in the parameter space?
%        \item For this model, showing parameter identifiability consists of solving five equations in five unknowns. What are the equations? The lecture slide entitled ``Five equations in six unknowns" should help.
%        \item For some points in the parameter space these equations can be solved, and for others they cannot. Where can they be solved? You don't have to literally give the solutions.
        \item Give a numerical example of two different parameter vectors that yield the same distribution of $\mathbf{D}_i$. %Stay in the parameter space.  
What are the mean vector and covariance matrix of $\mathbf{D}_i$ for your example? 
% This shows that the parameter is not technically identifiable, even though things are okay in most of the parameter space. It suggests that we need a pointwise definition of parameter identifiability, which is given later in Chapter Zero. 
    \end{enumerate}

% Traditional notation for an instrumental variable is Z, and the path diagram is better than lecture slides and text in other ways, too.
\item For a simple instrumental variables model, the model equations are
\begin{eqnarray*}
    X_i & = & \alpha_1 + \beta_1W_i +\epsilon_{i1} \\ 
    Y_i & = & \alpha_2 + \beta_2X_i +\epsilon_{i2}   
\end{eqnarray*} 
and the path diagram is
\begin{center}
\includegraphics[width=3in]{InstruVar}
\end{center}
    \begin{enumerate}
        \item Calculate the expected value vector and covariance matrix of the observable data. 
        \item Suggest an estimator for the parameter $\beta_1$. Does your $\widehat{\beta}_1 \rightarrow \beta_1$? Why?
        \item Suggest an estimator of the covariance parameter $c$ in terms of $\widehat{\sigma}_{ij}$ values.
        \item Do you have $\widehat{c} \rightarrow c$? Why?
    \end{enumerate}

\end{enumerate} % End of all the questions

\end{document}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%