% 431Assignment4.tex    Repeat of derive MVN likelihood, Surface regression with SAS, Omitted vars, instrumental vars.
\documentclass[11pt]{article} 
%\usepackage{amsbsy} % for \boldsymbol and \pmb 
\usepackage{graphicx} % To include pdf files!
\usepackage{amsmath}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links
\usepackage{fullpage}
%\pagestyle{empty} % No page numbers


\begin{document}

%\enlargethispage*{1000 pt} 


\begin{center}   
{\Large \textbf{STA 431s17 Assignment Four}}\footnote{This assignment was prepared by  \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner},
Department of Statistical Sciences, University of Toronto. It is licensed under a 
\href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US}
     {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website:
\href{http://www.utstat.toronto.edu/~brunner/oldclass/431s17} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s17}}}
\vspace{1 mm}
\end{center}

\noindent
This assignment is based mostly on lecture units Six (SAS Example 2) and Seven (Omitted Variables and Instrumental Variables). Also see Sections 0.5 and 0.6 in Chapter Zero of the text. The non-computer parts of this assignment are just practice for the quiz. They are not to be handed in. 
% This is what I should be doing -- providing specific guidance as to what the assignment covers.

\begin{enumerate} 

\item This question is a deliberate repeat from last week. Starting with the multivariate normal density on the formula sheet, derive the multivariate normal likelihood, also on the formula sheet. You will use $tr(\mathbf{AB}) = tr(\mathbf{AB}) $ and other properties of the trace.

\item \label{SAS} The \texttt{statclass} data consist of Quiz
average, Computer assignment average, Midterm score and Final Exam score from
a statistics class, long ago. The first three variables are explanatory, and final exam score is the response variable. Data are in the plain text file
{\small
\begin{center}
\href{http://www.utstat.utoronto.ca/~brunner/data/legal/LittleStatclassdata.txt} 
{\texttt{http://www.utstat.utoronto.ca/$\sim$brunner/data/legal/LittleStatclassdata.txt}}.
\end{center}
} % End size

Fit a standard regression model using \texttt{proc calis}. % 14 parameters? 
Please make sure to use the \texttt{vardef=n} and \texttt{nostand} options. 
All the variables are observed, and the explanatory variables could be correlated with one another but they are independent of the error term. So that our regression coefficients will mean the same thing, please use the order of explanatory variables in the data file. Be able to answer questions like the following.
    \begin{enumerate}
        \item What is $\widehat{\beta}_0$? The answer is a number on your printout.
        \item What is the test statistic for test $H_0: \beta_2=0$? What is the $p$-value? The answers are numbers on your printout.
        \item What is the predicted Final Exam score for a student with a Quiz average of 8.5, a Computer average of 5, and a Midterm mark of 60\%? The answer is a number. Be able to do this kind of thing on the quiz with a calculator. % My answer was 63.84144 using R. 
        \item For any fixed Quiz Average and Computer Average, a score one point higher on the Midterm yields a predicted mark on the Final Exam that is \underline{\hspace{10mm}} higher.
        \item From your \texttt{proc calis} output, what is the estimated covariance between Quiz average and Computer average? The answer is a number on your printout.
        \item Your regression model should have an error term. What is its estimated variance? The answer is a number from your \texttt{proc calis} output.
    \end{enumerate}
Please bring printouts of your log file and results file to the quiz; you may be asked to hand them in. Make sure your name and student number appears on both files, preferably using a \texttt{title} or \texttt{title2} statement. \textbf{Do not write anything on your printouts} except possibly your name and student number if you forgot to put them in your code.

\pagebreak

     \item Ordinary least squares is often applied to data sets where the explanatory variables are best modeled as random variables. In what way does the usual conditional linear regression model imply that (random) explanatory variables have zero covariance with the error term? Hint: Assume $\mathbf{X}_i$ as well as $\epsilon_i$ continuous. What is the conditional distribution of $\epsilon_i$ given $\mathbf{X}_i=\mathbf{x}_i)=0$?

     \item In a regression with one explanatory variable, show that $E(\epsilon_i|X_i=x_i)=0$ for all $x_i$ implies $Cov(X_i,\epsilon_i)=0$, so that a standard regression model without the normality assumption still implies zero covariance (though not necessarily independence) between the error term and explanatory variables. Hint: the matrix version of this calculation is in the text.

\item \label{omit} In the following regression model, the explanatory variables $X_1$ and $X_2$ are random variables. The true model is 
\begin{displaymath}
    Y_i = \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + \epsilon_i,
\end{displaymath}
independently for $i= 1, \ldots, n$, where $\epsilon_i \sim N(0,\sigma^2)$ and is independent of $X_{i,1}$ and $X_{i,2}$. 

The explanatory variables have a bivariate normal distribution with
\begin{displaymath}
    E\left( \begin{array}{c} X_{i,1} \\ X_{i,2} \end{array} \right) =
     \left( \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right)
     \mbox{~~ and ~~}
    cov\left( \begin{array}{c} X_{i,1} \\ X_{i,2} \end{array} \right) = 
 \left( \begin{array}{rr}
\phi_{11} & \phi_{12} \\ 
\phi_{12} & \phi_{22}
\end{array} \right) 
\end{displaymath}

Unfortunately $X_{i,2}$, which has an impact on $Y_i$ and is correlated with $X_{i,1}$, is not part of the data set. Since $X_{i,2}$ is not observed, it is absorbed by the intercept and error term, as follows.
\begin{eqnarray*}
    Y_i &=& \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + \epsilon_i \\
        &=& (\beta_0 + \beta_2\mu_2) + \beta_1 X_{i,1} + (\beta_2 X_{i,2} - \beta_2 \mu_2 + \epsilon_i) \\
        &=& \beta^\prime_0 + \beta_1 X_{i,1} + \epsilon^\prime_i.
\end{eqnarray*}
The primes just denote a new $\beta_0$ and a new $\epsilon_i$. It was necessary to add and subtract $\beta_2 \mu_2$ in order to obtain $E(\epsilon^\prime_i)=0$. And of course there could be more than one omitted variable. They would all get swallowed by the intercept and error term, the garbage bins of regression analysis.

\begin{enumerate}
     \item What is $Cov(X_{i,1},\epsilon^\prime_i)$? 
     \item Calculate $Cov(X_{i,1},Y_i)$. Is it the same under the true model and the re-parameterized model?

% \pagebreak


     \item Suppose we want to estimate $\beta_1$. The usual least squares estimator is
\begin{displaymath}
    \widehat{\beta}_1 = \frac{\sum_{i=1}^n(X_{i,1}-\overline{X}_1)(Y_i-\overline{Y})}
                           {\sum_{i=1}^n(X_{i,1}-\overline{X}_1)^2}.
\end{displaymath}
You may just use this formula; you don't have to derive it. Is $\widehat{\beta}_1$ a consistent estimator of $\beta_1$  if the true model holds? Answer Yes or no and show your work. Remember, $X_2$ is not available, so you are doing a regression with one explanatory variable. You may use the consistency of the sample variance and covariance without proof.
     \item What is the parameter space under the true model? 
     \item Are there \emph{any} points in the parameter space for which $\widehat{\beta}_1 \stackrel{p}{\rightarrow} \beta_1$ when the true model holds? 
\end{enumerate}

\item If a parameter is a function of the distribution of the observable data, it is said to be \emph{identifiable}. You know that if $X \sim N(\mu,\sigma^2)$, then $E(X)=\mu$ and $Var(X)=\sigma^2$. Why does this tell you that the parameters of the normal model are identifiable?

\item \label{poisson} % See 2101f2013 Assignment 6 for data, more detail.
Men and women are calling a technical support line according to
independent Poisson processes with rates $\lambda_1$ and $\lambda_2$ per
hour. Data for 144 hours are available, but unfortunately the sex of the
caller was not recorded. All we have is the number of callers for each
hour, which is distributed Poisson($\lambda_1+\lambda_2$); just use this; you don't have to show it.  The parameter in this problem is $\boldsymbol{\theta} = (\lambda_1,\lambda_2)$. 
    \begin{enumerate}
        \item Try to find the MLE by differentiating. Show your work. Are there any points in the parameter space where both partial derivatives are zero?  Why did estimation fail for this fairly realistic model? 
        \item To show that the parameters of a model are not identifiable, all you need to do is find two different sets of parameter values that yield the same distribution of the observable data. Then, the parameter vector cannot possibly be a function of the distribution of the observable data. Use this to show that the parameters in this problem are not identifiable. A simple numerical example is enough, and in fact it is best.
    \end{enumerate}

\item Independently for $i= 1, \ldots, n$, let $Y_i = \beta X_i + \epsilon_i$, where 
\begin{displaymath}
    \left( \begin{array}{c} X_i \\ \epsilon_i \end{array} \right) \sim
    N_2\left(  \left( \begin{array}{c} \mu_x \\ 0 \end{array} \right), 
 \left( \begin{array}{cc}
\sigma^2_x &       c \\ 
   c        & \sigma^2_\epsilon
\end{array} \right)  
    \right)
\end{displaymath}
The observable data are $\mathbf{D}_1, \ldots, \mathbf{D}_n$, where 
$\mathbf{D}_i =  \left( \begin{array}{c} X_i \\ Y_i \end{array} \right)$.
    \begin{enumerate}
        \item Draw a path diagram for this model.
        \item What is the distribution of $\mathbf{D}_i$? Hint: If $\mathbf{w} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, what is the distribution of $\mathbf{Aw}$? What is $\mathbf{A}$ for this problem?
        \item What is the parameter space? Note that $c$ cannot be just anything.
        \item Give the formula for a Method of Moments estimate of $\beta$. Is it consistent? That is, does it converge in probability to the right answer \emph{everywhere} in the parameter space?
        \item For this model, showing parameter identifiability consists of solving five equations in five unknowns. What are the equations? The lecture slide entitled ``Five equations in six unknowns" should help.
        \item For some points in the parameter space these equations can be solved, and for others they cannot. Where can they be solved? You don't have to literally give the solutions.
        \item As in Question \ref{poisson}, give a numerical example of two different parameter vectors that yield the same distribution of $\mathbf{D}_i$. Stay in the parameter space.  What are the mean vector and covariance matrix of $\mathbf{D}_i$ for your example? This shows that the parameter is not technically identifiable, even though things are okay in most of the parameter space. It suggests that we need a pointwise definition of parameter identifiability, which is given later in Chapter Zero. 
    \end{enumerate}

% Traditional notation for an instrumental variable is Z, and the path diagram is better than lecture slides and text in other ways, too.
\item For a simple instrumental variables model, the model equations are
\begin{eqnarray*}
    X_i & = & \alpha_1 + \beta_1W_i +\epsilon_{i1} \\ 
    Y_i & = & \alpha_2 + \beta_2X_i +\epsilon_{i2}   
\end{eqnarray*} 
and the path diagram is
\begin{center}
\includegraphics[width=3in]{InstruVar}
\end{center}
    \begin{enumerate}
        \item Calculate the expected value vector and covariance matrix of the observable data. 
        \item Is the parameter $\beta_1$ identifiable? Answer Yes or No and prove it.
        \item Give the formula for a Method of Moments estimate of the covariance parameter $c$ in terms of $\widehat{\sigma}_{ij}$ values.
        \item This is also the maximum likelihood estimate. Why?
    \end{enumerate}




\end{enumerate}


\end{document}