\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s23 Assignment Eight}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s23} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s23}}} \vspace{1 mm} \end{center} \noindent The non-computer questions on this assignment are practice for the quiz, and will not be handed in. Please bring printouts for the R part of this assignment (Question~\ref{Rsim}) to the quiz. There may be one or more questions about the printouts, and you may be asked to hand them in with the quiz. \begin{enumerate} \item In the following model, assume that $E(X)=\mu_x$, and the regression equations \emph{do} have intercepts. % Path diagram: Had to fiddle with this! \begin{picture}(100,100)(0,0) % Size of picture (does not matter), origin \put(100,75){\framebox{$X$}} % Put at location, object framed by box \put(117,78){\vector(1,0){55}} % Put at location, vector in direction (1,0), length 55 \put(138,81){$\gamma_1$} \put(180,25){$\epsilon_1$} \put(183,33){\vector(0,1){35}} \put(175,75){\framebox{$Y_1$}} \put(195,78){\vector(1,0){53}} \put(215,81){$\beta_1$} \put(255,25){$\epsilon_2$} \put(258,33){\vector(0,1){35}} \put(250,75){\framebox{$Y_2$}} \end{picture} \begin{enumerate} \item Classify all the random variables in the model (including error terms) as either Exogenous or Endogenous, and as either Observable or Latent. \item Express the model as a set of equations. Please start by writing ``Independently for $i=1, \ldots, n$, \ldots" and put a subscript $i$ on all the random variables. Assume that all the exogenous variables are normal, and include this in the statement of the model. Make up your own symbols for parameters when necessary, but try to stay consistent with the notation being used in the course. \item What is the parameter vector $\boldsymbol{\theta}$ for this model? \item Write the mean vector $\boldsymbol{\mu}$ and variance-covariance matrix $\boldsymbol{\Sigma}$ of an observable data vector $(X_i,Y_{i,1},Y_{i,2})$ in terms of the model parameters; show your work. Each element of $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$ should contain a formula in terms of quantities like $\phi$, $\beta$ and so on. \item Are the parameters of the model identifiable? Answer Yes or No and prove it. If the answer is No, all you need is a simple numerical example of two distinct parameter vectors that yield the same mean and covariance matrix of the observable data. \item Is this model saturated? Answer Yes or No. \item Suppose that $X$, $Y_1$ and $Y_2$ were all latent variables, and there were two independent measurements of each one. Independent means no covariance between any measurement errors. For simplicity, assume no intercepts and $E(X)=0$. \begin{enumerate} \item Draw the path diagram for the new model. \item Are all the parameters of the new model identifiable? Answer yes or no and explain why. No detailed calculations are needed. \end{enumerate} \end{enumerate} \item Patients with high blood pressure are randomly assigned to different dosages of a blood pressure medication. There are lots of different dosages, so dosage may be treated as a continuous variable. Because the exact dosage is known, this variable is observed without error. After one month of taking the medication, the level of the drug in the patient's bloodstream is measured once (with error, of course), by an independent lab. Then, three independent measurements of the patient's blood pressure are taken. One is done by the lab that did the blood test, one is the average of 7 daily measurements taken at home by the patient, and one is done in the doctor's office. Notice that the same lab measures the blood level of the drug, and also does one of the blood pressure measurements. Do \emph{not} assume that errors in the two measurements carried out by the lab are independent. \textbf{Make a path diagram. Do not bother to write coefficients on the arrows this time}, but write brief labels (``Dose" etc.) in the boxes and ovals. \item Consider the following path diagram. \begin{center} \includegraphics[width=4in]{BigHandPath} % Need \usepackage{graphicx} \end{center} \begin{enumerate} \item Give the model equations in scalar form, for a centered model with no intercepts or expected values. \item Give the latent variable model equations in matrix form: $\mathbf{y}_i = \boldsymbol{\beta} \mathbf{y}_i + \boldsymbol{\Gamma} \mathbf{x}_i + \boldsymbol{\epsilon}_i$. Use symbols from the path diagram. For example, you will write $\boldsymbol{\beta}$ as a $3 \times 3$ matrix of zeros and $\beta_j$ symbols. \item Give the measurement model equations in matrix form: $\mathbf{d}_i = \boldsymbol{\Lambda}\mathbf{F}_i + \mathbf{e}_i$. Use symbols from the path diagram. \item Give the following matrices: $\boldsymbol{\Phi}_x, \boldsymbol{\Psi}, \boldsymbol{\Omega}$. Make sure the dimensions are correct. Some of the symbols you need are on the path diagram, but not all. \end{enumerate} \item For the General Structural Equation Model (see formula sheet), calculate \begin{enumerate} \item $cov(\mathbf{y}_i)$ \item $cov(\mathbf{x}_i,\mathbf{y}_i)$ \end{enumerate} \item In your calculation of $cov(\mathbf{y}_i)$ and $cov(\mathbf{x}_i,\mathbf{y}_i)$, you used the matrix $(\mathbf{I}-\boldsymbol{\beta})^{-1}$. As described in lecture, the existence of this matrix is implied by the model. Assume it does \emph{not} exist. Then the rows of $(\mathbf{I}-\boldsymbol{\beta})$ are linearly dependent, and there is a $q \times 1$ vector $\mathbf{v} \neq \mathbf{0}$ with $\mathbf{v}^\top (\mathbf{I}-\boldsymbol{\beta}) = \mathbf{0}$. \begin{enumerate} \item Under this assumption, show $\mathbf{v}^\top\boldsymbol{\epsilon}_i = - \mathbf{v}^\top \boldsymbol{\Gamma} \mathbf{X}_i$. \item Show that the last equality contradicts $\boldsymbol{\Psi}$ positive definite. % The joint distribution of x and epsilon is degenerate -- concentrated on a lower-dimensional subset of R^{p+q}. This contradicts independence. But instead, calculate covariances. \end{enumerate} \item The following centered model has zero covariance between all pairs of exogenous variables, including error terms. \begin{eqnarray} Y_1 &=& \gamma_1 X + \epsilon_1 \nonumber \\ Y_2 &=& \beta Y_1 + \gamma_2 X + \epsilon_2 \nonumber \\ W &=& X + e_1 \nonumber \\ V_1 &=& Y_1 + e_2 \nonumber \\ V_2 &=& Y_2 + e_3 \nonumber \end{eqnarray} \begin{enumerate} \item As the notation suggests, the observable variables are $W$, $V_1$ and $V_2$. Draw the path diagram. Put a coefficient on each straight arrow that does not come from an error term, either the number one or a Greek letter. It is assumed that all straight arrows coming from error terms have a one. \item Are the parameters of this model identifiable from the covariance matrix? Respond Yes or No and justify your answer. \end{enumerate} \pagebreak \item Consider the following model. \vspace{10mm} % Path diagram: Had to fiddle with this! \begin{picture}(100,100)(0,0) % Size of picture (does not matter), origin \put(116,46){$F$} \put(120,50){\circle{20}} \put(127,59){\vector(1,1){40}} % Put at location, vector tow (1,1), length 40 \put(135,75){$\lambda$} \put(129,44){\vector(1,-1){39}} \put(147,25){$\lambda$} \put(30,46){\framebox{$D_1$}} % Put at location, object framed by box \put(108,48){\vector(-1,0){52}} \put(35,100){$e_1$} \put(38,95){\vector(0,-1){35}} \put(170,100){\framebox{$D_2$}} \put(245,100){$e_2$} \put(243,102){\vector(-1,0){52}} \put(170,000){\framebox{$D_3$}} \put(245,000){$e_3$} \put(242,002){\vector(-1,0){50}} \end{picture} \begin{enumerate} \item Write the model equations in centered form. Please start by writing ``Independently for $i=1, \ldots, n$, \ldots" and put a subscript $i$ on all the random variables. \item Let $\boldsymbol{\theta}$ denote the vector of parameters that appear in the covariance matrix of the observable data. What is $\boldsymbol{\theta}$? \item Does this model pass the test of the parameter count rule? Answer Yes or No and give the numbers. \item The parameters are identifiable at some points in the parameter space, but not at all points. For what points in the parameter space are the parameters identifiable? Show your work. \item In a test of model fit, what would the degrees of freedom be? The answer is a single number. \end{enumerate} \item \label{hiddenIV} In the following model, all random variables are normally distributed with expected value zero, and there are no intercepts. \vspace{10mm} % Path diagram: Had to fiddle with this! \begin{picture}(100,100)(0,0) % Size of picture (does not matter), origin \put(30,46){\framebox{$X$}} % Put at location, object framed by box \put(50,50){\vector(1,0){59}}% Put at location, vector tow (1,0), length 59 \put(75,55){$\gamma$} \put(112,100){\framebox{$V$}} \put(120,60){\vector(0,1){35}} \put(112,70){$1$} \put(35,100){$e$} \put(45,102){\vector(1,0){65}} \put(116,46){$Y_1$} \put(120,50){\circle{20}} \put(127,59){\vector(1,1){40}} %\put(75,000){$\epsilon_1$} %\put(82,007){\vector(1,1){32}} \put(90,20){$\epsilon_1$} \put(97,27){\vector(1,1){15}} \put(170,100){\framebox{$Y_2$}} \put(245,100){$\epsilon_2$} \put(243,102){\vector(-1,0){52}} \put(135,80){$\beta_1$} \put(170,000){\framebox{$Y_3$}} \put(245,000){$\epsilon_3$} \put(242,002){\vector(-1,0){50}} \put(129,44){\vector(1,-1){39}} \put(148,25){$\beta_2$} \put(270,52){\oval(20,100)[r]} % r for Right side of oval \put(270,102){\vector(-1,0){10}} % Top arrow on curve \put(270,002){\vector(-1,0){10}} % Bottom arrow on curve \end{picture} \begin{enumerate} \item Write the model equations in scalar form. \item What is the parameter vector $\boldsymbol{\theta}$ for this model? Use standard notation. Include unknown parameters only. \item Does this model pass the test of the parameter count rule? Answer Yes or No and give both numbers. \item Show that $\phi$ and $\gamma$ are identifiable, and then show that $\beta_1$ and $\beta_2$ are identifiable provided $\gamma \neq 0$. The other parameters are also identifiable in most of the parameter space, but you don't have to do the calculations. \end{enumerate} % X is an instrumental variable. Ha! \pagebreak \item \label{Rsim} Simulate a data set based on the model of Question~\ref{hiddenIV}. You choose the true parameter values and you choose the sample size, but make it large. Fit the model with \texttt{lavaan}, and verify that you are estimating the parameters accurately. For the correlated error terms, you may find my \texttt{rmvn} function helpful. You can access it with \begin{center} \texttt{source("https://www.utstat.toronto.edu/brunner/openSEM/fun/rmvn.txt")} \end{center} \vspace{5mm} \noindent Remember, \emph{Question \ref{Rsim} is not a group project}. You are required to do the work yourself. You may discuss it with your classmates, but do not look at anyone else's code, and do not show anyone else your code. \end{enumerate} \vspace{60mm} \noindent \textbf{Please bring your printout from Question~\ref{Rsim} to the quiz. You may be asked for numbers from your printout, and you may be asked to hand it in as part of the quiz.} \end{document}