\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s17 Assignment Eleven}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s17} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s17}}} \vspace{1 mm} \end{center} % Slide sets 19: CFA2: Original and surrogate models % 20: % 21: \noindent These problems are preparation for the quiz on Wednesday March 29th, and are not to be handed in. \begin{enumerate} \item \label{deep6} Refer to the following as the \emph{original model}. It's really not quite original because it has been centered. Let \begin{eqnarray*} D_1 & = & \lambda_1 F_1 + e_1 \\ D_2 & = & \lambda_2 F_1 + e_2 \\ D_3 & = & \lambda_3 F_1 + e_3 \\ D_4 & = & \lambda_4 F_2 + e_4 \\ D_5 & = & \lambda_5 F_2 + e_5 \\ D_6 & = & \lambda_6 F_2 + e_6, \end{eqnarray*} where all expected values are zero, $Var(e_i)=\omega_i$ for $i=1, \ldots, 6$, $cov\left( \begin{array}{c} F_{1} \\ F_{2} \end{array} \right) = \left( \begin{array}{c c} \phi_{11} & \phi_{12} \\ \phi_{12} & \phi_{22} \end{array} \right)$, the factors are independent of the error terms, and all the error terms are independent of each other. All the factor loadings are non-zero. \begin{enumerate} \item Make a path diagram. \item Does this model pass the test of the parameter count rule? Answer Yes or No and give the numbers. % 3+6+6 = 15 parameters, 6(6+1)/2 = 21 equations, Yes \item \label{Sigma} Calculate or otherwise obtain the covariance matrix of the observable variables. \item \label{nonident} Show that the parmeters of this model are not identifiable by giving specific $\lambda_1^\prime$, $\lambda_2^\prime$, $\lambda_3^\prime$, $\phi_{11}^\prime$ and $\phi_{12}^\prime$ (different from the original parameters) that yield the same covariance matrix as the parameters of the original model. The parameters not named are th same in both models. \item Show that $\lambda_2/\lambda_1$ is identifiable. \item Show that $\omega_4$ is identifiable. \item Show that the reliability of $D_1$ is identifiable. If $\lambda_1$ is getting in your way, remember that the reliability is one minus the proportion of variance that is \emph{not} error. % Calculate from the definition. \item When we set a factor loading to one in order to obtain identifiability, we are not really \emph{assuming} something like $\lambda_1=1$. That would be wildly unrealistic. Instead, we are transforming the latent variable to obtain $D_1 = F_1^* + e_1$. Let's do the same thing to $F_2$ as well (the standard trick) so that $D_4 = F_2^* + e_4$. Now we will see the change of variables produces a re-parameterization that changes the meaning of the other parameters in the model. \begin{enumerate} \item Under the re-parameterized (surrogate) model, $D_2 = \lambda_2^* F_1^* + e_2$. What is $\lambda_2^*$? There's just a little work to show. Show it. \item What is $\lambda_6^*$? There's just a little work to show. Show it. \item What is $\phi_{22}^*$? There's just a little work to show. Show it. \item What is $\phi_{12}^*$? There's just a little work to show. Show it. \item What is $\lambda_1^*$? $\lambda_4^*$? \end{enumerate} \item The other standard trick is to set the variances of the factors to one. For the resulting surrogate model, \begin{enumerate} \item What are $F_1^*$ and $F_2^*$ in terms of the original model? \item What is $\lambda_2^*$ in terms of the original model? \item Show that $\phi_{12}^* = Corr(F_1,F_2)$. \item The parameters of this surrogate model will be identifiable provided that the sign of one factor loading is known for each factor. Assuming $\lambda_1^*>0$ and $\lambda_4^*>0$, give formulas for $\lambda_1^*$ and $\lambda_4^*$ in terms of $\sigma_{ij}$ quantities. You can use your answer to Question~\ref{Sigma}, modifying the notation a bit in your mind. \item Using your answer to the last question, give a formula for $\phi_{12}^*$ in terms of $\sigma_{ij}$ quantities. \item Now take that same function of the $\sigma_{ij}$ quantities assuming the \emph{original} model. When you solve for $\phi_{12}^*$, what are you really identifying in terms of the original model? The beauty of this is that $\phi_{12}^*$ is the correlation between factors under the original model and the surrogate model. % This is just an important example. The general argument about meaning of the parameters goes in the text. Matrix version? \end{enumerate} \item Continuing with the two-factor example in which variances of the factors equal one under the surrogate model, suppose that in the original model, $F_2 = \gamma F_1 + \epsilon$. \begin{enumerate} \item Give the $2 \times 2$ matrix $\boldsymbol{\Phi}$ assuming the original model. \item For the surrogate latent model $F_2^* = \gamma^* F_1^* + \epsilon^*$, express $\gamma^*$ and $\psi^*$ in terms of the parameters of the original model. \item What are $F_1^*$, $F_2^*$ and $\epsilon^*$ in terms of the original model? \item What is $Var(\epsilon^*)$ n terms of the parameters of the original model? \end{enumerate} \end{enumerate} % Need a question about how the equality constraints on Sigma are the same. \item \label{onefact} Now consider an original (except it's centered) single-factor model in which \begin{eqnarray*} D_1 &=& \lambda_1 F + e_1 \\ D_2 &=& \lambda_2 F + e_2 \\ D_3 &=& \lambda_3 F + e_3 \\ D_4 &=& \lambda_4 F + e_4 \end{eqnarray*} where $e_1,\ldots, e_4, F$ are all independent, $Var(e_j) = \omega_j$, $Var(F) = \phi$ and $\lambda_j \neq 0$. Write $\boldsymbol{\Sigma}$ in terms of the model parameters. This is mostly just for later use. You have already done almost all the work in Question~\ref{Sigma}. \item Now set $\lambda_1=1$ in the model of Question~\ref{onefact}, resulting in a surrogate model. \begin{enumerate} \item \label{explicit} Show that $\lambda_2, \lambda_3, \lambda_4$ and $\phi$ are identifiable by solving for them explicitly in terms of $\sigma_{ij}$ quantities. The $\omega_j$ are identifiable too, but don't bother with them. \item Now substitute your solutions for the parameters from Question~\ref{explicit} back into the six covariance structure equations for $\sigma_{ij}$ where $i \neq j$. The result is two model-induced \emph{equality constraints} on the covariances. What are they? By the way, this is a general method for deriving equality constraints (the null hypothesis of the goodness of fit test), but the parameters have to be identifiable. \item But all is not lost. Verify that the constraints you obtained for the surrogate model also are true of the original model of Question~\ref{onefact}. It is interesting and useful that models with non-identifiable parameters can imply testable constraints on the covariance matrix. \item Are the constraints also true under the other surrogate model with $\phi=1$? \end{enumerate} \item \label{bigpath} For the following path diagram, assume that any arrow unmarked by a symbol has the coefficient one. When you give model equations and matrices below, please use the symbols from the path diagram. \begin{center} \includegraphics[width=4in]{A12Pic2} % Need \usepackage{graphicx} \end{center} Explain why the model parameters are all identifiable, making specific reference to the identifiability rules on the reference sheet. \item Write symbols on the arrows of the path diagram below, selecting a surrogate model whose parameters are identifiable. Cite the rule you are using by name, letter and number. What else must you assume? \begin{center} \includegraphics[width=4in]{A12Pic3} % Need \usepackage{graphicx} \end{center} \item Take a look at the path diagram below. \begin{center} \includegraphics[width=4in]{A12Pic1} % Need \usepackage{graphicx} \end{center} \begin{enumerate} \item At a glance, do the parameters look identifiable to you? How come? You don't have to prove anything. \item Now examine the latent variable model. How does the acyclic rule fail? \end{enumerate} \pagebreak \item Make a path diagram with two factors that illustrates both the three-variable rule for unstandardized factors and the crossover rule. Write symbols on the arrows. If an arrow does not have a symbol it means the coefficient equals one. Make the parameters identifiable. % \item In a reaction time study, subjects are seated at a screen. A light flashes on the screen, and they press a key as fast as they can; the time between the light flash and the key press is recorded automatically. After some warmup trials, the subjects do the task 50 times, so 50 reaction times are recorded. The 50 times are divided randomly into two sets of 25, and then the median is calculated for each set. In the end, each subject produces two median reaction times. The scientists locate sample of university student volunteers whose parents and grandparents are also available to do the experiment. When all the data have been collected, there is a data file with $n$ lines of data. Each line of data has 14 numbers. There are two median reaction times for each of the following individuals: \begin{itemize} \item The student \item Mother \item Father \item Maternal grandmother (mother's mother) \item Maternal grandfather (mother's father) \item Paternal grandmother (father's mother) \item Paternal grandfather (father's father) \end{itemize} As in most applications of statistical methods to real data, your job is to translate this flood of words into a statistical model. \begin{enumerate} \item Make a path diagram. Write symbols on the arrows, making it a surrogate model with identifiable parameters. \item Explain why the parameters are identifiable. Cite the rules you are using by name, letter and number. \end{enumerate} \item Make a path diagram that illustrates the acyclic rule, the error-free rule and double measurement. Write symbols on the arrows. If an arrow does not have a symbol it means the coefficient equals one. Keep it fairly simple and make the parameters identifiable. \pagebreak \item In a study of maternal behaviour in cats, mother cats with new litters of kittens were injected daily with estrogen, a female sex hormone. The cats were randomly assigned to different dosages (amounts) of estrogen. There are lots of different dosages, so dosage may be treated as a continuous variable. Because the exact amount injected is known, the variable Dosage is observed without error. After three days, the amount of estrogen in the animal's bloodstream is measured, once. Of course it is measured with error. Then for the next seven days, the following maternal behaviours are recorded. \begin{itemize} \item Nursing time in total minutes. \item Licking in total number of times the cat licked one of her kittens. \item Retrievals: The mother cat picks up one of her kittens by the skin on the back of its neck, and carries it somewhere. \end{itemize} \begin{enumerate} \item Make a path diagram. Write symbols on the arrows, making it a surrogate model with identifiable parameters. \item Explain why the parameters are identifiable. Cite the rules you are using by name, letter and number. \end{enumerate} \item This question outlines a different approach to identifying the parameters of a measurement model. Recall that when latent variables are measured with error and the error terms are not correlated, only the variances of $\boldsymbol{\Sigma}$ are affected by measurement error. The covariances are untouched. \begin{enumerate} \item Just to remind yourself of this fact, calculate $\boldsymbol{\Sigma}$ for the following model. Independently for $i=1, \ldots, n$, let % Need eqnarray inside a parbox to make it the cell of a table \begin{tabular}{ccc} \parbox[m]{1.5in} { \begin{eqnarray*} D_{i,1} &=& F_{i,1} + e_{i,1} \\ D_{i,2} &=& F_{i,2} + e_{i,2} \\ && \end{eqnarray*} } % End parbox & $cov\left( \begin{array}{c} F_{i,1} \\ F_{i,2} \end{array} \right) = \left( \begin{array}{c c} \phi_{11} & \phi_{12} \\ \phi_{12} & \phi_{22} \end{array} \right)$ & $cov\left( \begin{array}{c} e_{i,1} \\ e_{i,2} \end{array} \right) = \left( \begin{array}{c c} \omega_1 & 0 \\ 0 & \omega_2 \end{array} \right)$ \end{tabular} \item Based on a random sample of $(D_1,D_2)$ pairs, do we have $\widehat{\boldsymbol{\Sigma}} \stackrel{p}{\rightarrow} \boldsymbol{\Phi}$? Answer Yes or No and briefly justify your answer. \item Denote the reliability of $D_{i,1}$ as a measure of $F_{i,1}$ by $r_1$, and denote the reliability of $D_{i,2}$ as a measure of $F_{i,2}$ by $r_2$. Suppose you have good (consistent) estimates of $r_1$ and $r_2$ from another source; say $\widehat{r}_1 \stackrel{p}{\rightarrow} r_1$ and $\widehat{r}_2 \stackrel{p}{\rightarrow} r_2$. Give a consistent estimator of $\boldsymbol{\Phi}$. Show your work. \end{enumerate} The point of this question is that sometimes you can use ``auxiliary" (outside) information to identify the parameters of a measurement model and rescue the analysis of data that were not collected with latent variables in mind. \end{enumerate} \end{document} \item Now let us consider $\phi_{12}$. \begin{enumerate} \item Show that$\phi_{12}^\prime$ is identifiable under the re-parameterized model by expressing it as a function of $\sigma_{ij}$ quantities. \item Now take that same function of the $\sigma_{ij}$ quantities assuming the \emph{original} model. What are you really identifying in terms of the original model? \end{enumerate}