\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s15 Assignment Eight}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s15} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s15}}} \vspace{1 mm} \end{center} \noindent The non-computer questions on this assignment are practice for the quiz, and will not be handed in. Please bring your log files and your output files for the SAS part of this assignment (Question~\ref{SASmanager}) to the quiz. There may be one or more questions about them, and you may be asked to hand printouts in with the quiz. \begin{enumerate} \item In the following model, assume that $E(X)=\mu_x$, and the regression equations \emph{do} have intercepts. % Path diagram: Had to fiddle with this! \begin{picture}(100,100)(0,0) % Size of picture (does not matter), origin \put(100,75){\framebox{$X$}} % Put at location, object framed by box \put(117,78){\vector(1,0){55}} % Put at location, vector in direction (1,0), length 55 \put(138,81){$\gamma$} \put(180,25){$\epsilon_1$} \put(183,33){\vector(0,1){35}} \put(175,75){\framebox{$Y_1$}} \put(195,78){\vector(1,0){53}} \put(215,81){$\beta$} \put(255,25){$\epsilon_2$} \put(258,33){\vector(0,1){35}} \put(250,75){\framebox{$Y_2$}} \end{picture} \begin{enumerate} \item Classify all the random variables in the model as either Exogenous or Endogenous, and as either Manifest or Latent. \item Express the model as a set of equations. Please start by writing ``Independently for $i=1, \ldots, n$, \ldots" and put a subscript $i$ on all the random variables. Assume that everything is normal, and include this in your statement of the model. Make up your own symbols for parameters when necessary, but try to stay consistent with the notation being used in the course. \item What is the parameter vector $\boldsymbol{\theta}$ for this model? \item What is the joint distribution of the manifest variables? Express the mean vector and variance-covariance matrix in terms of the model parameters; show your work. Each element of the mean vector and the variance-covariance matrix should contain a formula in terms of quantities like $\phi$, $\beta$ and so on. \item Are the parameters of the model identifiable? Answer Yes or No and prove it. If the answer is No, all you need is a simple numerical example of two distinct parameter vectors that yield the same mean and covariance matrix of the observable data. \item Is this model saturated? Answer Yes or No. \item Suppose that $X$, $Y_1$ and $Y_2$ were all latent variables, and there were two independent measurements of each one. Independent means no covariance between any measurement errors. For simplicity, assume no intercepts and $E(X)=0$. \begin{enumerate} \item Draw the path diagram for the new model. \item Are all the parameters of the new model identifiable? Answer yes or no and explain why. No detailed calculations are needed. \end{enumerate} \end{enumerate} \item Patients with high blood pressure are randomly assigned to different dosages of a blood pressure medication. There are lots of different dosages, so dosage may be treated as a continuous variable. Because the exact dosage is known, this variable is observed without error. After one month of taking the medication, the level of the drug in the patient's bloodstream is measured once (with error, of course), by an independent lab. Then, three independent measurements of the patient's blood pressure are taken. One is done by the lab that did the blood test, one is the average of 7 daily measurements taken at home by the patient, and one is done in the doctor's office. Notice that the same lab measures the blood level of the drug, and also does one of the blood pressure measurements. Do \emph{not} assume that errors in the two measurements carried out by the lab are independent. \textbf{Make a path diagram. Do not bother to write coefficients on the arrows this time}, but write brief labels (``Dose" etc.) in the boxes and ovals. \item Consider the following path diagram. \begin{center} \includegraphics[width=3in]{FinalPic1} % Need \usepackage{graphicx} \end{center} \begin{enumerate} \item Write down the model equations in scalar form. \item Refer to the general two-stage structural equation model on the formula sheet. Write down the following, using symbols from the path diagram. This part is intended to help you use matrices with the correct dimensions. \begin{enumerate} \item $\mathbf{X}$ \item $\mathbf{Y}$ \item $\mathbf{F}$ \item $\mathbf{D}$ \end{enumerate} \item Give the matrix $\boldsymbol{\beta}$, using symbols from the path diagram. \emph{Make sure it has the correct dimensions.} \item Give the matrix $\boldsymbol{\Gamma}$, using symbols from the path diagram. Make sure it has the correct dimensions. \item Give the matrix $\boldsymbol{\Lambda}$, using symbols from the path diagram. Make sure it has the correct dimensions. \item Give the matrix $\boldsymbol{\Phi}_x$, using symbols from the path diagram. Make sure it has the correct dimensions. Not all the symbols in $\boldsymbol{\Phi}_x$ appear on the path diagram. \item Give the matrix $\boldsymbol{\Psi}$. Make sure it has the correct dimensions. The symbols in $\boldsymbol{\Psi}$ do not appear on the path diagram. Use standard notation. \item Give the matrix $\boldsymbol{\Omega}$, using symbols from the path diagram. Make sure it has the correct dimensions. Not all the symbols in $\boldsymbol{\Omega}$ appear on the path diagram. \end{enumerate} \item For the General Structural Equation Model (see formula sheet), calculate \begin{enumerate} \item $V(\mathbf{Y}_i)$ \item $C(\mathbf{X}_i,\mathbf{Y}_i)$ \end{enumerate} \item In your calculation of $V(\mathbf{Y}_i)$ and $C(\mathbf{Y}_i,\mathbf{Y}_i)$, you used the matrix $(\mathbf{I}-\boldsymbol{\beta})^{-1}$. As described in lecture, the existence of this matrix is implied by the model. Assume it does \emph{not} exist. Then the rows of $(\mathbf{I}-\boldsymbol{\beta})$ are linearly dependent, and there is a $q \times 1$ vector $\mathbf{v} \neq \mathbf{0}$ with $(\mathbf{I}-\boldsymbol{\beta})^\top \mathbf{v} = \mathbf{0}$. \begin{enumerate} \item Show that there is a $p \times 1$ vector $\mathbf{a} \neq \mathbf{0}$ and a $q \times 1$ vector $\mathbf{b} \neq \mathbf{0}$ with $\mathbf{a}^\top\mathbf{X}_i = \mathbf{b}^\top\boldsymbol{\epsilon}_i$ for $i=1, \ldots, n$. Give formulas for $\mathbf{a}$ and $\mathbf{b}$. \item How does the existence of $\mathbf{a}$ and $\mathbf{b}$ contradict the model? % It goes beyond just lack of independence of X and epsilon. Their joint distribution is actually degenerate -- concentrated on a lower-dimensional subset of R^{p+q}. \end{enumerate} \item The following model has zero covariance between all pairs of exogenous variables. \begin{eqnarray} Y_1 &=& \gamma_1 X + \epsilon_1 \nonumber \\ Y_2 &=& \beta Y_1 + \gamma_2 X + \epsilon_2 \nonumber \\ W &=& X + e_1 \nonumber \\ V_1 &=& Y_1 + e_2 \nonumber \\ V_2 &=& Y_2 + e_3 \nonumber \end{eqnarray} \begin{enumerate} \item Draw the path diagram. Put a coefficient on each straight arrow that does not come from an error term, either the number one or a Greek letter. It is assumed that all straight arrows coming from error terms have a one. \item As the notation suggests, the observable variables are $W$, $V_1$ and $V_2$. Are the parameters of this model identifiable at every point in the parameter space? Respond Yes or No and justify your answer. You may assume $E(X)=0$. \end{enumerate} \pagebreak \item In the following model, all expected values are zero. \vspace{10mm} % Path diagram: Had to fiddle with this! \begin{picture}(100,100)(0,0) % Size of picture (does not matter), origin \put(116,46){$F$} \put(120,50){\circle{20}} \put(127,59){\vector(1,1){40}} % Put at location, vector tow (1,1), length 40 \put(135,75){$\lambda$} \put(129,44){\vector(1,-1){39}} \put(147,25){$\lambda$} \put(30,46){\framebox{$D_1$}} % Put at location, object framed by box \put(108,48){\vector(-1,0){52}} \put(35,100){$e_1$} \put(38,95){\vector(0,-1){35}} \put(170,100){\framebox{$D_2$}} \put(245,100){$e_2$} \put(243,102){\vector(-1,0){52}} \put(170,000){\framebox{$D_3$}} \put(245,000){$e_3$} \put(242,002){\vector(-1,0){50}} \end{picture} \begin{enumerate} \item Write this model as a set of simultaneous equations. Please start by writing ``Independently for $i=1, \ldots, n$, \ldots" and put a subscript $i$ on all the random variables. You may assume that everything is normal \item What is the parameter vector $\boldsymbol{\theta}$ for this model? \item Does this model pass the test of the parameter count rule? Answer Yes or No and give the numbers. \item Are the parameters of the model identifiable? Answer Yes or No and prove it. If the answer is No, all you need is a simple numerical example of two distinct parameter vectors that yield the same mean and covariance matrix of the observable data. \item In a test of model fit, what would the degrees of freedom be? The answer is a single number. \end{enumerate} \item In the following model, all random variables are normally distributed with expected value zero, and there are no intercepts. \vspace{10mm} % Path diagram: Had to fiddle with this! \begin{picture}(100,100)(0,0) % Size of picture (does not matter), origin \put(30,46){\framebox{$X$}} % Put at location, object framed by box \put(50,50){\vector(1,0){59}}% Put at location, vector tow (1,0), length 59 \put(75,55){$\gamma$} \put(112,100){\framebox{$V$}} \put(120,60){\vector(0,1){35}} \put(112,70){$1$} \put(35,100){$e$} \put(45,102){\vector(1,0){65}} \put(116,46){$Y_1$} \put(120,50){\circle{20}} \put(127,59){\vector(1,1){40}} %\put(75,000){$\epsilon_1$} %\put(82,007){\vector(1,1){32}} \put(90,20){$\epsilon_1$} \put(97,27){\vector(1,1){15}} \put(170,100){\framebox{$Y_2$}} \put(245,100){$\epsilon_2$} \put(243,102){\vector(-1,0){52}} \put(135,80){$\beta_1$} \put(170,000){\framebox{$Y_3$}} \put(245,000){$\epsilon_3$} \put(242,002){\vector(-1,0){50}} \put(129,44){\vector(1,-1){39}} \put(148,25){$\beta_2$} \put(270,52){\oval(20,100)[r]} % r for Right side of oval \put(270,102){\vector(-1,0){10}} % Top arrow on curve \put(270,002){\vector(-1,0){10}} % Bottom arrow on curve \end{picture} \begin{enumerate} \item Write the model equations in scalar form. \item What is the parameter vector $\boldsymbol{\theta}$ for this model? Use standard notation. Include unknown parameters only. \item Does this model pass the test of the parameter count rule? Answer Yes or No and give both numbers. \end{enumerate} \pagebreak \item \label{managerpath} Make a path diagram for the following data set. A farm co-operative (co-op) is an association of farmers. The co-op can buy fertilizer and other suppies in large quantities for a lower price, it often provides a common storage location for harvested crops, and it arranges sale of farm products in large quantities to grocery store chains and other food suppliers. Farm co-ops usually have professional managers, and some do a better job than others. We have data from a study of farm co-op managers. The response variable of interest is job performance, a latent variable. The variables in the ``latent variable" part of the model are the following, but note that one of them is assumed observable. \begin{itemize} \item[] $X_1$: Knowledge of business principles and products (economics, fertilizers and chemicals). This is a latent variable measured by $W_{11}$ and $W_{12}$. \item[] $X_2$: Profit-loss orientation (``Tendency to rationally evaluate means to an economic end"). This is a latent variable measured by $W_{21}$ and $W_{22}$. \item[] $X_3$: Job satisfaction. This is a latent variable measured by $W_{31}$ and $W_{32}$. \item[] $X_4$: Formal Education = Number of years of formal schooling divided by 6. This is an observable variable, assumed to be measured without error. \item[] $Y$: Job performance. This is a latent variable measured by $V_1$ and $V_2$. \end{itemize} The data file has these observable variables in addition to an identification code for the managers. \begin{itemize} \item[] $W_{11}$: Knowledge measurement 1 \item[] $W_{12}$: Knowledge measurement 2 \item[] $W_{21}$: Profit-Loss Orientation 1 \item[] $W_{22}$: Profit-Loss Orientation 2 \item[] $W_{31}$: Job Satisfaction 1 \item[] $W_{32}$: Job Satisfaction 2 \item[] $X_{4}$: Formal education, assumed measured without error \item[] $V_1$: Job Performance 1 \item[] $V_2$: Job Performance 2 \end{itemize} In this study, the double measurements are obtained by just splitting questionnaires in two, as in split half reliability. Furthermore, all the measurement errors are assumed independent of one another. This is consistent with mainstream psychometric theory, though maybe not with common sense. For this assignment, please assume that the errors are independent of one another, and independent of the explanatory variables. The explanatory variables, of course, should \emph{not} be assumed independent of one another. \pagebreak \item \label{SASmanager} The file \href{http://www.utstat.toronto.edu/~brunner/data/legal/manager.data.txt} {\texttt{manager.data.txt}} has raw data for the study described in Question~\ref{managerpath}. This is a reconstructed data set based on a covariance matrix in Jorekog (1978, p. 465). Joreskog got it from Warren, White and Fuller (1974). There is a link on the course web page in case the one in this document does not work. \begin{enumerate} \item Using proc calis, fit the appropriate model. There are 98 co-ops, so please make sure you are reading the correct number of cases. For comparison, my value of Akaike's Information Criterion (which will not be on the quiz) is 73.1819. If you get this number, we must be fitting the same model. Using your output file when necessary, be ready to answer questions like the following on the quiz. \begin{enumerate} \item There is one manifest exogenous variable. What is it? \item There is one latent endogenous variable. What is it? \item Based on the number of covariance structure equations and the number of unknown paramters, how many equality restrictions should the model impose on the covariance matrix? The answer is a single number; you need not say what they all are. \item Does your model fit the data adequately? Answer Yes or No and give three numbers: a chisquare statistic, the degrees of freedom, and a $p$-value. \item Controlling for knowledge, profit-loss orientation and job satisfaction, is there evidence that formal education is related to job performance? Answer Yes or No and give the value of a test statistic (actually it's a $Z$) that supports your conclusion. Of course in all these questions you are using the $\alpha=0.05$ significance level and a 2-sided test. \item Controlling for formal education, knowledge and profit-loss orientation, is there evidence that job satisfaction is related to job performance? Answer Yes or No and give the value of a test statistic (actually it's a $Z$) that supports your conclusion. If the answer is Yes, say whether satisfaction is positively related to performance, or negatively related. \item Controlling for job satisfaction, formal education and knowledge, is there evidence that profit-loss orientation is related to job performance? Answer Yes or No and give the value of a test statistic (actually it's a $Z$) that supports your conclusion. If the answer is Yes, say whether profit-loss orientation is positively related to performance, or negatively related. \item Carry out a Wald test of all the regression coefficients at once; use the \texttt{simtests} command. Be able to give the value of the chi-squared test statistic, the degrees of freedom, and the $p$-value -- all numbers from your printout. Using the usual $\alpha=0.05$ significance level, is there evidence that at least one regression coefficient must be non-zero? As a by-product, you get four tests with $p$-values identical to some in the default output. What null hypotheses are they testing? \item Estimate the reliability of Knowledge measurement 1. Your answer is a number. You will need a calculator. \item Estimate the reliability of Knowledge measurement 2. Your answer is a number. You will need a calculator. \pagebreak \item Show that the reliabilities of Knowledge Measurements 1 and 2 are equal if and only if the variances of the two measurement error terms are equal. This is is a paper-and-pencil calculation. It is the basis of the next (and last) test you are asked to carry out. \end{enumerate} \item Carry out a Wald (not likelihood ratio) test of the null hypothesis that the variances of the two measurement error terms for Knowledge measurements 1 and 2 are equal. By the last calculation you did, this is equivalent to testing whether the two reliabilities are equal. \begin{enumerate} \item What is the value of the chi-squared statistic? The answer is a number. \item What are the degrees of freedom? The answer is a number. \item What is the $p$-value? The answer is a number. \item Do you reject the null hypothesis at $\alpha=0.05$? Answer Yes or No. \item What do you conclude about the reliabilties of the two measurements? Using $\alpha=0.05$, do you have sufficient evidence to conclude that they are different? Answer Yes or No. If the answer is Yes, say which one seems to be more reliable. Of course the answer may not be Yes. If the answer is No, do \emph{not} draw any conclusions about which measurement is more reliable. \end{enumerate} \end{enumerate} \end{enumerate} \vspace{60mm} \noindent Bring your log and your output files to the quiz. You may be asked for numbers from your printouts, and you may be asked to hand them in. \textbf{There must be no error messages, and no notes or warnings about invalid data on your log file.} \end{document}