\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links % \usepackage{fullpage} %\pagestyle{empty} % No page numbers % To use more of the top and bottom margins than fullpage \oddsidemargin=-.2in % Good for US Letter paper \evensidemargin=-.2in \textwidth=6.6in \topmargin=-1.1in \headheight=0.2in \headsep=0.5in \textheight=9.4in \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2053 Assignment 4 (Double measurement regression, surrogate models)}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/brunner/oldclass/2053f22} {\texttt{http://www.utstat.toronto.edu/brunner/oldclass/2053f22}}} \vspace{1 mm} \end{center} \noindent Questions \ref{pigprep} and \ref{omod} are not to be handed in. They are practice for the quiz on November 13th. Please bring your complete R input and output for Question~\ref{Rpig} to the quiz. \begin{enumerate} \item \label{pigprep} Question \ref{Rpig} (the R part of this assignment) will use the \emph{Pig Birth Data}. As part of a much larger study, farmers filled out questionnaires about various aspects of their farms. Some questions were asked twice, on two different questionnaires several months apart. Buried in all the questions were \begin{itemize} \item Number of breeding sows (female pigs) at the farm on June 1st \item Number of sows giving birth later that summer. \end{itemize} There are two readings of these variables, one from each questionnaire. We will assume (maybe incorrectly) that because the questions were buried in a lot of other material and were asked months apart, that errors of measurement are independent between the two questionnaires. However, errors of measurement might be correlated within a questionnaire. \begin{enumerate} \item Propose a reasonable \emph{surrogate} centered model for these data, using the usual notation. Give all the details. You may assume normality if you wish. \item Make a path diagram of the model you have proposed. \item Calculate the covariance matrix of one observable data vector $\mathbf{d}_i$. \item Even though you have a general result that applies to this case, prove that all the parameters in the covariance matrix are identifiable. \item If there are any equality constraints on the covariance matrix, say what they are. \item \label{df} Based on your answer to the last question, how many degrees of freedom should there be in the chi-squared test for model fit? \item \label{mombetahat} Give a consistent estimator of $\beta$ that is \emph{not} the MLE, and explain why it's consistent. You may use the consistency of sample variances and covariances without proof. Your estimator must not be a function of any unknown parameters. \end{enumerate} % \pagebreak \item \label{Rpig} The Pig Birth Data are given in the file \href{http://www.utstat.toronto.edu/brunner/openSEM/data/openpigs2.data.txt} {\texttt{openpigs2.data.txt}}. There are $n=114$ farms; please verify that you are reading the correct number of cases. \begin{enumerate} \item Start by reading the data and then using the \texttt{var} function to produce a sample covariance matrix of all the observable variables. Don't worry about $n$ versus $n-1$. \item Use \texttt{lavaan} to fit your model. Look at \texttt{summary}. If you experience numerical problems you are doing something differently from the way I did it. When I fit a good model everything was fine. When I fit a poor model there was trouble. Just to verify that we are fitting the same model, my estimate of the variance of the latent exogenous variable is 357.145. \item Does your model fit the data adequately? Answer Yes or No and give three numbers: a chi-squared statistic, the degrees of freedom, and a $p$-value. % G^2 = 0.087, df = 1, p = 0.768 Do the degrees of freedom agree with your answer to Question~\ref{df}? \item \label{betahat} If the number of breeding sows present in September increases by one, what happens to the estimated number giving birth that summer? You answer is based on a single number from the output of \texttt{summary}. It is not an integer. % betahat = 0.757 \item Using your answer to Question~\ref{mombetahat} and the output of \texttt{var}, give a \emph{numerical} version of your consistent estimate of $\beta$. How does it compare to the MLE? % 0.5*(272.67101+260.02857)/348.52989 = 0.7642093 v.s. MLE of 0.7567. Pretty good! \item %Since maximum likelihood estimates are asymptotically normal, % (approximately normal for large samples), a large-sample confidence interval is $\widehat{\theta} \pm 1.96 se$, where $se$ is the standard error (estimated standard deviation) of $\widehat{\theta}$. Give a large-sample confidence interval for your answer to \ref{betahat}. I used \texttt{parameterEstimates} to do it the easy way. \item \label{rely} Recall that reliability of a measurement is the proportion of its variance that does \emph{not} come from measurement error. What is the estimated reliability of number of breeding sows? There are two numbers, one for questionnaire one and another for questionnaire two. You could do this with a calculator and the output of \texttt{summary}, but I did it with \texttt{:=} in the model string. %, which you get with a calculator and the output. I think this is the reliability of the number giving birth. % 1 - 93.82358/(0.7567^2*360.30522+33.93153 +93.82358) = 0.7191449 from proc calis MLEs \item It would be inconvenient at best to get confidence intervals for reliability with a calculator. Obtain confidence intervals for the reliabilities in Question~\ref{rely}. Try \texttt{parameterEstimates}. It uses the delta method. \item Is there evidence of correlated measurement error within questionnaires? Answer Yes or No and give some numbers from the results file to support your conclusion. \item The answer to that last question was based on two separate tests. Though it is already pretty convincing, conduct a \emph{single} Wald (not likelihood ratio) test of the two null hypotheses simultaneously. Give the Wald chi-squared statistic, the degrees of freedom and the $p$-value. What do you conclude? Is there evidence of correlated measurement error, or not? % SAS gave W =45.41656 , df=2, p < 0.0001. With R, I got 45.818 \end{enumerate} % End of R question \item \label{omod} Of course the model of Questions \ref{pigprep} and \ref{Rpig} is a surrogate model. \begin{enumerate} \item Give the model equations for the original model, in centered form. It's still double measurement. \item Calculate the covariance matrix of the observable data for the original model. Do you have the same equality constraint? \item Re-parameterizing by a change of variables (actually, two changes of variables), obtain the surrogate model of Question~\ref{pigprep}. What is $\beta^\prime$ in terms of the parameters of the original model? \end{enumerate} \end{enumerate} % End of questions \end{document} \vspace{2mm} \noindent Please bring your \emph{complete} R printout from Question~\ref{Rpig} to the quiz, showing all input and output. It may be handed in.