\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s15 Assignment Five}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s15} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s15}}} \vspace{1 mm} \end{center} \noindent The non-computer questions on this assignment are for practice, and will not be handed in. For the SAS part of this assignment (Question~\ref{SAS}) please bring your log file and your output file to the quiz. There may be one or more questions about them, and you may be asked to hand the printouts in with the quiz. \begin{enumerate} \item \label{poisson} Men and women are calling a technical support line according to independent Poisson processes with rates $\lambda_1$ and $\lambda_2$ per hour. Data for 144 hours are available, but unfortunately the sex of the caller was not recorded. All we have is the number of callers for each hour, which is distributed Poisson($\lambda_1+\lambda_2$). \begin{enumerate} \item The parameter in this problem is $\boldsymbol{\theta} = (\lambda_1,\lambda_2)^\prime$. Try to find the MLE by differentiating. Show your work. Are there any points in the parameter space where both partial derivatives are zero? \item Is the parameter identifiable? Answer Yes or No and prove your answer. If the answer is no, all you need to prove your answer is a simple numerical example of two different parameter vectors that yield the same probability distribution for the sample data. \item Give one identifiable \emph{function} of $\boldsymbol{\theta}$. Show it is identifiable by expressing it as a function of the probability distribution of the observable data. \end{enumerate} \item \label{randiv} Independently for $i = 1 , \ldots, n$, let \begin{eqnarray*} Y_i & = & \beta X_i + \epsilon_i \\ W_i & = & X_i + e_i \end{eqnarray*} where all random variables are normal with expected value zero, $Var(X_i)=\phi>0$, $Var(\epsilon_i)=\psi>0$, $Var(e_i)=\omega>0$ and $\epsilon_i$, $e_i$ and $X_i$ are all independent. The variables $W_i$ and $Y_i$ are observable, while $Y_i$ is latent. Error terms are never observable. \begin{enumerate} \item What is the parameter vector $\boldsymbol{\theta}$ for this model? \item Calculate the covariance matrix $\boldsymbol{\Sigma}$ of the observable variables, expressed as a function of the model parameters. \item Does this model pass the test of the parameter count rule? Answer Yes or No and give the numbers. \item Are there any points in the parameter space where the parameter $\beta$ is identifiable? Are there infinitely many, or just one point? \item The usual estimator of $\beta$ is \begin{displaymath} \widehat{\beta}_n = \frac{\sum_{i=1}^n W_i Y_i}{\sum_{i=1}^n W_i^2}. \end{displaymath} Is $\widehat{\beta}_n$ a consistent estimator of $\beta$? Answer Yes or No and prove your answer. \end{enumerate} \item Independently for $i=1, \ldots, n$, let \begin{eqnarray*} Y_{i~~} &=& \beta X_i + \epsilon_i \\ V_{i~~} &=& Y_i + e_i \\ W_{i,1} &=& X_i + e_{i,1} \\ W_{i,2} &=& X_i + e_{i,2} \end{eqnarray*} where \begin{itemize} \item $Y_i$ is a latent variable. \item $V_i$, $W_{i,1}$ and $W_{i,2}$ are all observable variables. \item $X_i$ is a normally distributed \emph{latent} variable with mean zero and variance $\phi>0$. \item $\epsilon_i$ is normally distributed with mean zero and variance $\psi>0$. \item $e_{i}$ is normally distributed with mean zero and variance $\omega>0$. \item $e_{i,1}$ is normally distributed with mean zero and variance $\omega_1>0$. \item $e_{i,2}$ is normally distributed with mean zero and variance $\omega_2>0$. \item $X_i$, $\epsilon_i$, $e_i$, $e_{i,1}$ and $e_{i,2}$ are all independent of one another. \end{itemize} \begin{enumerate} \item Make a path diagram of this model. \item What is the parameter vector $\boldsymbol{\theta}$ for this model? \item Does this problem pass the test of the Parameter Count Rule? Answer Yes or No and give the numbers. \item Calculate the variance-covariance matrix of the observable variables as a function of the model parameters. Show your work. \item Is the parameter vector identifiable at every point in the parameter space? Answer Yes or No and prove your answer. \item Some parameters are identifible, while others are not. Which ones are identifiable? \item If $\beta$ (the paramter of main interest) is identifiable, propose a Method of Moments estimator for it and prove that your proposed estimator is consistent. \item Suppose the sample variance-covariance matrix $\widehat{\Sigma}$ is \begin{verbatim} W1 W2 V W1 38.53 21.39 19.85 W2 21.39 35.50 19.00 V 19.85 19.00 28.81 \end{verbatim} Give a reasonable estimate of $\beta$. There is more than one right answer. The answer is a number. (Is this the Method of Moments estimate you proposed? It does not have to be.) \textbf{Circle your answer.} \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \label{SAS} This is a slight extension of the SAS part of Assignment Four. Again, students in a first-year Calculus class took a diagnostic test with two parts: Pre-calculus and Calculus. Their High School Calculus marks and their marks in University Calculus were also available. In order, the variables in the data file are: Identification code, Mark in High School Calculus, Score on the Pre-calculus portion of the diagnostic test, Score on the Calculus portion of the diagnostic test, and mark in University Calculus\footnote{Thanks to Cleo Boyd for premission to use these original data.}. Data are available in the file \href{http://www.utstat.toronto.edu/~brunner/data/legal/mathtest.txt} {\texttt{mathtest.data.txt}}. \begin{enumerate} \item Using SAS \texttt{proc calis}, first carry out an unconditional regression in which the explanatory variables are Mark in High School Calculus, Score on the Pre-calculus portion of the diagnostic test and Score on the Calculus portion of the diagnostic test. The response variable is mark in University Calculus. All the variables are observable. This is your full, unrestricted model, and it is saturated. \item You want a single test for whether the two parts of the diagnostic test are useful for predicting university calculus mark. Carry out a likelihood ratio test to do this. What is the null hypothesis? It doesn't matter whether your test statistic is $G^2$ or $\frac{n-1}{n} \, G^2$. Be able to state your conclusions from this test in simple, non-technical language. Your conclusion is something about the usefulness of the diagnostic test. You should be guided by the $\alpha=0.05$ significance level, but \emph{do not mention the significance level in your conclusion}. It's too technical for your boss, who will fire you if you start showing off. \item In addition to the requested likelihood ratio test, your output file has lots of ``$t$-tests" (actually, $Z$-tests). Know what null hypotheses they all are testing, but pay attention \emph{only} to the ones based on the unrestricted model. This is a general rule. \item What null hypothesis is being tested by the ``Baseline Model Chi-Square?" Is it the same for the restricted and unrestricted model? Why are the degrees of freedom what they are? \end{enumerate} \vspace{10mm} \noindent Bring your log file and your list file to the quiz. You may be asked for numbers from your printouts, and you may be asked to hand them in. There are lots of \textbf{There must be no error messages, and no notes or warnings about invalid data on your log file.} \end{enumerate} \end{document}