\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s15 Assignment Four}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s15} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s15}}} \vspace{1 mm} \end{center} \noindent The non-computer questions on this assignment are for practice, and will not be handed in. For the SAS part of this assignment (Question~\ref{SAS}) please bring your log file and your output file to the quiz. There may be one or more questions about them, and you may be asked to hand the printouts in with the quiz. \begin{enumerate} \item\label{measurementbias} In a study of diet and health, suppose we want to know how much snack food each person eats, and we ``measure" it by asking a question on a questionnaire. Surely there will be measurement error, and suppose it is of a simple additive nature. But we are pretty sure people under-report how much snack food they eat, so a model like~$W = X + e$ with $E(e)=0$ is hard to defend. Instead, let \begin{displaymath} W = \nu + X + e, \end{displaymath} where $E(X)=\mu$, $E(e)= 0$, $Var(X)=\sigma^2_X$, $Var(e)=\sigma^2_e$, and $Cov(X,e)=0$ The unknown constant $\nu$ could be called \emph{measurement bias}. Calculate the reliability of $W$ for this model. Is it the same as the expression for reliability given in the text and lecture, or does $\nu\neq 0$ make a difference? % Lesson: Assuming expected values and intercepts zero does no harm. \item Continuing Exercise~\ref{measurementbias}, suppose that two measurements of $W$ are available. \begin{eqnarray} W_1 & = & \nu_1 + X + e_1 \nonumber \\ W_2 & = & \nu_2 + X + e_2, \nonumber \end{eqnarray} where $E(X)=\mu$, $Var(X)=\sigma^2_T$, $E(e_1)=E(e_2)=0$, $Var(e_1)=Var(e_2)=\sigma^2_e$, and $X$, $e_1$ and $e_2$ are all independent. Calculate $Corr(W_1,W_2)$. Does this correlation still equal the reliability even when $\nu_1$ and $\nu_2$ are non-zero? % Yes. Intercepts don't matter. \item\label{goldstandard} Let $X$ be a latent variable, $W = X + e_1$ be the usual measurement of $X$ with error, and $G = X+e_2$ be a measurement of $X$ that is deemed ``gold standard," but of course it's not completely free of measurement error. It's better than $W$ in the sense that $00$. \begin{enumerate} \item Draw a path diagram of the model. \item Show that $Corr(W_1,W_2)$ is strictly \emph{greater} than the reliability. This means that in practice, omitted variables will result in over-estimates of reliability. And there are always omitted variables. \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Let $X_1 , \ldots, X_n$ be a random sample from a continuous distribution with density \begin{displaymath} f(x;\theta) = \frac{1}{\theta^{1/2}\sqrt{2\pi}} \, e^{-\frac{x^2}{2\theta}}, \end{displaymath} where the parameter $\theta>0$. Propose a reasonable estimator for the parameter $\theta$, and use the Law of Large Numbers to show that your estimator is consistent. \item Let $X_1, \ldots, X_{n_1}$ be a random sample from a distribution with expected value $\mu$ and variance $\sigma^2_x$. Independently of $X_1, \ldots, X_n$, let $Y_1, \ldots, Y_{n_2}$ be a random sample from a distribution with the same expected value $\mu$ and variance $\sigma^2_y$. Let $T_n= \alpha \overline{X}_n + (1-\alpha) \overline{Y}_n$, where $0 \leq \alpha \leq 1$. \begin{enumerate} \item Is $T_n$ an unbiased estimator of $\mu$ for any value of $\alpha \in [0,1]$? Answer Yes or No and show your work. \item Is $T_n$ a consistent estimator of $\mu$ for any value of $\alpha \in [0,1]$? Answer Yes or No and show your work. \item Find the value of $\alpha$ that minimizes the variance of the estimator $T_n$. \end{enumerate} % Always is deliberately misleading. Have confidence! \item Let $X_1 , \ldots, X_n$ be a random sample from a Gamma distribution with $\alpha=\beta=\theta>0$. That is, the density is \begin{displaymath} f(x;\theta) = \frac{1}{\theta^\theta \Gamma(\theta)} e^{-x/\theta} x^{\theta-1}, \end{displaymath} for $x>0$. Let $\widehat{\theta} = \overline{X}_n$. Is $ \widehat{\theta}$ a consistent estimator of $\theta$? Answer Yes or No and prove your answer. Hint: If $X$ has a Gamma distribution with parameters $\alpha$ and $\beta$, $E(X)=\alpha\beta$. \item \label{varconsistent} Let $X_1, \ldots, X_n$ be a random sample from a distribution with mean $\mu$ and variance $\sigma^2$. Prove that the sample variance $S^2=\frac{\sum_{i=1}^n(X_i-\overline{X})^2}{n-1}$ is consistent for $\sigma^2$. \item \label{covconsistent} Let $(X_1, Y_1), \ldots, (X_n,Y_n)$ be a random sample from a bivariate distribution with $E(X_i)=\mu_x$, $E(Y_i)=\mu_y$, $Var(X_i)=\sigma^2_x$, $Var(Y_i)=\sigma^2_y$, and $Cov(X_i,Y_i)=\sigma_{xy}$. Show that the sample covariance $S_{xy} = \frac{\sum_{i=1}^n(X_i-\overline{X})(Y_i-\overline{Y})}{n-1}$ is a consistent estimator of $\sigma_{xy}$. \item \label{randiv} Independently for $i = 1 , \ldots, n$, let \begin{displaymath} Y_i = \beta X _i + \epsilon_i, \end{displaymath} where $E(X_i)=\mu$, $E(\epsilon_i)=0$, $Var(X_i)=\sigma^2_x$, $Var(\epsilon_i)=\sigma^2_\epsilon$, and $\epsilon_i$ is independent of $X_i$. The variables $X_i$ and $Y_i$ are both observable. \begin{enumerate} \item Let \begin{displaymath} \widehat{\beta}_1 = \frac{\sum_{i=1}^n Y_i}{\sum_{i=1}^n X_i}. \end{displaymath} \begin{enumerate} \item Is $ \widehat{\beta}_1$ a consistent estimator of $\beta$? Answer Yes or No and justify your answer. \item Does it matter if $\mu=0$? \end{enumerate} % I said justify because the breakdown at mu=0 just blocks the simple applicatipon of LLN and continuous mapping. But say X and epsilon are both normal. Then betahat2 is beta plus a (constant multiple of) a Cauchy, which establishes a clear NO at least for the normal case. \item Let \begin{displaymath} \widehat{\beta}_2 = \frac{\sum_{i=1}^n X_i Y_i}{\sum_{i=1}^n X_i^2}. \end{displaymath} \begin{enumerate} \item Is $ \widehat{\beta}_2$ a consistent estimator of $\beta$? Answer Yes or No and prove your answer. \item Does it matter if $\mu=0$? \end{enumerate} \end{enumerate} \newpage \item The Laws of Large Numbers we are using in this class assume that independent observations are being averaged. \begin{enumerate} \item Does our Law of Large Numbers apply to $\overline{W}_n$ of Question~\ref{testlength}? Answer Yes or No and \emph{say why}. \item Do we have $ \overline{W}_n \stackrel{a.s.}{\rightarrow} \mu$, or $ \overline{W}_n \stackrel{a.s.}{\rightarrow} X$? Prove your answer. Hint: If $X_n \stackrel{a.s.}{\rightarrow} X$ and $Y_n \stackrel{a.s.}{\rightarrow} Y$, the the vector $(X_n,Y_n)^\top$ converges almost surely to $(X,Y)^\top$. \end{enumerate} \item \label{SAS} Before the beginning of the Fall term, students in a first-year Calculus class took a diagnostic test with two parts: Pre-calculus and Calculus. Their High School Calculus marks and their marks in University Calculus were also available. In order, the variables in the data file are: Identification code, Mark in High School Calculus, Score on the Pre-calculus portion of the diagnostic test, Score on the Calculus portion of the diagnostic test, and mark in University Calculus\footnote{Thanks to Cleo Boyd for premission to use these original data.}. Data are available in the file \href{http://www.utstat.toronto.edu/~brunner/data/legal/mathtest.txt} {\texttt{mathtest.data.txt}}. Using SAS \texttt{proc calis}, carry out an unconditional regression in which the explanatory variables are Mark in High School Calculus, Score on the Pre-calculus portion of the diagnostic test and Score on the Calculus portion of the diagnostic test. The response variable is mark in University Calculus. All the variables are observable. You are fitting just one model, and it is saturated -- meaning its parameters are one-to-one with those of an unrestricted multivariate normal model. Bring your log file and your list file to the quiz. You may be asked for numbers from your printouts, and you may be asked to hand them in. There are lots of ``$t$-tests" (actually, $Z$-tests). Know what null hypotheses they all are testing. \textbf{There must be no error messages, and no notes or warnings about invalid data on your log file.} \end{enumerate} \end{document}