% 431Assignment1.tex Review \documentclass[10pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s17 Assignment One}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s17} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s17}}} \vspace{1 mm} \end{center} \begin{enumerate} \item In the following, $X$ and $Y$ are scalar random variables, while $a$ and $b$ are fixed constants. For each pair of statements below, one is true and one is false (that is, not true in general). State which one is true, and prove it. Zero marks if you prove both statements are true, even if one of the proofs is correct. Please use only expected value signs in your answers, not integrals or summation. \begin{enumerate} \item $ Var(aX) = a Var(X)$, or $ Var(aX) = a^2 Var(X)$. \item $ Var(aX+b) = a^2 Var(X) + b^2$, or $ Var(aX+b) = a^2 Var(X)$. % Important \item $ Var(a)=0$, or $ Var(a)=a^2$. \item $Cov(aX,bY) = ab\, Cov(X,Y)$, or $Cov(aX,bY) = a^2Var(X)+b^2Var(Y)+2abCov(X,Y)$. \item $Cov(X+a,Y+b)=Cov(X,Y)+ab$, or $Cov(X+a,Y+b)=Cov(X,Y)$. % Important \item $Var(aX+bY)=a^2Var(X)+b^2Var(Y)$, or $Var(aX+bY)=a^2Var(X)+b^2Var(Y)+2abCov(X,Y)$. \end{enumerate} \item Let $X_1, \ldots, X_n$ be random variables, let $a_1, \ldots, a_n$ be constants, and let $Y=\sum_{i=1}^n a_iX_i$. Derive a general formula for $Var(Y)$. Show your work; it's a lot easier with the centering rule; see Question~\ref{centeringrule}. Now give the useful special case that applies when $X_1, \ldots, X_n$ are independent. \item Let $X_1, \ldots, X_n$ be independent and identically distributed random variables (the standard model of a random sample with replacement). Denoting $E(X_i)$ by $\mu$ and $Var(X_i)$ by $\sigma^2$, \begin{enumerate} \item Show $E(\overline{X})=\mu$; that is, the sample mean is unbiased for $\mu$. \item Find $Var(\overline{X})$. \item Letting $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\overline{X})^2$, show that $E(S^2)=\sigma^2$. That is, the sample variance is an unbiased estimator of the population variance. Consider adding and subtracting $\mu$. \end{enumerate} \item In this regression model, the explanatory variables are random. Independently for $i=1, \ldots, n$, let $Y_i = \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + \epsilon_i$, where $E(X_{i,1})=\mu_1$, $E(X_{i,2})=\mu_2$, $E(\epsilon_i)=0$, $Var(\epsilon_i)=\sigma^2$, $\epsilon_i$ is independent of both $X_{i,1}$ and $X_{i,2}$, and \begin{displaymath} cov\left( \begin{array}{c} X_{i,1} \\ X_{i,2} \end{array} \right) = \left( \begin{array}{c c} \phi_{11} & \phi_{12} \\ \phi_{12} & \phi_{22} \end{array} \right) \end{displaymath} % \begin{enumerate} \item What is $Var(Y_i)$? You may be able to just write down the answer. \item What is $Cov(X_{i,1},Y_i)$? Show your work. The centering rule definitely helps; see Question~\ref{centeringrule}. \item What is $Cov(X_{i,2},Y_i)$? \end{enumerate} % \pagebreak \item Let \begin{displaymath} \mathbf{x}_i = \left(\begin{array}{c} x_{i,i} \\ \vdots \\ x_{i,p} \end{array}\right) \mbox{ and } \overline{\mathbf{x}} = \frac{1}{n} \sum_{i=1}^n \mathbf{x}_i = \left(\begin{array}{c} \overline{x}_1 \\ \vdots \\ \overline{x}_p \end{array}\right). \end{displaymath} Let the $p \times p$ matrix $\boldsymbol{\widehat{\Sigma}} = \frac{1}{n}\sum_{i=1}^n (\mathbf{x}_i-\overline{\mathbf{x}}) (\mathbf{x}_i-\overline{\mathbf{x}})^\top $. Give a \emph{scalar} formula for element $(2,3)$ of $\boldsymbol{\widehat{\Sigma}}$. If you get stuck, an example with $p=3$ should help. \item Let the $p \times 1$ random vector $\mathbf{X}$ have mean $\boldsymbol{\mu}$ and variance-covariance matrix $\mathbf{\Sigma}$, let $\mathbf{A}$ be an $r \times p$ matrix of constants, and let $\mathbf{c}$ be an $r \times 1$ vector of constants. Starting from the definition of a covariance matrix on the formula sheet, find $cov(\mathbf{AX}+\mathbf{c})$. Show your work. \item Let $\mathbf{X}$ and $\mathbf{Y}$ be random vectors, and let $\mathbf{A}$ and $\mathbf{B}$ be matrices of constants. Starting from the definition on the formula sheet, find $cov(\mathbf{AX},\mathbf{BY})$. Show your work. Of course we are assuming that the matrices are the right size. \item \label{centeringrule} This question takes you through the useful \emph{centering rule}; see the lecture slides or Appendix~A of the text. Denote the centered version of the random vector $\mathbf{X}$ by $\stackrel{c}{\mathbf{X}} = \mathbf{X} - \boldsymbol{\mu}_x$. Most of the following are very quick. \begin{enumerate} \item Show $E(\stackrel{c}{\mathbf{X}})=\mathbf{0}$. \item Show $cov(\stackrel{c}{\mathbf{X}}) = cov(\mathbf{X})$. \item Let $\mathbf{L} = \mathbf{A}_1\mathbf{X}_1 + \cdots + \mathbf{A}_m\mathbf{X}_m + \mathbf{b}$, where the $\mathbf{A}_j$ are matrices of constants, and $\mathbf{b}$ is a vector of constants. Show $\stackrel{c}{\mathbf{L}} = \mathbf{A}_1 \stackrel{c}{\mathbf{X}}_1 + \cdots + \mathbf{A}_m \stackrel{c}{\mathbf{X}}_m$. \item Show $cov(\mathbf{L}) = E(\stackrel{c}{\mathbf{L}}\stackrel{c}{\mathbf{L}} \stackrel{\top}{\vphantom{r}})$. \item Show $cov(\mathbf{L}_1,\mathbf{L}_2) = E(\stackrel{c}{\mathbf{L}}_1\,\stackrel{c}{\mathbf{L}} \stackrel{\top}{\vphantom{r}_2})$. \end{enumerate} \item High School History classes from across Ontario are randomly assigned to either a discovery-oriented or a memory-oriented curriculum in Canadian history. At the end of the year, the students are given a standardized test and the median score of each class is recorded. Please consider a regression model with these variables: \begin{itemize} \item[$X_1$] Equals 1 if the class uses the discovery-oriented curriculum, and equals 0 if the class uses the memory-oriented curriculum. \item[$X_2$] Average parents' education for the classroom. \item[$X_3$] Average family income for the classroom. \item[$X_4$] Number of university History courses taken by the teacher. \item[$X_5$] Teacher's final cumulative university grade point average. \item[$Y$] Class median score on the standardized history test. \end{itemize} The full regression model (as opposed to the reduced models for various null hypotheses) implies \begin{displaymath} E[Y|X] = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \beta_4X_4 + \beta_5X_5. \end{displaymath} For each question below, please give the null hypothesis in terms of $\beta$ values, and $E[Y|X]$ for the reduced model you would use to answer the question. Don't re-number the variables. \begin{enumerate} \item If you control for parents' education and income and for teacher's university background, does curriculum type affect test scores? (And why is it okay to use the word "affect?") \item Controlling for parents' education and income and for curriculum type, is teacher's university background (two variables) related to their students' test performance? \item Controlling for teacher's university background and for curriculum type, are parents' education and family income (considered simultaneously) related to students' test performance? \item Controlling for curriculum type, teacher's university background and parents' education, is parents' income related to students' test performance? \item Here is one final question. Assuming that $X_1, \ldots, X_5$ are random variables (and I hope you agree that they are), would you expect $X_1$ ro be related to the other explanatory variables? Would you expect the other explanatory variables to be related to each other? \end{enumerate} \end{enumerate} \end{document}