% 431Assignment1.tex Review \documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{comment} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s23 Assignment One}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/brunner/oldclass/431s23} {\small\texttt{http://www.utstat.toronto.edu/brunner/oldclass/431s23}}} \vspace{1 mm} \end{center} \noindent \emph{These problems are not to be handed in. They are practice for the Quiz on Friday January 20.} \begin{enumerate} \item Look at the formula sheet and answer the following True of False for general random variables $X$ and $Y$. \begin{enumerate} \item $Var(X)=0$. \item $Cov(X,Y) = Cov(Y,X)$. \item $Cov(X,X) = Var(X)$. \end{enumerate} \item The discrete random variables $X$ and $Y$ have joint distribution \begin{center} \begin{tabular}{c|ccc} & $x=1$ & $x=2$ & $x=3$ \\ \hline $y=1$ & $3/12$ & $1/12$ & $3/12$ \\ $y=2$ & $1/12$ & $3/12$ & $1/12$ \\ \end{tabular} \end{center} \begin{enumerate} \item What is the marginal distribution of $X$? List the values with their probabilities. \item What is the marginal distribution of $Y$? List the values with their probabilities. \item Calculate $E(X)$. Show some work. \item Calculate $E(Y)$. Show some work. \item Let $Z = g(X,Y) = XY$. What is the probability distribution of $Z$? List the values with their probabilities. Show some work. \item Calculate $E(Z)= E(XY)$. Show your work. \item Do we have $E(XY) = E(X)E(Y)$? Answer Yes or No. \item Using the well-known formula $Cov(X,Y)=E(XY)-E(X)E(Y)$ (to be proved later), what is $Cov(X,Y)$? \item Are $X$ and $Y$ independent? Answer Yes or No and show your work. Note that for discrete random variables, $X$ and $Y$ independent means $P(X=x,Y=y) = P(X=x) \, P(Y=y)$ for all real $x$ and $y$. So to prove independence, you need to establish it for all $x$ and $y$, while to prove lack of independence, you only need to find one exception. \end{enumerate} \item \label{prod} Let $X$ and $Y$ be \emph{continuous} random variables that are independent, meaning $f_{x,y}(x,y) = f_x(x) \, f_y(y)$ for all real $x$ and $y$. Using the expression for $E(g(\mathbf{x}))$ on the formula sheet, show $E(XY) = E(X)E(Y)$. Draw an arrow to the place in your answer where you use independence, and write ``This is where I use independence." Because $X$ and $Y$ are continuous, you will need to integrate. Does your proof still apply if $X$ and $Y$ are discrete? \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item This question clarifies the meaning of $E(a)$ and $Var(a)$ when $a$ is a constant. \begin{enumerate} \item Let $X$ be a discrete random variable with $P(X=a)=1$ (later we will call this a \emph{degenerate} random variable). Using the definitions on the formula sheet, calculate $E(X)$ and $Var(X)$. % This is the real meaning of the concept. \item Let $a$ be a real constant and $X$ be a continuous random variable with density $f(x)$. Let $Y = g(X) = a$. Using the formula for $E(g(X))$ on the formula sheet, calculate $E(Y)$ and $Var(Y)$. This reminds us that the change of variables formula (which is a very big theorem) applies to the case of a constant function. \end{enumerate} % See 2016 for another version of this question. \item \label{handy} Using the definitions of variance and covariance along with the linear property $E(\sum_{i=1}^na_iY_i) = \sum_{i=1}^na_iE(Y_i)$ (no integrals), show the following: \begin{enumerate} \item \label{handyA} $Var(Y) = E(Y^2)-\left( E(Y) \right)^2$ \item \label{handyB} $Cov(X,Y)=E(XY)-E(X)E(Y)$ \item If $X$ and $Y$ are independent, $Cov(X,Y) = 0$. Of course you may use Problem~\ref{prod}. \end{enumerate} \item In the following, $X$ and $Y$ are scalar random variables, while $a$ and $b$ are fixed constants. For each pair of statements below, one is true and one is false (that is, not true in general). State which one is true, and prove it. Zero marks if you prove both statements are true, even if one of the proofs is correct. Please use only expected value signs in your answers, not integrals or summation. \begin{enumerate} \item $ Var(aX) = a Var(X)$, or $ Var(aX) = a^2 Var(X)$. \item $ Var(aX+b) = a^2 Var(X) + b^2$, or $ Var(aX+b) = a^2 Var(X)$. % Important \item $ Var(a)=0$, or $ Var(a)=a^2$. \item $Cov(aX,bY) = ab\, Cov(X,Y)$, or $Cov(aX,bY) = a^2Var(X)+b^2Var(Y)+2abCov(X,Y)$. \item $Cov(X+a,Y+b)=Cov(X,Y)+ab$, or $Cov(X+a,Y+b)=Cov(X,Y)$. % Important \item $Var(X+Y)=Var(X)+Var(Y)$, or $Var(X+Y)=Var(X)+Var(Y)+2 \, Cov(X,Y)$. \end{enumerate} \item Let $E(X)=\mu_x$, $E(Y)=\mu_y$, and $E(Z)=\mu_z$. Show $Cov(X,Y+Z) = Cov(X,Y) + Cov(X,Z)$. \item Show $Cov(X_1+X_2,Y_1+Y_2) = Cov(X_1,Y_1) + Cov(X_1,Y_2) +Cov(X_2,Y_1) +Cov(X_2,Y_2)$. There is a generalization of this fact on the formula sheet. Can you find it? \item Let $X$ and $Y$ be random variables, with $E(X)=\mu_x$, $E(Y)=\mu_y$, $Var(X)=\sigma^2_x$, $Var(Y)=\sigma^2_y$, and $Cov(X,Y) = \sigma_{xy}$. Let $a$ be a non-zero constant. Find $Corr(aX,Y)$. Do not forget that $a$ could be negative. Use $sign(a)$ to denote the sign of $a$. Show your work. \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Let $y_1, \ldots, y_n$ be numbers (not necessarily random variables), and $\overline{y}=\frac{1}{n}\sum_{i=1}^ny_i$. Show \begin{enumerate} \item $\sum_{i=1}^n(y_i-\overline{y})=0$ \item $\sum_{i=1}^n(y_i-\overline{y})^2=\sum_{i=1}^ny_i^2 \,-\, n\overline{y}^2$ \end{enumerate} \item Let $x_1, \ldots, x_n$ and $y_1, \ldots, y_n$ be numbers, with $\overline{x}=\frac{1}{n}\sum_{i=1}^nx_i$ and $\overline{y}=\frac{1}{n}\sum_{i=1}^ny_i$. Show $\sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y}) = \sum_{i=1}^n x_iy_i \,-\, n\overline{x} \, \overline{y}$. \item Let $Y_1, \ldots, Y_n$ be \emph{independent} random variables with $E(Y_i)=\mu$ and $Var(Y_i)=\sigma^2$ for $i=1, \ldots, n$. For this question, please use definitions and familiar properties of expected value, not integrals or sums. \begin{enumerate} \item Find $E(\sum_{i=1}^nY_i)$. Are you using independence? \item Find $Var\left(\sum_{i=1}^n Y_i\right)$. What earlier questions are you using in connection with independence? \item Using your answer to the last question, find $Var(\overline{Y})$. \item A statistic $T$ is an \emph{unbiased estimator} of a parameter $\theta$ if $E(T)=\theta$. Show that $\overline{Y}$ is an unbiased estimator of $\mu$. \item Let $a_1, \ldots, a_n$ be constants and define the linear combination $L$ by $L = \sum_{i=1}^n a_i Y_i$. What condition on the $a_i$ values makes $L$ an unbiased estimator of $\mu$? Show your work. \item Is $\overline{Y}$ a special case of $L$? If so, what are the $a_i$ values? \item What is $Var(L)$? \end{enumerate} \item Let $X_1, \ldots, X_n$ be independent and identically distributed random variables (the standard model of a random sample with replacement). Denote $E(X_i)$ by $\mu$ and $Var(X_i)$ by $\sigma^2$. \begin{enumerate} % \item Show $E(\overline{X})=\mu$; that is, the sample mean is unbiased for $\mu$. \item Letting $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\overline{X})^2$, show that $E(S^2)=\sigma^2$. That is, the sample variance is an unbiased estimator of the population variance. Consider adding and subtracting $\mu$. \item Find $Cov(X_j,\overline{X})$. Show the calculation. \item Find $Cov(\overline{X}, X_j-\overline{X})$. Show the calculation. \end{enumerate} \item The pairs of random variables $(X_1, Y_1), \ldots, (X_n, Y_n)$ are a random sample from a bivariate distribution with $E(X_i) = \mu_x$, $E(Y_i) = \mu_y$, $Var(X_i) = \sigma^2_x$, $Var(Y_i) = \sigma^2_y$, and $Cov(X_i, Y_i) = \sigma_{xy}$. Because $(X_1, Y_1), \ldots, (X_n, Y_n)$ are a random sample, they are independent for $i = 1, \ldots, n$. However, $X_i$ and $Y_i$ are not necessarily independent, because $\sigma_{xy}$ might not equal zero. \begin{enumerate} \item Find $Cov(\overline{X},\overline{Y})$. Show the calculation. \item Find $Corr(\overline{X},\overline{Y})$. Show the calculation. \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Let \begin{eqnarray*} Y_1 & = & \alpha_0 + \alpha_1 X_1 + \alpha_2 X_2 + \epsilon_1 \\ Y_2 & = & \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon_2, \end{eqnarray*} where $E(X_1)= \mu_1$, $E(X_2)= \mu_2$, $Var(X_1)=\phi_{11}$, $Var(X_2)=\phi_{22}$, $Cov(X_1,X_2)=\phi_{12}$, $E(\epsilon_1)=E(\epsilon_2)= 0$, $Var(\epsilon_1)=\sigma_{11}$, $Var(\epsilon_2)=\sigma_{22}$, $Cov(\epsilon_1,\epsilon_2)=\sigma_{12}$, and also, $X_1$ and $X_2$ are independent of $\epsilon_1$ and $\epsilon_2$. Calculate $Cov(Y_1,Y_2)$. Show your work. My answer is $\alpha_1\beta_1\phi_{11} + \alpha_2\beta_2\phi_{22} + (\alpha_1\beta_2+\alpha_2\beta_1)\phi_{12} + \sigma_{12}$. \item High School History classes from across Ontario are randomly assigned to either a discovery-oriented or a memory-oriented curriculum in Canadian history. At the end of the year, the students are given a standardized test and the median score of each class is recorded. Please consider a regression model with these variables: \begin{itemize} \item[$X_1$] Equals 1 if the class uses the discovery-oriented curriculum, and equals 0 if the class uses the memory-oriented curriculum. \item[$X_2$] Average parents' education for the classroom. \item[$X_3$] Average family income for the classroom. \item[$X_4$] Number of university History courses taken by the teacher. \item[$X_5$] Teacher's final cumulative university grade point average. \item[$Y$] Class median score on the standardized history test. \end{itemize} The full regression model (as opposed to the reduced models for various null hypotheses) implies \begin{displaymath} E[Y|\mathbf{x}] = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \beta_4X_4 + \beta_5X_5. \end{displaymath} For each question below, please give the null hypothesis in terms of $\beta$ values. The terms Controlling, Correcting, Holding constant and Adjusting all mean the same thing. \begin{enumerate} \item If you control for parents' education and income and for teacher's university background, does curriculum type affect test scores? (And why is it okay to use the word "affect?") \item Correcting for parents' education and income and for curriculum type, is teacher's university background (two variables) related to their students' test performance? \item Holding teacher's university background and curriculum type constant, are parents' education and family income (considered simultaneously) related to students' test performance? \item Adjusting for curriculum type, teacher's university background and parents' education, is parents' income related to students' test performance? \item Here is one final question. Assuming that $X_1, \ldots, X_5$ are random variables (and I hope you agree that they are), would you expect $X_1$ ro be related to the other explanatory variables? Would you expect the other explanatory variables to be related to each other? \end{enumerate} \end{enumerate} \end{document} % This was first \item The discrete random variable $X$ has probability mass function $p(x) = |x|/20$ for $x = -4, \ldots, 4$ and zero otherwise. Let $Y=X^2-1$. \begin{enumerate} \item What is $E(X)$? The answer is a number. Show some work. % zero \item Calculate the variance of $X$. The answer is a number. My answer is 10. \item What is $P(Y=8)$? My answer is 0.30 \item What is $P(Y=-1)$? My answer is zero. \item What is $P(Y=-4)$? My answer is zero. \item What is the probability distribution of $Y$? Give the $y$ values with their probabilities. \begin{verbatim} y 0 3 8 15 p(y) 0.1 0.2 0.3 0.4 \end{verbatim} \item What is $E(Y)$? The answer is a number. My answer is 9. \item What is $Var(Y)$? The answer is a number. My answer is 30. \end{enumerate}