% 302f14Assignment2.tex MGFs and one regression through the origin \documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 302f15 Assignment Two}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent These problems are preparation for the quiz in tutorial, and are not to be handed in. Starting with Problem~\ref{mgfstart}, you can play a little game. Try not to do the same work twice. Instead, use results of earlier problems whenever possible. \begin{enumerate} %%%%%%%%%%%%%%%% Matrices \item In \emph{Linear models in statistics}, do problems 2.6 (a,c,d), 2.7, 2.17, 2.18, 2.20, 2.23, 2.24 and 2.25. The answers in the back of the book are helpful. The answer to 2.24 is incomplete; consider all four cases. \item Let \begin{tabular}{ccc} $\mathbf{A} = \left( \begin{array}{c c} 1 & 2 \\ 2 & 4 \end{array} \right) $ & $\mathbf{B} = \left( \begin{array}{c c} 0 & 2 \\ 2 & 1 \end{array} \right) $ & $\mathbf{C} = \left( \begin{array}{c c} 2 & 0 \\ 1 & 2 \end{array} \right) $ \end{tabular} \begin{enumerate} \item Calculate $\mathbf{AB}$ and $\mathbf{AC}$ \item Do we have $\mathbf{AB} = \mathbf{AC}$? \item Prove $\mathbf{B} = \mathbf{C}$. Show your work. \end{enumerate} The idea for this problem comes from Example 2.4b (page 21) in the textbook. Also see Problem 2.25 in the text. %%%%%%%%%%%%%%%% Least squares \item \label{LS} Sometimes, you want the least squares line to go through the origin, so that predicted $Y$ automatically equals zero when $x=0$. For example, suppose the cases are $n$ half-kilogram batches of rice purchased from grocery stores. The independent variable $x$ is concentration of arsenic in the rice before washing, and the dependent variable $Y$ is concentration of arsenic after washing. Discounting the very unlikely possibility that arsenic contamination can happen \emph{during} washing, you want to use your knowledge that zero arsenic before washing implies zero arsenic after washing. You will use your knowledge by building it into the statistical model. Accordingly, let $Y_i = \beta x_i + \epsilon_i$ for $i=1, \ldots, n$, where $\epsilon_1, \ldots, \epsilon_n$ are a random sample (that is, independent and identically distributed) from a distribution with expected value zero and variance $\sigma^2$, and $\beta$ and $\sigma^2$ are unknown constants. The numbers $x_1, \ldots, x_n$ are known, observed constants. \begin{enumerate} \item What is $E(Y_i)$? \item What is $Var(Y_i)$? \item Find the Least Squares estimate of $\beta$ by minimizing the function \begin{displaymath} Q(\beta)=\sum_{i=1}^n(Y_i-\beta x_i)^2 \end{displaymath} over all values of $\beta$. Let $\widehat{\beta}$ denote the point at which $Q(\beta)$ is minimal. \item Give the equation of the least-squares line. Of course it's the \emph{constrained} least-squares line, passing through $(0,0)$. \pagebreak \item \label{numbers} Calculate $\widehat{\beta}$ for the following data set. Your answer is a number. Bring a calculator to the quiz in case you have to do something like this. \begin{verbatim} x 0.0 1.3 3.2 -2.5 -4.6 -1.6 4.5 3.8 y -0.8 -1.3 7.4 -5.2 -6.5 -4.9 9.9 7.2 \end{verbatim} % answer: 1.8885 \item Recall that a statistic is an \emph{unbiased estimator} of a parameter if the expected value of the statistic is equal to the parameter. Is $\widehat{\beta}$ an unbiased estimator of $\beta$? Answer Yes or No and show your work. \item What is $Var(\widehat{\beta})$? Show your work. \end{enumerate} %%%%%%%%%%%%%%%% Stats \item Let $Y_1, \ldots, Y_n$ be a random sample from a normal distribution with mean $\mu$ and variance $\sigma^2$, so that $T = \frac{\sqrt{n}(\overline{Y}-\mu)}{S} \sim t(n-1)$. This is something you don't need to prove, for now. \begin{enumerate} \item Derive a $(1-\alpha)100\%$ confidence interval for $\mu$. ``Derive" means show all the high school algebra. Use the symbol $t_{\alpha/2}$ for the number satisfying $Pr(T>t_{\alpha/2})= \alpha/2$. \item \label{ci} A random sample with $n=23$ yields $\overline{Y} = 2.57$ and a sample variance of $S^2=5.85$. Using the critical value $t_{0.025}=2.07$, give a 95\% confidence interval for $\mu$. The answer is a pair of numbers. \item Test $H_0: \mu=3$ at $\alpha=0.05$. \begin{enumerate} \item Give the value of the $T$ statistic. The answer is a number. \item State whether you reject $H_0$, Yes or No. \item Can you conclude that $\mu$ is different from 3? Answer Yes or No. \item If the answer is Yes, state whether $\mu>3$ or $\mu<3$. Pick one. \end{enumerate} \end{enumerate} % \newpage %%%%%%%%%%%%%%%%%%%%%%%%% MGF %%%%%%%%%%%%%%%%%%%%%%%%% \item \label{mgfstart} Denote the moment-generating function of a random variable $Y$ by $M_Y(t)$. The moment-generating function is defined by $M_Y(t) = E(e^{Yt})$. \begin{enumerate} \item Let $a$ be a constant. Prove that $M_{aX}(t) = M_X(at)$. \item Prove that $M_{X+a}(t) = e^{at}M_X(t)$. \item Let $X_1, \ldots, X_n$ be \emph{independent} random variables. Prove that \begin{displaymath} M_{\sum_{i=1}^n X_i}(t) = \prod_{i=1}^n M_{X_i}(t). \end{displaymath} For convenience, you may assume that $X_1, \ldots, X_n$ are all continuous, so you will integrate. \end{enumerate} \item Recall that if $X\sim N(\mu,\sigma^2)$, it has moment-generating function $M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2t^2}$. \begin{enumerate} \item Let $X\sim N(\mu,\sigma^2)$ and $Y=aX+b$, where $a$ and $b$ are constants. Find the distribution of $Y$. Show your work. \item Let $X\sim N(\mu,\sigma^2)$ and $Z = \frac{X-\mu}{\sigma}$. Find the distribution of $Z$. Show your work. \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. Find the distribution of $Y = \sum_{i=1}^nX_i$. Show your work. \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. Find the distribution of the sample mean $\overline{X}$. Show your work. \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. Find the distribution of $Z = \frac{\sqrt{n}(\overline{X}-\mu)}{\sigma}$. Show your work. \item Let $X_1, \ldots, X_n$ be independent random variables, with $X_i \sim N(\mu_i,\sigma_i^2)$. Let $a_1, \ldots, a_n$ be constants. Find the distribution of $Y = \sum_{i=1}^n a_iX_i$. Show your work. \end{enumerate} \item For the model of Question \ref{LS}, suppose that the $\epsilon_i$ are normally distributed, which is the usual assumption. What is the distribution of $Y_i$? What is the distribution of $\widehat{\beta}$? You should be able to just write down the answers based on your earlier work. \item A Chi-squared random variable $X$ with parameter $\nu>0$ has moment-generating function $M_X(t) = (1-2t)^{-\nu/2}$. \begin{enumerate} \item Let $X_1, \ldots, X_n$ be independent random variables with $X_i \sim \chi^2(\nu_i)$ for $i=1, \ldots, n$. Find the distribution of $Y = \sum_{i=1}^n X_i$. \item Let $Z \sim N(0,1)$. Find the distribution of $Y=Z^2$. For this one, you need to integrate. Recall that the density of a normal random variable is $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$. You will still use moment-generating functions. \item Let $X_1, \ldots, X_n$ be random sample from a $N(\mu,\sigma^2)$ distribution. Find the distribution of $Y = \frac{1}{\sigma^2} \sum_{i=1}^n\left(X_i-\mu \right)^2$. \item Let $Y=X_1+X_2$, where $X_1$ and $X_2$ are independent, $X_1\sim\chi^2(\nu_1)$ and $Y\sim\chi^2(\nu_1+\nu_2)$, where $\nu_1$ and $\nu_2$ are both positive. Show $X_2\sim\chi^2(\nu_2)$. \item Let $X_1, \ldots, X_n$ be random sample from a $N(\mu,\sigma^2)$ distribution. Show \begin{displaymath} \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1), \end{displaymath} where $S^2 = \frac{\sum_{i=1}^n\left(X_i-\overline{X} \right)^2 }{n-1}$. Hint: $\sum_{i=1}^n\left(X_i-\mu \right)^2 = \sum_{i=1}^n\left(X_i-\overline{X} + \overline{X} - \mu \right)^2 = \ldots$ For this question, you may use the independence of $\overline{X}$ and $S^2$ without proof. We will prove it later. \end{enumerate} \end{enumerate} \vspace{25mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/302f15} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/302f15}} \end{document} ======================================================= % t-test question ======================================================= > set.seed(2222) > x = rnorm(23,2,2) > t.test(x,mu=3) One Sample t-test data: x t = -0.8458, df = 22, p-value = 0.4068 alternative hypothesis: true mean is not equal to 3 95 percent confidence interval: 1.527200 3.619482 sample estimates: mean of x 2.573341 > s2 = round(var(x),2); s2 [1] 5.85 > xbar=round(mean(x),2); xbar [1] 2.57 > t = sqrt(23)*(xbar-3)/sqrt(s2); t [1] -0.8526179 > cv = round(qt(0.975,22),2); cv [1] 2.07 > xbar - cv *sqrt(s2/23) [1] 1.526039 > xbar + cv *sqrt(s2/23) [1] 3.613961 === Check data with R for simple regression through the origin === > rbind(x,y) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] x 0.0 1.3 3.2 -2.5 -4.6 -1.6 4.5 3.8 y -0.8 -1.3 7.4 -5.2 -6.5 -4.9 9.9 7.2 > summary(lm(y ~ -1+x)) Call: lm(formula = y ~ -1 + x) Residuals: Min 1Q Median 3Q Max -3.7550 -1.0696 -0.2275 1.3680 2.1871 Coefficients: Estimate Std. Error t value Pr(>|t|) x 1.8885 0.2248 8.402 6.66e-05 *** === Check lovely matrix example with R === A = rbind( c(1,2), c(2,4) ) B = rbind( c(0,2), c(2,1) ) C = rbind( c(2,0), c(1,2) ) # Verify AB = AC A %*% B A %*% C > # Verify AB = AC > A %*% B [,1] [,2] [1,] 4 4 [2,] 8 8 > A %*% C [,1] [,2] [1,] 4 4 [2,] 8 8 ===== These were cut from the regression through origin question. \item Let $\widehat{\beta}_2 = \frac{\overline{Y}_n}{\overline{x}_n}$. Is $\widehat{\beta}_2$ also unbiased for $\beta$? Answer Yes or No and show your work. \item What is $Var(\widehat{\beta}_2)$? Show your work, what little there is of it.. \item Calculate $\widehat{\beta}_2$ for the data of Question~\ref{numbers}. The answer is a number. \item \emph{This last part is a challenge for your entertainment. It will not be on the quiz.} Prove that $\widehat{\beta}$ is a more accurate estimator than $\widehat{\beta}_{2}$ in the sense that it has smaller variance. Hint: The sample variance of the independent variable values cannot be negative.