\documentclass[10pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment Six}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent Please bring two \emph{separate} R printouts to the quiz. \emph{Your printouts must \emph{not} contain answers to the non-computer parts of this assignment}, The non-computer questions are just practice for the quiz, and will not be handed in. %, though you may use R as a calculator. % Bring a real calculator to the quiz. Note that the \href{http://www.utstat.toronto.edu/~brunner/appliedf13/formulas/2101f13Formulas1.pdf} {formula sheet} will be displayed with Quiz 6. See the link on the course home page in case the one in this document does not work. \begin{enumerate} \item You probably should do one Fisher Information exercise by hand. This example is very standard, but informative. Among other things, it shows when you do and when you do \emph{not} divide by $n$ to get asymptotic variances. Let $Y_1, \ldots, Y_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. You may use the well-known maximum likelihood estimators $\widehat{\mu}= \overline{Y}$ and $\widehat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n (Y_i-\overline{Y})^2$ without deriving them. \begin{enumerate} \item \label{varmean} What is $Var(\overline{Y})$? Just write down the answer. \item Calculate the Fisher Information matrix $\boldsymbol{\mathcal{I}}(\boldsymbol{\theta})$. It is a function of $\mu$ and $\sigma^2$. \item What is $\boldsymbol{\mathcal{I}}(\mu,\sigma^2)^{-1}$? \item Recalling $\sqrt{n}(\widehat{\boldsymbol{\theta}}_n-\boldsymbol{\theta}) \stackrel{d}{\rightarrow} \mathbf{T} \sim N_k\left(\mathbf{0}, \boldsymbol{\mathcal{I}}(\boldsymbol{\theta})^{-1}\right )$, \begin{enumerate} \item What is the asymptotic covariance matrix of $(\overline{Y}, \widehat{\sigma}^2)$? \item What part of your answer agrees with your answer to Question~\ref{varmean}? \item Why does it make sense that the asymptotic covariance matrix is diagonal? \end{enumerate} \item As a cross-check on your asymptotic variance of $\widehat{\sigma}^2$, recall that if $W \sim \chi^2(\nu)$, then $Var(W) = 2\nu$, and for a normal random sample, $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$. Use this to calculate the \emph{exact} (not asymptotic) variance of $\widehat{\sigma}^2$. \item \label{fishervar} Based on what you have done so far, give the \emph{estimated} asymptotic covariance matrix of $(\overline{Y}, \widehat{\sigma}^2)$. Your answer is a $2 \times 2$ matrix of handwritten symbols. \item Another formula for the asymptotic covariance matrix is based on the observed Fisher information. Look at the formula sheet, and calculate it for this problem. Compare it to your answer to Question~\ref{fishervar}. \item The file \href{http://www.utstat.toronto.edu/~brunner/appliedf13/code_n_data/hw/normal.data} {\texttt{normal.data}} contains some data that are rounded, but otherwise closer to normal than almost anything in nature. There is a link from the course website in case the one in this document does not work. \begin{enumerate} \item Give the estimated asymptotic covariance matrix of $(\overline{Y}, \widehat{\sigma}^2)$ based on your work so far. It is a $2 \times 2$ matrix of numbers. Produce it with R. \item Do the job numerically with R's \texttt{nlm} function. Obtain an estimated asymptotic covariance matrix of $(\overline{Y}, \widehat{\sigma}^2)$ by inverting the Hessian. Again, your answer is a $2 \times 2$ matrix of numbers. Are you shocked? Why or why not? \item Now do the job with the bootstrap, using $B=1,000$ as usual. Again, your answer is a $2 \times 2$ matrix of numbers. Are you even more shocked? Which estimated asymptotic covariance matrix do you like most? \end{enumerate} \end{enumerate} Bring your printout to the quiz. Your printout must \emph{not} contain answers to the non-computer parts of this question. That is, it must contain only numerical answers. \pagebreak \item \label{poisson} Men and women are calling a technical support line according to independent Poisson processes with rates $\lambda_1$ and $\lambda_2$ per hour. Data for 144 hours are available, but unfortunately the sex of the caller was not recorded. All we have is the number of callers for each hour, which is distributed Poisson($\lambda_1+\lambda_2$). Here are the data, which are also available in the file \href{http://www.utstat.toronto.edu/~brunner/appliedf13/code_n_data/hw/poisson.data} {\texttt{poisson.data}} on the course website: \begin{verbatim} 12 9 25 11 22 16 17 8 14 17 14 17 8 14 18 16 13 17 13 11 9 15 18 14 16 17 16 19 13 18 12 12 13 10 8 15 13 11 15 15 6 10 13 13 11 13 9 15 16 9 5 10 8 18 13 17 7 13 13 17 12 17 14 16 6 12 17 10 9 14 11 19 13 17 15 20 14 10 13 14 17 9 13 14 7 16 16 9 25 10 10 9 17 7 15 12 14 21 14 18 14 12 13 15 12 11 16 14 15 16 8 19 13 17 15 11 18 13 12 11 19 14 16 17 13 13 19 19 11 19 10 12 9 18 11 14 9 14 14 14 13 9 13 18 \end{verbatim} \begin{enumerate} \item The parameter in this problem is $\boldsymbol{\theta} = (\lambda_1,\lambda_2)^\prime$. Try to find the MLE analytically. Show your work. Are there any points in the parameter space where both partial derivatives are zero? \item Now try to find the MLE numerically with R's \texttt{nlm} function\footnote{When I did this, I got lots of warning messages with some starting values, when the search repeatedly left the parameter space and then bounced back in. For this problem, you don't need to worry about the warnings as long as the exit code is one.}. The Hessian is interesting; ask for it. Try two different starting values. Compare the minus log likelihoods at your two answers. What seems to be happening here? \item Try inverting the Hessian to get the asymptotic covariance matrix. Any comments? \item To understand what happened in the last item, calculate the Fisher information in a single observation from the definition. That is, letting $\ell = \log f(Y;\boldsymbol{\theta})$, calculate the elements of the $2 \times 2$ matrix whose $(i,j)$ element is \begin{displaymath} -E\left(\frac{\partial^2 \ell}{\partial\theta_i\partial\theta_j}\right) \end{displaymath} \item The Fisher information in the sample is just $n$ times the Fisher information in a single observation. Using the numerical MLEs from one of your \texttt{nlm} runs, estimate this quantity (a $2 \times 2$ matrix). Compare it to the Hessian. Now do you see what happened when you tried to calculate the asymptotic covariance matrix? \item Why did estimation fail for this fairly realistic model? \end{enumerate} Bring your printout to the quiz. Your printout must \emph{not} contain answers to the non-computer parts of this question. That is, it must contain only numerical answers. \newpage \item Consider the usual fixed effects multiple regression model in scalar form: \begin{displaymath} Y_i = \beta_0 + \beta_1 x_{i,1} + \cdots + \beta_{p-1} x_{i,p-1} + \epsilon_i, \end{displaymath} where the $x_{i,j}$ quantities are fixed constants, and the $\epsilon_i$ variables are independent $N(0,\sigma^2)$. \begin{enumerate} \item Write the likelihood function. \item The least squares estimates of the regression coefficients are the $\beta_j$ values that minimize \begin{displaymath} Q(\boldsymbol{\beta})=\sum_{i=1}^n(Y_i-\beta_0 - \beta_1 x_{i,1} - \cdots - \beta_k x_{i,p-1})^2 \end{displaymath} Are the least squares and maximum likelihood estimates the same in this case (the normal case), or are they different? You don't need to derive $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\prime\mathbf{X})^{-1}\mathbf{X}^\prime\mathbf{Y}$ in order to see the answer, so don't do it. \item Differentiate $Q(\boldsymbol{\beta})$ with respect to $\beta_0$ and set the derivative to zero, obtaining the first \emph{normal equation}. \item Noting that the quantities $\widehat{\beta}_0, \ldots, \widehat{\beta}_{p-1}$ must satisfy the first normal equation, show that the least squares plane must pass through the point $(\overline{x}_1, \overline{x}_2, \ldots, \overline{x}_{p-1}, \overline{Y})$. \item Defining ``predicted" $Y_i$ as $\widehat{Y}_i = \widehat{\beta}_0 + \widehat{\beta}_1 x_{i,1} + \cdots + \widehat{\beta}_{p-1} x_{i,p-1}$, show that $\sum_{i=1}^n \widehat{Y}_i = \sum_{i=1}^n Y_i$. \item The \emph{residual} for observation $i$ is defined by $e_i = Y_i - \widehat{Y}_i$. Show that the sum of residuals equals exactly zero. \end{enumerate} \item For the general fixed effects linear regression model in matrix form (see formula sheet), show that the $n \times p$ matrix of covariances $C(\mathbf{e},\widehat{\boldsymbol{\beta}}) = \mathbf{0} $. Why does this establish that $SSE$ and $\widehat{\boldsymbol{\beta}}$ are independent? \item For the general fixed effects linear regression model, tests and confidence intervals for linear combinations of regression coefficients are very useful. Derive the appropriate $t$ distribution and some applications by following these steps. Let $\mathbf{a}$ be a $p \times 1$ vector of constants. \begin{enumerate} \item What is the distribution of $\mathbf{a}^\prime \widehat{\boldsymbol{\beta}}$? Show a little work. Your answer includes both the expected value and the variance. \item Now standardize the difference (subtract off the mean and divide by the standard deviation) to obtain a standard normal. \item Divide by the square root of a well-chosen chi-squared random variable, divided by its degrees of freedom, and simplify. Call the result $T$. \item How do you know numerator and denominator are independent? \item Suppose you wanted to test $H_0: \mathbf{a}^\prime\boldsymbol{\beta} = c$. Write down a formula for the test statistic. \item Suppose you wanted to test $H_0: \beta_2=0$. Give the vector $\mathbf{a}$. \item Suppose you wanted to test $H_0: \beta_1=\beta_2$. Give the vector $\mathbf{a}$. \item Letting $t_{\alpha/2}$ denote the point cutting off the top $\alpha/2$ of the $t$ distribution with $n-p$ degrees of freedom, give a $(1-\alpha) \times 100\%$ confidence interval for $\mathbf{a}^\prime\boldsymbol{\beta}$. % \item Derive the $(1-\alpha)\times 100\%$ prediction interval for $Y_{n+1}$. \end{enumerate} \end{enumerate} % \vspace{25mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6.5in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf13} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf13}} \end{document} ###### R work for the normal Fisher Information example. ################### # set.seed(32448) # y = round(rnorm(100,100,15)) y = scan("http://www.utstat.toronto.edu/~brunner/appliedf13/code_n_data/hw/normal.data") n = length(y); ybar = mean(y) ; sigma2hat = (n-1)/n * var(y) n; ybar; sigma2hat varhatmean = sigma2hat/n; varhatmean # Asyptotic covariance matrix based on hand calculated Fisher Information. AV1 = rbind(c(varhatmean,0), c(0,2*sigma2hat^2/n) ); AV1 # Asyptotic covariance matrix based on numerical search mll = function(theta,datta) # Minus LL for normal { mu = theta[1]; sigma2 = theta[2] mll = -sum(dnorm(datta,mu,sqrt(sigma2),log=T)) mll } # End of function mll search2 = nlm(mll,p=c(ybar,sigma2hat),hessian=T,datta=y) AV2 = solve(search2$hessian); AV2 # $ # Asyptotic covariance matrix based on bootstrap set.seed(9999) B=1000 statstar = NULL # Will be a B x 2 matrix of ybar,sigmahat values for(draw in 1:B) { dstar = y[sample(1:n,size=n,replace=T)] statstar = rbind(statstar,c(mean(dstar),(n-1)/n*var(dstar)) ) } AV3 = var(statstar); AV3 ############################################################################ # R work for the Poisson example rm(list=ls()) x = scan("http://www.utstat.toronto.edu/~brunner/appliedf13/code_n_data/hw/poisson.data") xbar = mean(x); xbar pmll = function(theta,data) # Poisson Minus Log Likelihood { L1 = theta[1]; L2 = theta[2] pmll = -sum(dpois(data,L1+L2,log=T)); pmll } # End function pmll search1 = nlm(pmll,p=c(xbar/2,xbar/2),hessian=T,data=x); search1 search2 = nlm(pmll,p=c(10,8),hessian=T,data=x); search2 joe = function(t) {joe = -sum(dpois(data,t,log=T)) ; joe} nlm(joe,xbar,data) ############################################################################ % Maybe next time. Also see more 2012 A5. \item Consider the prediction interval for $Y_{n+1}$. \begin{enumerate} \item What is the distribution of $Y_{n+1}-\widehat{Y}_{n+1} = Y_{n+1}-\mathbf{x}_{n+1}^\prime \widehat{\boldsymbol{\beta}}$? Show your work. Your answer includes both the expected value and the variance. \item Now standardize the difference to obtain a standard normal. \item Divide by the square root of a chi-squared random variable, divided by its degrees of freedom, and simplify. Call it $T$. Compare your answer to a slide from lecture. \item Using your result, derive the $(1-\alpha)\times 100\%$ prediction interval for $Y_{n+1}$. \end{enumerate} \item There is a difference between a prediction interval and a confidence interval. Using the preceding question as a guide, derive a $(1-\alpha)\times 100\%$ for $E(Y_{n+1})= \mathbf{x}_{n+1}^\prime \boldsymbol{\beta}$. This time, rather than trapping $Y_{n+1}$ in the interval, you are trapping $\mathbf{x}_{n+1}^\prime \boldsymbol{\beta}$. \item When you fit a full and a reduced regression model, the proportion of remaining variation explained by the additional variables in the full model is $a = \frac{R^2_F-R^2_R}{1-R^2_R}$. Show \begin{displaymath} F = \frac{(SSR_F-SSR_R)/r}{MSE_F} = \left(\frac{n-p}{r}\right) \left(\frac{a}{1-a}\right) \end{displaymath}