\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment 1 (Mostly Review)}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf17} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf17}}} \vspace{1 mm} \end{center} \noindent The questions on this assignment are practice for the quiz on Friday September 15th, and are not to be handed in. For the linear algebra part starting with Question~\ref{firstmat}, there is an excellent review in Chapter Two of Renscher and Schaalje's \emph{Linear models in statistics}. The chapter has more material than you need for this course. Note they use $\mathbf{A}^\prime$ for the transpose, while in this course we'll use $\mathbf{A}^\top$. \begin{enumerate} \item For each of the following distributions, derive a general expression for the Maximum Likelihood Estimator (MLE); don't bother with the second derivative test. Then use the data to calculate a numerical estimate; you should bring a calculator to the quiz in case you have to do something like this. \begin{enumerate} \item $p(x)=\theta(1-\theta)^x$ for $x=0,1,\ldots$, where $0<\theta<1$. Data: \texttt{4, 0, 1, 0, 1, 3, 2, 16, 3, 0, 4, 3, 6, 16, 0, 0, 1, 1, 6, 10}. Answer: 0.2061856 % Geometric .25, thetahat = 1/xbar \item $f(x) = \frac{\alpha}{x^{\alpha+1}}$ for $x>1$, where $\alpha>0$. Data: \texttt{1.37, 2.89, 1.52, 1.77, 1.04, 2.71, 1.19, 1.13, 15.66, 1.43} Answer: 1.469102 % Pareto with true alpha = 1 (one over uniform) alphahat = 1/mean(log(x)) \item $f(x) = \frac{\tau}{\sqrt{2\pi}} e^{-\frac{\tau^2 x^2}{2}}$, for $x$ real, where $\tau>0$. Data: \texttt{1.45, 0.47, -3.33, 0.82, -1.59, -0.37, -1.56, -0.20 } Answer: 0.6451059 % Normal mean zero tauhat = sqrt(1/mean(x^2)) \item $f(x) = \frac{1}{\theta} e^{-x/\theta}$ for $x>0$, where $\theta>0$. Data: \texttt{0.28, 1.72, 0.08, 1.22, 1.86, 0.62, 2.44, 2.48, 2.96} Answer: 1.517778 % Exponential, true theta=2, thetahat = xbar \end{enumerate} \item In the \emph{centered} linear regression model, sample means are subtracted from the explanatory variables, so that values above average are positive and values below average are negative. Here is a version with one explanatory variable. Independently for $i=1, \ldots, n$, let $Y_i = \beta_0 + \beta_1(x_i-\overline{x}) + \epsilon_i$, where \begin{itemize} \item[] $\beta_0$ and $\beta_1$ are unknown constants (parameters). \item[] $x_i$ are known, observed constants. \item[] $\epsilon_1, \ldots, \epsilon_n$ are random variables with $E(\epsilon_i)=0$, $Var(\epsilon_i)=\sigma^2$ and $Cov(\epsilon_i,\epsilon_j)=0$ for $i \neq j$. \item[] $\sigma^2$ is an unknown constant (parameter). \item[] $Y_1, \ldots, Y_n$ are observable random variables. \end{itemize} \begin{enumerate} \item What is $E(Y_i)$? $Var(Y_i)$? \item Prove that $Cov(Y_i,Y_j)=0$. Use the definition \\ $Cov(U,V) = E\{(U-E(U))(V-E(V)) \}$. \item If $\epsilon_i$ and $\epsilon_j$ are independent (not just uncorrelated), then so are $Y_i$ and $Y_j$, because functions of independent random variables are independent. Proving this in full generality requires advanced definitions, but in this case the functions are so simple that we can get away with an elementary definition. Let $X_1$ and $X_2$ be independent random variables, meaning $P\{X_1 \leq x_1, X_2 \leq x_2 \} = P\{X_1 \leq x_1\} P\{X_2 \leq x_2\}$ for all real $x_1$ and $x_2$. Let $Y_1=X_1+a$ and $Y_2=X_2+b$, where $a$ and $b$ are constants. Prove that $Y_1$ and $Y_2$ are independent. \item In \emph{least squares estimation}, we observe random variables $Y_1, \ldots, Y_n$ whose distributions depend on a parameter $\theta$, which could be a vector. To estimate $\theta$, write the expected value of $Y_i$ as a function of $\theta$, say $E_\theta(Y_i)$, and then estimate $\theta$ by the value that gets the observed data values as close as possible to their expected values. To do this, minimize \begin{displaymath} Q = \sum_{i=1}^n\left(Y_i-E_\theta(Y_i)\right)^2 . \end{displaymath} The value of $\theta$ that makes $Q$ as small as possible is the least squares estimate. Using this framework, find the least squares estimates of $\beta_0$ and $\beta_1$ for the centered regression model. The answer is a pair of formulas. Show your work. \item Because of the centering, it is possible to verify that the solution actually \emph{minimizes} the sum of squares $Q$, using only single-variable second derivative tests. Do this part too. \item How about a least squares estimate of $\sigma^2$? \item You know that the least squares estimators $\widehat{\beta}_0$ and $\widehat{\beta}_1$ must be unbiased, but show it by calculating their expected values for this particular case. \item Calculate $\widehat{\beta}_0$ and $\widehat{\beta}_1$ for the following data. Your answer is a pair of numbers. % \begin{center} ~~~~~ \begin{tabular}{c|ccccc} $x$ & 8 & 7 & 7 & 9 & 4 \\ \hline $y$ & 9 & 13 & 9 & 8 & 6 \end{tabular} % \end{center} ~~~~~ I get $\widehat{\beta}_1 = \frac{1}{2}$. \item Going back to the general setting (not just the numerical example with $n=5$), Suppose the $\epsilon_i$ are normally distributed. \begin{enumerate} \item What is the distribution of $Y_i$? \item Write the log likelihood function. \item Obtain the maximum likelihood estimates of $\beta_0$ and $\beta_1$; don't bother with $\sigma^2$. The answer is a pair of formulas. \emph{Don't do more work than you have to!} As soon as you realize that you have already solved this problem, stop and write down the answer. \end{enumerate} \end{enumerate} \item For the general uncentered regression model $Y_i = \beta_0 + \beta_1 x_{i,1} + \cdots + \beta_{p-1} x_{i,p-1} + \epsilon_i$ (you fill in details as necessary), show that the maximum likelihood and least squares estimates of the $\beta_j$ parameters are the same. You are most definitely \emph{not} being asked to an explicit formula; that's too much work. Just show that (assuming existence) they are the same. \item Let $x_1, \dots, x_{n_1} \stackrel{i.i.d.}{\sim} N(\mu_1,\sigma^2)$, and $y_1, \dots, y_{n_2} \stackrel{i.i.d.}{\sim} N(\mu_2,\sigma^2)$. These two random samples are independent, meaning all the $x$ variables are independent of all of the $y$ variables. Every elementary Statistics text tells you that \begin{displaymath} t = \frac{\overline{x}-\overline{y} - (\mu_1-\mu_2)}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \sim t(n_1+n_2-2), \end{displaymath} where \begin{displaymath} s^2_p = \frac{\sum_{i=1}^{n_1}(x_i-\overline{x})^2 + \sum_{i=1}^{n_2}(y_i-\overline{y})^2} {n_1+n_2-2} \end{displaymath} This is the basis of tests and confidence intervals for $\mu_1-\mu_2$. \begin{enumerate} \item Using (not proving) the distribution fact above, derive a $(1-\alpha)100\%$ confidence interval for $\mu_1-\mu_2$. ``Derive" means show all the High School algebra. Use $t_{\alpha/2}$ to denote the point cutting off the upper $\alpha/2$ of the $t$ distribution with $n_1+n_2-2$ degrees of freedom. The core of the answer is a formula for the lower confidence limit and another formula for the upper confidence limit. \item Suppose you wanted to test $H_0:\mu_1=\mu_2$. Give a formula for the test statistic. \item Prove that $H_0:\mu_1=\mu_2$ is rejected at significance level $\alpha$ if and only if the $(1-\alpha)100\%$ confidence interval for $\mu_1-\mu_2$ does not include zero. It is easiest to start with the event that the null hypothesis is \emph{not} rejected. \item Here are some $x_i$ and $y_i$ values, together with a set of $t_{0.025}$ critical values produced by R. \begin{center} \begin{tabular}{c|rrrrr} $x$ & 6 & 10 & 8 & 12 & \\ \hline $y$ & 7 & 4 & 8 & 7 & 9 \end{tabular} \end{center} \begin{verbatim} > df = 1:10; critvalue = qt(0.975,df) > round(rbind(df,critvalue),3) # Round to 3 digits df 1.000 2.000 3.000 4.000 5.000 6.000 7.000 8.000 9.000 10.000 critvalue 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 \end{verbatim} \begin{enumerate} \item Calculate $s^2_p$. My answer is $34/7$. \item Give a 95\% confidence interval for $\mu_1-\mu_2$. I get (-1.496, 5.496), give or take a little rounding error. \item What is the numerical value of the $t$ statistic for testing $H_0: \mu_1=\mu_2$ against $H_1: \mu_1 \neq \mu_2$? I get $t=1.353$. Do you reject $H_0$ at $\alpha=0.05$? Are you able to conclude that there is a real difference between means? Answer Yes or No. \end{enumerate} \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item A medical researcher conducts a study using twenty-seven litters of cancer-prone mice. Two members are randomly selected from each litter, and all mice are subjected to daily doses of cigarette smoke. For each pair of mice, one is randomly assigned to Drug A and one to Drug B. Time (in weeks) until the first clinical sign of cancer is recorded. \begin{enumerate} \item State a reasonable model for these data. Remember, a statistical model is a set of assertions that partly specify the probability distribution of the observable data. For simplicity, you may assume that the study continues until all the mice get cancer, and that log time until cancer has a normal distribution. \item What is the parameter space for your model? \end{enumerate} \item Suppose that volunteer patients undergoing elective surgery at a large hospital are randomly assigned to one of three different pain killing drugs, and one week after surgery they rate the amount of pain they have experienced on a scale from zero (no pain) to 100 (extreme pain). \begin{enumerate} \item State a reasonable model for these data. For simplicity, you may assume normality. \item What is the parameter space? \end{enumerate} \item A fast food chain is considering a change in the blend of coffee beans they use to make their coffee. To determine whether their customers prefer the new blend, the company selects a random sample of $n=100$ coffee-drinking customers and asks them to taste coffee made with the new blend and with the old blend. Customers indicate their preference, Old or New. State a reasonable model for these data. What is the parameter space? \item Label each statement below True or False. Write ``T" or ``F" beside each statement. Assume the $\alpha=0.05$ significance level. If your answer is False, be able to explain why. \begin{enumerate} \item \underline{\hspace{10mm}} The $p$-value is the probability that the null hypothesis is true. % F \item \underline{\hspace{10mm}} The $p$-value is the probability that the null hypothesis is false. % F \item \underline{\hspace{10mm}} In a study comparing a new drug to the current standard treatment, the null hypothesis is rejected. We conclude that the new drug is ineffective. % F \item \underline{\hspace{10mm}} If $p > .05$ we reject the null hypothesis at the .05 level. % F \item \underline{\hspace{10mm}} If $p < .05$ we reject the null hypothesis at the .05 level. % T \item \underline{\hspace{10mm}} The greater the $p$-value, the stronger the evidence against the null hypothesis. % F \item \underline{\hspace{10mm}} In a study comparing a new drug to the current standard treatment, $p > .05$. We conclude that the new drug and the existing treatment are not equally effective. %F \item \underline{\hspace{10mm}} The 95\% confidence interval for $\beta_3$ is from $-0.26$ to $3.12$. This means $P\{-0.26 < \beta_3 < 3.12\} = 0.95$. % F \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Suppose that several studies have tested the same hypothesis or similar hypotheses. For example, six studies could all be testing the effect of a treatment for arthritis. Two studies rejected the null hypothesis, one was almost significant, and the other three were clearly non-significant. What should be concluded? Ideally, one would pool the data from the six studies, but in practice the raw data are not available. All we have are the published test statistics. How do we combine the information we have and come to an overall conclusion? That is the main task of \emph{meta-analysis}. In this question you will develop some simple, standard tools for meta-analysis. \begin{enumerate} \item Let the test statistic $T$ be continuous, with pdf $f(t)$ and strictly increasing cdf $F(t)$ under the null hypothesis. The null hypothesis is rejected if $T>c$. Show that if $H_0$ is true, the distribution of the $p$-value is $U(0,1)$. Derive the density. Start with the cumulative distribution function of the $p$-value: $Pr\{P \leq x\} = \ldots$. \item Suppose $H_0$ is false. Would you expect the distribution of the $p$-value to still be uniform? Pick one of the alternatives below. You are not asked to derive anything for now. \begin{enumerate} \item The distribution should still be uniform. \item We would expect more small $p$-values. \item We would expect more large $p$-values. \end{enumerate} \item Let $P_i \sim U(0,1)$. Show that $Y_i = -2\ln(P_i)$ has a $\chi^2$ distribution. What are the degrees of freedom? Remember, a chi-squared with $\nu$ degrees of freedom is Gamma($\alpha=\nu/2,\beta=2$). \item \label{log} Let $P_1, \ldots P_n$ be a random sample of $p$-values with the null hypotheses all true, and let $Y=\sum_{i=1}^n-2\ln(P_i)$. What is the distribution of $Y$? Only derive it (using moment-generating functions) if you don't know the answer. \item Suppose we observe the following random sample of $p$-values: \texttt{0.016 0.188 0.638 0.148 0.917 0.124 0.695}. For the test statistic of Question~\ref{log}, \begin{enumerate} \item What is the critical value at $\alpha = 0.05$? The answer is a number. It doesn't matter how you find it. If you have to do something like this on a quiz or the final, several numbers will be supplied and you will have to choose the right one. My answer is 23.68. % qchisq(0.95,14) \item What is the value of the test statistic? The answer is a number. My answer is 21.41. % > p = c(0.016, 0.188, 0.638, 0.148, 0.917, 0.124, 0.695) % > sum(-2*log(p)) % [1] 21.40881 \item What null hypothesis are you testing? I find it easier to state in words, rather than in symbols. \item Do you reject the null hypothesis at $\alpha = 0.05$? Answer Yes or No. \item What if anything do you conclude? \end{enumerate} \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \enlargethispage*{1000 pt} \item \label{firstmat} Which statement is true? (Quantities in boldface are matrices of constants.) \begin{enumerate} \item $\mathbf{A(B+C) = AB+AC}$ \item $\mathbf{A(B+C) = BA+CA}$ \item Both a and b \item Neither a nor b \end{enumerate} \item Which statement is true? \begin{enumerate} \item $a\mathbf{(B+C)}=a\mathbf{B} + a\mathbf{C}$ \item $a\mathbf{(B+C)}=\mathbf{B}a + \mathbf{C}a$ \item Both a and b \item Neither a nor b \end{enumerate} \item Which statement is true? \begin{enumerate} \item $\mathbf{(B+C)A = AB+AC}$ \item $\mathbf{(B+C)A = BA+CA}$ \item Both a and b \item Neither a nor b \end{enumerate} \item Which statement is true? \begin{enumerate} \item $\mathbf{(AB)^\top = A^\top B^\top}$ \item $\mathbf{(AB)^\top = B^\top A^\top}$ \item Both a and b \item Neither a nor b \end{enumerate} \item Which statement is true? \begin{enumerate} \item $\mathbf{A^{\top\top} = A }$ \item $\mathbf{A^{\top\top\top} = A^\top }$ \item Both a and b \item Neither a nor b \end{enumerate} \item Suppose that the square matrices $\mathbf{A}$ and $\mathbf{B}$ both have inverses and are the same size. Which statement is true? \begin{enumerate} \item $\mathbf{(AB)}^{-1} = \mathbf{A}^{-1}\mathbf{B}^{-1}$ \item $\mathbf{(AB)}^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$ \item Both a and b \item Neither a nor b \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Which statement is true? \begin{enumerate} \item $\mathbf{(A+B)^\top = A^\top + B^\top}$ \item $\mathbf{(A+B)^\top = B^\top + A^\top }$ \item $\mathbf{(A+B)^\top = (B+A)^\top}$ \item All of the above \item None of the above \end{enumerate} \item Which statement is true? \begin{enumerate} \item $(a+b)\mathbf{C} = a\mathbf{C}+ b\mathbf{C}$ \item $(a+b)\mathbf{C} = \mathbf{C}a+ \mathbf{C}b$ \item $(a+b)\mathbf{C} = \mathbf{C}(a+b)$ \item All of the above \item None of the above \end{enumerate} \item Let \begin{tabular}{ccc} $\mathbf{A} = \left( \begin{array}{c c} 1 & 2 \\ 2 & 4 \end{array} \right) $ & $\mathbf{B} = \left( \begin{array}{c c} 0 & 2 \\ 2 & 1 \end{array} \right) $ & $\mathbf{C} = \left( \begin{array}{c c} 2 & 0 \\ 1 & 2 \end{array} \right) $ \end{tabular} \begin{enumerate} \item Calculate $\mathbf{AB}$ and $\mathbf{AC}$ \item Do we have $\mathbf{AB} = \mathbf{AC}$? Answer Yes or No. \item Prove $\mathbf{B} = \mathbf{C}$. Show your work. \end{enumerate} \item Let $\mathbf{A}$ be a square matrix with the determinant of $\mathbf{A}$ (denoted $|\mathbf{A}|$) equal to zero. What does this tell you about $\mathbf{A}^{-1}$? No proof is required here. \item Recall that an inverse of the square matrix $\mathbf{A}$ (denoted $\mathbf{A}^{-1}$) is defined by $\mathbf{A}^{-1}\mathbf{A} = \mathbf{AA}^{-1}=\mathbf{I}$. Prove that inverses are unique, as follows. Let $\mathbf{B}$ and $\mathbf{C}$ both be inverses of $\mathbf{A}$. Show that $\mathbf{B=C}$. \item Suppose that the square matrices $\mathbf{A}$ and $\mathbf{B}$ both have inverses. Using the definition of an inverse, prove that $\mathbf{(AB)}^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$. Becuause you are using the definition, you have two things to show. \item Let $\mathbf{X}$ be an $n$ by $p$ matrix with $n \neq p$. Why is it incorrect to say that $(\mathbf{X^\top X})^{-1}= \mathbf{X}^{-1}\mathbf{X}^{\top -1}$? \item \label{ivt} Let $\mathbf{A}$ be a non-singular square matrix. Prove $(\mathbf{A}^\top)^{-1} = (\mathbf{A}^{-1})^\top$. \item Using Question~\ref{ivt}, prove that the if the inverse of a symmetric matrix exists, it is also symmetric. \item \label{ss} Let $\mathbf{a}$ be an $n \times 1$ matrix of real constants. How do you know $\mathbf{a}^\top\mathbf{a}\geq 0$? \pagebreak % \small \item The $p \times p$ matrix $\boldsymbol{\Sigma}$ is said to be \emph{positive definite} if $\mathbf{a}^\top \boldsymbol{\Sigma} \mathbf{a} > 0$ for all $p \times 1$ vectors $\mathbf{a} \neq \mathbf{0}$. Show that the eigenvalues of a positive definite matrix are all strictly positive. Hint: start with the definition of an eigenvalue and the corresponding eigenvalue: $\boldsymbol{\Sigma}\mathbf{v} = \lambda \mathbf{v}$. Eigenvectors are typically scaled to have length one, so you may assume $\mathbf{v}^\top \mathbf{v} = 1$. \item Recall the \emph{spectral decomposition} of a symmetric matrix (for example, a variance-covariance matrix). Any such matrix $\boldsymbol{\Sigma}$ can be written as $\boldsymbol{\Sigma} = \mathbf{P} \boldsymbol{\Lambda} \mathbf{P}^\top$, where $\mathbf{P}$ is a matrix whose columns are the (orthonormal) eigenvectors of $\boldsymbol{\Sigma}$, $\boldsymbol{\Lambda}$ is a diagonal matrix of the corresponding eigenvalues, and $\mathbf{P}^\top\mathbf{P} =~\mathbf{P}\mathbf{P}^\top =~\mathbf{I}$. If $\boldsymbol{\Sigma}$ is real, the eigenvalues are real as well. \begin{enumerate} \item Let $\boldsymbol{\Sigma}$ be a square symmetric matrix with eigenvalues that are all strictly positive. \begin{enumerate} \item What is $\boldsymbol{\Lambda}^{-1}$? \item Show $\boldsymbol{\Sigma}^{-1} = \mathbf{P} \boldsymbol{\Lambda}^{-1} \mathbf{P}^\top$ \end{enumerate} \item Let $\boldsymbol{\Sigma}$ be a square symmetric matrix, and this time the eigenvalues are non-negative. \begin{enumerate} \item What do you think $\boldsymbol{\Lambda}^{1/2}$ might be? \item Define $\boldsymbol{\Sigma}^{1/2}$ as $\mathbf{P} \boldsymbol{\Lambda}^{1/2} \mathbf{P}^\top$. Show $\boldsymbol{\Sigma}^{1/2}$ is symmetric. \item Show $\boldsymbol{\Sigma}^{1/2}\boldsymbol{\Sigma}^{1/2} = \boldsymbol{\Sigma}$, justifying the notation. \end{enumerate} \item Now return to the situation where the eigenvalues of the square symmetric matrix $\boldsymbol{\Sigma}$ are all strictly positive. Define $\boldsymbol{\Sigma}^{-1/2}$ as $\mathbf{P} \boldsymbol{\Lambda}^{-1/2} \mathbf{P}^\top$, where the elements of the diagonal matrix $\boldsymbol{\Lambda}^{-1/2}$ are the reciprocals of the corresponding elements of $\boldsymbol{\Lambda}^{1/2}$. \begin{enumerate} \item Show that the inverse of $\boldsymbol{\Sigma}^{1/2}$ is $\boldsymbol{\Sigma}^{-1/2}$, justifying the notation. \item Show $\boldsymbol{\Sigma}^{-1/2} \boldsymbol{\Sigma}^{-1/2} = \boldsymbol{\Sigma}^{-1}$. \end{enumerate} \item Let $\boldsymbol{\Sigma}$ be a symmetric, positive definite matrix. How do you know that $\boldsymbol{\Sigma}^{-1}$ exists? \end{enumerate} % \pagebreak \item Let $\mathbf{X}$ be an $n \times p$ matrix of constants. The idea is that $\mathbf{X}$ is the ``design matrix" in the linear model $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$, so this problem is really about linear regression. \begin{enumerate} % \item Recall that $\mathbf{A}$ symmetric means $\mathbf{A=A^\top}$. Let $\mathbf{X}$ be an $n$ by $p$ matrix. Show that $\mathbf{X^\top X}$ is symmetric. \item Recall the definition of linear independence. The columns of $\mathbf{A}$ are said to be \emph{linearly dependent} if there exists a column vector $\mathbf{v} \neq \mathbf{0}$ with $\mathbf{Av} = \mathbf{0}$. If $\mathbf{Av} = \mathbf{0}$ implies $\mathbf{v} = \mathbf{0}$, the columns of $\mathbf{A}$ are said to be linearly \emph{independent}. Show that if the columns of $\mathbf{X}$ are linearly independent, then $\mathbf{X}^\top\mathbf{X}$ is positive definite. \item Show that if $\mathbf{X}^\top\mathbf{X}$ is positive definite then $(\mathbf{X}^\top\mathbf{X})^{-1}$ exists. \item Show that if $(\mathbf{X}^\top\mathbf{X})^{-1}$ exists then the columns of $\mathbf{X}$ are linearly independent. \end{enumerate} This is a good problem because it establishes that the least squares estimator $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{Y}$ exists if and only if the columns of $\mathbf{X}$ are linearly independent. \end{enumerate} \end{document} # R for regression data x = c(8, 7, 7, 9, 4); dx = x-7 y = c(9, 13, 9, 8, 6); dy = y-9 cbind(dx,dy) lm(y~dx); sum(dx*dy)/sum(dx^2) # beta0hat = 9, beta1hat = 0.5 # R for two-sample t-test x = c(6,10,8,12); mean(x) y = c(7, 4, 8, 7, 9); mean(y) t.test(x,y,var.equal=T) Two Sample t-test data: x and y t = 1.3528, df = 7, p-value = 0.2182 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.495899 5.495899 sample estimates: mean of x mean of y 9 7 df = 1:10 critvalue = qt(0.975,df) round(rbind(df,critvalue),3) # Round the whole thing to 3 digits s2p = ((4-1)*var(x)+(5-1)*var(y))/7; s2p # 4.857143 se = sqrt(s2p*(1/4+1/5)); se # 1.478416 low = 2 - 2.365*se; low # -1.496454 high = 2 + 2.365*se; high # 5.496454 tstat = 2/se; tstat # 1.352799