\documentclass[10pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment Ten}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent For this assignment, there is a computer part to each question. Use R, and bring your printouts to the quiz. \textbf{Your printouts should show \emph{all} R input and output, and \emph{only} R input and output}. Do not write anything on your printouts except your name and student number. \vspace{1mm} \begin{enumerate} \item \label{bunnies} I know this is pretty gruesome, but the data are real. An experiment in dentistry seeks to test the effectiveness of a drug (HEBP) that is supposed to help dental implants become more firmly attached to the jaw bone. This is an initial test on animals. False teeth were implanted into the leg bones of rabbits, and the rabbits were randomly assigned to receive either the drug or a saline solution (placebo). Technicians administering the drug were blind to experimental condition. Rabbits were also randomly assigned to be "sacrificed" after either 3, 6, 9 or 12 days. At that time, the implants were pulled out of the bone by a machine that measures force in newtons and stiffness in newtons/mm. For both of these measurements, higher values indicate more healing. A measure of "pre-load stiffness" in newtons/mm is also available for each animal. This may be another indicator of how firmly the false tooth was implanted into the bone, but it might even be a covariate. Nobody can seem to remember what "preload" means, so we'll ignore this variable for now. The explanatory variables are Time and Drug. The response variable is Force required to pull out the tooth. There is more than one reasonable way to do this analysis, but just to keep us together please treat Time as a categorical variable. The data are available in the file \href{http://www.utstat.toronto.edu/~brunner/data/legal/bunnies.data.txt} {\texttt{bunnies.data.txt}}. The variables are \begin{itemize} \item Identification code \item Time (3,6,9,12 days of healing) \item Drug (1=HEBP, 0=saline solution) \item Stiffness in newtons/mm \item Force in newtons \item Preload stiffness in newtons/mm \end{itemize} \begin{enumerate} \item Use \texttt{table} to find out how many rabbits are in each experimental condition. \item Carry out the standard tests of main effects and interactions. Be prepared to answer the following questions about each test. \begin{enumerate} \item What is the value of the test statistic? The answer is a number from your printout. \item What proportion of the remaining variation is explained? You can use \texttt{R}, or just be ready to do it with a calculator. \item What is the $p$-value? The answer is a number from your printout. \item Do you reject the null hypothesis at the 0.05 level? Yes or No. \item What, if anything, do you conclude? This is not the place for statistical jargon. ``What do you conclude" means say something about the drug, healing, time -- something like that. \end{enumerate} \item I know this is a bit redundant with the preceding question, but \emph{averaging across time, did the drug help the teeth become more firmly attached to the bone?} If the results justify an answer, then answer Yes or No. \item Make a table with a row for each treatment combination. Make columns showing the dummy variables for effect coding. \item Give $E[Y|\mathbf{X}=\mathbf{x}]$ for a regression model with both main effects and the interaction. Use your variable names from the preceding question. \pagebreak \item In terms of the $\beta$ values of your regression model, give the null hypothesis you would test in order to answer each of the following questions. \begin{enumerate} \item Averaging across time periods, is there a difference between the drug and placebo in mean force required to extract the tooth? \item Averaging across drug and placebo, is does elapsed time affect the mean force required to extract the tooth? \item Does the effect of the drug depend upon elapsed time? \end{enumerate} \item Now please return to R. Doing it the easiest way you can, conduct tests to answer the following questions. Just do regular one-at-a-time (custom) tests. Don't bother with any Bonferroni correction this time. Just consider one response variable: Force. As usual, we are guided by the $\alpha = 0.05$ significance level. \begin{enumerate} \item Are the marginal means different at 3 and 6 days? \item Are the marginal means different at 6 and 9 days? \item Are the marginal means different at 9 and 12 days? \item Is there a difference between Drug and Placebo just at 3 days? \item Is there a difference between Drug and Placebo just at 6 days? \item Is there a difference between Drug and Placebo just at 9 days? \item Is there a difference between Drug and Placebo just at 12 days? % At any day is good, not asked. \item Be able to answer questions like these for each test: \begin{enumerate} \item What is the value of the test statistic? The answer is a number from your printout. \item What proportion of the remaining variation is explained? You don't have to do all these calculations in advance; just be ready to do them with a calculator. \item What is the $p$-value? The answer is a number from your printout. \item Do you reject the null hypothesis at the 0.05 level? Yes or No. \item What, if anything, do you conclude? This is not the place for statistical jargon. ``What do you conclude" means say something about the drug, healing, time -- something like that. \end{enumerate} \end{enumerate} \end{enumerate} \item Let $\mathbf{w} \sim N_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$, where $\boldsymbol{\Sigma}$ is symmetric and positive definite. \begin{enumerate} \item Show that $\mathbf{w}^\top \boldsymbol{\Sigma}^{-1}\mathbf{w}$ has a non-central chi-squared distribution with degrees of freedom $p$ and non-centrality parameter $\lambda = \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}$. \item Use what you have just proved to obtain the formula for the non-centrality parameter for the general linear $F$-test; see the formula sheet. I think this is cleaner than the way it was done in lecture. \end{enumerate} \newpage \item \label{potatopower} For this question, please feel free to use my code from the power lecture. Remember the rotten potato example of factorial ANOVA? It was a $2 \times 3$ design with nine cases per treatment combination. Suppose the true value of $\sigma^2 = 16$, and the true expected values are as follows: \begin{verbatim} Bact1 Bact2 Bact3 Cool 4 4 12 Warm 5 5 14 \end{verbatim} \begin{enumerate} \item For $n=54$, what is the power to detect the main effect of temperature? \item Still maintaining equal sample sizes, what minimum total sample size is required so that the test of the main effect for temperature will have a power of at least 0.8? \item For $n=54$, what is the power to detect the main effect of bacteria type? \item Still maintaining equal sample sizes, what minimum total sample size is required so that the test of the main effect for bacteria type will have a power of at least 0.8? \item For $n=54$, what is the power to detect the interaction of temperature and bacteria type? \item Still maintaining equal sample sizes, what minimum total sample size is required so that the test of the interaction will have a power of at least 0.8? % \item For $n=54$, what is the power to detect an effect of bacteria type at cool temperatures? % \item Buried in all those tests of special contrasts was a test of equality of three Bacteria means at cool temperature. Suppose the means for Bacteria types 1 and 2 were equal, but the mean for Bacteria type 3 was one-half standard deviation lower. % \begin{enumerate} % \item What was the power to detect this effect? The answer is a number between zero and one. % \item What \emph{total sample size} would have been required for a power of 0.80? Remember, all 6 sample sizes are equal. % \end{enumerate} Please use R and bring your printout to the quiz. \end{enumerate} \item \label{intercept} Let $Y_1, \ldots, Y_n$ be independent and identically distributed $N(\mu,\sigma^2)$ random variables. We are interested in the power of a test of $H_0: \mu=\mu_0$ against the alternative that $\mu \neq 0$. The easy way out is to use a regression model with an intercept but no explanatory variables, so that the null hypothesis is a statement about the intercept. \begin{enumerate} \item Calculate the non-centrality parameter $\lambda$ and simplify. \item What is ``effect size" for this problem? \item Suppose $\frac{|\mu-\mu_0|}{\sigma} = \frac{1}{3}$. What is the minimum sample size required for a power of 0.80? The answer is a number. \end{enumerate} \end{enumerate} Please bring your printouts Question~\ref{bunnies},~\ref{potatopower} and \ref{intercept} to the quiz. \textbf{Your printouts should show \emph{all} R input and output, and \emph{only} R input and output}. Do not write anything on your printouts except your name and student number. % \vspace{50mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf16} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf16}} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Maybe next time, and also 2013A12 Q4 \pagebreak \item \label{randslopes} It is perfectly natural to assume that something like response to a drug might be approximately linear over some range of dosage values, but that each person in the population might have his or her own slope. Thus each time you select a random sample you'll get a different collection of slopes, and the regression coefficient corresponding to the slope would be a random variable. Here is a simple model illustrating this situation. Let \begin{displaymath} Y_i = S_ix_i + \epsilon_i, \end{displaymath} where $x_1, \ldots, x_n$ are known constants, and independently for $i=1, \ldots, n$, \begin{itemize} \item[] $S_i$ ($S$ for slope) is a normal random variable with expected value $\beta$ and variance $\sigma^2_1$, \item[] $\epsilon_i$ is a normal random variable with expected value zero and variance $\sigma^2_2$, and \item[] $S_i$ and $\epsilon_i$ are independent. \end{itemize} \begin{enumerate} \item This is a special case of the general mixed linear model, in which $\mathbf{Y}~=~\mathbf{X} \boldsymbol{\beta} ~+~ \mathbf{Zb} ~+~\boldsymbol{\epsilon}$, where \begin{itemize} \item $\mathbf{X}$ is an $n \times p$ matrix of known constants \item $\boldsymbol{\beta}$ is a $p \times 1$ vector of unknown constants. \item $\mathbf{Z}$ is an $n \times q$ matrix of known constants \item $\mathbf{b} \sim N_q(\mathbf{0},\boldsymbol{\Sigma}_b)$ with $\boldsymbol{\Sigma}_b$ unknown \item $\boldsymbol{\epsilon} \sim N(\mathbf{0},\sigma^2 \mathbf{I}_n)$ , where $\sigma^2 > 0$ is an unknown constant. \end{itemize} At first, the model in this problem might not seem to fit the specifications above. Why? But actually it does. To see this, give the distribution of $Y_i$, and then write a general mixed model that yields this same probability distribution. Now you can answer these questions. \begin{enumerate} \item What is the matrix $\mathbf{X}$? What is $p$? \item What is the matrix $\boldsymbol{\beta}$? \item What is the matrix $\mathbf{Z}$? What is $q$? \item What is the matrix $\mathbf{b}$? \item What is the matrix $\boldsymbol{\Sigma}_b$? \end{enumerate} \item What would happen if you tried to estimate $\beta$ in the usual way with \begin{displaymath} \widehat{\beta} = \frac{\sum_{i=1}^n x_i Y_i}{\sum_{i=1}^n x_i^2} \end{displaymath} Under what conditions on the $x_i$ values is this estimator consistent? \item Find another estimator of $\beta$ by calculating $E(\overline{Y}_n)$. Why does the Law of Large Numbers \emph{not} apply here? Okay, anyway, propose an estimator, and give a set of conditions on the $x_i$ values that will make it consistent for $\beta$. \item Which estimator do you like more? Why? \item Now suppose that all the $x_i$ values are equal to one. \begin{enumerate} \item What is the distribution of $Y_i$ in this situation? \item Propose an estimator of $\beta$ that should satisfy anyone. \end{enumerate} %This last little example shows you two things. First, whether the parameters of a model can be estimated depends on how you collect the data; this is a matter of experimental design (assuming the $x_i$ values are under the control of the investigator). Second, it is possible that some parameters can be estimated very successfully, while others cannot be estimated at all. \end{enumerate}