\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} \topmargin=-.3in \textheight=9.4in %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment Twelve}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \vspace{3mm} \noindent Do these questions in preparation for the final exam. The R questions will not be on the final, but they provide some guidance about what to do with the data. So if you do them, you will be able to guess what I will do with similar data sets on the final exam. \begin{enumerate} \item In a dichotic listening experiment, subjects wear stereo headphones that allow the presentation of different sound tracks to each ear, at the same time. In this example, right-handed female university students listened to short lectures on art history in the presence of background noise. After each lecture, they answered a set of multiple choice questions. Two factors were varied experimentally: \begin{itemize} \item Noise Type: The background noise consisted of either Hip-hop music, Classical music or Radio commercials. Volume was carefully held constant. \item Presentation: Subjects heard (simultaneously) either \begin{itemize} \item Lecture (Signal) in the left ear and distraction (Noise) in the right, or \item Distraction (Noise) in the left ear and lecture (Signal) in the right, or \item Both Signal and Noise in both ears \end{itemize} \end{itemize} Each subject in the experiment experienced all nine treatment combinations, in a balanced order that was different for each subject, and randomly assigned. Thus, there are nine data values for each subject: number of questions answered correctly in each experimental condition. The layout of the data is given below: \begin{verbatim} Signal in Signal in Signal in Left Ear Right Ear Both Ears ____________________ ____________________ ____________________ HipHop Classc Radio HipHop Classc Radio HipHop Classc Radio ------ ------ ------ ------ ------ ------ ------ ------ ------ test11 test12 test13 test21 test22 test23 test31 test32 test33 1 13 12 10 15 14 14 14 13 14 2 4 8 8 6 5 8 6 3 4 3 13 15 11 11 13 15 11 13 12 \end{verbatim} Data are given in the file \href{http://www.utstat.toronto.edu/~brunner/appliedf14/code_n_data/hw/Dichotic.data} {\texttt{Dichotic.data}}. There is a link from the course home page in case the one in this document does not work. \begin{enumerate} \item Test for both main effects and for the interaction. Be able to give the $F$ statistic, the degrees of freedom and the $p$-value. \item If a main effect is significant, calculate the marginal means and test for pairwise differences of marginal means, with a Bonferroni correction. Remember, Hotelling's T-squared for a single contrast is equivalent to a matched t-test. Be able to state conclusions (if any) in plain, non-statistical language. \item If the interaction is significant at $\alpha=0.05$ (and only then), make a two-way table of treatment means. Conduct tests of contrasts that will allow you to understand the interaction and describe it in plain, non-statistical language. Again, only do this if you reject the null hypothesis of no interaction. \end{enumerate} \item In a test of how well people remember instructional materials, subjects were presented with training materials that were either in Black and White or in Colour. Their ability to recall the material was tested with both Cartoon and Realistic testing materials at two points in time -- immediately after training, and several weeks later. The variables are \begin{itemize} \item Colour versus Black and White training materials \item Cartoon1: Recall at Time One, Cartoon testing materials \item Real1: Recall at Time One, Realistic testing materials \item Cartoon2: Recall at Time Two, Cartoon testing materials \item Real2: Recall at Time Two, Realistic testing materials \end{itemize} Data are given in the file \href{http://www.utstat.toronto.edu/~brunner/appliedf14/code_n_data/hw/cartoon2.data} {\texttt{cartoon2.data}}. There is a link from the course home page in case the one in this document does not work. \begin{enumerate} \item Think of this as a three-factor design. Which factors are between cases and which are within? \item Fortunately, the within-cases factors have only two levels, so you never need to test several difference variables simultaneously. This means you don't need \texttt{manove}; you can get away with purely univariate analysis of variance. Think about this. \item Your goal is to test all the main effects and interactions. In every case, you are doing a regression analysis with effect coding, and the response variable is \emph{computed} from the data in the data file. For each effect (how many are there?) specify the response variable, and give the null hypothesis in terms of $\beta$ values. Which of the tests are about the intercept? \item Produce all tests of the main effects and interactions. For each one, be able to give the $F$ statistic, the degrees of freedom and the $p$-value. You might as well make a table. \item Interpret all results in plain, non-statistical language. You will have to calculate some treatment means to figure out what is going on. \item This data set is similar to one of the data sets for the final exam. Which one? \end{enumerate} \pagebreak \item The Ontario government selects a random sample of $q$ grade schools. From each school, a random sample of $k$ students is selected and given a reading test. School is the explanatory variable. Because the values of this variable represent a random sample from a larger populations, a \emph{random effects} model is appropriate. A standard version that applies to this situation is \begin{displaymath} Y_{ij} = \mu + \tau_i + \epsilon_{ij}, \end{displaymath} where \begin{itemize} \item[] $\mu$ is an unknown constant parameter. \item[] $\tau_i \sim N(0,\sigma^2_\tau)$ and $\epsilon_{ij} \sim N(0,\sigma^2)$. \item[] $\tau_i$ and $\epsilon_{ij}$ are all independent, $i=1, \ldots q$ and $j=1, \ldots, k$. \end{itemize} \vspace{1mm} \begin{enumerate} \item What is the distribution of $Y_{ij}$? Just write down the answer. You need not show any work. \item\label{indep} Are the $Y_{ij}$ all independent? Consider two cases. \item What is the distribution of $\overline{Y}_i = \frac{1}{k} \sum_{j=1}^k Y_{ij}$? State your answer; the only work you need to show is your calculation of the variance. \item Find $Cov(\overline{Y}_i,Y_{ij}-\overline{Y}_i)$. Show your work. \item Define $SSTR = k\sum_{i=1}^q(\overline{Y}_i -\overline{Y}_. )^2$, where $\overline{Y}_. = \frac{1}{q} \sum_{i=1}^q \overline{Y}_i$. Find the distribution of $\frac{SSTR}{\sigma^2+k\sigma^2_\tau}$. Hint: You can make this a very easy problem. What is the joint distribution of $\overline{Y}_1, \ldots, \overline{Y}_q$? \item Define $SSE = \sum_{i=1}^q \sum_{j=1}^k(Y_{ij} - \overline{Y}_i )^2$. Find the distribution of $\frac{SSE}{\sigma^2}$. Again you may use a well-known fact to make the problem easier, but \emph{do not forget your answer to~\ref{indep}}. \item The proportion of variance in an observation $Y_{ij}$ that is explained by School is $\frac{\sigma^2_\tau}{\sigma^2_\tau+\sigma^2}$. Give a reasonable estimator for this quantity; show some work. \item What null hypothesis would you use to test for the effect of school on students' reading scores? State the null hypothesis in symbolic form; that is, it's a statement in terms of Greek letters. \item An exact (not large-sample) test is available for this hypothesis. Give a formula for the test statistic. Also state its distribution under $H_0$, including the degrees of freedom. Briefly indicate why it has the distribution you claim. \item Show that the power of this test is based on a \emph{central} rather than a non-central $F$ distribution. \item Suppose that the government's primary interest is in testing whether School has any effect at all on average reading score. Since resources are limited, would you advise the government to spend money on sampling more schools, or more students per school? Why? \end{enumerate} \end{enumerate} \vspace{5mm} \noindent \noindent \begin{center}\begin{tabular}{l} \hspace{6.5in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf14} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf14}} \end{document}