\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} \topmargin=-.3in \textheight=9.4in %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment Twelve}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \vspace{3mm} \noindent These questions are practice for the final exam, and are not to be handed in. Material like this may or may not be on the final. % needs factorial, merror, mixed. \begin{enumerate} \item Consider simple regression through the origin in which the explanatory variable values are random variables rather than fixed constants. In addition, the explanatory variable values cannot be observed directly. Instead, we observe $X_i$ plus a piece of random noise. The model is this: Independently for $i=1, \ldots, n$, let \begin{eqnarray} \label{witherror} Y_i & = & X_i \beta + \epsilon_i \\ W_i & = & X_i + e_i, \nonumber \end{eqnarray} where \begin{itemize} \item $X_i$ has expected value $\mu$ and variance $\sigma^2_x$, \item $\epsilon_i$ has expected value $0$ and variance $\sigma^2_\epsilon$, and \item $e_i$ has expected value $0$ and variance $\sigma^2_e$ \item $X_i$, $e_i$ and and $\epsilon_i$ are all independent. \end{itemize} Again, the $X_i$ values are unavailable. All we can see are the pairs $(W_i,Y_i)$ for $i=1, \ldots, n$. \begin{enumerate} % Starting parts of the mereg question \item Following common practice, we ignore the measurement error and apply the usual regression estimator with $W_i$ in place of $X_i$. The parameter $\beta$ is estimated by \begin{displaymath} \widehat{\beta}_{(2)} = \frac{\sum_{i=1}^n W_iY_i}{\sum_{i=1}^n W_i^2} \end{displaymath} Is $\widehat{\beta}_{(2)}$ a consistent estimator of $\beta$? Answer YES, NO or IMPOSSIBLE TO DETERMINE. Show your work. \ item Now consider instead the estimator \begin{displaymath} \widehat{\beta}_{(3)} = \frac{\sum_{i=1}^n Y_i}{\sum_{i=1}^n W_i}. \end{displaymath} Is $\widehat{\beta}_{(3)}$ a consistent estimator of $\beta$? Answer YES, NO or IMPOSSIBLE TO DETERMINE. Show your work. \item If $X_i$, $\epsilon_i$ and $e_i$ are normally distributed, Model~(\ref{witherror}) has five parameters: $\beta$, $\mu$, $\sigma^2_x$ $\sigma^2_\epsilon$ and $\sigma^2_e$. Perhaps surprisingly, it is possible to obtain the maximum likelihood estimates explicitly without differentiating anything. Do it if you can. It is acceptable to start the calculation and just indicate how you would finish it without giving formulas for all the MLEs. Is $\widehat{\beta}_{(3)}$ the MLE of $\beta$? Hint: Begin by calculating the covariance matrix of $(W_i,Y_i)^\prime$. \item Even without assuming normal distributions, a large-sample confidence interval for $\beta$ is within reach. Indicate how you would derive it, not necessarily giving all the details. There is more than one way to get the standard error you need. \end{enumerate} % Ending parts of megeg question \item Steel is made by heating iron and adding some carbon. A steel company conducted an experiment in which knife blades were manufactured using two different amounts of carbon (Low and High), and three different temperatures (Low, Medium and High). Of course even the Low temperature was very hot. A sample of knife blades was manufactured at each combination of carbon and temperature levels, and then the breaking strength of each blade was measured by a specially designed machine. The response variable is breaking strength. \begin{enumerate} \item In a table with one row for each treatment combination, please make columns giving the coefficients of the contrast or contrasts you would use to test for main effects of Temperature. \item In another table with one row for each treatment combination, please make columns giving the coefficients of the contrast or contrasts you would use to test the Temperature by Carbon Level interaction. \item In one last table with one row for each treatment combination, please make columns showing how you would set up dummy variables for both independent variables, using \emph{effect coding} (that's the scheme with the -1s). \item Write $E(Y|\mathbf{X=x})$ for the regression model, using the names from your table above. Include the interactions! \item Using the $\beta$ values from your answer to the preceding question, state the null hypothesis you'd use to test whether the effect of carbon level on breaking strength depends on the temperature. \end{enumerate} \item Consider a two-factor analysis of variance in which each factor has two levels. Use this regression model for the problem: \begin{displaymath} Y_i = \beta_0 + \beta_1 d_{i,1} + \beta_2 d_{i,2} + \beta_3 d_{i,1}d_{i,2} + \epsilon_i, \end{displaymath} where $d_{i,1}$ and $d_{i,2}$ are dummy variables. \begin{enumerate} \item Make a two-by-two table showing the four treatment means in terms of $\beta$ values. Use \emph{effect coding} (the scheme with $0,~1,~-1$). In terms of the $\beta$ values, state the null hypothesis you would use to test for \begin{enumerate} \item Main effect of the first factor \item Main effect of the second factor \item Interaction \end{enumerate} \item Make a two-by-two table showing the four treatment means in terms of $\beta$ values. Use \emph{indicator dummy variables} (zeros and ones). In terms of the $\beta$ values, state the null hypothesis you would use to test for \begin{enumerate} \item Main effect of the first factor \item Main effect of the second factor \item Interaction \end{enumerate} \item Which dummy variable scheme do you like more? \end{enumerate} \pagebreak In a study of agricultural productivity, small apple farms are randomly assigned to use one of three Pesticides (Type $A$, $B$ or $C$) and one of three Fertilizers (Type 1, 2 or 3). The dependent variable is total crop yield in kilograms, and there are two covariates: number of trees on the farm, and crop yield last year. \begin{enumerate} \item In the table below, fill in the definitions of the dummy variables for Pesticide ($p_1$ and $p_2$), and the dummy variables for Fertilizer ($f_1$ and $f_2$). Use \emph{effect coding} (the scheme with $0,~1,~-1$). \begin{center} \begin{tabular}{|c|c|c|c|c|c|} \hline \emph{Pesticide} & \emph{Fertilizer} & $p_1$ & $p_2$ & $f_1$ & $f_2$ \\ \hline \hline $A$ & $1$ & & & & \\ \hline $A$ & $2$ & & & & \\ \hline $A$ & $3$ & & & & \\ \hline $B$ & $1$ & & & & \\ \hline $B$ & $2$ & & & & \\ \hline $B$ & $3$ & & & & \\ \hline $C$ & $1$ & & & & \\ \hline $C$ & $2$ & & & & \\ \hline $C$ & $3$ & & & & \\ \hline \end{tabular} \end{center} \item Write $E[Y|\mathbf{X}]$ for a model that includes a possible Pesticide by Fertilizer interaction as well as their main effects. Denote the covariates by $X_1$ and $X_2$. Of course the vector $\mathbf{X}$ includes $p_1$, $p_2$ and so on as well as $X_1$ and $X_2$. There are no interactions between the two covariates, or between covariates and factors. % \vspace{30mm} \item Give the null hypothesis you would test to answer each question below. The answers are in terms of the $\beta$ parameters in your model. Some of the answers are the same. Except for the last one, assume that each question begins with ``Controlling for number of trees and crop yield last year \ldots" . % This shows wrapping text within a cell of a table. The repeated \vspace{1mm} is unfortunate, but I can't get around it. parbox seems to negate the effects of \renewcommand{\arraystretch}{1.5}, which only seems to affect the top row and rows with a single line of text inside parbox. % \renewcommand{\arraystretch}{1.5} \begin{center}\begin{tabular}{|c|c|} \hline \emph{Question} & \hspace{15mm}\emph{Null Hypothesis}\hspace{15mm} \\ \hline \parbox{8 cm}{\vspace{1mm}Does type of pesticide affect average crop yield?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does the effect of fertilizer type on crop yield depend on the type of pesticide used?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does the effect of pesticide type on crop yield depend on the type of fertilizer used?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does fertilizer type affect average crop yield?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there a main effect for pesticide type?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there a main effect for fertilizer type?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there an interaction between fertilizer type and pesticide type?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Test both main effects and the interaction, all at the same time.\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Test both covariates simultaneously, controlling for the main effects and the interaction.\vspace{1mm}} & \\ \hline \end{tabular}\end{center} \renewcommand{\arraystretch}{1.0} \end{enumerate} \pagebreak \item The Ontario government selects a random sample of $q$ grade schools. From each school, a random sample of $k$ students is selected and given a reading test. School is the explanatory variable. Because the values of this variable represent a random sample from a larger populations, a \emph{random effects} model is appropriate. A standard version that applies to this situation is \begin{displaymath} Y_{ij} = \mu + \tau_i + \epsilon_{ij}, \end{displaymath} where \begin{itemize} \item[] $\mu$ is an unknown constant parameter. \item[] $\tau_i \sim N(0,\sigma^2_\tau)$ and $\epsilon_{ij} \sim N(0,\sigma^2)$. \item[] $\tau_i$ and $\epsilon_{ij}$ are all independent, $i=1, \ldots q$ and $j=1, \ldots, k$. \end{itemize} \vspace{1mm} \begin{enumerate} \item What is the distribution of $Y_{ij}$? Just write down the answer. You need not show any work. \item\label{indep} Are the $Y_{ij}$ all independent? Consider two cases. \item What is the distribution of $\overline{Y}_i = \frac{1}{k} \sum_{j=1}^k Y_{ij}$? State your answer; the only work you need to show is your calculation of the variance. \item Find $Cov(\overline{Y}_i,Y_{ij}-\overline{Y}_i)$. Show your work. \item Define $SSTR = k\sum_{i=1}^q(\overline{Y}_i -\overline{Y}_. )^2$, where $\overline{Y}_. = \frac{1}{q} \sum_{i=1}^q \overline{Y}_i$. Find the distribution of $\frac{SSTR}{\sigma^2+k\sigma^2_\tau}$. Hint: You can make this a very easy problem by using something from the formula sheet. \item Define $SSE = \sum_{i=1}^q \sum_{j=1}^k(Y_{ij} - \overline{Y}_i )^2$. Find the distribution of $\frac{SSE}{\sigma^2}$. Again you may use a well-known fact to make the problem easier, but \emph{do not forget your answer to~\ref{indep}}. \item The proportion of variance in an observation $Y_{ij}$ that is explained by School is $\frac{\sigma^2_\tau}{\sigma^2_\tau+\sigma^2}$. Give a reasonable estimator for this quantity; show some work. \item What null hypothesis would you use to test for the effect of school on students' reading scores? State the null hypothesis in symbolic form; that is, it's a statement in terms of Greek letters. \item An exact (not large-sample) test is available for this hypothesis. Give a formula for the test statistic. Also state its distribution under $H_0$, including the degrees of freedom. Briefly indicate why it has the distribution you claim. % \item Show that the power of this test is based on a \emph{central} rather than a non-central $F$ distribution. % \item Suppose that the government's primary interest is in testing whether School has any effect at all on average reading score. Since resources are limited, would you advise the government to spend money on sampling more schools, or more students per school? Why? \end{enumerate} \end{enumerate} \vspace{30mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6.5in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf13} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf13}} \end{document}