% 305s14regular2.tex \documentclass[10pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 305s14 Regular Assignment Four}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent This assignment is preparation for the final exam. Your solutions to these homework problems will not be handed in. Use the formula sheet, which is posted on the course home page. As more material is covered, additional problems will be added at the end of the assignment. % \vspace{3mm} \begin{enumerate} %%%%%%%%%% \item This question should have been on Assignment Three. Assume a random sampling (not randomization) model for a completely randomized one-factor design. Define a contrast of the expected responses as $c = a_1\mu_1 + a_2\mu_2 + \cdots + a_p\mu_p = \mathbf{a}^\prime\boldsymbol{\mu}$, and the corresponding contrast of the sample means as $\widehat{c} = a_1\overline{Y}_1 + a_2\overline{Y}_2 + \cdots + a_p\overline{Y}_p = \mathbf{a}^\prime\overline{\mathbf{Y}}$. The ``weights" $a_1, \ldots, a_p$ add up to zer. \begin{enumerate} \item Using scalar (not matrix) calculations, show that $\widehat{c}$ is an unbiased estimator of $c$. \item Calculate $Var(\widehat{c})$ using scalar (not matrix) notation. Denote the sample sizes by $n_1, \ldots, n_p$. \item Suppose that $p=2$, $a_1=1$, $a_2=-1$ and the total sample size $n=n_1+n_2$ is fixed, possibly by budgetary constraints. Show that $Var(\widehat{c})$ is minimized (so that the estimate is most accurate on average) when the sample sizes are equal. You did this in assignment three. \item \label{threemeans} Now assume that there are $p=3$ experimental conditions, and we are interested in the contrast $\mu_1 - \frac{1}{2}(\mu_2+\mu_3)$. Again, let the total sample size $n=n_1+n_2+n_3$ be fixed. What choice of $n_1$, $n_2$ and $n_3$ minimizes the variance of the estimated contrast? Show your work. I did this by letting $x_1=\frac{n_1}{n}$ and $x_2=\frac{n_2}{n}$, and then minimizing a function of $x_1$ and $x_2$. I took partial derivaties, and then solved two equations in two unknowns and got a satisfying general answer. % x_1 = 2 x_2 and (4 x_2 - 1)(2 x_2 - 1) = 0 => x1=1/ and x_2 = 1/4 Is this answer really the location of the unique minimum, and not a maximum or saddle point? Well yes, but to really prove it you need to calculate the eigenvalues of a matrix of partial derivaties, which is the extension of the second derivative test. It's only a $2 \times 2$ matrix, but still it's a lot of work so let it go. This is a job for software; it's easy with R. As an alternative, you can sort of convince yourself that you have located the minimum by playing around with the function if you feel like it. \end{enumerate} \section*{Lecture Unit 9: Factorial ANOVA} \item Steel is made by heating iron and adding some carbon. A steel company conducted an experiment in which knife blades were manufactured using two different amounts of carbon (Low and High), and three different temperatures (Low, Medium and High). Of course even the Low temperature was very hot. A sample of knife blades was manufactured at each combination of carbon and temperature levels, and then the breaking strength of each blade was measured by a specially designed machine. The response variable is breaking strength. \begin{enumerate} \item In a table with one row for each treatment combination, please make columns giving the coefficients of the contrast or contrasts you would use to test for main effects of Temperature. \item In another table with one row for each treatment combination, please make columns giving the coefficients of the contrast or contrasts you would use to test the Temperature by Carbon Level interaction. \item In one last table with one row for each treatment combination, please make columns showing how you would set up dummy variables for both independent variables, using \emph{effect coding} (that's the scheme with the -1s). \item Write $E(Y|\mathbf{X=x})$ for the regression model, using the names from your table above. Include the interactions! \item Using the $\beta$ values from your answer to the preceding question, state the null hypothesis you'd use to test whether the effect of carbon level on breaking strength depends on the temperature. \end{enumerate} \item Consider a two-factor analysis of variance in which each factor has two levels. Use this regression model for the problem: \begin{displaymath} Y_i = \beta_0 + \beta_1 d_{i,1} + \beta_2 d_{i,2} + \beta_3 d_{i,1}d_{i,2} + \epsilon_i, \end{displaymath} where $d_{i,1}$ and $d_{i,2}$ are dummy variables. \begin{enumerate} \item Make a two-by-two table showing the four treatment means in terms of $\beta$ values. Use \emph{effect coding} (the scheme with $0,~1,~-1$). In terms of the $\beta$ values, state the null hypothesis you would use to test for \begin{enumerate} \item Main effect of the first factor \item Main effect of the second factor \item Interaction \end{enumerate} \item Make a two-by-two table showing the four treatment means in terms of $\beta$ values. Use \emph{indicator dummy variables} (zeros and ones). In terms of the $\beta$ values, state the null hypothesis you would use to test for \begin{enumerate} \item Main effect of the first factor \item Main effect of the second factor \item Interaction \end{enumerate} \item Which dummy variable scheme do you like more? \end{enumerate} \item \label{mathed} In a study of math education in elementary school, equal numbers of boys and girls were randomly assigned to one of three training programs designed to improve spatial reasoning. After five school days of training, the students were given a standardized test of spatial reasoning. Score on this test is the dependent variable. \begin{enumerate} \item Write $E[Y|\mathbf{X}]$ for a regression model with an intercept. The model includes the possible interaction of Sex by Program, as well as both main effects. You need not say how the dummy variables are defined. You'll do that in the next item. \item In the table below, fill in the definitions of the dummy variable)s) for Program, and the dummy variable(s) for Sex. Use \emph{effect coding} (the scheme with $0,~1,~-1$). Show the product terms too. Remember, you \emph{never} include products of the dummy variables for the same factor. \begin{center} \begin{tabular}{|l|c|} \hline Girls, Program 1 & ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ \hline Girls, Program 2 & \\ \hline Girls, Program 3 & \\ \hline Boys, Program 1 & \\ \hline Boys, Program 2 & \\ \hline Boys, Program 3 & \\ \hline \end{tabular} \end{center} \pagebreak \item Give the null hypothesis you would test to answer each question below. The answers are in terms of the $\beta$ parameters in your model. Some of the answers are the same. % This shows wrapping text within a cell of a table. The repeated \vspace{1mm} is unfortunate, but I can't get around it. parbox seems to negate the effects of \renewcommand{\arraystretch}{1.5}, which only seems to affect the top row and rows with a single line of text inside parbox. % \renewcommand{\arraystretch}{1.5} \begin{center}\begin{tabular}{|c|c|} \hline \emph{Question} & \hspace{15mm}\emph{Null Hypothesis}\hspace{15mm} \\ \hline \parbox{8 cm}{\vspace{1mm}Averaging the expected values for boys and girls, does program affect test score?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does the effect of program on test score depend on the sex of the student?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does the effect of sex on test score depend on the which program the student experienced?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Averaging across expected values for the three programs, is there a sex difference in mean test score?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there a main effect for sex?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there a main effect for program?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there an interaction between sex and program?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Test both main effects and the interaction, all at the same time; this is one test.\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Are there any differences in average test score among the six treatment combinations? This is one test. \vspace{1mm}} & \\ \hline \end{tabular}\end{center} \renewcommand{\arraystretch}{1.0} \end{enumerate} \item Consider again the math education study of Question~\ref{mathed}. Use this notation for the expected test scores. \begin{center} \begin{tabular}{cccccc} \hline \multicolumn{3}{c}{Girls} & \multicolumn{3}{c}{Boys} \\ \hline Program 1 & Program 2 & Program 3 & Program 1 & Program 2 & Program 3 \\ \hline $\mu_{11}$ & $\mu_{12}$ & $\mu_{13}$ & $\mu_{21}$ & $\mu_{22}$ & $\mu_{23}$ \\ % \hline \end{tabular} \end{center} \begin{enumerate} \item In terms of the $\mu_{ij}$ values, state the null hypotheses you would test to answer the following questions. \begin{enumerate} \item Averaging the expected values for boys and girls, does program affect test score? \item Is there an effect of program type for \emph{either} boys or girls? The negation of this is that there is no effect for boys and no effect for girls. A single test is being requested. \item Does the effect of program on test score depend on the sex of the student? \item Does the effect of sex on test score depend on the which program the student experienced? \item Averaging across expected values for the three programs, is there a sex difference in mean test score? \item Is there a sex difference in expected test score for any of the three programs? The negation of this is that there is no difference for any program. You are being asked for one test. \item Is there a main effect for sex? \item Is there a main effect for program? \item Is there an interaction between sex and program? \item Test both main effects and the interaction, all at the same time; this is one test. \item Are there any differences in average test score among the six treatment combinations? This is one test. \end{enumerate} \pagebreak \item Now consider contrasts of the form \begin{displaymath} L = c_1 \mu_{11} + c_2 \mu_{12} + c_3 \mu_{13} + c_4 \mu_{21} + c_5 \mu_{22} + c_6 \mu_{23}. \end{displaymath} For each question below, give the coefficients ($c_j$ quantities) of the contrasts you would test in order to answer the question. In each case, you are testing the null hypothesis that the contrast or set of contrasts equal zero. \begin{enumerate} \item Averaging the expected values for boys and girls, does program affect test score? \item Is there an effect of program type for \emph{either} boys or girls? The negation of this is that there is no effect for boys and no effect for girls. A single test is being requested. \item Does the effect of program on test score depend on the sex of the student? \item Does the effect of sex on test score depend on the which program the student experienced? \item Averaging across expected values for the three programs, is there a sex difference in mean test score? \item Is there a sex difference in expected test score for any of the three programs? The negation of this is that there is no difference for any program. You are being asked for one test. \item Is there a main effect for sex? \item Is there a main effect for program? \item Is there an interaction between sex and program? \item Test both main effects and the interaction, all at the same time; this is one test. \item Are there any differences in average test score among the six treatment combinations? This is one test. \end{enumerate} \item Suppose you rejected the null hypothesis of no sex by program interaction, and you interpreted this as meaning that the sex differences were not the same for the three programs. Next, you need to follow this up, to determine where the effect came from. \begin{enumerate} \item How about testing all pairwise differences between differences? Give the null hypotheses you would test, in term of $\mu_{ij}$ quantities. \item In this situation, people often look at the usual tests for pairwise differences between expected values to see where the interaction came from. Are these null hypotheses implied by the null hypothesis of the test we are following up? \end{enumerate} \end{enumerate} \section*{Lecture Unit 10: Analysis of Covariance} \item Suppose that an experiment has just one treatment condition and one control condition. For experimental units exposed to the control condition, the expected response is $\beta_0 + \beta_1 x$, where $x$ is the value of a covariate. Under the assumption of unit-treatment additivity, what is the expected value in the treatment condition? What is the connection to the concept of an interaction? \item In a study of agricultural productivity, small apple farms are randomly assigned to use one of three Pesticides (Type $A$, $B$ or $C$) and one of three Fertilizers (Type 1, 2 or 3). The dependent variable is total crop yield in kilograms, and there are two covariates: number of trees on the farm, and crop yield last year. \begin{enumerate} \item In the table below, fill in the definitions of the dummy variables for Pesticide ($p_1$ and $p_2$), and the dummy variables for Fertilizer ($f_1$ and $f_2$). Use \emph{effect coding} (the scheme with $0,~1,~-1$). \begin{center} \begin{tabular}{|c|c|c|c|c|c|} \hline \emph{Pesticide} & \emph{Fertilizer} & $p_1$ & $p_2$ & $f_1$ & $f_2$ \\ \hline \hline $A$ & $1$ & & & & \\ \hline $A$ & $2$ & & & & \\ \hline $A$ & $3$ & & & & \\ \hline $B$ & $1$ & & & & \\ \hline $B$ & $2$ & & & & \\ \hline $B$ & $3$ & & & & \\ \hline $C$ & $1$ & & & & \\ \hline $C$ & $2$ & & & & \\ \hline $C$ & $3$ & & & & \\ \hline \end{tabular} \end{center} \item Write $E[Y|\mathbf{X}]$ for a model that includes a possible Pesticide by Fertilizer interaction as well as their main effects. Denote the covariates by $X_1$ and $X_2$. Of course the vector $\mathbf{X}$ includes $p_1$, $p_2$ and so on as well as $X_1$ and $X_2$. There are no interactions between the two covariates, or between covariates and factors. % \vspace{30mm} % \pagebreak \item Give the null hypothesis you would test to answer each question below. The answers are in terms of the $\beta$ parameters in your model. Some of the answers are the same. Except for the last one, assume that each question begins with ``Controlling for number of trees and crop yield last year \ldots" . % This shows wrapping text within a cell of a table. The repeated \vspace{1mm} is unfortunate, but I can't get around it. parbox seems to negate the effects of \renewcommand{\arraystretch}{1.5}, which only seems to affect the top row and rows with a single line of text inside parbox. % \renewcommand{\arraystretch}{1.5} \begin{center}\begin{tabular}{|c|c|} \hline \emph{Question} & \hspace{15mm}\emph{Null Hypothesis}\hspace{15mm} \\ \hline \parbox{8 cm}{\vspace{1mm}Averaging across fertilizer types, does type of pesticide affect average crop yield?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does the effect of fertilizer type on crop yield depend on the type of pesticide used?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Does the effect of pesticide type on crop yield depend on the type of fertilizer used?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Averaging across pesticide types, does fertilizer type affect average crop yield?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there a main effect for pesticide type?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there a main effect for fertilizer type?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Is there an interaction between fertilizer type and pesticide type?\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Test both main effects and the interaction, all at the same time.\vspace{1mm}} & \\ \hline \parbox{8 cm}{\vspace{1mm}Test both covariates simultaneously, controlling for the main effects and the interaction.\vspace{1mm}} & \\ \hline \end{tabular}\end{center} \renewcommand{\arraystretch}{1.0} \end{enumerate} \item In the \emph{Eating Study}, pairs of university students came to a Psychology laboratory to eat a meal together. They were either friends or strangers, they ate from either small or large plates, and the food was in either a common bowl or separate bowls. Before the meal, they rated how hungry they were. The variable \texttt{Hunger} in the data is the mean rating to the two subjects who are eating together. The total amount of food they served out onto their plates and the total amount of food they actually ate were recorded, in grams. Here is the SAS program, followed by part of the list file. \begin{verbatim} /* eating.sas: Pliner's Yummy data */ options linesize=79 pagesize=500 noovp formdlim='_'; title "Eating Data"; proc format; value ffmt 1 = 'Friends' 2 = 'Strangers'; value pfmt 1 = 'Large Plate' 2 = 'Small Plate'; value sfmt 1 = 'Common Bowl' 2 = 'Separate bowls'; data chowtime; infile 'Eating.data' firstobs=2; input Friend Plate Share Hunger FoodSrv FoodEat; format Friend ffmt.; format Plate pfmt.; format Share sfmt.; proc glm; class Friend Plate Share; model FoodSrv FoodEat = hunger Friend|Plate|Share; lsmeans Friend|Plate|Share; ________________________________________ Eating Data 1 The GLM Procedure Class Level Information Class Levels Values Friend 2 Friends Strangers Plate 2 Large Plate Small Plate Share 2 Common Bowl Separate bowls Number of Observations Read 57 Number of Observations Used 57 _______________________________________________________________________________ Eating Data 2 The GLM Procedure Dependent Variable: FoodSrv Sum of Source DF Squares Mean Square F Value Pr > F Model 8 247157.8287 30894.7286 2.97 0.0087 Error 48 498824.3092 10392.1731 Corrected Total 56 745982.1379 R-Square Coeff Var Root MSE FoodSrv Mean 0.331319 23.36760 101.9420 436.2537 Source DF Type I SS Mean Square F Value Pr > F Hunger 1 64064.75136 64064.75136 6.16 0.0166 Friend 1 58889.57844 58889.57844 5.67 0.0213 Plate 1 12511.69338 12511.69338 1.20 0.2780 Friend*Plate 1 9950.85369 9950.85369 0.96 0.3327 Share 1 39207.81294 39207.81294 3.77 0.0580 Friend*Share 1 232.07441 232.07441 0.02 0.8818 Plate*Share 1 50525.55846 50525.55846 4.86 0.0323 Friend*Plate*Share 1 11775.50606 11775.50606 1.13 0.2924 Source DF Type III SS Mean Square F Value Pr > F Hunger 1 8242.14175 8242.14175 0.79 0.3776 Friend 1 75213.88643 75213.88643 7.24 0.0098 Plate 1 13230.41227 13230.41227 1.27 0.2648 Friend*Plate 1 6940.34667 6940.34667 0.67 0.4178 Share 1 47995.34574 47995.34574 4.62 0.0367 Friend*Share 1 157.24354 157.24354 0.02 0.9026 Plate*Share 1 55315.60357 55315.60357 5.32 0.0254 Friend*Plate*Share 1 11775.50606 11775.50606 1.13 0.2924 _______________________________________________________________________________ Eating Data 3 The GLM Procedure Dependent Variable: FoodEat Sum of Source DF Squares Mean Square F Value Pr > F Model 8 215056.3060 26882.0383 2.57 0.0203 Error 48 502109.4096 10460.6127 Corrected Total 56 717165.7157 R-Square Coeff Var Root MSE FoodEat Mean 0.299870 24.11728 102.2771 424.0825 Source DF Type I SS Mean Square F Value Pr > F Hunger 1 67061.23143 67061.23143 6.41 0.0147 Friend 1 48734.93228 48734.93228 4.66 0.0359 Plate 1 5124.99555 5124.99555 0.49 0.4873 Friend*Plate 1 11886.26519 11886.26519 1.14 0.2918 Share 1 42825.98220 42825.98220 4.09 0.0486 Friend*Share 1 540.51108 540.51108 0.05 0.8211 Plate*Share 1 34885.13845 34885.13845 3.33 0.0740 Friend*Plate*Share 1 3997.24985 3997.24985 0.38 0.5394 Source DF Type III SS Mean Square F Value Pr > F Hunger 1 16310.92742 16310.92742 1.56 0.2178 Friend 1 58466.31824 58466.31824 5.59 0.0222 Plate 1 4413.24880 4413.24880 0.42 0.5191 Friend*Plate 1 9396.68815 9396.68815 0.90 0.3480 Share 1 50998.56677 50998.56677 4.88 0.0321 Friend*Share 1 336.44991 336.44991 0.03 0.8584 Plate*Share 1 37062.36817 37062.36817 3.54 0.0659 Friend*Plate*Share 1 3997.24985 3997.24985 0.38 0.5394 _______________________________________________________________________________ Eating Data 5 The GLM Procedure Least Squares Means FoodSrv FoodEat Friend LSMEAN LSMEAN Friends 476.288291 459.474409 Strangers 399.704253 391.952903 FoodSrv FoodEat Plate LSMEAN LSMEAN Large Plate 421.637672 416.265683 Small Plate 454.354872 435.161629 FoodSrv FoodEat Friend Plate LSMEAN LSMEAN Friends Large Plate 471.172417 463.108264 Friends Small Plate 481.404164 455.840553 Strangers Large Plate 372.102926 369.423101 Strangers Small Plate 427.305579 414.482705 FoodSrv FoodEat Share LSMEAN LSMEAN Common Bowl 467.487383 456.113444 Separate bowls 408.505160 395.313867 FoodSrv FoodEat Friend Share LSMEAN LSMEAN Friends Common Bowl 507.470277 492.347545 Friends Separate bowls 445.106304 426.601272 Strangers Common Bowl 427.504490 419.879343 Strangers Separate bowls 371.904016 364.026462 FoodSrv FoodEat Plate Share LSMEAN LSMEAN Large Plate Common Bowl 483.349917 473.039931 Large Plate Separate bowls 359.925426 359.491434 Small Plate Common Bowl 451.624849 439.186958 Small Plate Separate bowls 457.084894 431.136300 FoodSrv FoodEat Friend Plate Share LSMEAN LSMEAN Friends Large Plate Common Bowl 549.411222 530.999536 Friends Large Plate Separate bowls 392.933613 395.216993 Friends Small Plate Common Bowl 465.529332 453.695554 Friends Small Plate Separate bowls 497.278996 457.985551 Strangers Large Plate Common Bowl 417.288613 415.080326 Strangers Large Plate Separate bowls 326.917240 323.765875 Strangers Small Plate Common Bowl 437.720366 424.678361 Strangers Small Plate Separate bowls 416.890792 404.287049 \end{verbatim} Now please answer these questions. Please remember to ignore the tests based on Type I Sums of squares. Type III corresponds to the general linear test. \begin{enumerate} \item This is an analysis of covariance. What is the covariate? \item I believe it's possible (though not guaranteed) that the covariate was influenced by one of the the factors. Which one any why? \item Controlling for reported hunger and averaging across size of plate and whether they were serving from a common bowl or separate bowls, did the amount the subjects ate depend on whether they were eating with a friend? Give the value of the $F$ statistic, the $p$-value, and whether you reject $H_0$ at $\alpha=0.05$. In plain language, what do you conclude? \item Controlling for reported hunger and averaging across size of plate and whether they were eating with a friend or a stranger, did the amount the subjects ate depend on whether they were served from a common bowl or separate bowls? Give the value of the $F$ statistic, the $p$-value, and whether you reject $H_0$ at $\alpha=0.05$. In plain language, what do you conclude? \item Controlling for reported hunger and averaging across whether they were eating with a friend or a stranger, it looks like the amount of food served onto the plate might depend on the \emph{combination} of the size of plate and whether they were served from the same bowl. Give the value of the $F$ statistic and the $p$-value. Describe the interaction in plain language. \end{enumerate} \item In a study of fuel efficiency, mid-sized SUVs were randomly assigned to receive one of three fuel additives (call them $A$, $B$ and $C$), and then their fuel consumption in liters per 100 kilometers was assessed by driving over a standard course. Weight of the vehicle was a covariate. For this problem, you will use a regression model with an intercept. You will use \emph{effect coding} (the scheme with $0,~1,~-1$). The model will allow you to test for a possible interaction of fuel additive by vehicle weight. That's right, just multiply the dummy variables by the covariate. It's not so obvious why this is a good idea with effect coding. The point of this question is to see how it works. \begin{enumerate} \item Make a table with one row for each experimental condition, showing how the dummy variables are defined. Make a wider row in which you show $E[Y|\mathbf{X}]$ for each experimental condition. \item Make another table in which you simplify our expressions for $E[Y|\mathbf{X}]$, giving the formulas for the regression lines in slope-intercept form. \item Averaging across the three experimental treatments, what is the average (arithmetic mean) intercept? Give the answer in terms of the $\beta$ parameters of your model. \item What is the average slope? \item What is the difference between the slope for additive $A$ and the average slope? \item What is the difference between the intercept for additive $B$ and the average intercept? \item What is the difference between the slope for additive $C$ and the average slope? \item What is the difference between the intercept for additive $C$ and the average intercept? \item In terms of the $\beta$ parameters of your model, what null hypothesis would you test to compare the slopes of additive $A$ and additive $C$? Simplify. \item In terms of the $\beta$ parameters of your model, what null hypothesis would you test to check the equal slopes assumption? \item In terms of the $\beta$ parameters of your model, what null hypothesis would you test to see whether the three regression lines were the same? (This is sometimes called ``equal regressions.") \end{enumerate} \section*{Lecture Unit 12: Randomized block designs} \item Sprinters on High School track teams are to be randomly assigned to different doses of anabolic steroids. Briefly explain why team would be a good blocking variable. \item Suppose there are $m$ treatments, arranged in a complete randomized block design with $k$ blocks. This means each treatment appears exactly once within each block. \begin{enumerate} \item How many experimental units are required? % pk \item In how many ways can the units within each block be assigned to experimental treatments? % p! \item In how many total ways can the experimental units be assigned to treatments? % (p!)^k \item What is the maximum number of values in the permutation distribution of the test statistic? % (p!)^k \end{enumerate} \item As in the last question, suppose there are $m$ treatments, arranged in a complete randomized block design with $k$ blocks, so that each treatment appears exactly once within each block. Consider a regression model for this, with effect coding for the dummy variables and products for the interaction of Block and Treatment. \begin{enumerate} \item How many dummy variables are there for Treatment? \item How many dummy variables are there for Block? \item How many products of dummy variables are in the regression model? \item How many regression coefficients are there in total? \item What is the sample size $n$? \item What are the error degrees of freedom for this model? \item Looking at the formula sheet, why does this tell you that you can't test any hypotheses with the general linear $F$-test? \end{enumerate} So now you see why models for randomized block designs have no interaction between block and treatment. However, you should know that the interaction is testable another way -- one due to our old friend Mr. Tukey. \item A study is designed to compare two contact lens cleaning solutions in a randomized block design. Block is the person. For each person, one eye is randomly assigned to each treatment. The dependent variable is amount of redness in the eye as rated by a nurse after two weeks. For a small study with only five subjects, \begin{enumerate} \item Write a regression model with an intercept. Begin with ``$Y_i = \ldots$" and so on. \item Make a table showing how your dummy variables are defined. \item In terms of the $\beta$ parameters of your model, what null hypothesis would you test to see which contact lens solution worked better? \end{enumerate} \item The lecture slide show has an example of a Latin Square design with four treatments. \begin{enumerate} \item Write down a Latin Square design with three treatments. \item For your design with three treatments, write a regression model with an intercept. Begin with ``$Y_i = \ldots$" and so on. \item For your design with three treatments, make a table showing how your dummy variables are defined. \item In terms of the $\beta$ parameters of your model, what null hypothesis would you test to see whether there were any treatment effects? \end{enumerate} \pagebreak \section*{Lecture Unit 14: Choosing sample size} \item Assume a random sampling model for a completely randomized design with $p$ experimental treatments. Let \begin{eqnarray*} \ell & = & a_1\mu_1 + a_2\mu_2 + \cdots + a_p\mu_p \\ &&\\ \hat{\ell} & = & a_1\overline{Y}_1 + a_2\overline{Y}_2 + \cdots + a_p\overline{Y}_p \end{eqnarray*} \begin{enumerate} \item Calculate the expected value and variance of $\hat{\ell}$. \item In terms of a regression model with cell means coding (indicator dummy variables and no intercept) what is $\hat{\ell}$? \item The ordinary $(1-\alpha)100\%$ confidence interval for $ \ell$ is $ \hat{\ell} $ plus or minus something. What is that something? Use the formula sheet. \item Why might it be reasonable to assume that the distribution of $\hat{\ell}$ is approximately normal even if the $Y_{ij}$ data values are not? \item Derive a formula for $Pr\{|\hat{\ell}- \ell| < m \}$, where $m$ is some desired margin of error. Express your answer in terms of the function $\Phi(x)$, the cumulative distribution function of a standard normal random variable. \item Using $z_{\alpha/2}$ to denote the number satisfying $\Phi(z_{\alpha/2}) = 1-\alpha/2$, What value of the sample size $n$ is required for $Pr\{|\hat{\ell}- \ell| < m \} = 1-\alpha$? Show your work. \item In a study with just an experimental group and a control group, the experimenter uses intuition to decide $n_2=2n_1$, because units in the control group require about half as much time (and therefore expense) to process. What should the minimum total sample size be so that $\mu_1-\mu_2$ can be estimated to within $\frac{\sigma}{10}$, with probability 0.95? For comparison, the (corrected) lecture slides indicate that with $n_1=n_2$, the required sample size is $n=1,538$. Note that the formula sheet now has critical values of the standard normal distribution. \item Continuing with the last item, which experiment will cost more to achieve the same precision, the one with equal sample sizes, or the one with unequal sample sizes? \item In an experiment with $p=3$ treatment conditions, suppose we are interested in the contrast $\mu_1 - \frac{1}{2}(\mu_2+\mu_3)$. \begin{enumerate} \item With equal sample sizes, what total sample size $n=n_1+n_2+n_3$ is required so that the contrast can be estimated to within $\frac{\sigma}{4}$ of its true value, with $90\%$ probability? % n_1=n_2=n_3=65, n=195 \item In Problem~\ref{threemeans}, you found that the variance was least with $n_1=2n_2$ and $n_2=n_3$. How much total sample size would you save on the last problemn if you used these optimal relative sample sizes? % Save n=19 \end{enumerate} \item Specialize the formula on the formula sheet for the case of estimating a single expected value based on one random sample. % n = sigma^2 z^2 / m^2 \item No matter what he does, the approval rating of our Mayor seems to hover around 40\%. We are planning a survey to estimate it again. \begin{enumerate} \item Assuming $Y_1, \ldots, Y_n$ are Bernoulli and the approval rating does not change a whole lot, this gives us a pretty good idea of $\sigma^2$. What is a good guess of $\sigma^2$? \item We want to be able to say the usual ``These results are expected to be accurate within three percentage points, 19 times out of 20." If the approval rating goes up, we might have to say something like ``accurate to within 3.5 percentage points," but nobody will really mind. What is the required sample size? % 0.24 * 1.96^2 / 0.03^2 = 1024.427 -> 1025 \end{enumerate} \end{enumerate} % \vspace{20mm} %%%%%%%%%%%%%%%%%%%% Power \item If a random variable $W$ has moment-generating function \begin{displaymath} M_W(t) = (1-2t)^{-\frac{\nu}{2}} e^{\frac{\lambda t}{1-2t}} \end{displaymath} then we say that $W$ has non-central chi-square random variable with degrees of freedom $\nu>0$ and non-centrality parameter $\lambda \geq 0$, and we write $W \sim \chi^2(\nu,\lambda)$. Let $W_1, \ldots, W_n$ be independent random variables, with $W_i \sim \chi^2(\nu_i,\lambda_i)$, and let $W = \sum_{i=1}^n W_i$. Find the distribution of $W$; show your work. \item Recall from lecture that if $Z$ is normal with variance one, then $Z^2 \sim \chi^2(1,(E(Z))^2)$. \begin{enumerate} \item First the univariate version: Let $Y \sim N(\mu,\sigma^2)$ Show $\frac{Y^2}{\sigma^2} \sim \chi^2(1,\frac{\mu^2}{\sigma^2})$. \item Now the multivariate version: Let $\mathbf{Y} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$. Show $ \mathbf{Y}^\prime \boldsymbol{\Sigma}^{-1}\mathbf{Y} \sim \chi^2(p, \boldsymbol{\mu}^\prime \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu})$. \end{enumerate} \item For the general linear test, you know that whether $H_0: \mathbf{C}\boldsymbol{\beta} = \mathbf{t}$ is true or not, $SSE \sim \chi^2(n-p)$ and $SSE$ is independent of $\widehat{\boldsymbol{\beta}}$. Thus, to prove that $F^*$ on the formula sheet has a non-central $F$ distribution, all you need to show is that \begin{displaymath} \frac{1}{\sigma^2}(\mathbf{C}\widehat{\boldsymbol{\beta}}-\mathbf{t})^\prime (\mathbf{C}(\mathbf{X}^\prime \mathbf{X})^{-1}\mathbf{C}^\prime)^{-1} (\mathbf{C}\widehat{\boldsymbol{\beta}}-\mathbf{t}) \end{displaymath} has a non-central chi-squared distribution. Show it, and also verify that the formula for the non-centrality parameter on the formula sheet is correct. \item For the special case of testing the difference between a treatment and control condition, \begin{enumerate} \item Show that the non-centrality parameter may be written $ \lambda = n f (1-f) d^2$, where $f = \frac{n_1}{n}$ and $d = \frac{|\mu_1-\mu_2|}{\sigma}$. \item You know that the greater the non-centrality parameter, the greater the power. Prove that for any total sample size $n$ and any effect size $d$, the power is greatest when $n_1=n_2$. \end{enumerate} \item In an experiment with three conditions, the investigator plans to test $H_0: \mu_1=\mu_2=\mu_3$, and decides on equal sample sizes. The investigator would like to be able to reject the null hypothesis with high probability when $\mu_1=45$, $\mu_2=50$, $\mu_3=55$ and $\sigma^2=100$. The non-centrality parameter $\lambda$ may be written as the sample size $n$, multiplied by a quantity that could be called ``effect size." Find the effect size for this problem. The answer is a number. This is something you can do by hand, but you can also look at the code on the lecture slides and see how to check your answer with SAS. % Answer is 1/6 You can invert a $2 \times 2$ matrix by hand if you need, to, but to make it easier you might want to use orthogonal contrasts. The ones I used are called \emph{orthogonal polynomials}, and if you look at the coefficients you can see why. Here they are: \begin{center} \begin{tabular}{rrr} -1 & 0 & 1 \\ 1 & -2 & 1 \end{tabular} \end{center} %%%%%%%%%% %%%%%%%%%% \end{enumerate} \vspace{10mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/305s14} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/305s14}} \end{document} # R function for estimation problem. Ig in units of sigma, let sigma=1. estn = function(a,f,m,alpha=0.05,sigma=1) { z = qnorm(1-alpha/2) f = f/sum(f) n = (sigma^2 * z^2 * sum(a^2 / f))/m^2 return(n) } # End function estn > estn(a=c(1,-1),f=c(1/2,1/2),m=1/10) [1] 1536.584 estn(a=c(1,-1/2,-1/2),f=c(1/3,1/3,1/3),m=1/4,alpha=0.10) > estn(a=c(1,-1/2,-1/2),f=c(1/3,1/3,1/3),m=1/4,alpha=0.10) [1] 194.7991 > > 194.7991/3 [1] 64.93303 > 65*3 [1] 195 # Oh dear, more R work on that last one. sigma = 10 mu = rbind(45,50,55) L = rbind(c(1,-1,0), c(0,1,-1)) eff = L %*% mu; eff f = c(1,1,1) # Relative sample sizes f <- f/sum(f) kore <- solve(L%*%diag(1/f)%*%t(L)) effsize <- t(eff)%*%kore%*%eff / sigma^2 effsize [,1] [1,] 0.1666667 L = rbind(c(-1,0,1), c(1,-2,1)) eff = L %*% mu; eff f = c(1,1,1) # Relative sample sizes f <- f/sum(f) kore <- solve(L%*%diag(1/f)%*%t(L)) effsize <- t(eff)%*%kore%*%eff / sigma^2 effsize First the natural ugly way. > estn(a=c(1,-1/2,-1/2),f=c(2,1,1),m=1/4,alpha=0.10) [1] 173.1548 > 173.1548/4 [1] 43.2887 > 88*2 [1] 176