\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} % Good for US Letter paper % \pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2201/442 Assignment 11}}\\ \vspace{4 mm} \end{center} \begin{enumerate} \item Steel is made by heating iron and adding some carbon. A steel company conducted an experiment in which knife blades were manufactured using two different amounts of carbon (Low and High), and three different temperatures (Low, Medium and High). Of course even the Low temperature was very hot. A sample of knife blades was manufactured at each combination of carbon and temperature levels, and then the breaking strength of each blade was measured by a specially designed machine. The response variable is breaking strength. \begin{enumerate} \item In a table with one row for each treatment combination, please make columns giving the coefficients of the contrast or contrasts you would use to test for main effects of Temperature. \item In another table with one row for each treatment combination, please make columns giving the coefficients of the contrast or contrasts you would use to test the Temperature by Carbon Level interaction. \item In one last table with one row for each treatment combination, please make columns showing how you would set up dummy variables for both independent variables, using \emph{effect coding} (that's the scheme with the -1s). \item Write $E(Y|\mathbf{X=x})$ for the regression model, using the names from your table above. Include the interactions! \item Using the $\beta$ values from your answer to the preceding question, state the null hypothesis you'd use to test whether the effect of carbon level on breaking strength depends on the temperature. \end{enumerate} \item Consider a two-factor analysis of variance in which each factor has two levels. Use this regression model for the problem: \begin{displaymath} Y_i = \beta_0 + \beta_1 d_{i,1} + \beta_2 d_{i,2} + \beta_3 d_{i,1}d_{i,2} + \epsilon_i, \end{displaymath} where $d_{i,1}$ and $d_{i,2}$ are dummy variables. \begin{enumerate} \item Make a two-by-two table showing the four treatment means in terms of $\beta$ values. Use \emph{effect coding}. In terms of the $\beta$ values, state the null hypothesis you would use to test for \begin{enumerate} \item Main effect of the first factor \item Main effect of the second factor \item Interaction \end{enumerate} \item Make a two-by-two table showing the four treatment means in terms of $\beta$ values. Use \emph{indicator dummy variables} (zeros and ones). In terms of the $\beta$ values, state the null hypothesis you would use to test for \begin{enumerate} \item Main effect of the first factor \item Main effect of the second factor \item Interaction \end{enumerate} \item Which dummy variable scheme do you like more? \end{enumerate} \item In the slide show on power for the $F$ test, I obtained a special case formula for comparing two means. It was $\phi = n f (1-f) d^2$, where $f = \frac{n_1}{n}$ and $d = \frac{|\mu_1-\mu_2|}{\sigma}$. I did this the hard way, using a regression model with an intercept and a single indicator dummy variable. I believe it would be a lot easier with cell means coding. Try it. You should get the same formula for $\phi$. The \href{http://www.utstat.toronto.edu/~brunner/appliedf12/2101f12Formulas1.pdf} {\texttt{formula sheet}} has the general expression for the non-centrality parameter $\phi$. The formula sheet will be provided with the final exam, and also with Quiz 11 if necessary. There is a link to the formula sheet on the course home page unless the one in this document does not work. \item Remember the rotten potato example of factorial ANOVA. It was a $2 \times 3$ design with equal sample sizes. Buried in all those tests of special contrasts was a test of equality of three Bacteria means at cool temperature. Suppose the means for Bacteria types 1 and 2 were equal, but the mean for Bacteria type 3 was one-half standard deviation lower. \begin{enumerate} \item What was the power to detect this effect? The answer is a number between zero and one. % Get phi = n/6 ... p=6, r=2 % > phi=9 % > 1 - pf(qf(0.95,2,48),2,48,phi) % [1] 0.7425811 % > fpow2(6,2,effsize=9/54) % [1] 62 % 62 -> 66 \item What \emph{total sample size} would have been required for a power of 0.80? Remember, all 6 sample sizes are equal. \end{enumerate} Please use R and bring your printout to the quiz. You may use my function \texttt{fpow2} if you wish. The lecture slides show how to get it. After you use the \texttt{source} command, type \texttt{fpow2} to see the definition of the function. \item I know this is pretty gruesome, but the data are real. An experiment in dentistry seeks to test the effectiveness of a drug (HEBP) that is supposed to help dental implants become more firmly attached to the jaw bone. This is an initial test on animals. False teeth were implanted into the leg bones of rabbits, and the rabbits were randomly assigned to receive either the drug or a saline solution (placebo). Technicians administering the drug were blind to experimental condition. Rabbits were also randomly assigned to be "sacrificed" after either 3, 6, 9 or 12 days. At that time, the implants were pulled out of the bone by a machine that measures force in newtons and stiffness in newtons/mm. For both of these measurements, higher values indicate more healing. A measure of "pre-load stiffness" in newtons/mm is also available for each animal. This may be another indicator of how firmly the false tooth was implanted into the bone, but it might even be a covariate. Nobody can seem to remember what "preload" means, so we'll ignore this variable for now. The data are available in the file \href{http://www.utstat.toronto.edu/~brunner/appliedf12/data/bunnies.data} {\texttt{bunnies.data}}. The variables are \begin{itemize} \item Identification code \item Time (3,6,9,12 days of healing) \item Drug (1=HEBP, 0=saline solution) \item Stiffness in newtons/mm \item Force in newtons \item Preload stiffness in newtons/mm \end{itemize} Classify the factors as within cases or between cases. Then please do the following using SAS. Remember that in the \texttt{input} statement, character-valued variables should be followed by a dollar sign. \begin{enumerate} \item Use \texttt{proc freq} to find out how many rabbits are in each experimental condition. \item Using \texttt{proc glm}, conduct a two-way ANOVA, with force as the dependent variable. Use the means statement to get cell means and marginal means. Be prepared to answer the following questions about each of the significance tests that SAS produces by default (I count 4 default tests). \begin{enumerate} \item What is the value of the test statistic? The answer is a number from your printout. \item What proportion of the remaining variation is explained? You can use \texttt{proc iml}, or just be ready to do it with a calculator. The formula will be supplied with the quiz if this question is asked. \item What is the $p$-value? The answer is a number from your printout. \item Do you reject the null hypothesis at the 0.05 level? Yes or No. \item What, if anything, do you conclude? This is not the place for statistical jargon. ``What do you conclude" means say something about the drug, healing, time -- something like that. \end{enumerate} \item I know this is a bit redundant with the preceding question, but \emph{did the drug work?} If the results justify an answer, then answer Yes or No. \item Now, make a table with a row for each treatment combination. Give the coefficients of the constrast or contrasts that would be used to test for \begin{enumerate} \item The main effect of Drug \item The main effect(s) of Time \item The Drug by Time interaction. \end{enumerate} \item Make another table with a row for each treatment combination. Make columns showing the dummy variables for effect coding. \item Give $E[Y|\mathbf{X}=\mathbf{x}]$ for a regression model with both main effects and the interaction. Use your variable names from the preceding question. \item In terms of the $\beta$ values of your regression model, give the null hypothesis you would test in order to answer each of the following questions. \begin{enumerate} \item Averaging across time periods, is there a differnece between the drug and placebo in mean force required to extract the tooth? \item Averaging across drug and placebo, is does elapsed time affect the mean force required to extract the tooth? \item Does the effect of the drug depend upon elapsed time? \end{enumerate} \item Now please return to SAS. Using \texttt{proc reg} and cell means coding, conduct tests to answer the following questions. Just do regular one-at-a-time (custom) tests. Don't bother with any Bonferroni correction this time. Just consider one dependent variable: Force. As usual, we are guided by the $\alpha = 0.05$ significance level. \begin{enumerate} \item Are the marginal means different at 3 and 6 days? \item Are the marginal means different at 6 and 9 days? \item Are the marginal means different at 9 and 12 days? \item Is there a difference between Drug and Placebo just at 3 days? \item Is there a difference between Drug and Placebo just at 6 days? \item Is there a difference between Drug and Placebo just at 9 days? \item Is there a difference between Drug and Placebo just at 12 days? \item Be able to answer questions like these for each test: \begin{enumerate} \item What is the value of the test statistic? The answer is a number from your printout. \item What proportion of the remaining variation is explained? You can use \texttt{proc iml}, or just be ready to do it with a calculator. The formula will be supplied with the quiz if this question is asked. \item What is the $p$-value? The answer is a number from your printout. \item Do you reject the null hypothesis at the 0.05 level? Yes or No. \item What, if anything, do you conclude? This is not the place for statistical jargon. ``What do you conclude" means say something about the drug, healing, time -- something like that. \end{enumerate} \end{enumerate} \end{enumerate} \end{enumerate} \vspace{5mm} \noindent Please bring your \texttt{SAS} log file and list file to the quiz. Remember, the log file is not the same as a plain listing of the program. Please be sure that the log file and list file come from the same SAS run. \emph{Do not write anything on your printouts in advance except maybe your name} (and student number, if you wish). \end{document} Suppose you have a data set with a quaqntitative dependent variable Y, quantitative independent variables X1 and X2, and factors A, B and C. Factor A has 3 levels (categories), factor B has 2 levels and factor C has 3 levels. You see this SAS code; proc glm; class A B C; model Y = X1 X2 A|B|C; Is this an analysis of covariance? Answer Yes or No. If Yes, what are the covariates? Indicate how you would define dummy variables for the factors, using effect coding. Use names like a1, a2, and so on. Write E(Y|X=x) for a regression model equivalent to the proc glm model above. For the initial F-test produced by proc glm, State the null hypothesis in terms of β quantities from your regression model. How many (numerator) degrees of freedom will be in your test? For each of the following effects in the proc glm model, state the null hypothesis in terms of β quantities from your regression model. Main effect of A. Main effect of B. Main effect of C. A by B interaction. A by C interaction. B by C interaction. A by B by C interaction. Controlling for the covariates, are there any differences among the 18 treatment means? State the null hypothesis you would test in order to answer this question. You answer should be in terms of β quantities from your regression model.