% Came from STA441s20 through STA442s12, 441s16, 441s18 - replacing most of mixed model % \documentclass[serif]{beamer} % Serif for Computer Modern math font. \documentclass[serif, handout]{beamer} % Handout to ignore pause statements \hypersetup{colorlinks,linkcolor=,urlcolor=red} % Uncomment next 2 lines instead of the first for article-style handout: % \documentclass[12pt]{article} % \usepackage{beamerarticle} % \usefonttheme{serif} % Looks like Computer Modern for non-math text -- nice! \setbeamertemplate{navigation symbols}{} % Supress navigation symbols at bottom % \usetheme{Berlin} % Displays sections on top % \usetheme{Warsaw} % Displays sections on top \usetheme{Frankfurt} % Displays sections on top: Fairly thin but swallows some material at bottom of crowded slides % \usetheme{AnnArbor} % CambridgeUS \usepackage[english]{babel} \usepackage{graphicx}% \usepackage{graphpap} % Graph paper for pictures. \setbeamertemplate{footline}[frame number] \mode % \mode{\setbeamercolor{background canvas}{bg=black!5}} \title{Within Cases ANOVA Part One: Mixed Model and Multivariate Approaches\footnote{See last slide for copyright information.}} \subtitle{STA441 Spring 2024} % (optional) \date{} % To suppress date \begin{document} \begin{frame} \titlepage \end{frame} \begin{frame} \frametitle{Contents} \tableofcontents \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Overview of Within Cases} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Within Cases} Example: A random sample of male and female university students is weighed midway through year 1, 2, 3 and 4. The explanatory variables are gender and year (time). \vspace{1mm} Gender is a between-cases factor and year is a within-cases factor. \pause \begin{itemize} \item For a within-cases factor, a case contributes a response variable value for more than one value of the explanatory variable --- usually all of them. \pause \item It is natural to expect data from the same case to be correlated -- \emph{not} independent. \pause \item For example, the same subject appears in several treatment conditions. \item Hearing study: How does pitch affect our ability to hear faint sounds? The same subjects will hear a variety of different pitch and volume levels (in a random order). They press a key when they think they hear something. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Student's Sleep Study (\emph{Biometrika}, 1908)} \framesubtitle{First Published Example of a $t$-test} \pause \begin{itemize} \item Patients take two sleeping medicines several days apart. \item Half get $A$ first, half get $B$ first. \item Reported extra hours of sleep are recorded (difference from baseline). \pause \item It's natural to subtract, and test whether the mean \emph{difference} equals zero. \item That's what Gossett did. \pause \item But some might do an independent $t$-test with $n_1=n_2$. \item This assumes observations from the same person to be independent. \ \item It's unrealistic, but is it harmful? \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Matched pairs, testing $H_0: \mu_1=\mu_2$} \framesubtitle{Independent \emph{v.s.} Matched $t$-test} \pause \begin{itemize} \item If population covariance between the two measurements is positive, Type I error probability of both tests is 0.05, but matched $t$-test has better power. \item If population covariance between measurements is negative, matched $t$-test has Type I error probability of 0.05, but the independent $t$-test has Type I error probability greater than 0.05. \end{itemize} Why? \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Why the matched $t$-test is better} %\framesubtitle{} \begin{itemize} \item Numerator of both test statistics is $\overline{d} = \overline{y}_1 - \overline{y}_2$. \pause \item Denominator is an estimate of the standard deviation of the difference. \pause \item $Corr(\overline{y}_1, \overline{y}_2) = Corr(y_{i,1},y_{i,2})$. \pause \item So $Cov(\overline{y}_1, \overline{y}_2)$ has the same sign as $Cov(y_{i,1},y_{i,2})$. \pause \item $Var(\overline{y}_1 - \overline{y}_2) = Var(\overline{y}_1) + Var(\overline{y}_2) - 2 Cov(\overline{y}_1, \overline{y}_2)$. \pause \item If $Cov(\overline{y}_1, \overline{y}_2) > 0$, pretending independence results in overestimation of $Var(\overline{y}_1 - \overline{y}_2)$. Denominator is larger, so $t$~statistic is smaller. \pause \item If $Cov(\overline{y}_1, \overline{y}_2) < 0$, pretending independence results in underestimation of $Var(\overline{y}_1 - \overline{y}_2)$. Denominator is smaller, so $t$~statistic is greater (too big). \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Within-cases Terminology} You may hear terms like ``longitudinal" and ``repeated measures." \vspace{5mm} \pause \begin{itemize} \item \textbf{Longitudinal}: The same variables are measured repeatedly over time. Usually there are lots of variables, including categorical ones, and large samples. If there's an experimental treatment, itís usually once at the beginning, like a surgery. Longitudinal studies basically track what happens over time. \pause \item \textbf{Repeated measures}: Usually, the same subjects experience two or more experimental treatments. Usually quantitative response variables, and often small samples. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Wine Tasting Example} \framesubtitle{A single within-cases factor} \pause In a taste test of wine, 6 professional judges judged 4 wines. The numbers they gave do not exactly represent quality. Instead, they are maximum prices in dollars per bottle that the judge thinks the company can charge and still sell most of the wine. \pause \begin{itemize} \item Cases are judges: $n=6$. \item Each judge tastes and rates all four wines. \item The single factor is Wine: Four categories. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Archery Example: Bow and Arrow} \framesubtitle{Two within-cases factors} \begin{itemize} \item Cases are archers. There are $n$ archers. \item Test two bows, three arrow types. \pause \item Warmup, then each archer takes 10 shots with each Bow-Arrow combination --- 60 shots. \item In a different random order for each archer, of course. \pause \item $y_{i,1}, \ldots, y_{i,6}$ are mean distances from arrow tip to centre of target, for $i=1, \ldots, n$. \item Each $y_{i,j}$ is based on 10 shots. \pause \item $E(y_{i,j})=\mu_j$ for $j=1,\ldots,6$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}{One Between, One Within} \begin{itemize} \item Grapefruit study: Cases are $n$ grocery stores. \item Within stores factor: Three price levels. \item Between-stores factor: Incentive program for produce managers (Yes-No). \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}{Monkey Study} \begin{itemize} \item Train monkeys on discrimination tasks, at 16, 12, 8, 4 and 2 weeks prior to treatment. Different task each time, equally difficult (randomize order). \pause \item Treatment is to block function of the hippocampus (with drug, not surgery), \pause re-test. Get 5 scores for each monkey. \pause \begin{center} \includegraphics[width=4in]{Timeline} \end{center} \item 11 randomly assigned to treatment, 7 to control. \pause \item Treatment is between, time elapsed since training is within. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}{Advantages of Within-cases Designs}{If measurement of the response variable does not mess things up too much} \pause \begin{itemize} \item Convenience (sometimes). \pause \item Each case serves as its own control. A huge number of extraneous variables are automatically held constant. The result can be a very sensitive analysis. \pause \item For some models, you can have lots of measurements on just a few subjects --- if you are willing to make some assumptions. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}{Three main approaches for normal response variables} \begin{itemize} \item Classical Mixed (Random Shock) model \item Multivariate \item Covariance Structure \pause \item[] \item Randomization. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Mixed Model} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Random Shock Model} \framesubtitle{``Mixed" fixed and random effects} \begin{itemize} \item Expect multiple measurements coming from the same individual to be correlated. \item Here's a model for how it could happen. \item In addition to an ordinary regression model, each person (case) brings his or her own individual contribution, a little piece of noise that pushes all his or her data values up or down by the same amount. \pause \item Random shock --- random because the individual is randomly sampled. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Mixed Model} \framesubtitle{Both fixed and random effects} \begin{itemize} \item Random shock from each person. \pause \item A random effect (for person). \item A mixed model because there are also fixed effects in the regression model. \item Fixed means the regression coefficients are constants. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Regression model} \framesubtitle{Cases $i=1, \ldots, n$, with $j = 1, \ldots k$ values of $y$ for each case} {\Large \begin{displaymath} y_{ij} = \beta_0 + \beta_1x_{ij1} + \beta_2x_{ij2} + \cdots + \beta_{p-1}x_{ij,p-1} + \epsilon_{ij} + {\color{red} \delta_i} \end{displaymath} } % End size \begin{itemize} \item $\epsilon_{ij} \sim N(0,\sigma^2)$ \item ${\color{red} \delta_i} \sim N(0,\sigma^2_1)$ \item Independent. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Example: Grapefruit sales} \framesubtitle{Three price levels within stores, one incentive program between} \begin{verbatim} Store p1 p2 Incent Sales 1 1 0 1 18 1 0 1 1 27 1 0 0 1 14 2 1 0 0 32 2 0 1 0 11 2 0 0 0 9 \end{verbatim} \pause \begin{eqnarray*} y_{i1} & = & \beta_0 + \beta_1 p_{i11} + \beta_2 p_{i12} + \beta_3 d_{i} + \epsilon_{i1} + {\color{red} \delta_i}\\ y_{i2} & = & \beta_0 + \beta_1 p_{i21} + \beta_2 p_{i22} + \beta_3 d_{i} + \epsilon_{i2} + {\color{red} \delta_i}\\ y_{i3} & = & \beta_0 + \beta_1 p_{i31} + \beta_2 p_{i32} + \beta_3 d_{i} + \epsilon_{i3} + {\color{red} \delta_i} \end{eqnarray*} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Variances and Covariances} %\framesubtitle{} \begin{eqnarray*} y_{i1} & = & \beta_0 + \beta_1 p_{i11} + \beta_2 p_{i12} + \beta_3 d_{i} + \epsilon_{i1} + {\color{red} \delta_i}\\ y_{i2} & = & \beta_0 + \beta_1 p_{i21} + \beta_2 p_{i22} + \beta_3 d_{i} + \epsilon_{i2} + {\color{red} \delta_i}\\ y_{i3} & = & \beta_0 + \beta_1 p_{i31} + \beta_2 p_{i32} + \beta_3 d_{i} + \epsilon_{i3} + {\color{red} \delta_i} \end{eqnarray*} \vspace{1mm} \begin{eqnarray*} Var(y_{i1}) & = & Var(\epsilon_{i1}) + Var({\color{red} \delta_i}) \\ \pause & = & \sigma^2+\sigma^2_1 \\ \pause & = & Var(y_{i2}) = Var(y_{i3}) \\ \pause &&\\ Cov(y_{i1},y_{i2}) & = & Cov(\epsilon_{i1} + {\color{red} \delta_i} \, , \, \epsilon_{i2} + {\color{red} \delta_i}) \\ \pause & = & Cov({\color{red} \delta_i} , {\color{red} \delta_i}) \pause = Var({\color{red} \delta_i}) = \sigma^2_1 \\ \pause & = & Cov(y_{i1},y_{i3}) = Cov(y_{i2},y_{i3}) \end{eqnarray*} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Compound Symmetry} %\framesubtitle{} {\large \begin{displaymath} cov(\mathbf{y}_i) =\left( \begin{array}{c c c c} \sigma^2+\sigma^2_1 & \sigma^2_1 & \sigma^2_1 & \sigma^2_1 \\ \sigma^2_1 & \sigma^2+\sigma^2_1 & \sigma^2_1 & \sigma^2_1 \\ \sigma^2_1 & \sigma^2_1 & \sigma^2+\sigma^2_1 & \sigma^2_1 \\ \sigma^2_1 & \sigma^2_1 & \sigma^2_1 & \sigma^2+\sigma^2_1 \end{array} \right) \end{displaymath} \pause } % End size \begin{itemize} \item All variances equal. \item All covariances equal. \item All covariances positive. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Classical $F$-tests} \framesubtitle{Based on theory of mixed models} \begin{itemize} \item No covariates allowed. \item $F$-tests exist only for balanced designs, \item And not for all models, even if balanced. \item [] \pause \item Modern, high quality \emph{approximate} $F$-tests are available. \item \texttt{proc mixed} in SAS and \texttt{lmer} in R. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Random Intercept} %\framesubtitle{Cases $i=1, \ldots, n$, with $j = 1, \ldots k$ values of $y$ for each case} \begin{eqnarray*} y_{ij} &=& \beta_0 + \beta_1x_{ij1} + \beta_2x_{ij2} + \cdots + \beta_{p-1}x_{ij,p-1} + \epsilon_{ij} + {\color{red} \delta_i} \\ \pause &=& (\beta_0 + {\color{red} \delta_i}) + \beta_1x_{ij1} + \beta_2x_{ij2} + \cdots + \beta_{p-1}x_{ij,p-1} + \epsilon_{ij} \end{eqnarray*} \pause \begin{itemize} \item $(\beta_0 + \delta_i) \sim N(\beta_0,\sigma^2_1)$ \item So the random shock model is sometimes called a ``random intercept" model. \pause \item That's why the \texttt{lmer} syntax in R might look like \end{itemize} \begin{center} \texttt{lmer(sales $\sim$ price*incent + (1|store) )} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Advantages and disadvantages of the random shock model} %\framesubtitle{} {\small \begin{displaymath} cov(\mathbf{y}_i) =\left( \begin{array}{c c c c} \sigma^2+\sigma^2_1 & \sigma^2_1 & \sigma^2_1 & \sigma^2_1 \\ \sigma^2_1 & \sigma^2+\sigma^2_1 & \sigma^2_1 & \sigma^2_1 \\ \sigma^2_1 & \sigma^2_1 & \sigma^2+\sigma^2_1 & \sigma^2_1 \\ \sigma^2_1 & \sigma^2_1 & \sigma^2_1 & \sigma^2+\sigma^2_1 \end{array} \right) \end{displaymath} \pause } % End size \begin{itemize} \item Makes sense for some data sets more than others. \item I like it for experiments where the participants respond to several different stimuli, in a different random order for each participant. \pause \item For longitudinal data, not so much. \pause \item Should the correlation between time 1 and time 2 be the same as the correlation between time 1 and time 50? \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Can be a life saver for small samples} %\framesubtitle{} Example: Ten men and ten women judge the beauty of randomly renerated shapes. There are seven levels of colour (roybgiv), five levels of complexity, and four levels of size. Each subject judges all $7 \times 5 \times 4 = 140$ pictures, in a different random order for each subject. \vspace{1mm} \pause There are 3 within-subjects factors and one between-subjects factor. \pause \begin{itemize} \item There are 280 $\beta_j$ parameters in the regression model, but only $n=20$. \pause \item However, there are $20 \times 140 = 2,800$ data lines, and only $280+2=282$ parameters. \item Everything is fine, if you don't hate the model too much. \pause \item[] % Unknown 140x140 covariance matrix has 9870 unknown parameters. \item General principle: There is a trade-off between sample size and assumptions. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{If assumptions are violated} %\framesubtitle{} \begin{itemize} \item Compound symmetry is assumed, but only ``sphericity" is actually required. \item Sphericity means the variances of the differences are all the same. \item Without sphericity, the Type I error probability can be greater than 0.05. \pause \item[] \item Two corrections of the $p$-values are available: Greenhouse-Geisser and Huynh-Feldt. \item There is also a test for sphericity. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Multivariate Approach} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}{Multivariate Approach to Repeated Measures} \begin{itemize} \item Multivariate methods allow the analysis of more than one response variable at the same time. \item When a case (subject) provides data under more than one set of conditions, it is natural to think of the measurements as multivariate. \pause \item The humble matched $t$-test has a multivariate version: Hotelling's $T^2$. \pause \item Simultaneously test whether the means of several \emph{differences} equal zero. \pause \item Like rating of Wine One minus Wine Two, Wine Two minus Wine Three, and Wine Three minus Wine Four. \pause \item When there are also between-subjects factors (like nationality of judge), use multivariate regression methods. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Pure within-cases: Multiple factors} \framesubtitle{Archery example} Each archer contributes 6 numbers: \begin{center} \begin{tabular}{|c|c|c|c|} \hline & \multicolumn{3}{c|}{\textbf{Arrow type}} \\ \hline \textbf{Bow type} & $1$ & $2$ & $3$ \\ \hline 1 & $E(y_{i,1})=\mu_{11}$ & $E(y_{i,2})=\mu_{12}$ & $E(y_{i,3})=\mu_{13}$ \\ \hline 2 & $E(y_{i,4})=\mu_{21}$ & $E(y_{i,5})=\mu_{22}$ & $E(y_{i,6})=\mu_{23}$ \\ \hline \end{tabular}\end{center} \pause \begin{itemize} \item Form (sets of) linear combinations of the response variables. \pause \item Want to test main effect of Bow Type? \begin{itemize} \item $H_0: \mu_{11}+\mu_{12}+\mu_{13} = \mu_{21}+\mu_{22}+\mu_{23}$ \pause \item Calculate $L_i = y_{i,1}+y_{i,2}+y_{i,3} - (y_{i,4}+y_{i,5}+y_{i,6})$. \pause \item $E(L_i) = \mu_{11}+\mu_{12}+\mu_{13} - (\mu_{21}+\mu_{22}+\mu_{23})$. \pause \item Test $H_0: E(L_i)=0$. \pause \item Could use an ordinary matched $t$-test for this one. \end{itemize} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Main effect for arrow type} \framesubtitle{Differences between marginal means} \begin{center} \begin{tabular}{|c|c|c|c|} \hline & \multicolumn{3}{c|}{\textbf{Arrow type}} \\ \hline \textbf{Bow type} & $1$ & $2$ & $3$ \\ \hline 1 & $E(y_{i,1})=\mu_{11}$ & $E(y_{i,2})=\mu_{12}$ & $E(y_{i,3})=\mu_{13}$ \\ \hline 2 & $E(y_{i,4})=\mu_{21}$ & $E(y_{i,5})=\mu_{22}$ & $E(y_{i,6})=\mu_{23}$ \\ \hline \end{tabular}\end{center} \pause \begin{itemize} \item $H_0: \mu_{11}+\mu_{21} = \mu_{12}+\mu_{22}$ and $\mu_{12}+\mu_{22} = \mu_{13}+\mu_{23}$ \pause \item Calculate two linear combinations for each archer: \begin{itemize} \item $L_{i,1} = y_{i,1}+y_{i,4}-(y_{i,2}+y_{i,5})$ \item $L_{i,2} = y_{i,2}+y_{i,5}-(y_{i,3}+y_{i,4})$ \pause \end{itemize} \item Simultaneously test $H_0: E(L_{i,1})=0$ and $E(L_{i,2})=0$.\pause \item Use Hotelling's $T^2$. \item Or something equivalent. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Matched $t$-tests with \texttt{proc reg}} \pause %\framesubtitle{} \begin{itemize} \item Regression with no explanatory variables. \item $y_i = \beta_0 + \epsilon_i \pause \sim \pause N(\beta_0,\sigma^2)$. \pause \item Test $H_0: \beta_0=0$. \end{itemize} \pause \begin{verbatim} proc reg; model y = ; \end{verbatim} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Hotelling's $T$-squared} \framesubtitle{Multivariate matched $t$-test} \pause \begin{itemize} \item Official SAS documentation claims that SAS won't calculate Hotelling's $T$-squared, but \ldots \pause \item $T^2 = (n-1) \left(\frac{1}{\lambda}-1 \right)$, so just get Wilks' Lambda from the \texttt{mtest} statement of \texttt{proc reg}. The $p$-value will be correct. \pause \item In a regression model with \emph{no explanatory variables}, and a vector of differences $\mathbf{d}_i$ for $i = 1, \ldots, n$, $E(\mathbf{d}_i) = \boldsymbol{\beta}_0$, so test $H_0: \boldsymbol{\beta}_0=\mathbf{0}$. \pause \begin{verbatim} proc reg; model D1 D2 D3 = ; Wine: mtest intercept=0; \end{verbatim} \pause \item Or just use the test for Wilks' lambda directly. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Designs with both between and within cases factors} \pause \begin{itemize} \item Could have main effects and interactions for between-cases factors, \pause \item Could have main effects and interactions for within-cases factors, \pause \item Could have interactions of between by within. \pause \item Again, observation from the same case are treated as multivariate. \pause \item Again we form linear combinations of response variables and test hypothesis about them. \pause \item \textbf{Recipe}: \emph{Use a regression model with effect coding dummy variables for the between-cases factors} (if any). \pause Use these same explanatory variables in every model. \pause \item Response variables (linear combinations) will vary depending on the effect being tested. \pause \item Null hypotheses for all the main effects and interactions are statements about the $\beta$ values. \end{itemize} \end{frame} %set up effect coding dummy variables for the between-cases factors (if any), and calculate response variables that are linear combinations of the variables that are recorded for each case. You can then obtain tests for all the main effects and interactions by testing null hypotheses about the values in the regression model. Sometimes the model has more than one response variable (linear combination). In this case it really is multivariate, and the second subscript on the refers to the response variable. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Main effects and interactions for the between-cases factors} \pause %\framesubtitle{} \begin{itemize} \item These are marginals, averaging $\mu$ parameters over the within-cases factors. \pause \item Let $L_i=$ the mean (or sum) of the $y_{i,j}$ values averaging or adding over $j$. \pause \item Do a standard between-cases analysis with $L_i$ as the response variable. \end{itemize} \end{frame} \begin{frame} \frametitle{Main effects and interactions for the within-cases factors} \pause %\framesubtitle{} \begin{itemize} \item Need to average $\mu$ parameters over the between-cases factors. \pause \item Effect coding! $\beta_0$ is the grand mean. \pause \item Form linear combinations as in the archery example. \pause \item Test $H_0: \beta_0=0$. \pause \item Or test multiple $\beta_{0,j}=0$ if need be. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Interactions of between by within} \pause %\framesubtitle{} \begin{itemize} \item The nature of a within-cases effect \emph{depends} on a between-cases treatment combination. \pause \item Take the linear combinations for the within-cases effect. \pause \item Test the between-cases effect on those. \pause \item For example, factors are Bow Type, Arrow Type and Gender. \item Want to test the Arrow Type by Gender interaction. \pause \item Are the differences between arrow types (averaging over bow types) different for men and women? \pause \item Simultaneously test for gender differences in the two linear combinations representing arrow type one versus two and two versus three. \pause \item It's a standard multivariate test. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{You could use \texttt{proc reg}} \framesubtitle{To test the arrow type by gender interaction} {\scriptsize \begin{center} \begin{tabular}{|c|c|c|c|} \hline & \multicolumn{3}{c|}{\textbf{Arrow type}} \\ \hline \textbf{Bow type} & $1$ & $2$ & $3$ \\ \hline 1 & $E(y_{i,1})=\mu_{11}$ & $E(y_{i,2})=\mu_{12}$ & $E(y_{i,3})=\mu_{13}$ \\ \hline 2 & $E(y_{i,4})=\mu_{21}$ & $E(y_{i,5})=\mu_{22}$ & $E(y_{i,6})=\mu_{23}$ \\ \hline \end{tabular}\end{center} \pause } % End size {\footnotesize % or scriptsize \begin{verbatim} L1 = y1+y4 - (y2+y5); L2 = y2+y5 - (y3+y6); \end{verbatim} \pause \begin{verbatim} proc reg; model L1 L2 = gender; arrow_by_sex: mtest gender=0; \end{verbatim} \pause } % End size Or you can let \texttt{proc glm} do the dummy variables and linear combinations for you. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{If within-cases factors have just two levels} \framesubtitle{Like before and after, experimental vs. control} \pause \begin{itemize} \item You can always do it with a univariate analysis. \item No fancy software is needed. % \item All three approaches to repeated measures yield the same $F$ statistics. \pause \item Make a sum variable and a difference variable. \pause \item Salmon study: Fish are Canadian or Alaskan\pause, Female or Male\pause, Growth is measured in freshwater \emph{and} marine environments. \pause \item Three factors: Species by sex by environment -- environment is within cases. \item Response variable is growth. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Salmon example} \framesubtitle{SAS code not tested} % {\footnotesize % or scriptsize \begin{verbatim} sumgrowth = freshgrowth + marinegrowth; difgrowth = freshgrowth - marinegrowth; \end{verbatim} \pause Assume effect coding for country and sex, product term \texttt{cs}. \pause \begin{verbatim} proc reg; title2 'Between-cases effects'; model sumgrowth = country sex cs; proc reg; title2 'Within and between-within'; model difgrowth = country sex cs; \end{verbatim} What do the $t$-tests give you? % } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Advantages of the multivariate approach} %\framesubtitle{} \begin{itemize} \item Straightforward application of classical multivariate methods. \pause \item No restrictions on the correlations of observations coming from the same case. Could have some positive, some negative, some zero. Let the data speak. \pause % Another way to say it is no assumptions about how the observations from a case came to be correlated. \item Can be extended to repeated measures on \emph{vectors} of observations (called doubly multivariate repeated measures). \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Disadvantages of the multivariate approach} %\framesubtitle{} \begin{itemize} \item Lots of unknown variance and covariance parameters. Small sample size and lots of experimental conditions will not work. Longitudinal at lots of time points is out. \pause \item No way to model the dependence between observations. \pause \item No time-varying covariates. Explanatory variable values must be the same for all $y$ from a given case. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/brunner/oldclass/441s24} {\small\texttt{http://www.utstat.toronto.edu/brunner/oldclass/441s24}} \end{frame} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%