% \documentclass[serif]{beamer} % Serif for Computer Modern math font. \documentclass[serif, handout]{beamer} % Handout to ignore pause statements \hypersetup{colorlinks,linkcolor=,urlcolor=red} % To create handout using article mode: Comment above and uncomment below (2 places) %\documentclass[12pt]{article} %\usepackage{beamerarticle} %\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=red]{hyperref} % For live Web links with href in article mode %\usepackage{amsmath} % For \binom{n}{y} %\usepackage{graphicx} % To include pdf files! %\usepackage{fullpage} \usefonttheme{serif} % Looks like Computer Modern for non-math text -- nice! \setbeamertemplate{navigation symbols}{} % Suppress navigation symbols % \usetheme{Berlin} % Displays sections on top \usetheme{Frankfurt} % Displays section titles on top: Fairly thin but still swallows some material at bottom of crowded slides %\usetheme{Berkeley} \usepackage{graphpap} % For graph paper in picture \usepackage[english]{babel} \usepackage{amsmath} % for binom % \usepackage{graphicx} % To include pdf files! % \definecolor{links}{HTML}{2A1B81} % \definecolor{links}{red} \setbeamertemplate{footline}[frame number] \mode % \mode{\setbeamercolor{background canvas}{bg=black!5}} % Comment this out for handout \title{Nested and Random Effects Models\footnote{See last slide for copyright information.}} \subtitle{STA441 Winter 2016} \date{} % To suppress date % Trying to shift big equations a bit to the left \setbeamersize{text margin left = 0.5cm} \begin{document} \begin{frame} \titlepage \end{frame} \begin{frame} \frametitle{Overview} \tableofcontents \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Nested Designs} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Nested Designs} \framesubtitle{Example} A chain of commercial business colleges is teaching a software certification course. After 6 weeks of instruction, students take a certification exam and receive a score ranging from zero to 100. \pause \begin{itemize} \item The owners want to see whether performance is related to which school students attend, or which instructor they have -- or both. \pause \item They compare two schools; one of the schools has three instructors teaching the course, and the other school has 4 instructors teaching the course. \pause \item A teacher only works in one school. \pause \item There are two categorical explanatory variables, school and teacher. \pause \item But it's not a factorial design, because ``Teacher 1" does not mean the same thing in School 1 and School 2. \pause \item It's a different person. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Teacher is nested within school} \begin{picture} (100,100)(-100,0) % \graphpaper(0,0)(200,100) % Need \usepackage{graphpap} \put(-55,120){School One} % Co-ordinates of text lower left \put (-30,110){\line(0,-1){100}} % Down from School One % Co-ordinates, direction, length \put(-80,40){\line(1,0){100}} % Across under School One \put(-80,40){\line(0,-1){30}} % Branch to Teacher 1 \put(20,40){\line(0,-1){30}} % Branch to Teacher 3 \put(-95,0){\tiny Teacher 1} \put(-85,-10){\small $\mu_1$} \put(-45,0){\tiny Teacher 2} \put(-35,-10){\small $\mu_2$} \put(5,0){\tiny Teacher 3} \put(15,-10){\small $\mu_3$} \put(120,120){School Two} \put (150,110){\line(0,-1){70}} % Down from School Two \put(75,40){\line(1,0){150}} % Across under School Two \put(75,40){\line(0,-1){30}} % Branch to Teacher 1 School 2 \put(125,40){\line(0,-1){30}} % Branch to Teacher 2 School 2 \put(175,40){\line(0,-1){30}} % Branch to Teacher 3 School 2 \put(225,40){\line(0,-1){30}} % Branch to Teacher 4 School 2 \put(60,0){\tiny Teacher 1} \put(70,-10){\small $\mu_4$} \put(110,0){\tiny Teacher 2} \put(120,-10){\small $\mu_5$} \put(160,0){\tiny Teacher 3} \put(170,-10){\small $\mu_6$} \put(210,0){\tiny Teacher 4} \put(220,-10){\small $\mu_7$} \pause \put(-95,-40){\small Schools $H_0: \frac{1}{3}(\mu_1+\mu_2+\mu_3) = \frac{1}{4}(\mu_4+\mu_5+\mu_6+\mu_7)$} \pause \put(-95,-60){\small Teachers within Schools $H_0: \mu_1=\mu_2=\mu_3$ and $\mu_4=\mu_5=\mu_6=\mu_7$} \end{picture} % Unbalanced design % Teachers(Schools) pools main effect for teachers AND the interaction \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Tests of nested effects are tests of contrasts} \pause $H_0: \frac{1}{3}(\mu_1+\mu_2+\mu_3) = \frac{1}{4}(\mu_4+\mu_5+\mu_6+\mu_7)$ \pause $H_0: \mu_1=\mu_2=\mu_3$ and $\mu_4=\mu_5=\mu_6=\mu_7$ \pause \vspace{3mm} You can specify the contrasts yourself, or you can take advantage of \texttt{proc glm}'s syntax for nested models. \pause \begin{verbatim} proc glm; class school teacher; model score = school teacher(school); \end{verbatim} The notation \texttt{teacher(school)} should be read ``teacher within school." \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Easy to extend the ideas} %\framesubtitle{} \begin{itemize} \item Can have more than one level of nesting. You could have climate zones, lakes within climate zones, fishing boats within lakes, \ldots \pause \item There is no problem with combining nested and factorial structures. \pause You just have to keep track of what's nested within what. \pause \item Factors that are not nested are sometimes called ``crossed." \pause \item The combination of nesting and \emph{random effects} is very powerful. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Random Effects} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Random Effects} \framesubtitle{As opposed to \emph{fixed effects}} \pause A random factor is one in which the \emph{values of the factor are a random sample} from a populations of values. \pause \begin{itemize} \item Randomly select 10 schools, test students at each school. \pause School is a random factor with 10 values. \pause \item Randomly select 15 naturopathic medicines for arthritis (there are quite a few), and then randomly assign arthritis patients to try them. \pause Drug is a random factor. \pause \item Randomly select 15 lakes. In each lake, measure how clear the water is at 20 randomly chosen points. \pause Lake is a random factor. \pause \item Randomly select 20 fast food outlets, survey customers in each about quality of the fries. \pause Outlet is a random factor with 20 values. \pause Amount of salt would be a fixed factor, which could be crossed with outlet. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{One random factor} \framesubtitle{A nice simple example} \pause \begin{itemize} \item Randomly select 5 farms. \pause \item Randomly select 10 cows from each farm, milk them, and record the amount of milk from each one. \pause \item The one random factor is Farm. \pause \item Total $n=50$. \pause \item The idea is that ``Farm" is a kind of random shock that pushes all the amounts of milk in a particular farm up or down by the same amount. \pause \item You could also think of cow (the cases are cows) as a random factor nested within farm. \pause \end{itemize} {\LARGE \begin{displaymath} Y_{ij} = \mu + \tau_i + \epsilon_{ij} \end{displaymath} } % End size \end{frame} \begin{frame} \frametitle{Analysis of variance} %\framesubtitle{} {\LARGE $Y_{ij} = \mu + \tau_i + \epsilon_{ij}$ \pause } % End size \vspace{5mm} \begin{eqnarray*} Var(Y_{ij}) & = & Var(\mu + \tau_i + \epsilon_{ij}) \\ \pause & = & Var(\tau_i) + Var(\epsilon_{ij}) \\ \pause & = & \sigma^2_\tau + \sigma^2 \pause \end{eqnarray*} \begin{itemize} \item Split the variance up into two parts: \pause The part that comes from farms, \pause and the part that comes from cows (within farms). \pause \item \emph{Analysis} of variance. \pause \item Test $H_0: \sigma^2_\tau=0$ \pause \item Estimate $\frac{\sigma^2_\tau}{\sigma^2_\tau+\sigma^2}$ \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Nesting and random effects} %\framesubtitle{} \begin{itemize} \item Nested models are often viewed as random effects models, but there is no necessary connection between the two concepts. \pause \item It depends on how the study was conducted. \pause Were the two schools randomly selected from some population of schools, or did someone just pick those two (maybe because there are just two schools)? \pause \item Random effects, like fixed effects, can either be nested or not; it depends on the logic of the design. \end{itemize} \end{frame} \begin{frame} \frametitle{Sub-sampling} \pause %\framesubtitle{} \begin{itemize} \item Sub-sampling is a useful case of nested and purely random effects. \pause \item For example, \begin{itemize} \item Select a random sample of towns. \pause \item From each town, select a random sample of households. \pause \item From each household, select a random sample of people \pause to test, or measure, or question. \pause \end{itemize} \item Start with the grand mean $\mu$. \pause \item Town, household and person are random shocks that push the measurement up or down from the grand mean. \pause \item In the model, shocks are independent normal with mean zero. \pause \item Analysis of variance. \end{itemize} \end{frame} \begin{frame} \frametitle{Sub-sampling with SAS} \framesubtitle{Waste water treatment} \pause \begin{itemize} \item We are studying the porosity of ``flocks," nasty little pieces of something floating in the tanks. \pause \item We randomly select a sample of flocks, and then cut each one up into very thin slices. \pause \item We then randomly select a sample of slices (called ``sections") from each flock, look at it under a microscope, and assign a number representing how porous it is \pause (how much empty space there is in a designated region of the section). \pause \item The explanatory variables are flock and section. \pause \item The research question is whether section is explaining a significant amount of the variance in porosity \pause -- because if not, we can use just one section per flock, and save considerable time and expense. \end{itemize} \end{frame} \begin{frame}[fragile] \frametitle{SAS \texttt{proc nested}} %\framesubtitle{} SAS \texttt{proc nested} is built specifically for pure random effects models with each explanatory variable nested within all the preceding ones. \pause \begin{verbatim} proc sort; by flock section; /* Data must be sorted */ proc nested; class flock section; var por; \end{verbatim} \pause You could use \texttt{proc glm}, but the \texttt{proc nested} syntax is easier and the output is nicer for this special case. \end{frame} \begin{frame} \frametitle{Mixed models} \framesubtitle{The classical approach} \pause \begin{itemize} \item There can be both fixed and random factors in the same experiment. \pause This makes it a \emph{mixed} model. \pause \item Factors can be nested or crossed, in various patterns. \pause \item Random factors can be nested within fixed. \pause \item Fixed effects cannot be nested within random. \pause \item The interaction of any random factor with another factor (whether fixed or random) is random. \pause \item $F$-tests are often possible, \pause but they don't always use Mean Squared Error in the denominator of the F statistic. \pause \item Often, it's the Mean Square for some interaction term. \pause \item The choice of what error term to use is relatively mechanical for balanced models \pause --- based on expected mean squares. \pause \item Mechanical means SAS can do it for you. \end{itemize} \end{frame} \begin{frame} \frametitle{One more example} \framesubtitle{And some sample questions} Independent random samples of 10 Canadian and 10 U.S. large companies were selected. In each company, 25 female and 25 male managers were randomly selected, and their formal education in years was recorded. \pause \begin{enumerate} \item Is this an observational study, or experimental? \pause {\tiny \color{red} Observational.} \pause \item What are the factors? \pause {\tiny \color{red} Nation, Company and Sex.} \pause \item Designate the factors as fixed or random. \pause {\tiny \color{red} Nation and Sex are fixed. \pause Company is random.} \pause \item Describe the nesting, if any. \pause {\tiny \color{red} Company is nested within Nation.} \end{enumerate} \end{frame} % \pause {\tiny \color{red} } \pause %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/441s16} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/441s16}} \end{frame} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{} %\framesubtitle{} \begin{itemize} \item \item \item \end{itemize} \end{frame} {\LARGE \begin{displaymath} \end{displaymath} } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % I cut out a bit of detail that could be useful some day. \begin{frame} \frametitle{Another Example} \begin{itemize} \item Randomly sample school boards in Ontario. \pause \item Within school boards, sample K-8 schools. \pause \item Within schools, sample Grade 3 classrooms. \pause \item Within classrooms (teachers), sample students. \pause \item Measure something. \end{itemize} \end{frame} \begin{frame} \frametitle{Random shocks} \framesubtitle{Modelled as independent normal random variables with mean zero.} \begin{itemize} \item Start with the grand mean, a constant. \item Randomly sample $I$ school boards. \begin{itemize} \item[] School board is a random ``shock" that pushes the scores of all students in the board up or down by the same amount. \end{itemize} \item Within each school board, randomly sample $J$ grade schools. \begin{itemize} \item[] School is a random shock that pushes the scores of all students in the school up or down by the same amount. \end{itemize} \item Within each school, randomly sample $K$ Grade 3 classrooms. \begin{itemize} \item[] Classroom is a random shock that pushes the scores of all students in the class up or down by the same amount. \end{itemize} \item Within each school, randomly sample $L$ students. \begin{itemize} \item[] Individual student is a random shock that pushes the score up or down one more time. \end{itemize} \end{itemize} \begin{center} %$Y_{ijk\ell} = \mu + \tau_i + \tau_{j(i)} + \tau_{k(ij)} + \epsilon_{\ell(ijk)}$ $Y_{ijk\ell} = \mu + \tau_i + \tau_{ij} + \tau_{ijk} + \epsilon_{ijk\ell}$ \end{center} \end{frame} \begin{frame} \frametitle{Analysis of variance} \framesubtitle{$Y_{ijk\ell} = \mu + \tau_i + \tau_{ij} + \tau_{ijk} + \epsilon_{ijk\ell}$} \begin{eqnarray*} Y_{ijk\ell} & = & \mu + \tau_i + \tau_{ij} + \tau_{ijk} + \epsilon_{ijk\ell} \\ &&\\ Var(Y_{ijk\ell}) & = & Var(\mu + \tau_i + \tau_{ij} + \tau_{ijk} + \epsilon_{ijk\ell}) \\ & = & Var(\tau_i) + Var(\tau_{ij}) + Var(\tau_{ijk}) + Var(\epsilon_{ijk\ell})) \\ & = & \sigma^2_1 +\sigma^2_2 +\sigma^3_1 + \sigma^2 \end{eqnarray*} \begin{itemize} \item \emph{Analysis} of variance. \item What proportion of the variance in student performance comes from the teacher? \item[] $\frac{\sigma^2_3}{\sigma^2_1 +\sigma^2_2 +\sigma^3_1 + \sigma^2}$ \end{itemize} \end{frame} % This last one is too complicated for me, so I cut it out. \begin{frame} \frametitle{One more example} \framesubtitle{A sample question} Independent random samples of 10 Canadian and 10 U.S. universities were selected. For each university, 10 female and 10 male professors in the teaching stream were randomly selected. Two courses were randomly selected for each professor. Ten female and 10 male students were randomly selected from each course, and asked to rate the quality of the professor's teaching. \pause \begin{enumerate} \item Is this an observational study, or experimental? \pause {\tiny \color{red} Observational.} \pause \item What are the factors? \pause {\tiny \color{red} Nation, Sex of professor, Sex of student, University, Course} \pause \item Designate the factors as fixed or random. \pause {\tiny \color{red} Nation, Sex of professor and Sex of student are fixed. \pause University, Professor and Course are random.} \pause \item Describe the nesting and crossing. \begin{itemize} {\tiny \color{red} \item University is nested within Nation. \item Professor is nested within the combination of University and Sex of professor. \item Course is nested within Professor. \item Nation, Sex of professor and Sex of student are crossed. } % End tiny red \end{itemize} \end{enumerate} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%