% \documentclass[serif]{beamer} % Serif for Computer Modern math font. \documentclass[serif, handout]{beamer} % Handout mode to ignore pause statements \hypersetup{colorlinks,linkcolor=,urlcolor=red} % To create handout using article mode: Comment above and uncomment below (2 places) %\documentclass[12pt]{article} %\usepackage{beamerarticle} %\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=red]{hyperref} % For live Web links with href in article mode %\usepackage{amsmath} % For \binom{n}{y} %\usepackage{graphicx} % To include pdf files! %\usepackage{fullpage} \usefonttheme{serif} % Looks like Computer Modern for non-math text -- nice! \setbeamertemplate{navigation symbols}{} % Suppress navigation symbols % \usetheme{Berlin} % Displays sections on top \usetheme{Frankfurt} % Displays section titles on top: Fairly thin but still swallows some material at bottom of crowded slides %\usetheme{Berkeley} \usepackage{graphpap} % For graph paper in picture \usepackage[english]{babel} \usepackage{amsmath} % for binom % \usepackage{graphicx} % To include pdf files! % \definecolor{links}{HTML}{2A1B81} % \definecolor{links}{red} \setbeamertemplate{footline}[frame number] \mode % \mode{\setbeamercolor{background canvas}{bg=black!5}} % Comment this out for handout \title{Permutation and Randomization Tests\footnote{See last slide for copyright information.}} \subtitle{STA442/2101 Fall 2014} \date{} % To suppress date % Trying to shift big equations a bit to the left %\setbeamersize{text margin left = 0.5cm} \begin{document} \begin{frame} \titlepage \end{frame} %\begin{frame} %\frametitle{Background Reading} %\framesubtitle{It may be a little bit helpful.} % \begin{itemize} % \item Davison's \emph{Statistical models}: Section % \end{itemize} %\end{frame} \begin{frame} \frametitle{Overview} \tableofcontents \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Permutation Tests} \begin{frame} \frametitle{The lady and the tea} \framesubtitle{From Fisher's \emph{The design of experiments}, first published in 1935} Once upon a time, there was a British lady who claimed that she could tell from the taste which had been poured into the cup first, the tea or the milk. So Fisher designed an experiment to test it. \pause \begin{itemize} \item Eight cups of tea were prepared. \pause \item In four, the tea was poured first. \pause \item In the other four, the milk was poured first. \pause \item Other features of the cups of tea (size, temperature, etc.) were held constant. \pause \item Cups were presented in a random order (critical). \pause \item The lady tasted them, and judged. \pause \item She knew there were four of each type. \end{itemize} \end{frame} \begin{frame} \frametitle{The null hypothesis} %\framesubtitle{} \begin{itemize} \item The null hypothesis is that the lady has no ability to taste the difference. \pause \item If so, all possible ways of lining up the lady's judgements and the truth about the tea cups should be equally likely. \pause \item Equally likely \emph{because of the random order of presentation}. \pause \item The test statistic is the number of correct judgements. \pause \item What is the distribution of the test statistic under the null hypothesis? \end{itemize} \end{frame} \begin{frame}[fragile] \frametitle{Data file} %\framesubtitle{} {\footnotesize % or scriptsize \begin{verbatim} Truth Judgement 1 tea milk 2 milk tea 3 milk milk 4 milk milk 5 tea tea 6 tea tea 7 tea milk 8 milk tea \end{verbatim} \pause } % End size {\small \begin{itemize} \item Under $H_0$, the reasons for the lady's judgements are unknown, except that they have nothing to do with the truth. \pause \item The judgements are what they are; they are fixed. \pause \item Because of randomization, all $8! = 40,320$ permutations of the cups are equally likely, and each one has its own number of correct judgements. \pause \item But there are lots of repeats. \end{itemize} } % End size \end{frame} \begin{frame} \frametitle{Counting argument} %\framesubtitle{} \begin{itemize} \item How many ways are there to choose 4 cups to put the tea in first? $\binom{8}{4}=70$ \pause \item All are equally likely. \pause \item Only one lines up perfectly with the lady's judgements. \pause \item The chances of this under $H_0$ are $\frac{1}{70} = 0.0143 < 0.05$. \pause \item So $H_0$ would be rejected at $\alpha = 0.05$ if she guessed perfectly. \end{itemize} \end{frame} \begin{frame} \frametitle{The permutation distribution} \framesubtitle{In general} \pause \begin{itemize} \item Decide on a test statistic $T$. \pause \item List the possible values of $T$. \pause \item Under $H_0$, all ways of re-arranging the data are equally likely. \pause \item $P(T=t)$ is proportional to the number of ways of getting the value $t$. \pause \end{itemize} The permutation $p$-value is the probability of getting a value of $T$ as extreme or more extreme as the value we observed, ``extreme" meaning in a direction inconsistent with $H_0$. \end{frame} \begin{frame} \frametitle{Permutation distribution is hypergeometric} \framesubtitle{For the tea-tasting experiment} \pause {\LARGE \begin{displaymath} P(T=t) = \frac{\binom{4}{t}\binom{4}{4-t}}{\binom{8}{4}} \end{displaymath} } \pause \vspace{5mm} Of the four cups where the tea was poured first, select $t$ of them to say ``tea" correctly, and $4-t$ to say ``tea" incorrectly. \end{frame} \begin{frame}[fragile] \frametitle{$P(T=t) = \frac{\binom{4}{t}\binom{4}{4-t}}{\binom{8}{4}}$} %\framesubtitle{} \pause {\footnotesize % or scriptsize \begin{verbatim} > p = function(t) + {p = choose(4,t)*choose(4,4-t)/choose(8,4) + p} > > cbind(0:4,p(0:4)) [,1] [,2] [1,] 0 0.01428571 [2,] 1 0.22857143 [3,] 2 0.51428571 [4,] 3 0.22857143 [5,] 4 0.01428571 \end{verbatim} } % End size \pause If she tasted 10 cups, it would be possible to reject $H_0$ without requiring perfect judgement. \end{frame} \begin{frame} \frametitle{Fisher's exact test} \pause %\framesubtitle{} \begin{itemize} \item Again, testing association of two binary variables. \pause \item This time, no requirement of 50-50 split. \pause \item $p$-values are still exact probabilities based on the hypergeometric distribution. \pause \item Large samples are not required. \end{itemize} \end{frame} \begin{frame} \frametitle{Permutation tests are not just for categorical data} \framesubtitle{Another example from Fisher's \emph{The design of experiments}} \pause Darwin's experiment on self-fertilized versus cross-fertilized corn plants: \begin{itemize} \item Plants are grown in 15 pairs, one cross and one self-fertilized. \pause \item Response variable is height. \pause \item Calculate differences. \pause \item Do a matched $t$-test, or \ldots \end{itemize} \end{frame} \begin{frame} \frametitle{A randomization test for matched pairs} {\footnotesize \begin{itemize} \item Fisher wishes the self-fertilized plants had been randomly assigned to be on either the left or the right. Otherwise he loves the experiment. \pause \item Under null hypothesis that self-fertilized versus cross-fertilized does not matter at all, only chance determined whether $A$ was subtracted from $B$ or $B$ was subtracted from $A$. \pause \item So the absolute value of the difference is what it is, but the plus or minus sign is by chance alone (under $H_0$). \pause \item Test statistic is sum of the differences. \pause \item There are $2^{15}=32,768$ ways to swap the plus and minus signs, all equally likely under $H_0$. \pause \item Calculate the sum of differences for each one, yielding a permutation distribution for the test statistic under $H_0$. \pause \item The $p$-value is the proportion of these that equal or exceed in absolute value the sum of differences Darwin observed: $D=314$. \pause \item Fisher's answer is $p = 0.05267$, compared to $p = 0.0497$ from a $t$-test. \pause \item He used his brain as well as doing a lot of tedious calculation. \end{itemize} } % End size \end{frame} \begin{frame} \frametitle{Some big advantages of the permutation test idea} %\framesubtitle{} \begin{itemize} \item Test is distribution-free under $H_0$. \pause \item Some non-parametric methods depend on large sample sizes for their validity. Permutation tests do not. Even for tiny samples, the chance of false significance cannot exceed 0.05. \pause \item $p$-values are exact and not asymptotic. \pause \item There is no pretense of random sampling from some imaginary population. \pause \item All the probability comes from random assignment. \pause \item Can easily be extended to tests comparing several independent treatments. \end{itemize} \end{frame} \begin{frame} \frametitle{More comments} %\framesubtitle{} {\small \begin{itemize} \item For observational studies too, the null hypothesis is that the explanatory variable(s) and response variable(s) are independent. \pause \item It's even better than that. Bell and Doksum (1967) proved that \emph{any} valid distribution-free test of independence \emph{must} be a permutation test (maybe a permutation test in disguise). \pause \item It doesn't matter if data are categorical or quantitative. By scrambling the data, any possible relationship between explanatory and response variables is destroyed. \pause \item If either explanatory or response variable is multivariate, scramble \emph{vectors} of data. \pause \item What is ``the" test statistic? In fact, the test statistic is up to you. No matter what you choose, the chance of wrongly rejecting limited to $\alpha$. \pause \item But some choices are better than others, depending on \emph{how} $H_0$ is false: power. \end{itemize} } % End size \end{frame} \begin{frame} \frametitle{To summarize} A permutation test is conducted by following these three steps. \pause \begin{enumerate} \item Compute some test statistic using the set of original observations. \pause \item Re-arrange the observations in all possible orders, computing the test statistic each time. \pause \item Calculate the permutation test $p$-value, which is the proportion of test statistic values from the re-arranged data that equal or exceed the value of the test statistic from the original data. \end{enumerate} \end{frame} \begin{frame} \frametitle{Fisher said} \framesubtitle{\emph{Statistical methods for research workers}, 1936} {\LARGE \begin{quote} Actually, the statistician does not carry out this very tedious process but his conclusions have no justification beyond the fact they could have been arrived at by this very elementary method. \end{quote} } % End size \end{frame} \begin{frame} \frametitle{Main drawback is that it's hard to compute}\pause %\framesubtitle{} \begin{itemize} \item Fisher considered permutation tests to be mostly hypothetical, but that was before computers. \pause \item Even with computers, listing all the permutations can be out of the question, and combinatoric simplification may be challenging. \pause \item One way around the computational problem is to convert the data to ranks, and then do it. \pause \item Then, permutation distributions can be figured out in advance. \pause \item All the common non-parametric rank tests are permutation tests carried out on ranks. \end{itemize} \end{frame} \section{Randomization Tests} \begin{frame} \frametitle{Randomization tests: A modern solution} \pause \framesubtitle{} \begin{itemize} \item Scramble the values of the response variable in a random order. \pause \item Compute the test statistic for the randomly shuffled data. \pause \item In this way, we have randomly sampled a value of the test statistic from its permutation distribution. \pause \item Carry out the procedure a large number of times. \pause \item By the Law of Large Numbers, the the permutation $p$-value is approximated by the proportion of randomly generated values that exceed or equal the observed value of the test statistic. \pause \item This proportion is the $p$-value of the randomization test. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf14} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf14}} \end{frame} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% {\LARGE \begin{displaymath} \end{displaymath} } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Darwin's Maize data from Davison p. 2 Pot Crossed Self Diff 1 1 188 139 49 2 1 96 163 -67 3 1 168 160 8 4 2 176 160 16 5 2 153 147 6 6 2 172 149 23 7 3 177 149 28 8 3 163 122 41 9 3 146 132 14 10 3 173 144 29 11 3 186 130 56 12 4 168 144 24 13 4 177 102 75 14 4 184 124 60 15 4 96 144 -48 # On Titanic, maize <- read.table("maize.data") attach(maize) sum(Diff) rbinom(15,1,.5) mult = 2*rbinom(15,1,.5) - 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%