% \documentclass[serif]{beamer} % Serif for Computer Modern math font. \documentclass[serif, handout]{beamer} % Handout to ignore pause statements \hypersetup{colorlinks,linkcolor=,urlcolor=red} \usefonttheme{serif} % Looks like Computer Modern for non-math text -- nice! \setbeamertemplate{navigation symbols}{} % Suppress navigation symbols \usetheme{AnnArbor} % CambridgeUS % \usetheme{Frankfurt} % Displays section titles on top: Fairly thin but still swallows some material at bottom of crowded slides % \usetheme{Berlin} % Displays sections on top % \usetheme{Berkeley} \usepackage[english]{babel} \usepackage{amsmath} % for binom % \usepackage{graphicx} % To include pdf files! % \definecolor{links}{HTML}{2A1B81} % \definecolor{links}{red} \setbeamertemplate{footline}[frame number] \mode \title{Permutation and Randomization Tests\footnote{ This slide show is an open-source document. See last slide for copyright information.}} \date{} % To suppress date \begin{document} \begin{frame} \titlepage \end{frame} \begin{frame} \frametitle{Overview} \tableofcontents \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The lady and the tea} \framesubtitle{From Fisher's \emph{The design of experiments}, first published in 1935} Once upon a time, there was a British lady who claimed that she could tell from the taste which had been poured into the cup first, the tea or the milk. So Fisher designed an experiment to test it. \pause \begin{itemize} \item Eight cups of tea were prepared. \pause \item In four, the tea was poured first. \pause \item In the other four, the milk was poured first. \pause \item Other features of the cups of tea (size, temperature, etc.) were held constant. \pause \item Cups were presented in a random order (critical). \pause \item The lady tasted them, and judged. \pause \item She knew there were four of each type. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The null hypothesis} %\framesubtitle{} \begin{itemize} \item The null hypothesis is that the lady has no ability to taste the difference. \pause \item If so, all possible ways of lining up the lady's judgements and the truth about the tea cups should be equally likely. \pause \item Equally likely \emph{because of the random order of presentation}. \pause \item The test statistic is the number of correct judgements. \pause \item What is the distribution of the test statistic under the null hypothesis? \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{Data file} %\framesubtitle{} {\footnotesize % or scriptsize \begin{verbatim} Truth Judgement 1 tea milk 2 milk tea 3 milk milk 4 milk milk 5 tea tea 6 tea tea 7 tea milk 8 milk tea \end{verbatim} \pause } % End size {\small \begin{itemize} \item Under $H_0$, the reasons for the lady's judgements are unknown, except that they have nothing to do with the truth. \pause \item The judgements are what they are; they are fixed. \pause \item Because of randomization, all $8! = 40,320$ permutations of the cups are equally likely, and each one has its own number of correct judgements. \pause \item But there are lots of repeats. \end{itemize} } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Counting argument} %\framesubtitle{} \begin{itemize} \item How many ways are there to choose 4 cups to put the tea in first? $\binom{8}{4}=70$ \pause \item All are equally likely. \pause \item Only one lines up perfectly with the lady's judgements. \pause \item The chances of this under $H_0$ are $\frac{1}{70} = 0.0143 < 0.05$. \pause \item So $H_0$ would be rejected at $\alpha = 0.05$ if she guessed perfectly. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Fisher's exact test} \pause %\framesubtitle{} \begin{itemize} \item Testing association of two binary variables. \pause \item Unlike the tea-teasing example, no requirement of 50-50 split. \pause \item Numbers of A = Yes, No and B = Yes, No are fixed. \pause \item Subject to those restrictions, the count in one cell is free to vary ($df=1$). \pause \item Number in the (Yes, Yes) cell is one-to-one with the odds ratio. \pause \item If counts in the (Yes, Yes) cell were completely random subject to the restriction of row and column totals \pause (that's $H_0$)\pause, what's the probability of getting such a large (or small) oddes ratio? \pause \item $p$-values are exact probabilities based on the hypergeometric distribution. \pause \item Large samples are not required. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Permutation Tests} \begin{frame} \frametitle{The idea} \framesubtitle{Thank you Mr. Fisher} \pause \begin{itemize} \item Experimental study, with random assignment of units to conditions. \pause \item Under $H_0$, the treatment has no effect at all. \pause \item The process producing values of $y$ is unspecified. \pause \item Except that it has nothing to do with experimental condition. \pause \item The particular values of the response variable are what they are. \pause \item The only reason for differences among conditions is the random assignment. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The permutation distribution} \pause %\framesubtitle{} \begin{itemize} \item Pick a test statistic (more on that later). \pause \item Under $H_0$, all the ways of distributing $y$ values into experimental conditions are equally likely. \pause \item Each re-arrangement (permutation) of the $y$ values produces a value of the test statistic. \pause \item Compute the test statistic for each re-arrangement. \pause \item Relative frequencies are the \emph{permutation distribution} of the test statistic. \pause \item[] \item Make a histogram. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Another example from Fisher's \emph{Design of experiments}} \pause %\framesubtitle{} Darwin's experiment on self-fertilized versus cross-fertilized corn plants: \pause \begin{itemize} \item Plants are grown in 15 pairs, one cross and one self-fertilized. \pause \item Response variable is height. \pause \item Calculate differences. \pause \item Do a matched $t$-test, or \ldots \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{A randomization test for matched pairs} {\footnotesize \begin{itemize} \item Fisher wishes the self-fertilized plants had been randomly assigned to be on either the left or the right. Otherwise he loves the experiment. \pause \item Under null hypothesis that self-fertilized versus cross-fertilized does not matter at all, only chance determined whether $A$ was subtracted from $B$ or $B$ was subtracted from $A$. \pause \item So the absolute value of the difference is what it is, but the plus or minus sign is by chance alone (under $H_0$). \pause \item Test statistic is sum of the differences. \pause \item There are $2^{15}=32,768$ ways to swap the plus and minus signs, all equally likely under $H_0$. \pause \item Calculate the sum of differences for each one, yielding a permutation distribution for the test statistic under $H_0$. \pause \item The $p$-value is the proportion of these that equal or exceed in absolute value the sum of differences Darwin observed: $D=314$. \pause \item Fisher's answer is $p = 0.05267$, compared to $p = 0.0497$ from a $t$-test. \pause \item He used his brain as well as doing a lot of tedious calculation. \end{itemize} } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{} \framesubtitle{} \begin{center} \includegraphics[width=3.5in]{Darwin} \end{center} {\footnotesize %Observed sum of differences = 314 in eights of an inch: $p = 0.05267$ } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Permutation $p$-value} %\framesubtitle{} The permutation test $p$-value is the proportion of values in the permutation distribution that equal or exceed the observed value of the test statistic from the un-scrambled data \pause --- in the direction(s) of the alternative hypothesis. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Advantages of the permutation test idea} \pause %\framesubtitle{} \begin{itemize} \item Simplicity. The distribution theory is elementary. \pause \item Test is distribution-free under the null hypothesis. \pause There is no assumption of the normal or any other distribution. \pause \item Some non-parametric methods depend on large sample sizes for their validity. Permutation tests do not. \pause Even for tiny samples, the chance of false significance cannot exceed 0.05. \pause \item $p$-values are exact and not asymptotic. \pause \item There is no pretense of random sampling from some imaginary population. \pause \item All the probability comes from random assignment. \pause \item Random assignment actually happens. Random sampling often does not. % \item Inference is \emph{conditional} on the observed data values. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{More comments} %\framesubtitle{} %{\small \begin{itemize} \item Applies to observational studies too. \pause \item The null hypothesis is that the explanatory variable(s) and response variable(s) are independent. \pause \item It's even better than that. \pause Bell and Doksum (1967) proved that \emph{any} valid distribution-free test of independence \emph{must} be a permutation test (maybe a permutation test in disguise). \pause \item It doesn't matter if data are categorical or quantitative. \pause By scrambling the data, any possible relationship between explanatory and response variables is destroyed. \pause \item If either explanatory or response variable is multivariate, scramble \emph{vectors} of data. \end{itemize} %} % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{What is ``the" test statistic?} \pause %\framesubtitle{} \begin{itemize} \item It's up to you. \pause \item No matter what you choose, the chance of wrongly rejecting the null hypothesis cannot exceed $\alpha = 0.05$. \pause \item One good choice is a descriptive statistic that accurately reflects the phenomenon as you understand it. \pause \item Could that number (or greater) have been produced by random assignment or random sampling? \pause No doubt. \pause \item The question is, how unlikely is this? \pause \item The answer is given by the permutation $p$-value. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Choice of test statistic} \pause %\framesubtitle{} \begin{itemize} \item We are testing a null hypothesis based on the value of the test statistic. \pause \item The probability of wrongly rejecting $H_0$ (and making a false discovery) is limited to $\alpha = 0.05$. Good. \pause \item Some test statistics are better than others, depending on \emph{how} $H_0$ is false: Statistical power. \pause \item See Good (1994) \emph{Permutation tests}. \pause \item Traditional test statistics are a popular choice, and usually a good choice. \pause \item When the assumptions happen to be approximately satisfied, they often are nearly optimal. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{To summarize} A permutation test is conducted by following these three steps. \pause \begin{enumerate} \item Compute some test statistic using the set of original observations. \pause \item Re-arrange the observations in all possible orders, computing the test statistic each time. \pause Re-arrangement corresponds exactly to the details of random assignment. \pause \item Calculate the permutation test $p$-value, which is the proportion of test statistic values from the re-arranged data that equal or exceed the value of the test statistic from the original data. \pause Or, locate the critical value(s) in the permutation distribution. \end{enumerate} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Using the $p$-value from a traditional test} \pause \framesubtitle{As the test statistic} \begin{itemize} \item The $p$-value from a traditional test is sometimes a more convenient test statistic than the original test statistic. \pause \item $p$-value is $1-1$ function of the test statistic, so the permutation $p$-value is the same. \pause \item The permutation $p$-value is the proportion of $p$-values from the scrambled data that are less than or equal to the observed $p$-value. \pause \item That's exactly the cdf of the permutation distribution of $p$-values. \pause \item One-sided, two-sided does not matter. \pause \item Handy for multiple comparisons (More later). \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Fisher said} \framesubtitle{\emph{Statistical methods for research workers}, 1936} \pause {\LARGE \begin{quote} Actually, the statistician does not carry out this very tedious process but his conclusions have no justification beyond the fact they could have been arrived at by this very elementary method. \end{quote} \pause } % End size \vspace{10mm} See Cox and Reid (2000) \emph{The Theory of the Design of Experiments} for the research literature. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Scab disease data} \pause \framesubtitle{Illustrating Fisher's claim} %\framesubtitle{Raw $p$-value = 0.0103, Permutation $p$-value = 0.0111} % Actually I lied. That's a randomization p-value \begin{center} \includegraphics[width=3in]{PermDistF} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} % Superimpose F density \frametitle{Scab disease data} \framesubtitle{Illustrating Fisher's claim} \begin{center} \includegraphics[width=3in]{PermDistF2} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[fragile] \frametitle{The approximation is not always so good} %\framesubtitle{} {\Large \begin{verbatim} Group1 Group2 Group3 220 1 4 0 0 0 1 2 0 0 4 3 1 2 1 1 0 4 0 1 1 \end{verbatim} } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Permutation Distribution of the $F$ Statistic} %\framesubtitle{} \begin{center} \includegraphics[width=3.3in]{AbsCauchy1} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{} %\framesubtitle{Permutation Distribution versus Theoretical F Distribution} \begin{center} \includegraphics[width=3.5in]{AbsCauchy2a} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{} %\framesubtitle{Permutation Distribution versus Theoretical F Distribution} \begin{center} \includegraphics[width=3.5in]{AbsCauchy2b} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{It was the outlier} \pause %\framesubtitle{} \begin{itemize} \item Approximation was excellent for exponential data. \pause \item Awful for absolute Cauchy not rounded. \pause \item Likert scale: 7-point scale, strongly disagree to strongly agree. \pause \end{itemize} \begin{center} \includegraphics[width=2.4in]{LikertScale} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{$n=7$ for each of three treatments} \framesubtitle{0.0542 of the permutation distribution is above the $F$ critical vlue} \begin{center} \includegraphics[width=3in]{Likert21} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{$n=30$ for each of three treatments} \framesubtitle{0.0472 of the permutation distribution is above the $F$ critical vlue} \begin{center} \includegraphics[width=3in]{Likert90} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Main drawback of permutation tests are that they're hard to compute}\pause %\framesubtitle{} \begin{itemize} \item Fisher considered permutation tests to be mostly hypothetical, but that was before computers. \pause \item Even with computers, listing all the permutations can be out of the question, and combinatoric simplification may be challenging. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Scab disease} \pause %\framesubtitle{} \begin{itemize} \item Eight plots of land in the control condition. \pause \item Four plots in each of 6 experimental conditions. \pause \item Total $n=32$. \pause \item This is a small sample. \pause \item There are $ \frac{32!}{8! \, 4! \, 4! \, 4! \, 4! \, 4! \, 4!}$ ways to place the observed data into treatment conditions. \pause \item Calculate the $F$ statistic for each one. \pause \item SAS will do it. \pause Or anyway, it will try. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Calculate the $F$ statistic for each re-arrangement of the data} \pause %\framesubtitle{} \begin{displaymath} \frac{32!}{8! \, 4! \, 4! \, 4! \, 4! \, 4! \, 4!} = 34,149,454,710,484,113,000,000 \end{displaymath} \pause \begin{itemize} \item That's a big number. \pause \item Maybe we can distribute the computation among lots of computers. \pause \item World population is approximately 7.51 billion. \pause % Based on a projection from 2014 data \item That's $34149454710484113/7510 \approx 4.547 \times 10^{12}$ calculations per person. \pause \item If they all had computers and could do one test every 0.01 seconds,\pause \item It would take around 1,441.9 years to finish the job. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Some problems can be figured out in advance}\pause %\framesubtitle{} \begin{itemize} \item If both explanatory and response variable are binary (an important case), \pause Fisher derived the permutation distribution of the number of observations in the Yes, Yes cell \pause (equivalent to the odds ratio) \pause based on the hypergeometric distribution. \pause \item The result is called Fisher's exact test. \pause \item For non-binary response variables, one can convert the data to ranks. \pause \item Then, permutation distributions can be figured out in advance. \pause \item All the common non-parametric rank tests are permutation tests carried out on ranks. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Randomization Tests} \begin{frame} \frametitle{Randomization tests: A modern solution} \pause \framesubtitle{} \begin{itemize} \item Scramble the values of the response variable in a random order\pause, leaving the explanatory variable values in place. \pause \item Compute the test statistic for the randomly shuffled data. \pause \item We have randomly sampled a value of the test statistic from its permutation distribution. \pause \item Carry out the procedure a large number of times. \pause \item By the Law of Large Numbers, the permutation $p$-value is approximated by the proportion of randomly generated values that exceed or equal the observed value of the test statistic. \pause \item This proportion is the $p$-value of the randomization test. \pause \item The $p$-value of a randomization test is an \emph{estimate} of the $p$-value of the corresponding permutation test. \pause \item SAS does this (among other options) in \texttt{proc~npar1way}. \pause \item With a confidence interval for the permutation test $p$-value. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Multiple Comparisons} \begin{frame} \frametitle{Multiple Comparisons} \framesubtitle{Using randomization} \pause \begin{itemize} \item You could Bonferroni protect a collection of randomization tests, but this is better. \pause \item It's \emph{not} conservative. \pause \item You do need to know what all the tests are, in advance. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Use a standard $p$-value as the test statistic} \pause \begin{center} \includegraphics[width=3in]{PermDistP} \end{center} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{$p$-values} %\framesubtitle{} \begin{itemize} \item Have a family of tests, and an observed $p$-value from each one. \pause \item The event that at least one $p$-value is less than some critical value is the event that the \emph{minimum} is less than the critical value. \pause \item If the distribution of the test statistic is continuous and $H_0$ is true, $p$-values are uniformly distributed on $(0,1)$. \pause \item It's easy to derive the distribution of the minimum of a collection of independent uniforms. \pause \item Except the $p$-values are not independent. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The randomization test solution} \framesubtitle{Approximate permutation distribution for a family of tests} \pause \begin{itemize} \item Randomly permute the data, scrambling $y$ against $x$. \pause \item Calculate $p$-values for all the tests and take the minimum. \pause \item Repeat. \pause \item The result is a randomization distribution of minimum $p$-values. \pause \item This is an approximation of the corresponding permutation distribution. \pause \item Compare each observed $p$-value to the distribution of the minimum. \pause \item The proportion of minimum $p$-values at or below any given observed $p$-value is an \emph{adjusted} $p$-value. \pause \item If all null hypotheses are true, the probability of getting at least one adjusted $p$-value less than 0.05 equals 0.05. \pause \item Give or take discreteness and Monte Carlo sampling error. \pause \item \texttt{proc multtest} does this. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Bootstrap} \begin{frame} \frametitle{Bootstrap (Efron, 1979)} \framesubtitle{For comparison} \pause \begin{itemize} \item If the sample size is large enough, the histogram of the sample data is a lot like the histogram of the entire population. \pause \item Thus, sampling from the sample \emph{with replacement} is a lot like sampling from the population. \pause \item Sampling from the sample is called \textbf{resampling}. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Bootstrap distribution} \pause %\framesubtitle{} One can approximate the sampling distribution of a statistic as follows. \pause \begin{itemize} \item Select a random sample of size $n$ from the sample data, \emph{with replacement}. \pause \item Compute the statistic from the resampled data. \pause \item Do this over and over again, accumulating the values of the statistic. \pause \item A histogram of the values you have accumulated will resemble the sampling distribution of the statistic. \pause \item Use it to construct tests and confidence intervals. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Bootstrap vs. Randomization tests} \framesubtitle{Similarities and differences} \pause \begin{itemize} \item Both are computer-intensive Monte Carlo methods based on random number generation. \pause \item Neither requires any assumption about the distribution of the data. \pause \item Both substitute computing power for probability theory. \pause \item Bootstrap assumes random sampling and is justifiable only as $n \rightarrow \infty$\pause, though it often seems to work well with moderate sample sizes. \pause \item Randomization tests do \emph{not} assume random sampling and are \emph{exact} for small samples \pause (almost). \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \vspace{5mm} \href{http://www.utstat.toronto.edu/~brunner/oldclass/441s20} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/441s20}} \end{frame} \end{document} %%%%%%%%%%%%%%%%%% R code for the pictures ########################### Darwin's plants #################################### plants = read.table("DarwinPlants.data.txt",header=T) # Data are heights in eights of an inch, like Fisher # Fisher's Sum of differences is 314 # Permutation p-value 0.05267 # Fisher says t = 2.148, critical value 2.145 # Would yield p = 0.0497 attach(plants); diff = Crossed-Self; sum(diff) urn = c(-1,1) m = 100000; SumOfDifferences = numeric(m) for(j in 1:m) { jolt = sample(urn,15,replace=T) rdiff = diff*jolt # Randomizing the signs of the differences SumOfDifferences[j] = sum(rdiff) } hist(SumOfDifferences, breaks='fd', xlab = 'Sum of Differences', freq=F, main = "Permutation Distribution for Darwin's Plant Data") points(314,0,pch=17) text(314,-0.00005,'314',cex=0.5) ########################## Scab Disease ######################################## rm(list=ls()) scab = read.table("ScabDisease.data.txt",col.names=c("Condition","Infection")) table0 = anova(lm(Infection ~ Condition, data=scab)) f0 = table0[1,4]; p0 = table0[1,5]; c(f0,p0) condition = scab$Condition set.seed(4444) nsim = 100000; f = p = numeric(nsim) for(j in 1:nsim) { infection = sample(scab$Infection) Table = anova(lm(infection ~ condition)) f[j] = Table[1,4]; p[j] = Table[1,5] } # Next simulation hist(f,freq=F,breaks=100,xlab='F statistic', main = 'Permutation Distribution of the F Statistic') points(f0,0,pch=17) text(f0,-0.015,'3.61',cex=0.75) # subtitle = paste("F0 = ",round(f0,2)," Randomization p-value = ",round(mean(f>f0),4)) # title(sub=subtitle) # Superimpose F density x = seq(from=0,to=8,by=0.05); y = df(x,6,26) lines(x,y,lty=1) x0 = c(5,8); y0 = c(0.4,0.4); lines(x0,y0,lty=1) text(9,0.4,"F(6,25)") # titl = paste("Permutation Distribution of the p-value: p0 = ",round(p0,4)) titl = 'Permutation Distribution of the p-value' hist(p,freq=F,breaks=100,xlab='p-value', main=titl) points(p0,0,pch=17) text(p0,-0.025,'0.0103',cex=0.75) ########################## Absolute Cauchy ######################################## # The F approximation to the permutation distribution is amazingly good for the scab disease data. Can I make it fail? rm(list=ls()) # Generate data nj = 7 # Explanatory variable group = rep(c(1,2,3),each=nj); group group = factor(group) # Response variable set.seed(11111) y = round(abs(rcauchy(3*nj))) cbind(group,y) # For display cbind(y[1:7],y[8:14],y[15:21]) anova(lm(y ~ group)) # For un-scrambled data # Randomization distribution of F statistic set.seed(4444) nsim = 10000; f = p = numeric(nsim) for(j in 1:nsim) { yrand = sample(y) # Random permutation Table = anova(lm(yrand ~ group)) f[j] = Table[1,4]; p[j] = Table[1,5] } # Next simulation # Generates AbsCauchy1.pdf hist(f,freq=F,breaks='fd',xlab='F statistic', main = 'Permutation Distribution of the F Statistic') # Generates AbsCauchy2.pdf hist(f,freq=F,breaks='fd',xlab='F statistic', xlim = c(0,9), main = 'Permutation Distribution versus Theoretical F Distribution') # Superimpose F density, not to scale x = seq(from=0,to=9,by=0.01); y = df(x,2,18) y = y * 20 * max(y) lines(x,y,lty=1) critval = qf(0.95,2,18); critval points(critval,0,pch=17) text(critval,-0.7,'Critical value',cex=0.75) x0 = c(4,6); y0 = c(20,20); lines(x0,y0,lty=1) text(6.75,20,"F(2,18)") text(7,19,"Not to scale") ########################## Likert ######################################## # Likert histogram: Makes LikertScale.pdf p = round((1:7)/28, 2) # Subjects use the upper end of the scale. y = sample(7, size=100000, replace = TRUE, prob = p) truehist(y,breaks=seq(from=.5,to=7.5,by=1),xlab='Rating', col=3) ######################################## rm(list=ls()) # Generate data nj = 7; n = 3*nj # Explanatory variable group = rep(c(1,2,3),each=nj); group group = factor(group) # Response variable set.seed(11111) p = round((1:7)/28, 2) # Subjects use the upper end of the scale. y = sample(7, size=n, replace = TRUE, prob = p) cbind(group,y) # Randomization distribution of F statistic set.seed(4444) nsim = 10000; f = p = numeric(nsim) for(j in 1:nsim) { yrand = sample(y) # Random permutation Table = anova(lm(yrand ~ group)) f[j] = Table[1,4]; p[j] = Table[1,5] } # Next simulation hist(f,freq=F,breaks='fd',xlab='F statistic', xlim=c(0,10), main = 'Permutation Distribution of the F Statistic with Likert Data') # Superimpose F density xx = seq(from=0,to=10,by=0.01); yy = df(xx,2,n-3) lines(xx,yy,lty=1) x0 = c(4,6); y0 = c(0.6,0.6); lines(x0,y0,lty=1) text(6.75,0.6,"F(2,18)") # What about the test? critval = qf(0.95,2,n-3); critval points(critval,0,pch=17) text(critval,-0.02,'Critical value',cex=0.75) # Proportion of randomization test statistics above critical value # 0.0542 for nj=7, 0.0472 for nj=30 mean(f>critval)