\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{comment} \usepackage[scr=rsfs,cal=boondox]{mathalfa} % For \mathscr, which is very cursive. \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312f22 Assignment Five}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent Please bring hard copy of your complete R input and output from Question \ref{Rquestion} to the quiz. The non-computer questions are practice for the quiz on Friday Oct. 28th, and are not to be handed in. \vspace{2mm} \hrule \begin{enumerate} \item This question and the next one could have been on Assignment 4. That is, they are based on Contingency Tabes Part One, not Two. Let $X_1, \ldots, X_{n_1} \stackrel{iid}{\sim}$ Bernoulli$(\pi_1)$, and independently, $Y_1, \ldots, Y_{n_2} \stackrel{iid}{\sim}$ Bernoulli$(\pi_2)$. That is, we have independent random samples from two Bernoulli distributions, and we want to test whether the two probabilities $\pi_1$ and $\pi_2$ are different. \begin{enumerate} \item Derive the likeihood ratio statistic for testing $H_0: \pi_1 = \pi_2$. Your final answer is a formula. You may use the fact that the unrestricted MLE of $\pi_1$ is $p_1$ and the unrestricted MLE of $\pi_2$ is $p_2$. \item Of 140 study participants receiving a placebo, 22.1\% got sick, while only 12.2\% of the 139 participants who took Vitamin~C became ill. Calculate your test statistic and the $p$-value, and state your conclusion. The percentages are not quite exact (they are rounded), so it may help to start by recovering the observed frequencies. \item How could you have done this problem with a lot less work? Hint: Your $G^2$ statistic (the actual number) is in the lecture slides. \end{enumerate} \item Please use the notation shown in the following table. Assume it's a cross-sectional study. \begin{center} \begin{tabular}{|c|c|c|} \hline $x$ & $a-x$ &{\color{red} $a$ } \\ \hline $b-x$ & $1-a-b+x$ &{\color{red} $1-a$ } \\ \hline {\color{red} $b$ } &{\color{red} $1-b$ } & {\color{red} $1$ } \\ \hline \end{tabular} \end{center} \begin{enumerate} \item How many free (unknown) parameters are there under the null hypothesis of independence? In the notation of the table, what are they? \item Write the likelihood function, restricted by the null hypothesis of independence. Use the symbols $n_{11}, n_{12}, n_{21}, n_{22}$ for the cell frequencies. \item Find the MLE of the free parameters of the restricted model. Show your work. \item Write down the entire restricted MLE (four quantities) in a $2 \times 2$ table. \item Based on your work, give a formula for $\widehat{\mu}_{ij}$. Simplify. % \item Switching back to the usual $\pi_{ij}$ notation of the table in Question~\ref{table1}, find $\pi_{ij}(M)$ for $i=1,2$ and $j=1,2$. Use the Law of Large Numbers, which basically says $p \rightarrow \pi$. Show your work % \item Give a formula for the non-centrality parameter $\lambda$ for the Pearson $X^2$ test of independence, applied to a $2 \times 2$ table. \end{enumerate} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Denote the probability of an event by $\pi$, and the odds of the event by $d$. \begin{enumerate} \item Give an expression for $\pi$ as a function of $d$. Show your work. \item Show that $d$ is a strictly increasing function of $\pi$. So, the greater the probability, the greater the odds. For this item, you could differentiate either the odds or the log odds. \end{enumerate} \item \label{table1} Using the notation in this table and assuming a cross-sectional study, \begin{center} \begin{tabular}{|l|c|c|c|} \hline & $Y=1$ & $Y=2$ & Total \\ \hline $X=1$ & $\pi_{11}$ & $\pi_{12}$ &{\color{red} $\pi_{11}+\pi_{12}$ } \\ \hline $X=2$ & $\pi_{21}$ & $\pi_{22}$ &{\color{red} $\pi_{21}+\pi_{22}$ } \\ \hline Total &{\color{red} $\pi_{11}+\pi_{21}$ } &{\color{red} $\pi_{12}+\pi_{22}$ } & \\ \hline \end{tabular} \end{center} \begin{enumerate} \item What are the odds of $Y=1$ given $X=1$? Just write the answer down. \item What are the odds of $Y=1$ given $X=2$? Just write the answer down. \item What are the odds of $Y=1$ given $X=1$ \emph{divided by} the odds of $Y=1$ given $X=2$? Simplify. \item Show that $P(Y=1|X=1)=P(Y=1|X=2)$ if and only if $\theta=1$. \item Show that $P(X=1|Y=1)=P(X=1|Y=2)$ if and only if $\theta=1$. \item The two-by-two table is a matrix. Show that the odds ratio equals one if and only if the determinant of the matrix is zero. \end{enumerate} \item Make a table like the table in Question~\ref{table1}, except for a prospective design. Show that the odds ratio equals the cross-product ratio. \item Make a table like the table in Question~\ref{table1}, except for a \emph{retrospective} design. This time, the odds ratio of interest is the odds of $X=1$ given $Y=1$ divided by the odds of $X=1$ given $Y=2$. Show that the odds ratio equals the cross-product ratio. % Fisher's exact \item This question is about Fisher's exact test, In the $2 \times 2$ table below, the integers $a$, $b$ and $n$ are fixed. \begin{center} \begin{tabular}{|c||c|c||c|} \hline & Response=1 & Response=2 & \\ \hline \hline Explanatory=1 & $x$ & $a-x$ &{\color{red}$a$} \\ \hline Explanatory=2 & $b-x$ & $n-a-b+x$ &{\color{red}$1-a$} \\ \hline \hline &{\color{red} $b$ }&{\color{red}$1-b$}& {\color{red}$n$} \\ \hline \end{tabular} \end{center} \begin{enumerate} \item How many ways are there for $a$ of the cases to have the explanatory variable equal to one and $b$ of the cases to have the response variable equal to one? Express your answer in terms of binomial coefficients. \item How many ways are there to observe the four cell frequencies shown in the table? Express your answer as a multinomial coefficient. \item Note that if $x$ is specified, the other 3 cell frequencies are determined. If all the ways of sorting the observations subject to the constraints are equally likely (that's what $H_0$ says), what is the probability of $n_{11}=x$. Express your answer in terms of binomial coefficients. \item Show that the cross-product ratio $\theta$ is an increasing function of $x$. This means that tail probabilities for any observed cross-product ratio can be obtained by summing over $x$ in your last answer. \item Noting that each of the four cells in the table must be non-negative, what is the range of possible values for the random variable $n_{11}$? Show some work. \end{enumerate} \item \label{Rquestion} Data from the 1912 sinking of the Titanic are available in a built-in R dataset. See \texttt{help(Titanic)} for details. For this question, you might want to look at the Contingency Tables with R lecture. Also, I found the following R functions to be helpful: \texttt{as.data.frame}, \texttt{subset}, \texttt{xtabs}, and of course \texttt{chisq.test}. \begin{enumerate} \item What kind of design is this? Introspective? \item Make a $2 \times 2$ table of Sex by Survived, just for adult passengers. Display the observed frequencies. \begin{enumerate} \item Use \texttt{prop.table} to get a table of the relevant proportions. Round to~3 decimal places. \item Calculate the sample odds ratio (not the restricted MLE produced by \texttt{fisher.test}). In words, what does this number represent? % Estimated odds of death for men were 12.6 times as great as the odds of death for women. Or, estimated odds of survival were 12.6 times as great for women compared to men. \item Carry out a Pearson chi-squared test. Display the test statistic, degrees of freedom and $p$-value. \item In plain, non-statistical language, what do you conclude? \end{enumerate} \item Now make a $3 \times 2 \times 2$ table of Class by Sex by Survived, again just for adult passengers. \begin{enumerate} \item Use \texttt{prop.table} to get a table of proportions. What you want is a table showing the proportions of males and females who survived, for each class separately. \item Carry out a Pearson chi-squared test separately for each Class. In plain, non-statistical language, what do you conclude? So you can check, I get $X^2 = {\color{red}59.159}$ for third class. \item Calculate the sample odds ratio for each sub-table. So you can check, I get 72.45614 for first class. Do these odds ratios look equal? Later, we will be able to test it. \end{enumerate} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item You have done a lot of the coding already, so it's easy to look at just the children. \begin{enumerate} \item Start with a $3 \times 2 \times 2$ table of Class by Sex by Survived, just for children. Comment. \item How about estimated odds ratios? \item Do the only analysis you really can do. What proportion of female 3d class children survived? Males? Is the difference statistically significant at $\alpha = 0/05$ using the Pearson test? \item Describe your findings for the children (all 3 passenger classes) in plain, non-statistical language. \end{enumerate} \end{enumerate} % \texttt{} \end{enumerate} % End of all the questions \vspace{10mm} \begin{center} \textbf{Please bring hard copy of your complete R input and output from Question \ref{Rquestion} to the quiz.} \end{center} \vspace{20mm} %\newpage \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/312f22} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/312f22}} \end{document}