\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[scr=rsfs,cal=boondox]{mathalfa} % For \mathscr, which is very cursive. \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312f22 Assignment Three}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent Please bring your complete R input and output from Question \ref{beer1} to the quiz. The non-computer questions are practice for the quiz on Friday Oct. 7th, and are not to be handed in. On the quiz, you will be given a copy of the \href{https://www.utstat.toronto.edu/brunner/312f22/formulas/312f22Formulas2.pdf} {formula sheet}, so please use the formula sheet while doing your homework, and let me know if there are problems. There is a link to the formula sheet from the \href{http://www.utstat.toronto.edu/~brunner/oldclass/312f22} {course home page}, in case the one in this document does not work. \begin{enumerate} \item For a random sample of size $n$ from a multinomial distribution $M\left(1,(\pi_1, \ldots, \pi_c)\right))$, \begin{enumerate} \item What is the probability distribution of the frequency $n_j$? It has a familiar name. Just write down the answer. \item What is the expected value of the frequency $n_j$? Just write down the answer. \item Let $\widehat{\pi}_1, \ldots, \widehat{\pi}_c$ be the estimated probabilities given some null hypothesis. What is the \emph{estimated} expected value of the frequency $n_j$? Just write down the answer. \end{enumerate} \item The formula sheet has two formulas for the likelihood ratio test statistic $G^2$. The one on the left is general, and the one on the right is specific to the multinomial model. Show how to get from $G^2 = -2 \log \left( \frac{\max_{\beta \in \mathscr{B}_0} \ell(\beta)} {\max_{\beta \in \mathscr{B}} \ell(\beta)} \right)$ to $G^2 = 2 \sum_{j=1}^c n_j\log \left(\frac{n_j}{\widehat{\mu}_j}\right)$. \item You must admit that the Bernoulli is a special case of the multinomial. Based on a random sample of size $n$ from a Bernoulli, you want to test $H_0: \pi=\pi_0$. \begin{enumerate} \item Obtain a special-case formula for the likelihood ratio statistic $G^2$. The answer is something you can compute based on $\pi_0$ and the frequencies. \item Obtain a special-case formula for the Pearson chi-squared statistic $X^2$. The answer is something you can compute based on $\pi_0$ and the frequencies. \item What are the degrees of freedom? The answer is a number. \end{enumerate} \item \label{bigred} Here is a repeat of the question from the last assignment. This time, you are asked to do it with technology from the multinomial. For years, brand awareness for Big Red chewing gum has been stuck at about 6\%, meaning that about 6\% of consumers who chew gum say they remember hearing about Big Red gum. The company started an advertising campaign to increase brand awareness, and after it had been running for a few weeks, they interviewed a random sample of 200 gum chewers, and found that twenty had heard of Big Red. \begin{enumerate} \item State the model and the null hypothesis, or look at your answer from last week. \item The Pearson $X^2$ and large-sample likelihood test $G^2$ have the same critical value. What is it? The answer is a number. \item Carry out the large-sample likelihood test $G^2$. Use the formula sheet and a calculator. \begin{enumerate} \item What is the value of the test statistic? The answer is a number. \item Do you reject $H_0$ at $\alpha=0.05$? Answer Yes or No. \item State your conclusion in plain, non-statistical language. For a directional conclusion, compare the observed and (estimated) expected frequencies. \end{enumerate} \item Carry out the Pearson chi-squared test $X^2$. Use the formula sheet and a calculator. \begin{enumerate} \item What is the value of the test statistic? The answer is a number. \item Do you reject $H_0$ at $\alpha=0.05$? Answer Yes or No. \item State your conclusion in plain, non-statistical language. \end{enumerate} \end{enumerate} % End of Big Red \item \label{beer1} Under carefully controlled conditions, 120 beer drinkers tasted 6 beers and indicated which one they liked best. Here are the numbers preferring each beer. \begin{center} \begin{tabular}{|l|c|c|c|c|c|c|} \hline & \multicolumn{6}{c|}{Preferred Beer} \\ \hline & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline Frequency & 30 & 24 & 22 & 28 & 9 & 7 \\ \hline \end{tabular} \end{center} % x2 = rmultinom(1,size=120,prob=c(2,2,2,2,1,1)); x2 The main question is whether preference for the 6 beers is different in the population from which this sample was taken. Use R whenever possible. \emph{You may be asked to hand in your printout of the R parts, so please print this R session on a separate sheet of paper.} \begin{enumerate} \item State a reasonable model for these data. \item What is the parameter space $\mathscr{B}$? \item State the null hypothesis in symbols. It is a statement about the $\pi_j$s. Please be specific. The research question allows you to give a specific numerical value for each $\pi_j$ under $H_0$. \item What is the restricted parameter space $\mathscr{B}_0$? How many points are in $\mathscr{B}_0$? \item What are the degrees of freedom of the test? The answer is a number. \item What are the expected frequencies under $H_0$? Your answer is a set of 6 numbers. Are these estimated expected values, or exact expected values? \item Calculate the likelihood ratio test statistic $G^2$ using $R$. \item Obtain the critical value at $\alpha=0.05$? with $R$. Compare the number on the formula sheet. \item Calculate the $p$-value using $R$. Print out all the $R$ input and output, and bring it to the quiz. \item Do you reject the null hypothesis at $\alpha=0.05$? Answer Yes or No. \item It is tempting to ask you to state your conclusion in words. But all you can conclude without further testing is that preference for all the beers is not equal. It \emph{looks} like preference for beers 1 through 4 is greater than preference for 5 and 6, and this is what you would tell your management or client in a job situation. \item Calculate the Pearson chi-square statistic $X^2$ for these data, using $R$. \item Do you reject the null hypothesis at $\alpha=0.05$? Answer Yes or No. Just so you can check your answer to this question, my $p$-value for $X^2$ is 0.0002479085. \end{enumerate} % End of beer1 \item \label{beer2} In Question \ref{beer1}, you found that preference for the 6 beers was not the same. But it seems that the first 4 beers are lagers and the last two are ales. No one would expect preference for lagers and ales to be the same.\footnote{Actually, I am making all this up with only a vague idea of what these terms mean.} So let's test whether preference for the 4 lagers is different, and at the same time, whether preference for the 2 ales is different. \begin{enumerate} \item State the null hypothesis in symbols. It is a statement about the $\pi_j$s. \item What are the degrees of freedom of the test? The answer is a number. \item Differentiating the log likelihood function, obtain the maximum likelihood estimator of the parameter $\boldsymbol{\pi}$, under the null hypothesis. Show all your work. The answer is a symbolic expression, a vector of length 6. If it is not absolutely the most natural thing you can imagine, then either it's wrong or you have not simplified enough. % 4x (x1+x2+x3+x4)/(4N), 2x (x5+x6)/(2N) \item Give the maximum likelihood estimate of $\boldsymbol{\pi}$ under the null hypothesis for this particular data set. The answer is a set of 6 numbers. Note the difference between an estimator and an estimate. \item What are the expected frequencies under $H_0$? Your answer is a set of 6 numbers. Are these estimated expected values, or exact expected values? \item Using the formula sheet, calculate the likelihood ratio test statistic $G^2$. Show your work. Your answer is a number. This is something you should be able to do with a calculator if necessary on the quiz. You would be silly not to check it with $R$. \item Calculate the Pearson $X^2$ statistic for these data. The answer is a number. \item What is the critical value for this test at $\alpha=0.05$? The answer is a number. \item Do you reject the null hypothesis at $\alpha=0.05$ based on the likelihood ratio test? Answer Yes or No. \item Do you reject the null hypothesis at $\alpha=0.05$ based on the Pearson Chisquare test? Answer Yes or No. \item Based on this analysis, are you able to conclude that there is any difference in preference among the 4 lagers or between the 2 ales? Answer Yes or No. (In applied situations, beware of concluding that there is no effect --- that is, \emph{absolutely no difference at all} in the population.) \end{enumerate} Just so you can check your answer to this question, my $p$-value for the likelihood ratio test is 0.7735209. \item A sample of 150 students each try to solve two difficult logic problems. The problems are complicated, but it's multiple choice, so the answers are either right or wrong. For each student, the data file indicates whether he or she got Question 1 right, and whether he or she got Question 2 right. It is expected that students who get one question correct will also tend to get the other correct; this is not the issue. The issue is whether the one question is more difficult than the other. At first glance, this seems like just a problem of comparing two proportions, but there is a twist. Each student answers \emph{both} questions. In a multinomial model, there are $n$ \emph{independent} observations, so each case (person or whatever) must contribute only one frequency to the table. This experiment is an example of a within-cases (repeated measures) design, but with categorical data. We will set up the problem as a multinomial with four categories, as follows. \begin{center} \begin{tabular}{|l|c|c|} \hline & Question 1 Correct & Question 1 Incorrect \\ \hline Question 2 Correct & $\pi_1$ & $\pi_2$ \\ \hline Question 2 Incorrect & $\pi_3$ & $\pi_4$ \\ \hline \end{tabular} \end{center} The observed frequencies are: % x1 = rmultinom(1,size=150,prob=c(70,40,30,10)); x1 \begin{center} \begin{tabular}{|l|c|c|} \hline & Question 1 Correct & Question 1 Incorrect \\ \hline Question 2 Correct & 66 & 41 \\ \hline Question 2 Incorrect & 30 & 13 \\ \hline \end{tabular} \end{center} Note that even though these look like 2-dimensional tables, we have not reached contingency tables yet in the assignments. This is a one-dimensional multinomial. \begin{enumerate} \item State the null hypothesis in symbols. It is a statement about the $\pi_j$s. Simplify. \item What are the degrees of freedom for this test? % $p_1$, $p_2$, $p_3$ and $p_4$. \item Differentiating the log likelihood function, obtain the maximum likelihood estimator of the parameter $\boldsymbol{\pi}$, under the null hypothesis. Show all your work. The answer is a symbolic expression, a vector of length 4. \item Give the maximum likelihood estimate of $\boldsymbol{\pi}$ under the null hypothesis for this particular data set. The answer is a set of 4 numbers. \item What are the expected frequencies under $H_0$? Your answer is a set of 4 numbers. Are these estimated expected values, or exact expected values? \item Using the formula sheet, calculate the likelihood ratio test statistic $G^2$. Show your work. Your answer is a number. This is something you should be able to do with a calculator if necessary on the quiz. \item Calculate the Pearson $X^2$ statistic for these data. The answer is a number. This is something you should be able to do with a calculator if necessary on the quiz. \item What is the critical value for this test at $\alpha=0.05$? The answer is a number. \item Do you reject the null hypothesis at $\alpha=0.05$ based on the likelihood ratio test? Answer Yes or No. \item Do you reject the null hypothesis at $\alpha=0.05$ based on the Pearson Chisquare test? Answer Yes or No. \item What percent of students answered Question 1 correctly? The answer is a number. \item What percent of students answered Question 2 correctly? The answer is a number. \item Does this study provide solid evidence that the two questions differ in their difficulty? Answer Yes or No. \end{enumerate} \end{enumerate} % End of all the questions \vspace{10mm} \begin{center} \textbf{Please bring hard copy of your R input and output for Question~\ref{beer1} to the quiz.} \end{center} \vspace{20mm} %\newpage \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/312f22} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/312f22}} \end{document}