\documentclass[11pt]{article} 
%\usepackage{amsbsy} % for \boldsymbol and \pmb 
\usepackage{graphicx} % To include pdf files!
\usepackage{amsmath}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage{comment}
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links
\usepackage{fullpage}
%\pagestyle{empty} % No page numbers


\begin{document}
%\enlargethispage*{1000 pt} 

\begin{center}   
{\Large \textbf{STA 312f22 Assignment Two}}\footnote{Copyright information is at the end of the last page.}
\vspace{1 mm}
\end{center}

\noindent
These questions are practice for the quiz on Friday Sept. 30th, and are not to be handed in.

\begin{enumerate} 

\item Customers arrive at a Tim Hortons according to a Poisson process with rate $\lambda=30$ per hour. What is the probability that exactly 40 customers arrive during a one-hour period? The answer is a number. My answer is 0.01394346. % dpois(40,30) = 0.01394346

    \item \label{bigred} For years, brand awareness for Big Red chewing gum has been stuck at about 6\%, meaning that about 6\% of consumers who chew gum say they remember hearing about Big Red gum. The marketing department is planning an advertising campaign to increase brand awareness, in the hope that increased brand awareness will lead to increased sales. Once the campaign was running a few weeks, they interviewed a random sample of 200 gum chewers, and found that twenty had heard of Big Red.
        \begin{enumerate}
            \item State a reasonable model for these data.
        %   \item What is the parameter space $\mathcal{B}$?
            \item Without any derivation, estimate the brand awareness for Big Red, in percent. Your answer is a number between zero and one hundred. 
            \item Give an approximate 95\% confidence interval for the brand awareness in percent. Your answer is a set of two numbers. My upper confidence limit is 0.1416. % (0.0584,0.1416)
            \item What is the null hypothesis corresponding to the \emph{main question}, in symbols?
            \item What is the critical value (or values) of the test statistic at $\alpha=0.05$ for a 2-sided test?  The answer is a number or a pair of numbers.
            \item Calculate the test statistic $Z_2$ --- see lecture notes. The formulas for $Z_1$ and $Z_2$ will be provided with the quiz if necessary. What is the value of the test statistic? Your answer is a number. Show some work. My answer is 1.88.
                \begin{enumerate}
                    \item Do you reject $H_0$ at $\alpha=0.05$? Answer Yes or No.
                    % \item Using R, calculate the $p$-value. If necessary, see \texttt{help(pnorm)}. Make sure the p-value is on the printout you bring to the quiz.
                    \item Do the data provide adequate evidence against the null hypothesis?
                    \item In plain, non-statistical language, what do you conclude? Your answer is a statement about brand awareness.
                \end{enumerate}
            \item Some clever person suggests that for this problem, the test based on $Z_1$ will always be bigger than $Z_2$ as long as $p>\pi_0$. Try it and see. What is the value of $Z_1$? Your answer is a number. Show some work.
                \begin{enumerate}
                    \item Do you reject $H_0$ at $\alpha=0.05$? Answer Yes or No.
                    \item What is the critical value (or values) of the test statistic at $\alpha=0.05$ for a 2-sided test? 
                    \item Do the data provide adequate evidence against the null hypothesis?
                    \item In plain, non-statistical language, what do you conclude? Your answer is a statement about brand awareness.
                \end{enumerate}

\newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

            \item But is $Z_1$ \emph{always} bigger than $Z_2$ when $p>\pi_0$, as claimed? To answer this question, it is easier to base the test on $Z^2$ rather than $Z$. The null hypothesis is rejected when $|Z|>1.96$, which occurs if and only if $Z^2>1.96^2 = \chi^2_{0.05(1)}$ -- so it's the same test. Now find the value of $p$ for which the denominator of $Z_2^2$ is greatest. Make a rough sketch of the function. Now you can answer the question: Is $Z_1$ \emph{always} bigger than $Z_2$ when $p>\pi_0$?

        \end{enumerate}

    \item Ten friends have a party right after graduating from university. At the time, none of them has ever been married. The party includes a visit by a fortune teller, who says ``Five years from now, 3 of you will still be unmarried, 3 of you will be married for the first time, 2 will be divorced, one will be married for the second time, and one will be widowed."

    How many ways are there for this to happen? The answer is a number. Show some work. My answer is 50,400.

    \item Students entering U of T have to choose a division: Humanities, Social Sciences, or Sciences. 
        \begin{enumerate}
            \item Of the 25 students from a particular high school, how many ways are there for 8 to choose the Humanities, 14 to choose the Social Sciences and 3 to choose the Sciences? The answer is a number. My answer is 735,471,000, but you can just leave it in factorial form.
            \item Of the 3 students from another high school, how many ways are there for 1 to choose the Humanities, 1 to choose the Social Sciences and 1 to choose the Sciences? The answer is a number. Show your work.
        \end{enumerate}

    \item Please do problem 1.6 from the text. % An easy multinomial.
          For (b), I get 10 possibilities. My answer to (c) is 3/16.

    \item A fair die is tossed 8 times. What is the probability of observing the numbers 3 and 4 twice each, and the others once each? The answer is a number. My answer is around 0.006. % about 0.006, stolen from Schaum's outline with slight changes.

    \item A box contains 5 red, 3 white and two blue marbles. A sample of six marbles is drawn with replacement. Find the probability that
        \begin{enumerate}
            \item 3 are red, 2 are white and one is blue. My answer is 0.135.
            \item 2 are red, 3 are white and 1 is blue. My answer is 0.0810.
            \item 2 of each colour appear. My answer is 0.0810.
        \end{enumerate}
% All the answers are numbers. 
% Stolen from Schaum's outline without modification except for the spelling of colour.

    \item Let $\mathbf{Y}_1, \ldots, \mathbf{Y}_n$ be a random sample from a $M\left(1,(\pi_1, \ldots ,\pi_c)\right)$ distribution. Show why the likelihood function is written $\ell(\boldsymbol{\pi})  = \pi_1^{n_1} \pi_2^{n_2} \cdots \pi_c^{n_c}$.

    \item Let $\mathbf{Y}_1, \ldots, \mathbf{Y}_n$ be a random sample from a $M\left(1,(\pi_1,\pi_2,\pi_3)\right)$ distribution. Find the maximum likelihood estimator of $(\pi_1,\pi_2,\pi_3)$. Show \emph{all} your work.

\end{enumerate}

% \vspace{50mm}

%\newpage
\noindent
\begin{center}\begin{tabular}{l}
\hspace{6in} \\ \hline
\end{tabular}\end{center}
This assignment was prepared by  \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner},
Department of Statistics, University of Toronto. It is licensed under a 
\href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US}
     {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website:
\href{http://www.utstat.toronto.edu/brunner/oldclass/312f22} {\texttt{http://www.utstat.toronto.edu/brunner/oldclass/312f22}}

\end{document}


% Excellent big beer problem in 2010 ass2
\begin{comment}
\item \label{beer} Under carefully controlled conditions, 120 beer drinkers each tasted 6 beers and indicated which one they liked best. Here are the numbers preferring each beer.
\begin{center}
\begin{tabular}{|l|c|c|c|c|c|c|}  \hline
           & \multicolumn{6}{c|}{Preferred Beer}  \\ \hline
           &   1   &  2    &  3     &  4     &  5     &  6     \\ \hline
Frequency  &  30   &  24   &  22    & 28    &   9    &   7   \\ \hline
\end{tabular}
\end{center}
% x2 = rmultinom(1,size=120,prob=c(2,2,2,2,1,1)); x2

The main question is whether preference for the 6 beers is different in the population from which this sample was taken. Use R whenever possible. \emph{You may be asked to hand in your printout of the R parts, so please print this R session on a separate sheet of paper.}

    \begin{enumerate}
                \item State a reasonable model for these data.
                \item What is the parameter space $\mathcal{B}$?
                \item State the null hypothesis in symbols. It is a statement about the $\pi_j$s. Please be specific. The research question allows you to give a specific numerical value for each $\pi_j$ under $H_0$.
                \item What is the restricted parameter space $\mathcal{B}_0$?
                \item What are the degrees of freedom of the test? The answer is a number.
                \item What are the expected frequencies under $H_0$? Your answer is a set of 6 numbers. Are these estimated expected values, or exact expected values?
                \item Calculate the likelihood ratio test statistic $G^2$. Show some work. This is something you should be able to do with a calculator if necessary on the quiz. Your answer is a number. 
                \item Now calculate $G^2$ again using $R$. 
                \item Obtain the critical value at $\alpha=0.05$? with $R$. 
                \item Calculate the $p$-value using $R$. Print out all the $R$ output and bring it to the quiz.
                \item Do you reject the null hypothesis at $\alpha=0.05$? Answer Yes or No.
                \item It is tempting to ask you to state your conclusion in words. But all you can conclude without further testing is that preference for all the beers is not equal. It \emph{looks} like preference for beers 1 through 4 is greater than preference for 5 and 6, and this is what you would tell your management or client in a job situation. 
                \item Calculate the Pearson chi-square statistic $X^2$ for these data. Your answer is a number. This is something you should be able to do with a calculator if necessary on the quiz.
                \item Now calculate $X^2$ again using $R$. 
                \item Do you reject the null hypothesis at $\alpha=0.05$? Answer Yes or No.
    Just so you can check your answer to this question, my $p$-value for $X^2$ is 0.0002479085.
        \end{enumerate}
\end{comment}

# Beer
obs = c(30,24,22,28,9,7); obs
ex = 120*c(1,1,1,1,1,1)/6 ; ex
G2 = 2*sum(obs*log(obs/ex)); G2
pvalG <- 1-pchisq(G2,5); pvalG
X2 = sum((obs-ex)^2/ex); X2
pvalX <- 1-pchisq(X2,5); pvalX