\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 2101/442 Assignment Two}}\footnote{Copyright information is at the end of the last page.} \vspace{1 mm} \end{center} \noindent Please bring your R printouts to the quiz on Friday Sept. 21st. The non-computer parts are just practice for the quiz, and are not to be handed in \begin{enumerate} \item In a risky type of brain surgery, seventy-five percent of patients survive for at least 24 hours after the surgery. But at a hospital that usually achieves this success rate, 15 out of the last 30 patients have died. Could this be due to chance? \begin{enumerate} \item State a reasonable model for these data. \item What is the parameter space? \item Without any derivation, estimate the parameter in your model. Your answer is a number. \item Give an approximate 95\% confidence interval for the probability of survival. Your answer is a set of two numbers. \item What is the null hypothesis corresponding to the \emph{main question}, in symbols? \item What is the critical value (or values) of the test statistic at $\alpha=0.05$ for a 2-sided test? The answer is a number or a pair of numbers. \item Calculate a reasonable test statistic. Your answer is a number. Show some work. \begin{enumerate} \item Do you reject $H_0$ at $\alpha=0.05$? Answer Yes or No. \item Using R, calculate the $p$-value. Make sure it's on the printout you bring to the quiz. \item Do the data provide convincing evidence against the null hypothesis? \item In plain, non-statistical language, what do you conclude? Your answer is a statement about surviving this surgery. \end{enumerate} \end{enumerate} \item \label{quebec} A polling firm plans to ask a random sample of registered voters in Quebec whether Quebec should separate from Canada and become an independent nation: Yes or No. They would like to be able to say that their results are expected to be accurate within three percentage points, nineteen times out of twenty. \begin{enumerate} \item Suppose the population percent favouring independence is 25\%. What sample size is required to achieve the desired margin of error? \item Suppose the population percent favouring independence is 40\%. What sample size is required to achieve the desired margin of error? \item What sample size would be required if you were unwilling to make any assumptions about the true percentage favouring independence? \end{enumerate} \item \label{qtest} Still for the Quebec polling study of Problem~\ref{quebec}, suppose we intend to test whether the true percent favouring independence is different from 50\%, and in fact the true percent is 53\%. What is the minimum sample size required to reject the null hypothesis at $\alpha=0.05$ with probability at least 0.80, using a 2-sided $Z_2$ test? Do the calculation with R. Please use the non-central chi-squared distribution. \emph{You may be asked to hand in your printout, so please print this R session on a separate sheet of paper.} \item \label{bigred} For years, brand awareness for Big Red chewing gum has been stuck at about 6\%, meaning that about 6\% of consumers who chew gum say they remember hearing about Big Red gum. The gum company is planning an advertising campaign to increase brand awareness, in the hope that increased brand awareness will lead to increased sales. The advertising agency has a problem. With the budget they have been given to purchase media (air time and so on), they are confident they can move brand awareness a little -- perhaps to 8\%. In the old days, they could tell the client they had increased awareness by 33\% and start to celebrate, but now the client has fallen under the influence of a U of T graduate who insists that a null hypothesis be rejected at the $\alpha=0.05$ level with a non-directional test before they admit that anything actually worked. So the advertising agency has to decide how many people they need to survey when they measure brand awareness, in order to have a good chance of rejecting the null hypothesis. It's important, because if the client thinks the advertising didn't work, they might get a new advertising agency. \begin{enumerate} \item Suppose they want to be 90\% sure of rejecting $H_0$ if they increase brand awareness to 8\%. What sample size do they need if they use a $Z_2$ test as in Problem~\ref{qtest}? Please obtain the answer using R and bring your printout to the quiz. \item What sample size do they need if they use a $Z_1$ test? Following some hand calculations involving the non-central chi-squared distribution, please obtain the answer using R and bring your printout to the quiz. \item Please check your calculation involving the $Z_1$ test with a simulation. Use a Monte Carlo sample size of 10,000. Bring your printout to the quiz. \end{enumerate} \item \label{inverseCDF} This is about how to simulate from a continuous univariate distribution. Let the random variable $X$ have a continuous distribution with density $f_X(x)$ and cumulative distribution function $F_X(x)$. Suppose the cumulative distribution function is strictly increasing over the set of $x$ values where $F_X(x)>0$, so that $F_X(x)$ has an inverse. Let $U$ have a uniform distribution over the interval $(0,1)$. Show that the random variable $Y = F^{-1}(U)$ has the same distribution as $X$. Hint: Start by finding $F_U(u) = Pr\{U \leq u\}$. \item Let $X_1 , \ldots, X_n$ be a random sample from a Binomial distribution with parameters $3$ and $\theta$. That is, \begin{displaymath} P(X_i = x_i) = \binom{3}{x_i} \theta^{x_i} (1-\theta)^{3-x_i}, \end{displaymath} for $x_i=0,1,2,3$. Find the maximum likelihood estimator of $\theta$, and show that it is strongly consistent. \item Let $X_1 , \ldots, X_n$ be a random sample from a continuous distribution with density \begin{displaymath} f(x;\tau) = \frac{\tau^{1/2}}{\sqrt{2\pi}} \, e^{-\frac{\tau x^2}{2}}, \end{displaymath} where the parameter $\tau>0$. Let \begin{displaymath} \widehat{\tau} = \frac{n}{\sum_{i=1}^n X_i^2}. \end{displaymath} Is $ \widehat{\tau}$ a consistent estimator of $\tau$? Answer Yes or No and prove your answer. Hint: You can just write down $E(X^2)$ by inspection. This is a very familiar distribution. % \item Let $X_1, \ldots, X_n$ be a random sample from a distribution with mean $\mu$. Show that $T_n = \frac{1}{n+400}\sum_{i=1}^n X_i$ is a strongly consistent estimator of $\mu$. % That could be a quiz Q \item Let $X_1, \ldots, X_n$ be a random sample from a distribution with mean $\mu$ and variance $\sigma^2$. Prove that the sample variance $S^2=\frac{\sum_{i=1}^n(X_i-\overline{X})^2}{n-1}$ is a strongly consistent estimator of $\sigma^2$. \item \label{randiv} Independently for $i = 1 , \ldots, n$, let \begin{displaymath} Y_i = \beta X _i + \epsilon_i, \end{displaymath} where $E(X_i)=E(\epsilon_i)=0$, $Var(X_i)=\sigma^2_X$, $Var(\epsilon_i)=\sigma^2_\epsilon$, and $\epsilon_i$ is independent of $X_i$. Let \begin{displaymath} \widehat{\beta} = \frac{\sum_{i=1}^n X_i Y_i}{\sum_{i=1}^n X_i^2}. \end{displaymath} Is $ \widehat{\beta}$ a consistent estimator of $\beta$? Answer Yes or No and prove your answer. \item In this problem, you'll use (without proof) the \emph{variance rule}, which says that if $\theta$ is a real constant and $T_1, T_2, \ldots$ is a sequence of random variables with \begin{displaymath} \lim_{n \rightarrow \infty} E(T_n) = \theta \mbox{ and } \lim_{n \rightarrow \infty} Var(T_n) = 0, \end{displaymath} then $ T_n \stackrel{P}{\rightarrow} \theta$. In Problem~\ref{randiv}, the independent variables are random. Here they are fixed constants, which is more standard (though a little strange if you think about it). Accordingly, let \begin{displaymath} Y_i = \beta x_i + \epsilon_i \end{displaymath} for $i=1, \ldots, n$, where $\epsilon_1, \ldots, \epsilon_n$ are a random sample from a distribution with expected value zero and variance $\sigma^2$, and $\beta$ and $\sigma^2$ are unknown constants. \begin{enumerate} \item What is $E(Y_i)$? \item What is $Var(Y_i)$? \item Find the Least Squares estimate of $\beta$ by minimizing $Q=\sum_{i=1}^n(Y_i-\beta x_i)^2$ over all values of $\beta$. Let $\widehat{\beta}_n$ denote the point at which $Q$ is minimal. \item Is $\widehat{\beta}_n$ unbiased? Answer Yes or No and show your work. \item Give a sufficient condition for $\widehat{\beta}_n$ to be consistent. Show your work. Remember, in this model the $x_i$ are fixed constants, not random variables. \item Let $\widehat{\beta}_{2,n} = \frac{\overline{Y}_n}{\overline{x}_n}$. Is $\widehat{\beta}_{2,n}$ unbiased? Consistent? Answer Yes or No to each question and show your work. \item Prove that $\widehat{\beta}_n$ is a more accurate estimator than $\widehat{\beta}_{2,n}$ in the sense that it has smaller variance. Hint: The sample variance of the independent variable values cannot be negative. \end{enumerate} \item Let $X_1 , \ldots, X_n$ be a random sample from a Gamma distribution with $\alpha=\beta=\theta>0$. That is, the density is \begin{displaymath} f(x;\theta) = \frac{1}{\theta^\theta \Gamma(\theta)} e^{-x/\theta} x^{\theta-1}, \end{displaymath} for $x>0$. Let $\widehat{\theta} = \overline{X}_n$. Is $ \widehat{\theta}$ a consistent estimator of $\theta$? Answer Yes or No and prove your answer. \end{enumerate} \vspace{90mm} \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf12} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf12}} \end{document} %