\documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[scr=rsfs,cal=boondox]{mathalfa} % For \mathscr, which is very cursive. \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312f22 Practice Questions}} \vspace{1 mm} \end{center} \begin{enumerate} \item A strange three-sided die has $Pr(1)=1/6$, $Pr(2)=2/6$, and $Pr(3)=3/6$. If you roll this die four times, what is the probability of getting one 1, one 2 and two 3s? The answer is a number. \textbf{Circle your answer}. \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%% \item For $i=1, \ldots,n$, let $\mathbf{X}_i = (X_{i,1},X_{i,2},X_{i,3})$ be independent multinomial $M(1,(\pi_1,\pi_2,\pi_3))$ random vectors. \begin{enumerate} \item \label{mle} Consider $H_0:\pi_2=2\pi_3$. Starting with the likelihood on the formula sheet, derive the maximum likelihood estimator under this restriction. Your answer is a vector of \emph{three} quantities, each a function of $n_1, n_2$ and $n_3$ (and maybe $n$, depending on how you write it.) Show your work and \textbf{Circle your final answer}. \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Suppose we sample 300 adults, give each of them three unlabeled cups of tea to taste, and ask them to indicate their preference. We find that 104 prefer tea Type $A$, 87 prefer $B$ and 109 prefer $C$. Give the restricted maximum likelihood estimate. This is the numerical version of your answer to~\ref{mle}. The answer is a vector of three numbers. \textbf{Circle your answer}. \end{enumerate} \vspace{75mm} \item For the following table, show that $P(Y=1|X=1)=P(Y=1|X=2)$ implies that the odds ratio $\theta=1$. \begin{center} \begin{tabular}{|l|c|c|} \hline & $Y=1$ & $Y=2$ \\ \hline $X=1$ & $\pi_{11}$ & $\pi_{12}$ \\ \hline $X=2$ & $\pi_{21}$ & $\pi_{22}$ \\ \hline \end{tabular} \end{center} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \label{peas} In a famous genetics experiment first described by Gregor Mendel, plants produce peas that are either smooth or wrinkled. For this experiment (a back-cross involving a single dominant/recessive gene pair), classical genetic theory tells us that the probability of observing a plant with smooth peas is 0.75. We breed and raise 50 plants according to the experimental protocol, and observe that 34 produce smooth peas. \begin{enumerate} \item What is a reasonable model for these data? \vspace{5mm} \item What is the null hypothesis, in symbols? \vspace{5mm} \item Would rejection of $H_0$ be evidence for the theory, or against it? \vspace{5mm} \item You will test the null hypothesis with a large-sample likelihood ratio test. What is the approximate distribution of the test statistic when $H_0$ is true? You don't have to show anything. Just write down the answer. \vspace{10mm} \item What is the critical value of the test statistic at $\alpha = 0.05$? The answer is a number. \vspace{5mm} \item Calculate the test statistic. Show some work. Your answer is a number. \textbf{Circle your answer.} % 3.071821 \vspace{80mm} \item Do you reject $H_0$ at $\alpha = 0.05$? Answer Yes or No. \vspace{5mm} \item Do the results of this experiment provide evidence against the theory? Answer Yes or No. \vspace{5mm} \item Do the results of this experiment prove that the theory is correct? Answer Yes or No. \end{enumerate} \vspace{5mm} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Information on a sample of drivers includes Age, Sex and how many traffic tickets (moving violations) they had the past 12 months. The number of tickets is modeled as a Poisson random variable with conditional mean $\lambda$, and \begin{displaymath} \log\lambda = \beta_0 + \beta_1 x + \beta_2 s, \end{displaymath} where $x$ is age, and $s$ is a dummy variable that equals one for females and zero for males. \begin{enumerate} \item Make a table with one row for females and one row for males. Make one column for the dummy variable $s$, and a second, wider column for the expected number of tickets. \vspace{50mm} \item In terms of $\beta$ quantities, what is the expected number of traffic tickets for a 20 year old woman? \vspace{20mm} \item Suppose that for any age, the expected number of traffic tickets is twice as great for men as for women. What is $\beta_2$? Show some work. \textbf{Circle your answer}. \end{enumerate} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % A logistic regression question here? \item This question is based on the following printout from the Birth Weight study. Remember, \texttt{lwt} refers to mother's weight. {\small \begin{verbatim} > library(MASS) > head(birthwt) low age lwt race smoke ptl ht ui ftv bwt 85 0 19 182 2 0 0 0 1 0 2523 86 0 33 155 3 0 0 0 0 3 2551 87 0 20 105 1 1 0 0 0 1 2557 > attach(birthwt) > race=factor(race,labels=c("White","Black","Other")) > contrasts(race) Black Other White 0 0 Black 1 0 Other 0 1 > mod1 = glm(low ~ lwt + smoke + race, family=binomial) > summary(mod1) Call: glm(formula = low ~ lwt + smoke + race, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.5278 -0.9053 -0.5863 1.2878 2.0364 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.10922 0.88211 -0.124 0.90146 lwt -0.01326 0.00631 -2.101 0.03562 * smoke 1.06001 0.37832 2.802 0.00508 ** raceBlack 1.29009 0.51087 2.525 0.01156 * raceOther 0.97052 0.41224 2.354 0.01856 * --- Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 on 188 degrees of freedom Residual deviance: 215.01 on 184 degrees of freedom AIC: 225.01 Number of Fisher Scoring iterations: 4 > anova(mod1,test='LRT') Analysis of Deviance Table Model: binomial, link: logit Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 188 234.67 lwt 1 5.9813 187 228.69 0.014458 * smoke 1 4.3500 186 224.34 0.037009 * race 2 9.3260 184 215.01 0.009438 ** --- Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1 \end{verbatim} } % End size \begin{enumerate} \item For any race and any mother's weight, the odds of a low birth weight baby are estimated to be \underline{\hspace{15mm}} as great for a mother who smokes during pregnancy. The answer is a number. \vspace{10mm} \item Give a 95\% confidence interval for that last odds ratio. Your answer is a pair of numbers, a lower confidence limit and an upper confidence limit. Show your work. \textbf{Circle your answer.} \vspace{100mm} \item Estimate the probability of a low birth weight baby for a 130 pound, White, non-smoking mother. The answer is a number. \textbf{Circle your answer.} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Controlling for mother's weight and smoking status, the estimated odds of a low birth weight baby are \underline{\hspace{15mm}} as great for a Black mother, compared to an Other mother. The answer is a number. \vspace{20mm} \item Controlling for mother's weight and smoking status, do White and Black mothers differ in their odds of having a low birth weight baby? \begin{enumerate} \item Give the test statistic ($Z$ or $\chi^2$). The answer is a number. \vspace{20mm} \item What is the critical value of the test statistic at $\alpha = 0.05$? The answer is a number. \vspace{20mm} \item Do you reject $H_0$ at $\alpha=0.05$? Answer Yes or No. \vspace{30mm} \item In plain, non-statistical language, what do you conclude? %\vspace{30mm} \end{enumerate} \end{enumerate} % End of bweight question \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item Using the \texttt{Math} data, we investigate choice of university Calculus course as a function of High School Calculus mark and sex. The questions come after the printout. {\footnotesize \begin{verbatim} > # Choice of university course based on High School data, sex and first language > # install.packages("mlogit", dependencies=TRUE) # Only need to do this once > library(mlogit) # Load the package every time > datta = math[,c(1,5,9)] # Just course, hscalc and sex > datta = na.omit(datta) > summary(datta); attach(datta) course hscalc sex Catch-up: 20 Min. : 50.00 F:193 Elite : 28 1st Qu.: 67.00 M:186 Mainstrm:331 Median : 77.00 Mean : 76.09 3rd Qu.: 86.00 Max. :100.00 > # Make Mainstream the reference category for course by changing alphabetical order. > n = length(course); Course = character(n) > Course[course=='Mainstrm'] = '1_Mainstrm' > Course[course=='Elite'] = '2_Elite' > Course[course=='Catch-up'] = '3_Catch-up' > Course = factor(Course); table(Course) Course 1_Mainstrm 2_Elite 3_Catch-up 331 28 20 > datta$course = Course # Put the fixed-up version back in the data frame > > # Make an mlogit data frame in long format > long = mlogit.data(datta,shape="wide",choice="course") > > # Fit full model > full = mlogit(course ~ 0 | hscalc + sex, data=long) > summary(full) Call: mlogit(formula = course ~ 0 | hscalc + sex, data = long, method = "nr", print.level = 0) Frequencies of alternatives: 1_Mainstrm 2_Elite 3_Catch-up 0.873351 0.073879 0.052770 nr method 6 iterations, 0h:0m:0s g'(-H)^-1g = 1.94E-07 gradient close to zero Coefficients : Estimate Std. Error t-value Pr(>|t|) 2_Elite:(intercept) -6.277453 1.573941 -3.9884 6.653e-05 *** 3_Catch-up:(intercept) 4.213750 1.472793 2.8611 0.004222 ** 2_Elite:hscalc 0.036789 0.018833 1.9535 0.050762 . 3_Catch-up:hscalc -0.107888 0.023086 -4.6732 2.965e-06 *** 2_Elite:sexM 1.411205 0.475800 2.9660 0.003017 ** 3_Catch-up:sexM 0.733976 0.497212 1.4762 0.139895 --- Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1 Log-Likelihood: -153.11 McFadden R^2: 0.13307 Likelihood ratio test : chisq = 47.003 (p.value = 1.5226e-09) > > # Restricted models > NoCalculus = mlogit(course ~ 0 | sex, data=long) > summary(NoCalculus) Call: mlogit(formula = course ~ 0 | sex, data = long, method = "nr", print.level = 0) Frequencies of alternatives: 1_Mainstrm 2_Elite 3_Catch-up 0.873351 0.073879 0.052770 nr method 5 iterations, 0h:0m:0s g'(-H)^-1g = 3.08E-08 gradient close to zero Coefficients : Estimate Std. Error t-value Pr(>|t|) 2_Elite:(intercept) -3.39563 0.41503 -8.1816 2.22e-16 *** 3_Catch-up:(intercept) -3.10794 0.36137 -8.6005 < 2.2e-16 *** 2_Elite:sexM 1.46279 0.47359 3.0887 0.00201 ** 3_Catch-up:sexM 0.56897 0.46957 1.2117 0.22564 --- Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1 Log-Likelihood: -170.31 McFadden R^2: 0.035674 Likelihood ratio test : chisq = 12.601 (p.value = 0.0018356) > anova(NoCalculus, full) Error in UseMethod("anova") : no applicable method for 'anova' applied to an object of class "mlogit" > NoSex = mlogit(course ~ 0 | hscalc, data=long) > summary(NoSex) Call: mlogit(formula = course ~ 0 | hscalc, data = long, method = "nr", print.level = 0) Frequencies of alternatives: 1_Mainstrm 2_Elite 3_Catch-up 0.873351 0.073879 0.052770 nr method 6 iterations, 0h:0m:0s g'(-H)^-1g = 1.56E-07 gradient close to zero \end{verbatim} \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{verbatim} Coefficients : Estimate Std. Error t-value Pr(>|t|) 2_Elite:(intercept) -5.641500 1.514151 -3.7258 0.0001947 *** 3_Catch-up:(intercept) 4.472142 1.453862 3.0760 0.0020977 ** 2_Elite:hscalc 0.040052 0.018460 2.1696 0.0300373 * 3_Catch-up:hscalc -0.106000 0.022873 -4.6343 3.581e-06 *** --- Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1 Log-Likelihood: -159.34 McFadden R^2: 0.097752 Likelihood ratio test : chisq = 34.528 (p.value = 3.1797e-08) \end{verbatim} } % End size \begin{enumerate} \item The \texttt{summary} output for the full model includes two tests (excluding tests for the intercepts) that are statistically significant. In plain, non-statistical language and mentioning \emph{no numbers}, give the conclusions from these two tests. You have more room than you need. \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item We seek a \emph{single} test of the relationship between sex and choice of university Calculus course, controlling for mark in High School Calculus. \begin{enumerate} \item Write the numerical value of the test statistic in the space below. The answer is a number. If you need to calculate this number from material on the printout, show a little work. \textbf{Circle the number}. \vspace{140mm} \item What is the critical value? The answer is a number from the formula sheet. \vspace{10mm} \item Do you reject the null hypothesis? Answer Yes or No. \vspace{10mm} \item Is there evidence that sex is related to choice of Calculus course, controlling for High School performance? Just answer Yes or No. \end{enumerate} \end{enumerate} \end{enumerate} % End of all the questions \vspace{20mm} %\newpage \noindent \begin{center}\begin{tabular}{l} \hspace{6in} \\ \hline \end{tabular}\end{center} This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/312f22} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/312f22}} \end{document}