% 431Assignment2.tex SAS, MLE, MOM \documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 431s17 Assignment Two}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/431s17} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/431s17}}} \vspace{1 mm} \end{center} \begin{enumerate} \item Two latent explanatory variables $X_1$ and $X_2$ (say motivation and ability) potentially have non-zero covariance. Four job performance measures $D_1$, $D_2$, $D_3$ and $D_4$ are potentially related to $X_1$ and $X_2$ as follows: \begin{eqnarray*} D_1 & = & \alpha_1 + \beta_{11}X_1 + \beta_{12}X_2 + \epsilon_1 \\ D_2 & = & \alpha_2 + \beta_{21}X_1 + \beta_{22}X_2 + \epsilon_2 \\ D_3 & = & \alpha_3 + \beta_{31}X_1 + \beta_{32}X_2 + \epsilon_3 \\ D_4 & = & \alpha_4 + \beta_{41}X_1 + \beta_{42}X_2 + \epsilon_4, \end{eqnarray*} where the $\alpha$ and $\beta$ quantities are unknown parameters, $\epsilon_1$ through $\epsilon_4$ are independent of one another and independent of $X_1$ and $X_2$, and everything is normally distributed. \begin{enumerate} \item Make a path diagram of this model. Write $\beta_{ij}$ parameters on the appropriate arrows. \item What are the unknown parameters of this model? I count 21. You will have to make up some notation for the expected values, variances and covariances. \end{enumerate} \item \label{SAS} Pigs are routinely given large doses of antibiotics even when they show no signs of illness, to protect their health under unsanitary conditions. Pigs were randomly assigned to one of three antibiotic drugs. Dressed weight (weight of the pig after slaughter and removal of head, intestines and skin) was the response variable. Explanatory variables are Drug type, Mother's live adult weight and Father's live adult weight. Data are in the plain text file \begin{center} \href{http://www.utstat.toronto.edu/~brunner/data/legal/pigweight.data.txt} {\texttt{http://www.utstat.toronto.edu/$\sim$brunner/data/legal/pigweight.data.txt}}. \end{center} Once you have the raw data file open in a Web browser, you need to save the page to your computer and drag it to the \texttt{myfolders} sub-folder in your shared folder --- that is, to the folder that is shared between your computer and the virtual \texttt{linux} machine on which SAS is installed. Exactly how you save a web page to your computer depends on your Web browser. \begin{itemize} \item In Firefox, choose \texttt{Save Page As} from the \texttt{File} menu. \item In Chrome, click on the wrench icon in the upper right corner, and choose \texttt{Save Page As}. \item In Safari, choose \texttt{Save As \ldots} from the \texttt{File} menu, and then under \texttt{Format}, choose \texttt{Page Source}. \end{itemize} For this assignment, all you have to do is \begin{enumerate} \item Use \texttt{proc freq} to make a frequency distribution of drug condition. \item Use \texttt{proc means} to calculate means and standard deviations for all the quantitative variables, separately for each drug condition. \item Use \texttt{proc corr} to produce a correlation matrix of the quantitative variables. By default you will also get means and standard deviations \emph{not} broken down by country. \end{enumerate} While it will be useful to use \texttt{kars.sas} (presented in lecture) as an example, please do not imitate the whole thing. Do only what you are asked to do. For example, you are \emph{not} being asked to make dummy variables, and you are \emph{not} being asked to do a regression. Be able to answer questions like the following based on your results file: \begin{itemize} \item How many pigs received Drug 2? \item What percentage of pigs received Drug 2? \item What is the standard deviation of Father's Weight for pigs receiving Drug 3? \item What is the mean dressed weight of the pigs in the sample? \item What is the correlation between Mother's weight and Father's weight? What is the $p$-value? Is it statistically significant? Using the usual $\alpha=0.05$ significance level, are you able to conclude that heavier fathers tend to be paired with lighter mothers? \end{itemize} Please follow these guidelines. Marks will be deducted if you do not. \begin{itemize} \item Put your name and student number in a \texttt{title} or \texttt{title2} statement. \item Do not write anything on the printouts except your name and student number. The other questions are just practice for the quiz, and are not to be handed in. \item Bring your log file to the quiz, \emph{not} just a listing of the program file. \item The log file and the output file must be from the same run of SAS. %\item Your output file must have a time and date stamp. This is automatically generated if you save a pdf file or print from SAS Studio. \item You must use \emph{your} installation of SAS, not the installation on someone else's computer. \end{itemize} \pagebreak \item \label{mle} For each of the following distributions, derive a general expression for the Maximum Likelihood Estimator (MLE); don't bother with the second derivative test. Then use the data to calculate a numerical estimate; you should bring a calculator to the quiz in case you have to do something like this. You will not be asked to carry out the second derivative test. \begin{enumerate} \item $p(x)=\theta(1-\theta)^x$ for $x=0,1,\ldots$, where $0<\theta<1$. Data: \texttt{4, 0, 1, 0, 1, 3, 2, 16, 3, 0, 4, 3, 6, 16, 0, 0, 1, 1, 6, 10}. % MLE=MOM= 0.2061856 = 1/(1+xbar) \item $f(x) = \frac{1}{\theta} e^{-x/\theta}$ for $x>0$, where $\theta>0$. Data: \texttt{0.28, 1.72, 0.08, 1.22, 1.86, 0.62, 2.44, 2.48, 2.96} % Exponential, true theta=2, thetahat = xbar MLE=MOM: 1.517778 \item $f(x) = \frac{\alpha}{x^{\alpha+1}}$ for $x>1$, where $\alpha>0$. Data: \texttt{1.37, 2.89, 1.52, 1.77, 1.04, 2.71, 1.19, 1.13, 15.66, 1.43} % % Pareto alpha = 1 (one over uniform) MLE = 1/mean(log(x)) = 1.469102, MOM = xbar/(xbar-1) = 1.482859. \item $f(x) = \theta x^{\theta-1}$ for $00$. Data: \texttt{0.04, 0.69, 0.86, 0.24, 0.99} % Beta, alpha=theta and beta=1. True theta=1, MLE = -1/mean(log(x)) = 0.965637, MOM = xbar/(1-xbar) = 1.293578 \end{enumerate} \item For each of the distributions in Question~\ref{mle}, derive a formula for the Method of Moments estimator, and calculate it for the given data. To do this you need the expected values, and while it would be ``interesting" to calculate them yourself, that's not a goal of this course. So, here are the expected values. \begin{enumerate} \item Geometric: $E(X)=\frac{1-\theta}{\theta}$ \item Exponential: $E(X)=\theta$ \item Pareto: $E(X)=\frac{\alpha}{\alpha-1}$ for $\alpha>1$. For $0<\alpha\leq 1$, the expected value does not exist. \item Beta with $\alpha=\theta$ and $\beta=1$: $E(X)=\frac{\theta}{\theta+1}$. \end{enumerate} \item Let $Y_i = \beta x_i + \epsilon_i$ for $i=1, \ldots, n$, where $\epsilon_1, \ldots, \epsilon_n$ are a random sample from a normal distribution with expected value zero and variance $\sigma^2$. The parameters $\beta$ and $\sigma^2$ are unknown constants. The numbers $x_1, \ldots, x_n$ are known, observed constants. \begin{enumerate} \item What is the parameter space $\Theta$? \item Find the Maximum Likelihood Estimator of the pair $(\beta,\sigma^2)$. Show your work. % \item Find a Method of Moments estimator of $\beta$ based on $\overline{Y}$. Don't bother estimating $\sigma^2$ by the Method of Moments. Start by calculating $E(\overline{Y})$, which does not equal $E(Y_i)$ for this problem. % MOM does not quite fit the definition with fixed x. \item \label{numbers} Based on the small data set below, calculate % both your MLE and your Method of Moments estimator the MLE for $\beta$. Your answer is a number. Bring a calculator in case you have to do something like this on the quiz. \begin{verbatim} x 0.0 1.3 3.2 -2.5 -4.6 -1.6 4.5 3.8 y -0.8 -1.3 7.4 -5.2 -6.5 -4.9 9.9 7.2 \end{verbatim} % MLE: 1.8885 and MOM = 1.414634 \end{enumerate} \end{enumerate} \end{document}