\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312f23 Assignment Four}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/brunner}{Jerry Brunner}, Department of Mathematical and Computational Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/brunner/oldclass/312f23} {\texttt{http://www.utstat.toronto.edu/brunner/oldclass/312f23}}} \vspace{1 mm} \end{center} %\textbf{Need to fix up Gumbel} \noindent The paper and pencil parts of this assignment are not to be handed in. They are practice for Quiz 4 on Oct.~6th. The R input and output for Question~\ref{WeibillR} may be handed in as part of the quiz. \textbf{Bring hard copy of your printout to the quiz}. Do not write anything on your printout in advance except possibly your name and student number. \vspace{2mm} \noindent Unless otherwise noted, $T$ is a continuous random variable with $P(T>0)=1$, density $f(t)$ and cumulatve distribution function $F(t) = P(T \leq t)$. \begin{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%% Survival and Hazard %%%%%%%%%%%%%%%%%%%%%%%%%%% \item \label{WeibillR} This is a repeat of Question 3(e) from Assignment 3. For background, see the your answers to the earlier parts of the question. The file \href{http://www.utstat.toronto.edu/brunner/data/legal/Weibull.data1.txt} {\texttt{http://www.utstat.toronto.edu/brunner/data/legal/Weibull.data1.txt}} contains a random sample from a Weibull distribution. Read it with \texttt{scan()}. \begin{enumerate} \item Use R's \texttt{optim} function to find the MLE. The answer is a pair of numbers from your printout. \item Calculate a 95\% confidence interval for $\alpha$. The answer is a pair of numbers from your printout. My lower confidence limit is \texttt{1.780622}. \item Give a point estimate of the expected value for the Weibull data. The answer is a number that you calculate with R. It should appear on your printout. The answer should be fairly close to the sample mean. Why? \item Calculate a 95\% confidence interval for the expected value. Hint: to apply the delta method, you will need the derivative of the gamma function. See \texttt{help(digamma)}. Because of the Central Limit Theorem, the \texttt{t.test} function yields a confidence interval for $\mu$ that is close to the right answer. \item Give a point estimate of the median for the Weibull data. The answer is a number that you calculate with R. It should appear on your printout. The answer should be fairly close to the sample median. \item Give a 95\% confidence interval for the median. The answer is a pair of numbers on your printout, the lower confidence limit and the upper confidence limit. My lower confidence limit is \texttt{3.182938}. \item This is the last part of the question, and you should do it last. Using the usual 0.05 significance level, test whether the expected value equals the median. What do you conclude? There is more than one right way to do this. I did it several ways. You only need to do it one way. Using the delta method, I got $Z=9.92$. Another approach is to observe that the mean equals the median if and only if $\alpha$ has a particular value. I fund that value numerically, and tested $H_0:\alpha=\alpha_0$, obtaining $Z=-22.71$ and $G^2=381.89$. Note that the $Z$-tests are 2-sided, so the difference in sign is not a problem. \end{enumerate} % End of numerical Weibull questions \item The survival function is $S(t) = P(T>t)$. Prove $\displaystyle E(T) = \int_0^\infty S(t) \, dt$. \item The hazard function is denoted by $h(t)$, and defined on the formula sheet. Starting with the definition, prove $h(t) = \frac{f(t)}{S(t)}$. \item Prove $\displaystyle S(t) = e^{-\int_0^t h(x) \, dx}$. You may use anything on the formula sheet except the fact you are proving. \item Let $T$ have a Weibull distribution with parameters $\alpha>0$ and $\lambda>0$. \begin{enumerate} \item Derive the survival function $S(t)$ for $t>0$. \item What is the hazard function $h(t)$ for $t>0$? \item For what values of $\alpha$ and $\lambda$ is $h(t)$ increasing? Decreasing? \item What happens to $h(t)$ as $t \rightarrow \infty$? \end{enumerate} \item \label{bowl} Let the continuous random variable $T$ have hazard function $h(t) = (t-2)^2$ for $t>0$, so that the risk of failure decreases at first, and then increases without bound. \begin{enumerate} \item What is the survival function $S(t)$ for $t>0$? Show your work. \item What is the density $f(t)$ for $t>0$? Show a little work. % \item Using R, make a plot of $f(t)$. Bring hard copy to the quiz, including the R code that generated the plot. \end{enumerate} \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%% Weibull and Gumbel %%%%%%%%%%%%%%%%%%%%%%%%%%% \item \label{gumbel} Let $X$ have an exponential distribution with $\lambda=1$, and let $Y=-\log(X)$. The distribution of $Y$ is another version of the (standard) Gumbel, or extreme value distribution\footnote{In Assignment One, you saw another version of the Gumbel distribution, in which $Y=\log(X)$. }. The extreme value distribution has an important role in the analysis survival data, and also is used to model the rare events like natural disasters. \begin{enumerate} \item Where is the density of $Y$ non-zero? \item Find the probability density function of $Y$. Show your work. % \item \label{gplot} Using R, make a plot of the standard Gumbel density. Bring hard copy to the quiz, including the R code that generated the plot. \item What is the median of $Y$? The answer is a number. Use your calculator. % loglog2 = -0.3665 or so. \item The \emph{mode} of a continuous distribution is the point where the density is highest. What is the mode of $Y$? Show your work. \item What is the survival function $S(y)$? \item The expected value of $Y$ is surprisingly difficult. Trust me that if you try to use the definition of expected value, you will not be able to do the integral. Instead, use moment-generating functions. Recall that the moment-generating function of $Y$ is $M(t)=E(e^{Yt})$, and $M^\prime(0) = E(Y)$. \begin{enumerate} \item Derive the moment-generating function of $Y$. Show your work. \item Differentiate with respect to $t$ and set $t=0$. Using R's \texttt{digamma} function, get a numerical answer. % Please put the one line of calculation on the same printout as Question~\ref{gplot}. You can check your answer by giving this to Wolfram Alpha: \verb|integral of y*exp(-y-e^(-y)) from y = minus infinity to infinity|. A minor bonus is that we find the expected value to be $\gamma$, where $\gamma$ is the Euler-Mascheroni constant. Who knew? By the way, chatGPT got the same wrong answer (zero) three times in a row. \end{enumerate} \end{enumerate} % End of standard Gumbel distribution questions. \item Let $Z \sim N(0,1)$, and let $X = \sigma Z + \mu$, where $\sigma>0$. Find the density of $X$. Show your work. Identify the distribution by name. It is on the formula sheet. \item Let the continuous random variable $Z$ have density $f(z)$, and let $X = \sigma Z + \mu$, where $\sigma>0$. Show that the density of $X$ is $f_x(x) = \frac{1}{\sigma}f\left( \frac{x-\mu}{\sigma}\right)$. The quantity $\mu$ is called a \emph{location parameter}, and $\sigma$ is called a \emph{scale parameter}. \item \label{nsgumbel} Let $Z$ have the standard extreme value distribution of Question~\ref{gumbel}, and let $X = \sigma Z + \mu$. Give the density of $X$. This is a Gumbel (extreme value) distribution with location $\mu$ and scale $\sigma$. Is $\mu$ the expected value? \pagebreak \item \label{logweibull} Let $T$ have a Weibull distribution with parameters $\alpha>0$ and $\lambda>0$, and let $Y = -\log T$. \begin{enumerate} \item Find the density of $Y$; show your work. Do not forget to specify where the density is non-zero. \item Now re-parameterize, meaning express the parameters in a different, equivalent way. Instead of the parameters $\alpha$ and $\lambda$, we will have $\mu$ and $\sigma$. Let $\sigma = \frac{1}{\alpha}$ and $\mu=\log\lambda$. Write the density of $Y$ in terms of $\mu$ and $\sigma$. Simplify, and compare your answer to Question~\ref{nsgumbel}. \end{enumerate} The lesson here is that the (minus) log of a Weibull is an extreme value (Gumbel) distribution. So if you believe the distribution of a set of failure time data could be Weibull (a popular choice), you can log-transform the data and apply a Gumbel model. The Gumbel distribution may be preferable because the parameters $\mu$ and $\sigma$ are easy to interpret. % \pagebreak \item This question is about the meaning of $\mu$ and $\sigma$ in the Gumbel distribution. You can use your answers to earlier questions to make it easier. Show your work when necessary. Let $Y$ have density \begin{displaymath} f(y) = \frac{1}{\sigma} \, \exp-\left\{ \left( \frac{y-\mu}{\sigma}\right) + e^{ -\left(\frac{y-\mu}{\sigma}\right)} \right\} . \end{displaymath} \begin{enumerate} \item What is the survival function $S(y)$? \item The hazard function $h(y)$ does not simplify. Forget it. \item What is the mode? \item What is the median? \item What is the expected value? Write your answer in terms of $\gamma$, the Euler-Mascheroni constant. \item The variance of a standard Gumbel is $\frac{\pi^2}{6}$, though this is not easy to show. How do you know that the variance of a general Gumbel (with density given at the beginning of this question) is proportional to $\sigma^2$? \end{enumerate} \end{enumerate} % End of all the questions \noindent \textbf{Bring your printout from Question~\ref{WeibillR} to the quiz.} Do not write anything on your printout in advance except possibly your name and student number. \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % See work and quiz files for R work.