% \documentclass[serif]{beamer} % Serif for Computer Modern math font. \documentclass[serif, handout]{beamer} % Handout mode to ignore pause statements \hypersetup{colorlinks,linkcolor=,urlcolor=red} \usefonttheme{serif} % Looks like Computer Modern for non-math text -- nice! \setbeamertemplate{navigation symbols}{} % Supress navigation symbols \usetheme{AnnArbor} % CambridgeUS Blue and yellow, Shows current section title % \usetheme{Berlin} % Displays sections on top \usepackage[english]{babel} % \definecolor{links}{HTML}{2A1B81} % \definecolor{links}{red} \setbeamertemplate{footline}[frame number] \mode % \mode{\setbeamercolor{background canvas}{bg=black!5}} \title{Log-Normal Regression\footnote{See last slide for copyright information.}} \subtitle{STA312 Fall 2023} \date{} % To suppress date \begin{document} \begin{frame} \titlepage \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Overview} \tableofcontents \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{The Model} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The Log-Normal Distribution} %\framesubtitle{} \begin{itemize} \item Failure time $t$ $\sim$ Log-Normal$(\mu,\sigma^2)$ means $y = \log(t) \sim N(\mu,\sigma^2)$. \pause \item $y = \log(t) \Leftrightarrow t = e^y$. \pause \item The log-normal density is \begin{displaymath} f(t|\mu,\sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} \, \exp -\left\{{\frac{(\log(t)-\mu)^2} {2\sigma^2}}\right\} \, \frac{1}{t} \end{displaymath} \item $P(T>0)=1$, right-skewed. \item Median = $e^\mu$, expected value is $e^{\mu+\frac{1}{2}\sigma^2}$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Regression} %\framesubtitle{} \begin{itemize} \item In normal regression, $\mu_i = \beta_0 + \beta_1 x_{i,1} + \cdots + \beta_{p-1} x_{i,p-1}$ e \item So just take logs of the failure times and do normal regression. \item Lots of things are familiar, except for censoring. \pause \item Because of censoring, formulas like $\widehat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}$ do not apply. \item $F$ and $t$ distributions do not apply. \item Everything is large-sample maximum likelihood. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Interpretation in Terms of Failure Time} %\framesubtitle{} \begin{itemize} \item People think in terms of time, not log time. \item Don't talk about log failure time, except to statisticians. \pause \item $\mu_i = \beta_0 + \beta_1 x_{i,1} + \cdots + \beta_{p-1} x_{i,p-1} = \mathbf{x}_i^\top\boldsymbol{\beta}$. \pause \item The quantity $\mu_i$ has meaning on the time scale. The median failure time for a log-normal is $e^{\mu_i}$. Mean failure time is $e^{\mu_i+\frac{1}{2}\sigma^2}$. \pause \item Anything that makes $\mathbf{x}_i^\top\boldsymbol{\beta}$ larger makes average failure time larger. \item Ideas like positive and negative relationship, ``controlling for," etc.~carry over directly. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Prediction Intervals} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Prediction intervals} \pause %\framesubtitle{} \begin{itemize} \item You have a good log-normal regression analysis of a set of data. \item Want to predict the value of a future observation, given the explanatory variable values. \item That is, you have $\mathbf{x}_{n+1}$ and you want to predict $t_{n+1}$. This is a very practical goal.\pause \item A natural prediction would be the estimated median for those $\mathbf{x}_{n+1}$ values: $e^{\mathbf{x}_i^\top\widehat{\boldsymbol{\beta}}}$. \pause \item[] \item Estimates and predictions are more valuable when they come with a margin of error, or interval of likely values. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Prediction versus Estimation} \pause %\framesubtitle{} \begin{itemize} \item In statistics, we estimate parameters -- or functions of parameters. \item These are fixed constants. \item With increasing sample size, the confidence interval shrinks to zero. \pause \item Prediction tries to ``estimate" the value of a random variable. \pause \item There is uncertainty about the average value, and further uncertainty that comes from variation of random variables around the average. \item Prediction intervals are always wider than confidence intervals. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Prediction for the Normal Model} %\framesubtitle{} \begin{itemize} \item Prediction intervals for normal regression are straightforward. \pause \item In survival analysis, the distinction between confidence intervals and prediction intervals is largely ignored. \ \item This is probably because the distribution theory for prediction intervals is so hard. \pause \item Except for log-normal regression \ldots \item So the following is ``new," and based on the derivation for ordinary regression. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Derivation of the Prediction Interval} \framesubtitle{Details will be covered in the sample problems} \pause \begin{itemize} \item Get a point prediction and interval for $y_{n+1} = \log(t_{n+1})$, and then take the exponential function. \pause \begin{eqnarray*} 0.95 & \approx & P(A < y_{n+1} < B) \\ \pause & = & P(e^A < e^{y_{n+1}} < e^B) \\ \pause & = & P(e^A < t_{n+1} < e^B) \end{eqnarray*} \pause \item $\widehat{\boldsymbol{\beta}} \stackrel{.}{\sim} N(\boldsymbol{\beta},\mathbf{C}_n)$. \pause \item $\widehat{y}_{n+1} = \mathbf{x}_{n+1}^\top\widehat{\boldsymbol{\beta}} \pause \stackrel{.}{\sim} N\left(\mathbf{x}_{n+1}^\top\boldsymbol{\beta}, \, \mathbf{x}_{n+1}^\top\mathbf{C}_n\mathbf{x}_{n+1}\right)$. \pause \item $y_{n+1} \sim N\left(\mathbf{x}_{n+1}^\top\boldsymbol{\beta},\sigma^2\right) $. \pause \item $y_{n+1}$ and $\widehat{y}_{n+1}$ are independent. \pause \item $y_{n+1} - \widehat{y}_{n+1} \stackrel{.}{\sim} N\left(0, \, \sigma^2 + \mathbf{x}_{n+1}^\top\mathbf{C}_n\mathbf{x}_{n+1}\right)$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Derivation of the Prediction Interval Continued} \begin{itemize} \item $y_{n+1} - \widehat{y}_{n+1} \stackrel{.}{\sim} N\left(0, \, \sigma^2 + \mathbf{x}_{n+1}^\top\mathbf{C}_n\mathbf{x}_{n+1}\right)$ \pause \item $Z = \frac{y_{n+1} - \widehat{y}_{n+1}} {\sqrt{\widehat{\sigma}^2 + \mathbf{x}_{n+1}^\top \widehat{\mathbf{C}}_n\mathbf{x}_{n+1}}} \stackrel{.}{\sim} N(0,1) $ \pause \begin{eqnarray*} 0.95 & \approx & P(-1.96 < Z < 1.96) \\ \pause & = & P\left(-1.96 < \frac{y_{n+1} - \widehat{y}_{n+1}} {\sqrt{\widehat{\sigma}^2 + \mathbf{x}_{n+1}^\top \widehat{\mathbf{C}}_n\mathbf{x}_{n+1}}} < 1.96 \right) \end{eqnarray*} \pause \item Isolate $y_{n+1}$. \pause \item Prediction interval is $\widehat{y}_{n+1} \pm 1.96 \sqrt{\widehat{\sigma}^2 + \mathbf{x}_{n+1}^\top \widehat{\mathbf{C}}_n \mathbf{x}_{n+1}}$. \pause \item Exponential function of the endpoints gives prediction interval for $t_{n+1}$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Derivation of the Prediction Interval Concluded} Prediction interval for $t_{n+1}$ is from \begin{displaymath} \exp\left( \mathbf{x}_{n+1}^\top\widehat{\boldsymbol{\beta}} - 1.96 \sqrt{\widehat{\sigma}^2 + \mathbf{x}_{n+1}^\top \widehat{\mathbf{C}}_n \mathbf{x}_{n+1}} \right) \end{displaymath} to \begin{displaymath} \exp\left( \mathbf{x}_{n+1}^\top\widehat{\boldsymbol{\beta}} + 1.96 \sqrt{\widehat{\sigma}^2 + \mathbf{x}_{n+1}^\top \widehat{\mathbf{C}}_n \mathbf{x}_{n+1}} \right) \end{displaymath} \pause where $\mathbf{C}_n$ is the estimated asymptotic covariance matrix of $\widehat{\boldsymbol{\beta}}$, obtained from the inverse of the Hessian. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistics, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/brunner/oldclass/312f23} {\footnotesize \texttt{http://www.utstat.toronto.edu/brunner/oldclass/312f23}} \end{frame} \end{document} Residual Plots with R Plot Residuals Against Independent Variables Not in the Equation E[Y|X1, X2] = beta0 + beta1 X1 + beta2 X2 Fit model with just X1 Say beta2 < 0 rm(list=ls()) rmvn <- function(nn,mu,sigma) # Returns an nn by kk matrix, rows are independent MVN(mu,sigma) { kk <- length(mu) dsig <- dim(sigma) if(dsig[1] != dsig[2]) sprime("Sigma must be square.") if(dsig[1] != kk) sprime("Sizes of sigma and mu are inconsistent.") ev <- eigen(sigma,symmetric=T) sqrl <- diag(sqrt(ev$values)) PP <- ev$vectors ZZ <- rnorm(nn*kk) ; dim(ZZ) <- c(kk,nn) rmvn <- t(PP%*%sqrl%*%ZZ+mu) rmvn }# End of function rmvn set.seed(9999) n = 50 b0 = 10; b1 = 1; b2 = -1; sig = 10 Phi = rbind(c(100,50), c(50,100)) IV = rmvn(n,c(50,50),Phi) X1 = IV[,1]; X2 = IV[,2] epsilon = rnorm(n,0,sig) Y = b0 + b1*X1 + b2*X2 + epsilon cor(cbind(X1,X2,Y)) mod1 = lm(Y~X1); residual = mod1$residuals #$ plot(X2,residual,xlab=expression(X[2]), ylab=expression(paste('Residual from model with just ',X[1]))) title(expression(paste('True model has both ', X[1], ' and ',X[2]))) ,,,,,,,,,, set.seed(999) n = 50 b0 = 10; b1 = 1; b2 = 1/10; sig = 10 Phi = rbind(c(100,50), c(50,100)) IV = rmvn(n,c(50,50),Phi) X1 = IV[,1]; X2 = IV[,2] % X2 = pnorm(X2,50,10) * 80 # Uniform on (0,80) but correlated with X1 epsilon = rnorm(n,0,sig) Y = b0 + b1*X1 + b2*X2 + 10*sqrt(900-(X2-50)^2) + epsilon cor(cbind(X1,X2,Y)) mod3 = lm(Y~X1+X2); residual = mod3$residuals; Yhat = mod3$fitted.values plot(Yhat,residual, ylab="Residual") title(expression(paste('Suspect Curvilinear Relationship with one or more X variables' ))) ,,,,,,,,,, Plot Residuals Against Independent Variables in the Equation Detect Curvilinear Relationship set.seed(999) n = 50 b0 = 10; b1 = 1; b2 = 1/10; sig = 10 Phi = rbind(c(100,50), c(50,100)) IV = rmvn(n,c(50,50),Phi) X1 = IV[,1]; X2 = IV[,2] % X2 = pnorm(X2,50,10) * 80 # Uniform on (0,80) but correlated with X1 epsilon = rnorm(n,0,sig) Y = b0 + b1*X1 + b2*X2 + 10*sqrt(900-(X2-50)^2) + epsilon cor(cbind(X1,X2,Y)) mod3 = lm(Y~X1+X2); residual = mod3$residuals #$ plot(X2,residual,xlab=expression(X[2]), ylab=expression(paste('Residual from model with E(Y|',bold(X), ') = ', beta[0]+beta[1]*X[1]+beta[2]*X[2] ))) title(expression(paste('Detect Curvilinear Relationship with ',X[2] ))) ,,,,,,,, Plot Residuals Against Independent Variables in the Equation: Detect Variance Increasing with X1 set.seed(9999) n = 50 b0 = 10; b1 = 1; b2 = 1 Phi = rbind(c(100,50), c(50,100)) IV = rmvn(n,c(50,50),Phi) X1 = IV[,1]; X2 = IV[,2] sig = seq(from=1,to=20,length.out=n) sig = sig[rank(X1)] # Sig an increasing function of X1 epsilon = rnorm(n,0,sig) Y = b0 + b1*X1 + b2*X2 + epsilon mod2 = lm(Y~X1+X2); Residual = mod2$residuals #$ plot(X1,Residual,xlab=expression(X[1])) title(expression(paste('Detect Variance Increasing with ',X[1] )))