\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312f23 Assignment Ten}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Mathematical and Computational Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/brunner/oldclass/312f23} {\texttt{http://www.utstat.toronto.edu/brunner/oldclass/312f23}}} \vspace{1 mm} \end{center} \noindent The paper and pencil part of this assignment is not to be handed in. It is practice for Quiz~10 on November 24th. The R parts may be handed in as part of the quiz. \textbf{Bring hard copy of your printout for Questions~\ref{veteran} and~\ref{cancer} to the quiz}. Do not write anything on your printouts in advance except possibly your name and student number. \emph{Answers to the ``plain language" questions are specifically prohibited.} Do not write them, or type them, or otherwise cause them to appear on your printouts. %\vspace{5mm} \begin{enumerate} \item Our main concrete example of a proportional hazards regression model is Weibull regression. \begin{enumerate} \item What is the baseline hazard function for Weibull regression? Assume $e^{\beta_0}$ is part of the baseline hazard function. \item Suppose that the Weibull regression model is the true model for a set of data. When we fit a proportional hazards regression model by maximum partial likelihood and estimate $\beta_1$, what function of the Weibull regression model parameters are we estimating? \end{enumerate} \item Prove $S(t) = e^{-H(t)}$, where $H(t) = \int_0^t h(y) \, dy$. This is a general statement, not just for the proportional hazards model. \item For the proportional hazards model, again assume that $e^{\beta_0}$ is part of the baseline hazard function. We will always do this from now on. Prove that for the proportional hazards model, $S(t) = S_0(t)^{\exp\{\mathbf{x}^\top\boldsymbol{\beta}\}}$. \item A sample of lung cancer patients are classified according to their type of cancer: squamous, small cell, adenocarcinoma, and large cell. We also have age and physician's rating of how far the disease has progressed on a scale from 1-10, which we will call ``severity." Small cell lung cancer is found exclusively in smokers, ex-smokers, and people who have worked in the asbestos industry. \emph{For this entire question, assume a proportional hazards regression model.} \begin{enumerate} \item \label{haz} Write the hazard function, the length of time between diagnosis and death (call it survival time) by $t$. Denote age by $x_1$ and disease severity by $x_2$. There should be \emph{no interactions} in the model, in case you know what that is. You do not need to say how the dummy variables are defined. You will do that in the next part. Complete the equation below. \vspace{3mm} $h(t) = $ \vspace{3mm} \item In the table below, make columns showing how your dummy variables are defined. In the last column, write the hazard function $h(t)$ given a particular vector of explanatory variable values $\mathbf{x}$, using the notation of your model from Question~\ref{haz} above. If \emph{symbols} for your dummy variables appear in the last column, the answer is wrong. \vspace{4mm} \hspace{4.2in} $h(t)$ \begin{center} \renewcommand{\arraystretch}{2.5} \begin{tabular}{|l|c|c|} \hline Squamous & \hspace{50mm} & \hspace{70mm} \\ \hline Small Cell & & \\ \hline Adeno & & \\ \hline Large Cell & & \\ \hline \end{tabular} \renewcommand{\arraystretch}{1.0} \end{center} \vspace{10mm} \item In the notation of your model, what is the hazard function for a 45-year-old patient with adenocarcinoma and a disease severity of 6? % \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item For a patient with small-cell lung cancer, the hazard of death is \underline{\hspace{20mm}} times as great as the hazard for a patient with large-cell lung cancer. Answer in terms of the Greek letters from your model. Do age and disease severity affect the answer (in this model)? Does time $t$ affect the answer? \item For a 47-year-old patient with squamous lung cancer and a disease severity of 3, the chances of death are \underline{\hspace{20mm}} times as great as the chances for a 47-year-old with adenocarcinoma and a disease severity of 3. Answer in terms of the Greek letters from your model. \item You want to know whether, controlling for age and disease severity, type of lung cancer has any effect on the risk of death. What is the null hypothesis? Answer in terms of the Greek letters from your model. % \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item That last question could be answered with either a large-sample likelihood ratio test, or a Wald test. \begin{enumerate} \item Suppose you decided on a likelihood ratio test. Write hazard function for the restricted model. \vspace{3mm} $h(t) =$ \vspace{3mm} \item Suppose you decided on a Wald test. Write the null hypothesis $H_0: \mathbf{L}\boldsymbol{\theta} = \mathbf{0}$ in matrix form. \end{enumerate} \item You want to know whether, allowing for type of cancer and disease severity, the patient's age has any connection to risk of death. What is the null hypothesis? Answer in terms of the Greek letters from your model. \item You want to know whether, controlling for age and disease severity, the hazard is different for patients with large-cell or small-cell cancer. What is the null hypothesis? Answer in terms of the Greek letters from your model. \item You want to know whether, controlling for age and disease severity, risk is different for patients with squamous lung cancer or adenocarcinoma. What is the null hypothesis? Answer in terms of the Greek letters from your model. \end{enumerate} % \pagebreak \item \label{veteran} The classic data set \texttt{veteran} is available as part of the \texttt{survival} package. Type \texttt{help(veteran)} for details. Look at \texttt{contrasts(veteran\$celltype)} to see how the dummy variables for cell type are set up. It's not what you might expect. \begin{enumerate} \item \label{model1} Based on a preliminary analysis (one that you don't have to do), I request that you fit a proportional hazards regression model with just experimental treatment, cell type and Karnofsky score. Based on this model, carry out significance tests to answer the following questions. Be able to state $H_0$, give the value of the test statistic ($Z$ or chi-squared) and the $p$-value. Be able to state your conclusions, if any, in plain, non-statistical language. Guidelines for the plain language statements are \begin{itemize} \item Be guided by the 0.05 significance level, but never mention it. If you do, you get a zero even if what you say is correct. \item Any use of statistical vocabulary such as $p$-value, null hypothesis, significance etc. will get you a zero. Instead of saying ``controlling for," say ``allowing for," or ``correcting for," or ``taking into account." The phrase ``controlling for" will not get you a zero, but please avoid it when talking to non-statisticians. \item If a directional conclusion is posible, make it. Don't say ``Survival time was related to sex." Say ``Women tended to live longer." \item If a test is not significant, do \emph{not} say there was no effect, or no difference. Avoid accepting the null hypothesis, or implying that you accept it. Say ``There was no evidence that surgery was related survival time," or ``These results do not provide evidence of a connection between marital status and time required to graduate," or something like that. \item For any explanatory variable that was \emph{not} randomly assigned, avoid language that suggests influence, or causal connection. Say ``Patients with a health club memberships were at less risk for heart attack," not ``Exercise lowered the risk of heart attacks." \end{itemize} % End plain language guidelines %\hrule Now here are the questions. \begin{enumerate} \item Controlling for cell type and Karnofsky score, does treatment appear to affect survival time? \item Allowing for experimental treatment and cell type, does Karnofsky score help predict survival? In spite of the word ``predict," you are beng asked for a significance test. \item Correcting for experimental treatment and Karnofsky score, do patents with different types of cancer (cell type) differ in their hazard of dying? Do a partial likelihood ratio test. \item Follow up the last question by carrying out tests for all pairwise comparisons of cancer types, controlling for the other variables. Some of the comparisons you want are $z$-tests on the \texttt{summary} output. Use Wald tests for the other comparisons. Directional conclusions are possible for all the tests that are statistically significant, including the Wald tests. \end{enumerate} \end{enumerate} % End veteran questions \item \label{cancer} This question uses the same old \texttt{cancer} data set you have been analyzing in the past two assignments. Even though you may be getting tired of it, there is an interesting technical question we have not explored. \emph{Please print the output for this question on a separate sheet.} \begin{enumerate} \item Fit a proportional hazards model using the same explanatory variables you did for Weibull regression and log-normal regression: \texttt{sex} and \texttt{ph.ecog}. Be able to state the conclusions in plain, non-statistical language. See Question~\ref{model1} for guidelines. \item Now do a table of \texttt{ph.ecog}. Also do \texttt{summary}. You can see that even though it's technically a 6-point scale, in practice the physicians are using just a few categories. It makes you wonder whether we should be treating \texttt{ph.ecog} as quantitative or categorical. \item \label{dummies} Fit a model with \texttt{sex} and \texttt{ph.ecog}, in which \texttt{ph.ecog} is represented by dummy variables. Before you do this, make \texttt{ph.ecog = 3} into \texttt{NA}, since there's only one patient. \item If \texttt{ph.ecog} is quantitative, the proportional hazards model says the hazard is multiplied by $e^{\beta_2}$ when we increase by one unit (whether it's from 0 to 1 or from 1 to 2). Express this as a null hypothesis about the parameters of your model from Question~\ref{dummies}. \item Test the null hypothesis with a Wald test. What do you conclude? (This is \emph{not} a plain language question.) Does it seem okay to treat \texttt{ph.ecog} as quantitative? \end{enumerate} % End computer questions \end{enumerate} % End of all the questions \noindent Please bring \textbf{both} printouts to the quiz. Your printout should show \emph{all} R input and output,and \emph{only} R input and output. Do not write anything on your printouts in advance except your name and student number. The rule is that you may not put anything on your printout that you could not have known before seeing the results. So question numbers are okay. You may even copy-paste the entire questions (for the computer parts) into comment statements if you wish. But results, conclusions and interpretation are not allowed. In particular, answers to the ``plain language" questions must not appear on your printout. % \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%