\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312s19 Assignment Ten}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Mathematical and Computational Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/312s19} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/312s19}}} \vspace{1 mm} \end{center} \noindent The paper and pencil part of this assignment is not to be handed in. It is practice for Quiz~10 on March 25th. The R parts may be handed in as part of the quiz. \textbf{Bring hard copy of your printout for Questions~\ref{veteran} and~\ref{cancer} to the quiz}. Do not write anything on your printouts in advance except possibly your name and student number. \emph{Answers to the ``plain language" questions are specifically prohibited.} Do not write them, or type them, or otherwise cause them to appear on your printouts. %\vspace{5mm} \begin{enumerate} \item Our main concrete example of a proportional hazards regression model is Weibull regression. \begin{enumerate} \item What is the baseline hazard function for Weibull regression? Assume $e^{\beta_0}$ is part of the baseline hazard function. \item Suppose that the Weibull regression mode is the true model for a set of data. When we fit a proportional hazards regression model by maximum partial likelihood and estimate $\beta_1$, what function of the Weibull regression model parameters are we estimating? \end{enumerate} \item Prove $S(t) = e^{-H(t)}$, where $H(t) = \int_0^t h(y) \, dy$. This is a general statement, not just for the proportional hazards model. \item For the proportional hazards model, again assume that $e^{\beta_0}$ is part of the baseline hazard function. We will always do this from now on. Prove that for the proportional hazards model, $S(t) = S_0(t)^{\exp\{\mathbf{x}^\top\boldsymbol{\beta}\}}$. ~ \vspace{70mm} \pagebreak \item \label{veteran} The classic data set \texttt{veteran} is available as part of the \texttt{survival} package. Type \texttt{help(veteran)} for details. \begin{enumerate} \item \label{model1} Based on a preliminary analysis (one that you don't have to do), I request that you fit a model with just experimental treatment, cell type and Karnofsky score. Based on this model, carry out significance tests to answer the following questions. Be able to state $H_0$, give the value of the test statistic ($Z$ or chi-squared) and the $p$-value. Be able to state your conclusions, if any, in plain, non-statistical language. Guidelines for the plain language statements are \begin{itemize} \item Be guided by the 0.05 significance level, but never mention it. If you do, you get a zero even if what you say is correct. \item Any use of statistical vocabulary such as $p$-value, null hypothesis, significance etc. will get you a zero. Instead of saying ``controlling for," say ``allowing for," or ``correcting for," or ``taking into account." The phrase ``controlling for" will not get you a zero, but please avoid it when talking to non-statisticians. \item If a directional conclusion is posible, make it. Don't say ``Survival time was related to sex." Say ``Women tended to live longer." \item If a test is not significant , do \emph{not} say there was no effect, or no difference. Avoid accepting the null hypothesis, or implying that you accept it. Say ``There was no evidence that surgery was related survival time," or ``These results do not provide evidence of a connection between marital status and time required to graduate," or something like that. \item For any explanatory variable that was \emph{not} randomly assigned, avoid language that suggests influence, or causal connection. Say ``Patients with a health club memberships were at less risk for heart attack," not ``Exercise prevented heart attacks." \end{itemize} % End plain language guidelines %\hrule Now here are the questions. \begin{enumerate} \item Controlling for cell type and Karnofsky score, does treatment appear to affect survival time? \item Allowing for experimental treatment and cell type, does Karnofsky score help predict survival? In spite of the word ``predict," you are beng asked for a significance test. \item Correcting for experimental treatment and Karnofsky score, do patents with different types of cancer (cell type) differ in their hazard of dying? Do a partial likelihood ratio test. \item Follow up the last question by carrying out tests for all pairwise comparisons of cancer types. Some of the comparisons you want are $Z$-tests on the \texttt{summary} output. Use Wald tests for the other comparisons. Directional conclusions are possible for all the tests that are statistically significant, including the Wald tests. \end{enumerate} % End Model 1 questions \item Now we are interested in whether there could be an effect of experimental treatment that depends on the type of cancer. Fit another model with experimental treatment, cell type and Karnofsky score -- except this one also allows for interaction between treatment and cell type. Use a different dummy variable coding. Display \texttt{summary} and carry out a partial likelihood ratio test for the interaction. Are you able to conclude that the effect of treatment depends on type of cancer? % No fishing, even though the summary output suggests that there may be an effect within on type of cancer. Categorical variable with 8 categories, etc. \item Returning to the model of Question~\ref{model1} (which has no interaction of treatment with cell type), test whether there might be an effect of treatment that depends on Karnofsky score. Just look at the $Z$-test. In plain language, what do you conclude? \end{enumerate} % End veteran questions \item \label{cancer} This question uses the same old \texttt{cancer} data set you have been analyzing in the past two assignments. Even though you may be getting tired of it, there is an interesting technical question we have not explored. \emph{Please print the output for this question on a separate sheet.} \begin{enumerate} \item Fit a proportional hazards model using the same explanatory variables you did for Weibull regression and log-normal regression: \texttt{sex} and \texttt{ph.ecog}. Be able to state the conclusions in plain, non-statistical language. See Question~\ref{model1} for guidelines. \item Now do a table of \texttt{ph.ecog}. Also do \texttt{summary}. You can see that even though it's technically a 6-point scale, in practice the physicians are using just a few categories. It makes you wonder whether we should be treating \texttt{ph.ecog} as quantitative or categorical. \item \label{dummies} Fit a model with \texttt{sex} and \texttt{ph.ecog}, in which \texttt{ph.ecog} is represented by dummy variables. Before you do this, make \texttt{ph.ecog = 3} into \texttt{NA}, since there's only one patient. \item If \texttt{ph.ecog} is quantitative, the proportional hazards model says the hazard is multiplied by $e^{\beta_2}$ when we increase by one unit (whether it's from 0 to 1 or from 1 to 2). Express this as a null hypothesis about the parameters of your model from Question~\ref{dummies}. \item Test the null hypothesis with a Wald test. What do you conclude? (This is not a plain language question.) Does it seem okay to treat \texttt{ph.ecog} as quantitative? \end{enumerate} % End computer questions \end{enumerate} % End of all the questions \noindent Please bring \textbf{both} printouts to the quiz. Your printout should show \emph{all} R input and output,and \emph{only} R input and output. Do not write anything on your printouts in advance except your name and student number. The rule is that you may not put anything on your printout that you could not have known before seeing the results. So question numbers are okay. You may even copy-paste the entire questions (for the computer parts) into comment statements if you wish. But results, conclusions and interpretation are not allowed. In particular, do not write answers to ``plain language" questions on your printout, put them in comment statements, or otherwise cause them to appear on your printout. If you do, it's an unauthorized aid and you will be charged with an academic offence, whether or not that particular question was asked. % \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%