\documentclass[12pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb \usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{fullpage} %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 312s19 Assignment Eleven}}\footnote{This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Mathematical and Computational Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/312s19} {\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/312s19}}} \vspace{1 mm} \end{center} \noindent The paper and pencil part of this assignment is not to be handed in. It is practice for Quiz~11 on April 1st. The R parts may be handed in as part of the quiz. \textbf{Bring hard copy of your printout for Questions~\ref{goldenyears} and~\ref{liver} to the quiz}. Separate printouts would be a good idea. Do not write anything on your printouts in advance except possibly your name and student number. \emph{Answers to the ``plain language" questions are specifically prohibited.} Do not write the answers to the plain language questions, or type them, or otherwise cause them to appear on your printouts. %\vspace{5mm} \begin{enumerate} \item In a study of whether vitamin C can help protect against the common cold, volunteer subjects were randomly assigned to either an observation only condition (no pill), a placebo condition (sugar pill), a 100 mg pill, a 500 mg pill or a 2,000 mg pill. The pills all had the same size and appearance. Time until the first reported cold was recorded. Subjects were followed for 6 months. At that point, the data for any subject who had not caught a cold was censored. In addition to experimental condition, age was recorded for each subject. Naturally, a lot more data about the subjects would be recorded in a real study. \begin{enumerate} \item \label{haz1} These data will be analyzed using proportional hazards regression. Write the hazard function, including the baseline hazard. Denote age by $x$, and put it first. There are no interactions yet. You do not have to say how your dummy variables are defined. You will do that in the next part. \item \label{table} In the table below, make columns showing how your dummy variables are defined. Make placebo the reference category. In the last column, write the hazard function. If \emph{symbols} for your dummy variables appear in the last column, the answer is wrong. \vspace{4mm} \hspace{4in} Hazard Function \renewcommand{\arraystretch}{2.0} \hspace{-10mm} \begin{tabular}{|l|c|c|} \hline Observation Only & & \\ \hline Placebo & \hspace{50mm} & \hspace{70mm} \\ \hline 100 mg & & \\ \hline 500 mg & & \\ \hline 2,000 mg & & \\ \hline \end{tabular} \renewcommand{\arraystretch}{1.0} %\end{center} %\vspace{10mm} \pagebreak \item You want to test whether, controlling for age, experimental condition has any effect on the risk (technically, the hazard) of getting a cold. Experimental condition includes Observation Only. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz1}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \item Placebo effects are interesting. It is well documented that sometimes, people can get better by taking a pill that is medically inert, just because they \emph{believe} it will make them better. It is particularly likely in this study, because we have data only on time to the first \emph{reported} cold. You want to test for a placebo effect. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz1}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \item You want to test whether, controlling for age, dosage level has any effect on the risk of getting a cold. This test includes the placebo, which is a dosage level of zero, but excludes Observation Only. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz1}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \item If that last test is statistically significant, you definitely want to follow up with all pairwise comparisons of dosage levels, to see where the overall difference comes from. How many pairwise comparisons are there? The answer is a number. \item One of the pairwise comparisons is between the placebo and 2,000 mg. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz1}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \item Another pairwise comparison is between 100 mg and 500 mg. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz1}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \pagebreak \item \label{haz2} Now we want to check whether the magnitude of the effects might depend on the age of the subject. Write the hazard function for proportional hazards regression, including the baseline hazard. Dummy variable definitions are the same as in Question~\ref{table}. \item Fill in the table for the model of Question~\ref{haz2} above. \vspace{4mm} \hspace{3in} Hazard Function \renewcommand{\arraystretch}{2.0} \hspace{-20mm} \begin{tabular}{|l|c|c|} \hline Obs.~Only & & \\ \hline Placebo & \hspace{30mm} & \hspace{120mm} \\ \hline 100 mg & & \\ \hline 500 mg & & \\ \hline 2,000 mg & & \\ \hline \end{tabular} \renewcommand{\arraystretch}{1.0} %\end{center} \vspace{5mm} \item For a test of the interaction of age by experimental condition (including Observation Only), \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz2}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \item Maybe older people have more faith in the medical system. Test whether the magnitude of the placebo effect depends on age. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz2}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \item Test whether the effect of dosage depends on age. This excludes Observation Only. \begin{enumerate} \item Write the null hypothesis in scalar form, using the notation of your answer to Question~\ref{haz2}. \item Write the null hypothesis in matrix form as $H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{0}$. \end{enumerate} \end{enumerate} % End of Vitamin C question \pagebreak \item \label{goldenyears} The \texttt{channing} data set in the \texttt{KMsurv} package has data on death in nursing homes as a function of age and sex. You will need to install the \texttt{KMsurv} package. Information about the data is provided by \texttt{help~channing}. For some reason you need \texttt{data(channing)} before you can use it. Please make gender a binary variable with 1=Female and 0=Male. Also, create a centered version of age, by subtracting off the sample mean for the entire sample. This variable should be in years, not months. When a question refers to a resident of ``average age," it refers to sample mean age. When I say `age," do I mean the variable \texttt{ageentry} or the variable \texttt{age} in the data file? Why? \begin{enumerate} \item In any data analysis, you should start with some basic descriptive statistics to see what the data are like. Please answer these questions. The answers are numbers. A \texttt{summary} of the data frame will help. \begin{enumerate} \item What was the median age at which participants entered the nursing home, in years? \item What was the age of the youngest person to enter the nursing home, in years? \item What was the age of the oldest person to enter the nursing home, in years? \item What proportion of the nursing home residents were women? \item What was the longest length of stay at the nursing home, in years? \item What proportion of the observations were censored? \end{enumerate} \item Fit a proportional hazards model with just age at entry, and gender. There is no interaction yet. \begin{enumerate} \item Interpret both $Z$-tests in plain, non-statistical language. This is one sentence each. \item Correcting for age, the estimated hazard of death is \underline{\hspace{15mm}\%} as great for women. \item Give a 95\% confidence interval for that last answer. Allowing for age, we estimate the hazard to be between \underline{\hspace{15mm}\%} and \underline{\hspace{15mm}\%} as great for women. \item Controlling for gender, if age at entry is increased by 10 years, we estimate the hazard of death to be multiplied by \underline{\hspace{15mm}}. \item Give a 95\% confidence interval for that last number. \item Test the proportional hazards assumption. What do you conclude? \end{enumerate} \item Now fit a model that allows you to test whether the sex difference in risk of death depends on the age of the nursing home resident. \begin{enumerate} \item Do it two ways, with age at entry centered and age at entry uncentered. Which way do you like more? What is the evidence that the fit of the two models is the same? \item State your conclusion (about the interaction) in plain, non-statistical language. \end{enumerate} \end{enumerate} % End nursing home questions. \item \label{liver} In the liver disease data, patents were randomly assigned to one of two drugs, or to a placebo. The data file includes age and sex (1=F). Blood platelet count was recorded for each patient in each time period. The data are available at \begin{center} \href{http://www.utstat.toronto.edu/~brunner/data/legal/liver.data.txt} {\texttt{http://www.utstat.toronto.edu/$\sim$brunner/data/legal/liver.data.txt}} \end{center} \begin{enumerate} \item What was the failure time for Patient 1? Patient 2? Patient 3? \item For a data file in this format, it can be challenging to get basic descriptive statistics. Let's see if you can do it. See \texttt{help(subset)}. \begin{enumerate} \item How many patients took part in the study? \item What was their mean age? \item What percent were male? \end{enumerate} \item Fit a proportional hazards model with no interactions. \begin{enumerate} \item Controlling for sex, experimental condition and platelet count, is there evidence that the risk of death depends on age? \begin{enumerate} \item Answer the question Yes or No. \item If the answer is Yes, state the conclusion in plain, non-statistical language. If the answer is No, that's all you need to say. \item What test statistic value supports your conclusion? The answer is a number on your printout. \item What $p$-value supports your conclusion? The answer is a number on your printout. \item State the null hypothesis in symbols. \item Do you reject $H_0$? Answer Yes or No. \item Are the results statistically significant? Answer Yes or No. \item All other things being equal, if age is increased by one year, the estimated hazard of death is multiplied by \underline{\hspace{15mm}}. \item Give a 95\% confidence interval to go with that last estimate. \item All other things being equal, if age is increased by ten years, the estimated hazard of death is multiplied by \underline{\hspace{15mm}}. You could get this from the output of \texttt{summary} with a calculator, but please use R and display the result on your printout. \item Give a 95\% confidence interval to go with that last estimate. Again, this is an easy calculation based on the output of \texttt{summary}. \end{enumerate} \item Controlling for age, experimental condition and platelet count, is there evidence that the risk of death depends on sex of patient? \begin{enumerate} \item Answer the question Yes or No. \item If the answer is Yes, state the conclusion in plain, non-statistical language. If the answer is No, that's all you need to say. \item What test statistic value supports your conclusion? The answer is a number on your printout. \item What $p$-value supports your conclusion? The answer is a number on your printout. \item State the null hypothesis in symbols. \item Do you reject $H_0$? Answer Yes or No. \item Are the results statistically significant? Answer Yes or No. \end{enumerate} \item Controlling for age, sex and platelet count, is there evidence that experimental treatment (including the placebo condition) affects the chances of survival? Answer the question with a Wald test. \begin{enumerate} \item Answer the question Yes or No. \item What test statistic value supports your conclusion? The answer is a number on your printout. \item What $p$-value supports your conclusion? The answer is a number on your printout. \item State the null hypothesis in symbols. \item Do you reject $H_0$? Answer Yes or No. \item Are the results statistically significant? Answer Yes or No. \item If the results are statistically significant, follow up with all pairwise comparisons. State your conclusions in plain, non-statistical language. \end{enumerate} \end{enumerate} \item Now fit a model that allows you to test whether the effect of experimental treatment depends on the patient's blood platelet count. Test the null hypothesis with a partial likelihood ratio test. What do you conclude? \item Test the proportional hazards assumption. What do you conclude? \end{enumerate} % End liver disease questions % End computer questions \end{enumerate} % End of all the questions \vspace{20mm} \hrule \vspace{5mm} \noindent Please bring \textbf{both} printouts to the quiz. Your printout should show \emph{all} R input and output,and \emph{only} R input and output. Do not write anything on your printouts in advance except your name and student number. The rule is that you may not put anything on your printout that you could not have known before seeing the results. So question numbers are okay. You may even copy-paste the entire questions (for the computer parts) into comment statements if you wish. But results, conclusions and interpretation are not allowed. In particular, do not write answers to ``plain language" questions on your printout, put them in comment statements, or otherwise cause them to appear on your printout. If you do, it's an unauthorized aid and you will be charged with an academic offence, whether or not that particular question was asked on the quiz. % \pagebreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%