STA429/1007 Assignment 8
Quiz on Thursday Nov. 15th at 10:10 a.m.
This assigment will guide you through a logistic regression analysis of the Heart data from Assignment 7. The dependent variable will be whether the participant was dead 10 years after the beginning of the study. Please use the original coding, in which 0=alive and 1=dead. You may call this variable anything you like, but I will refer to it as "Died" or "Death."
First, create a new Body Mass Index (BMI) variable. The Wikipedia has a formula that applies to weight in pounds and height in inches. Now, please follow these steps.
Log Odds of Death = β0 + β1x, so Odds of Death = eβ0 + β1x.
With x=0 (No Coronary Heart Disease), we have
Odds of Death = eβ0.
But if a probability equals zero, then the odds = zero too. This means
eβ0=0.
Unfortunately, there is no number β0 that satisfies this equation, and finding an estimated β0 that satisfies it will not be possible either. This is why the logistic regression blows up if CHD is included as an independent variable. It's a great predictor -- too good!
Now carry out logistic regression on the 104 people with Coronary Heart Disease. The dependent variable is died, and the independent variables are age, education, BMI, diastolic blood pressure, cholesterol level, number of cigarettes, and family history of heart disease. (Do you have anybody with 99 years of education? What do you think you should do about it?)
data sick;
set heart;
if chd=1; /* Just data for patients with CHD */
Subsequent proc steps will use the most recently created SAS data set, which is sick.
Please bring your log file and your list file to the quiz.