STA 301F97 Assignment 8: Quiz in tutorial Nov. 7
The file /student/jbrunner/public/heart contains data from a study of coronary heart disease. Get a copy of the data for yourself with
cp /student/jbrunner/public/heart .
The period is important; it refers to your current directory. Feel free to edit and/or rename the file if you wish.
Create an SPSS command file with variable labels for all variables and value labels where appropriate. PLEASE USE MY VARIABLE NAMES: frstchd age bpress educat cholest numcigs height weight daydeath alive famhist chd
1. Use RECODE to create a new variable called SMOKER, with values 1 = smokes vs 0 = does not smoke. Provide variable and value labels. Generate two frequency distributions, one for the original variable and one for the recoded variable. Did you handle missing values correctly?
2. Use COMPUTE to create a new variable called "fatness," equal to weight divided by height. Provide a variable label
3. The following choice of IV and DV is a little unnatural, but please do it anyway for practice. The independent variable will be type of first coronary heart disease event, and the dependent variable will be age. First do a oneway anova, with plannned comparisons comparing "no chd" to each of the other types of heart attack; to make things nice, give "no chd" a negative weight. Notice that NO Tukey or Scheffe tests are requested). Now, when you first try to do this, you get some funny results. Fix it up by guessing what to do (guessing right).
Now do the same thing with multiple regression, as follows. Make indicator dummy variables for type of first coronary heart disease event, with "no chd" as the comparison category. Call them e1, e2, etc.
If you do this right, you will see a remarkable similarity between the results of oneway and regression (this always happens). How are they the same? How are they different?
4. Make up a table showing the values of the dummy variables and predicted Y (Y-hat) for each group. An example is on page 7 of the notes, but you also need a column for each dummy variable. On the quiz, you may be asked to do this either for this exact categorical IV, or for another one you have never seen before.
5. Now do a multiple regression in which cholesterol level
is to be predicted from age, blood pressure, years of education, whether
the person is a smoker (NOT number of cigarettes per day), fatness,
whether the person has a family history of coronary heart disease,
and whether the person has coronary heart disease himself. What
proportion of the variation in cholesterol level do these variables
explain? Is it a statistically significant amount? What do the
t-tests tell you? Disregarding significance, what trends (that's a good
way to say it) do the b coefficients indicate?
Bring a printout of your unedited list file
to the quiz. Don't write
anything on it except your name and student number.