STA442/1008 Assignment 8
Quiz on Friday Nov. 6th in Tutorial (Bring a calculator.)
The following
formulas
may be useful, but you do not need to memorize them. If they are necessary, they will be provided with the quiz.
Note: For the purposes of this assignment, all test statistics are Wald chi-squares.
- Consider a logistic regression in which the cases are newly married couples with both people from the same religion, the independent variable is religion (A, B, C and None -- let's call "None" a religion), and the dependent variable is whether the marriage lasted 5 years (1=Yes, 0=No).
- Make a table with four rows, showing how you would set up indicator dummy variables for Religion, with None as the reference category.
- Add a column showing the odds of the marriage lasting years. The symbols for your dummy variables should not appear in your answer, because they are zeros and ones, and different for each row.
- What is the ratio of the odds of lasting 5 years or more for religion C to the odds of lasting 5 years or more for No Religion? Answer in terms of the β symbols of your model.
- What is the ratio of the odds of lasting 5 years or more for religion A to the odds of lasting 5 years or more for Religion B? Answer in terms of the β symbols of your model.
- You want to test whether Religion is related to whether the marriage lasts 5 years. State the null hypothesis in terms of one or more β values.
- You want to know whether marriages from Religion A are more likely to last 5 years than marriages from Religion C. State the null hypothesis in terms of one or more β values.
- You want to test whether marriages between people of No Religion have a 50-50 chance of lasting 5 years. State the null hypothesis in terms of one or more β values.
- If two events have equal probability, the odds ratio equals ___.
- The file
heart.txt
contains data from a long-term study of middle-aged male employees of the Western Electric Company in the 1950's. The first part of the file gives descriptions of the variables. This part should be stripped off or skipped using the firstobs option on the infile statement.
Please write a SAS program that reads and labels the data, including a
proc format. This data file contains numeric missing value codes; 99, 999 and so on. You should convert them to the SAS missing value code using if statements (not emacs!). In addition to the variables in the file, please create an additional quantitative variable: Body Mass Index (BMI) The Wikipedia has a definition at
http://en.wikipedia.org/wiki/Body_mass_index.
- Obtain means and standard deviations of all the quantitative variables. I got a mean years of education equal to 11.6603774, and minimum BMI of 18.8928114.
- Obtain frequency distributions of the categorical variables. It seems that 13 people died on Friday.
- Look at a table of first coronary heart disease event by whether or not the person has coronary heart disease. Does it look okay? If so, relax. If not, track down any problems and fix them using common sense.
The objective here is to find variables that predict presence of coronary heart disease (CHD). One could call this homework assignment "Risk factors in Coronary Heart Disease," and it would sound good.
- Please consider a very simple model with just one independent variable: family history of CHD.
- Are the independent and dependent variables related? Answer Yes, No or No Conclusion.
- Give the value of the test statistic; the answer is a number from your printout.
- Give the p-value; the answer is a number from your printout.
- The odds of coronary heart disease are estimated to be ____ times as great for those with a family history of CHD. The answer is a number from your printout.
- Using numbers from your printout and proc iml, estimate the probability of CHD for study participants with a family history of CHD. Also estimate the probability for those without a family history. Be able to do these calculations with a calculator, too. How could you check your answers with proc freq?
- Now add age to the model.
- Controlling for age, is family history of CHD significantly related to CHD? Answer Yes or No and give the value of the test statistic and the p-value (numbers from the printout).
- Controlling for family history of CHD, is age significantly related to CHD? Answer Yes or No and give the value of the test statistic and the p-value (numbers from the printout).
- Give the value of the test statistic and the p-value for the simultaneous test of age and family history of CHD.
- Controlling for family history of CHD, for each year of increase in age, the estimated odds of coronary heart disease are multiplied by ____. The answer is a number from your printout. Please disregard the significance test this time.
- Now fit a larger model in which the independent variables are
- Family history of CHD
- Age
- Reported number of cigarettes per day
- Blood pressure
- Cholesterol level
- BMI
- Education
Carry out a simultaneous test of the set of independent variables that are not significantly related to CHD, controlling for all the others. Give the value of the test statistic and the p-value. Does it look okay to drop all these variables from the model?
- It is amazing, but we seem to have only two useful independent variables. Fit the model with just those two.
- Fill in the blank. Allowing for education, the more you smoke, the ____ likely you are to have CHD.
- Fill in the blank. Allowing for smoking, the more educated you are, the ____ likely you are to have CHD.
- When we control for reported number of cigarettes per day and increase reported years of education by one year, the estimated odds of coronary heart disease are multiplied by ____. The answer is a single number from your printout. Does this make sense?
- Use proc iml to estimate the probability of coronary heart disease for a man with 16 years of education who smokes 25 cigarettes per day. Be able to do this calculation with a calculator, too.
- Use proc iml to estimate the probability of coronary heart disease for a man with 12 years of education who smokes zero cigarettes per day. Be able to do this calculation with a calculator, too.
- Why does this data set illustrate the danger of fomally accepting the null hypothesis with too much certainty?
Please bring your log file and your list file to the quiz. Bring a calculator.