STA441 Assignment 7
Quiz on Thursday in Tutorial (Bring a calculator.)
- A psychiatric consulting firm was hired by the Ontario Department of Corrections. Their task was to conduct in-depth interviews, and predict whether convicted felons released on parole would be arrested again within 12 months. The table below shows predicted re-arrest by observed re-arrest for a sample of 250 prisoners.
Observed
Predicted No Yes
No 82 75
Yes 37 49
- Are the psychiatrists doing significantly better than chance? Use SAS to carry out the analysis. Give the Pearson chi-square and p-value. Do the shrinks want to feature your findings on their website? Answer Yes or No. For guidance on how to get the data into SAS, see the Berkeley graduate admissions example. This was SAS Example 3; also see Chapter 4 of the text for a full discussion. If you want to check your SAS work, I get a Phi Coefficient of 0.0881.
- Now test the same hypothesis with proc catmod. You'll get a Wald chi-square rather than Pearson, but they should be fairly close and your conclusions should be the same. Dont forget weight count;
- In the Berkeley graduate admissions data, we saw that male grad school applicants were more likely to be accepted, but this tendency disappeared or even reversed in one case when we looked at the departments separately. Try it with proc logistic. Make Department F the reference category. There is no interaction in your model (we'll get back to that later). Feel free to lift my code directly from
SAS Example 3.
Don't forget weight count; Here are some sample questions.
- Controlling for department, is sex of applicant related to admission?
- What is the value of the test statistic? The answer is a number from your printout.
- Do you reject H0 at α=0.05? Yes or No.
- Are the results statistically significant? Yes or No.
- Disregarding significance, the estimated odds of a woman being admitted are ____ times the odds for a man, once you allow for department.
- In plain, non-statistical language, what do you conclude?
- Controlling for sex of applicant, is academic department of applicant related to admission?
- What is the value of the test statistic? The answer is a number from your printout.
- Do you reject H0 at α=0.05? Yes or No.
- Are the results statistically significant? Yes or No.
- In plain, non-statistical language, what do you conclude?
- Controlling for sex of applicant, are the chances of admission different for Department C and Department F?
- What is the value of the test statistic? The answer is a number from your printout.
- Do you reject H0 at α=0.05? Yes or No.
- Are the results statistically significant? Yes or No.
- Disregarding significance, the estimated odds of admission to Department C are ____ times the odds of admision to Department F, once you allow for sex.
- In plain, non-statistical language, what do you conclude?
- Controlling for sex of applicant, are the chances of admission different for Department D and Department E?
- What is the value of the test statistic? The answer is a number from your printout. It's not part of the default output.
- Do you reject H0 at α=0.05? Yes or No.
- Are the results statistically significant? Yes or No.
- Disregarding significance, the estimated odds of admission to Department D are ____ times the odds of admision to Department E, once you allow for sex. Use a claculator.
- In plain, non-statistical language, what do you conclude?
- This question is based upon the Heart data again. Please start by creating a new variable with 3 categories:
- Died from first heart attack (Sudden Death or Fatal Myocardial Infraction)
- Died in next 10 years
- Alive 10 years after entering the study
The new 3-category variable will be your response variable. You will create it using if statements. You will find that or, and, and parentheses work the way you'd expect.
For interpretability, make the probability of being alive 10 years later the denominator in each generalized logit.
- To verify that you've created the response variable correctly, make three tables:
- First CHD event by Alive 10 years after entering study.
- First CHD event by new response variable.
- New response variable by Alive 10 years after entering study.
Suppress all the percentages and include the missing values.
- Carry out a test of Number of cigarettes and Family history of CHD (considered simultaneously -- one test) controlling for Age and Blood pressure. Be able to state the value of the test statistic and the p-value (numbers from the printout), as well as whether the results are statistically significant, whether you reject the null hypothesis, and what (if anything) you'd conclude. For the conclusion, use plain, non-statistical language. I get b0,2 = -14.2147.
- Why does it make sense that both regression coefficients for age are positive?
- Now fit a model with just Age and Blood pressure. For a 50 year old with a diastolic blood pressure of 100, estimate the probability of
- Dying from a first heart attack.
- Dying in the following 10 years.
- Being alive 10 years after entering the study.
Use proc iml. Should your probabilities add to one? For a 5 year old with disastolic blood pressure equal to 400, I get the following estimated probability of being alive 10 years later: 0.0057949.
- Be able to interpret all the parameter estimates and Wald chisquare tests (except those for the intercepts) under "Analysis of Maximum Likelihood Estimates" and "Maximum Likelihood Analysis of Variance."
Please bring your log file(s) and your results file(s) to the quiz. Bring a calculator.