Erindale College - University of Toronto Faculty of Arts and Science December Examinations 1995 STA 301F Duration - 3 hours Aids allowed: Calculator. Computer printout will be supplied. Name (Please print) _____________________ Signature _____________________________ Student Number ________________________ 1. (10 points) Answer the following questions T for true or F for false; write the letters on the lines. Assume the significance level is alpha = .05 in all cases. You must get at least 10 out of 11 right in order to get any marks at all on this question. ____ In an observational study, a significant relationship between the independent variable and the dependent variable provides good evidence of a causal relationship. ____ We observe r = -0.041, p = .762. We conclude that there is a nonlinear relationship between X and Y. ____ We would like to predict type of automobile (North American vs. Japanese vs. other) from income, education and sex. It does not make sense to use the SPSS Regression procedure to do this. ____ An experimental study is one in which cases are randomly assigned to the different values of an independent variable. ____ In simple regression, a negative regression coefficient b1 (if it is significant) implies that high values of X go with low values of Y and low values of X go with high values of Y. ____ If P > .05 we say the results are statistically significant at the .05 level. ____ We attempt to predict the independent variable from the dependent variable. ____ In a study seeking to predict income from education and race, there is a significant interaction between education and race. This means that education and race are related. ____ We observe r = 0.50, p = .002. This means that 50% of the variation in the dependent variable is explained by a linear relationship with the independent variable. ____ When a relationship between the independent variable and the dependent variable is statistically significant, we conclude that the two variables are actually related. ____ We observe r = -0.70, p = .009. We conclude that X and Y are related. 2. (30 pts for 3 tables) Please fill in the tables below and on the following pages. Use my variable names from Section A of the printout. Infection risk always refers to infrisk, not riskcat. Write the letter of the best technique, using A = Matched t-test B = Independent t-test C = One-way anova D = Chisq test of ind. E = Simple regression F = Multiple regression G = Logistic regression H = Factor or PC analysis I = None of the above Question IV(s) DV(s) Letter of best test Is there a linear relationship between the average patient age in a hospital and the average length of time a patient stays? Is the average of xratio different from the average of culratio? Do the geographic regions differ in average length of patient stay? Once we control for number of nurses and length of stay, can the geographic region of a hospital be predicted from medical school affiliation? Once we control for number of patients, do the geographic regions differ in average number of nurses? Do patients in hospitals with medical schools have a lower risk of getting an infection? Once we control for number of patients, number of beds and length of stay, is there a relationship between medical school affiliation & infection risk? Do the geographic regions differ in how likely a patient is to acquire an infection while in the hospital? Can percent of facilities and services be predicted from the number of patients in a hospital? Once we control for size of hospital (represented by nbeds and census), do the geographic regions differ in how likely a hospital is to be affiliated with a medical school? Do hospitals with a medical school affiliation tend to have more patients? Do the geographic regions differ in how likely a hospital is to be affiliated with a medical school? Do patients in hospitals with medical schools stay a shorter length of time on average? Once we control for number of patients, can medical school affiliation be predicted from number of nurses? Do hospitals with more nurses tend to have higher infection risk? 3. (4 points) Refer to Section B of the printout. In everyday language, what do we learn from this analysis? (If you just say something is "related" to something else, you get no credit. Be specific about what the results are -- but don't quote numbers). 4. (4 points) Refer to Section C of the printout. In everyday language, what do we learn from this analysis? (If you just say something is "related" to something else, you get no credit. Be specific about what the results are -- but don't quote numbers). 5. (4 points) Refer to Section D of the printout. In everyday language, what do we learn from this analysis? 6a. (3 pts) In the table below, set up indicator dummy variables for geographic region so that SOUTH is the reference category. Region D1 D2 D3 NORTHEAST NORTH CENTRAL SOUTH WEST b. (8 pts) Representing infection risk by Y, age by the variable X, and your three dummy variables by D1, D2 and D3, write a regression equation in which the linear relationship of age to infection risk might depend on geographic region. Complete the equation below. = b0 + b1X + c. (4 pts) Using your notation from parts (a) and (b) above, write the equations of the 4 regression lines (one for each region) in the table below. You may want to do some scratch work on the (mostly blank) sheet of paper that has your name, but DO NOT HAND IN THE SCRATCH PAPER. All I want are the answers. Region Equation: = (Intercept) + (Slope) X NORTHEAST NORTH CENTRAL SOUTH WEST 7. Refer to Section E of the printout: Ordinary Multiple Regression. We are always using nondirectional tests (two-sided, if it's a t-test) with significance level alpha = 0.05. a. (1 point) Without considering any other variables, are number of beds and number of patients (together) significant predictors of infection risk? Write your answers in the boxes. Answer Yes or No P - Value b. (1 point) When we control for number of beds in the hospital, is the number of patients a significant predictor of infection risk? Write your answers in the boxes. Answer Yes or No P - Value c. (1 point) When we control for number of patients, is number of beds in the hospital a significant predictor of infection risk? Write your answers in the boxes. Answer Yes or No P - Value d. (5 points) Explain the apparent inconsistency between your answer to (a) and your answers to (b) and (c). Don't worry about "everyday language;" just convince me that you understand what's going on. 7e. (1 point) When we control for number of beds and number of patients, do medical school affiliation and number of nurses (considered together) predict infection risk? Write your answers in the boxes. Answer Yes or No P - Value 7f. (1 point) Controlling for number of beds and number of patients (but NOT for number of nurses), does medical school affiliation predict infection risk? Write your answers in the boxes. Answer Yes or No P - Value 7g. (1 point) What percentage of the variation in infection risk is explained by the 4 independent variables together? 7h.. (1 point) Taken all together, do the 4 independent variables explain a significant amount of variation in the dependent variable? Write your answers in the boxes. Answer Yes or No P - Value 7i. (1 points) Which variable or variables explains a significant amount of variation in the dependent variable once you control for the other three? Just name the variable or variables. 8. Refer to Section F of the printout: Logistic Regression. We are always using nondirectional tests with significance level alpha = 0.05. If both a Wald test and a likelihood ratio test answer the same question, use the likelihood ratio test. a. (1 point) Without considering any other variables, is average length of stay a significant predictor of infection risk category? Write your answers in the boxes. Answer Yes or No P - Value b. (1 point) When we control for average length of stay, are number of patients and number of beds (together) significant predictors of infection risk category? Write your answers in the boxes. Answer Yes or No P - Value c. (1 point) When we control for average length of stay, number of patients and number of beds, is geographic region a significant predictor of infection risk category? Write your answers in the boxes. Answer Yes or No P - Value d. (1 point) When we control for number of beds, number of patients and geographic region, and the average length of stay increases by one day, the odds of being above the median in infection risk are ___________ times as great (fill in the blank). e. (1 point) When we control for average length of stay, number of beds and number of patients, the odds of being above the median in infection risk are ___________ times as great for a hospital in the West as a hospital in the South (fill in the blank). 9. This is about principal components analysis, and you don't want to see it, I believe.