STA 301F97 Assignment 9: Test 2 in tutorial Nov. 7

This assignment is now complete. The only change since Monday Nov 3d is the set of hints immediately below. Note: to see all of this page properly you must use a browser that supports tables.

Hints

Like Test 2 a couple of years ago, this test may refer to either the heart data (which you used in Assignment 8) or the SMSA data. To get a copy of the SMSA data set (and an explanation of what it is) type

cp /student/jbrunner/public/smsa .

The period is important; it refers to your current directory. Feel free to edit and/or rename the file if you wish.

As you did with the heart data, write an SPSS command file that labels the variables, including value labels where appropriate. For both the heart data and the smsa data, it is best to use my variable names. Here are my SMSA variable names: id landarea totpop urban oldfolks doctors hospbeds hsgrads labforce income crimes region.

On the test, you will see printout that comes from either or both data sets. The only statistical procedures I will use are oneway, correlations and regression.

What you should do is make up a lot of questions about these data sets, questions that can be answered by oneway analysis of variance (including planned comparisons and Tukey follow-ups but not Scheffe tests) and multiple regression, and do the analyses with SPSS. Look at the output and decide what's going on. How much variation are you explaining? How much additional variation is explained when you bring in a second block of variables? What proportion of the remaining variation is explained when you bring in a second block of variables?

Important Note: For both the heart and SMSA data sets, any analyses you do with continuous independent variables should be done using the original variables and also their standardized equivalents (Z-scores). See senicreg97d.sps on page 37 of the class notes for an easy way to do this. On the test you will definitely see some Z-scores.

The sample test from a couple years ago will give you some ideas of what questions might be asked about the data sets, especially the heart data. Here are some more, about the SMSA data. When you are doing a one-way anova, follow up with Tukey tests if appropriate. Note that because the word "depends" does not appear in any of the questions, you are NOT being asked to test for interactions. Also, if you are being asked to test independent variables A and B controlling for C and D, those are the only independent variables in the analysis.

Question Independent variable(s), including ones for which we may be controlling Dependent variable  Are the results significant at alpha = 0.05? (Yes or No)  p-value Answer the question Yes or No What do you conclude about crime (in everyday language)?
 Once we control for total population and income, is education level related to total number of crimes?            
 Do the geographic regions differ in average number of crimes?            
 Once we control for population size, do the geographic regions differ in number of crimes?            
 Create a new variable called crime RATE (divide total serious crimes by total population). Do the geographic regions differ in average crime rate?            
 Once we control for total population, percent of population in central cities and geographic region, does total personal income predict total serious crimes?            

 

Here are a couple of followup questions that do not fit into the table format.

For question 3: Suppose that population size is held constant at its mean level -- that is, the sample mean population size across all 141 SMSAs. Use a calculator to obtain the predicted number of crimes in each region. Is this consistent with what you learn from the t-statistics for the dummy variables?

For question 5: I think the results of this analysis are a little surprising at first glance. Think about the results and try to come up with an EXPLANATION (not just a description) of what happened.

As in Test 1, you will NOT bring your list files to class.