STA441s16 Final Assignment
Please notice that the Scab Disease data set has been replaced with the Tooth Growth data.
The final exam, like the quizzes, will have a computer part. Links to several data sets are given below. Your job is to get familiar with them and do appropriate analyses in preparation for the exam. On the exam, I will provide my SAS programs and results files. You will answer questions based on my input and output. What I do will be quite predictable, so even if you do not do exactly what I do, my input and output should be fairly easy to follow if you prepare. Again, you are not going to bring your SAS work to the final exam. You will answer questions based on my SAS work.
In addition to numerical answers and plain-language conclusions, you should be able to do the following for every analysis.
Not all of the data sets below will appear on the final exam. There won't be time.
The file TV1.data.txt contains data from a 1982 survey conducted in Stevens County in the United States. Well, actually Stevens county is fictitious, and the data were simulated using a program written by Ted Chang of the University of Virginia (see The American Statistician, 46 (1992), 232-237 for more information), but the details are realistic -- or anyway, they were realistic in 1982. The imaginary "Stevens County" is divided into 75 districts including rural, small-town and urban areas. For each of 500 households interviewed, the data file contains district number, household number within district, assessed value of home in US dollars (an indirect measure of income, which was not asked), and answers to 9 questions related to the respondents' interest in getting cable TV. The variables are:
When you look at the data file, you will see that the columns with the 9 survey questions are numbered 1 through 9. My variable names are q1-q9. The primary response variable is q4: Price willing to pay for cable TV. I am going to make a variable called Location with three values: Rural, Small town, and City.
The Tooth Growth data are in the file ToothGrowth.data.txt. The response is the length of odontoblasts (teeth) for 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
The file donner.data.txt contains data from the ill-fated Donner party, a group of American pioneers who, in the mid 1800s, decided to attempt a new and untested route over the Sierra Nevada mountains. They were snowed in, and the legend is that the survivors were forced to cannibalism. The data file supposedly contains three pieces of information from each adult (15 and over) in the party. I say supposedly because the historical record is not perfect, and there is even room for disagreement about what it meant to be a member of the Donner party, because some people split off from the party during the trek, rejoined later or not, and so on.
Anyway, the variables are
The file BeatTheBlues.data.txt contains data from a longitudinal clinical trial of an interactive, multimedia program known as "Beat the Blues" designed to deliver cognitive behavioural therapy to depressed patients via a computer terminal. Patients with depression recruited in primary care were randomised to either the Beating the Blues program, or to "Treatment as Usual" (TAU). The variables are
Some people disappeared. That's another variable. Is disappearance at random, or is it related to other variables in the study? If disappearance is not at random, how might it bias the results? Think about it.
Once you've explored the data, do some analyses that try to answer the main research question.