STA442/1008 Assignment 2
Quiz on Friday Jan 26th
This "quiz" is really just an opportunity to get your feet wet with unix and SAS. It is based on Chapter 2 of the class notes and associated lecture material. You will do the job described below, and bring your log file and your list file to the quiz. Then you will be asked a few very simple questions, like what's the mean number of hours of sports programming watched per household, or how many households in the sample do not have a TV. The answers will all be numerical, and they will all be directly from your printout. The quiz will consist of your circling a few numbers and writing a few words on your printouts, and turning them in. It should only take a few minutes, but Alison will be there until the end of the period.
There is a warning about copying at the end of this Web page; please read it and believe it. Now here's the assignment.
The file tv1.dat contains data from a 1982 survey conducted in Stevens County in the United States. Well, actually Stevens county is fictitious, and the data were simulated using a program written by Ted Chang of the University of Virginia (see The American Statistician, 46 (1992), 232-237 for more information), but the details are realistic -- or anyway, they were realistic in 1982. The imaginary "Stevens County" is divided into 75 districts including rural, small-town and urban areas. For each of 500 households interviewed, the data file contains district number, household number within district, assessed value of home in US dollars (an indirect measure of income, which was not asked), and answers to 9 questions related to the respondents' interest in getting cable TV. The variables are:
You can get a copy of the data file here, or
If you're on tuzo, then at the unix prompt, type
cp /student/jbrunner/public/442s01/tv1.dat .
If you're on credit, then at the unix prompt, type
cp /res/jbrunner/public/442s01/tv1.dat .
The period is important; it refers to your current directory.
Write a SAS program that reads the data and labels the variables with the label statement. Use proc freq to obtain frequency distributions of all the survey questions. Use proc means to obtain n, mean and standard deviation for all the quantitative variables. That's it.
Again, bring your log file and your list file to the quiz. You will hand them both in. Here are a few suggestions and comments
Plagiarism
It is academic dishonesty to present someone else's work as your own, or to allow your work to be copied for this purpose. To repeat: the person who allows her/his work to be copied is equally guilty, and subject to disciplinary action by the university.
Note that if I catch you, I am not allowed to impose some reasonable penalty (like a zero on the assignment). I am required to pass it on to the Dean. In the past I have done this, even though I liked the students involved. The penalties were harsh.
Here are some guidelines. It is fine to discuss the assignments and to learn from each other, but don't copy. Never look at anyone else's work or show anyone your work before the time when it might be handed in -- these times will be very explicit. Do not give anyone a copy of your program file before a computer assignment is due, and do not look at anyone else's.
For some of the quizzes, you will be asked to bring your printout to class; maybe you will hand part of it in, and maybe you will use it to answer some questions. Never, ever, bring a copy of somebody else's printout, or allow anyone to have a copy of yours.
Don't copy. If we catch you, you will get in big trouble. And even if we do not catch you, after you die you will be reincarnated as a tadpole in a polluted stream.
A final note: You must use your own computer account, and only your own computer account to do the work for this course. If you use the account of another student -- or allow your account to be used by another student in this class -- your computer account will be cancelled, and so will your friend's. This will make it very difficult for you to pass this course (and possibly others). The reason for this seemingly insane rule is to prevent the following well-worn defense (never believed) for having identical printouts: "I was using my friend's account and I accidentally printed the wrong file."