STA429/1007 Assignment 5
Quiz on Thursday Oct. 25th at 10:10 a.m.
You will find Chapter 2 and Handout 2 to be useful for this assignment.
The file
tv1.data
contains data from a 1982 survey conducted in
Stevens County in the United States. At the time, Stevens County was divided into 75
districts including rural, small-town and urban areas. For each of 500
households interviewed, the data file contains district number, household
number within district, assessed value of home in US dollars (an indirect
measure of income, which was not asked), and answers to 9 questions related to
the respondents' interest in getting cable TV. The variables are:
- District: 1-25 are rural, 26-50 small town, 51-75 city.
- Household (numbered within district)
- Assessed value of home in US dollars
- Number of persons 12 and older in household
- Number of persons 11 and younger in household
- Number of TV sets in Household
- Price willing to pay for cable TV (in US Dollars per month)
- Total TV hours watched last week (add hours for all persons in
household)
- Hours Public Affairs watched last week
- Hours Sports watched last week
- Hours Children's programming watched last week
- Hours Movies watched last week
Write a SAS program that reads the data and labels the variables with the
label statement. Create a new variable called "Location" with 3 values: Rural, Small Town and City.
You will need to edit the data file. Notice that some people were not at home. Please do not delete the entire data lines for these respondents. Instead, put the SAS missing value code (a period) for all the missing data. I did it with search and replace in emacs. See the shorter
emacs handout. I will also illustrate search and replace in class.
- Use proc means to obtain n, mean and standard
deviation for all the quantitative variables.
- Use proc freq to obtain
frequency distributions of Location, and also of Number of
persons 12 and older, Number of persons 11 and younger and Number of TV sets. Location should be nicely labelled; I used proc format. So you can check your work, here is my frequency distribution of Location:
The FREQ Procedure
Cumulative Cumulative
location Frequency Percent Frequency Percent
---------------------------------------------------------------
Rural 91 18.20 91 18.20
Small Town 98 19.60 189 37.80
Urban 311 62.20 500 100.00
For debugging, your initial runs should include frequency distributions of
everything, but once the data are clean and your program is correct, just
produce what is requested above.
Bring your log file and your list file to the quiz. You will hand
both of them in. Also,
- The quiz will consist of a few questions like "How many households have exactly two persons 12 and older?" and "What is the mean price people said they were willing to pay for cable TV?"
- You could get zero marks for the quiz if there are any error
messages on your log file. Substantial marks will be deducted if there
are warnings, or "NOTE: Invalid data." Error messages,
warnings and notes about invalid data mean there is something wrong, and
it needs to be fixed before you proceed to statistical analysis.
- You must be absolutely sure that the log file and list file come
from the same run. For me to check your work, the program on your log
file must be exactly the program that generated your statistical output.
- Don't delete your program; later assignments will build on it.
- The first draft of this assignment had regression too, but I think this is enough.