Quiz on Friday Feb. 8th
Explain the difference between random sampling (simple independent random
sampling) and random assignment. Use original examples.
You will find Chapter 2 and Handout 2 to be useful for the remainder of
this assignment.
The file
tv1.data
contains data from a 1982 survey conducted in Stevens County in the United
States. At the time, Stevens County was divided into 75 districts including
rural, small-town and urban areas. For each of 500 households interviewed, the
data file contains district number, household number within district, assessed
value of home in US dollars (an indirect measure of income, which was not
asked), and answers to 9 questions related to the respondents' interest in
getting cable TV. The variables are:
- District: 1-25 are rural, 26-50 small town, 51-75 city.
- Household (numbered within district)
- Assessed value of home in US dollars
- Number of persons 12 and older in household
- Number of persons 11 and younger in household
- Number of TV sets in Household
- Price willing to pay for cable TV (in US Dollars per month)
- Total TV hours watched last week (add hours for all persons in
household)
- Hours Public Affairs watched last week
- Hours Sports watched last week
- Hours Children's programming watched last week
- Hours Movies watched last week
Write a SAS program that reads the data and labels the variables with the
label statement. Create a new variable called "Location" with 3 values: Rural, Small Town and City.
You will need to edit the data file. Notice that some people were not at home. Please do not delete the entire data lines for these respondents. Instead, put the SAS missing value code (a period) for all the missing data. I did it with search and replace in emacs. See the shorter
emacs handout.
- Use proc means to obtain n, mean and standard
deviation for all the quantitative variables.
- Use proc freq to obtain
frequency distributions of Location, and also of Number of
persons 12 and older, Number of persons 11 and younger and Number of TV sets. Location should be nicely labelled; I used proc format. So you can check your work, I had 98 households in the Small Town location.
For debugging, your initial runs should include frequency distributions of
everything, but once the data are clean and your program is correct, just
produce what is requested above.
Bring your log file and your list file to the quiz. You will hand
both of them in. If you decide to use %include, you'll need to bring two log files. Also,
- The quiz will have a few questions like "How many households have exactly two persons 12 and older?" and "What is the mean price people said they were willing to pay for cable TV?"
- You could fail the quiz if there are any error
messages on your log file. Substantial marks will be deducted if there
are warnings, or "NOTE: Invalid data." Error messages,
warnings and notes about invalid data mean there is something wrong, and
it needs to be fixed before you proceed to statistical analysis.
- You must be absolutely sure that the log file and list file come
from the same run. For Christine to check your work, the program on your log
file must be exactly the program that generated your statistical output.
- Don't delete your program; a later assignment will build on it.